Word Up

Word Up

Notes on recreating existing Word reports in Rmarkdown with Officer, Officedown and flextable.

Background: I’m currently working on a project where I am using parameterised rmarkdown for the first time - hooray! This will end up as 13 separate documents of about 30 pages, with lots of tables and charts. This is the first phase of many similar documents.

But, I have to render my final output to Word, which, I’m not going to lie, was a bit of a worry.
That’s a lot of Word, and a lot of scope for things to go wrong.

While I’m not adverse to Excel, I am far from a fan of Word.
Much of this is because of its infuriating layout quirks when it comes to margins,figures and tables.

The {officer}, {officedown} and {flextable} packages make working with Word easier.
You can use {officer} to create Word and Powerpoint documents when working in R, {officedown} to do the same in R Markdown, and {flextable} for, well, tables.

The advantage of {officedown} is it allows dynamic figure and table captions, cross referencing, and whole host of other features ( like multi column layouts, changing page orientation, and different formats for different sections) {officer} is a lower level package, where you work with adding paragraphs, and other blocks. One of my favourite features is block_pour_docx which allows you to insert an entire page, or even an entire document, into your rmarkdown file, as a one-liner.

{flextable} is not just for Word, and ranks alongside {gt} as a general purpose solution for all your table needs. If you haven’t considered it before now, it is definitely worth spending some time on.

There’s not a lot of documentation / help/ blog posts out there for working with these packages, especially for Word.
Or, at least, not in comparison to working with Rmarkdown generally.

Alison Hill wrote a post detailing her friction log when working with Powerpoint, so consider this my friction log of working with Word.

(These are in no particular order)

Applying your departmental / organisational styles

You will see advice that says you need to knit a regular Word document ( from the New File menu item in RStudio), then open it (it doesn’t need any content) and then amend the styles in this document, before passing it to officer as a reference document.

My colleagues already had a .dot template with our departmental styles applied. I found that I could use that (after I’d saved a copy as a .docx file) and the styles were picked up by {officedown}. No need to generate a blank file and amend everything from scratch (there are a lot of style elements and it would have been really tedious). If you already have a document with styles in place, you should be good to go. Here’s how my Rmd file looked. You specify rdocx_document as the output format, and I saved my template file in the root directory so it was easy to reference :

output:
  officedown::rdocx_document:
    reference_docx: "my_report_template.docx"

Adding a row to a table to create space / improve presentation

Throughout this project I’m recreating an existing suite of tables and plots that have, until now, been produced in Excel & Word. The first table had totals at Scotland, Council and Health Board level, then a space, then a range of lower level data.

Once I’d got the data into shape, I first added the space using dplyr::add_row().

But, over time, this felt wrong - I didn’t like the idea of adding an unnecessary empty row to a dataframe, especially if I might need it for further analysis later on.

The solution, instead, was to add padding to the relevant row(s)

flextable::flextable(df) %>%
    flextable::padding(., i = c(1,9), padding = 10, padding.top = FALSE)

Here I’m adding padding to rows 1 and 9 of my table ( flextable uses the i and j conventions for applying functions and modifications to rows and columns), and ensuring the padding goes at the bottom by setting padding.top to FALSE.

Setting row heights

Another way to improve presentation in some of my tables was to reduce row heights for some rows, and increase it for others.

out_table  %>%
    flextable::height(., i = c(4), height = .015, part = "body") %>%
    flextable::height(., i = c(1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13), height = .25,
           part = "body")

Row 4 gets less space, while the other row heights are increased. N.B - I thought I could do something like

flextable::height(., i = c(1:3, 5:10, 12:13), height = .25,
           part = "body")

but it didn’t seem to like it, so each row got entered individually instead.

Amending number formats for specific columns

Here I amend the 3rd, 4th and 5th columns - because the 3rd column name is dynamic, I didn’t want to refer to it by name

flextable::flextable(table_data, cwidth = 1.05) %>%
    flextable::colformat_double(.,j = c(3, 4, 5), digits = 1) %>%

For other tables, where I know the column names are static, I can enter the names directly:

flextable::flextable(table_data, cwidth = 1.05) %>%
    flextable::colformat_double(.,j = c("Rate","Lower bound","Upper bound"), digits = 1)

Adding a partial header row above existing column names

I needed to add header row above the final two columns in my seven column table, denoting whether there was a statistically significant difference between them and a reference value.

flextable::add_header_row(.,values = c("","Significance"), colwidths = c(5,2)) 

The first 5 columns get passed nothing, while “Significance” spans the final two columns.

Highlighting a row

Using bg, we can add a custom background colour to highlight a specific row.

I found that when creating the table in a function, I had to to use the which syntax below:


flextable::bg(out_table, 
                     i = which(out_table$body$dataset$areaname == mylocation),
                     bg = "wheat",
                     part = "body") 

Prior to this, I was using the syntax below, but it would not work inside a function.

 i = ~ areaname == eval(mylocation),

Remove / blank out a column heading

Easy enough, give the column name you want to amend and pass it ""

out_table %>%
    flextable::set_header_labels(., areaname = "")

superscript in the chunks

Ok, technically not an {officer} / {officedown} thing, but here it is anyway. You’ll see I’m specifying the figure captions here, also


{r fig6, fig.height = 4, fig.width = 6.5, fig.cap= "Annual birth rate per 1,000 population^1^, 2002 - 2020"}


{r fig8, fig.height = 3.5, fig.width = 6.5, fig.cap= "Age-sex standardised all age death rate per 100,000 population^1^, 2002 - 2020^2^,^3^", fig.cap.style = "Image Caption"}


ensuring a table returned even if no data exists

If this sounds crazy / stupid, then here’s the scenario. This particular table looks at deprivation. Not all areas have datazones within a certain deprivation decile, hence, there will not always be data to show. However, I want the table numbers to be consistent across all the documents - table 10 in doc1 should address the same topic as table 10 in doc 2 (instead of showing table 11 data under a table10 heading).

I went belt and braces here, using tibble::tribble() and tibble::add_row()


if (my_nrows < 1) {

  out_data <- tibble::tribble(~`Data Zone`, ~Name, ~`Rank in Scotland (1 = most deprived area  6976 = least deprived`,  ~ `National decile of deprivation`, NA_character_,NA_character_,NA_character_,NA_character_)

  outdata <- out_data %>%
    tibble::add_row(., tibble::tibble_row())

}

This returns an empty table for the edge case where currently no zones are in the specified decile.

Here’s a strange one. When knitting docs individually, the resulting Word file opened and the table of contents was displayed. But, when rendering all of the docs in a loop using purrr, there was no TOC, and instead there was the following warning:

 External links warning

I was asked if I’d like to update external links within the document. If I said ‘no’ , then the TOC did not update. If I said ‘yes’, then it would update, and appear as intended. I’m not sure why I only saw this when rendering with rmarkdown::render(), but there doesn’t appear to be a way round it. This means, when the final documents are distributed, we will have to advise readers to accept any prompts to update external links, which always feels a bit uncomfortable. Or, we have to ensure that we ourselves open and refresh all the documents before onward distribution. More on this here

The great & / and debacle

TLDR - lots of my variables had and within their names. In some datasets, this was changed to the ampersand. For example Skye, Lochalsh and Wester Ross became Skye, Lochalsh & Wester Ross

I need to unify these, and let’s face it, typing “&” is easier, and takes up less space.

Big mistake. All seemed to be going well, until I tried generating a report with Skye, Lochalsh & Wester Ross as the parameter. I’m sure someone is laughing at this, thinking, “well of course you can’t have “&”, that’s a special character in html / xml”, but I didn’t know that, being as I’m not an actual super nerd. (Well, OK, I am, but even I have my limits).

Debugging this took me a long time. My first thought was I’d messed up my functions, and then I got some great advice on Twitter, and realised that the markdown file was rendering 100% completely and the failure point was between the .md file and the final Word output.

Reverting all ampersands back to and ( which mean re-running the entire pipeline from scratch) allowed me to generate the elusive final document, and a lesson learned.

Oh how I laughed
I like ‘&’. I like ‘and’. But which is better?
There’s only one way to find out.
FIGHHTTTT!

Best layout for table widths

Depends a lot on your margins, but instead of relying on autofit(), this seems to work well :

    flextable::width(., width = dim(.)$widths * 6.5 / (flextable::flextable_dim(.)$widths))

Inserting cover images

I’m going to leave that for the next post, which I hope will be published within the next few days..so please look out for the follow up post!


© 2016 - 2022. All rights reserved.