Org files to beatiful docx files with Pandoc
When it comes to sharing written documents with my co-workers, I usually need to use Word or PDFs. Org-mode includes an export option to odt files, which does the job in a pinch, but it’s a bit harder to customize, especially if I want to preserve certain formatting elements. In the past, I used to create ODT files, import them into Word, make whatever changes I needed, and save them in a docx file. The problem there, besides the manual work, is consistency. If I want to make sure I use the same style for fonts, header sizes, and table formats, things get annoying fast.
About a year ago, I discovered that Pandoc can use an external Word document (docx) as a style template. As long as I have this file accessible, every org file I convert to docx will automatically (and beautifully, I may add) retain the exact style changes I need. This, of course, can be automated later inside Emacs, so the export process is basically seamless.
The problem lies in the fact that you need to work with Word and understand how its styles. Readers of this blog know I love to complain about Microsoft’s UI and its lack of consistency, and the instructions here are one fine example in my opinion. Be it as it may, I will try to keep the snarky comments in check.
Creating the default style sheet
The first thing to do is get Pandoc to create its default style reference document as a .docx. This can be done with pandoc -o custom-reference.docx --print-default-data-file reference.docx.
What you’re telling Pandoc here is to output (-o) a file named custom-reference.docx (you can call it whatever you’d like, as long as it’s a docx, since you want to work with Word). What to output goes into this file? The default reference document.
Once you do that, you will have a custom-reference.docx in the folder you ran the command from.
Now it’s time to work inside Word.
Modifying the styles
What you will see is a rather plain file listing a bunch of styles and their name. First, you will see “Title,” and it will be nice and big because it’s using the Title style in Word. Heading 1 will use the Heading 1 style, Body Text (further down) will use the Body Text style, and so on.
What you want to do is to access these styles in Word. You will be modifying them to fit to your liking.
To do that, search for the Styles Pane. In my case, Office 365 for macOS, I can find it in the ribbon under Home, but I wouldn’t be surprised if that’s a little different if you’re using Windows, or a different version of Word.
With the Style Pane open, you will see a long list of the available styles. You can navigate this way, but I find it’s easier to just look for them in the document to the left and select them; it will then show under Current style: to the right.
Now that you have a style selected, select the name of that style from the Style Pane and choose Modify Style…1. You will see the options available to you, such as the font name, text color, alignment, etc.
Make sure that you do not change the Name: of the style! You’re modifying existing default styles that Pandoc recognizes; if you create a new one (by giving it a new name), Pandoc will not know what to do with it and will ignore it. Remember, you are modifying a reference sheet of existing styles, not creating new styles. Likewise, this means that you have no reason to change anything in the document itself (the content) - Pandoc doesn’t care about that and will ignore it. You’re only touching the options in the Modify Style… window.
That’s basically all there is to it. As you will see soon, some items can get a bit more complicated as they are not available in the same way. Save the files when you’re done modifying the styles. You will probably keep going back to this document as you export docx files, tweaking things until you like them.
Using Pandoc to export from org-mode to a docx file
The next step is to tell Pandoc to convert an existing org file to a docx file using the reference sheet we just worked on. To do that, use the following command:
pandoc -s <input_file.org> -o <output_file.docx> --reference-doc custom-reference.docx.
The idea is the same as before, but we use a stationary option, -s (the reason for that is that we’re telling Pandoc to use a “full” version of a document, needed with the reference doc option, otherwise it can’t use our reference document).
Then, we’re pointing Pandoc to our org file and using the output, -o, as the doc file we want, then finally using the reference-doc option by utilizing our reference file, custom-reference.docx (unless you change the name to something else). That’s it, Pandoc will create the file in the folder you are in.
Images and Tables
For me, things get a bit more complicated. My technical documents often include screenshots, and I need those embedded in the resulting Word docx file.
The above command will embed images in the Word document, as long as org-mode can find and display your images (using the inlineimages option in org-mode). My org-mode technical notes are all written with Denote these days, which keeps everything organized, including the images, which are inside the default data subfolder. If your workflow is different, be mindful of your path to the images; you’d probably want to use an absolute path in org-mode (use C-u C-u C-c C-l when linking an image instead of the usual C-u C-c C-l for this). There’s also the option to extract media in Pandoc (see the –extract-media option). In my opinion, it’s simpler and cleaner to dedicate a folder for your org documents, completed with a subfolder for attachments.
Another issue that stumbled me was the tables.
Pandoc’s default style for tables is minimal - without borders. This might look nice if you want to hide your tables, but if you need a table with borders and perhaps some shading, you’ll run into issues.
The problem is with Word. The style for the table is not included in the Style Pane. In order to change the style for your table in the reference doc, you need to locate Table Design in your ribbon. Notice that if you work on a small screen (like in my recording), you might have to expand your options to find it. Then, you will have to extend the pane further and locate the Modify Table Style below. Once again, this is true for my case, Office 365 on macOS. If you’re using a different Office version or on Windows, this menu might be located elsewhere.
Once you have Modify Table Style open, you can change the borders and shading to your heart’s content, but as before, make sure the style name remains Table. If you create a new style, Pandoc will ignore it.
A few tricks I picked up
Under the name of the style, in the Modify Style or Modify Table Style window, you will see the Style based on option. You can change this option and browse through different colors and fonts to find something you like, and then customize further instead of starting from scratch. As long as you don’t change the name of the style, you’re OK. I found this useful when working with tables, as I was looking for a table with shading on every other row.
Speaking of tables, the border button is the little square icon in the middle. I spent a few moments trying to find it.
Bonus: use dwim-shell-command for quick export
If you’re an Emacs user, you like shortcuts, and you like to use shell commands from within Emacs. If you haven’t yet, check out dwim-shell-command.
Here’s how I have it set up. All I have to do is mark the org file(s) I want to convert to a docx, and that’s it. It doesn’t matter if I want to export only one file or fifty; it’s the same ease of use.
(defun jtr/dwim-shell-command-pandoc-org-to-docx ()
"uses pandoc to convert an org file to docx using a template docx file. The docx template file is stored in my Notes sync folder"
(interactive)
(dwim-shell-command-on-marked-files
"converting from org to docx"
"pandoc -s '<<f>>' -o '<<fne>>.docx' --reference-doc ~/Sync/Notes/Info/custom-reference.docx"
:utils "pandoc"))
Footnotes
1 : Or, another way to work is to use to select the text you want to modify, then go to Format and select Style from there; you will need to then choose Modify in the window that shows.