Comparing Microsoft Word renderings with PDF export
Sometime ago I came up with the idea of automated comparison of how different Office applications render Microsoft documents. Here is some output and thoughts.
- Export test file to PDF from application
convert(imagemagick) PDF to TIFF
- findimagedupes $MICROSOFT_TIFF $COMPETITOR_TIFF
office2007beta2/4-1-1-23.tiff abiword/4-1-1-23.tiff: 89.45% similar. office2007beta2/2-19-1-1.tiff freepdfconvert/2-19-1-1.tiff: 98.44% similar. office2007beta2/2-19-14-7.tiff freepdfconvert/2-19-14-7.tiff: 99.61% similar. office2007beta2/4-1-1-23.tiff freepdfconvert/4-1-1-23.tiff: 96.48% similar. office2007beta2/2-19-1-1.tiff neevia/2-19-1-1.tiff: 98.44% similar. office2007beta2/2-19-14-7.tiff neevia/2-19-14-7.tiff: 99.61% similar. office2007beta2/4-1-1-23.tiff neevia/4-1-1-23.tiff: 94.92% similar. office2007beta2/2-19-1-1.tiff OO/2-19-1-1.tiff: 91.80% similar. office2007beta2/2-19-14-7.tiff OO/2-19-14-7.tiff: 99.22% similar. office2007beta2/4-1-1-23.tiff OO/4-1-1-23.tiff: 88.28% similar.
- Abiword 2.4.4 failed to generate PDFs that imagemagick’s convert could work on, except in one case. Btw the rendering in that case is completely wrong. Notice though how it scored a 89.45% similarity.
- Chart PDF test files that MSOffice2007beta2 outputted would make imagemagick’s convert choke ! :( Hence, I had to compare them by hand.
- Neevia and freepdfconvert do very high quality PDF conversion compared to MS Office 2007.
- Converting test files to PDF in each application is extremely laborious
- OpenOffice 2.0.3 does well, except minor details in the rendering such as font size and alignment
- I did try other “online” PDF convertors. Neevia and freepdfconvert were the only two to pass my basic CJK test.
- Reference PDF conversion with OO can be automated
- Of course I assume that the PDF export is the same as the on screen rendering. This isn’t strictly true. For example Word documents with a coloured background won’t come out in a PDF.
- Imagemagick’s compare and pdiff can help
- I tried Softmaker’s Textmaker 2002 linux edition. Couldn’t render any of my doc tests. The new 2006 Textmaker version for Windows renders my tests well, though it seems unable to export to PDF. Doh!
- Microsoft are doing away with PDF export
- The similarity index can only be used as a rough guide to flag possible problems. OO renderings are often below 90%, but the information is there.
If there is a way to hack Microsoft Office 2007beta2 to create (correct) PDFs from them command (cmd) line, then this technique has a hope. I can’t be bothered to go through hundreds of test cases. Since Neevia and “freepdfconvert” are so close to Microsoft Office2007 renderings in my tests, perhaps they could be used as a base reference.