Document compatibility between Microsoft and opensource formats is probably going to get worse before it gets better.

Right now we have quite decent compatibility between Openoffice and MS Word with the 97–2003 .doc format. Not perfect by a long shot. Especially for CJK documents in my tests, but usable.

So here is why it’s going to get worse. Microsoft is pushing its new .docx XML standard which of course isn’t compatible or rather importable with OpenOffice (or anything else for that matter).

In my tests if you open .docx as “Word XML” in OpenOffice, expect a crash. Oh, but the new .docx XML standard is actually a standard proposed to ECMA one might exclaim. Now one can look up the spec, instead of reverse engineering the proprietary Word 97–2003 .doc format. Er, no.

The Microsoft proprietary specification, expertly called “Office Open XML” to confuse everyone is a 4081 page Word centric document from hell. Still I’ll be surprised if this red hot spec is complete and stable by the time Office 2007 actually ships. Still Microsoft have the money and contacts to push this standard as if it was actually a reasonable implementable specification reference. Ouch.

Microsoft’s “Office Open XML” spec was also of course designed to completely derail the true “opensource” format standard proposal called ODF. ODF or otherwise known as the OpenDocument Format is currently implemented in OpenOffice as “Opendocument Text” .odt. I don’t have much experience with ODT or ODF except that in seems to incorporate every XML technology under the sun. So expect your Office application to be like a Web browser, except more complex. How on earth it maps onto A4 postscript or PDF is anyone’s guess.

ODF is now implemented by a couple of programs. “Office Open XML” is only implemented in the repeatedly delayed Office 2007 installment. Here is a comparison article on Wikipedia.

Anyway, if you are wondering what a “Hello World” in each “Office” format looks like, then I have a treat for you:

# Hello World ODT file
# ODT converted from Hello World Word doc
# Extracted ODT
# MS Word 97–2003 Hello World
# Microsoft’s Office Open .docx
# Extracted DOCX
# Hello World PDF converted from docx from Office2007beta2
# Hello World PDF from Openoffice2
# Microsoft’s XML Paper Specification(XPS) Hello World
# Extracted XPS
# Print to file TIFF by Office2007

Scary things to think of:
# The sheer complexity of these new XML formats
# What about the millions of existing Word 97–2003 documents out there and their 3rd party tools?
# Might implementors create XML filters for existing Word doc tools?
# Will “commercial” implementers write their own Opendocument engine from scratch?
# How many people will convert (upgrade) their Word documents to .docx?
# Will there be a plugin for Office 2007 to publish to odt?
# Will OpenOffice implement import from .docx? Should they?
# Will Microsoft be forced to implement odt?
# What happens if the industry is bribed towards .docx ?
# Longterm: What will the fall out be? Two “new” standards vying here. Will either complex standard work out?
# What about Excel? Excel is actually supposingly mapped out a little on Microsoft’s “Office Open” spec, whilst there is nothing for spreadsheets on ODF.
# Will Office formats map or become part of the HTML Web?

Personally, I think the new Office format should be reStructuredText. ;)

91.84.53.136

Btw I’ve gone off reStructuredText. I am now a big HTML and http://www.princexml.com/ fanboy. :)

Anyway I know for a fact Google docs does not actually any of the aforementioned formats to store their documents. Though their main language is really the Web. Ah, HTML again.

I reckon this Web thing could be big. ;)

Comment by hendry
68.95.110.1
There actually is a spreadsheet format in the ODF spec, ODS
Comment by jeff