Email Address Password
Remember Me

Or Create a (Free) Account.
2004JanFebMarAprMayJunJul Aug Sep Oct Nov Dec
2005 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2006 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Oct Oct

Thu 21st Feb 00:06 2008: MS Office

It seems that Microsoft have released documentation on the major MS Office file binary formats.


Joel Spolsky defends the lack of clarity.

Myself, I find notes such as this, to be rather confusing:

Note The end of a section is also the end of a paragraph. The last character of a section is a
section mark which stands in place of the paragraph mark normally required to end a
paragraph. An exception is made for the last character of a document which is always a
paragraph mark although the end of a document is always an implicit end of section.
(Page 31, Word Spec.

Does that mean much to you? It doesn't help me a lot, and it doesn't suggest a cleanly-designed format.

But then again, I haven't in full; I'm really only interested in the MS Publisher format, as I try to decode that myself.

Joel suggests that a naysayer might suggest that these formats:

  • are deliberately obfuscated
  • are the product of a demented Borg mind
  • were created by insanely bad programmers
  • and are impossible to read or create correctly.

From what little I have glanced at, I would have to concur with they hypothetical naysayer. Joel's explanations ("we weren't thinking of interoperability") are not terribly convincing; when one version of MS Word is not compatible with another version, you have a problem anyway. The argument that RTF is a standard format is laughable; I have read the documentation, and it barely covers any features at all (headers and footers are not properly documented). CSV is another "simple" format which doesn't work properly; are quotes required around fields with spaces, or are they not required? It depends on the MS application which wrote it, and which reads it.

Joel's argument that it took thousands of coder-years to write means that it must take thousands of coder-years to be compatible is misguided, at best. The point of a document format, whether it's intended to be internal or public, is that it be clearly documented and understood, not just easy to code. At least, that's the way it's done in the professional IT world. SGML, W3C, etc etc. It makes sense even if MS assumed that their closed-source monopoly would always be accepted by every country on the planet, in perpetuity, for compatibility purposes. It's a lot easier to check that a web page is valid (it passes, or fails, the online W3C validator) than to check if a MS Office document is valid (it opens "correctly" (please define "correctly") on all versions, on all platforms).

I haven't looked at the legal requirements behind this publication, but it looks as if MS have put forward the bare minimum, whilst also acknowledging that their presumption has always been monopoly domination of the market. As that monopoly has been declared illegal, the onus to rectify it must fall on Microsoft, not on third party developers to fit in with Microsoft's formats. Microsoft are the ones who have been found guilty, not the many third parties who need to operate with their poorly-designed, poorly-written software.

Comments for 'MS Office'

You could post a comment if you were logged in.

You are logged in as 0

create an account

Steve's urandom blog
Share on Twitter Share on Facebook Share on LinkedIn Share on Share on StumbleUpon
My Shell Scripting Book:
    Shell Scripting, Expert Recipes for Linux, Bash and more
is available online and from all good booksellers: