<!--#include virtual="/c/head.shtml"-->
<h3>OpenOffice.org Findings</h3>
I've been looking at the work done by the OpenOffice.org team (<a target="_blank" href="http://www.openoffice.org/">http://openoffice.org/</a>)
and the file format. It's very good, it's small, open, XML, and allows for nearly infinite expansion and backwards-compatability
in the future.<P>
So far, I've got some documentation of my findings <a href="xml.shtml">documented here</a> about the .sxw file format.<P>
There has clearly been a lot of work put into the  xml.openoffice.org site, and even more work into the file format, but
the documentation they have doesn't really tell you what you need to know to <i>start</i> getting your teeth into the file format.
<P>
I'm starting to attack the file format myself, and have started creating some pretty decent-looking documents in the format.
If I can show you a way in to understanding what it's about, and where the useful information is to be found, then I'll consider
my goal achieved.
<P>
The reason I'm interested in the format is twofold; firstly, I believe that it's such an excellent format, that I think
it will take off, and end up having huge significance - think of the influence Microsoft have over millions of PC users
who have to upgrade to the latest version of MS Office simply because everyone else has; secondly because I'm in charge
of an application which creates user-editable reports. So far, it creates HTML and RTF documents... HTML is not a very
powerful WP language (it's <i>not</i> a WP language!), and RTF is - effectively - undocumented. The RTF documentation
is <a href="RTF-Spec-1.5.rtf">here</a>. Okay, this is 1.5, which Office 97 supports, so it's a bit old, but please, feel
free to (a) find this on www.microsoft.com, and once you've gone through that hurdle, please (b) let me
know how it documents putting page numbers into footers... it doesn't. It gives a passing mention to footers. Nothing
about putting page numbers into them - a pretty standard thing to do with a document.
<P>
A valid question to ask, is, "Who are you to be talking about XML, your website's HTML is simplistic and not even valid HTML, let alone
XML", that's for a very good reason - see the <a target="_blank" href="http://www.anybrowser.org/campaign/">Any Damn Browser</a> icon
at the bottom of the page - it might be simple and wrong, but it works. For XML, the rules are different. I am not breaking the HTML
rules lightly, I am breaking them because it makes the site visible in everything including Lynx, Mosaic, Netscape, Internet Explorer,
and Mozilla. I assume it works in Opera and everything else I've not tried, too... With XML, all tags must be closed, and everything
is properly defined. The OpenOffice.org File Format defines a standard which is implementable by every word processor, and
allows for extra features, and allows for different features offered by different word processors, seamlessly - so long as they
all follow the specification. That is why the OpenOffice.org spec. is good - it's not just good, it's properly documented.
<P>
<h2>Gotchas</h2>
One gotcha I've found so far: When trying your code, adding features and so on, don't use "File|Reload" to see if your changes
have worked. I believe the team are working on this problem, but at the moment, opening a file either succeeds or fails - it does
not give any explanation as to the problem, etc. This has a rather drastic affect on the File|Reload feature... if the file was
okay before, then you broke it, it'll fail to load the document, but the cursor will go to the start of the current document, thus
giving the impression that the file has been reloaded, and what you're seeing is the document you've just created. I have opened
<a target="_blank" href="http://www.openoffice.org/issues/show_bug.cgi?id=7597">bug #7597</a> on this issue.

<P>
<H2>Benefits</h2>
<h3>The benefits I see to the OpenOffice.org File Format</h3>
The OpenOffice.org file format is, IMHO, great.<P>
I'm not a Word-Processor developer, though I am working on an application which creates WP files, that makes me a much lower
life-form. I don't have to worry about everything a user might do, I just have to make sure that the files I create are acceptable
to the word-processors. The WP programmers can then worry about what strange things users might do to the document.<P>
The file size is small, because it's XML (which is really just text) then zipped. This is far more efficient than any binary
format could be, because a binary format would do all its compression before creating the file. By zipping the file after it's
created, a better compression rate is possible. A typical text document in MS .DOC format (24,648 bytes) compresses to a mere
9,329 bytes as .SXW. Most of the world are still using 56Kb modems; in the UK we pay by the minute for internet access, so downloading
a document in 5 minutes instead of 15 minutes makes a big difference.
<P>
It's clever. It uses XML, and there seems to have been some "discussion" internally about this, but it also supports binary file formats
(such as graphics) - it would be stupid for OO.org to add a new image format to all the ones already out there, and it would
have been silly for OO.org to insist on a single format. It would also have made the filesize huge if all images were encoded in a
text representation (such as base-64, as used by MIME (like email uses)), and this is really why the .SXW file format is a .ZIP file - it just
includes the image files as they are. Sure, a well-compressed image isn't going to be more-compressed by .ZIP, but it isn't going
to be any bigger. This really is the best of both worlds. Think about HTML - there is a pretty good argument for WWW documents
to be passed around in this way - get the content, style, images, and Binary Large OBjects (BLOBs) seperately, as and when
required. By putting them in different files, the whole issue is clearly resolved.
<P>
Knocks HTML/CSS into a hat, if you ask me.


<h2>Queries - Questions I want to put to the OpenOffice.org team</h2>
<HR noshade>
Styles seem to be contained in styles.xml and content.xml - presumably content.xml overrules styles.xml?
<P>
<HR noshade>
Is anything standard? How about Standard? OpenOffice.org 1.0 and StarOffice 6 are (roughly) the same thing - 
SO6 has extra fonts (incl. Times, the default; OO.org's default is Thorndale).<P>
It seems strange that when my document specifies Times / family "times" that OpenOffice.org uses Thorndale, as does
StarOffice (which includes Times as a value-add to OO.org). How do I make StarOffice (ideally any system with Times font
installed) use Times instead of Thorndale?
<HR noshade>

<!--#include virtual="/bottom.shtml"-->