File Format Fights and the Future
File Format Fights and the Future
By Randy Edwards
In today's computing world Microsoft's Office suite commands a huge marketshare. It wasn't always that way and some can remember in years past that we had a wide variety of word processors and spreadsheets. Those people will also recall the confusion of exchanging files between one program and a friend or colleague who used a different program.
Now with Microsoft's domination of the market, Microsoft Word *.DOC files and Excel *.XLS spreadsheet files seem like industry standards and are ubitquitous. However, as popular as those file formats are, they are not open standards -- the file formats are proprietary, and only Microsoft really knows how data is stored inside of those files. While some companies produce "filters" to read/write Microsoft Office file formats, those filters are always "iffy" simply because the filter producers are doing reverse engineering and often have to "guess" what is really in those secret file formats.
This has been a cause for concern among some companies, who see their data locked inside of a secret black box designed and copyrighted by Microsoft. Some countries have viewed this as an issue of liberty. For example, a legislator of Peru wrote long essays on how the government of Peru's data -- which is, logically, owned by the people of Peru -- could be locked up in such a way as to force the people of Peru to have to buy specific software to access the data they paid to collect. The legislator proposed that only openly-defined file formats be used by the governement to give Peruvian citizens free access to their data.
Microsoft has heard these complaints and has responded that it will move from the secret binary file formats (e.g. *.DOC, *.XLS, etc.) to an XML-based file format in its new version of Microsoft Office, called Office 11. Microsoft is portraying its move to XML as a nod to open file formats.
XML, the extensible markup language, is so extensible as to make it a non-standard standard. For example, everything in XML is put inside of a <TAG> </TAG> untag format. The only rule is that for every TAG you invent, you have to have an un-TAG. The data inside a tag can be anything. Since these tag names can be anything, this means that different industries or interests can invent XML to suit themselves -- this is XML's genius. An XML file format for the legal industry would contain different tags than an XML file format for the automotive parts industry.
For example, you could easily invent a tag called URL and have something like <URL>http://www.golgotha.net</URL>. This is straightforward and simple. But you could also have a perfectly legal XML tag like <ProprietaryFile>\\\Windows\System32\SecretFile.DLL</ProprietaryFile>. SecretFile.DLL would do some sort of blackbox processing. And it would obviously work only on Windows systems and its producer would be the only one that really knew what SecretFile.DLL was actually doing.
Given this scenario, it is possible to create proprietary XML references to do tons of things. The end user would be in no better position to understand what the XML file format is actually doing as compared to a binary *.DOC or *.XLS file format. Read that sentence again -- even though XML is supposed to be a standard and is in plain text, an XML file format can be created which is essentially unreadable and which third parties cannot process or even understand.
Now, the question becomes: will Microsoft do such a thing? We don't have to wait for the answer -- we can deduce the answer from past behavior. It's already been proven in court that Microsoft will stoop to illegal behavior to eliminate anything (e.g. Java) that threatens Windows. Microsoft uses the profits from its Windows and Microsoft Office monopolies to bankroll its unprofitable ventures (e.g. MSN, X-Box, etc.) to take over new markets. Therefore we can safely assume that Microsoft will not allow anything to intefere with Microsoft Office's monopoly position. If Microsoft were to create a truly open XML version of its file formats you can assume that many of its customers would opt for cheaper products to read/write Microsoft Office file formats. Why would a business pay $400 for a copy of Microsoft Word when they can instead use a free copy of OpenOffice to read/write the same files?
But there's more than financial interest to consider. Right now there is a group which is dedicated to producing an open, common XML file format for various office suite files (spreadsheet, word processing document, etc.). This group wants true inter-operability between various office suites. If Microsoft were truly interested in this, one would think they would support this effort. But instead of supporting this effort, Microsoft is working to torpedo OASIS (Organization for the Advancement of Structured Information Standards), the fledgling office file format standards group. The reasons for this, as noted above, are clear.
The problem for Microsoft is that too many people know its past and can predict its future actions. Too many of its customers are upset at being raked over the coals. OASIS is attracting the interest of some large corporations and European and other governments who have dealt with the insanity of proprietary file formats for too long. There is a growing movement for office suite users to ensure that their data is generically accessible.
The battle lines are drawn. The importance of this upcoming fight is critical: who will have access to your data? Will you be able to access your data generically, or will you be in a position to have to use one vendor's product to access your company's data? While one can create an argument for either side of the equation, it's important for information technology people and office users to know what is at stake.
This article is copyright © Randy Edwards 2002 and is licensed under the GFDL.