Why care about document formats?
Document formats play an important role in the exchange of information between people. Throughout the history of its Office products, Microsoft has worked to enable the exchange of documents across applications and systems. In this white paper, learn what Microsoft is doing to help people exchange documents, including supporting existing industry standards, and helping standardize document formats that can help you bring your existing documents into an open, compatible environment.
Why care about document formats?
Documents are a primary information exchange vehicle between people, in paper and digital form. Document formats play an important role in helping computer users share information. Many computer users can recall a time when Microsoft® Windows® based PC’s could not share documents with users of Macintosh® computers. Through technical collaboration, the problem was eventually solved, helping people exchange information between these computing platforms. For people who communicated with both Windows and Macintosh users, this was a tremendous step forward.
When document formats are shared among many applications and platforms, people have an improved ability to communicate with others. Document formats such as PDF, .DOC, .TXT and many others achieved a sort of de facto standard status because of widespread usage. Like HTML, these document formats provided a suitable information exchange medium because the sender of the document could make a reasonable assumption that the recipient could open or read their file.
For most users, document formats are of little consequence; for the most part, people don’t care which file format is being used, as long as they can open it, read it, or do what they choose to do with it without interruption. Very often, users are lead to the same conclusion: “I usually don’t have any problem opening documents that are sent to me, I don’t see this as a problem. Why should I care?”
Today, many public and private enterprises endeavor to supply file formats that applications can use and share to enable greater interoperability between an increased variety of software applications and platforms. There has been a lot of recent discussion about how to achieve sustainable document formats standards in a world where the diversity of software is increasing.
For those on the outside looking in, the discussion can be confusing and intimidating because it is saturated with very different opinions. Interestingly, many of the participants in this discussion share some common goals in enabling information exchange, butit can be difficult to gain a clear perspective on the debate and separate the various arguments.Conflicting opinions on document format standards have spawned a healthy discussion of what standardized document formats should do, who should have “control” over their design, and who should have rights to use them.
While there are many efforts to supply common document formats, there are also varying opinions and approaches for how to solve the challenge. Open XML Formats, UOF, PDF, ODF, RTF, TXT, ASCII, HTML and many other formats offer solutions, but the marketplace has not decided on a “winner;” in fact, the most reasonable outcome for document formats is that there is no single “winner.” Many document formats are used today to accomplish specific tasks. Scrapping all of them in favor of a single, monolithic document format is not a reasonable approach.
Balancing practical reality and potential benefit
Many aspects are shaping the discussion around standardized document formats. Effort is required to sort out these important issues to understand how they impact document formats and what can be done about them. Some of these aspects include:
- Freedom to choose a format that suits the needs of the task at hand
- Document formats that can be easily exchanged by many applications and systems
- Freedom from dependence on specific applications, vendors or platforms to exchange documents
- Maximum compatibility with existing documents
- Preserving documents for records management and archival purposes
- Document formats that support the breadth of language and assistive technology requirements
- Accounting for the incredible variety of software applications, usage and functionality
- Protecting information stored in documents from unwanted usage
These goals represent a strong desire for independence, choice, innovation and freedom for software applications. It also reflects a strong desire of organizations to get more out of the software they already have; to do a better job of integrating their desktops to back-end systems. There are many other factors that can be considered, but this list presents a formidable challenge for organizations seeking to comprehend what standardized document formats mean for their computing ecosystems.
Many of these goals might also be in conflict. For example, does a document format suitable for archives need to protect its content from unwanted usage? Should it also support the real-time updates and information exchange required to integrate with other applications and systems? It would seem that the many worthy goals for document format standards reflects the diversity of software usage. It would also seem that one format will cannot reasonably accommodate these goals.
How Microsoft is helping improve standardized document formats
Users of Microsoft Office in public and private sector agencies want to achieve the benefits of open, standardized file formats, but also want to preserve their ability to work with the content in their existing documents. As in any technology area, backward compatibility is a desirable feature, and Microsoft worked with others in the industry to design and document the Ecma Open XML standard to achieve this goal.
Throughout the history of Office, Microsoft has supported many document formats in its applications, including binary formats, RTF, TXT, ASCII, CSV, HTML, and hundreds of other formats. Support for many file formats in Office is designed to facilitate information exchange between Microsoft Office and other applications and systems. Microsoft began the transition to XML-based formats in Office 2000, when support for XML-based document properties was introduced.. In the 2007 release of Microsoft Office, support for additional formats like PDF and the newly created XPS has been introduced to improve the ability of Microsoft Office to exchange information with other systems.
Beginning in 2006, Microsoft began participating in an effort led by the international standards body, Ecma International, to work on an open, published standard for documents that had a primary goal of being able to fully represent all of the information stored in the billions of documents that have been created in the past. This approach enables users to easily migrate existing documents to this new file format. Along with Apple, Novell, Toshiba, Statoil, the US Library of Congress and other entities, Microsoft worked in the Ecma Technical Committee #45to fully document and publish the Ecma Office Open XML formats.
In December of 2006, the Ecma Office Open XML formats were ratified as an international standard that can be implemented by any developer, including Microsoft competitors. People are not required to use only one specific application or program to read their documents, and they are free to choose among many applications that can exchange these files. The diligence of the Open XML format technical committee to preserve the behavior of legacy documents offers a clear, practical path for those seeking to easily migrate their documents into an open environment. With a detailed open, published international standard for file formats, exchanging documents across applications and systems becomes easier and people now have increased choice of applications and platforms.
The marketplace for business productivity applications has responded favorably to the Open XML formats. Novell has introduced a version of the OpenOffice suite that can read and write Open XML formats. Corel will introduce support for the Open XML formats in the WordPerfect suite in 2007. Microsoft has introduced compatibility tools for the Open XML formats that enable users of Office 2000, XP and 2003 version to read and write Open XML formats. The newly release 2007 Office suite also uses the Open XML formats as its default file formats. This rapid, broad adoption of Open XML formats indicates the desire of software vendors and users to store their documents in open file formats. The choice to support Open XML formats is a strong signal from the ecosystem that the Open XML formats provide significant value to people.
Open Document Format
In 2006, another document format standard was ratified by the International Standards Organization (ISO). Open Document Format (ODF) has its origins as the “Open Office XML Format,” ratified by OASIS as a document format standard in 2005. Designed originally as a document format for OpenOffice.org, ODF was intended as a document format specification that enabled the exchange between OpenOffice.org and other supporting applications. Interest in OpenOffice.org accelerated upon the declaration of the Commonwealth of Massachusetts that open file formats were to be used in the exchange of documents between government agencies. This policy raised a number of important questions for document standards, including the role of PDF, accessibility and assistive technology, and even the programs required to read and write ODF.
ODF is designed to represent functionality in OpenOffice.org products, and was originally named “Open Office XML Format.” Unlike the Open XML formats, which are specifically designed to carry forward information in legacy files, ODF is not optimized to represent content that exists in documents that have already been created, it was only designed to reflect the information created by one application. For example, the ODF technical committee within the OASIS standards body declared that a standardized markup language for spreadsheet formulas (such as “SUM” and “Average”) was “outside of the scope” of their charter.
ODF is also supported in most major business productivity suites today. Microsoft Office users can download and install a free plug-in from the Open Source software community to convert documents to ODF. Open Office also uses ODF. Corel has announced ODF support in 2007. Other business productivity applications like KOffice, ABIWord and others support ODF.
Aren’t these just two formats designed to do the same thing?
The rhetoric surrounding the standardization of ODF and Open XML has generated debate over the merits of each format. Those who are polarized toward ODF claim that Open XML and ODF are designed for the same purpose, and that only one format should exist. Open XML advocates, along with a mainstream of users, support the idea that Open XML and ODF are designed for different purposes, and exist to address different user needs, much like PDF, RTF, HTML and the countless other text and document formats that exist. Other document formats such as PDF and UOF provide different examples of how specific design goals shape the use of a document format. PDF is a fixed document format for representing content that is no longer editable; UOF is a standard from the Chinese government intended for use within China.
Many arguments are made about the process by which ODF and Open XML formats were standardized; many argue over minute technical details within the specifications, some argue about the terms under which the intellectual property in the file formats should be made available. In truth, the Ecma Office Open XML Formats and Open Document format share many similarities in the standardization process. Both standards underwent a lengthy technical review, represented by parties from multiple companies and other interested parties. Both formats originated from software products; ODF originated from OpenOffice (and was originally named “Open Office XML Document Format”), and Open XML formats are a reflection of the proprietary .doc, .xls, and .ppt file formats of past Office versions.
The real differences between ODF, Open XML formats and other formats are not in the politics and rhetoric. By comparison to Open XML formats, the ODF specification is short and simple, but is not optimized for representing the content in existing documents. The Open XML Formats specification is optimized for the level of precision and detail required to carry forward billions of existing files, including a complete specification for spreadsheet formulas and many other features that are lacking in the ODF specification. The Open XML Formats also offer the unique capability of hosting custom-defined data languages within the document format. Organizations can use Open XML formats to report information from other applications and systems without having to translate it first. This capability is a key innovation for developers seeking to incorporate real-time business information into their documents, or those who seek to “tag” documents with their own categorization system in order to improve their understanding of its contents.
Support for both ODF and Open XML formats, as well as PDF, RTF, HTML, in productivity suites would suggest an acknowledgement of the reality that customers need many file formats to accomplish the work they do. To support the variety of needs for document formats, many translation projects are under development, to facilitate translation between Open XML formats, ODF, UOF, PDF and others. Indeed, for customers who want to use multiple formats – their needs are being met by developers designing products to support multiple formats and by using translators to exchange data between them.
Can’t we have just one document format?
When thinking about this question, it’s important to compare it to a more digestible real world example. Most governments have the need for vehicles to carry out government business. Whether its fire engines, ambulances, police cruisers, prisoner transport vehicles, mass transit vehicles, snow plows or others needs, the sheer variety of tasks and needs of the populace require a government to have the flexibility to use the right vehicle for the job. Similarly, open file formats mean many things to many people, and one document format cannot address the list of needs that arise in the multitude of scenarios in which documents are created and used. Just as an ambulance isn’t the optimal choice to clean streets and a snow plow isn’t useful to transport commuters to work, the reality of software usage today suggests that many, many file formats exist to satisfy the incredible diversity of needs in software applications. Image file formats, editable document formats, fixed document formats, archive formats, spreadsheet formats, page layout formats, email formats, diagramming formats and countless others exist to satisfy the needs of software usage. Some document formats are optimized to present a fixed, static representation of information so that it cannot be changed, ever. Editable document formats are designed to maximize editability. Specific formats like spreadsheets or page layout document formats are designed to suit specific needs of software applications and systems.
Imagine a common scenario involving PDF, Microsoft Office Excel, and Microsoft Project. All of these programs share information, and may represent information from a specific project at any instance in time. But it makes little sense to merge these into a single format, as the data represented in these formats are intended for different purposes. The PDF documents for this application would be for the purpose of presenting final versions of the information. Excel may be used to perform analysis of data, which PDF is not suited to do. The Project file contains information about tasks and resources that is editable by the project owner, and is not suitable for analysis, and not appropriate for broad distribution in a final format.Combining thesedocument formats would make little sense.
In fact, the very tenets of document format exchange, which include use in multiple applications and maximum compatibility with existing documents,demand the ability to choose those formats which best suit the task at hand. Legislating or mandating the use of a single document format is an arbitrary measure that doesn’t reflect the reality of software usage today.
What can I do about document formats for my organization?
One advantage of open file formats is the ease with which one can enable support for them within their existing infrastructure. Microsoft Offers a Compatibility Pack to enable support for the Open XML formats in Office 2000, XP and 2003. Microsoft and many other software providers offer free add-ins to Microsoft Office to save documents using the PDF format. A free, open-source translator for supporting ODF within Microsoft Office is also available from sourceforge.net. Users of Office XP, 2003 or 2007 can add ODF support to their Office installations easily.