CREATING WORD DOCUMENT IN OFFICE OPEN XML FORMAT USING JAVA.
Author
Sanjay Kumar Madhva/ Kulkarni D.V./ Srinidhi H. S/Pujari Y.
Sonata Software Limited
RV Road Bangalore INDIA.
Introduction to Office Open XML
Scope of the article
WordProcessingML
Package
Parts
Item
Content type
Content-type item
Relationship
Package relationship
Package-relationship item
Relationship markup
JAVA WordprocessingML implementation
JAVA WordprocessingML Folder creation
JAVA WordprocessingML File creation
Create [Content_Types].xml
Create or copy image1.jpg
Create .rels
Create document.xml.rels
Create document.xml
JAVA Packaging Class Implementation
Importing Classes
Create a OpenXMLZipFile Classes
Create a CreateZipFile Method
Create a UnZipFile Method
Creating WordprocesingML package
Introduction to Office Open XML
The introduction of theOpen XML file formatsstandard from Ecma provides developers with the option of creating/editing an Open XML document using any development tool on any platform as long at they are conforming to the standardized file format specified. The use of open document formats, such as WordprocessingML improves interoperabilityby enabling standard-based XML 1.0 tools to create, read and write files conforming to the standardized file format. The Office Open XML formats can be used by a wide set of tools and platforms in order to foster interoperability across office productivity application and with line-of-business systems.
This article is based the Office Open XML standard being developed by Ecma the TC45 technical committee, the family of XML schemas collectively called Open XML. This standard defines the XML vocabularies consumed and produced by applications such as the “Office 2007” version of the Microsoft Office products Microsoft Word, Microsoft Excel, and Microsoft PowerPoint. The standard describes the packaging of documents that conform to these schemas.
Scope of the article
Article describes the packaging mechanism and minimum required files for creating anOffice Open XML Word document (referred to as WordprocessingML) using JAVA. This document, although created with no Microsoft APIs or software, can be consumed orviewed by Word 2007. (It may also be consumed by Word 2000, Word XP, or Word 2003, using the free add-in for Open XML support that will be released by Microsoft when Office 2007 is released.)
Assumption:
All the required files such as XML and images are created manually under the directory for packaging.
Understanding Office open XML
In order to create a WordprocessingML, let us understand how the document is structured in the Open XML packaging specification. The followingsections cover some of those parts.
WordProcessingML
A WordprocessingML document (Office Open XML document) is represented as a series of related parts that are stored in a container called a package. Information about the relationshipsbetween a package and its parts is stored in the package’s package-relationship items. Information about the relationships between two parts is stored in the part-relationship item for the source parts. A package is an ordinary Zip archive whose items correspond directly to those related parts.
Package
Package – A Zip archive that contains all the relationship items and parts of the Office Open XML documents, such that those parts are reachable via a set of relationships defines in the relationship items.
Package acts as a container for a collection of components, which are composed, processed, and persisted according to a set of rules. These are two kinds of components: parts and relationship items. A package is implemented as a ZIP archive, with each component in a package corresponding to an item in the archive. A Zip archive is a ZIP file as defined in the ZIP file format specification, but excluding all elements of that specification related to the encryption or decryption. A package provides a convenient way to distribute a document with all of its components pieces, such as images fonts and data.
The purpose of a package is to combine all of the pieces of document into a single file. A package holding a WordprocessingML document with a picture might contain a number of parts; an XML markup part representing the document, a part containing page header information, a part containing footnotes, and a part representing the picture in jpeg form.
Note:XML that is valid according to Office Open XML’s schemas.
Note:All XML content of the components defined in this Standard must be encoded using either UTF-8 or UTF-16.
Parts
Part – A package component that has associated common properties. A part corresponds to an item in a package.
A WordprocessingML document contains a part for the body of the text; it might also contain a part for an image referenced by that text, parts that defining documents characteristics, styles and fonts.
Parts can have relationship to each other, as well as to the package itself. These relationships are defined using XML in one or more relationship items. Each part has a content type and is unambiguously addressed using well defined naming guidelines. Content-type information is recorded in the content-type item.
Each part has part names. Part names refer to parts within a package, typically as part of a URI reference. Like file names in a file system and URIs, part names are hierarchical. Part name consist of segments, each representing a level in the hierarchy. For example, the part name “/hello/world/document.xml” contains three segments “hello”, “world”, and “document.xml”. Segments form a tree structure. This is similar to the file systems, where all of the non-leaf nodes in the tree are folders and the leaf nodes are files, which contain actual content. The folder (that is non-leaf) is the tree serve a similar function: they organize the parts of the package.
EG:
OverridePartName="/hello/world/document.xml "
ContentType="application/vnd.ms-word.main+xml" />
Item
Item is the context of a package “item” is a synonym for ”Zip item”
Content type
Content type is the description of the type of content stored in a part. A content type defines a media type, a subtype, and an optional set of parameters. The file which is a must and will be named [Content_Types].xml
Content-type item
Content type item an XML representation of mappings from part names to content types, stored as an item in a package. A content-type is not itself a part. This is the must and will look as below
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>Types xmlns="">
DefaultExtension="xml" ContentType="application/xml" />
DefaultExtension="rels" ContentType="application/vnd.ms-package.relationships+xml" />
DefaultExtension="jpg" ContentType="image/jpeg" />
OverridePartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />
</Types
Relationship
Parts often contain reference to other parts in a package and to resources outside of the package. However, in general, these references are represented inside the references inside the referring part in way that are specific to the content type of the part; that is, in arbitrary markup of an application-specific encoding. This effectively hides the internal and external linkages between parts from consumers that do not understand the content type of the parts containing such references.
The package user relationship as a higher-level mechanism to describe references from parts to other internal of external resources. A relationship represents the kind of connection between a source and a target resource. If the source is a part, the relationship is referred to as a part relationship. If the source is the package itself, the relationship is referred to as a package relationship. Relationship makes the connections directly discoverable without looking at the content in the parts, so they are independent of content-specific schema and faster to resolve the location of others parts.
Package relationship
A relationship whose target is a part and whose source is the package as a whole.
Package-relationship item
An XML representation of one or more package relation ship. Stored as an item in a package. A package relationship item is not itself a part.
Relationship markup
Relationship is represented using one or more Relationship elements nested in a single Relationship element. These elements are defiled in the relationship namespace.
Every relationship element must have an Id attribute, the value of which must be unique with in the relationship item. The Id type is xsd:ID and must conform to the naming restriction for that type.
This concludes the commonly used terms in creating anOffice Open XML document. What follows next is how to create an Office Open XML Word document referred to as WordprocessingML using JAVA,
JAVA WordprocessingML implementation
We in this article are trying to create a WordprocessingML document, which contains body text as “This document was created using JAVA….” and image being embedded. This can be achieved by creating hierarchy of folders and files, which will be packaged together as mentioned in the steps below.
JAVA WordprocessingML Folder creation
- Create a directory for example “c:\WordprocessingML\”, which will contain all the files required for packaging such as [Content_Types].xml, image1.jpg, document.xml etc
- Under “c:\WordprocessingML” create a folder with the name of “_rels” as shown below.
JAVA WordprocessingML File creation
Create [Content_Types].xml
In the directory “c:\WordprocessingML\” create a XML file “[Content_Types].xml”, which will contain the WordprocessingML content type.
The file content is displayed below.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
Types xmlns="">
DefaultExtension="xml" ContentType="application/xml" />
DefaultExtension="rels" ContentType="application/vnd.ms-package.relationships+xml" />
DefaultExtension="jpg" ContentType="image/jpeg" />
OverridePartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />
</Types
In Override tag attribute PartName is the xml representation of the word document and the ContentType indicates that it is the main document in xml format.
OverridePartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />
Images used in the document are referred as shown below, where Extension attribute describes the file <type> and the ContentType attributes contains “image/<type>”.
DefaultExtension="jpg" ContentType="image/jpeg" />
Create or copy image1.jpg
Copy or create an “image1.jpg” of type jpg file format under the“c:\WordprocessingML”, which needs to be embedded in the document.
Create .rels
Create XML file under “c:\WordprocessingML\_rels\” with below content.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
Relationships xmlns="">
RelationshipId="rId1"
Type=""
Target="document.xml" />
</Relationships
Create document.xml.rels
Create XML file under “c:\WordprocessingML\_rels\” with below content.
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
Relationships xmlns="">
<RelationshipId="rId1"
Type=""
Target="image1.jpg" />
</Relationships
Create document.xml
Create XML file under “c:\WordprocessingML\” with below content. This XML contains the text of the document as wells as formatting such as paragraph, row etc… For example in below example tag <w:p> represents paragraph for the text. In below case it will also contain the <w:pict> tag for the image to be embedded in the document.
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
w:wordDocument xmlns:w=""
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:r="">
w:body
w:p
w:r
w:tThis document was created using JAVA….</w:t
</w:r
</w:p
w:p
w:r
w:pict
v:shape
v:imagedatar:id="rId1" />
</v:shape
</w:pict
</w:r
</w:p
</w:body
</w:wordDocument
JAVA Packaging Class Implementation
Importing Classes
Following classes needs to be imported for creating packaging class. The imported classes are built in classes of JAVA..
import java.io.BufferedInputStream;import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
Create a OpenXMLZipFile Classes
OpenXMLZipFile class will contain all the required methods, which will to help in packaging..
public class OpenXMLZipFile{
// CreateZipFile method which will take the zipFileName and ToCompressFiles as arguments
// and will go through the array of ToCompressFiles and pack it into zipFileName
// Below sections explains the method implementation.
….
}
Create a CreateZipFile Method
-create a ZipOutputStream
-set the level as Deflater.BEST_COMPRESSION
-loop through the list of files to be Zipped and for each file do the following
- Get the file Name and add it to the ZipEntry.
- set the ZipEntry to the ZipOutputStream
- write the contents of the file to the ZipOutputStream
public static void CreateZipFile(String zipFileName, String[] ToCompressFiles)
{
try
{
String[] fileNames = ToCompressFiles;
//fileNames[0] = "C:\\noname.xml";
//fileNames[1] = "C:\\sql_reference.pdf";
FileInputStream inStream;
// "C:\\ZipExample1.zip"
FileOutputStream outStream = new FileOutputStream(zipFileName);
ZipOutputStream zipOStream = new ZipOutputStream(outStream);
zipOStream.setLevel ( Deflater.BEST_COMPRESSION );
for (int loop=0;loop < fileNames.length; loop++)
{
inStream = new FileInputStream(fileNames[loop]);
zipOStream.putNextEntry(new ZipEntry(fileNames[loop]));
int i=0;
while ((i=inStream.read())!=-1)
{
zipOStream.write(i);
}
zipOStream.closeEntry();
inStream.close();
}
zipOStream.flush();
zipOStream.close();
}
catch (IllegalArgumentException iae) {
iae.printStackTrace();
}
catch(FileNotFoundException fe)
{
System.out.println("File not found===="+fe);
}
catch (IOException ioe)
{
System.out.println("IOException===="+ioe);
ioe.printStackTrace();
}
}
Create a UnZipFile Method
-Read the zip file
-Loop through the entries in the zip file and for each entry do the following
- Create a File Object. The file name is derived from the ZipEntry.
- Create an OutputStream using the File Object.
- Read the contents ZipEntry into a InputStream
- Write the contents of the InputStream into the OutputStream
public static void UnZipFile(String zipFileName, String ToExtractFile)
{
String inputFileName = zipFileName; // "C:\\PPT.zip";
String desFileName = ToExtractFile; // "C:\\TEST\\";
try
{
File sourceZipFile = new File(inputFileName);
File destDirectory = new File(desFileName);
//Open the ZIP file for reading
ZipFile zipFile = new ZipFile(sourceZipFile,ZipFile.OPEN_READ);
//Get the entries
Enumeration enum = zipFile.entries();
while(enum.hasMoreElements())
{
ZipEntry zipEntry = (ZipEntry)enum.nextElement();
String currName = zipEntry.getName();
File destFile = new File(destDirectory,currName);
// grab file's parent directory structure
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
if(!zipEntry.isDirectory())
{
BufferedInputStream is = new
BufferedInputStream(zipFile.getInputStream(zipEntry));
int currentByte;
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos);
// read and write until last byte is encountered
while ((currentByte = is.read()) != -1)
{
dest.write(currentByte);
}
dest.flush();
dest.close();
is.close();
}
}
}
catch(IOException ioe)
{
System.out.println("IOException occured====="+ioe);
ioe.printStackTrace();
}
}
The OpenXMLZipFile class. class code looks as below after implementation
import java.io.BufferedInputStream;import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
public class OpenXMLZipFile
{
// CreateZipFile method which will take the zipFileName and ToCompressFiles as arguments
// and will go through the array of ToCompressFiles and pack it into zipFileName
public static void CreateZipFile(String zipFileName, String[] ToCompressFiles)
{
try
{
String[] fileNames = ToCompressFiles;
//fileNames[0] = "C:\\noname.xml";
//fileNames[1] = "C:\\sql_reference.pdf";
FileInputStream inStream;
// "C:\\ZipExample1.zip"
FileOutputStream outStream = new FileOutputStream(zipFileName);
ZipOutputStream zipOStream = new ZipOutputStream(outStream);
zipOStream.setLevel ( Deflater.BEST_COMPRESSION );
for (int loop=0;loop < fileNames.length; loop++)
{
inStream = new FileInputStream(fileNames[loop]);
zipOStream.putNextEntry(new ZipEntry(fileNames[loop]));
int i=0;
while ((i=inStream.read())!=-1)
{
zipOStream.write(i);
}
zipOStream.closeEntry();
inStream.close();
}
zipOStream.flush();
zipOStream.close();
}
catch (IllegalArgumentException iae) {
iae.printStackTrace();
}
catch(FileNotFoundException fe)
{
System.out.println("File not found===="+fe);
}
catch (IOException ioe)
{
System.out.println("IOException===="+ioe);
ioe.printStackTrace();
}
}
public static void UnZipFile(String zipFileName, String ToExtractFile)
{
String inputFileName = zipFileName; // "C:\\PPT.zip";
String desFileName = ToExtractFile; // "C:\\TEST\\";
try
{
File sourceZipFile = new File(inputFileName);
File destDirectory = new File(desFileName);
//Open the ZIP file for reading
ZipFile zipFile = new ZipFile(sourceZipFile,ZipFile.OPEN_READ);
//Get the entries
Enumeration enum = zipFile.entries();
while(enum.hasMoreElements())
{
ZipEntry zipEntry = (ZipEntry)enum.nextElement();
String currName = zipEntry.getName();
File destFile = new File(destDirectory,currName);
// grab file's parent directory structure
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
if(!zipEntry.isDirectory())
{
BufferedInputStream is = new
BufferedInputStream(zipFile.getInputStream(zipEntry));
int currentByte;
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream dest = new BufferedOutputStream(fos);
// read and write until last byte is encountered
while ((currentByte = is.read()) != -1)
{
dest.write(currentByte);
}
dest.flush();
dest.close();
is.close();
}
}
}
catch(IOException ioe)
{
System.out.println("IOException occured====="+ioe);
ioe.printStackTrace();
}
}
}
Creating WordprocesingML package
To create a WordprocessingML do the following steps.
- Create an instance of the class OpenXMLZipFile
OpenXMLZipFile myWordprocessingML = new OpenXMLZipFile()
- Create a variable
String zipFileName = “c:\\myFirstDocumentUsingJava.docx”
String [] ToCompressFiles = new String[4];
ToCompressFiles [0] = “c:\\WordprocessingML\\[Content_Types].xml”;
ToCompressFiles [1] = “c:\WordprocessingML\\image1.jpg”;
ToCompressFiles [2] = “c:\WordprocessingML\_rels\document.xml.rels”;
ToCompressFiles [3] = “c:\WordprocessingML\document.xml”;
- Call the method CreateZipFile
CreateZipFile (zipFileName, ToCompressFiles);
The output of the above method will be a “myFirstDocumentUsingJava.docx”. This document fully conforms to the Open XML standard, and can be accessed using Office 2007 (or the 2000/XP/2003 versions of Office with the free Open XML add-in installed).