Introduction
Open XML Markup standard has been introduced to store the content in XML-based format on the Microsoft Office platform. It uses Open technologies which enable solutions to several application software. A key advantage of the Office Open XML structure is that various parts of the document can be edited more easily because they reside in separate files.
WordprocessingML for documents, SpreadsheetML for spreadsheets and PresentationML for presentations are three major markup languages of Open XML.
WordprocessingML for documents
WordprocessingML or Open XML document is a zipped collection of folders, subfolders, XML files and other file types. The basic document structure of a WordProcessingML document consists of the <document> and <body> elements, followed by one or more block level elements such as <p>, which represents a paragraph.
A paragraph contains one or more <r> elements. The <r> stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more <t> elements. The <t> element contains a range of text.
Following figure shows the XML elements of a simple WordprocessingML document:
document — The root element for a WordprocessingML’s main document part.
body — The container for the collection of block-level structures that comprises the main story.
p — A paragraph.
r — A run.
t — A region of text.
The following code uses the Open XML SDK 2.5 to create a simple WordprocessingML document that contains the text “Trigent POC on WordprocessingML”
A real word document might contain a lot more attributes such as headers, footers, tables, images, TOC’s etc.
SpreadsheetML for spreadsheets
Structure of a SpreadsheetML document consists of the <workbook> element which is the container for various worksheets. A separate XML file is created for each worksheet. These elements are the minimum elements required for a valid spreadsheet document. In addition, a spreadsheet document might contain <table>, <chartsheet>, <pivotTableDefinition> or other spreadsheet related elements.
We will analyze different parts of the spreadsheet by taking an example of below worksheet.
The following figure shows a sample workbook which is a part of the above spreadsheet. This example has sheet tag referencing to worksheet via the r:id attribute.
The next thing we see is the relationship in the workbook.xml part’s relationship file.
In this part we have four relationships which corresponds to Styles, Themes, worksheet and contents of the worksheet.
sharedStrings.xml has the contents of the specified spreadsheet , which we can see below.
Following code example use the classes in the Open XML SDK 2.5 to create a workbook.
PresentationML for presentations
The major parts of PresentationML documents are <presentation> (Presentation) element that contains <sldMaster> (Slide Master), <sldLayout> (Slide Layout), <sld > (Slide), and <theme> (Theme) elements. For PresentationML the hierarchy is bigger, where a slide references a specific slide layout, which in turn goes to a slide master. Together these three elements form the final slide on screen. There are several types of master slides for several types of content. The number and types of parts will vary based on the content of the presentation.
<presentation> (Presentation) element
A slide with common types of content:
This contents will look as follows in XML:
The following code example uses the classes in the Open XML SDK 2.5 to create a sample presentation.
The important thing to note In the above code is that parts of the presentation (SlidePart , SlideLayoutPart, SlideMasterPart, ThemePart ) are created in a hierarchal manner.
Conclusion
Open XML is a standard, free for all to use. It is well documented. Programmability of Open XML empowers the developer to create a straightforward tool to save, load, and use the document format in a wide variety of applications.