Overview of Three Important Markup Languages of Open XML

Introduction

Open XML Markup standard has been introduced to store the content in XML-based format on the Microsoft Office platform. It uses Open technologies which enable solutions to several application software. A key advantage of the Office Open XML structure is that various parts of the document can be edited more easily because they reside in separate files.

WordprocessingML for documents, SpreadsheetML for spreadsheets and PresentationML for presentations are three major markup languages of Open XML.

WordprocessingML for documents

WordprocessingML  or Open XML document is a zipped collection of folders, subfolders, XML files and other file types. The basic document structure of a WordProcessingML document consists of the <document> and <body> elements, followed by one or more block level elements such as <p>, which represents a paragraph.

A paragraph contains one or more <r> elements. The <r> stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more <t> elements. The <t> element contains a range of text.

Following figure shows the XML elements of a simple WordprocessingML document:

OpenXML01

document — The root element for a WordprocessingML’s main document part.

body — The container for the collection of block-level structures that comprises the main story.

p — A paragraph.

r — A run.

t — A region of text.

The following code uses the Open XML SDK 2.5 to create a simple WordprocessingML document that contains the text “Trigent POC on WordprocessingML”

OpenXML02

A real word document might contain a lot more attributes such as headers, footers, tables, images, TOC’s etc.

SpreadsheetML for spreadsheets

Structure of a SpreadsheetML document consists of the <workbook> element which is the container for various worksheets. A separate XML file is created for each worksheet. These elements are the minimum elements required for a valid spreadsheet document. In addition, a spreadsheet document might contain <table>, <chartsheet>, <pivotTableDefinition> or other spreadsheet related elements.

We will analyze different parts of the spreadsheet by taking an example of below worksheet.

OpenXML03

The following figure shows a sample workbook which is a part of the above spreadsheet. This example has sheet tag referencing to worksheet via the r:id attribute.

OpenXML04

The next thing we see is the relationship in the workbook.xml part’s relationship file.

OpenXML05

In this part we have four relationships which corresponds to Styles, Themes, worksheet and contents of the worksheet.

sharedStrings.xml  has the contents of the specified spreadsheet , which we can see below.

OpenXML06

Following code example use the classes in the Open XML SDK 2.5 to create a workbook.

OpenXML07OpenXML08

PresentationML for presentations

The major parts of PresentationML documents are <presentation> (Presentation) element that contains <sldMaster> (Slide Master), <sldLayout> (Slide Layout), <sld > (Slide), and <theme> (Theme) elements. For PresentationML the hierarchy is bigger, where a slide references a specific slide layout, which in turn goes to a slide master. Together these three elements form the final slide on screen. There are several types of master slides for several types of content. The number and types of parts will vary based on the content of the presentation.

<presentation> (Presentation) element

OpenXML09

A slide with common types of content:

OpenXML10

This contents will look as follows in XML:

OpenXML11
The following code example uses the classes in the Open XML SDK 2.5 to create a sample presentation.

OpenXML12

The important thing to note In the above code is that parts of the presentation (SlidePart , SlideLayoutPart, SlideMasterPart, ThemePart ) are created in a hierarchal manner.

Conclusion

Open XML is a standard, free for all to use. It is well documented. Programmability of Open XML empowers the developer to create a straightforward tool to save, load, and use the document format in a wide variety of applications.

Author