Creating Open XML Documents using SharePoint Object Model

Introduction

Open XML files are Zip files that contain XML, and it is very easy to generate or modify Open XML documents programmatically. Using the programmability features of Open XML and SharePoint, we can put together a small document-generation system.

Here we will see how to develop a document-generation system that uses SharePoint lists to populate tables in an Open XML word-processing document.

Sharepoint List

Two SharePoint list Policy and PerminsAndForms are created that contains data we want create the table in the Word document.

Template Document

We need to create a template document which defines format of the output document like Styles, Settings, Fonts, columns of the table etc. Below figure shows the template we created for this sample.

Programming for Open XML Using the .NET Framework

Here we need to use Open XML SDK (ver 2.5) and it is available for free download. We should refer documentformat.openxml.dll of the SDK in our project before get into coding.

We also need to refer Microsoft.SharePoint.Client.dll and Microsoft.SharePoint.Client.Runtime.dll to access the SharePoint list and read data from the above two lists.

We have created two classes in this sample project, SPDocumentGenerator class contains the logic to create the word document and SPDataRepository class for communicating to Sharepoint and get the data.

We will see important methods of these classes and the logic involved in generating the final output.

CreatePackage method of the SPDocumentGenerator creates a blank document package in the specified path that is passed as parameter to this method. All parts of the document such as Body, Font, Style, Settings, and Themes are created in the CreateParts method. Each of these child parts are created in their respective methods like GenerateMainDocumentPart1Content creates Body of the document, GenerateFontTablePart1Content creates fonts, and GenerateStyleDefinitionsPart1Content creates Styles etc.

GenerateMainDocumentPart1Content is the important method for us now because it creates content of the document and in our case it will create some plain text and table of data from the Sharepoint list.

In this method, to begin with, we are creating the body object, and then the paragraph. The most basic unit of block-level content when creating word document from OpenXML is paragraph. A paragraph can contain optional paragraph properties, inline content (Run object that we see in the above code).

After the paragraph, the next level document hierarchy is Run which defines a region of text.  Run can also have properties (RunProperties object). Some examples of run properties are bold, border, character style, color, font, font size, italic, and underline.

The Text object is the container for the text that makes up the document content. We need to use Text object to place text in a Wordprocessing document.

In the above code text “Creating Open XML Documents using SharePoint Object Model” is added to the main document. A Run object contains a region of text within the paragraph and then a RunProperties object is used to apply bold formatting to the run.

Now we will see how to create table.

In the below code we can see that Table object is created and then style, width of the table is defined through TableProperties object.  Required columns and its width are created by using TableGrid object. Then we need start creating TableRow and Cell in the each row.  In our example we have table with four columns with static column header. This Column header is created by using Paragraph, Run and Text object as explained above and then Paragraph object will be added to Cell of the TableRow. “Sl No.” is the first column we created in this table.

Now we will see how to create data row in this table. In the below code snippet we have created instance of SPDataRepository class which has the logic to retrieve data from the Sharepoint using CSOM (Client Side Object Model). We will explore that code in next section.

We are using getListItems method of the SPDataRepository to get the required SharePoint list. First parameter in getListItems is the SharePoint list name, and other two parameters are useful if we want to fetch list items that match certain criteria. Second parameter is for criteria field name and third is for matching value.

Return values of getListItems method are stored in a collection (lstPolicies). Then we need to create TableRow for each item, and TableCell for each field in the list item. Then as we did for static text, we need to create Paragraph, Run, and Text object for assigning SharePoint List Item field value to the table cell.

The figure below displays getListItems method of the SPDataRepository. Firstly, we need to get access to SharePoint and this is done by creating ClientContext object with the URL of the SharePoint site. Then using GetByTitle method we are retrieving the specific List whose name is passed to this method.

The next part of the code defines a CAML (Collaborative Application Markup Language) query that specify which items of the List is to be returned. Created query string is to be assigned to Viewxml property of the CamlQuery object.

GetItems() method takes a CamlQuery input to return the items which meets the CamlQuery criteria.

One thing to note here is the client side object model (CSOM) uses the same pattern as SQL, i.e To start with, we need to Build a query, then execute and then Read the result. So here ClientContext object Load method builds the query and ExecuteQuery submits the prepared query to the SharePoint server and retrieves the data.

Finally returned result will be sent to the calling object, in our case it is SPDocumentGenerator

Output document which is generated from our application looks like below.

Conclusion

One of the important features of Open XML is Programmability. In this blog, we explored a very simple solution to study how we can use the Open XML to create Office documents by referring SharePoint as the data source. This also demonstrates how to reuse information between the applications and the information system.

Overview of Three Important Markup Languages of Open XML

Introduction

Open XML Markup standard has been introduced to store the content in XML-based format on the Microsoft Office platform. It uses Open technologies which enable solutions to several application software. A key advantage of the Office Open XML structure is that various parts of the document can be edited more easily because they reside in separate files.

WordprocessingML for documents, SpreadsheetML for spreadsheets and PresentationML for presentations are three major markup languages of Open XML.

WordprocessingML for documents

WordprocessingML  or Open XML document is a zipped collection of folders, subfolders, XML files and other file types. The basic document structure of a WordProcessingML document consists of the <document> and <body> elements, followed by one or more block level elements such as <p>, which represents a paragraph.

A paragraph contains one or more <r> elements. The <r> stands for run, which is a region of text with a common set of properties, such as formatting. A run contains one or more <t> elements. The <t> element contains a range of text.

Following figure shows the XML elements of a simple WordprocessingML document:

document — The root element for a WordprocessingML’s main document part.

body — The container for the collection of block-level structures that comprises the main story.

p — A paragraph.

r — A run.

t — A region of text.

The following code uses the Open XML SDK 2.5 to create a simple WordprocessingML document that contains the text “Trigent POC on WordprocessingML”

A real word document might contain a lot more attributes such as headers, footers, tables, images, TOC’s etc.

SpreadsheetML for spreadsheets

Structure of a SpreadsheetML document consists of the <workbook> element which is the container for various worksheets. A separate XML file is created for each worksheet. These elements are the minimum elements required for a valid spreadsheet document. In addition, a spreadsheet document might contain <table>, <chartsheet>, <pivotTableDefinition> or other spreadsheet related elements.

We will analyze different parts of the spreadsheet by taking an example of below worksheet.

The following figure shows a sample workbook which is a part of the above spreadsheet. This example has sheet tag referencing to worksheet via the r:id attribute.

The next thing we see is the relationship in the workbook.xml part’s relationship file.

In this part we have four relationships which corresponds to Styles, Themes, worksheet and contents of the worksheet.

sharedStrings.xml  has the contents of the specified spreadsheet , which we can see below.

Following code example use the classes in the Open XML SDK 2.5 to create a workbook.

PresentationML for presentations

The major parts of PresentationML documents are <presentation> (Presentation) element that contains <sldMaster> (Slide Master), <sldLayout> (Slide Layout), <sld > (Slide), and <theme> (Theme) elements. For PresentationML the hierarchy is bigger, where a slide references a specific slide layout, which in turn goes to a slide master. Together these three elements form the final slide on screen. There are several types of master slides for several types of content. The number and types of parts will vary based on the content of the presentation.

<presentation> (Presentation) element

A slide with common types of content:

This contents will look as follows in XML:


The following code example uses the classes in the Open XML SDK 2.5 to create a sample presentation.

The important thing to note In the above code is that parts of the presentation (SlidePart , SlideLayoutPart, SlideMasterPart, ThemePart ) are created in a hierarchal manner.

Conclusion

Open XML is a standard, free for all to use. It is well documented. Programmability of Open XML empowers the developer to create a straightforward tool to save, load, and use the document format in a wide variety of applications.

Exit mobile version