Raf's laboratory Abstracts Feed Raffaele Rialdi personal website

I am Raf logo

Creating Word docx document using Office Open XML SDK 2.0

January 19, 2010
http://www.iamraf.net/Articles/Creating-Word-docx-document-using-Office-Open-XML-SDK-20

There is no doubt that the new Office Open XML file format is a huge gain for every developer, and as conseguence for users too.

The new file format is native in Office 2007 but can be also used natively with a plugin in Office 2003 and with external (free) utilities with previous Office releases but also OpenOffice. The very first gain is that the file format is an ECMA and recently ISO standard. This means that two committees ensure the stability and evolution of this format.

In order to read and write docx, xlsx and pptx (word processing, spreadsheet and presentation) files there are few things to keep in mind:

  1. These files are zip archives (try to rename the file with zip extension and open it) with a specific directory and file requirements. This part of the specification is called OPC (Open Packaging Convention) and is shared with XPS files.
    OPC is handled from Office Open XML SDK version 1 but can also be handled directly with System.IO.Packaging classes in Framework 3.x or above.
  2. The content of the document is written in one or more xml files inside the zip archive.
    The specification for the xml schema are inside the Office Open XML specification. The current version in Office 2007 is the ECMA specifications. In the (near?) future there will be a soft transition to the ISO schema.
    SDK version 2 handle the content of the document using ECMA standard.
  3. If you read a generic document, you have to be prepared to handle all of the specs, or at least be aware to ignore part of it.
    If you have to create (write) a document, you only need to use the elements required to describe your content.
    This means that, as always, writing is easier than reading.

opc_file_format

The OPC file format

Ok, let's start and create a new Word document without using Word.

  • Open Visual Studio (or whatever IDE you want) and create a new C# project of your choice (Windows Forms, WPF, Console).
  • Add a reference to DocumentFormat.OpenXml.dll and WindowsBase.dll
  • Write the code! :)
   1: using(var doc = WordprocessingDocument.Create(
   2:         fileName, WordprocessingDocumentType.Document))
   3: {
   4:     MainDocumentPart mainPart = doc.AddMainDocumentPart();
   5:     // ....
   6: }

This is the most essential code that you need to create the document but obviously few things more are needed.

Replace  line 5 (the comment) with the following snippet to create the body of the document:

   1: mainPart.Document = new Document(new Body());
   2:  
   3: Paragraph par = CreateSimpleParagraph();
   4: mainPart.Document.Body.Append(par);

Finally create a paragraph with the following code:

   1: private Paragraph CreateSimpleParagraph()
   2: {
   3:     Paragraph par = new Paragraph(new Run(
   4:         new Text("Welcome to TechDays/WPC 2009")));
   5:     return par;
   6: }

That's all! Now let's add a couple of methods to create formatted paragraphs, just to show how to add some decoration to our document.

   1: private Paragraph CreateFormattedParagraph()
   2: {
   3:     Paragraph par = new Paragraph(new ParagraphProperties());
   4:     par.ParagraphProperties.TextAlignment = new TextAlignment();
   5:     par.ParagraphProperties.TextAlignment.Val = VerticalTextAlignmentValues.Top;
   6:  
   7:     par.ParagraphProperties.Justification = new Justification();
   8:     par.ParagraphProperties.Justification.Val = JustificationValues.Right;
   9:  
  10:     par.ParagraphProperties.Indentation = new Indentation();
  11:     par.ParagraphProperties.Indentation.FirstLine = 12;
  12:  
  13:     Run run = new Run(new Text("Welcome to TechDays/WPC 2009"));
  14:     //run.Append(new Break() { Type = BreakValues.Page });    // optional
  15:     par.Append(run);
  16:     return par;
  17: }
  18:  
  19: private Paragraph CreateBorderParagraph()
  20: {
  21:     Paragraph par = new Paragraph(new ParagraphProperties());
  22:     par.ParagraphProperties.ParagraphBorders = new ParagraphBorders();
  23:     par.ParagraphProperties.ParagraphBorders.LeftBorder = new LeftBorder();
  24:     par.ParagraphProperties.ParagraphBorders.LeftBorder.Size = 24;
  25:     par.ParagraphProperties.ParagraphBorders.LeftBorder.Val = BorderValues.Single;
  26:     par.ParagraphProperties.ParagraphBorders.LeftBorder.Color = "4F81BD";
  27:  
  28:  
  29:     par.Append(new Run(new Text("Hello, world")));
  30:     return par;
  31: }

 

You may wonder now how many tags /classes are available to the developer to create a full featured document. I can honestly say that they are so many that you will need the specification on your knees or a secondary monitor with the specs.

There is a valid alternative that, trust me, you will really love:

  • Create the document using MS Word and close it.
  • Run DocumentReflector.exe from the sdk folder (C:\Program Files\Open XML Format SDK\V2.0\tools) and open the document.
  • Select and copy the code in the right window, paste it in your project.

document_reflector

Isn't it wonderful? This also explain why the SDK object model is unusual. It is not strong typed and you are free to create invalid hierarchies, that is invalid documents. The advantage to have a one to one match between tags and classes is that an existing document can easily converted in code as DocumentReflector does.

Take the time to look at the SDK documentation that is in the C:\Program Files\Open XML Format SDK\V2.0\doc folder. It's organized in two sections: a valuable "how-to" article collection and the file format documentation that is really more powerful than the official format specifications.

I have a final example to talk about. I took from the Internet the ascii text for Julius Caesar's "De Bello Gallico", a well-known avatar for the author and created the book-formatted document. The result is the following:

result

In this example, the most important innovation is the use of styles. Every word processor user should use styles instead of formatting the document. This let's you easily change the look of the document by changing the styles only.

Creating the styles is pretty easy. You typically define one base style and then define the others based on the first one.

   1: private Styles GetStyles()
   2: {
   3:     Styles styles = new Styles(
   4:         // Normal
   5:         new Style(
   6:             new StyleName() { Val = "Normal" },
   7:             new PrimaryStyle()
   8:             ) { Type = StyleValues.Paragraph, StyleId = "Normal", Default = true },
   9:  
  10:         // ParaFancy
  11:         new Style(
  12:             new StyleName() { Val = "ParaFancy" },
  13:             new BasedOn() { Val = "Normal" },
  14:             new NextParagraphStyle() { Val = "ParaFancy" },
  15:             new PrimaryStyle(),
  16:             new StyleParagraphProperties(
  17:                 new Indentation() { FirstLine = 220U },
  18:                 new Justification() { Val = JustificationValues.Both },
  19:                 new SpacingBetweenLines() { After = (UInt32Value)300U, Line = 240, LineRule = LineSpacingRuleValues.Auto },
  20:                 new WidowControl()
  21:                     )
  22:             ) { Type = StyleValues.Paragraph, StyleId = "ParaFancy" },
  23:  
  24:         // Title
  25:         new Style(
  26:             new StyleName() { Val = "Title" },
  27:             new BasedOn() { Val = "Normal" },
  28:             new NextParagraphStyle() { Val = "Normal" },
  29:             new LinkedStyle() { Val = "TitleChar" },
  30:             new PrimaryStyle(),
  31:             new StyleParagraphProperties(
  32:                 new ParagraphBorders(
  33:                     new BottomBorder() { Val = BorderValues.Single, Color = "4F81BD", Size = (UInt32Value)8U, Space = (UInt32Value)4U }),
  34:                 new SpacingBetweenLines() { After = (UInt32Value)300U, Line = 240, LineRule = LineSpacingRuleValues.Auto },
  35:                 new ContextualSpacing()
  36:                     ),
  37:             new StyleRunProperties(
  38:                 new Color() { Val = "17365D" },
  39:                 new Spacing() { Val = 5 },
  40:                 new Kern() { Val = (UInt32Value)28U },
  41:                 new FontSize() { Val = (UInt32Value)52U },
  42:                 new FontSizeComplexScript() { Val = (UInt32Value)52U })
  43:                     )
  44:         );
  45:     return styles;
  46: }

Styles must be embedded in another part (xml file) of the OPC container. This is done with only two lines:

   1: StyleDefinitionsPart stylePart = mainPart.AddNewPart<StyleDefinitionsPart>();
   2: stylePart.Styles = GetStyles();

Once defined the styles, you assign the desired style while creating the paragraph:

   1: private Paragraph CreateParagraph(string str, string style)
   2: {
   3:     Paragraph par = new Paragraph(
   4:         GetPropertiesForStyle(style),
   5:         new Run(new Text(str)));
   6:     return par;
   7: }
   8:  
   9: public ParagraphProperties GetPropertiesForStyle(string StyleName)
  10: {
  11:     var element =
  12:         new ParagraphProperties(
  13:             new ParagraphStyleId() { Val = StyleName });
  14:     return element;
  15: }

Formatting the first letter of each paragraph is only a matter of fantasy. This is the obvious method implementation:

   1: private Paragraph CreateParagraphFirstLetterBold(string str, string style)
   2: {
   3:     Paragraph par = new Paragraph(GetPropertiesForStyle(style));
   4:  
   5:     if(str.Length == 0)
   6:         return par;
   7:     if(str.Length == 1 || style != "ParaFancy")
   8:     {
   9:         par.Append(new Run(new Text(str)));
  10:         return par;
  11:     }
  12:  
  13:     string FirstChar = str[0].ToString();
  14:     string Rest = str.Substring(1);
  15:  
  16:     par.Append(
  17:         new Run(
  18:             new RunProperties(
  19:                 new Bold(),
  20:                 new FontSize() { Val = 32U }
  21:                 ),
  22:             new Text(FirstChar)
  23:             ),
  24:         new Run(
  25:             new Text(Rest))
  26:             );
  27:  
  28:     return par;
  29: }

Inserting the image is quite tedious and longer, so I simply used the DocumentReflector tool as you will see in the attached sample.

But now there is a last super-important information to include in our document: document properties.
We live in the metadata-age, metadata are fundamental to index, categorize, tag, retrieve the documents and we can add these precious information with few lines of code.

   1: using(WordprocessingDocument doc = WordprocessingDocument.Create(document, WordprocessingDocumentType.Document))
   2: {
   3:     MainDocumentPart mainPart = doc.AddMainDocumentPart();
   4:  
   5:     doc.PackageProperties.Creator = "Raf";
   6:     doc.PackageProperties.Category = "Sample";
   7:     doc.PackageProperties.Keywords = "Caesar De Bello Gallico Latin";
   8:     doc.PackageProperties.Description = "This is a sample for TechDays/WPC 2009 conference";
   9:     doc.PackageProperties.ContentStatus = "First draft by Raf";
  10:     doc.PackageProperties.Subject = "Office OpenXML";
  11:     doc.PackageProperties.Title = "DocumentProperties Sample";
  12:     // ...


rated by 0 users



Share this page on Twitter


Privacy | Legal Copyright © Raffaele Rialdi 2009, Senior Software Developer, Consultant, p.iva IT01741850992, hosted by Vevy Europe Advanced Technologies Division. Site created by Raffaele Rialdi, 2009 - 2015 Hosted by: © 2008-2015 Vevy Europe S.p.A. - via Semeria, 16A - 16131 Genova - Italia - P.IVA 00269300109