Search This Blog

Monday, 17 August 2015

Working with Word documents using Aspose in C#


The current system we are building is a platform for document authoring and collaboration. The system lets users upload word documents into our system and the API in our system extracts content from it and translates it into a custom document format. Our current code base does this by parsing the word Open XML content and translating it into the content elements of our document element. The code largely revolves around dealing with the nuances of the Open XML format and conditionally handling inconsistencies of Open XML. Looking at some metrics on the code that parses the word document, there is code with very high cyclometric complexity.


This is inevitable if we built this all by ourselves, but in the grand scheme of things working in a start-up this cost us at least two peoples time for a couple of months. Is this worth it and will our code be as good as some of the libraries out there in parsing word documents. The answer is an obvious NO. We didn’t need to build this on our own. At approx 14000$ I could get a Site OEM license of Aspose.Words for NET and use it.

Overview of Aspose Words for .Net

A snapshot of the capabilities of Aspose.Words for NET from their site is shown below


Having evaluated a few options, it was an easy conclusion to use Aspose. The product has a mature API and can convert it into other formats quite easily. The picture below from Aspose should explain everything you need to about formats of content Aspose supports.



Importing Word Content

Extracting content from different document elements is made easy by Aspose’s Document tree navigation and composite nodes. One of our developers shared the code snippets shown below on how to extract paragraphs , content from tables and footers from a word document. 

Extracting from Paragraphs and footer:



Extracting content from tables:



Producing word documents from Data using Templates

A feature we need was to take our custom document format, extract data and produce word documents from word templates. The mail merge feature in Aspose is pretty slick in how we are able to do this without much effort.

Before we started writing any code we created a word document template (as shown below in the screenshot) to identify what data needs to be injected into the template.The code snippet blow is from one our developers, who worked with the library more extensively than I did.  Aspose has a concept of regions to dynamically grow portions of the document, such as tables. Since we persist the output word document in a file system, we converted the output into a stream object.


So when we execute the code to perform a merge of the data and the template, the resulting word document looks like the following screenshot.


Clearly this is a feature to buy and not to build on our own, because no matter how good we are, the cost of building this is going to exceed a full blown version of Aspose.Words for NET. The Aspose.Words for NET library helped us avoid a lot of complex code that would have been written to match these requirements. We are no experts in Open XML and frankly don’t think we should be writing code to parse word documents. Aspose.Net for Word was an easy choice. They have other components which are worth having a look at.

No comments: