I am trying to use Word 2007 and export HTML of its contents and publish it to my content management system.

Right now Exporting Word using the API's is leading to incorrect HTML - like all lists are converted to P tags etc.

Is there a way to generate correct HTML from Word




Phil Hoff - MSFT

To get cleaner HTML you may have to resort to transforming the DocX file yourself. There are several articles on the web that describe the process (a link to the first one I came across is below). How complicated that process would be depends on how much of Word's native formatting you need to preserve during the conversion. p=691502

You may get a better answer in an Office- or Word-specific forum, as VSTO does not provide any additional capabilities in this area.


Saurabh Nandu - MVP

thanks Phil, I'll look at the links.

There were no Office / Word specific forums on this site hence I wanted to double check is VSTO provided me some help. I'll try hunting down a Word News Group.

Unfortunately it's very sad that even in the year 2007 - word can't generate clean HTML!