nitm

hi everyone,

i'm parsing a word document (using .net c#) and i need to transform parts of it to html...
i know that word has the option to save documents as html but is there a way 
to save only parts of the document as html (range => html)  

my solution so far is to create a new document paste the range there and 
then save the entire document as html.
this solution is way too complex and goofy for my taste...

any other (more simple) ways to do this 

thanks!


Re: Visual Studio Tools for Office word range to html?

Cindy Meister

No, there's nothing in the object model that will let you extract the HTML directly from the object model. Word requires a converter that only triggers when reading/writing to files or the Clipboard.

If this is Word 2003 or later, then you can get the XML and build a transform to convert that to HTML.

Your other option is to copy it to the Clipboard, then get the HTML off the Clipboard. You should find some explanations and code samples on how to get the HTML from the Clipboard in this forum.






Re: Visual Studio Tools for Office word range to html?

nitm

hi Cindy,

i'm not sure if i get you right..

you're saying that i can copy part of the word document to the clipboard and call the converter to transform the data in the clipboard into html it sounds great if that's the case.

thanks a lot!





Re: Visual Studio Tools for Office word range to html?

Cindy Meister

The converter will come into play automatically. Word places information on the clipboard in multiple formats. To see this, copy some formatted text in Word, then go to Edit/Paste Special and look at the formats listed there. These formats are generally available from the Clipboard, not just in Word.

As I said, do a search in the forum and you should find some discussions and code samples.






Re: Visual Studio Tools for Office word range to html?

nitm

thanks for the reply.

the clipboard sounds like the right option for me but i can't make it work...

i seperate parts of the word documents into different objects (i have a special class for that called Section).

each section has a few parameters and a Range which is the range from the original word document that contains the data for that section (the data can be anything from text, images, table, etc..).

i need to transform this data into html, so this is what i did:

Code Snippet

m_text.Select();

m_text.Copy();

if (Clipboard.ContainsText(TextDataFormat.Html))

Console.WriteLine(Clipboard.GetData(DataFormats.Html).ToString());

else

Console.WriteLine("no good: " + m_text.Text);

(m_text is the Range object)

all of the sections give the same result: "no good: " + the text of the section (the text is the right text).

what am i doing wrong it looks like the content of the range does not go into the clipboard...

thanks a lot!





Re: Visual Studio Tools for Office word range to html?

Cindy Meister

First thing I'd do is have your app stop right after the Copy method, with the Word application visible and accessible. Then go to Edit/Paste Special and see what formats are available there. And paste into Word to see if the text you think you copied is actually on the clipboard.

Note that, theoretically, you should be able to copy the text without selecting it first. Sometimes, Word doesn't like to use Copy and Paste with Range objects. You could also give wdApp.Selection.Copy() a try (where wdApp represents the Word.Application object).

You might also check with your code what formats the Clipboard is offering at this point.






Re: Visual Studio Tools for Office word range to html?

nitm

ok, i added break points right after the 'select' and 'copy' operations to see what's going on.

after the copy operation i checked the paste special and it had a lot of formats in there including html, i used that and it pasted the right data in the right format so it works fine, but still Clipboard.ContainsText(TextDataFormat.Html) returns false.

any idea why

thanks.





Re: Visual Studio Tools for Office word range to html?

Cindy Meister

At this point I have to admit that I'm lost. Possibly, the data is being put only on the Office Clipboard and not being passed to the Windows Clipboard But I don't know how to work around that. You might read through all the message threads that discuss this topic, I think an MSFT person has posted about the problem once, although I don't remember the details.




Re: Visual Studio Tools for Office word range to html?

nitm

it took me a while to see your reply since i was away from work...

you are probably right about the data not being passed to the windows clipboard, when i try to access the dataobject with Clipboard.GetDataObject() i get null.

is it possible that all of this is happening because i use more than one thread

i searched for similar threads but could not find anything.

thanks a lot, nitzan.





Re: Visual Studio Tools for Office word range to html?

Cindy Meister

nitm wrote:

it took me a while to see your reply since i was away from work...

you are probably right about the data not being passed to the windows clipboard, when i try to access the dataobject with Clipboard.GetDataObject() i get null.

is it possible that all of this is happening because i use more than one thread

i searched for similar threads but could not find anything.

you won't find much information on this because 1) the question is really off-topic for this forum and 2) the Office<->Clipboard interfaces aren't really documented.

I can tell you that, from everything I've read, you may indeed run into real problems when trying to work with multiple threads in connection to Office. Can you test something running in a single thread






Re: Visual Studio Tools for Office word range to html?

nitm

no need for testing, that was the problem... as it turns out you can only access the clipboard from the gui thread..

i used Form.Invoke on a delegate that copies the text and returns the html from the clipboard as string and now everything works fine.

well, not exactly fine, now i have a new problem:

my word document is writen in hebrew and when i save the html (to a text file) i get a something weird..

how do i get the html in the right encoding

thanks a lot.





Re: Visual Studio Tools for Office word range to html?

Cindy Meister

At this point you're not only off-topic, but out-of-range as far as my knowledge goes. Encoding just isn't something I deal with at that level. But there are WORD specialists who deal with encoding questions - in the word newsgroups. Klaus Linke is the best I know of, followed by Graham Mayor. Do a Google groups search on encoding questions in the word newsgroups on Microsoft.com and you should turn up messages from them so that you can decide which group would be best to target with the question.

If all else fails, the word.international.features newsgroup would probably be most appropriate as far as the topic goes.

When you post, it will help if you 1. Specify the version of Word you're working with, 2. describe "something weird" in more detail and 3. be careful not to mention you're programming Word (that will scare people off in these end-user groups). Present the problem as it occurs when you File/Save As in the Word interface.






Re: Visual Studio Tools for Office word range to html?

nitm

ok, i will try that.

thanks a lot for all your help!