samoc

If I do this process using VBA thru Word it takes about 10 seconds

If I run this code thru DotNet C# calling word it takes 30 minutes to process
30,000 paragraphs. Is the way I am doing this in DotNet correct or is there a
faster way

All I need to do is loop thru the paragraphs and do some task on the text depending on the

formatting or properties set on the Paragraph

Thanks

using Microsoft.Office.Interop.Word;

public class WordDoc

{


ApplicationClass WordApp;

public WordDoc(string sFileName)
{
WordApp = new ApplicationClass();
WordApp.Visible = false;
FileName = sFileName;
}

public void Process()
{


object read_only = false;
object visible = true;
object ofalse = false;
object otrue = true;
object dynamic = 2;


object missing = System.Reflection.Missing.Value;
object oFileName = FileName;

Document Doc = WordApp.Documents.Open(ref oFileName, ref
missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref
missing, ref missing,
ref missing, ref missing, ref missing, ref
missing, ref missing);


for(int i = 1; i <= Doc.Paragraphs.Count; i++)
{
string sLine = Doc.ParagraphsIdea.Range.Text;


.....

}

}



Re: Visual Studio Tools for Office Access Word Object Model thru C# to get at Paragraphs

Cindy Meister

This question is really off-topic in the VSTO forum. General Office interop is not supported here, just questions revolving around the VSTO technology. And unfortunately you don't mention which version of word you're targeting.

Keep in mind when you automate an Office from outside the application (no VBA) that you're going through an interface. Execution is always slower when it's not native VBA. In addition, going from .NET the code has to slog through an additional layer: .NET to COM and back again. It's going to be slow.

Given what you show us, why not read the entire document into a string, then work with the string

string docText = Doc.Content.Text

Working with the string will be entirely in the .NET Framework, so should be fairly fast. You should be able to identify the character for a paragraph (probably \r).

If this is Word 2003 or later, another approach would be to pick up and work with the document's XML instead of the plain text.






Re: Visual Studio Tools for Office Access Word Object Model thru C# to get at Paragraphs

samoc

Hi Cindy

Thanks for the info and sorry I am in the wrong Group

I previously tried the string approach but the tags weren't consistent thru the doc

I just looked at the XML again that look's like it may have alot of clutter

Is there an easy way to find tags The xml is pretty ugly to try to read thru a text editor

and IE opens it as a Document

Thru vba it was easy to loop thru paragraphs and find a paragraph where

If (para.Range.Font.Size = 9.5 And _
para.Range.Shading.BackgroundPatternColor = wdColorBlack And _
para.Range.Font.Bold = True) Then

sLine = para.Range.Text

'Do something with paragraph

End If

Thanks





Re: Visual Studio Tools for Office Access Word Object Model thru C# to get at Paragraphs

Cindy Meister

samoc wrote:

I previously tried the string approach but the tags weren't consistent thru the doc

I just looked at the XML again that look's like it may have alot of clutter

Is there an easy way to find tags The xml is pretty ugly to try to read thru a text editor

and IE opens it as a Document

Each paragraph will begin with a <wStick out tongue> tag (hope that comes through OK, the forum tends to put "smileys" in for some XML code combinations). So, using XPath, you should be able to "jump" to each paragraph, then extract only the <w:t> tags to put together the text for that paragraph.

If you haven't done a lot with XML you might want to ask in a forum or newsgroup that deals with XML to find out how to best write code that works only with specific tags like that. More information about the WordProcessingML vocabulary can be found at OpenXMLDeveloper.org.