parsing a word document

nitm · Jun 17, 2007

hi everyone,

i need to parse a word document with special tags.
the documents always have the same structure:
each document is devided into sections, each section starts with a <section>
tag, after that tag there can be any number of optional tags (i.e: <title>i
am a title<title>, <author>paul auster<author>, <date>06/06/2007<date>, and
so on...) and after these tags comes the section text.

i need to go over the entire document and break these sections apart (i have
a special object for a section).

i use c# and i can't find anything on the web that can help me with this...
here's what i have so far:

Microsoft.Office.Interop.Word.Range currentRange, baseRange;
currentRange = wordDoc.Content;
baseRange = currentRange.Duplicate;
bool ans = currentRange.Find.Execute(ref searchFor, ref falseObj, ref
falseObj, ref trueObj, ref falseObj,
ref falseObj, ref trueObj, ref wrap, ref falseObj, ref missing, ref
falseObj, ref falseObj,
ref falseObj, ref falseObj, ref falseObj);
if (!ans) {
addUserMessage("Error: document does not seem to be in a valid format");
}
else while (ans) {
baseRange.Start = currentRange.End + 1;
ans = currentRange.Find.Execute(ref searchFor, ref falseObj, ref falseObj,
ref trueObj, ref falseObj,
ref falseObj, ref trueObj, ref wrap, ref falseObj, ref missing, ref
falseObj, ref falseObj,
ref falseObj, ref falseObj, ref falseObj);

baseRange.End = currentRange.Start - 1;
addUserMessage(baseRange.Text);
}

this works great except that it always goes to the start of the document
instead of returning false to ans... thus the loop never stops.

what's wrong with my code, and if anyone thinks that there's a better way to
do what i'm trying to do i'll be happy to know about it...

thanks a lot, nitzan

Cindy M. · Jun 18, 2007

Hi =?Utf-8?B?bml0bQ==?=,

this works great except that it always goes to the start of the document
instead of returning false to ans... thus the loop never stops.

What have you defined for: ref wrap

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail

nitm · Jun 18, 2007

the wrap parameter is set to wdFindContinue, is that wrong?

thanks!

Cindy M. · Jun 19, 2007

Hi =?Utf-8?B?bml0bQ==?=,

the wrap parameter is set to wdFindContinue, is that wrong?

That's the problem

(Glad it was a simple thing to track
down - you had me worried!)

You want wdFindStop

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun
17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow
question or reply in the newsgroup and not by e-mail

C# to concatenate word documents; problem	4	May 22, 2008
Automatic formatting of XML documents using styles (Word 2007)	1	Oct 23, 2006

parsing a word document

nitm

Cindy M.

nitm

Cindy M.

Ask a Question

Similar Threads