How do I parse a document?

F

Fernando Ronci

Hi,

I've got a 1500+ lines MS-Word 2003 document that contains plenty of short
articles. Each article begins with a bulleted, bold-type title, styled with
the 7th bullet from Word's bullet gallery found at "Format | Bullets and
Numbering".

I want a macro that does the following:
- parse the whole document and extract all bulleted lines styled with the
7th bullet.
- populate a listbox with the titles (extracted above) so that I can choose
a topic and jump to the corresponding article.

I'm new to VBA programming and don't know how to tackle the problem.
I barely was able to get the line count with a piece of code I copied and
pasted from somewhere like so:

Dim NumLines As Long
With ActiveDocument.Range.Find
.ClearFormatting
.Format = False
.Forward = True
.Wrap = wdFindStop
.MatchWildcards = True
End With
NumLines = ActiveDocument.ComputeStatistics(wdStatisticLines)
MsgBox ("The document contains " & NumLines & " Lines")
End Sub

So, NumLines holds the document's line count. Now, how do I continue from
here to get the rest done? (Word's object model is huge and I'm stuck).

Guidelines and pointers appreciated.

Thank you.

Fernando Ronci
E-mail: (e-mail address removed)
 
E

Ed

Hi, Fernando. The first thing I would do is click in the title and then
click in the text underneath and see if they have different styles assigned
to them. Also, have these titles been manually formatted and the bullets
inserted as a special character? Or were they formatted using the Bullets
and Numbering scheme?

Ed
 
F

Fernando Ronci

Hello Ed,

I myself created the .doc. It's a collection of related articles that I
gathered here and there. Nothing fancy. Just chunks of plain text that I've
been saving on a single .doc so as to have all the articles in one single
place. Now the list has grown and I'm thinking of adding some logic (e.g. a
macro that displays a listbox with the titles) to fast-jump to any given
article. As I said in my original post, every article begins with a
bulleted (7th bullet) + bold-type line that acts as the title of the
article. So, it occurs to me that this fact (the bullet on the title) could
be used as the pattern that signals the beginning of an article. Also this
very first line should be used to populate the listbox.

The scenario is not difficult. The problem is that I don't know Word's
Object Model.
I'd appreciate pointers to existing macros that tackle this, or a similar
problem of parsing a document searching for a pattern and populating a
listbox.

Thanks.
Fernando
 
F

Fernando Ronci

Hello Ed,

I myself created the .doc. It's a collection of related articles that I
gathered here and there. Nothing fancy. Just chunks of plain text that I've
been saving on a single .doc so as to have all the articles in one single
place. Now the list has grown and I'm thinking of adding some logic (e.g. a
macro that displays a listbox with the titles) to fast-jump to any given
article. As I said in my original post, every article begins with a
bulleted (7th bullet) + bold-type line that acts as the title of the
article. So, it occurs to me that this fact (the bullet on the title) could
be used as the pattern that signals the beginning of an article. Also this
very first line should be used to populate the listbox.

The scenario is not difficult. The problem is that I don't know Word's
Object Model.
I'd appreciate pointers to existing macros that tackle this, or a similar
problem of parsing a document searching for a pattern and populating a
listbox.

Thanks.
Fernando
 
H

Helmut Weber

Hi Fernando,
Each article begins with a bulleted, bold-type title, styled with
the 7th bullet from Word's bullet gallery

Such a paragraph is a "listparagraph".

MsgBox ActiveDocument.ListParagraphs.Count

should therefore return the number of articles in your doc.

Then You could use a modeless userform with a listbox,
and add, lets say, the first three words of each listparagraph
to the listbox. (additem)
The listbox would then need a method like change or doubleclick,
to search the doc and select the text of the listbox.item.
All very simplified, of course.

I'll stop here, because:
I'm new to VBA programming and don't know how to tackle the problem.
For a beginner it would be quite a job.

To get you started, see:
http://word.mvps.org/faqs/Userforms/CreateAUserForm.htm

Ever thought about using "View, Outline"?

--
Greetings from Bavaria, Germany

Helmut Weber, MVP WordVBA

Win XP, Office 2003
"red.sys" & Chr$(64) & "t-online.de"
 
R

Russ

Fernando,

I don't want to discourage you from using VBA, but what you desire is more
or less built into Word. If you use Helmut's suggestion about Outline view
you might be satisfied with the results. This piece of text is taken from
Word Help:
"In outline view, you can collapse a document to show only the headings and
body text you want. This makes it easier to view the document's
organization, move through the document, and rearrange large blocks of text.
Keep in mind that you can collapse only text (your "titles") that is
formatted with the built-in heading styles (Heading 1 through Heading 9) or
outline levels (Level 1 through Level 9)."

Also, if you make (your "titles") those styles, Word can automatically
insert a 'Table of Contents' list with clickable page numbers to take you
the starting page of each article. Please read more about these subjects in
Word Help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top