Splitting doc by header style

W

werD

Hello,

Ive got a word doc that is basically in this format with no page breaks

Page Title(Heading 2)
Page Data(Table)
Page Title(Heading 2)
Page Data(Table)
Page Title(Heading 2)
Page Data(Table)
etc...

Im using .net to loop through the doc by Paragraph and Table and Im able to
get to the plain text of the objects via a paragraph loop but i cant seem to
narrow down the data formatted as Heading 2 and just get the next table as
html or similar so that I can create some pages based on them.


Id like to do something similar to this although it's obviously not right (
i have 40 items marked with heading 2 but the headers loop only runs 3 times.)

For Each h As Word.HeaderFooter In s.Headers
'?PageTitle=h.range.Text
'?BodyText = '?Table and Text
Next

Ive seen this type of functionalty in tools like robohelp(split document
into web pages based on defined formatting) before but im not sure what the
proper logic would be do get to these pieces of data

I appreciate any insight you have and would be glad to clarify further.

Thanks in Advance
DrewG
 
S

Shauna Kelly

Hi

I think you'll need to clarify what you're doing and what you're aiming for
before we can help you.

Headings are the short paragraphs that introduce a new part of content and
are generally styled with Heading 1, Heading 2, ... Heading 9.

Headers are the text at the top (well, generally the top) of a page that is
the same on each page and might include a field to generate a page number.
They are closely related to footers, which provide the same text at the
bottom of each page. Headers and footers are properties of a Section, of
which your document as at least one. And each Section has exactly 3 headers
and 3 footers (first page, odd and even) whether or not you've chosen to
display all 3.

So, are we talking about headings or headers?

And, what version of Word are you using?

Hope this helps.

Shauna Kelly. Microsoft MVP.
http://www.shaunakelly.com/word
 
W

werD

Ah, I see. Thank you for clarifying the headers distinction. I am indeed
trying to split by headings and not headers. so I take it I should be looking
for a paragraph styled with the appropriate heading style(imc Heading 2). How
would I then grab the following table for an html conversion? My goal is to
split this document up for viewing on different web pages but Im trying to
get just the basic table and text formatting so i can load up an xml document
with it for storage. The app will be running on a machine with Office
2007(Office 12 libraries) but the document will be in Office 2003 format.
 
W

werD

If need be I can use office 2003 or xp as well


Shauna Kelly said:
Hi

I think you'll need to clarify what you're doing and what you're aiming for
before we can help you.

Headings are the short paragraphs that introduce a new part of content and
are generally styled with Heading 1, Heading 2, ... Heading 9.

Headers are the text at the top (well, generally the top) of a page that is
the same on each page and might include a field to generate a page number.
They are closely related to footers, which provide the same text at the
bottom of each page. Headers and footers are properties of a Section, of
which your document as at least one. And each Section has exactly 3 headers
and 3 footers (first page, odd and even) whether or not you've chosen to
display all 3.

So, are we talking about headings or headers?

And, what version of Word are you using?

Hope this helps.

Shauna Kelly. Microsoft MVP.
http://www.shaunakelly.com/word
 
W

werD

So i've written this loop to go through and find all paragraphs that are
Heading 2 and then pulls out all text form additonal paragraphs until a new
Heading 2 is found. I cant seem to figure out how to pull out the following
text as a table instead of just paragraph text.

So.. The Outline of a page area looks similar to this

Heading 2 Text
________________
|text1| |text4|
|text2| TEXT |text5|
|text3| |text6|
|____|_____ |____|

But My Output is this

Heading 2 Text
text1 text2 text3 TEXT text4 text5 text6

Here's the loop Ive written to get this far

Dim H2ParaFound As Boolean = False
Dim ExtraTitleChunks As Integer = 0
Dim NrmlParaChunks As Integer = 0
Dim tempPgHdrText As String = String.Empty
Dim tempParaText As String = String.Empty
Dim txtfound As Boolean = False

For Each p As Word.Paragraph In doc.Paragraphs
Dim stype As String = CType(p.Style, Word.Style).NameLocal
If stype = "Heading 2" Then
'if this is the second part of a title
If H2ParaFound = True And ExtraTitleChunks > 0 Then
tempPgHdrText &= "," & p.Range.Text
H2ParaFound = True
ExtraTitleChunks += 1
NrmlParaChunks = 0
Else
'First part of a title found
If txtfound = True Then
Me.txtResults.Text &= tempParaText & vbCrLf
tempParaText = String.Empty
txtfound = False
End If
tempPgHdrText = "Page Title: " & p.Range.Text
H2ParaFound = True
ExtraTitleChunks += 1
NrmlParaChunks = 0
End If
Else
'if this is not Heading 2
If H2ParaFound = True Then
Me.txtResults.Text &= tempPgHdrText & vbCrLf
tempPgHdrText = String.Empty
tempParaText = p.Range.Text
txtfound = True
H2ParaFound = False
ExtraTitleChunks = 0
NrmlParaChunks += 1
Else
tempParaText &= p.Range.Text
txtfound = True
tempPgHdrText = String.Empty
H2ParaFound = False
ExtraTitleChunks = 0
NrmlParaChunks += 1
End If
End If
Next
If tempParaText.Length > 0 Then
Me.txtResults.Text &= tempParaText & vbCrLf
tempParaText = String.Empty
End If

Any thoughts or insight?

DrewG

I'm starting to see why pople complain about the word Doc Obj Model
 
S

Shauna Kelly

Hi

Let's go back to the beginning here.

You want to chop a document up into many smaller documents. If you were
doing this manually, how would you do it? Bear in mind you can't select some
text and tell Word to save the selection as a separate document (and if you
think you remember ever doing that, it was with maybe WordPerfect in the
mid- to late-1980s).

You can't count on finding a range of interest and copying it into a new
document, unless (a) you base the new document on the same template as the
main document, (b) the styles in the main document haven't changed since it
was born and (c) you manually fix any section break settings (eg margins).

So, the only real way to achieve what you want is to find a bit of text you
want, delete everything above it, delete everything below it, and do Save >
As to save as the new document.

You'll have to do the same in code.

A Range has a .Start and .End that are just Longs. And, you can create a
Range and explicitly set its .Start and .End.

So as a shell:

Dim doc as Word.Document

Dim rngDeleteAbove as Word.Range
Dim rngDeleteBelow as Word.range

Dim rngStartOfFirstHeadingas Word.range
Dim rngStartOfNextHeading as Word.Range

'set doc to be the Document of interest

'Use .Find to set the rngStartOfFirstHeading

'Use .Find to set the rngStartOfNextHeading

set rngDeleteAbove = doc.Range
rngDeleteAbove.End = rngStartOfFirstHeading.Start -1

set rngDeleteBelow = doc.Range
rngDeleteBelow.Start = rngStartOfNextHeading.Start - 1

rngDeleteAbove.Delete
rngDeleteBelow.Delete

ActiveDocument.SaveAs "C:\MyPath\MyFileName.doc"



Hope this helps.

Shauna Kelly. Microsoft MVP.
http://www.shaunakelly.com/word
 
W

werD

Thanks. I figured out a good looping logic. Im looking for a lightweight all
in one solution to get this converted though. I will be saving the data to
an xml as it's loaded by the user not to individual word docs. With .net i
have no issues with this, I can store the range as xml, word doc, hashtables
etc.. The real issue that im having is getting the html equivalent of the
tables within the range. I would just do a "for each row" but in most cases
the middle column is one large merged cell so that won't work.

Is there any way beside looping through each column/cell that i can get the
entire table as an html chunk or a quickly convertable equivalent of that?

Thanks for your posts,
DrewG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top