VB text/formatting extracttion from Word XP

T

theWatcher

Hello

I am writing a VB6 app that opens a Word document (Word XP) and populates the
pre-existing Doc Variables with data from a database. The user is then able to
edit the text and/or formatting (italics, superscript, bold, etc.) of these
fields. Once the changes have been made, the VB app needs to extract the
modified data from the specific fields (text and/or formatting), analyze it, and
store it in the database so that the exact document may be reproduced at a later
date. The questions I have are

1. How do I extract the modified data from the doc variable fields? The
Variables("varname").Value property only contains the data I originally sent to
it and seems not to be dynamically linked to the field on the document. The
closest I have come is to use the Fields(x).Select and Selection.Text
combination, but I'd rather have access to the named variables, since
Fields(1,2,3...) may not refer to the same information in every document

2. How do I identify formatting changes in a specific field? If the user
simply changes normal text to italicized, I need to capture that change as well

3. How do I extract formatting codes from Word so that the formatted text can
be preserved in a database for future reproduction? How do I read that
formatted data into Word later to reproduce the exact document? (Is this
HTML-type formatting?

4. How do I interrupt or Cancel the Close event in Word? I do not want for the
user to be able to close the document before the app has a chance to process the
data, so I have hidden all of the Close/Save functions in the File menu, but the
Close button (X) at the top, right of the screen still exists. I want to cancel
the close event (similar to Canceling the FormUnload event for VB forms), but
have been unable to find one for Word

Thank you for any help you can provide.
 
J

Jonathan West

Hi Watcher,

Quite a few questions here, answers below each one.

theWatcher said:
Hello -

I am writing a VB6 app that opens a Word document (Word XP) and populates the
pre-existing Doc Variables with data from a database. The user is then able to
edit the text and/or formatting (italics, superscript, bold, etc.) of these
fields. Once the changes have been made, the VB app needs to extract the
modified data from the specific fields (text and/or formatting), analyze it, and
store it in the database so that the exact document may be reproduced at a later
date. The questions I have are:

1. How do I extract the modified data from the doc variable fields? The
Variables("varname").Value property only contains the data I originally sent to
it and seems not to be dynamically linked to the field on the document. The
closest I have come is to use the Fields(x).Select and Selection.Text
combination, but I'd rather have access to the named variables, since
Fields(1,2,3...) may not refer to the same information in every document.

Two possibilities spring to mind.

1. Mark each relevant field with a bookmark, so you can identify the ones
you want to read from the bookmark, or

2. Check the Code property of each field to check that it is reflecting the
correct docvariable.

Which of those is more appropriate to you depends on your exact
application.
2. How do I identify formatting changes in a specific field? If the user
simply changes normal text to italicized, I need to capture that change as
well.

Once you have identified the field, you can check the various properties of
its Font object, and see whether they still match the properties of the
underlying style.
3. How do I extract formatting codes from Word so that the formatted text can
be preserved in a database for future reproduction? How do I read that
formatted data into Word later to reproduce the exact document? (Is this
HTML-type formatting?)

There is no direct way of doing this that I know of. The only suggestion I
can make is to copy the relevant text into the clipboard, using the
Selection.Copy or Range.Copy method, and then get it into the database from
there.
4. How do I interrupt or Cancel the Close event in Word? I do not want for the
user to be able to close the document before the app has a chance to process the
data, so I have hidden all of the Close/Save functions in the File menu, but the
Close button (X) at the top, right of the screen still exists. I want to cancel
the close event (similar to Canceling the FormUnload event for VB forms), but
have been unable to find one for Word.

Take a look at this article

Intercepting events like Save and Print
http://www.mvps.org/word/FAQs/MacrosVBA/InterceptSavePrint.htm
 
T

theWatcher

Jonathan

Thank you for your reply. I was unaware of the "multiple posts" rule/etiquette - I will avoid posting to more than one group in the future

I think I'll follow your advice regarding the addition of bookmarks to ID the fields I'm analyzing. It's too bad that Word DocVariables are not dynamically linked to their fields. This seems a logical thing to do. Your suggestion appears to be a decent workaround

Regarding the interruption of the Close event, however, I do not see how it is possible to interrupt or cancel the event. I read your article on different Word events - I was hoping that Word understood some BeforeClose event, but it seems that only the Document_Close event exists. While I can execute code at this point and even delay the close, I cannot keep Word from closing. This is a problem, since I need to read information from the document (once focus has been given back to the VB app), and I also want to give the user the option to continue editing the document

You suggested that I "check the various properties of its Font object, and see whether they still match the properties of th
underlying style", to find out if formatting changes have been made. A single field may contain a number of formatting styles (some text is bold, some normal, some italicized, some superscript, etc. Will your proposed method be able to identify the changes even still, or will I need to parse the text a character at a time to compare which (if any) have been modified

Lastly, regarding the preservation of text formatting, have you any experience with importing/exporting text to/from Word through the use of HTML tags

Thanks again.
 
J

Jonathan West

theWatcher said:
Jonathan -

Thank you for your reply. I was unaware of the "multiple posts"
rule/etiquette - I will avoid posting to more than one group in the future.

The reason for that can be seen in this article

Tips from MVPs on posting to the Word newsgroups
http://www.mvps.org/word/FindHelp/Posting.htm

The problem caused by multiposting, is that someone can spend a long time
researching the answer unaware that it has already been answered elsewhere.
As this is a peer-to-peer group, nobody gets paid to answer questions, and
so if effort is not duplicated, it gives more chance for everybody's
questions to be answered.
I think I'll follow your advice regarding the addition of bookmarks to ID
the fields I'm analyzing. It's too bad that Word DocVariables are not
dynamically linked to their fields. This seems a logical thing to do. Your
suggestion appears to be a decent workaround.
Regarding the interruption of the Close event, however, I do not see how
it is possible to interrupt or cancel the event. I read your article on
different Word events - I was hoping that Word understood some BeforeClose
event, but it seems that only the Document_Close event exists. While I can
execute code at this point and even delay the close, I cannot keep Word from
closing. This is a problem, since I need to read information from the
document (once focus has been given back to the VB app), and I also want to
give the user the option to continue editing the document.


You can use the DocumentBeforeClose event of the Application object. That
event can be cancelled. The "Intercepting events" section of the article I
mentioned earlier includes a code example for a similar event, the
DocumentBeforePrint event, which shows how to cancel the process. The same
principle applies to DocumentBeforeClose

You suggested that I "check the various properties of its Font object, and
see whether they still match the properties of the
underlying style", to find out if formatting changes have been made. A
single field may contain a number of formatting styles (some text is bold,
some normal, some italicized, some superscript, etc. Will your proposed
method be able to identify the changes even still, or will I need to parse
the text a character at a time to compare which (if any) have been modified?

For any particular characteristic, if all the text shares the same value
then that value is returned. If different characters have different values,
then you get back the value 9999999, and you then know you have to look
character by character.

For instance, if all the text is bold, then .Font.Bold is -1. If all the
text is not bold, then the value is 0, and if some characters are bold, then
the value returned is 9999999.

Lastly, regarding the preservation of text formatting, have you any
experience with importing/exporting text to/from Word through the use of
HTML tags?

There are two options I would explore in your position.

1. Copy the relevant text to the clipboard, and then use Windows API code to
extract the HTML version of the text and save it where you want it. For
this, you might be interested in a code sample produced a couple of years
ago by Karl Peterson, that appears to do what you want. Go to
http://www.mvps.org/vb/index2.html?samples.htm, and scroll down to the
ClipEx.zip section.

2. Bookmark the relevant text, save the file as HTML, and then use simple
text analysis to work out which formatted text falls within the bookmark you
defined, and extract that.

I've not had to deal with your specific problem, so haven't looked at
either of those solutions in great detail, but hopefully they should be able
to point you in a useful direction.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top