How to get the plain text (without "||||||||||") from word ducment includes table?

H

Hurry Wood

I write addin for Word 2k. The task is to retrieve plain text of word
document and do some process on them, then position them back to doucment.
The problem is that the number of characters returned by
"Document.Characters.Range.Count" does not equal to the number of characters
of text returned by "Document.Characters.Range.Text". Actually, that problem
always takes place when the ducment includes some tables or cells. The task
require to retrieve "Plain text" from document, but as the result of
research, I find the Tables or Cells are translated to some special
characters (i.e. "||||||||||||||||"). Those special characters are not
allowed in my task, how to get the plain text (without table characters
"||||||") from the doucment?

Hurry
 
C

Cindy M -WordMVP-

Hi Hurry,

If you ever encounter a document with field codes, I think you'll find the
problem even more pronounced.

In order to get only text, and ignore table stuff, I think you'd need to loop
the Paragraphs collection of the document's MainStory, picking up the
.Range.Text for each. And if you need to take fields into consideration as
well, look at the information on TextRetrievalMode so that you can set it to
ignore field codes (and deal with hidden text) for the range you're
processing.
I write addin for Word 2k. The task is to retrieve plain text of word
document and do some process on them, then position them back to doucment.
The problem is that the number of characters returned by
"Document.Characters.Range.Count" does not equal to the number of characters
of text returned by "Document.Characters.Range.Text". Actually, that problem
always takes place when the ducment includes some tables or cells. The task
require to retrieve "Plain text" from document, but as the result of
research, I find the Tables or Cells are translated to some special
characters (i.e. "||||||||||||||||"). Those special characters are not
allowed in my task, how to get the plain text (without table characters
"||||||") from the doucment?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
H

Hurry Wood

Hi Cindy,

Thank you for your reply.
I have considered the field codes and hidden text, and have them to put
FALSE so that prevent them from appeared in the text. Is there a
specification about what unicode codes will be added into text (i.e. table
mark)?
 
C

Cindy M -WordMVP-

Hi Hurry,
I have considered the field codes and hidden text, and have them to put
FALSE so that prevent them from appeared in the text. Is there a
specification about what unicode codes will be added into text (i.e. table
mark)?
If you loop through the paragraphs you won't pick up any table characters.
There is no way to "filter out" tables when picking up a document's
range.text as a whole.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
H

Hurry Wood

Hi Cindy,

Thank you for your help.

I have tried walking through the paragraphs and then retrieve text by call
"paragraphs(index).Range.Text", as result the table like marks were still
there. It did not equal to the number of
"paragraphs(index).Range.Characters.Count" yet that the length of text
retrieved by invoke "paragraphs(index).Range.Text" What else should I do?

Hurry
 
C

Cindy M -WordMVP-

Hi Hurry,
I have tried walking through the paragraphs and then retrieve text by call
"paragraphs(index).Range.Text", as result the table like marks were still
there. It did not equal to the number of
"paragraphs(index).Range.Characters.Count" yet that the length of text
retrieved by invoke "paragraphs(index).Range.Text" What else should I do?
Can you get the ANSI character codes for these "table marks"? I've never,
ever seen anything to do with any tables when picking up range.text from
paragraphs in a table...

Please also post the code you're using to get the document object and pick
up the text. And which programming language environment are you working in?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
H

Hurry Wood

Hi Cindy,

Nope, I dont know if it is special table marks in Word, but I am sure that
it is not ANSI tab mark. I guess those additional marks might be defined as
MS Word-Specified Tabble mark, cause in that those codes always be out there
every time the document include table contents. The problem is that the
number of characters retrieved from Range.Text doesnt equal to it that the
result from Range.Characters.Count. You could try verify the result for
"Range.Characters.Count" and "number of characters of Range.Text", if you do
so, you can find they are always different results if your active document
includes some "Table contents"


Hurry
 
C

Cindy M -WordMVP-

Hi Hurry,
Nope, I dont know if it is special table marks in Word, but I am sure that
it is not ANSI tab mark. I guess those additional marks might be defined as
MS Word-Specified Tabble mark, cause in that those codes always be out there
every time the document include table contents. The problem is that the
number of characters retrieved from Range.Text doesnt equal to it that the
result from Range.Characters.Count. You could try verify the result for
"Range.Characters.Count" and "number of characters of Range.Text", if you do
so, you can find they are always different results if your active document
includes some "Table contents"
Read my previous response carefully. I was NOT asking about tab characters. I
was asking which ANSI characters these odd things are that you're picking up.

Yes, of course a table will generate/contain characters in the range. It must.
How else would Word know a table is there? But if you're still trying to pick
up characters across an entire range that includes a table, rather than
paragraph-by-paragraph, that's what you're going to get. And if you can't
research the question I asked about what character codes these odd characters
are, then you really aren't going to get out of this loop you're in...

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
H

Hurry Wood

Hi Cindy,

I have tried the Paragrph-By-Paragraph resolvent, but I still got some
additional codes, which included "|||||". Could you post a code snippet over
here about it? (VB or VC++)


Hurry
 
C

Cindy M -WordMVP-

Hi Hurry,
I have tried the Paragrph-By-Paragraph resolvent, but I still got some
additional codes, which included "|||||". Could you post a code snippet over
here about it? (VB or VC++)
Try this. It would be helpful if you'd *TEST* what one asks you to test. If
you had checked the character number a few days ago, and reported ANSI 7, I
could have told you right away what was going on. But if you can't provide
information, it makes it difficult for anyone to help you. Every table cell
contains an end-of-cell marker, and depending on how your code picks up the
text, it will pick this up, as well.

Sub LoopParasForText()
Dim doc As Word.Document
Dim para As Word.Paragraph
Dim sText As String

Set doc = ActiveDocument

For Each para In doc.Range.Paragraphs
sText = sText & Replace(para.Range.Text, Chr$(7), "")
Next para
Debug.Print sText
End Sub

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
H

Hurry Wood

Hi Cindy,

Thank you for help to me.
Yes, the code you posted is what I want. Could you provide a list of those
ANSI codes (just as ANSI 7) which MS Word might add in private?

Hurry
 
C

Cindy M -WordMVP-

Hi Hurry,
Could you provide a list of those
ANSI codes (just as ANSI 7) which MS Word might add in private?
Not a comprehensive one, probably, since these things don't
necessarily crop up that often.

Chr$(13) you're probably already familiar with (paragraph mark)

Chr$(11) = new line (Shift+Enter)

Chr$(12) = new page (Ctrl+Enter)

Chr$(19) = field code, opening bracket

Chr$(20) = field code, closing bracket

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8
2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow
question or reply in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top