Pulling out words in capitals from word doc

G

Gem_man

Plesae could someone help me out with some code to extract words that appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks
 
J

Jay Freedman

Plesae could someone help me out with some code to extract words that appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean by
"extract". Do you want to copy them to another document?
 
G

Gem_man

Thanks Jay,

Yes, copy them them to a new word. The ultimate intention is to arrange
these words in alphabetical order.

regards

Jay Freedman said:
Plesae could someone help me out with some code to extract words that appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean by
"extract". Do you want to copy them to another document?
 
H

Helmut Weber

Hi,

you might have to take care of words
containing lowercase letters which were formatted
as .Font.AllCaps = True in addition.

Greetings from Bavaria, Germany
Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word 2002, Windows 2000
 
D

Doug Robbins

The following should do what you want:

Dim Source As Document, Target As Document, Capword As Range
Set Source = ActiveDocument
Set Target = Documents.Add
Source.Activate
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="[A-Z]{2,}[ .]{1}", MatchWildcards:=True,
Wrap:=wdFindStop, Forward:=True) = True
Set Capword = Selection.Range
If Right(Capword, 1) = "." Then
Capword.End = Capword.End - 1
End If
Target.Range.InsertAfter Capword.Text & vbCr
Loop
End With
Target.Activate
Target.Content.Sort SortOrder:=wdSortOrderAscending


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
Gem_man said:
Thanks Jay,

Yes, copy them them to a new word. The ultimate intention is to arrange
these words in alphabetical order.

regards

Jay Freedman said:
Plesae could someone help me out with some code to extract words that
appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean by
"extract". Do you want to copy them to another document?
 
G

Gem_man

Thanks for that Doug. It worked a treat.

I am trying to do other things with this. I have managed to operate the code
from a userform and have managed to give the option of the CAPITAL words
being arranged in vertical or horizontal format.

Would you be so kind as to give me tip on how to be able to count the unique
occurences of each word? Also i want to be able to programatically delete all
duplicates of a word.

For some reason my Answer Wizard section of the VBA help is blank so finding
a few hints is proving extremely difficult

Many thanks

Doug Robbins said:
The following should do what you want:

Dim Source As Document, Target As Document, Capword As Range
Set Source = ActiveDocument
Set Target = Documents.Add
Source.Activate
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="[A-Z]{2,}[ .]{1}", MatchWildcards:=True,
Wrap:=wdFindStop, Forward:=True) = True
Set Capword = Selection.Range
If Right(Capword, 1) = "." Then
Capword.End = Capword.End - 1
End If
Target.Range.InsertAfter Capword.Text & vbCr
Loop
End With
Target.Activate
Target.Content.Sort SortOrder:=wdSortOrderAscending


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
Gem_man said:
Thanks Jay,

Yes, copy them them to a new word. The ultimate intention is to arrange
these words in alphabetical order.

regards

Jay Freedman said:
On Sun, 31 Jul 2005 16:27:02 -0700, "Gem_man"

Plesae could someone help me out with some code to extract words that
appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean by
"extract". Do you want to copy them to another document?
 
J

Jay Freedman

Extending Doug's macro, replace the last two lines with this:

Dim i As Long
With Target
.Content.Sort SortOrder:=wdSortOrderAscending
For i = .Paragraphs.Count To 2 Step -1
If .Paragraphs(i).Range.Text = _
.Paragraphs(i - 1).Range.Text Then
.Paragraphs(i).Range.Delete
End If
Next i
Do While Len(.Paragraphs(1).Range.Text) = 1
.Paragraphs(1).Range.Delete
Loop
.Activate
MsgBox "Found " & .Paragraphs.Count & " unique words"
End With

The earlier code puts each word in a separate paragraph (because of
the expression "& vbCr" in the InsertAfter line). The For loop
compares each paragraph, starting with the last one, to the paragraph
that precedes it. If they match, the later one of the pair is deleted.
The loop continues until the second word is compared to the first one.

The Do While loop removes any empty paragraphs, which sort to the
beginning of the document; there will always be at least one.

Finally, the MsgBox displays the paragraph count, which is now also
the count of unique words.

--
Regards,
Jay Freedman
Microsoft Word MVP

Thanks for that Doug. It worked a treat.

I am trying to do other things with this. I have managed to operate the code
from a userform and have managed to give the option of the CAPITAL words
being arranged in vertical or horizontal format.

Would you be so kind as to give me tip on how to be able to count the unique
occurences of each word? Also i want to be able to programatically delete all
duplicates of a word.

For some reason my Answer Wizard section of the VBA help is blank so finding
a few hints is proving extremely difficult

Many thanks

Doug Robbins said:
The following should do what you want:

Dim Source As Document, Target As Document, Capword As Range
Set Source = ActiveDocument
Set Target = Documents.Add
Source.Activate
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="[A-Z]{2,}[ .]{1}", MatchWildcards:=True,
Wrap:=wdFindStop, Forward:=True) = True
Set Capword = Selection.Range
If Right(Capword, 1) = "." Then
Capword.End = Capword.End - 1
End If
Target.Range.InsertAfter Capword.Text & vbCr
Loop
End With
Target.Activate
Target.Content.Sort SortOrder:=wdSortOrderAscending


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
Gem_man said:
Thanks Jay,

Yes, copy them them to a new word. The ultimate intention is to arrange
these words in alphabetical order.

regards

:

On Sun, 31 Jul 2005 16:27:02 -0700, "Gem_man"

Plesae could someone help me out with some code to extract words that
appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean by
"extract". Do you want to copy them to another document?
 
G

Gem_man

Thanks for that Jay. This is great. From your code I think I have figured how
to count the number of times the same word appears too :)

One further question. Have either you or Doug got any hints on how I can
pull not only the words that are in captal letters off the original doc but
also the preceeding 1 or maybe 2 words (irrespective of the lettering case).

I think with this info I will have enough to work out the rest of the stuff
myself... he said hopefully.

Thanks again for your assistance, I am very grateful


Jay Freedman said:
Extending Doug's macro, replace the last two lines with this:

Dim i As Long
With Target
.Content.Sort SortOrder:=wdSortOrderAscending
For i = .Paragraphs.Count To 2 Step -1
If .Paragraphs(i).Range.Text = _
.Paragraphs(i - 1).Range.Text Then
.Paragraphs(i).Range.Delete
End If
Next i
Do While Len(.Paragraphs(1).Range.Text) = 1
.Paragraphs(1).Range.Delete
Loop
.Activate
MsgBox "Found " & .Paragraphs.Count & " unique words"
End With

The earlier code puts each word in a separate paragraph (because of
the expression "& vbCr" in the InsertAfter line). The For loop
compares each paragraph, starting with the last one, to the paragraph
that precedes it. If they match, the later one of the pair is deleted.
The loop continues until the second word is compared to the first one.

The Do While loop removes any empty paragraphs, which sort to the
beginning of the document; there will always be at least one.

Finally, the MsgBox displays the paragraph count, which is now also
the count of unique words.

--
Regards,
Jay Freedman
Microsoft Word MVP

Thanks for that Doug. It worked a treat.

I am trying to do other things with this. I have managed to operate the code
from a userform and have managed to give the option of the CAPITAL words
being arranged in vertical or horizontal format.

Would you be so kind as to give me tip on how to be able to count the unique
occurences of each word? Also i want to be able to programatically delete all
duplicates of a word.

For some reason my Answer Wizard section of the VBA help is blank so finding
a few hints is proving extremely difficult

Many thanks

Doug Robbins said:
The following should do what you want:

Dim Source As Document, Target As Document, Capword As Range
Set Source = ActiveDocument
Set Target = Documents.Add
Source.Activate
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="[A-Z]{2,}[ .]{1}", MatchWildcards:=True,
Wrap:=wdFindStop, Forward:=True) = True
Set Capword = Selection.Range
If Right(Capword, 1) = "." Then
Capword.End = Capword.End - 1
End If
Target.Range.InsertAfter Capword.Text & vbCr
Loop
End With
Target.Activate
Target.Content.Sort SortOrder:=wdSortOrderAscending


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
Thanks Jay,

Yes, copy them them to a new word. The ultimate intention is to arrange
these words in alphabetical order.

regards

:

On Sun, 31 Jul 2005 16:27:02 -0700, "Gem_man"

Plesae could someone help me out with some code to extract words that
appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean by
"extract". Do you want to copy them to another document?
 
D

Doug Robbins

Replacing the

[A-Z]{2,}[ .]{1}

with

[A-z]{1,} [A-z]{1,} [A-Z]{2,}[ .]{1}

will result in the capitalised word and the two preceding words to be found.
Note that getting the sort on the capitalised words will be another story.

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
Gem_man said:
Thanks for that Jay. This is great. From your code I think I have figured
how
to count the number of times the same word appears too :)

One further question. Have either you or Doug got any hints on how I can
pull not only the words that are in captal letters off the original doc
but
also the preceeding 1 or maybe 2 words (irrespective of the lettering
case).

I think with this info I will have enough to work out the rest of the
stuff
myself... he said hopefully.

Thanks again for your assistance, I am very grateful


Jay Freedman said:
Extending Doug's macro, replace the last two lines with this:

Dim i As Long
With Target
.Content.Sort SortOrder:=wdSortOrderAscending
For i = .Paragraphs.Count To 2 Step -1
If .Paragraphs(i).Range.Text = _
.Paragraphs(i - 1).Range.Text Then
.Paragraphs(i).Range.Delete
End If
Next i
Do While Len(.Paragraphs(1).Range.Text) = 1
.Paragraphs(1).Range.Delete
Loop
.Activate
MsgBox "Found " & .Paragraphs.Count & " unique words"
End With

The earlier code puts each word in a separate paragraph (because of
the expression "& vbCr" in the InsertAfter line). The For loop
compares each paragraph, starting with the last one, to the paragraph
that precedes it. If they match, the later one of the pair is deleted.
The loop continues until the second word is compared to the first one.

The Do While loop removes any empty paragraphs, which sort to the
beginning of the document; there will always be at least one.

Finally, the MsgBox displays the paragraph count, which is now also
the count of unique words.

--
Regards,
Jay Freedman
Microsoft Word MVP

Thanks for that Doug. It worked a treat.

I am trying to do other things with this. I have managed to operate the
code
from a userform and have managed to give the option of the CAPITAL words
being arranged in vertical or horizontal format.

Would you be so kind as to give me tip on how to be able to count the
unique
occurences of each word? Also i want to be able to programatically
delete all
duplicates of a word.

For some reason my Answer Wizard section of the VBA help is blank so
finding
a few hints is proving extremely difficult

Many thanks

:

The following should do what you want:

Dim Source As Document, Target As Document, Capword As Range
Set Source = ActiveDocument
Set Target = Documents.Add
Source.Activate
Selection.HomeKey wdStory
Selection.Find.ClearFormatting
With Selection.Find
Do While .Execute(FindText:="[A-Z]{2,}[ .]{1}",
MatchWildcards:=True,
Wrap:=wdFindStop, Forward:=True) = True
Set Capword = Selection.Range
If Right(Capword, 1) = "." Then
Capword.End = Capword.End - 1
End If
Target.Range.InsertAfter Capword.Text & vbCr
Loop
End With
Target.Activate
Target.Content.Sort SortOrder:=wdSortOrderAscending


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP
Thanks Jay,

Yes, copy them them to a new word. The ultimate intention is to
arrange
these words in alphabetical order.

regards

:

On Sun, 31 Jul 2005 16:27:02 -0700, "Gem_man"

Plesae could someone help me out with some code to extract words
that
appear
in CAPITAL LETTERS in a word document?

I cannot find any reference to this in VBA help files.

Many thanks

The general approach is to use a wildcard search (see
http://www.gmayor.com/replace_using_wildcards.htm) and search for
the
expression

<[A-Z]@>

That will find each word that consists of only upper case letters.

What you do with the word once you find it depends on what you mean
by
"extract". Do you want to copy them to another document?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top