determine document language

C

Co

Hi All,

I need a code that can check a document for the language it was
written in.
If you'd have a noise list with frequently used words like in English
(the at for with him etc..)
you could try to match these with the paragraphs in the text. Then
when a certain number of matches
is reached the code could tell if the doc was in English or French or
so...
Anyone an idea or some sample code?

Regards
Marco
 
M

macropod

Hi Marco,

Check out the DetectLanguage Method in Word's VBA help file. There's even a working example there of how the method can be used.
 
K

Klaus Linke

.... and the method you want to implement in your macro (list of "stopwords")
is pretty much what Word uses out of the box, if you allow it to
automatically detect the language -- Only, it does it locally rather than at
the document level.
Somewhere in a Ressource Kit, I've seen the lists of those stop words that
Word employs, for the different languages.

Regards,
Klaus


macropod said:
Hi Marco,

Check out the DetectLanguage Method in Word's VBA help file. There's even
a working example there of how the method can be used.

--
Cheers
macropod
[MVP - Microsoft Word]


Co said:
Hi All,

I need a code that can check a document for the language it was
written in.
If you'd have a noise list with frequently used words like in English
(the at for with him etc..)
you could try to match these with the paragraphs in the text. Then
when a certain number of matches
is reached the code could tell if the doc was in English or French or
so...
Anyone an idea or some sample code?

Regards
Marco
 
C

Cindy M.

Hi Macropod,
Check out the DetectLanguage Method in Word's VBA help file. There's even a working example there of how the method can be used.
The one problem with this (vs. using the LanguageID property of a Range or Style definition) is that it requires
that the language be installed in the Windows Control Panel/Regional Settings AND recognized for Office. This works
fine if you can be sure there will be a limited number of languages, but it will cause problems if Word can't find
the language on the system.

Plus, don't forget that this actually works on a Range, and the document may contain multiple languages if what the
user types isn't something in the dictionary, or is misspelled, or if the user pastes something (from the Internet,
for example).

Personally, I think this method should *not* be used to determine in which language a document was written. But
there are developers who use it in this manner.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail
:)
 
C

Co

Hi Macropod,

method can be used.

The one problem with this (vs. using the LanguageID property of a Range or Style definition) is that it requires
that the language be installed in the Windows Control Panel/Regional Settings AND recognized for Office. This works
fine if you can be sure there will be a limited number of languages, but it will cause problems if Word can't find
the language on the system.

Plus, don't forget that this actually works on a Range, and the document may contain multiple languages if what the
user types isn't something in the dictionary, or is misspelled, or if the user pastes something (from the Internet,
for example).

Personally, I think this method should *not* be used to determine in which language a document was written. But
there are developers who use it in this manner.

Cindy Meister
INTER-Solutions, Switzerlandhttp://homepage.swissonline.ch/cindymeister(last update Jun 17 2005)http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in the newsgroup and not by e-mail
:)

Cindy,

appreciate your comments.
How would you solve such a problem?

Marco
 
K

Klaus Linke

Cindy,
appreciate your comments.
How would you solve such a problem?


Not Cindy, but start with ActiveDocument.Content.LanguageID?
If that's wdUndefined (mixed languages), look further to see what language
is applied to most of the text.

In my experience, the LanguageID tends to be mostly applied properly.
If you're sure you have docs in which it isn't, you could use the method you
proposed originally... Maybe the stopword list I mentioned would come in
handy.

Regards,
Klaus
 
C

Co

Not Cindy, but start with ActiveDocument.Content.LanguageID?
If that's wdUndefined (mixed languages), look further to see what language
is applied to most of the text.

In my experience, the LanguageID tends to be mostly applied properly.
If you're sure you have docs in which it isn't, you could use the method you
proposed originally... Maybe the stopword list I mentioned would come in
handy.

Regards,
Klaus

Klaus,

Is there a way to retrieve this Word stopword list for say English,
French, German, Dutch and Italian?

Marco
 
C

Cindy M.

Hi Klaus,
but start with ActiveDocument.Content.LanguageID?
If that's wdUndefined (mixed languages), look further to see what language
is applied to most of the text.
Agreed.

If we're talking 2003 or 2007, I might then pick up XML property and parse
through that, rather than "walk" the object model.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
K

Klaus Linke

Cindy M. said:
Hi Klaus,

Agreed.

If we're talking 2003 or 2007, I might then pick up XML property and parse
through that, rather than "walk" the object model.

True... Though if you have a list of languages you're interested in, getting
the number of characters formatted with a certain language should be pretty
quick using Find with

Selection.Find.LanguageID = wdEnglishUS

and

While .Find.Execute
nLang = nLang + Selection.End - Selection.Start
Selection.Collapse(wdCollapseEnd)
Wend

Regards,
Klaus
 
C

Co

Hi Klaus,


Agreed.

If we're talking 2003 or 2007, I might then pick up XML property and parse
through that, rather than "walk" the object model.

Cindy Meister
INTER-Solutions, Switzerlandhttp://homepage.swissonline.ch/cindymeister(last update Jun 17 2005)http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)

Cindy,

What exactly do you mean with that:
" I might then pick up XML property and parse
through that, rather than "walk" the object model"

Could you give me an example here?

MArco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top