Accessing a Word document source

J

jic

Greetings!

We have a bunch of Word documents that were created long time ago.
One of the software that we use to translate complains about some
track changes were left without accepting or rejecting. I have a
script that supposely accepts all changes to the Word doc, but it does
not work on some of these older documents. I know how to fix this
problem by hand, but I would like to do this programmatically. The
way to fix it by hand is,

1. Open the word document
2. Go to Tools > Macro > Microsoft Script Editor
3. Use Edit > Find and Replace > Find to search for the string mso-
prop-change; you should find something like:
<p class=MsoNormal style='mso-prop-change:"John Johnson"
20071130T1052'"...
4. Remove the part style='mso-prop-change:"John Johnson"
20071130T1052'"
5. Repeat this until you do not find the string mso-prop-change
anymore
6. Save the document and close the Microsoft Script Editor

I am able to open the Script Editor by doing,

var w = WScript;
var word = w.CreateObject("Word.Application");
var doc = word.Documents.Open(file);
//msoHTMLProjectOpenSourceView 1
//msoHTMLProjectOpenTextView 2
var htmldoc = word.ActiveDocument.HTMLProject.Open(2); // This
opens the Microsoft Script Editor

But, the problem is that I would like to do this behind the scenes and
not have the Script Editor come up. I would also like to do a search
and replace on that specific document. I have searched on the
Microsoft site for a solution, but none was found.

I know how to do the search and replace, all I need is to be able to
access the xml side of the Word document.

Any ideas?

thanks,

jic
 
J

jic

jic said:
Greetings!

We have a bunch of Word documents that were created long time ago.
One of the software that we use to translate complains about some
track changes were left without accepting or rejecting. I have a
script that supposely accepts all changes to the Word doc, but it does
not work on some of these older documents. I know how to fix this
problem by hand, but I would like to do this programmatically. The
way to fix it by hand is,

1. Open the word document
2. Go to Tools > Macro > Microsoft Script Editor
3. Use Edit > Find and Replace > Find to search for the string mso-
prop-change; you should find something like:
<p class=MsoNormal style='mso-prop-change:"John Johnson"
20071130T1052'"...
4. Remove the part style='mso-prop-change:"John Johnson"
20071130T1052'"
5. Repeat this until you do not find the string mso-prop-change
anymore
6. Save the document and close the Microsoft Script Editor

I am able to open the Script Editor by doing,

var w = WScript;
var word = w.CreateObject("Word.Application");
var doc = word.Documents.Open(file);
//msoHTMLProjectOpenSourceView 1
//msoHTMLProjectOpenTextView 2
var htmldoc = word.ActiveDocument.HTMLProject.Open(2); // This
opens the Microsoft Script Editor

But, the problem is that I would like to do this behind the scenes and
not have the Script Editor come up. I would also like to do a search
and replace on that specific document. I have searched on the
Microsoft site for a solution, but none was found.

I know how to do the search and replace, all I need is to be able to
access the xml side of the Word document.

Any ideas?

thanks,

jic

Does any one out there knows this? It can not be that difficult. Maybe
this has not been addressed before. Any help would be greatly appreciated.

thanks,

jic
 
D

Dorak

Here's what I use. Reviewing toolbar doesn't even show. Not sure if it will
work on older versions, this is 2003.

Sub trk()
' trk Macro
' Macro recorded 9/16/2008 by C. Dorak
'
CommandBars("Reviewing").Visible = False
WordBasic.ShowComments
WordBasic.ShowInkAnnotations
WordBasic.ShowInsertionsAndDeletions
WordBasic.ShowFormatting
WordBasic.AcceptAllChangesInDoc
ActiveDocument.TrackRevisions = Not ActiveDocument.TrackRevisions
End Sub
 
D

Dorak

I'm sorry, that was a toggle macro assuming that track changes are on and
everything was checked. Here's a better one.

Sub AcceptAllandFinal()
'
' AcceptAllandFinal Macro
' Macro recorded 11/26/2007 by Admin
'
WordBasic.AcceptAllChangesInDoc
With ActiveWindow.View
.ShowRevisionsAndComments = False
.RevisionsView = wdRevisionsViewFinal
End With
CommandBars("Reviewing").Visible = False
ActiveDocument.TrackRevisions = Not ActiveDocument.TrackRevisions
End Sub
 
J

jic

Dorak said:
I'm sorry, that was a toggle macro assuming that track changes are on and
everything was checked. Here's a better one.

Sub AcceptAllandFinal()
'
' AcceptAllandFinal Macro
' Macro recorded 11/26/2007 by Admin
'
WordBasic.AcceptAllChangesInDoc
With ActiveWindow.View
.ShowRevisionsAndComments = False
.RevisionsView = wdRevisionsViewFinal
End With
CommandBars("Reviewing").Visible = False
ActiveDocument.TrackRevisions = Not ActiveDocument.TrackRevisions
End Sub

Thanks for the help, but that does not work. I have tried that before, but
tried it again, since you had it in a different sequence. I am using
JScript, so I hope that it is not a problem with JScript. Anyway, what I
would like to do, is to... In the Word document source code, there are
"style='mso-prop-change:" that I would like to delete. For example, in the
document that I am having problem with, even though the track changes options
are off and have all been accepted, some of these "style='mso-prop-change:"
were left. These cause problems to the translation software that we use.
ie. these entries are found in the document inquestion:

<p class=CEDTable style='mso-prop-change:"John Johnson" 20060509T0844'>
<span style='font-size:8.0pt;font-family:"Arial
Narrow";mso-bidi-font-family:"Arial Narrow"'>
<o:p> </o:p>
</span>
</p>
<p class=CEDTable style='mso-prop-change:"John Johnson" 20060509T0844'>
<span
style='font-size:8.0pt;font-family:"Arial
Narrow";mso-bidi-font-family:"Arial Narrow"'>Support
for rough and highly textured stocks using patented ultrasonic Transfer
Overdrive technology<o:p></o:p>
</span>
</p>
<p class=CEDTable style='mso-prop-change:"John Johnson" 20060509T0844'>
<span style='font-size:8.0pt;font-family:"Arial
Narrow";mso-bidi-font-family:"Arial Narrow"'>
<o:p> </o:p></span>
</p>

What I would like to do is to search through the document and replace all
instances of,

" style='mso-prop-change:"John Johnson" 20060509T0844'"

with a " ". This will fix my problem. I can do this by hand and it works,
however, I would like to do it with a script.

Thanks for the help.

josé
 
J

jic

jic said:
Thanks for the help, but that does not work. I have tried that before, but
tried it again, since you had it in a different sequence. I am using
JScript, so I hope that it is not a problem with JScript. Anyway, what I
would like to do, is to... In the Word document source code, there are
"style='mso-prop-change:" that I would like to delete. For example, in the
document that I am having problem with, even though the track changes options
are off and have all been accepted, some of these "style='mso-prop-change:"
were left. These cause problems to the translation software that we use.
ie. these entries are found in the document inquestion:

<p class=CEDTable style='mso-prop-change:"John Johnson" 20060509T0844'>
<span style='font-size:8.0pt;font-family:"Arial
Narrow";mso-bidi-font-family:"Arial Narrow"'>
<o:p> </o:p>
</span>
</p>
<p class=CEDTable style='mso-prop-change:"John Johnson" 20060509T0844'>
<span
style='font-size:8.0pt;font-family:"Arial
Narrow";mso-bidi-font-family:"Arial Narrow"'>Support
for rough and highly textured stocks using patented ultrasonic Transfer
Overdrive technology<o:p></o:p>
</span>
</p>
<p class=CEDTable style='mso-prop-change:"John Johnson" 20060509T0844'>
<span style='font-size:8.0pt;font-family:"Arial
Narrow";mso-bidi-font-family:"Arial Narrow"'>
<o:p> </o:p></span>
</p>

What I would like to do is to search through the document and replace all
instances of,

" style='mso-prop-change:"John Johnson" 20060509T0844'"

with a " ". This will fix my problem. I can do this by hand and it works,
however, I would like to do it with a script.

Thanks for the help.

josé

I have a work-around. If I save the file as XML and then back to DOC, the
problem goes away. Something like this, in JScript...

word.ActiveDocument.XMLSaveDataOnly = false;
word.ActiveDocument.XMLUseXSLTWhenSaving = false;
word.ActiveDocument.XMLSaveThroughXSLT = "";
word.ActiveDocument.XMLHideNamespaces = false;
word.ActiveDocument.XMLShowAdvancedErrors = false;
word.ActiveDocument.XMLSchemaReferences.HideValidationErrors = false;
word.ActiveDocument.XMLSchemaReferences.AutomaticValidation = true;
word.ActiveDocument.XMLSchemaReferences.IgnoreMixedContent = false;
word.ActiveDocument.XMLSchemaReferences.AllowSaveAsXMLWithoutValidation =
false;
word.ActiveDocument.XMLSchemaReferences.ShowPlaceholderText = false;

word.ActiveDocument.SaveAs( "filename.xml",
wdFormatXML,
false,
"",
false,
"",
false,
false,
false,
false,
false);

and then save it back to DOC:

word.ActiveDocument.SaveAs( "filename.doc",
wdFormatDocument,
false,
"",
false,
"",
false,
false,
false,
false,
false);

These steps clean the bad XML code. Well, at least for my problem.

thanks,

josé
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top