J
Jay
Hey all,
We currently have a process that takes HTML from rich text box and inserts
that into a DOCX file via the Range.InsertFile method. The problem we are
having is that the produced word document ends up with strange characters
after the InsertFile is executed.
Our process is to create an HTML file for each of the rich text boxes to be
displayed in the DOCX file. We then locate the range in the DOCX file to be
the target location of the HTML and then use the InsertFile method.
During this process if you stop to look at the generated HTML file, there
are none of these characters inside the HTML document that is inserted. Word
2007 must be doing something inside the InsertFile method, but I'm not sure
how we can even begin to correct the issue.
Sample characters include:
* "Â" - appears to be inserted for "some" line-breaks in the word document
* "€™" - appears to be inserted for "some" single quotes
* "€œ" - appears to be inserted for "some" double-quote (left)
* "€�" - appears to be inserted for "some" double-quote (right)
Due to this problem we have added a method to "Clean" the DOCX file with
these strange characters. After doing all of the required InsertFile calls,
this clean method is called. We were able to replace the "Â" character with
an empty string and this appears to be working.
Sample clean method:
string capitalAWithCarrot = Convert.ToString((char) 194);
string smallAWithCarrot = Convert.ToString((char)226);
string euroSymbol = Convert.ToString((char)8364);
string trademarkSymbol = Convert.ToString((char)8482);
string ohEeSymbol = Convert.ToString((char)339);
string apostropheSymbol = Convert.ToString((char) 39);
string leftDoubleQuote = Convert.ToString((char)8220);
string rightDoubleQuote = Convert.ToString((char)8221);
findTag = capitalAWithCarrot;
replaceWithValue = string.Empty;
this.FindAndReplaceTextInDOCX(doc, findTag, replaceWithValue, queueItem);
Each one of the above is passed to a method (FindAndReplaceTextInDOCX())
that does the following:
object replaceAllAsObject = WdReplace.wdReplaceAll;
// loop through each StoryRange (section of Word doc)
foreach (Range tmpRange in doc.StoryRanges)
{
// set text to find and replace
tmpRange.Find.ClearFormatting();
tmpRange.Find.Text = findMe;
tmpRange.Find.Replacement.Text = replaceWithMe;
// set to find continue so dialog to continue is not displayed
tmpRange.Find.Wrap = WdFindWrap.wdFindContinue;
// perform replacement...passing in find and replace All
tmpRange.Find.Execute(ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref
missing, ref replaceAllAsObject, ref missing, ref missing, ref missing, ref
missing);
}
This works fine in most cases. However, I cannot identify the ansi character
value for one character that is injected. The 4th example above is actually a
euro sign, along with a square box with a question mark in side of it. I
searched an ANSI character set article
(http://www.alanwood.net/demos/ansi.html) but the square with question mark
character was not there. Any ideas as to how I would find out what character
that is?
Ideally, we would like to figure out why the InsertFile method is adding
these characters to begin with, but if not does anyone know of a way we can
find and replace that characters that I listed above?.
FYI - I posted this same issue in the VSTO forum but thought I'd have more
luck here.
Thanks,
Jay
We currently have a process that takes HTML from rich text box and inserts
that into a DOCX file via the Range.InsertFile method. The problem we are
having is that the produced word document ends up with strange characters
after the InsertFile is executed.
Our process is to create an HTML file for each of the rich text boxes to be
displayed in the DOCX file. We then locate the range in the DOCX file to be
the target location of the HTML and then use the InsertFile method.
During this process if you stop to look at the generated HTML file, there
are none of these characters inside the HTML document that is inserted. Word
2007 must be doing something inside the InsertFile method, but I'm not sure
how we can even begin to correct the issue.
Sample characters include:
* "Â" - appears to be inserted for "some" line-breaks in the word document
* "€™" - appears to be inserted for "some" single quotes
* "€œ" - appears to be inserted for "some" double-quote (left)
* "€�" - appears to be inserted for "some" double-quote (right)
Due to this problem we have added a method to "Clean" the DOCX file with
these strange characters. After doing all of the required InsertFile calls,
this clean method is called. We were able to replace the "Â" character with
an empty string and this appears to be working.
Sample clean method:
string capitalAWithCarrot = Convert.ToString((char) 194);
string smallAWithCarrot = Convert.ToString((char)226);
string euroSymbol = Convert.ToString((char)8364);
string trademarkSymbol = Convert.ToString((char)8482);
string ohEeSymbol = Convert.ToString((char)339);
string apostropheSymbol = Convert.ToString((char) 39);
string leftDoubleQuote = Convert.ToString((char)8220);
string rightDoubleQuote = Convert.ToString((char)8221);
findTag = capitalAWithCarrot;
replaceWithValue = string.Empty;
this.FindAndReplaceTextInDOCX(doc, findTag, replaceWithValue, queueItem);
Each one of the above is passed to a method (FindAndReplaceTextInDOCX())
that does the following:
object replaceAllAsObject = WdReplace.wdReplaceAll;
// loop through each StoryRange (section of Word doc)
foreach (Range tmpRange in doc.StoryRanges)
{
// set text to find and replace
tmpRange.Find.ClearFormatting();
tmpRange.Find.Text = findMe;
tmpRange.Find.Replacement.Text = replaceWithMe;
// set to find continue so dialog to continue is not displayed
tmpRange.Find.Wrap = WdFindWrap.wdFindContinue;
// perform replacement...passing in find and replace All
tmpRange.Find.Execute(ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref
missing, ref replaceAllAsObject, ref missing, ref missing, ref missing, ref
missing);
}
This works fine in most cases. However, I cannot identify the ansi character
value for one character that is injected. The 4th example above is actually a
euro sign, along with a square box with a question mark in side of it. I
searched an ANSI character set article
(http://www.alanwood.net/demos/ansi.html) but the square with question mark
character was not there. Any ideas as to how I would find out what character
that is?
Ideally, we would like to figure out why the InsertFile method is adding
these characters to begin with, but if not does anyone know of a way we can
find and replace that characters that I listed above?.
FYI - I posted this same issue in the VSTO forum but thought I'd have more
luck here.
Thanks,
Jay