How to clean .rtf documents

MLeditor_Dana · Nov 8, 2005

My company uses .rtf documents as "raw" documents (revised annually) that are
subsequently converted (via a special in-house conversion program) to
speically coded html documents for use on our web site. Over the past 2
years or so, we have noticed more than ever that our conversion program is
getting caught up on all the extraneous code automatically entered into .rtf
documents as they are opened and edited by various people. The code causes
our conversion program to skip important sections of text that should be
recognized and linked, and occasionally, the file will get so jumbled with
this text that the conversion program rejects it altogether!

Currently, the only way I know to fix the problem is to save the entire
document as a .txt file, then resave it as .rtf. Then reformat all my lost
formatting (bolds, italics, etc). And even then, I still sometimes have to
re-open the file as .txt and manually hunt for the offensive code! We have
over 1200 documents that I reconvert 4 times a year, so I'm wasting a ton of
time.

Is there a way to "clean" this code? Also, is there a way to turn off this
annoying code so that it doesn't get entered in the first place? (I believe
much of this formatting is RSID code that records document changes and
versions? Much of our revision process involved pasting text from other
documents, so Iâ€™m sure some of this code is getting entered as a result of
the copy/paste.)

For example, this:
2-Methyl-3-hydroxybutyryl-}{\insrsid8069086
coA}{\insrsid8069086\charrsid1904895 dehydrogenase deficiency is a rare
X-linked organic aciduria with a highly unusual \'93neurodegenerative\'94
disease

should simply be this:
2-Methyl-3-hydroxybutyryl-coA dehydrogenase deficiency is a rare X-linked
organic aciduria with a highly unusual "neurodegenerative" disease

ANY assistance is MUCH appreciated!!

MLeditor_Dana · Nov 9, 2005

Thanks Don. Unfortunately, our "html conversion" is not a simple
convert--the conversion also does a massive amount of linking, indexing, etc.
We also have a pretty sophisticated document management system.
Unfortunately it was all built based on our "raw" documents as .rtf, and it
would be a tragedy to scrap the entire system to use a different type of raw
document at this point. Integrating our conversion program with a 3rd party
software is probably futile.

I was REALLY just hoping to find a way to clean up these .rtf documents. It
just doesn't seem right that there's no way to strip out all that extra RSID
code if we don't need or want it in there.

--dana

Adjusting the layout of over 1000 existing Word documents using a macro-driven batch run	0	Nov 11, 2021
VBA - exporting serial letters to single documents	0	Oct 14, 2021
Template location changes from user to user	0	Jan 24, 2022
Problems with editing RTF in Word 2007	0	Nov 1, 2012
RTF UserForm Run-time error question	4	Feb 23, 2010
Name of file opened, being viewed	2	Mar 10, 2023
Converting URLs in comments to Markdown-formatted links	2	Dec 13, 2022
Macro to clean up hyphens in OCR documents	7	Jul 1, 2009

How to clean .rtf documents

MLeditor_Dana

MLeditor_Dana

Ask a Question

Similar Threads