How to clean .rtf documents

M

MLeditor_Dana

My company uses .rtf documents as "raw" documents (revised annually) that are
subsequently converted (via a special in-house conversion program) to
speically coded html documents for use on our web site. Over the past 2
years or so, we have noticed more than ever that our conversion program is
getting caught up on all the extraneous code automatically entered into .rtf
documents as they are opened and edited by various people. The code causes
our conversion program to skip important sections of text that should be
recognized and linked, and occasionally, the file will get so jumbled with
this text that the conversion program rejects it altogether!

Currently, the only way I know to fix the problem is to save the entire
document as a .txt file, then resave it as .rtf. Then reformat all my lost
formatting (bolds, italics, etc). And even then, I still sometimes have to
re-open the file as .txt and manually hunt for the offensive code! We have
over 1200 documents that I reconvert 4 times a year, so I'm wasting a ton of
time.

Is there a way to "clean" this code? Also, is there a way to turn off this
annoying code so that it doesn't get entered in the first place? (I believe
much of this formatting is RSID code that records document changes and
versions? Much of our revision process involved pasting text from other
documents, so I’m sure some of this code is getting entered as a result of
the copy/paste.)

For example, this:
2-Methyl-3-hydroxybutyryl-}{\insrsid8069086
coA}{\insrsid8069086\charrsid1904895 dehydrogenase deficiency is a rare
X-linked organic aciduria with a highly unusual \'93neurodegenerative\'94
disease

should simply be this:
2-Methyl-3-hydroxybutyryl-coA dehydrogenase deficiency is a rare X-linked
organic aciduria with a highly unusual "neurodegenerative" disease

ANY assistance is MUCH appreciated!!
 
M

MLeditor_Dana

Thanks Don. Unfortunately, our "html conversion" is not a simple
convert--the conversion also does a massive amount of linking, indexing, etc.
We also have a pretty sophisticated document management system.
Unfortunately it was all built based on our "raw" documents as .rtf, and it
would be a tragedy to scrap the entire system to use a different type of raw
document at this point. Integrating our conversion program with a 3rd party
software is probably futile.

I was REALLY just hoping to find a way to clean up these .rtf documents. It
just doesn't seem right that there's no way to strip out all that extra RSID
code if we don't need or want it in there.

--dana
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top