How to extract data between two consecutive numbering lists...

A

Andi Setiawan

Dear everyone...

I need your suggestion or hint how to accomplish my problem below.

Just give me a rough explanation in such away that I can go to the correct
path.



My problem is:

There are many (more than ten thousands) Microsoft Word *.doc files.

Each contains 30 mathematics problems numbered using numbering bullet. There
is no bookmark.

Each problem might contain OLE objects, pictures, and mathematics equations.



What I want to do is to separate each problem (no matter in which files it
is) into a specified folder.

More precisely,



A problem 1 from F1.doc is extracted to create a new document file G1.doc
and is saved in a folder Folder1.

A problem 1 from F2.doc is extracted to create a new document file G2.doc
and is saved in a folder Folder1.

………………………………..

………………………………..

………………………………..

………………………………..

A problem 2 from F1.doc is extracted to create a new document file G1.doc
and is saved in a folder Folder2.

A problem 2 from F2.doc is extracted to create a new document file G2.doc
and is saved in a folder Folder2.

………………………………..

………………………………..

………………………………..

………………………………..

A problem 30 from F1.doc is extracted to create a new document file G1.doc
and is saved in a folder Folder30.

A problem 30 from F2.doc is extracted to create a new document file G2.doc
and is saved in a folder Folder30.

………………………………..

………………………………..

………………………………..

………………………………..





I know C#, Regex, and XML.

I also have installed Office 2003 and Visual C# Express Edition.



Thank you in advance….





Best regards,



Andi Setiawan
 
C

Cindy M.

Hi =?Utf-8?B?QW5kaSBTZXRpYXdhbg==?=,
I need your suggestion or hint how to accomplish my problem below.

Just give me a rough explanation in such away that I can go to the correct
path.
Any chance STYLES have been used to format the document? That would be the
easiest way to identify the different areas of text...

Which version of Word do you have at your disposal?
My problem is:

There are many (more than ten thousands) Microsoft Word *.doc files.

Each contains 30 mathematics problems numbered using numbering bullet. There
is no bookmark.

Each problem might contain OLE objects, pictures, and mathematics equations.



What I want to do is to separate each problem (no matter in which files it
is) into a specified folder.

More precisely,



A problem 1 from F1.doc is extracted to create a new document file G1.doc
and is saved in a folder Folder1.

A problem 1 from F2.doc is extracted to create a new document file G2.doc
and is saved in a folder Folder1.

………………………………..

………………………………..

………………………………..

………………………………..

A problem 2 from F1.doc is extracted to create a new document file G1.doc
and is saved in a folder Folder2.

A problem 2 from F2.doc is extracted to create a new document file G2.doc
and is saved in a folder Folder2.

………………………………..

………………………………..

………………………………..

………………………………..

A problem 30 from F1.doc is extracted to create a new document file G1.doc
and is saved in a folder Folder30.

A problem 30 from F2.doc is extracted to create a new document file G2.doc
and is saved in a folder Folder30.

………………………………..

………………………………..

………………………………..

………………………………..





I know C#, Regex, and XML.

I also have installed Office 2003 and Visual C# Express Edition.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 17 2005)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top