A
adMjb
Example I want to clean word documents, so we get a consistant
structured but basic word doc, from our clients, they are not to great
in word, so we get a real mess, below is my basic requirements:
What is needed:
Take an unstructured document and create structure, so if Heading 1 is
not using H1, then start to recognise what should be H1 and convert it
to H1
So if the largest font on one to two lines (maybe more lines) is
found, it must be H1, then work down from there, H2, H3, h4…..
examples, there could be many unstructured styles below is only
examples:
unstructured document structured document
Heading 1
Heading 1 + Bold = Heading 1
Normal + 20 pt, Bold = Heading 1
Heading 1 + 22 pt, Bold = Heading 1
Heading 2
Heading 2 + Bold = Heading 2
Normal + 16 pt, Bold = Heading 2
Heading 2 + 18 pt, Bold = Heading 2
& Heading 3, 4, 5, 6
Paragraph text:
This is where the fun starts, we need to recognise what is a
paragraph, numbered and normal body text, by eliminating the headings,
and other styles (see below) I think it will be easer:
So if its NOT a heading, or quote text, or bulleted text, not in a
table, it could be a paragraph, then a basic BodyText style should be
applied, if it has a number at the front, it should be a List
Paragraph, and the correct numbering should be followed, but style
changed to numbered List Paragraph.
Quote text
If a paragraph is found with all italic, we will assume its a quote,
so the basic "quote" style should be applied.
Character Styles
There are 4 basic Character Styles we want to use, Bold, Italic,
BoldItalic and Underline, if a single word, or part of a paragraph has
been styled with the bold button, italic button, underline button or a
combination of bold and italic, the styles in the attached document
should be applied.
Can anyone help me with this? or point me in the right direction?
Many thanks,
Adam
structured but basic word doc, from our clients, they are not to great
in word, so we get a real mess, below is my basic requirements:
What is needed:
Take an unstructured document and create structure, so if Heading 1 is
not using H1, then start to recognise what should be H1 and convert it
to H1
So if the largest font on one to two lines (maybe more lines) is
found, it must be H1, then work down from there, H2, H3, h4…..
examples, there could be many unstructured styles below is only
examples:
unstructured document structured document
Heading 1
Heading 1 + Bold = Heading 1
Normal + 20 pt, Bold = Heading 1
Heading 1 + 22 pt, Bold = Heading 1
Heading 2
Heading 2 + Bold = Heading 2
Normal + 16 pt, Bold = Heading 2
Heading 2 + 18 pt, Bold = Heading 2
& Heading 3, 4, 5, 6
Paragraph text:
This is where the fun starts, we need to recognise what is a
paragraph, numbered and normal body text, by eliminating the headings,
and other styles (see below) I think it will be easer:
So if its NOT a heading, or quote text, or bulleted text, not in a
table, it could be a paragraph, then a basic BodyText style should be
applied, if it has a number at the front, it should be a List
Paragraph, and the correct numbering should be followed, but style
changed to numbered List Paragraph.
Quote text
If a paragraph is found with all italic, we will assume its a quote,
so the basic "quote" style should be applied.
Character Styles
There are 4 basic Character Styles we want to use, Bold, Italic,
BoldItalic and Underline, if a single word, or part of a paragraph has
been styled with the bold button, italic button, underline button or a
combination of bold and italic, the styles in the attached document
should be applied.
Can anyone help me with this? or point me in the right direction?
Many thanks,
Adam