ODC connections - interpreting HTML

E

Emlyn B

I have a question about ODC connections set up to mailmerge from a datafile
into Word. Is there entry you can put into the associated Schema.ini for this
ODC connection to enable the interpretation of HTML tags so that when the
data is merged into Word the HTML gets generated and does not show as just
the HTML tags. My datafile looks like so:

TABLE
<TABLE><TR><TD WIDTH=80>Course </TD><TD WIDTH=200>Title



</TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points </TD><TD WIDTH=60>ECTS
</TD><TD WIDTH=100>Result </TD></TR></TABLE>

where table is the header and the record is the HTML. My entry in the
Schema.ini is

[names_list.txt]
Format=Delimited(})
----
I have also found that if I was just to merge this data to a Word document
without using this ODC, the HTML seems to get interpreted ok, and in the
document a table gets created. This would be using the default text
intrepretor set up in the Windows registry presumably. So would this be
possible?
Many thanks
 
P

Peter Jamieson

I'm not sure you can use the text file provider + SCHEMA.INI for that -
MS's documentation suggests that the text file provider and the HTML
provider are the same thing, but I think you have to specify different
extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"

or something like

Extended Properties=&quot;HTML Import;HDR=YES;&quot;

in a .odc file.

(The engine type, 112, is different from the text file engine type, 96,
but in this case it seems that you need the extended properties rather
than the engine type).

If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

FWIW

1. the header row can be defined as

<tr><th>field1</th><th>field2</th><th>etc.</th></tr>
or
<tr><td>field1</td><td>field2</td><td>etc.</td></tr>

2. If you have more than one table, the provider use the name in any
<caption> tag in the table if there is one, but for "uncaptioned" tables
it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.

SELECT * FROM


)

As far as I can see, this all works with simple examples but I've never
used it "for real". You appear to need a .odc when the data gets more
complex (perhaps if there is more than one table or you named a table
with a caption), but you may be able to specify the .htm file as the
Name parameter and put everything else in the Connection parameter (and
possibly the SQLStatement paramter) in simpler cases.


Peter Jamieson

http://tips.pjmsn.me.uk
Emlyn said:
I have a question about ODC connections set up to mailmerge from a datafile
into Word. Is there entry you can put into the associated Schema.ini for this
ODC connection to enable the interpretation of HTML tags so that when the
data is merged into Word the HTML gets generated and does not show as just
the HTML tags. My datafile looks like so:

TABLE
<TABLE><TR><TD WIDTH=80>Course </TD><TD WIDTH=200>Title



</TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points </TD><TD WIDTH=60>ECTS
</TD><TD WIDTH=100>Result </TD></TR></TABLE>

where table is the header and the record is the HTML. My entry in the
Schema.ini is

[names_list.txt]
Format=Delimited(})
----
I have also found that if I was just to merge this data to a Word document
without using this ODC, the HTML seems to get interpreted ok, and in the
document a table gets created. This would be using the default text
intrepretor set up in the Windows registry presumably. So would this be
possible?
Many thanks
 
E

Emlyn B

Thanks for replying. However, I probably should have shown you what the full
file looks like:

DATE_REPORT}RUN_ID}TEMPLATE_ID}TABLE_EXAMPLE
06 February 2009}18}DSUPP.DOC}<TABLE><TR><TD WIDTH=80>Course </TD><TD
WIDTH=200>Title </TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points
</TD><TD WIDTH=60>ECTS </TD><TD WIDTH=100>Result
</TD></TR></TABLE>

That is it is delimited by '}' and contains both text and html. So I want
the merge process to read all these field and merge 06 February 2009 into the
DATE_REPORT field and merge the html table into the TABLE EXAMPLE field. When
I followed your suggestion, of putting HTML Import;HDR=YES into the extended
properties, only the HTML table appears to be read. By the looks of things it
is not possible. You can only have one or the other. However if I was to open
the doc using the default Windows text converter, it seems to work ok. Is
this possible? Thanks in advance.

Peter Jamieson said:
I'm not sure you can use the text file provider + SCHEMA.INI for that -
MS's documentation suggests that the text file provider and the HTML
provider are the same thing, but I think you have to specify different
extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"

or something like

Extended Properties="HTML Import;HDR=YES;"

in a .odc file.

(The engine type, 112, is different from the text file engine type, 96,
but in this case it seems that you need the extended properties rather
than the engine type).

If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

FWIW

1. the header row can be defined as

<tr><th>field1</th><th>field2</th><th>etc.</th></tr>
or
<tr><td>field1</td><td>field2</td><td>etc.</td></tr>

2. If you have more than one table, the provider use the name in any
<caption> tag in the table if there is one, but for "uncaptioned" tables
it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.

SELECT * FROM


)

As far as I can see, this all works with simple examples but I've never
used it "for real". You appear to need a .odc when the data gets more
complex (perhaps if there is more than one table or you named a table
with a caption), but you may be able to specify the .htm file as the
Name parameter and put everything else in the Connection parameter (and
possibly the SQLStatement paramter) in simpler cases.


Peter Jamieson

http://tips.pjmsn.me.uk
Emlyn said:
I have a question about ODC connections set up to mailmerge from a datafile
into Word. Is there entry you can put into the associated Schema.ini for this
ODC connection to enable the interpretation of HTML tags so that when the
data is merged into Word the HTML gets generated and does not show as just
the HTML tags. My datafile looks like so:

TABLE
<TABLE><TR><TD WIDTH=80>Course </TD><TD WIDTH=200>Title



</TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points </TD><TD WIDTH=60>ECTS
</TD><TD WIDTH=100>Result </TD></TR></TABLE>

where table is the header and the record is the HTML. My entry in the
Schema.ini is

[names_list.txt]
Format=Delimited(})
----
I have also found that if I was just to merge this data to a Word document
without using this ODC, the HTML seems to get interpreted ok, and in the
document a table gets created. This would be using the default text
intrepretor set up in the Windows registry presumably. So would this be
possible?
Many thanks
 
P

Peter Jamieson

I see what you mean. It's quite interesting that the converter
treats the file that way.

However, so far I haven't seen a simple way to grab both the regular
data and the HTML data, certainly not during a merge operation.

Peter Jamieson

http://tips.pjmsn.me.uk

Emlyn said:
Thanks for replying. However, I probably should have shown you what the full
file looks like:

DATE_REPORT}RUN_ID}TEMPLATE_ID}TABLE_EXAMPLE
06 February 2009}18}DSUPP.DOC}<TABLE><TR><TD WIDTH=80>Course </TD><TD
WIDTH=200>Title </TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points
</TD><TD WIDTH=60>ECTS </TD><TD WIDTH=100>Result
</TD></TR></TABLE>

That is it is delimited by '}' and contains both text and html. So I want
the merge process to read all these field and merge 06 February 2009 into the
DATE_REPORT field and merge the html table into the TABLE EXAMPLE field. When
I followed your suggestion, of putting HTML Import;HDR=YES into the extended
properties, only the HTML table appears to be read. By the looks of things it
is not possible. You can only have one or the other. However if I was to open
the doc using the default Windows text converter, it seems to work ok. Is
this possible? Thanks in advance.

Peter Jamieson said:
I'm not sure you can use the text file provider + SCHEMA.INI for that -
MS's documentation suggests that the text file provider and the HTML
provider are the same thing, but I think you have to specify different
extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"

or something like

Extended Properties="HTML Import;HDR=YES;"

in a .odc file.

(The engine type, 112, is different from the text file engine type, 96,
but in this case it seems that you need the extended properties rather
than the engine type).

If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

FWIW

1. the header row can be defined as

<tr><th>field1</th><th>field2</th><th>etc.</th></tr>
or
<tr><td>field1</td><td>field2</td><td>etc.</td></tr>

2. If you have more than one table, the provider use the name in any
<caption> tag in the table if there is one, but for "uncaptioned" tables
it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.

SELECT * FROM


)

As far as I can see, this all works with simple examples but I've never
used it "for real". You appear to need a .odc when the data gets more
complex (perhaps if there is more than one table or you named a table
with a caption), but you may be able to specify the .htm file as the
Name parameter and put everything else in the Connection parameter (and
possibly the SQLStatement paramter) in simpler cases.


Peter Jamieson

http://tips.pjmsn.me.uk
Emlyn said:
I have a question about ODC connections set up to mailmerge from a datafile
into Word. Is there entry you can put into the associated Schema.ini for this
ODC connection to enable the interpretation of HTML tags so that when the
data is merged into Word the HTML gets generated and does not show as just
the HTML tags. My datafile looks like so:

TABLE
<TABLE><TR><TD WIDTH=80>Course </TD><TD WIDTH=200>Title



</TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points </TD><TD WIDTH=60>ECTS
</TD><TD WIDTH=100>Result </TD></TR></TABLE>

where table is the header and the record is the HTML. My entry in the
Schema.ini is

[names_list.txt]
Format=Delimited(})
----
I have also found that if I was just to merge this data to a Word document
without using this ODC, the HTML seems to get interpreted ok, and in the
document a table gets created. This would be using the default text
intrepretor set up in the Windows registry presumably. So would this be
possible?
Many thanks
 
P

Peter Jamieson

OK, I have had a further look at this. Frankly, I think you will
probably be better off either trying to get your source data in another
format, or if you are in a position to do so, writing code to transform
it into something easier to deal with. If you go down that path, what is
feasible depends on your coding capabilities - e.g. you could try to
write Word VBA code that reads the file and generates the Word document
you want, or you could write code that transforms the data into a format
that makes it possible to produce output using simple techniques - e.g.
transform it into two Access tables then use the Access reporter to
generate your output, or if you have to use Word for the output,
transform the data into a flat file that would then allow you to use the
"one-to-many" catalog merge documented by macropod (have a look around
this group for a pointer to that).

Of course, the chances are that the data originates from a couple of
tables in a database somewhere and that if you were able to start from
that, you could avoid all these intermediate steps.

The following idea are really so shaky I wouldn't dream of recommending
them, but since I spent quite a lot of time looking at the problem I
suppose I may as well write it up as someone may gain something from it.

In order for this to have the slightest chance of working, your HTML
must not contain any newlines/carriage returns etc. (which will break at
least part of this), and assuming that you could have multiple rows in
mixed.txt, each with an HTML table, there might be problems with number
of rows and the length of the HTML. If you only ever have one row, you
could simplify this.

Suppose your file is called mixed.txt and is in c:\a
a. Make a copy of mixed.txt called mixed.htm (even though it isn't a
..htm file)
b. edit the appropriate SCHEMA.INI so that mixed.txt is defined as
Delimiter(})
c. create a mixedtxt.odc that accesses mixed.txt as a text file - i.e.
the Data Source is c:\a and the engine type is 96
d. create a mixedhtm.odc that accesses mixed.htm as a HTM file, i.e.
the Data Source is c:\a\mixed.htm, Extended Properties=&quot;HTML
Import;HDR=YES;&quot;; and if you like, the Engine type is 112
e. use mixedtxt.odc as the data source for the merge. Don't try to
insert the column that contains the table.
f. insert a DATABASE field to insert mixedhtm.odc. At first it
probably won't work, because Word may complain about the connection
string. But reduce the field to

{ DATABASE \d "C:\\a\\mixedhtm.odc" \s "SELECT * FROM `Table`" \h }

That successfully inserts the table from row 1

To get the field to insert from subsequent rows, insert a nested field ,
something like this:

{ DATABASE \d "C:\\a\\mixedhtm.odc" \s "SELECT * FROM `Table{ = {
MERGEREC }-1 \#"0;0;''" }`" \h }

In other words, if it's record 1, use "Table", if it's record 2, use
"Table1" etc.

Merge to a new document. Reformat all the tables as necessary.



Peter Jamieson

http://tips.pjmsn.me.uk

Peter said:
I see what you mean. It's quite interesting that the converter treats
the file that way.

However, so far I haven't seen a simple way to grab both the regular
data and the HTML data, certainly not during a merge operation.

Peter Jamieson

http://tips.pjmsn.me.uk

Emlyn said:
Thanks for replying. However, I probably should have shown you what
the full file looks like:

DATE_REPORT}RUN_ID}TEMPLATE_ID}TABLE_EXAMPLE
06 February 2009}18}DSUPP.DOC}<TABLE><TR><TD WIDTH=80>Course
</TD><TD WIDTH=200>Title </TD><TD WIDTH=60>Level</TD><TD
WIDTH=60>Points </TD><TD WIDTH=60>ECTS </TD><TD
WIDTH=100>Result </TD></TR></TABLE>

That is it is delimited by '}' and contains both text and html. So I
want the merge process to read all these field and merge 06 February
2009 into the DATE_REPORT field and merge the html table into the
TABLE EXAMPLE field. When I followed your suggestion, of putting HTML
Import;HDR=YES into the extended properties, only the HTML table
appears to be read. By the looks of things it is not possible. You can
only have one or the other. However if I was to open the doc using the
default Windows text converter, it seems to work ok. Is this possible?
Thanks in advance.

Peter Jamieson said:
I'm not sure you can use the text file provider + SCHEMA.INI for that
- MS's documentation suggests that the text file provider and the
HTML provider are the same thing, but I think you have to specify
different extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"

or something like

Extended Properties="HTML Import;HDR=YES;"

in a .odc file.

(The engine type, 112, is different from the text file engine type,
96, but in this case it seems that you need the extended properties
rather than the engine type).

If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

FWIW

1. the header row can be defined as

<tr><th>field1</th><th>field2</th><th>etc.</th></tr>
or
<tr><td>field1</td><td>field2</td><td>etc.</td></tr>

2. If you have more than one table, the provider use the name in any
<caption> tag in the table if there is one, but for "uncaptioned"
tables it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.

SELECT * FROM


)

As far as I can see, this all works with simple examples but I've
never used it "for real". You appear to need a .odc when the data
gets more complex (perhaps if there is more than one table or you
named a table with a caption), but you may be able to specify the
.htm file as the Name parameter and put everything else in the
Connection parameter (and possibly the SQLStatement paramter) in
simpler cases.


Peter Jamieson

http://tips.pjmsn.me.uk

Emlyn B wrote:
I have a question about ODC connections set up to mailmerge from a
datafile into Word. Is there entry you can put into the associated
Schema.ini for this ODC connection to enable the interpretation of
HTML tags so that when the data is merged into Word the HTML gets
generated and does not show as just the HTML tags. My datafile looks
like so:

TABLE
<TABLE><TR><TD WIDTH=80>Course </TD><TD
WIDTH=200>Title



</TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points </TD><TD
WIDTH=60>ECTS </TD><TD WIDTH=100>Result
</TD></TR></TABLE>

where table is the header and the record is the HTML. My entry in
the Schema.ini is

[names_list.txt]
Format=Delimited(})
----
I have also found that if I was just to merge this data to a Word
document without using this ODC, the HTML seems to get interpreted
ok, and in the document a table gets created. This would be using
the default text intrepretor set up in the Windows registry
presumably. So would this be possible?
Many thanks
 
E

Emlyn B

Thanks for these recommendations. I will try them.

Peter Jamieson said:
OK, I have had a further look at this. Frankly, I think you will
probably be better off either trying to get your source data in another
format, or if you are in a position to do so, writing code to transform
it into something easier to deal with. If you go down that path, what is
feasible depends on your coding capabilities - e.g. you could try to
write Word VBA code that reads the file and generates the Word document
you want, or you could write code that transforms the data into a format
that makes it possible to produce output using simple techniques - e.g.
transform it into two Access tables then use the Access reporter to
generate your output, or if you have to use Word for the output,
transform the data into a flat file that would then allow you to use the
"one-to-many" catalog merge documented by macropod (have a look around
this group for a pointer to that).

Of course, the chances are that the data originates from a couple of
tables in a database somewhere and that if you were able to start from
that, you could avoid all these intermediate steps.

The following idea are really so shaky I wouldn't dream of recommending
them, but since I spent quite a lot of time looking at the problem I
suppose I may as well write it up as someone may gain something from it.

In order for this to have the slightest chance of working, your HTML
must not contain any newlines/carriage returns etc. (which will break at
least part of this), and assuming that you could have multiple rows in
mixed.txt, each with an HTML table, there might be problems with number
of rows and the length of the HTML. If you only ever have one row, you
could simplify this.

Suppose your file is called mixed.txt and is in c:\a
a. Make a copy of mixed.txt called mixed.htm (even though it isn't a
..htm file)
b. edit the appropriate SCHEMA.INI so that mixed.txt is defined as
Delimiter(})
c. create a mixedtxt.odc that accesses mixed.txt as a text file - i.e.
the Data Source is c:\a and the engine type is 96
d. create a mixedhtm.odc that accesses mixed.htm as a HTM file, i.e.
the Data Source is c:\a\mixed.htm, Extended Properties="HTML
Import;HDR=YES;"; and if you like, the Engine type is 112
e. use mixedtxt.odc as the data source for the merge. Don't try to
insert the column that contains the table.
f. insert a DATABASE field to insert mixedhtm.odc. At first it
probably won't work, because Word may complain about the connection
string. But reduce the field to

{ DATABASE \d "C:\\a\\mixedhtm.odc" \s "SELECT * FROM `Table`" \h }

That successfully inserts the table from row 1

To get the field to insert from subsequent rows, insert a nested field ,
something like this:

{ DATABASE \d "C:\\a\\mixedhtm.odc" \s "SELECT * FROM `Table{ = {
MERGEREC }-1 \#"0;0;''" }`" \h }

In other words, if it's record 1, use "Table", if it's record 2, use
"Table1" etc.

Merge to a new document. Reformat all the tables as necessary.



Peter Jamieson

http://tips.pjmsn.me.uk

Peter said:
I see what you mean. It's quite interesting that the converter treats
the file that way.

However, so far I haven't seen a simple way to grab both the regular
data and the HTML data, certainly not during a merge operation.

Peter Jamieson

http://tips.pjmsn.me.uk

Emlyn said:
Thanks for replying. However, I probably should have shown you what
the full file looks like:

DATE_REPORT}RUN_ID}TEMPLATE_ID}TABLE_EXAMPLE
06 February 2009}18}DSUPP.DOC}<TABLE><TR><TD WIDTH=80>Course
</TD><TD WIDTH=200>Title </TD><TD WIDTH=60>Level</TD><TD
WIDTH=60>Points </TD><TD WIDTH=60>ECTS </TD><TD
WIDTH=100>Result </TD></TR></TABLE>

That is it is delimited by '}' and contains both text and html. So I
want the merge process to read all these field and merge 06 February
2009 into the DATE_REPORT field and merge the html table into the
TABLE EXAMPLE field. When I followed your suggestion, of putting HTML
Import;HDR=YES into the extended properties, only the HTML table
appears to be read. By the looks of things it is not possible. You can
only have one or the other. However if I was to open the doc using the
default Windows text converter, it seems to work ok. Is this possible?
Thanks in advance.

:

I'm not sure you can use the text file provider + SCHEMA.INI for that
- MS's documentation suggests that the text file provider and the
HTML provider are the same thing, but I think you have to specify
different extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"

or something like

Extended Properties="HTML Import;HDR=YES;"

in a .odc file.

(The engine type, 112, is different from the text file engine type,
96, but in this case it seems that you need the extended properties
rather than the engine type).

If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

FWIW

1. the header row can be defined as

<tr><th>field1</th><th>field2</th><th>etc.</th></tr>
or
<tr><td>field1</td><td>field2</td><td>etc.</td></tr>

2. If you have more than one table, the provider use the name in any
<caption> tag in the table if there is one, but for "uncaptioned"
tables it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.

SELECT * FROM


)

As far as I can see, this all works with simple examples but I've
never used it "for real". You appear to need a .odc when the data
gets more complex (perhaps if there is more than one table or you
named a table with a caption), but you may be able to specify the
.htm file as the Name parameter and put everything else in the
Connection parameter (and possibly the SQLStatement paramter) in
simpler cases.


Peter Jamieson

http://tips.pjmsn.me.uk

Emlyn B wrote:
I have a question about ODC connections set up to mailmerge from a
datafile into Word. Is there entry you can put into the associated
Schema.ini for this ODC connection to enable the interpretation of
HTML tags so that when the data is merged into Word the HTML gets
generated and does not show as just the HTML tags. My datafile looks
like so:

TABLE
<TABLE><TR><TD WIDTH=80>Course </TD><TD
WIDTH=200>Title



</TD><TD WIDTH=60>Level</TD><TD WIDTH=60>Points </TD><TD
WIDTH=60>ECTS </TD><TD WIDTH=100>Result
</TD></TR></TABLE>

where table is the header and the record is the HTML. My entry in
the Schema.ini is

[names_list.txt]
Format=Delimited(})
----
I have also found that if I was just to merge this data to a Word
document without using this ODC, the HTML seems to get interpreted
ok, and in the document a table gets created. This would be using
the default text intrepretor set up in the Windows registry
presumably. So would this be possible?
Many thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top