K
Ker_01
This is a followup to a post from yesterday (Thanks to Tim Williams for
responding). I have more information now, and felt it warranted a second try
to see if there is way to do this now that we've gotten the documents
exposed via the web interface. Using XL2003 on WinXP.
We have a corporate web application that exposes various documents in
multiple levels of subdirectories. My belief is that these are stored in a
database, but now they are directly accessible via web links through this
web application, so where they come from hopefully doesn't affect what I am
trying to accomplish.
Starting from the main page of the web application, I need to scrape the
entire directory tree and capture some of the details (javascript links to
..doc and .pdf files that can be opened through IE6 via 'dedicated' URLs for
each document). I'm sure I'll have more questions once I start dissecting
the HTML, but for starters I need to understand how to even scrape multiple
levels within the directory tree of a website. I've copied in some of the
URLS (changed slightly for corporate security) to give a sense of what I'm
working with.
Top of tree:
http://ourserver.com/rtsa-bin/PermaSite.dll/aaumain.htm?site=omatcone&pagetitle=M S - L
I can click a link to go to the next level of subfolder:
http://ourserver.com/rtsa-bin/Perma...00cdb6&foldername=M+R&se=omatcone&pagetitle=M
Third level of folder:
http://ourserver.com/rtsa-bin/Perma...=EPD+100000-109999&site=omatconec&pagetitle=M
and so on.
A sample link for a single document within one of the pages in the web
tree/directory is:
javascriptpenDocument('0900043d802b3528');
where clicking that link ultimately opens:
http://ourserver.com/Documentation/03451TRs142.pdf
Ultimately I need to recreate all the links in an Excel workbook so users
can click on a hyperlink and access the relevant document. An Excel
hyperlink that uses the javascriptpendocument command is totally fine with
me, but first I need to collect them all. Alternatively I'll have to figure
out how to cycle through each javascript command anyway, then identify the
URL it opened (which sounds harder).
Any advice or code snippets greatly appreciated- I haven't done anything
with HTML at all.
Thanks,
Keith
responding). I have more information now, and felt it warranted a second try
to see if there is way to do this now that we've gotten the documents
exposed via the web interface. Using XL2003 on WinXP.
We have a corporate web application that exposes various documents in
multiple levels of subdirectories. My belief is that these are stored in a
database, but now they are directly accessible via web links through this
web application, so where they come from hopefully doesn't affect what I am
trying to accomplish.
Starting from the main page of the web application, I need to scrape the
entire directory tree and capture some of the details (javascript links to
..doc and .pdf files that can be opened through IE6 via 'dedicated' URLs for
each document). I'm sure I'll have more questions once I start dissecting
the HTML, but for starters I need to understand how to even scrape multiple
levels within the directory tree of a website. I've copied in some of the
URLS (changed slightly for corporate security) to give a sense of what I'm
working with.
Top of tree:
http://ourserver.com/rtsa-bin/PermaSite.dll/aaumain.htm?site=omatcone&pagetitle=M S - L
I can click a link to go to the next level of subfolder:
http://ourserver.com/rtsa-bin/Perma...00cdb6&foldername=M+R&se=omatcone&pagetitle=M
Third level of folder:
http://ourserver.com/rtsa-bin/Perma...=EPD+100000-109999&site=omatconec&pagetitle=M
and so on.
A sample link for a single document within one of the pages in the web
tree/directory is:
javascriptpenDocument('0900043d802b3528');
where clicking that link ultimately opens:
http://ourserver.com/Documentation/03451TRs142.pdf
Ultimately I need to recreate all the links in an Excel workbook so users
can click on a hyperlink and access the relevant document. An Excel
hyperlink that uses the javascriptpendocument command is totally fine with
me, but first I need to collect them all. Alternatively I'll have to figure
out how to cycle through each javascript command anyway, then identify the
URL it opened (which sounds harder).
Any advice or code snippets greatly appreciated- I haven't done anything
with HTML at all.
Thanks,
Keith