Need to open HTML Document

M

Mark

Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?
 
T

T Lavedas

Hello, I have code in my program that can open HTML Files and
search them for hrefs. There are also HTML Documents that I need to
search. If selected they open in IE, if I right click and say open
with, then they open as a text file. Is there a way to do this in my
code?

What do you mean by HTML *files* as opposed to HTML *documents*? Are
the files local and the documents accessed over the web? Also, what
do you mean by *selected*? I thought you said they were being
accessed by code - as text files with FSO, I suppose. If so, I don't
see how they could *open*. Please explain how this happens.

I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond. Since you are asking the question again, I thought I'd chime
in to get clarification. Then maybe I or someone else will be able to
respond with something useful.

Tom Lavedas
===========
http://members.cox.net/tglbatch/wsh/
 
M

Mark

What do you mean by HTML *files* as opposed to HTML *documents*?  Are
the files local and the documents accessed over the web?  Also, what
do you mean by *selected*?  I thought you said they were being
accessed by code - as text files with FSO, I suppose.  If so, I don't
see how they could *open*.  Please explain how this happens.

I believe you posted some code yesterday, but since I didn't
understand the distinction between files and documents, I didn't
respond.  Since you are asking the question again, I thought I'd chime
in to get clarification.  Then maybe I or someone else will be able to
respond with something useful.

Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/

Hello

Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P></P> tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?

Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)
Dim re, matches, match, d, uri, name, r As Long, c As Range, Lrow
As Long
Dim saveLink
Dim iRet
saveLink = False

Set re = CreateObject("vbscript.regexp")
re.Pattern = "<a\s+.*?href=[""\']?([^""\' >]*)[""\']?[^>]*>(.*?)<\/
a>"
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
Set matches = re.Execute(html)
For Each match In matches
iRet = InspectLink(GetURLAddress(match))
If (iRet > 0) Then
Cells(Globalindx, 2) = strFilex
Cells(Globalindx, 3) = strTitlex
Cells(Globalindx, 4) = GetURLTitle(match)
Cells(Globalindx, 5) = GetURLAddress(match)
Cells(Globalindx, 6) = GetType(iRet)

Globalindx = Globalindx + 1
End If


Next
Set matches = Nothing
Set re = Nothing


End Function
 
T

T Lavedas

Hello

Well basically, the thing was, there were two types of files if you
double clicked, one that would open in IE, and one that would open in
Notepad, but both files are HTML. The one that opens in IE is named
HTML Document, and the one that opens in Notepad is HTML File. I went
into folder options and changed it so that both of them open in
Notepad to that my program can search the source code for hrefs. I
did this because it was finding some hrefs on some files but not
others. After researching a bit more I have found what the problem
is. With the hrefs that are being found, in the source code they are
on one line, between the <P></P> tags. On the ones that it is not
finding, there are multiple hrefs within the tags. The code that I
have to search for the hrefs is below and I think I need to add more
code to grab these others. Any ideas?

Private Function GetHrefs(ByVal html, ByVal strFilex, ByVal strTitlex)
{code snipped}

Yes, I would enlist IE to do all of the searching and parsing,
something like this ...

Function ListHref(sHTMLText)
Dim s
with CreateObject("htmlfile")
.write(sText)
.close
for each sTagType in Array("A", "Base", "Link", "Area")
set tags = .parentWindow.document.body.all.tags(sTagType)
for each tag in tags
s = s & tag.href & vbnewline
next ' tag
next ' tagtype
ListHref = Split(s, vbnewline) ' returns an array
end with
End Function

Just pass it the test read from your HTML file and it returns an array
of all the HREFs, on reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet.

Tom Lavedas
===========
http://members.cox.net/tglbatch/wsh/
 
T

T Lavedas

{code snipped}

Yes, I would enlist IE to do all of the searching and parsing,
something like this ...

Function ListHref(sHTMLText)
Dim s
with CreateObject("htmlfile")
.write(sText)
.close
for each sTagType in Array("A", "Base", "Link", "Area")
set tags = .parentWindow.document.body.all.tags(sTagType)
for each tag in tags
s = s & tag.href & vbnewline
next ' tag
next ' tagtype
ListHref = Split(s, vbnewline) ' returns an array
end with
End Function

Just pass it the test read from your HTML file and it returns an array
of all the HREFs, on reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet.

Tom Lavedas
===========http://members.cox.net/tglbatch/wsh/

Oops, theres an error in the posted code. This line ...

.write(sText)

should read ...

.write(sHTMLText)

Also, there are two typos in the last paragraph. It should have
read ...

"Just pass it the text read from your HTML file and it returns an
array
of all the HREFs, one reference per element. You can use a For Each
loop on the array to place the results into your spreadsheet."

Tom Lavedas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top