K
Ker_01
I'm parsing an HTML file, and originally, I thought I only needed to capture
all the links- the following worked well in my particular application
(sample HTML snippet pasted at bottom of post):
^<A HREF=.*>
However, now I've found that I only need to capture and process certain
links. The information that will determine whether a link needs to be
processed is buried between the original link and the next link (or EOF), so
I need to capture a larger (multiline) section of text and test each one to
see if it contains my identifier. It appears that I'm safe using the </TR>
tag as something that always comes after my new identifier and before the
next link (or EOF). So, I'm trying to edit my regex so I can grab this
larger (multiline) section of text, then if the identifier is the correct
one, I'll use my first regex expression or a slightly modified version to
grab just the URL from within the match.
I've been using http://www.aivosto.com/vbtips/regex.html as a helpful source
on regex expressions, but when I test my code on
http://regexlib.com/RETester.aspx I'm getting no results (my first
expression worked fine). Any assistance would be greatly appreciated. I
think I'm pretty close, but the following isn't working:
^<A HREF=.*/TR>
Any advice? The only difference is replacing the single '>' with '/TR>'. I
suspect it may have to do with spaces or linebreaks, but I don't know for
certain.
I'm posting a sample of my much larger HTML below; I'm trying to only
capture the ^<A HREF=.*> URL match for items where the class td includes
"Land Spread Vector".
I prefer using multiple simple Regex expressions versus one donated
expression that does it all, so I can understand my own code and at least
attempt to troubleshoot if I need to change anything.
Thanks!
Keith
<A Href=javascriptpenDocument('0900043d802b3528');>
<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16>
101998
</a>
</td>
<td class='classtd'>
Green-tipped Martin
</td>
<td class='classtd'>
CURRENT,3.2
</td>
</TR>
<TR>
<TD></TD>
<TD>
<A Href=javascriptpenDocument('0900043d803a1ce4');>
<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16>
101998 - APRRE - Assert.doc
</a>
</td>
<td class='classtd'>
Land Spread Vector
</td>
<td class='classtd'>
CURRENT,3.0
</td>
</TR>
<TR>
<TD></TD>
<TD>
<A Href=javascriptpenDocument('0900043d802b635e');>
<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16>
101998-R
</a>
</td>
<td class='classtd'>
Reevaluation
</td>
<td class='classtd'>
CURRENT,1.0
</td>
</TR>
</TD></TR></TABLE><BR><BR>
<CENTER>
<A Href='javascript:history.back();'><img
src='/OurDir/images/back_down.jpg' border=0 align='center'
alt='Back'></A>
<A Href='javascript:goHome();'><img
src='/OurDir/images/home_down.jpg' border=0 align='center' alt='Home'></A>
</CENTER>
</BODY>
</HTML>
all the links- the following worked well in my particular application
(sample HTML snippet pasted at bottom of post):
^<A HREF=.*>
However, now I've found that I only need to capture and process certain
links. The information that will determine whether a link needs to be
processed is buried between the original link and the next link (or EOF), so
I need to capture a larger (multiline) section of text and test each one to
see if it contains my identifier. It appears that I'm safe using the </TR>
tag as something that always comes after my new identifier and before the
next link (or EOF). So, I'm trying to edit my regex so I can grab this
larger (multiline) section of text, then if the identifier is the correct
one, I'll use my first regex expression or a slightly modified version to
grab just the URL from within the match.
I've been using http://www.aivosto.com/vbtips/regex.html as a helpful source
on regex expressions, but when I test my code on
http://regexlib.com/RETester.aspx I'm getting no results (my first
expression worked fine). Any assistance would be greatly appreciated. I
think I'm pretty close, but the following isn't working:
^<A HREF=.*/TR>
Any advice? The only difference is replacing the single '>' with '/TR>'. I
suspect it may have to do with spaces or linebreaks, but I don't know for
certain.
I'm posting a sample of my much larger HTML below; I'm trying to only
capture the ^<A HREF=.*> URL match for items where the class td includes
"Land Spread Vector".
I prefer using multiple simple Regex expressions versus one donated
expression that does it all, so I can understand my own code and at least
attempt to troubleshoot if I need to change anything.
Thanks!
Keith
<A Href=javascriptpenDocument('0900043d802b3528');>
<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16>
101998
</a>
</td>
<td class='classtd'>
Green-tipped Martin
</td>
<td class='classtd'>
CURRENT,3.2
</td>
</TR>
<TR>
<TD></TD>
<TD>
<A Href=javascriptpenDocument('0900043d803a1ce4');>
<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16>
101998 - APRRE - Assert.doc
</a>
</td>
<td class='classtd'>
Land Spread Vector
</td>
<td class='classtd'>
CURRENT,3.0
</td>
</TR>
<TR>
<TD></TD>
<TD>
<A Href=javascriptpenDocument('0900043d802b635e');>
<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16>
101998-R
</a>
</td>
<td class='classtd'>
Reevaluation
</td>
<td class='classtd'>
CURRENT,1.0
</td>
</TR>
</TD></TR></TABLE><BR><BR>
<CENTER>
<A Href='javascript:history.back();'><img
src='/OurDir/images/back_down.jpg' border=0 align='center'
alt='Back'></A>
<A Href='javascript:goHome();'><img
src='/OurDir/images/home_down.jpg' border=0 align='center' alt='Home'></A>
</CENTER>
</BODY>
</HTML>