A
Akrobrat
Greetings all,
I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.
The expected result is:
However, if I do get a non-null result back, it is usually:
---------------------- Sample Patterns Tested:
----------------------
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W?"
..Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"
---------------------- Partial Source Data (from website):
----------------------
<div class="logo">
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"><img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/></a>
</div>
<td class="skucontent">
<a href="/site/olspage.jsp?skuId=8936896&st=Transformers
+Widescreen&type=product&id=1754542" class="prodlink">
Transformers - Widescreen Dubbed Subtitle AC3</a><br/>
---------------------- ---------------------- ----------------------
I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!
I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.
The expected result is:
However, if I do get a non-null result back, it is usually:
---------------------- Sample Patterns Tested:
----------------------
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W?"
..Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"
---------------------- Partial Source Data (from website):
----------------------
<div class="logo">
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"><img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/></a>
</div>
<td class="skucontent">
<a href="/site/olspage.jsp?skuId=8936896&st=Transformers
+Widescreen&type=product&id=1754542" class="prodlink">
Transformers - Widescreen Dubbed Subtitle AC3</a><br/>
---------------------- ---------------------- ----------------------
I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!