extracting email address out of message body

C

CompleteNewb

I've seen some posts about this before, but the suggestions didn't work at
all when I tried them (probably my fault).

We send a newsletter out TO PEOPLE WHO SUBSCRIBE to it (we are NOT spammers,
and include opting out options in every mailing). Again, we're not
spammers, and that's why I've been charged with keeping the mailing list
clean. We get a lot of "mailbox full" or "user not found" bounces, and
after we get three of these per address, I want to take that email address
out of our distribution list.

So, I've been using Outlook to import all the bounces from our server, then
I export the bounces to Access. Now in Access I've got a "body" field that
has the entirety of the bounce message, which differs across all ISP's.
Some say the reason, some just say it didn't get through, and some have
cryptic messages that don't really say much of anything. Also, some include
the first 20 lines or whatever of the original outgoing message, some have
the whole thing, some have the plain text version, some put weird characters
in, etc. The point here is that the email address is never in the same
place, so I can't count on code that looks in a specific place inside the
body. One thing they all have, though, is the email address the original
message was sent to (I can't use the "From" filed, because it's just the
postmaster, or sysadmin, or whatever, of the ISP) SOMEWHERE in the body.

What I'd like to do is extract all the email addresses out of the bodies of
all these bounces, into a new field for that record. I tried several
permutations of my VERY limited knowledge of VBA, trying to locate the @
symbol, then getting all the characters to the right and left of the symbol
until a space is encountered. I couldn't figure out how to do this,
however.

Can someone here help me with this? I don't mind getting our own sending
address as well as the recipient's (a lot of bounces include the sender's
email address in the bounce message as well as the failed destination
address), because I figure I can run through the new field with just the
email addresses and delete our own sending address, leaving the recipient's.
Or, perhaps I'm going about it all wrong to begin with. If this is the
wrong group to post to, please help direct me to a more appropriate group if
you could.

Any help, advice, samples, etc. would be greatly appreciated, and thanks for
reading.
 
J

John Spencer

Assuming that you have a table with all the addresses you sent to you MIGHT
be able to use a non-equi join.
SELECT tblEmail.*, tblSourceList.Email
FROM tblEmail INNER JOIN tblSourceList
ON tblEmail.BigField LIKE "*" & tblSourceList.Email & "*"

I would guess that this might fail to work if tblEmail.BigField is a memo
field.

Otherwise, I think you are going to have to write a custom VBA function to
extract the Email address from the field.
UNTESTED AIRCODE follows

Public Function fProbableEmail(strIN)
Dim vArray as Variant
Dim I as Long
Dim strReturn as String

if Len(strIN & "") = 0 then
fProbableEmail = strIN
ELSe
vArray = Split(strIn," ")
For I = Lbound(vArray) to UBound(vArray)
If vArray(I) LIKE "*[@]*" Then
If vArray(I) LIKE "*[email protected]*" Then
'Screen out your address
Else
fProbableEmail= vArray(I)
exit For
End if

End If
Next I
End if

End Function

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
..
 
C

CompleteNewb

John, your vba worked virtually flawlessly. Seriously, I was flabbergasted.
You should have seen the pages of crap I wrote while trying to experiment
and do this, and your one small paragraph was almost perfect. I am in awe.

I have a few questions, and I hope you have the time to address them (I feel
guilty asking for more help when your one snippet worked so well, but I feel
I should try, and I'm just as curious to learn as I am to save countless
hours of fiddling).

1) The snippet you provided seems to get the first email address to show
up, and then it goes to the next record, skipping any subsequent email
addresses in the field. This is okay with me, as the majority of bounce
messages have the first email address in the body as the one that couldn't
be reached, but not all of them. I realize if I get all of them, then I'll
have several in what the function returns (like my own email address when it
says who the sender was, that weird message ID one that's always something
like <[email protected]>, etc.). I removed the
If..Then that checks for the mydomain.com addresses and ignores them
(because I didn't know what to replace the comment line with in your
snippet; I'm quite the newb), so I'm not sure why it only gets the first one
and then moves to the next record. Is it because your For..Next procedure
moves on after the first capture? Is that the nature of the For..Next
function, that once it gets a successful datum it goes to Next? Do you know
how I can get ALL the email addreses from this memo field, not just the
first one? The biggest reason I ask is that some ISP's will list ALL the
email addresses that got bounced in one bounce message for that domain,
instead of a separate bounce message per address in that domain.

2) Assuming I CAN accomplish the above, what would I use to either ignore
the email addresses that have the mydomain.com in them, or as a second step
remove them from the new field once the first procedure gets all the emails
into that field?

3) I am seeing that a lot of these bounces use some kind of character that
I can't see after the email address that must be something like a carriage
return, because I get some text from the line AFTER the email address until
there's a space in THAT line. For instance, several hundred of the records
have in the new field:

(e-mail address removed)
Reason:

So in the bounce message, it has the failed email address, some sort of
"Next line" code that I can't see in the body, then on the next line
"Reason: User mailbox over quota" or whatever, so your code gets the email
address, new line, and everything up until the first space. I tried
copy/pasting whatever invisible character is after the email address and
using that in place of the " " in your code, but it didn't work. I'm not
even sure the character (whatever it might be) takes up a space I can copy.
I also tried using the vbCrLf and Chr$(13) & Chr$(10), but those didn't work
either. When I export to excel instead of Access, I see a lot of weird
special characters, like the upright thick black line that looks like it's
where carriage returns are, and I don't see these weird charatcers when
exporting to Access. Again, this isn't a huge deal, because I can make
several trips through in table view just uding Find/Replace to get rid of
whatever the most common net line words are (Reason:, Cause:, etc.). I was
just wondering if it's something I can do inclusively with the first step.
Is there some type of intermediate step I can take, like telling Outlook to
export the raw content, or plain text of the bounce messages? Is this
something I should ask the Outlook groups about?

Whether you can help with these issues or not, my PROFUSE thanks and serious
kudos for your expertise. I was almost ashamed of myself when I saw how
small your aircode was, when thinking of how long I took to try out those
big half-page vba attempts. I SERIOUSLY need to get some VBA training, but
every class I took never seemed to be all that useful in application. I
learned more from experimenting than I did in class, but that has also led
to some VERY inefficient code.

Thanks again.





John Spencer said:
Assuming that you have a table with all the addresses you sent to you
MIGHT be able to use a non-equi join.
SELECT tblEmail.*, tblSourceList.Email
FROM tblEmail INNER JOIN tblSourceList
ON tblEmail.BigField LIKE "*" & tblSourceList.Email & "*"

I would guess that this might fail to work if tblEmail.BigField is a memo
field.

Otherwise, I think you are going to have to write a custom VBA function to
extract the Email address from the field.
UNTESTED AIRCODE follows

Public Function fProbableEmail(strIN)
Dim vArray as Variant
Dim I as Long
Dim strReturn as String

if Len(strIN & "") = 0 then
fProbableEmail = strIN
ELSe
vArray = Split(strIn," ")
For I = Lbound(vArray) to UBound(vArray)
If vArray(I) LIKE "*[@]*" Then
If vArray(I) LIKE "*[email protected]*" Then
'Screen out your address
Else
fProbableEmail= vArray(I)
exit For
End if

End If
Next I
End if

End Function

--
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
.

CompleteNewb said:
I've seen some posts about this before, but the suggestions didn't work
at all when I tried them (probably my fault).

We send a newsletter out TO PEOPLE WHO SUBSCRIBE to it (we are NOT
spammers, and include opting out options in every mailing). Again, we're
not spammers, and that's why I've been charged with keeping the mailing
list clean. We get a lot of "mailbox full" or "user not found" bounces,
and after we get three of these per address, I want to take that email
address out of our distribution list.

So, I've been using Outlook to import all the bounces from our server,
then I export the bounces to Access. Now in Access I've got a "body"
field that has the entirety of the bounce message, which differs across
all ISP's. Some say the reason, some just say it didn't get through, and
some have cryptic messages that don't really say much of anything. Also,
some include the first 20 lines or whatever of the original outgoing
message, some have the whole thing, some have the plain text version,
some put weird characters in, etc. The point here is that the email
address is never in the same place, so I can't count on code that looks
in a specific place inside the body. One thing they all have, though, is
the email address the original message was sent to (I can't use the
"From" filed, because it's just the postmaster, or sysadmin, or whatever,
of the ISP) SOMEWHERE in the body.

What I'd like to do is extract all the email addresses out of the bodies
of all these bounces, into a new field for that record. I tried several
permutations of my VERY limited knowledge of VBA, trying to locate the @
symbol, then getting all the characters to the right and left of the
symbol until a space is encountered. I couldn't figure out how to do
this, however.

Can someone here help me with this? I don't mind getting our own sending
address as well as the recipient's (a lot of bounces include the sender's
email address in the bounce message as well as the failed destination
address), because I figure I can run through the new field with just the
email addresses and delete our own sending address, leaving the
recipient's. Or, perhaps I'm going about it all wrong to begin with. If
this is the wrong group to post to, please help direct me to a more
appropriate group if you could.

Any help, advice, samples, etc. would be greatly appreciated, and thanks
for reading.
 
J

John Spencer

Public Function fProbableEmail(strIN)
Dim vArray as Variant
Dim I as Long
Dim strReturn as String
Dim strMailAddress as string

if Len(strIN & "") = 0 then
fProbableEmail = strIN
exit function
ELSe
vArray = Split(strIn," ")
For I = Lbound(vArray) to UBound(vArray)
If vArray(I) LIKE "*[@]*" Then
If vArray(I) LIKE "*@YourDomain.com*" Then
'Screen out addresses at your domain
Else
strMailAddress = vArray(I)
'look for CR and strip to the Cr
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & vbCr, vbCr)-1)
'look for LineFeed and strip to the linefeed
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & vbLF, vbLf)-1)
'look for Tab and strip to the Tab
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & VbTab, VbTab)-1)
'look for Comma and strip to the Comma
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & ",", ",")-1)

strReturn = "; " & strMailAddress & strReturn

End if

End If
Next I
End if

fProblemEmail = Mid(strReturn, 3)

End Function




'====================================================
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
'====================================================
 
C

CompleteNewb

John:

Thanks so much for giving it a go, but that returns nothing for all the
fields.

I'm already impressed with the first try, and I'll see if I can muddle
through tacking on some secondary routines to run once I have the new field
created. It already saves me amazing amounts of time.

I may post on here again if I can't figure out some things after
experimenting for a while, but your code has put me on the right path and
shown me some commands I've not used before.

Thanks again, your time has been VERY MUCH appreciated.

The Complete and Thankful Newb

John Spencer said:
Public Function fProbableEmail(strIN)
Dim vArray as Variant
Dim I as Long
Dim strReturn as String
Dim strMailAddress as string

if Len(strIN & "") = 0 then
fProbableEmail = strIN
exit function
ELSe
vArray = Split(strIn," ")
For I = Lbound(vArray) to UBound(vArray)
If vArray(I) LIKE "*[@]*" Then
If vArray(I) LIKE "*@YourDomain.com*" Then
'Screen out addresses at your domain
Else
strMailAddress = vArray(I)
'look for CR and strip to the Cr
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & vbCr, vbCr)-1)
'look for LineFeed and strip to the linefeed
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & vbLF, vbLf)-1)
'look for Tab and strip to the Tab
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & VbTab, VbTab)-1)
'look for Comma and strip to the Comma
strMailAddress = Left(strMailAddress, _
Instr(1, strMailAddress & ",", ",")-1)

strReturn = "; " & strMailAddress & strReturn

End if

End If
Next I
End if

fProblemEmail = Mid(strReturn, 3)

End Function




'====================================================
John Spencer
Access MVP 2002-2005, 2007
Center for Health Program Development and Management
University of Maryland Baltimore County
'====================================================

John, your vba worked virtually flawlessly. Seriously, I was
flabbergasted. You should have seen the pages of crap I wrote while
trying to experiment and do this, and your one small paragraph was almost
perfect. I am in awe.

I have a few questions, and I hope you have the time to address them (I
feel guilty asking for more help when your one snippet worked so well,
but I feel I should try, and I'm just as curious to learn as I am to save
countless hours of fiddling).
 
E

EllenM

Hello,
If you're still in the process of parsing out your e-mail addresses, try
this query:

SELECT BounceTable.BounceField,
rgxExtract([BounceField],"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b",0) AS
Email1,
Replace([BounceField],(rgxExtract([BounceField],"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b",0)),"")
AS BounceMinus1,
rgxExtract([BounceMinus1],"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b",0) AS
Email2,
Replace([BounceMinus1],(rgxExtract([BounceMinus1],"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b",0)),"")
AS BounceMinus2,
rgxExtract([BounceMinus2],"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b",0) AS
Email3,
Replace([BounceMinus2],(rgxExtract([BounceMinus2],"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b",0)),"") AS BounceMinus3
FROM BounceTable;

You'll also need the "rgxExtract" function found in
http://www.j.nurick.dial.pipex.com/Code/vbRegex/rgxExtract.htm. For this
function to work, you'll need to remove all tabs and returns from your
bounces. Your bounces should be in one single line.

The query will find the first three e-mail addresses. There will also be 3
fields with the bounce minus 1,2 or 3 e-mail addresses.

Repeat the query with the "BounceMinus3" field to strip out the next three
e-mail addresses.

Continue repeating the process till there's no more addresses to be
extracted out. Then you can pick out addresses that are found in your
subscriber table.

For more information on regular expressions, search this forum for "isolate
patterns", the post where I learned about regular expressions in Access.
(Thank you John Spencer!!)

Hope this is helpful.

Ellen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top