B
Boris Zakharin
I have a VBA method to read log files and parse data into the Access
database. I have been told that the speed at which my method runs now is
unacceptable, so I'm looking for any advie to speed this up.
The structure of the file is (at high level) as follows:
The log represents the interaction of users with the company's mail server.
Each entry describes an event such as mail sent, mail received, etc
Each entry is exactly one line in length
Each entry has a time stamp (the date is in the file name)
Each entry has an ID number that is guaranteed to be the same for all events
associated with one message (that is, this number is the same for the send
event and for all receive events of the same message)
Now, my current approach is as follows:
Using Scripting.FileSystemObject I get each file name using a for each
construct
I get the date from the file name using FormatDateTime(Left(f1.Name, 2) &
"/" & Mid(f1.Name, 3, 2) & "/" & year, vbShortDate)
I open the file using OpenAsTextStream(ForReading)
I read each line using vbCrLf & ts.readline & vbCrLf
From each line, I try to extract the time, sender, recipient, line id, etc
(even though they cannot all be in the same line at once)
To locate information in the line (this is done by calling a separate
function) I use InStr and InStrRev to get the left and right delimiters then
return the string produced by the Mid stratement
Based on which information is available, I call an append query using
CurrentDb.QueryDefs and pass the information as parameters
If it's a send event I first delete all previous events with the same ID
then i add this line to the senders table
If it's a receive event I first add the line to the recipients table
Then I open a query which joins the senders table and the final table as a
recordsert
If the recordset has only one entry, I pass its fields as parameters to an
append query to insert into the final table
If it has more than one entry I use SELECT TOP 1 (I sort on an autonumber
field so that the most recently inserted will be at the top)
There are other details that have not gone into, but here's an example of a
query I use to join the senders and recipients table. There's also a num:
SELECT s.rDate, s.rTime, s.rFrom, [@rTo] AS rTo, n.nr, s.lc, [@rTime] AS
sTime
FROM senders AS s, numRecs AS n, email_data AS e
WHERE s.rFrom=e.rTo And s.lc=e.lc And s.lc=n.lc And s.rFrom=n.rFrom And
n.nr='1' And s.lc=[@lc] And e.rFrom=[@rTo];
Here numRecs has the number of recipients which is also read from the same
line as the name of the sender and email_data is the final tasble. This is
required because a reply to a message has the same ID as the original
message nad is being loomed up in the part of the table already created (lc
is the ID number and nr is the number of recipients)
Does this sound like there could be a better way to do this?
Thanks,
Boris Zakharin
database. I have been told that the speed at which my method runs now is
unacceptable, so I'm looking for any advie to speed this up.
The structure of the file is (at high level) as follows:
The log represents the interaction of users with the company's mail server.
Each entry describes an event such as mail sent, mail received, etc
Each entry is exactly one line in length
Each entry has a time stamp (the date is in the file name)
Each entry has an ID number that is guaranteed to be the same for all events
associated with one message (that is, this number is the same for the send
event and for all receive events of the same message)
Now, my current approach is as follows:
Using Scripting.FileSystemObject I get each file name using a for each
construct
I get the date from the file name using FormatDateTime(Left(f1.Name, 2) &
"/" & Mid(f1.Name, 3, 2) & "/" & year, vbShortDate)
I open the file using OpenAsTextStream(ForReading)
I read each line using vbCrLf & ts.readline & vbCrLf
From each line, I try to extract the time, sender, recipient, line id, etc
(even though they cannot all be in the same line at once)
To locate information in the line (this is done by calling a separate
function) I use InStr and InStrRev to get the left and right delimiters then
return the string produced by the Mid stratement
Based on which information is available, I call an append query using
CurrentDb.QueryDefs and pass the information as parameters
If it's a send event I first delete all previous events with the same ID
then i add this line to the senders table
If it's a receive event I first add the line to the recipients table
Then I open a query which joins the senders table and the final table as a
recordsert
If the recordset has only one entry, I pass its fields as parameters to an
append query to insert into the final table
If it has more than one entry I use SELECT TOP 1 (I sort on an autonumber
field so that the most recently inserted will be at the top)
There are other details that have not gone into, but here's an example of a
query I use to join the senders and recipients table. There's also a num:
SELECT s.rDate, s.rTime, s.rFrom, [@rTo] AS rTo, n.nr, s.lc, [@rTime] AS
sTime
FROM senders AS s, numRecs AS n, email_data AS e
WHERE s.rFrom=e.rTo And s.lc=e.lc And s.lc=n.lc And s.rFrom=n.rFrom And
n.nr='1' And s.lc=[@lc] And e.rFrom=[@rTo];
Here numRecs has the number of recipients which is also read from the same
line as the name of the sender and email_data is the final tasble. This is
required because a reply to a message has the same ID as the original
message nad is being loomed up in the part of the table already created (lc
is the ID number and nr is the number of recipients)
Does this sound like there could be a better way to do this?
Thanks,
Boris Zakharin