parsing a tagged text file from PubMed

M

Mike Hodkin

I have a text file retrieved from PubMed which has multiple citations, all
formated as seen below. How can I best use VBA to parse the text associated
with particular tags, e.g. "PMID" or "TI", into their own fields?
Obviously, each citation would be a new record. I suppose this is what the
commercial reference managers do, e.g. EndNote



thanks,

MJH





UI - 21029628

PMID- 11164634

OWN - NLM

STAT- completed

DA - 20010112

DCOM- 20010301

IS - 0303-2647

VI - 58

IP - 1-3

DP - 2000 Oct-Dec

TI - Coincidence detection in the Hodgkin-Huxley equations.

PG - 83-91

FAU - Marsalek, P

AU - Marsalek P

LA - eng

PT - Journal Article

PL - Ireland

TA - Biosystems

JID - 0430773

SB - IM

MH - Auditory Pathways/*physiology

MH - Hearing/*physiology

MH - Support, Non-U.S. Gov't

EDAT- 2001/02/13 11:00

MHDA- 2001/03/07 10:01

AID - S0303264700001106 [pii]

PST - ppublish

SO - Biosystems 2000 Oct-Dec;58(1-3):83-91.
 
H

HSalim

Mike,
Your example is good but incomplete, so I am going to make these
assumptions:
1. There are multiple records in the text field
2. UI indicates the start of each record;
3. The numbers following UI will be used as the primary key
4. Since there are multiple MH records, they will be sent to a separate
table

You can call the code from anywhere if you put it in a module. - example
call readfile("C:\temp\newsgrp1.txt", "Citations", "MH")

You must add a reference to MIcrosoft Scripting Runtime.

see table and code listing below:

I hope this is what you wanted.
HS
------------------------------------------------------
For this test I created two tables -
Citations ( UI char(50) primary key,
PMIDchar(50),
OWNchar(50),
STATchar(50),
DA char(50),
DCOMchar(50),
IS char(50),
VI char(50),
IP char(50),
DP char(50),
TI char(100),
PG char(50),
FAUchar(50),
AU char(50),
LA char(50),
PT char(50),
PL char(50),
TA char(50),
JIDchar(50),
SB char(50),
EDAT char(50),
MHDA char(50),
AID char(50),
PST char(50),
SO char(50))

---------------------------------------------
tblMH
UI char(50),
MH Char(50),
RowID longint, primary key, Autonumber
---------------------------------------------
Public Sub readfile(Filespec As String, DestTable As String, mhtable As
String)

Dim fso As New FileSystemObject _
, cfile _
, dbs As DAO.Database _
, rst As DAO.Recordset _
, Rowid As String _
, prefix As String _
, LineData As String _
, counter As Integer _
, strUI As String


Set dbs = CurrentDb
Set rst = dbs.OpenRecordset(DestTable, dbOpenDynaset, dbAppendOnly)
Set rstmh = dbs.OpenRecordset(mhtable, dbOpenDynaset, dbAppendOnly)
counter = 0
Set cfile = fso_OpenTextFile(Filespec, ForReading, False)

' Begin reading the file
While Not (cfile.AtEndOfStream)
currline = cfile.ReadLine

If Len(Trim(currline)) > 0 Then ' skip blank lines
Debug.Print currline ' for testing only
prefix = Trim(Left(currline, InStr(1, currline, "-",
vbTextCompare) - 1))
LineData = Trim(Mid(currline, InStr(1, currline, "-", vbTextCompare)
+ 1))

Select Case prefix
Case "UI" 'Special Handling for UI since it marks the beginning of a
record
If counter > 0 Then rst.Update 'suppress the update till the
second occurance of UI
rst.AddNew
counter = counter + 1
rst!UI = LineData
strUI = LineData ' carry UI number for MH records
Case "MH" ' Move MH data to separate table, use UI as foreign key
rstmh.AddNew
rstmh!UI = strUI
rstmh!mh = LineData
rstmh.Update
Case Else 'all other fields can go to Citations table
rst.Fields(prefix) = LineData
End Select
End If
Wend
rst.Update 'Update the last set of rows

End Sub
-------------------------------------
 
M

Mike Hodkin

Thanks for the detailed reply...I only need a few certain fields to create a
bibliography reference, namely AU, TI, etc. (no MH fields needed), so my
needs are simpler than you assumed. Your example code will get me started,
and I'm sure I can learn much from it.

thanks again...
Mike
 
Top