Read .TXT file line by line?

C

Charlotte E.

How to read a .TXT-file line by line?

I would like to read the first line of the .TXT-file,
Perform some action on the read data (the .TXT-line)

Read the next line - perform some action
Read the next line - perfom the action again
etc...

Until all lines of the .TXT-file is read, and the action is performed on
all lines (imported data)

My problem is that I have a HUGE .TXT-file of almost 3.000.000 lines!!!
I'm aware that going through all these lines, one-by-one, will probably
take a week or so, so I have to make my first test-program read a
smaller file, until I'm sure it is working.

But, how to read the file, line by line, in the first place, since I
can't import such a huge file in my trusty old XL2003?

Oh, I case you wonder what the 'action' is, I want to remove those lines
in the .TXT-file that doens't fullfill certain criteria, thus (hopefully
:) ending up with a .TXT-file of only approx. 105.000 lines, which then
can be converted into a webtable.


Thanks in advance...

CE
 
M

Mike S

How to read a .TXT-file line by line?

I would like to read the first line of the .TXT-file,
Perform some action on the read data (the .TXT-line)

Read the next line - perform some action
Read the next line - perfom the action again
etc...

Until all lines of the .TXT-file is read, and the action is performed on
all lines (imported data)

My problem is that I have a HUGE .TXT-file of almost 3.000.000 lines!!!
I'm aware that going through all these lines, one-by-one, will probably
take a week or so, so I have to make my first test-program read a
smaller file, until I'm sure it is working.

But, how to read the file, line by line, in the first place, since I
can't import such a huge file in my trusty old XL2003?

Oh, I case you wonder what the 'action' is, I want to remove those lines
in the .TXT-file that doens't fullfill certain criteria, thus (hopefully
:) ending up with a .TXT-file of only approx. 105.000 lines, which then
can be converted into a webtable.

Thanks in advance...
CE

How big is the entire file?

If you run this code (sourcefile contains the full path to the text
file) do you see an error?

Option Explicit

Public Function FileIntoString(sourcefile As String) As String
On Error GoTo FileIntoStringErr
Dim filenum As Long
filenum = FreeFile
Open sourcefile For Binary As #filenum
FileIntoString = Space(LOF(filenum))
Get #filenum, , FileIntoString
Close #filenum
Exit Function
'
FileIntoStringErr:
msgbox "Error in Function FileIntoString " & _
Err.Number & " " & Err.Description
FileIntoString = ""
End Function

If you don't see an error, you can use that code to read the entire file
into a string, then split the string into an array so that each line of
the file is in a separate array element very easily:

dim Filedata() as string

Filedata = Split(FileIntoString(FullPathToTextFile),vbCrLf)

Now that each line of the file is in a separate array element, you can
loop through all or some of them very easily. I like to make sure I
ignore blank lines, you may not need that protection:

Loop through all array elements:

Dim n as Long

For n = LBound(Filedata) To UBound(Filedata )
If Not Trim$(Filedata(n)) = "" Then
DoSomething Filedata(n)
End If
Next

For testing your code if you want to just work with a limited number of
elements of the array, e.g. 10. This assumes your Option Base is set to
or defaults to 0:

Syntax: Option Base {0 | 1}
Because the default base is 0, the Option Base statement is never
required. If used, the statement must appear in a module before any
procedures. Option Base can appear only once in a module and must
precede array declarations that include dimensions.
http://msdn.microsoft.com/en-us/library/gg251511.aspx

For n = LBound(Filedata) To 9
If Not Trim$(Filedata(n)) = "" Then
DoSomething Filedata(n)
End If
Next

Does that work for you? This is untested, if it doesn't work I'll be
happy to check it further, or if you like send me the text file zipped
if it doesn't contain any private or secure information and I'll debug
the code

Mike
 
H

Harald Staff

Hej Charlotte E.

Tricket er Line input:

Sub test()
Dim Linje As String, Avsn() As String
Dim iFnum As Integer
iFnum = FreeFile
Open "C:\Temp\Fil.Txt" For Input As #iFnum
While Not EOF(iFnum)
Line Input #iFnum, Linje
'eksempel "some action":
Avsn = Split(Linje, " ")
Debug.Print Avsn(0)
Wend
Close #iFnum
End Sub

HTH. Best wishes Harald
 
C

Charlotte E.

How big is the entire file?

It is 2.7 GB!

If you run this code (sourcefile contains the full path to the text
file) do you see an error?

I'll try this weekend, and return...

But, with the entire fle being 2.7 GB, I doubt it will fit into an array???


CE
 
A

Auric__

Translated from Danish as a convenience (Oversat fra dansk som en
bekvemmelighed; dersom norske, beklager, ser det samme til meg):
Hey Charlotte E.

The trick is Line Input:

Sub test()
Dim Linje As String, Avsn() As String
Dim iFnum As Integer
iFnum = FreeFile
Open "C:\Temp\Fil.Txt" For Input As #iFnum
While Not EOF(iFnum)
Line Input #iFnum, Linje
'example "some action":
Avsn = Split(Linje, " ")
Debug.Print Avsn(0)
Wend
Close #iFnum
End Sub

To clarify this one a bit, this text:
'example "some action":
Avsn = Split(Linje, " ")
Debug.Print Avsn(0)

....is where you check the line to see if it fits your criteria. To write
the lines that you want to keep to another file, do this:
Sub test()
Dim Linje As String
Dim iFnum As Integer, oFnum As Integer
iFnum = FreeFile
Open "C:\path\to\bigFile.txt" For Input As iFnum
oFnum = FreeFile
Open "output.txt" For Output As oFnum
While Not EOF(iFnum)
Line Input #iFnum, Linje
'...check criteria here...
Print #oFnum, Linje
Wend
Close #iFnum
End Sub

The method Mike S showed is *usually* faster than this way, but probably
not feasible with such a large file. (*I* wouldn't want to try it.)

Another problem is that Excel is limited in how many lines it can have in a
spreadsheet. I don't know what the limits are for 2003, but in 2000 it's
65,536 lines. If your final file has more lines than that, you'll need to
split it up.
 
I

isabelle

hi,

Sub Read_txt_File()
Dim ff As Integer, rws() As String, Temp As String, MyFile As String
ff = FreeFile
MyFile = "C:\MyFile.txt" '**** adapt
Open MyFile For Binary As #ff
Temp = String(FileLen(MyFile), " ")
Get #ff, , Temp
Close #ff
rws = Split(Temp, vbCrLf)
For i = LBound(rws) To UBound(rws)
Range("A" & i + 1) = rws(i)
Next
End Sub
 
J

Jim Cone

My turn <g> ...
Making the assumption you are using xl2007/2010, suggest you change
lngMaxRows to ~250000
ARR_Size to ~ 50,000

The above changes would give you ~12 columns with 250,000 rows in each column.
(something to tell Grandma about if it doesn't blow up)
'---
Public Function ReadTextFile(ByRef strFullPath As String, _
ByRef WS As Excel.Worksheet) As String
'JBC - Portland, Oregon USA - December, 2008
Dim objFso As Object
Dim oTextFile As Object
Dim colCounter As Long
Dim lngMaxRows As Long
Dim lngCount As Long
Dim lngMarker As Long
Dim N As Long
Dim strTemp As String
Dim vArr() As String
Const ARR_Size As Long = 20000

On Error GoTo ErrHandler
N = 1
lngCount = 1
colCounter = 1
lngMaxRows = CLng(WS.Rows.Count \ ARR_Size)
lngMaxRows = lngMaxRows * ARR_Size 'maximum rows used on sheet
ReDim vArr(1 To ARR_Size, 1 To 1)
Set objFso = CreateObject("Scripting.FileSystemObject")

Application.ScreenUpdating = False
'Reads every line and adds it the the array.
If objFso.FileExists(strFullPath) Then
Set oTextFile = objFso_OpenTextFile(strFullPath)
Do While Not oTextFile.AtEndOfStream
strTemp = oTextFile.Readline
If Len(strTemp) Then
vArr(N, 1) = strTemp
N = N + 1
If N > lngMaxRows Then
DoEvents
'Add array to sheet, switch to next column, reset variables.
WS.Range(WS.Cells(LBound(vArr(), 1), colCounter), _
WS.Cells(UBound(vArr(), 1), colCounter)).Value = vArr()
colCounter = colCounter + 1
lngMarker = 1
lngCount = 1
N = 1
ReDim vArr(N To ARR_Size, 1 To 1)
ElseIf N > (ARR_Size * lngCount) Then
'Add array to sheet
WS.Range(WS.Cells(LBound(vArr(), 1), colCounter), _
WS.Cells(UBound(vArr(), 1), colCounter)).Value = vArr()
ReDim vArr(N To (N + ARR_Size), 1 To 1)
'Keep track of how many times array added to the same column.
lngCount = lngCount + 1
'Flag to identify if partially filled array exists when loop completes.
lngMarker = N
ElseIf N Mod 5000 = 0 Then
Application.StatusBar = "Row " & N & " - Column " & colCounter
End If
End If
Loop
'If a partially filled array leftover, add it to sheet
If N > lngMarker Then WS.Range(WS.Cells(LBound(vArr(), 1), colCounter), _
WS.Cells(UBound(vArr(), 1), colCounter)).Value = vArr()
End If

ExitRoutine:
On Error Resume Next
oTextFile.Close
Set oTextFile = Nothing
Set objFso = Nothing
Application.StatusBar = False
Application.ScreenUpdating = True
Exit Function

ErrHandler:
Resume ExitRoutine
End Function
'---

Jim Cone
Portland, Oregon USA .
http://www.mediafire.com/PrimitiveSoftware .
(Formats & Styles xl add-in: lists/removes unused styles & number formats) - free






"Charlotte E." <[email protected]>
wrote in message
 
M

Mike S

Translated from Danish as a convenience (Oversat fra dansk som en
bekvemmelighed; dersom norske, beklager, ser det samme til meg):


To clarify this one a bit, this text:

...is where you check the line to see if it fits your criteria. To write
the lines that you want to keep to another file, do this:
Sub test()
Dim Linje As String
Dim iFnum As Integer, oFnum As Integer
iFnum = FreeFile
Open "C:\path\to\bigFile.txt" For Input As iFnum
oFnum = FreeFile
Open "output.txt" For Output As oFnum
While Not EOF(iFnum)
Line Input #iFnum, Linje
'...check criteria here...
Print #oFnum, Linje
Wend
Close #iFnum
End Sub

The method Mike S showed is *usually* faster than this way, but probably
not feasible with such a large file. (*I* wouldn't want to try it.)

Another problem is that Excel is limited in how many lines it can have in a
spreadsheet. I don't know what the limits are for 2003, but in 2000 it's
65,536 lines. If your final file has more lines than that, you'll need to
split it up.

I agree with everything Auric__ said, line input is the way to go since
your file is so big.

Mike
 
G

GS

Charlotte E. brought next idea :
How to read a .TXT-file line by line?

I would like to read the first line of the .TXT-file,
Perform some action on the read data (the .TXT-line)

Read the next line - perform some action
Read the next line - perfom the action again
etc...

Until all lines of the .TXT-file is read, and the action is performed on all
lines (imported data)

My problem is that I have a HUGE .TXT-file of almost 3.000.000 lines!!!
I'm aware that going through all these lines, one-by-one, will probably take
a week or so, so I have to make my first test-program read a smaller file,
until I'm sure it is working.

But, how to read the file, line by line, in the first place, since I can't
import such a huge file in my trusty old XL2003?

Oh, I case you wonder what the 'action' is, I want to remove those lines in
the .TXT-file that doens't fullfill certain criteria, thus (hopefully :)
ending up with a .TXT-file of only approx. 105.000 lines, which then can be
converted into a webtable.


Thanks in advance...

CE

Not to discount others' replies here but given the size of the file
you say you are working with, these are similar to files resulting from
a database 'dump'. In this case it would be far more efficient to read
the file in 'blocks' and process each line of a block in memory rather
than read/process each line one at a time. Okay for smaller files if
you don't mind the time it takes but you might not like the performance
this approach has with such large files.

You can find lots of examples/samples of large file parsers if you
google this with "VB6" included. (ie: "parse large text files vb6")
 
P

Peter Jamieson

Hi Charlotte,
I know you ask in an ms excel programming group but this task, especially if
you have to perform
it repeatedly, may best be done in Perl.
The time taken to process your file you estimate as a week: much quicker to
get say the Windows ActiveState
version of Perl. Then a few lines of Perl script will perform your task with
very little overhead.
Perl was (almost) designed to do just the task you have!
Cheers, Peter
 
R

Rick Rothstein

My problem is that I have a HUGE .TXT-file of
almost 3.000.000 lines!!!

On average, how long (number of characters) are those lines of text? Also,
what is the total file size?

Rick Rothstein (MVP - Excel)
 
C

Charlotte E.

Thanks for all your suggentions and help, guys :)

Got it working :)))

CE



Den 02.12.2011 09:00, Charlotte E. skrev:
 
M

Mike S

Thanks for all your suggentions and help, guys :)

Got it working :)))
<snip>

Just out of curiosity, did you use line input, and how long does it take
to go work it's way through a 2GB file?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top