Linking Large Text Database to Excel

C

cardan

Hello, I have a large database issue I hope someone can help me with.
It is hard to explain so I have tried to be as descriptive as possible
without losing sight of my problem.

OVERALL QUESTION
Is there a way to have my Excel workbook search a very large text
database outside the Excel file?
I am working with an extremely large data set and my Excel file is
starting to get too large (20M) that I need to figure out a way to
reduce file size (and maybe increase functionality and flexibility).

QUICK BACKGROUND
Every bank in the US is required to report their financial statements
on a quarterly basis called Call Reports. This information is
available free and online on the FDIC website. There are about 7,500
banks nationally that report. These reports contain maybe 1,000
different numbers, each given its own code (i.e. Total Assets is
RCON2170- I’ll call them RCON #’s for short) and every bank uses the
same template.

I am able to download this data in bulk form into a zipped text folder
which I then extract. This folder will now contains about 40
different text files – each named for a section in the report (ie.
RCCI is the balance sheet section). The Banks unique identifier is
listed in column A and the number code (RCON) is listed in row 1.
(Each bank identifier and number code are unique). The whole unzipped
folder is approximately 65M.

CURRENT SETUP
Of these 40 sections mentioned, I need about 6 of the sections in my
Excel file. Right now I convert the sections I need into an Excel
format (Convert with a Tab Delimited) and put it into my model. I have
template tabs that uses the INDEX-MATCH formulas to find the right
number based on the RCON code and the banks unique identifier. This
setup allows for comparisons amongst different banks. All the user has
to do is input the banks unique identifier and it will return the
appropriate numbers based on the RCON number. (Sometimes we will do 10
banks side by side). The INDEX MATCH works very well.

THE PROBLEM
The issue is that each tab represents a section of the report. Each
section lists each bank (7,500) and contains approximately 100 RCON
numbers (a data set of 750,000 fields per tab). My file for a
quarterly report is approximately 20M. Is there a way to have my
Excel workbook search the text files for the appropriate RCON number
and the banks unique identifier? That way I can keep my files limited
in size and may be able to include previous Call Reports for Trend
Analysis.

Any help would be extremely appreciated. (Also let me know if this
format is acceptable to understand my issues :)
 
J

JLatham

Short answer: yes. VBA can deal with this type of thing pretty well, and
actually with pretty good speed.

The process would go something like this:
You enter the RCON number into a cell or as a response to an InputBox$()
statement. Then the code would open the text files and look for the RCON
number in each row of input from them, and as the RCON is encountered, would
then extract the information you need from the row and it into the Excel
sheet.

In order to accomplish this, a very good analysis/understanding of the
content and format of the lines of the text files is required. You usually
have to write custom code to "parse" the inut data from the text file to get
just what you want from it and pull it into Excel.

As far as whether or not this format is acceptable to understand your
issues, I think so - at least I think I understand your needs. But I could
be wrong - that's been known to happen from time to time (usually with rather
short intervals between the misunderstandings).
 
G

Gary Keramidas

i actually automated a specific call report for a credit union. it was a
report that they needed to fill out. they would retrieve the gl income and
balance data from their server and then i would populate the report with the
correct amounts.

doesn't really help. just thought i'd mention it since you mentioned a call
report.
--


Gary Keramidas
Excel 2003


Hello, I have a large database issue I hope someone can help me with.
It is hard to explain so I have tried to be as descriptive as possible
without losing sight of my problem.

OVERALL QUESTION
Is there a way to have my Excel workbook search a very large text
database outside the Excel file?
I am working with an extremely large data set and my Excel file is
starting to get too large (20M) that I need to figure out a way to
reduce file size (and maybe increase functionality and flexibility).

QUICK BACKGROUND
Every bank in the US is required to report their financial statements
on a quarterly basis called Call Reports. This information is
available free and online on the FDIC website. There are about 7,500
banks nationally that report. These reports contain maybe 1,000
different numbers, each given its own code (i.e. Total Assets is
RCON2170- I’ll call them RCON #’s for short) and every bank uses the
same template.

I am able to download this data in bulk form into a zipped text folder
which I then extract. This folder will now contains about 40
different text files – each named for a section in the report (ie.
RCCI is the balance sheet section). The Banks unique identifier is
listed in column A and the number code (RCON) is listed in row 1.
(Each bank identifier and number code are unique). The whole unzipped
folder is approximately 65M.

CURRENT SETUP
Of these 40 sections mentioned, I need about 6 of the sections in my
Excel file. Right now I convert the sections I need into an Excel
format (Convert with a Tab Delimited) and put it into my model. I have
template tabs that uses the INDEX-MATCH formulas to find the right
number based on the RCON code and the banks unique identifier. This
setup allows for comparisons amongst different banks. All the user has
to do is input the banks unique identifier and it will return the
appropriate numbers based on the RCON number. (Sometimes we will do 10
banks side by side). The INDEX MATCH works very well.

THE PROBLEM
The issue is that each tab represents a section of the report. Each
section lists each bank (7,500) and contains approximately 100 RCON
numbers (a data set of 750,000 fields per tab). My file for a
quarterly report is approximately 20M. Is there a way to have my
Excel workbook search the text files for the appropriate RCON number
and the banks unique identifier? That way I can keep my files limited
in size and may be able to include previous Call Reports for Trend
Analysis.

Any help would be extremely appreciated. (Also let me know if this
format is acceptable to understand my issues :)
 
C

cardan

Short answer: yes.  VBA can deal with this type of thing pretty well, and
actually with pretty good speed.

The process would go something like this:
You enter the RCON number into a cell or as a response to an InputBox$()
statement.  Then the code would open the text files and look for the RCON
number in each row of input from them, and as the RCON is encountered, would
then extract the information you need from the row and it into the Excel
sheet.

In order to accomplish this, a very good analysis/understanding of the
content and format of the lines of the text files is required.  You usually
have to write custom code to "parse" the inut data from the text file to get
just what you want from it and pull it into Excel.

As far as whether or not this format is acceptable to understand your
issues, I think so - at least I think I understand your needs.  But I could
be wrong - that's been known to happen from time to time (usually with rather
short intervals between the misunderstandings).

Thanks for the reply. Is there a source you would recommend that
research this at? My VBA skills are sub par so I need all the help I
can get! Also, what are your thoughts about using an INDIRECT? Could
that work if I converted the text files into Excel? Thanks again.
 
C

cardan

i actually automated a specific call report for a credit union. it was a
report that they needed to fill out. they would retrieve the gl income and
balance data from their server and then i would populate the report with the
correct amounts.

doesn't really help. just thought i'd mention it since you mentioned a call
report.
--

Gary Keramidas
Excel 2003


Hello, I have a large database issue I hope someone can help me with.
It is hard to explain so I have tried to be as descriptive as possible
without losing sight of my problem.

OVERALL QUESTION
Is there a way to have my Excel workbook search a very large text
database outside the Excel file?
I am working with an extremely large data set and my Excel file is
starting to get too large (20M) that I need to figure out a way to
reduce file size (and maybe increase functionality and flexibility).

QUICK BACKGROUND
Every bank in the US is required to report their financial statements
on a quarterly basis called Call Reports. This information is
available free and online on the FDIC website.   There are about 7,500
banks nationally that report.  These reports contain maybe 1,000
different numbers, each given its own code (i.e. Total Assets is
RCON2170- I’ll call them RCON #’s for short) and every bank uses the
same template.

I am able to download this data in bulk form into a zipped text folder
which I then extract.  This folder will now contains about 40
different text files – each named for a section in the report (ie.
RCCI is the balance sheet section).   The Banks unique identifier is
listed in column A and the number code (RCON) is listed in row 1.
(Each bank identifier and number code are unique).  The whole unzipped
folder is approximately 65M.

CURRENT SETUP
Of these 40 sections mentioned, I need about 6 of the sections in my
Excel file. Right now I convert the sections I need into an Excel
format (Convert with a Tab Delimited) and put it into my model. I have
template tabs that uses the INDEX-MATCH formulas to find the right
number based on the RCON code and the banks unique identifier.  This
setup allows for comparisons amongst different banks. All the user has
to do is input the banks unique identifier and it will return the
appropriate numbers based on the RCON number. (Sometimes we will do 10
banks side by side). The INDEX MATCH works very well.

THE PROBLEM
The issue is that each tab represents a section of the report. Each
section lists each bank (7,500) and contains approximately 100 RCON
numbers (a data set of 750,000 fields per tab).  My file for a
quarterly report is approximately 20M.  Is there a way to have my
Excel workbook search the text files for the appropriate RCON number
and the banks unique identifier?  That way I can keep my files limited
in size and may be able to include previous Call Reports for Trend
Analysis.

Any help would be extremely appreciated.  (Also let me know if this
format is acceptable to understand my issues :)

Thank you for the reply. It sounds we are on opposite ends of the
Call Report, you feed the info into it, and I extract it. The model I
have is kinda cool and robust, just too large.
 
P

PY & Associates

Thanks for the reply. Is there a source you would recommend that
research this at?  My VBA skills are sub par so I need all the help I
can get!  Also, what are your thoughts about using an INDIRECT? Could
that work if I converted the text files into Excel?  Thanks again.- Hide quoted text -

- Show quoted text -

I did something similar for the health care provider.
Client insisted to do everything with Excel 2007.
The file is 250 MB with reasonable speed.

To keep the file size down, I suggest look into using ms query.
 
J

JLatham

cardan,
Why don't you get in touch with me via email (remove spaces) through
Help From @ JLatham Site.com
remind me of this link, your request and that you are "cardan" in this forum.

The general code is fairly simple, it's the 'parsing' of the found rows of
the text file that have to be tweaked. In general it goes a little like this:

Sub ReadTextFile()
Dim textFileName As String
Dim fileBuff As Integer
Dim rawData As String
Dim RCON As String
RCON=InputBox$("Enter the RCON to search for: ","RCON Entry","")
If RCON="" Then
Exit Sub
End IF
textFileName = Application.GetOpenFilename
fileBuff = FreeFile()
Open textFileName For Input As #fileBuff
Do While Not EOF(fileBuff)
LineInput #fileBuff, rawData
'see if RCON is in the input
If Instr(rawData,RCON)>0 Then
'here is where you would parse the data
'and put what you need from it onto the worksheet
End If
Loop
Close #fileBuff
End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top