Microsoft Access Importing Data

T

tonyrulesyall

I am using a software package that exports into Microsoft Access
2002 (also with Windows XP Professional).

The issue that I am having is that the data is exported into
a "flat file" format. For example the data looks like this:

ID First name Last Name City Charges
01 John Doe Boston $11.20
01 John Doe Boston $99.99
01 John Doe Boston $55.99
02 Jane Smith Houston $11.88
02 Jane Smith Houston $88.99

The data is superhuge. Just imagine screen after screen
with user charges, repeated city names, etc.

The data does not seem to be a good fit (the database file
is about a gigabyte plus). Since the data is not relational,
it takes a long time to run. (I believe the person who set it
up before me clocked it at 2 hours just to run a report.)

Can Access break down the tables into relational ones?
For example, extract the raw data and make a city table
(ex. one time "Boston" shows up, one time "Houston" shows up, etc.) I
would like all the data to link up in a one-to-one or one-to-
many connection on it's own, so I don't have to worry about
data integrity. Human error is a real issue here because of
all the data and one mistake could make reports worthless.

These reports will be likely on a weekly basis, so I would
like this to be as user friendly as possible (since it is going
to be done many times). Automation with macros would also
be a big plus (if you could give ideas how I would go about doing
this).

Any ideas? If you could walk me through step by step, I'd really
appreciate it.

Thanks!
 
J

John W. Vinson

I am using a software package that exports into Microsoft Access
2002 (also with Windows XP Professional).

The issue that I am having is that the data is exported into
a "flat file" format. For example the data looks like this:

ID First name Last Name City Charges
01 John Doe Boston $11.20
01 John Doe Boston $99.99
01 John Doe Boston $55.99
02 Jane Smith Houston $11.88
02 Jane Smith Houston $88.99

The data is superhuge. Just imagine screen after screen
with user charges, repeated city names, etc.

Huge as in... 100000 rows, a million, ten million?
The data does not seem to be a good fit (the database file
is about a gigabyte plus). Since the data is not relational,
it takes a long time to run. (I believe the person who set it
up before me clocked it at 2 hours just to run a report.)

Sympathies... said:
Can Access break down the tables into relational ones?
For example, extract the raw data and make a city table
(ex. one time "Boston" shows up, one time "Houston" shows up, etc.) I
would like all the data to link up in a one-to-one or one-to-
many connection on it's own, so I don't have to worry about
data integrity. Human error is a real issue here because of
all the data and one mistake could make reports worthless.

Sure. But if there is garbage in the data being exported from the software
package, you'll still get garbage in the report! At best (and with a lot of
maintenance) a normalized structure with referential integrity enforced will
just cause records to be rejected, say because one record contains "Hoouston"
and it's not found in the Cities table.

Basically, what you would do is have normalized tables set up: a table of
Cities, a table of Customers, and a table of Charges. You could import or link
to your repeating table and run Append queries to migrate data into the
normalized tables; the queries could be called from VBA code.

HOWEVER... lots of questions. Can you count on the ID being unique? What about
duplicate names? Might you have one record with ID 3152, name Dave Smith, and
another record ID 3152, name Josefina Gallegos? Which do you trust - the ID or
the name? If you trust the name, how do you reconcile duplicate names (there
might be two or three or four Dave Smiths)? Do you have a defined set of
cities, or do you want to add new cities to the Cities table based on what
comes in the file? Does the last apply even if there is a record for Hoouston
in the file?

I REALLY think that a serious discussion with whoever is providing the data is
in order!
 
J

John W. Vinson

I am using a software package that exports into Microsoft Access
2002 (also with Windows XP Professional).

The issue that I am having is that the data is exported into
a "flat file" format. For example the data looks like this:

ID First name Last Name City Charges
01 John Doe Boston $11.20
01 John Doe Boston $99.99
01 John Doe Boston $55.99
02 Jane Smith Houston $11.88
02 Jane Smith Houston $88.99

The data is superhuge. Just imagine screen after screen
with user charges, repeated city names, etc.

Just thought after I posted:

You might try creating a new .mdb file (just because the existing one may have
a lot of rubbish in it). Create a table to match your import - empty - but put
nonunique Indexes on whatever fields are used in your report for sorting or
filtering - ID, City, and a two-field index on Last Name and First Name, say.

Then import into this table (or append the data into it from an external
file).

This should make your report run a WHOLE lot faster.
 
M

MrsTasha96

Well, without being able to actually see all the tables and fields you are
working with, I would suggest that you create queries to run your reports.
Queries can group the duplicate information for you. For example, if you
wanted to get a sum of charges for each individual user, you would create a
query with all the fields you want displayed and then add the Total Row,
which would allow you to group the fields and add the Sum the Charges field.

Another option is to use the Make Table option from the Query section so
relationships can be made using the new tables (same info without
duplicates). As far as I know, Access will not automatically create
relationships until you have defined Primary Keys for your tables and
specified which fields have like data types.

I tested the query theory out to make sure it would work. Here's what I did:

The ultimate goal was to establish relationships. In order to establish
relationship I needed to have at least two tables and one of the those tables
must have a primary key defined. You cannot define primary keys with
duplicate information. So we must consolidate the duplicate information
first.

1. Make a copy of the original flat file. This way, if you make errors,
your original is not destroyed.
2. Create a table to list all users contact information. (For this example,
I used the mini-data you provided)
a. Create a new query in Design View.
b. The Show Table dialog box pops up, added the table (I will refer to this
table as “tblFlatFileâ€. I used the prefix ‘tbl’ so I easily identify my
objects in Access.) with the duplicate user information.
c. Double-click the following fields: ID, FirstName, LastName.
d. Click the Totals button (Sigma Symbol) on the toolbar to add the Group By
field. (This is the most important step because this is how Access will get
rid of duplicate entries.)
e. Go to Query Menu and selected Make-Table Query option. Save the table as
“tblUserInfoâ€.
f. Clicked the save button to save the query as “qryUserInfo for Tableâ€.
g. Run the query. It creates a new table with two rows because it
recognizes I only have 2 unique users in the mini-data.
h. Close the new table and query.
3. Open “tblUserInfo†in design view.
a. Make the ID field the primary key by right-clicking the field selector
and selecting primary key.
b. Save the table and close it.
4. Go to the Tools menu and select Relationships.
a. Add both tables (tblFlatFile & tblUserInfo) to the Relationships Window.
b. The ID field in “tblUserInfo†should be in bold. (If it is not, then you
have not defined it as the primary key and need to repeat step 3.)
c. Now click drag the ID field from “tblUserInfo “ right on top of the ID
field in “tblFlatFileâ€.
d. The Edit Relationships dialog box pops up. Check the Enforce Referential
Integrity box, Cascade Update Related Records, and Cascade Delete Related
Records. This way if the user doesn't exist in the tblUserInfo table, you
cannot add charges for the user in the tblFlatFile. Then, click the Create
button.
e. Now, click the Save button on the toolbar and then close the
Relationships window.
5. Confirm the relationships exist.
a. Open the “tblUserInfo†table. You should see plus signs next to the ID
fields. Click the + sign to verify the relationship.
6. Once confirmed, you can create your reports using the new table you
created or by creating custom queries to pull the info out and have Access
make the table for you using the Make Table Query.
Hope that helps. I could send you the sample file I created so you can see
it if you want.
 
M

MrsTasha96

Well, without being able to actually see all the tables and fields you are
working with, I would suggest that you create queries to run your reports.
Queries can group the duplicate information for you as long as the IDs are
unique for each individual. For example, if you wanted to get a sum of
charges for each individual user, you would create a query with all the
fields you want displayed and then add the Total Row, which would allow you
to group the fields and add the Sum the Charges field.

Another option is to use the Make Table option from the Query section so
relationships can be made using the new tables (same info without
duplicates). As far as I know, Access will not automatically create
relationships until you have defined Primary Keys for your tables and
specified which fields have like data types.

I tested the query theory out to make sure it would work. Here's what I did:

The ultimate goal was to establish relationships. In order to establish
relationship I needed to have at least two tables and one of the those tables
must have a primary key defined. You cannot define primary keys with
duplicate information. So we must consolidate the duplicate information
first.

1. Make a copy of the original flat file. This way, if you make errors,
your original is not destroyed.
2. Create a table to list all users contact information. (For this example,
I used the mini-data you provided)
a. Create a new query in Design View.
b. The Show Table dialog box pops up, added the table (I will refer to this
table as “tblFlatFileâ€. I used the prefix ‘tbl’ so I easily identify my
objects in Access.) with the duplicate user information.
c. Double-click the following fields: ID, FirstName, LastName.
d. Click the Totals button (Sigma Symbol) on the toolbar to add the Group By
field. (This is the most important step because this is how Access will get
rid of duplicate entries.)
e. Go to Query Menu and selected Make-Table Query option. Save the table as
“tblUserInfoâ€.
f. Clicked the save button to save the query as “qryUserInfo for Tableâ€.
g. Run the query. It creates a new table with two rows because it
recognizes I only have 2 unique users in the mini-data.
h. Close the new table and query.
3. Open “tblUserInfo†in design view.
a. Make the ID field the primary key by right-clicking the field selector
and selecting primary key.
b. Save the table and close it.
4. Go to the Tools menu and select Relationships.
a. Add both tables (tblFlatFile & tblUserInfo) to the Relationships Window.
b. The ID field in “tblUserInfo†should be in bold. (If it is not, then you
have not defined it as the primary key and need to repeat step 3.)
c. Now click drag the ID field from “tblUserInfo “ right on top of the ID
field in “tblFlatFileâ€.
d. The Edit Relationships dialog box pops up. Check the Enforce Referential
Integrity box, Cascade Update Related Records, and Cascade Delete Related
Records. This way if the user doesn't exist in the tblUserInfo table, you
cannot add charges for the user in the tblFlatFile. Then, click the Create
button.
e. Now, click the Save button on the toolbar and then close the
Relationships window.
5. Confirm the relationships exist.
a. Open the “tblUserInfo†table. You should see plus signs next to the ID
fields. Click the + sign to verify the relationship.
6. Once confirmed, you can create your reports using the new table you
created or by creating custom queries to pull the info out and have Access
make the table for you using the Make Table Query.
Hope that helps. I could send you the sample file I created so you can see
it if you want.
 
K

Ken Sheridan

The following set of queries and explanatory notes are taken from a demo file
I produced some time ago to show how imported data (MasterTable) can be
decomposed into normalized tables. The demo uses a simple international
contacts lists so there are countries, regions and cities involved in the
addresses. Also each contact can have more than one employer. The queries
are listed in the order in which they must be executed. Hopefully it will
illustrate the principles involved:

The query below inserts rows into the Countries table with unique values of
the Country column imported from Excel into the table MasterTable. This is
necessary before rows can be inserted into the Regions table as that table
references the Countries table in a many-to-one relationship.

INSERT INTO Countries (Country)
SELECT DISTINCT Country
FROM MasterTable;

Having inserted rows into the Countries table rows can now be inserted into
the Regions table with the query below. This joins the MasterTable to the
newly filled Countries table on the Country columns and inserts unique values
from the Region column of the MasterTable and the CountryID column of the
Countries table into the Regions table.

INSERT INTO Regions (Region, CountryID)
SELECT DISTINCT Region, CountryID
FROM MasterTable INNER JOIN Countries
ON MasterTable.Country=Countries.Country;

Having inserted rows into the Regions table rows can now be inserted into
the Cities table with the query below. This joins the MasterTable to the
newly filled Regions table on the Region columns. The Countries table is
joined to the MasterTable on the Country columns and to the Regions table on
the CountryID columns, thus taking account of any regions of the same name in
different countries. The query inserts unique values from the City column of
the MasterTable and the RegionID column of the Regions table into the Cities
table.

INSERT INTO Cities (City, RegionID)
SELECT DISTINCT MasterTable.City, Regions.RegionID
FROM Countries INNER JOIN (MasterTable INNER JOIN Regions
ON MasterTable.Region=Regions.Region)
ON (MasterTable.Country=Countries.Country)
AND (Countries.CountryID=Regions.CountryID);

The previous queries inserted rows into the Countries, Regions and Cities
tables. Following the insertion of data into the last of these, Cities, it is
now possible to insert rows into the Contacts table as this only needs to
reference the Cities table, the relevant Region and Country being referenced
via the relationships between these three tables. The query below does this
by joining the MasterTable to both the Cities table, on the City columns, and
to the Regions table, on the Region columns. The Cities table is also joined
to Regions on RegionID and the Countries table is joined to the MasterTable
on Country and the Regions table on Country ID. This is to take account of
the possibility of two cities having the same name, but being in different
regions, which themselves could theoretically have the same name but be in
different countries, so that the correct CityID value is inserted into
Contacts.

For simplicity it is assumed that contacts at the same address have unique
names. This might not always be the case, particularly with commercial
premises (the developer of this demo once worked with two Maggie Taylors in
the same building!). In such cases, however, there is likely to be some
distinguishing value such as Job Title or Department which could be used.

INSERT INTO Contacts ( FirstName, LastName, Address, CityID )
SELECT DISTINCT MasterTable.FirstName, MasterTable.LastName,
MasterTable.Address, Cities.CityID
FROM Countries INNER JOIN ((MasterTable INNER JOIN Cities ON
MasterTable.City = Cities.City)
INNER JOIN Regions ON (Regions.RegionID = Cities.RegionID)
AND (MasterTable.Region = Regions.Region))
ON (Countries.CountryID = Regions.CountryID)
AND (Countries.Country = MasterTable.Country);

The query below inserts rows into the Employers table with unique values of
the Employer column imported from Excel into the table MasterTable. This is
necessary before rows can be inserted into the ContactEmployers table as that
table references the Employers table.

INSERT INTO Employers (Employer)
SELECT DISTINCT Employer
FROM MasterTable;

Having inserted rows into the Contacts and Employers table it is now
possible to insert rows into the ContactEmployers table which models the
many-to-many relationship between Contacts and Employers. The query below
does this by joining the MasterTable to Contacts on the Address, LastName and
Firstname columns and to the Employers table on the Employer columns. The
ContactID values from Contacts and EmployerID values from Employers are
inserted into the two columns of ContactEmployers.

INSERT INTO ContactEmployers (ContactID, EmployerID)
SELECT Contacts.ContactID, Employers.EmployerID
FROM (Contacts INNER JOIN MasterTable
ON (Contacts.Address=MasterTable.Address)
AND (Contacts.LastName=MasterTable.LastName)
AND (Contacts.FirstName=MasterTable.FirstName))
INNER JOIN Employers ON MasterTable.Employer=Employers.Employer;

The above does assume that the imported data is completely consistent, which
as John has cautioned, might not be the case.

If you'd like a copy of the demo itself, which is made up of an Excel file
from which the data is imported and the Access file which decomposes it using
the above queries mail me at:

kenwsheridan<at>yahoo<dot>co<dot>uk

Ken Sheridan
Stafford, England
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top