Need the best way to consolidate data

C

ColeTX

I am new to access - intermediate level with excel. I have a file
containing unique persons (bob, sue, mary). First, in some fields I
have this problem.

First Middle Last Address
Robert T Barker 1701 Lollypop Ln.
Bob Barker 1701 Lollypop Ln.
Bob Tucker Barker 1701 Lollypop Ln.
Bob Barker 1701 Lollypop Lane

The problem lies in identifying these duplicates (no middle initial,
'Lane instead of 'Ln.'


The second problem I have is more complex

First Last Address Sent to Item
Bob Barker 1701 Lollypop Ln. Point A Corn
Bob Barker 1701 Lollypop Ln. Point B Beans
Sue Smith 156 Highway 70 Point B Corn
Sue Smith 156 Highway 70 Point H Squash

I want to change this to

First Last Address Sent to Item Sent to Item
Bob Barker 1701 Lollypop Ln. Point A Corn Point B Beans
Sue Smith 156 Highway 70 Point B Corn Point H Squash

There are an 200 possible locations to be sent to. and there are an
infininite amount of items.

Once consolidated, I want to be able to sort by everyone who directed
to Point B, even thought the data is in multiple columns.
 
T

TedMi

Your assessment of the relative complexities of the 2 problems is backwards.
Your first problem, identifying dups in an address list, is by far the more
complex one. Professional list managers charge BIG Bucks for list cleanup,
and the results are far from perfect. Your best bet is to apply eyeball
inspection and manual intervention.

Your second problem is easy - that's exactly what relational DB's are
designed for. Just forget about thinking in spreadsheet terms of rows,
columns and cells and think in terms of entities ("things") in your
database). Your desired result is a REPORT, not a table. How the underlying
data is actually stored should be very different.

From your description, you have 3 entities: persons, locations, and items.
So you need 3 tables:
Person
PersonID (Primary Key)
First Name
Last Name
Address
etc.

Location
LocationID (Primary Key)
Location name
Any other info particular to location

Item
ItemID (Primary Key)
ItemName

If all item names are short, you can dispense with the ItemID column and use
the ItemName as the PK.

And another table to indicate what was shipped where to whom:
Shipments
PersonID (Foreign Key to Person table)
LocationID (FK to Location)
ItemID or ItemName (FK to Item)
any other info identifying this shipment:
Shipment date?
Ship mode?
Ship cost?

How do you ensure that duplicates do not get entered into this table?
What makes a shipment unique?

Now you can create any kind of report, e.g.

Joe Blow, his address
Point A, Corn
Point B, Beans

Jane Doe, her address
Point C, wheat
Point A, beef
etc.

It's not easy to string the multiple location/item pairs horizontally, and
visually not very useful. How easy is it to read a report handle it if there
are dozens or hundreds of pairs per person? A vertical orientation handles
this quite readily.
 
J

John Vinson

I am new to access - intermediate level with excel. I have a file
containing unique persons (bob, sue, mary). First, in some fields I
have this problem.

First Middle Last Address
Robert T Barker 1701 Lollypop Ln.
Bob Barker 1701 Lollypop Ln.
Bob Tucker Barker 1701 Lollypop Ln.
Bob Barker 1701 Lollypop Lane

The problem lies in identifying these duplicates (no middle initial,
'Lane instead of 'Ln.'

It's a real and difficult problem, and is one of the reasons that
commercial (and often expensive) list-cleaning services exist.
Considering that you might have two entries

Fred G Brown 123 3rd St
Fred G Brown 123 3rd St

who are in fact father and son (you don't show a field for Suffix or
they'd be distinguished as Sr. and Jr.), it's hard to see any
automated way to RELIABLY solve the problem.

My best suggestion would be to have a continuous Form which you can
flexibly sort by any field - say get all the names at one address
together, then all the Barkers sorted by address - and manually fix
them.
The second problem I have is more complex

First Last Address Sent to Item
Bob Barker 1701 Lollypop Ln. Point A Corn
Bob Barker 1701 Lollypop Ln. Point B Beans
Sue Smith 156 Highway 70 Point B Corn
Sue Smith 156 Highway 70 Point H Squash

I want to change this to

First Last Address Sent to Item Sent to Item
Bob Barker 1701 Lollypop Ln. Point A Corn Point B Beans
Sue Smith 156 Highway 70 Point B Corn Point H Squash

There are an 200 possible locations to be sent to. and there are an
infininite amount of items.

You're proposing to take a well-designed, properly normalized table
and trash it by converting it to a non-normalized, improper design.
Don't.

You can create a Crosstab query (with some difficulty) to DISPLAY the
data in this way but storing the data in a table as you propose is not
merely difficult, it's impossible. Tables have fixed width with
predefined nonrepeating fields.
Once consolidated, I want to be able to sort by everyone who directed
to Point B, even thought the data is in multiple columns.

It's perfectly easy in your properly normalized design; it would be
monstrously difficult in your proposed "wide-flat" structure.

If you have People, Items, and Items, and any person can send an
arbitrary number of items to an arbitrary number of places, I'd
suggest a four-table solution:

People
PersonID Primary Key
LastName
MiddleName
FirstName
Address
<other bio info>

Items
ItemID Primary Key
ItemDescription <e.g. "beans">

Locations
LocationID Primary Key
Loction Text

Sendings
PersonID <who sent it>
ItemID <what they sent>
LocationID <where they sent it>
<maybe other info, such as date sent>

This gives you complete flexibility using queries joining these tables
to find everything Joe Doakes sent; or everybody who sent Cayenne
chiles anywhere; or everybody who sent coals to Newcastle.

John W. Vinson[MVP]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top