Splitting and archiving a large database

E

Elizabeth- MDC

I'm designing a database to store and track up to 15,000 customers, their
personal info and shopping history. I'm planning on splitting it since up to
5 users will be on it at any given time.

Originally I was thinking one table for customer info, one table to track
purchase amounts and dates (linked by customer number) and one to track our
monthly giveaway program(again linked by customer number)

My concern is that with thousands of customers with multiple visits each
month, that querying the database and retrieving individual information will
be slow and cumbersome, especially in a year or two.

I was thinking of archiving the data, so that only 6 months of sales and
monthly visits will be accessible except for history reports.

Is this neccassary, and will it speed up the database in any significant way?
 
T

TedMi

15,00 customers and 5 users are no problem for Access. You should not limit
the availability of data to (say) 6 mos because of limitations of software.
You should choose software that supports the availability of data that your
business requires.
 
J

John Vinson

I'm designing a database to store and track up to 15,000 customers, their
personal info and shopping history. I'm planning on splitting it since up to
5 users will be on it at any given time.

Access is multiuser out of the box; unless two users are editing the
*same exact record* at the same time, there's no need to split up the
tables. Thirty simultaneously editing users is normally not a problem;
and 15,000 records is quite a modest sized database. 1,500,000
customers is getting pretty big.

Instead, split the *application*. If you use Tools... Database
Utilities... Database Splitter Wizard you can split the database into
a shared "backend" containing only the tables, and a "frontend" with
the forms, queries, reports, etc.
Originally I was thinking one table for customer info, one table to track
purchase amounts and dates (linked by customer number) and one to track our
monthly giveaway program(again linked by customer number)

That's just good design. I take it now that you have a single big flat
table with many duplicates?
My concern is that with thousands of customers with multiple visits each
month, that querying the database and retrieving individual information will
be slow and cumbersome, especially in a year or two.

See above. You're not even making Access breathe hard yet, assuming
proper table design and indexing.
I was thinking of archiving the data, so that only 6 months of sales and
monthly visits will be accessible except for history reports.

Is this neccassary, and will it speed up the database in any significant way?

You can archive data (to a separate .mdb file IF it is *demonstrably*
necessary. Don't assume that "Access can't possibly handle a 100,000
record table so I have to split first" - if you try it, and *in your
environment, with your equipment, and with your users' tolerance for
speed* - it's too slow, then first optimize the data storage and
indexing; and only if that fails, archive the data.

John W. Vinson[MVP]
 
T

TC

No, Access will deal with those numbers in its sleep - if you do it
properly!

"Properly", in this context, means:

(1) The table structures must be defined correctly. If you know about
"database normalization", you already know what I mean. If you /don't/
know about datbase normalization, you need to read-up on that topic,
before you proceed.

(2) In particular, the tables must be primary-keyed correctly.

(3) The SQL must be written properly & efficiently.

By far the most common cause of low Access databases, IMHO, is bad
table design and poorly written SQL.

With a split front-end/back-end structure, you also need the front-end
to keep a connection open to the back-end, for the duration of the
front-end run. You could google "persistent connection" in these
groups, or ask later on, when you get to that point. It's just a few
lines of code - there are no design issues involved.

HTH,
TC [MVP Access]
 
E

Elizabeth- MDC

Thanks everyone for your help.

Yes my db is normalized. 2NF if I don't do a Zip Code table, 3NF if I do.
And my SQL is fine.

I am still in the designing phase. When I mentioned splitting, I meant fe
and be. As for the customer table, I have no problem with that. That will
only contain around 15,000 records. My concern was with the two linked
tables. These will grow by approx. 7500 records a month(90,000 a year) for
each table.

In terms of speed, I already solved part of my problem by including three
more fields in the "Customers" Table that record running totals for
purchases, date of last visit, and if they received a monthly giveaway. I
figure this will be faster than running a query every time a new customer
comes in to total their purchases(Though adding the field keeps me at 3NF
instead of 4). Detailed info(what is stored in the "Purchases" table) will
be available mainly for reports.

I know there's a possibility of RI problems if I split the back end into two
(or more) files. The Access help file says that RI can still be set if the
files are all in Access. Does anyone have any experience with this?

tina said:
to add to TC's excellent remarks, here is a link to Tony Toews' Performance
FAQs, with additional suggestions:

http://www.granite.ab.ca/access/performancefaq.htm

hth


TC said:
No, Access will deal with those numbers in its sleep - if you do it
properly!

"Properly", in this context, means:

(1) The table structures must be defined correctly. If you know about
"database normalization", you already know what I mean. If you /don't/
know about datbase normalization, you need to read-up on that topic,
before you proceed.

(2) In particular, the tables must be primary-keyed correctly.

(3) The SQL must be written properly & efficiently.

By far the most common cause of low Access databases, IMHO, is bad
table design and poorly written SQL.

With a split front-end/back-end structure, you also need the front-end
to keep a connection open to the back-end, for the duration of the
front-end run. You could google "persistent connection" in these
groups, or ask later on, when you get to that point. It's just a few
lines of code - there are no design issues involved.

HTH,
TC [MVP Access]
 
J

John Vinson

The Access help file says that RI can still be set if the
files are all in Access. Does anyone have any experience with this?

You can enforce RI between two tables ONLY if they are in the same
database file. The reason makes sense: if there is a relational
constraint defined in A.MDB affecting a table in B.MDB, there is no
way that it can be enforced unless A.MDB is open. There's nothing to
stop someone from opening B.MDB directly, or opening it from X.MDB,
and making a change which violates the rule.

You may want to look ahead to the future, and consider putting the
data into SQL/Server, still using Access as a frontend. You can test
this out for free using the MSDE version of SQL which comes with
Access - you can install it from the SQL folder on the OfficePro CD.

John W. Vinson[MVP]
 
T

Tom Ellison

John's analysis here is so teriffic, I just HAD to complement him on it.
The major difficulty in partitioning a logical database between several
physical databases is that you lose the enforcement of referential integrity
(RI). I have never found another major disadvantage of this, and I'd really
like to know if there is one.

MSDE will not only allow you to develop your database, but will probably
allow you to operate it for some time. When the time comes that you exceed
it's limitations, the migration will be painless, except for the cost of SQL
Server, and perhaps a server operating system, since MSDE could be run on a
non-server OS for now.

Tom Ellison
 
E

Elizabeth- MDC

Thank you everyone for all your help.

Is the max size on the MSDE still 2gb? And has anyone had any experience
dealing with SQLServer 2005 Express? Unfortunately cost is an issue.

The interesting thing is that RI is not going to be a big problem with this
db. Even if the customer is deleted, i still want to keep their history for
statistical analysis. The only problem is changing their customer ID which
is the primary field. Each customer has a bar coded card, with the number
identified as the primary key. But if I use an autonumbered field as my key
to link the tables, then there should be no problems with ophaned records
whose parents merely changed their names.

Could someone tell me if my analysis is correct? I haven't had a lot of
experience with this.
 
T

Tom Ellison

Dear Elizabeth:

Is there a possibility that you could summarize the records and still get
good statistics?

Let's say you bracket the data into statistical groups. For a given period
of time, say all of 2005, you have 2481 rows where the value of some
statistic lies between 1000 and 4999.99, with an average value of 2137.14.
You could just keep summaries like this, and still be able to produce
certain statistical summaries, combining this with current values. Is such
an approach conceivable? It could drastically reduce your storage
requirements.

Tom Ellison
 
E

Elizabeth- MDC

Tom,

If I were designing it for myself, then I could just grab the statistics I
needed. Unfortunately the people I'm designing this for can never make up
their minds what they want from any program. Today they might want to simply
track spending by zip code, next year they might want statistics based on the
age of the shopper, then all of a sudden, they need to know how many men
shopped at the Gap in May. I personally would rather save the raw data.

For the day-to-day operation of the database, I don't need the stored
information beyond the customer's personal information found in one table and
possibly six months of previous spending history. The only use for older
data is for reporting and analysis purposes.

I think I might have just figured out a way to consolidate the two history
tables into one. Figure 100,000 records a year for this table plus a
lifetime total of 50,000 records in the Customer info table. Think I can get
a couple of years out of this?


Thank you.


Elizabeth
 
T

Tom Ellison

Dear Elizabeth:

Sizing your database depends on the size of each record, the size of
indexes, and can be a bit difficult to be sure. Given moderate values for
this, 200,000 rows isn't so many. It would allow 10,000 bytes per row, and
that's generally very high. For 10 years I might get a bit concerned. An
empirical test might be useful. Performance is another issue. But perhaps
the analyses you'll be doing will often be scratch work, so they'll have to
wait for it to be programmed, as well as run. You may want to factor in
that the programming time may include time to test the queries as they are
being generated, and if the programmer must wait minutes, or hours for each
test to run, it can take a while to program. There are many uncertain
factors possible here.

If you start with MSDE instead of Jet, you'll be in position to migrate
seamlessly to SQL Server when size or performance becomes the issue. It's a
thought worth considering.

Tom Ellison
 
E

Elizabeth- MDC

Thanks. I think I'm just going to have to do some serious testing with this
one. The tables have few fields, so I'm guessing that I can get a couple of
years out of it, and if neccessary just archive the data every two or three
years if we want to keep it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top