P
Petr Danes
I have a database that is coming close to straining Access's capabilities.
It reads in about 2500 text files (library catalogs) and indexes the
location of each word in each file, excluding trivialities like "a", "and",
"the" and so on. There are three main relations - file names, words and a
linking table (each file has many words, each word can be in many files).
The words relation has over 600,000 entries, the linking relation over 7
million. Simple searches go well, but more complicated ones were rather
slow, since they involve SQL created on the fly, with self-joins. Below is a
sample of some such statements.
Simple select, quite fast:
SELECT
HSBlokSoubor.TagWordAutoID,
HSBlokSoubor.SouborAutoID,
HSBlokSoubor.HSBlok
FROM
Soubory
INNER JOIN
HSBlokSoubor
ON
Soubory.SouborAutoID = HSBlokSoubor.SouborAutoID
WHERE
(((Soubory.OmezeneHledani)=True));
More complicated select, not so fast:
SELECT
HB1.SouborAutoID,
HB1.HSBlok,
WT1.TagWord AS TagWord1,
WT2.TagWord AS TagWord2,
WT3.TagWord AS TagWord3
INTO
NalezeneSoubory
FROM
WordTag AS WT1
INNER JOIN
((HSBlokSoubor AS HB1
INNER JOIN
(WordTag AS WT2
INNER JOIN
HSBlokSoubor AS HB2
ON
WT2.TagWordAutoID = HB2.TagWordAutoID)
ON
(HB1.SouborAutoID = HB2.SouborAutoID) AND (HB1.HSBlok = HB2.HSBlok)) INNER
JOIN
(WordTag AS WT3 INNER JOIN HSBlokSoubor AS HB3
ON
WT3.TagWordAutoID = HB3.TagWordAutoID)
ON
(HB2.SouborAutoID = HB3.SouborAutoID) AND (HB2.HSBlok = HB3.HSBlok)) ON
WT1.TagWordAutoID = HB1.TagWordAutoID
WHERE
(((Mid([WT1].[TagWord],4))="amerika" Or (Mid([WT1].[TagWord],4))="America")
AND ((Mid([WT2].[TagWord],4))="rakouska")
AND ((Mid([WT3].[TagWord],4))="francie"));
In an attempt to improve performance, I decided to try splitting off the
data to SQL server and see if a more robust database engine (on the same
computer, no network involved) would better handle the job. I've never split
a database or used SQL server, so as a first try I just ran the splitting
wizard and split the data off to another Access database. The performance
improvement from just this one step is unbelievable - queries that took over
a minute are now done in a few seconds! Does this make sense to anyone? All
the posts on this topic I've seen in the archives complain about reduced
performance after splitting, and recognized experts in these forums
generally agree that reduced performance is part of the price you pay for
the other benefits of splitting; never that performance is an added benefit
of same.
It seems more likely to me that I had some flaw in my original design that
splitting somehow bypassed, but I can't think what it might be. I did
compact and repair the original database after loading the data, since the
size before C&R was over 1.5 GB, pushing the 2GB Access limit. After C&R the
size is 400MB, but there is no noticeable speed improvement.
Does anyone have any thoughts on this? I'm pleased with the improvement, of
course, but rather at a loss to explain it.
Pete
It reads in about 2500 text files (library catalogs) and indexes the
location of each word in each file, excluding trivialities like "a", "and",
"the" and so on. There are three main relations - file names, words and a
linking table (each file has many words, each word can be in many files).
The words relation has over 600,000 entries, the linking relation over 7
million. Simple searches go well, but more complicated ones were rather
slow, since they involve SQL created on the fly, with self-joins. Below is a
sample of some such statements.
Simple select, quite fast:
SELECT
HSBlokSoubor.TagWordAutoID,
HSBlokSoubor.SouborAutoID,
HSBlokSoubor.HSBlok
FROM
Soubory
INNER JOIN
HSBlokSoubor
ON
Soubory.SouborAutoID = HSBlokSoubor.SouborAutoID
WHERE
(((Soubory.OmezeneHledani)=True));
More complicated select, not so fast:
SELECT
HB1.SouborAutoID,
HB1.HSBlok,
WT1.TagWord AS TagWord1,
WT2.TagWord AS TagWord2,
WT3.TagWord AS TagWord3
INTO
NalezeneSoubory
FROM
WordTag AS WT1
INNER JOIN
((HSBlokSoubor AS HB1
INNER JOIN
(WordTag AS WT2
INNER JOIN
HSBlokSoubor AS HB2
ON
WT2.TagWordAutoID = HB2.TagWordAutoID)
ON
(HB1.SouborAutoID = HB2.SouborAutoID) AND (HB1.HSBlok = HB2.HSBlok)) INNER
JOIN
(WordTag AS WT3 INNER JOIN HSBlokSoubor AS HB3
ON
WT3.TagWordAutoID = HB3.TagWordAutoID)
ON
(HB2.SouborAutoID = HB3.SouborAutoID) AND (HB2.HSBlok = HB3.HSBlok)) ON
WT1.TagWordAutoID = HB1.TagWordAutoID
WHERE
(((Mid([WT1].[TagWord],4))="amerika" Or (Mid([WT1].[TagWord],4))="America")
AND ((Mid([WT2].[TagWord],4))="rakouska")
AND ((Mid([WT3].[TagWord],4))="francie"));
In an attempt to improve performance, I decided to try splitting off the
data to SQL server and see if a more robust database engine (on the same
computer, no network involved) would better handle the job. I've never split
a database or used SQL server, so as a first try I just ran the splitting
wizard and split the data off to another Access database. The performance
improvement from just this one step is unbelievable - queries that took over
a minute are now done in a few seconds! Does this make sense to anyone? All
the posts on this topic I've seen in the archives complain about reduced
performance after splitting, and recognized experts in these forums
generally agree that reduced performance is part of the price you pay for
the other benefits of splitting; never that performance is an added benefit
of same.
It seems more likely to me that I had some flaw in my original design that
splitting somehow bypassed, but I can't think what it might be. I did
compact and repair the original database after loading the data, since the
size before C&R was over 1.5 GB, pushing the 2GB Access limit. After C&R the
size is 400MB, but there is no noticeable speed improvement.
Does anyone have any thoughts on this? I'm pleased with the improvement, of
course, but rather at a loss to explain it.
Pete