Find records with same, but unknown, value in a field

Petr Danes · May 27, 2007

I have trouble believing that I can't figure this out, but I'm stuck. How do
I set up a condition in a query where I want all the records that have the
same value in a particular field, when I don't know (or care) what that
value actually is?

Here's what I'm doing: This is a rather large database which reads in around
2000 text files, then indexes every word in each file for subsequent
searching. Each file contains many blocks of entries, sort of a file of card
catalogs. Here is a sample of one, with three cards:

.._HS Storch, Johann OKS

..AU
Storch, Johann
..TI
Der kleine Helfer.

..IM
Prag, 1919.

..RO

..PO

..PR

..VA
Vazba
..LI

..SI
Be¼ov nad Teplou 601
..SF
T/71
..OD

..DA
1919
..TD
Dv, Boh
..JZ
Ger
..GE
@PA
.._HS Dannhauser, Alfred AFFE

..AU
Dannhauser, Alfred
..TI
Die Tragoedie der Frau. Das Problem der reiferen Jahre. 2.«
Aufl.

..IM
Stuttgart, 1928.

..RO

..PO

..PR

..VA
Vazba
..LI

..SI
Be¼ov nad Teplou 602
..SF
T/71
..OD

..DA
1928
..TD
Dv
..JZ
Ger
..GE
@PA
.._HS Smolle, Leo NATOC

..AU
Smolle, Leo
..TI
Das Buch von unserem Kaiser. 1848 - 1888. Festschrift ...

..IM
Wien, 1888.

..RO

..PO

..PR

..VA
Vazba
..LI

..SI
Be¼ov nad Teplou 603
..SF
T/71
..OD

..DA
1888
..TD
Nt
..JZ
Ger
..GE
@PA

The start of every card is "._HS texttexttext..", and all the subsequent
lines are in pairs, the .XX identifier preceding the line or lines with the
actual text of interest.

My initial design had a table with a column for each of the .XX fields, but
in the interest of normalization (and my own sanity, there're over seventy
possible fields in each card) I switched to a table with the following
structure:

SouborAutoID (FK to table of file names)
SlovoAutoID (FK to table of unique words)
Linka (Line number in the original text file)
HSBlok (Card number in the original text file)
Tag (Text of the two-letter field identifier)

I need to assemble queries where I look for things like:

(Tag="IM" AND (Slovo="Prague" OR Slovo="Prag" OR Slovo="Praha"))
AND
(Tag="TI" AND (Slovo="Kunratice" OR Slovo="Kynzvart"))

For simplicity I omitted the JOIN clause; Slovo is in a related table, BUT,
I need to find these things all on the same card, that is, where the field
HSBlok is the same. I don't now what that value might be, there will in fact
be many values retrieved by such queries, all I care is that the value is
the SAME on both the IM and the TI record. That is, I need to add a clause
to the above which does something like:

AND HSBlok Is Identical On Both Records

I thought of doing it with a self-join, but what happens when I have three,
four or even more tags to test? This would have been trivial in my original
model, since every card comprised exactly one record, but this way I'm
stumped. Did I over-normalize my design? Is there even such a thing? Have I
finally gone off the deep end? It seems like some sort of grouping clause
should help, but I haven't been able to come up with anything.

Pete

Rene · May 27, 2007

Select Queries
Select New
Select Find Duplicates Qyery Wizard

Rene

Petr Danes · May 28, 2007

Hello Rene,

Thank you for the suggestion, but I'm afraid that a simple count like the
wizard generates won't do the trick. Suppose that I have something like:

..TI
The Case for Tort Reform in the US
..AU
Brown, John & Brown, John Jr.

This generates two records with Brown as the word and AU as the type tag. If
I were to search for something like:
(Tag="TI" AND Slovo="Frankenstein")
AND
(Tag="AU" AND Slovo="Brown")

the Count>1 generated by the duplicate wizard would indicate a match,
because it would find the two "Brown" records with an AU tag, even though
the word Frankenstein does not appear in the title tag. I need to be certain
that the HSBlok fields have identical values for EACH tag-word combination,
since the number of records found meeting each criteria varies.

The other problem is that the main link table has over 9 million records and
growing. Such grouping gives me retrieval speeds that are completely
unacceptable, even if the method did work.

Pete

Word Error Message: This file is corrupt and cannot be opened	1	Nov 30, 2009
What is the syntax for adding to "Blocked Senders List" to OL2K3	5	Jun 15, 2004

Find records with same, but unknown, value in a field

Petr Danes

Rene

Petr Danes

Ask a Question

Similar Threads