HOW DO I FIND NUMBER OF OCCURENCES FOR DIFFERENT LETTERS IN A FIEL

J

jagadeeshtiru

WHILE I AM CREATING A DATABASE IN ACCESS,
I NEED TO COUNT THE NUMBER OF REPEATED LETTERS FOR EACH ROW.
THE LETTER COUNT SHOULD BE PRINT IN SEPERATE FIELDS FOR EACH DIFFERENT LETTER.

I HAVE 12500 ROWS IN FIELD, CONTAINING A SEQUENCE OF LETTERS OF "A",
"T","G","C" WITH LENGTH <=600.
NOW I NEED TO COUNT THE NUMBER OF A's, T's, G's AND C's PRESENT IN EACH ROW.

THE COUNT MUST BE INTO A NEW FIELD.

CAN YOU SOLVE MY PROBLEM?

PLEASE...

THANKING YOU
 
J

John Vinson

WHILE I AM CREATING A DATABASE IN ACCESS,
I NEED TO COUNT THE NUMBER OF REPEATED LETTERS FOR EACH ROW.
THE LETTER COUNT SHOULD BE PRINT IN SEPERATE FIELDS FOR EACH DIFFERENT LETTER.

I HAVE 12500 ROWS IN FIELD, CONTAINING A SEQUENCE OF LETTERS OF "A",
"T","G","C" WITH LENGTH <=600.
NOW I NEED TO COUNT THE NUMBER OF A's, T's, G's AND C's PRESENT IN EACH ROW.

THE COUNT MUST BE INTO A NEW FIELD.

CAN YOU SOLVE MY PROBLEM?

PLEASE...

THANKING YOU

First off... please turn off your CAPS LOCK key. It's hard to read,
impolite, and looks like you are SHOUTING AT US.

Secondly - this must obviously be a Memo field since text fields are
limited to 255 bytes, right? DNA sequences it seems...

You will need some VBA code to do this. Here's some untested code, but
it should work.

Public Function CountLetter(strSeq as String, strBase as String) _
As Integer
Dim iPos As Integer
CountLetter = 0
For iPos = 1 to Len(strSeq)
If Mid(strSeq, iPos, 1) = strBase Then
CountLetter = CountLetter + 1
End If
Next iPos
End Sub

In a Query in four vacant Field cells, assuming the memo field is
named Seq, type

CountA: CountLetter([Seq], "A")
CountT: CountLetter([Seq], "T")

and so on.

Don't expect this to be blazing fast on a large table!

John W. Vinson[MVP]
 
J

John Nurick

Public Function CountLetter(strSeq as String, strBase as String) _
As Integer
Dim iPos As Integer
CountLetter = 0
For iPos = 1 to Len(strSeq)
If Mid(strSeq, iPos, 1) = strBase Then
CountLetter = CountLetter + 1
End If
Next iPos
End Sub

In a Query in four vacant Field cells, assuming the memo field is
named Seq, type

CountA: CountLetter([Seq], "A")
CountT: CountLetter([Seq], "T")

and so on.

Don't expect this to be blazing fast on a large table!

Hi John,

Out of curiosity I wrote a similar function using Static variables to
avoid iterating the string every time the function is called. I timed
update queries using the two functions, running them against a table
containing 100,000 random sequences, i.e. 400,000 function calls. The
sequences were of random lengths between 300 and 599 characters.

Results of three runs of each query suggest that the Static approach is
usefully faster. These are hh:mm:ss:
Vinson: 00:05:22
Nurick: 00:02:35
Vinson: 00:05:20
Nurick: 00:02:34
Vinson: 00:05:19
Nurick: 00:02:37

Here's the function I used:

Public Function AllAtOnce(Sequence As String, Base As String) As Long
Static S As String
Static ACount As Long
Static CCount As Long
Static GCount As Long
Static TCount As Long
Dim j As Long

If Sequence <> S Then 'perform a count
S = Sequence
ACount = 0
CCount = 0
GCount = 0
TCount = 0

For j = 1 To Len(S)
Select Case Mid(S, j, 1)
Case "A": ACount = ACount + 1
Case "C": CCount = CCount + 1
Case "G": GCount = GCount + 1
Case "T": TCount = TCount + 1
End Select
Next
End If

Select Case Base
Case "A": AllAtOnce = ACount
Case "C": AllAtOnce = CCount
Case "G": AllAtOnce = GCount
Case "T": AllAtOnce = TCount
End Select
End Function
 
B

Brendan Reynolds

See the thread "Count occurances of specific character in string" in the
microsoft.public.access.queries newsgroup
 
J

John Nurick

See the thread "Count occurances of specific character in string" in the
microsoft.public.access.queries newsgroup

Neat. This query

UPDATE tblMain
SET
ACount = Len([Sequence])-Len(Replace([Sequence],"A","")),
CCount = Len([Sequence])-Len(Replace([Sequence],"C","")),
GCount = Len([Sequence])-Len(Replace([Sequence],"G","")),
TCount = Len([Sequence])-Len(Replace([Sequence],"T",""))
;

updated the 100,000 records in about 1'34", a minute faster than my
"static" function.
 
J

John Vinson

UPDATE tblMain
SET
ACount = Len([Sequence])-Len(Replace([Sequence],"A","")),
CCount = Len([Sequence])-Len(Replace([Sequence],"C","")),
GCount = Len([Sequence])-Len(Replace([Sequence],"G","")),
TCount = Len([Sequence])-Len(Replace([Sequence],"T",""))
;

updated the 100,000 records in about 1'34", a minute faster than my
"static" function.

wow. Never would have thought of That one. "Me 'at's off to the Dook!"

John W. Vinson[MVP]
 
J

John Nurick

UPDATE tblMain
SET
ACount = Len([Sequence])-Len(Replace([Sequence],"A","")),
CCount = Len([Sequence])-Len(Replace([Sequence],"C","")),
GCount = Len([Sequence])-Len(Replace([Sequence],"G","")),
TCount = Len([Sequence])-Len(Replace([Sequence],"T",""))
;

updated the 100,000 records in about 1'34", a minute faster than my
"static" function.

wow. Never would have thought of That one. "Me 'at's off to the Dook!"

Hats off, definitely, but I'm kicking myself for not having thought of
it. In Perl I'd probably have done
$Acount = s/A/A/g;
$Ccount = s/C/C/g;
...
which replaces each character with itself and counts the number of
replacements, and from that it's only a small step to TK's solution.
 
G

Graham R Seach

Just to add a dimension to this thread, this is what I use (in VBA):
CountOccurrences = (Len(sSource) - Len(Replace(sSource, sFind, ""))) /
Len(sFind)

Regards,
Graham R Seach
Microsoft Access MVP
Sydney, Australia
---------------------------

John Nurick said:
See the thread "Count occurances of specific character in string" in the
microsoft.public.access.queries newsgroup

Neat. This query

UPDATE tblMain
SET
ACount = Len([Sequence])-Len(Replace([Sequence],"A","")),
CCount = Len([Sequence])-Len(Replace([Sequence],"C","")),
GCount = Len([Sequence])-Len(Replace([Sequence],"G","")),
TCount = Len([Sequence])-Len(Replace([Sequence],"T",""))
;

updated the 100,000 records in about 1'34", a minute faster than my
"static" function.
 
D

Douglas J. Steele

Documented in my April, 2004 Access Answers column in Smart Access
http://members.rogers.com/douglas.j.steele/SmartAccess.html although I use \
Len(sFind) instead of / Len(sFind).

As I state in the article, I got it from a post Mike Sutton (a VB MVP) made
some time back.

--
Doug Steele, Microsoft Access MVP

(no e-mails, please!)



Graham R Seach said:
Just to add a dimension to this thread, this is what I use (in VBA):
CountOccurrences = (Len(sSource) - Len(Replace(sSource, sFind, ""))) /
Len(sFind)

Regards,
Graham R Seach
Microsoft Access MVP
Sydney, Australia
---------------------------

John Nurick said:
See the thread "Count occurances of specific character in string" in the
microsoft.public.access.queries newsgroup

Neat. This query

UPDATE tblMain
SET
ACount = Len([Sequence])-Len(Replace([Sequence],"A","")),
CCount = Len([Sequence])-Len(Replace([Sequence],"C","")),
GCount = Len([Sequence])-Len(Replace([Sequence],"G","")),
TCount = Len([Sequence])-Len(Replace([Sequence],"T",""))
;

updated the 100,000 records in about 1'34", a minute faster than my
"static" function.
 
G

Graham R Seach

I don't read Smart Access. Perhaps I should. It seems it would save me the
trouble of working out stuff for myself.

Regards,
Graham R Seach
Microsoft Access MVP
Sydney, Australia

Microsoft Access 2003 VBA Programmer's Reference
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0764559036.html
Douglas J. Steele said:
Documented in my April, 2004 Access Answers column in Smart Access
http://members.rogers.com/douglas.j.steele/SmartAccess.html although I use
\ Len(sFind) instead of / Len(sFind).

As I state in the article, I got it from a post Mike Sutton (a VB MVP)
made some time back.

--
Doug Steele, Microsoft Access MVP

(no e-mails, please!)



Graham R Seach said:
Just to add a dimension to this thread, this is what I use (in VBA):
CountOccurrences = (Len(sSource) - Len(Replace(sSource, sFind, ""))) /
Len(sFind)

Regards,
Graham R Seach
Microsoft Access MVP
Sydney, Australia
---------------------------

John Nurick said:
See the thread "Count occurances of specific character in string" in the
microsoft.public.access.queries newsgroup

Neat. This query

UPDATE tblMain
SET
ACount = Len([Sequence])-Len(Replace([Sequence],"A","")),
CCount = Len([Sequence])-Len(Replace([Sequence],"C","")),
GCount = Len([Sequence])-Len(Replace([Sequence],"G","")),
TCount = Len([Sequence])-Len(Replace([Sequence],"T",""))
;

updated the 100,000 records in about 1'34", a minute faster than my
"static" function.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top