data filtering based on last two digits of large number

M

Margo Guda

I have a very large database where I want to filter out numbers that do
not have certain two digits as their last two. For example, the column
based on which I want to filter out data contains numbers like
197301310153. I want to filter out anything that does not end in 53.
I tried using a criterium with a formula such as
=RIGHT(TEXT(A5,"###########"),2)="53", where A5 contains the first
record of the database. I put this criterium in A2, and when I select
the advanced filter from the menu I make sure it does not look for a
column label, i.e. the criterium only refers to this cell A2. But this
does not filter out anything. Can anyone here help me?
Thanks in advance.

Margo Guda.
 
B

Bernard Liengme

In a cell on row 5 enter =--RIGHT(A5,2)
Put a name (TEST) in the top row
Select all the data including the top row with names and filter
 
M

Margo Guda

Can you explain a little more? I tried this, but not all records that do
not satisfy the criterium are filtered out. for example, I still see a
record for 200201200642, even though I am trying to get only numbers
ending in 53 to show. I seem to still be doing something wrong.
 
M

Margo Guda

Here is a snapshot of the first few rows of my dataset, containing at
least one value that was supposed to be filtered out. (see the fourth
row of figures). Some values are indeed filtered out, but many less than
should be.

date/time dir speed gust TEST


date/time dir speed gust TEST
200201010053 50 7 *** 53
200201010153 60 7 ***
200201010253 60 10 ***
200201010336 50 9 ***
200201010453 80 7 ***
200201010553 70 9 ***
200201010653 60 9 ***
200201010853 90 18 ***
200201010953 100 17 ***
200201011253 110 15 ***
200201011353 120 13 ***
200201011453 110 15 ***
 
M

Margo Guda

Do you mean add a column with this formula to the whole dataset? (i.e.
in a new column, put one in every cell where column A has a number?).
That would pose a different problem. The file as it now stands is over
14 MB. I'm not sure if my system can handle it if I do this (There are
tens of thousands of data to do!)
 
B

Bernard Liengme

You must copy the formula all the way down the column.
Not really supersized- yet!
 
B

Bernard Liengme

You might want to use Tools | Options | Calculations and set calculations to
manual. Then copy formula. Now press F9 to calculate.
Remember to turn automatic calculations back on.
 
P

pdberger

Margo --

Here's a different approach, because it looks like you want to manipulate
some wind data in other columns:

A B C D
1 12345 2 3
=IF(RIGHT(TEXT(A1,"0"),2)="45",B1*C1,"")
2 22345 2 4
3 12346 2 3

In case it gets garbled at the end of a line, here is cell D1 again:

=IF(RIGHT(TEXT(A2,"0"),2)="45",B2*C2,"")

It says, "Take the number in A2, converting it to a string with a "0000"
format with no decimal points. Take the right two characters and, if they
are "45", multiply b1 * c1. If not, leave the cell blank." This would
require creating a new column. If that's a problem, I suspect you could do
it either with an array formula of some kind.

HTH.

Peter
 
M

Margo Guda

Here is the situation, and the reason why I want to do it with
filtering: This is supposedly hourly wind data for a number of years,
one year per sheet. The file is 14 MB. At unpredictable times, there are
more observations in the hour. Also, some hours are missing. I am
preparing to do an analysis of the hourly data, and the first thing I
want to do is build a matrix, one row per day, one column for each hour.
Thus, I need to weed out the extra observations. Each observation has a
timestamp, the last four digits of which are the hour and the minutes
past the hour it was taken. So, 0145 means the observation is for 01:45.
In that year, most observations were taken at 45 minutes past the hour
(in this example). But in special cases - that I do not need - there may
be more observations for that hour, say at 15 and 30 minutes past the
hour as well. Those I need to weed out. What I had hoped to do is make a
filter that would take only the records with the 45 timestamp, and
either filter the database in place, or better, copy the required
records to a new file. I would then continue to work with the data in
the new file, where I could build my matrix and do the analyses I need.
Bernard's suggestion worked - but only to some extent. Some offending
records would indeed be filtered out, but far from all. His later
approach, and yours, will swell my file to even unwieldier proportions,
and I would still have to do the filtering in order to get rid of the
ballast. And I don't understand why his first stab does not filter out
every record that does not satisfy the criterion. Can you explain that?
 
R

Ron Coderre

I think an Advanced Filter will do what you want easily without copying
thousands of formulas and without increasing the size of the workbook.

You've got 2 ways to go with this method:
1)You can copy the valid records to a new location
or
2)You can delete the invalid records (my preference)

I assumed your data begins in A1

METHOD 1:
F1: TestCrit
F2: =(--RIGHT(A2,2)=53)

Select your data range (mine is A1:D12000)
Data|Filter|Advanced Filter
Check: Copy to another location
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Copy To: $H1
Click the [OK] butotn

All of the valid records, where the date/time values end in 53, will be
copied to the range beginning with cell H1

METHOD 2:
F1: TestCrit
F2: =(--RIGHT(A2,2)<>53)

Select the data range (mine is $A$1:$D$12000)
Data|Filter|Advanced Filter
Check: Filter the list in-place
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Click the [OK] butotn

Only the values that do not end in 53 will be displayed.
(Check that NO valid rows are displayed)

Select from the first visible record under A1 down through the last visible
record.
Edit|Delete..... (Excel only allows you to delete entire rows when filtering)
(That will delete those rows)
Data|Filter|Show....to display the remaining valid rows
File|Save As...so you don't overwrite the original file.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro
 
M

Margo Guda

I finally found how to do this. Somehow the Right(A2,2) etc criterion
did not do the trick, but the following did:
=ISNUMBER(FIND("53",A2,11)). I found this on a website called
Contextures, in a simpler form (without the third parameter, but I
needed that because in some instances I would have two possible
instances of the search figures in the cell). I am confused about why
this works, though. I thought FIND is a text function; would it handle
the large numbers as strings? Excel does think they are numbers, but I
think the filter function does both, somehow? Anyway, I've found a way
of filtering out the bad data, indeed by, once I had a good working
filter, copying the good cells to a new range and then cutting them out
and pasting into a new worksheet. I have not deleted the invalid data
because they do have some validity (although not for the project I'm
working on now). So I need them in there.
Thanks everyone for trying to help.

Margo Guda.

Ron said:
I think an Advanced Filter will do what you want easily without copying
thousands of formulas and without increasing the size of the workbook.

You've got 2 ways to go with this method:
1)You can copy the valid records to a new location
or
2)You can delete the invalid records (my preference)

I assumed your data begins in A1

METHOD 1:
F1: TestCrit
F2: =(--RIGHT(A2,2)=53)

Select your data range (mine is A1:D12000)
Data|Filter|Advanced Filter
Check: Copy to another location
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Copy To: $H1
Click the [OK] butotn

All of the valid records, where the date/time values end in 53, will be
copied to the range beginning with cell H1

METHOD 2:
F1: TestCrit
F2: =(--RIGHT(A2,2)<>53)

Select the data range (mine is $A$1:$D$12000)
Data|Filter|Advanced Filter
Check: Filter the list in-place
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Click the [OK] butotn

Only the values that do not end in 53 will be displayed.
(Check that NO valid rows are displayed)

Select from the first visible record under A1 down through the last visible
record.
Edit|Delete..... (Excel only allows you to delete entire rows when filtering)
(That will delete those rows)
Data|Filter|Show....to display the remaining valid rows
File|Save As...so you don't overwrite the original file.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro


:

Here is a snapshot of the first few rows of my dataset, containing at
least one value that was supposed to be filtered out. (see the fourth
row of figures). Some values are indeed filtered out, but many less than
should be.

date/time dir speed gust TEST


date/time dir speed gust TEST
200201010053 50 7 *** 53
200201010153 60 7 ***
200201010253 60 10 ***
200201010336 50 9 ***
200201010453 80 7 ***
200201010553 70 9 ***
200201010653 60 9 ***
200201010853 90 18 ***
200201010953 100 17 ***
200201011253 110 15 ***
200201011353 120 13 ***
200201011453 110 15 ***
 
R

Ron Coderre

I'm glad you got that to work.

I'm curious, though:
You stated that =ISNUMBER(FIND("53",A2,11)) finds the 53 and =Right(A2,2)
does not. Did you use the criteria formula that was posted?:
=(--RIGHT(A2,2)=53)

The double-negative-operators are necessary in that form of the criteria.
=RIGHT(A2,2) returns the word "53" and you'd need to use this instead:
=(RIGHT(A2,2)="53")

=--RIGHT(A2,2) returns the number 53

If you did use the correct form of the criteria then that seems to indicate
that the value in A2 does not end in 2 digits. It may have trailing spaces
or non-printing characters. If you're curious, try this:

=LEN(A2)

If that formula returns a number greater than 12 (the length of
200201010053), you've found the complication.

***********
Regards,
Ron

XL2002, WinXP-Pro


Margo Guda said:
I finally found how to do this. Somehow the Right(A2,2) etc criterion
did not do the trick, but the following did:
=ISNUMBER(FIND("53",A2,11)). I found this on a website called
Contextures, in a simpler form (without the third parameter, but I
needed that because in some instances I would have two possible
instances of the search figures in the cell). I am confused about why
this works, though. I thought FIND is a text function; would it handle
the large numbers as strings? Excel does think they are numbers, but I
think the filter function does both, somehow? Anyway, I've found a way
of filtering out the bad data, indeed by, once I had a good working
filter, copying the good cells to a new range and then cutting them out
and pasting into a new worksheet. I have not deleted the invalid data
because they do have some validity (although not for the project I'm
working on now). So I need them in there.
Thanks everyone for trying to help.

Margo Guda.

Ron said:
I think an Advanced Filter will do what you want easily without copying
thousands of formulas and without increasing the size of the workbook.

You've got 2 ways to go with this method:
1)You can copy the valid records to a new location
or
2)You can delete the invalid records (my preference)

I assumed your data begins in A1

METHOD 1:
F1: TestCrit
F2: =(--RIGHT(A2,2)=53)

Select your data range (mine is A1:D12000)
Data|Filter|Advanced Filter
Check: Copy to another location
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Copy To: $H1
Click the [OK] butotn

All of the valid records, where the date/time values end in 53, will be
copied to the range beginning with cell H1

METHOD 2:
F1: TestCrit
F2: =(--RIGHT(A2,2)<>53)

Select the data range (mine is $A$1:$D$12000)
Data|Filter|Advanced Filter
Check: Filter the list in-place
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Click the [OK] butotn

Only the values that do not end in 53 will be displayed.
(Check that NO valid rows are displayed)

Select from the first visible record under A1 down through the last visible
record.
Edit|Delete..... (Excel only allows you to delete entire rows when filtering)
(That will delete those rows)
Data|Filter|Show....to display the remaining valid rows
File|Save As...so you don't overwrite the original file.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro


:

Here is a snapshot of the first few rows of my dataset, containing at
least one value that was supposed to be filtered out. (see the fourth
row of figures). Some values are indeed filtered out, but many less than
should be.

date/time dir speed gust TEST


date/time dir speed gust TEST
200201010053 50 7 *** 53
200201010153 60 7 ***
200201010253 60 10 ***
200201010336 50 9 ***
200201010453 80 7 ***
200201010553 70 9 ***
200201010653 60 9 ***
200201010853 90 18 ***
200201010953 100 17 ***
200201011253 110 15 ***
200201011353 120 13 ***
200201011453 110 15 ***


Bernard Liengme wrote:

In a cell on row 5 enter =--RIGHT(A5,2)
Put a name (TEST) in the top row
Select all the data including the top row with names and filter
 
M

Margo Guda

Ron said:
I'm glad you got that to work.

I'm curious, though:
You stated that =ISNUMBER(FIND("53",A2,11)) finds the 53 and =Right(A2,2)
does not. Did you use the criteria formula that was posted?:
=(--RIGHT(A2,2)=53)
Yes. But I did not say it did not find them - just that some values that
I wanted to filter out also slipped through, for instance those with 46
instead of 53. THe 53s were not suppressed.
The double-negative-operators are necessary in that form of the criteria.
=RIGHT(A2,2) returns the word "53" and you'd need to use this instead:
=(RIGHT(A2,2)="53")

=--RIGHT(A2,2) returns the number 53

If you did use the correct form of the criteria then that seems to indicate
that the value in A2 does not end in 2 digits. It may have trailing spaces
or non-printing characters. If you're curious, try this:

=LEN(A2)
I still think there is some confusion with the 12-digit number being
handled as a string by excel, even though it seems to be in number
format. After some manipulation of the worksheet, it seemed to think the
number was a string, not a number.
Anyway since I got this to work, and some other sorting and filtering
things I tried to do suggest conversion to a string somewhere along the
way, I'll stick to this method, since it works well and is not too
consuming of resources.
I appreciate your input. Thanks for following up, too.

Signing off,
Margo Guda.
If that formula returns a number greater than 12 (the length of
200201010053), you've found the complication.

***********
Regards,
Ron

XL2002, WinXP-Pro


:

I finally found how to do this. Somehow the Right(A2,2) etc criterion
did not do the trick, but the following did:
=ISNUMBER(FIND("53",A2,11)). I found this on a website called
Contextures, in a simpler form (without the third parameter, but I
needed that because in some instances I would have two possible
instances of the search figures in the cell). I am confused about why
this works, though. I thought FIND is a text function; would it handle
the large numbers as strings? Excel does think they are numbers, but I
think the filter function does both, somehow? Anyway, I've found a way
of filtering out the bad data, indeed by, once I had a good working
filter, copying the good cells to a new range and then cutting them out
and pasting into a new worksheet. I have not deleted the invalid data
because they do have some validity (although not for the project I'm
working on now). So I need them in there.
Thanks everyone for trying to help.

Margo Guda.

Ron said:
I think an Advanced Filter will do what you want easily without copying
thousands of formulas and without increasing the size of the workbook.

You've got 2 ways to go with this method:
1)You can copy the valid records to a new location
or
2)You can delete the invalid records (my preference)

I assumed your data begins in A1

METHOD 1:
F1: TestCrit
F2: =(--RIGHT(A2,2)=53)

Select your data range (mine is A1:D12000)
Data|Filter|Advanced Filter
Check: Copy to another location
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Copy To: $H1
Click the [OK] butotn

All of the valid records, where the date/time values end in 53, will be
copied to the range beginning with cell H1

METHOD 2:
F1: TestCrit
F2: =(--RIGHT(A2,2)<>53)

Select the data range (mine is $A$1:$D$12000)
Data|Filter|Advanced Filter
Check: Filter the list in-place
List Range: $A$1:$D$12000
Criteria Range: $F$1:$F$2
Click the [OK] butotn

Only the values that do not end in 53 will be displayed.
(Check that NO valid rows are displayed)

Select from the first visible record under A1 down through the last visible
record.
Edit|Delete..... (Excel only allows you to delete entire rows when filtering)
(That will delete those rows)
Data|Filter|Show....to display the remaining valid rows
File|Save As...so you don't overwrite the original file.

Does that help?

***********
Regards,
Ron

XL2002, WinXP-Pro


:



Here is a snapshot of the first few rows of my dataset, containing at
least one value that was supposed to be filtered out. (see the fourth
row of figures). Some values are indeed filtered out, but many less than
should be.

date/time dir speed gust TEST


date/time dir speed gust TEST
200201010053 50 7 *** 53
200201010153 60 7 ***
200201010253 60 10 ***
200201010336 50 9 ***
200201010453 80 7 ***
200201010553 70 9 ***
200201010653 60 9 ***
200201010853 90 18 ***
200201010953 100 17 ***
200201011253 110 15 ***
200201011353 120 13 ***
200201011453 110 15 ***


Bernard Liengme wrote:


In a cell on row 5 enter =--RIGHT(A5,2)
Put a name (TEST) in the top row
Select all the data including the top row with names and filter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top