How do I pull a random sample of people from a list in excel?

P

PM

I am trying to pull 600 contact records from a list in excel of over 50,000.
I think a random sample will work fine. There are 2 demographic markers that
I have pulled counts on for the entire database. I need to pull a similar
ratio of contacts in my sample of 600, but I believe if I pull a random
sample it will work out. Any ideas on how to do this?
 
D

DOR

You can pull a random sample of N items from a column using this
technique:

Assuming your items are in A2:A100, enter in column B, a helper column,

=IF(RAND()<(SampSize-COUNT($B$1:B1))/(ROWS($A$2:$A$100)-(ROW()-ROW($2:$2))),ROW(),"")

where SampSize is the name of a cell containing the number of items you
want in your sample.

Drag/copy down to he end of your list.

In C2 enter

=IF(ROW()-ROW($1:$1)>SampSize,"",INDEX(A:A,SMALL(B:B,ROW()-ROW($1:$1))))

Drag it down as far as you want, at least to cover the number of items
you want in your sample. You can drag it further; you should just get
blanks beyond your sample size.

SampSize is a variable, and can be changed at any time to give you a
different sized sample, as long as you have enough instances of the
formula in column C.

Hope this helps

Declan O'Riordan
 
D

DOR

OOOPS! I gave you a more complex formula than you needed for the
helper column. It can just be =RAND(), as JE McGimpsey points out in
his referenced web site. My formula still works; it just selects the
random sample differently since it was originally formulated for a
slightly different application.

However, the suggestion for column C still stands - it enables you to
make the sample size a spreadsheet variable. It works with RAND() in
the helper column. When used with my original helper column formula it
presents the samples in the sequence in which they exist in the list
rather than the random number sequence in which they would be presented
using JE's web-site method 2, which may or may not be a benefit.

HTH

Declan
 
D

DOR

Double OOOPS!

I was hasty in saying that if you change the helper column to =RAND()
that the formula I suggested for column C still stands. It doesn't; it
needs to be changed to

=IF(ROW()-ROW($1:$1)>SampSize,"",INDEX($A:$A,MATCH(SMALL(B:B,ROW()-ROW($1:$1)),B:B,0)))

where SampSize refers to the variable, sample size.

I overlooked the MATCH expression, which was unnecessary in my version.

Declan O'R
 
A

Alan

PM said:
I am trying to pull 600 contact records from a list in excel of over
50,000. I think a random sample will work fine. There are 2
demographic markers that I have pulled counts on for the entire
database. I need to pull a similar ratio of contacts in my sample
of 600, but I believe if I pull a random sample it will work out.
Any ideas on how to do this?

Hi,

This is the simplest way I know:

If you have your data in columns B1 to D50000 then do the following:

Enter the following into A1 and copy down to A50000:

=RAND()

Now in E1 enter:

=VLOOKUP(SMALL(A$1:A$50000,ROW()),A$1:D$50000,Col_Offset,FALSE)

Where Col_Offset would be '2' to return the data in column B, '3' to
return the data in column C etc..

Copy E1 down to E600 (to get a sample of 600 items).

HTH,

Alan.

--
The views expressed are my own, and not those of my employer or anyone
else associated with me.

My current valid email address is:

(e-mail address removed)

This is valid as is. It is not munged, or altered at all.

It will be valid for AT LEAST one month from the date of this post.

If you are trying to contact me after that time,
it MAY still be valid, but may also have been
deactivated due to spam. If so, and you want
to contact me by email, try searching for a
more recent post by me to find my current
email address
 
A

Alan

I should just comment that it is theoretically possible to get two
duplicate random numbers.

However, since RAND() evaluates to 15 decimal places, I guess that
means that, even for the 50,000th line, the chances of getting a
duplicate number are only about 50,000 in 10^15 or about 5 in 10^11
(if I have that right).

That is quite a lot less than the chanes of randomly selecting two
people from a list of everyone on Earth and getting the same person
both times.

I can live with that in a practical world, and if it is an issue, you
can always put an error check in your worksheet to alert to
duplication if you really want to.

Alan.


--
The views expressed are my own, and not those of my employer or anyone
else associated with me.

My current valid email address is:

(e-mail address removed)

This is valid as is. It is not munged, or altered at all.

It will be valid for AT LEAST one month from the date of this post.

If you are trying to contact me after that time,
it MAY still be valid, but may also have been
deactivated due to spam. If so, and you want
to contact me by email, try searching for a
more recent post by me to find my current
email address
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top