Rod Urand said:
Thanks for that, not sure it will work. I need to
ensure the sample has precisely the right ratios
according to the existing data.
First, in general, you can never guarantee "precisely the right ratios" due
to quantization limitations. For example, if 25% of 100 people are female,
you cannot randomly select 47 people such that "precisely" 25% are female.
Second, a truly random sampling cannot "guarantee" even close to the same
ratios of all characteristics (you mentioned ethnicity as well as gender).
A truly random sampling can only guarantee certain ratios with some
probability.
If you were interested in guaranteeing the same ratio (within quantization
limitations) of only two categories, e.g. male and female, you could split
the population into male and female, and then take a random sampling of
INT(N*56%) from the male subgroup and a random sampling of N - INT(N*56%)
from the female subgroup. (Note: Do not try to take ROUND(N*56%,0) and
ROUND(N*44%,0). That might not total N.)
Likewise, if you want the same ratio of male/female and white/non-white,
say, you could make 4 subgroups based on the pairwise characteristics. If
43% of males are white, and 47% of females are white, you could take the
following random samplings from each subgroup: INT(M*43%) from white males
and M - INT(M*43%) from nonwhite males, and INT(F*47%) from white females
and F - INT(F*47%) from nonwhite females, where M = INT(N*56%) and F = N -
INT(N*56%).
But as you can imagine, that strategy breaks down very quickly as the number
of pairwise (or "morewise") characteristics increases. Furthermore, the
quantization error increases as the size of subgroups decreases.
At that point, it is best to take truly random samplings of a "sufficient
size" from the entire population, and let the chips fall where they may.
What constitutes a "sufficient size" depends on the standard deviation and
the population size. There are ways to calculate it in simple cases. But
it may be difficult to calculate when there are many characteristics to
consider.
Actually, it is best to take multiple truly random samplings of the same
size, and average the results. But that is not always feasible.
----- original message -----
Rod Urand said:
Bob,
Thanks for that, not sure it will work. I need to ensure the sample
has precisely the right ratios according to the existing data.
For example, in the list there are 1500 people. 56% are women, 44% are
men.
Of the 500 sample records, 56% or 280 must be women, 44% or 220 must
be men.
----- previous message -----