J
jimbo
When 50% of a data set is a text value, and the rest is numeric,
typically the average is calculated by either making the text value a
numeric, then calculating the mean, or, ignoring the text value and
average the numeric values only.
For instance, the lowest detectable limit for serum aluminum may be
<5, which is a text value. The remaining data ranges between 5-100.
Making all <5s = 5 will overestimate the mean because the <5 values
range between 0 and 4.9, not 5. By ignoring the <5 is worse because
if the distribution is gaussian, the mean should be close to 47, where
the other approach has a mean around 20.
But there should be a way to take the slope of the distribution as it
approaches the lower reportable limit and the number of text values
then determine the numeric equivalent of the values <5 and then, with
the numeric values, determine the mean of the data set as well as
other statistical measurements.
Does anyone know of such a calculation, a similar, or equivalent
calculation?
typically the average is calculated by either making the text value a
numeric, then calculating the mean, or, ignoring the text value and
average the numeric values only.
For instance, the lowest detectable limit for serum aluminum may be
<5, which is a text value. The remaining data ranges between 5-100.
Making all <5s = 5 will overestimate the mean because the <5 values
range between 0 and 4.9, not 5. By ignoring the <5 is worse because
if the distribution is gaussian, the mean should be close to 47, where
the other approach has a mean around 20.
But there should be a way to take the slope of the distribution as it
approaches the lower reportable limit and the number of text values
then determine the numeric equivalent of the values <5 and then, with
the numeric values, determine the mean of the data set as well as
other statistical measurements.
Does anyone know of such a calculation, a similar, or equivalent
calculation?