T
toadflax
Hi,
I am struggling with a data set which is drawn from several different
sources and thus has a large amount of duplicated information. It's in
three columns, with the contents of first two columns usually being
duplicated (unless there's a typo or such) but the value in the third
column differing because it indicates the source of the data. There
are varying numbers of these rows from one (only coming from a single
data source) to eight (coming from all eight data sources). What I
want to do is to combine the data on a single row by looking row-by-
row at the first two columns. If they contain exactly the same values,
I'd like to combine them onto a single row, with the third and
subsequent columns in the row showing the data source information. If
they only come from one data source or the first two columns aren't
exact duplicates, I'd like them left as is. Can this be done with a
macro or some other way? I've done a good bit of searching but
couldn't find anything that exactly addressed my situation.
For example (with semi-colons being used to indicate columns, they
aren't actually in the data):
Group;Family;State
PD;Imaginaceae;NSW
PD;Imaginaceae;Qld
PD;Imaginaceae;WA
CL;Frustraceae;Qld
CP;Fakaceae;Qld
PG;Fakaceae;NT
PG;Furphyaceae;Qld
PG;Furphyaceae;WA
What I want it to look like is:
Group;Family;State1;State2;State3;...State8
PD;Imaginaceae;NSW;Qld;WA
CL;Frustraceae;Qld
CP;Fakaceae;Qld
PG;Fakaceae;NT
PG;Furphyaceae;Qld;WA
Do I need to sort the data alphabetically by say family and then group
for the macros to work, or can it be in any order throughout the file?
Additionally, some of the names are bolded and this means something,
so I'd rather not lose it. Also, I may at some future time want to add
other sources of data, so it would be great if there were comments or
something to indicate what if anything I'd need to change if I decide
to run this on such a file with more than eight possible rows... Don't
ask for much, do I?
Thanks very much for any help,
Michelle
I am struggling with a data set which is drawn from several different
sources and thus has a large amount of duplicated information. It's in
three columns, with the contents of first two columns usually being
duplicated (unless there's a typo or such) but the value in the third
column differing because it indicates the source of the data. There
are varying numbers of these rows from one (only coming from a single
data source) to eight (coming from all eight data sources). What I
want to do is to combine the data on a single row by looking row-by-
row at the first two columns. If they contain exactly the same values,
I'd like to combine them onto a single row, with the third and
subsequent columns in the row showing the data source information. If
they only come from one data source or the first two columns aren't
exact duplicates, I'd like them left as is. Can this be done with a
macro or some other way? I've done a good bit of searching but
couldn't find anything that exactly addressed my situation.
For example (with semi-colons being used to indicate columns, they
aren't actually in the data):
Group;Family;State
PD;Imaginaceae;NSW
PD;Imaginaceae;Qld
PD;Imaginaceae;WA
CL;Frustraceae;Qld
CP;Fakaceae;Qld
PG;Fakaceae;NT
PG;Furphyaceae;Qld
PG;Furphyaceae;WA
What I want it to look like is:
Group;Family;State1;State2;State3;...State8
PD;Imaginaceae;NSW;Qld;WA
CL;Frustraceae;Qld
CP;Fakaceae;Qld
PG;Fakaceae;NT
PG;Furphyaceae;Qld;WA
Do I need to sort the data alphabetically by say family and then group
for the macros to work, or can it be in any order throughout the file?
Additionally, some of the names are bolded and this means something,
so I'd rather not lose it. Also, I may at some future time want to add
other sources of data, so it would be great if there were comments or
something to indicate what if anything I'd need to change if I decide
to run this on such a file with more than eight possible rows... Don't
ask for much, do I?
Thanks very much for any help,
Michelle