K
Keith
I have a data extraction / cleansing problem.
A computer help desk system has ticket information with textual detail
placed in a field called 'Comments'. This data has been extracted to
Excel spreadsheets for the last few months. I cannot know exactly
where the 'Comments' column may be in each spreadsheet, however I know
that the ending column location for each row is 'BA'.
We need to extract phone numbers and extensions out of this data.
Phone numbers can be 10 digit telephone numbers such as: 123-123-1234
or 1231231234.
The 10 digit telephone numbers can have 4 or 5 digit extensions (or
PBX stations) such as 12345 or 1234.
The telephone combination at times can be put together such as
1231231234/12345.
Phone numbers can also be 7 digit (local) numbers such as 1231234 or
123-1234.
This information is intermixed with text and I need to extract the
phone numbers out of the Comments.
Comments are like this:
"Joe Blow 1231231234/12345 needs to have his extension moved to new
desk. Call site manager Jack Black 1231232345/09876 for access."
* I need to pull out "1231231234", "12345", "1231232345" and "09876"
from the above and put the extracted data at the end of the
spreadsheet row. Given the end of the row is at BA, and extracted
numbers from the above, put "1231231234" in that row's BB column,
"12345" in BC column, "1231232345" in BD column and "09876" in BE
column.
"Add VM 1234, updated dictionary, added 2345 & 2346. Completed by Jay
Smith."
* I need to pull out "1234", "2345" and "2346" and put "1234" in BB
column, "2345" in BC column and "2346" in BD column.
"AS000AD00SK000DD000R0E0R0WQ0E0D0D0 Jane Smith 4441231237. Setup vmail
for ext 1237. Added to call pickup group as 4948."
* I need to pull out "4441231237", "1237" and "2346" and put
"4441231237" in BB column, "1237" in BC column and "2346" in BD
column.
"4441234567 thru 7890 have been added to VNET Thanks!"
* I need to pull out "4441234567" and "7890" and put "4441234567" in
BB column and "7890" in BC column.
I have found a VBA utility that can pull numbers out of a string from
http://www.google.com/groups?hl=en&lr=&ie=UTF-8&[email protected]
(or http://www.mvps.org/dmcritchie/excel/grove_digitsid.htm), but that
only returns one number.
I was thinking about adding a parent function which would get the
column, tokenize the contents of 'Comments' on space, check the length
of the tokenized sting content length > 3 and if it is pass the
tokenized sting content to the above routine. If it returns a good
number do some comparisons to be sure it passes the above business
rules and put the results in the end of that row.
Being a good reuse programmer, I was wondering if anyone has something
that does something kind-of similar to what I need before I start on
my coding journey.
Please reply to the group, so that other people can search on this and
(hopefully) find a good answer.
Thanks,
Keith
A computer help desk system has ticket information with textual detail
placed in a field called 'Comments'. This data has been extracted to
Excel spreadsheets for the last few months. I cannot know exactly
where the 'Comments' column may be in each spreadsheet, however I know
that the ending column location for each row is 'BA'.
We need to extract phone numbers and extensions out of this data.
Phone numbers can be 10 digit telephone numbers such as: 123-123-1234
or 1231231234.
The 10 digit telephone numbers can have 4 or 5 digit extensions (or
PBX stations) such as 12345 or 1234.
The telephone combination at times can be put together such as
1231231234/12345.
Phone numbers can also be 7 digit (local) numbers such as 1231234 or
123-1234.
This information is intermixed with text and I need to extract the
phone numbers out of the Comments.
Comments are like this:
"Joe Blow 1231231234/12345 needs to have his extension moved to new
desk. Call site manager Jack Black 1231232345/09876 for access."
* I need to pull out "1231231234", "12345", "1231232345" and "09876"
from the above and put the extracted data at the end of the
spreadsheet row. Given the end of the row is at BA, and extracted
numbers from the above, put "1231231234" in that row's BB column,
"12345" in BC column, "1231232345" in BD column and "09876" in BE
column.
"Add VM 1234, updated dictionary, added 2345 & 2346. Completed by Jay
Smith."
* I need to pull out "1234", "2345" and "2346" and put "1234" in BB
column, "2345" in BC column and "2346" in BD column.
"AS000AD00SK000DD000R0E0R0WQ0E0D0D0 Jane Smith 4441231237. Setup vmail
for ext 1237. Added to call pickup group as 4948."
* I need to pull out "4441231237", "1237" and "2346" and put
"4441231237" in BB column, "1237" in BC column and "2346" in BD
column.
"4441234567 thru 7890 have been added to VNET Thanks!"
* I need to pull out "4441234567" and "7890" and put "4441234567" in
BB column and "7890" in BC column.
I have found a VBA utility that can pull numbers out of a string from
http://www.google.com/groups?hl=en&lr=&ie=UTF-8&[email protected]
(or http://www.mvps.org/dmcritchie/excel/grove_digitsid.htm), but that
only returns one number.
I was thinking about adding a parent function which would get the
column, tokenize the contents of 'Comments' on space, check the length
of the tokenized sting content length > 3 and if it is pass the
tokenized sting content to the above routine. If it returns a good
number do some comparisons to be sure it passes the above business
rules and put the results in the end of that row.
Being a good reuse programmer, I was wondering if anyone has something
that does something kind-of similar to what I need before I start on
my coding journey.
Please reply to the group, so that other people can search on this and
(hopefully) find a good answer.
Thanks,
Keith