P
Peter Hallett
I have encountered a rather interesting problem. I have a large file of
badly-entered names and addresses – nothing to do with me, I hasten to add.
There are many instances of multiple spaces and other spuriously repeated
characters. Before outputting this data, I therefore put the file through a
VBA filter, which is largely successful in removing this ‘debris’. It simply
takes each field in turn and examines it, character by character, using the
Mid function, in a nested set of Do loops. For specified characters,
particularly the space, if two adjacent characters are found to be the same
then one of them is ignored. Oddly, although this removes by far the
majority of the unwanted padding, a few instances of repeated spaces still
remain. Debugging has shown that, in these cases, although the Mid function
returns “ “ for both adjacent characters, the test:–
If (stTestChar1 = “ “) And (stTestChar2 = “ “) Then …..
fails to identify a match. I conclude from this that two different ASCII
characters are being interpreted as spaces.
I cannot immediately see how such characters can be conveniently identified.
One way would be to test whether the cardinal value of each corresponding
ASCII character lies within one of the acceptable ranges. This would
certainly impose a much smaller processing overhead than would individually
testing each character, but I can’t immediately see how to do this. Can
anyone enlighten me?
badly-entered names and addresses – nothing to do with me, I hasten to add.
There are many instances of multiple spaces and other spuriously repeated
characters. Before outputting this data, I therefore put the file through a
VBA filter, which is largely successful in removing this ‘debris’. It simply
takes each field in turn and examines it, character by character, using the
Mid function, in a nested set of Do loops. For specified characters,
particularly the space, if two adjacent characters are found to be the same
then one of them is ignored. Oddly, although this removes by far the
majority of the unwanted padding, a few instances of repeated spaces still
remain. Debugging has shown that, in these cases, although the Mid function
returns “ “ for both adjacent characters, the test:–
If (stTestChar1 = “ “) And (stTestChar2 = “ “) Then …..
fails to identify a match. I conclude from this that two different ASCII
characters are being interpreted as spaces.
I cannot immediately see how such characters can be conveniently identified.
One way would be to test whether the cardinal value of each corresponding
ASCII character lies within one of the acceptable ranges. This would
certainly impose a much smaller processing overhead than would individually
testing each character, but I can’t immediately see how to do this. Can
anyone enlighten me?