Removing multiple duplicate events

F

Father John

Hi

I have a spreadsheet that has a complete address in one column (a)
In column (b) (B1:B165) I have the name of the suburbs I wish to remove from
col (a)

Start
Col (A)
15 Smith Street Blacktown

Col (B7) contains the suburb Blacktown (I have 165 suburbs in this column)

I want to remove that same data from (A) so I just end up with 15 Smith
Street.

Any ideas?

Thanks in advance - Stephen
 
M

Max

Lightly tested, this should work ok ..
The reference suburbs in B1:B165 are assumed fully populated
With your source addresses in A1 down,
Put this in C1, normal ENTER to confirm will do:
=IF(SUMPRODUCT(--ISNUMBER(SEARCH($B$1:$B$165,A1)))>0,TRIM(SUBSTITUTE(A1,INDEX($B$1:$B$165,MATCH(TRUE,INDEX(ISNUMBER(SEARCH($B$1:$B$165,A1)),),0)),"")),"")
Copy C1 down to the last row of source data in col A. Col C should return
the desired cleansed results. Success? wave it here ..
 
M

Max

A slight correction to the end part of the expression. It should read:
.... )),),0)),"")),A1)
 
A

Ashish Mathur

Hi,

Assume 15 Smith Street Blacktown is in cell A7. In cell C7, enter the
following formula

=TRIM(LEFT(A7,SEARCH(B7,A7)-1))
 
F

Father John

Ashish
The only problem is the data in col. B is a list of the suburbs in a region
and your formula relies by chance as one of the suburbs being on the same
line in the sheet
It would need to say if any of the data in col b. = the same as col a.
delete it from col. A

Many thanks
Stephen
 
F

Father John

Hi Max
Thanks for the effort here!
Not sure now of the entire structure of the expression as the correction
doesn't show if I am replacing a section or adding to it would you be able
to retype the entire line?
With thanks
Stephen West
Gold Coast, Australia
 
S

Steve Dunn

Stephen,

just replace the last "" with A1.

Max,

that's pretty neat! I particularly like the INDEX(array,) wrapping to
elimate the need for array entry. Definitely something I'll be putting to
good use. Cheers.

Steve D.
 
A

Ashish Mathur

Hi,

You may try this array formula (Ctrl+Shift+Enter)

=TRIM(LEFT(A7,MAX(IF(ISNUMBER(SEARCH(B1:B165,A7)),SEARCH(B1:B165,A7)))-1))
 
R

Ron Rosenfeld

Hi

I have a spreadsheet that has a complete address in one column (a)
In column (b) (B1:B165) I have the name of the suburbs I wish to remove from
col (a)

Start
Col (A)
15 Smith Street Blacktown

Col (B7) contains the suburb Blacktown (I have 165 suburbs in this column)

I want to remove that same data from (A) so I just end up with 15 Smith
Street.

Any ideas?

Thanks in advance - Stephen

There is a potential problem with Max's solution if one suburb name is a part
of a subsequent suburb name, or if a suburb name happens to also be a street
name. The problem arises because Max's algorithm does not check that the
position of the match is at the end of the string.

I took a list of eastern Sydney suburbs.

These include, among others

Bondi
Bondi Beach
Bondi Junction

If you have addresses in Bondi Beach, or Bondi Junction, Max's formula will
only remove Bondi.

In other words:

17 Doolittle Place Bondi Beach --> 17 Doolittle Place Beach

One possible solution is to manually ensure that the order of suburbs is such
that the shortest name (assessed by fewest number of words) is at the bottom of
the list, so that those suburbs are listed as

Bondi Junction
Bondi Beach
Bondi

Another issue can arise if a suburb name happens to be part of the address.

For example, Mascot and Eastlakes are both eastern Sydney suburbs. So,
depending on the sort direction,

17 Mascot St Eastlakes might --> 17 St Eastlakes.

Here is a User Defined Function that uses a Regular Expression to ensure the
Suburb name being matched does, indeed, occur at the end of the string.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

C1: =RemSuburb(A1,$B$1:$B$165)

and fill down as needed.

===========================================
Option Explicit
Function RemSuburb(Address As String, Suburbs As Range) As String
Dim re As Object
Dim SuburbString As String
Dim c As Range

For Each c In Suburbs
'create pipe-delimited list of Suburbs
If Len(c.Value) > 0 Then
SuburbString = SuburbString & Trim(c.Value) & "|"
End If
Next c
'remove last pipe
SuburbString = Left(SuburbString, Len(SuburbString) - 1)

Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s(" & SuburbString & ")$"

RemSuburb = re.Replace(Trim(Address), "")
End Function
=================================================
--ron
 
R

Ron Rosenfeld

There is a potential problem with Max's solution if one suburb name is a part
of a subsequent suburb name, or if a suburb name happens to also be a street
name. The problem arises because Max's algorithm does not check that the
position of the match is at the end of the string.

Of course, even with this solution, one could still have suburb name
combinations that would return unwanted results, if the repeated word were at
the end of the suburb name, and occurred first in order.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top