Hi Ron
The purpose of this process is to come up with a total cost for all
similar items (e.g the total cost of printers, regardless of brand -
Epson/Panasonic).
The problem is to come up with a list of keywords of items, as there
are a few thousand rows of different items with different brands (and
many are overlapping matches as you have pointed out). I do not know
entirely what are the items contain in the list.
I was hoping that if excel could highlight / identify these items
(say, keyword=printer) which are similar then I could use the
SUMPRODUCT function to total up all cost based on the keywords.
But then again, excel obviously would not know how to distinguish
between an item (handphone, printer, cond, etc) from brands (Epson,
panasonic, etc).
I reckon the best way is to get excel to list all duplicate words
(regardless if its item / brand), then I will manually pick out the
items from the list.
I wonder if there is a better way around this.
Many thanks again for your effort and time in looking at this.
SauQ
I think you will have to develop your keyword list manually. There are
routines that can generate a list of individual words, sorted either
alphabetically or by word count. Then you can manually decide which you want
to use as keywords (perhaps with wild cards) in a SUMPRODUCT or SUMIF formula.
Given your initial example:
YORK A/COND
SANYO 1.0HP air-cond
CANON laser printer (model : L-P'TER*LBP-3050)
EPSON - Batching system printer
EPSON - Dot matrix printer
Nokia 1200 - handphone
HP printer
Panasonic K Printer
Panasonic X printer
Panasonic P-P1121 printer
Nokia N6070 handphone
Nokia 1200 handphone
Microphone
Panasonic printer
2nd hand hand phone
And defining a "word" as having only letters, digits or hyphens; and also being
at least two characters in length, here is a list of individual words sorted by
frequency (descending):
printer 8
Panasonic 4
Nokia 3
handphone 3
COND 2
EPSON 2
1200 2
hand 2
YORK 1
SANYO 1
0HP 1
air-cond 1
CANON 1
laser 1
model 1
L-P 1
Batching 1
system 1
Dot 1
matrix 1
HP 1
P-P1121 1
N6070 1
Microphone 1
2nd 1
phone 1
And here is a list sorted alphabetically (ascending):
0HP 1
1200 2
2nd 1
air-cond 1
Batching 1
CANON 1
COND 2
Dot 1
EPSON 2
hand 2
handphone 3
HP 1
laser 1
L-P 1
matrix 1
Microphone 1
model 1
N6070 1
Nokia 3
Panasonic 4
phone 1
P-P1121 1
printer 8
SANYO 1
system 1
YORK 1
There are instructions in the macro posted below for how to change from numeric
to alpha sorting; and also how to change from ascending to descending sorts.
Also, in the macro, as of now, the data source (rSrc) is set to Selection; and
the output destination is set to start at C1 (rDest). These can be changed or
set up in different ways.
Maybe this will help.
To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this Macro (Sub):
Select your data range (unless you have modified the Sub to do it
automatically).
Then <alt-F8> opens the macro dialog box. Select the macro by name, and
<RUN>.
===================================
Option Explicit
Option Compare Text
Sub UniqueWordList()
Dim rSrc As Range, rDest As Range, c As Range
Dim cWordList As Collection
Dim res() As Variant
Dim w() As String
Dim i As Long
Set cWordList = New Collection
Set rSrc = Selection
Set rDest = Range("C1")
rDest.EntireColumn.NumberFormat = "@"
For Each c In rSrc
w = Split(c.Value)
For i = 0 To UBound(w)
w(i) = StripWord(w(i))
If Not w(i) = "" Then
On Error Resume Next
cWordList.Add Item:=w(i), Key:=w(i)
On Error GoTo 0
End If
Next i
Next c
'transfer words to results array
ReDim res(1 To cWordList.Count, 0 To 1)
For i = 1 To cWordList.Count
res(i, 0) = cWordList(i)
Next i
'get counts
For i = LBound(res) To UBound(res)
For Each c In rSrc
res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0))
Next c
Next i
'sort alpha: d=0; sort numeric d=1
'there are various ways of sorting
BubbleSort res, 0
rDest.CurrentRegion.Clear
For i = LBound(res) To UBound(res)
rDest.Offset(i, 0).NumberFormat = "@"
rDest.Offset(i, 0).Value = res(i, 0)
'For just lowercase output, use:
'rDest.Offset(i, 0).Value = LCase(res(i, 0))
rDest.Offset(i, 1).Value = res(i, 1)
Next i
End Sub
Private Function StripWord(s As String) As String
Dim re As Object, mc As Object, m As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "[-\w]{2,}"
If re.test(s) = True Then
Set mc = re.Execute(s)
StripWord = mc(0).Value
End If
Set re = Nothing
End Function
Private Function CountWord(ByVal s As String, sPat) As Long
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.IgnoreCase = True
re.Pattern = "\b" & sPat & "\b"
Set mc = re.Execute(s)
CountWord = mc.Count
End Function
Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension
Dim temp(0, 1) As Variant
Dim i As Integer
Dim NoExchanges As Integer
' Loop until no more "exchanges" are made.
Do
NoExchanges = True
' Loop through each element in the array.
For i = LBound(TempArray) To UBound(TempArray) - 1
' If the element is less than the element
' following it, exchange the two elements.
' change "<" to ">" to sort ascending
If TempArray(i, d) > TempArray(i + 1, d) Then
NoExchanges = False
temp(0, 0) = TempArray(i, 0)
temp(0, 1) = TempArray(i, 1)
TempArray(i, 0) = TempArray(i + 1, 0)
TempArray(i, 1) = TempArray(i + 1, 1)
TempArray(i + 1, 0) = temp(0, 0)
TempArray(i + 1, 1) = temp(0, 1)
End If
Next i
Loop While Not (NoExchanges)
End Sub
=====================================================
--ron