Hi,
I've a rather large spreadsheet, and I want to extract the hastags from it (Literally twitter like so my column may contain 'My query is about #excel #spreadsheets today')
I've tried search, find, vlookup but can't find anything that's going to work without causing errors, or without requiring a predifined list of tags to lookup.
I want the result to extract the actual hash tag(s) so I can report on how many times it has been used, so if anyone can help with either a formula or a pivot table with this it'd be greatly appreciated.
Just to clarify, there is no set list of pre-defined hash tags, and they can appear in any location within the column
Many thanks in advance!
Iain
This can be done with a VBA Macro.
I would first generate a list of unique hash tags
Then count each one.
Finally, write the results back to the worksheet.
The following assumes your data source is in column A; and that you want the results in Columns D:E (both of those assumptions are easily changed)
In addition, the results are unsorted (essentially in the order of the first appearance of the tags in your list), but a sort routine could be added to sort either alphabetically, or by count, if that is preferable.
To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro by name, and <RUN>.
===========================================
Option Explicit
Sub HashTagCount()
Dim colHashTags As Collection
Dim vSrc As Variant, rSrc As Range
Dim rDest As Range
Dim vRes() As Variant
Dim L As Long
Dim v As Variant
Dim c As Range, sFirstAddress As String
Dim re As Object, mc As Object, m As Object
Set rSrc = Range("A1", Cells(Rows.Count, "A").End(xlUp))
vSrc = rSrc
Set rDest = Range("D1")
'generate list of unique hash tags
Set colHashTags = New Collection
Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.Pattern = "#\S+"
End With
For Each v In vSrc
If re.test(v) = True Then
Set mc = re.Execute(v)
On Error Resume Next
With WorksheetFunction
For Each m In mc
colHashTags.Add Item:=.Proper(m), Key:=.Proper(m)
Next m
End With
On Error GoTo 0
End If
Next v
'get count of each hash tag
ReDim vRes(1 To colHashTags.Count, 0 To 1)
For L = 1 To colHashTags.Count
vRes(L, 0) = colHashTags(L)
With re
.Pattern = colHashTags(L)
.ignorecase = True
End With
With rSrc
Set c = .Find(what:=colHashTags(L), _
LookIn:=xlValues, _
lookat:=xlPart, _
MatchCase:=False)
If Not c Is Nothing Then
sFirstAddress = c.Address
Do
Set mc = re.Execute(c.Text)
vRes(L, 1) = vRes(L, 1) + mc.Count
Set c = .FindNext(after:=c)
If c Is Nothing Then Exit Do
Loop While c.Address <> sFirstAddress
End If
End With
Next L
Set rDest = rDest.Resize(rowsize:=UBound(vRes, 1), columnsize:=2)
rDest.EntireColumn.ClearContents
rDest = vRes
rDest.EntireColumn.AutoFit
End Sub
====================================