sorting alphanumerical characters

L

lumi

Hi, I want to do an index for molecular formulas. Normally I import a file (.csv) and I have one column (text) containing:
C3H2
C12H6F3
C12H6FO2
C6H12NO2
C10H5F3 and so on
I would like to do a sorting in which C atoms are first sorted from 1-~, then H from 1 -~ and then alphabetical order of the rest of the elements(i.e F,N,O)in ascending order. The final list should look like:
C3H2
C6H12NO2
C10H5F3
C12H6FO2
C12H6F3
Any suggestions please?
Thank you
Luminita
 
B

Bernie Deitrick

Luminita,

You can use a macro to sort chemical formulas. The version I've written
will sort a list starting in cell A1, extending down column A without any
gaps, and with nothing else on the sheet. As written, it will sort based on
5 element symbols, and the count of the first four elements. I'm pretty sure
it works, but would appreciate any feedback, good or bad. Copy the code into
a regular code module, then run the macro after you've imported your list.

HTH,
Bernie
MS Excel MVP

Option Explicit

Sub SortChemicalFormulas()
'Macro to sort chemical formulas in column A
'starting in cell A1.
'This will work for all chemical formulas that have
'fewer than 100 of any element.
'Elements must be in proper case: H, N, Na, Cl
'Written by Bernie Deitrick July 1, 2004
'If you use this, let me know at
' deitbe at consumer dot org
'To decipher my address, simply take out spaces
'and change the dot to a . and the at to @

Dim myCell As Range
Dim i As Integer
Dim UpperC As Boolean
Dim TwoLetter As Boolean
Dim isNum As Boolean
Dim TwoNum As Boolean
Dim Count1 As Boolean
Dim myRange As Range

For Each myCell In Range("A1").CurrentRegion
For i = 1 To Len(myCell.Value)
UpperC = False
TwoLetter = False
TwoNum = False
isNum = False
Count1 = False

'Find leading Upper Case letters or numbers
If Not IsNumeric(Mid(myCell.Value, i, 1)) Then
If Mid(myCell.Value, i, 1) = UCase(Mid(myCell.Value, i, 1)) Then
UpperC = True
End If
Else
isNum = True
End If

'Differentiate between cases like NO, N2O, NaO and Na2O
If UpperC Then
If Not IsNumeric(Mid(myCell.Value, i + 1, 1)) Then
If Mid(myCell.Value, i + 1, 1) = _
LCase(Mid(myCell.Value, i + 1, 1)) Then
TwoLetter = True
If Not IsNumeric(Mid(myCell.Value, i + 2, 1)) Then
Count1 = True
End If
Else
Count1 = True
End If
End If
End If

'Find if there are two digits after the symbol
If isNum Then
If IsNumeric(Mid(myCell.Value, i + 1, 1)) Then
TwoNum = True
End If
End If

'Write the symbols and numbers out to the sheet for later sorting
If TwoLetter Or TwoNum Then
Cells(myCell.Row, 256).End(xlToLeft)(1, 2).Value = _
Mid(myCell.Value, i, 2)
i = i + 1
If Count1 Then
Cells(myCell.Row, 256).End(xlToLeft)(1, 2).Value = 1
End If
Else
Cells(myCell.Row, 256).End(xlToLeft)(1, 2).Value = _
Mid(myCell.Value, i, 1)
If Count1 Then
Cells(myCell.Row, 256).End(xlToLeft)(1, 2).Value = 1
End If
End If

Next i
Next myCell

Set myRange = Range("A1").CurrentRegion
If Application.WorksheetFunction.CountBlank(myRange) > 0 Then
myRange.SpecialCells(xlCellTypeBlanks).Value = 0
End If
With myRange.Resize(myRange.Rows.Count, 10)
.Sort Key1:=Range("H1"), Order1:=xlAscending, _
Key2:=Range("I1"), Order2:=xlAscending, _
Key3:=Range("J1"), Order3:=xlAscending, _
Header:=xlGuess, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
.Sort Key1:=Range("E1"), Order1:=xlAscending, _
Key2:=Range("F1"), Order2:=xlAscending, _
Key3:=Range("G1"), Order3:=xlAscending, _
Header:=xlGuess, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
.Sort Key1:=Range("B1"), Order1:=xlAscending, _
Key2:=Range("C1"), Order2:=xlAscending, _
Key3:=Range("D1"), Order3:=xlAscending, _
Header:=xlGuess, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom
.Range(Range("B1"), Cells(1, Range("A1").CurrentRegion. _
Columns.Count)).EntireColumn.Delete
End With

End Sub

lumi said:
Hi, I want to do an index for molecular formulas. Normally I import a file
(.csv) and I have one column (text) containing:
C3H2
C12H6F3
C12H6FO2
C6H12NO2
C10H5F3 and so on
I would like to do a sorting in which C atoms are first sorted from 1-~,
then H from 1 -~ and then alphabetical order of the rest of the elements(i.e
F,N,O)in ascending order. The final list should look like:
 
D

David McRitchie

Hi Luminita,

The numbers will be normalized at 3 digits, in the following example:

C3H2 C003H002 =personal.xls!ChemNSort(A1,3)
C6H12NO2 C006H012NO002 =personal.xls!ChemNSort(A2,3)
C10H5F3 C010H005F003 =personal.xls!ChemNSort(A3,3)
C12H6F3 C012H006F003 =personal.xls!ChemNSort(A4,3)
C12H6FO2 C012H006FO002 =personal.xls!ChemNSort(A5,3)


Function NormDigits(cell As String, Optional p As Long) As String
'David McRitchie, newuser, 2004-07-01
'-- http://www.mvps.org/dmcritchie/sorting.htm
Dim i As Long, n As String, s As String
Dim newstr As String
s = UCase(Trim(cell))
If p = 0 Then p = 3 '-- default is 3 digits for nomenclature
newstr = ""
n = ""
reloop:
For i = 1 To Len(s)
If Mid(s, i, 1) Like "[0-9]" Then
n = n & Mid(s, i, 1)
ElseIf n = "" Then
newstr = newstr & Mid(s, i, 1)
Else
newstr = newstr & Format(n, Left("0000000", p))
n = ""
newstr = newstr & Mid(s, i, 1)
End If
Next i
If n <> "" Then newstr = newstr & Format(n, Left("0000000", p))
NormDigits = newstr
End Function

could also work for TCP/IP addresses
1.1.1.1 001.001.001.001 =personal.xls!ChemNSort(A7,3)
31.32.34.250 031.032.034.250 =personal.xls!ChemNSort(A8,3)

---
HTH,
David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001]
My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm
Search Page: http://www.mvps.org/dmcritchie/excel/search.htm

lumi said:
Hi, I want to do an index for molecular formulas. Normally I import a file (.csv) and I have one column (text) containing:
C3H2
C12H6F3
C12H6FO2
C6H12NO2
C10H5F3 and so on
I would like to do a sorting in which C atoms are first sorted from 1-~, then H from 1 -~ and then alphabetical order of the rest
of the elements(i.e F,N,O)in ascending order. The final list should look like:
 
B

Bernie Deitrick

David,

A function like yours is a better idea than my macro, since there is
unlimited length for sorting (so I will re-write mine as a function).
However, your function would also need to handle elements with only 1 atom,
as well as two letter chemical symbols.

NaCl => Na001Cl001
CHNO => C001H001N001O001

Bernie
MS Excel MVP

David McRitchie said:
Hi Luminita,

The numbers will be normalized at 3 digits, in the following example:

C3H2 C003H002 =personal.xls!ChemNSort(A1,3)
C6H12NO2 C006H012NO002 =personal.xls!ChemNSort(A2,3)
C10H5F3 C010H005F003 =personal.xls!ChemNSort(A3,3)
C12H6F3 C012H006F003 =personal.xls!ChemNSort(A4,3)
C12H6FO2 C012H006FO002 =personal.xls!ChemNSort(A5,3)


Function NormDigits(cell As String, Optional p As Long) As String
'David McRitchie, newuser, 2004-07-01
'-- http://www.mvps.org/dmcritchie/sorting.htm
Dim i As Long, n As String, s As String
Dim newstr As String
s = UCase(Trim(cell))
If p = 0 Then p = 3 '-- default is 3 digits for nomenclature
newstr = ""
n = ""
reloop:
For i = 1 To Len(s)
If Mid(s, i, 1) Like "[0-9]" Then
n = n & Mid(s, i, 1)
ElseIf n = "" Then
newstr = newstr & Mid(s, i, 1)
Else
newstr = newstr & Format(n, Left("0000000", p))
n = ""
newstr = newstr & Mid(s, i, 1)
End If
Next i
If n <> "" Then newstr = newstr & Format(n, Left("0000000", p))
NormDigits = newstr
End Function

could also work for TCP/IP addresses
1.1.1.1 001.001.001.001 =personal.xls!ChemNSort(A7,3)
31.32.34.250 031.032.034.250 =personal.xls!ChemNSort(A8,3)

---
HTH,
David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001]
My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm
Search Page: http://www.mvps.org/dmcritchie/excel/search.htm

Hi, I want to do an index for molecular formulas. Normally I import a file (.csv) and I have one column (text) containing:
C3H2
C12H6F3
C12H6FO2
C6H12NO2
C10H5F3 and so on
I would like to do a sorting in which C atoms are first sorted from 1-~,
then H from 1 -~ and then alphabetical order of the rest
 
B

Bernie Deitrick

Here's a function version, used like

=ChemName(A1)

and then copied down to match your list.

Sort all the cells based on the column with the function.

HTH,
Bernie
MS Excel MVP

Function ChemName(myCell As Range) As String
'Function used to sort chemical formulas
'Used like =ChemName(A1), then copied down to match list.
'This will work for all chemical formulas that have
'fewer than 100 of any element.
'Elements must be in proper case: H, N, Na, Cl
'Written by Bernie Deitrick July 1, 2004
'If you use this, let me know at
' deitbe at consumer dot org
'To decipher my address, simply take out spaces
'and change the dot to a . and the at to @

Dim i As Integer
Dim UpperC As Boolean
Dim TwoLetter As Boolean
Dim isNum As Boolean
Dim TwoNum As Boolean
Dim Count1 As Boolean
Dim myRange As Range

ChemName = ""

For i = 1 To Len(myCell.Value)
UpperC = False
TwoLetter = False
TwoNum = False
isNum = False
Count1 = False

'Find leading Upper Case letters or numbers
If Not IsNumeric(Mid(myCell.Value, i, 1)) Then
If Mid(myCell.Value, i, 1) = UCase(Mid(myCell.Value, i, 1)) Then
UpperC = True
End If
Else
isNum = True
End If

'Differentiate between cases like NO, N2O, NaO and Na2O
If UpperC Then
If Not IsNumeric(Mid(myCell.Value, i + 1, 1)) Then
If Mid(myCell.Value, i + 1, 1) = _
LCase(Mid(myCell.Value, i + 1, 1)) Then
TwoLetter = True
If Not IsNumeric(Mid(myCell.Value, i + 2, 1)) Then
Count1 = True
End If
Else
Count1 = True
End If
End If
End If

'Find if there are two digits after the symbol
If isNum Then
If IsNumeric(Mid(myCell.Value, i + 1, 1)) Then
TwoNum = True
End If
End If

'Write the symbols and numbers out to the function
If TwoLetter Or TwoNum Then
ChemName = ChemName & Mid(myCell.Value, i, 2)
i = i + 1
If Count1 Then
ChemName = ChemName & "01"
End If
Else
ChemName = ChemName & "0" & Mid(myCell.Value, i, 1)
If Count1 Then
ChemName = ChemName & "01"
End If
End If

Next i

End Function

lumi said:
Hi, I want to do an index for molecular formulas. Normally I import a
file
(.csv) and I have one column (text) containing:
C3H2
C12H6F3
C12H6FO2
C6H12NO2
C10H5F3 and so on
I would like to do a sorting in which C atoms are first sorted from 1-~,
then H from 1 -~ and then alphabetical order of the rest of the
elements(i.e
F,N,O)in ascending order. The final list should look like:
 
D

David McRitchie

Hi Bernie,
I was thinking of that after I posted then looked to see if you did that.
So here is my new soluton. Hope the hyphens don't confuse, but
the purpose is strictly for sorting.

Function ChemNDigits(cell As String, Optional p As Long) As String
' David McRitchie, newusers, 2004-07-01
' http://www.mvps.org/dmcritchie/sorting.htm & tcpip.htm
Dim i As Long, n As String, s As String, c As String
Dim newstr As String
s = Trim(cell)
If p = 0 Then p = 3
newstr = ""
n = 0
'------ end of initialization
c = Left(s, 1)
For i = 2 To Len(s)
If Mid(s, i, 1) Like "[A-Z]" Then
If c <> "" Then
newstr = newstr & Left(c & "--", 2)
End If
n = Application.WorksheetFunction.Max(1, n)
newstr = newstr & Format(n, Left("0000000", p))
n = 0
c = Mid(s, i, 1)
ElseIf Mid(s, i, 1) Like "[a-z]" Then
c = c & Mid(s, i, 1)
ElseIf Mid(s, i, 1) Like "[0-9]" Then
If c <> "" Then newstr = newstr & Left(c & "--", 2)
n = n & Mid(s, i, 1)
c = ""
Else
newstr = newstr & "..error.."
End If
Next i
If c <> "" Then newstr = newstr & Left(c & "--", 2)
n = Application.WorksheetFunction.Max(1, n)
newstr = newstr & Format(n, Left("0000000", p))
ChemNDigits = newstr 'Chemical Nomenclature
End Function

ignore spaces hard to get letter to line up

NaCl => Na001Cl 001
CHNO => C- 000H -000N-001O-001
 
D

David McRitchie

Hi Luminita,
I don't know if the digits can go up to 100 or not, if they can't,
you can change it to =ChemNDigits(A1,2)
or change the function default to p=2 instead of p=3
It really doesn't matter because it is just for sorting not for viewing.

The real reason for another posting was to point out that if you
want to convert your Chemical names to show subscripted
numbers you can use David Hagar's subroutine. It will not
change the value but it will make them look better.
'David Hager, 2002-01-22, programming
' http://google.com/groups?selm=uDg8ZVuoBHA.2284@tkmsftngp05
You will probably have to fix the line wraps for a couple of lines after copying.

You can see the results of my Function and David Hagar's subroutine
at http://www.mvps.org/dmcritchie/excel/sorttcp.htm#chemndigits
 
L

lumi

Thanks a lot. I will give it a try today and see how it works.I have never used macros before but I think I can manage at this stage. Yes I would like to have subscripts but I before I sorted the problem by copy/paste to Word and subscripts all the column with numbers, then converting the table to text.
Cheers
Luminita
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top