Find x number of lowest values from a 200 x 200 matrix

G

Grotifant

Hi all,

I've to find the 50 lowest values from a 200 x 200 correlation matrix
and copy it together with the names into a final list in a different
w/sheet. The matrix is of the form:

Name1 Name2 .... Name200
Name1 0.62 .... 0.15
Name2 .... 0.34
....
Name199 0.86

I would like to get an output of the following form:

Column A Column B Column C
Name1 Name200 0.15
Name67 Name 89 0.16
etc.

However, there are two things that make it more difficult:

1.) I would like to keep the original interior color of the names. Each
name has a interior colorindex representing a specific industry group
and I would like to have that reflected in the final list.

2.) Each name should only be included once in the final output, i.e. I
don't want any duplicates in the list of the companies with the lowest
correlation. For example, if name1/name200 form the first pair, then I
would like to make sure that both companies are not included in this
list anymore, even if e.g. name1/name70 has a very low correlation as
well.

Below is what I've done so far. It works except for point 2 above
(i.e.final list includes duplicates). The code uses an array to save
the names, the interior color of the names and the correlation, but I
don't know how to make sure that rows/columns that were used before are
ignored when looking for the next lowest value. Also the fact that I
change the value of the correlation and only use the "min" function is
probably not the most efficient to do.


sub matching_names_with_lowest_correlation()

Dim i As Integer, g As Integer, x As Integer, r As Integer
Dim q As Integer, p As Integer, t As Integer
Dim FirstCell As Range
Dim FoundCell As Range
Dim AllCells As Range
Dim workrange As Range
Dim mymatches(1 To 50, 1 To 5) As Variant

For i = 1 To 50

Set workrange = Selection
MinVal = Application.Min(workrange)
workrange.Find(What:=MinVal).Select

On Error Resume Next
Range(Selection, Selection.End(xlToLeft)).Select
Range(Selection, Selection.End(xlToLeft)).Select
q = Selection.Columns.count - 1
ActiveCell.Select
mymatches(i, 1) = ActiveCell
mymatches(i, 2) = ActiveCell.Interior.ColorIndex

ActiveCell.Offset(0, q).Select

ActiveCell.Select
Range(Selection, Selection.End(xlUp)).Select
p = Selection.Rows.count - 1
ActiveCell.Select
mymatches(i, 3) = ActiveCell
mymatches(i, 4) = ActiveCell.Interior.ColorIndex

ActiveCell.Offset(p, 0).Select
mymatches(i, 5) = ActiveCell

ActiveCell.Value = ActiveCell.Value + 1000
Selection.Interior.ColorIndex = 6

Selection.End(xlUp).Select
Selection.End(xlToLeft).Select

Next i
On Error GoTo 0


' Selects cells based on their formatting

ActiveCell.Offset(1, 1).Select
Range(Selection, Selection.End(xlDown)).Select
Range(Selection, Selection.End(xlToRight)).Select

With Application.FindFormat
.Clear
.Interior.ColorIndex = 6
End With

'Look for first matching cell
Set FirstCell = ActiveSheet.UsedRange.Find(What:="",
SearchFormat:=True)

'Initialize AllCells
Set AllCells = FirstCell
Set FoundCell = FirstCell

'Loop until the FirstCell is found again
Do
Set FoundCell = ActiveSheet.UsedRange.Find _
(After:=FoundCell, What:="", SearchFormat:=True)
If FoundCell Is Nothing Then Exit Do
Set AllCells = Union(FoundCell, AllCells)
If FoundCell.Address = FirstCell.Address Then Exit Do
Loop

AllCells.Select

For Each cell In Selection
cell.Select
ActiveCell.Value = ActiveCell.Value - 1000
Selection.Interior.ColorIndex = 3
Next cell

Worksheets("FinalList").Select

t = 1
For p = 1 To 50

Cells(p, t) = mymatches(p, 1)
Cells(p, t).Interior.ColorIndex = mymatches(p, 2)
Cells(p, t + 1) = mymatches(p, 3)
Cells(p, t + 1).Interior.ColorIndex = mymatches(p, 4)
Cells(p, t + 2) = mymatches(p, 5)

Next p

End Sub



I'm really stuck at this point - any help is greatly appreciated!!!!

Rgds,
Manuel
 
B

Bernie Deitrick

Manuel,

Try using a bit more of Excel's built-in functionality. The code below
should give you the fifty lowest correlation values, with the formatting
preserved, on a new sheet. Select a single cell within the correlation
table, and run the code.

HTH,
Bernie
MS Excel MVP


Sub MatchingNamesWithLowestCorrelation2()
Dim myCell As Range
Dim newSheet As Worksheet
Dim mySheet As Worksheet
Dim i As Long
Dim j As Integer
Dim k As Long
Dim mySelection As Range

Set mySheet = ActiveSheet
Set mySelection = ActiveCell.CurrentRegion
On Error Resume Next
Application.DisplayAlerts = False
Worksheets("New Database").Delete
Set newSheet = Worksheets.Add
newSheet.Name = "New Database"
mySheet.Activate
i = 1

For j = mySelection(1).Column + 1 To
mySelection(mySelection.Cells.Count).Column
For k = mySelection(1).Row + 1 To mySelection(mySelection.Cells.Count).Row
If mySheet.Cells(k, j).Value <> "" Then
Cells(mySelection(1).Row, j).Copy newSheet.Cells(i, 1)
Cells(k, mySelection(1).Column).Copy newSheet.Cells(i, 2)
newSheet.Cells(i, 3).Value = Cells(k, j).Value
i = i + 1
End If
Next k
Next j

With newSheet
.Range("A1").CurrentRegion.Sort Key1:=.Range("C1"), Order1:=xlAscending,
Header:=xlNo
.Range("D1").FormulaR1C1 = _
"=COUNTIF(R1C1:RC[-3],RC[-3])+COUNTIF(R1C2:RC[-2],RC[-2])"
.Range("D1").AutoFill Destination:=.Range("D1:D" &
..Range("C65536").End(xlUp).Row)
.Range("A1").CurrentRegion.Sort Key1:=.Range("D1"), Order1:=xlAscending,
Header:=xlNo
.Range(.Range("A51"), .Range("A65536").End(xlUp)).EntireRow.Delete
.Range("A1").EntireRow.Insert
.Range("A1").Value = "Column Header"
.Range("B1").Value = "Row Label"
.Range("C1").Value = "Values"
.Range("D1").Value = "Total Count"
.Columns("A:D").EntireColumn.AutoFit
End With
Application.DisplayAlerts = True

End Sub
 
G

Grotifant

Thanks a lot for this Bernie. It works fine and your code is much more
neat and efficient than my code.
The only problem that I've got left is that there are still multiple
company names in the list (e.g. name1 occurs 3 times).
Do you think it is possible to create the same list but only with the
top 50 pairs where each pair consists of 2 different companies that are
not part of any of the other pairs?

Thanks again,
Manuel
 
B

Bernie Deitrick

Manuel,

The formula that the code inserts after the initial sort and prior to the
second sort counts the previous occurences of the names. In my testing, the
top values never had any repeats. You should be able to get the same
result. It might be a calculation problem - try setting your calculation
mode to automatic. If that doesn't work, try commenting out the line with
the .EntireRow.Delete, and run the code, then do a manual sort of the data,
sorting first on column C, then re-sorting based on column D (use 2 distinct
sorts, not 1 sort with two criteria). If that doesn't work, you can send
me a workbook and I will take a look at it, and perhaps I will be able to
"sort" it out for you.

HTH,
Bernie
MS Excel MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top