counting elements of two arrays

G

Geoff

Hello

I have two arrays that contain data that I am using for correlations.
The two arrays are dynamic (part of a larer data set) Each array will
ahve a different number of missing values. The two arrays are a time
series, the position in the array is related to specific time stamp. I
need to be able to count the number of times there is a value in both
arrays for each timestamp and report that value. This will help in
determining the validity of the correlations.

The data set is a workbook with 20 sheets each contianing 200 columns
and 8700 rows. The data is arranged in columns for variables and the
rows represent an individual timestamp.

I am currently selecting one column from the workbook and then looping
through the workbook to find columns that have significant
correlations.

The problem is that I am getting high correlations because excel does
a pairwise deletion and so if there are two values that are common and
both change positively then I get a high correlation but the value of
the correlation in interpreting the data is a waste. as most the
values are being ignored because they are not common.

I am stuck at how to start the process. Coudl some one please suggest
a place to start.

Kind regards
Geoff
 
J

Joel

What math function(s) are you using to get your correlation. How many bit
accuracy are you look at.
 
G

Geoff

I am using the intenal correlation function within excel. I am not
looking for an incredible accuracy. What I need to know is how many
common elements are in the two arrays that I am producing the
correlation for. I think it is a Pearson function but that is entirely
speculative :)

Tha data is froma real world data collection system for a paper
machine. There are large gaps in the data after removing text and
erroneous data. For the data that is left I need some idea of what
correlates with key parameters and am now realising that I need to
know if I can believe the correlation.

I thought a simple way of doing this would be to check that there is
at least 100 or more values that are common to the two arrays. 100 may
not be the absolute number but it can be a starting point.

I am hoping there is a simple way to do this. I could just go through
the two arrays and check if there is a value for each row in the array
and once it reaches 100 check the correlation. This would be very slow
for a macro that has such a large body of data to check.

Regards, Geoff
 
J

Joel

The Correlation function will give you a 1 when the two columns are exacttly
the same. When they don't match, the results closestt to one (absolute) is
the best results.


Correlattion function in Excel ignores empty cells. If you first put a
value like 0 into the empty cells then you would get better results.

I think the best result would be to use correlation but when a cell is empty
place the same value in both columns. This way correlation will count these
cells

You should go to the excel worksheet and from Inset Menu - Function -
Correl. the select help (lowerr left corner of window). This will show you
the calculation that is made for Correl.

The formula basical is calculatting the difference between the cells in the
two columns, and then dividing the product of the squares. It ignores blank
cells in the calculation. Putting the same value in both columns will not
chbange the results of the Numerator of the function, but does change the
value of the denominator.

I don't think it will take a macro very long to automatically put the same
value in both columns. iff you tell me how your columns are aranged I will
write the macro. It take me only 5 minutes.

Are your columns A & B, the D & E leavving one empty column between pairs,
or some other arrangement.
 
G

Geoff

Hello Joel

I do not want to put in a value in the empty cells. I don't want to
force a correlation I want to ignore columns that have correlations
based on only a few values. I have 20 sheets each with 200 columns and
8700 rows. I want to find out which columns have a real correlation
with a particular columns of interest.

Here is the macro that I am currently using. The tagname is allows me
to choose from the data the column that I want to find out what
correlates so that I can then have acloser look at the data to see
what is the value of the correlation. Some columns will have 6000+
data points and others will ahve 100. But if only 10 of the data
points overlap with the 6000 then any correlation is not significant
as there is insufficiant data for my purposes. To force the
correlation to happen will not help me identify which items to look at
more closely.

I want to identify which columns contain more than say 100 common data
points.

Thank you for your help.

Sub CorrelSearch()
Dim startrow As Integer, endrow As Integer, startcol As Integer,
endcol As Integer
Dim r As Long, c As Long
Dim StartTime As Double
Dim myArray() As Variant, rsltarray() As Variant
Dim myRange As Range
Dim clcell As Long
Dim aSheet As Object
Dim dupval As Variant, rsltcount As Variant
Dim colarray() As Variant, corelarray() As Variant
Dim colave As Double, colsd As Double
Dim testval As Double
Dim tagname As Variant
Dim srchrange As Variant
Dim corelval As Double
Dim coreltest As Double
Dim rsltrange As Range


StartTime = Timer

' find from lookup table tag name
tagname = InputBox(prompt:="Enter the tagname",
Title:="Correlation selection")
coreltest = InputBox(prompt:="Enter the correlation value")
' copy selected tag data to an array
For Each aSheet In ActiveWorkbook.Sheets
Worksheets(aSheet.Name).Select
Set srchrange = Range(Cells(1, 1), Cells(1, 256))
' srchrange = Range(Selection)
If WorksheetFunction.CountIf(srchrange, tagname) = 1 Then
Cells.Find(What:=tagname, LookIn:=xlFormulas,
LookAt:=xlPart).Activate
corelarray = Range(Cells(7, ActiveCell.Column),
Cells(ActiveSheet.Cells.SpecialCells(xlLastCell).Row,
ActiveCell.Column))
End If

Next aSheet

' cycle through all pages in workbook and check correlation
rsltcount = 0
ReDim rsltarray(2000, 8)
For Each aSheet In ActiveWorkbook.Sheets
Application.ScreenUpdating = True
Worksheets(aSheet.Name).Select
Application.ScreenUpdating = False
Set myRange = Range("B7", ActiveCell.SpecialCells(xlLastCell))
startcol = myRange.Column
endcol = myRange.Columns(myRange.Columns.Count).Column
startrow = myRange.Row
endrow = myRange.Rows(myRange.Rows.Count).Row
ReDim myArray(startrow To endrow, startcol To endcol)
myArray() = myRange
For c = 1 To endcol - startcol + 1
Application.StatusBar = ActiveSheet.Name & ", column " & c
colarray = Application.Index(myArray, 0, c)
On Error Resume Next
If ArrayCount(colarray) > ArrayCount(corelarray) * 0.1
Then
corelval = WorksheetFunction.Correl(corelarray,
colarray)
If Abs(corelval) > coreltest Then
colave = WorksheetFunction.Average(colarray)
colsd = WorksheetFunction.StDev(colarray)
rsltcount = rsltcount + 1
rsltarray(rsltcount, 1) = Cells(1, c + 1)
rsltarray(rsltcount, 2) = Cells(2, c + 1)
rsltarray(rsltcount, 3) = colave
rsltarray(rsltcount, 4) = colsd
rsltarray(rsltcount, 5) = colsd / colave * 100
rsltarray(rsltcount, 6) = corelval
rsltarray(rsltcount, 7) = corelval * corelval
rsltarray(rsltcount, 8) =
WorksheetFunction.Count(colarray)
End If
End If
Next c


Next aSheet

Sheets.Add after:=ActiveSheet
Set rsltrange = Range(Cells(1, 1), Cells(rsltcount, 8))
rsltrange.Value = rsltarray
ActiveSheet.Name = tagname
' Display elapsed time
Application.ScreenUpdating = True
MsgBox Format(Timer - StartTime, "00.00") & " seconds"
MsgBox rsltcount

End Sub
 
J

Joel

If you want to try using countif > 0 which will get the non-blank dates it
requires only one line change. This method doesn't look at bad data, just
cares if there is data or is'nt data.

chage from:
corelval = WorksheetFunction.Correl(corelarray,
colarray)
change to:
corelval = WorksheetFunction.Countif(corelarray,
">0")



If ArrayCount(colarray) > ArrayCount(corelarray) * 0.1
Then

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
corelval = WorksheetFunction.Correl(corelarray,
colarray
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
If Abs(corelval) > coreltest Then
colave = WorksheetFunction.Average(colarray)
colsd = WorksheetFunction.StDev(colarray)
rsltcount = rsltcount + 1
rsltarray(rsltcount, 1) = Cells(1, c + 1)
rsltarray(rsltcount, 2) = Cells(2, c + 1)
rsltarray(rsltcount, 3) = colave
rsltarray(rsltcount, 4) = colsd
rsltarray(rsltcount, 5) = colsd / colave * 100
rsltarray(rsltcount, 6) = corelval
rsltarray(rsltcount, 7) = corelval * corelval
rsltarray(rsltcount, 8) =
WorksheetFunction.Count(colarray)
End If
End If
Next c
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top