How to calculate correlation between all columns and remove highly correlated ones using pandas?

The method here worked well for me, only a few lines of code: https://chrisalbon.com/machine_learning/feature_selection/drop_highly_correlated_features/ import numpy as np # Create correlation matrix corr_matrix = df.corr().abs() # Select upper triangle of correlation matrix upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)) # Find features with correlation greater than 0.95 to_drop = [column for column in upper.columns if any(upper[column] > 0.95)] … Read more

Correlation of two arrays in C#

You can have the values in separate lists at the same index and use a simple Zip. var fitResult = new FitResult(); var values1 = new List<int>(); var values2 = new List<int>(); var correls = values1.Zip(values2, (v1, v2) => fitResult.CorrelationCoefficient(v1, v2)); A second way is to write your own custom implementation (mine isn’t optimized for … Read more

A matrix version of cor.test()

corr.test in the psych package is designed to do this: library(“psych”) data(sat.act) corr.test(sat.act) As noted in the comments, to replicate the p-values from the base cor.test() function over the entire matrix, then you need to turn off adjustment of the p-values for multiple comparisons (the default is to use Holm’s method of adjustment): corr.test(sat.act, adjust … Read more

Use .corr to get the correlation between two columns

Without actual data it is hard to answer the question but I guess you are looking for something like this: Top15[‘Citable docs per Capita’].corr(Top15[‘Energy Supply per Capita’]) That calculates the correlation between your two columns ‘Citable docs per Capita’ and ‘Energy Supply per Capita’. To give an example: import pandas as pd df = pd.DataFrame({‘A’: … Read more

Computing the correlation coefficient between two multi-dimensional arrays

Correlation (default ‘valid’ case) between two 2D arrays: You can simply use matrix-multiplication np.dot like so – out = np.dot(arr_one,arr_two.T) Correlation with the default “valid” case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position. Row-wise Correlation Coefficient calculation for two 2D arrays: def … Read more