correlation
How to calculate correlation between all columns and remove highly correlated ones using pandas?
The method here worked well for me, only a few lines of code: https://chrisalbon.com/machine_learning/feature_selection/drop_highly_correlated_features/ import numpy as np # Create correlation matrix corr_matrix = df.corr().abs() # Select upper triangle of correlation matrix upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)) # Find features with correlation greater than 0.95 to_drop = [column for column in upper.columns if any(upper[column] > 0.95)] … Read more
Correlation of two arrays in C#
You can have the values in separate lists at the same index and use a simple Zip. var fitResult = new FitResult(); var values1 = new List<int>(); var values2 = new List<int>(); var correls = values1.Zip(values2, (v1, v2) => fitResult.CorrelationCoefficient(v1, v2)); A second way is to write your own custom implementation (mine isn’t optimized for … Read more
A matrix version of cor.test()
corr.test in the psych package is designed to do this: library(“psych”) data(sat.act) corr.test(sat.act) As noted in the comments, to replicate the p-values from the base cor.test() function over the entire matrix, then you need to turn off adjustment of the p-values for multiple comparisons (the default is to use Holm’s method of adjustment): corr.test(sat.act, adjust … Read more
LabelEncoder for categorical features?
TL;DR: Using a LabelEncoder to encode ordinal any kind of features is a bad idea! This is in fact clearly stated in the docs, where it is mentioned that as its name suggests this encoding method is aimed at encoding the label: This transformer should be used to encode target values, i.e. y, and not … Read more
How can I create a correlation matrix in R?
An example, d <- data.frame(x1=rnorm(10), x2=rnorm(10), x3=rnorm(10)) cor(d) # get correlations (returns matrix)
Use .corr to get the correlation between two columns
Without actual data it is hard to answer the question but I guess you are looking for something like this: Top15[‘Citable docs per Capita’].corr(Top15[‘Energy Supply per Capita’]) That calculates the correlation between your two columns ‘Citable docs per Capita’ and ‘Energy Supply per Capita’. To give an example: import pandas as pd df = pd.DataFrame({‘A’: … Read more
Plot correlation matrix into a graph
Rather “less” look like, but worth checking (as giving more visual information): Correlation matrix ellipses: Correlation matrix circles: Please find more examples in the corrplot vignette referenced by @assylias below.
Computing the correlation coefficient between two multi-dimensional arrays
Correlation (default ‘valid’ case) between two 2D arrays: You can simply use matrix-multiplication np.dot like so – out = np.dot(arr_one,arr_two.T) Correlation with the default “valid” case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position. Row-wise Correlation Coefficient calculation for two 2D arrays: def … Read more