Should Feature Selection be done before Train-Test Split or after?
It is not actually difficult to demonstrate why using the whole dataset (i.e. before splitting to train/test) for selecting features can lead you astray. Here is one such demonstration using random dummy data with Python and scikit-learn: import numpy as np from sklearn.feature_selection import SelectKBest from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics … Read more