TfidfVectorizer in scikit-learn : ValueError: np.nan is an invalid document
You need to convert the dtype object to unicode string as is clearly mentioned in the traceback. x = v.fit_transform(df[‘Review’].values.astype(‘U’)) ## Even astype(str) would work From the Doc page of TFIDF Vectorizer: fit_transform(raw_documents, y=None) Parameters: raw_documents : iterable an iterable which yields either str, unicode or file objects