In short:
df['Text'].apply(word_tokenize)
Or if you want to add another column to store the tokenized list of strings:
df['tokenized_text'] = df['Text'].apply(word_tokenize)
There are tokenizers written specifically for twitter text, see http://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.casual
To use nltk.tokenize.TweetTokenizer
:
from nltk.tokenize import TweetTokenizer
tt = TweetTokenizer()
df['Text'].apply(tt.tokenize)
Similar to:
-
How to apply pos_tag_sents() to pandas dataframe efficiently
-
how to use word_tokenize in data frame
-
How to apply pos_tag_sents() to pandas dataframe efficiently
-
Tokenizing words into a new column in a pandas dataframe
-
Run nltk sent_tokenize through Pandas dataframe
-
Python text processing: NLTK and pandas