How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL

You can use pd.Series.isin. For “IN” use: something.isin(somewhere) Or for “NOT IN”: ~something.isin(somewhere) As a worked example: import pandas as pd >>> df country 0 US 1 UK 2 Germany 3 China >>> countries_to_keep [‘UK’, ‘China’] >>> df.country.isin(countries_to_keep) 0 False 1 True 2 False 3 True Name: country, dtype: bool >>> df[df.country.isin(countries_to_keep)] country 1 UK … Read more

How do I select rows from a DataFrame based on column values?

To select rows whose column value equals a scalar, some_value, use ==: df.loc[df[‘column_name’] == some_value] To select rows whose column value is in an iterable, some_values, use isin: df.loc[df[‘column_name’].isin(some_values)] Combine multiple conditions with &: df.loc[(df[‘column_name’] >= A) & (df[‘column_name’] <= B)] Note the parentheses. Due to Python’s operator precedence rules, & binds more tightly than … Read more

How to sum a variable by group

You can also use the dplyr package for that purpose: library(dplyr) x %>% group_by(Category) %>% summarise(Frequency = sum(Frequency)) #Source: local data frame [3 x 2] # # Category Frequency #1 First 30 #2 Second 5 #3 Third 34 Or, for multiple summary columns (works with one column too): x %>% group_by(Category) %>% summarise(across(everything(), sum)) Here … Read more

How to deal with SettingWithCopyWarning in Pandas

The SettingWithCopyWarning was created to flag potentially confusing “chained” assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.] df[df[‘A’] > 2][‘B’] = new_val # new_val not set in df The warning offers a suggestion to rewrite as … Read more

Reshaping data.frame from wide to long format

Three alternative solutions: 1) With data.table: You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt-function from reshape2. You can for example also specify the name of the variable-column: library(data.table) long <- melt(setDT(wide), id.vars = c(“Code”,”Country”), … Read more