dataframe – Page 81 – Make Me Engineer

How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL

April 14, 2022 by Tarik

You can use pd.Series.isin. For “IN” use: something.isin(somewhere) Or for “NOT IN”: ~something.isin(somewhere) As a worked example: import pandas as pd >>> df country 0 US 1 UK 2 Germany 3 China >>> countries_to_keep [‘UK’, ‘China’] >>> df.country.isin(countries_to_keep) 0 False 1 True 2 False 3 True Name: country, dtype: bool >>> df[df.country.isin(countries_to_keep)] country 1 UK … Read more

How do I select rows from a DataFrame based on column values?

April 13, 2022 by Tarik

To select rows whose column value equals a scalar, some_value, use ==: df.loc[df[‘column_name’] == some_value] To select rows whose column value is in an iterable, some_values, use isin: df.loc[df[‘column_name’].isin(some_values)] Combine multiple conditions with &: df.loc[(df[‘column_name’] >= A) & (df[‘column_name’] <= B)] Note the parentheses. Due to Python’s operator precedence rules, & binds more tightly than … Read more

How to sum a variable by group

April 13, 2022 by Tarik

You can also use the dplyr package for that purpose: library(dplyr) x %>% group_by(Category) %>% summarise(Frequency = sum(Frequency)) #Source: local data frame [3 x 2] # # Category Frequency #1 First 30 #2 Second 5 #3 Third 34 Or, for multiple summary columns (works with one column too): x %>% group_by(Category) %>% summarise(across(everything(), sum)) Here … Read more

How to deal with SettingWithCopyWarning in Pandas

April 12, 2022 by Tarik

The SettingWithCopyWarning was created to flag potentially confusing “chained” assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.] df[df[‘A’] > 2][‘B’] = new_val # new_val not set in df The warning offers a suggestion to rewrite as … Read more

Reshaping data.frame from wide to long format

April 12, 2022 by Tarik

Three alternative solutions: 1) With data.table: You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt-function from reshape2. You can for example also specify the name of the variable-column: library(data.table) long <- melt(setDT(wide), id.vars = c(“Code”,”Country”), … Read more

Extract a part of a data frame by selecting specific observations of a column in R [duplicate]

April 11, 2022 by Tarik

Lets suppose your data frame is called DF and you want item_i = 9, you coul try: DF[DF$item_i==9,] If you want item_i = 1 or 9 then DF[DF$item_i %in% c(1,9),]

how to integrate properties defined on multiple rows using a data.frame or data.table long format approach

April 11, 2022 by Tarik

If I understand OP correctly, you want smth like this: dt[, {bigN = .N; .SD[, .N / bigN, by = subg]}, by = group] or maybe (and very similarly) this: dt[, {counts.sum = sum(counts); .SD[, counts / counts.sum, by = subg]}, by = group]