selecting top N rows for each group in a table

If you’re using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this: ;WITH HairColors AS (SELECT id, name, hair, score, ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as ‘RowNum’ ) SELECT id, name, hair, score FROM HairColors WHERE RowNum <= 3 This CTE will “partition” your … Read more

Ranking order per group in Pandas

There are lots of different arguments you can pass to rank; it looks like you can use rank(“dense”, ascending=False) to get the results you want, after doing a groupby: >>> df[“rank”] = df.groupby(“group_ID”)[“value”].rank(method=”dense”, ascending=False) >>> df group_ID item_ID value rank 0 0S00A1HZEy AB 10 2 1 0S00A1HZEy AY 4 3 2 0S00A1HZEy AC 35 1 … Read more

How do I find the closest values in a Pandas series to an input number?

You could use argsort() like Say, input = 3 In [198]: input = 3 In [199]: df.iloc[(df[‘num’]-input).abs().argsort()[:2]] Out[199]: num 2 4 4 2 df_sort is the dataframe with 2 closest values. In [200]: df_sort = df.iloc[(df[‘num’]-input).abs().argsort()[:2]] For index, In [201]: df_sort.index.tolist() Out[201]: [2, 4] For values, In [202]: df_sort[‘num’].tolist() Out[202]: [4, 2] Detail, for the … Read more

A better similarity ranking algorithm for variable length strings

Simon White of Catalysoft wrote an article about a very clever algorithm that compares adjacent character pairs that works really well for my purposes: http://www.catalysoft.com/articles/StrikeAMatch.html Simon has a Java version of the algorithm and below I wrote a PL/Ruby version of it (taken from the plain ruby version done in the related forum entry comment … Read more

tech