ranking – Make Me Engineer

selecting top N rows for each group in a table

June 1, 2023 by Tarik

If you’re using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this: ;WITH HairColors AS (SELECT id, name, hair, score, ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as ‘RowNum’ ) SELECT id, name, hair, score FROM HairColors WHERE RowNum <= 3 This CTE will “partition” your … Read more

Find the n most common values in a vector

November 29, 2022 by Tarik

I’m sure this is a duplicate, but the answer is simple: sort(table(variable),decreasing=TRUE)[1:3]

What string similarity algorithms are there?

October 8, 2022 by Tarik

The Levenshtein distance is the algorithm I would recommend. It calculates the minimum number of operations you must do to change 1 string into another. The fewer changes means the strings are more similar…

How do I find the closest values in a Pandas series to an input number?

July 6, 2022 by Tarik

You could use argsort() like Say, input = 3 In [198]: input = 3 In [199]: df.iloc[(df[‘num’]-input).abs().argsort()[:2]] Out[199]: num 2 4 4 2 df_sort is the dataframe with 2 closest values. In [200]: df_sort = df.iloc[(df[‘num’]-input).abs().argsort()[:2]] For index, In [201]: df_sort.index.tolist() Out[201]: [2, 4] For values, In [202]: df_sort[‘num’].tolist() Out[202]: [4, 2] Detail, for the … Read more

A better similarity ranking algorithm for variable length strings

May 29, 2022 by Tarik

Simon White of Catalysoft wrote an article about a very clever algorithm that compares adjacent character pairs that works really well for my purposes: http://www.catalysoft.com/articles/StrikeAMatch.html Simon has a Java version of the algorithm and below I wrote a PL/Ruby version of it (taken from the plain ruby version done in the related forum entry comment … Read more

Extracting the corresponding value of a rank vector in Matlab

April 10, 2022 by Tarik

This kind of question has been asked and answered on many occasions, but I can’t seem to find a good duplicate. Regardless, you would use the second output of sort on the rank vector to give you the position of where each value would appear in the sorted result. Specifically, for each value in the … Read more