What is the difference between pandas.qcut and pandas.cut?

To begin, note that quantiles is just the most general term for things like percentiles, quartiles, and medians. You specified five bins in your example, so you are asking qcut for quintiles.

So, when you ask for quintiles with qcut, the bins will be chosen so that you have the same number of records in each bin. You have 30 records, so should have 6 in each bin (your output should look like this, although the breakpoints will differ due to the random draw):

pd.qcut(factors, 5).value_counts()

[-2.578, -0.829]    6
(-0.829, -0.36]     6
(-0.36, 0.366]      6
(0.366, 0.868]      6
(0.868, 2.617]      6

Conversely, for cut you will see something more uneven:

pd.cut(factors, 5).value_counts()

(-2.583, -1.539]    5
(-1.539, -0.5]      5
(-0.5, 0.539]       9
(0.539, 1.578]      9
(1.578, 2.617]      2

That’s because cut will choose the bins to be evenly spaced according to the values themselves and not the frequency of those values. Hence, because you drew from a random normal, you’ll see higher frequencies in the inner bins and fewer in the outer. This is essentially going to be a tabular form of a histogram (which you would expect to be fairly bell shaped with 30 records).

Leave a Comment

tech