Efficiently Creating A Pandas DataFrame From A Numpy 3d array

Here’s one approach that does most of the processing on NumPy before finally putting it out as a DataFrame, like so – m,n,r = a.shape out_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1))) out_df = pd.DataFrame(out_arr) If you precisely know that the number of columns would be 2, such that we would have b and c as the last two … Read more

Show confidence limits and prediction limits in scatter plot

Here’s what I put together. I tried to closely emulate your screenshot. Given import numpy as np import scipy as sp import scipy.stats as stats import matplotlib.pyplot as plt %matplotlib inline # Raw Data heights = np.array([50,52,53,54,58,60,62,64,66,67,68,70,72,74,76,55,50,45,65]) weights = np.array([25,50,55,75,80,85,50,65,85,55,45,45,50,75,95,65,50,40,45]) Two detailed options to plot confidence intervals: def plot_ci_manual(t, s_err, n, x, x2, y2, ax=None): … Read more

Vectorized way of calculating row-wise dot product two matrices with Scipy

Straightforward way to do that is: import numpy as np a=np.array([[1,2,3],[3,4,5]]) b=np.array([[1,2,3],[1,2,3]]) np.sum(a*b, axis=1) which avoids the python loop and is faster in cases like: def npsumdot(x, y): return np.sum(x*y, axis=1) def loopdot(x, y): result = np.empty((x.shape[0])) for i in range(x.shape[0]): result[i] = np.dot(x[i], y[i]) return result timeit npsumdot(np.random.rand(500000,50),np.random.rand(500000,50)) # 1 loops, best of 3: … Read more

Get intersecting rows across two 2D numpy arrays

For short arrays, using sets is probably the clearest and most readable way to do it. Another way is to use numpy.intersect1d. You’ll have to trick it into treating the rows as a single value, though… This makes things a bit less readable… import numpy as np A = np.array([[1,4],[2,5],[3,6]]) B = np.array([[1,4],[3,6],[7,8]]) nrows, ncols … Read more

tech