Working with big data in python and numpy, not enough ram, how to save partial results on disc?

Using numpy.memmap you create arrays directly mapped into a file: import numpy a = numpy.memmap(‘test.mymemmap’, dtype=”float32″, mode=”w+”, shape=(200000,1000)) # here you will see a 762MB file created in your working directory You can treat it as a conventional array: a += 1000. It is possible even to assign more arrays to the same file, controlling … Read more

Installing SciPy with pip

Prerequisite: sudo apt-get install build-essential gfortran libatlas-base-dev python-pip python-dev sudo pip install –upgrade pip Actual packages: sudo pip install numpy sudo pip install scipy Optional packages: sudo pip install matplotlib OR sudo apt-get install python-matplotlib sudo pip install -U scikit-learn sudo pip install pandas src

Computing the correlation coefficient between two multi-dimensional arrays

Correlation (default ‘valid’ case) between two 2D arrays: You can simply use matrix-multiplication np.dot like so – out = np.dot(arr_one,arr_two.T) Correlation with the default “valid” case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position. Row-wise Correlation Coefficient calculation for two 2D arrays: def … Read more

binning data in python with scipy/numpy

It’s probably faster and easier to use numpy.digitize(): import numpy data = numpy.random.random(100) bins = numpy.linspace(0, 1, 10) digitized = numpy.digitize(data, bins) bin_means = [data[digitized == i].mean() for i in range(1, len(bins))] An alternative to this is to use numpy.histogram(): bin_means = (numpy.histogram(data, bins, weights=data)[0] / numpy.histogram(data, bins)[0]) Try for yourself which one is faster… … Read more

Moving average or running mean

UPDATE: more efficient solutions have been proposed, uniform_filter1d from scipy being probably the best among the “standard” 3rd-party libraries, and some newer or specialized libraries are available too. You can use np.convolve for that: np.convolve(x, np.ones(N)/N, mode=”valid”) Explanation The running mean is a case of the mathematical operation of convolution. For the running mean, you … Read more

tech