Using numpy.memmap
you create arrays directly mapped into a file:
import numpy
a = numpy.memmap('test.mymemmap', dtype="float32", mode="w+", shape=(200000,1000))
# here you will see a 762MB file created in your working directory
You can treat it as a conventional array:
a += 1000.
It is possible even to assign more arrays to the same file, controlling it from mutually sources if needed. But I’ve experiences some tricky things here. To open the full array you have to “close” the previous one first, using del
:
del a
b = numpy.memmap('test.mymemmap', dtype="float32", mode="r+", shape=(200000,1000))
But openning only some part of the array makes it possible to achieve the simultaneous control:
b = numpy.memmap('test.mymemmap', dtype="float32", mode="r+", shape=(2,1000))
b[1,5] = 123456.
print a[1,5]
#123456.0
Great! a
was changed together with b
. And the changes are already written on disk.
The other important thing worth commenting is the offset
. Suppose you want to take not the first 2 lines in b
, but lines 150000 and 150001.
b = numpy.memmap('test.mymemmap', dtype="float32", mode="r+", shape=(2,1000),
offset=150000*1000*32/8)
b[1,2] = 999999.
print a[150001,2]
#999999.0
Now you can access and update any part of the array in simultaneous operations. Note the byte-size going in the offset calculation. So for a ‘float64’ this example would be 150000*1000*64/8.
Other references:
-
Is it possible to map a discontiuous data on disk to an array with python?
-
numpy.memmap
documentation here.