cython: how do you create an array of cdef class

The answer is no – it is not really possible in a useful way: newsgroup post of essentially the same question

It wouldn’t be possible to have a direct array (allocated in a single chunk) of Childs. Partly because, if somewhere else ever gets a reference to a Child in the array, that Child has to be kept alive (but not the whole array) which wouldn’t be possible to ensure if they were all allocated in the same chunk of memory. Additionally, were the array to be resized (if this is a requirement) then it would invalidate any other references to the objects within the array.

Therefore you’re left with having an array of pointers to Child. Such a structure would be fine, but internally would look almost exactly like a Python list (so there’s really no benefit to doing something more complicated in Cython…).

There are a few sensible workarounds:

  1. The workaround suggested in the newsgroup post is just to use a python list. You could also use a numpy array with dtype=object. If you need to to access a cdef function in the class you can do a cast first:

    cdef Child c = <Child?>a[0] # omit the ? if you don't want
                                # the overhead of checking the type.
    c.some_cdef_function()
    

    Internally both these options are stored as an C array of PyObject pointers to your Child objects and so are not as inefficient as you probably assume.

  2. A further possibility might be to store your data as a C struct (cdef struct ChildStruct: ....) which can be readily stored as an array. When you need a Python interface to that struct you can either define Child so it contains a copy of ChildStruct (but modifications won’t propagate back to your original array), or a pointer to ChildStruct (but you need to be careful with ensuring that the memory is not freed which the Child pointing to it is alive).

  3. You could use a Numpy structured array – this is pretty similar to using an array of C structs except Numpy handles the memory, and provides a Python interface.

  4. The memoryview syntax in your question is valid: cdef Child[:] array_of_child. This can be initialized from a numpy array of dtype object:

    array_of_child = np.array([(Child() for i in range(100)])
    

    In terms of data-structure, this is an array of pointers (i.e. the same as a Python list, but can be multi-dimensional). It avoids the need for <Child> casting. The important thing it doesn’t do is any kind of type-checking – if you feed an object that isn’t Child into the array then it won’t notice (because the underlying dtype is object), but will give nonsense answers or segmentation faults.

    In my view this approach gives you a false sense of security about two things: first that you have made a more efficient data structure (you haven’t, it’s basically the same as a list); second that you have any kind of type safety. However, it does exist. (If you want to use memoryviews, e.g. for multi-dimensional arrays, it would probably be better to use a memoryview of type object – this is honest about the underlying dtype)

Leave a Comment