Align already captured rgb and depth images

Question

prologue

This answer is based on mine previous answer:

Does Kinect Infrared View Have an offset with the Kinect Depth View

I manually crop your input image so I separate colors and depth images (as my program need them separated. This could cause minor offset change by few pixels. Also as I do not have the depths (depth image is 8bit only due to grayscale RGB) then the depth accuracy I work with is very poor see:

depth

So my results are affected by all this negatively. Anyway here is what you need to do:

determine FOV for both images

So find some measurable feature visible on both images. The bigger in size the more accurate the result. For example I choose these:

form a point cloud or mesh

I use depth image as reference so my point cloud is in its FOV. As I do not have the distances but 8bit values instead I converted that to some distance by multiplying by constant. So I scan whole depth image and for every pixel I create point in my point cloud array. Then convert the dept pixel coordinate to color image FOV and copy its color too. something like this (in C++):

picture rgb,zed; // your input images
struct pnt3d { float pos[3]; DWORD rgb; pnt3d(){}; pnt3d(pnt3d& a){ *this=a; }; ~pnt3d(){}; pnt3d* operator = (const pnt3d *a) { *this=*a; return this; }; /*pnt3d* operator = (const pnt3d &a) { ...copy... return this; };*/ };
pnt3d **xyz=NULL; int xs,ys,ofsx=0,ofsy=0;

void copy_images()
    {
    int x,y,x0,y0;
    float xx,yy;
    pnt3d *p;
    for (y=0;y<ys;y++)
     for (x=0;x<xs;x++)
        {
        p=&xyz[y][x];
        // copy point from depth image
        p->pos[0]=2.000*((float(x)/float(xs))-0.5);
        p->pos[1]=2.000*((float(y)/float(ys))-0.5)*(float(ys)/float(xs));
        p->pos[2]=10.0*float(DWORD(zed.p[y][x].db[0]))/255.0;
        // convert dept image x,y to color image space (FOV correction)
        xx=float(x)-(0.5*float(xs));
        yy=float(y)-(0.5*float(ys));
        xx*=98.0/108.0;
        yy*=106.0/119.0;
        xx+=0.5*float(rgb.xs);
        yy+=0.5*float(rgb.ys);
        x0=xx; x0+=ofsx;
        y0=yy; y0+=ofsy;
        // copy color from rgb image if in range
        p->rgb=0x00000000; // black
        if ((x0>=0)&&(x0<rgb.xs))
         if ((y0>=0)&&(y0<rgb.ys))
          p->rgb=rgb2bgr(rgb.p[y0][x0].dd); // OpenGL has reverse RGBorder then my image
        }
    }

where **xyz is my point cloud 2D array allocated t depth image resolution. The picture is my image class for DIP so here some relevant members:

xs,ys is the image resolution in pixels
p[ys][xs] is the image direct pixel access as union of DWORD dd; BYTE db[4]; so I can access color as single 32 bit variable or each color channel separately.
rgb2bgr(DWORD col) just reorder color channels from RGB to BGR.

render it

I use OpenGL for this so here the code:

    glBegin(GL_QUADS);
    for (int y0=0,y1=1;y1<ys;y0++,y1++)
    for (int x0=0,x1=1;x1<xs;x0++,x1++)
        {
        float z,z0,z1;
        z=xyz[y0][x0].pos[2]; z0=z; z1=z0;
        z=xyz[y0][x1].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
        z=xyz[y1][x0].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
        z=xyz[y1][x1].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
        if (z0   <=0.01) continue;
        if (z1   >=3.90) continue;  // 3.972 pre vsetko nad .=3.95m a 4.000 ak nechyti vobec nic
        if (z1-z0>=0.10) continue;
        glColor4ubv((BYTE* )&xyz[y0][x0].rgb);
        glVertex3fv((float*)&xyz[y0][x0].pos);
        glColor4ubv((BYTE* )&xyz[y0][x1].rgb);
        glVertex3fv((float*)&xyz[y0][x1].pos);
        glColor4ubv((BYTE* )&xyz[y1][x1].rgb);
        glVertex3fv((float*)&xyz[y1][x1].pos);
        glColor4ubv((BYTE* )&xyz[y1][x0].rgb);
        glVertex3fv((float*)&xyz[y1][x0].pos);
        }
    glEnd();

You need to add the OpenGL initialization and camera settings etc of coarse. Here the unaligned result:

colors

align it

If you notice I added ofsx,ofsy variables to copy_images(). This is the offset between cameras. I change them on arrows keystrokes by 1 pixel and then call copy_images and render the result. This way I manually found the offset very quickly:

As you can see the offset is +17 pixels in x axis and +4 pixels in y axis. Here side view to better see the depths:

Hope It helps a bit

Leave a Comment Cancel reply