Align already captured rgb and depth images

prologue

This answer is based on mine previous answer:

  • Does Kinect Infrared View Have an offset with the Kinect Depth View

I manually crop your input image so I separate colors and depth images (as my program need them separated. This could cause minor offset change by few pixels. Also as I do not have the depths (depth image is 8bit only due to grayscale RGB) then the depth accuracy I work with is very poor see:

depth

So my results are affected by all this negatively. Anyway here is what you need to do:

  1. determine FOV for both images

    So find some measurable feature visible on both images. The bigger in size the more accurate the result. For example I choose these:

    FOV

  2. form a point cloud or mesh

    I use depth image as reference so my point cloud is in its FOV. As I do not have the distances but 8bit values instead I converted that to some distance by multiplying by constant. So I scan whole depth image and for every pixel I create point in my point cloud array. Then convert the dept pixel coordinate to color image FOV and copy its color too. something like this (in C++):

    picture rgb,zed; // your input images
    struct pnt3d { float pos[3]; DWORD rgb; pnt3d(){}; pnt3d(pnt3d& a){ *this=a; }; ~pnt3d(){}; pnt3d* operator = (const pnt3d *a) { *this=*a; return this; }; /*pnt3d* operator = (const pnt3d &a) { ...copy... return this; };*/ };
    pnt3d **xyz=NULL; int xs,ys,ofsx=0,ofsy=0;
    
    void copy_images()
        {
        int x,y,x0,y0;
        float xx,yy;
        pnt3d *p;
        for (y=0;y<ys;y++)
         for (x=0;x<xs;x++)
            {
            p=&xyz[y][x];
            // copy point from depth image
            p->pos[0]=2.000*((float(x)/float(xs))-0.5);
            p->pos[1]=2.000*((float(y)/float(ys))-0.5)*(float(ys)/float(xs));
            p->pos[2]=10.0*float(DWORD(zed.p[y][x].db[0]))/255.0;
            // convert dept image x,y to color image space (FOV correction)
            xx=float(x)-(0.5*float(xs));
            yy=float(y)-(0.5*float(ys));
            xx*=98.0/108.0;
            yy*=106.0/119.0;
            xx+=0.5*float(rgb.xs);
            yy+=0.5*float(rgb.ys);
            x0=xx; x0+=ofsx;
            y0=yy; y0+=ofsy;
            // copy color from rgb image if in range
            p->rgb=0x00000000; // black
            if ((x0>=0)&&(x0<rgb.xs))
             if ((y0>=0)&&(y0<rgb.ys))
              p->rgb=rgb2bgr(rgb.p[y0][x0].dd); // OpenGL has reverse RGBorder then my image
            }
        }
    

    where **xyz is my point cloud 2D array allocated t depth image resolution. The picture is my image class for DIP so here some relevant members:

    • xs,ys is the image resolution in pixels
    • p[ys][xs] is the image direct pixel access as union of DWORD dd; BYTE db[4]; so I can access color as single 32 bit variable or each color channel separately.
    • rgb2bgr(DWORD col) just reorder color channels from RGB to BGR.
  3. render it

    I use OpenGL for this so here the code:

        glBegin(GL_QUADS);
        for (int y0=0,y1=1;y1<ys;y0++,y1++)
        for (int x0=0,x1=1;x1<xs;x0++,x1++)
            {
            float z,z0,z1;
            z=xyz[y0][x0].pos[2]; z0=z; z1=z0;
            z=xyz[y0][x1].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
            z=xyz[y1][x0].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
            z=xyz[y1][x1].pos[2]; if (z0>z) z0=z; if (z1<z) z1=z;
            if (z0   <=0.01) continue;
            if (z1   >=3.90) continue;  // 3.972 pre vsetko nad .=3.95m a 4.000 ak nechyti vobec nic
            if (z1-z0>=0.10) continue;
            glColor4ubv((BYTE* )&xyz[y0][x0].rgb);
            glVertex3fv((float*)&xyz[y0][x0].pos);
            glColor4ubv((BYTE* )&xyz[y0][x1].rgb);
            glVertex3fv((float*)&xyz[y0][x1].pos);
            glColor4ubv((BYTE* )&xyz[y1][x1].rgb);
            glVertex3fv((float*)&xyz[y1][x1].pos);
            glColor4ubv((BYTE* )&xyz[y1][x0].rgb);
            glVertex3fv((float*)&xyz[y1][x0].pos);
            }
        glEnd();
    

    You need to add the OpenGL initialization and camera settings etc of coarse. Here the unaligned result:

    colors

  4. align it

    If you notice I added ofsx,ofsy variables to copy_images(). This is the offset between cameras. I change them on arrows keystrokes by 1 pixel and then call copy_images and render the result. This way I manually found the offset very quickly:

    align

    As you can see the offset is +17 pixels in x axis and +4 pixels in y axis. Here side view to better see the depths:

    side

Hope It helps a bit

Leave a Comment

tech