Is there a more efficient way of texturing a circle?

Why not use:

(x-x0)^2 + (y-y0)^2 <= r^2

so simply:

int x0=?,y0=?,r=?; // your planet position and size
int x,y,xx,rr,col;
for (rr=r*r,x=-r;x<=r;x++)
 for (xx=x*x,y=-r;y<=r;y++)
  if (xx+(y*y)<=rr)
   {
   col = whateverFunctionIMake(x, y);
   setPixel(x0+x, y0+y, col);   
   }

all on integers, no floating or slow operations, no gaps … Do not forget to use randseed for the coloring function …

[Edit1] some more stuff

Now if you want speed than you need direct pixel access (in most platforms Pixels, SetPixel, PutPixels etc are slooow. because they perform a lot of stuff like range checking, color conversions etc … ) In case you got direct pixel access or render into your own array/image whatever you need to add clipping with screen (so you do not need to check if pixel is inside screen on each pixel) to avoid access violations if your circle is overlapping screen.

As mentioned in the comments you can get rid of the x*x and y*y inside loop using previous value (as both x,y are only incrementing). For more info about it see:

  • 32bit SQRT in 16T without multiplication

the math is like this:

(x+1)^2 = (x+1)*(x+1) = x^2 + 2x + 1

so instead of xx = x*x we just do xx+=x+x+1 for not incremented yet x or xx+=x+x-1 if x is already incremented.

When put all together I got this:

void circle(int x,int y,int r,DWORD c)
    {
    // my Pixel access
    int **Pixels=Main->pyx;         // Pixels[y][x]
    int   xs=Main->xs;              // resolution
    int   ys=Main->ys;
    // circle
    int sx,sy,sx0,sx1,sy0,sy1;      // [screen]
    int cx,cy,cx0,    cy0    ;      // [circle]
    int rr=r*r,cxx,cyy,cxx0,cyy0;   // [circle^2]
    // BBOX + screen clip
    sx0=x-r; if (sx0>=xs) return; if (sx0<  0) sx0=0;
    sy0=y-r; if (sy0>=ys) return; if (sy0<  0) sy0=0;
    sx1=x+r; if (sx1<  0) return; if (sx1>=xs) sx1=xs-1;
    sy1=y+r; if (sy1<  0) return; if (sy1>=ys) sy1=ys-1;
    cx0=sx0-x; cxx0=cx0*cx0;
    cy0=sy0-y; cyy0=cy0*cy0;
    // render
    for (cxx=cxx0,cx=cx0,sx=sx0;sx<=sx1;sx++,cxx+=cx,cx++,cxx+=cx)
     for (cyy=cyy0,cy=cy0,sy=sy0;sy<=sy1;sy++,cyy+=cy,cy++,cyy+=cy)
      if (cxx+cyy<=rr)
       Pixels[sy][sx]=c;
    }

This renders a circle with radius 512 px in ~35ms so 23.5 Mpx/s filling on mine setup (AMD A8-5500 3.2GHz Win7 64bit single thread VCL/GDI 32bit app coded by BDS2006 C++). Just change the direct pixel access to style/api you use …

[Edit2]

to measure speed on x86/x64 you can use RDTSC asm instruction here some ancient C++ code I used ages ago (on 32bit environment without native 64bit stuff):

double _rdtsc()
    {
    LARGE_INTEGER x; // unsigned 64bit integer variable from windows.h I think
    DWORD l,h;       // standard unsigned 32 bit variables
    asm {
        rdtsc
        mov l,eax
        mov h,edx
        }
    x.LowPart=l;
    x.HighPart=h;
    return double(x.QuadPart);
    }

It returns clocks your CPU has elapsed since power up. Beware you should account for overflows as on fast machines the 32bit counter is overflowing in seconds. Also each core has separate counter so set affinity to single CPU. On variable speed clock before measurement heat upi CPU by some computation and to convert to time just divide by CPU clock frequency. To obtain it just do this:

t0=_rdtsc()
sleep(250);
t1=_rdtsc();
fcpu = (t1-t0)*4;

and measurement:

t0=_rdtsc()
mesured stuff
t1=_rdtsc();
time = (t1-t0)/fcpu

if t1<t0 you overflowed and you need to add the a constant to result or measure again. Also the measured process must take less than overflow period. To enhance precision ignore OS granularity. for more info see:

  • Measuring Cache Latencies
  • Cache size estimation on your system? setting affinity example
  • Negative clock cycle measurements with back-to-back rdtsc?

Leave a Comment

tech