Convolving image with kernel in Fourier domain

Question

The Discrete Fourier transform (DFT) and, by extension, the FFT (which computes the DFT) have the origin in the first element (for an image, the top-left pixel) for both the input and the output. This is the reason we often use the fftshift function on the output, so as to shift the origin to a location more familiar to us (the middle of the image).

This means that we need to transform a 3×3 uniform weighted blurring kernel to look like this before passing it to the FFT function:

1/9  1/9  0  0  ... 0  1/9
1/9  1/9  0  0  ... 0  1/9
  0    0  0  0  ... 0    0
...  ...               ...
  0    0  0  0  ... 0    0
1/9  1/9  0  0  ... 0  1/9

That is, the middle of the kernel is at the top-left corner of the image, with the pixels above and to the left of the middle wrapping around and appearing at the right and bottom ends of the image.

We can do this using the ifftshift function, applied to the kernel after padding. When padding the kernel, we need to take care that the origin (middle of the kernel) is at location k_im.shape // 2 (integer division), within the kernel image k_im. Initially the origin is at [3,3]//2 == [1,1]. Usually, the image whose size we’re matching is even in size, for example [256,256]. The origin there will be at [256,256]//2 == [128,128]. This means that we need to pad a different amount to the left and to the right (and bottom and top). We need to be careful computing this padding:

sz = img.shape  # the sizes we're matching
kernel = np.ones((3,3)) / 9
sz = (sz[0] - kernel.shape[0], sz[1] - kernel.shape[1])  # total amount of padding
kernel = np.pad(kernel, (((sz[0]+1)//2, sz[0]//2), ((sz[1]+1)//2, sz[1]//2)), 'constant')
kernel = fftpack.ifftshift(kernel)

Note that the input image, img, does not need to be padded (though you can do this if you want to enforce a size for which the FFT is cheaper). There is also no need to apply fftshift to the result of the FFT before multiplication, and then reverse this shift right after, these shifts are redundant. You should use fftshift only if you want to display the Fourier domain image. Finally, applying logarithmic scaling to the filtered image is wrong.

The resulting code is (I’m using pyplot for display, not using PIL at all):

import numpy as np
from scipy import misc
from scipy import fftpack
import matplotlib.pyplot as plt

img = misc.face()[:,:,0]

kernel = np.ones((3,3)) / 9
sz = (img.shape[0] - kernel.shape[0], img.shape[1] - kernel.shape[1])  # total amount of padding
kernel = np.pad(kernel, (((sz[0]+1)//2, sz[0]//2), ((sz[1]+1)//2, sz[1]//2)), 'constant')
kernel = fftpack.ifftshift(kernel)

filtered = np.real(fftpack.ifft2(fftpack.fft2(img) * fftpack.fft2(kernel)))
plt.imshow(filtered, vmin=0, vmax=255)
plt.show()

Note that I am taking the real part of the inverse FFT. The imaginary part should contain only values very close to zero, which are the result of rounding errors in the computations. Taking the absolute value, though common, is incorrect. For example, you might want to apply a filter to an image that contains negative values, or apply a filter that produces negative values. Taking the absolute value here would create artefacts. If the output of the inverse FFT contains imaginary values significantly different from zero, then there is an error in the way that the filtering kernel was padded.

Also note that the kernel here is tiny, and consequently the blurring effect is tiny too. To better see the effect of the blurring, make a larger kernel, for example np.ones((7,7)) / 49.

Leave a Comment Cancel reply