James Gregson's Website

Sep 09, 2018

Consistent Spatial and Frequency Domain Image Filtering with Python

In image reconstruction tasks, it's often convenient to perform filtering operations in either the spatial or frequency domain. Both have their advantages and disadvantages. For the spatial domain, small filters such as finite differences can be evaluated extremely quickly and include local coefficients but objective functions often result in large systems of equations. In contrast, many objective functions diagonalize to yield pixel-wise solutions in the frequency domain but spatial variation cannot be incorporated directly.

As a result, it's common to express reconstruction tasks with individual terms in each domain. The domains are then coupled through penalties or constraints that can be expressed naturally either domain and lead to efficient solvers.

By way of example: in the total-variation regularized inpainting example below, the data term expresses a least-squares penalty on a (fixed) random selection of pixels in the reconstruction \({\bf I}\) and measurements \({\bf Y}\) via the selection mask \({\bf S}\) (\(\odot\) is pixelwise multiplication). This term cannot be expressed efficiently in the Fourier domain. However the regularizers (e.g. \(\lambda \| \frac{\partial}{\partial x} {\bf I} \|_1\)) can and should be since otherwise they couple every pixel in the image together. This is bad because it results in very costly solves.

\begin{equation*} {\bf I} = \mbox{argmin}_{\bf I} \frac{1}{2} \| S \odot \left( {\bf I} - {\bf Y} \right) \|_2^2 + \lambda \| \frac{\partial}{\partial x} {\bf I} \|_1 + + \lambda \| \frac{\partial}{\partial y} {\bf I} \|_1 \end{equation*}

Solving this efficiently requires splitting the problem into separate solve that couple through constraints and Lagrange multipliers (or more simply through penalties). How to do that is an entire topic in itself but the key thing is that filtering is performed in both domains and must be consistent to produce sensible results. If this is not the case, all sorts of weird things happen: images shifting each iteration, convergence stalling, etc.

This sounds (and is) easy but requires a bit of attention to detail.

Brass Tacks

The starting point here is a definition of a sparse kernel. By sparse, I mean a kernel with very few non-zero weights, generally less than 10-20 or so. This includes common filters like Laplacians, edge filters and finite difference stencils.

In the following code, a kernel is defined by three equal length lists containing the row-offsets, column-offsets and filter values for the kernel. In the code, the user specifies the correlation kernel; this produces results that match typical finite-difference kernels when the correlate_spatial or correlate_fourier methods are called. Here is the code for the kernel class:

import numpy

class SparseKernel2D:
    """Represents a kernel for a 2D convolution operation

    Example:

        # make a first-order finite difference along the column axis
        K = SparseKernel2D( [0,0], [-1,0], [-1.0, 1.0] )

    """

    def __init__( self, offy, offx, vals ):
        """Represents a kernel for a 2D convolution operation

        Args:
            offy (integer array/list) vertical offsets of non-zero kernel values
            offx (integer array/list) horizontal offsets of non-zero kernel values
            vals (float array/list)   non-zero kernel values

        Raises:
            ValueError whenever offx, offx, vals do not have the same length
        """
        if len(offy) != len(offx) or len(vals) != len(offx):
            raise ValueError('offx, offy and vals must be 1D array-like with the same size')

        self.offx = offx
        self.offy = offy
        self.vals = vals

    def image( self, dim ):
        """Returns an image represenation of the kernel at the specified dimensions with the kernel centered.

        Args:
            dim (integer tuple) output image dimensions [height,width]

        Raises:
            ValueError when dim is not 2-element

        Returns:
            height x width image of float32 elements
        """
        if len(dim) != 2:
            raise ValueError('Expected a 2-element integer tuple/list for image dimensions')

        cen = ( int(dim[0]/2), int(dim[1]/2) )
        out = numpy.zeros( dim, dtype=numpy.float32 )
        for dx,dy,val in zip( self.offx, self.offy, self.vals):
            out[cen[0]+dy,cen[1]+dx] = val
        return out

    def spectrum( self, dim ):
        """Returns a spectrum representation of the kernel at the specified dimensions. The returned
        kernel is *not* centered, i.e. has it's DC component at the corner.

        Args:
            dim (integer tuple) output spectrum dimensions [height,width]

        Raises:
            ValueError when dim is not 2-element

        Returns:
            height x width complex image
        """
        return numpy.fft.fft2( numpy.roll( self.image(dim), [int(dim[0]/2+0.5),int(dim[1]/2+0.5)], axis=[0,1] ) )

    def convolve_spatial( self, I ):
        """Convolves the input image by the kernel in the spatial domain

        Args:
            I (2d float32 array) input image to convolve with kernel

        Raises:
            ValueError when input is not a 2D ndarray or I.dtype is not numpy.floating subclass

        Returns:
            Convolution results as 2D ndarray of I.dtype elements
        """
        if not isinstance(I,numpy.ndarray) or I.ndim != 2 or not numpy.issubdtype(I.dtype,numpy.floating):
            raise ValueError('Expected a 2D ndarray or float32 values')

        result = numpy.zeros( I.shape, dtype=I.dtype )

        for dx,dy,val in zip(self.offx, self.offy, self.vals):
            result += val*numpy.roll( I, [dy,dx], axis=[0,1] )

        return result

    def correlate_spatial( self, I ):
        """Correlates the input image with the kernel in the spatial domain

        """
        if not isinstance(I,numpy.ndarray) or I.ndim != 2 or not numpy.issubdtype(I.dtype,numpy.floating):
            raise ValueError('Expected a 2D ndarray or float32 values')

        result = numpy.zeros( I.shape, dtype=I.dtype )

        for dx,dy,val in zip(self.offx, self.offy, self.vals):
            result += val*numpy.roll( I, [-dy,-dx], axis=[0,1] )

        return result

    def convolve_fourier( self, I ):
        """Convolves the input image by the kernel in the Fourier domain

        Args:
            I (2d float32 array) input image to convolve with kernel

        Raises:
            ValueError when input is not a 2D ndarray or I.dtype is not numpy.floating subclass

        Returns:
            Convolution results as 2D ndarray of I.dtype elements
        """
        if not isinstance(I,numpy.ndarray) or I.ndim != 2 or not numpy.issubdtype(I.dtype,numpy.floating):
            raise ValueError('Expected a 2D ndarray or float32 values')

        fK = self.spectrum( I.shape )
        fI = numpy.fft.fft2( I )

        return numpy.real( numpy.fft.ifft2( fK*fI ) )

    def correlate_fourier( self, I ):
        """Correlates the input image by the kernel in the Fourier domain

        Args:
            I (2d float32 array) input image to convolve with kernel

        Raises:
            ValueError when input is not a 2D ndarray or I.dtype is not numpy.floating subclass

        Returns:
            Convolution results as 2D ndarray of I.dtype elements
        """
        if not isinstance(I,numpy.ndarray) or I.ndim != 2 or not numpy.issubdtype(I.dtype,numpy.floating):
            raise ValueError('Expected a 2D ndarray or float32 values')

        fK = self.spectrum( I.shape )
        fI = numpy.fft.fft2( I )

        return numpy.real( numpy.fft.ifft2( numpy.conj(fK)*fI ) )

Testing & Validation

To ensure that indexing is done correctly, it's important to check that the kernel can represent an identity operation correctly for odd and even image sizes in both Fourier and spatial domains. After that, it's important that the convolution and correlation functions return the correct results.

This last point depends on whether the convolution or correlation filter is being defined. The difference between the two is mirroring horizontally and vertically. Personally, I find it most natural to define the correlation filter which coincides with the typical finite difference stencils. Generating some analytic data as well as some known finite difference kernels allows correct operation to be confirmed:

def TestSparseKernel2D():
    """Runs a series of test functions on the SparseKernel2D class"""

    # local helper function that will raise an error if the
    # results are insufficiently accurate
    def MSE( x, ref ):
        val = numpy.mean( numpy.power( x-ref, 2 ) )
        if val > 1e-6:
            raise ValueError('Values do not match!')
        return val

    # define a mean-squared error function
    #MSE = lambda I, Ref: numpy.mean( numpy.power( I-Ref, 2 ) )

    # generate an identity kernel
    ident = SparseKernel2D( [0],[0],[1.0] )

    # generate some test images
    x,y = numpy.meshgrid( numpy.arange(0.,200.), numpy.arange(0.,100.) )
    z = x+y

    # check that convolving with the identity kernel
    # reproduces the input for even mesh sizes
    conv_z_s = ident.convolve_spatial( z )
    conv_z_f = ident.convolve_fourier( z )

    print('Identity test, even sizes: ',z.shape)
    print('\tMSE spatial: %f'%MSE(conv_z_s,z) )
    print('\tMSE fourier: %f'%MSE(conv_z_f,z) )

    # check that convolving with the identity kernel
    # reproduces the input for odd/even mesh sizes
    tz = z[1:,:]
    conv_z_s = ident.convolve_spatial( tz )
    conv_z_f = ident.convolve_fourier( tz )

    print('Identity test, odd/even sizes: ',tz.shape)
    print('\tMSE spatial: %f'%MSE(conv_z_s,tz) )
    print('\tMSE fourier: %f'%MSE(conv_z_f,tz) )

    # check that convolving with the identity kernel
    # reproduces the input for odd/even mesh sizes
    tz = z[:,1:]
    conv_z_s = ident.convolve_spatial( tz )
    conv_z_f = ident.convolve_fourier( tz )

    print('Identity test, even/odd sizes: ',tz.shape)
    print('\tMSE spatial: %f'%MSE(conv_z_s,tz) )
    print('\tMSE fourier: %f'%MSE(conv_z_f,tz) )

    # check that convolving with the identity kernel
    # reproduces the input for odd/odd mesh sizes
    tz = z[:-1,1:]
    conv_z_s = ident.convolve_spatial( tz )
    conv_z_f = ident.convolve_fourier( tz )

    print('Identity test, odd/odd sizes: ',tz.shape)
    print('\tMSE spatial: %f'%MSE(conv_z_s,tz) )
    print('\tMSE fourier: %f'%MSE(conv_z_f,tz) )

    # generate first order finite difference formula
    # try convolving in both the spatial and frequency
    # domains
    ddx = SparseKernel2D( [0,0],[-1,0],[-1.0,1.0] )
    ddx_z_s = ddx.correlate_spatial( z )[1:-2,1:-2]
    ddx_z_f = ddx.correlate_fourier( z )[1:-2,1:-2]
    ones = numpy.ones( ddx_z_s.shape, dtype=numpy.float32 )

    print('First order finite-difference d(x+y)/dx')
    print('\tMSE spatial: %f' % MSE(ddx_z_s,ones) )
    print('\tMSE fourier: %f' % MSE(ddx_z_f,ones) )

    # generate first order finite difference formula
    # try convolving in both the spatial and frequency
    # domains
    ddy = SparseKernel2D( [-1,0],[0,0],[-1.0,1.0] )
    ddy_z_s = ddy.correlate_spatial( z )[1:-2,1:-2]
    ddy_z_f = ddy.correlate_fourier( z )[1:-2,1:-2]
    ones = numpy.ones( ddy_z_s.shape, dtype=numpy.float32 )

    print('First order finite-difference d(x+y)/dy')
    print('\tMSE spatial: %f' % MSE(ddy_z_s,ones) )
    print('\tMSE fourier: %f' % MSE(ddy_z_f,ones) )

    # now try the convolution functions, this simply
    # flips the kernel about the central pixel
    # horizontally and vertically. for the filters
    # defined above, this negates the results
    ddx_z_s = ddx.convolve_spatial( z )[1:-2,1:-2]
    ddx_z_f = ddx.convolve_fourier( z )[1:-2,1:-2]
    ones = numpy.ones( ddx_z_s.shape, dtype=numpy.float32 )

    print('Convolved first order finite-difference d(x+y)/dx')
    print('\tMSE spatial: %f' % MSE(ddx_z_s,-ones) )
    print('\tMSE fourier: %f' % MSE(ddx_z_f,-ones) )

    # generate first order finite difference formula
    # try convolving in both the spatial and frequency
    # domains
    ddy_z_s = ddy.convolve_spatial( z )[1:-2,1:-2]
    ddy_z_f = ddy.convolve_fourier( z )[1:-2,1:-2]
    ones = numpy.ones( ddy_z_s.shape, dtype=numpy.float32 )

    print('Convolved first order finite-difference d(x+y)/dy')
    print('\tMSE spatial: %f' % MSE(ddy_z_s,-ones) )
    print('\tMSE fourier: %f' % MSE(ddy_z_f,-ones) )

This prints:

Identity test, even sizes:  (100, 200)
    MSE spatial: 0.000000
    MSE fourier: 0.000000
Identity test, odd/even sizes:  (99, 200)
    MSE spatial: 0.000000
    MSE fourier: 0.000000
Identity test, even/odd sizes:  (100, 199)
    MSE spatial: 0.000000
    MSE fourier: 0.000000
Identity test, odd/odd sizes:  (99, 199)
    MSE spatial: 0.000000
    MSE fourier: 0.000000
First order finite-difference d(x+y)/dx
    MSE spatial: 0.000000
    MSE fourier: 0.000000
First order finite-difference d(x+y)/dy
    MSE spatial: 0.000000
    MSE fourier: 0.000000
Convolved first order finite-difference d(x+y)/dx
    MSE spatial: 0.000000
    MSE fourier: 0.000000
Convolved first order finite-difference d(x+y)/dy
    MSE spatial: 0.000000
    MSE fourier: 0.000000

Sep 08, 2018

Bilateral Filtering in Python

Bilteratal filtering is a process for removing noise from images. Like most filters, each output pixel is produced as a weighted combination of input pixels. However unlike typical filters, the weights used to combine input pixels are a function of the input pixels themselves rather than fixed in advance. This makes it a non-linear filter and is key to its effectiveness.

Given an input grayscale image \({\bf I}(y,x)\) where \(x\) & \(y\) are column and row indices, bilateral filtering defines the output image \({\bf F}\) as:

$$ {\bf F}(y,x) = \frac{ \sum_{r=y-w}^{y+w} \sum_{s=x-w}^{x+w} {\bf I}(r,s) \mathcal{N}_s(r-y,s-x) \mathcal{N}_v \left( {\bf I}(y,x)-{\bf I}(s,r) \right) }{ \sum_{r=y-w}^{y+w} \sum_{s=x-w}^{x+w} \mathcal{N}_s(r-y,s-x) \mathcal{N}_v \left( {\bf I}(y,x)-{\bf I}(s,r) \right)} $$

This looks more involved than it actually is:

  • The numerator and denominator both sum over a window that is \(2w+1 \times 2w+1\) pixels in size, centered on \(y\),\(x\). The pixel coordinates \(r\),\(s\) reference each pixel within that window.
  • The pixel values of the input are combined via a weighted sum using the spatial proximity in \(\mathcal{N}_s(\delta_y,\delta_x) = e^{ -\frac{\delta_x^2+\delta_y^2}{2\sigma_s^2} }\), as in standard Gaussian filtering.
  • An additional factor based on the value proximity \(\mathcal{N}_v \left( v \right) = e^{ -\frac{v^2}{2 \sigma_v^2} }\) is used to modify the weights. This is the nonlinear part of the filter and causes pixels with similar value to the central pixel to be weighted more strongly.
  • The entire sum is normalized by the sum of the product of the spatial proximity and value proximity weights in order to produce an output with proper range.

The mathematical definition is good for understanding but a straightforward implementation results in four nested loops: two for the rows/columns and two for the offsets in each window. In pure python this leads to considerable overhead. A better approach is to vectorize the operations by:

  • Only looping over the offsets for the window
  • Generating the spatial proximity weights once per offset
  • Shifting the entire input image by these offsets
  • Generating the value proximity weights in builk using the shifted image
  • Accumulating an unnormalized result image and sum of weights image in bulk
  • Returning the pixel-wise quotient of the result and weight sum images

Code that implements this is below:

def filter_bilateral( img_in, sigma_s, sigma_v, reg_constant=1e-8 ):
    """Simple bilateral filtering of an input image

    Performs standard bilateral filtering of an input image. If padding is desired,
    img_in should be padded prior to calling

    Args:
        img_in       (ndarray) monochrome input image
        sigma_s      (float)   spatial gaussian std. dev.
        sigma_v      (float)   value gaussian std. dev.
        reg_constant (float)   optional regularization constant for pathalogical cases

    Returns:
        result       (ndarray) output bilateral-filtered image

    Raises: 
        ValueError whenever img_in is not a 2D float32 valued numpy.ndarray
    """

    # check the input
    if not isinstance( img_in, numpy.ndarray ) or img_in.dtype != 'float32' or img_in.ndim != 2:
        raise ValueError('Expected a 2D numpy.ndarray with float32 elements')

    # make a simple Gaussian function taking the squared radius
    gaussian = lambda r2, sigma: (numpy.exp( -0.5*r2/sigma**2 )*3).astype(int)*1.0/3.0

    # define the window width to be the 3 time the spatial std. dev. to 
    # be sure that most of the spatial kernel is actually captured
    win_width = int( 3*sigma_s+1 )

    # initialize the results and sum of weights to very small values for
    # numerical stability. not strictly necessary but helpful to avoid
    # wild values with pathological choices of parameters
    wgt_sum = numpy.ones( img_in.shape )*reg_constant
    result  = img_in*reg_constant

    # accumulate the result by circularly shifting the image across the
    # window in the horizontal and vertical directions. within the inner
    # loop, calculate the two weights and accumulate the weight sum and 
    # the unnormalized result image
    for shft_x in range(-win_width,win_width+1):
        for shft_y in range(-win_width,win_width+1):
            # compute the spatial weight
            w = gaussian( shft_x**2+shft_y**2, sigma_s )

            # shift by the offsets
            off = numpy.roll(img_in, [shft_y, shft_x], axis=[0,1] )

            # compute the value weight
            tw = w*gaussian( (off-img_in)**2, sigma_v )

            # accumulate the results
            result += off*tw
            wgt_sum += tw

    # normalize the result and return
    return result/wgt_sum

An example of calling the function is below:

# use opencv for image loading
import cv2
import numpy

# function definition here....

# read the lena image, convert to float and scale to [0,1]
I = cv2.imread('images/lena.png', cv2.IMREAD_UNCHANGED ).astype(numpy.float32)/255.0

# bilateral filter the image
B = numpy.stack([ 
        filter_bilateral( I[:,:,0], 10.0, 0.1 ),
        filter_bilateral( I[:,:,1], 10.0, 0.1 ),
        filter_bilateral( I[:,:,2], 10.0, 0.1 )], axis=2 )

# stack the images horizontally
O = numpy.hstack( [I,B] )

# write out the image
cv2.imwrite( 'images/lena_bilateral.png', out*255.0 )

Which produces the following, the input image is one the left and the output image is on the right.

Input Lena image and bilateral filtered

The only downside to this is the speed. Evaluating the value proximity weights is costly. It can be sped up by discretizing and interpolating them from a lookup table. Since the filter also preserves edges very well, there is often the temptation to use very large spatial filter sizes. In the example above I used a value of 10 which corresponds to a 61x61 pixel kernel. This makes the filter a prime candidate to implement in C++ or CUDA for efficiency.

Sep 02, 2018

16 bit Image IO with Python

Reading and writing 16 bit images is very important for a wide variety of topics but is surprisingly overlooked by many imaging libraries in Python.

There are a number of different Python extensions that support 16 bit images but have surprising restrictions or complications:

  • scikit-image+FreeImage plugin: apparently reads/writes 16 bit formats but I could not find any explanation of how to actually install it.
  • Pillow: extension of the PIL library but seems to only support monochrome 16 bit formats.
  • PyPNG: reads/writes 16 bit images but is terribly slow due to pure-Python implementation
  • easy_image_io: My own foray into this, reads/writes 16 bit pngs & tiffs but requires compilation

However, I recently discovered that OpenCV will happily read/write at least 16 bit png files. Installation is very easy:

pip install opencv-python

Reading and writing is then easily done:

import cv2
I = np.ones( (10,20,3) )*65535
cv2.imwrite( 'output16.png', I.astype(np.uint16) )
I2 = cv2.imread( 'output16.png', cv2.IMREAD_UNCHANGED )
I2.shape
# prints (10, 20, 3)
I2[1,1,1]
# prints 65535

This is by far the easiest way I've seen to get 16 bit image IO running under python, provided you're willing to introduce the relatively large dependency of OpenCV.