James Gregson's Website

Feb 09, 2019

Extract Image Patches in Python

There are a number of imaging algorithms that work with patch representations. Non-local means, dictionary learning and Gaussian Mixture Models are some examples.

In order to use these, you need to be able to extract image patches and reconstruct images efficiently. There are probably faster methods but the following does the trick for me without much in the way of dependencies:

import numpy

def extract_grayscale_patches( img, shape, offset=(0,0), stride=(1,1) ):
    """Extracts (typically) overlapping regular patches from a grayscale image

    Changing the offset and stride parameters will result in images
    reconstructed by reconstruct_from_grayscale_patches having different
    dimensions! Callers should pad and unpad as necessary!

        img (HxW ndarray): input image from which to extract patches

        shape (2-element arraylike): shape of that patches as (h,w)

        offset (2-element arraylike): offset of the initial point as (y,x)

        stride (2-element arraylike): vertical and horizontal strides

        patches (ndarray): output image patches as (N,shape[0],shape[1]) array

        origin (2-tuple): array of top and array of left coordinates
    px, py = numpy.meshgrid( numpy.arange(shape[1]),numpy.arange(shape[0]))
    l, t = numpy.meshgrid(
        numpy.arange(offset[0],img.shape[0]-shape[0]+1,stride[0]) )
    l = l.ravel()
    t = t.ravel()
    x = numpy.tile( px[None,:,:], (t.size,1,1)) + numpy.tile( l[:,None,None], (1,shape[0],shape[1]))
    y = numpy.tile( py[None,:,:], (t.size,1,1)) + numpy.tile( t[:,None,None], (1,shape[0],shape[1]))
    return img[y.ravel(),x.ravel()].reshape((t.size,shape[0],shape[1])), (t,l)

def reconstruct_from_grayscale_patches( patches, origin, epsilon=1e-12 ):
    """Rebuild an image from a set of patches by averaging

    The reconstructed image will have different dimensions than the
    original image if the strides and offsets of the patches were changed
    from the defaults!

        patches (ndarray): input patches as (N,patch_height,patch_width) array

        origin (2-tuple): top and left coordinates of each patch

        epsilon (scalar): regularization term for averaging when patches
            some image pixels are not covered by any patch

        image (ndarray): output image reconstructed from patches of
            size ( max(origin[0])+patches.shape[1], max(origin[1])+patches.shape[2])

        weight (ndarray): output weight matrix consisting of the count
            of patches covering each pixel
    patch_width  = patches.shape[2]
    patch_height = patches.shape[1]
    img_width    = numpy.max( origin[1] ) + patch_width
    img_height   = numpy.max( origin[0] ) + patch_height

    out = numpy.zeros( (img_height,img_width) )
    wgt = numpy.zeros( (img_height,img_width) )
    for i in range(patch_height):
        for j in range(patch_width):
            out[origin[0]+i,origin[1]+j] += patches[:,i,j]
            wgt[origin[0]+i,origin[1]+j] += 1.0

    return out/numpy.maximum( wgt, epsilon ), wgt

if __name__ == '__main__':
    import cv2
    import time
    import matplotlib.pyplot as plt

    img = cv2.imread('lena.png')[:,:,::-1]

    start = time.time()
    p, origin = extract_grayscale_patches( img[:,:,2], (8,8), stride=(1,1) )
    end = time.time()
    print( 'Patch extraction took: {}s'.format(numpy.round(end-start,2)) )
    start = time.time()
    r, w = reconstruct_from_grayscale_patches( p, origin )
    end = time.time()
    print('Image reconstruction took: {}s'.format(numpy.round(end-start,2)) )
    print( 'Reconstruction error is: {}'.format( numpy.linalg.norm( img[:r.shape[0],:r.shape[1],2]-r ) ) )

    plt.subplot( 131 )
    plt.imshow( img[:,:,2] )
    plt.title('Input image')
    plt.subplot( 132 )
    plt.imshow( p[p.shape[0]//2] )
    plt.title('Central patch')
    plt.subplot( 133 )
    plt.imshow( r )
    plt.title('Reconstructed image')

Remove the end bit to have only numpy as a dependency.

Feb 01, 2019

Assignment Problems

Assignment problems occur in many fields that deal with resource allocation. There are many variants, however the simplest involves \(N\) agents indexed by \(i\) which must perform \(N\) tasks indexed by \(j\). The cost for agent \(i\) to perform task \(j\) is encoded in a cost matrix \(C\) that is weighted component-wise by assignments from an assignment matrix \(W_{ij}\) to form the objective.

It's common to have the requirements that each task must be completed and that no agent can perform more than one task. Often it is required that each agent perform a single task to completion but I'm not considering that case here.

\begin{align*} \newcommand{\argmin}{{\bf\mbox{argmin}}} \newcommand{\st}{{\bf\mbox{subject to:}}} \newcommand{\gap}{\hspace{0.25cm}} W = \argmin\gap& &\sum_{ij} C_{ij} W_{ij} \\ \st\gap& & \sum_i W_{ij} = 1\gap\forall j\\ & & \sum_j W_{ij} = 1\gap\forall i\\ & & W_{ij} \geq 0 \gap\forall i,j \end{align*}

The problem can be solved using the Hungarian algorithm but I thought I'd give it a try with ADMM to see how well it works.

The probability simplex constraints are a pain but it's pretty easy to solve this by introducing splitting variables \(A, B\) each equal to \(W\). This gives the following modified problem:

\begin{align*} W = \argmin\gap& &\sum_{ij} C_{ij} W_{ij} + \sum_i \mathcal{I}_\Delta(A_{i*}) + \sum_j \mathcal{I}_\Delta(B_{*j}) \\ \st\gap& & W - A = 0 \\ & & W - B = 0 \end{align*}

The function \(\mathcal{I}_\Delta(v)\) is the indicator function for probability simplex constraints on the rows of \(V\):

\begin{equation*} \mathcal{I}_\Delta(v) = \begin{cases} 0 & \gap\mbox{if } v_i \geq 0 \mbox{ and } \sum_j v_i = 1 \gap\forall i \\ \infty &\gap\mbox{otherwise} \end{cases} \end{equation*}

The proximal operator for this functions requires Euclidean projection onto the non-negativity and sum contraints. It's possible to alternately project on the non-negativity and sum contraints but this converges slowly. However, an \(O(N \log N)\) algorithm exists and can be implemented in Python as below:

def prox_probability_simplex( v ):
    """Algorithm of https://eng.ucmerced.edu/people/wwang5/papers/SimplexProj.pdf"""
    j = numpy.arange(1,len(v)+1)
    u = numpy.sort(v)[::-1]
    y = numpy.cumsum( u )
    rho = numpy.argmax((u+(1.0-y)/j>0)*j)+1
    off = (1-y[rho-1])/rho
    ret = numpy.maximum( v+off, 0.0 )
    return ret

Neglecting constants w.r.t \(W, A, B\), the augmented Lagrangian of this problem is:

\begin{equation*} \mathcal{L}(W,A,U_A,B,U_B) = \sum_{ij} C_{ij} W_{ij} + \sum_i \mathcal{I}_\Delta(A_{i*}) + \sum_j \mathcal{I}_\Delta(B_{*i}) + \frac{\rho}{2}\| W - A + U_A \|_2^2 + \frac{\rho}{2}\| W - A + U_A \|_2^2 \end{equation*}

ADMM consists of alternating minimizations for the solution \(W\) and the splitting variables \(A\) and \(B\). After each set of updates, the (scaled) Lagrange multipliers \(U_A, U_B\) are updated. The objective terms parallelize really cleanly, the first and last two terms parallelize per-component of \(W\). The \(\mathcal{I}_\Delta(.)\) terms each parallelize independently over rows and columns of the splitting variables.

The resulting algorithm is as follows. I have not parallelized the solves although the \(W\) solve is vectorized at least:

def assignment_problem( C, W=None, rho=0.1, thresh=0.55, early_stop=True, num_iterations=10000 ):
    """Applies an ADMM algorithm to the assignment problem

    This algorithm will only solve the problem at the limit of iterations. Early
    stopping or running too few iterations could give a non-optimal or invalid
    matching (i.e. not a permutation of agents to tasks)

    That said, it often does okay!

        C (NxN matrix): Input cost matric mapping agents (row indices) to tasks
            (column indices). Invalid assignments should be indicated with very
            large positive entries

        W (NxN matrix): Initial assignments to warm-start the process or None
            to start from all zeros

        rho (scalar): Lagrangian augmentation weight

        thresh (scalar): Threshold above which assignments are treated as hard
            for the purposes of returned assignments

        early_stop (boolean): enable early stopping in the case that thresholding
            the assignment matrix by thresh produces a valid assignment. This
            does not necessarily produce an optimal solution!

        num_iterations (int): maximum number of iterations to perform, may be
            fewer if early_stop == True


        assignment (N-element integer array): assignments from agents to task
            where each element assignment[i]=v maps agent i to task v

        assignment matrix (NxN matrix): solution variables to warm-start if
            a valid solution was not found

        loss (scalar): the achieved loss/cost of the solution

        iter (integer): number of iterations performed
    if W is None:
        W = numpy.zeros_like(C)

    A = W.copy()
    B = W.copy()
    Ua = W-A
    Ub = W-B

    loss = numpy.sum(C*W)
    for iter in range( num_iterations ):
        # update primary variables, trivially parallelizes per component
        W = numpy.maximum( (rho*(A-Ua)+rho*(B-Ub)-C)/(2.0*rho), 0.0 )

        # update row constraints, trivially parallizes by row
        for i in range(A.shape[0]):
            A[i,:] = prox_probability_simplex( W[i,:] + Ua[i,:] )
        Ua += (W - A)

        # update column constraints, trivially parallelizes by column
        for i in range(A.shape[1]):
            B[:,i] = prox_probability_simplex( W[:,i] + Ub[:,i] )
        Ub += (W - B)

        tloss = numpy.sum(C*W)
        rowsum = numpy.sum(W > 0.55, axis=0)
        colsum = numpy.sum(W > 0.55, axis=1)
        if early_stop and numpy.max(rowsum) == 1 and numpy.min(rowsum) == 1 and numpy.max(colsum) == 1 and numpy.min(colsum) == 1:
            return numpy.argmax( W > thresh, axis=1 ), W, tloss, iter

    return numpy.argmax( W > thresh, axis=1 ), W, tloss, iter

def check_validity( x ):
    """Checks that a solution from the above is a permutation"""
    return numpy.linalg.norm( numpy.sort(x) - numpy.arange(len(x)) ) > 1e-8

It's worth noting a few things:

  • With the splitting and ADMM, this algorithm will only exactly solve the problem in the limit of iteration count.
  • Early stopping can reduce the iterations needed to achieve a valid, low-cost matching by 10-50X depending on the problem but this matching may not be optimal. For some test problems with 10x10 cost matrices having standard normal entries this happened about 1% of the time. The achieved loss values are typically within 1-2% of each other.
  • Early stopping does always generate identical output running the algorithm for many more iterations. Subsets of agents/tasks where multiple matchings produce close results tend to be hard to resolve.

Jan 28, 2019

Dictionary Learning

Dictionary learning consists of jointly solving for both a dictionary \(D\), the columns of which store features and weights \(X\) which store coefficients applied to \(D\) required to best approximate a collection of input datapoints stored as columns of \(Y\). The weights \(X\) are required to be sparse, i.e. have relatively few non-zero entries:

\begin{equation*} \newcommand{\gap}{\hspace{0.25cm}} \newcommand{\argmin}{{\bf\mbox{argmin}}} \newcommand{\st}{{\bf\mbox{subject to:}}} D, X = \argmin \frac{1}{2} \| Y - DX \|_F^2 + \lambda \| X \|_1 \end{equation*}

An additional constraint must be added to \(D\) because, without one, the 1-norm in the objective can be made arbitrarily small by allowing \(D\) to grow arbitrarily large. Typically this constraint is that columne of \(D\) have unit 2-norm.

This is a non-convex problem, but it can be solved relatively well by iterating weight and dictionary updates, leaving the other fixed. Ignoring the norm constraint, the dictionary update is a linear least-squares problem and can be solved as:

\begin{equation*} D = \left( \left( X X^T + \epsilon I \right)^{-1} X Y^T \right)^T \end{equation*}

After this, the columns can be normalized.

The weight update is a sparse-coding problem and can be solved using ADMM. The first step is to introduce a splitting variable that allows the Frobenius-norm and 1-norm to be handled separately:

\begin{align*} X = \argmin & & \frac{1}{2} \| Y - D X \|_F^2 + \lambda \| Z \|_1 \\ \st & & X - Z = 0 \end{align*}

Once in this form, the update algorithm for the weights can be written from inspection based on the link above using \(F(x) = \argmin \frac{1}{2}\| Y - D X \|_F^2\) and \(G(Z) = \lambda \| Z \|_1\).

\begin{align*} \newcommand{\prox}[2]{{\bf\mbox{prox}_{#1}}\left(#2\right)} X^{k+1} &=& \prox{F}{Z^k - U^k} \\ Z^{k+1} &=& \prox{\lambda G}{X^{k+1} + U^k} \\ U^{k+1} &=& U^k + X^{k+1} - Z^{k+1} \end{align*}

All that remains is to determine the proximal operators and initialization. The proximal operator for \(X\) is for a pretty conventional least-squares problem:

\begin{align*} \prox{F}{V} &=& \argmin\gap F(X) + \frac{1}{2}\| X - V \|_F^2 \\ &=& \argmin\gap \frac{1}{2} \| Y - D X \|_F^2 + \frac{1}{2} \| X - V \|_F^2 \\ &=& \left( D^T D + I \right)^{-1} \left( D^T Y + V \right) \end{align*}

The corresponding proximal operator for \(Z\) is for 'pure' 1-norm, which is just the soft-thresholding operator:

\begin{align*} \prox{\lambda G}{v} &=& \argmin\gap G(Z) + \frac{1}{2\lambda} \| Z - V \|_1 \\ &=& ( |V| - \lambda )_+ {\bf\mbox{sgn}}( V ) \end{align*}

If you're in doubt where either of these come from, consult the ADMM link above.

This leads to some interesting choices. Solving the split problem has two sets of variables, \(X\) and \(Z\). \(X\) minimizes the data term but \(Z\) enforces the constraints. Nominally, both of these should be equal but this only holds at the limit of iteration count, assuming the overall algorithm works at all (the problem is non-convex after all). Which should be used when updating \(D\)?

For a test problem involving around 5.5M points of 48 dof data, I found that using \(Z\) is vastly preferable. Using a dictionary with only 100 atoms (~2X overcomplete) yields reconstruction errors around 1% or less but the percentage of non-zeros in the coding weights is only around 25%. It settled relatively quickly. Using \(X\) on the other hand oscillated quite a bit and did not reach these levels of accuracy/sparsity.

I conclude that it's better to use a looser fit than slacker constraint satisfaction, for this problem at least. Presumably the dictionary/codebooks adapt to each other better when they are consistent. More 'compressible' problems may yield sparser solutions.

import numpy

def sparse_coding( Y, D, X, rho, num_iterations, Z=None, U=None ):
    if Z is None:
        Z = X.copy()
    if U is None:
        U = X - Z

    # precompute solve and part of RHS
    iDtD = numpy.linalg.inv( numpy.dot(D.transpose(),D) + numpy.eye(D.shape[1]) )
    DtY  = numpy.dot( D.transpose(), Y )

    for iter in range(num_iterations):
        print('    Sparse coding iteration [{}/{}]'.format(iter+1,num_iterations) )
        # primary update
        X = numpy.dot( iDtD, DtY + Z - U )

        # splitting variable update
        T = X + U
        Z = numpy.maximum( numpy.abs(T) - rho, 0.0)*numpy.sign(T)

        # lagrange multiplier update
        U = T - Z
    return X, Z, U

def dictionary_learning( Y, num_atoms, rho=0.001, num_outer_iterations=10, num_inner_iterations=10, epsilon=1e-8 ):
    # initialize the dictionary and weights
    X = numpy.random.standard_normal( (num_atoms, Y.shape[1]) )
    D = numpy.random.standard_normal( (Y.shape[0], num_atoms) )

    # outer loop to fit dictionary and weights
    Z = X.copy()
    U = X - Z
    for outer in range(num_outer_iterations):
        print( '  Outer iteration [{}/{}]'.format(outer+1,num_outer_iterations) )

        # dictionary update
        D = numpy.linalg.solve(
        for i in range(D.shape[1]):
            D[:,i] /= numpy.linalg.norm(D[:,i])

        # sparse coding weight update
        X, Z, U = sparse_coding( Y, D, X, rho, num_inner_iterations, Z, U )

        # print some stats
        print( '    ||Y-DX|| RMS error: {}'.format( numpy.sqrt(numpy.mean(numpy.square(Y - numpy.dot(D,Z)))) ) )
        print( '          mean(nnz(X)): {}'.format( numpy.mean( numpy.sum(numpy.abs(Z)>1e-4, axis=0) ) ) )

    # return dictionary and solution variables
    return D, Z

Jan 12, 2019

PyQt5 + PyOpenGL Example

It took a little bit for me to find working example code for PyOpenGL+PyQt5. The following exposes a GLUT-like interface where you can provide callback functions for basic window, mouse and keyboard events.

# simple_viewer.py

from PyQt5 import QtWidgets
from PyQt5 import QtCore
from PyQt5 import QtGui
from PyQt5 import QtOpenGL

class SimpleViewer(QtOpenGL.QGLWidget):

    initialize_cb    = QtCore.pyqtSignal()
    resize_cb        = QtCore.pyqtSignal(int,int)
    idle_cb          = QtCore.pyqtSignal()
    render_cb        = QtCore.pyqtSignal()

    mouse_move_cb    = QtCore.pyqtSignal( QtGui.QMouseEvent )
    mouse_press_cb   = QtCore.pyqtSignal( QtGui.QMouseEvent )
    mouse_release_cb = QtCore.pyqtSignal( QtGui.QMouseEvent )
    mouse_wheel_cb   = QtCore.pyqtSignal( QtGui.QWheelEvent )

    key_press_cb     = QtCore.pyqtSignal( QtGui.QKeyEvent )
    key_release_cb   = QtCore.pyqtSignal( QtGui.QKeyEvent )

    def __init__(self, parent=None):
        self.parent = parent
        QtOpenGL.QGLWidget.__init__(self, parent)

    def mouseMoveEvent( self, evt ):
        self.mouse_move_cb.emit( evt )

    def mousePressEvent( self, evt ):
        self.mouse_press_cb.emit( evt )

    def mouseReleaseEvent( self, evt ):
        self.mouse_release_cb.emit( evt )

    def keyPressEvent( self, evt ):

    def keyReleaseEvent( self, evt ):

    def initializeGL(self):

    def resizeGL(self, width, height):
        if height == 0: height = 1

    def paintGL(self):

And here is a basic example using it:

"""About the simplest PyQt OpenGL example with decent interaction"""
import sys

from OpenGL.GL import *
from OpenGL.GLU import *

from simple_viewer import SimpleViewer

width = 800
height = 600
aspect = width/height

def resize( w, h ):
    width = w
    height = h
    aspect = w/h
    glViewport( 0, 0, width, height )

def initialize():
    glClearColor( 0.7, 0.7, 1.0, 0.0 )

def render():

    glMatrixMode( GL_PROJECTION )
    gluPerspective( 45.0, aspect, 0.1, 10.0 )

    glMatrixMode( GL_MODELVIEW )
    gluLookAt( 0.0, 2.0, 5.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0 )

    glColor3f(  1.0, 0.0, 0.0 )
    glVertex3f( 1.0, 0.0, 0.0 )
    glVertex3f( 0.0, 0.0, 0.0 )
    glColor3f(  0.0, 1.0, 0.0 )
    glVertex3f( 0.0, 1.0, 0.0 )
    glVertex3f( 0.0, 0.0, 0.0 )
    glColor3f(  0.0, 0.0, 1.0 )
    glVertex3f( 0.0, 0.0, 1.0 )
    glVertex3f( 0.0, 0.0, 0.0 )

def mouse_move( evt ):
    print('Mouse move {}: [{},{}]'.format(evt.button(),evt.x(),evt.y()) )

def mouse_press( evt ):
    print('Mouse press {}: [{},{}]'.format(evt.button(),evt.x(),evt.y()) )

def mouse_release( evt ):
    print('Mouse release {}: [{},{}]'.format(evt.button(),evt.x(),evt.y()) )

def key_press( evt ):
    print('Key press {}'.format(evt.key()) )

def key_release( evt ):
    print('Key release {}'.format(evt.key()) )

# create the QApplication
app = SimpleViewer.application()

# set up the display
viewer = SimpleViewer()
viewer.resize_cb.connect( resize )
viewer.initialize_cb.connect( initialize )
viewer.render_cb.connect( render )

# keyboard & mouse interactions
viewer.key_press_cb.connect( key_press )
viewer.key_release_cb.connect( key_release )
viewer.mouse_press_cb.connect( mouse_press )
viewer.mouse_release_cb.connect( mouse_release )
viewer.mouse_move_cb.connect( mouse_move )

# resize the window
viewer.resize( width, height )

# main loop
except SystemExit:

Nov 15, 2018

Reordering Matrix Products

Here are a few operations for dealing with objective functions of matrix-valued variables:

Starting Definitions

Let \(A\) be a \(N\times M\) matrix with entry \(A_{ij}\) being the entry at row \(i\) and column \(j\). The \(i\)'th row of \(A\) is then \(A_{i*}\) and the \(j\)'th column is \(A_{*j}\).

Row and Column Vectorization

Define the column vectorization operator which returns a single column vector containing the columns of \(A\) stacked

\begin{equation*} \newcommand{\cvec}{{\bf\mbox{cvec}}} \cvec(A) = \begin{bmatrix} A_{*1} \\ A_{*2} \\ \vdots \\ A_{*M} \end{bmatrix} \end{equation*}

likewise the row vectorization operator returns a single column vector containing the (transposes of) rows of \(A\) stacked:

\begin{equation*} \newcommand{\rvec}{{\bf\mbox{rvec}}} \rvec(A) = \cvec(A^T) = \begin{bmatrix} A_{1*}^T \\ A_{2*}^T \\ \vdots \\ A_{N*} \end{bmatrix} \end{equation*}

In numpy, matrices are stored in row-major order, so the Python definition of these operaions is inverted as below:

def cvec( A ):
    """Stack columns of A into single column vector"""
    return rvec(A.T)

def rvec( A ):
    """Stack rows of A into a single column vector"""
    return A.ravel()

Similarly, the inverses of \(\cvec(A)\) and \(\rvec(A)\) unpack the vectorized values back into a matrix of the original shape:

\begin{align*} \newcommand{\cmat}{{\bf\mbox{cmat}}} \newcommand{\rmat}{{\bf\mbox{rmat}}} \cmat\left( \cvec(A), M \right) = A \\ \rmat\left( \rvec(A), N \right) = A \\ \end{align*}

with corresponding python definitions:

def rmat( v, nr ):
    """Reshape vector v into matrix with nr rows"""
    return v.reshape((nr,-1))

def cmat( v, nc ):
    """Reshape vector v into matrix with nc columns"""
    return v.reshape((cols,-1)).T

Finally, define two additional operators. The first of these is the spread operator, which takes the Kronecker product between an identity matrix and the input matrix, resulting in copies of the input matrix along the diagonal:

\begin{equation*} \newcommand{\spread}[2]{{\bf\mbox{spread}}_{#2}\left(#1\right)} \spread{A}{r} = I_{r\times r} \otimes A = \begin{bmatrix} A & 0 & \dots & 0 \\ 0 & A & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & A \end{bmatrix} \end{equation*}

The second of these is the split operator which reverses the order of the arguments to the Kronecker product, basically replacing each entry \(A_{ij}\) with \(A_{i,j} I_{r\times r}\):

\begin{equation*} \newcommand{\split}[2]{{\bf\mbox{split}}_{#2}\left(#1\right)} \split{A}{r} = A \otimes I_{r\times r} = \begin{bmatrix} A_{11} I_{r\times r} & A_{12} I_{r\times r} & \dots & A_{1M} I_{r\times r} \\ A_{21} I_{r\times r} & A_{22} I_{r\times r} & \dots & A_{2M} I_{r\times r} \\ \vdots & \vdots & \ddots & \vdots \\ A_{N1} I_{r\times r} & A_{N2} I_{r\times r} & \dots & A_{NM} I_{r\times r} \end{bmatrix} \end{equation*}

Python implementation of these are one-liners that just call the existing numpy Kronecker product:

def spread( M, n ):
    """Make n copies of M along the matrix diagonal"""
    return np.kron( np.eye(n), M )

def split( M, n ):
    """Replace each entry of M with a n-by-n identity matrix scaled by the entry value"""
    return np.kron( M, np.eye(n) )

With these, it is possible to define matrix products. There are also some fun properties that come from the Kronecker product:

\begin{align*} \spread{M}{n}^T &=& \spread{M^T}{n} \\ \spread{M}{n}^{-1} &=& \spread{M^{-1}}{n} \\ \split{M}{n}^T &=& \split{M^T}{n} \\ \split{M}{n}^{-1} &=& \split{M^{-1}}{n} \end{align*}

Matrix Products

Let \(X\) be a \(N\times K\) matrix and \(Y\) be a \(K\times M\) matrix, then their product \(W = XY\) will be a \(N\times M\) matrix and the following are true:

\begin{align*} \rvec(W) &=& \split{X}{M} \rvec(Y) \\ \cvec(W) &=& \spread{X}{M} \cvec(Y) \\ \rvec(W) &=& \spread{Y^T}{N} \rvec(X) \\ \cvec(W) &=& \split{Y^T}{N} \cvec(X) \end{align*}

What these let you do is unbury any (single) term that's in the middle of some horrendous expression. For example, say there's an expression \(W = AXB\) with \(A \in \mathcal{R}^{N\times R}\), \(X \in \mathcal{R}^{R\times S}\) and \(B \in \mathcal{R}^{S \times M}\) and you want to isolate \(X\) as a column vector. Then you have:

\begin{align*} \cvec(W) &=& \spread{A}{M} \split{B^T}{R} \cvec(X) \\ &=& \left( I_{M\times M} \otimes A \right) \left( B^T \otimes I_{R\times R} \right) \cvec(X) \\ &=& \left( I_{M\times M} B^T \right) \otimes \left( A I_{R\times R} \right) \cvec(X) \\ &=& \left( B^T \otimes A \right) \cvec(X) \end{align*}

The step in the third line is the Kronecker product identity:

\begin{equation*} \left( A \otimes B \right) \left( C \otimes D \right ) = \left( A C \right) \otimes \left( B D \right) \end{equation*}

The same steps can also be done if you want row-major storage of \(X\):

\begin{align*} \rvec(W) &=& \split{A}{M} \spread{B^T}{R} \rvec(X) \\ &=& \left( A \otimes I_{M\times M} \right) \left( I_{R\times R} \otimes B^T \right) \rvec(X) \\ &=& \left( A \otimes B^T \right) \rvec(X) \end{align*}

In these cases the dimensions work out correctly but you don't need to worry too much about them since the dimensions of \(X\) are constrained by the column could in \(A\) and the row count in \(B\). This only applies when operating with them symbolically of course.

Frobenius Norm Problems

With this information, it becomes pretty easy to deal with Frobenius norm problems like:

\begin{equation*} X = \mbox{argmin} \frac{1}{2} \| A X B - Y \|_F^2 \end{equation*}

I have found these frustrating in the past since I invariably end up in indexing hell. But it becomes much easier with the identities above. Applying them first of all isolates the X term to the right-hand-side of the expression and also converts the Frobenius norm to an \(L_2\) norm. Making the substitutions \(\tilde{x} = \cvec(X)\) and \(\tilde{y} = \cvec(y)\),

\begin{align*} \tilde{x}_* &=& \mbox{argmin} \frac{1}{2} \| \left( B^T \otimes A \right) \tilde{x} - \tilde{y} \|_2^2 \\ &=& \mbox{argmin} \frac{1}{2} \tilde{x}^T \left( B \otimes A^T \right) \left( B^T \otimes A \right) \tilde{x} - \tilde{x}^T \left( B \otimes A^T \right) \tilde{y} + \mbox{constants} \\ &=& \mbox{argmin} \frac{1}{2} \tilde{x}^T \left( B B^T \otimes A^T A \right) \tilde{x} - \tilde{x}^T \left( B \otimes A^T \right) \tilde{y} \end{align*}

This now looks like a standard least-squares problem, albeit one with this Kronecker product mashed into it. Setting the gradient to zero gives the solution as:

\begin{align*} \tilde{x}_* &=& \left( B B^T \otimes A^T A \right)^{-1} \left( B \otimes A^T \right ) \tilde{y} \\ &=& \left( (B B^T)^{-1} \otimes (A^T A)^{-1} \right) \left( B \otimes A^T \right) \tilde{y} \\ &=& \left( (B B^T)^{-1} B \otimes (A^T A)^{-1} A^T \right) \tilde{y} \end{align*}

This result has a similar form to \(AXB = \left( B^T \otimes A\right)\cvec(X)\), which can be used to get back to the matrix form of the expression:

\begin{align*} X &=& (A^T A)^{-1} A^T Y \left( (B B^T)^{-1} B\right)^T \\ &=& (A^T A)^{-1} A^T Y B^T (B B^T)^{-1} \end{align*}

It's also possible to come at this from a different direction

\begin{align*} X &=& \mbox{argmin} \frac{1}{2} \| A X B - Y \|_F^2 \\ &=& \mbox{argmin} \frac{1}{2} \mbox{Tr}\left( (AXB - Y)(AXB - Y)^T \right) \end{align*}

The gradient of this is given in the Matrix Cookbook.

\begin{equation*} \frac{\partial}{\partial X} \frac{1}{2} \mbox{Tr}\left( (AXB - Y)(AXB - Y)^T \right) = A^T \left( A X B - Y \right) B^T \end{equation*}

Setting the gradient of this to zero and isolating \(X\) gives the following:

\begin{align*} A^T A X B B^T - A^T Y B^T = 0 \\ X = (A^T A)^{-1} A^T Y B^T (B B^T)^{-1} \end{align*}

which matches the previous result.

Nov 10, 2018

Gradients are Row-Vectors

\begin{equation*} \newcommand{pd}[2]{\frac{\partial #1}{\partial #2}} \end{equation*}

Let \(x = \left[ x_1, x_2, x_3 \right]^T\) and \(y_1 = f(x)\) be a scalar function. Then \(\nabla y_1\) is:

\begin{equation*} \nabla y_1 = \left[ \pd{f}{x_1}, \pd{f}{x_2}, \pd{f}{x_3} \right] \end{equation*}

i.e., a row-vector. This allows it to be compatible with the corresponding definition of the Jacobian when, i.e, when

\begin{equation*} y = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} f_1(x) \\ f_2(x) \end{bmatrix} \end{equation*}

then the Jacobian of \(y\) is:

\begin{equation*} J_y = \begin{bmatrix} \pd{f_1}{x_1} & \pd{f_1}{x_2} & \pd{f_1}{x_3} \\ \pd{f_2}{x_1} & \pd{f_2}{x_2} & \pd{f_2}{x_3} \end{bmatrix} = \begin{bmatrix} \nabla f_1 \\ \nabla f_2 \end{bmatrix} \end{equation*}

This also allows linearization of the scalar or vector function to be consistent. Here \(\delta x\)

\begin{align*} y_1(x+\delta x) \approx y_1(x) + \nabla y_1 \delta x \\ y (x+\delta x) \approx y(x) + J_y \delta x \end{align*}

That's great, but where it is useful is in differentiating more complex expressions. For example in the context of optimization or regression \(E = \frac{1}{2} \| y \|^2\). \(E\) is a scalar function of a vector function of a vector and it's a pain to differentiate this symbolically with respect to the parameters \(x\).

What helps here is this:

\begin{equation*} \pd{E}{x} = \pd{E}{y} \pd{y}{x} = y^T J_y \end{equation*}

The transpose on the \(y\) factor is included because it it fits the definition that the gradient of a scalar function taking a vector argument is a row-vector. The above expression expands to:

\begin{equation*} \pd{E}{x} = \left[ y_1, y_2 \right] \begin{bmatrix} \pd{f_1}{x_1} & \pd{f_1}{x_2} & \pd{f_1}{x_3} \\ \pd{f_2}{x_1} & \pd{f_2}{x_2} & \pd{f_2}{x_3} \end{bmatrix} = \begin{bmatrix} y_1 \nabla y_1 + y_2 \nabla y_2 \end{bmatrix} \end{equation*}

which shows that the terms involving \(y_1\) and \(y_2\) are correctly decoupled and only combine through the summation implied by the 2-norm.

What about when \(y = A x - b\)? Well, it's pretty easy to substitute in:

\begin{align*} \pd{E}{y} = y^T \\ \pd{y}{x} = A \\ \pd{E}{x} = (A x - b)^T A \end{align*}

Wait! What's up with that? Isn't it supposed to be \(A^T(A x -b)\)? Well, yes, but that's when gradients are column vectors. Which they're not here. And if you transpose the row vector result you get the expected column vector result.

\begin{equation*} \pd{E}{x}^T = \left( (A x - b)^T A \right)^T = A^T (A x -b) \end{equation*}

Playing devil's advocate

So what happens if you decide to make gradients column vectors? Well, for starters, you have to redefine the del operator from \(\nabla = \left[ \pd{}{x_1}, \pd{}{x_2}, \dots, \pd{}{x_N} \right]\) to:

\begin{equation*} \nabla = \begin{bmatrix} \pd{}{x_1} \\ \pd{}{x_2} \\ \vdots \\ \pd{}{x_3} \end{bmatrix} \end{equation*}

Redefining fundamental mathematical things should be the first clue this is wrong but pressing on using the same definitions, you get:

\begin{equation*} \nabla y_1 = \begin{bmatrix} \pd{y_1}{x_1} \\ \pd{y_1}{x_2} \\ \pd{y_1}{x_3} \end{bmatrix} \end{equation*}

Stacking these gradients column-wise gives something resembling the Jacobian but which is actually its transpose:

\begin{equation*} \pd{y}{x} = J_y^T = \begin{bmatrix} \pd{y_1}{x_1} & \pd{y_2}{x_1} \\ \pd{y_1}{x_2} & \pd{y_2}{x_2} \\ \pd{y_1}{x_3} & \pd{y_2}{x_3} \end{bmatrix} \end{equation*}

The chain rule then gives:

\begin{equation*} \pd{E}{x} = \pd{E}{y} \pd{y}{x} \end{equation*}

which leads to total garbage since the array dimensions are not compatible for matrix multiplication anymore.

\begin{equation*} \pd{E}{x} \neq y J^T = \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \begin{bmatrix} \pd{y_1}{x_1} & \pd{y_2}{x_1} \\ \pd{y_1}{x_2} & \pd{y_2}{x_2} \\ \pd{y_1}{x_3} & \pd{y_2}{x_3} \end{bmatrix} \hspace{0.25cm}{\bf\mbox{Incompatible!}} \end{equation*}

But this can be fixed by changing the order of the chain rule from \(\pd{E}{x} = \pd{E}{y} \pd{y}{x}\) to \(\pd{E}{x} = \pd{y}{x} \pd{E}{y}\) to get:

\begin{equation*} \pd{E}{x} = J^T y \end{equation*}

which for the example of \(y = A x - b\) gives the expected column oriented gradient of \(\nabla E = A^T (A x - b)\). The difference here is that matrix composition left multiplies new transformations onto an existing one while the chain rule is typically written with right multiplication. This is really the only difference between the column-oriented and row-oriented gradients, other than the discrepancy between the definitions of the gradient and Jacobian matrix. When the Jacobian and gradient are defined correctly (i.e. row-oriented) this does not occur.

What this ends up meaning is that in order for the results to work out correctly with column-oriented gradients you have to carry the Jacobian transpose throughout your work and remember to re-order the chain rule, otherwise you need to forensically reconstruct what the dimensions should be from the scalar terms of your objective function (in optimization anway). I feel this is too high a price to pay.

On the other hand, defining points in one domain as column-vectors and gradients as row-vectors means that expressions involving both need some added boilerplate, e.g.:

\begin{equation*} v = x - \delta \nabla y \hspace{0.25cm}{\bf\mbox{Incompatible!}} \end{equation*}

with \(\delta\) a scalar no longer have compatible dimensions. Instead the equation above would need to be expressed as:

\begin{equation*} v = x - \delta (\nabla y)^T \end{equation*}

This is awkward but tends not to be a big problem since many programming langages have good array broadcasting that allows the transpose to be omitted when programming.

Some Random Example


\begin{equation*} A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix} \end{equation*}


\begin{equation*} B = \begin{bmatrix} b_{11} & b_{12} & b_{13} & b_{14} \\ b_{21} & b_{22} & b_{23} & b_{24} \end{bmatrix} \end{equation*}

and let \(A\) & \(B\) be functions of \(\theta \in \mathcal{R}^5\). Now try to minimize the Frobenius norm of \(Y = AB\):

\begin{equation*} F(\theta) = \frac{1}{2} \| Y \|_F^2 = \mbox{Tr}\left(Y Y^T\right) \end{equation*}

\(F(\theta)\) is a scalar and \(\theta\) is a vector so the gradient should be a row-vector of size \(1\times5\) in the end. Starting with the chain rule an issue is evident:

\begin{equation*} \nabla F(\theta) = Y \pd{Y}{\theta} \end{equation*}

\(Y\) is the wrong shape \(3\times4\) and \(\pd{Y}{\theta}\) is a \(3 \times 4 \times 5\) array where the final index (k) holds \(\pd{Y}{\theta_k}\). In order to make this work, we can concatenate the rows \(y_{i*}\) of \(Y\) to get a \(1\times 12\) matrix \(\tilde{Y}\) i.e.

\begin{equation*} \tilde{Y} = \begin{bmatrix} y_{1*} & y_{2*} & y_{3*} \end{bmatrix} = \mbox{vec}(Y^T)^T \end{equation*}

where \(\mbox{vec}(P)\) stacks the columns of \(P\) into a column vector. We can likewise stack the columns \(y_{*j}\) of \(Y\) to get a \(12x1\) column vector \(\hat{Y}=\mbox{vec}(Y)\). The derivative of this with respect to \(\theta\) is just the Jacobian of each column of \(Y\), stacked row-wise:

\begin{equation*} \pd{\hat{Y}}{\theta} = \begin{bmatrix} \pd{y_{*1}}{\theta} \\ \pd{y_{*2}}{\theta} \\ \pd{y_{*3}}{\theta} \\ \pd{y_{*4}}{\theta} \end{bmatrix} \end{equation*}

By doing this, the sum over all matrix entries implied by the Frobenius norm is absorbed as an inner product of \(\tilde{Y} \in \mathcal{R}^{1\times12}\) and \(\pd{\hat{Y}}{\theta} \in \mathcal{R}^{12\times5}\). This gives the correct dimensions for the gradient of \(1\times 5\). Note that everything works pretty cleanly by adopting the row-vector form for gradients.

We still need to compute the \(\pd{Y_{*i}}{\theta}\) terms. Starting from the definition of \(Y = AB\) it's pretty easy to see that \(y_{*i} = A b_{*i}\). We want to get \(\pd{y_{*i}}{\theta} \in \mathcal{R}^{3\times5}\) and so go to the chain rule:

\begin{equation*} \pd{y_{*i}}{\theta} = A \pd{b_{*i}}{\theta} + \pd{A}{\theta} b_{*i} \end{equation*}

The first term is no problem, it is simply a transform (\(A\)) applied to the Jacobian of \(b_{*i}\). The shapes work out correctly. The second term is more problematic, \(\pd{A}{\theta}\) is a \(3\times 2 \times 5\) array while \(b_{*i}\) is \(2 \times 1\). One way to address this is to stack rows of \(A\) into a column vector using the vectorization operator and splay \(b_{*i}\) using the Kronecker Product:

\begin{equation*} A b_{*i} = \begin{bmatrix} b_{*i}^T & {\bf 0} & {\bf 0} \\ {\bf 0} & b_{*i}^T & {\bf 0} \\ {\bf 0} & {\bf 0} & b_{*i}^T \end{bmatrix} \begin{bmatrix} a_{1*}^T \\ a_{2*}^T \\ a_{3*}^T \end{bmatrix} = \left( {\bf I_{3\times3}} \otimes b_{*i}^T \right) \mbox{vec}(A^T) \end{equation*}

Nov 06, 2018

Inverse Kinematics

I've been looking into inverse kinematics a bit lately. Inverse kinematics is the problem of solving for joint parameters given a linkage and end-effector position. There are a number of algorithms for doing this but I'm most interested in cases where there may be cycles and (slightly) incompatible joint configurations.

Two link IK chain

Given two links A and B with initial reference frames \(F_A\) and \(F_B\) connected by a joint J with initial reference frame \(F_J\), the transformations of each link reference frame to the joint reference frame are:

\begin{align*} T_{AJ} = F_J^{-1} F_A \\ T_{BJ} = F_J^{-1} F_B \end{align*}

These transforms are fixed for all time using the reference frames for the links and joint during setup. Using these transforms allows the links to be brought into a common (but unknown) joint frame:

\begin{align*} \hat{T}_A = T_{AJ} T_A \\ \hat{T}_B = T_{BJ} T_B \end{align*}

Once in the joint frame, the joint constraint functions can be defined. In the case of a ball and socket joint this is:

\begin{equation*} \hat{T}_A = R \hat{T}_B \end{equation*}

This allows the joint transformation to be found:

\begin{equation*} R = \hat{T}_A \hat{T}_B^{-1} \end{equation*}

Once the transformation is found, its parameters can be extracted and/or constrained. In the case of the ball and socket joint, this requires setting the translational components to zero, producing a constrained configuration \(R_*\). With \(R_*\) defined, it's then possible to express the constraint in the global frame for points \(P_A\) & \(P_B\) defined in the coordinates of A and B:

\begin{equation*} T_{AJ} T_A p_A = R_* T_{BJ} T_B P_b \end{equation*}

This equation forms the basis of constraints for the joint. The constraints can be enforced by ensuring this equation holds for three points that are not collinear. However the points are expressed in the frames of A and B) which is inconvenient. Instead each point can be expressed in the frame of \(F_J\) initially and transformed by \(T_{AJ}^{-1}\) and \(T_{BJ}^{-1}\) respectively:

\begin{align*} p_A = T_{AJ}^{-1} p_j \\ P_B = T_{BJ}^{-1} p_j \\ T_{AJ} T_A p_A = R_* T_{BJ} T_B p_B \end{align*}

This allows basis vectors in the frame of the joint to be used to specify the correspondences for the constraints.

To actually solve this in practice, the constraint equation can be linearized. If \(\alpha\) and \(\delta \alpha\) are the transformation parameters and incremental change of A, then \(T_A\) can be approximated with a first order Taylor series:

\begin{equation*} T_{\alpha+\delta\alpha} p_A \approx T_\alpha p_A + J_{\alpha, p_A} \delta \alpha \end{equation*}

where \(J_{\alpha,p_A}\) is the Jacobian of the transform of \(p_A\) with respect to link parameters \(\alpha\). This lets the constraint be written as:

\begin{equation*} T_{AJ} \left( T_\alpha p_A + J_{\alpha,p_A} \delta \alpha \right) = R_* T_{BJ} \left( T_\beta p_B + J_{\beta,p_B} \delta \beta \right) \end{equation*}

which can be simplified to:

\begin{equation*} T_{AJ} J_{\alpha,p_A} \delta \alpha - R_* T_{BJ} J_{\beta,p_B} \delta \beta = R_* T_{BJ} T_\beta p_B - T_{AJ} T_\alpha p_A \end{equation*}

Nov 02, 2018

Rotation Matrix Euler Angles

This page describes a basic approach to extracting Euler angles from rotation matrices. This follows what OpenCV does, (which is where I got the process) but I also wanted to understand why this approach was used.

Here I'm considering rotation performed first around x-axis (\(\alpha\)), then y-axis (\(\beta\)) and finally z-axis (\(\gamma\)). For this, the rotation matrix is shown below. Other rotation orders are analogous.

\begin{equation*} R(\alpha,\beta,\gamma) = \left[\begin{matrix}\cos{\left (\beta \right )} \cos{\left (\gamma \right )} & \sin{\left (\alpha \right )} \sin{\left (\beta \right )} \cos{\left (\gamma \right )} - \sin{\left (\gamma \right )} \cos{\left (\alpha \right )} & \sin{\left (\alpha \right )} \sin{\left (\gamma \right )} + \sin{\left (\beta \right )} \cos{\left (\alpha \right )} \cos{\left (\gamma \right )} & 0\\\sin{\left (\gamma \right )} \cos{\left (\beta \right )} & \sin{\left (\alpha\right )} \sin{\left (\beta \right )} \sin{\left (\gamma \right )} + \cos{\left (\alpha \right )} \cos{\left (\gamma \right )} & - \sin{\left (\alpha \right )} \cos{\left (\gamma \right )} + \sin{\left (\beta \right )} \sin{\left (\gamma \right )} \cos{\left (\alpha \right )} & 0\\- \sin{\left (\beta \right )} & \sin{\left (\alpha \right )} \cos{\left (\beta \right )} & \cos{\left (\alpha \right )} \cos{\left (\beta \right )} & 0\\0 & 0 & 0 & 1\end{matrix}\right] \end{equation*}

To solve for the Euler angles, we can use a little bit of trigonometry. The best way, in my opinion, is to find pairs of entries that are each products of two factors. Using the identity \(\sin^2 + \cos^2 = 1\) can be used to isolate specific angles, e.g. \(R_{0,0}\) and \(R_{0,1}\) can be used to find \(\cos(\beta)\) which unlocks the rest of the angles:

\begin{align*} (-1)^k \cos(\beta + k \pi) = \sqrt{ \cos^2(\beta) ( \sin^2(\gamma) + \cos^2(\gamma)) } = \sqrt{ R_{0,0}^2 + R_{1,0}^2 } \\ \cos(\beta) \approx \sqrt{ R_{0,0}^2 + R_{1,0}^2 } \end{align*}

Pay attention to the first line: For any true value of \(\beta\), this formula will return the same value for \(\beta + k\pi\) where \(k\) is an arbitrary integer. This will in turn determine the angles \(\alpha\) and \(\gamma\) in order to be compatible. The approximate equality will return \(\beta \in [-\pi/2, \pi/2]\).

Once \(\cos(\beta)\) is found we can find the angles directly using atan2:

\begin{align*} \beta = \mbox{atan2}( -R_{2,0}, \cos(\beta) ) \\ \alpha = \mbox{atan2}\left( \frac{R_{2,1}}{\cos(\beta)}, \frac{R_{2,2}}{\cos(\beta)} \right) \\ \gamma = \mbox{atan2}\left( \frac{R_{1,0}}{\cos(\beta)}, \frac{R_{0,0}}{\cos(\beta)} \right) \end{align*}

The other issue occurs when \(|\cos(\beta)| = 0\), which causes division by zero in the equations for \(\alpha,\gamma\) (\(\beta\) is still well defined). In this case there are a number of options. What OpenCV does is to arbitrarily decide that \(\gamma = 0\), which means that \(\sin(\gamma) = 0\) and \(\cos(\gamma) = 1\). The formulas in this case are:

\begin{align*} \beta = \mbox{atan2}( -R_{2,0}, \cos(\beta) ) \\ \alpha = \mbox{atan2}( -R_{1,2}, R_{1,1} ) \\ \gamma = 0 \end{align*}

There are, of course, other options. Python code implementing these operations is below, along with unit testing for the degenerate and non-degerate cases.

import unittest
import numpy

def x_rotation( theta ):
    """Generate a 4x4 homogeneous rotation matrix about x-axis"""
    c = numpy.cos(theta)
    s = numpy.sin(theta)
    return numpy.array([
        [ 1, 0, 0, 0],
        [ 0, c,-s, 0],
        [ 0, s, c, 0],
        [ 0, 0, 0, 1]

def y_rotation( theta ):
    """Generate a 4x4 homogeneous rotation matrix about y-axis"""
    c = numpy.cos(theta)
    s = numpy.sin(theta)
    return numpy.array([
        [ c, 0, s, 0],
        [ 0, 1, 0, 0],
        [-s, 0, c, 0],
        [ 0, 0, 0, 1]

def z_rotation( theta ):
    """Generate a 4x4 homogeneous rotation matrix about z-axis"""
    c = numpy.cos(theta)
    s = numpy.sin(theta)
    return numpy.array([
        [ c,-s, 0, 0],
        [ s, c, 0, 0],
        [ 0, 0, 1, 0],
        [ 0, 0, 0, 1]

def xyz_rotation( angles ):
    """Generate 4x4 homogeneous rotation in x then y then z order"""
    return numpy.dot( z_rotation(angles[2]), numpy.dot( y_rotation(angles[1]), x_rotation(angles[0]) ) )

def xyz_rotation_angles( R, eps=1e-8, check=False ):
    """Back out the Euler angles that would lead to the given matrix"""

    if check and numpy.linalg.norm( numpy.dot( R.transpose(), R ) - numpy.eye(4) ) > eps:
        raise ValueError('Input matrix is not a pure rotation, R\'R != I')

    cos_beta = numpy.sqrt( R[0,0]**2 + R[1,0]**2 )
    if numpy.abs(cos_beta) > eps:
        alpha = numpy.arctan2( R[2,1]/cos_beta, R[2,2]/cos_beta )
        beta  = numpy.arctan2(-R[2,0], cos_beta )
        gamma = numpy.arctan2( R[1,0]/cos_beta, R[0,0]/cos_beta )
        return numpy.array((alpha,beta,gamma))
        alpha = numpy.arctan2(-R[1,2], R[1,1] )
        beta  = numpy.arctan2(-R[2,0], cos_beta )
        gamma = 0
        return numpy.array((alpha,beta,gamma))

class test_euler_angles(unittest.TestCase):
    def test_angles(self):
        """Do fuzz testing on arbitrary rotations, limiting beta to valid range"""
        for i in range(1000):
            alpha = (numpy.random.rand()-0.5)*numpy.pi*2
            beta  = (numpy.random.rand()-0.5)*numpy.pi*0.9999
            gamma = (numpy.random.rand()-0.5)*numpy.pi*2

            ang = xyz_rotation_angles( xyz_rotation([alpha,beta,gamma]))
            self.assertAlmostEqual( alpha, ang[0] )
            self.assertAlmostEqual( beta,  ang[1] )
            self.assertAlmostEqual( gamma, ang[2] )

    def test_degeneracies(self):
        """Do fuzz testing on the degenerate condition of beta = +- pi/2"""
        for i in range(1000):
            alpha = (numpy.random.rand()-0.5)*numpy.pi*2
            beta  = numpy.sign(numpy.random.randn())*numpy.pi/2
            gamma = (numpy.random.rand()-0.5)*numpy.pi*2
            R     = xyz_rotation((alpha,beta,gamma))
            ang   = xyz_rotation_angles( R )
            R2    = xyz_rotation( ang )
            # check that beta is recovered, gamma is set to zero and
            # the rotation matrices match to high precision
            self.assertAlmostEqual( beta, ang[1] )
            self.assertAlmostEqual( ang[2], 0 )
            for j in range(16):
                self.assertAlmostEqual( R.ravel()[j], R2.ravel()[j] )

if __name__ == '__main__':

Nov 02, 2018

Transformation Matrix Jacobians

Starting with the \(4\times4\) homogeneous transform matrix with parameters \(\beta = [ \theta_x, \theta_y, \theta_z, t_x, t_y, t_z ]^T\) where rotations are performed in XYZ order and using the following substitutions:

\begin{align*} c_x = \cos(\theta_x) \\ s_x = \sin(\theta_x) \\ c_y = \cos(\theta_y) \\ s_y = \sin(\theta_y) \\ c_z = \cos(\theta_z) \\ s_z = \sin(\theta_z) \\ \end{align*}

the vector function mapping a point \(p = [p_x, p_y, p_z, 1]^T\) in the body coordinate system to a point in the world coordinate system \(w = [w_x, w_y, w_z, 1]^T\) is:

\begin{equation*} \begin{bmatrix} w_x \\ w_y \\ w_z \\ 1 \end{bmatrix} = F( p, \beta ) = \left[\begin{matrix}c_{y} c_{z} & - c_{x} s_{z} + c_{z} s_{x} s_{y} & c_{x} c_{z} s_{y} + s_{x} s_{z} & t_{x}\\c_{y} s_{z} & c_{x} c_{z} + s_{x} s_{y} s_{z} & c_{x} s_{y} s_{z} - c_{z} s_{x} & t_{y}\\- s_{y} & c_{y} s_{x} & c_{x} c_{y} & t_{z}\\0 & 0 & 0 & 1\end{matrix}\right]\begin{bmatrix} p_x \\ p_y \\ p_z \\ 1 \end{bmatrix} \end{equation*}

and the Jacobian with respect to the parameters \(\beta\) is:

\begin{equation*} J_F = \left[\begin{matrix}p_{y} \left(c_{x} c_{z} s_{y} + s_{x} s_{z}\right) + p_{z} \left(c_{x} s_{z} - c_{z} s_{x} s_{y}\right) & c_{x} c_{y} c_{z} p_{z} + c_{y} c_{z} p_{y} s_{x} - c_{z}p_{x} s_{y} & - c_{y} p_{x} s_{z} + p_{y} \left(- c_{x} c_{z} - s_{x} s_{y} s_{z}\right) + p_{z} \left(- c_{x} s_{y} s_{z} + c_{z} s_{x}\right) & 1 & 0 & 0\\p_{y} \left(c_{x} s_{y} s_{z} - c_{z} s_{x}\right) + p_{z} \left(- c_{x} c_{z} - s_{x} s_{y} s_{z}\right) & c_{x} c_{y} p_{z} s_{z} + c_{y} p_{y} s_{x} s_{z} - p_{x} s_{y} s_{z} & c_{y} c_{z} p_{x} + p_{y} \left(- c_{x} s_{z} + c_{z} s_{x} s_{y}\right) + p_{z} \left(c_{x} c_{z} s_{y} + s_{x} s_{z}\right) & 0 & 1 & 0\\c_{x} c_{y} p_{y} - c_{y} p_{z} s_{x} & - c_{x} p_{z} s_{y} - c_{y} p_{x} -p_{y} s_{x} s_{y} & 0 & 0 & 0 & 1\\0 & 0 & 0 & 0 & 0 & 0\end{matrix}\right] \end{equation*}

Python code for these respective operations is below:

def make_transform( beta ):
    c_x = numpy.cos(beta[0])
    s_x = numpy.sin(beta[0])
    c_y = numpy.cos(beta[1])
    s_y = numpy.sin(beta[1])
    c_z = numpy.cos(beta[2])
    s_z = numpy.sin(beta[2])
    t_x = beta[3]
    t_y = beta[4]
    t_z = beta[5]
    return numpy.array([
        [c_y*c_z, -c_x*s_z + c_z*s_x*s_y, c_x*c_z*s_y + s_x*s_z, t_x],
        [c_y*s_z, c_x*c_z + s_x*s_y*s_z, c_x*s_y*s_z - c_z*s_x, t_y],
        [-s_y, c_y*s_x, c_x*c_y, t_z],
        [0, 0, 0, 1]

def make_transform_jacobian( beta, p ):
    c_x = numpy.cos(beta[0])
    s_x = numpy.sin(beta[0])
    c_y = numpy.cos(beta[1])
    s_y = numpy.sin(beta[1])
    c_z = numpy.cos(beta[2])
    s_z = numpy.sin(beta[2])
    t_x = beta[3]
    t_y = beta[4]
    t_z = beta[5]
    p_x = p[0]
    p_y = p[1]
    p_z = p[2]
    return numpy.array([
        [p_y*(c_x*c_z*s_y + s_x*s_z) + p_z*(c_x*s_z - c_z*s_x*s_y), c_x*c_y*c_z*p_z + c_y*c_z*p_y*s_x - c_z*p_x*s_y, -c_y*p_x*s_z + p_y*(-c_x*c_z - s_x*s_y*s_z) + p_z*(-c_x*s_y*s_z + c_z*s_x), 1, 0, 0],
        [p_y*(c_x*s_y*s_z - c_z*s_x) + p_z*(-c_x*c_z - s_x*s_y*s_z), c_x*c_y*p_z*s_z + c_y*p_y*s_x*s_z - p_x*s_y*s_z, c_y*c_z*p_x + p_y*(-c_x*s_z + c_z*s_x*s_y) + p_z*(c_x*c_z*s_y + s_x*s_z), 0, 1, 0],
        [c_x*c_y*p_y - c_y*p_z*s_x, -c_x*p_z*s_y - c_y*p_x - p_y*s_x*s_y, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0]

I generated these using sympy to build the transformations and used a finite-difference Jacobian function to check the output.

Oct 25, 2018

Create Docker Image

This process follows this link . First update apt and install docker.io

$ sudo apt-get update
$ sudo apt install docker.io

Next start the service and enable it at boot time:

$ sudo systemctl start docker
$ sudo systemctl enable docker

Finally add your current user to the docker group:

$ sudo usermod -a -G docker $USER

and log out of your account and back in.

Create Dockerfile

Create a blank Dockerfile

$ touch Dockerfile

and then edit it to look like the following:

# start with ubuntu
FROM ubuntu:16.04

# update the software repository
RUN apt-get update
RUN apt-get install -y python3-dev

# add user appuser and switch to it
RUN useradd -ms /bin/bash appuser
USER appuser
WORKDIR /home/appuser

# copy over a test message and python
# script to print it to the screen
COPY data/ ./data/
COPY print_message.py .

# run the script at container startup
CMD ["python3", "print_message.py"]

Next make a directory to share with the image and put a simple message file into it.

$ mkdir data
$ echo "Hello from docker" > data/message.txt

Then create a python script to print the message:

if __name__ == '__main__':

    with open('data/message.txt') as f:
        data = f.read()
        print( data )

Build the image and run it:

$ docker build -t test_image
$ docker run -it test_image
Hello from docker

And that's all there is to it.

Next → Page 1 of 2