James Gregson's Website

Jun 14, 2021

Least Squares & MLE

Given a linear model \(y_i = x_i^T \beta\), we collect \(N\) observations \(\tilde{y_i} = y_i + \epsilon_i\) where \(\epsilon_i \sim \mathcal{N}(0,\sigma^2)\). The likelihood of any one of our observations \(\tilde{y_i}\) given current estimates of our parameters \(\beta\) is then:

\begin{equation*} \newcommand{\exp}[1]{{\mbox{exp}\left( #1 \right)}} \newcommand{\log}[1]{{\mbox{log}\left( #1 \right)}} \DeclareMathOperator*{\argmax}{argmax} \DeclareMathOperator*{\argmin}{argmin} P(\tilde{y}_i | \beta) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp{-\frac{1}{2}\left(\frac{e_i}{\sigma}\right)^2} = \frac{1}{\sqrt{2\pi\sigma^2}} \exp{-\frac{1}{2}\left(\frac{\tilde{y_i}-x_i^T \beta}{\sigma}\right)^2} \end{equation*}

and the likelihood of all of them occuring is:

\begin{equation*} P(\tilde{y} | \beta) = \prod_{i=1}^N P(\tilde{y}_i | \beta ) \end{equation*}

The parameters most likely to explain the observations, given only the observations and definition of the model, are those that maximize this likelihood. Equivalently, we can find the parameters that minimize the negative log-likelihood since the two have the same optima:

\begin{align*} \beta^* &=& \argmax_\beta P(\tilde{y} | \beta) \\ &=& \argmin_\beta - \ln( P(\tilde{y} | \beta ) \\ &=& \argmin_\beta - N\log{\frac{1}{\sqrt{2\pi\sigma^2}}} - \log{ \prod_{i=1}^N P(\tilde{y}_i | \beta ) } \end{align*}

The benefit of this is that the terms decouple in the case of normally distributed residuals. Ditching constant factors and collecting the product terms into the exponent the above becomes:

\begin{align*} \beta^* &=& \argmin_\beta -\ln\left( \exp{-\frac{1}{2}\sum_{i=1}^N \left(\frac{\tilde{y}_i - M_i^T \beta}{\sigma}\right)^2} \right) \\ &=& \argmin_\beta \frac{1}{2\sigma^2} \sum_{i=1}^N \left( \tilde{y}_i - M_i^T \beta \right)^2 \end{align*}

which gives the least-squares version.

Least Squares & MAP

We can also add a prior for the parameters \(\beta\), e.g. \(\beta_i \sim \mathcal{N}(0,\sigma_\beta)\). In this case, each component of \(\beta\) is independent. The MAP estimate for \(\beta\) is then:

\begin{align*} \beta^* &=& \argmax_\beta P(\tilde{y} | \beta) P(\beta) \\ &=& \argmax_\beta \prod_i P(\tilde{y}_i | \beta) \prod_j P(\beta_j) \\ &=& \argmax_\beta \prod_i \frac{1}{\sqrt{2\pi\sigma^2}}\exp{ -\frac{1}{2}\left(\frac{e_i}{\sigma}\right)^2} \prod_j \frac{1}{\sqrt{2\pi\sigma_\beta^2}}\exp{ -\frac{1}{2}\left(\frac{\beta_j}{\sigma_\beta}\right)^2 } \end{align*}

The same negative log trick can then be used to convert products to sums, also ditching constants, to get:

\begin{align*} \beta^* &=& \argmin \frac{1}{2\sigma^2} \sum_i e_i^2 + \frac{1}{2\sigma_\beta^2} \sum_j \beta_j^2 \\ &=& \argmin \frac{1}{2\sigma^2} \sum_i \left( \tilde{y}_i - x_i^T \beta \right)^2 + \frac{1}{2\sigma_\beta^2} \sum_j \beta_j^2 \end{align*}

Using different priors will of course give different forms for the second term, e.g. choosing components of \(\beta\) to follow a Laplace distribution \(\beta_j \sim \frac{\lambda}{2} \exp{-\lambda |\beta_j|}\), gives:

\begin{equation*} \beta^* = \argmin \frac{1}{2\sigma^2} \sum_i \left( \tilde{y}_i - x_i^T \beta \right)^2 + \lambda \sum_j |\beta_j| \end{equation*}
posted at 00:00  ·   ·  Math  MLE  MAP

Dec 13, 2020

Essential Matrix

The essential matrix relates normalized coordinates in one view with those in another view of the same scene and is widely used in stereo reconstruction algorithms. It is a 3x3 matrix with rank 2, having only 8 degrees of freedom.

Given \(p\) as a normalized point in image P and \(q\) as the corresponding normalied point in another image Q of the same scene, the essential matrix provides the constraint:

\begin{equation*} q^T E p = 0 \end{equation*}

It's important that \(p\) and \(q\) be normalized homogeneous points, i.e. the following where \([u_p,v_p]\) are the pixel coordinates of point p:

\begin{equation*} p = K^{-1} \begin{bmatrix} u_p \\ v_p \\ 1 \end{bmatrix} \end{equation*}

By fixing the extrinsics of image P as \(R_P = I\) and \(t_P = [0,0,0]^T\) the essential matrix relating P and Q can defined by the relationship:

\begin{equation*} E = t_\times R \end{equation*}

where \(R\) and \(t\) are the relative transforms of camera Q with respect to camera P and \(t_\times\) is the matrix representation of the the cross-product:

\begin{equation*} t_\times = \begin{bmatrix} 0 & -t_z & t_y \\ t_z & 0 & -t_x \\ -t_y & t_x & 0 \end{bmatrix} \end{equation*}

Nov 07, 2019

Load/Save Wavefront .OBJ Files in Python

It's pretty useful to be able to load meshes when doing graphics. One format that is nearly ubiquitous are Wavefront .OBJ files, due largely to their good features-to-complexity ratio. Sure there are better formats, but when you just want to get a mesh into a program it's hard to beat .obj which can be imported and exported from just about anyhwere.

The code below loads and saves .obj files, including material assignments (but not material parameters), normals, texture coordinates and faces of any number of vertices. It supports faces with/without either/both of normals and texture coordinates and (if you can accept 6D vertices) also suports per-vertex colors (a non-standard extension sometimes used for debugging in mesh processing). Faces without either normals or tex-coords assign the invalid value -1 for these entries.

From the returned object, it is pretty easy to do things like sort faces by material indices, split vertices based on differing normals or along texture boundaries and so on. It also has the option to tesselate non-triangular faces, although this only works for convex faces (you should only be using convex faces anyway though!).

# wavefront.py
import numpy as np

class WavefrontOBJ:
    def __init__( self, default_mtl='default_mtl' ):
        self.path      = None               # path of loaded object
        self.mtllibs   = []                 # .mtl files references via mtllib
        self.mtls      = [ default_mtl ]    # materials referenced
        self.mtlid     = []                 # indices into self.mtls for each polygon
        self.vertices  = []                 # vertices as an Nx3 or Nx6 array (per vtx colors)
        self.normals   = []                 # normals
        self.texcoords = []                 # texture coordinates
        self.polygons  = []                 # M*Nv*3 array, Nv=# of vertices, stored as vid,tid,nid (-1 for N/A)

def load_obj( filename: str, default_mtl='default_mtl', triangulate=False ) -> WavefrontOBJ:
    """Reads a .obj file from disk and returns a WavefrontOBJ instance

    Handles only very rudimentary reading and contains no error handling!

    Does not handle:
        - relative indexing
        - subobjects or groups
        - lines, splines, beziers, etc.
    # parses a vertex record as either vid, vid/tid, vid//nid or vid/tid/nid
    # and returns a 3-tuple where unparsed values are replaced with -1
    def parse_vertex( vstr ):
        vals = vstr.split('/')
        vid = int(vals[0])-1
        tid = int(vals[1])-1 if len(vals) > 1 and vals[1] else -1
        nid = int(vals[2])-1 if len(vals) > 2 else -1
        return (vid,tid,nid)

    with open( filename, 'r' ) as objf:
        obj = WavefrontOBJ(default_mtl=default_mtl)
        obj.path = filename
        cur_mat = obj.mtls.index(default_mtl)
        for line in objf:
            toks = line.split()
            if not toks:
            if toks[0] == 'v':
                obj.vertices.append( [ float(v) for v in toks[1:]] )
            elif toks[0] == 'vn':
                obj.normals.append( [ float(v) for v in toks[1:]] )
            elif toks[0] == 'vt':
                obj.texcoords.append( [ float(v) for v in toks[1:]] )
            elif toks[0] == 'f':
                poly = [ parse_vertex(vstr) for vstr in toks[1:] ]
                if triangulate:
                    for i in range(2,len(poly)):
                        obj.mtlid.append( cur_mat )
                        obj.polygons.append( (poly[0], poly[i-1], poly[i] ) )
                    obj.polygons.append( poly )
            elif toks[0] == 'mtllib':
                obj.mtllibs.append( toks[1] )
            elif toks[0] == 'usemtl':
                if toks[1] not in obj.mtls:
                cur_mat = obj.mtls.index( toks[1] )
        return obj

def save_obj( obj: WavefrontOBJ, filename: str ):
    """Saves a WavefrontOBJ object to a file

    Warning: Contains no error checking!

    with open( filename, 'w' ) as ofile:
        for mlib in obj.mtllibs:
            ofile.write('mtllib {}\n'.format(mlib))
        for vtx in obj.vertices:
            ofile.write('v '+' '.join(['{}'.format(v) for v in vtx])+'\n')
        for tex in obj.texcoords:
            ofile.write('vt '+' '.join(['{}'.format(vt) for vt in tex])+'\n')
        for nrm in obj.normals:
            ofile.write('vn '+' '.join(['{}'.format(vn) for vn in nrm])+'\n')
        if not obj.mtlid:
            obj.mtlid = [-1] * len(obj.polygons)
        poly_idx = np.argsort( np.array( obj.mtlid ) )
        cur_mat = -1
        for pid in poly_idx:
            if obj.mtlid[pid] != cur_mat:
                cur_mat = obj.mtlid[pid]
                ofile.write('usemtl {}\n'.format(obj.mtls[cur_mat]))
            pstr = 'f '
            for v in obj.polygons[pid]:
                # UGLY!
                vstr = '{}/{}/{} '.format(v[0]+1,v[1]+1 if v[1] >= 0 else 'X', v[2]+1 if v[2] >= 0 else 'X' )
                vstr = vstr.replace('/X/','//').replace('/X ', ' ')
                pstr += vstr
            ofile.write( pstr+'\n')

A function that round-trips the loader is shown below, the resulting mesh dump.obj can be loaded into Blender and show the same material ids and texture coordinates. If you inspect it (and remove comments) you will see that the arrays match the string in the function:

from wavefront import *

def obj_load_save_example():
    data = '''
# slightly edited blender cube
mtllib cube.mtl
v 1.000000 1.000000 -1.000000
v 1.000000 -1.000000 -1.000000
v 1.000000 1.000000 1.000000
v 1.000000 -1.000000 1.000000
v -1.000000 1.000000 -1.000000
v -1.000000 -1.000000 -1.000000
v -1.000000 1.000000 1.000000
v -1.000000 -1.000000 1.000000
vt 0.625000 0.500000
vt 0.875000 0.500000
vt 0.875000 0.750000
vt 0.625000 0.750000
vt 0.375000 0.000000
vt 0.625000 0.000000
vt 0.625000 0.250000
vt 0.375000 0.250000
vt 0.375000 0.250000
vt 0.625000 0.250000
vt 0.625000 0.500000
vt 0.375000 0.500000
vt 0.625000 0.750000
vt 0.375000 0.750000
vt 0.125000 0.500000
vt 0.375000 0.500000
vt 0.375000 0.750000
vt 0.125000 0.750000
vt 0.625000 1.000000
vt 0.375000 1.000000
vn 0.0000 0.0000 -1.0000
vn 0.0000 1.0000 0.0000
vn 0.0000 0.0000 1.0000
vn -1.0000 0.0000 0.0000
vn 1.0000 0.0000 0.0000
vn 0.0000 -1.0000 0.0000
usemtl mat1
# no texture coordinates
f 6//1 5//1 1//1 2//1
usemtl mat2
f 1/5/2 5/6/2 7/7/2 3/8/2
usemtl mat3
f 4/9/3 3/10/3 7/11/3 8/12/3
usemtl mat4
f 8/12/4 7/11/4 5/13/4 6/14/4
usemtl mat5
f 2/15/5 1/16/5 3/17/5 4/18/5
usemtl mat6
# no normals
f 6/14 2/4 4/19 8/20
    with open('test_cube.obj','w') as obj:
        obj.write( data )
    obj = load_obj( 'test_cube.obj' )
    save_obj( obj, 'dump.obj' )

if __name__ == '__main__':

Hope it's useful!

Nov 05, 2019

Useful Mesh Functions in Python

These are collections of functions that I end up re-implementing frequently.

import numpy as np

def manifold_mesh_neighbors( tris: np.ndarray ) -> np.ndarray:
    """Returns an array of triangles neighboring each triangle

        tris (Nx3 int array): triangle vertex indices

        nbrs (Nx3 int array): neighbor triangle indices,
            or -1 if boundary edge
    if tris.shape[1] != 3:
        raise ValueError('Expected a Nx3 array of triangle vertex indices')

    e2t = {}
    for idx,(a,b,c) in enumerate(tris):
        e2t[(b,a)] = idx
        e2t[(c,b)] = idx
        e2t[(a,c)] = idx

    nbr = np.full(tris.shape,-1,int)
    for idx,(a,b,c) in enumerate(tris):
        nbr[idx,0] = e2t[(a,b)] if (a,b) in e2t else -1
        nbr[idx,1] = e2t[(b,c)] if (b,c) in e2t else -1
        nbr[idx,2] = e2t[(c,a)] if (c,a) in e2t else -1
    return nbr

if __name__ == '__main__':
    tris = np.array((
    nbrs = manifold_mesh_neighbors( tris )

    tar = np.array((
    if not np.allclose(nbrs,tar):
        raise ValueError('uh oh.')

Nov 04, 2019

Useful Transformation Functions in Python

These are collections of functions that I end up re-implementing frequently.

from typing import *
import numpy as np

def closest_rotation( M: np.ndarray ):
    U,s,Vt = np.linalg.svd( M[:3,:3] )
    if np.linalg.det(U@Vt) < 0:
        Vt[2] *= -1.0
    return U@Vt

def mat2quat( M: np.ndarray ):
    if np.abs(np.linalg.det(M[:3,:3])-1.0) > 1e-5:
        raise ValueError('Matrix determinant is not 1')
    if np.abs( np.linalg.norm( M[:3,:3].T@M[:3,:3] - np.eye(3)) ) > 1e-5:
        raise ValueError('Matrix is not orthogonal')
    w = np.sqrt( 1.0 + M[0,0]+M[1,1]+M[2,2])/2.0
    x = (M[2,1]-M[1,2])/(4*w)
    y = (M[0,2]-M[2,0])/(4*w)
    z = (M[1,0]-M[0,1])/(4*w)
    return np.array((x,y,z,w),float)

def quat2mat( q: np.ndarray ):
    qx,qy,qz,qw = q/np.linalg.norm(q)
    return np.array((
        (1 - 2*qy*qy - 2*qz*qz,     2*qx*qy - 2*qz*qw,     2*qx*qz + 2*qy*qw),
        (    2*qx*qy + 2*qz*qw, 1 - 2*qx*qx - 2*qz*qz,     2*qy*qz - 2*qx*qw),
        (    2*qx*qz - 2*qy*qw,         2*qy*qz + 2*qx*qw, 1 - 2*qx*qx - 2*qy*qy),

if __name__ == '__main__':
    for i in range( 1000 ):
        q = np.random.standard_normal(4)
        q /= np.linalg.norm(q)
        A = quat2mat( q )
        s = mat2quat( A )
        if np.linalg.norm( s-q ) > 1e-6 and np.linalg.norm( s+q ) > 1e-6:
            raise ValueError('uh oh.')

        R = closest_rotation( A+np.random.standard_normal((3,3))*1e-6 )
        if np.linalg.norm(A-R) > 1e-5:
            print( np.linalg.norm( A-R ) )
            raise ValueError('uh oh.')

Nov 03, 2019

Mesh Processing in Python: Implementing ARAP deformation

Python is a pretty nice language for prototyping and (with numpy and scipy) can be quite fast for numerics but is not great for algorithms with frequent tight loops. One place I run into this frequently is in graphics, particularly applications involving meshes.

Not much can be done if you need to frequently modify mesh structure. Luckily, many algorithms operate on meshes with fixed structure. In this case, expressing algorithms in terms of block operations can be much faster. Actually implementing methods in this way can be tricky though.

To gain more experience in this, I decided to try implementing the As-Rigid-As-Possible Surface Modeling mesh deformation algorithm in Python. This involves solving moderately sized Laplace systems where the right-hand-side involves some non-trivial computation (per-vertex SVDs embedded within evaluation of the Laplace operator). For details on the algorithm, see the link.

As it turns out, the entire implementation ended up being just 99 lines and solves 1600 vertex problems in just under 20ms (more on this later). Full code is available.

Building the Laplace operator

ARAP uses cotangent weights for the Laplace operator in order to be relatively independent of mesh grading. These weights are functions of the angles opposite the edges emanating from each vertex. Building this operator using loops is slow but can be sped up considerably by realizing that every triangle has exactly 3 vertices, 3 angles and 3 edges. Operations can be evaluated on triangles and accumulated to edges.

Here's the Python code for a cotangent Laplace operator:

from typing import *
import numpy as np
import scipy.sparse as sparse
import scipy.sparse.linalg as spla

def build_cotan_laplacian( points: np.ndarray, tris: np.ndarray ):
    a,b,c = (tris[:,0],tris[:,1],tris[:,2])
    A = np.take( points, a, axis=1 )
    B = np.take( points, b, axis=1 )
    C = np.take( points, c, axis=1 )

    eab,ebc,eca = (B-A, C-B, A-C)
    eab = eab/np.linalg.norm(eab,axis=0)[None,:]
    ebc = ebc/np.linalg.norm(ebc,axis=0)[None,:]
    eca = eca/np.linalg.norm(eca,axis=0)[None,:]

    alpha = np.arccos( -np.sum(eca*eab,axis=0) )
    beta  = np.arccos( -np.sum(eab*ebc,axis=0) )
    gamma = np.arccos( -np.sum(ebc*eca,axis=0) )

    wab,wbc,wca = ( 1.0/np.tan(gamma), 1.0/np.tan(alpha), 1.0/np.tan(beta) )
    rows = np.concatenate((   a,   b,   a,   b,   b,   c,   b,   c,   c,   a,   c,   a ), axis=0 )
    cols = np.concatenate((   a,   b,   b,   a,   b,   c,   c,   b,   c,   a,   a,   c ), axis=0 )
    vals = np.concatenate(( wab, wab,-wab,-wab, wbc, wbc,-wbc,-wbc, wca, wca,-wca, -wca), axis=0 )
    L = sparse.coo_matrix((vals,(rows,cols)),shape=(points.shape[1],points.shape[1]), dtype=float).tocsc()
    return L

Inputs are float \(3 \times N_{verts}\) and integer \(N_{tris} \times 3\) arrays for vertices and triangles respectively. The code works by making arrays containing vertex indices and positions for all triangles, then computes and normalizes edges and finally computes angles and cotangents. Once the cotangents are available, initialization arrays for a scipy.sparse.coo_matrix are constructed and the matrix built. The coo_matrix is intended for finite-element matrices so it accumulates rather than overwrites duplicate values. The code is definitely faster and most likely fewer lines than a loop based implementation.

Evaluating the Right-Hand-side

The Laplace operator is just a warm-up exercise compared to the right-hand-side computation. This involves:

  • For every vertex, collecting the edge-vectors in its one-ring in both original and deformed coordinates
  • Computing the rotation matrices that optimally aligns these neighborhoods, weighted by the Laplacian edge weights, per-vertex. This involves a SVD per-vertex.
  • Rotating the undeformed neighborhoods by the per-vertex rotation and summing the contributions.

In order to vectorize this, I constructed neighbor and weight arrays for the Laplace operator. These store the neighboring vertex indices and the corresponding (positive) weights from the Laplace operator. A catch is that vertices have different numbers of neighbors. To address this I use fixed sized arrays based on the maximum valence in the mesh. These are initialized with default values of the vertex ids and weights of zero so that unused vertices contribute nothing and do not affect matrix structure.

To compute the rotation matrices, I used some index magic to accumulate the weighted neighborhoods and rely on numpy's batch SVD operation to vectorize over all vertices. Finally, I used the same index magic to rotate the neighborhoods and accumulate the right-hand-side.

The code to accumulate the neighbor and weight arrays is:

def build_weights_and_adjacency( points: np.ndarray, tris: np.ndarray, L: Optional[sparse.csc_matrix]=None ):
    L = L if L is not None else build_cotan_laplacian( points, tris )
    n_pnts, n_nbrs = (points.shape[1], L.getnnz(axis=0).max()-1)
    nbrs = np.ones((n_pnts,n_nbrs),dtype=int)*np.arange(n_pnts,dtype=int)[:,None]
    wgts = np.zeros((n_pnts,n_nbrs),dtype=float)

    for idx,col in enumerate(L):
        msk = col.indices != idx
        indices = col.indices[msk]
        values  = col.data[msk]
        nbrs[idx,:len(indices)] = indices
        wgts[idx,:len(indices)] = -values

Rather than break down all the other steps individually, here is the code for an ARAP class:

class ARAP:
    def __init__( self, points: np.ndarray, tris: np.ndarray, anchors: List[int], anchor_weight: Optional[float]=10.0, L: Optional[sparse.csc_matrix]=None ):
        self._pnts    = points.copy()
        self._tris    = tris.copy()
        self._nbrs, self._wgts, self._L = build_weights_and_adjacency( self._pnts, self._tris, L )

        self._anchors = list(anchors)
        self._anc_wgt = anchor_weight
        E = sparse.dok_matrix((self.n_pnts,self.n_pnts),dtype=float)
        for i in anchors:
            E[i,i] = 1.0
        E = E.tocsc()
        self._solver = spla.factorized( ( self._L.T@self._L + self._anc_wgt*E.T@E).tocsc() )

    def n_pnts( self ):
        return self._pnts.shape[1]

    def n_dims( self ):
        return self._pnts.shape[0]

    def __call__( self, anchors: Dict[int,Tuple[float,float,float]], num_iters: Optional[int]=4 ):
        con_rhs = self._build_constraint_rhs(anchors)
        R = np.array([np.eye(self.n_dims) for _ in range(self.n_pnts)])
        def_points = self._solver( self._L.T@self._build_rhs(R) + self._anc_wgt*con_rhs )
        for i in range(num_iters):
            R = self._estimate_rotations( def_points.T )
            def_points = self._solver( self._L.T@self._build_rhs(R) + self._anc_wgt*con_rhs )
        return def_points.T

    def _estimate_rotations( self, def_pnts: np.ndarray ):
        tru_hood = (np.take( self._pnts, self._nbrs, axis=1 ).transpose((1,0,2)) - self._pnts.T[...,None])*self._wgts[:,None,:]
        rot_hood = (np.take( def_pnts,   self._nbrs, axis=1 ).transpose((1,0,2)) - def_pnts.T[...,None])

        U,s,Vt = np.linalg.svd( rot_hood@tru_hood.transpose((0,2,1)) )
        R = U@Vt
        dets = np.linalg.det(R)
        Vt[:,self.n_dims-1,:] *= dets[:,None]
        R = U@Vt
        return R

    def _build_rhs( self, rotations: np.ndarray ):
        R = (np.take( rotations, self._nbrs, axis=0 )+rotations[:,None])*0.5
        tru_hood = (self._pnts.T[...,None]-np.take( self._pnts, self._nbrs, axis=1 ).transpose((1,0,2)))*self._wgts[:,None,:]
        rhs = np.sum( (R@tru_hood.transpose((0,2,1))[...,None]).squeeze(), axis=1 )
        return rhs

    def _build_constraint_rhs( self, anchors: Dict[int,Tuple[float,float,float]] ):
        f = np.zeros((self.n_pnts,self.n_dims),dtype=float)
        f[self._anchors,:] = np.take( self._pnts, self._anchors, axis=1 ).T
        for i,v in anchors.items():
            if i not in self._anchors:
                raise ValueError('Supplied anchor was not included in list provided at construction!')
            f[i,:] = v
        return f

The indexing was tricky to figure out. Error handling and input checking can obviously be improved. Something I'm not going into are the handling of anchor vertices and the overall quadratic forms that are being solved.

So does it Work?

Yes. Here is a 2D equivalent to the bar example from the paper:

import time
import numpy as np
import matplotlib.pyplot as plt

import arap

def grid_mesh_2d( nx, ny, h ):
    x,y = np.meshgrid( np.linspace(0.0,(nx-1)*h,nx), np.linspace(0.0,(ny-1)*h,ny))
    idx = np.arange(nx*ny,dtype=int).reshape((ny,nx))
    quads = np.column_stack(( idx[:-1,:-1].flat, idx[1:,:-1].flat, idx[1:,1:].flat, idx[:-1,1:].flat ))
    tris  = np.vstack((quads[:,(0,1,2)],quads[:,(0,2,3)]))
    return np.row_stack((x.flat,y.flat)), tris, idx

nx,ny,h = (21,5,0.1)
pnts, tris, ix = grid_mesh_2d( nx,ny,h )

anchors = {}
for i in range(ny):
    anchors[ix[i,0]]    = (       0.0,           i*h)
    anchors[ix[i,1]]    = (         h,           i*h)
    anchors[ix[i,nx-2]] = (h*nx*0.5-h,  h*nx*0.5+i*h)
    anchors[ix[i,nx-1]] = (  h*nx*0.5,  h*nx*0.5+i*h)

deformer = arap.ARAP( pnts, tris, anchors.keys(), anchor_weight=1000 )

start = time.time()
def_pnts = deformer( anchors, num_iters=2 )
end = time.time()

plt.triplot( pnts[0], pnts[1], tris, 'k-', label='original' )
plt.triplot( def_pnts[0], def_pnts[1], tris, 'r-', label='deformed' )
plt.title('deformed in {:0.2f}ms'.format((end-start)*1000.0))

The result of this script is the following:

As Rigid As Possible deformation

This is qualitatively quite close to the result for two iterations in Figure 3 of the ARAP paper. Note that the timings may be a bit unreliable for such a small problem size.

A corresponding result for 3D deformations is here (note that this corrects an earlier bug in rotation computations for 3D):

As Rigid As Possible deformation in 3D


By using SciPy's sparse matrix functions and pre-factoring the systems, the solves are very quick. This is because SciPy uses optimized libraries for the factorization and backsolves. Prefactoring the system matrices for efficiency was a key point of the paper and it's nice to be able to take advantage of this from Python.

To look at the breakdown of time taken for right-hand-side computation and solves, I increased the problem size to 100x25, corresponding to a 2500 vertex mesh. Total time for two iterations was roughly 35ms, of which 5ms were the linear solves and 30ms were the right-hand-side computations. So the right-hand-side computation completely dominates the overall time. That said, using a looping approach, I could not even build the Laplace matrix in 30ms to say nothing of building the right-hand-side three times over, so the vectorization has paid off. More profiling would be needed to say if the most time-consuming portion is the SVDs or the indexing but overall I consider it a success to implement a graphics paper that has been cited more than 700 times in 99 lines of Python.

Mar 09, 2019

Tsai Camera Calibration

Projection of known 3D points into a pinhole camera can be modeled by the following equation:

\begin{align*} \begin{bmatrix} \hat{u}_c \\ \hat{v}_c \\ \hat{w}_c \end{bmatrix} = \begin{bmatrix} f & 0 & u_0 \\ 0 & f & v_0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{xx} & r_{xy} & r_{xz} & t_x \\ r_{yx} & r_{yy} & r_{yz} & t_y \\ r_{zx} & r_{zy} & r_{zz} & t_z \end{bmatrix} \begin{bmatrix} x_w \\ y_w \\ z_w \\ 1 \end{bmatrix} \\ = K E x \end{align*}

The matrix \(K\) contains the camera intrinsics and the matrix \(E\) contains the camera extrinsics. The final image coordinates are obtained by perspective division: \(u_c = \hat{u}_c/\hat{w}_c\) and \(v_c = \hat{v}_c/\hat{w}_c\).

Estimating \(K\) and \(E\) jointly can be difficult because the equations are non-linear and may converge slowly or to a poor solution. Tsai's algorithm provides a way to obtain initial estimates of the parameters for subsequent refinement. It requires a known calibration target that can be either planar or three-dimensional.

The key step is in estimating the rotation and x-y translation of the extrinsics \(E\) in such a way that the unknown focal length is eliminated. Once the rotation is known, the focal-length and z translation can be computed.

This process requires the optical center \([u_0, v_0]\) which is also unknown. However, given reasonable quality optics this is often near the image center so the center can be used as an initial estimate.

This is the starting point for the estimation.

Estimating (most of) the Extrinsics

This is divided into two cases, one for planar targets and one for 3D targets.

Case 1: 3D Non-planar Target:

For calibration targets with 8+ points that span 3 dimensions. Points in the camera frame prior to projection are:

\begin{equation*} \begin{bmatrix} x_c \\ y_c \\ z_c \end{bmatrix} = \begin{bmatrix} r_{xx} x_w + r_{xy} y_w + r_{xz} z_w + t_x \\ r_{yx} x_w + r_{yy} y_w + r_{yz} z_w + t_y \\ r_{zx} x_w + r_{zy} y_w + r_{zz} z_w + t_z \end{bmatrix} \end{equation*}

The image points relative to the optical center are then:

\begin{equation*} \begin{bmatrix} u_c - u_0 \\ v_c - v_0 \end{bmatrix} = \begin{bmatrix} u_c' \\ v_c' \end{bmatrix} = \begin{bmatrix} (f x_c + u_0 z_c)/z_c - u_0 \\ (f y_c + v_0 z_c)/z_c - v_0 \end{bmatrix} = \begin{bmatrix} f x_c/z_c \\ f y_c/z_c \end{bmatrix} \end{equation*}

The ratio of these two quantities gives the equation:

\begin{equation*} \frac{u_c'}{v_c'} = \frac{x_c}{y_c} \end{equation*}

or, after cross-multiplying and collecting terms to one side:

\begin{align*} v_c' x_c - u_c' y_c = 0 \\ [ v_c' x_w, v_c' y_w, v_c' z_w, -u_c' x_w, -u_c' y_w, -u_c' z_w, 1, -1] \begin{bmatrix} r_{xx} \\ r_{xy} \\ r_{xz} \\ r_{yx} \\ r_{yy} \\ r_{yz} \\ t_x \\ t_y \end{bmatrix} = 0 \end{align*}

Each point produces a single equation like the above which can be stacked to form a homogeneous system. With a zero right-hand-side, scalar multiples of the solution still satisfy the equations and a trivial all-zero solution exists. Calling the system matrix \(M\), the solution to this equation system is the eigenvector associated with the smallest eigenvalue of \(M^T M\). The computed solution will be proportional to the desired solution. It is also possible to fix one value, e.g. \(t_y=1\) and solve an inhomogeneous system but this may give problems if \(t_y \neq 0\) for the camera configuration.

The recovered \(r_x = [ r_{xx}, r_{xy}, r_{xz} ]\) and \(r_x = [ r_{yx}, r_{yy}, r_{yz} ]\) should be unit length. The scale of the solution can be recovered as the mean length of these vectors and used to inversely scale the translations \(t_x, t_y\).

Case 2: 2D Planar Targets:

For 2D targets, arbitrarily set \(z_w=0\), leading to:

\begin{equation*} \begin{bmatrix} x_c \\ y_c \\ z_c \end{bmatrix} = \begin{bmatrix} r_{xx} x_w + r_{xy} y_w + t_x \\ r_{yx} x_w + r_{yy} y_w + t_y \\ r_{zx} x_w + r_{zy} y_w + t_z \end{bmatrix} \end{equation*}

Use the same process as before, a homogeneous system can be formed. It is identical to the previous one, except the \(r_{xz}\) and \(r_{yz}\) terms are dropped since \(z_w=0\). Each equation ends up being:

\begin{equation*} [ v_c' x_w, v_c' y_w, -u_c' x_w, -u_c' y_w, 1, -1] \begin{bmatrix} r_{xx} \\ r_{xy} \\ r_{yx} \\ r_{yy} \\ t_x \\ t_y \end{bmatrix} = 0 \end{equation*}

Solving this gives \(r_{xx}, r_{xy}, r_{zx}, r_{zy}, t_x, t_y\) up to a scalar \(k\). It does not provide \(r_{xz}, r_{yz}\). In addition, we know that the rows of the rotation submatrix are unit-norm and orthogonal, giving three equations:

\begin{align*} r_{xx}^2 + r_{xy}^2 + r_{xz}^2 = k^2 \\ r_{yz}^2 + r_{yy}^2 + r_{yz}^2 = k^2 \\ r_{xx} r_{yx} + r_{xy} r_{yy} + r_{xz} r_{yz} = 0 \end{align*}

After some algebraic manipulations the factor \(k^2\) can be found as:

\begin{equation*} k^2 = \frac{1}{2}\left( r_{xx}^2 + r_{xy}^2 + r_{yx}^2 + r_{yy}^2 + \sqrt{\left( (r_{xx} - r_{yy})^2 + (r_{xy} + r_{yx})^2\right)\left( (r_{xx}+r_{yy})^2 + (r_{xy} - r_{yx})^2 \right) } \right) \end{equation*}

This allows \(r_{xz}, r_{yz}\) to be found as:

\begin{align*} r_{xz} = \pm\sqrt{ k^2 - r_{xx}^2 - r_{xy}^2 } \\ r_{yz} = \pm\sqrt{ k^2 - r_{yz}^2 - r_{yy}^2 } \end{align*}

Estimate Remaining Intrinsics and Focal Length

Estimating the focal length and z-translation follows a similar process. Using the definitions above we have:

\begin{align*} u_c' = f \frac{x_c}{z_c} \\ v_c' = f \frac{y_c}{z_c} \end{align*}

Cross-multiplying and collecting gives:

\begin{align*} (r_{xx} x_w + r_{xy} y_w + r_{xz} z_w + t_x) f - u_c' t_z = u_c' (r_{zx} x_w + r_{zy} y_w + r_{zz} z_w) \\ (r_{yx} x_w + r_{yy} y_w + r_{yz} z_w + t_y) f - v_c' t_z = v_c' (r_{zx} x_w + r_{zy} y_w + r_{zz} z_w) \end{align*}

Solving this system gives the desired \(f, t_x\) values.

Feb 09, 2019

Extract Image Patches in Python

There are a number of imaging algorithms that work with patch representations. Non-local means, dictionary learning and Gaussian Mixture Models are some examples.

In order to use these, you need to be able to extract image patches and reconstruct images efficiently. There are probably faster methods but the following does the trick for me without much in the way of dependencies:

import numpy

def extract_grayscale_patches( img, shape, offset=(0,0), stride=(1,1) ):
    """Extracts (typically) overlapping regular patches from a grayscale image

    Changing the offset and stride parameters will result in images
    reconstructed by reconstruct_from_grayscale_patches having different
    dimensions! Callers should pad and unpad as necessary!

        img (HxW ndarray): input image from which to extract patches

        shape (2-element arraylike): shape of that patches as (h,w)

        offset (2-element arraylike): offset of the initial point as (y,x)

        stride (2-element arraylike): vertical and horizontal strides

        patches (ndarray): output image patches as (N,shape[0],shape[1]) array

        origin (2-tuple): array of top and array of left coordinates
    px, py = numpy.meshgrid( numpy.arange(shape[1]),numpy.arange(shape[0]))
    l, t = numpy.meshgrid(
        numpy.arange(offset[0],img.shape[0]-shape[0]+1,stride[0]) )
    l = l.ravel()
    t = t.ravel()
    x = numpy.tile( px[None,:,:], (t.size,1,1)) + numpy.tile( l[:,None,None], (1,shape[0],shape[1]))
    y = numpy.tile( py[None,:,:], (t.size,1,1)) + numpy.tile( t[:,None,None], (1,shape[0],shape[1]))
    return img[y.ravel(),x.ravel()].reshape((t.size,shape[0],shape[1])), (t,l)

def reconstruct_from_grayscale_patches( patches, origin, epsilon=1e-12 ):
    """Rebuild an image from a set of patches by averaging

    The reconstructed image will have different dimensions than the
    original image if the strides and offsets of the patches were changed
    from the defaults!

        patches (ndarray): input patches as (N,patch_height,patch_width) array

        origin (2-tuple): top and left coordinates of each patch

        epsilon (scalar): regularization term for averaging when patches
            some image pixels are not covered by any patch

        image (ndarray): output image reconstructed from patches of
            size ( max(origin[0])+patches.shape[1], max(origin[1])+patches.shape[2])

        weight (ndarray): output weight matrix consisting of the count
            of patches covering each pixel
    patch_width  = patches.shape[2]
    patch_height = patches.shape[1]
    img_width    = numpy.max( origin[1] ) + patch_width
    img_height   = numpy.max( origin[0] ) + patch_height

    out = numpy.zeros( (img_height,img_width) )
    wgt = numpy.zeros( (img_height,img_width) )
    for i in range(patch_height):
        for j in range(patch_width):
            out[origin[0]+i,origin[1]+j] += patches[:,i,j]
            wgt[origin[0]+i,origin[1]+j] += 1.0

    return out/numpy.maximum( wgt, epsilon ), wgt

if __name__ == '__main__':
    import cv2
    import time
    import matplotlib.pyplot as plt

    img = cv2.imread('lena.png')[:,:,::-1]

    start = time.time()
    p, origin = extract_grayscale_patches( img[:,:,2], (8,8), stride=(1,1) )
    end = time.time()
    print( 'Patch extraction took: {}s'.format(numpy.round(end-start,2)) )
    start = time.time()
    r, w = reconstruct_from_grayscale_patches( p, origin )
    end = time.time()
    print('Image reconstruction took: {}s'.format(numpy.round(end-start,2)) )
    print( 'Reconstruction error is: {}'.format( numpy.linalg.norm( img[:r.shape[0],:r.shape[1],2]-r ) ) )

    plt.subplot( 131 )
    plt.imshow( img[:,:,2] )
    plt.title('Input image')
    plt.subplot( 132 )
    plt.imshow( p[p.shape[0]//2] )
    plt.title('Central patch')
    plt.subplot( 133 )
    plt.imshow( r )
    plt.title('Reconstructed image')

Remove the end bit to have only numpy as a dependency.

Feb 01, 2019

Assignment Problems

Assignment problems occur in many fields that deal with resource allocation. There are many variants, however the simplest involves \(N\) agents indexed by \(i\) which must perform \(N\) tasks indexed by \(j\). The cost for agent \(i\) to perform task \(j\) is encoded in a cost matrix \(C\) that is weighted component-wise by assignments from an assignment matrix \(W_{ij}\) to form the objective.

It's common to have the requirements that each task must be completed and that no agent can perform more than one task. Often it is required that each agent perform a single task to completion but I'm not considering that case here.

\begin{align*} \newcommand{\argmin}{{\bf\mbox{argmin}}} \newcommand{\st}{{\bf\mbox{subject to:}}} \newcommand{\gap}{\hspace{0.25cm}} W = \argmin\gap& &\sum_{ij} C_{ij} W_{ij} \\ \st\gap& & \sum_i W_{ij} = 1\gap\forall j\\ & & \sum_j W_{ij} = 1\gap\forall i\\ & & W_{ij} \geq 0 \gap\forall i,j \end{align*}

The problem can be solved using the Hungarian algorithm but I thought I'd give it a try with ADMM to see how well it works.

The probability simplex constraints are a pain but it's pretty easy to solve this by introducing splitting variables \(A, B\) each equal to \(W\). This gives the following modified problem:

\begin{align*} W = \argmin\gap& &\sum_{ij} C_{ij} W_{ij} + \sum_i \mathcal{I}_\Delta(A_{i*}) + \sum_j \mathcal{I}_\Delta(B_{*j}) \\ \st\gap& & W - A = 0 \\ & & W - B = 0 \end{align*}

The function \(\mathcal{I}_\Delta(v)\) is the indicator function for probability simplex constraints on the rows of \(V\):

\begin{equation*} \mathcal{I}_\Delta(v) = \begin{cases} 0 & \gap\mbox{if } v_i \geq 0 \mbox{ and } \sum_j v_i = 1 \gap\forall i \\ \infty &\gap\mbox{otherwise} \end{cases} \end{equation*}

The proximal operator for this functions requires Euclidean projection onto the non-negativity and sum contraints. It's possible to alternately project on the non-negativity and sum contraints but this converges slowly. However, an \(O(N \log N)\) algorithm exists and can be implemented in Python as below:

def prox_probability_simplex( v ):
    """Algorithm of https://eng.ucmerced.edu/people/wwang5/papers/SimplexProj.pdf"""
    j = numpy.arange(1,len(v)+1)
    u = numpy.sort(v)[::-1]
    y = numpy.cumsum( u )
    rho = numpy.argmax((u+(1.0-y)/j>0)*j)+1
    off = (1-y[rho-1])/rho
    ret = numpy.maximum( v+off, 0.0 )
    return ret

Neglecting constants w.r.t \(W, A, B\), the augmented Lagrangian of this problem is:

\begin{equation*} \mathcal{L}(W,A,U_A,B,U_B) = \sum_{ij} C_{ij} W_{ij} + \sum_i \mathcal{I}_\Delta(A_{i*}) + \sum_j \mathcal{I}_\Delta(B_{*i}) + \frac{\rho}{2}\| W - A + U_A \|_2^2 + \frac{\rho}{2}\| W - A + U_A \|_2^2 \end{equation*}

ADMM consists of alternating minimizations for the solution \(W\) and the splitting variables \(A\) and \(B\). After each set of updates, the (scaled) Lagrange multipliers \(U_A, U_B\) are updated. The objective terms parallelize really cleanly, the first and last two terms parallelize per-component of \(W\). The \(\mathcal{I}_\Delta(.)\) terms each parallelize independently over rows and columns of the splitting variables.

The resulting algorithm is as follows. I have not parallelized the solves although the \(W\) solve is vectorized at least:

def assignment_problem( C, W=None, rho=0.1, thresh=0.55, early_stop=True, num_iterations=10000 ):
    """Applies an ADMM algorithm to the assignment problem

    This algorithm will only solve the problem at the limit of iterations. Early
    stopping or running too few iterations could give a non-optimal or invalid
    matching (i.e. not a permutation of agents to tasks)

    That said, it often does okay!

        C (NxN matrix): Input cost matric mapping agents (row indices) to tasks
            (column indices). Invalid assignments should be indicated with very
            large positive entries

        W (NxN matrix): Initial assignments to warm-start the process or None
            to start from all zeros

        rho (scalar): Lagrangian augmentation weight

        thresh (scalar): Threshold above which assignments are treated as hard
            for the purposes of returned assignments

        early_stop (boolean): enable early stopping in the case that thresholding
            the assignment matrix by thresh produces a valid assignment. This
            does not necessarily produce an optimal solution!

        num_iterations (int): maximum number of iterations to perform, may be
            fewer if early_stop == True


        assignment (N-element integer array): assignments from agents to task
            where each element assignment[i]=v maps agent i to task v

        assignment matrix (NxN matrix): solution variables to warm-start if
            a valid solution was not found

        loss (scalar): the achieved loss/cost of the solution

        iter (integer): number of iterations performed
    if W is None:
        W = numpy.zeros_like(C)

    A = W.copy()
    B = W.copy()
    Ua = W-A
    Ub = W-B

    loss = numpy.sum(C*W)
    for iter in range( num_iterations ):
        # update primary variables, trivially parallelizes per component
        W = numpy.maximum( (rho*(A-Ua)+rho*(B-Ub)-C)/(2.0*rho), 0.0 )

        # update row constraints, trivially parallizes by row
        for i in range(A.shape[0]):
            A[i,:] = prox_probability_simplex( W[i,:] + Ua[i,:] )
        Ua += (W - A)

        # update column constraints, trivially parallelizes by column
        for i in range(A.shape[1]):
            B[:,i] = prox_probability_simplex( W[:,i] + Ub[:,i] )
        Ub += (W - B)

        tloss = numpy.sum(C*W)
        rowsum = numpy.sum(W > 0.55, axis=0)
        colsum = numpy.sum(W > 0.55, axis=1)
        if early_stop and numpy.max(rowsum) == 1 and numpy.min(rowsum) == 1 and numpy.max(colsum) == 1 and numpy.min(colsum) == 1:
            return numpy.argmax( W > thresh, axis=1 ), W, tloss, iter

    return numpy.argmax( W > thresh, axis=1 ), W, tloss, iter

def check_validity( x ):
    """Checks that a solution from the above is a permutation"""
    return numpy.linalg.norm( numpy.sort(x) - numpy.arange(len(x)) ) > 1e-8

It's worth noting a few things:

  • With the splitting and ADMM, this algorithm will only exactly solve the problem in the limit of iteration count.
  • Early stopping can reduce the iterations needed to achieve a valid, low-cost matching by 10-50X depending on the problem but this matching may not be optimal. For some test problems with 10x10 cost matrices having standard normal entries this happened about 1% of the time. The achieved loss values are typically within 1-2% of each other.
  • Early stopping does always generate identical output running the algorithm for many more iterations. Subsets of agents/tasks where multiple matchings produce close results tend to be hard to resolve.

Jan 28, 2019

Dictionary Learning

Dictionary learning consists of jointly solving for both a dictionary \(D\), the columns of which store features and weights \(X\) which store coefficients applied to \(D\) required to best approximate a collection of input datapoints stored as columns of \(Y\). The weights \(X\) are required to be sparse, i.e. have relatively few non-zero entries:

\begin{equation*} \newcommand{\gap}{\hspace{0.25cm}} \newcommand{\argmin}{{\bf\mbox{argmin}}} \newcommand{\st}{{\bf\mbox{subject to:}}} D, X = \argmin \frac{1}{2} \| Y - DX \|_F^2 + \lambda \| X \|_1 \end{equation*}

An additional constraint must be added to \(D\) because, without one, the 1-norm in the objective can be made arbitrarily small by allowing \(D\) to grow arbitrarily large. Typically this constraint is that columne of \(D\) have unit 2-norm.

This is a non-convex problem, but it can be solved relatively well by iterating weight and dictionary updates, leaving the other fixed. Ignoring the norm constraint, the dictionary update is a linear least-squares problem and can be solved as:

\begin{equation*} D = \left( \left( X X^T + \epsilon I \right)^{-1} X Y^T \right)^T \end{equation*}

After this, the columns can be normalized.

The weight update is a sparse-coding problem and can be solved using ADMM. The first step is to introduce a splitting variable that allows the Frobenius-norm and 1-norm to be handled separately:

\begin{align*} X = \argmin & & \frac{1}{2} \| Y - D X \|_F^2 + \lambda \| Z \|_1 \\ \st & & X - Z = 0 \end{align*}

Once in this form, the update algorithm for the weights can be written from inspection based on the link above using \(F(x) = \argmin \frac{1}{2}\| Y - D X \|_F^2\) and \(G(Z) = \lambda \| Z \|_1\).

\begin{align*} \newcommand{\prox}[2]{{\bf\mbox{prox}_{#1}}\left(#2\right)} X^{k+1} &=& \prox{F}{Z^k - U^k} \\ Z^{k+1} &=& \prox{\lambda G}{X^{k+1} + U^k} \\ U^{k+1} &=& U^k + X^{k+1} - Z^{k+1} \end{align*}

All that remains is to determine the proximal operators and initialization. The proximal operator for \(X\) is for a pretty conventional least-squares problem:

\begin{align*} \prox{F}{V} &=& \argmin\gap F(X) + \frac{1}{2}\| X - V \|_F^2 \\ &=& \argmin\gap \frac{1}{2} \| Y - D X \|_F^2 + \frac{1}{2} \| X - V \|_F^2 \\ &=& \left( D^T D + I \right)^{-1} \left( D^T Y + V \right) \end{align*}

The corresponding proximal operator for \(Z\) is for 'pure' 1-norm, which is just the soft-thresholding operator:

\begin{align*} \prox{\lambda G}{v} &=& \argmin\gap G(Z) + \frac{1}{2\lambda} \| Z - V \|_1 \\ &=& ( |V| - \lambda )_+ {\bf\mbox{sgn}}( V ) \end{align*}

If you're in doubt where either of these come from, consult the ADMM link above.

This leads to some interesting choices. Solving the split problem has two sets of variables, \(X\) and \(Z\). \(X\) minimizes the data term but \(Z\) enforces the constraints. Nominally, both of these should be equal but this only holds at the limit of iteration count, assuming the overall algorithm works at all (the problem is non-convex after all). Which should be used when updating \(D\)?

For a test problem involving around 5.5M points of 48 dof data, I found that using \(Z\) is vastly preferable. Using a dictionary with only 100 atoms (~2X overcomplete) yields reconstruction errors around 1% or less but the percentage of non-zeros in the coding weights is only around 25%. It settled relatively quickly. Using \(X\) on the other hand oscillated quite a bit and did not reach these levels of accuracy/sparsity.

I conclude that it's better to use a looser fit than slacker constraint satisfaction, for this problem at least. Presumably the dictionary/codebooks adapt to each other better when they are consistent. More 'compressible' problems may yield sparser solutions.

import numpy

def sparse_coding( Y, D, X, rho, num_iterations, Z=None, U=None ):
    if Z is None:
        Z = X.copy()
    if U is None:
        U = X - Z

    # precompute solve and part of RHS
    iDtD = numpy.linalg.inv( numpy.dot(D.transpose(),D) + numpy.eye(D.shape[1]) )
    DtY  = numpy.dot( D.transpose(), Y )

    for iter in range(num_iterations):
        print('    Sparse coding iteration [{}/{}]'.format(iter+1,num_iterations) )
        # primary update
        X = numpy.dot( iDtD, DtY + Z - U )

        # splitting variable update
        T = X + U
        Z = numpy.maximum( numpy.abs(T) - rho, 0.0)*numpy.sign(T)

        # lagrange multiplier update
        U = T - Z
    return X, Z, U

def dictionary_learning( Y, num_atoms, rho=0.001, num_outer_iterations=10, num_inner_iterations=10, epsilon=1e-8 ):
    # initialize the dictionary and weights
    X = numpy.random.standard_normal( (num_atoms, Y.shape[1]) )
    D = numpy.random.standard_normal( (Y.shape[0], num_atoms) )

    # outer loop to fit dictionary and weights
    Z = X.copy()
    U = X - Z
    for outer in range(num_outer_iterations):
        print( '  Outer iteration [{}/{}]'.format(outer+1,num_outer_iterations) )

        # dictionary update
        D = numpy.linalg.solve(
        for i in range(D.shape[1]):
            D[:,i] /= numpy.linalg.norm(D[:,i])

        # sparse coding weight update
        X, Z, U = sparse_coding( Y, D, X, rho, num_inner_iterations, Z, U )

        # print some stats
        print( '    ||Y-DX|| RMS error: {}'.format( numpy.sqrt(numpy.mean(numpy.square(Y - numpy.dot(D,Z)))) ) )
        print( '          mean(nnz(X)): {}'.format( numpy.mean( numpy.sum(numpy.abs(Z)>1e-4, axis=0) ) ) )

    # return dictionary and solution variables
    return D, Z
Next → Page 1 of 2