## Tsai Camera Calibration

Projection of known 3D points into a pinhole camera can be modeled by the following equation:

The matrix \(K\) contains the camera intrinsics and the matrix \(E\) contains the camera extrinsics. The final image coordinates are obtained by perspective division: \(u_c = \hat{u}_c/\hat{w}_c\) and \(v_c = \hat{v}_c/\hat{w}_c\).

Estimating \(K\) and \(E\) jointly can be difficult because the equations are non-linear and may converge slowly or to a poor solution. Tsai's algorithm provides a way to obtain initial estimates of the parameters for subsequent refinement. It requires a known calibration target that can be either planar or three-dimensional.

The key step is in estimating the rotation and x-y translation of the extrinsics \(E\) in such a way that the unknown focal length is eliminated. Once the rotation is known, the focal-length and z translation can be computed.

This process requires the optical center \([u_0, v_0]\) which is also unknown. However, given reasonable quality optics this is often near the image center so the center can be used as an initial estimate.

This is the starting point for the estimation.

## Estimating (most of) the Extrinsics

This is divided into two cases, one for planar targets and one for 3D targets.

**Case 1: 3D Non-planar Target**:

For calibration targets with 8+ points that span 3 dimensions. Points in the camera frame prior to projection are:

The image points relative to the optical center are then:

The ratio of these two quantities gives the equation:

or, after cross-multiplying and collecting terms to one side:

Each point produces a single equation like the above which can be stacked to form a homogeneous system. With a zero right-hand-side, scalar multiples of the solution still satisfy the equations and a trivial all-zero solution exists. Calling the system matrix \(M\), the solution to this equation system is the eigenvector associated with the smallest eigenvalue of \(M^T M\). The computed solution will be proportional to the desired solution. It is also possible to fix one value, e.g. \(t_y=1\) and solve an inhomogeneous system but this may give problems if \(t_y \neq 0\) for the camera configuration.

The recovered \(r_x = [ r_{xx}, r_{xy}, r_{xz} ]\) and \(r_x = [ r_{yx}, r_{yy}, r_{yz} ]\) should be unit length. The scale of the solution can be recovered as the mean length of these vectors and used to inversely scale the translations \(t_x, t_y\).

**Case 2: 2D Planar Targets**:

For 2D targets, arbitrarily set \(z_w=0\), leading to:

Use the same process as before, a homogeneous system can be formed. It is identical to the previous one, except the \(r_{xz}\) and \(r_{yz}\) terms are dropped since \(z_w=0\). Each equation ends up being:

Solving this gives \(r_{xx}, r_{xy}, r_{zx}, r_{zy}, t_x, t_y\) up to a scalar \(k\). It does not provide \(r_{xz}, r_{yz}\). In addition, we know that the rows of the rotation submatrix are unit-norm and orthogonal, giving three equations:

After some algebraic manipulations the factor \(k^2\) can be found as:

This allows \(r_{xz}, r_{yz}\) to be found as:

## Estimate Remaining Intrinsics and Focal Length

Estimating the focal length and z-translation follows a similar process. Using the definitions above we have:

Cross-multiplying and collecting gives:

Solving this system gives the desired \(f, t_x\) values.