Artificial Intelligence 7 min read

Derivation of Rotation and Translation Matrices for Face Euler‑Angle Estimation

The article explains how to derive rotation and translation matrices through three successive coordinate transformations—world to camera, camera to image plane, and image plane to pixel—to accurately estimate facial Euler angles for face‑recognition tasks.

New Oriental Technology
New Oriental Technology
New Oriental Technology
Derivation of Rotation and Translation Matrices for Face Euler‑Angle Estimation

Background

In face‑recognition tasks, improving accuracy hinges on precise estimation of facial Euler angles, which requires mapping real‑world facial coordinates to image coordinates. This is typically done by establishing a relationship between facial key‑point coordinates and image key‑point coordinates to solve a rotation and translation matrix that provides a reliable transformation model.

Triple Coordinate Transformations

World Coordinate System → Camera Coordinate System

The point P in the world coordinate system (coordinates Ow, Xw, Yw, Zw ) is transformed to the camera coordinate system by applying a rotation matrix R (3×3) and a translation vector T (3×1). Assuming the Z‑axis remains fixed, rotations around the Z, X, and Y axes produce individual rotation matrices R1 , R2 , and R3 , which are combined to obtain the overall rotation matrix.

From the diagram, the transformation from point P1 to P can be expressed, leading to the derivation of matrix R1 . Similar procedures yield matrices R2 and R3 .

Combining the three rotation matrices gives the final Euler‑angle rotation matrix.

Camera Coordinate System → Image‑Plane Coordinate System

The transformation from camera coordinates to image‑plane coordinates uses the focal length f and a symmetric projection onto the object space.

The resulting matrix relates camera coordinates to image‑plane coordinates (no subscript denotes image‑plane).

Image‑Plane Coordinate System → Pixel Coordinate System

Pixel size is defined as x' mm (width) and y' mm (height). The optical centre offsets u0 and v0 are used to derive the mapping equations.

The final transformation from pixel coordinates (u, v) to camera coordinates is expressed by the matrix below.

Deriving Rotation and Translation Matrices

An additional scale factor K is introduced, leading to the equation shown below.

By assuming K = 1 , three independent parameters a, b, c form an antisymmetric matrix S . Together with the identity matrix I , they satisfy the relation:

Substituting S into the previous equation yields a set of linear equations that can be solved for the rotation matrix once at least three point correspondences are available. After obtaining R , the translation vector T is derived by plugging R back into the coordinate‑transformation equations.

Final Transformation Matrix

The complete transformation from world coordinates to pixel coordinates is summarized in the matrix below, where the first block represents the camera intrinsic parameters and the second block the extrinsic parameters.

The terms fx and fy denote the number of pixels per unit focal length, reflecting the spatial resolution of the camera.

Conclusion

Through the three successive transformations, a closed‑form equation linking any world point to its pixel location is obtained, enabling the extraction of rotation and translation parameters and thus rapid estimation of facial displacement and Euler angles.

computer visionFace Recognitioncamera calibrationEuler anglesrotation matrix
New Oriental Technology
Written by

New Oriental Technology

Practical internet development experience, tech sharing, knowledge consolidation, and forward-thinking insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.