Linear Transformations¶

Every matrix multiplication is a linear transformation -- a function that reshapes, rotates, or projects vectors while preserving linearity. This file covers rotation, reflection, scaling, shearing, projection, the kernel and image of a map, and how neural network layers chain these transformations.

A linear transformation (or linear map) is a function that takes a vector and produces another vector, while preserving addition and scaling. If \(T\) is linear, then:
- \(T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})\)
- \(T(c\mathbf{u}) = cT(\mathbf{u})\)
Every linear transformation can be represented as multiplication by a matrix. The matrix is the transformation. When you multiply a vector by a matrix, you are applying a linear transformation to it.
Think of a \(2 \times 2\) matrix as a machine that takes in 2D vectors and outputs new 2D vectors. The columns of the matrix tell you where the standard basis vectors \(\hat{\mathbf{i}}\) and \(\hat{\mathbf{j}}\) end up after the transformation. Everything else follows from linearity.

The columns of a matrix show where the basis vectors land

For example, if

\[ A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \]

then \(\hat{\mathbf{i}} = [1, 0]^T\) lands at \([2, 1]^T\) (column 1) and \(\hat{\mathbf{j}} = [0, 1]^T\) lands at \([1, 2]^T\) (column 2). Every other vector is a combination of these two, so its output follows automatically.

Multiplying two matrices can be thought of as applying one transformation after another. If \(B\) transforms vectors from one space and \(A\) transforms the result, then \(AB\) does both in sequence. In a game engine, rotating a character and then moving them forward is a different result from moving them first and then rotating, which is why matrix multiplication is not commutative.
Rotation turns vectors by an angle \(\theta\) without changing their length. The vector stays the same size, it just points in a new direction.

Rotation preserves length but changes direction

In 2D, the rotation matrix is:

\[ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \]

For \(\theta = 90°\):

\[ R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \]

so \([1, 0]^T\) becomes \([0, 1]^T\). The vector pointing right now points up. Rotation matrices are orthogonal and always have determinant 1. When you rotate a photo on your phone, this is the exact matrix being applied to every pixel coordinate.

In 3D, there are separate rotation matrices for each axis. A robotic arm rotates each joint around a specific axis, and each joint is one rotation matrix. Rotation around the z-axis looks like the 2D case embedded in 3D:

\[ R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

Scaling stretches or shrinks vectors along each axis independently:

\[ S(s_x, s_y) = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} \]

Scaling stretches each axis by a different factor

\(S(2, 1.5)\) doubles the x-component and multiplies the y-component by 1.5. Scaling by \(-1\) along an axis flips that component. A diagonal matrix is always a scaling transformation. When you resize an image to 50%, you are applying \(S(0.5, 0.5)\) to every pixel coordinate.
Reflection flips vectors across an axis or line, like a mirror. Reflecting across the x-axis keeps the x-component and negates the y-component:

\[ \text{Ref}_x = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \]

Reflection across the x-axis flips the y-component

For example, \([3, 2]^T\) becomes \([3, -2]^T\). When your phone flips a selfie horizontally so text reads correctly, it is applying a reflection matrix. Reflecting across the line \(y = x\) swaps the two components:

\[ \text{Ref}_{y=x} = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \]

Reflection matrices have determinant \(-1\), confirming they flip orientation.
Rotations and reflections are both rigid transformations: they preserve distances and angles. The matrices that represent them are orthogonal matrices, which is why orthogonal matrices always have determinant \(+1\) (rotation) or \(-1\) (reflection).
Shearing skews vectors along one axis proportionally to the other. A horizontal shear by factor \(k\):

\[ \text{Sh}_x(k) = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix} \]

Shearing slides the top sideways while the bottom stays fixed

Each point slides horizontally by \(k\) times its height. With \(k = 0.5\), a point at height 2 shifts right by 1. The bottom row stays put, the top row slides. This is how italic text works: upright letters are sheared so they slant to the right.
All of the above (rotation, scaling, reflection, shearing) are linear transformations. They keep the origin fixed and preserve straight lines. But what about translation (shifting everything by a fixed amount)?
Translation is not a linear transformation because it moves the origin. If you shift every point right by 3, the zero vector moves to \([3, 0]^T\), breaking linearity. To handle it, we use an affine transformation, which combines a linear transformation with a translation:

\[\mathbf{y} = A\mathbf{x} + \mathbf{t}\]

To represent this as a single matrix multiplication, we use homogeneous coordinates: add an extra 1 to every vector and use an \((n+1) \times (n+1)\) matrix:

\[ \begin{bmatrix} A & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ 1 \end{bmatrix} = \begin{bmatrix} A\mathbf{x} + \mathbf{t} \\ 1 \end{bmatrix} \]

Affine transformations preserve straight lines and parallelism, but not necessarily angles or lengths. Every object in a video game is positioned using affine transformations: rotate it, scale it, then place it at the right location, all encoded in a single matrix.
A degenerate transformation (singular matrix) collapses space into a lower dimension.
For example, the matrix

\[ \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} \]

maps every 2D vector onto a single line, because both columns point in the same direction. The determinant is zero, information is lost, and the transformation cannot be undone.

Converting a colour image (3 values per pixel: red, green, blue) to grayscale (1 value per pixel) is a degenerate transformation: the colour information is permanently gone.
In ML, linear transformations are the core of neural networks, data is represented as a matrix (a stack of vectors representing features of an object like humans, planes, text, image...anything!)
Each layer applies a matrix multiplication (linear transformation), details are provided in other chapters, we need to explain hpw to structure these data and motivate neural networks properly.
However, the most used techniques today often almost exclusively passes the data through a bunch of linear transformations, we call these Transformers.
Gemini, ChatGPT, Claude, Qwen, DeepSeek and the best performing AI in the world today, are transformers!

Coding Tasks (use CoLab or notebook)¶

Apply a rotation matrix to a vector and plot both the original and rotated vector. Try different angles.

import jax.numpy as jnp
import matplotlib.pyplot as plt

theta = jnp.pi / 3
R = jnp.array([[jnp.cos(theta), -jnp.sin(theta)],
               [jnp.sin(theta),  jnp.cos(theta)]])

v = jnp.array([1.0, 0.0])
v_rot = R @ v

plt.figure(figsize=(5, 5))
plt.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='red', label='original')
plt.quiver(0, 0, v_rot[0], v_rot[1], angles='xy', scale_units='xy', scale=1, color='blue', label='rotated')
plt.xlim(-1.5, 1.5); plt.ylim(-1.5, 1.5)
plt.grid(True); plt.legend(); plt.gca().set_aspect('equal')
plt.show()

Apply a shearing transformation to a set of points forming a square and visualise the deformed shape.

import jax.numpy as jnp
import matplotlib.pyplot as plt

square = jnp.array([[0,0],[1,0],[1,1],[0,1],[0,0]]).T

k = 0.5
shear = jnp.array([[1, k],
                    [0, 1]])
sheared = shear @ square

plt.figure(figsize=(6, 4))
plt.plot(square[0], square[1], 'r-o', label='original')
plt.plot(sheared[0], sheared[1], 'b-o', label='sheared')
plt.grid(True); plt.legend(); plt.gca().set_aspect('equal')
plt.show()