Function Approximation¶
Function approximation replaces complex functions with simpler ones that are close enough to be useful. This file covers linearisation, Taylor series, polynomial approximation, Fourier series, and the universal approximation theorem -- the theoretical backbone of why neural networks can learn arbitrary mappings.
-
Many functions we encounter are too complex to work with directly. Computing \(e^{0.1}\) on paper, predicting the trajectory of a satellite, etc. all involve functions that do not have simple closed-form answers.
-
Function approximation replaces a complicated function with a simpler one that is "close enough" over the region we care about.
-
The most natural approximation is a polynomial. Polynomials are just sums of powers of \(x\) with coefficients, and they are easy to evaluate, differentiate, and integrate.
-
But why do polynomials work so well as approximators? Consider what each power of \(x\) contributes.
- The constant term \(a_0\) sets the baseline value.
- The \(a_1 x\) term adds a slope.
- The \(a_2 x^2\) term adds curvature.
- Each higher power captures finer detail about the function's shape.
-
By choosing the right coefficients, we can match a function's value, slope, curvature, and higher-order behaviour at a point, one piece at a time.
-
With enough terms, the polynomial can mimic almost any smooth function.
-
The question becomes: how do we find the right coefficients?
-
Linearisation is the simplest approximation. Near a point \(x = a\), we replace the function with its tangent line:
-
This is the first-order Taylor approximation. It says: start at the known value \(f(a)\), then adjust by the slope times the distance from \(a\).
-
For example, linearise \(\sin(x)\) at \(x = 0\): \(f(0) = 0\), \(f'(0) = \cos(0) = 1\), so \(L(x) = x\). Near zero, \(\sin(x) \approx x\). Try it: \(\sin(0.1) = 0.0998\ldots \approx 0.1\).
-
But linearisation is only good very close to \(a\). Move further away and the approximation falls apart. To do better, we include higher-order terms.
-
The Taylor series represents a function as an infinite sum of polynomial terms, each capturing finer detail about the function's behaviour near a point \(a\):
-
Each successive term adds a correction. The first term matches the value, the second matches the slope, the third matches the curvature, and so on. The more terms we include, the larger the region where the approximation is accurate.
-
The \(n!\) in the denominator is not arbitrary. When you differentiate \((x - a)^n\) exactly \(n\) times, you get \(n!\). The factorial cancels this out, ensuring that the \(n\)-th derivative of the Taylor polynomial equals the \(n\)-th derivative of the original function at \(x = a\).
-
A Maclaurin series is simply a Taylor series centred at \(a = 0\):
- Some famous Maclaurin series:
-
Notice that \(\sin x\) has only odd powers (it is an odd function) and \(\cos x\) has only even powers (it is an even function). The alternating signs cause the approximation to oscillate around the true value, converging from both sides.
-
Let us approximate \(e^{0.5}\) using four terms: \(1 + 0.5 + \frac{0.25}{2} + \frac{0.125}{6} = 1 + 0.5 + 0.125 + 0.02083 \approx 1.6458\). The true value is \(1.6487\ldots\), so four terms already give us three correct decimal places.
-
Not every Taylor series converges everywhere. The radius of convergence tells us how far from the centre \(a\) the series gives valid results. Within that radius, the polynomial approximation can be made as accurate as we want by adding more terms. Outside it, the series diverges.
-
A power series is the general form: \(\sum_{n=0}^{\infty} a_n (x - c)^n\). Taylor series are power series where the coefficients are determined by derivatives. Other power series might be defined by some other rule. The ratio test determines convergence: compute \(\lim_{n \to \infty} \left|\frac{a_{n+1}}{a_n}\right|\). If this limit is \(L\), the radius of convergence is \(R = 1/L\).
-
When we truncate a Taylor series after \(n\) terms, we incur an error. The Lagrange remainder bounds this error:
-
Here \(c\) is some unknown point between \(a\) and \(x\). We do not know \(c\) exactly, but we can often bound \(|f^{(n+1)}(c)|\) to get a worst-case error estimate. The \((n+1)!\) in the denominator grows extremely fast, so the error shrinks rapidly as we add more terms (for functions within the radius of convergence).
-
For a function of multiple variables, the Taylor expansion includes mixed partial derivatives. The second-order approximation of \(f(\mathbf{x})\) around a point \(\mathbf{a}\) is:
-
The first term is the value, the second uses the gradient (a vector, as we saw in multivariate calculus), and the third uses the Hessian matrix (which captures curvature). This connects our matrices chapter directly to calculus: the Hessian is a matrix of second derivatives that describes the shape of the function's surface.
-
This multivariate second-order approximation is the foundation of Newton's method and other second-order optimisation techniques, which we will see in the next file.
-
Beyond polynomials, there are other approximation methods worth knowing about:
- Spline interpolation: instead of one high-degree polynomial, use many low-degree polynomials stitched together smoothly. This avoids the wild oscillations that high-degree polynomials can produce.
- Fourier series: approximate periodic functions as sums of sines and cosines. Essential in signal processing and audio.
- Neural networks: universal function approximators. With enough neurons, they can approximate any continuous function to arbitrary accuracy. This is the theoretical justification for deep learning.
-
A function is called "well-behaved" if it has properties that make approximation reliable: continuity (no jumps), differentiability (no sharp corners), smoothness (derivatives of all orders exist), and boundedness (outputs stay finite).
-
Polynomials, exponentials, and trigonometric functions are all well-behaved. The better-behaved a function is, the fewer Taylor terms you need for a good approximation.
Coding Tasks (use CoLab or notebook)¶
-
Approximate \(e^x\) using increasing numbers of Taylor terms and visualise how the approximation improves.
import jax.numpy as jnp import matplotlib.pyplot as plt x = jnp.linspace(-2, 3, 300) plt.plot(x, jnp.exp(x), "k-", linewidth=2, label="eˣ (exact)") colors = ["#e74c3c", "#3498db", "#27ae60", "#9b59b6"] for n, color in zip([1, 2, 4, 8], colors): approx = sum(x**k / jnp.array(float(jnp.prod(jnp.arange(1, k+1)) if k > 0 else 1)) for k in range(n+1)) plt.plot(x, approx, color=color, linestyle="--", label=f"{n} terms") plt.ylim(-2, 15) plt.legend() plt.title("Taylor approximation of eˣ") plt.show() -
Compute the Lagrange remainder to bound the error of approximating \(\sin(1)\) with different numbers of Taylor terms.
import jax.numpy as jnp x = 1.0 exact = jnp.sin(x) taylor = 0.0 for n in range(8): sign = (-1)**n factorial = float(jnp.prod(jnp.arange(1, 2*n+2))) taylor += sign * x**(2*n+1) / factorial error = abs(exact - taylor) bound = x**(2*n+3) / float(jnp.prod(jnp.arange(1, 2*n+4))) print(f"terms={n+1} approx={taylor:.10f} error={error:.2e} bound={bound:.2e}") -
Compare linearisation vs quadratic Taylor approximation of \(\cos(x)\) near \(x = 0\). Plot both approximations alongside the true function and observe the range where each is accurate.
import jax.numpy as jnp import matplotlib.pyplot as plt x = jnp.linspace(-3, 3, 300) plt.plot(x, jnp.cos(x), "k-", linewidth=2, label="cos(x)") plt.plot(x, jnp.ones_like(x), "--", color="#e74c3c", label="linear: 1") plt.plot(x, 1 - x**2/2, "--", color="#3498db", label="quadratic: 1 - x²/2") plt.plot(x, 1 - x**2/2 + x**4/24, "--", color="#27ae60", label="4th order") plt.ylim(-2, 2) plt.legend() plt.title("Taylor approximations of cos(x)") plt.show()