Newton’s Method: The Algorithm Behind Modern Optimization

Learn how Newton-Raphson iteration finds roots of equations with quadratic convergence and why it matters in machine learning.

Finding where a function equals zero is one of the most common computational problems in mathematics. Newton’s method, also called the Newton-Raphson method, is an iterative algorithm that converges remarkably fast when it works. The idea is simple: approximate the function by its tangent line at the current guess, find where the tangent crosses zero, and use that as the next guess. Repeat until you are close enough.

Mathematically, if x_n is your current guess, the next guess is x_(n+1) = x_n - f(x_n) / f’(x_n). This requires evaluating both the function and its derivative at each step. The Newton-Raphson calculator implements this algorithm and shows you the convergence history step by step.

Newton’s method converges quadratically, meaning the number of correct digits roughly doubles with each iteration. Starting with one correct digit, you get two, then four, then eight. This is dramatically faster than the bisection method (which converges linearly) or the secant method (which converges at about 1.618 digits per step). For well-behaved functions near a simple root, Newton’s method reaches machine precision in just a handful of iterations.

The method can fail in several ways. If the derivative is zero or very small, the tangent is nearly horizontal and the next guess shoots off to infinity. If the initial guess is far from the root, the method may converge to a different root or not converge at all. For functions with multiple roots, convergence depends on the starting point. Drawing a picture of the function and its tangent lines helps predict behavior.

Modified Newton methods handle cases where the derivative is expensive or impossible to compute. The secant method approximates the derivative using two previous function values. The quasi-Newton method (BFGS) builds up an approximation to the second derivative over multiple steps. These variants are essential in optimization, where evaluating the gradient of a loss function over millions of parameters is the core computational bottleneck.

In machine learning, gradient descent is a first-order optimization method related to Newton’s approach. Instead of finding roots, it minimizes a loss function by stepping in the direction opposite the gradient. Newton’s method itself can be used for optimization by finding roots of the gradient. Second-order methods like Newton’s method converge faster than gradient descent but require computing and inverting the Hessian matrix, which is expensive for high-dimensional problems.

Our polynomial root finder also uses Newton’s method internally for polynomials of degree 3 and higher. It tries multiple starting guesses to find all roots, which works well in practice for most polynomials you encounter.

The Fractal nature of Newton’s method in the complex plane is visually stunning. Coloring each point by which root Newton’s method converges to (starting from that point) produces intricate, infinitely detailed boundary patterns called Newton fractals. These images beautifully illustrate that even simple algorithms can exhibit extraordinarily complex behavior.