Integer Multiplication

Motivation

The grade-school algorithm for multiplying two \(n\)-digit integers takes \(\Theta(n^2)\) digit operations. For two thousand years, this was assumed to be optimal — until Karatsuba showed in 1960 that with one extra addition, you can replace four multiplications with three, saving an asymptotic factor and giving \(\Theta(n^{\log_2 3}) \approx \Theta(n^{1.585})\) (Karatsuba and Ofman 1963; Kleinberg and Tardos 2005).

Karatsuba’s discovery is a foundational result in algorithms: the first proof that a “natural” \(\Theta(n^2)\) algorithm could be improved by exploiting algebraic identities. It set off a sequence of progressively faster algorithms (Toom–Cook, Schönhage–Strassen, and most recently Harvey & van der Hoeven’s \(O(n \log n)\) algorithm) that culminated in a tight bound matching the lower bound up to constants.

For practical big-integer arithmetic — cryptography, computer algebra, big-data analytics — Karatsuba is the algorithm of choice for moderate sizes (hundreds to a few thousand digits) and a building block for the truly fast algorithms used at very large sizes.

Problem

Multiply two \(n\)-digit non-negative integers \(X\) and \(Y\) in some base \(B\), producing the up-to-\(2n\)-digit product \(X \cdot Y\). The cost model counts single-digit additions and single-digit multiplications, each at unit cost.

We focus on the asymptotic dependence on \(n\); the choice of base affects only constants.

Examples

Schoolbook on two digits

Multiply \(X = 12\) and \(Y = 34\) in base \(10\). The grade-school algorithm computes every digit-pair product and sums with shifts:

\(X_i\)	\(Y_j\)	\(X_i Y_j\)	shift \(B^{i+j}\)	contribution
\(2\)	\(4\)	\(8\)	\(1\)	\(8\)
\(2\)	\(3\)	\(6\)	\(10\)	\(60\)
\(1\)	\(4\)	\(4\)	\(10\)	\(40\)
\(1\)	\(3\)	\(3\)	\(100\)	\(300\)

Sum: \(300 + 40 + 60 + 8 = 408 = 12 \cdot 34\). Four digit-multiplications were needed for two digits each — the general pattern is \(n^2\) products for \(n\)-digit operands, giving \(\Theta(n^2)\).

Naive divide and conquer does not help

Split each operand at the middle:

\[ X = X_1 \cdot B^{n/2} + X_0, \qquad Y = Y_1 \cdot B^{n/2} + Y_0. \]

Expanding the product gives

\[ X \cdot Y = X_1 Y_1 \cdot B^{n} + (X_1 Y_0 + X_0 Y_1) \cdot B^{n/2} + X_0 Y_0, \]

so the work reduces to four multiplications of \(n/2\)-digit numbers plus \(\Theta(n)\) additions and shifts:

\[ T(n) = 4 \, T(n/2) + \Theta(n). \]

By the master theorem this is \(\Theta(n^{\log_2 4}) = \Theta(n^2)\) — the same as schoolbook. Splitting into halves on its own changes nothing because the recursion still has to do four sub-multiplications.

Key Ideas

The four sub-products \(X_1 Y_1\), \(X_1 Y_0\), \(X_0 Y_1\), \(X_0 Y_0\) are not independent. The two cross terms appear together as the middle coefficient \(X_1 Y_0 + X_0 Y_1\), and one extra product captures their sum:

\[ (X_1 + X_0)(Y_1 + Y_0) = X_1 Y_1 + X_1 Y_0 + X_0 Y_1 + X_0 Y_0. \]

So if we compute

\[ A = X_1 Y_1, \qquad B = X_0 Y_0, \qquad C = (X_1 + X_0)(Y_1 + Y_0), \]

then the cross terms are recovered as \(C - A - B\), and

\[ X \cdot Y = A \cdot B^{n} + (C - A - B) \cdot B^{n/2} + B. \]

Three sub-multiplications instead of four, at the cost of two extra additions of \(n/2\)-digit numbers and two extra subtractions. The arithmetic overhead is \(\Theta(n)\) per recursive call — too small to dominate.

The base \(\log_2 3 \approx 1.585\) is the asymptotic exponent; the saving compounds across \(\log_2 n\) levels of recursion, beating \(n^2\) by a polynomial factor.

Deriving the Solution

With three sub-multiplications and \(\Theta(n)\) overhead per level:

\[ T(n) = 3 \, T(n/2) + \Theta(n), \qquad T(1) = \Theta(1). \]

The master theorem (with \(a = 3\), \(b = 2\), \(c = 1\), so \(\log_b a = \log_2 3 > c\)) gives

\[ T(n) = \Theta(n^{\log_2 3}) \approx \Theta(n^{1.585}). \]

A direct argument: at level \(\ell\) of the recursion tree there are \(3^\ell\) subproblems of size \(n / 2^\ell\), and the per-level work grows geometrically as \(3^\ell \cdot \Theta(n / 2^\ell) = \Theta(n \cdot (3/2)^\ell)\). Summing the geometric series across \(\log_2 n\) levels is dominated by the leaves, yielding \(\Theta(3^{\log_2 n}) = \Theta(n^{\log_2 3})\).

Algorithm

karatsuba(X, Y):
    n = max(digit_length(X), digit_length(Y))
    if n <= n0:                      # base case: schoolbook
        return schoolbook_multiply(X, Y)

    m = n // 2
    X1, X0 = split(X, m)             # X = X1 * B^m + X0
    Y1, Y0 = split(Y, m)

    A = karatsuba(X1, Y1)
    B = karatsuba(X0, Y0)
    C = karatsuba(X1 + X0, Y1 + Y0)
    middle = C - A - B

    return A * B^(2m) + middle * B^m + B

The base-case threshold \(n_0\) is tuned per platform — typically tens of digits, where schoolbook’s smaller constants beat Karatsuba’s recursive overhead.

The sums \(X_1 + X_0\) and \(Y_1 + Y_0\) may carry into one extra digit, so the recursive call \(C\) is on operands of size \(n/2 + 1\) rather than exactly \(n/2\). This costs only constant factors and does not affect the asymptotics.

Walkthrough

Multiply \(X = 1234\) by \(Y = 5678\) in base \(10\). Split each four-digit number at the middle: \(X_1 = 12\), \(X_0 = 34\), \(Y_1 = 56\), \(Y_0 = 78\), with \(n = 4\) and \(m = 2\).

If we descended one more level, each two-digit multiplication would itself spawn three single-digit multiplications instead of four — the savings compound.

Correctness

Direct algebra. The identity

\[ A \cdot B^{n} + (C - A - B) \cdot B^{n/2} + B = X_1 Y_1 \cdot B^{n} + (X_1 Y_0 + X_0 Y_1) \cdot B^{n/2} + X_0 Y_0 \]

reduces to checking that \(C - A - B = X_1 Y_0 + X_0 Y_1\), which expands directly from the definition of \(C = (X_1 + X_0)(Y_1 + Y_0)\). Induction on \(n\) then carries the result through the recursion: assuming the recursive calls return the correct sub-products, the recombination is exact.

Complexity and Tradeoffs

The recurrence \(T(n) = 3 \, T(n/2) + \Theta(n)\) resolves to

\[ T(n) = \Theta(n^{\log_2 3}) \approx \Theta(n^{1.585}) \]

digit operations. (proof)

Auxiliary memory is \(\Theta(n)\) for the recursion stack and intermediate operand buffers.

Important tradeoffs:

Constant factors: Karatsuba does more additions and bookkeeping per level than schoolbook. In practice, schoolbook is faster for \(n \lesssim 30\) digits; the recursive code has to bottom out at a schoolbook base case.
Carry handling: \(X_1 + X_0\) can overflow into one extra digit, which the implementation must accommodate.
Sign handling: the formulation above assumes non-negative operands; signed multiplication is reduced to unsigned by tracking signs separately.
Asymptotic optimality: Karatsuba is not asymptotically optimal — \(O(n \log n)\) is achievable (Harvey and Hoeven 2021) — but it is the practical sweet spot for moderate sizes and the simplest of the sub-quadratic algorithms.

When to Use It

Karatsuba dominates schoolbook above a small threshold and remains competitive into the few-thousand-digit range. The broader pattern is the Karatsuba trick: when an algorithm’s recursive structure produces sub-products that share algebraic content, an extra evaluation can replace one of them.

Operand size	Typical algorithm
Single machine word	Hardware multiplier — \(\Theta(1)\)
\(n \lesssim 30\) digits	Schoolbook — \(\Theta(n^2)\), smallest constants
\(30 \lesssim n \lesssim\) a few thousand	Karatsuba — \(\Theta(n^{1.585})\)
Larger	Toom–Cook, then Schönhage–Strassen or other FFT-based
Extremely large (\(n \gtrsim 10^6\))	Harvey & van der Hoeven — \(\Theta(n \log n)\)

In practice, libraries such as GMP layer these algorithms by size, dispatching to whichever is fastest at a given \(n\).

Variants

Toom–Cook-\(k\). Generalize the split-into-two to split-into-\(k\). Evaluating the polynomial product at \(2k - 1\) points and interpolating gives \(T(n) = (2k-1) \, T(n/k) + \Theta(n)\) with cost \(\Theta(n^{\log_k(2k-1)})\).

\(k\) Multiplications Complexity

\(2\) (Karatsuba) \(3\) \(\Theta(n^{1.585})\)

\(3\) (Toom-3) \(5\) \(\Theta(n^{1.465})\)

\(4\) (Toom-4) \(7\) \(\Theta(n^{1.404})\)

The constant factor grows with \(k\), so very large \(k\) is impractical.
Schönhage–Strassen (1971): \(\Theta(n \log n \log \log n)\) via FFT over a finite ring (Schönhage and Strassen 1971).
Fürer (2007): \(\Theta(n \log n \cdot 2^{O(\log^* n)})\) (Fürer 2007).
Harvey & van der Hoeven (2019): \(\Theta(n \log n)\), matching the conjectural lower bound (Harvey and Hoeven 2021).

\(k\)	Multiplications	Complexity
\(2\) (Karatsuba)	\(3\)	\(\Theta(n^{1.585})\)
\(3\) (Toom-3)	\(5\)	\(\Theta(n^{1.465})\)
\(4\) (Toom-4)	\(7\)	\(\Theta(n^{1.404})\)

References

Fürer, Martin. 2007. “Faster Integer Multiplication.” Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC07, 57–66. https://doi.org/10.1145/1250790.1250800.

Harvey, David, and Joris van der Hoeven. 2021. “Integer Multiplication in Time \(O(n\mathrm{log}\, n)\).” Annals of Mathematics 193 (2). https://doi.org/10.4007/annals.2021.193.2.4.

Karatsuba, Anatolii, and Yuri Ofman. 1963. “Multiplication of Multidigit Numbers on Automata.” Soviet Physics Doklady 7: 595–96. https://cir.nii.ac.jp/crid/1570572700575309312.

Kleinberg, Jon, and Éva Tardos. 2005. Algorithm Design. Pearson/Addison-Wesley. https://www.pearson.com/en-us/subject-catalog/p/algorithm-design/P200000003259.

Schönhage, A., and V. Strassen. 1971. “Schnelle Multiplikation Großer Zahlen.” Computing 7 (3-4): 281–92. https://doi.org/10.1007/bf02242355.