a canonicity proof via gradient induction
A canonical form for a language defines a representative among an equivalent class of terms. It can help identify equivalent terms in the language. Here I present a proof for the canonicity of the sum-over-product normal form for arithmetics, to demonstrate an interesting technique that I call induction over derivatives. A more catchy name I thought of is gradient induction.
First let's clarify what we mean by canonical form: when two expressions in a language, considered as programs, evaluate to the same result on all possible inputs, we say they are semantically equivalent. We therefore hope to find a “standard way” to write such expressions, so that when we rewrite any two expressions to the standard form, we can immediately tell if they are semantically equivalent by just looking at them. Such a standard form is considered canonical – we say that two terms, e1 and e2, share the same canonical form if and only if they are semantically equivalent, where semantically equivalent means the terms always compute the same result given same inputs. Formally:
canonicalize(e1) ≡ canonicalize(e2) ⇔ ∀ x . eval(e1, x) = eval(e2, x)
Here ≡ denotes syntactic equality, and = denotes semantic (value) equality. In our case, the expressions are in the language of arithmetics (+, *, x, ℝ):
Definition. An arithmetic expression is either a variable, a constant, the sum of two expressions, or the product of two expressions.
And our normal form is essentially the standard polynomials:
Definition. The sum-over-product normal form of an arithmetic expression is the sum of products of literals, where a literal is either a variable or a constant. Furthermore, we combine monomials that only differ in their coefficients, e.g. 2xy + 3xy is rewritten into 5xy.
One can rewrite an expression to SoP with the standard laws (associativity, commutativity & distributivity). That is, we keep pulling + up and merge terms. For example, the SoP canonical form of (x + z) (x + y) is x2 + xy + xz + yz.
Proposition. The sum-over-product normal form is canonical: Csop(e1) ≡ Csop(e2) ⇔ ∀ x . eval(e1, x) = eval(e2, x)
Proof. The left-to-right direction can be proven by structural induction over the SoP normal form syntax, together with the fact that the standard rewrite rules we use preserve the semantics of arithmetics.
I now prove the contrapositive of the backward direction:
Csop(e1) ≢ Csop(e2) ⇒ ∃ x . eval(e1, x) ≠ eval(e2, x)
There are two cases for Csop(e1) ≢ Csop(e2): 1. e1 and e2 differ in their constant term (e.g. e1 = 2xy + 4 and e2 = 3yz + 7), and 2. otherwise (e.g. e1 = 2xy + 4 and e2 = 3yz + 4). Note that we only look at the lonely constants, not the coefficients in other terms.
Case 1 is simple, since if two expressions have different constants they evaluate to different results (i.e the constants themselves) on all-zero inputs.
To prove case 2, I break down the goal into two steps:
Csop(e1) ≢ Csop(e2) ⇒ ∃ y . ∂ e1 / ∂ y ≠ ∂ e2 / ∂ y
∃ y . ∂ e1 / ∂ y ≠ ∂ e2 / ∂ y ⇒ ∃ x . eval(e1, x) ≠ eval(e2, x)
Recall that ≠ is semantic inequivalence.
The latter is simple: pick x1 and x2 that only differ in the y variable (from ∂ y above). Since the derivatives differ, we can always find a pair of x1 and x2 such that either eval(e1, x1) ≠ eval(e2, x1) or eval(e1, x2) ≠ eval(e2, x2).
To prove Csop(e1) ≢ Csop(e2) ⇒ ∃ y . ∂ e1 / ∂ y ≠ ∂ e2 / ∂ y, we perform induction over the derivatives of the expressions, with our original proof goal as the inductive hypothesis: Csop(e1) ≡ Csop(e2) ⇔ ∀ x . eval(e1, x) = eval(e2, x). Since Csop(e1) ≢ Csop(e2), we know ∃ y . ∂ e1 / ∂ y ≢ ∂ e2 / ∂ y (syntactically). Since the derivative of a canonical form is also canonical (not that obvious, but you'll see it after thinking a little harder), by our inductive hypothesis, ∃ y . ∂ e1 / ∂ y ≠ ∂ e2 / ∂ y (semantically).
The preceding induction is sound because taking the derivative makes any expression simpler, eventually bringing it to a constant. □
The main takeaway here is that, when we want to perform an inductive proof, we need every inductive step to make the terms simpler. Usually this is done by structural induction over the language definition; but when the language is differentiable, the derivative is another tool for inductively simplifying the terms. This can come handy for the PL researcher, since we now know a wide range of languages are differentiable – not just polynomials!
p.s. Haotian Jiang & Sorawee Porncharoenwase pointed out a much simpler proof: given two semantically equivalent arithmetic expressions, their difference is always zero. Therefore, the expression that represents the difference has infinitely many roots. According to the fundamental theorem of algebra, the two expressions must be the same polynomial, since otherwise their difference would be a none-zero polynomial and has finitely many roots.