Element-wise derivative of matrix inverse

This short blog post is about a formula which is useful in statistics, machine learning and control science. In these fields, one often has a matrix $\mathbf{A}$ which is element-wise parametrized by a scalar $\theta$, so that the matrix entries are each scalar functions of the parameter:

$$ \mathbf{A}(\theta) = \begin{bmatrix} a_{11}(\theta) & a_{12}(\theta) &\ldots & a_{1j}(\theta) & \ldots & a_{1n}(\theta) \\ a_{21}(\theta) & a_{22}(\theta) &\ldots & a_{1j}(\theta) & \ldots & a_{1n}(\theta) \\ \vdots & \vdots & \ddots & \vdots & & \vdots \\ a_{i1}(\theta) & a_{i2}(\theta) &\ldots & a_{ij}(\theta) & \ldots & a_{in}(\theta) \\ \vdots & \vdots & & \vdots & \ddots & \vdots \\ a_{n1}(\theta) & a_{n2}(\theta) &\ldots & a_{nj}(\theta) & \ldots & a_{nn}(\theta) \\ \end{bmatrix} $$

We then naturally defined the derivative $\dfrac{\partial \mathbf{A}}{\partial \theta} $ as a matrix with entries $\left[\dfrac{\partial a_{ij}}{\partial \theta} \right]_{ij}$. Assume that $\mathbf{A}^{-1}$ exists for a specific value of $\theta$. In order to find the derivative $\dfrac{\partial \mathbf{A}^{-1}}{\partial \theta} $ of the inverse, we first differentiate each side of the equality

\begin{align} \mathbf{A}^{-1}\mathbf{A} = \mathbf{A}\mathbf{A}^{-1} = \mathbf{I}_n \end{align}

We however first need to figure out the derivative of a product $\mathbf{A}\mathbf{B}$

Derivative of a product of matrices

Suppose both $\mathbf{A}, \mathbf{B}$ depend on the parameter $\theta$. The $ij$-th entry of the product $\mathbf{A}\mathbf{B}$ is:

$$ \left[\mathbf{A}\mathbf{B} \right]_{ij} = \sum_{k=1}^n a_{ik}b_{kj} $$

We therefore only need the derivative of this expression for every entry of the product $\mathbf{A}\mathbf{B}$. This is a derivative of scalar functions, which is easy to evaluate:

\begin{align} \left[\dfrac{\partial}{\partial \theta}\mathbf{A}\mathbf{B} \right]_{ij} &= \dfrac{\partial}{\partial \theta}\sum_{k=1}^n a_{ik}b_{kj} \\ &= \sum_{k=1}^n \dfrac{\partial}{\partial \theta}(a_{ik}b_{kj}) \\ &= \sum_{k=1}^n \dfrac{\partial}{\partial \theta}a_{ik}b_{kj} + a_{ik}\dfrac{\partial}{\partial \theta} b_{kj} \\ & = \sum_{k=1}^n \dfrac{\partial}{\partial \theta}a_{ik}b_{kj} + \sum_{p=1}^n a_{ip}\dfrac{\partial}{\partial \theta} b_{pj} \\ & = \left[ \dfrac{\partial \mathbf{A}}{\partial \theta}\mathbf{B} + \mathbf{A} \dfrac{\partial \mathbf{B}}{\partial \theta}\right]_{ij} \\ \end{align}

Last derivations

We can apply this identity to obtain the equality:

$$ \dfrac{\partial \mathbf{A}}{\partial \theta}\mathbf{A}^{-1} + \mathbf{A}\dfrac{\partial \mathbf{A}^{-1}}{\partial \theta} = \mathbf{0}_n \iff \dfrac{\partial \mathbf{A}^{-1}}{\partial \theta} = \,-\mathbf{A}^{-1}\dfrac{\partial \mathbf{A}}{\partial \theta}\mathbf{A}^{-1} $$