Learning Graph Structures, Graphical Lasso and Its Applications - Part 4: Laying Out the Math

1 minute read

Published: April 06, 2019

Defining the inverse covariance matrix as $\Theta$, and the empirical covariance matrix from the data as S, we have the problem to maximize the following L1-penalized likelihood: $$\log \operatorname{det} \Theta-\operatorname{trace}(S \Theta)-\rho\|\Theta\|_{1}.$$ In case you wonder where that is from, consider the log likelihood of multivariate Gaussian distribution: $$p(X)=(2 \pi)^{-\frac{p}{2}} \operatorname{det}(\Theta)^{\frac{1}{2}} \exp \left(-\frac{1}{2}(X-\mu)^{T} \Theta(X-\mu)\right).$$ Ignoring the normalizing constant, we would have: $$L(\mu, \Sigma) \propto \frac{n}{2} \log \operatorname{det}(\Theta)-\frac{1}{2} \sum_{l=1}^{n}\left(X^{(l)}-\mu\right)^{T} \Theta\left(X^{(l)}-\mu\right).$$ which reduces to: $$\frac{n}{2} \log \operatorname{det}(\Theta)-\frac{n}{2} \operatorname{trace}(S \Theta)-\frac{n}{2}\left(X^{(l)}-\mu\right)^{T} \Theta\left(X^{(l)}-\mu\right).$$ Treating the mean as unconstrained, and adding the L1 penalty, we reduce the problem to the form: $$\underset{\Theta}{\operatorname{argmax}} \log \operatorname{det} \theta-\operatorname{trace}(S \theta)-\rho\|\theta\|_{1}.$$ Now we have the problem totally formulated as a convex optimization one, which means we will be able to solve it exactly; now the only problem left is how we can arrive at a clever solution to it. $\\$ Let us note that L1 norm above can be expressed as follows: $$\|\Theta\|_{1}=\max _{\|U\|_{\infty} \leq 1}(\operatorname{trace}(\Theta U)).$$ Please note that in the original papers,instead of the matrix norm definitions we are familiar with, here actually $\|\Theta\|_{1}$ means the sum of all absolute values of all $\Theta$ elements, and and $\|U\|_{\infty}$ the maximum absolute value of all elements in U. So we can rewrite our goal function as: $$\underset{\Theta}{\operatorname{argmax}} \underset{\|U\|_{\infty} \leq \lambda}{\operatorname{min}} \log \operatorname{det}(\Theta)-\operatorname{trace}(\Theta, S+U).$$ Now let us try obtain the dual problem. Exchanging min and max and setting gradient of $\Theta$ to 0, we have: $$\Theta^{-1}-(S+U)=0, \text { namely, } \Theta=(S+U)^{-1}.$$ Plugging the solution into the goal function, we have: $$\min _{\|U\|_{\infty} \leq \lambda}-\log \operatorname{det}(S+U)-p.$$ (remember we are assuming N*p data, hence p here) $\\$ Define W=S+U, we now arrive at the dual problem of the original inverse covariance estimation problem. This problem estimates the covariance matrix instead: $$\arg \max _{W}\left(\log \operatorname{det}(W) :\|W-S\|_{\infty} \leq \lambda\right).$$ We have travelled a long way to arrive at this form which is easier to work with and much clearer. So now let me clarify that our final goal is to get an accurate inverse covariance matrix, but we will try estimate the covariance matrix first, and we will transform our result after that to recover the inverse covariance matrix.

Share on

Twitter Facebook LinkedIn

Jason Xi Yang

Learning Graph Structures, Graphical Lasso and Its Applications - Part 4: Laying Out the Math

Share on

You May Also Enjoy

Learning Graph Structures, Graphical Lasso and Its Applications - Part 8: Visualizing International ETF Market Structure

Learning Graph Structures, Graphical Lasso and Its Applications - Part 7: Julia implementation of Glasso

Learning Graph Structures, Graphical Lasso and Its Applications - Part 6: Graphical Lasso II

Learning Graph Structures, Graphical Lasso and Its Applications - Part 5: Graphical Lasso Formulation