Commit 93d23230 authored by nanored's avatar nanored
Browse files

end of hw3 + a small part of chapter 2

parent ca1c63c9
......@@ -31,10 +31,10 @@ Si $f$ est strictement convexe et que l'on connaît un $m > 0$ tel que :
$$ \forall x \in S, \; \nabla^2 f(x) \succeq m I $$
Alors on a en utilisant la convexité :
$$ f(y) \geqslant f(x) + \nabla f(x)^\trans (y - x) + \dfrac{m}{2} \| y - x \|_2^2 $$
En sachant que $f$ est minorée par $p^*$, cette équation se réécrit :
$$ f(x) - p^* \leqslant - \nabla f(x)^\trans (y - x) - \dfrac{m}{2} \| y - x \|_2^2 $$
En prenant $y = x^*$, cette équation se réécrit :
$$ f(x) - p^* \leqslant - \nabla f(x)^\trans (x^* - x) - \dfrac{m}{2} \| x^* - x \|_2^2 $$
On cherche un majorant du côté droit de l'inégalité. Pour cela on écrit l'équation d'annulation du gradient en $y$ et on obtient :
$$ y = x + \dfrac{1}{m} \nabla f(x) $$
$$ y = x - \dfrac{1}{m} \nabla f(x) $$
Ce qui donne finalement :
\begin{equation}
\label{stopcont}
......@@ -104,4 +104,78 @@ Où $\epsilon_0$ dépend de $x^{(0)}$, $m$ et $L$.
Pour $f : \R^n \rightarrow \R$ on dira de même si $g(t) = f(x + tv)$ est auto-concordante pour tout $v \in \R^n$ et $x \in \dom f$.
}
On remarquera que les fonction auto-concordante sont stables par composition avec des fonctions affines.
\ No newline at end of file
On remarquera que les fonction auto-concordante sont stables par composition avec des fonctions affines.
\sect{Méthode barrière}
On se donne le problème :
$$ \begin{array}{llr}
\min & f_0(x) \\
\text{t.q.} & f_i(x) \leqslant 0, & i = 1, ..., m \\
& Ax = b
\end{array} $$
Où les fonctions $f_i$ sont deux fois différentiable et $A$ est de rang maximal (i.e $\rank A = p$). On suppose que $p^*$ est fini et atteint. Enfin on suppose que le problème est strictement admissible. \\
On commence par reformuler le problème pour enlever les contraintes d'inégalités :
$$ \begin{array}{llr}
\min & f_0(x) + \sum_{i = 1}^m I_-(f_i(x)) \\
\text{t.q.} & Ax = b
\end{array} $$
Où la fonction $I_-$ est la fonction indicatrice de $\R_-$ qui vaut 0 sur les valeurs négatives et $+\infty$ sur les valeurs strictement positives. Le problème est que cette fonction indicatrice n'est pas différentiable. Pour y remédier on utilise des approximations. On pose une constante $t > 0$ et on défini le nouveau problème suivant :
$$ \begin{array}{llr}
\min & \displaystyle f_0(x) - \dfrac{1}{t} \sum_{i = 1}^m \log \left( - f_i(x) \right) \\
\text{t.q.} & Ax = b
\end{array} $$
L'approximation est meilleur lorsque $t$ tend vers l'infini.
\begin{center}
\begin{tikzpicture}[yscale=0.25, xscale=1.25]
\draw (-3, -5) rectangle (1, 10);
\node[below] at (-1, -5) {-1};
\node[below] at (-3, -5) {-3};
\node[below] at (1, -5) {1};
\node[left] at (-3, -5) {-5};
\node[left] at (-3, 10) {10};
\draw[dashed, thick] (-3, 0) node[left] {0} -- (0, 0) -- (0, 10);
\draw[thick, red, smooth, domain=-1.1:10, variable=\y] plot ({-exp(-\y)}, \y);
\draw[thick, blue, smooth, domain=-2.2:10, variable=\y] plot ({-exp(-0.5*\y)}, \y);
\draw[thick, green, smooth, domain=-4.4:10, variable=\y] plot ({-exp(-0.25*\y)}, \y);
\node[green] at (0.5, 8.5) {\small $t = 1/4$};
\node[blue] at (0.5, 7) {\small $t = 1/2$};
\node[red] at (0.5, 5.5) {\small $t = 1$};
\end{tikzpicture}
\end{center}
On note $\phi$ l'approximation utilisée :
$$ \phi(x) = - \sum_{i = 1}^m \log \left( - f_i(x) \right) $$
Par la règle des compositions, $\phi$ est bien convexe. De plus, on a la double différentiabilité :
$$ \nabla \phi(x) = - \sum_{i = 1}^m \dfrac{1}{f_i(x)} \nabla f_i(x) $$
$$ \nabla^2 \phi(x) = \sum_{i = 1}^m \dfrac{1}{f_i(x)^2} \nabla f_i(x) \nabla f_i(x)^\trans - \dfrac{1}{f_i(x)} \nabla^2 f_i(x) $$
On note aussi $x^*(t)$ la solution de notre approximation. On peut montrer que cette solution existe et est unique. De plus on sait qu'il y a optimalité si et seulement si il existe $w$ tel que :
\begin{equation}
t \nabla f_0(x) - \sum_{i = 1}^m \dfrac{1}{f_i(x)} \nabla f_i(x)
A^\trans w = 0 \qquad \text{et} \quad Ax = b
\label{opti_cond}
\end{equation}
On pose a alors les variables duals suivantes :
$$ \lambda_i^*(t) = - \dfrac{1}{t f_i(x^*(t))} \qquad \nu_i^*(t) = \dfrac{1}{t} w $$
Et alors la condition \eqref{opti_cond} se réécrit :
$$ \nabla L(x^*(t), \lambda^*(t), \nu^*(t)) = 0 \qquad A x^*(t) = b $$
$L$ est le Lagrangien du problème initiale, i.e. :
$$ L(x, \lambda, \nu) = f_0(x) + \sum_{i = 1}^m \lambda_i f_i(x) + \nu^\trans (Ax - b) $$
Cela nous permet alors d'avoir une borne sur notre approximation de $p^*$ :
$$ p^* \geqslant g(\lambda^*(t), \nu^*(t)) = L(x^*(t), \lambda^*(t), \nu^*(t)) = f_0 \left( x^*(t) \right) - \dfrac{m}{t} $$
De plus cette inégalité montre la convergence vers une solution optimale lorsque $t$ tend vers l'infini. \\
Finalement cela nous permet de définir l'algorithme suivant :
\begin{algorithm}[h]
\caption{Méthode barrière}
\KwIn{Un point de départ $x=x^{(0)}$ strictement admissible, $t=t^{(0)} > 0$, $\mu > 1$ et $\epsilon > 0$}
\Repeat{$m/t < \epsilon$}{
$t \gets \mu t$ \\
\textit{Etape de centrage} : Calculer $x^*(t)$ en minimisant $tf_0 + \phi$ soumis à $Ax = b$. \\
$x \gets x^*(t)$
}
\end{algorithm}
A la fin de cet algorithme on a donc :
$$ f_0(x) - p^* \leqslant \epsilon $$
Le nombre d'itération de centrage est :
$$ n_{step} = \left\lceil \dfrac{\log \left( m / \left( \epsilon t^{(0)} \right) \right)}{\log \mu} \right\rceil $$
\ No newline at end of file
conv.png

79.1 KB

{
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"version": "3.7.3"
},
"orig_nbformat": 2,
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"npconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": 3
},
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"# Convex Optimization - Homework 3"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pylab as pl\n",
"from time import time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"## Question 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"Because $Q$ is always half of identity, I didn't put it in the parameters. Furthermore $b$ is replaced by the value $l = \\lambda$, because $b$ is defined as the vector with $\\lambda$ in each of its $2d$ coordinates."
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"### PARAMETERS ###\n",
"alpha = 0.5\n",
"beta = 0.8\n",
"##################\n",
"\n",
"def f(p, v):\n",
" \"\"\"\n",
" Return the value of our exact objective function evaluated in v\n",
" \"\"\"\n",
" return v.dot(v/2 + p)\n",
"\n",
"def approx(p, A, l, t, v):\n",
" \"\"\"\n",
" Return the value of our approximated objective function evaluated in v\n",
" \"\"\"\n",
" cons = l - A @ v\n",
" if cons.min() <= 0: return float('inf')\n",
" return t * v.dot(v/2 + p) - np.log(cons).sum()\n",
"\n",
"def centering_step(p, A, l, t, v0, eps):\n",
" \"\"\"\n",
"\tReturn the sequence of variables iterates (v_i) i=1,...,n_epsilon \n",
" \"\"\"\n",
" d2, n = A.shape\n",
" # Sequence of variables iterates\n",
" vs = [v0]\n",
" # This matrix is usefull for the Hessian computation\n",
" AAt = (A.reshape(d2, n, 1) @ A.reshape(d2, 1, n))\n",
" while True:\n",
" # variable iterate\n",
" v = vs[-1]\n",
" # Usefull for the computation of the gradient and the Hessian\n",
" denominator = l - A @ v\n",
" # gradient\n",
" grad = t * (v + p) + (A / denominator.reshape(d2, 1)).sum(0)\n",
" # Stopping criteria\n",
" if grad.dot(grad) < 2*t*eps: break\n",
" # Hessian\n",
" H = t * np.identity(n) + (AAt / denominator.reshape(d2, 1, 1)**2).sum(0)\n",
" # Newton step\n",
" Delta = - np.linalg.pinv(H) @ grad\n",
" # Multiplier of Newton step\n",
" tau = 1\n",
" # Value of our function at v\n",
" fv = approx(p, A, l, t, v)\n",
" # Newton decrement times alpha\n",
" gd = alpha * grad.dot(Delta)\n",
" # Line search backtracking\n",
" while approx(p, A, l, t, v + tau*Delta) > fv + tau*gd:\n",
" tau *= beta\n",
" # Update of v\n",
" v = v + tau * Delta\n",
" vs.append(v)\n",
" return vs"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"def bar_method(p, A, l, v0, eps, mu):\n",
" \"\"\"\n",
"\tReturn the sequence of variables iterates (v_i) i=1,...,n_epsilon \n",
" \"\"\"\n",
" ti = time()\n",
" d2, n = A.shape\n",
" # Sequence of variables iterates and initial t_0\n",
" vs, t = [v0], 1\n",
" while len(vs)==1 or d2/t > 0.5*eps:\n",
" v = vs[-1]\n",
" vs += centering_step(p, A, l, t, v, 0.5*eps)\n",
" print(f\"\\rCentering step with t={t} done\", end='')\n",
" t *= mu\n",
" print(f\"\\r[mu={mu}] Computation time: {time()-ti}s\")\n",
" return vs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
"## Question 3"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"def get_dual(X, y):\n",
" \"\"\"\n",
" Return the parameters A and p of the dual from the parameters X and y of the primal\n",
" \"\"\"\n",
" A = np.concatenate((X.T, -X.T), axis=0)\n",
" p = y\n",
" return A, p\n",
"\n",
"def get_w(X, y, v):\n",
" \"\"\"\n",
" Return the value of the primal variable w from the dual variable v\n",
" \"\"\"\n",
" return np.linalg.pinv(X) @ (y + v)\n",
"\n",
"def primal_objective(X, y, l, w):\n",
" \"\"\"\n",
" Return the value of the primal objective function evaluated at w\n",
" \"\"\"\n",
" error = X @ w - y\n",
" return 0.5 * error.T @ error + l * abs(w).sum()"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"### PARAMETERS ###\n",
"n, d = 100, 1000\n",
"l = 10\n",
"mus = [2, 15, 50, 100]\n",
"##################\n",
"\n",
"X = np.random.rand(n, d)\n",
"y = np.random.rand(n)\n",
"A, p = get_dual(X, y)\n",
"\n",
"v0 = np.zeros(n)\n",
"eps = 1e-6 * n\n",
"vs = [bar_method(p, A, l, v0, eps, mu) for mu in mus]"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"fvs = [np.array([f(p, v) for v in vsm]) for vsm in vs]\n",
"best = min(min(a) for a in fvs)\n",
"\n",
"nm = len(mus)\n",
"fig, ax = pl.subplots((nm+1)//2, 2, figsize=(15, 10))\n",
"print(\"=======================\")\n",
"print(\"Wanted precision:\", eps)\n",
"print(\"Best value found:\", best)\n",
"print(\"=======================\")\n",
"for i in range(nm):\n",
" nsteps = len(vs[i])\n",
" w = get_w(X, y, vs[i][-1])\n",
" print(f\"mu = {mus[i]}:\")\n",
" print(f\"\\tFinal precision: {fvs[i][-1]-best}\")\n",
" print(f\"\\tNumber of steps: {nsteps}\")\n",
" print(f\"\\tL1 norm of w: {abs(w).sum()}\")\n",
" print(f\"\\tPrimal value: {primal_objective(X, y, l, w)}\\n\")\n",
" ax[i//2, i%2].semilogy([i for i in range(nsteps)], fvs[i]-best)\n",
" ax[i//2, i%2].set_title(f\"Barrier method with mu={mus[i]}\")\n",
" ax[i//2, i%2].set_xlabel(\"Number of iterations\")\n",
" ax[i//2, i%2].set_ylabel(\"gap\")\n",
"pl.show()"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": []
}
]
}
\ No newline at end of file
%% Cell type:markdown id: tags:
# Convex Optimization - Homework 3
%% Cell type:code id: tags:
```
import numpy as np
import pylab as pl
from time import time
```
%% Cell type:markdown id: tags:
## Question 2
%% Cell type:markdown id: tags:
Because $Q$ is always half of identity, I didn't put it in the parameters. Furthermore $b$ is replaced by the value $l = \lambda$, because $b$ is defined as the vector with $\lambda$ in each of its $2d$ coordinates.
%% Cell type:code id: tags:
```
### PARAMETERS ###
alpha = 0.5
beta = 0.8
##################
def f(p, v):
"""
Return the value of our exact objective function evaluated in v
"""
return v.dot(v/2 + p)
def approx(p, A, l, t, v):
"""
Return the value of our approximated objective function evaluated in v
"""
cons = l - A @ v
if cons.min() <= 0: return float('inf')
return t * v.dot(v/2 + p) - np.log(cons).sum()
def centering_step(p, A, l, t, v0, eps):
"""
Return the sequence of variables iterates (v_i) i=1,...,n_epsilon
"""
d2, n = A.shape
# Sequence of variables iterates
vs = [v0]
# This matrix is usefull for the Hessian computation
AAt = (A.reshape(d2, n, 1) @ A.reshape(d2, 1, n))
while True:
# variable iterate
v = vs[-1]
# Usefull for the computation of the gradient and the Hessian
denominator = l - A @ v
# gradient
grad = t * (v + p) + (A / denominator.reshape(d2, 1)).sum(0)
# Stopping criteria
if grad.dot(grad) < 2*t*eps: break
# Hessian
H = t * np.identity(n) + (AAt / denominator.reshape(d2, 1, 1)**2).sum(0)
# Newton step
Delta = - np.linalg.pinv(H) @ grad
# Multiplier of Newton step
tau = 1
# Value of our function at v
fv = approx(p, A, l, t, v)
# Newton decrement times alpha
gd = alpha * grad.dot(Delta)
# Line search backtracking
while approx(p, A, l, t, v + tau*Delta) > fv + tau*gd:
tau *= beta
# Update of v
v = v + tau * Delta
vs.append(v)
return vs
```
%% Cell type:code id: tags:
```
def bar_method(p, A, l, v0, eps, mu):
"""
Return the sequence of variables iterates (v_i) i=1,...,n_epsilon
"""
ti = time()
d2, n = A.shape
# Sequence of variables iterates and initial t_0
vs, t = [v0], 1
while len(vs)==1 or d2/t > 0.5*eps:
v = vs[-1]
vs += centering_step(p, A, l, t, v, 0.5*eps)
print(f"\rCentering step with t={t} done", end='')
t *= mu
print(f"\r[mu={mu}] Computation time: {time()-ti}s")
return vs
```
%% Cell type:markdown id: tags:
## Question 3
%% Cell type:code id: tags:
```
def get_dual(X, y):
"""
Return the parameters A and p of the dual from the parameters X and y of the primal
"""
A = np.concatenate((X.T, -X.T), axis=0)
p = y
return A, p
def get_w(X, y, v):
"""
Return the value of the primal variable w from the dual variable v
"""
return np.linalg.pinv(X) @ (y + v)
def primal_objective(X, y, l, w):
"""
Return the value of the primal objective function evaluated at w
"""
error = X @ w - y
return 0.5 * error.T @ error + l * abs(w).sum()
```
%% Cell type:code id: tags:
```
### PARAMETERS ###
n, d = 100, 1000
l = 10
mus = [2, 15, 50, 100]
##################
X = np.random.rand(n, d)
y = np.random.rand(n)
A, p = get_dual(X, y)
v0 = np.zeros(n)
eps = 1e-6 * n
vs = [bar_method(p, A, l, v0, eps, mu) for mu in mus]
```
%% Cell type:code id: tags:
```
fvs = [np.array([f(p, v) for v in vsm]) for vsm in vs]
best = min(min(a) for a in fvs)
nm = len(mus)
fig, ax = pl.subplots((nm+1)//2, 2, figsize=(15, 10))
print("=======================")
print("Wanted precision:", eps)
print("Best value found:", best)
print("=======================")
for i in range(nm):
nsteps = len(vs[i])
w = get_w(X, y, vs[i][-1])
print(f"mu = {mus[i]}:")
print(f"\tFinal precision: {fvs[i][-1]-best}")
print(f"\tNumber of steps: {nsteps}")
print(f"\tL1 norm of w: {abs(w).sum()}")
print(f"\tPrimal value: {primal_objective(X, y, l, w)}\n")
ax[i//2, i%2].semilogy([i for i in range(nsteps)], fvs[i]-best)
ax[i//2, i%2].set_title(f"Barrier method with mu={mus[i]}")
ax[i//2, i%2].set_xlabel("Number of iterations")
ax[i//2, i%2].set_ylabel("gap")
pl.show()
```
%% Cell type:code id: tags:
```
```
%% Cell type:code id: tags:
```
```
......@@ -31,6 +31,7 @@
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator*{\rank}{rank}
\usetikzlibrary{arrows.meta}
\usetikzlibrary{arrows}
......@@ -128,7 +129,6 @@
\newcommand{\dom}{\text{\textbf{dom}}~}
\newcommand{\diag}{\text{\textbf{diag}}~}
\newcommand{\epi}{\text{\textbf{epi}}~}
\newcommand{\rank}{\text{\textbf{rank}}~}
\newcommand{\dist}{\text{\textbf{dist}}}
\newcommand{\pmat}[1]{\begin{pmatrix} #1 \end{pmatrix}}
\newcommand{\ind}[1]{\mathbbm{1}_{#1}}
......
......@@ -42,7 +42,7 @@
& = & - v^\trans y + \inf_{z, w} - v^\trans z + \dfrac{1}{2} \| z \|_2^2 + \left( X^\trans v \right)^\trans w + \lambda \| w \|_1 \\[3mm]
& = & - v^\trans y - \dfrac{1}{2} \| 2 v \|_2^{2 \, *} - \lambda \left\| - \dfrac{1}{\lambda} X^\trans v \right\|_1^*
\end{array} $$
In the previous homework I compute the conjugate of the square of 2-norm and the conjugate of the 1-norm. We have :
In the previous homework I have computed the conjugate of the square of 2-norm and the conjugate of the 1-norm. We have :
$$ \| x \|_2^{2 \, *} = \frac{1}{4} \| x \|_2^2 \qquad \| x \|_1^* = \syst{ll}{
0 & \text{If } \| x \|_{\infty} \leqslant 1 \\
+ \infty & \text{Otherwise}
......@@ -56,7 +56,7 @@
$$ \forall i = 1, ..., d \; \left| \left( X^\trans v \right)_i \right| \leqslant \lambda \Leftrightarrow X^\trans v \preceq \lambda \one_d \text{ and } - X^\trans v \preceq \lambda \one_d $$
Then we let the matrix:
$$ A = \pmat{X^\trans \\ -X^\trans} \qquad \text{such that } A v \preceq \lambda \one_{2d} $$
We also remark that the following equality:
We also remark the following equality:
$$ \dfrac{1}{2} \| v \|_2^2 = \dfrac{1}{2} v^\trans I_n v = v^\trans Q v \qquad \text{with } Q = \dfrac{1}{2} I_n $$
The dual problem is obtained by maximizing $g$ which is equivalent to minimizing $-g$. Thus we obtain the following dual:
$$ (QP) \qquad \begin{array}{llr}
......@@ -67,5 +67,62 @@
\begin{center}
\fbox{$ \displaystyle Q = \dfrac{1}{2} I_n \quad p = y \quad A = \pmat{X^\trans \\ -X^\trans} \quad b = \lambda \one_{2d} $}
\end{center}
\section*{Question 2}
\paragraph{Gradient and Hessian}
To implement the centering step we need to compute the gradient and the Hessian matrix of our approximated objective function:
$$ f_t(v) = t \left( v^\trans Q v + p^\trans v \right) - \sum_{i = 1}^{2d} \log \left( b_i - a_i^\trans v \right) $$
Where $a_i$ is the $i$-th line of $A$. To simplify the expression, I replace $Q$ and $b$ by their values:
$$ f_t(v) = t \left( \dfrac{1}{2} v^\trans v + p^\trans v \right) - \sum_{i = 1}^{2d} \log \left( \lambda - a_i^\trans v \right) $$
Firstly, we obtain the following gradient:
\begin{center}
\fbox{$ \displaystyle \nabla f_t(v) = t (v + p) + \sum_{i = 1}^{2d} \dfrac{a_i}{\lambda - a_i^\trans v} $}
\end{center}
Then here is the Hessian matrix:
\begin{center}
\fbox{$ \displaystyle \nabla^2 f_t(v) = t I_n + \sum_{i = 1}^{2d} \dfrac{a_i a_i^\trans}{\left( \lambda - a_i^\trans v \right)^2 } $}
\end{center}
\paragraph{Newton stopping criteria}
We also need to know when we can stop the Newton method to obtain a target precision $\epsilon$. To do so, we remark the following inequality:
$$ \nabla^2 f_t(v) \succeq t I_n $$
Now we let $m = t$, a constant of strict convexity of $f_t$ and as seen in class, we have the inequality:
$$ f_t(v) - p^*(t) \leqslant \dfrac{1}{2m} \| \nabla f_t(v) \|_2^2 $$
Then our stopping criteria inside the Newton method is :
\begin{center}
\fbox{$\displaystyle \| \nabla f_t(v) \|_2^2 \leqslant 2 m \epsilon = 2 t \epsilon$}
\end{center}
\paragraph{Initial point}
For $\lambda > 0$ it is easy to remark that the point $0$ is strictly feasible. Then we decide to start with this initial value $v_0 = 0$.
\section*{Question 3}
First, here is a figure representing the gap at each iteration for different values of $\mu$:
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{conv.png}
\caption{Gap in function of iterations for several $\mu$}
\end{figure}
We can remark that when $\mu$ increases, the number of iterations until convergence decreases. Furthermore, the computational time decreases with $\mu$ as the number of iterations decreases. In that way, for $\mu = 2$, the computation time was about 9s on my laptop, whereas for the three other values the computation time was between 2.5s and 3.5s. \\
Finally we can observe that at the end of the algorithm the gap is smaller when $\mu$ is small. But in every cases the gap is effectively smaller than $\epsilon$ as wanted.
\paragraph{Primal variable}
In order to check the impact of $\mu$ on $w$ we need to compute $w$ from the dual variable $v$. To do so we recall the Laplacian :
$$ L(w, z, v) = \dfrac{1}{2} \| z \|_2^2 + \lambda \| w \|_1 + v^\trans \left( Xw - y - z \right) $$
If we derive it with respect to $z$ we obtain:
$$ \dfrac{\partial L}{\partial z} = z - v \quad \Rightarrow \quad z^* = v^* $$
Then we use the equality:
$$ Xw - y = z $$
We multiply on the left by $X^\trans$ to obtain:
$$ X^\trans X w^* = X^\trans (y + v^*) $$
Then if our matrix $X^\trans X$ is inversible we get:
\begin{center}
\fbox{$ \displaystyle w^* = \left( X^\trans X \right)^{-1} X^\trans (y + v^*) $}
\end{center}
Unfortunately, I've certainly done something wrong because the primal value obtained by taking this value for $w$ is different from the dual value (about twice as large). I've checked that the LASSO algorithm from \verb|sklearn| library gives me the same value as I obtained from the barrier method. And it is effectively the same value, so my barrier method seems to be right, but my formula to obtain $w$ from $v$ is probably wrong. Thus I could not observed the effects+ of $\mu$ on $w$. I observed that the L1 norm of $w$ slightly decreases with $\mu$ but, because my formula to obtain $w$ is certainly wrong, it is probably an incorrect observation.
\end{document}\texttt{}
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment