Strong convergence of a modified proximal algorithm for solving the lasso

Tian, Ming; Gong, Jun-Ying

doi:10.1186/s13660-015-0680-x

Research
Open access
Published: 16 May 2015

Strong convergence of a modified proximal algorithm for solving the lasso

Ming Tian^1,2 &
Jun-Ying Gong¹

Journal of Inequalities and Applications volume 2015, Article number: 161 (2015) Cite this article

1458 Accesses
Metrics details

Abstract

As is well known the proximal iterative method can be used to solve the lasso of Tibshirani (J. R. Stat. Soc., Ser. B 58:267-288, 1996). In this paper, we first propose a modified proximal iterative method based on the viscosity approximation method to obtain strong convergence, then we apply this method to solve the lasso.

1 Introduction

The lasso of Tibshirani [1] is formulated as the minimization problem

$$ \min_{x\in\mathbb{R}^{n}}\frac{1}{2}\|Ax-b \|_{2}^{2} \quad\mbox{subject to}\quad \| x\|_{1}\leq t, $$

(1.1)

where A is an $m\times n$ (real) matrix, $b\in\mathbb{R}^{m}$, $t\geq 0$ is a tuning parameter. The regularization minimization problem which is equivalent to (1.1) is

$$ \min_{x\in\mathbb{R}^{n}}\frac{1}{2}\|Ax-b \|_{2}^{2}+\gamma\|x\|_{1}, $$

(1.2)

where $\gamma>0$ is a regularization parameter. As the $\ell_{1}$ norm promotes the sparsity phenomenon that occurs in practical problems such as image/signal processing, machine learning and so on, the lasso has received much attention in recent years.

In fact, both (1.1) and (1.2) are equivalent to the basis pursuit (BP) of Chen et al. [2]:

$$\min_{x\in\mathbb{R}^{n}}\|x\|_{1} \quad\mbox{subject to}\quad Ax=b. $$

So we mathematically study the inverse linear system in $\mathbb{R}^{n}$:

$$ Ax=b, $$

(1.3)

where A is an $m\times n$ matrix, $b\in\mathbb{R}^{m}$ is an input, and $x\in\mathbb{R}^{n}$ stands for the image of interest to be recovered in imaging science. As $m\ll n$, the system (1.3) is underdetermined. Donoho [3], Candes, and others [4–6] pioneered the theory of compressed sensing showing that under certain conditions the underdetermined system (1.3) can determine a unique k-sparse solution. Then Cipra [7] pointed out that the $\ell_{1}$ norm is ideal as it ensures not only the parsimony of $\ell_{0}$ but also the computation efficiency of $\ell_{2}$ in sparse recovery.

However, the error of measurements always results in an inaccuracy of the system (1.3):

$$Ax=b+\epsilon. $$

In this case, the BP (1.3) is reformulated as

$$ \min_{x\in\mathbb{R}^{n}}\|x\|_{1}\quad \mbox{subject to}\quad\|Ax-b \|_{2}\leq \epsilon, $$

(1.4)

where $\epsilon>0$ is the tolerance level of errors and $\|\cdot\|$ is a norm on $\mathbb{R}^{n}$. If we let $Q:=B_{\epsilon}(b)$ be the closed ball in $\mathbb{R}^{n}$ around b and with radius of ϵ, then (1.4) is rewritten as

$$ \min_{x\in\mathbb{R}^{n}}\|x\|_{1} \quad \mbox{subject to}\quad Ax\in Q. $$

(1.5)

As Q is a nonempty, closed, and convex subset of $\mathbb{R}^{m}$, we let $P_{Q}$ be the projection from $\mathbb{R}^{m}$ onto Q. The condition $Ax\in Q$ is equivalent to the condition $Ax-P_{Q}(Ax)=0$, so the problem (1.5) can be solved via

$$\min_{x\in\mathbb{R}^{n}}\|x\|_{1} \quad\mbox{subject to }(I-P_{Q})Ax=0. $$

Applying the Lagrange method, we obtain the so-called Q-lasso:

$$ \min_{x\in\mathbb{R}^{n}}\frac{1}{2}\bigl\| (I-P_{Q})Ax \bigr\| _{2}^{2}+\gamma\|x\|_{1}, $$

(1.6)

where $\gamma>0$ is a Lagrangian multiplier.

The Q-lasso is connected with the split feasibility problem (SFP) of Censor and Elfving [8–10]. The SFP is mathematically formulated as the problem of finding a point x with the property:

$$ x\in C \quad\mbox{and}\quad Ax\in Q, $$

(1.7)

where C and Q are nonempty, closed, and convex subset of $\mathbb {R}^{n}$ and $\mathbb{R}^{m}$, respectively. An equivalent minimization formulation of the SFP (1.7) is given as

$$\min_{x\in C}\frac{1}{2}\|Ax-P_{Q}Ax \|_{2}^{2}. $$

The $\ell_{1}$ regularization is given as the minimization problem

$$ \min_{x\in C}\frac{1}{2}\|Ax-P_{Q}Ax \|_{2}^{2}+\gamma\|x\|_{1}, $$

(1.8)

where $\gamma>0$ is a regularization parameter. If the constrained set C is taken to be the entire space $\mathbb{R}^{n}$, then the problem (1.8) is equivalent to the problem (1.2).

Recently, Xu [11] exploited the following proximal algorithm:

$$ x_{n+1}= \bigl(\operatorname{prox}_{\lambda_{n}g} \circ\,(I- \lambda_{n}\nabla f) \bigr)x_{n} $$

(1.9)

to solve lasso (1.1), and Alghamdi et al. [12] also discussed an iterative algorithm for solving Q-lasso (1.6) via a proximal-gradient method. However, their iterative algorithms only obtain weak convergence.

Recall that Moudafi [13] proposed the viscosity iterative method in 2000 as follows:

$$ x_{n+1}=\alpha_{n}f(x_{n})+(1- \alpha_{n})Tx_{n},\quad n\geq0, $$

(1.10)

where f is a contraction on a real Hilbert space, $\{\alpha_{n}\}$ is a sequence in $(0,1)$. In 2004, Xu [14] proved that if $\{\alpha_{n}\}$ satisfies certain conditions, the sequence generated by (1.10) can converge strongly to a fixed point $x^{*}$ of T, which is also the unique solution of the variational inequality

$$\bigl\langle (I-f)x^{*},x-x^{*} \bigr\rangle \geq0,\quad \mbox{for }x\in \operatorname{Fix}(T). $$

In this paper, based on the viscosity iterative algorithm (1.10), we propose a modified formulation of the proximal algorithm (1.9). It is proved that the algorithm we propose can obtain strong convergence. Then we also apply this algorithm to solve the lasso and Q-lasso.

2 Preliminaries

Let H be a Hilbert space and let $\Gamma_{0}$ be the space of convex functions in H that are proper, lower semicontinuous, and convex.

Definition 2.1

The proximal operator of $\varphi\in\Gamma_{0}(H)$ is defined by

$$\operatorname{prox}_{\varphi}(x)=\arg\min_{v\in H} \biggl\{ \varphi(v)+\frac{1}{2}\|v-x\|^{2} \biggr\} ,\quad x\in H. $$

The proximal operator of order $\lambda>0$ is defined as the proximal operator of λφ, that is,

$$\operatorname{prox}_{\lambda\varphi}(x)=\arg\min_{v\in H} \biggl\{ \varphi(v)+\frac{1}{2\lambda }\|v-x\|^{2} \biggr\} ,\quad x\in H. $$

Proposition 2.2

Let $\varphi\in\Gamma_{0}(H)$ and $\lambda\in(0,\infty)$.

(i)
$\operatorname{prox}_{\lambda\varphi}$ is firmly nonexpansive (hence nonexpansive). Recall that a mapping $T:H\rightarrow H$ is firmly nonexpansive if
$$\|Tx-Ty\|^{2}\leq\langle Tx-Ty,x-y\rangle, \quad x,y\in H. $$
(ii)
$\operatorname{prox}_{\lambda\varphi}=(I+\lambda\partial\varphi )^{-1}=J_{\lambda}^{\partial\varphi}$, the resolvent of the subdifferential ∂φ of φ.

Combettes and Wajs [15] shows that the proximal operator $\operatorname{prox}_{\lambda\varphi}$ can have a closed-form expression in some important cases, for example, if we take φ to be the norm of H, then

$$\operatorname{prox}_{\lambda\|\cdot\|}(x)= \left \{ \begin{array}{@{}l@{\quad}l} (1-\frac{\lambda}{\|x\|})x, &\mbox{if }\|x\|>\lambda,\\ 0, &\mbox{if }\|x\|\leq\lambda. \end{array} \right . $$

In particular, if $H=\mathbb{R}$, then the above operator is reduced to the scalar soft-thresholding operator:

$$\operatorname{prox}_{\lambda|\cdot|}(x)=\operatorname{sgn}(x) \max \bigl\{ |x|-\lambda,0\bigr\} . $$

Lemma 2.3

[11]

The proximal identity

$$\operatorname{prox}_{\lambda\varphi}x=\operatorname{prox}_{\mu\varphi} \biggl( \frac{\mu}{\lambda}x+ \biggl(1-\frac {\mu}{\lambda} \biggr)\operatorname{prox}_{\lambda\varphi}x \biggr) $$

holds for $\varphi\in\Gamma_{0}(H)$, $x\in H$, $\lambda>0$, and $\mu>0$.

Recall that the function $H\rightarrow H$ is convex if

$$f \bigl((1-\lambda)x+\lambda y \bigr)\leq(1-\lambda)f(x)+\lambda f(y) $$

for all $\lambda\in(0,1)$ and $x,y\in H$. (Note that we consider finite-valued functions.)

The subdifferential of a convex function f is defined as the operator ∂f given by

$$ \partial f(x)= \bigl\{ \xi\in H:f(y)\geq f(x)+\langle\xi,y-x \rangle,y\in H \bigr\} . $$

(2.1)

The inequality in (2.1) is referred to as the subdifferential inequality of f at x. We say that f is subdifferentiable at x if $\partial f(x)$ is nonempty. We know that for an everywhere finite-valued convex function f on H, f is everywhere subdifferentiable.

Example

(i)
If $f(x)=|x|$ for $x\in\mathbb{R}$, then $\partial f(0)=[-1,1]$;
(ii)
if $f(x)=\|x\|_{1}$, for $x\in\mathbb{R}^{n}$, then $\partial f(x)$ is given componentwise by
$$\bigl(\partial f(x) \bigr)_{j}= \left \{ \begin{array}{@{}l@{\quad}l} \operatorname{sgn}(x_{j}), & \mbox{if }x_{j}\neq0,\\ \xi_{j}\in[-1,1],& \mbox{if }x_{j}=0, \end{array} \right . $$
for $1\leq j\leq n$.

Consider the unconstrained minimization problem:

$$ \min_{x\in H}f(x). $$

(2.2)

Proposition 2.4

Let f be everywhere finite-valued convex on H and $z\in H$. Support f is bounded below (i.e., $\operatorname{inf}\{f(x):x\in H\}>-\infty$). Then z is a solution to minimization (2.2) if and only if it satisfies the first-order optimality condition:

$$0\in\partial f(z). $$

Lemma 2.5

[16]

Assume that $\{a_{n}\}_{n=0}^{\infty}$ is a sequence of nonnegative real numbers such that

$$a_{n+1}\leq(1-\gamma_{n})a_{n}+\gamma_{n} \delta_{n}+\beta_{n},\quad n\geq0, $$

where $\{\gamma_{n}\}_{n=0}^{\infty}$ and $\{\beta_{n}\}_{n=0}^{\infty }$ are sequence in $(0,1)$ and $\{\delta_{n}\}_{n=0}^{\infty}$ is a sequence in $\mathbb{R}$ such that

(i)
$\sum_{n=0}^{\infty}\gamma_{n}=\infty$;
(ii)
either $\limsup_{n\rightarrow\infty}\delta_{n}\leq0$ or $\sum_{n=0}^{\infty}\gamma_{n}|\delta_{n}|<\infty$;
(iii)
$\sum_{n=0}^{\infty}\beta_{n}<\infty$.

Then $\lim_{n\rightarrow\infty}a_{n}=0$.

Lemma 2.6

Let H be a Hilbert space, then, for all $x,y\in H$, we have the inequality

$$\|x+y\|^{2}\leq\|x\|^{2}+2\langle y,x+y\rangle. $$

Lemma 2.7

[17]

Let H be a Hilbert space, C a closed convex subset of H, and $T:C\rightarrow C$ a nonexpansive mapping with $\operatorname{Fix}(T)\neq\emptyset $. If $\{x_{n}\}_{n=1}^{\infty}$ is a sequence in C weakly converging to x and if $\{(I-T)x_{n}\}_{n=1}^{\infty}$ converges strongly to y, then $(I-T)x=y$. In particular, if $y=0$, then $x\in \operatorname{Fix}(T)$.

Lemma 2.8

[14]

Let H be a Hilbert space, $h:H\rightarrow H$ a contraction with coefficient $0<\rho<1$. Then

$$\bigl\langle x-y,(I-h)x-(I-h)y \bigr\rangle \geq(1-\rho)\|x-y\|^{2}, \quad x,y \in H. $$

That is, $I-h$ is strong monotone with coefficient $1-\rho$.

We will use the notation ⇀ for weak convergence and → for strong convergence.

3 Strong convergence of proximal algorithms

Let H be a real Hilbert space and let $\Gamma_{0}$ be the space of convex functions in H that are proper, lower semicontinuous, and convex. Consider the following minimization problem:

$$ \min_{x\in H}f(x)+g(x), $$

(3.1)

where $f,g\in\Gamma_{0}(H)$.

Proposition 3.1

[11]

Let $f,g\in\Gamma_{0}(H)$. Let $x^{*}\in H$ and $\lambda>0$. Assume that f is finite-valued and differentiable on H. Then $x^{*}$ is a solution to (3.1) if and only if $x^{*}$ solves the fixed point equation:

$$x^{*}= \bigl(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f) \bigr)x^{*}. $$

Consider a mapping $S_{t}$ on H defined by

$$ S_{t}(x)=th(x)+(1-t) \bigl(\operatorname{prox}_{\lambda g} \circ\,(I-\lambda_{t}\nabla f) \bigr)x, $$

(3.2)

where h is a contraction with the coefficient $0<\rho<1$, $t\in (0,1)$, $0<\lambda_{t}\leq\frac{2}{L}$. Assume that ∇f is L-Lipschitzian.

Proposition 3.2

The mapping $\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t}\nabla f)$ is nonexpansive.

Proof

We show it in two cases.

Case 1: As $\lambda_{t}=\frac{2}{L}$, we have

$$\begin{aligned} &\bigl\| (I-\lambda_{t}\nabla f)x-(I-\lambda_{t}\nabla f)y \bigr\| ^{2} \\ &\quad=\biggl\| \biggl(I-\frac{2}{L}\nabla f \biggr)x- \biggl(I- \frac{2}{L}\nabla f \biggr)y\biggr\| ^{2} \\ &\quad= \biggl\langle x-y-\frac{2}{L} \bigl(\nabla f(x)-\nabla f(y) \bigr), x-y- \frac{2}{L} \bigl(\nabla f(x)-\nabla f(y) \bigr) \biggr\rangle \\ &\quad=\|x-y\|^{2}+\frac{4}{L^{2}}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2}-\frac {2}{L} \bigl\langle x-y,\nabla f(x)-\nabla f(y) \bigr\rangle \\ &\qquad{}-\frac{2}{L} \bigl\langle \nabla f(x)-\nabla f(y),x-y \bigr\rangle \\ &\quad\leq\|x-y\|^{2}+\frac{4}{L^{2}}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2}-\frac {2}{L}\cdot\frac{1}{L}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2} \\ &\qquad{}-\frac{2}{L}\cdot\frac{1}{L}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2} \\ &\quad\leq\|x-y\|^{2}. \end{aligned}$$

Hence,

$$\bigl\| (I-\lambda_{t})\nabla f(x)-(I-\lambda_{t})\nabla f(y)\bigr\| \leq\|x-y\|. $$

As the mapping $\operatorname{prox}_{\lambda_{t}}$ is nonexpansive, we get

$$\begin{aligned} &\bigl\| \bigl(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda _{t}\nabla f)\bigr)x-\bigl(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t} \nabla f)\bigr)y\bigr\| \\ &\quad\leq\bigl\| (I-\lambda_{t})\nabla f(x)-(I-\lambda_{t})\nabla f(y)\bigr\| \\ &\quad\leq\|x-y\|. \end{aligned}$$

Case 2: $0<\lambda_{t}<\frac{2}{L}$. We follow the proof of [16]. Since ∇f is L-Lipschitzian, ∇f is $(1/L)$-ism [18], which then implies that $\lambda_{t}\nabla f$ is $(1/\lambda_{t}L)$-ism. So $I-\lambda_{t}\nabla f$ is $(\lambda_{t}L/2)$-averaged. Since the proximal mapping $\operatorname{prox}_{\lambda_{t}g}$ is (1/2)-averaged, the composite $\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t})\nabla f$ is $((2+\lambda _{t}L)/4)$-averaged for $0<\lambda_{t}<2/L$, so the mapping $\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t})\nabla f$ is nonexpansive. □

By Proposition 3.2 it is not hard to see that $S_{t}$ is a contraction on H. For $x,y\in H$, we have

$$\begin{aligned} &\bigl\| S_{t}(x)-S_{t}(y)\bigr\| \\ &\quad=\bigl\| t \bigl[h(x)-h(y) \bigr]+(1-t) \bigl[ \bigl(\operatorname{prox}_{\lambda_{t}g} \circ\,(I-\lambda_{t}\nabla f) \bigr)x- \bigl(\operatorname{prox}_{\lambda_{t}g} \circ\,(I-\lambda_{t}\nabla f) \bigr)y \bigr]\bigr\| \\ &\quad\leq t\rho\|x-y\|+(1-t)\|x-y\| \\ &\quad= \bigl(1-t(1-\rho) \bigr)\|x-y\|. \end{aligned}$$

Hence, $S_{t}$ has a unique point, we denote it by $x_{t}$. Thus $x_{t}$ is the unique solution of the fixed point equation

$$ x_{t}=t h(x_{t})+(1-t) \bigl( \operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t}\nabla f) \bigr)x_{t}. $$

(3.3)

We will use the following notation in Proposition 3.3 and Theorem 3.4:

$$\begin{aligned}& V_{t}=V_{\lambda_{t}}=\operatorname{prox}_{\lambda_{t}g}\circ\,(I- \lambda_{t}\nabla f); \\& V=V_{\lambda}=\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f). \end{aligned}$$

The properties of $V_{t}$ and V are helpful for the following proof [19].

Proposition 3.3

Assume that (3.1) is consistent, and let S denote its solution set. Assume that $\lambda_{t}$ is continuous with respect to t. Since $0<\lambda_{t}\leq2/L$, we assume that $\lambda_{t_{j}}\rightarrow \lambda $ ($t_{j}\rightarrow0$) and that $\lambda>0$. Let $x_{t}$ be defined by (3.3), we have

(i)
$\{x_{t}\}$ is bounded for $t\in(0, 1)$;
(ii)
$\lim_{t\rightarrow0}\|x_{t}-(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda \nabla f))x_{t}\|=0$;
(iii)
$x_{t}$ defines a continuous curve from $(0,1)$ into H.

Proof

(i) Take a $p\in S$, then we have $p\in \operatorname{Fix}(V)$, and

$$\begin{aligned} &\|x_{t}-p\| \\ &\quad=\bigl\| th(x_{t})+(1-t)V_{t}x_{t}-p\bigr\| \\ &\quad\leq(1-t)\|V_{t}x_{t}-p\|+t\bigl\| h(x_{t})-p\bigr\| \\ &\quad=(1-t)\|V_{t}x_{t}-V_{t}p\|+t \bigl\| h(x_{t})-p\bigr\| \\ &\quad\leq(1-t)\|x_{t}-p\|+t\bigl\| h(x_{t})-p\bigr\| . \end{aligned}$$

It follows that

$$\begin{aligned} &\|x_{t}-p\| \\ &\quad\leq\bigl\| h(x_{t})-p\bigr\| \\ &\quad\leq\bigl\| h(x_{t})-h(p)\bigr\| +\bigl\| h(p)-p\bigr\| \\ &\quad\leq\rho\|x_{t}-p\|+\bigl\| h(p)-p\bigr\| . \end{aligned}$$

Hence, $\|x_{t}-p\|\leq\frac{1}{1-\rho}\|h(p)-p\|$, and $\{x_{t}\}$ is bounded, so are $\{\operatorname{prox}_{\lambda g} \circ\,(I-\lambda\nabla f)x_{t}\}$ and $\{h(x_{t})\}$.

(ii) By the definition of $\{x_{t}\}$, we have, for any $\{t_{j}\} \rightarrow0$,

$$\begin{aligned} &\bigl\| x_{t_{j}}- \bigl(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda \nabla f) \bigr)x_{t_{j}}\bigr\| \\ &\quad=\bigl\| t_{j}h(x_{t_{j}})+(1-t_{j})V_{t_{j}}x_{t_{j}}-Vx_{t_{j}} \bigr\| \\ &\quad\leq t_{j}\bigl\| h(x_{t_{j}})-V_{t_{j}}x_{t_{j}} \bigr\| + \| V_{t_{j}}x_{t_{j}}-Vx_{t_{j}}\| \end{aligned}$$

and

$$\begin{aligned} &\|V_{t_{j}}x_{t_{j}}-Vx_{t_{j}}\| \\ &\quad=\bigl\| \bigl(\operatorname{prox}_{\lambda_{t_{j}}g}\circ\,(I-\lambda_{t_{j}} \nabla f) \bigr)x_{t_{j}}- \bigl(\operatorname{prox}_{\lambda g} \circ\,(I- \lambda\nabla f) \bigr)x_{t_{j}}\bigr\| \\ &\quad=\biggl\| \biggl(\operatorname{prox}_{\lambda g} \biggl(\frac{\lambda}{\lambda_{t_{j}}}(I- \lambda _{t_{j}}\nabla f) \biggr)x_{t_{j}}+ \biggl(1- \frac{\lambda}{\lambda _{t_{j}}} \biggr) \bigl(\operatorname{prox}_{\lambda_{t_{j}} g}\circ\,(I- \lambda_{t_{j}}\nabla f) \bigr)x_{t_{j}} \biggr) \\ &\qquad{}- \bigl(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f) \bigr)x_{t_{j}}\biggr\| \\ &\quad\leq\biggl\| \frac{\lambda}{\lambda_{t_{j}}}(I-\lambda_{t_{j}}\nabla f)x_{t_{j}}+ \biggl(1-\frac{\lambda}{\lambda_{t_{j}}} \biggr) \bigl( \operatorname{prox}_{\lambda_{t_{j}} g} \circ\,(I-\lambda_{t_{j}}\nabla f) \bigr)x_{t_{j}}-(I-\lambda\nabla f)x_{t_{j}}\biggr\| \\ &\quad=\biggl\| \biggl(\frac{\lambda}{\lambda_{t_{j}}}-1 \biggr)x_{t_{j}}+ \biggl(1- \frac{\lambda }{\lambda_{t_{j}}} \biggr) \bigl(\operatorname{prox}_{\lambda_{t_{j}}g}\circ\,(I-\lambda _{t_{j}}\nabla f)\bigr)x_{t_{j}}\biggr\| \\ &\quad=\biggl|1-\frac{\lambda}{\lambda_{t_{j}}}\biggr|\cdot\| x_{t_{j}}-V_{t_{j}}x_{t_{j}} \|. \end{aligned}$$

Then we obtain

$$ \bigl\| x_{t_{j}}-\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)x_{t_{j}}\bigr\| \leq t_{j}\bigl\| h(x_{t_{j}})-V_{t_{j}}x_{t_{j}} \bigr\| +\biggl|1-\frac{\lambda}{\lambda _{t_{j}}}\biggr|\cdot\|x_{t_{j}}-V_{t_{j}}x_{t_{j}} \|. $$

(3.4)

Since $\{x_{t}\}$, $\{h(x_{t})\}$, and $\{V_{t}x_{t}\}$ are bounded, and $\lambda_{t_{j}}\rightarrow\lambda $ ($t_{j}\rightarrow0$), we can obtain by (3.4) that

$$\lim_{t_{j}\rightarrow0}\bigl\| x_{t_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I- \lambda \nabla f)x_{t_{j}}\bigr\| =0. $$

By the arbitrariness of $t_{j}$, we get

$$\lim_{t\rightarrow0}\bigl\| x_{t}-\operatorname{prox}_{\lambda g}\circ\,(I- \lambda\nabla f)x_{t}\bigr\| =0. $$

(iii) For any given $t,t_{0}\in(0,1)$,

$$\begin{aligned} &\|x_{t}-x_{t_{0}}\|\\ &\quad=\bigl\| th(x_{t})+(1-t)V_{t}x_{t}-t_{0}h(x_{t_{0}})-(1-t_{0})V_{t_{0}}x_{t_{0}} \bigr\| \\ &\quad\leq\bigl\| (t-t_{0})h(x_{t})+t_{0} \bigl(h(x_{t})-h(x_{t_{0}}) \bigr)\bigr\| \\ &\qquad{}+\bigl\| (1-t_{0})V_{t_{0}}x_{t}-(1-t_{0})V_{t_{0}}x_{t_{0}}+(1-t)V_{t}x_{t}-(1-t_{0})V_{t_{0}}x_{t} \bigr\| \\ &\quad\leq|t-t_{0}|\cdot\bigl\| h(x_{t})\bigr\| +t_{0}\rho \|x_{t}-x_{t_{0}}\| +(1-t_{0})\|x_{t}-x_{t_{0}} \| \\ &\qquad{}+\|V_{t}x_{t}-V_{t_{0}}x_{t}\|+ \|tV_{t}x_{t}-t_{0}V_{t_{0}}x_{t} \| \\ &\quad\leq|t-t_{0}|\cdot\bigl\| h(x_{t})\bigr\| +t_{0}\rho \|x_{t}-x_{t_{0}}\| +(1-t_{0})\|x_{t}-x_{t_{0}} \|+\biggl|1-\frac{\lambda_{t_{0}}}{\lambda _{t}}\biggr|\cdot\|x_{t}-V_{t}x_{t}\| \\ &\qquad{}+\|t_{0}V_{t}x_{t}-t_{0}V_{t_{0}}x_{t} \|+\| tV_{t}x_{t}-t_{0}V_{t}x_{t} \| \\ &\quad\leq \bigl(1-t_{0}(1-\rho) \bigr)\|x_{t}-x_{t_{0}} \|+(t_{0}+1)\biggl|1-\frac{\lambda _{t_{0}}}{\lambda_{t}}\biggr|\cdot\|x_{t}-V_{t}x_{t} \| \\ &\qquad{}+|t-t_{0}| \bigl(\bigl\| h(x_{t})\bigr\| + \|V_{t}x_{t} \| \bigr). \end{aligned}$$

Hence,

$$ \|x_{t}-x_{t_{0}}\|\leq\frac{(\|h(x_{t})\|+\|V_{t}x_{t}\|)}{t_{0}(1-\rho )}|t-t_{0}|+ \frac{(t_{0}+1)}{t_{0}(1-\rho)}\biggl|1-\frac{\lambda _{t_{0}}}{\lambda_{t}}\biggr|\cdot\|x_{t}-V_{t}x_{t} \|. $$

(3.5)

Since $\lambda_{t}$ is continuous with respect to t, we can obtain by (3.5) that $x_{t}\rightarrow x_{t_{0}}$ as $t\rightarrow t_{0}$, that is, $\{x_{t}\}$ defines a continuous curve from $(0,1)$ into H. □

Theorem 3.4

Let $\{x_{t}\}$ be defined by (3.3). Assume that the minimization problem (3.1) is consistent, and let S denote its solution set. Assume that $\lim_{t\rightarrow0}\lambda_{t}=\lambda>0$ and that $\lambda _{t}$ is continuous with respect to t. Assume that ∇f is L-Lipschitzian. Then $x_{t}$ converges as $t\rightarrow0$ to a solution $x^{*}$ of (3.1), which also solves the variational inequality

$$ \bigl\langle (I-h)x^{*}, \tilde{x}-x^{*} \bigr\rangle \geq0, \quad\tilde{x}\in S. $$

(3.6)

Proof

By Lemma 2.8 we know that $I-h$ is strongly monotone, so the variational inequality (3.6) has only one solution. Below we use $x^{*}\in S$ to denote the unique solution of (3.6).

To prove that $x_{t}\rightarrow x^{*}$ ($t\rightarrow0$), we write, for a given $\tilde{x}\in S$,

$$\begin{aligned}& x_{t}-\tilde{x}=t h(x_{t})+(1-t)V_{t}x_{t}- \tilde{x}=t \bigl(h(x_{t})-\tilde {x} \bigr)+(1-t) (V_{t}x_{t}- \tilde{x}); \\& \begin{aligned}[b] &\|x_{t}-\tilde{x}\|^{2} \\ &\quad=\langle x_{t}-\tilde{x},x_{t}-\tilde{x} \rangle \\ &\quad=(1-t)\langle V_{t}x_{t}-\tilde{x},x_{t}- \tilde{x}\rangle+t \bigl\langle h(x_{t})-\tilde{x},x_{t}- \tilde{x} \bigr\rangle \\ &\quad\leq(1-t)\|x_{t}-\tilde{x}\|^{2}+t \bigl\langle h(x_{t})-h(\tilde {x}),x_{t}-\tilde{x} \bigr\rangle +t \bigl\langle h(\tilde{x})-\tilde{x},x_{t}-\tilde {x} \bigr\rangle \\ &\quad\leq \bigl(1-t(1-\rho) \bigr)\|x_{t}-\tilde{x}\|^{2}+t \bigl\langle h(\tilde{x})-\tilde {x},x_{t}-\tilde{x} \bigr\rangle . \end{aligned} \end{aligned}$$

Hence,

$$ \|x_{t}-\tilde{x}\|^{2}\leq \frac{\langle h(\tilde{x})-\tilde {x},x_{t}-\tilde{x}\rangle}{1-\rho}. $$

(3.7)

Since $\{x_{t}\}$ is bounded as $t\rightarrow0$, it is obvious that if $\{t_{j}\}$ is a sequence in $(0,1)$ such that $t_{j}\rightarrow0$ ($j\rightarrow\infty$) and $x_{t_{j}}\rightharpoonup\bar{x}$, then by (3.7) we get $x_{t_{j}}\rightarrow\bar{x}$. Since $0<\lambda_{t}\leq \frac{2}{L}$, we may assume that $\lambda_{t_{j}}\rightarrow\lambda\in (0,\frac{2}{L}]$ and that $\lambda>0$, then by the proof of Proposition 3.2 we see that $\operatorname{prox}_{\lambda g}(I-\lambda\nabla f)$ is also nonexpansive. Applying Lemma 2.7 and Proposition 3.3(ii), we get $\bar {x}\in S$.

Next, we show that $\bar{x}\in S$ solves the variational inequality (3.6). Indeed, we notice that $x_{t}$ solves the fixed point equation

$$\begin{aligned}& x_{t}=t h(x_{t})+(1-t) \bigl(\operatorname{prox}_{\lambda_{t}g} \circ\,(I-\lambda_{t}\nabla f) \bigr)x_{t}, \\& h(x_{t})=\frac{1}{t} \bigl[x_{t}-(1-t)V_{t}x_{t} \bigr], \\& (I-h)x_{t}=-\frac{1-t}{t}(I-V_{t})x_{t}. \end{aligned}$$

Since $V_{t}$ is nonexpansive, $I-V_{t}$ is monotone, so for any $\tilde {x}\in S$,

$$\begin{aligned} &\bigl\langle (I-h)x_{t},x_{t}-\tilde{x}\bigr\rangle \\ &\quad=-\frac{1-t}{t} \bigl\langle (I-V_{t})x_{t},x_{t}- \tilde{x} \bigr\rangle \\ &\quad=-\frac{1-t}{t} \bigl\langle (I-V_{t})x_{t}-(I-V_{t}) \tilde{x},x_{t}-\tilde {x} \bigr\rangle -\frac{1-t}{t} \bigl\langle (I-V_{t})\tilde{x},x_{t}-\tilde{x} \bigr\rangle \\ &\quad\leq-\frac{1-t}{t} \bigl\langle (I-V_{t}) \tilde{x},x_{t}- \tilde{x} \bigr\rangle \\ &\quad=0. \end{aligned}$$

Taking the limit through $t=t_{n}\rightarrow0$, we obtain

$$\bigl\langle (I-h)\bar{x},\bar{x}-\tilde{x} \bigr\rangle \leq0. $$

Therefore $\bar{x}=x^{*}$ by uniqueness. □

Initialize $x_{0}\in H$ and iterate

$$ x_{n+1}=\alpha_{n}h(x_{n})+(1- \alpha_{n}) \bigl(\operatorname{prox}_{\lambda_{n}g}\circ\, (I- \lambda_{n}\nabla f) \bigr)x_{n}, $$

(3.8)

where $\{\alpha_{n}\}$ is a sequence in $(0,1)$, $0<\lambda_{n}\leq\frac {2}{L}$, $\liminf_{n\rightarrow\infty}\lambda_{n}>0$, and $h:H\rightarrow H$ is a contraction with the coefficient $0<\rho<1$.

Theorem 3.5

Let $f,g\in\Gamma_{0}(H)$. Assume that the minimization problem (3.1) is consistent and let S denote its solution set. Assume in addition that

(C1)
∇f is L-Lipschitzian on H;
(C2)
$\alpha_{n}\rightarrow0$;
(C3)
$\sum_{n=0}^{\infty}\alpha_{n}=\infty$;
(C4)
$\sum_{n=0}^{\infty}|\alpha_{n+1}-\alpha_{n}|<\infty$;
(C5)
$\sum_{n=0}^{\infty}|\lambda_{n+1}-\lambda_{n}|<\infty$.

Then the sequence $\{x_{n}\}_{n=_{0}}^{\infty}$ generated by (3.8) converges to $x^{*}$ as defined in Theorem 3.4.

Proof

Putting

$$\begin{aligned}& V_{n}=V_{\lambda_{n}}=\operatorname{prox}_{\lambda_{n}g}\circ\,(I- \lambda_{n}\nabla f); \\& V=V_{\lambda}=\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f). \end{aligned}$$

We then get $x_{n+1}=\alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}$.

First we show that the sequence $\{x_{n}\}_{n=0}^{\infty}$ is bounded. Indeed, we have, for $\bar{x}\in S$,

$$\begin{aligned} &\|x_{n+1}-\bar{x}\| \\ &\quad=\bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}- \bar{x}\bigr\| \\ &\quad=\bigl\| \alpha_{n} \bigl(h(x_{n})-h(\bar{x}) \bigr)+ \alpha_{n} \bigl(h(\bar{x})-\bar {x} \bigr)+(1-\alpha_{n}) (V_{n}x_{n}-\bar{x})\bigr\| \\ &\quad\leq\alpha_{n}\rho\|x_{n}-\bar{x}\|+ \alpha_{n} \bigl\| h(\bar{x})-\bar{x}\bigr\| +(1-\alpha_{n}) \|x_{n}-\bar{x}\| \\ &\quad= \bigl(1-\alpha_{n}(1-\rho) \bigr)\|x_{n}-\bar{x}\|+ \alpha_{n}\bigl\| h(\bar{x})-\bar {x}\bigr\| \\ &\quad\leq \max \bigl\{ \|x_{n}-\bar{x}\|,\bigl\| h(\bar{x})-\bar{x}\bigr\| /(1- \rho) \bigr\} . \end{aligned}$$

So, an induction argument shows that

$$\|x_{n}-\bar{x}\|\leq \max \bigl\{ \|x_{0}-\bar{x}\|,\bigl\| h( \bar{x})-\bar{x}\bigr\| /(1-\rho) \bigr\} ,\quad n\geq0. $$

We next prove that $\|x_{n+1}-x_{n}\|\rightarrow0$ as $n\rightarrow \infty$. For the sake of simplicity, we assume that $0< a\leq\lambda _{n}\leq b\leq\frac{2}{L}$.

We compute

$$\begin{aligned} &\|x_{n+1}-x_{n}\| \\ &\quad=\bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}- \alpha _{n-1}h(x_{n-1})-(1-\alpha_{n-1})V_{n-1}x_{n-1} \bigr\| \\ &\quad\leq\bigl\| \alpha_{n} \bigl[h(x_{n})-h(x_{n-1}) \bigr]+ \bigl[\alpha_{n}h(x_{n-1})-\alpha _{n-1}h(x_{n-1}) \bigr]\bigr\| \\ &\qquad{}+\bigl\| (1-\alpha_{n}) (V_{n}x_{n}-V_{n}x_{n-1})+(1- \alpha _{n})V_{n}x_{n-1}-(1-\alpha_{n-1})V_{n-1}x_{n-1} \bigr\| \\ &\quad\leq\alpha_{n}\rho\|x_{n}-x_{n-1}\|+| \alpha_{n}-\alpha_{n-1}|\bigl\| h(x_{n-1})\bigr\| +(1- \alpha_{n})\|x_{n}-x_{n-1}\| \\ &\qquad{}+(1-\alpha_{n})\|V_{n}x_{n-1}-V_{n-1}x_{n-1} \|+\|\alpha _{n}V_{n-1}x_{n-1}-\alpha_{n-1}V_{n-1}x_{n-1} \| \\ &\quad\leq \bigl(1-\alpha_{n}(1-\rho) \bigr)\|x_{n}-x_{n-1} \|+|\alpha_{n}-\alpha _{n-1}| \bigl(\bigl\| h(x_{n-1})\bigr\| + \|V_{n-1}x_{n-1}\| \bigr) \\ &\qquad{}+\|V_{n}x_{n-1}-V_{n-1}x_{n-1}\| \end{aligned}$$

and

$$\begin{aligned} \|V_{n}x_{n-1}-V_{n-1}x_{n-1}\| &\leq\biggl|1-\frac{\lambda_{n}}{\lambda_{n-1}}\biggr| \cdot\|x_{n-1}-V_{n}x_{n-1} \| \\ &\leq\frac{|\lambda_{n}-\lambda_{n-1}|}{a}\|x_{n-1}-V_{n}x_{n-1} \|, \end{aligned}$$

so we obtain

$$ \|x_{n+1}-x_{n}\|\leq \bigl(1- \alpha_{n}(1-\rho) \bigr)\|x_{n}-x_{n-1}\|+M\bigl(|\alpha _{n}-\alpha_{n-1}|+|\lambda_{n}- \lambda_{n-1}|\bigr), $$

(3.9)

where $M\leq \max\{\|h(x_{n-1})\|+\|V_{n-1}x_{n-1}\|,\| x_{n-1}-V_{n}x_{n-1}\|/a\}$. By assumptions (C3)-(C5) in the theorem, we have $\sum_{n=0}^{\infty}\alpha_{n}=\infty$, and $\sum_{n=1}^{\infty}(|\alpha_{n}-\alpha_{n-1}|+|\lambda_{n}-\lambda _{n-1}|)<\infty$. Hence, Lemma 2.5 is applicable to (3.9) and we conclude that $\|x_{n+1}-x_{n}\|\rightarrow0$.

Since $\{x_{n}\}$ is bounded, there exists a subsequence $\{x_{n_{j}}\} $ such that $x_{n_{j}}\rightharpoonup z$; below we will prove that $z\in S$. Since $0<\lambda_{n}\leq\frac{2}{L}$, we may assume that $\lambda_{n_{j}}\rightarrow\lambda$. We have

$$ \bigl\| x_{n_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I- \lambda\nabla f)x_{n_{j}}\bigr\| \leq\|x_{n_{j}}-x_{n_{j+1}}\|+ \bigl\| x_{n_{j+1}}-\operatorname{prox}_{\lambda g}\circ\, (I-\lambda\nabla f)x_{n_{j}}\bigr\| . $$

(3.10)

We compute

$$\begin{aligned} &\bigl\| x_{n_{j+1}}-\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f)x_{n_{j}}\bigr\| \\ &\quad=\bigl\| \alpha_{n_{j}}h(x_{n_{j}})+(1-\alpha_{n_{j}}) \bigl( \operatorname{prox}_{\lambda _{n_{j}}g}\circ\,(I-\lambda_{n_{j}}\nabla f) \bigr)x_{n_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f)x_{n_{j}}\bigr\| \\ &\quad\leq\alpha_{n_{j}}\bigl\| h(x_{n_{j}})-Vx_{n_{j}}\bigr\| +(1- \alpha_{n_{j}})\| V_{n_{j}}x_{n_{j}}-Vx_{n_{j}}\| \end{aligned}$$

and

$$\begin{aligned} &\|V_{n_{j}}x_{n_{j}}-Vx_{n_{j}}\| \\ &\quad=\bigl\| \operatorname{prox}_{\lambda_{n_{j}}g}\circ\, (I-\lambda_{n_{j}}\nabla f)x_{n_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I-\lambda \nabla f)x_{n_{j}}\bigr\| \\ &\quad\leq\biggl\| \frac{\lambda}{\lambda_{n_{j}}}(I-\lambda_{n_{j}}\nabla f)x_{n_{j}}+ \biggl(1-\frac{\lambda}{\lambda_{n_{j}}} \biggr) \bigl( \operatorname{prox}_{\lambda _{n_{j}}g} \circ\,(I-\lambda_{n_{j}}\nabla f) \bigr)x_{n_{j}}-(I-\lambda\nabla f)x_{n_{j}}\biggr\| \\ &\quad=\biggl\| \biggl(\frac{\lambda}{\lambda_{n_{j}}}-1 \biggr)x_{n_{j}}+ \biggl(1- \frac{\lambda }{\lambda_{n_{j}}} \biggr)\operatorname{prox}_{\lambda_{n_{j}}g}(I- \lambda_{n_{j}}\nabla f)x_{n_{j}}\biggr\| \\ &\quad=\biggl|1-\frac{\lambda}{\lambda_{n_{j}}}\biggr|\cdot\| x_{n_{j}}-V_{n_{j}}x_{n_{j}} \| \\ &\quad\leq|\lambda_{n_{j}}-\lambda|\frac{\|x_{n_{j}}-V_{n_{j}}x_{n_{j}}\|}{a}. \end{aligned}$$

So we obtain

$$\begin{aligned} &\bigl\| x_{n_{j+1}}-\operatorname{prox}_{\lambda g}\circ\, (I- \lambda\nabla f)x_{n_{j}}\bigr\| \\ &\quad\leq\alpha_{n_{j}}\bigl\| h(x_{n_{j}})-Vx_{n_{j}} \bigr\| +(1- \alpha_{n_{j}})|\lambda _{n_{j}}-\lambda| \frac{\|x_{n_{j}}-V_{n_{j}}x_{n_{j}}\|}{a}. \end{aligned}$$

(3.11)

Combining (3.10) and (3.11) we get

$$\begin{aligned} &\bigl\| x_{n_{j}}-\bigl(\operatorname{prox}_{\lambda g} \circ\,(I- \lambda\nabla f)\bigr)x_{n_{j}}\bigr\| \\ &\quad\leq\|x_{n_{j}}-x_{n_{j+1}}\|+\alpha_{n_{j}} \bigl\| h(x_{n_{j}})-Vx_{n_{j}}\bigr\| +(1-\alpha_{n_{j}})| \lambda_{n_{j}}-\lambda|\frac{\| x_{n_{j}}-V_{n_{j}}x_{n_{j}}\|}{a}. \end{aligned}$$

(3.12)

Since $\lambda_{n_{j}}\rightarrow\lambda$, and $\alpha_{n}\rightarrow 0$, by (3.12) we get

$$ \bigl\| x_{n_{j}}- \bigl(\operatorname{prox}_{\lambda g} \circ\,(I-\lambda\nabla f) \bigr)x_{n_{j}}\bigr\| \rightarrow0. $$

(3.13)

By the proof of Theorem 3.4 we know that $\operatorname{prox}_{\lambda g}\circ\, (I-\lambda\nabla f)$ is nonexpansive. It follows from Lemma 2.7 and (3.13) that $z\in S$.

We next show that

$$ \limsup_{n\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n}-x^{*} \bigr\rangle \leq0, $$

(3.14)

where $x^{*}$ is obtained in Theorem 3.4. Indeed, replacing n with $n_{j}$ in (3.14), and letting $j\rightarrow\infty$, we have

$$\limsup_{n\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n}-x^{*} \bigr\rangle =\lim _{j\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n_{j}}-x^{*} \bigr\rangle . $$

Hence, by (3.6), we obtain

$$\limsup_{n\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n}-x^{*} \bigr\rangle = \bigl\langle h \bigl(x^{*} \bigr)-x^{*},z-x^{*} \bigr\rangle \leq0. $$

We finally show that $x_{n}\rightarrow x^{*}$. We have

$$\begin{aligned} &\bigl\| x_{n+1}-x^{*}\bigr\| ^{2} \\ &\quad=\bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}-x^{*} \bigr\| ^{2} \\ &\quad=\bigl\| \alpha_{n} \bigl(h(x_{n})-h \bigl(x^{*} \bigr) \bigr)+(1-\alpha _{n}) \bigl(V_{n}x_{n}-x^{*} \bigr)+\alpha_{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr)\bigr\| ^{2} \\ &\quad\leq\bigl\| \alpha_{n} \bigl(h(x_{n})-h \bigl(x^{*} \bigr) \bigr)+(1-\alpha_{n}) \bigl(V_{n}x_{n}-x^{*} \bigr)\bigr\| ^{2}+2 \bigl\langle \alpha_{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr),x_{n+1}-x^{*} \bigr\rangle \\ &\quad\leq\alpha_{n}\bigl\| h(x_{n})-h \bigl(x^{*} \bigr) \bigr\| ^{2}+(1-\alpha_{n})\bigl\| V_{n}x_{n}-x^{*} \bigr\| ^{2}+2 \bigl\langle \alpha _{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr),x_{n+1}-x^{*} \bigr\rangle \\ &\quad\leq\alpha_{n}\rho^{2}\bigl\| x_{n}-x^{*} \bigr\| ^{2}+(1-\alpha_{n})\bigl\| x_{n}-x^{*} \bigr\| ^{2}+2 \bigl\langle \alpha _{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr),x_{n+1}-x^{*} \bigr\rangle \\ &\quad\leq \bigl(1-\alpha_{n} \bigl(1-\rho^{2} \bigr) \bigr) \bigl\| x_{n}-x^{*}\bigr\| ^{2}+2\alpha _{n} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n+1}-x^{*} \bigr\rangle . \end{aligned}$$

It then follows that

$$ \bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}-x^{*} \bigr\| ^{2}\leq \bigl(1-\alpha _{n} \bigl(1-\rho^{2} \bigr) \bigr)\bigl\| x_{n}-x^{*}\bigr\| ^{2}+ \alpha_{n} \delta_{n}, $$

(3.15)

where $\delta_{n}=2\langle h(x^{*})-x^{*},x_{n+1}-x^{*}\rangle$. Applying Lemma 2.5 to the inequality (3.15), together with (3.14), we get $x_{n}\rightarrow x^{*}$ as $n\rightarrow\infty$. □

4 An application of proximal algorithm to the lasso and Q-lasso

Take $f(x)=\frac{1}{2}\|Ax-b\|_{2}^{2}$ and $g(x)=\gamma\|x\|_{1}$, then lasso (1.2) can be solved by the proximal algorithms (3.8). We have $\nabla f(x)=A^{t}(Ax-b)$, and we show that ∇f is Lipschitz continuous with constant $L=\|A\|_{2}^{2}$ as follows:

$$\begin{aligned} \bigl\| A^{t}(Ax-b)-A^{t}(Ay-b)\bigr\| _{2} =\bigl\| A^{t}A(x-y)\bigr\| _{2} \leq\|A\|_{2}^{2}\|x-y\|_{2}. \end{aligned}$$

Then the proximal algorithm (3.8) is equivalent to

$$ x_{n+1}=\alpha_{n}h(x_{n})+(1- \alpha_{n}) \bigl[\operatorname{prox}_{\lambda_{n}\gamma\|\cdot \|_{1}} \bigl(I- \lambda_{n}A^{t}(Ax-b) \bigr) \bigr]x_{n}. $$

(4.1)

Here we have, for $\alpha>0$ and $x=(x_{i})^{t}\in\mathbb{R}^{n}$,

$$\operatorname{prox}_{\alpha\|\cdot\|_{1}}(x)= \bigl(\operatorname{prox}_{\alpha|\cdot|}(x_{1}), \ldots,\operatorname{prox}_{\alpha|\cdot|}(x_{n}) \bigr)^{t}, $$

and $\operatorname{prox}_{\alpha|\cdot|}(\beta)= \operatorname{sgn}(\beta)\max\{|\beta|-\alpha,0\}$ for $\beta\in\mathbb{R}$.

Theorem 4.1

Assume that

(C1)
$0<\lambda_{n}\leq2/\|A\|_{2}^{2}$;
(C2)
$\alpha_{n}\rightarrow0$;
(C3)
$\sum_{n=0}^{\infty}\alpha_{n}=\infty$;
(C4)
$\sum_{n=0}^{\infty}|\alpha_{n+1}-\alpha_{n}|<\infty$;
(C5)
$\sum_{n=0}^{\infty}|\lambda_{n+1}-\lambda_{n}|<\infty$.

Then the sequence $\{x_{n}\}_{n=0}^{\infty}$ generated by (4.1) converges to a solution $x^{*}$ of lasso (1.2), which also solves the variational inequality (3.6).

For Q-lasso (1.6), we take $f(x)=(1/2)\|(I-P_{Q})Ax\|_{2}^{2}$ and $g(x)=\gamma\|x\|_{1}$. Since $P_{Q}$ is $\frac{1}{2}$-averaged, $I-P_{Q}$ is nonexpansive, so we can show that $\nabla f(x)=A^{t}(I-P_{Q})Ax$ is Lipschitz continuous with constant $L=\|A\| _{2}^{2}$ as follows:

$$\begin{aligned} &\bigl\| A^{t}(I-P_{Q})Ax-A^{t}(I-P_{Q})Ay \bigr\| _{2} \\ &\quad\leq\|A\|_{2}\bigl\| (I-P_{Q})Ax-(I-P_{Q})Ay \bigr\| _{2} \\ &\quad\leq\|A\|_{2}\|Ax-Ay\|_{2} \\ &\quad\leq\|A\|_{2}^{2}\|x-y\|_{2}. \end{aligned}$$

Then the proximal algorithm (3.8) is reduced to the following algorithm for Q-lasso (1.6):

$$ x_{n+1}=\alpha_{n}f(x_{n})+(1- \alpha_{n}) \bigl[\operatorname{prox}_{\lambda_{n}\gamma\|\cdot \|_{1}} \bigl(I- \lambda_{n}A^{t}(I-P_{Q})A \bigr) \bigr]x_{n}. $$

(4.2)

The convergence of Theorem 3.5 reads as follows for Q-lasso (1.6).

Theorem 4.2

Assume that

(C1)
$0<\lambda_{n}\leq2/\|A\|_{2}^{2}$;
(C2)
$\alpha_{n}\rightarrow0$;
(C3)
$\sum_{n=0}^{\infty}\alpha_{n}=\infty$;
(C4)
$\sum_{n=0}^{\infty}|\alpha_{n+1}-\alpha_{n}|<\infty$;
(C5)
$\sum_{n=0}^{\infty}|\lambda_{n+1}-\lambda_{n}|<\infty$.

Then the sequence $\{x_{n}\}_{n=0}^{\infty}$ generated by (4.2) converges to a solution $x^{*}$ of Q-lasso (1.6), which is also a solution of the variational inequality (3.6).

5 Conclusion

1.
We modify the proximal-gradient algorithm based on the viscosity proximation method; thus, we obtain strong convergence of results in [11]. Then we apply our results to the lasso and the Q-lasso.
2.
Theorem 3.4 proves the continuous version of Theorem 3.5, which is not presented in [11].
3.
In our main result, we extend the scope of L, that is, the condition in our main results is weaker than the condition in [11] and [12].

References

Tibshirani, R: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B 58, 267-288 (1996)
MATH MathSciNet Google Scholar
Chen, SS, Donoho, DL, Saunders, MA: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33-61 (1998)
Article MathSciNet Google Scholar
Donoho, D: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289-1306 (2000)
Article MathSciNet Google Scholar
Candes, EJ, Tao, T: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406-5425 (2006)
Article MATH MathSciNet Google Scholar
Candes, EJ, Romberg, J, Tao, T: Stable signal recovery from incomplete and inaccurate measurement. Commun. Pure Appl. Math. 59(2), 1207-1223 (2006)
Article MATH MathSciNet Google Scholar
Candes, EJ, Romberg, J, Tao, T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489-509 (2006)
Article MATH MathSciNet Google Scholar
Cipra, BA: $l_{1}$-magic. SIAM News. 39, 9 (2006)
Censor, Y, Elfving, T: A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 8(2-4), 221-239 (1994)
Article MATH MathSciNet Google Scholar
Censor, Y, Elfving, T, Kopt, N, Bortfeld, T: The multiple-sets split-feasibility problem and its applications for inverse problem. Inverse Probl. 21, 2071-2084 (2005)
Article MATH Google Scholar
Censor, Y, Elfving, T, Martin, B, Trfimov, A: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51, 2353-2365 (2006)
Article Google Scholar
Xu, HK: Properties and iterative methods for the lasso and its variants. Chin. Ann. Math., Ser. B 35(3), 1-18 (2014)
Article MATH Google Scholar
Alghamdi, MA, Alghamdi, MA, Shahzad, N, Xu, HK: Properties and iterative methods for the Q-lasso. Abstr. Appl. Anal. 2013, Article ID 250943 (2013)
MathSciNet Google Scholar
Moudafi, A: Viscosity approximation methods for fixed-points problems. J. Math. Anal. Appl. 241, 46-55 (2000)
Article MATH MathSciNet Google Scholar
Xu, HK: Viscosity approximation methods for nonexpansive mapping. J. Math. Anal. Appl. 298, 279-291 (2004)
Article MATH MathSciNet Google Scholar
Combettes, PL, Wajs, R: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168-1200 (2005)
Article MATH MathSciNet Google Scholar
Xu, HK: Averaged mappings and the gradient-projection algorithm. J. Optim. Theory Appl. 150, 360-378 (2011)
Article MATH MathSciNet Google Scholar
Geobel, K, Kirk, WA: Topics in Metric Fixed Point Theory. Cambridge Studies in Advanced Mathematics, vol. 28. Cambridge University Press, Cambridge (1990)
Book Google Scholar
Baillon, JB, Haddad, G: Quelques propriétés des opérateurs angle-bornés et n-cycliquement monotones. Isr. J. Math. 26, 137-150 (1977)
Article MATH MathSciNet Google Scholar
Marino, G, Muglia, L: On the auxiliary mappings generated by a family of mappings and solutions of variational inequalities problems. Optim. Lett. 9, 263-282 (2015)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

The authors are thankful to the anonymous referees for their valuable comments and suggestions, which helped to improve the presentation of this paper. The first author is supported by the Foundation of Tianjin Key Lab for Advanced Signal Processing and the Fundamental Research Funds for the Central Universities (No. 3122015L015).

Author information

Authors and Affiliations

College of Science, Civil Aviation University of China, Tianjin, 300300, China
Ming Tian & Jun-Ying Gong
Tianjin Key Laboratory for Advanced Signal Processing, Civil Aviation University of China, Tianjin, 300300, China
Ming Tian

Authors

Ming Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Ying Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Tian.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All the authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Tian, M., Gong, JY. Strong convergence of a modified proximal algorithm for solving the lasso. J Inequal Appl 2015, 161 (2015). https://doi.org/10.1186/s13660-015-0680-x

Download citation

Received: 21 October 2014
Accepted: 29 April 2015
Published: 16 May 2015
DOI: https://doi.org/10.1186/s13660-015-0680-x

Strong convergence of a modified proximal algorithm for solving the lasso

Abstract

1 Introduction

2 Preliminaries

Definition 2.1

Proposition 2.2

Lemma 2.3

Example

Proposition 2.4

Lemma 2.5

Lemma 2.6

Lemma 2.7

Lemma 2.8

3 Strong convergence of proximal algorithms

Proposition 3.1

Proposition 3.2

Proof

Proposition 3.3

Proof

Theorem 3.4

Proof

Theorem 3.5

Proof

4 An application of proximal algorithm to the lasso and Q-lasso

Theorem 4.1

Theorem 4.2

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords