Skip to main content

Strong convergence of a modified proximal algorithm for solving the lasso

Abstract

As is well known the proximal iterative method can be used to solve the lasso of Tibshirani (J. R. Stat. Soc., Ser. B 58:267-288, 1996). In this paper, we first propose a modified proximal iterative method based on the viscosity approximation method to obtain strong convergence, then we apply this method to solve the lasso.

1 Introduction

The lasso of Tibshirani [1] is formulated as the minimization problem

$$ \min_{x\in\mathbb{R}^{n}}\frac{1}{2}\|Ax-b \|_{2}^{2} \quad\mbox{subject to}\quad \| x\|_{1}\leq t, $$
(1.1)

where A is an \(m\times n\) (real) matrix, \(b\in\mathbb{R}^{m}\), \(t\geq 0\) is a tuning parameter. The regularization minimization problem which is equivalent to (1.1) is

$$ \min_{x\in\mathbb{R}^{n}}\frac{1}{2}\|Ax-b \|_{2}^{2}+\gamma\|x\|_{1}, $$
(1.2)

where \(\gamma>0\) is a regularization parameter. As the \(\ell_{1}\) norm promotes the sparsity phenomenon that occurs in practical problems such as image/signal processing, machine learning and so on, the lasso has received much attention in recent years.

In fact, both (1.1) and (1.2) are equivalent to the basis pursuit (BP) of Chen et al. [2]:

$$\min_{x\in\mathbb{R}^{n}}\|x\|_{1} \quad\mbox{subject to}\quad Ax=b. $$

So we mathematically study the inverse linear system in \(\mathbb{R}^{n}\):

$$ Ax=b, $$
(1.3)

where A is an \(m\times n\) matrix, \(b\in\mathbb{R}^{m}\) is an input, and \(x\in\mathbb{R}^{n}\) stands for the image of interest to be recovered in imaging science. As \(m\ll n\), the system (1.3) is underdetermined. Donoho [3], Candes, and others [4–6] pioneered the theory of compressed sensing showing that under certain conditions the underdetermined system (1.3) can determine a unique k-sparse solution. Then Cipra [7] pointed out that the \(\ell_{1}\) norm is ideal as it ensures not only the parsimony of \(\ell_{0}\) but also the computation efficiency of \(\ell_{2}\) in sparse recovery.

However, the error of measurements always results in an inaccuracy of the system (1.3):

$$Ax=b+\epsilon. $$

In this case, the BP (1.3) is reformulated as

$$ \min_{x\in\mathbb{R}^{n}}\|x\|_{1}\quad \mbox{subject to}\quad\|Ax-b \|_{2}\leq \epsilon, $$
(1.4)

where \(\epsilon>0\) is the tolerance level of errors and \(\|\cdot\|\) is a norm on \(\mathbb{R}^{n}\). If we let \(Q:=B_{\epsilon}(b)\) be the closed ball in \(\mathbb{R}^{n}\) around b and with radius of ϵ, then (1.4) is rewritten as

$$ \min_{x\in\mathbb{R}^{n}}\|x\|_{1} \quad \mbox{subject to}\quad Ax\in Q. $$
(1.5)

As Q is a nonempty, closed, and convex subset of \(\mathbb{R}^{m}\), we let \(P_{Q}\) be the projection from \(\mathbb{R}^{m}\) onto Q. The condition \(Ax\in Q\) is equivalent to the condition \(Ax-P_{Q}(Ax)=0\), so the problem (1.5) can be solved via

$$\min_{x\in\mathbb{R}^{n}}\|x\|_{1} \quad\mbox{subject to }(I-P_{Q})Ax=0. $$

Applying the Lagrange method, we obtain the so-called Q-lasso:

$$ \min_{x\in\mathbb{R}^{n}}\frac{1}{2}\bigl\| (I-P_{Q})Ax \bigr\| _{2}^{2}+\gamma\|x\|_{1}, $$
(1.6)

where \(\gamma>0\) is a Lagrangian multiplier.

The Q-lasso is connected with the split feasibility problem (SFP) of Censor and Elfving [8–10]. The SFP is mathematically formulated as the problem of finding a point x with the property:

$$ x\in C \quad\mbox{and}\quad Ax\in Q, $$
(1.7)

where C and Q are nonempty, closed, and convex subset of \(\mathbb {R}^{n}\) and \(\mathbb{R}^{m}\), respectively. An equivalent minimization formulation of the SFP (1.7) is given as

$$\min_{x\in C}\frac{1}{2}\|Ax-P_{Q}Ax \|_{2}^{2}. $$

The \(\ell_{1}\) regularization is given as the minimization problem

$$ \min_{x\in C}\frac{1}{2}\|Ax-P_{Q}Ax \|_{2}^{2}+\gamma\|x\|_{1}, $$
(1.8)

where \(\gamma>0\) is a regularization parameter. If the constrained set C is taken to be the entire space \(\mathbb{R}^{n}\), then the problem (1.8) is equivalent to the problem (1.2).

Recently, Xu [11] exploited the following proximal algorithm:

$$ x_{n+1}= \bigl(\operatorname{prox}_{\lambda_{n}g} \circ\,(I- \lambda_{n}\nabla f) \bigr)x_{n} $$
(1.9)

to solve lasso (1.1), and Alghamdi et al. [12] also discussed an iterative algorithm for solving Q-lasso (1.6) via a proximal-gradient method. However, their iterative algorithms only obtain weak convergence.

Recall that Moudafi [13] proposed the viscosity iterative method in 2000 as follows:

$$ x_{n+1}=\alpha_{n}f(x_{n})+(1- \alpha_{n})Tx_{n},\quad n\geq0, $$
(1.10)

where f is a contraction on a real Hilbert space, \(\{\alpha_{n}\}\) is a sequence in \((0,1)\). In 2004, Xu [14] proved that if \(\{\alpha_{n}\}\) satisfies certain conditions, the sequence generated by (1.10) can converge strongly to a fixed point \(x^{*}\) of T, which is also the unique solution of the variational inequality

$$\bigl\langle (I-f)x^{*},x-x^{*} \bigr\rangle \geq0,\quad \mbox{for }x\in \operatorname{Fix}(T). $$

In this paper, based on the viscosity iterative algorithm (1.10), we propose a modified formulation of the proximal algorithm (1.9). It is proved that the algorithm we propose can obtain strong convergence. Then we also apply this algorithm to solve the lasso and Q-lasso.

2 Preliminaries

Let H be a Hilbert space and let \(\Gamma_{0}\) be the space of convex functions in H that are proper, lower semicontinuous, and convex.

Definition 2.1

The proximal operator of \(\varphi\in\Gamma_{0}(H)\) is defined by

$$\operatorname{prox}_{\varphi}(x)=\arg\min_{v\in H} \biggl\{ \varphi(v)+\frac{1}{2}\|v-x\|^{2} \biggr\} ,\quad x\in H. $$

The proximal operator of order \(\lambda>0\) is defined as the proximal operator of λφ, that is,

$$\operatorname{prox}_{\lambda\varphi}(x)=\arg\min_{v\in H} \biggl\{ \varphi(v)+\frac{1}{2\lambda }\|v-x\|^{2} \biggr\} ,\quad x\in H. $$

Proposition 2.2

Let \(\varphi\in\Gamma_{0}(H)\) and \(\lambda\in(0,\infty)\).

  1. (i)

    \(\operatorname{prox}_{\lambda\varphi}\) is firmly nonexpansive (hence nonexpansive). Recall that a mapping \(T:H\rightarrow H\) is firmly nonexpansive if

    $$\|Tx-Ty\|^{2}\leq\langle Tx-Ty,x-y\rangle, \quad x,y\in H. $$
  2. (ii)

    \(\operatorname{prox}_{\lambda\varphi}=(I+\lambda\partial\varphi )^{-1}=J_{\lambda}^{\partial\varphi}\), the resolvent of the subdifferential ∂φ of φ.

Combettes and Wajs [15] shows that the proximal operator \(\operatorname{prox}_{\lambda\varphi}\) can have a closed-form expression in some important cases, for example, if we take φ to be the norm of H, then

$$\operatorname{prox}_{\lambda\|\cdot\|}(x)= \left \{ \begin{array}{@{}l@{\quad}l} (1-\frac{\lambda}{\|x\|})x, &\mbox{if }\|x\|>\lambda,\\ 0, &\mbox{if }\|x\|\leq\lambda. \end{array} \right . $$

In particular, if \(H=\mathbb{R}\), then the above operator is reduced to the scalar soft-thresholding operator:

$$\operatorname{prox}_{\lambda|\cdot|}(x)=\operatorname{sgn}(x) \max \bigl\{ |x|-\lambda,0\bigr\} . $$

Lemma 2.3

[11]

The proximal identity

$$\operatorname{prox}_{\lambda\varphi}x=\operatorname{prox}_{\mu\varphi} \biggl( \frac{\mu}{\lambda}x+ \biggl(1-\frac {\mu}{\lambda} \biggr)\operatorname{prox}_{\lambda\varphi}x \biggr) $$

holds for \(\varphi\in\Gamma_{0}(H)\), \(x\in H\), \(\lambda>0\), and \(\mu>0\).

Recall that the function \(H\rightarrow H\) is convex if

$$f \bigl((1-\lambda)x+\lambda y \bigr)\leq(1-\lambda)f(x)+\lambda f(y) $$

for all \(\lambda\in(0,1)\) and \(x,y\in H\). (Note that we consider finite-valued functions.)

The subdifferential of a convex function f is defined as the operator ∂f given by

$$ \partial f(x)= \bigl\{ \xi\in H:f(y)\geq f(x)+\langle\xi,y-x \rangle,y\in H \bigr\} . $$
(2.1)

The inequality in (2.1) is referred to as the subdifferential inequality of f at x. We say that f is subdifferentiable at x if \(\partial f(x)\) is nonempty. We know that for an everywhere finite-valued convex function f on H, f is everywhere subdifferentiable.

Example

  1. (i)

    If \(f(x)=|x|\) for \(x\in\mathbb{R}\), then \(\partial f(0)=[-1,1]\);

  2. (ii)

    if \(f(x)=\|x\|_{1}\), for \(x\in\mathbb{R}^{n}\), then \(\partial f(x)\) is given componentwise by

    $$\bigl(\partial f(x) \bigr)_{j}= \left \{ \begin{array}{@{}l@{\quad}l} \operatorname{sgn}(x_{j}), & \mbox{if }x_{j}\neq0,\\ \xi_{j}\in[-1,1],& \mbox{if }x_{j}=0, \end{array} \right . $$

    for \(1\leq j\leq n\).

Consider the unconstrained minimization problem:

$$ \min_{x\in H}f(x). $$
(2.2)

Proposition 2.4

Let f be everywhere finite-valued convex on H and \(z\in H\). Support f is bounded below (i.e., \(\operatorname{inf}\{f(x):x\in H\}>-\infty\)). Then z is a solution to minimization (2.2) if and only if it satisfies the first-order optimality condition:

$$0\in\partial f(z). $$

Lemma 2.5

[16]

Assume that \(\{a_{n}\}_{n=0}^{\infty}\) is a sequence of nonnegative real numbers such that

$$a_{n+1}\leq(1-\gamma_{n})a_{n}+\gamma_{n} \delta_{n}+\beta_{n},\quad n\geq0, $$

where \(\{\gamma_{n}\}_{n=0}^{\infty}\) and \(\{\beta_{n}\}_{n=0}^{\infty }\) are sequence in \((0,1)\) and \(\{\delta_{n}\}_{n=0}^{\infty}\) is a sequence in \(\mathbb{R}\) such that

  1. (i)

    \(\sum_{n=0}^{\infty}\gamma_{n}=\infty\);

  2. (ii)

    either \(\limsup_{n\rightarrow\infty}\delta_{n}\leq0\) or \(\sum_{n=0}^{\infty}\gamma_{n}|\delta_{n}|<\infty\);

  3. (iii)

    \(\sum_{n=0}^{\infty}\beta_{n}<\infty\).

Then \(\lim_{n\rightarrow\infty}a_{n}=0\).

Lemma 2.6

Let H be a Hilbert space, then, for all \(x,y\in H\), we have the inequality

$$\|x+y\|^{2}\leq\|x\|^{2}+2\langle y,x+y\rangle. $$

Lemma 2.7

[17]

Let H be a Hilbert space, C a closed convex subset of H, and \(T:C\rightarrow C\) a nonexpansive mapping with \(\operatorname{Fix}(T)\neq\emptyset \). If \(\{x_{n}\}_{n=1}^{\infty}\) is a sequence in C weakly converging to x and if \(\{(I-T)x_{n}\}_{n=1}^{\infty}\) converges strongly to y, then \((I-T)x=y\). In particular, if \(y=0\), then \(x\in \operatorname{Fix}(T)\).

Lemma 2.8

[14]

Let H be a Hilbert space, \(h:H\rightarrow H\) a contraction with coefficient \(0<\rho<1\). Then

$$\bigl\langle x-y,(I-h)x-(I-h)y \bigr\rangle \geq(1-\rho)\|x-y\|^{2}, \quad x,y \in H. $$

That is, \(I-h\) is strong monotone with coefficient \(1-\rho\).

We will use the notation ⇀ for weak convergence and → for strong convergence.

3 Strong convergence of proximal algorithms

Let H be a real Hilbert space and let \(\Gamma_{0}\) be the space of convex functions in H that are proper, lower semicontinuous, and convex. Consider the following minimization problem:

$$ \min_{x\in H}f(x)+g(x), $$
(3.1)

where \(f,g\in\Gamma_{0}(H)\).

Proposition 3.1

[11]

Let \(f,g\in\Gamma_{0}(H)\). Let \(x^{*}\in H\) and \(\lambda>0\). Assume that f is finite-valued and differentiable on H. Then \(x^{*}\) is a solution to (3.1) if and only if \(x^{*}\) solves the fixed point equation:

$$x^{*}= \bigl(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f) \bigr)x^{*}. $$

Consider a mapping \(S_{t}\) on H defined by

$$ S_{t}(x)=th(x)+(1-t) \bigl(\operatorname{prox}_{\lambda g} \circ\,(I-\lambda_{t}\nabla f) \bigr)x, $$
(3.2)

where h is a contraction with the coefficient \(0<\rho<1\), \(t\in (0,1)\), \(0<\lambda_{t}\leq\frac{2}{L}\). Assume that ∇f is L-Lipschitzian.

Proposition 3.2

The mapping \(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t}\nabla f)\) is nonexpansive.

Proof

We show it in two cases.

Case 1: As \(\lambda_{t}=\frac{2}{L}\), we have

$$\begin{aligned} &\bigl\| (I-\lambda_{t}\nabla f)x-(I-\lambda_{t}\nabla f)y \bigr\| ^{2} \\ &\quad=\biggl\| \biggl(I-\frac{2}{L}\nabla f \biggr)x- \biggl(I- \frac{2}{L}\nabla f \biggr)y\biggr\| ^{2} \\ &\quad= \biggl\langle x-y-\frac{2}{L} \bigl(\nabla f(x)-\nabla f(y) \bigr), x-y- \frac{2}{L} \bigl(\nabla f(x)-\nabla f(y) \bigr) \biggr\rangle \\ &\quad=\|x-y\|^{2}+\frac{4}{L^{2}}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2}-\frac {2}{L} \bigl\langle x-y,\nabla f(x)-\nabla f(y) \bigr\rangle \\ &\qquad{}-\frac{2}{L} \bigl\langle \nabla f(x)-\nabla f(y),x-y \bigr\rangle \\ &\quad\leq\|x-y\|^{2}+\frac{4}{L^{2}}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2}-\frac {2}{L}\cdot\frac{1}{L}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2} \\ &\qquad{}-\frac{2}{L}\cdot\frac{1}{L}\bigl\| \nabla f(x)-\nabla f(y) \bigr\| ^{2} \\ &\quad\leq\|x-y\|^{2}. \end{aligned}$$

Hence,

$$\bigl\| (I-\lambda_{t})\nabla f(x)-(I-\lambda_{t})\nabla f(y)\bigr\| \leq\|x-y\|. $$

As the mapping \(\operatorname{prox}_{\lambda_{t}}\) is nonexpansive, we get

$$\begin{aligned} &\bigl\| \bigl(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda _{t}\nabla f)\bigr)x-\bigl(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t} \nabla f)\bigr)y\bigr\| \\ &\quad\leq\bigl\| (I-\lambda_{t})\nabla f(x)-(I-\lambda_{t})\nabla f(y)\bigr\| \\ &\quad\leq\|x-y\|. \end{aligned}$$

Case 2: \(0<\lambda_{t}<\frac{2}{L}\). We follow the proof of [16]. Since ∇f is L-Lipschitzian, ∇f is \((1/L)\)-ism [18], which then implies that \(\lambda_{t}\nabla f\) is \((1/\lambda_{t}L)\)-ism. So \(I-\lambda_{t}\nabla f\) is \((\lambda_{t}L/2)\)-averaged. Since the proximal mapping \(\operatorname{prox}_{\lambda_{t}g}\) is (1/2)-averaged, the composite \(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t})\nabla f\) is \(((2+\lambda _{t}L)/4)\)-averaged for \(0<\lambda_{t}<2/L\), so the mapping \(\operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t})\nabla f\) is nonexpansive. □

By Proposition 3.2 it is not hard to see that \(S_{t}\) is a contraction on H. For \(x,y\in H\), we have

$$\begin{aligned} &\bigl\| S_{t}(x)-S_{t}(y)\bigr\| \\ &\quad=\bigl\| t \bigl[h(x)-h(y) \bigr]+(1-t) \bigl[ \bigl(\operatorname{prox}_{\lambda_{t}g} \circ\,(I-\lambda_{t}\nabla f) \bigr)x- \bigl(\operatorname{prox}_{\lambda_{t}g} \circ\,(I-\lambda_{t}\nabla f) \bigr)y \bigr]\bigr\| \\ &\quad\leq t\rho\|x-y\|+(1-t)\|x-y\| \\ &\quad= \bigl(1-t(1-\rho) \bigr)\|x-y\|. \end{aligned}$$

Hence, \(S_{t}\) has a unique point, we denote it by \(x_{t}\). Thus \(x_{t}\) is the unique solution of the fixed point equation

$$ x_{t}=t h(x_{t})+(1-t) \bigl( \operatorname{prox}_{\lambda_{t}g}\circ\,(I-\lambda_{t}\nabla f) \bigr)x_{t}. $$
(3.3)

We will use the following notation in Proposition 3.3 and Theorem 3.4:

$$\begin{aligned}& V_{t}=V_{\lambda_{t}}=\operatorname{prox}_{\lambda_{t}g}\circ\,(I- \lambda_{t}\nabla f); \\& V=V_{\lambda}=\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f). \end{aligned}$$

The properties of \(V_{t}\) and V are helpful for the following proof [19].

Proposition 3.3

Assume that (3.1) is consistent, and let S denote its solution set. Assume that \(\lambda_{t}\) is continuous with respect to t. Since \(0<\lambda_{t}\leq2/L\), we assume that \(\lambda_{t_{j}}\rightarrow \lambda \) (\(t_{j}\rightarrow0\)) and that \(\lambda>0\). Let \(x_{t}\) be defined by (3.3), we have

  1. (i)

    \(\{x_{t}\}\) is bounded for \(t\in(0, 1)\);

  2. (ii)

    \(\lim_{t\rightarrow0}\|x_{t}-(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda \nabla f))x_{t}\|=0\);

  3. (iii)

    \(x_{t}\) defines a continuous curve from \((0,1)\) into H.

Proof

(i) Take a \(p\in S\), then we have \(p\in \operatorname{Fix}(V)\), and

$$\begin{aligned} &\|x_{t}-p\| \\ &\quad=\bigl\| th(x_{t})+(1-t)V_{t}x_{t}-p\bigr\| \\ &\quad\leq(1-t)\|V_{t}x_{t}-p\|+t\bigl\| h(x_{t})-p\bigr\| \\ &\quad=(1-t)\|V_{t}x_{t}-V_{t}p\|+t \bigl\| h(x_{t})-p\bigr\| \\ &\quad\leq(1-t)\|x_{t}-p\|+t\bigl\| h(x_{t})-p\bigr\| . \end{aligned}$$

It follows that

$$\begin{aligned} &\|x_{t}-p\| \\ &\quad\leq\bigl\| h(x_{t})-p\bigr\| \\ &\quad\leq\bigl\| h(x_{t})-h(p)\bigr\| +\bigl\| h(p)-p\bigr\| \\ &\quad\leq\rho\|x_{t}-p\|+\bigl\| h(p)-p\bigr\| . \end{aligned}$$

Hence, \(\|x_{t}-p\|\leq\frac{1}{1-\rho}\|h(p)-p\|\), and \(\{x_{t}\}\) is bounded, so are \(\{\operatorname{prox}_{\lambda g} \circ\,(I-\lambda\nabla f)x_{t}\}\) and \(\{h(x_{t})\}\).

(ii) By the definition of \(\{x_{t}\}\), we have, for any \(\{t_{j}\} \rightarrow0\),

$$\begin{aligned} &\bigl\| x_{t_{j}}- \bigl(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda \nabla f) \bigr)x_{t_{j}}\bigr\| \\ &\quad=\bigl\| t_{j}h(x_{t_{j}})+(1-t_{j})V_{t_{j}}x_{t_{j}}-Vx_{t_{j}} \bigr\| \\ &\quad\leq t_{j}\bigl\| h(x_{t_{j}})-V_{t_{j}}x_{t_{j}} \bigr\| + \| V_{t_{j}}x_{t_{j}}-Vx_{t_{j}}\| \end{aligned}$$

and

$$\begin{aligned} &\|V_{t_{j}}x_{t_{j}}-Vx_{t_{j}}\| \\ &\quad=\bigl\| \bigl(\operatorname{prox}_{\lambda_{t_{j}}g}\circ\,(I-\lambda_{t_{j}} \nabla f) \bigr)x_{t_{j}}- \bigl(\operatorname{prox}_{\lambda g} \circ\,(I- \lambda\nabla f) \bigr)x_{t_{j}}\bigr\| \\ &\quad=\biggl\| \biggl(\operatorname{prox}_{\lambda g} \biggl(\frac{\lambda}{\lambda_{t_{j}}}(I- \lambda _{t_{j}}\nabla f) \biggr)x_{t_{j}}+ \biggl(1- \frac{\lambda}{\lambda _{t_{j}}} \biggr) \bigl(\operatorname{prox}_{\lambda_{t_{j}} g}\circ\,(I- \lambda_{t_{j}}\nabla f) \bigr)x_{t_{j}} \biggr) \\ &\qquad{}- \bigl(\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f) \bigr)x_{t_{j}}\biggr\| \\ &\quad\leq\biggl\| \frac{\lambda}{\lambda_{t_{j}}}(I-\lambda_{t_{j}}\nabla f)x_{t_{j}}+ \biggl(1-\frac{\lambda}{\lambda_{t_{j}}} \biggr) \bigl( \operatorname{prox}_{\lambda_{t_{j}} g} \circ\,(I-\lambda_{t_{j}}\nabla f) \bigr)x_{t_{j}}-(I-\lambda\nabla f)x_{t_{j}}\biggr\| \\ &\quad=\biggl\| \biggl(\frac{\lambda}{\lambda_{t_{j}}}-1 \biggr)x_{t_{j}}+ \biggl(1- \frac{\lambda }{\lambda_{t_{j}}} \biggr) \bigl(\operatorname{prox}_{\lambda_{t_{j}}g}\circ\,(I-\lambda _{t_{j}}\nabla f)\bigr)x_{t_{j}}\biggr\| \\ &\quad=\biggl|1-\frac{\lambda}{\lambda_{t_{j}}}\biggr|\cdot\| x_{t_{j}}-V_{t_{j}}x_{t_{j}} \|. \end{aligned}$$

Then we obtain

$$ \bigl\| x_{t_{j}}-\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)x_{t_{j}}\bigr\| \leq t_{j}\bigl\| h(x_{t_{j}})-V_{t_{j}}x_{t_{j}} \bigr\| +\biggl|1-\frac{\lambda}{\lambda _{t_{j}}}\biggr|\cdot\|x_{t_{j}}-V_{t_{j}}x_{t_{j}} \|. $$
(3.4)

Since \(\{x_{t}\}\), \(\{h(x_{t})\}\), and \(\{V_{t}x_{t}\}\) are bounded, and \(\lambda_{t_{j}}\rightarrow\lambda \) (\(t_{j}\rightarrow0\)), we can obtain by (3.4) that

$$\lim_{t_{j}\rightarrow0}\bigl\| x_{t_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I- \lambda \nabla f)x_{t_{j}}\bigr\| =0. $$

By the arbitrariness of \(t_{j}\), we get

$$\lim_{t\rightarrow0}\bigl\| x_{t}-\operatorname{prox}_{\lambda g}\circ\,(I- \lambda\nabla f)x_{t}\bigr\| =0. $$

(iii) For any given \(t,t_{0}\in(0,1)\),

$$\begin{aligned} &\|x_{t}-x_{t_{0}}\|\\ &\quad=\bigl\| th(x_{t})+(1-t)V_{t}x_{t}-t_{0}h(x_{t_{0}})-(1-t_{0})V_{t_{0}}x_{t_{0}} \bigr\| \\ &\quad\leq\bigl\| (t-t_{0})h(x_{t})+t_{0} \bigl(h(x_{t})-h(x_{t_{0}}) \bigr)\bigr\| \\ &\qquad{}+\bigl\| (1-t_{0})V_{t_{0}}x_{t}-(1-t_{0})V_{t_{0}}x_{t_{0}}+(1-t)V_{t}x_{t}-(1-t_{0})V_{t_{0}}x_{t} \bigr\| \\ &\quad\leq|t-t_{0}|\cdot\bigl\| h(x_{t})\bigr\| +t_{0}\rho \|x_{t}-x_{t_{0}}\| +(1-t_{0})\|x_{t}-x_{t_{0}} \| \\ &\qquad{}+\|V_{t}x_{t}-V_{t_{0}}x_{t}\|+ \|tV_{t}x_{t}-t_{0}V_{t_{0}}x_{t} \| \\ &\quad\leq|t-t_{0}|\cdot\bigl\| h(x_{t})\bigr\| +t_{0}\rho \|x_{t}-x_{t_{0}}\| +(1-t_{0})\|x_{t}-x_{t_{0}} \|+\biggl|1-\frac{\lambda_{t_{0}}}{\lambda _{t}}\biggr|\cdot\|x_{t}-V_{t}x_{t}\| \\ &\qquad{}+\|t_{0}V_{t}x_{t}-t_{0}V_{t_{0}}x_{t} \|+\| tV_{t}x_{t}-t_{0}V_{t}x_{t} \| \\ &\quad\leq \bigl(1-t_{0}(1-\rho) \bigr)\|x_{t}-x_{t_{0}} \|+(t_{0}+1)\biggl|1-\frac{\lambda _{t_{0}}}{\lambda_{t}}\biggr|\cdot\|x_{t}-V_{t}x_{t} \| \\ &\qquad{}+|t-t_{0}| \bigl(\bigl\| h(x_{t})\bigr\| + \|V_{t}x_{t} \| \bigr). \end{aligned}$$

Hence,

$$ \|x_{t}-x_{t_{0}}\|\leq\frac{(\|h(x_{t})\|+\|V_{t}x_{t}\|)}{t_{0}(1-\rho )}|t-t_{0}|+ \frac{(t_{0}+1)}{t_{0}(1-\rho)}\biggl|1-\frac{\lambda _{t_{0}}}{\lambda_{t}}\biggr|\cdot\|x_{t}-V_{t}x_{t} \|. $$
(3.5)

Since \(\lambda_{t}\) is continuous with respect to t, we can obtain by (3.5) that \(x_{t}\rightarrow x_{t_{0}}\) as \(t\rightarrow t_{0}\), that is, \(\{x_{t}\}\) defines a continuous curve from \((0,1)\) into H. □

Theorem 3.4

Let \(\{x_{t}\}\) be defined by (3.3). Assume that the minimization problem (3.1) is consistent, and let S denote its solution set. Assume that \(\lim_{t\rightarrow0}\lambda_{t}=\lambda>0\) and that \(\lambda _{t}\) is continuous with respect to t. Assume that ∇f is L-Lipschitzian. Then \(x_{t}\) converges as \(t\rightarrow0\) to a solution \(x^{*}\) of (3.1), which also solves the variational inequality

$$ \bigl\langle (I-h)x^{*}, \tilde{x}-x^{*} \bigr\rangle \geq0, \quad\tilde{x}\in S. $$
(3.6)

Proof

By Lemma 2.8 we know that \(I-h\) is strongly monotone, so the variational inequality (3.6) has only one solution. Below we use \(x^{*}\in S\) to denote the unique solution of (3.6).

To prove that \(x_{t}\rightarrow x^{*}\) (\(t\rightarrow0\)), we write, for a given \(\tilde{x}\in S\),

$$\begin{aligned}& x_{t}-\tilde{x}=t h(x_{t})+(1-t)V_{t}x_{t}- \tilde{x}=t \bigl(h(x_{t})-\tilde {x} \bigr)+(1-t) (V_{t}x_{t}- \tilde{x}); \\& \begin{aligned}[b] &\|x_{t}-\tilde{x}\|^{2} \\ &\quad=\langle x_{t}-\tilde{x},x_{t}-\tilde{x} \rangle \\ &\quad=(1-t)\langle V_{t}x_{t}-\tilde{x},x_{t}- \tilde{x}\rangle+t \bigl\langle h(x_{t})-\tilde{x},x_{t}- \tilde{x} \bigr\rangle \\ &\quad\leq(1-t)\|x_{t}-\tilde{x}\|^{2}+t \bigl\langle h(x_{t})-h(\tilde {x}),x_{t}-\tilde{x} \bigr\rangle +t \bigl\langle h(\tilde{x})-\tilde{x},x_{t}-\tilde {x} \bigr\rangle \\ &\quad\leq \bigl(1-t(1-\rho) \bigr)\|x_{t}-\tilde{x}\|^{2}+t \bigl\langle h(\tilde{x})-\tilde {x},x_{t}-\tilde{x} \bigr\rangle . \end{aligned} \end{aligned}$$

Hence,

$$ \|x_{t}-\tilde{x}\|^{2}\leq \frac{\langle h(\tilde{x})-\tilde {x},x_{t}-\tilde{x}\rangle}{1-\rho}. $$
(3.7)

Since \(\{x_{t}\}\) is bounded as \(t\rightarrow0\), it is obvious that if \(\{t_{j}\}\) is a sequence in \((0,1)\) such that \(t_{j}\rightarrow0\) (\(j\rightarrow\infty\)) and \(x_{t_{j}}\rightharpoonup\bar{x}\), then by (3.7) we get \(x_{t_{j}}\rightarrow\bar{x}\). Since \(0<\lambda_{t}\leq \frac{2}{L}\), we may assume that \(\lambda_{t_{j}}\rightarrow\lambda\in (0,\frac{2}{L}]\) and that \(\lambda>0\), then by the proof of Proposition 3.2 we see that \(\operatorname{prox}_{\lambda g}(I-\lambda\nabla f)\) is also nonexpansive. Applying Lemma 2.7 and Proposition 3.3(ii), we get \(\bar {x}\in S\).

Next, we show that \(\bar{x}\in S\) solves the variational inequality (3.6). Indeed, we notice that \(x_{t}\) solves the fixed point equation

$$\begin{aligned}& x_{t}=t h(x_{t})+(1-t) \bigl(\operatorname{prox}_{\lambda_{t}g} \circ\,(I-\lambda_{t}\nabla f) \bigr)x_{t}, \\& h(x_{t})=\frac{1}{t} \bigl[x_{t}-(1-t)V_{t}x_{t} \bigr], \\& (I-h)x_{t}=-\frac{1-t}{t}(I-V_{t})x_{t}. \end{aligned}$$

Since \(V_{t}\) is nonexpansive, \(I-V_{t}\) is monotone, so for any \(\tilde {x}\in S\),

$$\begin{aligned} &\bigl\langle (I-h)x_{t},x_{t}-\tilde{x}\bigr\rangle \\ &\quad=-\frac{1-t}{t} \bigl\langle (I-V_{t})x_{t},x_{t}- \tilde{x} \bigr\rangle \\ &\quad=-\frac{1-t}{t} \bigl\langle (I-V_{t})x_{t}-(I-V_{t}) \tilde{x},x_{t}-\tilde {x} \bigr\rangle -\frac{1-t}{t} \bigl\langle (I-V_{t})\tilde{x},x_{t}-\tilde{x} \bigr\rangle \\ &\quad\leq-\frac{1-t}{t} \bigl\langle (I-V_{t}) \tilde{x},x_{t}- \tilde{x} \bigr\rangle \\ &\quad=0. \end{aligned}$$

Taking the limit through \(t=t_{n}\rightarrow0\), we obtain

$$\bigl\langle (I-h)\bar{x},\bar{x}-\tilde{x} \bigr\rangle \leq0. $$

Therefore \(\bar{x}=x^{*}\) by uniqueness. □

Initialize \(x_{0}\in H\) and iterate

$$ x_{n+1}=\alpha_{n}h(x_{n})+(1- \alpha_{n}) \bigl(\operatorname{prox}_{\lambda_{n}g}\circ\, (I- \lambda_{n}\nabla f) \bigr)x_{n}, $$
(3.8)

where \(\{\alpha_{n}\}\) is a sequence in \((0,1)\), \(0<\lambda_{n}\leq\frac {2}{L}\), \(\liminf_{n\rightarrow\infty}\lambda_{n}>0\), and \(h:H\rightarrow H\) is a contraction with the coefficient \(0<\rho<1\).

Theorem 3.5

Let \(f,g\in\Gamma_{0}(H)\). Assume that the minimization problem (3.1) is consistent and let S denote its solution set. Assume in addition that

  1. (C1)

    ∇f is L-Lipschitzian on H;

  2. (C2)

    \(\alpha_{n}\rightarrow0\);

  3. (C3)

    \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\);

  4. (C4)

    \(\sum_{n=0}^{\infty}|\alpha_{n+1}-\alpha_{n}|<\infty\);

  5. (C5)

    \(\sum_{n=0}^{\infty}|\lambda_{n+1}-\lambda_{n}|<\infty\).

Then the sequence \(\{x_{n}\}_{n=_{0}}^{\infty}\) generated by (3.8) converges to \(x^{*}\) as defined in Theorem  3.4.

Proof

Putting

$$\begin{aligned}& V_{n}=V_{\lambda_{n}}=\operatorname{prox}_{\lambda_{n}g}\circ\,(I- \lambda_{n}\nabla f); \\& V=V_{\lambda}=\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f). \end{aligned}$$

We then get \(x_{n+1}=\alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}\).

First we show that the sequence \(\{x_{n}\}_{n=0}^{\infty}\) is bounded. Indeed, we have, for \(\bar{x}\in S\),

$$\begin{aligned} &\|x_{n+1}-\bar{x}\| \\ &\quad=\bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}- \bar{x}\bigr\| \\ &\quad=\bigl\| \alpha_{n} \bigl(h(x_{n})-h(\bar{x}) \bigr)+ \alpha_{n} \bigl(h(\bar{x})-\bar {x} \bigr)+(1-\alpha_{n}) (V_{n}x_{n}-\bar{x})\bigr\| \\ &\quad\leq\alpha_{n}\rho\|x_{n}-\bar{x}\|+ \alpha_{n} \bigl\| h(\bar{x})-\bar{x}\bigr\| +(1-\alpha_{n}) \|x_{n}-\bar{x}\| \\ &\quad= \bigl(1-\alpha_{n}(1-\rho) \bigr)\|x_{n}-\bar{x}\|+ \alpha_{n}\bigl\| h(\bar{x})-\bar {x}\bigr\| \\ &\quad\leq \max \bigl\{ \|x_{n}-\bar{x}\|,\bigl\| h(\bar{x})-\bar{x}\bigr\| /(1- \rho) \bigr\} . \end{aligned}$$

So, an induction argument shows that

$$\|x_{n}-\bar{x}\|\leq \max \bigl\{ \|x_{0}-\bar{x}\|,\bigl\| h( \bar{x})-\bar{x}\bigr\| /(1-\rho) \bigr\} ,\quad n\geq0. $$

We next prove that \(\|x_{n+1}-x_{n}\|\rightarrow0\) as \(n\rightarrow \infty\). For the sake of simplicity, we assume that \(0< a\leq\lambda _{n}\leq b\leq\frac{2}{L}\).

We compute

$$\begin{aligned} &\|x_{n+1}-x_{n}\| \\ &\quad=\bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}- \alpha _{n-1}h(x_{n-1})-(1-\alpha_{n-1})V_{n-1}x_{n-1} \bigr\| \\ &\quad\leq\bigl\| \alpha_{n} \bigl[h(x_{n})-h(x_{n-1}) \bigr]+ \bigl[\alpha_{n}h(x_{n-1})-\alpha _{n-1}h(x_{n-1}) \bigr]\bigr\| \\ &\qquad{}+\bigl\| (1-\alpha_{n}) (V_{n}x_{n}-V_{n}x_{n-1})+(1- \alpha _{n})V_{n}x_{n-1}-(1-\alpha_{n-1})V_{n-1}x_{n-1} \bigr\| \\ &\quad\leq\alpha_{n}\rho\|x_{n}-x_{n-1}\|+| \alpha_{n}-\alpha_{n-1}|\bigl\| h(x_{n-1})\bigr\| +(1- \alpha_{n})\|x_{n}-x_{n-1}\| \\ &\qquad{}+(1-\alpha_{n})\|V_{n}x_{n-1}-V_{n-1}x_{n-1} \|+\|\alpha _{n}V_{n-1}x_{n-1}-\alpha_{n-1}V_{n-1}x_{n-1} \| \\ &\quad\leq \bigl(1-\alpha_{n}(1-\rho) \bigr)\|x_{n}-x_{n-1} \|+|\alpha_{n}-\alpha _{n-1}| \bigl(\bigl\| h(x_{n-1})\bigr\| + \|V_{n-1}x_{n-1}\| \bigr) \\ &\qquad{}+\|V_{n}x_{n-1}-V_{n-1}x_{n-1}\| \end{aligned}$$

and

$$\begin{aligned} \|V_{n}x_{n-1}-V_{n-1}x_{n-1}\| &\leq\biggl|1-\frac{\lambda_{n}}{\lambda_{n-1}}\biggr| \cdot\|x_{n-1}-V_{n}x_{n-1} \| \\ &\leq\frac{|\lambda_{n}-\lambda_{n-1}|}{a}\|x_{n-1}-V_{n}x_{n-1} \|, \end{aligned}$$

so we obtain

$$ \|x_{n+1}-x_{n}\|\leq \bigl(1- \alpha_{n}(1-\rho) \bigr)\|x_{n}-x_{n-1}\|+M\bigl(|\alpha _{n}-\alpha_{n-1}|+|\lambda_{n}- \lambda_{n-1}|\bigr), $$
(3.9)

where \(M\leq \max\{\|h(x_{n-1})\|+\|V_{n-1}x_{n-1}\|,\| x_{n-1}-V_{n}x_{n-1}\|/a\}\). By assumptions (C3)-(C5) in the theorem, we have \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\), and \(\sum_{n=1}^{\infty}(|\alpha_{n}-\alpha_{n-1}|+|\lambda_{n}-\lambda _{n-1}|)<\infty\). Hence, Lemma 2.5 is applicable to (3.9) and we conclude that \(\|x_{n+1}-x_{n}\|\rightarrow0\).

Since \(\{x_{n}\}\) is bounded, there exists a subsequence \(\{x_{n_{j}}\} \) such that \(x_{n_{j}}\rightharpoonup z\); below we will prove that \(z\in S\). Since \(0<\lambda_{n}\leq\frac{2}{L}\), we may assume that \(\lambda_{n_{j}}\rightarrow\lambda\). We have

$$ \bigl\| x_{n_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I- \lambda\nabla f)x_{n_{j}}\bigr\| \leq\|x_{n_{j}}-x_{n_{j+1}}\|+ \bigl\| x_{n_{j+1}}-\operatorname{prox}_{\lambda g}\circ\, (I-\lambda\nabla f)x_{n_{j}}\bigr\| . $$
(3.10)

We compute

$$\begin{aligned} &\bigl\| x_{n_{j+1}}-\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f)x_{n_{j}}\bigr\| \\ &\quad=\bigl\| \alpha_{n_{j}}h(x_{n_{j}})+(1-\alpha_{n_{j}}) \bigl( \operatorname{prox}_{\lambda _{n_{j}}g}\circ\,(I-\lambda_{n_{j}}\nabla f) \bigr)x_{n_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I-\lambda\nabla f)x_{n_{j}}\bigr\| \\ &\quad\leq\alpha_{n_{j}}\bigl\| h(x_{n_{j}})-Vx_{n_{j}}\bigr\| +(1- \alpha_{n_{j}})\| V_{n_{j}}x_{n_{j}}-Vx_{n_{j}}\| \end{aligned}$$

and

$$\begin{aligned} &\|V_{n_{j}}x_{n_{j}}-Vx_{n_{j}}\| \\ &\quad=\bigl\| \operatorname{prox}_{\lambda_{n_{j}}g}\circ\, (I-\lambda_{n_{j}}\nabla f)x_{n_{j}}-\operatorname{prox}_{\lambda g}\circ\,(I-\lambda \nabla f)x_{n_{j}}\bigr\| \\ &\quad\leq\biggl\| \frac{\lambda}{\lambda_{n_{j}}}(I-\lambda_{n_{j}}\nabla f)x_{n_{j}}+ \biggl(1-\frac{\lambda}{\lambda_{n_{j}}} \biggr) \bigl( \operatorname{prox}_{\lambda _{n_{j}}g} \circ\,(I-\lambda_{n_{j}}\nabla f) \bigr)x_{n_{j}}-(I-\lambda\nabla f)x_{n_{j}}\biggr\| \\ &\quad=\biggl\| \biggl(\frac{\lambda}{\lambda_{n_{j}}}-1 \biggr)x_{n_{j}}+ \biggl(1- \frac{\lambda }{\lambda_{n_{j}}} \biggr)\operatorname{prox}_{\lambda_{n_{j}}g}(I- \lambda_{n_{j}}\nabla f)x_{n_{j}}\biggr\| \\ &\quad=\biggl|1-\frac{\lambda}{\lambda_{n_{j}}}\biggr|\cdot\| x_{n_{j}}-V_{n_{j}}x_{n_{j}} \| \\ &\quad\leq|\lambda_{n_{j}}-\lambda|\frac{\|x_{n_{j}}-V_{n_{j}}x_{n_{j}}\|}{a}. \end{aligned}$$

So we obtain

$$\begin{aligned} &\bigl\| x_{n_{j+1}}-\operatorname{prox}_{\lambda g}\circ\, (I- \lambda\nabla f)x_{n_{j}}\bigr\| \\ &\quad\leq\alpha_{n_{j}}\bigl\| h(x_{n_{j}})-Vx_{n_{j}} \bigr\| +(1- \alpha_{n_{j}})|\lambda _{n_{j}}-\lambda| \frac{\|x_{n_{j}}-V_{n_{j}}x_{n_{j}}\|}{a}. \end{aligned}$$
(3.11)

Combining (3.10) and (3.11) we get

$$\begin{aligned} &\bigl\| x_{n_{j}}-\bigl(\operatorname{prox}_{\lambda g} \circ\,(I- \lambda\nabla f)\bigr)x_{n_{j}}\bigr\| \\ &\quad\leq\|x_{n_{j}}-x_{n_{j+1}}\|+\alpha_{n_{j}} \bigl\| h(x_{n_{j}})-Vx_{n_{j}}\bigr\| +(1-\alpha_{n_{j}})| \lambda_{n_{j}}-\lambda|\frac{\| x_{n_{j}}-V_{n_{j}}x_{n_{j}}\|}{a}. \end{aligned}$$
(3.12)

Since \(\lambda_{n_{j}}\rightarrow\lambda\), and \(\alpha_{n}\rightarrow 0\), by (3.12) we get

$$ \bigl\| x_{n_{j}}- \bigl(\operatorname{prox}_{\lambda g} \circ\,(I-\lambda\nabla f) \bigr)x_{n_{j}}\bigr\| \rightarrow0. $$
(3.13)

By the proof of Theorem 3.4 we know that \(\operatorname{prox}_{\lambda g}\circ\, (I-\lambda\nabla f)\) is nonexpansive. It follows from Lemma 2.7 and (3.13) that \(z\in S\).

We next show that

$$ \limsup_{n\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n}-x^{*} \bigr\rangle \leq0, $$
(3.14)

where \(x^{*}\) is obtained in Theorem 3.4. Indeed, replacing n with \(n_{j}\) in (3.14), and letting \(j\rightarrow\infty\), we have

$$\limsup_{n\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n}-x^{*} \bigr\rangle =\lim _{j\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n_{j}}-x^{*} \bigr\rangle . $$

Hence, by (3.6), we obtain

$$\limsup_{n\rightarrow\infty} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n}-x^{*} \bigr\rangle = \bigl\langle h \bigl(x^{*} \bigr)-x^{*},z-x^{*} \bigr\rangle \leq0. $$

We finally show that \(x_{n}\rightarrow x^{*}\). We have

$$\begin{aligned} &\bigl\| x_{n+1}-x^{*}\bigr\| ^{2} \\ &\quad=\bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}-x^{*} \bigr\| ^{2} \\ &\quad=\bigl\| \alpha_{n} \bigl(h(x_{n})-h \bigl(x^{*} \bigr) \bigr)+(1-\alpha _{n}) \bigl(V_{n}x_{n}-x^{*} \bigr)+\alpha_{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr)\bigr\| ^{2} \\ &\quad\leq\bigl\| \alpha_{n} \bigl(h(x_{n})-h \bigl(x^{*} \bigr) \bigr)+(1-\alpha_{n}) \bigl(V_{n}x_{n}-x^{*} \bigr)\bigr\| ^{2}+2 \bigl\langle \alpha_{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr),x_{n+1}-x^{*} \bigr\rangle \\ &\quad\leq\alpha_{n}\bigl\| h(x_{n})-h \bigl(x^{*} \bigr) \bigr\| ^{2}+(1-\alpha_{n})\bigl\| V_{n}x_{n}-x^{*} \bigr\| ^{2}+2 \bigl\langle \alpha _{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr),x_{n+1}-x^{*} \bigr\rangle \\ &\quad\leq\alpha_{n}\rho^{2}\bigl\| x_{n}-x^{*} \bigr\| ^{2}+(1-\alpha_{n})\bigl\| x_{n}-x^{*} \bigr\| ^{2}+2 \bigl\langle \alpha _{n} \bigl(h \bigl(x^{*} \bigr)-x^{*} \bigr),x_{n+1}-x^{*} \bigr\rangle \\ &\quad\leq \bigl(1-\alpha_{n} \bigl(1-\rho^{2} \bigr) \bigr) \bigl\| x_{n}-x^{*}\bigr\| ^{2}+2\alpha _{n} \bigl\langle h \bigl(x^{*} \bigr)-x^{*},x_{n+1}-x^{*} \bigr\rangle . \end{aligned}$$

It then follows that

$$ \bigl\| \alpha_{n}h(x_{n})+(1-\alpha_{n})V_{n}x_{n}-x^{*} \bigr\| ^{2}\leq \bigl(1-\alpha _{n} \bigl(1-\rho^{2} \bigr) \bigr)\bigl\| x_{n}-x^{*}\bigr\| ^{2}+ \alpha_{n} \delta_{n}, $$
(3.15)

where \(\delta_{n}=2\langle h(x^{*})-x^{*},x_{n+1}-x^{*}\rangle\). Applying Lemma 2.5 to the inequality (3.15), together with (3.14), we get \(x_{n}\rightarrow x^{*}\) as \(n\rightarrow\infty\). □

4 An application of proximal algorithm to the lasso and Q-lasso

Take \(f(x)=\frac{1}{2}\|Ax-b\|_{2}^{2}\) and \(g(x)=\gamma\|x\|_{1}\), then lasso (1.2) can be solved by the proximal algorithms (3.8). We have \(\nabla f(x)=A^{t}(Ax-b)\), and we show that ∇f is Lipschitz continuous with constant \(L=\|A\|_{2}^{2}\) as follows:

$$\begin{aligned} \bigl\| A^{t}(Ax-b)-A^{t}(Ay-b)\bigr\| _{2} =\bigl\| A^{t}A(x-y)\bigr\| _{2} \leq\|A\|_{2}^{2}\|x-y\|_{2}. \end{aligned}$$

Then the proximal algorithm (3.8) is equivalent to

$$ x_{n+1}=\alpha_{n}h(x_{n})+(1- \alpha_{n}) \bigl[\operatorname{prox}_{\lambda_{n}\gamma\|\cdot \|_{1}} \bigl(I- \lambda_{n}A^{t}(Ax-b) \bigr) \bigr]x_{n}. $$
(4.1)

Here we have, for \(\alpha>0\) and \(x=(x_{i})^{t}\in\mathbb{R}^{n}\),

$$\operatorname{prox}_{\alpha\|\cdot\|_{1}}(x)= \bigl(\operatorname{prox}_{\alpha|\cdot|}(x_{1}), \ldots,\operatorname{prox}_{\alpha|\cdot|}(x_{n}) \bigr)^{t}, $$

and \(\operatorname{prox}_{\alpha|\cdot|}(\beta)= \operatorname{sgn}(\beta)\max\{|\beta|-\alpha,0\}\) for \(\beta\in\mathbb{R}\).

Theorem 4.1

Assume that

  1. (C1)

    \(0<\lambda_{n}\leq2/\|A\|_{2}^{2}\);

  2. (C2)

    \(\alpha_{n}\rightarrow0\);

  3. (C3)

    \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\);

  4. (C4)

    \(\sum_{n=0}^{\infty}|\alpha_{n+1}-\alpha_{n}|<\infty\);

  5. (C5)

    \(\sum_{n=0}^{\infty}|\lambda_{n+1}-\lambda_{n}|<\infty\).

Then the sequence \(\{x_{n}\}_{n=0}^{\infty}\) generated by (4.1) converges to a solution \(x^{*}\) of lasso (1.2), which also solves the variational inequality (3.6).

For Q-lasso (1.6), we take \(f(x)=(1/2)\|(I-P_{Q})Ax\|_{2}^{2}\) and \(g(x)=\gamma\|x\|_{1}\). Since \(P_{Q}\) is \(\frac{1}{2}\)-averaged, \(I-P_{Q}\) is nonexpansive, so we can show that \(\nabla f(x)=A^{t}(I-P_{Q})Ax\) is Lipschitz continuous with constant \(L=\|A\| _{2}^{2}\) as follows:

$$\begin{aligned} &\bigl\| A^{t}(I-P_{Q})Ax-A^{t}(I-P_{Q})Ay \bigr\| _{2} \\ &\quad\leq\|A\|_{2}\bigl\| (I-P_{Q})Ax-(I-P_{Q})Ay \bigr\| _{2} \\ &\quad\leq\|A\|_{2}\|Ax-Ay\|_{2} \\ &\quad\leq\|A\|_{2}^{2}\|x-y\|_{2}. \end{aligned}$$

Then the proximal algorithm (3.8) is reduced to the following algorithm for Q-lasso (1.6):

$$ x_{n+1}=\alpha_{n}f(x_{n})+(1- \alpha_{n}) \bigl[\operatorname{prox}_{\lambda_{n}\gamma\|\cdot \|_{1}} \bigl(I- \lambda_{n}A^{t}(I-P_{Q})A \bigr) \bigr]x_{n}. $$
(4.2)

The convergence of Theorem 3.5 reads as follows for Q-lasso (1.6).

Theorem 4.2

Assume that

  1. (C1)

    \(0<\lambda_{n}\leq2/\|A\|_{2}^{2}\);

  2. (C2)

    \(\alpha_{n}\rightarrow0\);

  3. (C3)

    \(\sum_{n=0}^{\infty}\alpha_{n}=\infty\);

  4. (C4)

    \(\sum_{n=0}^{\infty}|\alpha_{n+1}-\alpha_{n}|<\infty\);

  5. (C5)

    \(\sum_{n=0}^{\infty}|\lambda_{n+1}-\lambda_{n}|<\infty\).

Then the sequence \(\{x_{n}\}_{n=0}^{\infty}\) generated by (4.2) converges to a solution \(x^{*}\) of Q-lasso (1.6), which is also a solution of the variational inequality (3.6).

5 Conclusion

  1. 1.

    We modify the proximal-gradient algorithm based on the viscosity proximation method; thus, we obtain strong convergence of results in [11]. Then we apply our results to the lasso and the Q-lasso.

  2. 2.

    Theorem 3.4 proves the continuous version of Theorem 3.5, which is not presented in [11].

  3. 3.

    In our main result, we extend the scope of L, that is, the condition in our main results is weaker than the condition in [11] and [12].

References

  1. Tibshirani, R: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B 58, 267-288 (1996)

    MATH  MathSciNet  Google Scholar 

  2. Chen, SS, Donoho, DL, Saunders, MA: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33-61 (1998)

    Article  MathSciNet  Google Scholar 

  3. Donoho, D: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289-1306 (2000)

    Article  MathSciNet  Google Scholar 

  4. Candes, EJ, Tao, T: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406-5425 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  5. Candes, EJ, Romberg, J, Tao, T: Stable signal recovery from incomplete and inaccurate measurement. Commun. Pure Appl. Math. 59(2), 1207-1223 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  6. Candes, EJ, Romberg, J, Tao, T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489-509 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  7. Cipra, BA: \(l_{1}\)-magic. SIAM News. 39, 9 (2006)

  8. Censor, Y, Elfving, T: A multiprojection algorithm using Bregman projections in a product space. Numer. Algorithms 8(2-4), 221-239 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  9. Censor, Y, Elfving, T, Kopt, N, Bortfeld, T: The multiple-sets split-feasibility problem and its applications for inverse problem. Inverse Probl. 21, 2071-2084 (2005)

    Article  MATH  Google Scholar 

  10. Censor, Y, Elfving, T, Martin, B, Trfimov, A: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51, 2353-2365 (2006)

    Article  Google Scholar 

  11. Xu, HK: Properties and iterative methods for the lasso and its variants. Chin. Ann. Math., Ser. B 35(3), 1-18 (2014)

    Article  MATH  Google Scholar 

  12. Alghamdi, MA, Alghamdi, MA, Shahzad, N, Xu, HK: Properties and iterative methods for the Q-lasso. Abstr. Appl. Anal. 2013, Article ID 250943 (2013)

    MathSciNet  Google Scholar 

  13. Moudafi, A: Viscosity approximation methods for fixed-points problems. J. Math. Anal. Appl. 241, 46-55 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Xu, HK: Viscosity approximation methods for nonexpansive mapping. J. Math. Anal. Appl. 298, 279-291 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  15. Combettes, PL, Wajs, R: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168-1200 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  16. Xu, HK: Averaged mappings and the gradient-projection algorithm. J. Optim. Theory Appl. 150, 360-378 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  17. Geobel, K, Kirk, WA: Topics in Metric Fixed Point Theory. Cambridge Studies in Advanced Mathematics, vol. 28. Cambridge University Press, Cambridge (1990)

    Book  Google Scholar 

  18. Baillon, JB, Haddad, G: Quelques propriétés des opérateurs angle-bornés et n-cycliquement monotones. Isr. J. Math. 26, 137-150 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  19. Marino, G, Muglia, L: On the auxiliary mappings generated by a family of mappings and solutions of variational inequalities problems. Optim. Lett. 9, 263-282 (2015)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are thankful to the anonymous referees for their valuable comments and suggestions, which helped to improve the presentation of this paper. The first author is supported by the Foundation of Tianjin Key Lab for Advanced Signal Processing and the Fundamental Research Funds for the Central Universities (No. 3122015L015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Tian.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All the authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, M., Gong, JY. Strong convergence of a modified proximal algorithm for solving the lasso. J Inequal Appl 2015, 161 (2015). https://doi.org/10.1186/s13660-015-0680-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-015-0680-x

Keywords