- Research
- Open access
- Published:
PAC-Bayesian inequalities of some random variables sequences
Journal of Inequalities and Applications volume 2015, Article number: 244 (2015)
Abstract
In this paper, we establish some PAC-Bayesian inequalities for a sequence of random variables, which include conditionally symmetric random variables and locally square integrable martingales.
1 Introduction
In this paper, we establish several PAC-Bayesian inequalities. The PAC-Bayesian analysis is an abbreviation for the Probably Approximately Correct learning model and has been introduced a decade ago (Shawe-Taylor and Williamson [1]; Shawe-Taylor et al. [2]) and has made a significant contribution to the analysis and development of supervised learning methods, the random multiarmed bandits problem and so on. PAC-Bayesian analysis provides high probability bounds on the deviation of weighted averages of empirical means of sets of independent random variables from their expectations. It supplies generalization guarantees for many influential machine learning algorithms.
Shawe-Taylor and Williamson [1] established the PAC-Bayesian learning theorems. They showed that if one can find a ball of sufficient volume in a parameterized concept space, then the center of that ball has a low error rate. For non-i.i.d. data, application of PAC-Bayesian analysis was partially addressed only recently by Ralaivola et al. [3] and Lever et al. [4]. For a martingale, taking advantage of the martingale’s properties, Seldin et al. [5] obtained the PAC-Bayesian-Bernstein inequality and applied it to multiarmed bandits. Seldin et al. [6] presented a generalization of the PAC-Bayesian analysis. Their generalization makes it possible to consider model order selection simultaneously with the exploration-exploitation trade-off.
In the present paper, we continue to study the PAC-Bayesian inequalities for some random variable sequence. We concentrate on the conditionally symmetric random variables and the locally square integrable martingale. For the first case, we only assume the conditional symmetry of the random variable sequence, without any other dependent conditions and moment conditions. For the second case, the bounded condition in Seldin et al. [6] is weakened. The paper is divided as follows: In Section 2 we state the main results and make some remarks. Proofs of the main results are provided in Section 3.
2 Main results
In this section, we discuss the PAC-Bayesian inequalities for conditionally symmetric random variables and martingales. In order to present our main theorems, we give a few definitions. Let \(\mathbb{H}\) be an index (or a hypothesis) space, possibly uncountably infinite. Let \(\{X_{1}(h), X_{2}(h),\ldots: h\in\mathbb{H}\} \) be a sequence of variables adapted to an increasing sequence of σ-fields \(\{\mathcal{F}_{n}\}\), where \(\mathcal{F}_{n}=\{ X_{k}(h): 1\le k\le n \text{ and } h\in\mathbb{H}\}\) is a set of random sequences observed up to time n (the history).
Seldin et al. [6] obtained a PAC-Bayes-Bernstein inequality for martingale with finite jumps.
Theorem 2.1
(Seldin et al. [6])
Let \(\{X_{1}(h), X_{2}(h),\ldots: h\in\mathbb{H}\}\) be a set of martingale difference sequences adapted to an increasing sequence of σ-fields \(\mathcal{F}_{n}=\{X_{k}(h): 1\le k\le n \textit{ and } h\in\mathbb{H}\}\). Furthermore, let \(M_{n}(h)=\sum_{k=1}^{n}X_{k}(h)\) be martingales corresponding to the martingale difference sequences and let \(V_{n}(h)=\sum_{k=1}^{n}\mathbb{E}[X_{k}(h)^{2}|\mathcal{F}_{k-1}]\) be cumulative variances of the martingales. For a distribution ρ over \(\mathbb{H}\), define \(M_{n}(\rho) =\mathbb{E}_{\rho(h)}[M_{n}(h)]\) and \(V_{n}(\rho) =\mathbb{E}_{\rho(h)}[V_{n}(h)]\). Let \(\{C_{1}, C_{2}, \ldots\}\) be an increasing sequence set in advance, such that \(|X_{k}(h)|\le C_{k}\) for all h with probability 1. Let \(\{\mu_{1},\mu _{2},\ldots\}\) be a sequence of ‘reference’ (‘prior’) distributions over \(\mathbb{H}\), such that \(\mu_{n}\) is independent of \(\mathcal{F}_{n}\) (but can depend on n). Let \(\{\lambda_{1},\lambda_{2},\ldots\}\) be a sequence of positive numbers set in advance that satisfy \(\lambda_{k}\le C_{k}^{-1}\). Then for all possible distributions \(\rho_{n}\) over \(\mathbb {H}\) given n and for all n simultaneously with probability greater than \(1-\delta\),
where \(\operatorname{KL}(\mu\parallel\nu)\) is the KL-divergence (relative entropy) between two distributions μ and ν.
Seldin et al. [6] (see Theorem 2.1) considered the bounded martingale difference sequence and the parameters \((\lambda_{k})\) depend on the bounds \((1/C_{k})\). Since \(C_{k}\) is an increasing sequence, \(\lambda_{k}\) is a decreasing sequence. Furthermore, Seldin et al. [6] studied the deviation properties between the martingale and the conditional variance of the martingale. Based on the above works, in the present paper, we want to consider the conditionally symmetric random variables and the locally square integrable martingale. For the conditionally symmetric random variables, we can establish the PAC-Bayesian inequality without any dependent assumptions (for example, independence or being a martingale) and any moment assumptions for the sequences. For the locally square integrable martingale, we can remove the bounded restriction.
2.1 Conditionally symmetric random variables
Assume that \(\{X_{k}(h): k\ge1, h\in\mathbb{H}\}\) are conditionally symmetric with respect to \((\mathcal{F}_{n})\) (i.e., \(\mathcal{L}(X_{i}(h)|\mathcal{F}_{i})=\mathcal {L}(-X_{i}(h)|\mathcal{F}_{i})\)). Let
For a distribution ρ over \(\mathbb{H}\), define \(M_{n}(\rho) =\mathbb{E}_{\rho(h)}[M_{n}(h)]\) and \(V_{n}(\rho) =\mathbb{E}_{\rho (h)}[V_{n}(h)]\). We establish the PAC-Bayesian inequality between the partial sums and the total quadratic variation of the partial sums.
Theorem 2.2
Let \(\{\mu_{1},\mu_{2},\ldots\}\) be a sequence of ‘reference’ (‘prior’) distributions over \(\mathbb{H}\), such that \(\mu_{n}\) is independent of \(\mathcal{F}_{n}\) (but can depend on n). Then for all possible distributions \(\rho_{n}\) over \(\mathbb{H}\) given n and for all n simultaneously with probability greater than \(1-\delta\),
The following theorem gives an inequality in the sense of a self-normalized sequence.
Theorem 2.3
Under the assumptions in Theorem 2.2, then for all \(y>0\) and n simultaneously with probability greater than \(1-\frac{\delta}{2}\),
2.2 Martingales
Let \((M_{n}(h))\) be a locally square integrable real martingale adapted to the filtration \((\mathcal{F}_{n})\) with \(M_{0}=0\). The predictable quadratic variation and the total quadratic variation of \((M_{n}(h))\) are, respectively, given by
where \(\Delta M_{n}(h)=M_{n}(h)-M_{n-1}(h)\). For a distribution ρ over \(\mathbb{H}\), define
Theorem 2.4
Let \((M_{n}(h))\) be a locally square integrable real martingale adapted to a filtration \(\mathbb{F}=(\mathcal{F}_{n})\). Let \(\{\mu_{1},\mu_{2},\ldots\}\) be a sequence of ‘reference’ (‘prior’) distributions over \(\mathbb{H}\), such that \(\mu_{n}\) is independent of \(\mathcal{F}_{n}\) (but it can depend on n). Then for all possible distributions \(\rho_{n}\) over \(\mathbb{H}\) given n and for all n simultaneously with probability greater than \(1-\delta\),
Theorem 2.5
Under the assumptions in Theorem 2.4, for all \(y>0\) and n simultaneously with probability greater than \(1-\frac{\delta}{2}\),
3 The proofs of main results
Before giving the proofs of our results, we state the following basic lemmas.
Lemma 3.1
[7]
Let \(\{d_{i}\}\) be a sequence of variables adapted to an increasing sequence of σ-fields \(\{\mathcal{F}_{i}\}\). Assume that the \(\{d_{i}\}\)’s are conditionally symmetric. Then
is a supermartingale with mean ≤1, for all \(\lambda\in\mathbb{R}\).
Lemma 3.2
Under the assumptions of Lemma 3.1, for any \(y>0\), we have
Remark 3.1
Hitczenko [8] proved the above inequality for conditionally symmetric martingale difference sequences, and De la Peña [7] obtained the above inequality without the martingale difference assumption, hence without any integrability assumptions. Note that any sequence of real valued random variables \(X_{i}\) can be ‘symmetrized’ to produce an exponential supermartingale by introducing random variables \(X_{i}'\) such that
and we set \(d_{n}=X_{n}-X_{n}'\).
Proof
By using Fubini’s theorem, we have
where we used the fact
□
The following inequality is about a transformation of the measure inequality [9].
Lemma 3.3
For any measurable function \(\phi(h)\) on \(\mathbb{H}\) and any distributions \(\mu(h)\) and \(\rho(h)\) on \(\mathbb{H}\), we have
Proof of Theorem 2.2
Taking
then from Lemma 3.1 and Lemma 3.3, for all \(\rho_{n}\) and n simultaneously with probability greater than \(1-\frac{\delta}{2}\),
where the second equality is due to the fact that \(\mu_{n}\) is independent of \(\mathcal{F}_{n}\). By applying the same argument to martingales \(-M_{n}(h)\), we obtain the result that, with probability greater than \(1-\delta\),
□
Proof of Theorem 2.3
For all \(y>0\), taking
then from Lemma 3.2 and Lemma 3.3, for all \(\rho_{n}\) and n simultaneously with probability greater than \(1-\frac{\delta}{2}\),
□
In order to prove Theorem 2.4, we need to introduce the following lemma.
Lemma 3.4
Let \((M_{n})\) be a locally square integrable martingale. Putting
for all \(t\in \mathbb{R}\) and \(n\geq0\), denote
Then, for all \(t\in\mathbb{R}\), \((G_{n}(t))\) is a positive supermartingale with \(\mathbb{E}[G_{n}(t)]\leq1\).
Proof
Let X be a square integrable random variable with \(\mathbb{E}X=0\) and \(0<\sigma^{2}:=\mathbb{E}X^{2}<\infty\). Because of the basic inequality
we know
Then, for all \(t \in\mathbb{R}\) and \(n \geq0\), we obtain from (3.1)
Hence, we deduce that, for all \(t \in\mathbb{R}\),
As a result, for all \(t\in\mathbb{R}\), \(G_{n}(t)\) is a positive supermartingale, i.e., for all \(n\geq1\), \(\mathbb {E}[G_{n}(t)]\leq \mathbb{E}[G_{n-1}(t)]\), which implies that \(\mathbb{E}[G_{n}(t)]\leq \mathbb{E}[G_{0}(t)]=1\). □
Proof of Theorem 2.4
Taking
then from Lemma 3.3 and Lemma 3.4, for all \(\rho_{n}\) and n simultaneously with probability greater than \(1-\frac{\delta}{2}\),
where the second equality is due to the fact that \(\mu_{n}\) is independent of \(\mathcal{F}_{n}\). By applying the same argument to martingales \(-M_{n}(h)\), we obtain the result that, with probability greater than \(1-\delta\),
□
Proof of Theorem 2.5
For all \(y>0\), taking
and \(V_{n}(h)= [\frac{[M]_{n}(h)}{3} +\frac{2\langle M\rangle_{n}(h)}{3} ]\), from Lemma 3.2 and Lemma 3.3, we can get for all \(\rho_{n}\) and n simultaneously with probability greater than \(1-\frac {\delta}{2}\),
□
References
Shawe-Taylor, J, Williamson, RC: A PAC analysis of a Bayesian estimator. In: Proceedings of the International Conference on COLT, pp. 2-9 (1997)
Shawe-Taylor, J, Bartlett, PL, Williamson, RC, Anthony, M: Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inf. Theory 44(5), 1926-1940 (1998)
Ralaivola, L, Szafranski, M, Stempfel, G: Chromatic PAC-Bayes bounds for non-IID data: applications to ranking and stationary β-mixing processes. J. Mach. Learn. Res. 11, 1927-1956 (2010)
Lever, G, Laviolette, F, Shawe-Taylor, J: Distribution-dependent PAC-Bayes priors. In: Algorithmic Learning Theory. Lecture Notes in Computer Science, vol. 6331, pp. 119-133 (2010)
Seldin, Y, Laviolette, F, Cesa-Bianchi, N, Shawe-Taylar, J, Auer, P: PAC-Bayesian inequalities for martingales. IEEE Trans. Inf. Theory 58, 7086-7093 (2012)
Seldin, Y, Cesa-Bianchi, N, Auer, P, Laviolette, F, Shawe-Taylor, J: PAC-Bayes-Bernstein inequality for martingales and its application to multiarmed bandits. In: JMLR: Workshop and Conference Proceedings, vol. 26, pp. 98-111 (2012)
De la Peña, VH: A general class of exponential inequalities for martingales and ratios. Ann. Probab. 27(1), 537-564 (1999)
Hitczenko, P: Upper bounds for the \(L_{p}\)-norms of martingales. Probab. Theory Relat. Fields 86(2), 225-238 (1990)
Dupuis, P, Ellis, RS: A Weak Convergence Approach to the Theory of Large Deviations. Wiley Series in Probability and Statistics. Wiley, New York (2011)
Acknowledgements
This work is supported by IRTSTHN (14IRTSTHN023), NSFC (11471104), NCET (NCET-11-0945).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All authors contributed equally to the manuscript, and they read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Wang, Z., Shen, L., Miao, Y. et al. PAC-Bayesian inequalities of some random variables sequences. J Inequal Appl 2015, 244 (2015). https://doi.org/10.1186/s13660-015-0768-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-015-0768-3