- Research
- Open access
- Published:
Strong representation results of the Kaplan-Meier estimator for censored negatively associated data
Journal of Inequalities and Applications volume 2013, Article number: 340 (2013)
Abstract
In this paper, we discuss the strong convergence rates and strong representation of the Kaplan-Meier estimator and the hazard estimator based on censored data when the survival and the censoring times form negatively associated (NA) sequences. Under certain regularity conditions, strong convergence rates are established for the Kaplan-Meier estimator and the hazard estimator, and the Kaplan-Meier estimator and the hazard estimator can be expressed as the mean of random variables, with the remainder of order a.s.
MSC:60F15, 60F05.
1 Introduction and main results
Let be a sequence of true survival times. Random variables (r.v.s) are not assumed to be mutually independent; it is assumed, however, that they have a common unknown continuous marginal distribution function (d.f.) such that . Let the r.v.s be censored on the right by the censoring r.v.s , so that one observes only , where
Here and in the sequel, is the indicator random variable of the event A. In this random censorship model, the censoring times , , are assumed to have the common distribution function such that ; they are also assumed to be independent of the r.v.s ’s. The problem at hand is that of drawing nonparametric inference about F based on the censored observations , . For this purpose, define two stochastic processes on as follows:
the number of uncensored observations less than or equal to t, and
the number of censored or uncensored observations greater than or equal to t. The following nonparametric estimation of F due to Kaplan and Meier [1] is widely used to estimate F on the basis of the data :
where .
Let L be the distribution of the ’s, . Since the sequences and are independent, it follows that . The empirical d.f. of L is defined by
where .
Define (possibly infinite) times , and by
Then . By setting
and the empirical d.f. of is defined by
We have then
and
Another question of interest in survival analysis is the estimation of the hazard function h defined as follows when it is further assumed that F has a density f:
with . The quantity
is called the cumulative hazard function. The empirical cumulative hazard function is given by
where .
Since is a step function, and , , it can be easily seen that
and
where denote the order statistics of , and is the concomitant of .
There is extensive literature on the Kaplan-Meier and the hazard estimator and for censored independent observations. We refer to papers by Breslow and Crowley [2], Foldes and Rejto [3] and Gu and Lai [4]. Martingale methods for analyzing properties of are described in the monograph by Gill [5]. However, the censored dependent data appear in a number of applications. For example, repeated measurements in survival analysis follow this pattern, see Kang and Koehler [6] or Wei et al. [7]. In the context of censored time series analysis, Shumway et al. [8] considered (hourly or daily) measurements of the concentration of a given substance subject to some detection limits, thus being potentially censored from the right. Ying and Wei [9], Lecoutre and Ould-Saïd [10], Cai [11] and Liang and Uña-Álvarez [12] studied the convergence of for the stationary α-mixing data.
The main purpose of this paper is to study the strong convergence rates and strong representation of the Kaplan-Meier estimator and the hazard estimator based on censored data when the survival and the censoring times form the NA (see the following definition) sequences. Under certain regularity conditions, we find strong convergence rates of the Kaplan-Meier and hazard estimator, and the expression of the Kaplan-Meier estimator and the hazard estimator as the mean of random variables, with the remainder of order a.s.
Definition Random variables , are said to be negatively associated (NA) if for every pair of disjoint subsets and of ,
where and are increasing for every variable (or decreasing for every variable) so that this covariance exists. A sequence of random variables is said to be NA if every finite subfamily is NA.
Obviously, if is a sequence of NA random variables, and is a sequence of nondecreasing (or non-increasing) functions, then is also a sequence of NA random variables.
This definition was introduced by Joag-Dev and Proschan [13]. A statistical test depends greatly on sampling. The random sampling without replacement from a finite population is NA, but is not independent. NA sampling has wide applications such as in multivariate statistical analysis and reliability theory. Because of the wide applications of NA sampling, the limit behaviors of NA random variables have received more and more attention recently. One can refer to Joag-Dev and Proschan [13] for fundamental properties, Matula [14] for the three series theorem, and Wu and Jiang [15, 16] for the strong convergence.
We give two lemmas, which are helpful in proving our theorems.
Lemma 1.1 (Yang [17], Lemma 1)
Let be a sequence of negatively associated random variables with zero means and , a.s. (). Let be such that . Then, for all ,
Lemma 1.2 Let be a sequence of NA r.v.s with continuous d.f. F, and let be the empirical d.f. based on the segments . Then
Proof Similar to the proof of Lemma 4 in Yang [17], we can prove Lemma 1.2. □
Theorem 1.3 Let and be two sequences of NA random variables. Suppose that the sequences and are independent. Then, for any ,
and
here and in the sequel, .
For positive reals z and t, and δ taking value 0 or 1, let
where .
Theorem 1.4 Assume that the conditions of Theorem 1.3 hold. Then
and
where a.s. , .
2 Proofs
Proof of Theorem 1.3 It is easy to see from Property P7 of Joag-Dev and Proschan [13] that and are also two sequences of NA r.v.s. Therefore
and
follow from Lemma 1.2 and the fact that both and are empirical distribution functions of L and .
Now, by (1.1) and (1.2), let us write
Therefore, by the combination of equations (2.1) and (2.2), and , for , we obtain
Thus, (1.5) holds.
Now we prove (1.6). By (1.3) and (1.4),
Therefore, by combining the inequality , , and (2.1), for , , we get that
By (1.1),(1.6) and (2.4), using the Taylor expansion, , we obtain
Thence, the combination (1.5), (1.6) holds. This completes the proof of Theorem 1.3. □
Proof of Theorem 1.4 By (2.1),
Thus, by the combination of (2.3),
Noting that and is a step function, we get
Therefore, to prove (1.8), it suffices to prove that for . Let us divide the interval into subintervals , , where , and are such that . For , it is easy to check that
To estimate , we further subdivide each into subintervals , , where such that uniformly in i, j. Now, by (2.1) and , for , it follows that
For , , , let , . Then , and are NA sequences with , , , , .
Taking in Lemma 1.1, yields the following probability bound:
Using the bound and the Borel-Cantelli lemma, we deduce that a.s. The estimation of is similar noting that for all x and y. Therefore, by (2.6)-(2.9), (1.8) holds. (1.9) follows from (2.5) and (1.8). □
Authors’ information
Qunying Wu, Professor, Doctor, working in the field of probability and statistics.
References
Kaplan EM, Meier P: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53: 457–481. 10.1080/01621459.1958.10501452
Breslow N, Crowley J: A large sample study of the life table and product limit estimates under random censorship. Ann. Stat. 1974, 2: 437–453. 10.1214/aos/1176342705
Földes A, Rejtö L: A LIL type result for the product limit estimator. Z. Wahrscheinlichkeitstheor. Verw. Geb. 1981, 56: 75–84. 10.1007/BF00531975
Gu MG, Lai TL: Functional laws of the iterated logarithm for the product-limit estimator of a distribution function under random censorship or truncation. Ann. Probab. 1990, 18: 160–189. 10.1214/aop/1176990943
Gill R Mathematical Centre Tracts 124. In Censoring and Stochastic Integrals. Math. Centrum, Amsterdam; 1980.
Kang SS, Koehler KJ: Modification of the Greenwood formula for correlated failure times. Biometrics 1997, 53: 885–899. 10.2307/2533550
Wei LJ, Lin DY, Weissfeld L: Regression analysis of multivariate incomplete failure times data by modelling marginal distributions. J. Am. Stat. Assoc. 1989, 84: 1064–1073.
Shumway RH, Azari AS, Johnson P: Estimating mean concentrations under transformation for environmental data with detection limits. Technometrics 1988, 31: 347–356.
Ying Z, Wei LJ: The Kaplan-Meier estimate for dependent failure time observations. J. Multivar. Anal. 1994, 50: 17–29. 10.1006/jmva.1994.1031
Lecoutre JP, Ould-Sad E: Convergence of the conditional Kaplan-Meier estimate under strong mixing. J. Stat. Plan. Inference 1995, 44: 359–369. 10.1016/0378-3758(94)00084-9
Cai ZW: Estimating a distribution function for censored time series data. J. Multivar. Anal. 2001, 78: 299–318. 10.1006/jmva.2000.1953
Liang HY, Uña-Álvarez J: A Berry-Esseen type bound in kernel density estimation for strong mixing censored samples. J. Multivar. Anal. 2009, 100: 1219–1231. 10.1016/j.jmva.2008.11.001
Joag-Dev K, Proschan F: Negative association of random variables with applications. Ann. Stat. 1983, 11(1):286–295. 10.1214/aos/1176346079
Matula PA: A note on the almost sure convergence of sums of negatively dependent random variables. Stat. Probab. Lett. 1992, 15: 209–213. 10.1016/0167-7152(92)90191-7
Wu QY, Jiang YY: A law of the iterated logarithm of partial sums for NA random variables. J. Korean Stat. Soc. 2010, 39(2):199–206. 10.1016/j.jkss.2009.06.001
Wu QY, Jiang YY: Chover’s law of the iterated logarithm for NA sequences. J. Syst. Sci. Complex. 2010, 23(2):293–302. 10.1007/s11424-010-7258-y
Yang SC: Consistency of nearest neighbor estimator of density function for negative associated samples. Acta Math. Appl. Sin. 2003, 26(3):385–394.
Acknowledgements
Supported by the National Natural Science Foundation of China (11061012), project supported by Program to Sponsor Teams for Innovation in the Construction of Talent Highlands in Guangxi Institutions of Higher Learning ([2011] 47), and the Support Program of the Guangxi China Science Foundation (2012GXNSFAA053010, 2013GXNSFDA019001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
QW conceived of the study and drafted, complete the manuscript. PC participated in the discussion of the manuscript. QW and PC read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Wu, Q., Chen, P. Strong representation results of the Kaplan-Meier estimator for censored negatively associated data. J Inequal Appl 2013, 340 (2013). https://doi.org/10.1186/1029-242X-2013-340
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1029-242X-2013-340