Abstract
In this paper, we discuss the strong convergence rates and strong representation of the KaplanMeier estimator and the hazard estimator based on censored data when the survival and the censoring times form negatively associated (NA) sequences. Under certain regularity conditions, strong convergence rates are established for the KaplanMeier estimator and the hazard estimator, and the KaplanMeier estimator and the hazard estimator can be expressed as the mean of random variables, with the remainder of order a.s.
MSC: 60F15, 60F05.
Keywords:
NA sequence; random censorship model; KaplanMeier estimator; strong representation; strong convergence rate1 Introduction and main results
Let be a sequence of true survival times. Random variables (r.v.s) are not assumed to be mutually independent; it is assumed, however, that they have a common unknown continuous marginal distribution function (d.f.) such that . Let the r.v.s be censored on the right by the censoring r.v.s , so that one observes only , where
Here and in the sequel, is the indicator random variable of the event A. In this random censorship model, the censoring times , , are assumed to have the common distribution function such that ; they are also assumed to be independent of the r.v.s ’s. The problem at hand is that of drawing nonparametric inference about F based on the censored observations , . For this purpose, define two stochastic processes on as follows:
the number of uncensored observations less than or equal to t, and
the number of censored or uncensored observations greater than or equal to t. The following nonparametric estimation of F due to Kaplan and Meier [1] is widely used to estimate F on the basis of the data :
Let L be the distribution of the ’s, . Since the sequences and are independent, it follows that . The empirical d.f. of L is defined by
Define (possibly infinite) times , and by
and the empirical d.f. of is defined by
We have then
and
Another question of interest in survival analysis is the estimation of the hazard function h defined as follows when it is further assumed that F has a density f:
is called the cumulative hazard function. The empirical cumulative hazard function is given by
Since is a step function, and , , it can be easily seen that
and
where denote the order statistics of , and is the concomitant of .
There is extensive literature on the KaplanMeier and the hazard estimator and for censored independent observations. We refer to papers by Breslow and Crowley [2], Foldes and Rejto [3] and Gu and Lai [4]. Martingale methods for analyzing properties of are described in the monograph by Gill [5]. However, the censored dependent data appear in a number of applications. For example, repeated measurements in survival analysis follow this pattern, see Kang and Koehler [6] or Wei et al.[7]. In the context of censored time series analysis, Shumway et al.[8] considered (hourly or daily) measurements of the concentration of a given substance subject to some detection limits, thus being potentially censored from the right. Ying and Wei [9], Lecoutre and OuldSaïd [10], Cai [11] and Liang and UñaÁlvarez [12] studied the convergence of for the stationary αmixing data.
The main purpose of this paper is to study the strong convergence rates and strong representation of the KaplanMeier estimator and the hazard estimator based on censored data when the survival and the censoring times form the NA (see the following definition) sequences. Under certain regularity conditions, we find strong convergence rates of the KaplanMeier and hazard estimator, and the expression of the KaplanMeier estimator and the hazard estimator as the mean of random variables, with the remainder of order a.s.
Definition Random variables , are said to be negatively associated (NA) if for every pair of disjoint subsets and of ,
where and are increasing for every variable (or decreasing for every variable) so that this covariance exists. A sequence of random variables is said to be NA if every finite subfamily is NA.
Obviously, if is a sequence of NA random variables, and is a sequence of nondecreasing (or nonincreasing) functions, then is also a sequence of NA random variables.
This definition was introduced by JoagDev and Proschan [13]. A statistical test depends greatly on sampling. The random sampling without replacement from a finite population is NA, but is not independent. NA sampling has wide applications such as in multivariate statistical analysis and reliability theory. Because of the wide applications of NA sampling, the limit behaviors of NA random variables have received more and more attention recently. One can refer to JoagDev and Proschan [13] for fundamental properties, Matula [14] for the three series theorem, and Wu and Jiang [15,16] for the strong convergence.
We give two lemmas, which are helpful in proving our theorems.
Lemma 1.1 (Yang [17], Lemma 1)
Letbe a sequence of negatively associated random variables with zero means and, a.s. (). Letbe such that. Then, for all,
Lemma 1.2Letbe a sequence of NA r.v.s with continuous d.f. F, and letbe the empirical d.f. based on the segments. Then
Proof Similar to the proof of Lemma 4 in Yang [17], we can prove Lemma 1.2. □
Theorem 1.3Letandbe two sequences of NA random variables. Suppose that the sequencesandare independent. Then, for any,
and
For positive reals z and t, and δ taking value 0 or 1, let
Theorem 1.4Assume that the conditions of Theorem 1.3 hold. Then
and
2 Proofs
Proof of Theorem 1.3 It is easy to see from Property P_{7} of JoagDev and Proschan [13] that and are also two sequences of NA r.v.s. Therefore
and
follow from Lemma 1.2 and the fact that both and are empirical distribution functions of L and .
Now, by (1.1) and (1.2), let us write
Therefore, by the combination of equations (2.1) and (2.2), and , for , we obtain
Thus, (1.5) holds.
Now we prove (1.6). By (1.3) and (1.4),
Therefore, by combining the inequality , , and (2.1), for , , we get that
By (1.1),(1.6) and (2.4), using the Taylor expansion, , we obtain
Thence, the combination (1.5), (1.6) holds. This completes the proof of Theorem 1.3. □
Proof of Theorem 1.4 By (2.1),
Thus, by the combination of (2.3),
Noting that and is a step function, we get
Therefore, to prove (1.8), it suffices to prove that for . Let us divide the interval into subintervals , , where , and are such that . For , it is easy to check that
To estimate , we further subdivide each into subintervals , , where such that uniformly in i, j. Now, by (2.1) and , for , it follows that
For , , , let , . Then , and are NA sequences with , , , , .
Taking in Lemma 1.1, yields the following probability bound:
Using the bound and the BorelCantelli lemma, we deduce that a.s. The estimation of is similar noting that for all x and y. Therefore, by (2.6)(2.9), (1.8) holds. (1.9) follows from (2.5) and (1.8). □
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
QW conceived of the study and drafted, complete the manuscript. PC participated in the discussion of the manuscript. QW and PC read and approved the final manuscript.
Authors’ information
Qunying Wu, Professor, Doctor, working in the field of probability and statistics.
Acknowledgements
Supported by the National Natural Science Foundation of China (11061012), project supported by Program to Sponsor Teams for Innovation in the Construction of Talent Highlands in Guangxi Institutions of Higher Learning ([2011] 47), and the Support Program of the Guangxi China Science Foundation (2012GXNSFAA053010, 2013GXNSFDA019001).
References

Kaplan, EM, Meier, P: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc.. 53, 457–481 (1958). Publisher Full Text

Breslow, N, Crowley, J: A large sample study of the life table and product limit estimates under random censorship. Ann. Stat.. 2, 437–453 (1974). Publisher Full Text

Földes, A, Rejtö, L: A LIL type result for the product limit estimator. Z. Wahrscheinlichkeitstheor. Verw. Geb.. 56, 75–84 (1981). Publisher Full Text

Gu, MG, Lai, TL: Functional laws of the iterated logarithm for the productlimit estimator of a distribution function under random censorship or truncation. Ann. Probab.. 18, 160–189 (1990). Publisher Full Text

Gill, R: Censoring and Stochastic Integrals, Math. Centrum, Amsterdam (1980)

Kang, SS, Koehler, KJ: Modification of the Greenwood formula for correlated failure times. Biometrics. 53, 885–899 (1997). PubMed Abstract  Publisher Full Text

Wei, LJ, Lin, DY, Weissfeld, L: Regression analysis of multivariate incomplete failure times data by modelling marginal distributions. J. Am. Stat. Assoc.. 84, 1064–1073 (1989)

Shumway, RH, Azari, AS, Johnson, P: Estimating mean concentrations under transformation for environmental data with detection limits. Technometrics. 31, 347–356 (1988)

Ying, Z, Wei, LJ: The KaplanMeier estimate for dependent failure time observations. J. Multivar. Anal.. 50, 17–29 (1994). Publisher Full Text

Lecoutre, JP, OuldSad, E: Convergence of the conditional KaplanMeier estimate under strong mixing. J. Stat. Plan. Inference. 44, 359–369 (1995). Publisher Full Text

Cai, ZW: Estimating a distribution function for censored time series data. J. Multivar. Anal.. 78, 299–318 (2001). Publisher Full Text

Liang, HY, UñaÁlvarez, J: A BerryEsseen type bound in kernel density estimation for strong mixing censored samples. J. Multivar. Anal.. 100, 1219–1231 (2009). Publisher Full Text

JoagDev, K, Proschan, F: Negative association of random variables with applications. Ann. Stat.. 11(1), 286–295 (1983). Publisher Full Text

Matula, PA: A note on the almost sure convergence of sums of negatively dependent random variables. Stat. Probab. Lett.. 15, 209–213 (1992). Publisher Full Text

Wu, QY, Jiang, YY: A law of the iterated logarithm of partial sums for NA random variables. J. Korean Stat. Soc.. 39(2), 199–206 (2010). Publisher Full Text

Wu, QY, Jiang, YY: Chover’s law of the iterated logarithm for NA sequences. J. Syst. Sci. Complex.. 23(2), 293–302 (2010). Publisher Full Text

Yang, SC: Consistency of nearest neighbor estimator of density function for negative associated samples. Acta Math. Appl. Sin.. 26(3), 385–394 (2003)