On the parametric maximum likelihood estimator for independent but non-identically distributed observations with application to truncated data

We investigate the parametric maximum likelihood estimator for truncated data when the truncation value is diﬀerent according to the observed individual or item. We extend Lehmann’s proof (1983) of the asymptotic properties of the parametric maximum likelihood estimator in the case of independent non-identically distributed observations. Two cases are considered: either the number of distinct probability distribution functions that can be observed in the population from which the sample comes from is ﬁnite or this number is inﬁnite. Suﬃcient conditions for consistency and asymptotic normality are provided for both cases.


Introduction
Truncated data arise frequently in survival analysis or in astronomy.For instance, lefttruncated data occur when one wants to estimate the luminosity of an astronomical object but one can only detect astronomical objects which are sufficiently bright (Woodroofe, 1985).by the truncation value t when the variable X is observed only if its realization is smaller (resp.larger) than t.The truncation value t may be different according to the individual or item.Let (x 1 ,t 1 ), (x 2 ,t 2 ), . . ., (x n ,t n ) be the n truncated observations, where x i is the realization of X and t i is the truncation value.All observed data meet the condition x i ≤ t i in the case of right-truncated data or the condition x i ≥ t i in the case of left-truncated data.Right-truncated (resp.left-truncated) data on X consist of independent realizations of random variables with respective distribution the conditional distribution of X i given {X i ≤ t i } (resp.{X i ≥ t i }), that is with cumulative distribution function F (.; θ)/F (t i ; θ) (resp.F (.; θ)/ (1 − F (t i ; θ))) and probability distribution function f i (.; θ) = f (.; θ)/F (t i ; θ) (resp.f i (.; θ) = f (.; θ)/ (1 − F (t i ; θ))).Consequently, if the truncation value is different according to the individual or item, truncated data consist of independent but non-identically distributed observations.In this paper, we deal with this case.
Asymptotic properties of the parametric maximum likelihood estimator for independent and identically distributed observations in the multiparameter case have been explored by Chanda (1954) and Lehmann (1983).Chanda (1954) solved the normal equations to prove the consistency of this estimator whereas Lehmann (1983) studied the sign of a function of the log-likelihood on a sphere with center the true value of the parameter vector.While Bradley and Gart (1962) developed the extension of the proof of Chanda (1954) for independent but non-identically distributed observations, there is no extension of the proof of Lehmann (1983).There are some other proofs of asymptotic properties like the proof using empirical processes and exposed by Van der Vaart and Wellner (2000) that yields to different statements of assumptions that may not be easy to verify in specific situations.In the present article, we develop the extension of the proof of Lehmann (1983) in the case of independent but non-identically distributed observations.In their paper, Bradley and Gart (1962) considered two cases: either the number of distinct probability distribution functions that can be observed in the population from which the sample comes from is finite or this number is infinite.For the sake of generality, we consider these both cases.In the case of an infinite number of distinct probability distribution functions, the assumptions that are sufficient conditions for consistency and asymptotic normality of the parametric maximum likelihood estimator are slighlty different than in the paper of Bradley and Gart (1962).In the remaining of this paper are presented the assumptions, the theorems and the proofs of the asymptotic properties of the parametric maximum likelihood estimator.

Asymptotic properties
In this paper, for a sequence of random variables with index n, the convergence in probability is written

Infinite number of distinct probability distribution functions
Let (x 1 , x 2 , . . ., x n ) be the observations of n independent random variables with respective and not necessarily identical probability distribution functions f i (.; θ), for i = 1, . . ., n, where θ = (θ 1 , θ 2 , . . ., θ j , . . ., θ r ) is a vector of unknown parameters shared by all these random variables.The vector θ belongs to Θ, an open subset of R r .Let θ 0 = (θ 0 1 , θ 0 2 , . . ., θ 0 j , . . ., θ 0 r ) be the true value of the parameter.Let S i ⊂ R be the support of the probability distribution function f i (.; θ).The support S i must be independent of the vector of unknown parameters θ.As well-known, the likelihood of the sample is written L(x 1 , x 2 , . . ., x n ; θ) = n i=1 f i (x i ; θ) and the maximum likelihood estimator is defined as θn = argmax θ∈Θ L(x 1 , x 2 , . . ., x n ; θ).The normal equations are where ∇ θ is the gradient operator.
Remark 1.We assumed that the unknown parameter vector is shared by all the densities because it is the case for truncated data.However, theorems and proofs remain valid when it is not the case.
Let us introduce a set of sufficient conditions for the following theorems.
Assumption 1.The maximum likelihood estimator is solution of the normal equations.
Assumption 2. The normal equations have an unique root.
Remark 4. From Assumption 5 and Assumption 7, we were already sure that I θ 0 exists and is a positive semi-definite matrix.
Assumption 11.For all ǫ > 0, where I{A} is the indicator of set A.
Remark 5. Assumption 11 is the assumption required for the multivariate central limit theorem for independent non-identically distributed observations (Feller, 1971).
The following theorem states the consistency of the parametric maximum likelihood estimator.
Let (ζ, ε) be a vector of arbitrary positive constants.The results above allow to write three clusters of inequalities.For all ζ, for all ε, there exists n 0 such that for all n larger than n 0 and for all (j, p, q) ∈ {1, . . ., r} 3 , the following probabilities are bounded by ε/ r 1 + r + r 2 .Let S denote the event involving these r(1 + r + r 2 ) inequalities: From the above majorations of the different probabilities, we get P (S * ) < ε where S * is the complementary of S and thus P (S) > 1 − ε.
The following theorem states the asymptotic normality of the parametric maximum likelihood estimator.
In terms of matrix, , where The vectors ∇ θ logf i (X i ; θ)| θ 0 , for all i = 1, . . ., n, are independent but not identically distributed with zero mean and covariance matrix V i (θ 0 ) = V ijp (θ 0 ) 1≤j,p≤r , where We know from Assumption 5 that and from Assumption 10 that So, from Assumption 11 and the multivariate central limit theorem for independent non-identically distributed random variables, we get From Assumptions 7-8, the consistency of the maximum likelihood estimator and the Slutsky's theorem, one obtains for all (j, p) ∈ {1, . . ., r} 2 , These convergences in probability and the weak convergence of 1

Finite number of distinct probability distribution functions
Let N be the number of distinct probability distribution functions that can be observed in the population from which the sample comes from.For i = 1, . . ., N , let n i be the number of observations with density f i (.; θ) and n = N i=1 n i be the total number of observations.For i = 1, . . ., N , let µ i = n i /n be the proportion of observations with density f i (.; θ).One can easily prove that there exists constants (λ i ) 1≤i≤N in ]0, 1[ N such that for all i = 1, . . ., N , the proportion µ i tends to λ i when n tends to +∞.These quantities satisfy N i=1 λ i = 1.
Remark 6.Note that the case where there exists q such that λ q = 0 (resp.λ q = 1) corresponds to the case where there are in fact only N − 1 distinct distributions (resp.there is only one distribution).
Let Assumptions 1-5 be the same assumptions than in the previous case, except that n is replaced by N in the Assumptions 3-5.
Remark 7. From Assumption 5, we were already sure that I θ 0 is a positive semi-definite matrix.
The asymptotic behavior of the parametric maximum likelihood estimator in this case is given in the following two theorems.