Conditional risk estimate for functional data under strong mixing conditions

We consider the problem of nonparametric estimation of the conditional hazard function for functional mixing data. More precisely, given a strictly stationary random variables Zi = (Xi; Yi) i2N , we investigate a kernel estimate of the conditional hazard function of univariate response variable Yi given the functional variable Xi. The principal aim of this paper is to give the mean squared convergence rate and to prove the asymptotic normality of the proposed estimator.


Introduction
The statistical problems involved in the modelization of functional data have received an increasing interest in the literature. The infatuation for this topic is linked with many applications areas in which the data are collected in the functional order. Under this supposition, the statistical analysis focuses on a framework of infinite dimension for the data under study. This field of modern statistics has received much attention in the last 20 years, and it has been popularised in the book of Ramsay and Silverman [23]. This type of data appears in many fields of applied statistics: environmetrics [8], chemometrics [2], meteorological sciences [3], etc..
In this paper, we are interested in the nonparametric estimation of the conditional hazard function when the covariates are of functional nature.
The nonparametric estimation of the hazard and/or the conditional hazard function is quite important in a variety of fields including medicine, reliability, survival analysis or in seismology. The literature on this model in multivariate statistics is abundant. Historically, the hazard estimate was introduced by Watson and Leadbetter [30], since, several results have been added, see for example, Roussas [26] (for previous works).
From a theoretical point of view, a sample of functional data can be involved in many different statistical problems, such as for example: classification and principal components analysis (PCA) [5,6] or longitudinal studies, regression and prediction [2,7].
The recent monograph by Ferraty and Vieu [13] summaries many of their contributions to the non-parametric estimation with functional data; among other properties, consistency of the conditional density, conditional distribution and regression estimates are established in the i.i.d. case as well as under dependence conditions (strong mixing). Almost complete rates of convergence are also obtained, and the different techniques are applied to several examples of functional data samples. Related work can be seen in the paper of Masry [19], where the asymptotic normality of the functional non-parametric regression estimate is proven, considering strong mixing dependence conditions for the sample data. For automatic smoothing parameter selection in the regression setting, see Rachdi and Vieu [22].
The literature is strictly not limited in the case where the data is of functional nature (a curve). The first result in this context, was given by Ferraty et al . [12]. They established the almost complete convergence of the kernel estimate of the conditional hazard function in the i.i.d. case and under αmixing condition . Recently, Rabhi et al. [21] studied the mean quadratic convergence in the i.i.d. case of this estimate. More recently Mahiddine et al. [18] give the uniform version of the almost complete convergence rate in the i.i.d. case.
The estimation of the hazard function is a problem of considerable interest, especially to inventory theorists, medical researchers, logistics planners, reliability engineers and seismologists. The non-parametric estimation of the hazard function has been extensively discussed in the literature. Beginning with Watson and Leadbetter [30], there are many papers on these topics: Ahmad [1], Singpurwalla andWong [27], etc.We can cite Quintela [20] for a survey.
When hazard rate estimation is performed with multiple variables, the result is an estimate of the conditional hazard rate for the first variable, given the levels of the remaining variables. Many references, practical examples and simulations in the case of non parametric estimation using local linear approximations can be found in Spierdijk [28]. The main aim of this paper, is to study, under general conditions, the asymptotic proprieties of the functional data kernel estimate of the conditional hazard function introduced by Ferraty et al. [12]. More precisely, we treat the L 2 -convergence rate by giving the exact expression involved in the leading terms of the quadratic error. In addition, we establish the asymptotic normality of the construct estimator. We point out that our asymptotic results are useful in some statistical problems such as the choice of the smoothing parameters, the determination of confidence intervals and in risk analysis. The present work extended to dependent case the result of Rabhi et al. [21] given in i.i.d. case functional. We note that, one of the main difficulty, when dealing with functional variables, relies on the difficulty for choosing some appropriate measure of reference in infinite dimensional spaces. The main feature of our approach is to build estimates and to derive their asymptotic properties without any notion of density for the functional variable X. This approach allows us to avoid the use of a reference measure in such functional spaces. In each of the above described sections, we will give general asymptotic results without assuming existence of such a density, and each of these results will be discussed in relation with earlier literature existing in the usual finite dimensional case.
Our paper presents some asymptotic properties related with the non parametric estimation of the conditional hazard function. In a functional data setting, the conditioning variable is allowed to take its values in some abstract semi-metric space. In this case, Ferraty et al. [29] define nonparametric estimators of the conditional density and the conditional distribution. They give the rates of convergence (in an almost complete sense) to the corresponding functions, in an a dependence (α-mixing) context. In Rabhi et al. [21], the same properties are shown in an i.i.d. context in the data sample. We extend their results to dependent case by calculating the bias and variance of these estimates, and establishing their asymptotic normality, considering a particular type of kernel for the functional part of the estimates. Because the hazard function estimator is naturally constructed using these two last estimators, the same type of properties is easily derived for it. Our results are valid in a real (one-and multi-dimensional) context.
The paper is organized as follows: the next section we present our model. Section 3 is dedicated to fixing notations and hypotheses. We state our main results in Section 4. The Section 5 is devoted to some discuss on the applicability of our asymptotic result in some statistical problems such as the choice of the smoothing parameters, the determination of confidence intervals and in seismology analysis.

The model
Consider Z i = (X i ,Y i ), i ∈ N be a F × R-valued measurable strictly stationary process, defined on a probability space (Ω, A , P), where (F , d) is a semi-metric space.
In the following x will be a fixed point in F and N x will denote a fixed neighborhood of x. We assume that the regular version of the conditional probability of Y given X exists. Moreover, we suppose that, for all z ∈ N x the conditional distribution function of Y given X = z, F z (·) , is 3times continuously differentiable and we denote by f z its conditional density with respect to (w.r.t.) Lebesgue's measure over R. In this paper, we consider the problem of the nonparametric estimation of the conditional hazard function defined, for all y ∈ R such that F x (y) < 1, by .
In our spatial context, we estimate this function by , , ∀y ∈ R We can write an estimator of the first derivative of the hazard function through the first derivative of the estimator.
It is therefore natural to try to construct an estimator of the derivative of the function h X on the basis of these ideas. To estimate the conditional distribution function and the conditional density function in the presence of functional conditional random variable X.
The kernel estimator of the derivative of the function conditional random functional h X can therefore be constructed as follows: the estimator of the derivative of the conditional density is given in the following formula: Later, we need assumptions on the parameters of the estimator, ie on K, H, H , h H andh K are little restrictive. Indeed, on one hand, they are not specific to the problem estimate of h X (but inherent problems of F X , f X and f X estimation), and secondly they consist with the assumptions usually made under functional variables, with K is the kernel, H is a given continuously differentiable distribution function, h K = h K,n (resp. h H = h H,n ) is a sequence of positive real numbers and H is the derivative of H. Furthermore, the estimator h x (y) can we written as Our main purpose is to study the L 2 -consistency and the asymptotic normality of the nonparametric estimate h x of h x when the random filed (Z i , i ∈ N) satisfies the following mixing condition.

Notations and hypotheses
All along the paper, when no confusion is possible, we will denote by C and C some strictly positive generic constants. In order to establish our asymptotic results we need the following hypotheses, for all r > 0 and i ∈ N: is an α-mixing sequence whose the coefficients of mixture verify: Note that (H0) can be interpreted as a concentration hypothesis acting on the distribution of the f.r.v. X, whereas (H2) concerns the behavior of the joint distribution of the pairs (X i , X j ). In fact, this hypothesis is equivalent to assume that, for n large enough Its derivative K exists and is such that there exist two constants C and C with −∞ < C < K (t) < C < 0 for 0 ≤ t ≤ 1. (H6) H has even bounded derivative function supported on [0, 1] that verifies (H7) There exist sequences of integers (u n ) and (v n ) increasing to infinity such that (u n + v n ) ≤ n, satisfying where q n is the largest integer such that q n (u n + v n ) ≤ n.

Remarks on the assumptions
Remark 3.1. Assumption (H0) plays an important role in our methodology. It is known as (for small h) the "concentration hypothesis acting on the distribution of X" in infi-nite-dimensional spaces. This assumption is not at all restrictive and overcomes the problem of the non-existence of the probability density function. In many examples, around zero the small ball probabilityφ x (h) can be written approximately as the product of two independent functions ψ(x) and ϕ(h) as φ x (h) = ψ(x)ϕ(h) + o(ϕ(h)). This idea was adopted by Masry [19] who reformulated the Gasser et al. [14] one. The increasing proprety of φ x (·) implies that ζ x h (·) is bounded and then integrable (all the more so ζ x 0 (·) is integrable).
Without the differentiability of φ x (·), this assumption has been used by many authors where ψ(·) is interpreted as a probability density, while ϕ(·) may be interpreted as a volume parameter. In the case of finite-dimensional spaces, that is S = R d , it can be seen that φ where C(d) is the volume of the unit ball in R d . Furthermore, in infinite dimensions, there exist many examples fulfilling the decomposition mentioned above. We quote the following (which can be found in Ferraty et al. [10]): The function ζ x h (·) which intervenes in Assumption (H4) is increasing for all fixed h. Its pointwise limit ζ x 0 (.) also plays a determinant role. It intervenes in all asymptotic properties, in particular in the asymptotic variance term. With simple algebra, it is possible to specify this function (with ζ 0 (u) := ζ x 0 (u) in the above examples by: Assumption (H2) is classical and permits to make the variance term negligible.

Mean squared convergence
The first result concerns the L 2 -consistency of h x (y).
Proof. By using the same decomposition used in ( Theorem 3.1 Rabhi et al. [21], P.408), we show that the proof of Theorem 3.1 can be deduced from the following intermediates results: Remark 3.4. Observe that, the result of this lemma permits to write Under the hypotheses of Theorem (3.1), we have and .

Remark 3.5.
It is clear that, the results of Lemmas (3.2 and 3.3) allows to write

Asymptotic normality
This section contains results on the asymptotic normality of h X (y) and h X (y). Let us assume that h X is sufficiently smooth ( at least of class C 2 ).
We can write an estimator of the first derivative of the hazard function through the first derivative of the estimator. Later, we need assumptions on the parameters of the estimator, ie on K, H, H , h H and h K are little restrictive. Indeed, on one hand, they are not specific to the problem estimate of h X (but inherent problems of F X , f X and f X estimation), and secondly they consist with the assumptions usually made under functional variables.
To obtain the asymptotic normality of the conditional estimates, we have to add the following assumptions: (H8) H is twice differentiable. (H9) The bandwidth h H and h K , small ball probability φ z (h) and arithmetical α mixing coefficient with order a > 3 satisfying Theorem 3.2. Assume that (H0)-(H9) hold, then we have for any x ∈ A , and D → means the convergence in distribution.
Obviously, if one imposes some additional assumptions on the function φ x (·) and the bandwidth parameters (h K and h H ) we can improved our asymptotic normality by removing the bias term B n (x, y).

Corollary 3.1.
Under the hypotheses of Theorem 3.2 and if the bandwidth parameters (h K and h H ) and if the function φ x (h K ) satisfies: Proof of Theorem and Corollary. We consider the decomposition and let for the first term of (3.3) we can write because the estimator h X (·) converge a.co. to h X (·) we have S will be a fixed compact subset of R + , for the second term of (3.3) we have  N(0, 1).
Lemma 3.6. Under the hypotheses of Theorem 3.2

Applications
In this section we emphasize the potential impact of our work by studying its practical interest in some important statistical problems. Moreover, in order to show the easily implementation of our approach on a concrete cases, we discuss in the second part of this section the practical utilization of our model in risk analysis.
• On the choices of the bandwidths parameters: As all smoothing by a kernel method, the choice of bandwidths parameters has crucial role in determining the performance of the estimators. The mean quadratic error given in Theorem (3.1) is a basic ingredient to solve this problem. Usually, the ideal theoretical choices are obtained by minimizing this error.
Here, we have explicated its leading term which is .
Then, the smoothing parameters minimizing this leading term is asymptotically optimal with respect the L 2 -error. However, the practical utilization of this criterium requires some additional computational efforts. More precisely, it requires the estimation of the unknown quantities Ψ 0 , Φ 0 , f x (y) and F x (y). Clearly, all these estimations can be obtained by using a pilots estimators of the conditional distribution function F x (y) and of the conditional density f x (y). Such estimations are possible by using the kernel methods, with a separate choice of the bandwidths parameters between both models. More preciously, for the conditional density, we propose to adopt, to the functional case, the bandwidths selectors studied by Bouraine et al. [6] by considering the following criterion while, for the the conditional distribution function we can use the cross-validation rule proposed by De Gooijer and Gannoun (2000) (in vectorial case) where W 1 , W 2 and W are some suitable trimming functions and ∑ i∈I k,l n,ςn with I k,l n,ς n = {i such that |i − k| ≥ ς n and |i − l| ≥ ς n and I i n,ς n = { j such that | j − i| ≥ ς n }.
Of course, we can also adopt another selection methods, such that, the parametric bootstrap method, proposed by Hall et al. [16] and Hyndman et al. [17] for, respectively, the conditional cumulative distribution function and the conditional density in the finite dimensional case. Nevertheless, a data-driven method allows to overcome this additional computation is very important in practice and is one of the natural prospects of the present work.
• Confidence intervals: The main application of Theorem 3.2 is to build confidence band for the true value of h x (y). Similarly to the previous application, the practical utilization of our result in this topic requires the estimation of the quantity σ 2 h (x, y). A plug-in estimate for the asymptotic standard deviation σ 2 h (x, y) can be obtained by using the estimators f x (y) and F x (y) of f x (y) and F x (y). Then we get and Clearly, the function φ x (·) does not appear in the calculation of the confidence interval by simplification. More precisely, we obtain the following approximate (1 − ζ ) confidence band for h x (y) where t 1−ζ /2 denotes the 1 − ζ /2 quantile of the standard normal distribution.

Appendix
In the following, we will denote ∀i Proof of Lemma 3.1. Firstly, for E[ f x N (y)], we start by writing The latter can be re-written, by using a Taylor expansion under (H3), as follows Thus, we get Let ψ l (·, y) := ∂ l f · (y) ∂ l y : for l ∈ {0, 2}, since Φ l (0) = 0, we have So, .
Similarly to Ferratyet al. [10] we show that Hence, Secondly, concerning E[ F x N (y)], we write by an integration by part The same steps used to studying E[ f x N (y)] can be followed to prove that Proof of Lemma 3.2. For the first quantity Var[ f x N (y)], we have Thus Let us calculate the quantity Var [Γ 1 (x)]. We have: So, by using the same arguments as those used in pervious lemma we get which implies that Now, let us focus on the covariance term. To do that, we need to calculate the asymptotic behavior of quantity defined as with c n → ∞, as n → ∞.
For all (i, j) we write and we use the fact that E H i (y)H j (y)|(X i , X j ) = O(h 4 H ); ∀i = j, For J 1,n : by means of the integral realized above and under (H2) and (H5), we get and It follows that, the hypothesis (H0), (H2) and (H5), imply that Hence On the other hand, these covariances can be controled by mean of the usual Davydov-Rios's covariance inequality for mixing processes (see Rio [25], formula 1.12a). Together with (H1), this inequality leads to: By the fact, ∑ n a − 1 , we get by applying (H1), Thus, by using the following classical technique (see Bosq [5]), we can write , and owing to the right inequality in (H7b), we can deduce Finally, In conclusion, we have Now, for F x N (y), (resp. F x D ) we replace H" i (y) by H i (y) (resp. by 1) and we follow the same ideas, under the fact that H ≤ 1 . and This yields the proof.
Proof of Lemma 3.3. The proof of this lemma follows the same steps as the previous Lemma. For this, we keep the same notation and we write For the first term, we have under (H4) Therefore, So, by using similar arguments as those invoked in the proof of Lemma 3.2, and we use once again the boundedness of K and H, and the fact that (H1) and (H6) imply that Moreover, the right part of (H7b) implies that Meanwhile, using the Davydov-Rio's inequality in Rio [25] for mixing processes leads to we deduce easily that for any c n > 0 : It suffices now to take c n = h −2 to get the following expression for the sum of the covariances: From (5.4) and (5.5) we deduce that The same arguments can be used to shows that and .

Proof of Lemma 3.4. Let
Obviously, we have Thus, the asymptotic normality of n(σ f (x, y)) 2 −1/2 S n , is sufficient to show the proof of this Lemma. This last is shown by the blocking method, where the random variables Λ i are grouped into blocks of different sizes defined.
We consider the classical big-and small-block decomposition. We split the set {1, 2, . . . , n} into 2k n + 1 subsets with large blocks of size u n and small blocks of size v n and put k n := n u n + v n .
Assumption (H7)(ii) allows us to define the large block size by u n =: Using Assumption (H7) and simple algebra allows us to prove that v n u n → 0, u n n → 0, u n → 0, and n u n α(v n ) → 0 (5.7) Now, let ϒ j , ϒ j and ϒ j be defined as follows: Cleary, we can write ϒ j + ϒ k r =: S n + S n + S n .
We prove that for every ε > 0. Expression (5.8) show that the terms S n and S n are negligible, while Equations (5.9) and (5.10) show that the ϒ j are asymptotically independent, verifying that the sum of their variances tends to σ 2 f (x, y). Expression (5.11) is the Lindeberg-Feller's condition for a sum of independent terms. Asymptotic normality of S n is a consequence of Equations (5.8)-(5.11).
• Proof of (5.8) Because E(Λ j ) = 0, ∀ j, we have that By the second-order stationarity we get Then Simple algebra gives us kv n n ∼ = n u n + v n v n n ∼ = v n u n + v n ∼ = v n u n −→ 0 as n → ∞.
Using Equation (5.2) we have lim n→∞ Π 1 n = 0 (5.12) Now, let us turn to Π 2 /n. We have Cov Λ m j +l 1 , Λ m j +l 2 with m i = i(u n + v n ) + v n . As i = j, we have |m i − m j + l 1 − l 2 | ≥ u n . It follows that where ϑ n = n − k n (u n + v n ); by the definition of k n , we have ϑ n ≤ u n + v n . Then 1 n E S n 2 ≤ u n + v n n Var (Λ 1 (x)) + 1 n and by the definition of u n and v n we achieve the proof of (ii) of Equation (5.8). • Proof of (5.9) We make use of Volkonskii and Rozanov's lemma (see the appendix in Masry, [19]) and the fact that the process (X i , X j )is strong mixing. Note that ϒ a is F j a i a -mesurable with i a = a(u n + v n ) + 1 and j a = a(u n + v n ) + u n ; hence, with V j = exp itn −1/2 ϒ j we have which goes to zero by the last part of Equation (5.7). Now we establish Equation (5.10).
• Proof of (5.10) Note that Var(S n ) −→ σ 2 f (x, y) by Equation (5.8) and since Var(S n ) −→ σ 2 f (x, y) (by the definition of the Λ i and Equation (5.3)). Then because all we have to prove is that the double sum of covariances in the last equation tends to zero. Using the same arguments as those previously used for Π 2 in the proof of first term of Equation (5.8)we obtain by replacing v n by u n we get 1 n k−1 ∑ j=0 E ϒ 2 j 5 = ku n n Var (Λ 1 ) + o(1).
Making use Assumptions (H5) and (H6), we have which goes to zero as n goes to infinity by Equation (5.7). Then for n large enough, the set {|ϒ j | > ε nσ 2 f (x, y) −1/2 } becomes empty, this completes the proof and therefore that of the asymptotic normality of n(σ f (x, y)) 2 −1/2 S n , Proof of Lemma 3.6. It is clear that, the result of Lemma (3.1) and Lemma (3.2) permits us Moreover, the asymptotic variance of F x D − F x N given in remark (3.5) allows to obtain nh H φ x (h K ) By combining result with the fact that we obtain the claimed result.
Proof of Lemma 3.7. The proof is based on decomposition (3.1). Therefore, Lemma 3.7 is consequence of a special case of the lemmas Lemma 3.1 with Lemma 3.4 (it suffices to replace f x N (y) and f x (y) by f x N (y) and f x (y)) Remark 3.4 and Lemma 3.6.
ACKNOWLEDGEMENTS. The authors thank the Editor-in-chief, the Associate Editor and the anonymous referees immensely for a careful reading of the paper.