Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes I

Amadou Diadié Ba; Gane Samb LO; Diam Ba

doi:10.2991/jsta.2018.17.1.12

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Volume 17, Issue 1, March 2018, Pages 158 - 171

Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes I

Authors

Amadou Diadié Baamadou-diadie.ba@edu.ugb.en

LERSTAD, Gaston Berger University, Saint-Louis, SENEGAL

Gane Samb LOgane-samb.lo@ugb.edu.sn gslo@aust.edu.ng

LERSTAD, Gaston Berger University, Saint-Louis, SENEGAL*, Associate Researcher, LASTA, Pierre et Marie University, Paris, FRANCE, Assiated Professor, African University of Sciences and Technology, Abuja, NIGERIA

Diam Badiam.ba@edu.ugb.en

LERSTAD, Gaston Berger University, Saint-Louis, SENEGAL

^*

1178, Evanston Drive, NW, Calgary, Canada, T3P 0J9

Received 8 May 2017, Accepted 21 December 2017, Available Online 31 March 2018.

DOI: 10.2991/jsta.2018.17.1.12 How to use a DOI?
Keywords: Divergence measures estimation; Asymptotic normality; Wavelet theory; wavelets empirical processes; Besov spaces
Abstract: We deal with the normality asymptotic theory of empirical divergences measures based on wavelets in a series of three papers. In this first paper, we provide the asymptotic theory of the general of ϕ-divergences measures, which includes the most common divergence measures : Renyi and Tsallis families and the Kullback-Leibler measures. Instead of using the Parzen nonparametric estimators of the probability density functions whose discrepancy is estimated, we use the wavelets approach and the geometry of Besov spaces. One-sided and two-sided statistical tests are derived. This paper is devoted to the foundations the general asymptotic theory and the exposition of the mains theoretical tools concerning the ϕ-forms, while proofs and next detailed and applied results will be given in the two subsequent papers which deal important key divergence measures and symmetrized estimators.
Copyright: Copyright © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

1.1. General Introduction

In this paper, we deal with divergence measures estimation using essentially wavelets density function estimation. Let 𝒫 be a class of probability measures on ℝ^d, d ≥ 1, a divergence measure on 𝒫 is a function

(1.1)

$𝒟:𝒫2→ℝ¯(ℚ,𝕃)↦𝒟(ℚ,𝕃)$

such that 𝒟(ℚ, ℚ) = 0 for any ℚ such that (ℚ, ℚ) in the domain of application of 𝒟.

The function 𝒟 is not necessarily an application. And if it is, it is not always symmetrical and it does neither have to be a metric. A great number of them are based on probability density functions (pdf). So let us suppose that any ℚ ∈ 𝒫 admits a pdf f_ℚ with respect to a σ-finite measure ν on (ℝ^d, ℬ(ℝ^d)), which is usually the Lebesgue measure λ_d (with λ₁ = λ) or a counting measure on ℝ^d.

We may present the following divergence measures.

(1)
The $L22$ -divergence measure :
(1.2) $𝒟L2(ℚ,𝕃)=∫ℝd(fℚ(x)−f𝕃(x))2dν(x).$
(2)
The family of Renyi’s divergence measures indexed by α ≠ 1, α > 0, known under the name of Renyi-α :
(1.3) $𝒟R,α(ℚ,𝕃)=1α−1log(∫ℝdfℚα(x)f𝕃1−α(x)dν(x)).$
(3)
The family of Tsallis divergence measures indexed by α ≠ 1, α > 0, also known under the name of Tsallis-α :
(1.4) $𝒟T,α(ℚ,𝕃)=1α−1(∫ℝdfℚα(x)f𝕃1−α(x)−1)dν(x);$
(4)
The Kullback-Leibler divergence measure
(1.5) $𝒟KL(ℚ,𝕃)=∫ℝdfℚ(x)log(fℚ(x)/f𝕃(x))dν(x).$
The latter, the Kullback-Leibler measure, may be interpreted as a limit case of both the Renyi’s family and the Tsallis’ one by letting α → 1. As well, for α near 1, the Tsallis family may be seen as derived from 𝒟_R,α(ℚ, 𝕃) based on the first order expansion of the logarithm function in the neighborhood of the unity.

From this small sample of divergence measures, we may give the following remarks.

(a)
The $L22$ -divergence measure is both an application and a metric on 𝒫², where 𝒫 is the class of probability measures on ℝ^d such that
$∫ℝdfℚ2(x)dν(x)<+∞.$
(b)
For example, for both the Renyi and the Tsallis families, we may have integrability problems and lack of symmetry. From this sample tour, we have to be cautious, when speaking about divergence measures as applications and/or metrics. In the most general case, we have to consider the divergence measure between two specific probability measures as a number or a real parameter.

Originally, divergence measures came as extensions and developments of information theory that was first set for discrete probability measures. In such a situation, the boundedness of these discrete probability measures above zero and below +∞ was guaranteed. That is, the following assumption holds :

Boundedness Assumption (BD). There exist two finite numbers 0 < κ₁ < κ₂ < +∞ such that
(1.6) $κ1≤fℚ,f𝕃≤κ2.$
If Assumption (1.6) holds, we do not have to worry about integrability problems, especially for Tsallis, Renyi and Kullback-Leibler measures, in the computations arising in the estimation theories. This explains why Assumption (1.6) is systematically used in a great number of works in that topic, for example, in [Singh and Poczos (2014)], [Krishnamurthy et al.2014], [Hall(1987)], to cite a few. But instead of Assumption (1.6), we use the following
Modified Boundedness Condition : There exist 0 < κ₁ < κ₂ < +∞ and a compact domain D as large as possible such that
(1.7) $κ1≤fℚ1D,f𝕃1D≤κ2.$
This implies that the modified divergence measure, denoted by 𝒟^(m), is applied to the modified pdf’s :
$fℚ(m)=D1−1fℚ1D and fℙ(m)=D2−1f𝕃1D,$
where D₁ and D₂ are the integrals of f_ℚ and f_𝕃 of D, respectively. Based on this technique, that we apply in case of integrability problems, we will suppose, when appropriate, that Assumption (1.6) holds on a compact set D.

Although we are focusing on the aforementioned divergence measures in this paper, it is worth mentioning that there exist quite a few number of them. Let us cite for example the ones named after : Ali-Silvey or f-divergence [Topsoe(2000)], Cauchy-Schwarz, Jeffrey divergence (see [Evren(2012)]), Chernoff (See [Evren(2012)]), Jensen-Shannon (See [Evren(2012)]). According to [Cichocki and Amari(2010)], there is more than a dozen of different divergence measures in the literature. In a longer version of this paper (see [Ba et al.(2017)]), some important applications of them are highlighted with there references. The reader, who is interested by a so important review topic is referred to that paper.

In the next subsection, we describe the frame in which we place the estimation problems we deal in this paper.

1.2. Statistical Estimation

The divergence measures may be applied to two statistical problems among others.

(A)
First, it may be used as a fitting problem as described here. Let X₁, X₂,.... a sample from X with an unknown probability distribution ℙ_X and we want to test the hypothesis that ℙ_X is equal to a known and fixed probability ℙ₀. Theoretically, we can answer this question by estimating a divergence measure 𝒟(ℙ_X, ℙ₀) by a plug-in estimator $𝒟(ℙX(n),ℙ0)$ where, for each n ≥ 1, ℙ_X is replaced by an estimator $ℙX(n)$ of the probability law, which is based on sample X₁, X₂, ..., X_n, to be precised.

From there establishing an asymptotic theory of $Δn=𝒟(ℙX(n),ℙ0)−𝒟(ℙX,ℙ0)$ is thought to be necessary to conclude.
(B)
Next, it may be used as tool of comparing for two distributions. We may have two samples and wonder whether they come from the same probability measure. Here, we also may two different cases.
(B1)
In the first, we have two independent samples X₁,X₂,.... and Y₁,Y₂,.... respectively from a random variable X and Y. Here the estimated divergence $𝒟(ℙX(n),ℙY(m))$ , where n and m are the sizes of the available samples, is the natural estimator of D(ℙ_X, ℙ_Y) on which depends the statistical test of the hypothesis : ℙ_X = ℙ_Y.
(B2)
But the data may also be paired (X, Y), (X₁, Y₁), (X₂, Y₂),..., that is X_i and Y_i are measurements of the same case i = 1,2,... In such a situation, testing the equality of the margins ℙ_X = ℙ_Y should be based on an estimator $ℙ(X,Y)(n)$ of the joint probability law of the couple (X, Y) based on the paired observations (X_i,Y_i), i = 1,2,...,n.

We did not encounter the approach (B2) in the literature. In the (B1) approach, almost all the papers used the same sample size, at the exception of [Poczos and Jeff(2011)], for the double-size estimation problem. In our view, the study case should rely on the available data so that using the same sample size may lead to a loss of information. To apply their method, one should take the minimum of the two sizes and then loose information. We suggest to come back to a general case and then study the asymptotic theory of $𝒟(ℙX(n),ℙY(m))$ based on samples X₁,X₂,..,X_n. and Y₁,Y₂,...,Y_m. In this paper, we will systematically use arbitrary samples sizes.

In the context of the situation (B1), there are several papers dealing with the estimation of the divergence measures. As we are concerned in this paper by the weak laws of the estimators, our review on that problematic did return only of a few results. Instead, the literature presented us many kinds of results on almost-sure efficiency of the estimation, with rates of convergences and laws of the iterated logarithm, L^p (p = 1,2) convergence, etc. To be precise, [Dhakher et al.2016] used recent techniques based on functional empirical process to provide a series of interesting rates of convergence of the estimators in the case of one-sided approach for the class de Renyi, Tsallis, Kullback-Leibler to cite a few. Unfortunately, the authors did not address the problem of integrability, taking for granted that the divergence measures are finite. Although the results should be correct under the boundedness assumption BD we described earlier, a new formulation in that frame would be welcome.

The paper of [Krishnamurthy et al.2015] is exactly what we want to do, except that it is concentrated on the L²-divergence measure and used the Parzen approach. Instead, we will handle the most general case of ϕ-divergence measure and will use the wavelets probability density estimators.

In the context of the situation (B1), we may cite first the works of [Krishnamurthy et al.2014] and [Singh and Poczos (2014)]. They both used divergence measures based on probability density functions and concentrated on Renyi-α, Tsallis-α and Kullback-Leibler. In the description of the results below, the estimated pfd’s - f and g - are usually in a periodic Hőlder class of a known smoothness s..

Specifically, [Krishnamurthy et al.2014] defined Renyi and Tsallis estimators by correcting the plug-in estimator and established that, as long as 𝒟_R_,_α(f, g) ≥ c and 𝒟_T,α(f, g) ≥ c, for some constant c > 0, then

$𝔼|𝒟R,α(fn,gn)−𝒟R,α(f,g)|≤c(n−1/2+n−3s2s+d)and𝔼|𝒟T,α(fn,gn)−𝒟T,α(f,g)|≤c(n−1/2+n−3s2s+d),$

[Poczos and Jeff (2011)] used a k–nearest-neighbor approach to prove that if |α − 1| < k, (α ≠ 1) then

$limn,m→∞𝔼[𝒟T,α(fn,gm)−𝒟T,α(f,g)]2=0andlimn,m→∞𝔼(𝒟R,α(fn,gm))=𝒟R,α(f,g).$

There has been a recent interest in deriving convergence rates for divergence estimators ([Moon and Hero(2014)], [Krishnamurthy et al.2014]). The rates are typically derived in terms of smoothness s of the densities :

The estimator of [Liu et al. 2012] converges at rate $n−ss+d$ , achieving the parametric rate when s > d.

Similarly, [Sricharan et al.(2012)] showed that when s > d a k-nearest-neighbor style estimator achieves the rate n^−2/d (in absolute error) ignoring logarithmic factors. In a follow up work, the authors improved this result to O(n^−1/2) by using a set of weak estimators, but they required s > d orders of smoothness. One can also see [Singh and Poczos (2014)], [Kallberg and Seleznjev(2012)] for other contributions.

The majority of the aforementioned articles worked with densities in Hőlder classes, whereas our work applies for densities in the Besov classes.

Here, we will focus on divergence measures between absolutely continuous probability laws with respect to the Lebesgue measure. As well, our results applied to the approaches (A) and (B1) defined above. As a sequence, we estimate divergence measures by their plug-in counterparts, meaning that we replace the probability density functions (pdf) in the expression of the divergence measure by a nonparametric estimators of the pdf ’s. From now, we have on our probability space, two independent sequences :

(-)
a sequence of independent and identically distributed random variables with common pdf f_ℙ_X :
(1.8) $X1,X2,…$
(-)
a sequence of independent and identically distributed random variables with common pdf g_ℙ_Y :
(1.9) $Y1,Y2,…$
To make the notations more simple, we write
$f=fℙXand g=fℙY.$
We focus on using pdf ’s estimates provided by the wavelets approach. We will deal on the Parzen approach in a forthcoming study. So, we need to explain the frame in which we are going to express our results.

We also wish to get, first, general laws for an arbitrary functional of the form

(1.10)

$J(f,g)=∫Dφ(f(x),g(x))dx,$

where ϕ(x,y) is a measurable function of on

$(x,y)∈ℝ+2$ which we will make the appropriate conditions. The results on the functional J(f, g), which is also known under the name of ϕ-divergence, will lead to those on the particular cases of the Renyi, Tsallis, and Kullback-Leibler measures.

The exposure of all our results will be given in three a series of three papers. This paper is devoted to the foundations the general asymptotic theory and the exposition of the mains theoretical tools concerning the ϕ-forms. The second paper will deal with important key divergence measures and symmetrized estimators. Finally a third paper will focus on the proofs.

1.3. Wavelets estimation of pdf’s

To begin with the wavelets theory and its statistical applications, we say that the wavelets setting involves two functions φ and ψ in L₂(ℝ) respectively called father and mother such that

${ϕ(.−k),2j/2ψ(2j(.)−k),(j,k)∈ℤ2},$

is a orthonormal basis of L₂(ℝ). We adopt the following notation, for j ≥ 0, k ∈ ℤ :

$ϕj,k=2j/2ϕ(2j(.)−k) and ψj,k=2j/2ψ(2j(.)−k).$

Thus, any function f in L₂(ℝ) is characterized by its coordinates in the orthonormal basis, in the form

(1.11)

$f=∑k∈ℤα0,kϕ0,k+∑k∈ℤ∑j≥1βj,kψj,k$

with for j ≥ 0, k ∈ ℤ,

$α0,k=∫ℝf(t)ϕ0,k(t)dt and βj,k=∫ℝf(t)ψj,k(t)dt.$

For an easy introduction to the wavelets theory and to its applications to statistics, see for instance [Hardle et al.1998], [Daubechies(1992)], [Blatter(1998)], etc. In this paper we only mention the unavoidable elements of this frame.

Based on the orthonormal basis defined below, the following Kernel function is introduced

$ℝ2∋(x,y)↦K(x,y)=∑k∈ℤϕ(x−k)ϕ(y−k).$

For any j ≥ 1 fixed, called a resolution level, we define

$Kj(x,y)=2jK(2jx,2jy)$

and for measurable function h, we define the operator projection K_j of h onto the space V_j of L₂(ℝ) (spanned by 2^j/2φ(2^j(.) − k)), by

$ℝ2∋x↦Kj(x,y)=∫Kj(x,y)h(y)dy.$

Therefore we can write, for all x ∈ ℝ,

(1.12)

$Kj(h)(x)=2j∫K(2jx,2jy)h(y)dy=2j∫∑kϕ(2jx−k)ϕ(2jy−k)h(y)dy.$

In the frame of this wavelets theory, for each n ≥ 1, we fix the resolution level depending on n and denoted by j = j_n, and we use the following estimator of the pdf f associated to X, based on the sample of size n from X, as defined in (1.8),

(1.13)

$fn(x)=1n∑i=1nKjn(x,Xi).$

As well, in a two samples problem, we will estimate the pdf g associated to Y, based on a sample of size n from Y, as defined in (1.9), by

(1.14)

$gn(x)=1n∑i=1nKjn(x,Yi).$

The aforementioned estimator is known under the name of linear wavelets estimators.

Before we give the main assumptions on the wavelets we are working, we have to define the concept of weak differentiation. Denote by 𝒟(ℝ) the class of functions from ℝ to ℝ with compact support and infinitely differentiable. A function f : ℝ → ℝ is weak differentiable if and only if there exists a function g : ℝ → ℝ locally integrable (on compact sets) such that, for any ϕ ∈ 𝒟(ℝ), we have

$∫f(u)φ′(u)du=−∫g(u)φ(u)du.$

In such a case, g is called the weak derivative function of f and denoted f^[1]. If the first weak derivative has itself a weak derivative, ans so forth up to the p − 1-th derivative, we get the p-th derivative function f^[p]. Now we may expose the four assumptions we require on the wavelets.

Assumption 1.. The wavelets φ and ψ are bounded and have compact support and either (i) the father wavelet φ has weak derivatives up to order T in L_p(ℝ)(1 ≤ p ≤ ∞) or (ii) the mother wavelet ψ associated to φ satisfies ∫ x^mψ(x)dx = 0 for all m = 0,...,T.

and
Assumption 2. φ : ℝ → ℝ is of bounded p-variation for some 1 ≤ p < ∞ and vanishes on (B₁, B₂]^c for some −∞ < B₁ < B₂ < ∞.

Wavelets generators with compact supports are available in the literature. We may cite those named after Daubechies, Coiflets and Symmlets (See [Hardle et al.(1998)]). The cited generators fulfill our two main assumption.

Under Assumption 2, the summation over k, in (1.12), is finite since only a number of the terms in the summation are non zeros (see [Giné and Nickl(2009)]).
Assumption 3. There exists a non-negative symmetrical and continuous function Φ(t) of t ∈ ℝ with a compact support 𝒦 such that :
$∀(x,y)∈ℝ2,|K(x,y)|≤Φ(x−y).$
The fourth assumption concerns the resolution level we choose. We set for once an increasing sequence (j_n)_n≥1 such that
Assumption 4. lim_n→+∞ n^−1/42^jn = 1.

By the way, we have as n → ∞, and
(1.15) $jn2jnn+2−t/jn≈14log2lognn3/4+n−t/4→0, ∀t>0jnloglogn→∞ and supn≥n0(j2n−jn)=14.$
These conditions allow the use the [Giné and Nickl(2009)]’s results.

We also denote

(1.16)

$an=‖fn−f‖∞, bn=‖gn−g‖∞, n≥1cn=an∨bn, cn,m=an∨bm, n≥1, m≥1,cn,m*=cn,m∨cm,n, n≥1, m≥1.$

where ‖h‖_∞ stands for sup_x∈_D₍_h₎ |h(x)|, and D(h) is the domain of application of h.

In the sequel we suppose the densities f and g belong to the Besov space $ℬ∞,∞t(ℝ)$ . We will say a word of simple conditions under which our pdf’s do belong to such spaces.

Suppose that the densities f and g belong to $ℬ∞,∞t(ℝ)$ , that φ satisfies Assumption 2, and φ, ψ satisfy Assumption 1. Then Theorem 3 [Giné and Nickl(2009)] implies that the rates of convergence a_n, b_n and c_n are of the form

$O(14log2lognn3/4+n−t/4)$

almost-surely and converge all to zero at this rate (with 0 < t < T).

In order to establish the asymptotic normality of the divergences estimators, we need this key tool concerning the wavelets empirical process denoted by $𝔾n,Xw(h)$ , where $h∈ℬ∞,∞t(ℝ)$ and defined as follows by

$𝔾n,Xw(h)=n(ℙn,Xw−𝔼X)(h),$

where

$ℙn,Xw(h)=ℙn,X(Kjn(h))=1n∑i=1nKjn(h)(Xi)$ and 𝔼_X(h) = ∫ h(x) f(x)dx denotes the expectation of the measurable function h with respect to the probability distribution function ℙ_X. The superscript w refers to wavelets. We have

(1.17)

$𝔾n,Xw(h)=n∫(fn(x)−f(x))h(x)dx$

since, by Fubini’s Theorem,

$n(ℙn,Xw−𝔼𝕏)(h)=n(1n∑i=1nKjn(h)(Xi)−∫f(x)h(x)dx)=n(1n∫∑i=1nKjn(x,Xi)h(x)dx−∫f(x)h(x)dx)=n∫(1n∑i=1nKjn(x,Xi)−f(x))h(x)dx=n∫(fn(x)−f(x))h(x)dx.$

We are ready to give our results on the functional J introduced in Formula (1.10).

2. RESULTS

2.1. Main Results

Here, we present a general asymptotic theory of a class of divergence measures estimators including the Renyi and Tsallis families and the Kullback-Leibler ones.

Actually, we gather them in the ϕ-divergence measure form. We will obtain a general frame from which we will derive a number of corollaries. The assumption (1.6) will be used in the particular cases to ensure the finiteness of the divergence measure as mentioned in the beginning of the article. However, in the general results, the assumption (1.6) is part of the general conditions.

We begin to state a result as a general tool for establishing asymptotic normality and related to the wavelets empirical process, which we will use for establishing the asymptotic normality of divergence measures.

Theorem 2.1.

Given the (X_n)_n_≥1, defined in (1.8) such that $f∈ℬ∞,∞t(ℝ)$ and let f_n defined as (1.13) and $𝔾n,Xw$ defined as in (1.17). Then, under Assumption (1–3) and for any bounded h, defined on D, belonging to $ℬ∞,∞t(ℝ)$ , we have

$σh,n−1𝔾n,Xw(h)⇝𝒩(0,1) as n→∞,$

where we have

$σh,n2=𝔼X(Kjn(h)(X))2−(𝔼X(Kjn(h)(X))2→𝕍ar(h(X)) as n→∞.$

Based on that result which will be proved later, we are going to state all results of the functional J defined in Formula 1.10, regarding its almost-sure and Gaussian asymptotic behavior. Let us begin by some notations. Let us assume that ϕ have continuous second order partial derivatives defined as follows :

$φ1(1)(s,t)=∂φ∂s(s,t), φ2(1)(s,t)=∂φ∂s(s,t)$

and

$φ1(2)(s,t)=∂2φ∂s2(s,t), φ2(2)(s,t)=∂2φ∂t2(s,t), φ1,2(2)(s,t)=φ2,1(2)(s,t)=∂2φ∂s∂t(s,t).$

Define the functions h_i, i = 1,...4 :

$h1(x)=φ1(1)(f(x),g(x)), h2(x)=φ2(1)(f(x),g(x)),$

$h3(x)=φ1(1)(g(x),f(x)) and h4(x)=φ2(1)(g(x),f(x))$

Set

$A1=∫D|h1(x)|dx and A2=∫D|h2(x)|dx$

and

$A3=∫D|h3(x)|dx and A4=∫D|h4(x)|dx.$

We require the following general conditions.

C-A. All the constants A_i are finite.
C-h. All the functions h_i used in the theorem below are bounded and lie in a Besov space $ℬ∞,∞t$ for some t such that t > 1/2.
C1-ϕ. The following integral
$∫{|φ1(1)(f(x),g(x))|+|φ2(1)(f(x),g(x))|}dx<+∞.$
us finite.
C2-ϕ. For any measurable sequences of functions $δn(1)(x)$ , $δn(2)(x)$ , $ρn(1)(x)$ , and $ρn(2)(x)$ of x ∈ D, uniformly converging to zero, that is
$maxi=1,2, j=1,2sup{|δn(i)(x)|+|ρn(j)(x)|}<+∞,$
we have as n → ∞
(2.1) $∫Dφ1(2)(f(x)+δn(1)(x),g(x))dx→∫Dφ1(2)(f(x),g(x))dx,$

(2.2) $∫Dφ2(2)(f(x),g(x)+δn(2)(x))dx→∫Dφ2(2)(f(x),g(x))dx,$
and
(2.3) $∫Dφ1,2(2)(f(x)+ρn(1)(x),g(x)+ρn(2)(x))dx→∫Dφ1,2(2)(f(x),g(x))dx.$

Remark 2.1.

(a)
To check C-h, we may use criteria based on properties of Besov spaces derived on high order differentiability and on the fact we work on compact sets, as it will be seen in the second part of this paper, or in the Appendix section on [Ba et al.2017]. These techniques show that our results apply to all the usual distributions.
(b)
The conditions in C2-ϕ may be justified by the Dominated Convergence Theorem or the monotone Convergence Theorem or from other limit theorems. We may either express conditions on the general function ϕ under which these results hold true. But here, we choose to state the final results and next, to check them for particular cases, in which we may use convergence theorems.

Based on (1.13) and (1.14), we will use the following estimators

$Jn(fn,g)=∫Dφ(fn(x),g(x))dx, J(f,gn)=∫Dφ(f(x),gn(x))dx,and J(fn,gn)=∫Dφ(fn(x),gn(x))dx.$

Here are our main results.

I - Statements of the main results.

The first concerns the almost sure efficiency of the estimators.

Theorem 2.2.

Under the assumptions 1–3, C-A, C-h, C1-ϕ, C2-ϕ and (BD), we have

(2.4)

$limsupn→+∞|J(fn,g)−J(f,g)|an≤A1,a.s$

(2.5)

$limsupn→+∞|J(fn,g)−J(f,g)|bn≤A2,a.s$

(2.6)

$limsup(n,m)→(+∞,−∞)||J(fn,gm)−J(f,g)|cn,m|≤A1+A2 a.s$

where a_n, b_n and c_n are as in (1.16).

The second concerns the asymptotic normality of the estimators.

Theorem 2.3.

Under the assumptions 1–3, C-A, C-h, C1-ϕ, C2-ϕ and (BD), we have

(2.7)

$n(J(fn,g)−J(f,g))⇝𝒩(0,𝕍ar(h1(X))), as n→+∞$

(2.8)

$n(J(f,gn)−J(f,g))⇝𝒩(0,𝕍ar(h2(Y))), as n→+∞$

and as n → +∞ and m → +∞,

(2.9)

$(nmm𝕍ar(h1(X)+n𝕍ar(h2(Y))))1/2(J(fn,gm)−J(f,g))⇝𝒩(0,1).$

3. Comments and Announcements

In a second paper, we will give versions of our main results on specific and classical divergence measures. The references below, in general, will not be repeated in the two other papers.

Acknowledgment

The fourth (1 & 2 & 3) author acknowledges support from the World Bank Excellence Center (CEA-MITIC) that is continuously funding his research activities from starting 2014.

References

[1]AD Ba, GS Lo, and B Diam, Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes, 2017. LO Ba ArXiv:1704.04536

[2]H Dhaker, P Ngom, E Deme, and Mendy Pierre, Kernel-Type Estimators of Divergence Measures and Its Strong Uniform Consistency, American Journal of Theoretical and Applied Statistics, Vol. 5, No. 1, 2016, pp. 13-22.

[3]I Daubechies, Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics, Philadelphia, 1992.

[4]F Topsoe, Some inequalities for information divergence and related measures of discrimination, IEEE Transactions on Informations Theory, Vol. 46, 2000, pp. 1602-1609.

[5]A Evren, Some Applications of Kullback-Leibler and Jeffreys’ Divergences in Multinomial Populations, Journal of Selcuk University natural and Applied Science, Vol. 1, No. 4, 2012, pp. 48-58.

[6]A Cichocki and S Amari, Families of Alpha-Beta-and Gamma-Divergences: Flexible and Robust Measures of Similarities, Entropy, Vol. 12, No. 6, 2010, pp. 1532-1568.

[7]P Hall, On Kullback-Leibler loss and density estimation, The Annals of Statistics, Vol. 15, No. 4, 1987, pp. 1491-1519.

[8]S Kullback and R Leibler, On information and sufficiency, The Annals of Mathematical Statistics, Vol. 22, No. 1, 1951, pp. 79-86.

[9]S Singh and B Poczos, Generalized Exponential Concentration Inequality for Rényi Divergence Estimation, Journal of Machine Learning Research, Carnegie Mellon University, Vol. 6, 2014.

[10]K Akshay, K Kirthevasan, B Poczos, and L Wasserman, Nonparametric Estimation of Rényi Divergence and Friends, Journal of Machine Learning Research, in Workshop and conference Proceedings, 32. (2014), Vol. 3, pp. 2.

[11]A Krishnamurthy, K Kandasamy, B Poczós, and L Wasserman, To appear, JMLR: W&CP, in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) (San Diego, CA, USA., 2015), Vol. 38, 2015. Copyright 2015 by the authors.

[12]KR Moon and AO Hero III, Ensemble estimation of multivariate f - divergence, IEEE Internatonal Symposium on Information Theory, 2014, pp. 356-360.

[13]B Poczós and S Jeff, On the estimation of α–Divergences, in International Conference on Artificial Intelligence and Statistics (2011), pp. 609-617.

[14]H Liu, J Lafferty, and L Wasserman, Exponential concentration inequality for mutual information estimation, in Neural Information Processing Systems (NIPS) (2012).

[15]E Giné and R Nickl, Uniform limit theorems for wavelet density estimators, The Annals of Probability, Vol. 37, No. 4, 2009, pp. 1605-1646.

[16]W Hardle, G Kerkyacharian, D Picard, and A Tsybakov, Wavelets, Approximation, and Statistical Applications, 1998. Lecture Notes in Statistics.

[17]C Blatter, Wavelets, a Primer, A. K. Peters, Natick. MA, 1998.

[18]G Valiron, Théorie des fonctions, Masson, Paris Milan Melbourne, 1966.

[19]K Sricharan, D Wei, and AO Hero, Ensemble estimators for multivariate entropy estimation. arXiv:1203.5829, 2012.

[20]D Kallberg and O Seleznjev, Estimation of entropy-type integral functionals, 2012. arXiv:1209.2544.

[21]M Love, Probabily Theory I, 4th Edition, Springer, 1972.

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Journal: Journal of Statistical Theory and Applications
Volume-Issue: 17 - 1
Pages: 158 - 171
Publication Date: 2018/03/31
ISSN (Online): 2214-1766
ISSN (Print): 1538-7887
DOI: 10.2991/jsta.2018.17.1.12 How to use a DOI?
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Amadou Diadié Ba
AU  - Gane Samb LO
AU  - Diam Ba
PY  - 2018
DA  - 2018/03/31
TI  - Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes I
JO  - Journal of Statistical Theory and Applications
SP  - 158
EP  - 171
VL  - 17
IS  - 1
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.2018.17.1.12
DO  - 10.2991/jsta.2018.17.1.12
ID  - DiadiéBa2018
ER  -

download .riscopy to clipboard