A class of Bivariate SURE estimators in heteroscedastic hierarchical normal models

S.K. Ghoreishi

doi:10.2991/jsta.2018.17.2.11

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Volume 17, Issue 2, June 2018, Pages 324 - 339

A class of Bivariate SURE estimators in heteroscedastic hierarchical normal models

Authors

S.K. Ghoreishiatty_ghoreishi@yahoo.com

Department of Statistics, Faculty of Sciences, University of Qom, Qom, Iran

Received 11 January 2017, Accepted 5 June 2017, Available Online 30 June 2018.

DOI: 10.2991/jsta.2018.17.2.11 How to use a DOI?
Keywords: Asymptotic univariate shrinkage estimators; Heteroscedasticity; Hierarchical models; Multivariate shrinkage estimator; Stein’s unbiased risk estimators
Abstract: In this paper, we first propose a class of bivariate shrinkage estimators based on Steins unbiased estimate of risk (SURE). Then, we study the effect of correlation coefficients on their performance. Moreover, under some mild assumptions on the model correlations, we set up the optimal asymptotic properties of our estimates when the number of vector means to be estimated grows. Furthermore, we carry out a simulation study to compare how various bivariate competing shrinkage estimators perform and analyze a real data set.
Copyright: Copyright © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

One of the most applicable class of models is the class of hierarchical models. It is well-known that the shrinkage estimates play an important role in hierarchical models. In this line, the James-Stein estimators, which combine the partial information of various sources, dominates some other estimators like the ordinary least squares estimators, Stein (1956) and James and Stein (1961). However, the good risk properties of shrinkage estimators make them appealing in many applicable disciplines. Undoubtedly, the basic works of James and Stein on shrinkage estimators became a foundation for the other progressing studies on hierarchical normal models. Stein (1962) described a hierarchical, empirical Bayes interpretation for the shrinkage estimators. Further studies have been done by Efron and Morris (1973), who tried to interpret these kinds of empirical Bayes estimators by several other competing parametric empirical Bayes estimators. Moreover, the homoscedastic (equal subpopulation variances) empirical Bayes interpretation of estimators as well as heteroscedastic (unequal subpopulation variances) ones has motivated various treatments of this problem in hierarchical normal models. For homoscedastic case, we can refer to Baranchik (1970), Strawderman (1971), Brown (1971, 1975) and Berger (1976) among the others, while the heteroscedastic situation have been addressed by a few authors. For more details see Hudson (1974), Xie et al. (2012), Ghoreishi and Meshkani (2014, 2015).

Following considerable work on univariate shrinkage estimators, the multivariate type of James-Stein shrinkage estimators have also been considered for random vectors of dimension r ≥ 3, which either shrink towards the origin vector or another given fixed vector, Lehmann and Casella (1998). James and Stein (1961) showed that the multiple homoscedastic shrinkage estimator of the population mean dominates the corresponding maximum likelihood estimator when r ≥ 3. It means that the multiple James and Stein shrinkage estimator always achieves lower MSE than the corresponding maximum likelihood estimator. Equivalently, the James and Stein multiple estimator makes the maximum likelihood estimator inadmissible when r ≥ 3.

From interpretation point of view, the James and Stein estimator, as an empirical Bayes, is based on the assumption that the population mean is a random vector with an unknown diagonal covariance matrix σ²I and this matrix is estimated from the data itself, Brown (2008), Brown and Greenshtein (2009) and Berger and Strawderman (2009). The main point on this approach is that it is achievable when the random vector of means has a dimension greater than or equal 3 and unfortunately it is not applicable for 2-dimensional random vectors.

The study of James and Stein (1961) on empirical Bayes shrinkage estimators have extended to the case of a general covariance matrix, i.e., where measurements are statistically dependent and may have different variances, see Strawderman (1971) and Bock (1975). They illustrated that the theoretical results on multivariate shrinkage estimator hold when r ≥ 3 and in addition the covariance matrix has the general form σ²D, which is known up to constant σ².

To our knowledge, all results on multivariate homoscedastic James-Stein shrinkage estimators hold for large dimensions r ≥ 3.

The outlines of the paper are:

1)
We first introduce an alternative class of bivariate shrinkage estimators.
2)
We study the correlation effect on their performance.
3)
Since the correlation structure is the biggest challenge in bivariate setting, and this phenomenon can seriously threaten the truth of theoretical results, under some mild assumptions on the model correlations, we show that our proposed estimators have optimal asymptotic properties.

The paper is organized as follows. In Section 2, we review some basic concepts and define necessary notations. The main results of this work, including the risk properties of our bivariate conditional heteroscedastic SURE estimators, are presented in Section 3. We carry out a simple simulation study to evaluate the performance of our proposed bivariate empirical Bayes shrinkage estimators in Section 4 and illustrate the theoretical results for a real data set in Section 5. Necessary technical proofs are given in the appendix.

2. Preliminaries

Let $Yi=(Y1iY2i)$ be the vectors, with 2 × 1 dimension, of normal random variables taken independently at points i = 1,…N. That is

(2.1)

$Yi~N2(θi,Ai).$

Moreover, The classical conjugate hierarchical model puts a prior of bivariate normal distribution on θ_i

$θi~N2(μ,Λ).$

Combination of two assumed model leads us to the following heteroscedastic hierarchical bivariate normal model

(2.2)

$Yi~N2(θi,Ai),θi~N2(μ,Λ), i=1,⋯,N,$

where Λ = diag{λ_i} is an unknown 2 × 2 diagonal matrix with positive entries, μ is a 2 × 1 unknown vector and A_is are (potentially) distinct covariance matrices either completely known or will be known by some plug-in robust estimators.

Application of Bayes theorem for model (2.2) gives us the bivariate posterior density

$θi~N2((Ai−1+Λ−1)−1(Ai−1Yi+Λ−1μ),(Ai−1+Λ−1)−1).$

According to this distribution, we consider the posterior means as an alternative class of bivariate shrinkage estimators of θ_i. That is,

(2.3)

$θˆiΛˆ,μˆ=(Ai−1+Λˆ−1)−1Ai−1Yi+(Ai−1+Λˆ−1)−1Λˆ−1μˆ,$

As one can see these estimators tend to shrink the sample vectors to the mean vector μ via some coefficients, having important roles to balance observations effect in estimation.

It is straightforward to check that the following relationship holds between the coefficients of estimators (2.3),

(2.4)

$(Ai−1+Λˆ−1)−1Ai−1=I2×2−(Ai−1+Λˆ−1)−1Λˆ−1,$

for identity matrix I_2×2. Moreover, we have

$(Ai−1+Λ−1)−1Ai−1=(I2×2+AiΛ−1)−1=(I2×2+(a1i2ρia1ia2iρia1ia2ia2i2)(1λ1001λ2))−1=(U11iΛ−U12iΛ−U21iΛU22iΛ)=UiΛ,$

and

$(Ai−1+Λ−1)−1Λ−1=(1−U11iΛU12iΛU21iΛ1−U22iΛ),$

where a_1i > 0, a_2i > 0 and

$U11iΛ=1+a2i2λ2(1+a1i2λ1)(1+a2i2λ2)−ρi2a1i2a2i2λ1λ2, U12iΛ=ρia1ia2iλ1(1+a1i2λ1)(1+a2i2λ2)−ρi2a1i2a2i2λ1λ2U21iΛ=ρia1ia2iλ2(1+a1i2λ1)(1+a2i2λ2)−ρi2a1i2a2i2λ1λ2, U22iΛ=1+a1i2λ1(1+a1i2λ1)(1+a2i2λ2)−ρi2a1i2a2i2λ1λ2.$

It is worth pointing out the following remark concerning our shrinkage estimators.

Remark 2.1.

Condition |ρ_i| → 0 is equivalent to assume separate shrinkage estimates for marginal variables. If λ₁, λ₂ → ∞, then our shrinkage estimates severely shrank to the observations. Moreover, if $|ρi|>min{a1i,a2i}max{a1i,a2i}$ , under some extreme values for the other parameters, our shrinkage estimates behave in an uncontrolled manner.

Since the heteroscedastic model, though more appropriate for practical applications, is less well studied in higher dimensions, and it is unclear what types of bivariate shrinkage estimators are superior in terms of the risk, in this work, we propose SURE approach under some mild conditions on correlation coefficients. For our purpose, we adopt the quadratic loss function

$lN(θ,θˆΛ,μ)=1N∑i=1N{θi−θˆiΛ,μ}′{θi−θˆiΛ,μ}.$

Let us denote the corresponding risk function and its unbiased estimator by $R(θ,θˆΛ,μ)$ and SURE(Λ, μ), respectively.

For Bivariate shrinkage estimators (2.3), we have

$lN(θ,θˆΛ,μ)=1N∑i=1N{(Ai−1+Λ−1)−1Ai−1(Yi−θi)+(Ai−1+Λ−1)−1Λ−1(μ−θi)}′× {(Ai−1+Λ−1)−1Ai−1(Yi−θi)−(Ai−1+Λ−1)−1Λ−1(θi−μ)}.$

Therefore, the corresponding risk function is obtained as follows

$R(θ,θˆΛ,μ)=E(lN(θ,θˆΛ,μ))=1N∑i=1N{tr(Ai−1(Ai−1+Λ−1)−2)+(θi−μ)′Λ−1(Ai−1+Λ−1)−2Λ−1(θi−μ)}.$

It is Straightforward to see that the Stein’s unbiased risk estimator of the $R(θ,θˆΛ,μ)$ is given by

$SURE(Λ,μ)=1N∑i=1N{tr [Ai−1(Ai−1+Λ−1)−2Ai−1−Λ−1(Ai−1+Λ−1)−2)Λ−1]Ai+(Yi−μ)′Λ−1(Ai−1+Λ−1)−2Λ−1(Yi−μ)},$

for more details see the appendix. In terms of U_11i, U_12i, U_21i, and U_22i, SURE(Λ, μ) can be rewritten as

$SURE(Λ,μ)=1N∑i=1N{[(1−U11i)2+U21i2](Y1i−μ1)2+[U12i2+(1−U22i)2](Y2i−μ2)2+ 2[(1−U11i)U12i+U21i(1−U22i)](Y1i−μ1)(Y2i−μ2)+ (1−2U11i)a1i2+2ρia1ia2i[U12i+U21i]+a2i2(1−2U22i)}.$

To estimate μ₁, μ₂, λ₁, and λ₂, one can minimize SURE(Λ, μ) with respect to these quantities. Setting $∂SURE(Λ,μ)∂μ1=0$ and $∂SURE(Λ,μ)∂μ2=0$ , give the solutions, satisfying

(2.5)

$1N∑i=1N[(1−U11i)2+U21i2]μ1+1N∑i=1N[(1−U11i)U12i+U21i(1−U22i)]μ2= 1N∑i=1N{[(1−U11i)2+U21i2]Y1i+[(1−U11i)U12i+U21i(1−U22i)]Y2i}$

and

(2.6)

$1N∑i=1N[(1−U11i)U12i+U21i(1−U22i)]μ1+1N∑i=1N[U12i2+(1−U22i)2]μ2= 1N∑i=1N{[(1−U11i)U12i+U21i(1−U22i)]Y1i+[U12i2+(1−U22i)2]Y2i}.$

In order to elicit the solutions, we assume (λ₁, λ₂) ∈ [0,b]², for relatively large real value b > 0. Then, we partition this cell into several tiny cells and apply (2.5) and (2.6) for (λ₁, λ₂) at each grid of these tiny cells and simultaneously compute μ₁, μ₂ and their corresponding SURE. The values of μ₁, μ₂, λ₁, and λ₂ that minimize the SURE function are considered the SURE estimates. So, the SURE approach leads us to the Stein’ unbiased risk estimators of Λ and μ which can obtain by minimizing SURE(Λ, μ) with respect to the Λ and μ, when the solutions exist. That is,

$(ΛˆSURE,μˆSURE)=min argΛ,μ SURE(Λ,μ).$

In this case, the corresponding SURE estimators are of the form

(2.7)

$θˆiΛˆSURE,μˆSURE=(Ai−1+ΛˆSURE−1)−1Ai−1Yi+(Ai−1+ΛˆSURE−1)−1ΛˆSURE−1μˆSURE.$

In the next section, we discus how our SURE estimators $θˆiΛSURE$ and μ_SURE perform in comparison with other competitive estimators resulting from other approaches including Empirical Bayes Method of Moment Estimators(EBMME) and Empirical Bayes Maximum Likelihood Estimators(EBMLE).

3. Asymptotic properties of the bivariate SURE estimators

In this section, we show that the proposed bivariate SURE estimators (2.7) have optimal asymptotic properties in comparison with their competitor bivariate estimators. In order to check how well SURE(Λ, μ) function approximates the loss function $lN(θi,θˆiΛ,μ)$ , we have

$SURE(Λ,μ)−lN(θ,θˆΛ,μ)=1N∑i=1N{Y1i2+Y2i2−a1i2−a2i2−θ1i2−θ2i2}−2N∑i=1N{U11iΛY1i2+U22iΛY2i2−(U12iΛ+U21iΛ)Y1iY2i−U11iΛY1iθ1i−U22iΛY2iθ2i+U12iΛY2iθ1i+U21iΛY1iθ2i−U11iΛa1i2−U22iΛa2i2+(U12iΛ+U21iΛ)ρia1ia2i}−2N∑i=1N{(μ1(1−U11iΛ)+μ2U21iΛ)(Y1i−θ1i)+(μ2(1−U22iΛ)+μ1U12iΛ)(Y2i−θ2i)}.$

By rearranging the terms of the above equation, one has

(3.1)

$SURE(Λ,μ)−lN(θ,θˆΛ,μ)=1N∑i=1N{Y1i2−a1i2−θ1i2}−2N∑i=1NU11iΛ{Y1i2−Y1iθ1i−a1i2}︸I1−2N∑i=1Nμ1(1−U11iΛ) (Y1i−θ1i)︸I2−2N∑i=1Nμ2U21iΛ(Y1i−θ1i)︸I3+[1N∑i=1N{Y2i2−a2i2−θ2i2}−2N∑i=1NU22iΛ{Y2i2−Y2iθ2i−a2i2}︸II1−2N∑i=1Nμ2(1−U22iΛ) (Y2i−θ2i)︸II2−2N∑i=1Nμ1U12iΛ(Y2i−θ2i)]︸II3+2N∑i=1NU12iΛ{Y1iY2i−Y2iθ1i−ρia1ia2i}︸III1+2N∑i=1NU21iΛ{Y1iY2i−Y1iθ2i−ρia1ia2i}.︸III2$

To investigate the risk properties of our bivariate SURE estimators, we need to use Lemma (2.1) in Li(1986). In our setting, it is applicable when the absolute values of $UiΛ$ entries are less than or equal to 1. Hence, the following lemma gives some mild conditions on correlation coefficients to guarantee the framework of Lemma (2.1) in Li(1986). Since the proof is straightforward, it is omitted.

Lemma 3.1.

For each t, the elements of $UiΛ$ , satisfy in

i)
$0≤U11iΛ≤1$ ,
ii)
$0≤U22iΛ≤1$ ,
iii)
If $|ρi|≤min{a1i,a2i}max{a1i,a2i}$ , then $|U12iΛ|≤1$ , $|U21iΛ|≤1$ , and $|U12iΛ+U21iΛ|≤1$ .

In Lemma (3.1)[iii], the upper bound for the absolute value of the correlation coefficients can be removed when a_1i = a_2i. Under this condition, both variables have the same variances and so the quantities $U12iΛ$ , $U21iΛ$ , and ρ_i have the same sign.

To proceed, we assume a homogeneous property for the correlation coefficients. It means either ρ_i ≥ 0, for all i = 1,2,⋯,N; or ρ_i ≤ 0, for all i = 1,2,⋯,N; which is a natural assumption for any relation between two assumed variables over sub-populations. For example, in a multinomial distribution, which can be approximated by a multivariate normal distribution, every two cell frequencies always has a negative correlation. With this assumption, the asymptotic property of our bivariate SURE function, SURE(Λ, μ), is given by the following theorem. The necessary conditions (C1-C3) are stated in the appendix.

Theorem 3.1.

For model (2.2), under the conditions C1-C3 and homogeneous property of the correlation coefficients, If $|ρi|≤min{a1i,a2i}max{a1i,a2i}$ , then

$supΛ,μ|SURE(Λ,μ)−lN(θ,θˆiΛ,μ)|→0$

in L¹ and in probability, as N → ∞.

Another class of bivariate shrinkage estimators, which have appealing asymptotic properties, is the shrinkage estimators that are shrunk toward the data grand mean. That is,

(3.2)

$θˆiΛˆ,Y¯=(Ai−1+Λˆ−1)−1Ai−1Yi+(Ai−1+Λˆ−1)−1Λˆ−1Y¯.$

Using quadratic loss function for these bivariate estimators, it is straightforward to see that

(3.3)

$SURE(Λ)−lN(θi,θˆiΛ,Y¯)=1N∑i=1N{Y1i2−a1i2−θ1i2}−2N∑i=1NU11iΛ{Y1i2−Y1iθ1i−a1i2}︸I1−2N∑i=1N(1−U11iΛ){Y¯1(Y1i−θ1i)−a1i2N}︸I2′−2N∑i=1NU21iΛ{Y¯2(Y1i−θ1i)−ρia1ia2iN}︸I3′+1N∑i=1N{Y2i2−a2i2−θ2i2}−2N∑i=1NU22iΛ{Y2i2−Y2iθ2i−a2i2}︸II1−2N∑i=1N(1−U22iΛ){Y¯2(Y2i−θ2i)−a2i2N)}︸II2′−2N∑i=1NU12iΛ{Y¯1(Y2i−θ2i)−ρia1ia2iN)}︸II3′+2N∑i=1NU12iΛ{Y1iY2i−Y2iθ1i−ρia1ia2i}︸III1+2N∑i=1NU21iΛ{Y1iY2i−Y1iθ2i−ρia1ia2i}.︸III2$

Therefore, the following theorem holds for (3.3).

Theorem 3.2.

For model (2.2), under the conditions C1-C3 and homogenous property for the correlation coefficients, If $|ρi|≤min{a1i,a2i}max{a1i,a2i}$ , then

$supΛ|SURE(Λ)−lN(θ,θˆiΛ,Y¯)|→0$

in L¹ and in probability, as N → ∞.

Remark 3.1.

Here, it is important to note that if |ρ_i| → 0, the upper bound of (A.1), in the appendix, tends to become sharper. This means that under uncorrelated situation the asymptotic convergence rate increases in Theorems (3.1) and (3.2).

4. Simulation study

In this section, we carry out a simple simulation to study how our bivariate shrinkage estimates perform in comparison with EBMME (Empirical Bayes Method of Moments Estimator) and EBMLE(Empirical Bayes Maximum Likelihood Estimator) bivariate estimators defined below.

•
Empirical Bayes method of moment estimators:
$θˆiΛˆEBMME,μˆEBMME=(Ai−1+ΛˆEBMME−1)−1Ai−1Yi+(Ai−1+ΛˆEBMME−1)−1ΛˆEBMME−1μˆEBMME,$
where $ΛˆEBMME$ and $μˆEBMME$ satisfy in
$∑i=1N(Ai+Λ)−12(Yi−μ)=0,∑i=1N{(Yi−μ)′(Ai+Λ)−1(Yi−μ)−2}=0,$
and the elements of $ΛˆEBMME$ are non-negative.
•
Empirical Bayes maximum likelihood estimators:
$θˆiΛˆEBMLE,μˆEBMLE=(Ai−1+ΛˆEBMLE−1)−1Ai−1Yi+(Ai−1+ΛˆEBMLE−1)−1ΛˆEBMLE−1μˆEBMLE,$
where the elements of $ΛˆEBMLE$ and $μˆEBMLE$ are obtained by maximizing the log-likelihood function
$l=cte−12∑i=1Nln{(a1i2+λ1) (a2i2+λ2)−ρi2a1i2a2i2}−∑i=1N12{(a1i2+λ1) (a2i2+λ2)−ρi2a1i2a2i2}{(a2i2+λ2) (Y1i−μ1)2+(a1i2+λ1) (Y2i−μ2)2−2ρia1ia2i(Y1i−μ1) (Y2i−μ2)}.$

For our simulation study, we first derive samples for

$Ai=(a1i2ρia1ia2iρia1ia2ia2i2)$

with marginal distributions

$a1i2~Unif(1,2)$ and

$a2i2~Unif(0.1,1)$ , and compute the corresponding correlation coefficients under three following scenarios.

•
Scenario I: we assume a large correlation coefficient for each i = 1,2,…,N. That is $ρi=0.95min(a1i,a2i)max(a1i,a2i)$ .
•
Scenario II: we assume a moderate correlation coefficient, $ρi=0.25min(a1i,a2i)max(a1i,a2i)$ .
•
Scenario III: A weak correlation coefficient is considered in this scenario, $ρi=0.05min(a1i,a2i)max(a1i,a2i)$ .

We also assume that θ_i are independently distributed according to

$θi~N2((00), (0.9000.9)).$

To measure the predictive power of different bivariate shrinkage estimators, we compute the limiting risk $limN→∞R(θ,θˆ)$ for different shrinkage estimators under three scenarios. This process is repeated a large number of times T = 10,000 to obtain an accurate estimate of the average risk for each estimator under three scenarios. Moreover, the sample size N is chosen to be 50 and 200. Table 1 shows the result for various approaches.

Sample Size	Method	Scenario I	Scenario II	Scenario III
N = 50	Oracle	0.3342	0.1021	0.0971
	SURE	0.3343	0.1044	0.0978
	EBMME	0.4558	0.3322	0.2320
	EBMLE	0.4865	0.3413	0.2403

N = 200	Oracle	0.1947	0.0540	0.0097
	SURE	0.1989	0.0553	0.0098
	EBMME	0.2271	0.0653	0.0187
	EBMLE	0.2430	0.0670	0.0188

Table 1.

The risk of various bivariate shrinkage estimates

Table 1 clearly confirms that although all bivariate shrinkage estimates (SURE, EBMLE and EBMME) have good performance in comparison with the corresponding Oracle risk estimates, our bivariate SURE estimates do better in terms of the risk value under Scenarios (I)-(III). That is, as our theoretical result indicates, the performance of the SURE estimators approaches that of the oracle risk estimators.

5. Application to a real dataset

Educational performance is one of the principle measures for evaluation of an educational center, like a university. It is usually considered through two extreme groups of the students who have either excellent performance (with grade point averages between [17,20]) or poor performance (with grade point averages less than 12) in comparison with the third (moderate) group (with grade point averages between [12,17)). In the following, we use E, M, and P notations which stand for groups of students with Excellent, Moderate, and Poor performances. For a given department indexed by i, let E_i, M_i, and P_i denote the number of students at each group and let n_i denote the number of total students in the corresponding department during the given semester. Whenever it is necessary to consider multiple semesters of educations, such as the two semesters in a given educational year, we will insert other subscripts, and use symbols such as E_ji, M_ji, and P_ji, to denote the number of students with excellent, moderate, and poor performances and total number n_ij for department i (= 1,⋯,N) within semester j (= 1,2).

For data involving N departments over two semesters, we focus on two extreme groups E and P, which are known as educational outcome of each department, Tinto (2010), and assume that

$(Eji,Bji)~multinomial(nji,pjiE,pjiP);i=1, ⋯ N; j=1,2,$

where unobserved parameters

$p jiE$ and

$p jiP$ are the corresponding proportions of groups E and P with the assumptions of

$p1iE=p2iE$ and

$p1iP=p2iP$ . Furthermore, the corresponding sample proportions are denoted by

$pˆ jiE=Ejinji$ and

$pˆ jiP=Pjinji$ . It is obvious that the sample proportions have approximately multivariate normal with vector mean

$(p jiE,p jiP)$ , and a covariance matrix that depends on the unknown values

$p jiE$ and

$p jiP$ .

For our purpose, we are interested in dealing with nearly bivariate normal variables such that a) having variances that depend only on the observed value of n and b) asymptotic performance of bivariate shrinkage estimators are robust with respect to plug-in estimates for the correlation coefficient. From Bartlett (1936, 1947), the standard variance-stabilizing transformations, $arcsin En$ and $arcsin Pn$ may do this goal fairly well for univariate case. However, in our bivariate setting, we prefer to use the transformations

$Y1=arcsin E+1/4n+1/2,$

and

$Y2=arcsin P+1/4n+1/2.$

For more details on these transformations, see Brown (2008). It is well-known that these transformations have marginal variance-stabilizing properties. That is:

$E(Y1):=arcsin pE, var(Y1)=14nE(Y2):=arcsin pP, var(Y2)=14n$

By applying Delta-method for the function arcsin $x$ , it is straightforward to see that the estimated correlation between Y₁ and Y₂ is given by

$ρˆY1,Y2=−(E+1/4) (P+1/4)(n−E+1/4) (n−P+1/4)+O(1n2)$

We would like to note that Lemma (3.1) holds for this correlation and the class of correlations given by (5.1). Furthermore, it is not difficult to numerically verify that the asymptotic performance of our proposed bivariate shrinkage estimators are robust with respect to a general class of the correlation between Y₁ and Y₂

(5.1)

$ρˆY1,Y2c=−(E+c) (P+c)(n−E+c) (n−P+c),$

for a positive constant c ≪ n. For more details on marginal transformation of type (5.1), see Brown (2008). If in equation (5.1) we put c = 0, the estimated correlation will be equal to the maximum likelihood estimate of correlation between the sample proportions. In summary, we approximately have

$Y=(Y1Y2)~N2( (arcsin pEarcsin pB),14n (1ρˆY1,Y2ρˆY1,Y21) )$

Let’s look at Educational data, Table 2, for 24 departments at University of Qom over the educational year 2014. Our primary focus in this research is to evaluate the students’ performance in the second semester, at each department, when the parameters are estimated based on the first-semester data. Since we use the information of an already-completed educational year, we can then validate our bivariate estimation performance by comparing the estimated values with the real values for the second semester. To evaluate an estimator $θˆ$ based on the transformation Y_2j, we measure the total bivariate sum of squared prediction errors(TSE) defined as

$TSE(θˆ)=∑i=1N(Y2i−θˆi)′(Y2i−θˆi)−∑i=1N12n2i.$

Department	Semester 1				Semester 2

	E	M	P	Total	E	M	P	Total
Statistics	5	16	1	22	4	13	4	21
Law	28	21	2	51	25	22	4	51
Mathematics	5	18	8	31	4	13	12	29
English literature	5	29	9	43	9	18	14	41
Arabic literature	23	15	3	41	14	16	9	41
Persian literature	24	29	3	56	11	30	10	51
Biology	4	7	2	13	2	9	1	12
Chemistry	14	28	7	49	8	31	9	48
Information	16	16	1	33	23	8	1	32
Economics	9	21	6	36	3	22	11	36
Education	20	21	3	44	13	25	6	44
Quranic Sciences	23	3	1	27	17	8	2	27
Computer Sciences	6	23	6	35	3	16	14	32
Jurisprudence	32	15	0	47	38	7	2	47
Philosophy	23	8	4	35	23	5	3	31
Physics	6	27	6	39	3	20	15	38
Business Management	20	18	6	44	20	16	8	44
Industrial Management	27	4	0	31	19	12	1	32
Architectural Engineering	13	5	1	19	1	16	2	19
Electrical Engineering	23	26	1	50	12	33	5	50
Industrial Engineering	4	13	2	19	6	12	1	19
Civil Engineering	25	28	8	61	13	40	8	61
Computer Engineering	18	24	2	44	15	27	1	43
Mechanical Engineering	16	11	3	30	8	17	5	30

Table 2.

Number of students according to their educational performance and department of study at each semester over educational year 2014 in University of Qom

Table 3 summarizes the numerical results of our bivariate shrinkage estimators in comparison with the other competing bivariate shrinkage estimators. The corresponding TSE values are reported. Acceptable performance of our bivariate SURE estimators, among the other estimators, are shown by Table 3. To numerically investigate the robust performance of our bivariate shrinkage estimators, with respect to little changes in correlation coefficient, we present a plot of TSE versus c in (5.1). As Figure 1 illustrates, TSE values of two approaches EBMME and EBMLE always dominate the corresponding values of SURE even for fairly large value of c.

c	0.25	1	2	5	10
SURE	0.101	0.079	0.064	0.035	0.021
EBMME	0.520	0.459	0.388	0.249	0.163
EBMLE	0.557	0.507	0.426	0.281	0.192

Table 3.

TSE for various c

Acknowledgements

I am grateful to Editor, Associate Editor and three referees for their constructive and insightful comments that helped greatly improve the quality of the paper.

Appendix

We first prove $E(SURE(Λ,μ))=R(θ,θˆΛ,μ)$ . It is easy to see

$E(SURE(Λ,μ))=1N∑i=1N{tr[Ai−1(Ai−1+Λ−1)−2Ai−1−Λ−1(Ai−1+Λ−1)−2)Λ−1]Ai +E[(Yi−μ)′Λ−1(Ai−1+Λ−1)−2Λ−1(Yi−μ)]}=1N∑i=1N{tr[Ai−1(Ai−1+Λ−1)−2Ai−1−Λ−1(Ai−1+Λ−1)−2)Λ−1]Ai +[θi−μ)′Λ−1(Ai−1+Λ−1)−2Λ−1(θi−μ)] +tr[Λ−1(Ai−1+Λ−1)−2)Λ−1Ai]}=1N∑i=1N{tr(Ai−1(Ai−1+Λ−1)−2)+(θi−μ)′Λ−1(Ai−1+Λ−1)−2Λ−1(θi−μ)}=R(θ,θˆΛ,μ).$

For establishing the asymptotic results, the following conditions are required,

C1)
For j = 1,2 and some δ > 0:
${lim supN→∞1N∑i=1Nσji4<∞,lim supN→∞1N∑i=1Nσji2θji2<∞,lim supN→∞1N∑i=1Na1i2θ2i2<∞,lim supN→∞1N∑i=1Na2i2θ1i2<∞.$
C2)

${lim supN→∞1N∑i=1Nθji2+δ<∞.lim supN→∞1N∑i=1Nθ1i<∞,lim supN→∞1N∑i=1Nθ2i<∞,$
C3)
Let |μ₁| ≤ max_t |Y_1i|, and |μ₂| ≤ max_t |Y_2i|. Since the sensible shrinkage estimators would attempt to shrink toward locations that lie inside the range of the data, these assumptions are included for technical reasons to facilitate the proof. These conditions imply that $E(max1≤i≤NYji2)=O(N2(2+δ*))$ , for j = 1,2, where δ* = min(1, δ), which is necessary to prove our theorems. For more details see Xie et al. (2012).

Proof of Theorem (3.1)

From (3.1), we have

$supΛ,μ|SURE(Λ,μ)−lN(θ,θˆiΛ,μ)|≤supΛ,μ|I1|+supΛ,μ|I2|+supΛ,μ|I3|+supΛ,μ|II1|+supΛ,μ|II2|+supΛ,μ|II3|+supΛ,μ|III1|+supΛ,μ|III2|.$

The L² convergence of the sup_Λ,μ |I₁| and sup_Λ,μ |I₂|, and L¹ convergence of sup_Λ,μ |I₂|, sup_Λ,μ |I₃|, sup_Λ,μ |II₂|, and sup_Λ,μ |II₃| immediately obtain from Theorems (3.1) and (5.1) in Xie et al. (2012) for univariate case. The proof of two last expressions, which consider interaction between two random variables, is given below. Without loss of generality, we assume wherever it is needed, the summands are rearranged according to decreasing order of their coefficients $U12tΛ$ . Therefore, we have

$supΛ,μ|III1|=supΛ|2N∑i=1NU12iΛ{Y1iY2i−Y2iθ1i−ρia1ia2i}|=supλ2>0 supλ1>0|2N∑i=1NU12iΛ{Y1iY2i−Y2iθ1i−ρia1ia2i}|.$

Under necessary conditions for correlation coefficients in Lemma (3.1)[iii], and using Lemma (2.1) in Li (1986), the last term is equal to

$sup{1≥U121Λ≥U122Λ,⋯,≥U12NΛ≥0}|2N∑i=1NU12iΛ{Y1iY2i−Y2iθ1i−ρia1ia2i}|≤max1≤t≤N|2N∑i=1t{Y1iY2i−Y2iθ1i−ρia1ia2i}|.$

Let $Mt=∑i=1t{Y1iY2i−Y2iθ1i−ρia1ia2i}$ . It is easy to see that M_t;t = 1,2,⋯ form a martingale process. Therefore the L² maximum inequality implies

(A.1)

$E(max1≤t≤NMt2)≤4E(MN2)=4E[∑i=1N{(Y1i−θ1i)Y2i−ρia1ia2i}]2=4E[∑i=1N{(Y1i−θ1i)2Y2i2+ρi2a1i2a2i2−2ρia1ia2i(Y1i−θ1i)Y2i}=4∑i=1N[(1+ρi2)a1i2a2i2+a1i2θ2i2]..$

Therefore, conditions (C1) and (C2) imply that $E(max1≤t≤N(2NMt)2)→0$ , which is equivalent to

$supΛ,μ|III1|=supΛ|2N∑i=1NU12iΛ{Y1iY2i−Y2iθ1i−ρia1ia2i}|→0 in L2 as N→∞.$

The same argument shows

$supΛ,μ|III2|=supΛ|2N∑i=1NU12iΛ{Y1iY2i−Y1iθ2i−ρia1ia2i}|→0 in L2 as N→∞.$

Proof of Theorem (3.2)

Form (3.3) it is obvious that

$supΛ|SURE(Λ)−lN(θi,θˆiΛ,Y¯)|≤supΛ|I1|+supΛ|I2′|+supΛ|I3′|+supΛII1+supΛ|II2′|+supΛ|II3′|+supΛ|III1|+supΛ|III2|.$

For univariate L² convergence of the sup_Λ |I₁| and sup_Λ |II₂|, L¹ convergence of $supΛ|I2′|$ and $supΛ|II2′|$ see Xie et al. (2012). The proof of L² convergence of sup_Λ |III₁| and sup_Λ |III₂| is discussed in Theorem (3.1). The proof of convergence of interaction terms $supΛ|I3′|$ and $supΛ|II3′|$ comes below. To prove the convergence of $supΛ|I3′|$ , we assume the summands are rearranged according to decreasing order of their coefficients. Therefore, we have

$supΛ|I3′|=supΛ|2N∑i=1NU21iΛ{Y¯2(Y1i−θ1i)−ρia1ia2iN}|≤supΛ|2N∑i=1NU21iΛY¯2(Y1i−θ1i)|+supΛ|2N∑i=1Nρia1ia2iN|≤supΛ|2N∑i=1NU21iΛ(Y1i−θ1i)||Y¯2|+supΛ|2N2∑t=1Na1ia2i|.$

Following the technique in the proof of Theorem (3.1), it is obvious to see that

$supΛ|2N∑i=1NU21iΛ(Y1i−θ1i)|→0 in L2.$

Moreover, $E(Y¯22)=1N2∑t=1Na2i2+(1N∑i=1Nθ2i)2$ , which is bounded by conditions C1-C3. So, by Cauchy-Schwartz inequality we have

$supΛ|2N∑i=1NU21iΛ(Y1i−θ1i)||Y¯2|→0 in L1.$

Therefore, we have

$supΛ|I3′|→0 in L1.$

The same results hold for $supΛ|II3′|$ and hence this completes the proof.

References

[1]AJ Baranchik, A Family of Minimax Estimators of the Mean of a Multivariate Normal Distribution, Ann. of Math. Statist, Vol. 41, 1970, pp. 642-645.

[2]J Berger, Admissible Minimax Estimation of a Multivariate Normal MeanWith Arbitrary Quadratic Loss, Ann. of Statist, Vol. 4, 1976, pp. 223-226.

[3]J Berger and WE Strawderman, Choice of Hierarchical Priors:Admissibility in Estimation of Normal Means, Ann. of Statist, Vol. 24, 1996, pp. 931-951.

[4]ME Bock, Minimax Estimators of the Mean of a Multivariate Normal Distribution, Ann. of Statist, Vol. 3, 1975, pp. 209-218.

[5]LD Brown, Admissible Estimators, Recurrent Diffusions, and Insoluble Boundary Value Problems, Ann. of Math. Statist, Vol. 42, 1971, pp. 855-903.

[6]LD Brown, Estimation With Incompletely Specified Loss Functions (theCaseWith Several Location Parameters), J. Amer. Statist. Assoc, Vol. 70, 1975, pp. 417-427.

[7]LD Brown, In-Season Prediction of Batting Average: A Field Test of Empirical Bayes and Bayes Methodologies, Ann. of Appl. Statist, Vol. 2, 2008, pp. 113-152.

[8]LD Brown and E Greenshtein, Nonparametric Empirical Bayes and Compound Decision Approaches to Estimation of a High-Dimensional Vector of Means, Ann. of Statist, Vol. 37, 2009, pp. 1685-1704.

[9]B Efron and C Morris, Stein’s Estimation Rule and Its Competitors: An Empirical Bayes Approach, J. Amer. Statist. Assoc, Vol. 68, 1973, pp. 117-130.

[10]SK Ghoreishi, Bayesian analysis of hierarchical heteroscedastic linear models using Dirichlet-Laplace priors, J. of Statistical Theory and Applications, Vol. 16, 2017, pp. 53-64.

[11]SK Ghoreishi and MR Meshkani, On SURE estimators in hierarchical models assuming heteroscedasticity for both levels of a two-level normal hierarchical model, J. of Multivariate Analysis, Vol. 132, 2014, pp. 129-137.

[12]SK Ghoreishi and A Mostafavinia, Shrinkage estimators for multi-level heteroscedastic hierarchical normal linear models, J. of Statistical Theory and Applications, Vol. 14, 2015, pp. 204-213.

[13]HM Hudson, Empirical Bayes Estimation, Technical Report No. 58, Department of Statistics, Stanford University, 1974.

[14]W James and CM Stein, Estimation With Quadratic Loss, in Proc. 4th Berkeley Symposium on Probability and Statistics (1961), Vol. I, pp. 367-379.

[15]EL Lehmann and G Casella, Theory of Point Estimation, 2nd Ed., Springer, New York, 1998.

[16]KC Li, Asymptotic optimality ofCL and Generalized Cross Validation in Ridge Regression with Application to Spline Smoothinge, Ann. of Statist, Vol. 14, 1986, pp. 1101-1112.

[17]CM Stein, Inadmissibility of the usual estimator for the mean of a multivariate distribution, in Proc. Third Berkeley Symp. Math. Statist. Prob. (1956), Vol. 1, pp. 197-206.

[18]CM Stein, Confidence Sets for the Mean of a Multivariate Normal Distribution (with discussion), J. Roy. Statist. Soc. Ser. B, Vol. 24, 1962, pp. 265-296.

[19]WE Strawderman, Proper Bayes Minimax Estimators of the Multivariate Normal Mean, Ann. Math. Statist, Vol. 42, 1971, pp. 385-388.

[20]X Xie, SC Kou, and LD Brown, SURE estimators for a Heteroscedastic Hierarchical Model, J. Amer. Statist. Assoc, Vol. 107, 2012, pp. 1465-1479.

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Journal: Journal of Statistical Theory and Applications
Volume-Issue: 17 - 2
Pages: 324 - 339
Publication Date: 2018/06/30
ISSN (Online): 2214-1766
ISSN (Print): 1538-7887
DOI: 10.2991/jsta.2018.17.2.11 How to use a DOI?
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - S.K. Ghoreishi
PY  - 2018
DA  - 2018/06/30
TI  - A class of Bivariate SURE estimators in heteroscedastic hierarchical normal models
JO  - Journal of Statistical Theory and Applications
SP  - 324
EP  - 339
VL  - 17
IS  - 2
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.2018.17.2.11
DO  - 10.2991/jsta.2018.17.2.11
ID  - Ghoreishi2018
ER  -

download .riscopy to clipboard