Interval Estimation of the Overlapping Coefficient of Two Exponential Distributions

Sibil Jose; Seemon Thomas; Thomas Mathew

doi:10.2991/jsta.d.190306.004

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Volume 18, Issue 1, March 2019, Pages 26 - 32

Interval Estimation of the Overlapping Coefficient of Two Exponential Distributions

Authors

Sibil Jose¹^{, *}, Seemon Thomas², Thomas Mathew³

¹Department of Statistics, St. George’s College Aruvithura, Kottayam, Kerala, India.

²Department of Statistics, St. Thomas College Pala, Kottayam, Kerala, India.

³Department of Mathematics and Statistics, University of Maryland, Baltimore County, Maryland.

^*Corresponding author. Email: sibiljose60@yahoo.com

Corresponding Author

Sibil Jose

Received 31 July 2017, Accepted 12 September 2018, Available Online 22 April 2019.

DOI: 10.2991/jsta.d.190306.004 How to use a DOI?
Keywords: Generalized pivotal quantity; one-parameter exponential distribution; two-parameter exponential distribution
Abstract: For the overlapping coefficient between two one-parameter or two-parameter exponential distributions, confidence intervals are developed using generalized pivotal quantities. The accuracy of the proposed solutions are assessed using estimated coverage probabilities, and are also compared with other approximate solutions. The results are illustrated with examples.
Copyright: © 2019 The Authors. Published by Atlantis Press SARL.
Open Access: This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Overlapping coefficient (OVL) is defined as the common area under two probability density functions, and it is a measure of similarity between two populations. The value of OVL ranges from 0 to 1, where a value 0 indicates that there is no overlap and a value 1 shows that the two populations are identical. It has wide applications as a similarity measure.

Let f1(x) and f2x be the probability density functions of two continuous populations. The OVL can be defined as

(1)OVL=∫minf1(x),f2(x)dx,

following Weitzman [1].

Inference for the OVL has been investigated by several researchers under both normal and exponential distributions. Inman and Bradley [2] considered both interval estimation and hypothesis testing for the OVL under two normal distributions with equal variances. “For normal distributions with equal means, but different variances, Mulekar and Mishra [3] investigated the computation of confidence intervals.” “For two one-parameter exponential distributions, Al-Saleh and Samawi [4] investigated the interval estimation of the OVL, and compared several methods, including the bootstrap and Taylor series approximation.” It should be noted that “Rom and Hwang [5] proposed the OVL as a measure of bioequivalence.”

In this article, we consider both one-parameter and two-parameter exponential distributions, and investigate both interval estimation and hypothesis testing for the OVL using the “concept of a generalized pivotal quantity (GPQ) due to Weerahandi [6–8].” “This concept was earlier applied by Roy and Mathew [9] for the interval estimation of the reliability function of a two-parameter exponential distribution.” In the context of inference concerning the OVL, “Reiser and Faraggi [10] did use the GPQ idea for two normal distributions with equal variances.” In the subsequent sections, we shall take up the interval estimation and hypothesis testing concerning the OVL for one-parameter exponential distributions, for two-parameter exponential distributions having a common scale parameter, and finally for general two-parameter exponential distributions that may not have any common parameters. The performance of the proposed GPQ approach will be assessed using simulations, and the methodology will be illustrated using examples.

2. ONE-PARAMETER EXPONENTIAL DISTRIBUTIONS

Consider two one-parameter exponential distributions with pdf

fix=1θiexp−xθi; x>0, θi>00; otherwise

for i = 1, 2, and let Δ be the OVL between them, as defined in (1). If r=θ1/θ2, then Δ simplifies to

(2)Δ=1−r11−r1−1r,

where we assume that r≠1. If r=1, the OVL is of course equal to one. The interval estimation of Δ is nontrivial since it is not a monotone function of r; “see the plot of Δ given in Figure 1 in [4].”

Let X1,X2,…,Xn1 and Y1,Y2,…,Yn2 be two independent random samples of sizes n1 and n2 taken from f1⋅ and f2⋅, respectively. Let X¯ and Y¯ be the respective sample means. Now,

U=2n1X¯θ1 and V=2n2Y¯θ2

are independent chi-square random variables with 2n1 and 2n2 degrees of freedom, respectively, so that F=2n1V2n2U∼F2n2,2n1, an F-distribution with df = 2n2,2n1. In order to derive a GPQ for Δ, we recall that a GPQ is a function of the random variables, the observed data, and the unknown parameters, and it satisfies two conditions: (i) given the observed data, the GPQ has a distribution that is completely free of unknown parameters, and (ii) if the random variables are replaced by the corresponding observed values, the GPQ will simplify to a quantity that is free of any nuisance parameters; very often, the GPQ will simplify to the parameter of interest. Appropriate percentiles of the GPQ can be used to obtain confidence limits for the parameter of interest. We shall obtain a GPQ for Δ after deriving a GPQ for r=θ1/θ2. For this, let x¯ and y¯, respectively, denote the observed values of X¯ and Y¯. Define

Tr=rx¯y¯Y¯X¯=x¯y¯Y¯/θ2X¯/θ1=x¯y¯2n1V2n2U=wF,

where w=x¯y¯ and F=2n1V2n2U∼F2n2,2n1, as already noted. It is now easily verified that Tr satisfies the two properties required of a GPQ, and it simplifies to r when the random variables X¯ and Y¯ are replaced by the corresponding observed values x¯ and y¯. A GPQ for Δ, say TΔ, is now given by

(3)TΔ=1−wF11−wF1−1wF.

The percentiles of TΔ provide confidence limits for Δ. For example, if TΔ0.95 is the 95th percentile of TΔ, then TΔ0.95 is a 95% upper confidence limit for Δ, referred to as a generalized upper confidence limit. A lower confidence limit, or a two-sided confidence interval, can be similarly obtained. Furthermore, for testing H0:Δ=Δ0 against H1:Δ<Δ0, a test procedure consists of rejecting H0 if TΔ0.95<Δ0, when we use a 5% significance level. We note that the required percentiles have to be numerically obtained. For example, in order to compute the 95th percentile of TΔ, we keep the observed data fixed (i.e., in Eq. (3) we keep w fixed), and generate several values (say, 10,000 values) from the F2n2,2n1 distribution. This gives 10,000 values of TΔ for a fixed value of w. The 95th percentile of these 10,000 values is an estimate of the 95th percentile of TΔ.

2.1. Simulation Study

In order to assess the performance of the GPQ-based confidence interval for Δ, and to compare with other competing methods, we shall now report some simulation results on the coverage probability and expected lengths of two-sided confidence intervals for Δ using a 95% nominal level. Confidence intervals based on the following methods are included for comparison: (i) Taylor series approximation method, (ii) percentile bootstrap, and (iii) bootstrap t. “Description of these competing methods and numerical results appear in [4],” and we shall not give details of these methods. Without loss of generality, we have chosen 0<r<1. For a few values of the sample sizes n1,n2, Table 1 gives the coverage probabilities and expected lengths of the 95% two-sided confidence intervals obtained by the different methods for three choices of r, namely 0.2, 0.5, and 0.8. The corresponding value of Δ is also given in the table. The percentile bootstrap and the GPQ method give comparable performance in terms of both coverage probability and expected lengths, even though conservatism is noted in a few cases. The other two methods, namely, the Taylor series approximation and the bootstrap t, generally exhibit unsatisfactory performance.

		n1,n2
		(20, 20)		(20, 30)		(50, 50)		(50, 100)
Δ	Methods	Expected (length)	Coverage	Expected (length)	Coverage	Expected (length)	Coverage	Expected (length)	Coverage
	Taylor series	0.341	0.945	0.307	0.941	0.207	0.952	0.183	0.944
Δ=0.465	Percentile boot	0.335	0.948	0.303	0.949	0.210	0.950	0.182	0.948
(r=0.2)	Bootstrap t	0.465	0.990	0.418	0.983	0.341	0.988	0.278	0.981
	GPQ	0.355	0.950	0.362	0.953	0.211	0.953	0.182	0.950
	Taylor series	0.438	0.935	0.342	0.926	0.274	0.941	0.235	0.942
Δ=0.75	Percentile boot	0.384	0.973	0.362	0.954	0.266	0.952	0.233	0.948
(r=0.5)	Bootstrap t	0.536	0.933	0.487	0.954	0.317	0.970	0.264	0.983
	GPQ	0.384	0.974	0.371	0.960	0.267	0.953	0.233	0.950
	Taylor series	0.464	0.965	0.418	0.959	0.289	0.965	0.249	0.971
Δ=0.918	Percentile boot	0.323	0.973	0.302	0.968	0.220	0.975	0.199	0.965
(r=0.8)	Bootstrap t	0.452	0.801	0.405	0.833	0.288	0.835	0.255	0.920
	GPQ	0.323	0.973	0.302	0.967	0.220	0.974	0.200	0.968

GPQ, generalized pivotal quantity; OVL, overlapping coefficient.

Table 1

Expected lengths and coverage probabilities of 95% two-sided confidence intervals for the OVL of two one-parameter exponential distributions.

The results in Table 1 are obtained using R based on 10,000 simulated samples. For the bootstrap methods, 1000 parametric bootstrap samples were used. Furthermore, the GPQ-based confidence interval from each simulated sample was computed using 10,000 values of the GPQ.

2.2. An Example

We shall now illustrate our methodology using an “example taken from Lawless [11]” on the life times of steel specimens tested at 14 different stress levels (see Table G.4 in [11] p. 574). The lifetimes at stress levels 32.0 and 33.0 are noted to follow one-parameter exponential distributions. There are 24 observations in the first set and 20 observations in the second set. The maximum likelihood estimates of the parameters are θ^1=x¯=1399.875 and θ^2=y¯=910.05. The estimated value of Δ is 0.8428. We computed 95% confidence intervals for Δ using (i) Taylor series approximation, (ii) percentile bootstrap based on 1000 parametric bootstrap samples, (iii) bootstrap t based on 1000 parametric bootstrap samples, and (iv) the GPQ. The confidence interval constructed using these methods are (0.5985, 1.0453), (0.5883, 0.9922), (0.5878, 1.1216), and (0.7503, 0.9960), respectively. We note that the first three methods give intervals that are somewhat similar; however, the GPQ method has produced an interval that is significantly shorter. Furthermore, the GPQ-based interval indicates that there is significant overlap between the two distributions. We also note that methods (i) and (iii) have resulted in confidence intervals with upper confidence limits exceeding one, even though the OVL cannot exceed one. One can of course truncate the confidence intervals to be within the parameter space, that is, the interval (0, 1).

3. TWO-PARAMETER EXPONENTIAL DISTRIBUTIONS WITH A COMMON SCALE PARAMETER

The pdfs are now given by

fix=1βexp −x−αiβ;β>0, x>αi0;otherwise

for i=1,2. The OVL in this case, denoted by ρ, is given by

(4)ρ=exp−|α1−α2|β; α1≠α2.

It is obvious that when α1=α2, ρ=1. Let X1,X2,…,Xn1 and Y1,Y2,…,Yn2 be two independent random samples of sizes n1 and n2 taken from f1⋅ and f2⋅, respectively. Let X¯ and Y¯ be the respective sample means, and let X1=min X1,X2,…,Xn1 and Y1=min Y1,Y2,…,Yn2. Define

U1=β−12n1X1−α1, U2=β−12n2Y1−α2V1=β−12n1X¯−X1, V2=β−12n2Y¯−Y1W=2n1X¯−X1+2n2Y¯−Y1V=β−1W=V1+V2.

Then it is well known that U1, U2, and V are independent, distributed as χ22, χ22, and χ2n1+2n2−42, respectively. Let x1,y1, and w, be the observed values of X1,Y1, and W, respectively. GPQs for β, α1, and α2, say Tβ, Tα1, and Tα2, respectively, are given by

Tβ=wV,Tα1=x1−U12n1TβandTα2=y1−U22n2Tβ.

For ρ given in (4), a GPQ, say Tρ, is then obtained as

Tρ=exp−|Tα1−Tα2|Tβ.

The percentiles of Tρ can be used to obtain confidence intervals for ρ, and also to test hypotheses concerning ρ.

3.1. Simulation Study

The results in Table 2 are obtained using R based on 10,000 simulated samples. Corresponding to the parameter choices α1=2, α2=1, and β=2 yields the OVL value ρ=0.6065. As was the case with the results in Table 1, we have used 10,000 simulated samples to estimate the coverage probabilities, and for each simulated sample 10,000 values of the GPQ were generated in order to compute the confidence limits. For the bootstrap methods, 10,000 parametric bootstrap samples were used. GPQ-based confidence intervals provide better coverage for small samples and large samples than bootstrap intervals. For small samples, GPQ-based confidence intervals are the shortest intervals. As the value of OVL depends on certain conditions, Taylor series approximation cannot be done.

	GPQ		Percentile boot
n1,n2	Coverage	Length	Coverage	Length
(10, 10)	0.9477	0.4518	0.9284	0.4546
(20, 20)	0.9492	0.2587	0.9243	0.2732
(20, 30)	0.9500	0.2315	0.8567	0.2322
(30, 20)	0.9481	0.2187	0.9522	0.2399
(50, 50)	0.9524	0.1379	0.9341	0.1407
(50, 100)	0.9491	0.1136	0.8656	0.1139
(100, 50)	0.9518	0.1093	0.9494	0.1129
(100, 100)	0.9492	0.0909	0.9384	0.0921
(50, 200)	0.9513	0.0925	0.8015	0.0916
(200, 50)	0.9506	0.0880	0.9033	0.0911

GPQ, generalized pivotal quantity.

Table 2

Estimated coverage probabilities of ρ in Eq. (4) using GPQ and bootstrap method.

3.2. An Example

We shall once again take up the example that we considered in Section 2.2; however, now we shall consider the lifetimes of steel specimens corresponding to stress levels 37.5 and 38.0. There are 20 observations in each sample. It can be tested that the samples are from exponential distributions with the same scale parameter. The maximum likelihood estimates of the parameters α1, α2, and β are 65, 59, and 54.45, respectively. The estimated value of ρ is 0.8957. The 95% GPQ-based confidence intervals is (0.7720, 0.9904) and the bootstrap confidence interval is (0.5998, 0.9822), respectively. We once again conclude that there is significant overlap between the two distributions. The GPQ-based interval is satisfactory in terms of maintaining the coverage probability and in terms of providing shortest confidence interval.

4. TWO-PARAMETER EXPONENTIAL DISTRIBUTIONS WITH DIFFERENT LOCATION AND SCALE PARAMETERS

Consider two exponential populations with pdfs fix;αi,βi

fix;αi,βi=1βiexp−x−αiβi; βi>0, x>αi0; otherwise

for i = 1, 2. The point of intersection of f1⋅ and f2 ⋅ is easily seen to be

x=1β1−1β2−1lnβ2β1+α1β1−α2β2.

The overlaps between the curves f1⋅ and f2⋅ can be calculated by considering the following four cases:

(i) α1<α2, β1<β2, (ii) α1<α2, β1>β2, (iii) α1>α2, β1<β2, and (iv) α1>α2, β1>β2. The corresponding OVLs will be denoted by ηi,i=1,2,3,4 respectively, for the four different cases given above. The ηi′s are as follows:

Case 1: α1<α2 and β1<β2
η1=1−eα1−α2β1−β2β2β1β1β1−β2−β2β1β2β1−β2; α2−α1<β1lnβ2β1e−α2−α1β1; α2−α1>β1lnβ2β1.
Case 2: α1<α2 and β1>β2
η2=e−α2−α1β1−eα1−α2β1−β2β2β1β2β1−β2−β2β1β1β1−β2.
Case 3: α1>α2 and β1<β2
η3=e−α1−α2β2−eα1−α2β1−β2β2β1β1β1−β2−β2β1β2β1−β2.
Case 4: α1>α2 and β1>β2
η4=1−eα1−α2β1−β2β2β1β2β1−β2−β2β1β1β1−β2; α2−α1>β2lnβ2β1e−α1−α2β2; α2−α1<β2lnβ2β1.

The expression for OVL, say η, can be written as a single expression, using indicator functions:

η=η1Iα1<α2, β1<β2+η2Iα1<α2, β1>β2+η3Iα1>α2, β1<β2+η4Iα1>α2, β1>β2

where I⋅ denotes the indicator function. We note that the indicator functions are also parameter dependent.

Consider two independent random samples of sizes n1 and n2 from f1⋅ and f2⋅, respectively. Let X¯, Y¯ be the respective sample means and X1, Y1 be the respective sample minima of the two samples. Then

U1=2n1β1−1X¯−X1∼χ2n1−22, U2=2n2β2−1Y¯−Y1∼χ2n1−22,V1=2n1β1−1X1−α1∼χ22, V2=2n2β2−1Y1−α2∼χ22,

where U1, U2, V1, and V2 are also independent. Also let x¯, y¯, x1, and y1 denote the observed values of X¯, Y¯, X1, and Y1, respectively. Then GPQs dor β1, β2, α1, and α2 are given below:

Tβ1=2n1U1x¯−x1Tβ2=2n2U2y¯−y1Tα1=x1−V12n1Tβ1Tα2=y1−V22n2Tβ2

Since the OVL η is a function of β1, β2, α1, and α2, a GPQ for η, say Tη can be obtained by replacing each parameter by the corresponding GPQ in the expression for the OVL. As was noted in the previous sections, the percentiles of Tη can be estimated by Monte Carlo simulation, and provide confidence limits for η.

4.1. Simulation Study

Table 3 gives the estimated coverage probabilities of the (i) GPQ based and (ii) percentile bootstrap based for the OVL for 95% nominal level, corresponding to the parameter choices α1=3, α2=2, β1=2, and β2=4; this choice yields the OVL value ρ=0.6272. As before, we have used 10,000 simulated samples to estimate the coverage probabilities, and for each simulated sample 10,000 values of the GPQ were generated in order to compute the confidence limits. Also, 10,000 bootstrap samples were used. We note that the GPQ methodology has resulted in confidence intervals that maintain the coverage probabilities fairly accurately and confidence intervals having shortest length for small samples.

	GPQ		Percentile boot
Sample size	Coverage	Length	Coverage	Length
(10, 10)	0.9609	0.4518	0.9248	0.4546
(20, 20)	0.9547	0.2869	0.9616	0.3068
(20, 30)	0.9285	0.2577	0.9688	0.2844
(30, 20)	0.9605	0.2675	0.9697	0.2621
(50, 50)	0.9548	0.1840	0.9586	0.1757
(50, 100)	0.9439	0.1628	0.9493	0.1727
(100, 50)	0.9571	0.1480	0.9453	0.1435
(100, 100)	0.9528	0.1298	0.9505	0.1263
(50, 200)	0.9518	0.1649	0.9134	0.1577
(200, 50)	0.9565	0.1267	0.9143	0.1237

GPQ, generalized pivotal quantity; OVL, overlapping coefficient.

Table 3

Estimated coverage probabilities of the OVL coefficient η based on GPQ and bootstrap methods.

4.2. An Example

Continuing with the “data set from [11]” that we have used earlier, we shall now consider the lifetimes of steel specimens tested at stress levels 34 and 37.5. are considered. Each set consists of 20 observations. The two sets are exponentially distributed with different scale parameters. The maximum likelihood estimates of the parameters α1, α2, β1, and β2 are 146, 65, 329.8, and 57.8, respectively. The estimated value of the OVL is 0.2463. The 95% GPQ-based confidence intervals are (0.1569, 0.4603) and the corresponding bootstrap confidence interval is (0.0551, 0.3142), respectively. Unlike what was noted earlier, the overlap between the two distributions is not that substantial.

5. DISCUSSION

In this brief note, we have presented a unified approach for computing confidence limits for the OVL of two exponential distributions, one-parameter, as well as two-parameter, using the concept of a GPQ. Through numerical results, we have demonstrated the accuracy of the proposed method in terms of meeting the coverage probability requirement. Furthermore, illustrative examples are provided. Though the proposed solutions have to be numerically obtained, the required computations are both simple and straightforward.

REFERENCES

1.M.S. Weitzman, Technical paper No.22, Department of Commerce, Bureau of Census, Washington, U.S.A, 1970.

2.H.F. Inman and E.L. Bradley, Environmetrics, 5-1994, pp. 167-189.

3.M.S. Mulekar and S.N. Mishra, Comput. Stat. Data Anal., Vol. 34, 2000, pp. 121-137.

4.M.F. Al-Saleh and H.M. Samawi, J. Mod. Appl. Stat. Methods, Vol. 6, No. 2, 2007, pp. 503-516.

5.D.R. Rom and E. Hwang, Stat. Med., Vol. 15, No. 14, 1996, pp. 1489-1505.

6.S. Weerahandi, J. Am. Stat. Assoc., Vol. 88, 1993, pp. 899-905.

7.S. Weerahandi, Exact Statistical Methods for Data Analysis, Springer series in Statistics, New York, 1994.

8.S. Weerahandi, Generalized Inference in Repeated Measures, Wiley series in probability and statistics, New Jersey, 2004.

9.A. Roy and T. Mathew, J. Stat. Plan. Inf., Vol. 128, No. 2, 2005, pp. 509-517.

10.B. Reiser and D. Faraggi, Statistician, Vol. 48, No. 3, 1999, pp. 413-418.

11.J.F. Lawless, Statistical Models and Methods for Lifetime Data, Wiley series in probability and Statistics, New Jersey, 2003.

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Journal: Journal of Statistical Theory and Applications
Volume-Issue: 18 - 1
Pages: 26 - 32
Publication Date: 2019/04/22
ISSN (Online): 2214-1766
ISSN (Print): 1538-7887
DOI: 10.2991/jsta.d.190306.004 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Sibil Jose
AU  - Seemon Thomas
AU  - Thomas Mathew
PY  - 2019
DA  - 2019/04/22
TI  - Interval Estimation of the Overlapping Coefficient of Two Exponential Distributions
JO  - Journal of Statistical Theory and Applications
SP  - 26
EP  - 32
VL  - 18
IS  - 1
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.190306.004
DO  - 10.2991/jsta.d.190306.004
ID  - Jose2019
ER  -

download .riscopy to clipboard