Interval Estimation of the Overlapping Coefficient of Two Exponential Distributions
- 10.2991/jsta.d.190306.004How to use a DOI?
- Generalized pivotal quantity; one-parameter exponential distribution; two-parameter exponential distribution
For the overlapping coefficient between two one-parameter or two-parameter exponential distributions, confidence intervals are developed using generalized pivotal quantities. The accuracy of the proposed solutions are assessed using estimated coverage probabilities, and are also compared with other approximate solutions. The results are illustrated with examples.
- © 2019 The Authors. Published by Atlantis Press SARL.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
Overlapping coefficient (OVL) is defined as the common area under two probability density functions, and it is a measure of similarity between two populations. The value of OVL ranges from 0 to 1, where a value 0 indicates that there is no overlap and a value 1 shows that the two populations are identical. It has wide applications as a similarity measure.
Let and be the probability density functions of two continuous populations. The OVL can be defined asfollowing Weitzman .
Inference for the OVL has been investigated by several researchers under both normal and exponential distributions. Inman and Bradley  considered both interval estimation and hypothesis testing for the OVL under two normal distributions with equal variances. “For normal distributions with equal means, but different variances, Mulekar and Mishra  investigated the computation of confidence intervals.” “For two one-parameter exponential distributions, Al-Saleh and Samawi  investigated the interval estimation of the OVL, and compared several methods, including the bootstrap and Taylor series approximation.” It should be noted that “Rom and Hwang  proposed the OVL as a measure of bioequivalence.”
In this article, we consider both one-parameter and two-parameter exponential distributions, and investigate both interval estimation and hypothesis testing for the OVL using the “concept of a generalized pivotal quantity (GPQ) due to Weerahandi [6–8].” “This concept was earlier applied by Roy and Mathew  for the interval estimation of the reliability function of a two-parameter exponential distribution.” In the context of inference concerning the OVL, “Reiser and Faraggi  did use the GPQ idea for two normal distributions with equal variances.” In the subsequent sections, we shall take up the interval estimation and hypothesis testing concerning the OVL for one-parameter exponential distributions, for two-parameter exponential distributions having a common scale parameter, and finally for general two-parameter exponential distributions that may not have any common parameters. The performance of the proposed GPQ approach will be assessed using simulations, and the methodology will be illustrated using examples.
2. ONE-PARAMETER EXPONENTIAL DISTRIBUTIONS
Consider two one-parameter exponential distributions with pdffor i = 1, 2, and let be the OVL between them, as defined in (1). If , then simplifies to where we assume that If , the OVL is of course equal to one. The interval estimation of is nontrivial since it is not a monotone function of ; “see the plot of given in Figure 1 in .”
Let and be two independent random samples of sizes and taken from and , respectively. Let and be the respective sample means. Now,are independent chi-square random variables with and degrees of freedom, respectively, so that , an F-distribution with df = . In order to derive a GPQ for , we recall that a GPQ is a function of the random variables, the observed data, and the unknown parameters, and it satisfies two conditions: (i) given the observed data, the GPQ has a distribution that is completely free of unknown parameters, and (ii) if the random variables are replaced by the corresponding observed values, the GPQ will simplify to a quantity that is free of any nuisance parameters; very often, the GPQ will simplify to the parameter of interest. Appropriate percentiles of the GPQ can be used to obtain confidence limits for the parameter of interest. We shall obtain a GPQ for after deriving a GPQ for . For this, let and , respectively, denote the observed values of and . Define where and , as already noted. It is now easily verified that satisfies the two properties required of a GPQ, and it simplifies to when the random variables and are replaced by the corresponding observed values and . A GPQ for , say , is now given by
The percentiles of provide confidence limits for . For example, if is the 95th percentile of , then is a 95% upper confidence limit for , referred to as a generalized upper confidence limit. A lower confidence limit, or a two-sided confidence interval, can be similarly obtained. Furthermore, for testing against , a test procedure consists of rejecting if , when we use a 5% significance level. We note that the required percentiles have to be numerically obtained. For example, in order to compute the 95th percentile of , we keep the observed data fixed (i.e., in Eq. (3) we keep fixed), and generate several values (say, 10,000 values) from the distribution. This gives 10,000 values of for a fixed value of . The 95th percentile of these 10,000 values is an estimate of the 95th percentile of .
2.1. Simulation Study
In order to assess the performance of the GPQ-based confidence interval for , and to compare with other competing methods, we shall now report some simulation results on the coverage probability and expected lengths of two-sided confidence intervals for using a 95% nominal level. Confidence intervals based on the following methods are included for comparison: (i) Taylor series approximation method, (ii) percentile bootstrap, and (iii) bootstrap . “Description of these competing methods and numerical results appear in ,” and we shall not give details of these methods. Without loss of generality, we have chosen . For a few values of the sample sizes , Table 1 gives the coverage probabilities and expected lengths of the 95% two-sided confidence intervals obtained by the different methods for three choices of , namely 0.2, 0.5, and 0.8. The corresponding value of is also given in the table. The percentile bootstrap and the GPQ method give comparable performance in terms of both coverage probability and expected lengths, even though conservatism is noted in a few cases. The other two methods, namely, the Taylor series approximation and the bootstrap , generally exhibit unsatisfactory performance.
|(20, 20)||(20, 30)||(50, 50)||(50, 100)|
|Methods||Expected (length)||Coverage||Expected (length)||Coverage||Expected (length)||Coverage||Expected (length)||Coverage|
GPQ, generalized pivotal quantity; OVL, overlapping coefficient.
Expected lengths and coverage probabilities of 95% two-sided confidence intervals for the OVL of two one-parameter exponential distributions.
The results in Table 1 are obtained using R based on 10,000 simulated samples. For the bootstrap methods, 1000 parametric bootstrap samples were used. Furthermore, the GPQ-based confidence interval from each simulated sample was computed using 10,000 values of the GPQ.
2.2. An Example
We shall now illustrate our methodology using an “example taken from Lawless ” on the life times of steel specimens tested at 14 different stress levels (see Table G.4 in  p. 574). The lifetimes at stress levels 32.0 and 33.0 are noted to follow one-parameter exponential distributions. There are 24 observations in the first set and 20 observations in the second set. The maximum likelihood estimates of the parameters are and . The estimated value of is 0.8428. We computed confidence intervals for using (i) Taylor series approximation, (ii) percentile bootstrap based on 1000 parametric bootstrap samples, (iii) bootstrap based on 1000 parametric bootstrap samples, and (iv) the GPQ. The confidence interval constructed using these methods are (0.5985, 1.0453), (0.5883, 0.9922), (0.5878, 1.1216), and (0.7503, 0.9960), respectively. We note that the first three methods give intervals that are somewhat similar; however, the GPQ method has produced an interval that is significantly shorter. Furthermore, the GPQ-based interval indicates that there is significant overlap between the two distributions. We also note that methods (i) and (iii) have resulted in confidence intervals with upper confidence limits exceeding one, even though the OVL cannot exceed one. One can of course truncate the confidence intervals to be within the parameter space, that is, the interval (0, 1).
3. TWO-PARAMETER EXPONENTIAL DISTRIBUTIONS WITH A COMMON SCALE PARAMETER
The pdfs are now given byfor . The OVL in this case, denoted by , is given by
It is obvious that when , . Let and be two independent random samples of sizes and taken from and , respectively. Let and be the respective sample means, and let and . Define
Then it is well known that , , and are independent, distributed as , , and , respectively. Let , and , be the observed values of , and , respectively. GPQs for , , and , say , , and , respectively, are given by
For given in (4), a GPQ, say , is then obtained as
The percentiles of can be used to obtain confidence intervals for , and also to test hypotheses concerning .
3.1. Simulation Study
The results in Table 2 are obtained using R based on 10,000 simulated samples. Corresponding to the parameter choices , , and yields the OVL value . As was the case with the results in Table 1, we have used 10,000 simulated samples to estimate the coverage probabilities, and for each simulated sample 10,000 values of the GPQ were generated in order to compute the confidence limits. For the bootstrap methods, 10,000 parametric bootstrap samples were used. GPQ-based confidence intervals provide better coverage for small samples and large samples than bootstrap intervals. For small samples, GPQ-based confidence intervals are the shortest intervals. As the value of OVL depends on certain conditions, Taylor series approximation cannot be done.
GPQ, generalized pivotal quantity.
Estimated coverage probabilities of ρ in Eq. (4) using GPQ and bootstrap method.
3.2. An Example
We shall once again take up the example that we considered in Section 2.2; however, now we shall consider the lifetimes of steel specimens corresponding to stress levels 37.5 and 38.0. There are 20 observations in each sample. It can be tested that the samples are from exponential distributions with the same scale parameter. The maximum likelihood estimates of the parameters , , and are 65, 59, and 54.45, respectively. The estimated value of is 0.8957. The GPQ-based confidence intervals is (0.7720, 0.9904) and the bootstrap confidence interval is (0.5998, 0.9822), respectively. We once again conclude that there is significant overlap between the two distributions. The GPQ-based interval is satisfactory in terms of maintaining the coverage probability and in terms of providing shortest confidence interval.
4. TWO-PARAMETER EXPONENTIAL DISTRIBUTIONS WITH DIFFERENT LOCATION AND SCALE PARAMETERS
Consider two exponential populations with pdfsfor i = 1, 2. The point of intersection of and is easily seen to be
The overlaps between the curves and can be calculated by considering the following four cases:
(i) , (ii) , (iii) , and (iv) . The corresponding OVLs will be denoted by respectively, for the four different cases given above. The are as follows:
Case 1: and
Case 2: and
Case 3: and
Case 4: and
The expression for OVL, say , can be written as a single expression, using indicator functions:where denotes the indicator function. We note that the indicator functions are also parameter dependent.
Consider two independent random samples of sizes and from and , respectively. Let , be the respective sample means and , be the respective sample minima of the two samples. Thenwhere , , , and are also independent. Also let , , , and denote the observed values of , , , and , respectively. Then GPQs dor , , , and are given below:
Since the OVL is a function of , , , and , a GPQ for , say can be obtained by replacing each parameter by the corresponding GPQ in the expression for the OVL. As was noted in the previous sections, the percentiles of can be estimated by Monte Carlo simulation, and provide confidence limits for .
4.1. Simulation Study
Table 3 gives the estimated coverage probabilities of the (i) GPQ based and (ii) percentile bootstrap based for the OVL for 95% nominal level, corresponding to the parameter choices , , , and ; this choice yields the OVL value . As before, we have used 10,000 simulated samples to estimate the coverage probabilities, and for each simulated sample 10,000 values of the GPQ were generated in order to compute the confidence limits. Also, 10,000 bootstrap samples were used. We note that the GPQ methodology has resulted in confidence intervals that maintain the coverage probabilities fairly accurately and confidence intervals having shortest length for small samples.
GPQ, generalized pivotal quantity; OVL, overlapping coefficient.
Estimated coverage probabilities of the OVL coefficient η based on GPQ and bootstrap methods.
4.2. An Example
Continuing with the “data set from ” that we have used earlier, we shall now consider the lifetimes of steel specimens tested at stress levels 34 and 37.5. are considered. Each set consists of 20 observations. The two sets are exponentially distributed with different scale parameters. The maximum likelihood estimates of the parameters , , , and are 146, 65, 329.8, and 57.8, respectively. The estimated value of the OVL is 0.2463. The GPQ-based confidence intervals are (0.1569, 0.4603) and the corresponding bootstrap confidence interval is (0.0551, 0.3142), respectively. Unlike what was noted earlier, the overlap between the two distributions is not that substantial.
In this brief note, we have presented a unified approach for computing confidence limits for the OVL of two exponential distributions, one-parameter, as well as two-parameter, using the concept of a GPQ. Through numerical results, we have demonstrated the accuracy of the proposed method in terms of meeting the coverage probability requirement. Furthermore, illustrative examples are provided. Though the proposed solutions have to be numerically obtained, the required computations are both simple and straightforward.
Cite this article
TY - JOUR AU - Sibil Jose AU - Seemon Thomas AU - Thomas Mathew PY - 2019 DA - 2019/04/22 TI - Interval Estimation of the Overlapping Coefficient of Two Exponential Distributions JO - Journal of Statistical Theory and Applications SP - 26 EP - 32 VL - 18 IS - 1 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.d.190306.004 DO - 10.2991/jsta.d.190306.004 ID - Jose2019 ER -