An Asymptotic Two-Sided Test in a Family of Multivariate Distribution
- https://doi.org/10.2991/jsta.d.200511.001How to use a DOI?
- Asymptotic two-sided test; Chi-squared distribution; Efficient algorithm; Multivariate distribution elements
In the present paper, a two-sided test in a family of multivariate distribution according to the Mahalanobis distance with mean vector and positive definite matrix is considered. First, a family of multivariate distribution is introduced, then using the likelihood ratio method a test statistic is computed. The distribution of the test statistic is proposed for different sample sizes and fixed dimension. We study the distribution approximation computed using the likelihood ratio test and an efficient algorithm to compute the density functions can be derived according to Witkovsk´y, J. Stat. Plan. Inference. 94 (2001), 1–13. Also, a simulation study is presented on the sample sizes and powers to compare the performance of tests and show that the proposed distribution approximation is better than the classical distribution approximation.
- © 2020 The Authors. Published by Atlantis Press SARL.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
In multivariate analysis, testing the asymptotically of elements is one topic of interest and important. We are interested in testing the asymptotically of its elements based on a random sample of sample size from this population. In classic multivariate analysis, the dimension is fixed or relatively small compared with the sample size and the likelihood ratio test is an effective way to test the hypothesis of asymptotically. Also, in multivariate distribution, testing the asymptotically of grouped elements is one topic of interest. Testing asymptotically of vectors elements, such as financial data, the consumer data, the modern manufacturing data and the multimedia data, was always a matter of interest. The likelihood ratio test method can be used for testing asymptotic hypothesis. When the dimension remains fixed and the sample sizes go to infinity, the classical theory states that the null distribution of the likelihood ratio test converge to chi-squared distribution. Van der Laan and Bryan , show that the sample mean of –dimensional data can consistently estimate the population mean uniformly across dimensions for bounded random variables. In a major generalization, Kosorok and Ma  consider uniform convergence for a range of univariate statistics constructed for each data dimension which includes the marginal empirical distribution, sample mean and sample median. Fan et al.  evaluated approximating the overall level of significance for testing of means. They demonstrate that the bootstrap can accurately approximate the overall level of significance when the marginal tests are performed based on the normal or the –distributions. See also Fan et al.  and Huang et al. , for estimation and testing in semiparametric regression models.
Wang et al. , proposed a novel –dimensional nonparametric test for the population mean vector for a general class of multivariate distributions. They proved that the limiting null distribution of the proposed test is normal under mild conditions when the dimension is substantially larger than n and studied the local power of the proposed test and compare its relative efficiency with a modified Hotelling test for –dimensional data. They further illustrate its application by an empirical analysis of a genomics data set. Li and Liu , considered the problem of testing the complete independence of random variables when the dimension of observations can be much larger than the sample size. They introduced the permutation test and simulation results showed that for finite dimension and sample size the proposed test outperforms the existing methods in various cases.
Schott , developed a simpler test procedure specifically designed for –dimensional data. The test is based on the sample correlation matrix. Schott , proposed a simple statistic for testing the equality of the covariance matrices of several multivariate normal populations when the dimension is large relative to the sample sizes. Huster and Li , investigated testing the existence of the dependence function under the null hypothesis of asymptotic independence and present two suitable test statistics. Small simulations are studied and the application for a real data is shown. The asymptotic null distribution of this statistic, as both the sample sizes and the number of variables go to infinity, shown to be normal. For more information for testing about covariance matrices in –dimensional data one can see for example, Ledoit et al. (2002), Bai et al. , Chen et al. , Jiang et al.  and Jiang and Yang . Tsai et al. , showed that the existing tests for asymptotic independence are sensitive to outliers. A robust test proposed. The new test was made stable under contamination through a shrinkage scheme. Thus, many of these procedures will only be reliable when the sample sizes are substantially larger than dimension. A better approach in this –dimensional data setting would be to use a procedure which is based on asymptotic theory which has both and the sample sizes approaching infinity. Some examples of recent work on inference problems in this –dimensional setting include in Birke and Dette , Ledoit and Wolf , Fujikoshi , Schott  and Srivastava . More results on the ordered hypothesis tests especially in the multivariate normal distributions given in Bazyari and Pesarin , Bazyari , Bazyari  and Bazyari ans Afshari .
Testing in a dimensional –variate normal vector components, has been studied by Jiang and Yang . As extending the results of Jiang and Yang (, Theorem 2) to the more than one population case, testing independence of –dimensional normal vectors components are considered. We are interested in a two-sided test in a family of multivariate distribution. The same problem has been considered by Jiang and Yang , and Jiang and Qi (2013) when the number of the partition is fixed. The aim of the project is to extend the test to an arbitrary partitions and allow the number of the partition to change with the sample size.
The rest of the paper is organized as follows: In Section 2, we introduce a family of multivariate distribution consist of a –dimensional –distribution with parameters (real location vector), ( real positive definite scale matrix) and (positive real degrees of freedom parameter) according to the Mahalanobis distance between and , and derive an asymptotic test using the likelihood ratio test statistic. In Section 3, the asymptotic distribution of test statistic for a two-sided test is given. In Section 4, a simulation study on the size and power of tests is presented. Concluding remarks are given in Section 5. The complete source programs are written in statistical software.
2. A FAMILY OF MULTIVARIATE DISTRIBUTION
As a multivariate version of Jones' (2004) univariate construction defined in Anderson  and Castillo and Sarabia (2006) have proposed multivariate distributions based on an enriching process using a representation of a –dimensional random vector with a given distribution due to Rosenblatt . Here a multivariate normal distribution is considered. Let the –dimensional vector , , is distributed as the family of multivariate distribution given in Marshall and Olkin . For example, researcher can consider a –dimensional –distribution with parameters (real location vector), ( real positive definite scale matrix) and (positive real degrees of freedom parameter) is given bywhere is the Mahalanobis distance between and and , where denotes the Gamma function. A difficulty with the standard representation of the –distribution is that when is diagonal this representation can be shown to have zero correlation but the marginal distributions are not statistically independent. Equivalently the product of independent univariate –distributions with the same degrees of freedom parameter is not a standard multivariate –distribution with a diagonal scale matrix. We will see that the multivariate generalization we propose has in contrast this property and contains the product of independent –distributions as a particular case. Also, as mentioned by Kotz and Nadarajah , the standard –distributions belongs to the class of elliptically contoured distributions (see for instance Fang et al.  for a definition of elliptical distributions). We will see in the next section that our generalization allows for a greater variety of shapes and in particular contours that are not necessarily elliptic. Note however that our proposal is different from the meta-elliptical distributions of Fang et al. .
Most of the work on multivariate scale mixture of Gaussians has focused on studying different choices for the weight distribution surprisingly, little work to our knowledge has focused on the dimension of the weight variable W which in most cases has been considered as univariate. The difficulty in considering multiple weights is the interpretation of such a multidimensional case. The extension we propose consists then of introducing the parameterization of the scale matrix into , where is the matrix of eigenvectors of . The matrix determines the orientation of the Gaussian and its shape. Such a parameterization has the advantage to allow an intuitive incorporation of the multiple weight parameters. The generalization we propose is therefore to define
We can use then one of the equivalent expressions belowwhere denotes the th component of vector and the th diagonal element of the diagonal matrix (or equivalently the th eigenvalue of ). Then (1) follows that
The terms in the product reduce then to standard univariate scale mixtures. Another generative way to see this construction which is useful for simulation consists of simulating an –dimensional Gaussian variable with mean zero and covariance matrix equal to the identity matrix and to consider independent positive variables with respective distributions . Then the vectorfollows one of the distributions below depending on the choice of . For example, setting to a Gamma distribution results in a multivariate generalization of a Pearson type VII distribution. Setting to leads to a generalization of the multivariate –distribution. In Figure 1, we show some of the different shapes in a two-dimensional setting for different values of and , with fixed to and to Additional examples are shown in Figure 1 of the Supplementary Materials).
2.1. An Asymptotic Test
We can write a vector of independent variables whose distributions are given by
In the t-distribution and Pearson VII distribution cases, follows respectively a standard one-dimensional (1D) t-distribution and a standard 1D Pearson VII distribution . In the t-distribution case, a 1D marginal is then a linear combination of standard 1D –distributions for which in the general case no closed-form expression is available. However an efficient algorithm to compute such pdfs can be derived according to Witkovsk´y . The derivation in Witkovsk´y  is based on the inversion formula of the characteristic function which in the univariate case is
We use the tdist R package of V. Witkovsky available at http://aiolos.um.savba.sk/~viktor/software.html to plot the pdf of some marginals and compare it with 1D –distributions. We also plot the histogram obtained by simulations to illustrate its consistency with the marginal pdf formula. The fact that the marginals are not in general –distributions is a notable difference with other multivariate generalizations. With paying attention to the family of multivariate distribution, we partition the random vector into components as , where , and . Similarly, the mean vector and the covariance matrix are partitioned as and , where is the th partition of covariance matrix respectively.
Knowing these conventions, the preferred null hypothesis in this paper is that the components are mutually independently distribution, i.e. the density of can be written as the product of the density functions of . When we fix the last density, therefore, can be expressed as
For the variable of a vector of independent variables whose distributions are given bythis condition is held.
In fact we can write the following hypothesis:
Testing against is a two-sided test. This test can be done using different methods when the mean vectors and covariance matrices are unknown. To do this, let be the observations on the vector , . Then, the likelihood function of the observed data is given by
It can be verified (Anderson , Theorem 3.2.1), thatwhere , . It is easy to see that, under the hypothesis given in (4), the covariance matrix is equal to the diagonal matrix with diagonal elements . Therefore, under the null hypothesis, we have
Hence, the likelihood ratio test statistic obtains as
Note that, the statistic in (5) only exists when the inequality holds. Here, we suppose that . By the general theory of 's, it is demonstrated that the null distribution of is a chi-squared distribution with degrees of freedom as goes to infinity and the dimension is fixed.
3. ASYMPTOTIC DISTRIBUTION OF TEST STATISTIC
Under the null hypothesis given in (4),for the density function according to all sample random variables, using the likelihood ratio test, ,
The random variable converges in distribution to Standard normal distribution when the dimension goes to , whereand , and , if the conditions and the fraction goes to , where , as the dimension goes to for , hold.
The part (a) holds for the termwhere the th diagonal element of the diagonal matrix .
First, from Hardy et al. , if real numbers are greater than , and are all positive or all negative, then we have . For given , , and taking and , we see that
Therefore, . Then when the dimension goes to infinity, the fractionconverges to where and .
One can easily proof that all of these conditions hold for the following –dimensional –distribution:
For the fraction converges to the fraction as the dimension goes to infinity. In fact, for the second case, , the limit is obviously since . Now fix the value such that and set .
It is easy to see that the fraction converges to the fraction as the dimension goes to . Then with the fact thator .
Putting as the dimension goes to infinity. Now, for fix which , we have converge to for and 0 for . This results that, converge to for and 0 for .
Furthermore, when , for –dimensional –distribution, we have thatif and only if convergence to the fraction as the dimension goes to . For the density function which are real values, the inequality also holds. Therefore, from , we have that or . Similarly, since for , the term is an increasing function, then . As a result, from , we have that therefore . Using the theorem (11.2.3) from Muirhead , when the hypothesis is true, the th moment of likelihood ratio test given in (4) is where for complex number with , the multivariate gamma function (Muirhead , p. 62) .
Alsoas the dimension goes to infinity. Also, we get that if and only if as dimension goes to infinity. On the other hand, the random variable converges in distribution to standard normal distribution as the dimension goes to infinity and this completes the proof.
The proof of this part is similar to the part (a).
4. COMPARE THE PERFORMANCE OF TESTS
In this section, we compare the performance of the chi-square approximation and normal approximation through a finite sample simulation study. We plot the histograms for the chi-square statistics which are used for the chi-square approximation and compare with their corresponding limiting chi-square curves. In this simulation, the notation stands for the matrix whose entries are all equal to 1, and is equal to an identity matrix. Also, without loss of generality, we suppose that all the population covariance matrices 's are equal to with , mean vectors 's are all equal to zero, the nominal Type I error rate is 0.05 and . Furthermore, we consider the same partition of for all distributions. For each combination of , , , and , using 20000 replications from the multivariate normal distribution , with mean vector 0, the simulated values of size, , and power with , for chi-square and normal approximations are calculated and given in Tables 1 and 2. Also, in Figure 2, for different values of simulation parameters, the histograms of 20000 simulated null values of and with added standard normal and chi-square curves are pictures, respectively. We choose , and , with same partition of for each of the distribution as for , for , for , and for .
|p||Partition of p||Size under Chi-square Approximation||Size under Normal Approximation|
|5||2, 1, 2||0.454||0.038|
|0||10||4, 3, 3||0.761||0.042|
|20||4, 5, 6, 5||0.923||0.052|
|0||20||5, 7, 8||0.636||0.051|
|45||17, 15, 13||0.801||0.061|
|0||50||15, 17, 18||0.745||0.050|
|95||15, 30 30, 20||0.922||0.055|
|0||50||12, 24, 13, 21, 10||0.481||0.054|
|95||50, 70, 25||0.792||0.054|
The simulated values of size for two tests under .
|p||Partition of p||Power under Chi-square Approximation||Power under Normal Approximation|
|5||2, 1, 2||0.750||0.895|
|0.05||10||4, 3, 3||0.739||0.812|
|20||4, 5, 6, 5||0.634||0.730|
|0.05||20||5, 7, 8||0.694||0.804|
|45||17, 15, 13||0.605||0.729|
|0.05||20||15, 17, 18||0.669||0.914|
|45||15, 30, 30, 20||0.504||0.823|
|0.05||20||12, 24, 13, 21, 10||0.610||0.751|
|45||50, 70, 25||0.592||0.714|
|5||2, 1, 2||0.653||0.870|
|0.6||10||4, 3, 3||0.624||0.813|
|20||4, 5, 6, 5||0.548||0.685|
|0.6||10||5, 7, 8||0.624||0.824|
|20||17, 15, 13||0.603||0.741|
|0.6||10||15, 17, 18||0.525||0.745|
|20||15, 30, 30, 20||0.511||0.655|
|0.6||80||12, 24, 13, 21, 10||0.603||0.771|
|145||50, 70, 25||0.590||0.723|
The simulated values of power for two tests under .
The plots in the top row of Figure 2 indicate that and standard normal curve match better as the dimension becomes larger and the pictures in the bottom row show that the histogram of move away gradually from chi-square curve as the dimension grows.
From Tables 1 and 2, we inference that our normal approximation and the classical chi-square approximation are comparable for the large sample sizes and small values of dimension. Under the hypothesis , by increasing the dimension, the simulated size chi-square approximation tends to one, whereas the simulated size of the our proposed method is around .
From Table 2, we see that the power of normal approximation is greater than the power of chi-square approximation and also for any fixed sample sizes, by increasing the dimension, the power of our normal approximation decreases.
5. CONCLUDING REMARKS
In this paper, an asymptotic two-sided test in a family of multivariate distribution components with mean vector and positive definite matrix was considered. Using the likelihood ratio method a test statistic computed and the asymptotic distribution proposed. We studied the distribution approximation computed using the likelihood ratio test and an efficient algorithm to compute the density functions can be derived according to Witkovsk´y . Also, a simulation study presented on the sample sizes and powers to show that the proposed distribution approximation outperform the classical distribution approximation.
CONFLICT OF INTEREST
There is no conflict of interest to declare.
The authors gratefully thank to Editor in chief and the referees for their constructive comments and recommendations which definitely help to improve the readability and quality of the paper.
Cite this article
TY - JOUR AU - Abouzar Bazyari AU - Mahmoud Afshari AU - Monjed H. Samuh PY - 2020 DA - 2020/05/26 TI - An Asymptotic Two-Sided Test in a Family of Multivariate Distribution JO - Journal of Statistical Theory and Applications SP - 162 EP - 172 VL - 19 IS - 2 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.d.200511.001 DO - https://doi.org/10.2991/jsta.d.200511.001 ID - Bazyari2020 ER -