Journal of Statistical Theory and Applications

Volume 18, Issue 3, September 2019, Pages 244 - 258

A Multivariate Skew-Normal Mean-Variance Mixture Distribution and Its Application to Environmental Data with Outlying Observations

Authors
M. Tamandi1, *, N. Balakrishnan2, A. Jamalizadeh3, M. Amiri4
1Department of Statistics, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran
2Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, Canada
3Department of Statistics, Faculty of Mathematics and Computers, Shahid Bahonar University of Kerman, Kerman, Iran
4Department of Statistics, Faculty of Basic Sciences, University of Hormozgan, Bandar Abbas, Iran
*Corresponding author. Email: tamandi@vru.ac.ir
Corresponding Author
M. Tamandi
Received 21 February 2019, Accepted 14 June 2019, Available Online 3 July 2019.
DOI
10.2991/jsta.d.190617.001How to use a DOI?
Keywords
Multivariate distribution; Birnbaum–Saunders; ECM algorithm; Outliers; Mean-variance mixtures
Abstract

The presence of outliers, skewness, kurtosis, and dependency are well-known challenges while fitting distributions to many data sets. Developing multivariate distributions that can properly accomodate all these aspects has been the aim of several researchers. In this regard, we introduce here a new multivariate skew-normal mean-variance mixture based on Birnbaum–Saunders distribution. The resulting model is a good alternative to some skewed distributions, especially the skew-t model. The proposed model is quite flexible in terms of tail behavior and skewness, and also displays good performance in the presence of outliers. For the determination of maximum likelihood estimates, a computationally efficient Expectation-Conditional-Maximization (ECM) algorithm is developed. The performance of the proposed estimation methodology is illustrated through Monte Carlo simulation studies as well as with some real life examples.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Although normal distribution has a central role in statistical analysis, the presence of outliers or atypical observations becomes problematic in some applications. This problem often occurs in some fields such as environmental and financial sciences. For example, in a random sample of some variables from water quality or air pollution in a specific area, it is quite common to encounter some outliers in the data. Similarly, outlying observations may be seen in stock returns analysis. For further details on this issue, one may refer to [1] and [2]. When data are noisy, fitting a suitable distribution does become a challenge. Adding a tail or skewness parameter to a normal distribution is a recognized method to accommodate the presence of possible outliers. Two well-known formulations that follow this approach are by Barndorff-Nielsen [3] and Azzalini [4].

Normal mean-variance mixture (NMVM) distributions were introduced by Barndorff-Nielsen [3]. Specifically, if Y be a p×1 random vector from NMVM distribution, then Y can be represented as

Y=ξ+Wλ+W1/2Z,
where ZNp0,Σ and W>0 is a scalar-valued random variable that is independent of Z. The parameter vectors of ξ and λ are in Rp and Σ is a positive definite matrix. Generalized hyperbolic (GH) distribution, introduced by Barndorff-Nielsen [5], is one of the prominent distributions in this class. This class, in addition, includes several important distributions such as skew Laplace [6] and normal inverse Gaussian (NIG) [3] as special cases. The random variable with GH distribution can be represented as a NMVM variable when W in (1) is the generalized inverse Gaussian (GIG) random variable.

A positive random variable W follows a GIG distribution, denoted by WGIGκ,χ,ψ, if its probability density function (PDF) is given by

fGIGw;κ,χ,ψ=ψχκ/2wκ12Kκψχexp 12w1χ+wψ,  w>0,
where Kκ (with κR) denotes the modified Bessel function of the third kind with the property Kκ=Kκ. The parameters χ and ψ are such that χ0,ψ>0 if κ>0, ψ0,χ>0 if κ<0, and χ>0,ψ>0 if κ=0. This density is unimodal and contains the gamma and inverse gamma densities as special cases when χ=0 and ψ=0, respectively.

Let the random variable Y be as in (1), where WGIGκ,χ,ψ. Then, the PDF of Y is given by

fGHpy;ξ,Σ,λ,κ,χ,ψ=ψχκ/2λΣ1λ+ψp2κ2πp/2|Σ|1/2KκχψexpyξΣ1λ1×Kκp/2χ+yξΣ1yξλΣ1λ+ψχ+yξΣ1yξλΣ1λ+ψp2κ.
for yRp. The corresponding cumulative distribution function (CDF) is denoted by FGHp;ξ,Σ,λ,κ,χ,ψ. For further details about GIG and GH distributions, see [7].

The other members of NMVM class of distributions can be introduced similarly by choosing other suitable mixing random variable W. For example, let W be a Birnbaum–Saunders (BS) [8] random variable with shape parameter α and scale parameter β, denoted by WBSα,β. The PDF of W is given by

fBSw;α,β=w+β2αβw3ϕ1αwββw,  w>0;  α,  β>0.

Now, assuming WBSα,1 in (1), Pourmousa et al. [9] presented a NMVM of BS (NMVBS) distribution. They studied some properties of this model and illustrated its application in autoregressive conditional heteroskedastic (ARCH) models.

To shift from symmetric distributions to asymmetric ones, Azzalini [4] proposed the univariate skew-normal (SN) distribution, which can be used as a substitute for normal distribution in the modeling of data displaying some asymmetry. The multivariate version of SN (MSN) distribution was introduced by Azzalini and Dalla Valle [10] and Arellano-Valle and Genton [11], which extends the multivariate normal model by allowing a shape parameter to account for skewness.

A random vector Y is said to follow a p-variate MSN with location vector ξ, scale matrix Σ, and skewness parameter vector λRp, denoted by YSNpξ,Σ,λ if it has density

fMSNy;ξ,Σ,λ=2ϕpy;ξ,ΣΦλΣ1/2yξ,
where ϕp;ξ,Σ is the PDF of Npξ,Σ and Φ stand for the CDF of the univariate standard normal distribution.

According to Arellano-Valle and Genton [11], the MSN distribution has a convenient stochastic representation as

Y=dξ+Σ1/2δ|U0|+Ipδδ1/2U1,
where δ=λ/1+λλ, U0N0,1 and U1Np0,Ip independently.

If ZSNp0,Σ,λ is used in (1), then a skew-normal mean-variance mixture distribution is attained. Arslan [12] assumed that W is a GIG distribution and studied properties of the corresponding model.

The aim of this paper is to introduce a multivariate skew-normal mean-variance mixture based on BS (SNMVBS) distribution. The random variable YRp is said to have an SNMVBS distribution if

Y=ξ+Wλ1+W1/2Z,
where ZSNp0,Σ,λ2, WBSα,1, and W and Z are independent. The parameter vectors ξ,λ1 and λ2 are in Rp, and Σ is a positive definite matrix.

Tamandi et al. [13] studied the univariate SNMVBS and discussed some of its properties. The modeling of correlated data in the presence of outliers provides a main motivation for extending the univariate SNMVBS to multivariate setup. On the other hand, the SNMVBS distribution serves as an alternative to skew-t (ST) model introduced by Azzalini and Capitanio [14]. The ST distribution is a SN mean-variance mixture distribution, where the mixing distribution is Gammaν/2,ν/2. But, it is well known that there is not a closed-form estimator for ν in the ST model. Interestingly, the maximum likelihood (ML)  estimates of the parameters in SNMVBS model can be obtained by solving some simple linear equations. Moreover, the SNMVBS model extends the NMVBS distribution, introduced by Pourmousa et al. [9].

The rest of this paper proceeds as follows. Section 2 introduces the multivariate SNMVBS distribution and then discusses some of its characteristics. Section 3 develops an ECM algorithm for estimating the parameters of SNMVBS distribution. In Section 4, asymptotic properties of the ML estimators are investigated using simulations. Section 5 illustrates some applications of the SNMVBS distribution in modeling environmental data. Finally, Section 6 provides some concluding remarks.

2. MULTIVARIATE SNMVBS DISTRIBUTION

Let Y be a random variable following the representation in (5), where WBSα,1 with α>0. Then, we say that Y follows an SNMVBS distribution, and then write YSNMVBSξ,Σ,λ1,λ2,α in short. It must be noted that if we set WBSα,β, with a general β, then the corresponding model becomes nonidentifiable. For this reason, we assume β=1 to have an identifiable model with desirable properties. Using (5) and the known properties of SN distribution, another stochastic representation of SNMVBS random variable can be obtained as follows:

Y|W=wSNpξ+wλ1,wΣ,λ2,WBSα,1.

Theorem 2.1.

If the random vector Y follows the representation in (6), then the PDF of Y is given by

fSNMVBSy=f12yHp+12a+f12yHp12a,  yRp,
where fκx=fGHpx;ξ,Σ,λ1,κ,α2,α2 and Hκx=FGH1x; 0,1,b,κ,δ1,δ2. In addition δ1=yξΣ1yξ+α2, δ2=λ1Σ1λ1+α2, a=λ2Σ1/2yξ, and b=λ2Σ1/2λ1.

A detailed proof of this theorem is given in Appendix A. The notations in Theorem 2.1 will be used throughout the paper. Figure 1 displays some examples of the density function in (7). These figures present four different densities and the corresponding contour plots for ξ=0, Σ=10.50.51 and different values of λ1,λ2, and α. We find that the skewness of contours increases as a function of λ2. Also, these figures depict that the SNMVBS distribution can be an useful model in the presence of outliers when α becomes large.

Figure 1

Density functions and corresponding contours of skew-normal mean-variance mixture based on Birnbaum–Saunders (SNMVBS) distribution. From top to bottom: λ1=λ2=(0,0),α=0.3;λ1=(3,0),λ2=(0,3),α=0.3;λ1=(3,0),λ2=(0,3), α=1.7;λ1=(1,1),λ2=(1,1),α=1.7, respectively.

In the special case when λ1=0, the PDF in (7) reduces to a scale mixture of two multivariate SN distributions with the BS model as the mixing distribution. Also, when λ2=0, then Y has NMVBS distribution studied by Pourmousa et al. [9]. As mentioned in the introduction, Arslan [12] introduced a mean-variance mixture of the SN distribution with WGIGκ,χ,ψ as the mixing distribution. Our motivation for introducing SNMVBS distribution here is that the SNMVBS model serves as an extension of Arslan's model under proper settings of parameters since the BS distribution is a mixture of two GIG distributions with special parameters [15]. Moreover, estimates of all parameters of SNMVBS distribution can be obtained through some simple linear equations in closed-form, but in Arslan's model, we must obtain the estimates of GIG distribution using numerical methods.

The mean vector and covariance matrix of YSNMVBSξ,Σ,λ1,λ2,α are obtained by iterated expectations on representation (6) and Part (ii) of Lemma Appendix A.2, presented in Appendix A. These are given by

EY=ξ+λ1EW+2πΣ1/2δEW1/2,CovY=λ1λ1VarW+ΣEW2πΣ1/2δδΣ1/2E2W1/2+22πΣ1/2δλ1covW,W1/2.

Table 1 represents the upper bounds of the measures of the skewness and kurtosis coefficients for the univariate SNMVBS under ξ=0, σ=1, and several different values of α. It can be observed that the ranges of these measures for the SNMVBS model grows when the parameter α increases. However, the multivariate skewness and kurtosis measures [16] for the multivariate SNMVBS cannot be obtained in closed forms, but the numerical computations show that the results are in agreement with the univariate version. Moreover, the upper bounds of skewness and kurtosis in SNMVBS model are greater than that of the SN model. According to Azzalini and Capitanio [17], the approximate maximal values of the skewness and kurtosis coefficients of the SN model are 0.9905 and 0.869, respectively, while from Table 1, we find these values for the SNMVBS model when α=40 to be 4.7859 and 30.349, respectively. It is useful to note that the skewness and kurtosis coefficients for ST model with ν=5 are 0.9905 and 23.108, respectively. Thus, the SNMVBS model proposed here provides considerably larger ranges for skewness and kurtosis than both the SN and ST models.

α
Measure 0.5 1 2 4 8 10 15 20 30 40
Skewness 1.4547 2.6579 3.8459 4.4786 4.6996 4.7301 4.7619 4.7738 4.7826 4.7859
Kurtosis 6.8981 10.789 20.910 27.107 29.409 29.736 30.082 30.213 30.312 30.349
Table 1

Upper bounds of the measures of skewness and kurtosis coefficients for the univariate SNMVBS under ξ=0, σ=1, and different values of α.

From (4) and (5), we get

Y=dξ+Wλ1+W1/2Σ1/2λ21+λ2λ21/2|U0|+Ip+λ2λ21/2U1.

Denoting by γ=w1/21+λ2λ21/2|U0|, a further hierarchical representation of SNMVBS can be given as

Y|W=w,γNpξ+wλ1+Σ1/2λ2γ1+λ2λ2,wΣ*,γ|W=wHN0,w1+λ2λ2,WBSα,1,
where HNμ,σ2 denotes the univariate half-normal distribution and Σ*=Σ1/2I+λ2λ21Σ1/2.

The conditional distribution of γ, given Y=y and W=w, is given by

f(γ|y,w)=f(y,γ,w)f(y,w)=f(y|γ,w)f(γ|w)f(w)0f(y,γ,w)dγ.

Using representation (8), after some algebra, we note that

γ|W=w,Y=yHNλ2u2,w,
where u2=Σ1/2yξwλ1 and
μγ=λ2u2 ,σγ2=w.

Thus, we have

Eγ|W=w,Y=y=μγ+σγϕμγσγΦμγσγ,
and so by known properties of conditional expectations, we have
Eγ|y=EEγ|w,y|y,EγW1|y=EW1Eγ|w,y|y.

In Appendix A, we have established some theoretical properties and characteristics of SNMVBS distribution, including moment generating function, affine transformation, and conditional distributions. In the following theorem, conditional moments of W, given Y=y, are presented which will be used in subsequent sections.

Theorem 2.2.

Let YSNMVBSξ,Σ,λ1,λ2,α and WBSα,1. Then

  1. The conditional PDF of W, given Y=y, is

    fw|y=pyΦw1/2aw1/2bHp+12afGIGw;p+12,δ1,δ2+1pyΦw1/2aw1/2bHp12afGIGw;p12,δ1,δ2,
    where
    py=1fSNMVBSyf12yHp+12a;

  2. EWn|y=δ1δ2n/2pyHp+12+naHp+12aRp+12,nδ1δ2+1pyHp12+naHp12aRp12,nδ1δ2,
    for n=±1,±2,...;

  3. Emy=EWmqW1/2λ2U2|Y=y=12πfSNMVBSy|Σ*||Σ|1/2×fm+1/2*yR12,mα2+fm1/2*yR12,mα2
    for m=±1/2, where fκ*y=fGHpy;ξ,Σ*,λ1,κ,α2,α2 and qx=ϕxΦx.

Proof.

The conditional PDF in (11) can be obtained by using Bayes’ rule and some algebraic operations. The conditional expectation in (12) is obtained from Part (i) of the theorem and Part (ii) of Lemma Appendix A.1 in Appendix A. The conditional expectation in (13) is calculated directly.

To determine the parameter estimates of the SNMVBS distribution, we propose the Expectation-Maximization (EM)-type algorithms in the next section. The conditional expectations given in Theorem 2.2 become very useful for this purpose.

3. PARAMETER ESTIMATION

3.1. ECM Algorithm

Dempster et al. [18] introduced the EM algorithm technique as a versatile tool for ML estimation of models when there are missing data or latent variables. Simplicity in the implementation of the algorithm and the monotonic convergence are two most important features of the EM procedure. However, it is not directly applicable for estimating the parameters of the SNMVBS model because the M-step involves intractable computations. To avoid this problem, we use the Expectation-Conditional-Maximization (ECM) algorithm, suggested by Meng and Rubin [19], which replaces the M-step of the EM by a sequence of simpler conditional maximization steps.

Let y=y1,...,yn be n samples, which have been collected randomly, and the corresponding unobserved random values are γ=γ1,...,γn and w=w1,...,wn. By using (8), we have the following hierarchical representation:

Yi|γi,Wi=wiNpξ+wλ1+Σ1/2λ2γi1+λ2λ2,wiΣ*,γi|Wi=wiHN0,wi1+λ2λ2,WiBSα,1.

Then, under the hierarchical representation in (14), with ν=Σ1/2λ2, the complete log-likelihood function associated with yc=y,γ,wT6 is given by

lcθ|yc=cn2log |Σ|12i=1nwi1yiξwiλ1Σ1yiξwiλ112i=1nwi1γiνyiξwiλ12nlogα12α2i=1nwi+wi12.

Now, to perform the ECM algorithm, we start with the E-step, given the current parameter θ^k=ξ^k,Σ^k,λ^1k,λ^2k,α^k. Then, we compute the expected value of lcθ|yc, defined as Qθ|θ^k=Elcθ|yc|y,θ^k, which involves some conditional expectations, including

q^1ik=EWi1|yi,θ^k,q^2ik=EWi|yi,θ^k,q^3ik=EγiWi1|yi,θ^k,q^4ik=Eγi|yi,θ^k.

The q^1ik and q^2ik are obtained directly from (12). For q^3ik and q^4ik, one can use the conditional expectations presented in (10). So, the last two quantities in (16) can be obtained as

q^3ik=λ2Σ1/2yiξq^1ikλ2Σ1/2λ1+M1E1/2y,
q^4ik=λ2Σ1/2yiξq^2ikλ1+M2,
where M1=E1/2y and M2=E1/2y can be obtained from (13).

In summary, the ECM algorithm proceeds with the following steps:

E-step: Given the current value θ=θ^k, calculate the Q-function given by

Qθ|θ^k=cn2log|Σ|nlog α12i=1nq^1ikyiξΓyiξ12i=1nq^2ikλ1Γλ1+i=1nq^3ikyiξνi=1nq^4ikλ1ν+12i=1nyiξΓλ1+12i=1nλ1Γyiξ12α2i=1nq^1ik+q^2ik2,
where Γ=Σ1+νν.

CM-steps: Maximize (17), with respect to ξ, Σ, λ1, λ2 and α, to obtain the following closed-form expressions for the parameters:

α^k+1=1ni=1nq^1ik+q^2ik2 ,ξ^k+1=1i=1nq^1iki=1nq^1ikyiΓ^k1ν^ki=1nq^3iknλ^1k,Σ^k+1=1ni=1nq^1ikyiξ^k+1yiξ^k+1+λ^1kλ^1ki=1nq^2iki=1nyiξ^k+1λ^1ki=1nλ^1kyiξ^k+1,λ^1k+1=1i=1nq^2iki=1nyiξ^k+1Γ^k1ν^ki=1nq^4ik,ν^k+1=Σ^(k+1)1ni=1nq^3ikyiξ^k+1i=1nq^4ikλ^1k+1.

We then update λ^2k+1 by

λ^2k+1=Σ^k+112ν^k+1.

The above procedure is repeated until either the maximum number of iterations is reached or a suitable stopping criterion is satisfied. Moreover, Appendix B details an approach for estimating the standard errors of the ML estimates using the observed information matrix.

3.2. Computational Strategies for the Implementation

One of the attractive properties of the EM-type algorithms is that successive iterates monotonically increase the likelihood value. So, a highly useful way to confirm convergence is that the difference between two successive log-likelihood values is less than a user-specified error tolerance. In this paper, we have used the Aitken acceleration method [20] to avoid the lack of process when determining the actual convergence [21]. Given the sequence of observed log-likelihood lkk=0 values in successive iterations, we first calculate the Aitken acceleration factor ak=lk+1lk/lklk1. This yields the asymptotic estimate of the log-likelihood lk+1=lk+1+lk+1lk/1ak, which can be determined in advance at iteration k+1. The algorithm then gets terminated if

lk+1lk<ϵ,
where ϵ=106 is the default tolerance employed in our study.

Moreover, the implementation of EM algorithms requires initial values. Essentially, good initial values for the optimization process may speed up or even enable the convergence [22]. In our computations, we used the following method for choosing initial values. For ξ and Σ, the sample mean and sample covariance matrix were used as initial values. The initial values for λ1 and λ2 were set to be vectors of 1's times a factor d, where d was randomly drawn from a uniform distribution between 0.5 and 2. Lastly, for the initial value of α, we chose an integer randomly from the interval between 1 and 10.

4. SIMULATION STUDY

To examine the performance of the proposed distribution as well as the estimation method, a small simulation study is carried out here. Through this emperical study, we examine the finite-sample properties of ML estimators obtained by the ECM algorithm, described earlier in Section 3. For this purpose, Monte Carlo samples of size n=50,100,200, and 500 are generated from the SNMVBS model with true parameters ξ=1,1,VechΣ=1,0.5,1,λ1=1,1,λ2=1,1, and α=2, where VechD is a vector of pp+12 distinct elements of the p×p symmetric matrix D. Each simulated data set were then fitted under the true model via the ECM algorithm according to the procedure described in Section 3.1. For each sample size, the experiment was replicated 300 times to assess the performance.

To examine the accuracies of parameter estimates, we computed the bias and the mean squared error (MSE), defined as

Bias=1300i=1300θ^iθtrue  and    MSE=1300i=1300θ^iθtrue2,
where θ^i denotes the estimate of a specific parameter at the ith replication.

Furthermore, we computed the standard deviations of the ML estimates across 300 Monte Carlo samples (MC.SEs) and compared them with the average values of the approximate standard errors (A.SEs) obtained through the inverse of the observed information matrix, presented in Appendix B. Numerical results displayed in Tables 2 and 3 show the empirical consistency of the ML estimates as the Bias and MSE values both shrink toward zero when n increases. Moreover, in small sample sizes, the differences between A.SEs and MC.SEs of shape parameters are significantly large, but the information-based method can offer a reasonably satisfactory approximation to the asymptotic covariance matrix of the ML estimates of model parameters, when the associated sample size is sufficiently large, say 200 or more.

n = 50
n = 100
Parameter Bias MSE MC.SE A.SE Bias MSE MC.SE A.SE
ξ1 −0.0531 0.2074 0.2599 0.2959 −0.0189 0.1211 0.1704 0.1882
ξ2 −0.0532 0.1905 0.2896 0.2918 −0.0262 0.1237 0.1864 0.1890
d11 −0.0603 0.3948 0.2709 0.2945 −0.0321 0.2962 0.1943 0.1835
d12 0.1352 0.3642 0.1975 0.2705 −0.0863 0.2424 0.1379 0.1697
d22 −0.0728 0.4889 0.2821 0.2943 −0.0616 0.3016 0.2029 0.1840
λ11 −0.0625 0.2375 0.3359 0.4935 −0.0404 0.1784 0.2244 0.2886
λ12 −0.0715 0.2524 0.3028 0.4782 −0.0574 0.1801 0.2442 0.2874
λ21 0.0059 0.7948 0.7967 2.1740 −0.0279 0.5528 0.5978 1.2770
λ22 0.1124 0.8508 0.8944 2.1805 0.0856 0.5820 0.6199 1.2425
α 0.0331 0.3653 0.4025 0.4939 0.0431 0.2364 0.2341 0.3085
Table 2

Simulation results for assessing the asymptotic properties of parameter estimates and standard errors, when n = 50 and 100.

n = 200
n = 500
Parameter Bias MSE MC.SE A.SE Bias MSE MC.SE A.SE
ξ1 −0.0086 0.0820 0.1135 0.1159 −0.0041 0.0498 0.0762 0.0807
ξ2 −0.0114 0.0882 0.1178 0.1183 −0.0059 0.0530 0.0706 0.0807
d11 −0.0121 0.2122 0.1293 0.1093 −0.0147 0.1352 0.0903 0.0789
d12 −0.0361 0.1682 0.0786 0.0958 −0.0255 0.1009 0.0544 0.0617
d22 −0.0160 0.2220 0.1407 0.1096 −0.0146 0.1424 0.0973 0.0786
λ11 −0.0137 0.1298 0.1396 0.1633 −0.0074 0.0745 0.0955 0.1098
λ12 −0.0247 0.1229 0.1493 0.1635 −0.0156 0.0737 0.1018 0.1093
λ21 −0.0244 0.3842 0.3746 0.7025 −0.0063 0.2333 0.2594 0.4891
λ22 0.0361 0.4014 0.3862 0.6913 0.0252 0.2653 0.2497 0.4874
α 0.0175 0.1667 0.1469 0.1792 0.0042 0.1016 0.1059 0.1229
Table 3

Simulation results for assessing the asymptotic properties of parameter estimates and standard errors, when n = 200 and 500.

5. APPLICATIONS

In this section, two applications of the SNMVBS distribution are presented. These examples come from environmental studies in a multivariate setting and possessing some outliers. In these cases, the SNMVBS distribution provides a better fit to these data than the other skewed models mentioned earlier.

5.1. Wind Speed Data

In this example, we consider the wind speed dataset and then fit the SNMVBS distribution in its multivariate form. The wind speed dataset consists of n=278 hourly average wind speeds collected at three meteorological towers: Goodnoe Hills (gh), Kennewick (kw), and Vansycle (vs). Those towers are approximately located on a line and ordered from West to East. Azzalini and Genton [23] considered these data and fitted a trivariate-ST model. They treated the observations as being independent and identically distributed because the tests demonstrated a weak serial correlation between the three stations. The heavy tail behavior indicates the presence of extreme wind speeds. Arslan [6] fitted for these data the multivariate skew-Laplace distribution. She studied two-dimensional vector of wind speed recorded at the locations gh and kw as a bivariate random sample and noted that the estimates for the skewness parameter reveal apparent skewness in the data.

Here, we fit the SNMVBS distribution for these data as an alternative model. Figure 2 shows classical and robust Mahalanobis distances of this bivariate data. The plots show some outliers in these data and consequently a symmetric distribution will not be a good choice in this case. We assume (gh, kw) as a two-dimensional vector and fit a bivariate SNMVBS to that. Similarly, we also fit bivariate NIG, ST, and NMVBS distributions for the sake of comparison. As recommended by an anonymous referee, we also fitted the skew-Laplace distribution, known as shifted asymmetric Laplace (SAL), introduced by Franczak et al. [24]. The SAL random variable can be represented by (1) when W is a standard exponential random variable with mean 1. For fitting bivariate ST, we used the package “sn” and for NIG we used the package “ghyp” built-in in R 3.5.1 statistical software. Table 4 shows the obtained results, assuming α=ν and α¯ for ST and NIG, respectively.

Figure 2

Wind speed data: Classical and robust Mahalanobis distances.

Par SNMVBS ST NMVBS NIG SAL
ξ1 25.41 (2.639) 28.48 (0.652) 32.68 (10.49) 33.80 (4.526) 23.07 (1.145)
ξ2 40.94 (7.584) 24.09 (1.018) 40.09 (14.06) 41.10 (5.527) 22.01 (2.730)
σ11 424.25 (41.25) 387.45 (5.397) 97.97 (31.09) 94.10 (24.03) 136.26 (18.47)
σ12 213.68 (40.57) 241.45 (3.590) 30.36 (39.15) 20.07 (25.26) 85.24 (19.01)
σ22 259.45 (43.36) 349.09 (2.822) 183.84 (52.54) 188.35 (38.59) 345.65 (35.35)
λ11 3.24 (2.521) - −18.33 (10.42) −21.06 (4.573) −10.32 (1.285)
λ12 −18.37 (7.180) - −23.95 (14.15) −27.07 (5.731) −7.97 (2.759)
λ21 −5.54 (0.448) −4.48 (0.180) - - -
λ22 −1.16 (0.347) −0.10 (0.299) - - -
α 0.38 (0.073) 17.62 (0.121) 0.41 (0.154) 5.08 (0.006) -
lθ^ −2228.43 −2237.54 −2244.43 −2244.35 −2257.33
AIC 4476.86 4491.08 4504.86 4504.71 4528.66
BIC 4513.14 4520.11 4533.88 4533.73 4554.05
HQIC 4491.37 4502.73 4516.55 4516.36 4538.85
Table 4

Estimation results and information criteria for wind speed data (gh, kw). Standard errors of ML estimates are given in parentheses.

Performance assessments for studied models were made on the adequacy of overall fitness in terms of the Akaike Information Criterion (AIC; [25]), Bayesian Information Criterion (BIC; [26]) and Hannan–Quinn information Criterion (HQIC; [27]), defined as

AIC=2m2lmax, BIC=mlog n2lmax,
and
HQIC=2mloglogn2lmax,
where m is the number of parameters and lmax is the maximized log-likelihood value. As a general rule, lower values of information criteria indicate a better-fitting model.

As can be seen from Table 4, it is evident from the AIC, BIC, and HQIC values that the SNMVBS model provides the best fit for these data. Figure 3 presents the scatter plots of wind data along with the contour plots of the fitted density and the fitted density function of the SNMVBS distribution. The plots reveal that the SNMVBS distribution can effectively capture the skewness and heavy-tails present in the data.

Figure 3

Wind speed data: Scatter plot (gh, kw) with fitted density contours and the fitted density of the skew-normal mean-variance mixture based on Birnbaum–Saunders (SNMVBS) distribution.

5.2. Water Quality Data

Cook and Johnson [28] introduced a new distribution and investigated its application in water quality. These data consist of log concentrations of seven chemical elements in 655 water samples collected near Grand Junction in Western Colorado. Concentrations were measured for the following elements: Uranium (U), Lithium (Li), Cobalt (Co), Potassium (K), Cesium (Cs), Scandium (Sc), and Titanium (Ti). The data are presented in package “copula.” Figure 4 shows that there are some potential outliers in these data. The estimates of parameters are not shown here, but Figures. 5 and 6 show the contour plots of pairs of variables with multivariate SNMVBS contours. Clearly, the SNMVBS distribution provides a good fit for these data. Table 5 presents the information criteria of some alternate models. From this table, we observe once again that the AIC, BIC, and HQIC of SNMVBS are the lowest among all considered models.

Figure 4

Water quality data: Classical and robust Mahalanobis distances.

Figure 5

Water quality data: Scatter plots with superimposed fitted skew-normal mean-variance mixture based on Birnbaum–Saunders (SNMVBS) density contours for some concentrations.

Figure 6

Water quality data: (Continued).

Criterion SNMVBS ST NMVBS NIG SAL
lθ^ 1969.61 1923.66 1925.58 1926.93 1839.19
AIC −3839.20 −3761.33 −3765.16 −3767.86 −3594.39
BIC −3614.96 −3568.49 −3572.32 −3575.02 −3406.04
HQIC −3752.26 −3686.57 −3690.75 −3693.09 −3521.36
Table 5

Water quality data: maximum log-likelihood, AIC, and BIC for different fitted models.

6. CONCLUDING REMARKS

We have introduced in this work a multivariate skew-normal mean-variance mixture distribution obtained by employing BS as a mixing distribution. The properties of the SNMVBS distribution are studied, and a feasible EM-type algorithm for estimating parameters has been developed. We have demonstrated our methodology with a simulation study and have shown that the SNMVBS model outperforms other skew models in two data sets. The proposed distribution has some desirable properties. It offers a flexible class of distributions for modeling skewed and heavy-tailed data. It also provides a good fit to data consisting of outliers. Moreover, in contrast to the ST model, estimates of all parameters of the SNMVBS distribution can be obtained by solving some simple linear equations in the proposed ECM algorithm.

ACKNOWLEDGMENTS

We gratefully acknowledge the associate editor and three anonymous referees for their valuable comments and suggestions, which led to this greatly improved version of the article. We are also grateful to Prof. Tsung-I Lin for his help on the earlier version of the paper.

APPENDIX A. THEORETICAL PROPERTIES

The following lemmas are useful in establishing some properties of the skew-normal mean-variance mixture based on Birnbaum–Saunders (SNMVBS) distribution.

Lemma A.1.

If WGIGκ,χ,ψ, then

  1.  W1GIGκ,χ,ψ,

  2.  EWr=χψr/2Rκ,rψχ,

  3. EWΦaW1/2bW1/2=Fa,

where Rκ,rx=Kκ+rxKκx, a,bR, Φ is the cumulative distribution function (CDF) of the univariate standard normal distribution, and F is the CDF of GH10,1,b,χ,ψ.

Parts (i) and (ii) of the above lemma are well-known properties of generalized inverse Gaussian (GIG) distribution; see [7] for details. Part (iii) has been proved by [12].

Lemma A.2.

Let WBSα,1. Then

  1. The probability density function (PDF) of W can be expressed as

     fBSw=12fGIGw;12,α2,α2+fGIGw;12,α2,α2;

  2. EWr=12R12,rα2+R12,rα2.

In particular, we have  EW=1+12α2 and VarW=α21+54α2.

The proof of Lemma A.2 and some other useful properties of BS can be found in Leiva [15], for example.

We now present the proof of Theorem 2.1.

Proof.

From (6), we have

fY=0fMSNy|wfBSwdw=20ϕpy;ξ+wλ1,wΣΦw1/2λ2Σ1/2yξwλ1fBSwdw=0wp+321+wα(2π)p+12|Σ|1/2Φw1/2λ2Σ1/2yξwλ1×expw12yξwλ1Σ1yξwλ112α2w1w2dw=expyξΣ1λ122πp/2|Σ|1/2K1/2α2×0wp+321+wΦw1/2aw1/2bexp12δ1w1+δ2wdw=fGHpy;ξ,Σ,λ1,1/2,α2,α2EW1ΦW11/2aW11/2b+fGHpy;ξ,Σ,λ1,1/2,α2,α2EW2ΦW21/2aW21/2b,
where W1GIGp+1/2,δ1,δ2 and W2GIGp1/2,δ1,δ2. The proof is then completed using Part (iii) of Lemma A.1.

The following proposition presents the moment generating function (mgf) of Y.

Proposition A.1.

The mgf of YSNMVBSξ,Σ,λ1,λ2,α is given by

MYs=esξK1/2α2K1/2α22sλ1sΣsα2×α22sλ1sΣsα21/2H0*δΣ1/2s+α22sλ1sΣsα21/2H1*δΣ1/2s,
where Hκ*x=FGH1x;0,1,0,κ,α22sλ1sΣs,α2.

Proof.

ZSNpξ,Σ,λ implies MZs=2esξ+12sΣsΦδΣ1/2s. So using (6) we have

MYs=EWMY|Ws=esξ2K1/2α20w321+wΦw1/2δΣ1/2s×exp12α2w1+α22sλ1sΣswdw.

The proof is completed by performing some algebraic calculations.

The next theorem shows that SNMVBS random vector is invariant under linear transformations. Also, the marginal distributions of SNMVBS are presented.

Theorem Appendix A.1.

Let YSNMVBSξ,Σ,λ1,λ2,α. Then

  1. Foy any ARq×p and bRq, the q-dimensional random vector V=AY+b is distributed as SNMVBSAξ+b,AΣA,Aλ1,λ2*,α, where λ2*=δ*1δ*δ*1/2 with δ*=AΣA1/2AΣ1/2δ. Moreover, if q=p and A is non-singular, then we get λ2*=λ2;

  2. Suppose Y is partitioned as Y=Y1,Y2 of dimensions p1 and p2=pp1, respectively. Accordingly, the mean, skewness vector and covariance matrix are partitioned as

ξ=ξ1,ξ2,    λ1=λ11,λ12,    Σ=Σ11Σ12Σ21Σ22

Then, Y1SNMVBSξ1,Σ11,λ11,Σ111/2ν~,α, where ν~=ν1+Σ111Σ12ν21+ν2Σ22.1ν2, with Σ22.1=Σ22Σ21Σ111Σ12 and ν=ν1,ν2=Σ1/2λ2.

Proof.

The proof of Part (i) is straightforward by obtaining the mgf of V from Proposition A.1. Applying Part (i) to A=Ip1   0p2 with p=p1+p2, Part (ii) can be proved.

Although there is no closed-form for the conditional distribution of SNMVBS distribution, it is clear that the conditional distribution of Y2, given Y1, is not in the same class of distributions.

APPENDIX B. PROVISION OF STANDARD ERRORS

To compute the asymptotic covariance of the maximum likelihood (ML) estimates, we use the method suggested by Basford et al. [29]. Let lciθ|yi,γi,wi be the complete data log-likelihood contributed from the single observation yi. Then, the empirical information matrix is given by I^=i=1ns^is^i, where s^i is the individual score and defined by

s^i=Eθ^lciθ|yi,γi,wiθ|yi.

The standard errors of ML estimates can then be found by calculating the square root of the diagonal elements of I^1.

The lci(θ|yi,γi,wi) can be obtained from (15). Let d=VechD be a vector of pp+12 distinct elements of the symmetric matrix D, where D=Σ^1/2. Then, we have

s^i,α=1α^+1α^3q^1i+q^2i2,      s^i,ξ=Γ^q^1iyiξ^λ^1q^3iν^,s^i,d=Vech2D1DiagD1+A^i+A^iDiagA^i+B^i+B^iDiagB^i,s^i,λ1=Γ^yiξ^q^2iλ^1q^4iν^,s^i,ν=q^3iyiξ^q^4iλ^1q^1iyiξ^yiξ^+q^2iλ^1λ^1yiξ^λ^1λ^1yiξ^ν^,
where
A^i=D1q^1iu^iu^i+q^2iv^v^u^iv^+q^1iλ^2λ^2u^iu^i+q^2iλ^2λ^2v^v^iλ^2λ^2u^iv^,B^i=q^4iv^λ^2u^iv^q^3iu^iλ^2u^iv^λ^2λ^2D1,

u^i=D1yiξ^ and v^=D1λ^1. Diag (A) is a diagonal matrix created by extracting the main diagonal elements of the square matrix A.

REFERENCES

4.A. Azzalini, Scand. J. Stat., Vol. 12, 1985, pp. 171-178.
5.O. Barndorff-Nielsen, Scand. J. Stat., Vol. 9, 1978, pp. 43-46.
7.W. Hu, Florida State University, 2005. PhD Thesis https://diginole.lib.fsu.edu/islandora/object/fsu%3A181953
13.M. Tamandi, H. Negarestani, A. Jamalizadeh, and M. Amiri, JIRSS. (to appear) http://jirss.irstat.ir/article-1-499-en.html
25.H. Akaike, B.N. Petrov and F. Csaki (editors), Akadémiai Kiadó, in 2nd International Symposium on Information Theory (Budapest), 1973, pp. 267-281.
29.K.E. Basford, D.R. Greenway, G.J. McLachlan, and D. Peel, Comput. Stat., Vol. 12, 1997, pp. 1-17.
Journal
Journal of Statistical Theory and Applications
Volume-Issue
18 - 3
Pages
244 - 258
Publication Date
2019/07/03
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.190617.001How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - M. Tamandi
AU  - N. Balakrishnan
AU  - A. Jamalizadeh
AU  - M. Amiri
PY  - 2019
DA  - 2019/07/03
TI  - A Multivariate Skew-Normal Mean-Variance Mixture Distribution and Its Application to Environmental Data with Outlying Observations
JO  - Journal of Statistical Theory and Applications
SP  - 244
EP  - 258
VL  - 18
IS  - 3
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.190617.001
DO  - 10.2991/jsta.d.190617.001
ID  - Tamandi2019
ER  -