On Seemingly Unrelated Regression Model with Skew Error
- 10.2991/jsta.d.210126.002How to use a DOI?
- Seemingly unrelated regression; Endogenous variable; Exogenous variable; Skew-normal distribution
Sometimes, invoking a single causal relationship to explain dependency between variables might not be appropriate particularly in some economic problems. Instead, two jointly related equations, where one of the explanatory variables is endogenous, can represent the actual inheritance inter-relationship among variables. Such typical models are called simultaneous equation models of which the seemingly unrelated regression (SUR) models is a special case. Substantial progress has been made regarding the statistical inference on estimating the parameters of these models in which errors follow a normal distribution. But, less research was devoted to a case that the distributions of the errors are asymmetric. In this paper, statistical inference on the parameters for the SUR models, assuming the skew-normal density for errors, is tackled. Moreover, the results of the study are compared with those of other naive methodologies. The proposed model is utilized to analyze the income and expenditure of Iranian rural households in the year 2009.
- © 2021 The Authors. Published by Atlantis Press B.V.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
Most linear regression models rely on the relationship between a dependent variable to one or more explanatory variables. The main objective in treating these models is estimating and predicting the average value of dependent variables subject to some explanatory variables. But in many cases, particularly in some economic problems, the causal relationship represented by a single equation is not appropriate. The drawback of such single models is twofold. Mainly, not only does the response variable depends on the explanatory variables, but the response variable also determines some of the explanatory variables. Generally, it can be argued that there are simultaneous or two-sided relationships between the response and some of the explanatory variables in these cases. Hence, to separate the variables as explanatory and dependent does not make sense in real-life circumstances. In these situations, the number of equations will, naturally, be more than one. Precisely, there is an equation for every endogenous or dependent variable. Generally, following Haavelmo  when the dependent variable of a particular model is an explanatory variable, one should use the simultaneous equations models (SEMs). The particular case of these models is called the seemingly unrelated regression (SUR) model.
Evidence shows that Zellner  was the pioneer researcher to estimate the parameters of the SUR model using the generalized least square method. The history of the frequent approach to such models was somewhat low. But, there were much research on following the Bayesian approach. The application of the Bayesian approach in the SUR model was first proposed by Zellner . Afterward, other methods for estimating parameters were used, including the maximum likelihood method , Bayesian moment and direct Monte Carlo method . The MCMC application in the SUR model has appeared in many studies under various assumptions. To name some we can mention, for example, Percy , Chib and Greenberg , and Smith and Kohn . Recently, Zellner and Ando  and also Zellner et al.  have investigated the estimation of the parameters in the SUR model using a hierarchical Bayes approach through the direct Monte Carlo and importance sampling techniques.
Another important aspect of the SUR models, which was and is worth to study, refers to the type of distribution considered for the error term. It is quite common to assume the normal density for this case. But, there are numerous examples in which the empirical distribution of variables often exhibits asymmetric structure and so the normal distribution can no longer be used in these cases. In these situations, some transformations may be used to make the distribution of data to, relatively, follow normal density. However, such transformations have their own drawbacks, including the biase of the estimator . Using asymmetric distributions possessing the same characteristics as normal distribution, has recently received significant attention in the literature. The skew-normal distribution is one of the important distributions proposed to tackle the asymmetric feature of data. Historically, the univariate skew-normal distribution was advocated by Azzalini . Then, Azzalini and Dalla valle  proposed the multivariate skew-normal distribution. Azzalini and Capitano  further studied the properties of this density. Several generalizations of this distribution have been presented by Balakrishnan , Genton , Guptaet al. , and Arellanovalle et al. . Recently, Azzalini and Regoli  have investigated some other properties of the skew-symmetric distribution. As a new line of research, we consider the SUR model allowing the error in the model to follow the skew-normal distribution. The estimation of parameters using the maximum likelihood methodology is also treated. Intensive simulation studies are conducted to evaluate the proposed methods. Application of the model to real-life data is also given.
The present paper is organized as follows: A brief review of the SUR model is presented in Section 2. Then, a likelihood-based approach to estimate the parameters with the skew distribution for the errors in the SUR models is discussed in Section 3. The simulation study as well as the analysis of the real data, related to the Iranian rural households income and expenditure on in the year 2009, are presented in Section 4. General conclusions are provided at the end. The proofs for some of the results are given in the Appendix.
2. SUR MODEL
Suppose is an matrix of explanatory variables and a column vector of parameters with the length . Furthermore, suppose there are equations corresponding with endogenous variables, a column vector with the length indicated by . Hence, the -th equation of a linear simultaneous system can be written as
Let us assume that, -vectors and consist of and , respectively, stacked vertically for fixed . Accordingly, the -vector is formed by stacking vertically. Then, the matrix will be of dimension , where . In fact, it is a block-diagonal matrix with diagonal blocks also for fixed with rank . In short, the notations can be summarized as follows:
Based on this notation model (2.1) can be rewritten as
Note that as a common assumption, we now consider where .
In the present study, we aim to estimate the parameters of this SUR model. This can be achieved via many parametric and nonparametric estimating procedures including 2SLS1, 3SLS2, GMM3, LIML4 and FIML,5 Anderson and Rubin , Theil , and Davidson and Mackinnon . In this paper we focus on FIML according to normal and skew-normal errors assumption. Moreover, a number of important statistical features pertaining to these models are provided.
Based upon the information provided so far, we can write down the likelihood function to estimate the parameters. As is common, it is preferred to use the logarithm of the likelihood, in which we write it as , in our problem. It is given by
Moreover, via invoking a simple computation, it can be shown that
So far, the estimators have been calculated based on the assumption of normality for the error. However, if the distribution of errors is asymmetric, such as specifically skew-normal then to obtain the estimators are not as trivial as seen above. To treat this, we first briefly review the skew-normal distribution in the subsequent section. Then, the FIML estimators of the parameters are obtained under such assumption, while the model includes endogenous variables.
3. SUR MODELED WITH SKEW-NORMAL DISTRIBUTION
We first recall the definition and a few key properties of the skew-normal distribution, as given by Azzalini and Dalla Valle . Suppose is a -dimensional random variable, then it follows the multivariate skew-normal distribution if it is continuous with density function
The parameter plays a key role in representing the main features of density in (3.1). Since it controls the skewness of density, it is usually referred to as shape parameter or, also, skewness control. This density function is skewed to the right (left) for positive (negative) values of When , the distribution function (3.1) reduces to where is a zero vector of length
Location and scale parameters can be also added to the skew-normal density of given in (3.1). Let us write
Now, suppose one is interested in the estimator of parameters in this model through the maximum likelihood approach. Then, corresponding logarithm of the likelihood function, say which is given by
By substituting this estimation into the expression in (3.5), one will obtain
We conduct some simulation studies using model (2.1) along with normal and skew-normal distributions in the following section. Moreover, we investigate the application of these methods in real-life data.
4. SIMULATION STUDIES AND APPLICATION
Here, we outline our simulation study to evaluate the performance of the parameters estimation for the SUR models given in Section 2. Suppose we have the following model:
To further identification of this model, we need to indicate a distribution for . To start, let us assume , and are endogenous variables and , and are exogenous variables. To compare this model with an alternative, we also consider the case in which
We fix the parameter in our simulation studies as and
To initiate our simulation studies, we take the sample size equal to , in which using two equations in (4.1) ends up with the total observations . Then, we generate data for times from skew-normal distribution. Thereafter, the model was fitted by both maximum likelihood approaches (normal and skew-normal assumptions) as described in previous sections. Particularly, the parameters were estimated based upon either equations in (2.6) and (3.10), depending on the distribution considered for the errors in the model.
The results gained from our simulation studies for both the normal and skew-normal cases are given in Table 1. As seen, the table is partitioned into two parts. The three left- hand sides panels are related to the results coming from the normal assumption and the rest on the right belong to the skew-normal assumption both for error term. The distributions are indicated by N (Normal) and SN (Skew-Normal). Furthermore, the table includes estimate, standard deviation (SD), and effect size (ES).
The result of SUR model fitted according to the skew-normal and normal assumptions.
Based on the results in Table 1, the estimates for , , , and have small ES in both cases. The ES for the intercept is high regardless of which distribution is considered for the error term. However, it is higher in the normal model compared to the skew-normal case. Overall, the estimates in the SN case are closer to the real value of parameters before conducting the simulation. In general, when response variables follow a skew-normal distribution in the SUR model, the methods relied on the skew-normal density for the error leads to more accurate estimation than the normal density case.
One notes that the likelihood ratio test for the null hypothesis can be considered as a criterion for a comparison in whether or not the skew-normal distribution should be considered. This test is given by
Criteria to compare two methods of model parameters estimate.
We were interested in applying the proposed model in this paper in real-life data. To do this, we used the Iranian rural households income and expenditure data collected in the year . It includes families from provinces. In the present paper, the main goal is a survey effects of some variables on Iranian rural households income and expenditure. In this study, these two variables are considered as endogenous variables and other covariates are set as exogenous. Based on a general view and also consulting experts in the Statistical Center of Iran, the following SUR was utilized to express the inter-relationship between rural households income and expenditure in Iran:
|Variable Names||Abbreviation Signs||Variable Type||Coding|
|Number of literate in household||Quantitative||–|
|Number of employees in household||Quantitative||–|
|Number of people with income||Quantitative||–|
|Private car||Qualitative||1: Use, 0: Nonuse|
|Internet||Qualitative||1: Use, 0: Nonuse|
|Gas||Qualitative||1: Use, 0: Nonuse|
|Mobile||Qualitative||1: Use, 0: Nonuse|
|Agriculture self-employment income||Quantitative||–|
|Nonagriculture self-employment income||Quantitative||–|
|Non-monetary other incomes||Quantitative||–|
Description of variables utilized in model (4.4).
To initiate the analysis, the validity of the normality assumption for the response variables should be tested. We used the Kolmogorov–Smirnov (KS) test statistics for this purpose. The results of the KS test was significant with p-value , rejecting the null hypothesis; assuming the normality density. To have a visual inspection of the density, the Q-Q plot of the households income and expenditure are also drawn in Figure 5. They show the departure of univariate normal distribution for both variables. The contour plot in Figure 5 also demonstrates a departure from the bivariate normal distribution. It can be argued that some transformations, such as logarithm, to make density normal is appropriate. However, the income variable includes some negative values and so we are not allowed to utilize this transformation. Instead, we preferred to use the skew-normal distribution for the errors and attempted to model the rural households income and expenditure in Iran based upon this methodology. Nonetheless, to have a basement for our further comparison, the normal distribution was also considered for the errors in this example.
The results from employing aforementioned models for our example are appeared in Table 4. As seen, it includes three panels. The first (second) panel shows the results for the first (second) equation of the model (4.4). Confining ourselves only to those significant estimates of the parameters at level, the results for the normal and skew-normal densities are provided in both panels. The last panel shows the estimation for the components of the covariance matrix and shape parameters. A test was conducted to check whether or not the skewness parameter is equal to zero. This led to with . Since the test was significant at level, we accept that the skew parameter is not zero, and using the skew-normal MLE is more effective than the normal MLE.
The result of fitting the seemingly unrelated regression (SUR) model in (4.4) considering the skew-normal and normal distributions assumption for the response in the Iranian rural households income and expenditure data on year 2009.
Based on the results given in the first panel of Table 4, using facilities (including the Internet, gas, and mobile), has a direct effect on family households expenditure in Iran. In other words, using these facilities can increase family households expenditure. It is also seen that, family size, number of literate, employees, and people with income in household and age have direct link with family households expenditure. Moreover, regarding the second panel of Table 4, the agriculture self-employment, non-agriculture self-employment, miscellaneous income, and non-monetary other incomes have direct effect on the family incomes.
There are some examples of encountering with data having an asymmetric histogram. Considering some skew-normal distributions is usually a solution to construct a model. The problem will be harder if one should take SEMs into account. Confining to the SUR model, which is a particular case of SEM, we discussed the method of estimation for the parameters of this model in this paper. Here, the response variables were following the skew-normal distribution. Performance of the proposed method has been compared with an alternative case in which the normal density is incorrectly assumed for the error. Then, we applied the methods discussed in this paper on real data. Results shown superiority of our approach to other methods relied on normal distribution for the error. There is still room to extend the model in this paper. One of the possible options is to investigate the performance of the Bayesian approach on the SUR model with skew-normal assumption for the error term. Moreover, to check how other skew distributions such as skew-t density works on the SUR models worth to study.
CONFLICTS OF INTEREST
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article
Receiving support from the Center of Excellence in Analysis of Spatio Temporal Correlated Data at Tarbiat Modares University.
Theorem: For any fixed matrix .
In Equation (3.8), is determined as follows:
Suppose is the -th observation from -th equation and is -vector and is a -vector. Also consider
Hence, where is the corresponding variable to . The last step for determining the derivative of is to set the matrix as
As it can be seen, the first column and -th row of is equal to . Finally, for all other observations, the corresponding derivative is given as:
As a general rule, the Hessian matrix is required if one is interested in utilizing the quasi-Newton algorithm. The relevant derivatives to construct such a matrix are as follows:
Notice that we used the property . The second derivative of subject to is straightforward. However, the computation of is too tough. To obtain this derivative, we applied formula (5.5) to get:
On getting (5.7), we employed the following equality in which is a non-singular matrix:
If both and are members of -th equation in a SUR, then; we have:
Two-stage Least Square
Three-stage Least Square
Generalized Method of Moments
Limited Information Maximum Likelihood
Fully Information Maximum Likelihood
Cite this article
TY - JOUR AU - Omid Akhgari AU - Mousa Golalizadeh PY - 2021 DA - 2021/02/08 TI - On Seemingly Unrelated Regression Model with Skew Error JO - Journal of Statistical Theory and Applications SP - 97 EP - 110 VL - 20 IS - 1 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.d.210126.002 DO - 10.2991/jsta.d.210126.002 ID - Akhgari2021 ER -