# Journal of Statistical Theory and Applications

Volume 20, Issue 1, March 2021, Pages 97 - 110

# On Seemingly Unrelated Regression Model with Skew Error

Authors
1Department of Statistics, Amin University, Tehran, Iran
2Department of Statistics, Tarbiat Modares University, Tehran, Iran
Corresponding Author
Received 7 October 2018, Accepted 5 January 2021, Available Online 8 February 2021.
DOI
10.2991/jsta.d.210126.002How to use a DOI?
Keywords
Seemingly unrelated regression; Endogenous variable; Exogenous variable; Skew-normal distribution
Abstract

Sometimes, invoking a single causal relationship to explain dependency between variables might not be appropriate particularly in some economic problems. Instead, two jointly related equations, where one of the explanatory variables is endogenous, can represent the actual inheritance inter-relationship among variables. Such typical models are called simultaneous equation models of which the seemingly unrelated regression (SUR) models is a special case. Substantial progress has been made regarding the statistical inference on estimating the parameters of these models in which errors follow a normal distribution. But, less research was devoted to a case that the distributions of the errors are asymmetric. In this paper, statistical inference on the parameters for the SUR models, assuming the skew-normal density for errors, is tackled. Moreover, the results of the study are compared with those of other naive methodologies. The proposed model is utilized to analyze the income and expenditure of Iranian rural households in the year 2009.

Open Access

## 1. INTRODUCTION

Most linear regression models rely on the relationship between a dependent variable to one or more explanatory variables. The main objective in treating these models is estimating and predicting the average value of dependent variables subject to some explanatory variables. But in many cases, particularly in some economic problems, the causal relationship represented by a single equation is not appropriate. The drawback of such single models is twofold. Mainly, not only does the response variable depends on the explanatory variables, but the response variable also determines some of the explanatory variables. Generally, it can be argued that there are simultaneous or two-sided relationships between the response and some of the explanatory variables in these cases. Hence, to separate the variables as explanatory and dependent does not make sense in real-life circumstances. In these situations, the number of equations will, naturally, be more than one. Precisely, there is an equation for every endogenous or dependent variable. Generally, following Haavelmo  when the dependent variable of a particular model is an explanatory variable, one should use the simultaneous equations models (SEMs). The particular case of these models is called the seemingly unrelated regression (SUR) model.

Evidence shows that Zellner  was the pioneer researcher to estimate the parameters of the SUR model using the generalized least square method. The history of the frequent approach to such models was somewhat low. But, there were much research on following the Bayesian approach. The application of the Bayesian approach in the SUR model was first proposed by Zellner . Afterward, other methods for estimating parameters were used, including the maximum likelihood method , Bayesian moment and direct Monte Carlo method . The MCMC application in the SUR model has appeared in many studies under various assumptions. To name some we can mention, for example, Percy , Chib and Greenberg , and Smith and Kohn . Recently, Zellner and Ando  and also Zellner et al.  have investigated the estimation of the parameters in the SUR model using a hierarchical Bayes approach through the direct Monte Carlo and importance sampling techniques.

Another important aspect of the SUR models, which was and is worth to study, refers to the type of distribution considered for the error term. It is quite common to assume the normal density for this case. But, there are numerous examples in which the empirical distribution of variables often exhibits asymmetric structure and so the normal distribution can no longer be used in these cases. In these situations, some transformations may be used to make the distribution of data to, relatively, follow normal density. However, such transformations have their own drawbacks, including the biase of the estimator . Using asymmetric distributions possessing the same characteristics as normal distribution, has recently received significant attention in the literature. The skew-normal distribution is one of the important distributions proposed to tackle the asymmetric feature of data. Historically, the univariate skew-normal distribution was advocated by Azzalini . Then, Azzalini and Dalla valle  proposed the multivariate skew-normal distribution. Azzalini and Capitano  further studied the properties of this density. Several generalizations of this distribution have been presented by Balakrishnan , Genton , Guptaet al. , and Arellanovalle et al. . Recently, Azzalini and Regoli  have investigated some other properties of the skew-symmetric distribution. As a new line of research, we consider the SUR model allowing the error in the model to follow the skew-normal distribution. The estimation of parameters using the maximum likelihood methodology is also treated. Intensive simulation studies are conducted to evaluate the proposed methods. Application of the model to real-life data is also given.

The present paper is organized as follows: A brief review of the SUR model is presented in Section 2. Then, a likelihood-based approach to estimate the parameters with the skew distribution for the errors in the SUR models is discussed in Section 3. The simulation study as well as the analysis of the real data, related to the Iranian rural households income and expenditure on in the year 2009, are presented in Section 4. General conclusions are provided at the end. The proofs for some of the results are given in the Appendix.

## 2. SUR MODEL

Suppose Xt is an n×kt matrix of explanatory variables and βt a column vector of parameters with the length kt. Furthermore, suppose there are g equations corresponding with g endogenous variables, a column vector with the length n, indicated by y1,,yg. Hence, the t-th equation of a linear simultaneous system can be written as

yt=Xtβt+ut,t=1,,g,(2.1)
where
E(uti)=0Var(uti)=σttCov(uti,usi)=σts,t,s=1,2,,gi=1,2,,n.(2.2)

Let us assume that, g-vectors yi and ui consist of yti and uti, respectively, stacked vertically for fixed t. Accordingly, the k-vector β is formed by stacking βi vertically. Then, the matrix Xt will be of dimension g×k, where k=t=1gkt. In fact, it is a block-diagonal matrix with diagonal blocks Xti also for fixed t with rank 1×kt. In short, the notations can be summarized as follows:

yi=(y1iygi)g×1ui=(u1iugi)g×1β=(β1βk)k×1Xi=(X1i00Xgi)g×k.(2.3)

Based on this notation model (2.1) can be rewritten as

yi=Xiβ+ui,i=1,,n,(2.4)

Note that as a common assumption, we now consider ui~N(0g,Σ) where Σ={σts}g×g.

In the present study, we aim to estimate the parameters of this SUR model. This can be achieved via many parametric and nonparametric estimating procedures including 2SLS1, 3SLS2, GMM3, LIML4 and FIML,5 Anderson and Rubin , Theil , and Davidson and Mackinnon . In this paper we focus on FIML according to normal and skew-normal errors assumption. Moreover, a number of important statistical features pertaining to these models are provided.

Based upon the information provided so far, we can write down the likelihood function to estimate the parameters. As is common, it is preferred to use the logarithm of the likelihood, in which we write it as l(β,Σ), in our problem. It is given by

l(β,Σ)=ng2log2πn2log|Σ|12t=1n[(ytXtβ)TΣ1(ytXtβ)],(2.5)
and should be maximized to obtain the FIML estimators. It is quite straightforward to show (see, e.g. Anderson and Rubin ) that the maximum likelihood estimators of the parameters are given by
β^=[t=1nXtTΣ1Xt]1t=1nXtΣ1ytΣ^=1nt=1n[(ytXtβ)(ytXtβ)T].(2.6)

Moreover, via invoking a simple computation, it can be shown that

Var(β^)=t=1nXtTΣ1Xt1.(2.7)

So far, the estimators have been calculated based on the assumption of normality for the error. However, if the distribution of errors is asymmetric, such as specifically skew-normal then to obtain the estimators are not as trivial as seen above. To treat this, we first briefly review the skew-normal distribution in the subsequent section. Then, the FIML estimators of the parameters are obtained under such assumption, while the model includes endogenous variables.

## 3. SUR MODELED WITH SKEW-NORMAL DISTRIBUTION

We first recall the definition and a few key properties of the skew-normal distribution, as given by Azzalini and Dalla Valle . Suppose Z is a k-dimensional random variable, then it follows the multivariate skew-normal distribution if it is continuous with density function

2ϕk(z;Ψ)Φ(λTz),(zϵk),(3.1)
where ϕk(z;Ψ) is the k-dimensional normal density with zero mean vector and correlation matrix Ψ being of full rank, Φ(.) is the cumulative distribution function of the k-dimensional standard normal, and λ is a k-dimensional column vector with constant values. To show this in short form, it is common to write Z~SNk(0k,Ψ,λ).

The parameter λ plays a key role in representing the main features of density in (3.1). Since it controls the skewness of density, it is usually referred to as shape parameter or, also, skewness control. This density function is skewed to the right (left) for positive (negative) values of λ. When λ=0, the distribution function (3.1) reduces to N(0k,Ψ), where 0q is a zero vector of length q.

Location and scale parameters can be also added to the skew-normal density of Z given in (3.1). Let us write

Y=ξ+ωZ,(3.2)
where ξ=(ξ1,,ξk)T, and ω=diag(ω1,,ωk), are location and scale parameters, respectively. Note that components of ω are assumed to be all positive. The density function of Y is then given by
2ϕk(yξ;Ω)Φ(λTω1(yξ)),(3.3)
where Ω=ωΨωT=ωΨω represents the covariance matrix of Y. We use the standard notation Y~SNk(ξ,Ω,λ) to indicate that Y follows the density function in (3.3). To have a general graphical view of this density, we provided some plots for particular values of the parameters in (3.3). The Figure 1 shows the contour plots of bivariate skew-normal density and the histogram of each variable for a bivariate skew-normal density. Now, we are in a position to concentrate on the estimators in an SUR model under the skew-normal distribution for the error term. Consider the model (2.4), with altering the index i to t, where
ut=(ut1,,utg)T~SNg(0g,Σ,λ),  t=1,,n.(3.4)

Now, suppose one is interested in the estimator of parameters in this model through the maximum likelihood approach. Then, corresponding logarithm of the likelihood function, say =l(λ,β,Σ), which is given by

l=l(λ,β,Σ)=nlog2ng2log(2π)n2log|Σ|12t=1n[(ytXtβ)TΣ1(ytXtβ)]+t=1nlog[Φ1(λTΣ1/2ut)],(3.5)
needs to be maximized. If we regard η=Σ1/2λ as a new parameter, instead of λ, it results in splitting the parameters in (3.5) in the following sense: for fixed β and η, maximization of l with respect to Σ is equivalent to maximizing the analogous function for normal density for fixed β, which has a well-known solution (see, e.g. Mardia et al. ) given by
Σ^(β)=V(β)=1nt=1n(ytXtβ)(ytXtβ)T.(3.6)

By substituting this estimation into the expression in (3.5), one will obtain

l(η,β)=Cn2log|V(β)|ng2+t=1nlogζ0(ηTut),(3.7)
where ζ0(x)=log(2Φ(x)) and x~N(0,1). Now, to get the estimators for the rest of the parameters, one needs to maximize l(η,β), which is, in fact, the profile likelihood function , with respect to η and β. To do so, the partial derivatives of l(η,β) with respect to η and β can be written, respectively, as
l(η,β)η=t=1nutζ1(ηTut)=t=1n(ytXtβ)ζ1[η(ytXtβ)]l(η,β)β=n2log|V(β)|βt=1nXtTηζ1[ηT(ytXtβ)]=n2(tr(V1Vβ1)tr(V1Vβ2)tr(V1Vβk))t=1nXtTηζ1[ηT(ytXtβ)],(3.8)
where ζ1(x)=ϕ(x)/Φ(x). As seen, one cannot derive some closed solutions (estimators) from the equations in (3.8). Hence, some numerical maximization procedures need to be implemented for this purpose. There are numerous literature for such numerical computations. See, for example, Robert and Casella . A common approach is to follow the quasi-Newton algorithm. To do so, we are required to get the second derivatives of the expression in (3.7). They are given as follows:
2l(η,β)ηηT=t=1n(ytXtβ)(ytXtβ)Tζ2[ηT(ytXtβ)]2l(η,β)βTβ=n2(tr(V12Vβ12)tr(V12Vβ2β1)tr(V12Vβkβ1)tr(V12Vβ1βk)tr(V12Vβ2βk)tr(V12Vβk2))+n2(tr(V1Vβ1V1Vβ1)tr(V1Vβ2V1Vβ1)tr(V1VβkV1Vβ1)tr(V1Vβ1V1Vβk)tr(V1Vβ2V1Vβk)tr(V1VβkV1Vβk))+t=1nXtTηηTXtζ2[ηT(ytXtβ)]2l(η,β)βTη=t=1n{XtTη(ytXtβ)Tζ2[ηT(ytXtβ)]+XtTζ1[ηT(ytXtβ)]},2l(η,β)ηTβ=(2l(η,β)βTη)T,(3.9)
where ζ2(x)=ζ1(x)[x+ζ1(x)]. If ϒ is the parameter of interest, using the gradient of the function in which this parameter appears, the quasi-Newton algorithm apply as
ϒ(k+1)=ϒ(k)(2f)(k)1(f)(k),(3.10)
where the indices are used to show the value of the estimator at corresponding stage and (ignoring the index)
f=(l(η,β)ηl(η,β)β),2f=(2l(η,β)ββT2l(η,β)βTη2l(η,β)ηTβ2l(η,β)ηηT).(3.11)

We conduct some simulation studies using model (2.1) along with normal and skew-normal distributions in the following section. Moreover, we investigate the application of these methods in real-life data.

## 4. SIMULATION STUDIES AND APPLICATION

Here, we outline our simulation study to evaluate the performance of the parameters estimation for the SUR models given in Section 2. Suppose we have the following model:

y1=β0+β1z1+β2x1+u1y2=γ0+γ1z1+γ2x2+u2.(4.1)

To further identification of this model, we need to indicate a distribution for (u1,u2)T. To start, let us assume u=(u1,u2)T~N(02,Σ), y1 and y2 are endogenous variables and z1, x1 and x2 are exogenous variables. To compare this model with an alternative, we also consider the case in which u=(u1,u2)T~SN(02,Σ,λ).

We fix the parameter in our simulation studies as β=(6,3,4,9,3,2)T,  λ=(2,3)T and

Σ=(122211).(4.2)

To initiate our simulation studies, we take the sample size equal to 1000, in which using two equations in (4.1) ends up with the total observations 2000. Then, we generate data for 1000 times from skew-normal distribution. Thereafter, the model was fitted by both maximum likelihood approaches (normal and skew-normal assumptions) as described in previous sections. Particularly, the parameters were estimated based upon either equations in (2.6) and (3.10), depending on the distribution considered for the errors in the model.

The results gained from our simulation studies for both the normal and skew-normal cases are given in Table 1. As seen, the table is partitioned into two parts. The three left- hand sides panels are related to the results coming from the normal assumption and the rest on the right belong to the skew-normal assumption both for error term. The distributions are indicated by N (Normal) and SN (Skew-Normal). Furthermore, the table includes estimate, standard deviation (SD), and effect size (ES).

N-ML
SN-ML
Parameter Estimate SD ES Estimate SD ES
β0 9.552 0.602 3.552 5.736 0.313 0.264
β1 −3.001 0.018 0.001 −3.003 0.011 0.003
β2 −4.002 0.067 0.002 −3.982 0.023 0.018
γ0 18.11 0.751 9.11 8.711 0.451 0.289
γ1 3.007 0.063 0.007 3.007 0.029 0.007
γ2 −2.013 0.068 0.013 −1.969 0.037 0.031
σ11 20.55 4.849 8.55 12.47 1.059 0.474
σ22 41.42 5.543 30.42 11.25 3.377 0.258
σ12 −9.57 1.414 7.577 −1.72 1.175 0.28
λ1 2.210 0.691 0.210
λ2 3.211 1.080 0.211
Table 1

The result of SUR model fitted according to the skew-normal and normal assumptions.

Based on the results in Table 1, the estimates for β1, β2, γ1, and γ2 have small ES in both cases. The ES for the intercept is high regardless of which distribution is considered for the error term. However, it is higher in the normal model compared to the skew-normal case. Overall, the estimates in the SN case are closer to the real value of parameters before conducting the simulation. In general, when response variables follow a skew-normal distribution in the SUR model, the methods relied on the skew-normal density for the error leads to more accurate estimation than the normal density case.

One notes that the likelihood ratio test for the null hypothesis λ=0 can be considered as a criterion for a comparison in whether or not the skew-normal distribution should be considered. This test is given by

2{(β^,Σ^,λ^)(μ^,Ω^,0)},(4.3)
where β^, Σ^, and λ^ denote the MLE under the assumption of skew-normality (shorten as SN-ML) and μ^ and Ω^, are MLE under the assumption of normality (shorten as N-ML) for the errors. Following Casella and Berger , the expression (4.3) follows χdf2 where df is the difference on the dimensions of parameter in the alternative and null hypotheses. The logarithm of the likelihood and AIC criterion for both methods appear in Table 2. As it can be seen, the logarithm of the likelihood for the SN-ML is higher than that of N-ML. Moreover, the AIC criterion for the SN-ML is less than that of the N-ML. Therefore, SN-ML outperforms N-ML in this study which means that, in comparison with the N-ML distribution, using the skew-normal density for the error term in the SUR model (3.10), leads to an improvement on the accuracy and bias of the estimators. Here, the likelihood ratio test statistics was LRT=2{(β^,Σ^,λ^)(μ^,Ω^,0)}=119.48 with df=2. Hence, the test is significant at 0.05 level; therefore, it can be stated that the skew parameters (λ) is not zero. This supports our initial assumption on considering the skew-normal distribution for the error terms.

Criteria N-ML SN-ML
AIC 13143.32 13035.198
Log likelihood −6560.66 −6508.599
Table 2

Criteria to compare two methods of model parameters estimate.

We were interested in applying the proposed model in this paper in real-life data. To do this, we used the Iranian rural households income and expenditure data collected in the year 2009. It includes 13345 families from 32 provinces. In the present paper, the main goal is a survey effects of some variables on Iranian rural households income and expenditure. In this study, these two variables are considered as endogenous variables and other covariates are set as exogenous. Based on a general view and also consulting experts in the Statistical Center of Iran, the following SUR was utilized to express the inter-relationship between rural households income and expenditure in Iran:

GH=β0+i=14βCiCi+i=15βBiBi+βAA+ϵ1D=γ0+i=14γDiDi+ϵ2.(4.4)

A general description of the considered variables is provided in Table 3. Figures 24 present a geometric display of two important variables.

Variable Names Abbreviation Signs Variable Type Coding
Households expenditure GH Quantitative
Households income D Quantitative
Family size C1 Quantitative
Number of literate in household C2 Quantitative
Number of employees in household C3 Quantitative
Number of people with income C4 Quantitative
Age A Quantitative
Floor area B1 Quantitative
Private car B2 Qualitative 1: Use, 0: Nonuse
Internet B3 Qualitative 1: Use, 0: Nonuse
Gas B4 Qualitative 1: Use, 0: Nonuse
Mobile B5 Qualitative 1: Use, 0: Nonuse
Agriculture self-employment income D1 Quantitative
Nonagriculture self-employment income D2 Quantitative
Miscellaneous income D3 Quantitative
Non-monetary other incomes D4 Quantitative
Table 3

Description of variables utilized in model (4.4).

To initiate the analysis, the validity of the normality assumption for the response variables should be tested. We used the Kolmogorov–Smirnov (KS) test statistics for this purpose. The results of the KS test was significant with p-value <0.05, rejecting the null hypothesis; assuming the normality density. To have a visual inspection of the density, the Q-Q plot of the households income and expenditure are also drawn in Figure 5. They show the departure of univariate normal distribution for both variables. The contour plot in Figure 5 also demonstrates a departure from the bivariate normal distribution. It can be argued that some transformations, such as logarithm, to make density normal is appropriate. However, the income variable includes some negative values and so we are not allowed to utilize this transformation. Instead, we preferred to use the skew-normal distribution for the errors and attempted to model the rural households income and expenditure in Iran based upon this methodology. Nonetheless, to have a basement for our further comparison, the normal distribution was also considered for the errors in this example.

The results from employing aforementioned models for our example are appeared in Table 4. As seen, it includes three panels. The first (second) panel shows the results for the first (second) equation of the model (4.4). Confining ourselves only to those significant estimates of the parameters at %5 level, the results for the normal and skew-normal densities are provided in both panels. The last panel shows the estimation for the components of the covariance matrix and shape parameters. A test was conducted to check whether or not the skewness parameter (λ) is equal to zero. This led to LRT=2{(β^,Σ^,λ^)(μ^,Ω^,0)}=24385.1(24444.4)=59.3 with df=2. Since the test was significant at 0.05 level, we accept that the skew parameter is not zero, and using the skew-normal MLE is more effective than the normal MLE.

Estimation
Sth.error
Parameter N-ML SN-ML N-ML SN-ML
β0 −1.50 −1.34 0.047 0.006
βC1 0.036 0.040 0.009 0.001
βC2 0.082 0.046 0.010 0.002
βC3 0.106 0.059 0.011 0.004
βC4 0.061 −0.024 0.014 0.004
βB1 0.003 0.003 0.0006 0.0001
βB2 0.004 0.002 0.0002 0.0005
βB3 0.649 0.531 0.024 0.013
βB4 0.689 0.490 0.051 0.032
βB5 0.064 0.031 0.018 0.0085
βA 0.276 0.137 0.025 0.0064
γ0 0 −0.103 0.025 0.0038
γD1 0.544 0.499 0.013 0.0039
γD2 0.503 0.487 0.013 0.0039
γD3 0.412 0.352 0.013 0.0039
γD4 0.033 0.030 0.013 0.0038
σ11 0.656 0.051 0.011 0.009
σ21 0.084 0.009 0.009 0.001
σ22 0.315 0.018 0.008 0.004
λ1 1.181 0.104
λ2 0.869 0.097
Table 4

The result of fitting the seemingly unrelated regression (SUR) model in (4.4) considering the skew-normal and normal distributions assumption for the response in the Iranian rural households income and expenditure data on year 2009.

Based on the results given in the first panel of Table 4, using facilities (including the Internet, gas, and mobile), has a direct effect on family households expenditure in Iran. In other words, using these facilities can increase family households expenditure. It is also seen that, family size, number of literate, employees, and people with income in household and age have direct link with family households expenditure. Moreover, regarding the second panel of Table 4, the agriculture self-employment, non-agriculture self-employment, miscellaneous income, and non-monetary other incomes have direct effect on the family incomes.

## 5. CONCLUSION

There are some examples of encountering with data having an asymmetric histogram. Considering some skew-normal distributions is usually a solution to construct a model. The problem will be harder if one should take SEMs into account. Confining to the SUR model, which is a particular case of SEM, we discussed the method of estimation for the parameters of this model in this paper. Here, the response variables were following the skew-normal distribution. Performance of the proposed method has been compared with an alternative case in which the normal density is incorrectly assumed for the error. Then, we applied the methods discussed in this paper on real data. Results shown superiority of our approach to other methods relied on normal distribution for the error. There is still room to extend the model in this paper. One of the possible options is to investigate the performance of the Bayesian approach on the SUR model with skew-normal assumption for the error term. Moreover, to check how other skew distributions such as skew-t density works on the SUR models worth to study.

## CONFLICTS OF INTEREST

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article

## Funding Statement

Receiving support from the Center of Excellence in Analysis of Spatio Temporal Correlated Data at Tarbiat Modares University.

## APPENDIX

Theorem: For any fixed (p×p) matrix A>0.

f(Σ)=|Σ|n/2exp{12trΣ1A}(5.1)
is maximized over Σ>0 by Σ=n1A, and so f(n1A)=|n1A|n/2enp2.

In Equation (3.8), Vβj is determined as follows:

Suppose Ai=ytiXtiβi is the i-th observation from i-th equation and βi is ki-vector and XtiT is a ki-vector. Also consider

At=(A1A2Ag0A2Ag00Ag)g×gβ=(β1βg)k×1Xt=(Xt1000Xt2000Xtg)g×k(5.2)
where k=i=1gki. Here, the main goal is to get the derivative of V with respect to j-th parameter of β, that is βj (for j=1k). Therefore, we define k-vector whose that its j-th element is 1 and the other ones are all zero. Similarly, we determine βj a g-vector in which its the i-th element is 1 and the other ones are zero. Since Xt(j) is only appears in i-th equation in a particular manner, we define:
a=(010)k×1  b=(010)g×1.(5.3)

Hence, Xt(j)=bTXta where Xt(j) is the corresponding variable to βj. The last step for determining the derivative of V is to set the matrix Ctj as

Ctj=(000000Xt(j)0000)g×gAt.(5.4)

As it can be seen, the first column and i-th row of Ctj is equal to Xt(j). Finally, for all other observations, the corresponding derivative is given as:

Vβj=1nt=1n(Ctj+CtjT).(5.5)

As a general rule, the Hessian matrix is required if one is interested in utilizing the quasi-Newton algorithm. The relevant derivatives to construct such a matrix are as follows:

2lβTη=βT[t=1n(ytXtβ)ζ1(ηT(ytXtβ))]=t=1n[βTytζ1(ηT(ytXtβ))βTXtβζ1(ηT(ytXtβ))]=t=1n[XtTηytTξ2(ηT(ytXtβ))XtTζ1(ηT(ytXtβ))+XtTηβTXtTζ2(ηT(ytXtβ))]=t=1n[XtTη(ytXtβ)Tζ2(ηT(ytXtβ))+XtTζ1(ηT(ytXtβ))].(5.6)

Notice that we used the property (2ηTβ)T=2βTη. The second derivative of subject to η is straightforward. However, the computation of 2ββT is too tough. To obtain this derivative, we applied formula (5.5) to get:

2lββT=β{n2(tr(V1Vβ1)tr(V1Vβk))Tt=1nηTXtζ1[η(ytXtβ)]}=n2β(tr(V1Vβ1)tr(V1Vβk))T+t=1nXtTηηTXtζ2(ηT(ytXtβ))=n2(β1tr(V1Vβ1)β2tr(V1Vβ1)βktr(V1Vβ1)β1tr(V1Vβk)β2tr(V1Vβk)βktr(V1Vβk))      +t=1nXtTηηTXtζ2(ηT(ytXtβ))=n2(tr(V12Vβ12)tr(V12Vβ2β1)tr(V12Vβkβ1)tr(V12Vβ1βk)tr(V12Vβ2βk)tr(V12Vβk2))    +n2(tr(V1Vβ1V1Vβ1)tr(V1Vβ2V1Vβ1)tr(V1VβkV1Vβ1)tr(V1Vβ1V1Vβk)tr(V1Vβ2V1Vβk)tr(V1VβkV1Vβk))     +t=1nXtTηηTXtζ2[ηT(ytXtβ)].(5.7)

On getting (5.7), we employed the following equality in which F is a non-singular matrix:

2log|F|xixj=tr(F1Fxj)xi=tr(F12Fxixj)tr(F12Fxixj)(5.8)

The components of the second matrix in the last expression (5.7) are determined using (5.5). Assuming βj is a member of i-th equation in the SUR, we have:

B=Ctj+CtjT=(0Xt(j)A100Xt(j)A20Xt(j)A12Xt(j)AiXt(j)Ag0Xt(j)Ag0)(5.9)
where all of the arrays equal zero except i-th row and column. The main diagonal of the favorite matrix βj is a member of i-th equation and so
2Vβj2=(00002Xt(j)0000)  j=1,k.(5.10)

If both βj and βl are members of i-th equation in a SUR, then; we have:

2Vβjβl=(00002Xt(j)Xt(l)0000)  jl=1,k(5.11)
where all of the arrays are zero except the element in the (ii) position. If βj is a member of i-th equation and βl is a member of m-th equation where im, then; we have:
2Vβjβl=(0000Xt(j)Xt(l)00Xt(j)Xt(l)0000)  jl=1,k(5.12)
where all of the arrays are zero except (im)-th and (mi)-th components.

## Footnotes

1

Two-stage Least Square

2

Three-stage Least Square

3

Generalized Method of Moments

4

Limited Information Maximum Likelihood

5

Fully Information Maximum Likelihood