Journal of Statistical Theory and Applications

Volume 20, Issue 1, March 2021, Pages 11 - 20

Restricted Empirical Likelihood Estimation for Time Series Autoregressive Models

Mahdieh Bayati1, S.K. Ghoreishi2, *, Jingjing Wu3
1Department of Statistics, Science and Research Branch, Azad University, I. R. Iran
2Department of Statistics, Faculty of Sciences, University of Qom, I. R. Iran
3Department of Mathematics and Statistics, University of Calgary, Canada
*Corresponding author. Email:
Corresponding Author
S.K. Ghoreishi
Received 3 June 2020, Accepted 18 January 2021, Available Online 3 February 2021.
10.2991/jsta.d.210121.001How to use a DOI?
Autoregressive models; EM algorithm; Empirical likelihood; Estimating equations

In this paper, we first illustrate the restricted empirical likelihood function, as an alternative to the usual empirical likelihood. Then, we use this quasi-empirical likelihood function as a basis for Bayesian analysis of AR(r) time series models. The efficiency of both the posterior computation algorithm, when the estimating equations are linear functions of the parameters, and the EM algorithm for estimating hyper-parameters is an appealing property of our proposed approach. Moreover, the competitive finite-sample performance of this proposed method is illustrated via both simulation study and analysis of a real dataset.

© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (


The empirical likelihood method is one of the most useful statistical tools that allows us to refer to under some regularity conditions. Initially, this statistical method was introduced by Thomas and Grunkemeier [1] and subsequently further developed by Owen [2,3].

Nowadays, many statisticians use this statistical method in different fields of applications. Owen [2,3] have used the empirical likelihood ratio statistic to test the assumptions assumed by nonparametric statistics. They have shown that this statistic has an asymptotic chi-square distribution. Moreover, they have used this statistic to construct a confidence region and to perform a hypothesis testing of parameters. Asymptotic properties and some necessary corrections of this statistic have been studied by DiCiccio and Romano [4] and Hall and Scala [5]. Furthermore, Qin and Lawless [6] has demonstrated that in the nonparametric scheme, the empirical likelihood, along with estimating equations, fits the data well. For more details, see Newey and Smith [7], Chen and Cui, [8,9]), and Qin and Lowless [6].

In many statistical methods, estimating equations are commonly used to estimate the parameters of assumed model. Among them, the most well-known ones include the maximum likelihood (ML) method, the least square method, and the moment method. In this regard, the empirical likelihood approach, by applying unbiasedness constraint on estimating functions, leads to the maximum empirical likelihood (MEL) estimation, which is obtained by maximizing the empirical likelihood function, under the unbiasedness imposed on estimating equations involving the unknown probability p1,p2,,pn. The works from Chen et al. [9], Hjort et al. [10], Tang and Leng [11], Leng and Tang [12], and Chang et al. [13] have shown that empirical likelihood estimators perform well when the dimension p of parameters and the number r of estimating equations grow slower than the sample size n, i.e. p=o(n) and r=o(n).

Tang and Leng [11], Leng and Tang [12], and Chang and Chen [13] have investigated the properties of empirical maximal likelihood estimators for high-dimensional data with sparse parameter vector. Practically, this is done by incorporating some suitable penalty functions. However, there are some challenges in employing empirical likelihood for large-scale data. Tsao [14] has shown that for a large enough n and fixed p, the probability of the corresponding confidence region containing true parameter values can be substantially smaller than its nominal value, which leads to an under-coverage problem.

In the meantime, the use of empirical likelihood method for time series data analysis has also been a topic of interest for researchers. For more details, interested readers are referred to Mykland [15], Kitamura [16], Chan and Ling [17], and Chan and Liu [18]. Monti [19] has developed the empirical likelihood method for the analysis of stationary time series. The author has used the M-estimator method, introduced by Whittle [21], for periodogram ordinates in time series models. Later, Yau [21] has extended the results in Monti [19] to long memory time series. Nordman and Lahiri [22] has argued that Monti's results were based on the assumption of normal distribution for the error term, and thus has extended the classical empirical likelihood to the one based on spectral distribution via Fourier transforms. Their results apply to both short and long memory dependencies; see Nordman and Lahiri [24] for details.

To the best of our knowledge, all the studies on empirical likelihood for time series rely on the unbiasedness property of estimating equations. But it seems that the unbiasedness condition alone, when data contains outliers, will produce misleading estimates for the parameters of interest (Bayati et al. [24]). In other words, in the usual empirical likelihood approach, features such as the dispersion of estimator functional, which depends on the dispersion of data itself, are not included in the statistical inference process. Therefore, it is necessary to use a pseudo-empirical likelihood that is a squaring function of the estimating equations. In this paper, for the first time we introduce the restricted empirical likelihood (REL) for the AR(r) time series models and present its Bayesian analysis. To our best knowledge, Bayati et al. [24] is the only one that has also used the idea of REL. Nevertheless, Bayati et al. [24] and this paper are very different in the sense that Bayati et al. [24] has studied the REL for independent observations while this paper investigates the REL for the more sophisticated case of dependent observations specifically from autoregressive models. In this paper, we not only prove the asymptotic normality of REL estimator for general models, but also give the detailed Bayesian analysis of AR(r) time series models using the REL and as well develop efficient algorithm for computing posterior and estimating hyper-parameters.

This article is organized as follows. After a glance at the empirical likelihood approach, we will illustrate the REL methodology in Section 2. The Bayesian analysis using the REL is discussed in Section 3. Bayesian analysis of AR(r) time series models, using the REL, are given in Section 4. A simulation study and an application to a real-world dataset are embedded in Section 5.


In this section, we first review the concept of empirical likelihood and then introduce the REL which is the main method proposed in this paper. The notations and definitions are provided along with other content.

2.1. Empirical Likelihood

Suppose x=(x1,x2,,xn)Rn are n independent observations obtained from the unknown distribution function f and they relate to the vector of parameters ϕ=(ϕ1,...,ϕr). Moreover, we assume that gj(x;ϕ), j=1,,r, are some estimating functions to estimate ϕ and they satisfy the unbiasedness conditions

where P={p1,,pn}. Moreover, let g(x;ϕ)=g1(x;ϕ),...,gr(x;ϕ).

In the usual empirical likelihood approach, the estimates of probabilities pi's are obtained by maximizing the empirical likelihood function

under conditions (1) and i=1npi=1. Equivalently,

So, it is straightforward to see that

where λ(ϕ)=λ1(ϕ),λ2(ϕ),,λr(ϕ) is the vector of Lagrange multipliers (see Owen [2] and Qin and Lawless [6]). So, the empirical likelihood function is given as

By using (4), the MEL estimator of ϕ is given by


There is a modified two-layer coordinate descent algorithm to compute ϕ̂EL (Tang and Wu [25]).

2.2. Restricted Empirical Likelihood

In previous subsection, the empirical likelihood function is given by (4). Now, by inequality abaa+bb2=a2+b22 for any arbitrary vectors a and b, we have


We hope that the value of ϕ that maximizes the left side of (5) approaches to the maximizing value of ϕ for the right side. The left side of (5) can be rewritten as

where w(ϕ)=12+λ(ϕ)2. Therefore, the pseudo-empirical maximum likelihood estimator (MLE) of ϕ is given by

Indeed, the above equation is equivalent to the case that the function i=1nlog1+w(ϕ)g(xi;ϕ)2 is optimized under the penalty function logw(ϕ). It gives us the idea that we can extend the expression (6) to a general form given by

where w=(w1,,wr) and g(x;ϕ)=g12(x;ϕ),,gr2(x;ϕ). Here, we assume H(w,ϕ) plays the role of a penalty function and define our REL, an alternative to the usual empirical likelihood, as

Practically, the REL LR(w,ϕ) has two appealing properties in contrast to L(ϕ) given in (4). First, we can assume in LR(w,ϕ) that the elements of vector w are independent of ϕ, while in L(ϕ) dependency of λ on θ is a basic assumption. This means that LR(w,ϕ) can be considered as a semi-parametric version of the usual EL function. Second, by assuming sparsity of vectors ϕ and w (to reduce the number of parameters and the number of the estimating equations respectively) and logREL being convex, we can use the REL method for high-dimensional data. For more details on the appealing properties of the REL LR(w,ϕ), readers are referred to Bayati et al. [24]. With these appealing properties, we propose the REL estimator of ϕ as given by


Under some regularity conditions on g and H, the REL estimator ϕ̂REL has an asymptotically multivariate normal distribution. This result is presented in following theorem with details. Its proof is straightforward and thus omitted to save space.

Theorem 2.1.

Under some well-known regularity conditions on g and H, i.e., 2g(x;ϕ)ϕiϕj< and 2H(w,ϕ)ϕiϕj<, when x1,x2,,xn are i.i.d., it holds that

and w is replaced with its estimate ŵREL.


Since ϕ̂ is the maximizer of i=1nlog1+wTg(xi;ϕ)+nH(w,ϕ), so we define


It is obvious that ϕ̂ is the MLE when Tn(ϕ) is log-likelihood function normalized by 1n. Therefore, the proof is the same as asymptotic normality of MLEs. Moreover, under the given regularity conditions, MLE ϕ̂ is constant, i.e., ϕ̂ϕ as n.

In the next section, we focus on Bayesian analysis of ϕ, considering the REL as the usual likelihood function.


Our main purpose in this paper is the Bayesian analysis of ϕ, viewing the REL as playing the role of a regular likelihood function. Based on this idea, we hope that the Bayesian estimation of ϕ can provide an acceptable approximation of the true values. For simplicity, we assume that the penalty function H(w,ϕ) is an additive function, i.e., H(w,ϕ)=H1(w)+H2(ϕ), and moreover assume enH(w,ϕ) is proportional to a multivariate density. Consequently, we assume that

are the corresponding prior densities. We have the following two points to make on the densities.

  1. We assume in (9) that the two vectors w and ϕ are independent. This assumption is often considered for simplicity and computational purposes.

  2. Unlike the usual way, the priors π1 and π2 are dependent of n. This is often assumed in Bayesian statistical analysis; for more details see Bhattacharya et al. [26] and Ghoreishi [27]. However, the functions H1 and H2 can also be chosen in such a way that the effect of sample size is ignorable.

Based on the above assumptions, using (8) and (9) along with observations D={x1,,xn}, the joint posterior density of (w,ϕ) is


Given the estimating functions g and the priors π1 and π2, one can estimate the desired parameters using the Bayesian approach. In practice, the marginal posterior distribution of parameters often do not have a closed form, therefore any inference is derived using samples via MCMC methods. However, in many statistical models in which the estimating functions are usually linear of the parameters, the conditional posterior distributions have the closed form and they are easy to sample using Gibbs sampling method. In the next section, we discuss this issue with more details for autoregressive models of order r, i.e., AR(r).


4.1. Basic Concepts

Autoregressive models show a random process where the output variable is a linear function of the same variable. Specifically, the AR(r) model is defined as

where ϕ1,ϕ2,,ϕr are the model parameters, and ϵt's are white noises with mean 0 and constant variance σ2. In our empirical approach, we assume the distribution of ϵt's is totally unknown. In practice, there are several methods to estimate the model coefficients. A popular approach is Yule-Walker's system of equations, i.e.,
are the estimating equations. Later in our Bayesian approach, we will use the weighed Yule-Walker's system of equations as our estimating equations.

As we see, the estimating equations (12) are linear functions of parameters ϕ1,ϕ2,,ϕr. Therefore, we hope that the posterior conditional distributions have a closed form when we use the REL function instead of a regular likelihood function. Moreover, in the autoregressive model (11), the conditional distribution of (xr+1,,xn) given (x1,,xr) is

where pt:=f(xt|ϕ,xt1,xt2,,xtr). On the other hand, the corresponding likelihood function is

Since the probabilities pt's are practically unknown, we try to estimate them here using our REL approach. We assume that the observation xt is weighted with its corresponding conditional density function value pt as the weight. This means that up to a normalizing constant, the estimating equations (12) satisfy

where P={pr+1,pr+2,,pn}. Finally, combining (8), (13) and (14) gives the corresponding REL as

4.2. Posterior Conditional Distributions

Without loss of generality, for Bayesian analysis using the REL (15), we assume wj,ϕj and sj are i.i.d. with the following common priors

where the additional variables sr+1,sr+2,,sn (the argument ur+1,,un in their respective distribution functions) appeared in equality

Then the conditional posterior distributions have closed forms given by

where gIG stands for the generalized inverse Gaussian distribution and

4.3. EM Algorithm for Estimating Hyper-parameters

One of the major challenges in sampling from the conditional posterior densities (17) is that the hyper-parameters β0 and σ02 are unknown in practice. However, there are several methods to estimate these quantities. In this subsection, we introduce an EM algorithm for this purpose.

From (15) and (16), the marginal density of the observations, x, is obtained as


Now consider the logarithm of the integrand in (18) as a function of β0 and σ02 given by


Our proposed EM algorithm for estimating β0 and σ0 has the following steps:

  1. Select the initial estimates β0(0) (>0) and σ02(0) (>0) and start with k=1.

  2. Given β0(k) (>0) and σ02(k) (>0), produce a sample of size N from the conditional distributions (17).

  3. E-step: Calculate the averaged I(β0,σ02) given by (19) using the generated samples in Step 2. Here, it should be noted that it is enough to calculate the mean of that expression in I(β0,σ02) which only depend on β0 and σ02.

  4. M-step: Update β0 and σ02 by maximizing the following two quantities with respect to β0 and σ02 respectively:


    The updated estimates at time (k+1) are obtained as


    Repeat Steps 2-4 until the algorithm converges.

4.4. Model Evaluation Criteria

In practice, there are several criteria in literature for evaluating the efficiency of time series models. One of those is the mean squared prediction error defined as MSPE=1nt=1n(xtx̂t)2. In this article, we will use a similar criterion to BIC. This criterion, for AR(r) model, is constructed based on the REL function, abbreviated as EBIC, defined as


In practice, a small EBIC value confirms the adequacy of the model, whereas a large value indicates the inadequacy of the model.


In this section, we first carry out a simulation study to confirm our theoretical results and then apply the proposed methods to a real dataset.

5.1. Simulation Study

Our simulation study focuses on the following autoregressive model of order 2:


For this model, we assume several scenarios for ϕ=(ϕ1,ϕ2) and n. In order for the AR(2) model to look like a nonparametric model, we assume that ϵt's are generated from the following a little more complex distribution (a combination of normal and Cauchy distributions)


Here, we run the corresponding Gibbs sampling scheme (12) for two sample sizes n=50 and 500. With the generated data under each scenario, we try to fit the following three models:

  • Model I:

    with estimating equation g(X,ϕ1)=Xt1(Xtϕ1Xt1).

  • Model II:

    with estimating equation g(X,ϕ2)=Xt2(Xtϕ1Xt2).

  • Model III:

    with estimating equations g1(X,ϕ1,ϕ2)=Xt1(Xtϕ1Xt1ϕ2Xt2) and g2(X,ϕ1,ϕ2)=Xt2(Xtϕ1Xt1ϕ2Xt2).

Moreover, for simplicity, we assume w:=w1=w2. We ran the sampling scheme (17) in order to generate 5000 samples from the marginal distributions of ϕ1 and ϕ2, and each time we fit Models I–III to the data. In addition to the estimates of (ϕ1,ϕ2) and their standard errors, the MSPE and EBIC criteria are also computed. The numerical results are presented in Table 1. As we can see from it, Model III fits the data much better than the other two competitor models.

(ϕ1,ϕ2) Method ϕ̂1 ϕ̂2 MSPE EBIC
(−0.5, −0.8) REL −0.319(0.127) - 50.941 10.693
n = 50 EL −0.221(0.191) - 54.627
(0.5, −0.8) REL −0.355(0.108) - 34.713 10.261
Model I EL −0.312(0.123) - 39.101
(−0.5, −0.8) REL −0.302(0.168) - 58.121 14.180
n = 500 EL −0.223(0.172) - 61.001
(0.5, −0.8) REL 0.368(0.143) - 33.975 13.990
EL 0.311(0.144) - 42.214
(−0.5, −0.8) REL - −0.146(0.221) 55.914 8.573
n = 50 EL - −0.121(0.243) 67.423
(0.5, −0.8) REL - −0.417(0.197) 39.128 7.857
Model II EL - 0.314(0.202) 44.121
(−0.5, −0.8) REL - −0.586(0.200) 56.842 10.000
n = 500 EL - −0.372(0.213) 71.001
(0.5, −0.8) REL - −0.222(0.209) 28.613 8.156
EL - 0.199(0.266) 37.813
(−0.5, −0.8) REL −0.547(0.032) −0.767(0.047) 20.542 4.705
n = 50 EL −0.570(0.035) −0.726(0.052) 22.117
(0.5, −0.8) REL 0.546(0.041) −0.642(0.032) 21.222 4.440
Model III EL 0.561(0.043) −0.689(0.033) 22.473
(−0.5, −0.8) REL −0.505(0.012) −0.805(0.014) 17.124 6.805
n = 500 EL −0.515(0.013) −0.821(0.017) 17.921
(0.5, −0.8) REL 0.491(0.011) −0.798(0.011) 16.978 6.508
EL 0.484(0.012) −0.811(0.012) 17.008
Table 1
The REL and EL estimates of (ϕ1,ϕ2) under various models (and their standard errors inside the parentheses) and the associated MSPE and EBIC.

5.2. Application

In this subsection, we analyze a real dataset on the Total Index of Consumer Goods and Services in urban areas of Iran for the period 1990–2017, extracted by The Central Bank of the Islamic Republic of Iran. The data are shown in Table 2 which demonstrates the presence of time dependency. A primary analysis reveals that an AR(2) model fits the data well after removing the trend. For this purpose, we first estimate the trend using a cubic polynomial given by

where the error terms ut's have an unknown distribution with mean 0 and variance σ2. With the cubic polynomial, we obtain the estimates α̂=0.639 and β̂=0.004, by fitting a linear regression model. Kolmogorov–Smirnove test statistic for ϵt is equal to 0.233 with Sig=0.000. This shows that the residuals do not have a normal distribution. Since the size of the data in not large enough, there is no strong evidence that
follow any particular distribution. Thus, we will use our proposed REL method to fit a time series model to ϵ̂t's. Using the posterior conditional distributions (17), the Bayesian estimates of ϕ1 and ϕ2 in model (20) are ϕ̂1=1.335 and ϕ̂2=0.465 respectively. The results indicate that the two-level model (20) fits the data well with the determination coefficient R2=0.996,MSPE=3.96, and EBIC=6.327.

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
xt 1.00 1.20 1.50 1.80 2.50 3.70 4.50 5.30 6.30 7.50 8.50 9.40 10.90 12.60
xt̂ - - 1.14 1.51 1.87 2.79 4.22 4.92 5.85 7.09 8.55 9.70 10.86 12.92
Year 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
xt 14.50 16.10 18.00 21.30 26.70 29.50 33.20 40.30 52.60 70.90 81.90 91.70 100.00 109.6
xt̂ 15.04 17.39 19.31 21.85 26.19 32.76 34.96 39.65 48.56 62.91 82.94 90.53 100.02 108.18
Table 2
Total Index of Consumer Goods and Services and their fitted values in urban areas of Iran from 1990 to 2017.


In this paper, a Bayesian method using the REL has been proposed for fitting the AR(r) time series model. An useful algorithm for computing the posterior has been developed for this approach. Moreover, the competitive performance of the proposed method has been clearly demonstrated via both simulation studies and a real data application. It is important to note that this method can be easily applied to other general linear models.


5.P. Hall and B. La Scala, Internat. Statist. Review, Vol. 58, 1990, pp. 109-127.
Journal of Statistical Theory and Applications
20 - 1
11 - 20
Publication Date
ISSN (Online)
ISSN (Print)
10.2991/jsta.d.210121.001How to use a DOI?
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (

Cite this article

AU  - Mahdieh Bayati
AU  - S.K. Ghoreishi
AU  - Jingjing Wu
PY  - 2021
DA  - 2021/02/03
TI  - Restricted Empirical Likelihood Estimation for Time Series Autoregressive Models
JO  - Journal of Statistical Theory and Applications
SP  - 11
EP  - 20
VL  - 20
IS  - 1
SN  - 2214-1766
UR  -
DO  - 10.2991/jsta.d.210121.001
ID  - Bayati2021
ER  -