Goodness-of-fit tests for weibull populations on the basis of records

Record is used to reduce the time and cost of running experiments (Doostparast and Balakrishnan, 2010). It is important to check the adequacy of models upon which inferences or actions are based (Lawless, 2003, Chapter 10, p. 465). In the area of goodness of fit based on record data, there are a few works. Smith (1988) proposed a form of residual for testing some parametric models. But in most cases, the variation inherent in graphical summaries is substantial, even when the data are generated by assumed model, and the eye can not always determine whether features in a plot are within the bounds of natural random variation. Consequently, formal hypothesis tests are an important part of model checking (Lawless, 2003). In this paper, Kolmogorov-Smirnov and Cramer-von Mises type goodness of fit tests for record data are proposed. Also a new weighted goodness of fit test is suggested. A Monte-Carlo simulation study is conducted to derive the percentiles of the statistics proposed. Finally, some real data sets are given to investigate results obtained.


Introduction
In reliability, we are concerned primarily with test data in which lifetimes of items that fail during the course of the test are recorded or with variables related in some way to item lifetimes. If the actual lifetime of every item in the sample is recorded, the data are complete data. To obtain complete data, it is necessary to continue the experiment until the last item on test or in service has failed. In cases where even a few items in the sample may have very long lifetimes, experiment can go on for a very long period of time and, in fact, well beyond the point at which the results may no longer be of any interest or use.
In such situations, it may be desirable to terminate the study prior to failure of all items under test. When observation is discontinued prior to all items having failed, we obtain the so-called censored data. There are a variety of forms of censored data that arise in practice; See, for example, Balakrishnan and Cohen (1991) and Cohen (1991).
A form of censored data that is often encountered in applications is the so-called record data. As pointed out by Gulati and Padgett (1995), often, in industrial testing, meteorological data, and some other situations, measurements may be made sequentially and only values smaller (or larger) than all previous ones are recorded. Such data may be represented by (r,k) := (r 1 , k 1 , r 2 , k 2 , · · · , r m , k m ), where r i is the i-th record value meaning new minimum (or maximum) and k i is the number of trials following the observation of r i that are needed to obtain a new record value (or to exhaust the available observation). There are two sampling schemes for generating such a record-breaking data: • (Inverse sampling scheme) Items are presented sequentially and sampling is terminated when the m-th minimum is observed. In this case, the total number of items sampled is a random number, and K m is defined to be one for convenience; • (Random sampling scheme) A random sample Y 1 , · · · , Y n is examined sequentially and successive minimum values are recorded. In this setting, we have N (n) , the number of records obtained, to be random and, given a value of m, we have in this case m i=1 K i = n.
A random variable X is said to have an exponential distribution, denoted by X ∼ Exp(σ), if its cumulative distribution function (cdf) is and the probability density function (pdf) is The exponential distribution is commonly used in many applied problems. Such a exponential distribution is a natural model while studying a variable that can take on only positive values such as lifetime of units. In some situations, the Weibull distribution is more suitable than the exponential distributions (Nelson, 1985). The Weibull cdf, denoted by W (α, σ), is and hence with pdf The scale parameter σ is called the characteristic life because it is always 63.2-th percentile. It determines the spread and has the same units as failure times, for example hours, months, cycles, and so forth. Parameter α is a unitless pure number and determines the shape of the distribution. For α = 1, the Weibull distribution is the exponential distribution. The Weibull distribution appears very frequent in practical problems when we observe data representing minimal values. For example, the life of a capacitor is determined by shortest-lived portion of dielectric. For many parent populations with limited left tail, the limit of the minimum of independent samples converges to a Weibull distribution (Lawless, 2003). Researchers often like to make parametric assumptions on the underlying distribution. With this in mind, estimation of the mean of an exponential distribution based on record data has been treated by Samaniego and Whitaker (1986) and Doostparast (2009). Hoinkes and Padgett (1994) obtained the ML estimators from record-breaking data in this model. As pointed out by Lawless (2003, Chapter 10, p. 465), it is important to check the adequacy of models upon which inferences or actions are based. In the area of goodness of fit based on record data, there is a lack of published literature. But, there are a few works in this direction. However, informal methods of model checking emphasize graphical procedures such as probability and residual plots, Smith (1988) proposed a form of residual for testing some parametric models. But in most cases, the variation inherent in graphical summaries is substantial, even when the data are generated by assumed model, and the eye can not always determine whether features in a plot are within the bounds of natural random variation. Consequently, formal hypothesis tests are an important part of model checking.
Motivated by this, the aim of this paper is to provide some methods for model checking on the basis of records. Specifically, suppose that the record data {R 1 , K 1 , · · · , R m , K m } are coming from a population with parent cdf F (.). We consider testing where α and σ may be unknown positive constants. In other word, is the weibull model adequate to fit the data? Therefore, the rest of this article is organized as follows. Since weibull model has a wide variety application, in Section 2, maximum likelihood estimate (MLE) of the unknown parameters in Weibull model are obtained. In Section 3, explicit expression for Kolmogorov-Smirnov (K-S) and Cramer-Misses (C-M) goodness of fit tests is derived and we proposed a new modified goodness of fit test which is more suitable than the K-S and C-M statistics for records. Critical values of these statistics are obtained by a simulation study. In Section 4, Exponential model is considered and goodness of fit test for exponential model against the alternative weibull model is obtained. Finally, some numerical examples are given to investigate results obtained.

Fitting a Weibull model
It can be shown that, the likelihood function for the two sampling schemes is given by Let us assume that the sequence The corresponding likelihood function under either random or inversely sampling is obtained as After taking logarithm, we have Through this paper "log" denotes natural logarithm. One can easily show that, the maximum of (8) for m ≥ 2, by taking derivatives, is obtained from solving the equations and where The equation (10) respectively, where F 0 (x) is the hypothesized model whileF (x) is the corresponding nonparametric maximum likelihood estimation (NPMLE). On the basis of record data, arising from a random sample with size n, Samaniego and Whitaker (1988) where r (0) ≡ 0 and r (1) < r (2) < · · · < r (m) are the observed record values, ordered from smallest to largest and {k (i) } are the induced order statistics corresponding to the ordered record values {r (i) } or k (i) = k m−i+1 , i = 1, 2, · · · , m. As mentioned by Samaniego and Whitaker (1988), NPMLE in (13) will perform poorly when estimating the right tail of the actual distribution, thus we suggest a new GOF statistic as follows The basic idea for DS n is similar with Anderson-Darling statistic and is to measure the distance betweenF (x) and F 0 (x) in left tail region of F n (x) better than C-M statistic in (12). One may notice that, on the basis of record data, the statistics D n , W 2 n and DS n are modified so that the supreme and integral are over the range y ≤ r 1 . Sufficiently large values of D n , W 2 n or DS n provide evidence against the hypothesized model. To calculate the test statistics, the following Proposition is helpful.
Proposition 3.1 Let R 1 , K 1 , · · · , R m , K m be record data arising from a random sample with size n. Then the statistics D n , W 2 n and DS n are simplified as and respectively, where r (0) ≡ 0, x m+1 ≡ +∞ and for i = 1,Φ 1 · · ·Φ i−1 = 1 and Proof Proof of (15) is clear. For (16), we have Similarly, one can show (17) and desired result follows. ✷ , the distribution of D n , W 2 n and DS n , on the basis of record data do not depend on F 0 (y).
Thus, R ′ 1 , K 1 , · · · , R ′ m , K m are coming from a random sample with common distribution function W (1, 1). The ML estimates on the basis of R ′ 1 , K 1 , · · · , R ′ m , K m , denoted byα ′ andσ ′ , are obtained by solving (9) and (10) replacing r i with r ′ i . One can easily verify thatα = αα ′ . This implies that Hence, the estimate of weibull distribution function is obtained aŝ Similarly to Liao and Shimokawa (1999), this equation indicates thatF 0 (x; α, σ) is independent of the "true values" of the parameters α and σ. This implies that D n , W 2 n and DS n is not depend on the "true value" of α and σ when the parameters are estimated by the MLEs. The desired result follows. ✷ Proposition 3.2 clarifies that the distribution of D n , W 2 n and DS n , on the basis of record data, can be calculated via simulation without loss of generality by using a weibull distribution with α = σ = 1. Let D n,γ , W 2 n,γ and DS n,γ denotes the γ-th quantile of the distribution of D n , W 2 n and DS n , on the basis of record data, respectively. These tests rejects the null hypothesis H 0 : F (x) = 1 − exp {−(x/σ) α } of size γ, if the used GOF statistic exceeds its corresponding (1 − γ)-th quantile. Table 1 presents simulated critical values provided by a Monte-Carlo method. For this task, MC simulation provides the total sets of M = 100, 000 record samples and the values of D n , W 2 n and DS n are calculated and increasingly ordered. Then the critical values of D n , W 2 n and DS n for some significant level were calculated.

GOF for exponential model
As mentioned earlier, the model W (α, σ) reduces to Exp(σ) model when α = 1. Therefore, in this case, testing the hypothesis H 0 : X ∼ Exp(σ) against the alternative H 1 : X ∼ W (α, σ) is equivalent to testing H 0 : α = 1 against the alternative H 1 : α = 1. We could not find a UMP test of size γ (0 < γ < 1) for this hypothesis testing problem. We leave it as an open problem. Therefore, we used the generalized likelihood ratio (GLR) procedure in order to test these hypotheses. From (3), (4) and (7), likelihood ratio statistic for testing H 0 : α = 1 against the alternative H 1 : α = 1 is given by whereα is obtained by solving equation (10) and is the maximum likelihood estimation of α under H 1 whileσ 0 is the ML estimate of σ under H 0 and is given by m i=1 K i R i /n.

Proposition 4.1
When σ is unknown, critical region of the GLR test of level γ for testing H 0 : α = 1 against the alternative H 1 : α = 1 is given by α is the maximum likelihood estimation of α under H 1 and C ⋆ is obtained from the size restriction Under H 0 , it can be shown that −2 ln Λ has an asymptotic chi-square distribution with one degree of freedom when n, sample size, goes to infinity, thus C ⋆ ≈ exp − 1 2 χ 1,1−γ , where χ v,p is the p-th quantile of a chi-square distribution with v degrees of freedom.   Table 2 shows the times between 48 (in minutes) consecutive telephone calls to a company's switchboard, as presented by Castillo et. al. (2005). Assuming that the times between the consecutive telephone calls follow the exponential distribution Exp(σ), Castillo et. al. (2005) obtained the MLE of σ based on the complete data asσ C = 0.934. The corresponding record data, obtained from these complete data, are presented in Table 3. By assuming Exp(σ)-model, the MLE of σ on the basis of record data is obtained to bê σ 0 = 1.022 while by assuming W (α, σ)-model, from (9) and (10), MLEs of α and σ is obtained asα = 1.1815 andσ = 0.8181, respectively. To calculate the GOF statistics, Table 4 is useful. From Table 4, we conclude that D n = 0.6979, W 2 n =, 5.5140 DS n = 8.8604 Letting γ = 0.05, from Table 1, three approaches lead to accept Weibull model for this data. For testing exponential model against the alternative Weibull model, GLR statistics Table 4: GOF from times between 48 consecutive calls. i r i k i r (i) k (i)ΦiFn (r (i) ) =Φ 1 · · ·Φ iF0 (r (i) ) = exp{−(r (i) /σ)α}  (7) on the basis of data in Table 3.
or, −2 ln Λ = 0.3896630654 which gives the p − value = 0.5324766591. This supports exponential assumption by Castillo et. al. (2005). A graph of likelihood function is given in Figure 1.

Example 2
Samaniego and Whitaker (1986) presented record data arising from successive failure times of air conditioning units in Boeing aircraft on plan 7914 consists of n = 24 failure times. The data is given in Table 5.  (7) on the basis of data in Table 5  respectively. Therefore, −2 ln Λ = 1.580279376 which gives the p − value = 0.2087204561. This supports exponential assumption by Samaniego and Whitaker (1986). A graph of likelihood function is given in Figure 2.

Example 3
Samaniego and Whitaker (1988) simulated a random sample with size n = 30 from W (α = 4, σ = 1)-model and record data arising from this sample is presented in Table 6. Assuming  (7) on the basis of data in Table 6 respectively. Therefore, −2 ln Λ = 7.911804336 which gives the p − value = 0.0049113232. This supports departure from exponential assumption. A graph of likelihood function is given in Figure 3.

Concluding Remarks
In this paper, Kolmogorov-Smirnov and Cramer-von Misses type goodness of fit tests as well as a new weighted statistics for record data were proposed. These statistics were used to goodness of fit test for Weibull model. We suggest the following discipline to analyze record data: First step is to test weibull model using the proposed GOF tests in Section 3. Were it accepted, GLR test in Section 4 for the exponentially model. Use the statistical procedures for record data arising from exponential model provided that the exponential model were accepted. See Samaniego and Whitaker (1986), Arnold et. al.
(1998), Doostparast (2009), Doostparast and Balakrishnan (2010). If the exponentially was rejected, one can use the results of Hoinkes and Padgett (1994). If the weibull model was rejected, one can use the non-parametric results of Samaniego and Whitaker (1988). Following Samaniego and Whitaker (1988), one can consider the problem when the available data are arising from L sequence of random variables. More precisely, assume that L independent samples Y i1 , Y i2 , · · · , Y i,n i , 1 ≤ i ≤ L, each of size n i , are obtained sequentially from F . The resulting records are R i1 , K i1 , · · · , R im i , K im i for i = 1, 2, · · · , L where K im i = n i − m i −1 j=1 K ij . Similarly, the NPMLE of the survival function at point t is obtained aŝ where m ⋆ = L i=1 m i , {r (i) , i = 1, 2, · · · , m ⋆ } be the order observed record values in the L samples combined and {k (i) , i = 1, 2, · · · , m ⋆ } the induced order statistics for the associated k ij . To carry out the impact of L on the power of the GOF tests, one can conduct a simulation study.