Received 7 July 2020, Accepted 21 December 2020, Available Online 4 January 2021.
1. INTRODUCTION
For an r×r contingency table with the same row and column ordinal classifications, let X and Y denote the row and column variables, respectively. Also let Pr (X=i,Y=j)=pij(1≤i,j≤r). The symmetry (S) model (Bowker, [1]) is defined by
pij=pji(i<j);
see also Bishop et al. ([2], p.282). The S model indicates a structure of symmetry of the probabilities {pij} with respect to the main diagonal of the table. The global symmetry (GS) model (Read, [3]) is defined by
δU=δL,
where
δU=∑∑i<jpij(=Pr(X<Y)),
δL=∑∑i>jpij(=Pr(X>Y)).
The conditional symmetry (CS) model (Read, [3]; McCullagh, [4]) is defined by
pij=δpji(i<j),
where
δ is unknown parameter; see also Agresti ([
5], p.361) and Tomizawa [
6]. We note that the CS model is also expressed as
Pr(X=i,Y=j|X<Y)=Pr(X=j,Y=i|X>Y)(i≠j).
So, the CS model indicates the CS. A special case of this model obtained by putting δ=1 is the S model. Read [3] gave the theorem that the S model holds if and only if both the GS and CS models hold.
Wall and Lienert [7] defined the point symmetry (PS) model defined by
pij=pi∗j∗(1≤i,j≤r),
where
i∗=r+1−i and
j∗=r+1−j. This model indicates a structure of PS of the probabilities {
pij} with respect to the center cell when
r is odd or the center point when
r is even in square tables. Kurakami
et al. [
8] considered the another point symmetry (APS) model defined by
pij=pi∗j∗(1≤i,j≤r;i+j≠r+1).
The APS model has less restrictions than the PS model by excluding the restrictions imposed on reverse diagonal probabilities. Kurakami et al. [8] considered reverse global symmetry (RGS) model defined by
ΔU=ΔL,
where
ΔU=∑∑i+j<r+1pij(=Pr(X+Y<r+1)),
ΔL=∑∑i+j>r+1pij(=Pr(X+Y>r+1)).
Tomizawa [9] considered the conditional point symmetry (CPS) model defined by
pij=τpi∗j∗(i+j<r+1),
where
τ is unknown parameter. The CPS model indicates that
Pr(X=i,Y=j|X+Y<r+1)=Pr(X=i∗,Y=j∗|X+Y>r+1).
Kurakami et al. [8] gave the theorem that the APS model holds if and only if the RGS and CPS models hold. For more details on contingency tables analysis, see also Rao [10], Mosteller [11] and Wang [12].
By the way, when a model does not hold, we are interested in measuring the degree of departure from the model. Tomizawa [13], Tomizawa [14] and Tomizawa and Saitoh [15] considered the measures which indicate the degree of departure from S, GS and CS, respectively. Tomizawa and Saitoh [15] gave the theorem that the measure from S is equal to the sum of the measure from GS and the measure from CS. Now, we are interested in proposing measures which indicate the degree of departure from APS, RGS and CPS, and showing the theorem that the measure from APS is equal to the sum of the measure from RGS and the measure from CPS.
Section 2 proposes the new measures which represent the degree of departure from APS, RGS and CPS (denoted by ΦAPS, ΦRGS and ΦCPS), and show that the value of ΦAPS is equal to the sum of the value of ΦRGS and the value of ΦCPS. Section 3 gives an approximate standard error and large-sample confidence intervals for the proposed measures. Section 4 describes the relationship between the proposed measures and likelihood ratio statistic. Section 5 gives an example. Section 6 provides some concluding remarks.
2. MEASURES FROM MODELS AND DECOMPOSITION OF MEASURE
We assume that pij+pi∗j∗>0, for 1≤i,j≤r;i+j≠r+1. Let Δ=ΔU+ΔL and
pijc=pijΔ(1≤i,j≤r;i+j≠r+1).
We propose the measure for indicating how degree the departure from the APS model is as follows:
ΦAPS=1log2IAPS,
where
IAPS=∑∑i+j≠r+1pijclogpijcpijAPS,pijAPS=pijc+pi∗j∗c2.
Note that IAPS is the Kullback–Leibler information between {pijc} and {pijAPS}. The measure ΦAPS has characteristics that, (i) 0≤ΦAPS≤1, (ii) ΦAPS=0 if and only if pij=pi∗j∗ for i+j<r+1, and (iii) ΦAPS=1 if and only if pij=0 (then pi∗j∗>0) or pi∗j∗=0 (then pij>0) for i+j<r+1.
Next, assume that ΔU+ΔL>0. Let
ΔUc=ΔUΔ,ΔLc=ΔLΔ.
We propose the measure for indicating how degree the departure from the RGS model is as follows:
ΦRGS=1log2IRGS,
where
IRGS=ΔUclogΔUc1∕2+ΔLclogΔLc1∕2.
Note that IRGS is the Kullback–Leibler information between {ΔUc,ΔLc} and {1∕2,1∕2}. The measure ΦRGS has characteristics that, (i) 0≤ΦRGS≤1, (ii) ΦRGS=0 if and only if ΔU=ΔL, and (iii) ΦRGS=1 if and only if ΔU=0 (then ΔL>0) or ΔL=0 (then ΔU>0).
Moreover, assuming that ΔU>0, ΔL>0, and pij+pi∗j∗>0 for 1≤i,j≤r;i+j≠r+1, we propose the measure for indicating how degree the departure from the CPS model is as follows:
ΦCPS=1log2ICPS,
where
ICPS=∑∑i+j≠r+1pijclogpijcpijCPS,pijCPS=ΔUΔ(pijc+pi∗j∗c)(i+j<r+1),ΔLΔ(pijc+pi∗j∗c)(i+j>r+1).
Note that ICPS is the Kullback–Leibler information between {pijc} and {pijCPS}.
We obtain the theorem as follows:
Theorem 1.
The value of ΦAPS is equal to the sum of the value of ΦRGS and the value of ΦCPS.
Proof.
We see that
ΦAPS−ΦCPS=1log2∑∑i+j<r+1pijclogpijcpijAPS−logpijcpijCPS+∑∑i+j>r+1pijclogpijcpijAPS−logpijcpijCPS.
For i+j<r+1, we see
logpijcpijAPS−logpijcpijCPS=logΔUc1∕2.(1)
For i+j>r+1, we see
logpijcpijAPS−logpijcpijCPS=logΔLc1∕2.(2)
From equations (1) and (2), we see
ΦAPS−ΦCPS=1log2∑∑i+j<r+1pijclogΔUc1∕2+∑∑i+j>r+1pijclogΔLc1∕2=1log2ΔUclogΔUc1∕2+ΔLclogΔLc1∕2=ΦRGS.
The proof is completed. □
Thus, ΦCPS is expressed as ΦCPS=ΦAPS−ΦRGS. All measures can exist under the conditions of {pij+pi∗j∗>0}, ΔU>0 and ΔL>0. Then, we obtain 0≤ΦAPS≤1 and 0≤ΦRGS<1 (note that ΦRGS≠1 because of both ΔU>0 and ΔL>0). Since ΦCPS≥0, we obtain 0≤ΦCPS≤1. Besides, (i) ΦCPS=0 if and only if there is a structure of CPS in the square table, and (ii) ΦCPS=1 if and only if ΦAPS=1 and ΦRGS=0; i.e., pij=0 (then pi∗j∗>0) or pi∗j∗=0 (then pij>0) for i+j<r+1 and ΔU=ΔL.
Consider the artificial probabilities in Table 1. We see in Table 1a that there is the structures of pi∗j∗=0 (then pij>0) for all i+j<r+1 and ΔL=0 (then ΔU>0). Since the degrees of departure from APS (RGS) are largest, the values of ΦAPS and ΦRGS are both 1. We also see in Table 1a that there is not the structure of ΔL>0. Thus, the value of ΦCPS is not definded. We see in Table 1b that there is the structure of pij=0 or pi∗j∗=0 for all i+j<r+1. Since the degrees of departure from APS are largest, the value of ΦAPS is 1. Also, we see in Table 1b that there is the structure of ΔU=ΔL(=0.3). Thus, the value of ΦRGS is 0. From Theorem 1, we obtain the value of ΦCPS is 1. We see in Table 1c that there is the structure of pij=3pi∗j∗ for all i+j<r+1, thus, the CPS model hold. The value of ΦCPS is 0 and the values of the ΦAPS and ΦRGS are both 0.189.
(a)
|
|
Y
|
|
X |
(1) |
(2) |
(3) |
(4) |
Total |
(1) |
0.1 |
0.1 |
0.1 |
0.1 |
0.4 |
(2) |
0.1 |
0.1 |
0.1 |
0 |
0.3 |
(3) |
0.1 |
0.1 |
0 |
0 |
0.2 |
(4) |
0.1 |
0 |
0 |
0 |
0.1 |
Total |
0.4 |
0.3 |
0.2 |
0.1 |
1 |
|
(b)
|
|
Y
|
|
X |
(1) |
(2) |
(3) |
(4) |
Total |
(1) |
0.1 |
0 |
0.1 |
0.1 |
0.3 |
(2) |
0 |
0 |
0.1 |
0 |
0.1 |
(3) |
0.1 |
0.1 |
0.1 |
0.1 |
0.4 |
(4) |
0.1 |
0 |
0.1 |
0 |
0.2 |
Total |
0.3 |
0.1 |
0.4 |
0.2 |
1 |
|
(c)
|
|
Y
|
|
X |
(1) |
(2) |
(3) |
(4) |
Total |
(1) |
0.075 |
0.075 |
0.075 |
0.1 |
0.325 |
(2) |
0.075 |
0.075 |
0.1 |
0.025 |
0.275 |
(3) |
0.075 |
0.1 |
0.025 |
0.025 |
0.225 |
(4) |
0.1 |
0.025 |
0.025 |
0.025 |
0.175 |
Total |
0.325 |
0.275 |
0.225 |
0.175 |
1 |
Table 1Artificial probabilities.
3. APPROXIMATE CONFIDENCE INTERVALS FOR MEASURES
Let nij denote the observed frequency in the ith row and jth column of the table (1≤i,j≤r). Assuming that a multinomial distribution applies to the r×r table, we shall consider approximate standard errors and large-sample confidence intervals for ΦAPS, ΦRGS and ΦCPS using the delta method of which descriptions are given by Bishop et al. (2, Sec.14.6) and Agresti (5, Sec.12.1). The sample version of ΦAPS(ΦRGS,ΦCPS), i.e., Φ̂APS(Φ̂RGS,Φ̂CPS), is given by ΦAPS(ΦRGS,ΦCPS) with { pij} replaced by { p̂ij}, where p̂ij=nij∕n and n=Σi=1rΣj=1rnij. Using the delta method, each of n(Φ̂APS−ΦAPS), n(Φ̂RGS−ΦRGS) and n(Φ̂CPS−ΦCPS) has asymptotically (as n→∞) a normal distribution with mean zero and the corresponding variance, as
σ2[ΦAPS]=1Δ2∑∑i+j≠r+1pijΩij2−ΔΦAPS2,
where
Ωij=1log2log2pijpij+pi∗j∗,
σ2[ΦRGS]=ΔUΔL(log2)2Δ3logΔUΔL2,
and
σ2[ΦCPS]=1Δ2∑∑i+j≠r+1pijΨij2−ΔΦCPS2,
where
Ψij=1log2logΔpijΔU(pij+pi∗j∗)(i+j<r+1),1log2logΔpijΔL(pij+pi∗j∗)(i+j>r+1).
Let σ̂2[ΦAPS] denote σ2[ΦAPS] with { pij} replaced by { p̂ij}. Then σ̂[ΦAPS]∕n is an estimated approximate standard error for Φ̂APS, and Φ̂APS±zp∕2σ̂[ΦAPS]∕n is an approximate 100(1−p) percent confidence interval for ΦAPS, where zp∕2 is the percentage point from the standard normal distribution corresponding to a two-tail probability equal to p. In a similar way, approximate confidence intervals for ΦRGS and ΦCPS are given.
4. RELATIONSHIPS BETWEEN MEASURE AND LIKELIHOOD RATIO STATISTIC
Let GAPS2 denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of the APS model, i.e.,
GAPS2=2n∑i=1r∑j=1rp̂ijlogp̂ijp̂ijAPS,
where
p̂ijAPS=12(p̂ij+p̂i∗j∗)(i+j≠r+1),p̂ij(i+j=r+1).
Note that {p̂ijAPS} are the maximum likelihood estimates of {pij} under the APS model. Then it is that the estimated measure Φ̂APS is equal to GAPS2∕n†, where n†=(2log2)∑∑i+j≠r+1nij.
Next, let GRGS2 denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of the RGS model, i.e.,
GRGS2=2n∑i=1r∑j=1rp̂ijlogp̂ijp̂ijRGS,
where
p̂ijRGS=Δ̂U+Δ̂L2Δ̂Up̂ij(i+j<r+1),Δ̂U+Δ̂L2Δ̂Lp̂ij(i+j>r+1),p̂ij(i+j=r+1).
Note that {p̂ijRGS} are the maximum likelihood estimates of {pij} under the RGS model. Then it is that the estimated measure Φ̂RGS is equal to GRGS2∕n†.
Moreover, let GCPS2 denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of the CPS model, i.e.,
GCPS2=2n∑i=1r∑j=1rp̂ijlogp̂ijp̂ijCPS,
where
p̂ijCPS=Δ̂UΔ̂(p̂ij+p̂i∗j∗)(i+j<r+1),Δ̂LΔ̂(p̂ij+p̂i∗j∗)(i+j>r+1),p̂ij(i+j=r+1),
and
Δ̂U,
Δ̂L and
Δ̂ denote
ΔU,
ΔL and
Δ with
{pij} replaced by
{p̂ij}, respectively. Note that
{p̂ijCPS} are the maximum likelihood estimates of
{pij} under the CPS model. Then it is that the estimated measure
Φ̂CPS is equal to
GCPS2∕n†.
5. EXAMPLE
Consider the data in Tables 2 and 3, taken from Tomizawa [16]. Table 2 is constructed from the data of the unaided distance vision of 4746 students aged 18 to about 25, including about 10% of the women of the Faculty of Science and Technology, Science University of Tokyo in Japan examined in April, 1982. Table 3 is constructed from the data of the unaided distance vision of 3168 pupils aged 6-12, including about half the girls at elementary schools in Tokyo, Japan examined in June, 1984. In Tables 2 and 3 the row variable is the right eye grade and the column variable is the left eye grade with the categories ordered from the lowest grade (1) to the highest grade (4). For Tables 2 and 3, we are interested in whether models of various PS hold. For example, when the RGS model does not hold, the probability that the sum of the right eye grade and left eye grade is 4 or less, is not equal to the probability is 6 or above. When the model does not hold, we are interested in measuring and comparing the degrees of departure from the models for Tables 2 and 3. Table 4 gives the estimates of the measures ΦAPS, ΦRGS and ΦCPS, the estimated approximate standard errors for Φ̂APS, Φ̂RGS and Φ̂CPS and the approximate 95% confidence intervals for ΦAPS, ΦRGS and ΦCPS.
Left eye grade
|
Right eye grade |
Lowest (1) |
Second (2) |
Third (3) |
Highest (4) |
Total |
Lowest (1) |
1429 |
249 |
25 |
20 |
1723 |
Second (2) |
185 |
660 |
124 |
64 |
1033 |
Third (3) |
23 |
114 |
221 |
149 |
507 |
Highest (4) |
22 |
40 |
130 |
1291 |
1483 |
Total |
1659 |
1063 |
500 |
1524 |
4746 |
Table 2Unaided distance vision of 4746 students aged 18 to about 25 including about 10% women in Faculty of Science and Technology, Science University of Tokyo in Japan examined in April 1982; from Tomizawa [16].
Left eye grade
|
Right eye grade |
Lowest (1) |
Second (2) |
Third (3) |
Highest (4) |
Total |
Lowest (1) |
92 |
16 |
7 |
12 |
127 |
Second (2) |
15 |
75 |
42 |
10 |
142 |
Third (3) |
5 |
33 |
138 |
96 |
272 |
Highest (4) |
10 |
21 |
126 |
2470 |
2627 |
Total |
122 |
145 |
313 |
2588 |
3168 |
Table 3Unaided distance vision of 3168 pupils aged 6-12 including about half girls at elementary schools in Tokyo examined in June 1984; from Tomizawa [16].
Applied data |
Φ̂ |
Estimated measure |
Standard error |
Confidence interval |
Table 2 |
Φ̂APS |
0.049 |
0.005 |
(0.038, 0.059) |
|
Φ̂RGS |
0.017 |
0.003 |
(0.010, 0.023) |
|
Φ̂CPS |
0.032 |
0.004 |
(0.023, 0.041) |
Table 3 |
Φ̂APS |
0.693 |
0.016 |
(0.662, 0.724) |
|
Φ̂RGS |
0.640 |
0.017 |
(0.607, 0.674) |
|
Φ̂CPS |
0.053 |
0.008 |
(0.037, 0.069) |
Table 4
Estimate of ΦAPS, ΦRGS and ΦCPS, estimated approximate standard error for Φ̂APS, Φ̂RGS and Φ̂CPS, and approximate 95% confidence interval for ΦAPS, ΦRGS and ΦCPS, applied to Tables 2 and 3.
From Table 4, when the degrees of departure from APS for Tables 2 and 3 are compared using the confidence interval for ΦAPS, it would be greater in Table 3 than in Table 2. The same can be said about the degrees of departure from RGS. However, the comparison between degrees of departure from CPS in Tables 2 and 3 may be impossible. Because the values in the confidence interval for Table 3 are not always greater than the values in the confidence interval for Table 2.
6. CONCLUDING REMARKS
The measures ΦAPS, ΦRGS and ΦCPS always range between 0 and 1 independent of the dimension r and sample size n. So, these measures may be useful for comparing the degrees of departure from APS, RGS and CPS in several tables, respectively.
As is well known, in general, the absolute value of the correlation coefficient between two variables is theoretically 0 or more and 1 or less. However in many actual data, the estimated absolute value of the correlation coefficient is 0 or more and less than 1. Similarly, each of the proposed measures theoretically ranges between 0 and 1. However, when the value of the proposed measure is 1, it has some structures of probability zero. We note that in many actual data, the estimated value of the measures is 0 or more and less than 1.
The measure ΦAPS is used to measure what degree the departure from the APS model is toward the maximum departure of APS defined in Section 2. Similarly, the measure ΦRGS (ΦCPS) is used to measure what degree the departure from the RGS model (the CPS model) is toward the maximum departure of RGS (CPS) defined in Section 2. We note that the definitions of the three models and the corresponding maximum departures are different. That is, we point out that the purpose of using each measure is different. Also, from Theorem 1, note that the values of the three measures are related to each other.
The CPS model imposes no restriction on the reverse diagonal cell probabilities { pii∗}. Thus, the structure of CPS based on the probabilities { pij}, i.e., pij∕pi∗j∗=τ(i+j<r+1), is also expressed as pijc∕pi∗j∗c=τ, using the conditional cell probabilities { pijc}, i+j≠r+1. So, it seems natural that the measure of the degree of departure from CPS and their ranges do not depend on the reverse diagonal cell probabilities. In the sample versions, it may seem to many readers that both measures GCPS2∕n and Φ̂CPS are reasonable measures for representing the degree of departure from CPS. However, Φ̂CPS rather than GCPS2∕n would be useful for comparing the degree of departure from CPS in several tables. Because the range of GCPS2∕n depends on the reverse diagonal proportions, i.e., 0≤(GCPS2∕n)≤(n†∕n)[=(2log2)(1−Σi=1rnii∗∕n)], but Φ̂CPS always ranges between 0 and 1 without depending on the reverse diagonal proportions. By a similar reason, Φ̂CPS may also be preferable to GCPS2 for comparing them. The same can be said about Φ̂APS and Φ̂RGS.
Note that the proposed three measures cannot be used to test the goodness-of-fit of each model. Also note that the three measures have different purposes and it is meaningless to compare the values of the three measures. Kurakami et al. [8] gave the orthogonality of likelihood ratio chi-square statistics for testing the goodness-of-fit, i.e., GAPS2=GCPS2+GRGS2. We note that Theorem 1 is corresponding to the population version of this orthogonality.
We could extend the proposed measure ΦAPS to the power-divergence type measure, as
ΦAPS(λ)=λ(λ+1)2(2λ−1)IAPS(λ)forλ>−1,
where
IAPS(λ)=∑∑i+j≠r+1pijcpijcpijAPSλ−1,
and the value at
λ=0 is taken to be the limit as
λ→0. Thus,
ΦAPS(0) is identical to the measure
ΦAPS. Similarly, we could extend the proposed measures
ΦRGS (
ΦCPS) to the power-divergence type measures. For the detail of the power-deivergence, see Read and Cressie ([
17], p.15). However, for any
λ (
λ≠0), we note that the value of the power-divergence type measure
ΦAPS(λ) is not equal to the sum of the value of the measure
ΦRGS(λ) and the value of the measure
ΦCPS(λ).
CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest regarding the publication of this paper.
AUTHORS' CONTRIBUTIONS
All authors contributed equally to the writing of this paper. All authors have read and agreed to the published version of the manuscript.
Funding Statement
There is no funding of this paper.
ACKNOWLEDGMENTS
The authors would like to thank the editor and the three referees for their helpful comments.
REFERENCES
2.Y.M.M. Bishop, S.E. Fienberg,, and P.W. Holland, Discrete Multivariate Analysis: Theory and Practice, The MIT Press, Cambridge, 1975. 5.A. Agresti, Categorical Data Analysis, Wiley, New York, 1990. 7.K. D. Wall and G. A. Lienert, Biom. J., Vol. 18, 1976, pp. 259-264. 10.C.R. Rao, Bull. Int. Stat. Instit., Vol. 34, 1954, pp. 90-97. 12.Y. J. Wang, Commun. Stat. Simul. C., Vol. 41, 2012, pp. 32-43. 15.S. Tomizawa and K. Saitoh, Calcutta Stat. Assoc. Bull., Vol. 49, 1999, pp. 32-39. 17.T.R.C. Read and N.A.C. Cressie,, Goodness-of-Fit Statistics for Discrete Multivariate Data, Springer-Verlag, New York, 1988.