Journal of Statistical Theory and Applications

Volume 19, Issue 4, December 2020, Pages 526 - 533

Measure of Departure from Point Symmetry and Decomposition of Measure for Square Contingency Tables

Authors
Kiyotaka Iki1, *, Sadao Tomizawa2
1Faculty of Economics, Nihon University, Chiyoda-ku, Tokyo, 101-8360, Japan
2Faculty of Science and Technology, Tokyo University of Science, Noda City, Chiba, 278-8510, Japan
*Corresponding author. Email: iki.kiyotaka@nihon-u.ac.jp
Corresponding Author
Kiyotaka Iki
Received 7 July 2020, Accepted 21 December 2020, Available Online 4 January 2021.
DOI
10.2991/jsta.d.201223.001How to use a DOI?
Keywords
Conditional symmetry; Global symmetry; Kullback–Leibler information; Measure; Point symmetry; Square contingency table
Abstract

For square contingency tables with ordered categories, Tomizawa, Biometrica J. 28 (1986), 387–393, considered the conditional point symmetry model. Kurakami et al., J. Stat. Adv. Theory Appl. 17 (2017), 33–42, considered the another point symmetry and the reverse global symmetry model. The present paper proposes Kullback–Leibler information type measures to represent the degree of departure from each of the models. Also this paper shows a theorem that the measure for the another point symmetry model is equal to the sum of the measures for the reverse global symmetry model and for the conditional point symmetry model.

Copyright
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

For an r×r contingency table with the same row and column ordinal classifications, let X and Y denote the row and column variables, respectively. Also let Pr (X=i,Y=j)=pij(1i,jr). The symmetry (S) model (Bowker, [1]) is defined by

pij=pji(i<j);

see also Bishop et al. ([2], p.282). The S model indicates a structure of symmetry of the probabilities {pij} with respect to the main diagonal of the table. The global symmetry (GS) model (Read, [3]) is defined by

δU=δL,
where
δU=i<jpij(=Pr(X<Y)),
δL=i>jpij(=Pr(X>Y)).

The conditional symmetry (CS) model (Read, [3]; McCullagh, [4]) is defined by

pij=δpji(i<j),
where δ is unknown parameter; see also Agresti ([5], p.361) and Tomizawa [6]. We note that the CS model is also expressed as
Pr(X=i,Y=j|X<Y)=Pr(X=j,Y=i|X>Y)(ij).

So, the CS model indicates the CS. A special case of this model obtained by putting δ=1 is the S model. Read [3] gave the theorem that the S model holds if and only if both the GS and CS models hold.

Wall and Lienert [7] defined the point symmetry (PS) model defined by

pij=pij(1i,jr),
where i=r+1i and j=r+1j. This model indicates a structure of PS of the probabilities { pij} with respect to the center cell when r is odd or the center point when r is even in square tables. Kurakami et al. [8] considered the another point symmetry (APS) model defined by
pij=pij(1i,jr;i+jr+1).

The APS model has less restrictions than the PS model by excluding the restrictions imposed on reverse diagonal probabilities. Kurakami et al. [8] considered reverse global symmetry (RGS) model defined by

ΔU=ΔL,
where
ΔU=i+j<r+1pij(=Pr(X+Y<r+1)),
ΔL=i+j>r+1pij(=Pr(X+Y>r+1)).

Tomizawa [9] considered the conditional point symmetry (CPS) model defined by

pij=τpij(i+j<r+1),
where τ is unknown parameter. The CPS model indicates that
Pr(X=i,Y=j|X+Y<r+1)=Pr(X=i,Y=j|X+Y>r+1).

Kurakami et al. [8] gave the theorem that the APS model holds if and only if the RGS and CPS models hold. For more details on contingency tables analysis, see also Rao [10], Mosteller [11] and Wang [12].

By the way, when a model does not hold, we are interested in measuring the degree of departure from the model. Tomizawa [13], Tomizawa [14] and Tomizawa and Saitoh [15] considered the measures which indicate the degree of departure from S, GS and CS, respectively. Tomizawa and Saitoh [15] gave the theorem that the measure from S is equal to the sum of the measure from GS and the measure from CS. Now, we are interested in proposing measures which indicate the degree of departure from APS, RGS and CPS, and showing the theorem that the measure from APS is equal to the sum of the measure from RGS and the measure from CPS.

Section 2 proposes the new measures which represent the degree of departure from APS, RGS and CPS (denoted by ΦAPS, ΦRGS and ΦCPS), and show that the value of ΦAPS is equal to the sum of the value of ΦRGS and the value of ΦCPS. Section 3 gives an approximate standard error and large-sample confidence intervals for the proposed measures. Section 4 describes the relationship between the proposed measures and likelihood ratio statistic. Section 5 gives an example. Section 6 provides some concluding remarks.

2. MEASURES FROM MODELS AND DECOMPOSITION OF MEASURE

We assume that pij+pij>0, for 1i,jr;i+jr+1. Let Δ=ΔU+ΔL and

pijc=pijΔ(1i,jr;i+jr+1).

We propose the measure for indicating how degree the departure from the APS model is as follows:

ΦAPS=1log2IAPS,
where
IAPS=i+jr+1pijclogpijcpijAPS,pijAPS=pijc+pijc2.

Note that IAPS is the Kullback–Leibler information between {pijc} and {pijAPS}. The measure ΦAPS has characteristics that, (i) 0ΦAPS1, (ii) ΦAPS=0 if and only if pij=pij for i+j<r+1, and (iii) ΦAPS=1 if and only if pij=0 (then pij>0) or pij=0 (then pij>0) for i+j<r+1.

Next, assume that ΔU+ΔL>0. Let

ΔUc=ΔUΔ,ΔLc=ΔLΔ.

We propose the measure for indicating how degree the departure from the RGS model is as follows:

ΦRGS=1log2IRGS,
where
IRGS=ΔUclogΔUc12+ΔLclogΔLc12.

Note that IRGS is the Kullback–Leibler information between {ΔUc,ΔLc} and {12,12}. The measure ΦRGS has characteristics that, (i) 0ΦRGS1, (ii) ΦRGS=0 if and only if ΔU=ΔL, and (iii) ΦRGS=1 if and only if ΔU=0 (then ΔL>0) or ΔL=0 (then ΔU>0).

Moreover, assuming that ΔU>0, ΔL>0, and pij+pij>0 for 1i,jr;i+jr+1, we propose the measure for indicating how degree the departure from the CPS model is as follows:

ΦCPS=1log2ICPS,
where
ICPS=i+jr+1pijclogpijcpijCPS,pijCPS=ΔUΔ(pijc+pijc)(i+j<r+1),ΔLΔ(pijc+pijc)(i+j>r+1).

Note that ICPS is the Kullback–Leibler information between {pijc} and {pijCPS}.

We obtain the theorem as follows:

Theorem 1.

The value of ΦAPS is equal to the sum of the value of ΦRGS and the value of ΦCPS.

Proof.

We see that

ΦAPSΦCPS=1log2i+j<r+1pijclogpijcpijAPSlogpijcpijCPS+i+j>r+1pijclogpijcpijAPSlogpijcpijCPS.

For i+j<r+1, we see

logpijcpijAPSlogpijcpijCPS=logΔUc12.(1)

For i+j>r+1, we see

logpijcpijAPSlogpijcpijCPS=logΔLc12.(2)

From equations (1) and (2), we see

ΦAPSΦCPS=1log2i+j<r+1pijclogΔUc12+i+j>r+1pijclogΔLc12=1log2ΔUclogΔUc12+ΔLclogΔLc12=ΦRGS.

The proof is completed. □

Thus, ΦCPS is expressed as ΦCPS=ΦAPSΦRGS. All measures can exist under the conditions of {pij+pij>0}, ΔU>0 and ΔL>0. Then, we obtain 0ΦAPS1 and 0ΦRGS<1 (note that ΦRGS1 because of both ΔU>0 and ΔL>0). Since ΦCPS0, we obtain 0ΦCPS1. Besides, (i) ΦCPS=0 if and only if there is a structure of CPS in the square table, and (ii) ΦCPS=1 if and only if ΦAPS=1 and ΦRGS=0; i.e., pij=0 (then pij>0) or pij=0 (then pij>0) for i+j<r+1 and ΔU=ΔL.

Consider the artificial probabilities in Table 1. We see in Table 1a that there is the structures of pij=0 (then pij>0) for all i+j<r+1 and ΔL=0 (then ΔU>0). Since the degrees of departure from APS (RGS) are largest, the values of ΦAPS and ΦRGS are both 1. We also see in Table 1a that there is not the structure of ΔL>0. Thus, the value of ΦCPS is not definded. We see in Table 1b that there is the structure of pij=0 or pij=0 for all i+j<r+1. Since the degrees of departure from APS are largest, the value of ΦAPS is 1. Also, we see in Table 1b that there is the structure of ΔU=ΔL(=0.3). Thus, the value of ΦRGS is 0. From Theorem 1, we obtain the value of ΦCPS is 1. We see in Table 1c that there is the structure of pij=3pij for all i+j<r+1, thus, the CPS model hold. The value of ΦCPS is 0 and the values of the ΦAPS and ΦRGS are both 0.189.

(a)
Y
X (1) (2) (3) (4) Total
(1) 0.1 0.1 0.1 0.1 0.4
(2) 0.1 0.1 0.1 0 0.3
(3) 0.1 0.1 0 0 0.2
(4) 0.1 0 0 0 0.1
Total 0.4 0.3 0.2 0.1 1

(b)
Y
X (1) (2) (3) (4) Total
(1) 0.1 0 0.1 0.1 0.3
(2) 0 0 0.1 0 0.1
(3) 0.1 0.1 0.1 0.1 0.4
(4) 0.1 0 0.1 0 0.2
Total 0.3 0.1 0.4 0.2 1

(c)
Y
X (1) (2) (3) (4) Total
(1) 0.075 0.075 0.075 0.1 0.325
(2) 0.075 0.075 0.1 0.025 0.275
(3) 0.075 0.1 0.025 0.025 0.225
(4) 0.1 0.025 0.025 0.025 0.175
Total 0.325 0.275 0.225 0.175 1
Table 1

Artificial probabilities.

3. APPROXIMATE CONFIDENCE INTERVALS FOR MEASURES

Let nij denote the observed frequency in the ith row and jth column of the table (1i,jr). Assuming that a multinomial distribution applies to the r×r table, we shall consider approximate standard errors and large-sample confidence intervals for ΦAPS, ΦRGS and ΦCPS using the delta method of which descriptions are given by Bishop et al. (2, Sec.14.6) and Agresti (5, Sec.12.1). The sample version of ΦAPS(ΦRGS,ΦCPS), i.e., Φ̂APS(Φ̂RGS,Φ̂CPS), is given by ΦAPS(ΦRGS,ΦCPS) with { pij} replaced by { p̂ij}, where p̂ij=nijn and n=Σi=1rΣj=1rnij. Using the delta method, each of n(Φ̂APSΦAPS), n(Φ̂RGSΦRGS) and n(Φ̂CPSΦCPS) has asymptotically (as n) a normal distribution with mean zero and the corresponding variance, as

σ2[ΦAPS]=1Δ2i+jr+1pijΩij2ΔΦAPS2,
where
Ωij=1log2log2pijpij+pij,
σ2[ΦRGS]=ΔUΔL(log2)2Δ3logΔUΔL2,
and
σ2[ΦCPS]=1Δ2i+jr+1pijΨij2ΔΦCPS2,
where
Ψij=1log2logΔpijΔU(pij+pij)(i+j<r+1),1log2logΔpijΔL(pij+pij)(i+j>r+1).

Let σ̂2[ΦAPS] denote σ2[ΦAPS] with { pij} replaced by { p̂ij}. Then σ̂[ΦAPS]n is an estimated approximate standard error for Φ̂APS, and Φ̂APS±zp2σ̂[ΦAPS]n is an approximate 100(1p) percent confidence interval for ΦAPS, where zp2 is the percentage point from the standard normal distribution corresponding to a two-tail probability equal to p. In a similar way, approximate confidence intervals for ΦRGS and ΦCPS are given.

4. RELATIONSHIPS BETWEEN MEASURE AND LIKELIHOOD RATIO STATISTIC

Let GAPS2 denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of the APS model, i.e.,

GAPS2=2ni=1rj=1rp̂ijlogp̂ijp̂ijAPS,
where
p̂ijAPS=12(p̂ij+p̂ij)(i+jr+1),p̂ij(i+j=r+1).

Note that {p̂ijAPS} are the maximum likelihood estimates of {pij} under the APS model. Then it is that the estimated measure Φ̂APS is equal to GAPS2n, where n=(2log2)i+jr+1nij.

Next, let GRGS2 denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of the RGS model, i.e.,

GRGS2=2ni=1rj=1rp̂ijlogp̂ijp̂ijRGS,
where
p̂ijRGS=Δ̂U+Δ̂L2Δ̂Up̂ij(i+j<r+1),Δ̂U+Δ̂L2Δ̂Lp̂ij(i+j>r+1),p̂ij(i+j=r+1).

Note that {p̂ijRGS} are the maximum likelihood estimates of {pij} under the RGS model. Then it is that the estimated measure Φ̂RGS is equal to GRGS2n.

Moreover, let GCPS2 denote the likelihood ratio chi-squared statistic for testing the goodness-of-fit of the CPS model, i.e.,

GCPS2=2ni=1rj=1rp̂ijlogp̂ijp̂ijCPS,
where
p̂ijCPS=Δ̂UΔ̂(p̂ij+p̂ij)(i+j<r+1),Δ̂LΔ̂(p̂ij+p̂ij)(i+j>r+1),p̂ij(i+j=r+1),
and Δ̂U, Δ̂L and Δ̂ denote ΔU, ΔL and Δ with {pij} replaced by {p̂ij}, respectively. Note that {p̂ijCPS} are the maximum likelihood estimates of {pij} under the CPS model. Then it is that the estimated measure Φ̂CPS is equal to GCPS2n.

5. EXAMPLE

Consider the data in Tables 2 and 3, taken from Tomizawa [16]. Table 2 is constructed from the data of the unaided distance vision of 4746 students aged 18 to about 25, including about 10% of the women of the Faculty of Science and Technology, Science University of Tokyo in Japan examined in April, 1982. Table 3 is constructed from the data of the unaided distance vision of 3168 pupils aged 6-12, including about half the girls at elementary schools in Tokyo, Japan examined in June, 1984. In Tables 2 and 3 the row variable is the right eye grade and the column variable is the left eye grade with the categories ordered from the lowest grade (1) to the highest grade (4). For Tables 2 and 3, we are interested in whether models of various PS hold. For example, when the RGS model does not hold, the probability that the sum of the right eye grade and left eye grade is 4 or less, is not equal to the probability is 6 or above. When the model does not hold, we are interested in measuring and comparing the degrees of departure from the models for Tables 2 and 3. Table 4 gives the estimates of the measures ΦAPS, ΦRGS and ΦCPS, the estimated approximate standard errors for Φ̂APS, Φ̂RGS and Φ̂CPS and the approximate 95% confidence intervals for ΦAPS, ΦRGS and ΦCPS.

Left eye grade
Right eye grade Lowest (1) Second (2) Third (3) Highest (4) Total
Lowest (1) 1429 249 25 20 1723
Second (2) 185 660 124 64 1033
Third (3) 23 114 221 149 507
Highest (4) 22 40 130 1291 1483
Total 1659 1063 500 1524 4746
Table 2

Unaided distance vision of 4746 students aged 18 to about 25 including about 10% women in Faculty of Science and Technology, Science University of Tokyo in Japan examined in April 1982; from Tomizawa [16].

Left eye grade
Right eye grade Lowest (1) Second (2) Third (3) Highest (4) Total
Lowest (1) 92 16 7 12 127
Second (2) 15 75 42 10 142
Third (3) 5 33 138 96 272
Highest (4) 10 21 126 2470 2627
Total 122 145 313 2588 3168
Table 3

Unaided distance vision of 3168 pupils aged 6-12 including about half girls at elementary schools in Tokyo examined in June 1984; from Tomizawa [16].

Applied data Φ̂ Estimated measure Standard error Confidence interval
Table 2 Φ̂APS 0.049 0.005 (0.038, 0.059)
Φ̂RGS 0.017 0.003 (0.010, 0.023)
Φ̂CPS 0.032 0.004 (0.023, 0.041)
Table 3 Φ̂APS 0.693 0.016 (0.662, 0.724)
Φ̂RGS 0.640 0.017 (0.607, 0.674)
Φ̂CPS 0.053 0.008 (0.037, 0.069)
Table 4

Estimate of ΦAPS, ΦRGS and ΦCPS, estimated approximate standard error for Φ̂APS, Φ̂RGS and Φ̂CPS, and approximate 95% confidence interval for ΦAPS, ΦRGS and ΦCPS, applied to Tables 2 and 3.

From Table 4, when the degrees of departure from APS for Tables 2 and 3 are compared using the confidence interval for ΦAPS, it would be greater in Table 3 than in Table 2. The same can be said about the degrees of departure from RGS. However, the comparison between degrees of departure from CPS in Tables 2 and 3 may be impossible. Because the values in the confidence interval for Table 3 are not always greater than the values in the confidence interval for Table 2.

6. CONCLUDING REMARKS

The measures ΦAPS, ΦRGS and ΦCPS always range between 0 and 1 independent of the dimension r and sample size n. So, these measures may be useful for comparing the degrees of departure from APS, RGS and CPS in several tables, respectively.

As is well known, in general, the absolute value of the correlation coefficient between two variables is theoretically 0 or more and 1 or less. However in many actual data, the estimated absolute value of the correlation coefficient is 0 or more and less than 1. Similarly, each of the proposed measures theoretically ranges between 0 and 1. However, when the value of the proposed measure is 1, it has some structures of probability zero. We note that in many actual data, the estimated value of the measures is 0 or more and less than 1.

The measure ΦAPS is used to measure what degree the departure from the APS model is toward the maximum departure of APS defined in Section 2. Similarly, the measure ΦRGS (ΦCPS) is used to measure what degree the departure from the RGS model (the CPS model) is toward the maximum departure of RGS (CPS) defined in Section 2. We note that the definitions of the three models and the corresponding maximum departures are different. That is, we point out that the purpose of using each measure is different. Also, from Theorem 1, note that the values of the three measures are related to each other.

The CPS model imposes no restriction on the reverse diagonal cell probabilities { pii}. Thus, the structure of CPS based on the probabilities { pij}, i.e., pijpij=τ(i+j<r+1), is also expressed as pijcpijc=τ, using the conditional cell probabilities { pijc}, i+jr+1. So, it seems natural that the measure of the degree of departure from CPS and their ranges do not depend on the reverse diagonal cell probabilities. In the sample versions, it may seem to many readers that both measures GCPS2n and Φ̂CPS are reasonable measures for representing the degree of departure from CPS. However, Φ̂CPS rather than GCPS2n would be useful for comparing the degree of departure from CPS in several tables. Because the range of GCPS2n depends on the reverse diagonal proportions, i.e., 0(GCPS2n)(nn)[=(2log2)(1Σi=1rniin)], but Φ̂CPS always ranges between 0 and 1 without depending on the reverse diagonal proportions. By a similar reason, Φ̂CPS may also be preferable to GCPS2 for comparing them. The same can be said about Φ̂APS and Φ̂RGS.

Note that the proposed three measures cannot be used to test the goodness-of-fit of each model. Also note that the three measures have different purposes and it is meaningless to compare the values of the three measures. Kurakami et al. [8] gave the orthogonality of likelihood ratio chi-square statistics for testing the goodness-of-fit, i.e., GAPS2=GCPS2+GRGS2. We note that Theorem 1 is corresponding to the population version of this orthogonality.

We could extend the proposed measure ΦAPS to the power-divergence type measure, as

ΦAPS(λ)=λ(λ+1)2(2λ1)IAPS(λ)forλ>1,
where
IAPS(λ)=i+jr+1pijcpijcpijAPSλ1,
and the value at λ=0 is taken to be the limit as λ0. Thus, ΦAPS(0) is identical to the measure ΦAPS. Similarly, we could extend the proposed measures ΦRGS (ΦCPS) to the power-divergence type measures. For the detail of the power-deivergence, see Read and Cressie ([17], p.15). However, for any λ (λ0), we note that the value of the power-divergence type measure ΦAPS(λ) is not equal to the sum of the value of the measure ΦRGS(λ) and the value of the measure ΦCPS(λ).

CONFLICTS OF INTEREST

The authors declare that there are no conflicts of interest regarding the publication of this paper.

AUTHORS' CONTRIBUTIONS

All authors contributed equally to the writing of this paper. All authors have read and agreed to the published version of the manuscript.

Funding Statement

There is no funding of this paper.

ACKNOWLEDGMENTS

The authors would like to thank the editor and the three referees for their helpful comments.

REFERENCES

2.Y.M.M. Bishop, S.E. Fienberg,, and P.W. Holland, Discrete Multivariate Analysis: Theory and Practice, The MIT Press, Cambridge, 1975.
5.A. Agresti, Categorical Data Analysis, Wiley, New York, 1990.
7.K. D. Wall and G. A. Lienert, Biom. J., Vol. 18, 1976, pp. 259-264.
10.C.R. Rao, Bull. Int. Stat. Instit., Vol. 34, 1954, pp. 90-97.
12.Y. J. Wang, Commun. Stat. Simul. C., Vol. 41, 2012, pp. 32-43.
15.S. Tomizawa and K. Saitoh, Calcutta Stat. Assoc. Bull., Vol. 49, 1999, pp. 32-39.
17.T.R.C. Read and N.A.C. Cressie,, Goodness-of-Fit Statistics for Discrete Multivariate Data, Springer-Verlag, New York, 1988.
Journal
Journal of Statistical Theory and Applications
Volume-Issue
19 - 4
Pages
526 - 533
Publication Date
2021/01/04
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.201223.001How to use a DOI?
Copyright
© 2021 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Kiyotaka Iki
AU  - Sadao Tomizawa
PY  - 2021
DA  - 2021/01/04
TI  - Measure of Departure from Point Symmetry and Decomposition of Measure for Square Contingency Tables
JO  - Journal of Statistical Theory and Applications
SP  - 526
EP  - 533
VL  - 19
IS  - 4
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.201223.001
DO  - 10.2991/jsta.d.201223.001
ID  - Iki2021
ER  -