Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)

Mining data quality rules based on T-dependence

Authors
Toon Boeckling, Antoon Bronselaer, Guy De Tré
Corresponding Author
Toon Boeckling
Available Online August 2019.
DOI
10.2991/eusflat-19.2019.28How to use a DOI?
Keywords
Data Quality Pattern Mining Consistency Triangular Norms
Abstract

Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different.

Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)
Series
Atlantis Studies in Uncertainty Modelling
Publication Date
August 2019
ISBN
10.2991/eusflat-19.2019.28
ISSN
2589-6644
DOI
10.2991/eusflat-19.2019.28How to use a DOI?
Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Toon Boeckling
AU  - Antoon Bronselaer
AU  - Guy De Tré
PY  - 2019/08
DA  - 2019/08
TI  - Mining data quality rules based on T-dependence
BT  - Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)
PB  - Atlantis Press
SP  - 184
EP  - 191
SN  - 2589-6644
UR  - https://doi.org/10.2991/eusflat-19.2019.28
DO  - 10.2991/eusflat-19.2019.28
ID  - Boeckling2019/08
ER  -