Proceedings of the Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support

A Complex Mining Process about Air Quality

Authors
Mihaela Juganaru-Mathieu, Silvia González Brambila
Corresponding Author
Mihaela Juganaru-Mathieu
Available Online October 2013.
DOI
10.2991/.2013.35How to use a DOI?
Keywords
Knowledge Discovery, Text Mining, Data Mining, XML format
Abstract

In this paper we present a mining project about extracting knowledge from public documents concerning air pollution. Our collection contains annual reports about air quality, acid rains, climatological conditions in the large area of Mexico City. These reports contain reliable data and are generated by the Department of Environment, they are in a printable format (.pdf file) with number of pages, table of content, textual information, numerical information in tables, images. For a human being it is impossible to read the whole collection during a relatively short period (a few days or weeks) and understand the content of them. An automatic box of tools able to extract knowledge, to quick retrieve important term, to answer some exact questions about precise climate parameters would be an important help for lecturers. We will describe our project based upon a text and data mining process; the aims of the complex process are extract frequent temporal pattern, to extract association rules, to integrate also some information retrieval simple tools. In parallel, some data mining techniques will be used to detect the same types of data presented in every report and then to extract a numerical datamart containing climatological data structured by month, year, geographical area. The datamart will be analyzed also. The main steps of our mining process are: preparing documents (cleaning, removing images, table of contents, footnotes), transforming in structured document (in a XML format with a precise DTD), indexing, various algorithms and methods of mining, visualising results and validating knowledge. We think also that our methodology will concern also other collections of the same category : reliable data and informations presented in huge periodical reports.

Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support
Series
Advances in Intelligent Systems Research
Publication Date
October 2013
ISBN
10.2991/.2013.35
ISSN
1951-6851
DOI
10.2991/.2013.35How to use a DOI?
Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Mihaela Juganaru-Mathieu
AU  - Silvia González Brambila
PY  - 2013/10
DA  - 2013/10
TI  - A Complex Mining Process about Air Quality
BT  - Proceedings of the Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support
PB  - Atlantis Press
SP  - 290
EP  - 294
SN  - 1951-6851
UR  - https://doi.org/10.2991/.2013.35
DO  - 10.2991/.2013.35
ID  - Juganaru-Mathieu2013/10
ER  -