Proceedings of the 2024 5th International Conference on Big Data and Informatization Education (ICBDIE 2024)

Research on the Construction Technology of University Data Resource Catalogs Based on Machine Learning

Authors
Ying Zhang1, 2, Ying Guo1, 2, *, Shangxu Liu1, 2, Xiaohan Yang1, 2, Bowen Sun1, 2
1Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
2Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
*Corresponding author. Email: guoying@sdas.org
Corresponding Author
Ying Guo
Available Online 7 May 2024.
DOI
10.2991/978-94-6463-417-4_3How to use a DOI?
Keywords
Data Catalog; Big Data; Data Governance; Natural Language Processing
Abstract

Currently, in the process of information system construction in universities, the diverse construction of various departmental business information systems leads to issues such as the diversification of data characteristics and the phenomenon of information silos. This paper aims to construct a unified data resource catalog for universities and conducts in-depth research. Under the current national strategy of digitalization in education and the requirements for informatization development in universities, establishing a clear and orderly data resource catalog is crucial. It helps in building a comprehensive digital architecture, enhancing the utilization of data value, and supporting data sharing and decision-making. Traditional data integration faces challenges such as interference between integration tools and business systems, inability to synchronize metadata in real-time, inconsistency in data standards among different business systems, and lack of metadata semantic information. To address these issues, this paper proposes a method for university data resource catalog based on the Hudi Lakehouse, and details key works in four aspects, including data lake research, the design of university data mapping dictionaries, column semantic recognition methods, and data resource catalog construction technology. It effectively overcomes problems such as connection interference, metadata change perception, and metadata column semantic information recognition, establishing a unified data resource catalog for universities. This achievement is expected to provide strong support for university data management and governance, promote data sharing and utilization, and have a positive reference significance for the future operational models and informatization construction of universities.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2024 5th International Conference on Big Data and Informatization Education (ICBDIE 2024)
Series
Advances in Intelligent Systems Research
Publication Date
7 May 2024
ISBN
10.2991/978-94-6463-417-4_3
ISSN
1951-6851
DOI
10.2991/978-94-6463-417-4_3How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Ying Zhang
AU  - Ying Guo
AU  - Shangxu Liu
AU  - Xiaohan Yang
AU  - Bowen Sun
PY  - 2024
DA  - 2024/05/07
TI  - Research on the Construction Technology of University Data Resource Catalogs Based on Machine Learning
BT  - Proceedings of the 2024 5th International Conference on Big Data and Informatization Education (ICBDIE 2024)
PB  - Atlantis Press
SP  - 14
EP  - 29
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-417-4_3
DO  - 10.2991/978-94-6463-417-4_3
ID  - Zhang2024
ER  -