Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023)

Empowering Global Health with AI: Using NLP to Extract Medicinal Plants and Disease-fighting Compounds from PubMed

Authors
Rehan Khan1, *, Preenon Bagchi1, Krutanjali Patil1
1Institute of Biosciences and Technology, MGM University, Chht. Sambhajinagar, India
*Corresponding author. Email: khanrehan9395@gmail.com
Corresponding Author
Rehan Khan
Available Online 17 November 2023.
DOI
10.2991/978-94-6463-294-1_7How to use a DOI?
Keywords
Natural Language Processing; PubMed; Python; Disease; Artificial Intelligence
Abstract

PubMed is a free database maintained by the National Library of Medicine (NLM) at the National Institutes of Health (NIH) in the United States, and it contains more than 30 million citations and abstracts of biomedical literature and other scientific publications related to medicinal plants and phytocompounds from around the world. Natural Language Processing (NLP) and the Natural Language Toolkit (NLTK) is used to extract information on medicinal plants and disease-fighting compounds from PubMed, with the aim of empowering global health research. The methodology involved a Python-based NLP pipeline to extract information on medicinal plants and disease-fighting compounds from PubMed. The pipeline involved several stages, including text pre-processing, named entity recognition (NER), and relationship extraction. Text pre-processing involved cleaning and formatting the abstracts to remove irrelevant information and standardize the text. NER was performed using the libraries to identify chemical compounds, and disease targets. Relationship extraction involved using the NLTK to identify co-occurring terms and analyze their relationships based on their context and proximity. The use of NLP and NLTK can be powerful tools for extracting and analyzing information on medicinal plants and disease-fighting compounds from PubMed. The code developed in this study can be used to automate the extraction of key information from a large number of scientific articles, saving researchers time and effort. The results also showed that this approach can be used to identify relationships between different plants, compounds, and diseases, providing insights that may not be apparent through manual analysis.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023)
Series
Advances in Health Sciences Research
Publication Date
17 November 2023
ISBN
10.2991/978-94-6463-294-1_7
ISSN
2468-5739
DOI
10.2991/978-94-6463-294-1_7How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Rehan Khan
AU  - Preenon Bagchi
AU  - Krutanjali Patil
PY  - 2023
DA  - 2023/11/17
TI  - Empowering Global Health with AI: Using NLP to Extract Medicinal Plants and Disease-fighting Compounds from PubMed
BT  - Proceedings of the International Conference on Advances in Nano-Neuro-Bio-Quantum (ICAN 2023)
PB  - Atlantis Press
SP  - 72
EP  - 86
SN  - 2468-5739
UR  - https://doi.org/10.2991/978-94-6463-294-1_7
DO  - 10.2991/978-94-6463-294-1_7
ID  - Khan2023
ER  -