Proceedings of the 2016 International Conference on Biological Engineering and Pharmacy (BEP 2016)

A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization

Authors
Mingyue SHENG, Wei DU, Yuan TIAN, Yanchun LIANG
Corresponding Author
Mingyue SHENG
Available Online December 2016.
DOI
10.2991/bep-16.2017.14How to use a DOI?
Keywords
feature selection; samples localization; cancer classification
Abstract

It is an important and hot topic for researchers to develop an efficient and robust feature selection method from gene expression profile data with thousands of genes and small sample size. At present, most of feature selection methods are constructed models to use all samples of gene expression data, but these methods are never considered the influence of outlier samples and the distribution of samples. Besides, it is well known that cancer is a kind of heterogeneous disease, and different cancer tissue samples of same organs have many different subtypes on molecular characteristics. So, we should select samples with the same genetic characteristics to construct models. Therefore, in this article, we proposed a novel and efficient feature selection approach based on localized samples to extract gene signatures more accurately. We picked out the nearest samples in a certain range for each target sample and obtained the best localized samples by constructing a sample-sample similarity network, which calculated Euclidean distance between the central samples with others by using gene expression values firstly. Secondly, we established the co-expression networks by selecting top nearest samples, and formed steady-state probability network applying to Random Walk with Restart (RWR) method. Finally, through dividing into this network and comparing five selection strategies, we got localized samples for best cancer classification. We applied our method on six datasets across different cancer types. The average accuracies of top 100 genes of the method by SVM classifiers in leave-one-out cross validation (LOOCV) are 95.46%, 94.01%, 96.20%, 99.79%, 99.08% and 99.37%, respectively. The results show that the proposed method obtains excellent performance on these datasets. It also indicates that the proposed method is effective and applicable.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 International Conference on Biological Engineering and Pharmacy (BEP 2016)
Series
Advances in Biological Sciences Research
Publication Date
December 2016
ISBN
10.2991/bep-16.2017.14
ISSN
2468-5747
DOI
10.2991/bep-16.2017.14How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Mingyue SHENG
AU  - Wei DU
AU  - Yuan TIAN
AU  - Yanchun LIANG
PY  - 2016/12
DA  - 2016/12
TI  - A Novel Feature Selection Method for Gene Expression Data Based on Samples Localization
BT  - Proceedings of the 2016 International Conference on Biological Engineering and Pharmacy (BEP 2016)
PB  - Atlantis Press
SP  - 63
EP  - 68
SN  - 2468-5747
UR  - https://doi.org/10.2991/bep-16.2017.14
DO  - 10.2991/bep-16.2017.14
ID  - SHENG2016/12
ER  -