Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Mohammed, Tareq Abed; Bayat, Oguz; Uçan, Osman N.; Alhayali, Shaymaa

doi:10.1007/s10699-019-09588-6

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Published: 01 March 2019

Volume 25, pages 1009–1025, (2020)
Cite this article

Foundations of Science Aims and scope Submit manuscript

Tareq Abed Mohammed^1,2,
Oguz Bayat¹,
Osman N. Uçan¹ &
…
Shaymaa Alhayali¹

431 Accesses
11 Citations
3 Altmetric
Explore all metrics

Abstract

Due to the huge amount of data being generating from different sources, the analyzing and extracting of useful information from these data becomes a very complex task. The difficulty of dealing with big data optimization problems comes from many factors such as the high number of features, and the existing of lost data. The feature selection process becomes an important step in many data mining and machine learning algorithms to reduce the dimensionality of the optimization problems and increase the performance of the classification or clustering algorithms. In this paper, a set of hybrid and efficient genetic algorithms are proposed to solve feature selection problem, when the handled data has a large feature size. The proposed algorithms use a new gene-weighted mechanism that can adaptively classify the features into strong relative features, weak or redundant features, and unstable features during the evolution of the algorithm. Based on this classification, the proposed algorithm gives the strong features high priority and the weak features less priority when generating new candidate solutions. In the same time, the proposed algorithm tries to more concentrate on unstable features that sometimes appear and sometimes disappear from the best solutions of the population. The performance of proposed algorithms is investigated by using different datasets and feature selection algorithms. The results show that our proposed algorithms can outperform the other feature selection algorithms and effectively enhance the classification performance over the tested datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

A Feature Selection Algorithm for Big Data Based on Genetic Algorithm

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Feature Subset Selection Approach by Gray-Wolf Optimization

References

Aljawarneh, S. A., Alawneh, A., & Jaradat, R. (2017a). Cloud security engineering: Early stages of SDLC. Future Generation Computer Systems. https://doi.org/10.1016/j.future.2016.10.005.
Article Google Scholar
Aljawarneh, S., Aldwairi, M., & Yassein, M. B. (2018). Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. Journal of Computational Science, 25, 152–160.
Article Google Scholar
Aljawarneh, S. A., Moftah, R. A., & Maatuk, A. M. (2016). Investigations of automatic methods for detecting the polymorphic worms signatures. Future Generation Computer Systems, 60, 67–77. https://doi.org/10.1016/j.future.2016.01.020.
Article Google Scholar
Aljawarneh, S. A., & Vangipuram, R. (2018). GARUDA: Gaussian dissimilarity measure for feature representation and anomaly detection in Internet of things. Journal of Supercomputing. https://doi.org/10.1007/s11227-018-2397-3.
Article Google Scholar
Aljawarneh, S. A., Vangipuram, R., Puligadda, V. K., & Vinjamuri, J. (2017b). G-SPAMINE: An approach to discover temporal association patterns and trends in internet of things. Future Generation Computer Systems, 74, 430–443. https://doi.org/10.1016/j.future.2017.01.01344310.1016/j.future.2017.01.013.
Article Google Scholar
Aljawarneh, S., Yassein, M. B., & Aljundi, M. (2017c). An enhanced J48 classification algorithm for the anomaly intrusion detection systems. Cluster Computing. https://doi.org/10.1007/s10586-017-1109-8.
Article Google Scholar
Aljawarneh, S., Yassein, M. B., & Talafha, W. A. (2017d). A resource-efficient encryption algorithm for multimedia big data. Multimedia Tools and Applications, 76(21), 22703–22724. https://doi.org/10.1007/s11042-016-4333-y.
Article Google Scholar
Aljawarneh, S., Yassein, M. B., & Talafha, W. A. (2017e). A multithreaded programming approach for multimedia big data: encryption system. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-017-4873-9.
Article Google Scholar
Chen, C. P., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314–347.
Article Google Scholar
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(1–4), 131–156.
Article Google Scholar
Dua, D., & Karra Taniskidou, E. (2017). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml. Accessed 15 Jan 2018.
Frohlich, H., Chapelle, O., & Scholkopf, B. (2003). Feature selection for support vector machines by means of genetic algorithm. In Proceedings of 15th IEEE international conference on tools with artificial intelligence (pp. 142–148). IEEE.‏
George, G., Haas, M. R., & Pentland, A. (2014). Big data and management. Academy of Management Journal, 57(2), 321–326.
Article Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Google Scholar
Hamdani, T. M., Won, J. M., Alimi, A. M., & Karray, F. (2007). Multi-objective feature selection with NSGA II. In International conference on adaptive and natural computing algorithms (pp. 240–247). Springer, Berlin.‏
Ho, R. (2012). Big data machine learning.‏
Hong, Z. Q., & Yang, J. Y. (1991). Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognition, 24(4), 317–324.
Article Google Scholar
John Walker, S. (2014). Big data: A revolution that will transform how we live, work, and think. International Journal of Advertising, 33(1), 181–183‏. https://doi.org/10.2501/IJA-33-1-181-183.
Article Google Scholar
Kalpana, G., Kumar, P. V., Aljawarneh, S., & Krishnaiah, R. V. (2018). Shifted adaption homomorphism encryption for mobile and cloud learning. Computers & Electrical Engineering, 65, 178–195.
Article Google Scholar
Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: Issues, challenges, tools and good practices. In Sixth international conference on contemporary computing (IC3) (pp. 404–409). IEEE.‏
Liu, H., & Lei, Yu. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491–502.
Article Google Scholar
Liu, H., Motoda, H., Setiono, R., & Zhao, Z. (2010). Feature selection: An ever evolving frontier in data mining. In Proceedings of JMLR feature selection in data mining, vol. 10, Hyderabad, India, 2010 (pp. 4–13).
Liu, H., & Zhao, Z. (2009). Manipulating data and dimension reduction methods: Feature selection, encyclopedia of complexity and systems science (pp. 5348–5359). Berlin: Springer.
Google Scholar
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity.‏
Mao, Q., & Tsang, I. W.-H. (2013). A feature selection method for multivariate performance measures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2051–2063.
Article Google Scholar
Marcano-Cedeño, A., et al. (2010). Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In IECON 2010-36th annual conference on IEEE industrial electronics society. IEEE.‏
Marill, T., & Green, D. M. (1963). On the effectiveness of receptors in recognition systems. IEEE Transactions on Information Theory, 9(1), 11–17.
Article Google Scholar
Min, F., Hu, Q., & Zhu, W. (2014). Feature selection with test cost constraint. International Journal of Approximate Reasoning, 55(1), 167–179.
Article Google Scholar
Mohammad, R., Thabtah, F. A., & McCluskey, T. L. (2014). Predicting phishing websites based on self-structuring neural network. Neural Computing and Applications, 25(2), 443–458.
Article Google Scholar
Morita, M., Sabourin, R., Bortolozzi, F., & Suen, C. Y. (2003). Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In Proceedings of seventh international conference on document analysis and recognition (pp. 666–670). IEEE.‏
Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future—Big data, machine learning, and clinical medicine. The New England Journal of Medicine, 375(13), 1216.
Article Google Scholar
Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1424–1437.
Article Google Scholar
Oliveira, L. S., Sabourin, R., Bortolozzi, F., & Suen, C. Y. (2002). Feature selection using multi-objective genetic algorithms for handwritten digit recognition. In Proceedings of 16th international conference on pattern recognition (Vol. 1, pp. 568–571). IEEE.‏
Pudil, P., Novoviˇcová, J., & Kittler, J. V. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119–1125.
Article Google Scholar
Radhakrishna, V., Aljawarneh, S. A., Kumar, P. V., & Janaki, V. (2018). A novel fuzzy similarity measure and prevalence estimation approach for similarity profiled temporal association pattern mining. Future Generation Computer Systems, 83, 582–595.
Article Google Scholar
Stearns, S. D. (1976). On selecting features for pattern classifier. In Proceedings of 3rd international conference on pattern recognition, Coronado, CA, USA (pp. 71–75).
Tsai, C.-F., Eberle, W., & Chu, C.-Y. (2013). Genetic algorithms in feature and instance selection. Knowledge-Based Systems, 39, 240–247.
Article Google Scholar
Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, C-20(9), 1100–1103.
Article Google Scholar
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques. Burlington: Morgan Kaufmann.
Google Scholar
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87, 9193–9196.
Article Google Scholar
Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2014). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97–107.
Article Google Scholar
Xue, B., Zhang, M., & Browne, W. N. (2013). Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6), 1656–1671.
Article Google Scholar
Yassein, M. B., Aljawarneh, S., et al. (2017). A new elastic trickle timer algorithm for Internet of Things. Journal of Network and Computer Applications, 89, 38–47.
Article Google Scholar
Zhang, J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the ninth international machine learning conference (pp. 470–479). Aberdeen, Scotland: Morgan Kaufmann.
Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill Osborne Media.
Google Scholar

Download references

Author information

Authors and Affiliations

Altinbas University College of Engineering, Istanbul, Turkey
Tareq Abed Mohammed, Oguz Bayat, Osman N. Uçan & Shaymaa Alhayali
Kirkuk University College of Science, Kirkuk, Iraq
Tareq Abed Mohammed

Authors

Tareq Abed Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Oguz Bayat
View author publications
You can also search for this author in PubMed Google Scholar
Osman N. Uçan
View author publications
You can also search for this author in PubMed Google Scholar
Shaymaa Alhayali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oguz Bayat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohammed, T.A., Bayat, O., Uçan, O.N. et al. Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems. Found Sci 25, 1009–1025 (2020). https://doi.org/10.1007/s10699-019-09588-6

Download citation

Published: 01 March 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10699-019-09588-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Algorithm for Big Data Based on Genetic Algorithm

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Feature Subset Selection Approach by Gray-Wolf Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems

Abstract

Access this article

Similar content being viewed by others

A Feature Selection Algorithm for Big Data Based on Genetic Algorithm

Feature Selection Optimization Using a Hybrid Genetic Algorithm

Feature Subset Selection Approach by Gray-Wolf Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation