Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation

  • Ghadeer Written by
  • Update: 02/01/2023

Highly Accurate Spam Detection with the Help of Feature Selection and Data Transformation

Hidayet Takci

Computer Engineering Department, Sivas Cumhuriyet University, Turkey

This email address is being protected from spambots. You need JavaScript enabled to view it.

Fatema Nusrat

Computer Science Department, Asian University for Women, Bangladesh

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract: The amount of spam is increasing rapidly while the popularity of emails is increasing. This situation has led to the need to filter spam emails. To date, many knowledge-based, learning-based, and clustering-based methods have been developed for filtering spam emails. In this study, machine-learning-based spam detection was targeted, and C4.5, ID3, RndTree, C-Support Vector Classification (C-SVC), and Naïve Bayes algorithms were used for email spam detection. In addition, feature selection and data transformation methods were used to increase spam detection success. Experiments were performed on the UC Irvine Machine Learning Repository (UCI) spambase dataset, and the results were compared for accuracy, Receiver Operating Characteristic (ROC) analysis, and classification speed. According to the accuracy comparison, the C-SVC algorithm gave the highest accuracy with 93.13%, followed by the RndTree algorithm. According to the ROC analysis, the RndTree algorithm gave the best Area Under Curve (AUC) value of 0.999, while the C4.5 algorithm gave the second-best result. The most successful methods in terms of classification speed are Naïve Bayes and RndTree algorithms. In the experiments, it was seen that feature selection and data transformation methods increased spam detection success. The binary transformation that increased the classification success the most and the feature selection method was forward selection.

Keywords: Internet security, prediction methods, feature selection, data conversion, spam detection.

Received July 2, 2021; accepted September 28, 2022

https://doi.org/10.34028/iajit/20/1/4

Full text

Read 552 times Last modified on Tuesday, 03 January 2023 07:40
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…