A Data Science Approach to Cancer Patient Classification Using Support Vector Machine and Random Forest

Authors

DOI:

https://doi.org/10.22441/collabits.v3i1.37642

Keywords:

Cancer Patient Data, Support Vector Machine, Random Forest, Data science

Abstract

The increasing availability of healthcare data has encouraged the application of data science and machine learning techniques in medical research. Cancer patient datasets contain numerical demographic and clinical attributes that can be utilized for classification tasks; however, complex feature relationships and limited feature relevance remain key challenges. This study aims to analyze cancer patient data and compare the performance of Support Vector Machine and Random Forest algorithms for gender classification. The dataset used in this study consists of numerical features, including patient age, tumor size, number of examined lymph nodes, number of positive lymph nodes, body mass index, and survival duration measured in months. The research methodology includes data preprocessing, exploratory data analysis, model development, and performance evaluation. Feature normalization and data splitting are applied to ensure a fair comparison between models, while exploratory analysis is conducted to examine data distribution and relationships among variables. Both classification models are trained under identical experimental settings and evaluated using accuracy as the primary performance metric. The results indicate that both algorithms can classify cancer patients with satisfactory accuracy. Support Vector Machine demonstrates slightly better performance compared to Random Forest, suggesting its effectiveness in handling numerical data with complex decision boundaries. The findings highlight the importance of appropriate algorithm selection and feature utilization in healthcare data analysis.

Downloads

Download data is not yet available.

Author Biographies

Devi Dwi Anggraini, Universitas Mercu Buana

Mahasiswa Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Mercu Buana, Indonesia.

Mutiara Rizky Salsabila, Universitas Mercu Buana

Mahasiswa Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Mercu Buana, Indonesia.

Keisya Rizkia Kamila, Universitas Mercu Buana

Mahasiswa Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Mercu Buana, Indonesia.

References

I. Putri, R. Sari, and D. Prakoso, “Application of data mining using multiple linear regression algorithm in gold price forecasting,” Journal of Information Systems, vol. 6, no. 1, pp. 25–32, 2020.

M. Rahman, A. Nugroho, and S. Hadi, “Sentiment analysis of public opinion on public transportation in Jabodetabek using a web-based SVM algorithm,” Journal of Information Technology, vol. 8, no. 2, pp. 30–36, 2020.

D. Sari, F. Ananda, and Y. Pratama, “Sentiment analysis of tweets on the omnibus law using PSO-based SVM algorithm,” Journal of Data Science and Analytics, vol. 5, no. 1, pp. 40–46, 2021.

A. Hidayat, R. Maulana, and N. Fitriani, “Sentiment analysis of TikTok Shop users using the SVM algorithm,” Journal of Digital Business Analytics, vol. 4, no. 2, pp. 23–29, 2022.

D. Prasetyo, L. Wibowo, and A. Kurniawan, “Classification of public opinion on Twitter regarding data breaches in Indonesia using the SVM algorithm,” Journal of Social Media Analytics, vol. 6, no. 1, pp. 35–41, 2021.

N. Utami and A. Saputra, “Implementation of support vector machine algorithm in predicting stroke disease,” Journal of Health Informatics, vol. 4, no. 1, pp. 44–49, 2020.

M. Ramadhan, I. Hanafiah, and L. Safitri, “The effect of data balancing techniques on NAFLD disease classification using SVM algorithm,” Journal of Biomedical Informatics, vol. 6, no. 3, pp. 51–57, 2021.

A. Basri, H. Nasir, and L. Andini, “Disease diagnosis analysis based on medical history using random forest algorithm: A case study at Padjongadg Ngalle Hospital, Takalar Regency,” Journal of Medical Informatics, vol. 7, no. 2, pp. 32–38, 2020.

R. Pratama and S. Lestari, “Prediction of thyroid cancer recurrence using random forest algorithm,” Journal of Biomedical Data Science, vol. 4, no. 2, pp. 40–47, 2021.

M. Santoso and D. Kurnia, “Skin cancer image classification using random forest,” Journal of Computer Vision and Imaging, vol. 4, no. 2, pp. 29–35, 2022.

T. Wibowo and A. Hakim, “Intelligent detection and prediction of lung diseases using random forest algorithm,” Journal of Intelligent Systems, vol. 5, no. 1, pp. 37–43, 2021.

S. Lestari, R. Handayani, and M. Putra, “Optimization of random forest algorithm using particle swarm optimization for breast cancer classification with mammogram images,” Journal of Medical Image Computing, vol. 6, no. 2, pp. 45–52, 2022.

F. Firdaus, Y. Putra, and N. Siregar, “Characteristics of lung cancer patients at Dr. M. Djamil General Hospital Padang in 2021,” Journal of Clinical Oncology Research, vol. 9, no. 1, pp. 20–27, 2021.

A. Hakim, R. Maulana, and S. Hidayah, “Lung cancer classification using a comparison of machine learning algorithms,” Journal of Health Artificial Intelligence, vol. 6, no. 2, pp. 44–50, 2022.

S. Sulastri and D. Permata, “Comparative analysis of breast cancer prediction accuracy using random forest and logistic regression,” Journal of Health Data Science, vol. 5, no. 2, pp. 50–56, 2021.

E. Mulyani and P. Rahayu, “Breast cancer classification using SVM with RBF, linear, and sigmoid kernels,” Journal of Machine Learning Applications, vol. 4, no. 3, pp. 39–45, 2020.

Downloads

Published

2026-02-23

How to Cite

[1]
D. D. Anggraini, M. R. Salsabila, K. R. Kamila, and Y. S. Sari, “A Data Science Approach to Cancer Patient Classification Using Support Vector Machine and Random Forest”, Collabits, vol. 3, no. 1, pp. 72–79, Feb. 2026.

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.