Deteksi Hoaks Pada Berita Berbahasa Indonesia Seputar COVID-19

Amanda Tabitha Bulan Panjaitan; Ibnu Santoso

doi:10.22441/format.2021.v10.i1.007

Deteksi Hoaks Pada Berita Berbahasa Indonesia Seputar COVID-19

Penulis

Amanda Tabitha Bulan Panjaitan
Ibnu Santoso

DOI:

https://doi.org/10.22441/format.2021.v10.i1.007

Kata Kunci:

web scraping, data mining, klasifikasi, feature engineering, hoaks

Abstrak

Perkembangan teknologi yang semakin maju tentu mendatangkan banyak kemudahan bagi para penggunanya namun di lain sisi juga mempercepat penyebaran berita bohong pada internet. Berita bohong atau dikenal dengan hoaks adalah informasi sesat dan berbahaya karena menyesatkan persepsi manusia dengan menyampaikan informasi palsu sebagai kebenaran. Hoaks sendiri dapat bertujuan untuk mempengaruhi pembaca dengan informasi palsu sehingga pembaca mengambil tindakan sesuai dengan isi hoaks. Oleh karena itu, diperlukan sistem cerdas yang mampu mengklasifikasi sebuah berita dengan cepat yang menyebar melalui internet agar tidak menyesatkan para pembacanya. Penelitian ini dimulai dengan melakukan scraping berita yang sudah diberi kategori hoaks atau valid. Dataset tersebut dibagi dua menjadi data latih dan data uji. Dilakukan pre-processing mulai dari case folding, tokenizing, filtering dan stemming. Pada penelitian ini dilakukan perbandingan terhadap pengaruh penerapan feature engineering. Dari hasil akurasi, dapat dilihat bahwa dengan diterapkannya feature engineering mampu meningkatkan akurasi kelima metode klasifikasi. Metode random forest dengan penerapan feature engineering menghasilkan tingkat akurasi sebesar 96,05%.

Unduhan

Data unduhan belum tersedia.

Referensi

Kominfo. (2017, Desember 12). Ada 800.000 Situs Penyebar Hoax di Indonesia. Dipetik Oktober 3, 2020, dari https://kominfo.go.id/content/detail/12008/ada-800000-situs-penyebar-hoax-di-indonesia/0/sorotan_media

Forbes. (2020, August 23). Report: More Than 800 Deaths And 5,800 Hospitalizations Globally May Have Resulted From COVID-19 Misinformation Online. Dipetik Oktober 3, 2020, dari https://www.forbes.com/sites/markhall/2020/08/23/coronavirus-misinformation/#dc3c9f01684e

Prasetyo, A. R., Indriati, & Adikara, P. P. (2018). Klasifikasi Hoax Pada Berita Kesehatan Berbahasa Indonesia Dengan Menggunakan Metode Modified K-Nearest Neighbor. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, II(12), 7466-7473.

Turland, M. (2010). php|architect's Guide to Web Scraping. Los Angeles: Marco Tabini & Associates, Inc.

Triawati, C. (2009). Metode Pembobotan Statistical Concept Based untuk Klastering dan Kategorisasi Dokumen Berbahasa Indonesia. Bandung: Institut Teknologi Telkom.

Vijayarani, S., & Janani, R. (2016). Text Mining: Open Source Tokenization Tools, an Analysis. Advanced Computational Intelligence: An International Journal, III, 37-47.

Narulita, L. F. (2018). Pengaruh Proses Stemming Pada Kinerja Analisa Sentimen Pada Review Buku. Jurnal Hasil Penelitian LPPM Untag Surabaya, III(1), 55- 59.

Ren, F., & Sohrab, M. G. (2013). Class-indexing-based term weighting for automatic text classification. Inf. Sci., 236, 109-125.

Nurhikmat, T. (2018). Implementasi Deep Learning Untuk Image Classification Menggunakan AlgoritmaConvutional Neural Network (CNN) Pada Citra Wayang Golek. Yogyakarta: Universitas Islam Indonesia.

Bramer, M. (2007). Principles of Data Mining London. Springer Clark.

Liaw, A., & Wiener, M. (2002). Classification and Regression by Randomforest. II.

Rosadi, D. (2011). Analisis Ekonometrika dan Runtun Waktu Terapan dengan. Yogyakarta: Andi Offset.

Samsudiney. (2019, July). Penjelasan Sederhana tentang Apa Itu SVM? Diambil kembali dari https://medium.com/@samsudiney/penjelasansederhana-.

Friedman, J. (2014). Greedy Function Approximation: A Gradient Boosting. Ann. Stat., 29(5), 1189–1232.

Sabbah, T., Selamat, A., Selamat, M. H., Al-Anzi, F. S., Viedma, E. H., Krejcar, O., et al. (2017). Modified frequency-based term weighting schemes for text classification. Applied Soft Computing Journal 58, 193-206.

Rahmatullah, A. & Gunawan, R. (2020). Web Scraping with HTML DOM Method for Data Collection of Scientific Articles from Google Scholar. Indonesian Journal of Information Systems, II(2), 95-104.

Handayani, A., Jamal, A., Septiandri, A. A. (2017). Evaluasi Tiga Jenis Algoritme Berbasis Pembelajaran Mesin untuk Klasifikasi Jenis Tumor Payudara. JNTETI, VI(4), 394-403.

Rawat, Tara & Khemchandani, Vineeta. (2019). Feature Engineering (FE) Tools and Techniques for Better Classification Performance. IJIET, VIII(2), 169-179.

Unduhan

PDF (English)

Diterbitkan

2021-02-08

Cara Mengutip

[1]

A. T. B. Panjaitan dan I. Santoso, “Deteksi Hoaks Pada Berita Berbahasa Indonesia Seputar COVID-19”, FORMAT, vol. 10, no. 1, hlm. 76–85, Feb 2021.

Unduh Sitasi

Terbitan

Vol 10 No 1 (2021)

Bagian

Articles

Lisensi

The copyright to this article is transferred to Universitas Mercu Buana (UMB) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to UMB. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment.

We declare that this paper has not been published in the same form elsewhere.

Furthermore, I/We hereby transfer the unlimited rights of publication of the above-mentioned paper as a whole to UMB. The copyright transfer covers the right to reproduce and distribute the article, including reprints, translations, photographic reproductions, microform, electronic form (offline, online) or any other reproductions of similar nature.

The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.

Retained Rights/Terms and Conditions

Although authors are permitted to re-use all or portions of the Work in other works, this does not include granting third-party requests for reprinting, republishing, or other types of re-use.

Our Articles are licensed under CC BY-NC

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.