PENGARUH PENERAPAN KOMBINASI METODE PREPROCESSING, VECTORIZER, DAN ALGORITMA TERHADAP KINERJA KLASIFIKASI TEKS PERUNDUNGAN SIBER

Salim Athari, NIM.: 22206051023 (2025) PENGARUH PENERAPAN KOMBINASI METODE PREPROCESSING, VECTORIZER, DAN ALGORITMA TERHADAP KINERJA KLASIFIKASI TEKS PERUNDUNGAN SIBER. Masters thesis, UIN SUNAN KALIJAGA YOGYAKARTA.

[img]
Preview
Text (PENGARUH PENERAPAN KOMBINASI METODE PREPROCESSING, VECTORIZER, DAN ALGORITMA TERHADAP KINERJA KLASIFIKASI TEKS PERUNDUNGAN SIBER)
22206051023_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf - Published Version

Download (2MB) | Preview
[img] Text (PENGARUH PENERAPAN KOMBINASI METODE PREPROCESSING, VECTORIZER, DAN ALGORITMA TERHADAP KINERJA KLASIFIKASI TEKS PERUNDUNGAN SIBER)
22206051023_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Cyberbullying is a serious digital threat that requires automated detection. Machine Learning (ML) and Natural Language Processing (NLP) offer solutions, but their performance is significantly affected by the combination of preprocessing methods, vectorization, and classification algorithms. This study systematically compares these combinations to identify the optimal configuration due to the lack of a holistic approach. The preprocessing methods used include stopword removal, stemming, lemmatization with a one-hot vectorizer, BoW, TFIDF, and Logistic Regression, SVM, KNN, Decision Tree, and Naive Bayes algorithms for cyberbullying text classification. The goal is to find the most effective combination of methods based on standard performance metrics. The approach used is a comparative, quantitative, experimental approach, using a labeled English-language cyberbullying dataset. Data preprocessing was performed before vectorization and then trained using a classification algorithm. Performance was rigorously evaluated using accuracy, precision, recall, and F1-score. Therefore, this holistic approach yielded the highest results among the various combinations. The results show that preprocessing significantly improves performance, with lemmatization generally outperforming. One-Hot Encoding yielded the lowest results, while TF-IDF was the most effective vectorizer. Among the algorithms, SVM and Logistic Regression performed best, with SVM demonstrating excellent performance, especially with TF-IDF. The optimal combination was obtained when using stopword removal, TF-IDF, and SVM, achieving an F1-score of 0.863. The lowest combination result, 0.155, was obtained when using no preprocessing, one-hot encoding vectorization, and the KNN algorithm. This emphasizes the importance of precise preprocessing, an informative vectorizer, and a robust algorithm for effective cyberbullying detection.

Item Type: Thesis (Masters)
Additional Information / Supervisor: Dr. Agung Fatwanto, S.Si., M.Kom.
Uncontrolled Keywords: Perundungan Siber, Klasifikasi Teks, Preprocessing, Vectorizer, Algoritma
Subjects: 000 Ilmu Komputer, Ilmu Informasi, dan Karya Umum > 000 Karya Umum > 004 Pemrosesan Data, Ilmu Komputer, Teknik Informatika
Divisions: Fakultas Sains dan Teknologi > Informatika (S2)
Depositing User: Muh Khabib, SIP.
Date Deposited: 16 Sep 2025 14:34
Last Modified: 16 Sep 2025 14:34
URI: http://digilib.uin-suka.ac.id/id/eprint/72940

Share this knowledge with your friends :

Actions (login required)

View Item View Item
Chat Kak Imum