eprintid: 72940 rev_number: 10 eprint_status: archive userid: 12460 dir: disk0/00/07/29/40 datestamp: 2025-09-16 07:34:18 lastmod: 2025-09-16 07:34:18 status_changed: 2025-09-16 07:34:18 type: thesis metadata_visibility: show contact_email: muh.khabib@uin-suka.ac.id creators_name: Salim Athari, NIM.: 22206051023 title: PENGARUH PENERAPAN KOMBINASI METODE PREPROCESSING, VECTORIZER, DAN ALGORITMA TERHADAP KINERJA KLASIFIKASI TEKS PERUNDUNGAN SIBER ispublished: pub subjects: 004. divisions: S2_inf full_text_status: restricted keywords: Perundungan Siber, Klasifikasi Teks, Preprocessing, Vectorizer, Algoritma note: Dr. Agung Fatwanto, S.Si., M.Kom. abstract: Cyberbullying is a serious digital threat that requires automated detection. Machine Learning (ML) and Natural Language Processing (NLP) offer solutions, but their performance is significantly affected by the combination of preprocessing methods, vectorization, and classification algorithms. This study systematically compares these combinations to identify the optimal configuration due to the lack of a holistic approach. The preprocessing methods used include stopword removal, stemming, lemmatization with a one-hot vectorizer, BoW, TFIDF, and Logistic Regression, SVM, KNN, Decision Tree, and Naive Bayes algorithms for cyberbullying text classification. The goal is to find the most effective combination of methods based on standard performance metrics. The approach used is a comparative, quantitative, experimental approach, using a labeled English-language cyberbullying dataset. Data preprocessing was performed before vectorization and then trained using a classification algorithm. Performance was rigorously evaluated using accuracy, precision, recall, and F1-score. Therefore, this holistic approach yielded the highest results among the various combinations. The results show that preprocessing significantly improves performance, with lemmatization generally outperforming. One-Hot Encoding yielded the lowest results, while TF-IDF was the most effective vectorizer. Among the algorithms, SVM and Logistic Regression performed best, with SVM demonstrating excellent performance, especially with TF-IDF. The optimal combination was obtained when using stopword removal, TF-IDF, and SVM, achieving an F1-score of 0.863. The lowest combination result, 0.155, was obtained when using no preprocessing, one-hot encoding vectorization, and the KNN algorithm. This emphasizes the importance of precise preprocessing, an informative vectorizer, and a robust algorithm for effective cyberbullying detection. date: 2025-08-16 date_type: published pages: 80 institution: UIN SUNAN KALIJAGA YOGYAKARTA department: FAKULTAS SAINS DAN TEKNOLOGI thesis_type: masters thesis_name: other citation: Salim Athari, NIM.: 22206051023 (2025) PENGARUH PENERAPAN KOMBINASI METODE PREPROCESSING, VECTORIZER, DAN ALGORITMA TERHADAP KINERJA KLASIFIKASI TEKS PERUNDUNGAN SIBER. Masters thesis, UIN SUNAN KALIJAGA YOGYAKARTA. document_url: https://digilib.uin-suka.ac.id/id/eprint/72940/1/22206051023_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf document_url: https://digilib.uin-suka.ac.id/id/eprint/72940/2/22206051023_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf