eprintid: 58350
rev_number: 10
eprint_status: archive
userid: 12243
dir: disk0/00/05/83/50
datestamp: 2023-05-05 07:34:23
lastmod: 2023-05-05 07:34:23
status_changed: 2023-05-05 07:34:23
type: thesis
metadata_visibility: show
contact_email: muchti.nurhidaya@uin-suka.ac.id
creators_name: Nawwab Zia Ajnaden, NIM.: 18106050042
title: PENERAPAN ALGORITMA RANDOM UNDER DAN OVER
SAMPLING UNTUK MENGATASI CLASS IMBALANCE
DALAM KLASIFIKASI TOPIK FORUM
ispublished: pub
subjects: TB
divisions: jur_tinf
full_text_status: restricted
keywords: balanced accuracy; classification; class imbalance; resampling
note: Pembimbing: Nurochman, S.Kom., M.Kom.
abstract: Classification of user-generated content in social media applications
provides several opportunities to provide useful benefits to developers and users.
Factors such as limited time to create features and limited data sources at a certain
time can cause an imbalance in the number of classes in a particular label in the
dataset. Resampling techniques such as random over and under sampling are one
of the solutions to solving this problem.
This research compares three models (with the naive bayes classifier) in
classifying two datasets, namely, post data and combined data between posts and
comments, the three models are: models that are not subject to resampling,
models with Random Over Sampling (ROS), and models with Random Under
Sampling (RUS). All data has a total of 12 topic classes.
The results showed an increase in the balanced accuracy value in all models
equipped with resampling (Without resampling: 0.5792 and 0.5078, ROS: 0.6148
and 0.5570, RUS: 0.6040 and 0.5225 went to post data, and combined post and
comment data, respectively). Improvement occurred only in models trained with
post data on F1-score (Without resampling: 0.5780 and 0.5315, ROS: 0.6027 and
0.5260, RUS: 0.5754 and 0.4711 went to post data, and combined post and
comment data, respectively) and precision (without resampling: 0.5816 and
0.5774, ROS: 0.6027 and 0.5155, RUS: 0.5754 and 0.4653 went to post data, and
combined post and comment data, respectively). However, all models with
resampling improved recall values (without resampling: 0.5792 and 0.5078, ROS:
0.6148 and 0.5570, RUS: 0.6040 and 0.5225 went to post data, and combined post
and comment data, respectively) and correspond to an increase in the number of
predicted results in some minority classes (both true positives and false negatives
prediction).
date: 2023-01-10
date_type: published
pages: 53
institution: UIN SUNAN KALIJAGA YOGYAKARTA
department: FAKULTAS SAINS DAN TEKNOLOGI
thesis_type: skripsi
thesis_name: other
citation:   Nawwab Zia Ajnaden, NIM.: 18106050042  (2023) PENERAPAN ALGORITMA RANDOM UNDER DAN OVER SAMPLING UNTUK MENGATASI CLASS IMBALANCE DALAM KLASIFIKASI TOPIK FORUM.  Skripsi thesis, UIN SUNAN KALIJAGA YOGYAKARTA.   
document_url: https://digilib.uin-suka.ac.id/id/eprint/58350/1/18106050042_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf
document_url: https://digilib.uin-suka.ac.id/id/eprint/58350/2/18106050042_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf