eprintid: 74875
rev_number: 10
eprint_status: archive
userid: 12460
dir: disk0/00/07/48/75
datestamp: 2026-01-09 02:16:35
lastmod: 2026-01-09 02:16:35
status_changed: 2026-01-09 02:16:35
type: thesis
metadata_visibility: show
contact_email: muh.khabib@uin-suka.ac.id
creators_name: Deny Setiawan, NIM.: 23206051013
title: OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT
ispublished: pub
subjects: 004.
divisions: S2_inf
full_text_status: restricted
keywords: Anamnesis, Klasifikasi, IndoBERT, NLP, SNOMED-CT
note: Dr. Agus Mulyanto, S.Si., M.Kom., ASEAN Eng.
abstract: Classifying medical entities from patient anamnesis records is essential for producing structured clinical data aligned with standardized medical terminologies such as SNOMED-CT. However, the free-text format of clinical notes often contains high linguistic variability, requiring a Natural Language Processing (NLP) approach. This study aims to optimize the IndoBERT model through three stages of fine-tuning to classify medical entities in neurological outpatient anamnesis records. The dataset consists of 500 manually labeled records categorized into medical entities such as symptoms, body parts, medical history, and time expressions.
The fine-tuning process was conducted using a learning rate of 5e-5, a batch size of 4, and three training epochs, with varying hyperparameter combinations across three model versions (V1, V2, and V3). Evaluation results show progressive performance improvement as the hyperparameters were refined. The model’s accuracy increased from 0.856 (V1) to 0.870 (V3), while the weighted F1-score improved from 0.844 to 0.850, indicating greater stability and precision in medical entity classification. Although the macro F1-score slightly decreased from 0.740 to 0.731, this was primarily due to class imbalance within the dataset.
Overall, the findings demonstrate that stepwise fine-tuning of IndoBERT effectively enhances model performance for medical Named Entity Recognition (NER) tasks. The best-performing model (V3) exhibits reliable capability in identifying key medical entities within Indonesian-language anamnesis texts, suggesting its strong potential for supporting clinical data standardization and analysis in healthcare information systems.
date: 2025-11-03
date_type: published
pages: 90
institution: UIN SUNAN KALIJAGA YOGYAKARTA
department: FAKULTAS SAINS DAN TEKNOLOGI
thesis_type: masters
thesis_name: other
citation:   Deny Setiawan, NIM.: 23206051013  (2025) OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT.  Masters thesis, UIN SUNAN KALIJAGA YOGYAKARTA.   
document_url: https://digilib.uin-suka.ac.id/id/eprint/74875/1/23206051013_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf
document_url: https://digilib.uin-suka.ac.id/id/eprint/74875/2/23206051013_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf