OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT

Deny Setiawan, NIM.: 23206051013 (2025) OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT. Masters thesis, UIN SUNAN KALIJAGA YOGYAKARTA.

[img]
Preview
Text (OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT)
23206051013_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf - Published Version

Download (5MB) | Preview
[img] Text (OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT)
23206051013_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf - Published Version
Restricted to Registered users only

Download (11MB) | Request a copy

Abstract

Classifying medical entities from patient anamnesis records is essential for producing structured clinical data aligned with standardized medical terminologies such as SNOMED-CT. However, the free-text format of clinical notes often contains high linguistic variability, requiring a Natural Language Processing (NLP) approach. This study aims to optimize the IndoBERT model through three stages of fine-tuning to classify medical entities in neurological outpatient anamnesis records. The dataset consists of 500 manually labeled records categorized into medical entities such as symptoms, body parts, medical history, and time expressions. The fine-tuning process was conducted using a learning rate of 5e-5, a batch size of 4, and three training epochs, with varying hyperparameter combinations across three model versions (V1, V2, and V3). Evaluation results show progressive performance improvement as the hyperparameters were refined. The model’s accuracy increased from 0.856 (V1) to 0.870 (V3), while the weighted F1-score improved from 0.844 to 0.850, indicating greater stability and precision in medical entity classification. Although the macro F1-score slightly decreased from 0.740 to 0.731, this was primarily due to class imbalance within the dataset. Overall, the findings demonstrate that stepwise fine-tuning of IndoBERT effectively enhances model performance for medical Named Entity Recognition (NER) tasks. The best-performing model (V3) exhibits reliable capability in identifying key medical entities within Indonesian-language anamnesis texts, suggesting its strong potential for supporting clinical data standardization and analysis in healthcare information systems.

Item Type: Thesis (Masters)
Additional Information / Supervisor: Dr. Agus Mulyanto, S.Si., M.Kom., ASEAN Eng.
Uncontrolled Keywords: Anamnesis, Klasifikasi, IndoBERT, NLP, SNOMED-CT
Subjects: 000 Ilmu Komputer, Ilmu Informasi, dan Karya Umum > 000 Karya Umum > 004 Pemrosesan Data, Ilmu Komputer, Teknik Informatika
Divisions: Fakultas Sains dan Teknologi > Informatika (S2)
Depositing User: Muh Khabib, SIP.
Date Deposited: 09 Jan 2026 09:16
Last Modified: 09 Jan 2026 09:16
URI: http://digilib.uin-suka.ac.id/id/eprint/74875

Share this knowledge with your friends :

Actions (login required)

View Item View Item
Chat Kak Imum