%A NIM.: 23206051013 Deny Setiawan
%O Dr. Agus Mulyanto, S.Si., M.Kom., ASEAN Eng.
%T OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT
%X Classifying medical entities from patient anamnesis records is essential for producing structured clinical data aligned with standardized medical terminologies such as SNOMED-CT. However, the free-text format of clinical notes often contains high linguistic variability, requiring a Natural Language Processing (NLP) approach. This study aims to optimize the IndoBERT model through three stages of fine-tuning to classify medical entities in neurological outpatient anamnesis records. The dataset consists of 500 manually labeled records categorized into medical entities such as symptoms, body parts, medical history, and time expressions.
The fine-tuning process was conducted using a learning rate of 5e-5, a batch size of 4, and three training epochs, with varying hyperparameter combinations across three model versions (V1, V2, and V3). Evaluation results show progressive performance improvement as the hyperparameters were refined. The model’s accuracy increased from 0.856 (V1) to 0.870 (V3), while the weighted F1-score improved from 0.844 to 0.850, indicating greater stability and precision in medical entity classification. Although the macro F1-score slightly decreased from 0.740 to 0.731, this was primarily due to class imbalance within the dataset.
Overall, the findings demonstrate that stepwise fine-tuning of IndoBERT effectively enhances model performance for medical Named Entity Recognition (NER) tasks. The best-performing model (V3) exhibits reliable capability in identifying key medical entities within Indonesian-language anamnesis texts, suggesting its strong potential for supporting clinical data standardization and analysis in healthcare information systems.
%K Anamnesis, Klasifikasi, IndoBERT, NLP, SNOMED-CT
%D 2025
%I UIN SUNAN KALIJAGA YOGYAKARTA
%L digilib74875