eprintid: 74875 rev_number: 10 eprint_status: archive userid: 12460 dir: disk0/00/07/48/75 datestamp: 2026-01-09 02:16:35 lastmod: 2026-01-09 02:16:35 status_changed: 2026-01-09 02:16:35 type: thesis metadata_visibility: show contact_email: muh.khabib@uin-suka.ac.id creators_name: Deny Setiawan, NIM.: 23206051013 title: OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT ispublished: pub subjects: 004. divisions: S2_inf full_text_status: restricted keywords: Anamnesis, Klasifikasi, IndoBERT, NLP, SNOMED-CT note: Dr. Agus Mulyanto, S.Si., M.Kom., ASEAN Eng. abstract: Classifying medical entities from patient anamnesis records is essential for producing structured clinical data aligned with standardized medical terminologies such as SNOMED-CT. However, the free-text format of clinical notes often contains high linguistic variability, requiring a Natural Language Processing (NLP) approach. This study aims to optimize the IndoBERT model through three stages of fine-tuning to classify medical entities in neurological outpatient anamnesis records. The dataset consists of 500 manually labeled records categorized into medical entities such as symptoms, body parts, medical history, and time expressions. The fine-tuning process was conducted using a learning rate of 5e-5, a batch size of 4, and three training epochs, with varying hyperparameter combinations across three model versions (V1, V2, and V3). Evaluation results show progressive performance improvement as the hyperparameters were refined. The model’s accuracy increased from 0.856 (V1) to 0.870 (V3), while the weighted F1-score improved from 0.844 to 0.850, indicating greater stability and precision in medical entity classification. Although the macro F1-score slightly decreased from 0.740 to 0.731, this was primarily due to class imbalance within the dataset. Overall, the findings demonstrate that stepwise fine-tuning of IndoBERT effectively enhances model performance for medical Named Entity Recognition (NER) tasks. The best-performing model (V3) exhibits reliable capability in identifying key medical entities within Indonesian-language anamnesis texts, suggesting its strong potential for supporting clinical data standardization and analysis in healthcare information systems. date: 2025-11-03 date_type: published pages: 90 institution: UIN SUNAN KALIJAGA YOGYAKARTA department: FAKULTAS SAINS DAN TEKNOLOGI thesis_type: masters thesis_name: other citation: Deny Setiawan, NIM.: 23206051013 (2025) OPTIMASI LARGE LANGUAGE MODEL: INDOBERT DALAM KLASIFIKASI ANAMNESIS MEDIS BERDASARKAN SNOMED-CT. Masters thesis, UIN SUNAN KALIJAGA YOGYAKARTA. document_url: https://digilib.uin-suka.ac.id/id/eprint/74875/1/23206051013_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf document_url: https://digilib.uin-suka.ac.id/id/eprint/74875/2/23206051013_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf