KOMPARASI PERFORMA LARGE LANGUAGE MODELS UNTUK TUGAS PERINGKASAN TEKS BERBAHASA INDONESIA

R. Abdullah Hammami, NIM.: 22206051019 (2025) KOMPARASI PERFORMA LARGE LANGUAGE MODELS UNTUK TUGAS PERINGKASAN TEKS BERBAHASA INDONESIA. Masters thesis, UIN SUNAN KALIJAGA YOGYAKARTA.

[img]
Preview
Text (KOMPARASI PERFORMA LARGE LANGUAGE MODELS UNTUK TUGAS PERINGKASAN TEKS BERBAHASA INDONESIA)
22206051019_BAB-I_IV-atau-V_DAFTAR-PUSTAKA.pdf - Published Version

Download (6MB) | Preview
[img] Text (KOMPARASI PERFORMA LARGE LANGUAGE MODELS UNTUK TUGAS PERINGKASAN TEKS BERBAHASA INDONESIA)
22206051019_BAB-II_sampai_SEBELUM-BAB-TERAKHIR.pdf - Published Version
Restricted to Registered users only

Download (14MB) | Request a copy

Abstract

The rapid growth of online information, coupled with low reading interest and heterogeneous literacy levels in Indonesia, necessitates concise, accurate, and context-sensitive automatic summarization. Given Indonesian’s low-resource status, systematic evaluation of locally adapted models is warranted. This study compares four Indonesian-capable large language models—Gemma2 9B CPT Sahabat-AI v1 Instruct, Llama3 8B CPT Sahabat-AI v1 Instruct, Gemma-SEA-LION-v3-9B-IT, and Llama-SEA-LION-v3-8B-IT—on news summarization to identify the most suitable model for practical use. We employ a benchmarking protocol on the IndoSum test subset (3,762 articles), comprising preprocessing (token reconstruction and punctuation cleanup), prompt design, 8-bit quantized inference, and automated evaluation with ROUGE (1/2/L; precision, recall, F1), BLEU, METEOR, and BERTScore. Inference is executed in four batches to meet computational constraints, and evaluation is standardized across models. Llama3 8B CPT Sahabat-AI v1 Instruct achieves the most balanced performance: ROUGE F1 42.05% (precision 42.27%; recall 42.68%), BLEU 25.10%, and BERTScore P/R/F1 88.68%/88.43%/88.54%. Gemma2 9B CPT Sahabat-AI v1 Instruct excels in coverage with ROUGE recall 48.23%, ROUGE F1 39.50%, BLEU 22.70%, METEOR 47.20%, and BERTScore 86.78%/89.17%/87.95%. SEA-LION models perform lower: Gemma-SEA-LION-v3-9B-IT (ROUGE P/R/F1 25.77%/37.58%/30.37%; BLEU 12.65%; METEOR 37.72%; BERTScore 84.63%/87.36%/85.97%) and Llama-SEA-LION-v3-8B-IT (ROUGE 25.22%/33.84%/28.71%; BLEU 11.06%; METEOR 34.57%; BERTScore 84.46%/86.80%/85.61%). Overall, Indonesian-optimized models (SahabatAI) are superior and more stable. Llama3 8B is preferable when balancing precision, coverage, and structural consistency; Gemma2 9B is better when recall and semantic alignment with the source are prioritized.

Item Type: Thesis (Masters)
Additional Information / Supervisor: Dr. Agung Fatwanto, S.Si., M.Kom.
Uncontrolled Keywords: Peringkasan Teks, Fine-tuning, Gemma2, LLaMA3
Subjects: 000 Ilmu Komputer, Ilmu Informasi, dan Karya Umum > 000 Karya Umum > 004 Pemrosesan Data, Ilmu Komputer, Teknik Informatika
Divisions: Fakultas Sains dan Teknologi > Informatika (S2)
Depositing User: Muh Khabib, SIP.
Date Deposited: 16 Sep 2025 14:24
Last Modified: 16 Sep 2025 14:24
URI: http://digilib.uin-suka.ac.id/id/eprint/72937

Share this knowledge with your friends :

Actions (login required)

View Item View Item
Chat Kak Imum