LawRAG: Retrieval-Augmented Generation for Judicial Case Law: An Embedding Model Benchmark
Main Article Content
Abstract
This paper presents LawRAG, an advanced Retrieval-Augmented Generation (RAG) system designed for legal question answering using judicial case law in the Australian legal domain. The framework integrates legal document corpora, optimized vector embeddings, and state-of-the-art large language model to produce authoritative, contextually grounded responses. Unlike prior work focused on statutory texts, LawRAG addresses the nuanced structure of court judgments through an innovative parent document retrieval strategy. This method preserves critical legal context and improves factual accuracy. We evaluate multiple embedding models on a rigorously curated legal QA dataset, identifying GTE-large as the most reliable encoder, achieving a BERT Score of 0.8476 and the highest answer relevancy (0.7444). The system’s Dockerized implementation offers a fully reproducible pipeline for judicial case law analysis, establishing new best practices for contextual retrieval in legal AI applications.