Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation

Li, Bryan; Luo, Jiaming; Briakou, Eleftheria; Cherry, Colin

Computer Science > Computation and Language

arXiv:2503.05010 (cs)

[Submitted on 6 Mar 2025]

Title:Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation

Authors:Bryan Li, Jiaming Luo, Eleftheria Briakou, Colin Cherry

View PDF HTML (experimental)

Abstract:While large language models (LLMs) have been increasingly adopted for machine translation (MT), their performance for specialist domains such as medicine and law remains an open challenge. Prior work has shown that LLMs can be domain-adapted at test-time by retrieving targeted few-shot demonstrations or terminologies for inclusion in the prompt. Meanwhile, for general-purpose LLM MT, recent studies have found some success in generating similarly useful domain knowledge from an LLM itself, prior to translation. Our work studies domain-adapted MT with LLMs through a careful prompting setup, finding that demonstrations consistently outperform terminology, and retrieval consistently outperforms generation. We find that generating demonstrations with weaker models can close the gap with larger model's zero-shot performance. Given the effectiveness of demonstrations, we perform detailed analyses to understand their value. We find that domain-specificity is particularly important, and that the popular multi-domain benchmark is testing adaptation to a particular writing style more so than to a specific domain.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2503.05010 [cs.CL]
	(or arXiv:2503.05010v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.05010

Submission history

From: Bryan Li [view email]
[v1] Thu, 6 Mar 2025 22:23:07 UTC (368 KB)

Computer Science > Computation and Language

Title:Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators