LLM-as-a-Judge - Diego's Digital Garden

LLM-as-a-judge is a practice that uses a [[Large Language Model|large language model]] to evaluate a given input, mimicking what an expert evaluation would provide. That way, [[Large Language Model|LLMs]] are used as substitutes for human evaluation in tasks, serving as some sort of vague [[Error Metrics|error metric]]. There is no forma definition of LLM-as-a-judge, and some examples of it in use are: - Evaluating how similar, correct or complete a text is based on a reference text, - Using an [[Large Language Model|LLM]] considered "good enough" to evaluate the task of a different [[Large Language Model|LLM]] by performing the task itself (with no ground truth for reference).