"""**Evaluation** chains for grading LLM and Chain outputs. This module contains off-the-shelf evaluation chains for grading the output of LangChain primitives such as language models and chains. **Loading an evaluator** To load an evaluator, you can use the :func:`load_evaluators ` or :func:`load_evaluator ` functions with the names of the evaluators to load. .. code-block:: python from langchain.evaluation import load_evaluator evaluator = load_evaluator("qa") evaluator.evaluate_strings( prediction="We sold more than 40,000 units last week", input="How many units did we sell last week?", reference="We sold 32,378 units", ) The evaluator must be one of :class:`EvaluatorType `. **Datasets** To load one of the LangChain HuggingFace datasets, you can use the :func:`load_dataset ` function with the name of the dataset to load. .. code-block:: python from langchain.evaluation import load_dataset ds = load_dataset("llm-math") **Some common use cases for evaluation include:** - Grading the accuracy of a response against ground truth answers: :class:`QAEvalChain ` - Comparing the output of two models: :class:`PairwiseStringEvalChain ` or :class:`LabeledPairwiseStringEvalChain ` when there is additionally a reference label. - Judging the efficacy of an agent's tool usage: :class:`TrajectoryEvalChain ` - Checking whether an output complies with a set of criteria: :class:`CriteriaEvalChain ` or :class:`LabeledCriteriaEvalChain ` when there is additionally a reference label. - Computing semantic difference between a prediction and reference: :class:`EmbeddingDistanceEvalChain ` or between two predictions: :class:`PairwiseEmbeddingDistanceEvalChain ` - Measuring the string distance between a prediction and reference :class:`StringDistanceEvalChain ` or between two predictions :class:`PairwiseStringDistanceEvalChain ` **Low-level API** These evaluators implement one of the following interfaces: - :class:`StringEvaluator `: Evaluate a prediction string against a reference label and/or input context. - :class:`PairwiseStringEvaluator `: Evaluate two prediction strings against each other. Useful for scoring preferences, measuring similarity between two chain or llm agents, or comparing outputs on similar inputs. - :class:`AgentTrajectoryEvaluator ` Evaluate the full sequence of actions taken by an agent. These interfaces enable easier composability and usage within a higher level evaluation framework. """ # noqa: E501 from langchain.evaluation.agents import TrajectoryEvalChain from langchain.evaluation.comparison import ( LabeledPairwiseStringEvalChain, PairwiseStringEvalChain, ) from langchain.evaluation.criteria import ( Criteria, CriteriaEvalChain, LabeledCriteriaEvalChain, ) from langchain.evaluation.embedding_distance import ( EmbeddingDistance, EmbeddingDistanceEvalChain, PairwiseEmbeddingDistanceEvalChain, ) from langchain.evaluation.exact_match.base import ExactMatchStringEvaluator from langchain.evaluation.loading import load_dataset, load_evaluator, load_evaluators from langchain.evaluation.parsing.base import ( JsonEqualityEvaluator, JsonValidityEvaluator, ) from langchain.evaluation.parsing.json_distance import JsonEditDistanceEvaluator from langchain.evaluation.parsing.json_schema import JsonSchemaEvaluator from langchain.evaluation.qa import ContextQAEvalChain, CotQAEvalChain, QAEvalChain from langchain.evaluation.regex_match.base import RegexMatchStringEvaluator from langchain.evaluation.schema import ( AgentTrajectoryEvaluator, EvaluatorType, PairwiseStringEvaluator, StringEvaluator, ) from langchain.evaluation.scoring import ( LabeledScoreStringEvalChain, ScoreStringEvalChain, ) from langchain.evaluation.string_distance import ( PairwiseStringDistanceEvalChain, StringDistance, StringDistanceEvalChain, ) __all__ = [ "EvaluatorType", "ExactMatchStringEvaluator", "RegexMatchStringEvaluator", "PairwiseStringEvalChain", "LabeledPairwiseStringEvalChain", "QAEvalChain", "CotQAEvalChain", "ContextQAEvalChain", "StringEvaluator", "PairwiseStringEvaluator", "TrajectoryEvalChain", "CriteriaEvalChain", "Criteria", "EmbeddingDistance", "EmbeddingDistanceEvalChain", "PairwiseEmbeddingDistanceEvalChain", "StringDistance", "StringDistanceEvalChain", "PairwiseStringDistanceEvalChain", "LabeledCriteriaEvalChain", "load_evaluators", "load_evaluator", "load_dataset", "AgentTrajectoryEvaluator", "ScoreStringEvalChain", "LabeledScoreStringEvalChain", "JsonValidityEvaluator", "JsonEqualityEvaluator", "JsonEditDistanceEvaluator", "JsonSchemaEvaluator", ]