Reference/API/Evals
POST
/v1/eval

Launch an eval

Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.

/v1/eval

The Authorization access token

Authorization

Authorization
Required
Bearer <token>

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

In: header


Request Body

Eval launch parameters

project_id
Required
string

Unique identifier for the project to run the eval in

data
Required
Any properties in dataset_id, project_dataset_name

The dataset to use

task
Required
Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

The function to evaluate

scores
Required
array<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

The functions to score the eval on

experiment_namestring

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

metadataobject

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

streamboolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

Status codeDescription
200Eval launch response
curl -X POST "https://api.braintrust.dev/v1/eval" \
  -d '{
  "project_id": "string",
  "data": {
    "dataset_id": "string"
  },
  "task": {
    "function_id": "string",
    "version": "string"
  },
  "scores": [
    {
      "function_id": "string",
      "version": "string"
    }
  ],
  "experiment_name": "string",
  "metadata": {
    "property1": null,
    "property2": null
  },
  "stream": true
}'

Summary of an experiment

{
  "project_name": "string",
  "experiment_name": "string",
  "project_url": "http://example.com",
  "experiment_url": "http://example.com",
  "comparison_experiment_name": "string",
  "scores": {
    "property1": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    }
  },
  "metrics": {
    "property1": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    }
  }
}

On this page