Evals - Docs - Evals

POST

/v1/eval

Launch an eval

Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.

Authorization

`Authorization`
Required
Bearer <token>

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

In: header

Request Body

Eval launch parameters

`project_id`
Required
string

Unique identifier for the project to run the eval in

`data`
Required
Any properties in dataset_id, project_dataset_name

The dataset to use

`task`
Required
Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

The function to evaluate

`scores`
Required
array<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

The functions to score the eval on

`experiment_name`string

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

`metadata`object

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

`stream`boolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

Status code	Description
`200`	Eval launch response

curl -X POST "https://api.braintrust.dev/v1/eval" \
  -d '{
  "project_id": "string",
  "data": {
    "dataset_id": "string"
  },
  "task": {
    "function_id": "string",
    "version": "string"
  },
  "scores": [
    {
      "function_id": "string",
      "version": "string"
    }
  ],
  "experiment_name": "string",
  "metadata": {
    "property1": null,
    "property2": null
  },
  "stream": true
}'

Summary of an experiment

{
  "project_name": "string",
  "experiment_name": "string",
  "project_url": "http://example.com",
  "experiment_url": "http://example.com",
  "comparison_experiment_name": "string",
  "scores": {
    "property1": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    }
  },
  "metrics": {
    "property1": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    }
  }
}

Launch an eval

Body

Authorization

AuthorizationRequiredBearer <token>

Request Body

project_idRequiredstring

dataRequiredAny properties in dataset_id, project_dataset_name

dataset_id

project_dataset_name

taskRequiredAny properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

function_id

project_slug

global_function

prompt_session_id

inline_code

inline_prompt

scoresRequiredarray<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

Object 1

experiment_namestring

metadataobject

metadata

streamboolean

Response

Typescript

On this page