Blog

Open sourcing the AI proxy

Ankur Goyal
27 November 2023

Last week, we released the Braintrust AI Proxy, a new, free way to access LLaMa2, Mistral, OpenAI, Anthropic, and many other models behind the OpenAI protocol with built-in caching and API key management.

Folks immediately started reaching out about running the proxy in production. We firmly believe that code on the critical path to production should be open source, so we're excited to announce that the proxy's source code is now available on GitHub under the MIT license.

Deployment options

You can continue to access the proxy, for free, by using the hosted version at https://braintrustproxy.com. It's hosted on Cloudflare workers and end-to-end encrypts cached data using 256-bit AES-GCM encryption. For more details, see the documentation or source code.

The repository also contains instructions for deploying the proxy to Vercel Edge Functions, Cloudflare workers, AWS Lambda, or as a plain-old Express server.

Benchmarks

I did some quick benchmarks, from my in-laws' place in California and an EC2 machine (US East N. Virginia) to compare performance across options (code). The AWS Lambda functions are deployed in us-east-1. aws-pc is AWS Lambda with provisioned concurrency.

In-laws (CA)

$ python proxy_benchmark.py -n 100
cloudflare: AVG: 57.98ms,   MIN: 42.39ms,   MAX: 258.04ms
vercel:     AVG: 82.05ms,   MIN: 54.65ms,   MAX: 326.60ms
aws:        AVG: 131.95ms,  MIN: 103.64ms,  MAX: 722.90ms
aws-pc:     AVG: 145.10ms,  MIN: 109.22ms,  MAX: 1704.05ms

EC2 (US East N. Virginia)

$ python proxy_benchmark.py -n 100
cloudflare: AVG: 32.23ms,   MIN: 20.15ms,   MAX: 283.90ms
vercel:     AVG: 55.72ms,   MIN: 25.03ms,   MAX: 512.94ms
aws:        AVG: 43.91ms,   MIN: 22.20ms,   MAX: 130.78ms
aws-pc:     AVG: 68.13ms,   MIN: 24.46ms,   MAX: 973.50ms

As you can see, Cloudflare and Vercel are consistently very fast, and AWS Lambda in US East suffers (as expected) when measured from CA. I was surprised that AWS Lambda with provisioned concurrency was slower than without. Perhaps I misconfigured something...

Additional features

Along with the open source release, the proxy contains a number of useful built-in features.

Caching

The proxy automatically caches responses from the model provider if you set a seed value or temperature=0. Seeds are a new feature in the OpenAI API that allows you to create reproduceable results, but most model providers do not yet support them. The proxy automatically handles that for you.

API key management

You can add API keys across providers as secrets in Braintrust, and use a single API key to access all of them. This is a great way to manage your API keys in one place, and share them with your team.

Load balancing

You can now add multiple keys and organizations as secrets in Braintrust, and the proxy will automatically load balance across them for you. This is a simple way to add resiliency across OpenAI accounts or providers (e.g. OpenAI and Azure).

Azure OpenAI

You can access Azure's OpenAI endpoints through the proxy, with vanilla OpenAI drivers, by configuring Azure endpoints in Braintrust. If you configure both OpenAI and Azure endpoints, the proxy will automatically load balance between them.

Configure secrets

Replicate Lifeboat

We now support the Replicate lifeboat meta/llama-2-70b-chat model, including tool use, as an option. Simply enter your Replicate API key in Braintrust to get started.

What's next

We have an exciting roadmap ahead for the proxy, including more advanced load balancing/resiliency features, support for more models/providers, and deeper integrations into Braintrust.

If you have any feedback or want to collaborate, send us an email at info@braintrustdata.com or join our Discord.