Universal components

Our boilerplate provides you with best opensource technology to support your LLMs.
You can use it as building blocks for your own solutions, or use prepared templates.

LLMs
Long memory
Experimentation
Production

LLMs supported in our boilerplate

DeepSeek

You can serve DeepSeek models fully privately on your servers. Our pick for the best open-source LLM with reasoning.

Qwen 3

Alibaba's best. Our pick for the best instruction-following open-source LLM.

Phi4

Microsoft's best open-source model. Our pick for the best general and cost-effective open-source LLM.

🦙

Llama

Meta's finest research. Models of all sizes and use cases.

Mistral

Pioneers of the open-source LLMs.

Uncensored 18+ models

Sometimes you need to go beyond "I'm sorry, Dave. I'm afraid I can't do that."

🤗

Other open-source models

Many more other open-source models with different number of parameters and quantization.

OpenAI

OpenAI if it suits your needs better. Our pick for the best proprietary models.

Anthropic

The best alternative to OpenAI models. Probably the best coding model in the market.

Extended context window supported

Up to 1M-token context length

Some open-source models are trained to work well even with a very big context.

RAG

Retrieval-augmented generation

We have practical use cases for Qdrant's RAG that will retrieve the most relevant information to enhance your prompt before querying an LLM.

Multi-step pipelines

We have ready-to-use techniques to query LLMs multiple times with different goals before finalizing the result.

Experimentation

Ready-to-use templates

15 handpicked templates right out of the box. Not hundreds - only the ones you will actually use.

Templates over prompts

Treating your prompts and lines in it as data. Also, storing all the logic inside the template which allows for fewer changes in the actual codebase.

Experiment tracking

Storing and easily displaying experimentation results via MLFlow.

Production

Ollama

The easiest way to use open-source LLMs.

Llama.cpp

Fast inference in pure C/C++, but with available Python bindings.

vLLM

An awesome LLM inference engine. Our pick for the best way to productionize your LLMs.

0 to 1 to N

Set up your own LLM

Don't waste time on choosing the right stack or ideating on how to evaluate prompts and models.

Get LLM boilerplate