Set up a self-hosted large language model with LiteLLM

LiteLLM is an OpenAI proxy server. You can use LiteLLM to simplify the integration with different large language models (LLMs) by leveraging the OpenAI API spec. Use LiteLLM to easily switch between different LLMs.

On Kubernetes

On Kubernetes environments, Ollama can be installed with a Helm chart or following the example in the official documentation.

Example setup with LiteLLM and Ollama

Pull and serve the model with Ollama:
```
ollama pull codegemma:2b
ollama serve
```
Create the LiteLLM proxy configuration that routes a request from the AI Gateway directed to a specific model version instead of the generic named codegemma model. In this example we are using codegemma:2b, which is being served at http://localhost:11434 by Ollama:
```
# config.yaml
model_list:
- model_name: codegemma
  litellm_params:
      model: ollama/codegemma:2b
      api_base: http://localhost:11434
```
Run the proxy:
```
litellm --config config.yaml
```

Send a test request:

curl --request 'POST' \
'http://localhost:5052/v2/code/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"current_file": {
   "file_name": "app.py",
   "language_identifier": "python",
   "content_above_cursor": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>",
   "content_below_cursor": ""
},
"model_provider": "litellm",
"model_endpoint": "http://127.0.0.1:4000",
"model_name": "codegemma",
"telemetry": [],
"prompt_version": 2,
"prompt": ""
}' | jq

{
   "id": "id",
   "model": {
      "engine": "litellm",
      "name": "text-completion-openai/codegemma",
      "lang": "python"
   },
   "experiments": [],
   "object": "text_completion",
   "created": 1718631985,
   "choices": [
      {
         "text": "print(\"Hello, World!\")",
         "index": 0,
         "finish_reason": "length"
      }
   ]
}

Example setup for Codestral with Ollama

When serving the Codestral model through Ollama, there is an additional step required to make Codestral work with both code completions and code generations.

Pull the Codestral model:
```
ollama pull codestral
```

Edit the default template used for Codestral:

ollama run codestral

>>> /set template {{ .Prompt }}
Set prompt template.
>>> /save codestral
Created new model 'codestral'

On this page

On this page

Set up a self-hosted large language model with LiteLLM

On Kubernetes

Example setup with LiteLLM and Ollama

Example setup for Codestral with Ollama

Help & feedback

Docs

Product

Feature availability and product trials

Get Help

On this page

On this page

On this page

On this page

Set up a self-hosted large language model with LiteLLM

On Kubernetes

Example setup with LiteLLM and Ollama

Example setup for Codestral with Ollama

Related topics

Help & feedback

Docs

Product

Feature availability and product trials

Get Help

On this page

On this page