Status | Authors | Coach | DRIs | Owning Stage | Created |
---|---|---|---|---|---|
proposed |
@sean_carroll
|
@jessieay
|
@susie.bee
@m_gill
| devops ai-powered | 2024-03-29 |
Self-Hosted Model Deployment
This Blueprint describes support for customer self-deployments of Mistral LLMs as a backend for GitLab Duo features, as an alternative to the default Vertex or Anthropic models offered on GitLab Dedicated and .com. This initiative supports both internet connected and air-gapped GitLab deployments.
Motivation
Self-Hosted LLM models allow customers to manage the end-to-end transmission of requests to enterprise-hosted LLM backends for GitLab Duo features, and keep all requests within their enterprise network. GitLab provides as a default LLM backends of Google Vertex and Anthropic, hosted externally to GitLab. GitLab Duo feature developers are able to access other LLM choices via the AI Gateway. More details on model and region information can be found here.
Goals
Self-Managed models serve sophisticated customers capable of managing their own LLM infrastructure. GitLab provides the option to connect supported models to LLM features. Model-specific prompts and GitLab Duo feature support is provided by the self-hosted models feature.
- Choice of LLM models
- Ability to keep all data and request/response logs within their own domain
- Ability to select specific GitLab Duo Features for their users
- Non-reliance on the .com AI Gateway
Non-Goals
Other features that are goals of the Custom Models group and which may have some future overlap are explicitly out of scope for the current iteration of this blueprint. These include:
- Local Models
- RAG
- Fine Tuning
- GitLab managed hosting of open source models, other than the current supported third party models.
- Bring Your Own API Key (BYOK)
Proposal
GitLab will provide support for specific LLMs hosted in a customer’s infrastructure. The customer will self-host the AI Gateway, and self-host one or more LLMs from a predefined list. Customers will then configure their GitLab instance for specific models by LLM feature. A different model can be chosen for each GitLab Duo feature.
This feature is accessible at the instance-level and is intended for use in GitLab Self-Managed instances.
Self-Hosted Model Deployment is a GitLab Duo Enterprise Add-on.
Design and implementation details
Component Architecture
Diagram Notes
- User request: A GitLab Duo Feature is accessed from one of three possible starting points (Web UI, IDE or Git CLI). The IDE communicates directly with the AI Gateway.
- LLM Serving Config: The existence of a customer-hosted model along with its connectivity information is declared in GitLab Rails and exposed to the AI Gateway with an API.
- GitLab Duo Feature Configuration: For each supported GitLab Duo feature, a user may select a supported model and the associated prompts are automatically loaded.
- Prompt Retrieval: GitLab Rails chooses and processes the correct prompt(s) based on the GitLab Duo Feature and model being used
-
Model Routing: The AI Gateway routes the request to the correct external AI endpoint. The current default for GitLab Duo features is either Vertex or Anthropic. If a Self-Managed model is used, the AI Gateway must route to the correct customer-hosted model’s endpoint. The customer-hosted model server details are the
LLM Serving Config
and retrieved from GitLab Rails as an API call. They may be cached in the AI Gateway. - Model API interface: Each model serving has its own endpoint signature. The AI Gateway needs to be able to communicate using the right signature. We will support commonly supported model serving formats such as the OpenAI API spec.
Configuration
Configuration is set at the GitLab instance-level; for each GitLab Duo feature a drop-down list of options will be presented. The following options will be available:
- Self-Hosted Model 1
- Self-Hosted Model n
- Feature Inactive
In the initial implementation a single self-hosted Model will be supported, but this will be expanded to a number of GitLab-defined models.
AI Gateway Deployment
Customers will be required to deploy a local instance of the AI Gateway in their own infrastructure. The initial means to do this is via Docker container, as described in this issue.
Self-hosted Runway will be the preferred delivery mechanism for deploying the AI Gateway. Future options, in order of preference are:
- Runway discussion
- Kubernetes deployment issue
- Omnibus packaging issue
It should be noted that deployment by Docker container is a temporary measure only, and will be superceeded by the three options listed above.
Prompt Support
For each supported model and supported GitLab Duo feature, prompts will be developed and evaluated by GitLab. They will be baked into the Rails Monolith source code.
When the standard prompts are migrated into either the AI Gateway or a prompt template repository (direction is to be determined), the prompts supporting self-hosted models will also be migrated.
Supported LLMs
Installation instructions will be added to the Developer documentation. issue
This list will expand in the near future, but the overall architecture will be the same
GitLab Duo Feature Support
Feature | Default Model | Mistral AI 7B v0.1 | Mixtral 8x22B |
---|---|---|---|
GitLab Duo Chat | Anthropic Claude-2 Vertex AI Codey textembedding-gecko | Not planned | Not planned |
Code Completion | Vertex AI Codey code-gecko | ✅ | ✅ |
Code Generation | Anthropic Claude-2 | ✅ | ✅ |
Git Suggestions | Vertex AI Codey codechat-bison | Not planned | Not planned |
Discussion Summary | Vertex AI Codey text-bison | Not planned | Not planned |
Issue Description Generation | Anthropic Claude-2 | Not planned | Not planned |
Test Generation | Anthropic Claude-2 | Not planned | Not planned |
Merge request template population | Vertex AI Codey text-bison | Not planned | Not planned |
Suggested Reviewers | GitLab In-House Model | Not planned | Not planned |
Merge request summary | Vertex AI Codey text-bison | Not planned | Not planned |
Code review summary | Vertex AI Codey text-bison | Not planned | Not planned |
Vulnerability explanation | Vertex AI Codey text-bison Anthropic Claude-2 if degraded performance | Not planned | Not planned |
Vulnerability resolution | Vertex AI Codey code-bison | Not planned | Not planned |
Code explanation | Vertex AI Codey codechat-bison | Not planned | Not planned |
Root cause analysis | Vertex AI Codey text-bison | Not planned | Not planned |
Value stream forecasting | GitLab In-House Model | Not planned | Not planned |
The Suggested Reviewers
and Value stream forecasting
models are Convolutional Neural Networks (CNNs) developed in-house by GitLab.
LLM-hosting
Customers will self-manage LLM hosting. For Mistral, GitLab recommends following the Mistral Self-Deployment documentation
GitLab Duo License Management
The Self-Managed GitLab Rails will self-issue a token (same process as for .com) that the local AI Gateway can verify, to guarantee that cross-service communication is secure. Details
System Architectures
At this time a single system architecture only is supported. See the Out of Scope section for discussion on alternatives.
Self-Managed GitLab with self-hosted AI Gateway
This system architecture supports both a internet-connected GitLab and AI Gateway, or can be run in an air-gapped environment. Customers install a self-managed AI Gateway within their own infrastructure. The long-term vision for such installations is via Runway, but until that is available a Docker-based install will be supported.
Self-Managed customers who deploy a self-managed AI Gateway will only be able to access self-hosted models at this time. Future work around Bring Your Own Key may change that in the future.
Development Environment
Engineering documentation will be produced on how to develop this feature, with work in progress on:
Out of scope
- It would be possible to support customer self-hosted models within a customer’s infrastructure for dedicated or .com customers, but this is not within scope at this time.
- Support for models other than those listed in the Supported LLMs section above.
- Support for modified models.
Out of scope System Architectures
There are no plans to support these system architectures at this time, this could change if there was sufficient customer demand.
Self-Managed GitLab with .com AI Gateway
In this out-of-scope architecture a self-managed customer continues to use the .com hosted AI gateway, but points back to self-managed models.
.com GitLab with .com AI Gateway
In this out-of-scope architecture .com customers point to self-managed models. This topology might be desired if there were better quality of results for a given feature by a specific model, or if customers could improve response latency by using their own model-serving infrastructure.
GitLab Dedicated
Support will not be provided for Dedicated customers to use a self-hosted AI Gateway and self-hosted models. Dedicated customers who use GitLab Duo features can access them via the .com AI Gateway. If there is customer demand for self-managed models for Dedicated customers, this can be considered in the future.
Externally hosted models
It is expected that customers will self-host models.