Code Suggestions data usage
Code Suggestions is powered by a generative AI model.
Your personal access token enables a secure API connection to GitLab.com or to your GitLab instance. This API connection securely transmits a context window from your IDE/editor to the GitLab AI Gateway, a GitLab hosted service. The gateway calls the large language model APIs, and then the generated suggestion is transmitted back to your IDE/editor.
GitLab selects the best-in-class large-language models for specific tasks. We use Google Vertex AI Code Models and Anthropic Claude for Code Suggestions.
Telemetry
For self-managed instances that have enabled Code Suggestions and for SaaS accounts, we collect aggregated or de-identified first-party usage data through our Snowplow collector. This usage data includes the following metrics:
- Language the code suggestion was in (for example, Python)
- Editor being used (for example, VS Code)
- Number of suggestions shown, accepted, rejected, or that had errors
- Duration of time that a suggestion was shown
- Prompt and suffix lengths
- Model used
- Number of unique users
- Number of unique instances
Inference window context
Code Suggestions inferences against the currently opened file, the content before and after the cursor, the filename, and the extension type. For more information on possible future context expansion to improve the quality of suggestions, see epic 11669.
Training data
GitLab does not train generative AI models based on private (non-public) data. The vendors we work with also do not train models based on private data.
For more information on GitLab Code Suggestions data sub-processors, see:
- Google Vertex AI Codey APIs data governance and responsible AI.
- Anthropic Claude’s constitution.