GitHub Copilot Is Training on Your Private Code Now. You Probably Didn't Notice.

If you use GitHub Copilot Free, Pro, or Pro+, your code is being used to train AI models starting April 24. Not just your public repos. Your interaction data, which includes snippets from whatever youre working on, including private repositories.

And its opt-out, not opt-in. Meaning its already enabled unless you go turn it off.

What Actually Changed

GitHub quietly updated their Copilot interaction data usage policy on March 25. The key change: they will now use interaction data from individual Copilot users to train and improve AI models.

Interaction data includes:

Code snippets you accept or modify from Copilot suggestions
The code context around your cursor when Copilot activates
Comments and documentation in your files
File names and repository structure
Your navigation patterns
Chat conversations with Copilot
Thumbs up/down feedback

Thats a lot more than "we look at what you accept to improve suggestions." Thats your file structure, your naming conventions, the comments you write, the way you navigate your codebase. For anyone working on proprietary software, this is your companys intellectual property flowing into a training pipeline.

The Private Repo Problem

Heres the part that caught my attention. When you use Copilot in a private repository, it sends code context to generate suggestions. Thats expected. But now that context, your private code that was sent for suggestion generation, can be used for model training.

GitHub is careful to say they dont access private repositories "at rest." But the moment you open a file and Copilot activates, that code becomes interaction data. And interaction data is now fair game.

This creates a weird middle ground. Your private repo is still private. But the code you actually work on in that repo is not. The distinction between "stored code" and "interaction data" is doing a lot of heavy lifting in this policy.

The Opt-Out Problem

The setting is opt-out by default. GitHub frames the enabled state as "You will have access to the feature," making data donation sound like a perk rather than a cost. Multiple users on Hacker News reported finding the setting enabled when they expected it to be disabled. At least one person claimed it re-enabled itself after being turned off, though that hasnt been confirmed.

You can disable it in Settings > Copilot > Privacy. But how many of the millions of Copilot Free users will actually do that? The default state is what matters for the vast majority of users, and the default is "send your data."

Enterprise and Business accounts are excluded from this policy, which is smart from a legal perspective. But it creates an asymmetric situation: if an individual developer contributes to a corporate repo using a personal Copilot Free account, that interaction data could still enter the training pipeline even if the company has enterprise protections.

The Credential Risk Nobody Mentions

This one genuinely concerns me. Copilot processes whatever is in your working directory. There is no built-in mechanism to exclude sensitive files. If you have API keys, database credentials, or secrets in your project (even if theyre in .env files that are gitignored), Copilot can still see them during active use because it reads your local files, not just whats in git.

Now that interaction data is used for training, theres a theoretical path from your local .env file to a training dataset. GitHub would presumably filter these out, but the policy doesnt explicitly address credential handling in interaction data.

What Mario Rodriguez Said

GitHubs VP of Product, Mario Rodriguez, explained the reasoning: incorporating interaction data showed "meaningful improvements, including increased acceptance rates in multiple languages." They tested this internally with Microsoft employee data first, saw positive results, and decided to expand it to all individual users.

The technical argument makes sense. Models improve with more diverse training data. The question isnt whether this makes Copilot better. Its whether users were adequately informed and given meaningful choice.

What You Should Do

If you want to opt out:

Go to github.com/settings/copilot

Under "Privacy," disable "Allow GitHub to use my data for product improvements"

Verify it actually stuck. Check again in a week.

If youre working on anything proprietary or sensitive, this is worth the 30 seconds. The improvement to Copilots suggestions is marginal for you as an individual. The risk to your companys IP is not.

For teams, if youre not on Business or Enterprise, you should be. Individual accounts on corporate repos are the gap in this policy, and its a gap that could matter.

The Bigger Picture

This isnt unique to GitHub. Every AI tool faces the same tension: models need data to improve, and the best data comes from real usage. The question is always about consent, defaults, and transparency.

GitHubs approach, opt-out by default with a settings toggle, is the same playbook every major platform uses. Its legal. Its probably GDPR-compliant if they handle consent flows correctly in the EU. But its not what informed consent looks like.

The developers who read the blog post and find the setting are fine. The millions who dont read blog posts and dont check their privacy settings are the ones this policy is designed around. And that asymmetry is the whole point.