Site icon Tomato Soup

How to Set Up a Fully Local AI in Visual Studio

Blueprint-style illustration: a laptop with Visual Studio open, a neural network diagram beneath, and a context menu highlighting 'Generate Function...' to suggest AI-assisted coding.

VA Intelligence is Visual Assist’s local AI feature. It runs Gemma3 on your GPU through Ollama, both bundled and managed by Visual Assist itself. No API keys, no cloud round-trip, no telemetry pipeline waiting to be audited by your security team.

By the end of this guide you’ll have two features wired up inside Visual Studio: Explain with AI and Change Code with AI. If you’ve poked at Copilot or Cursor and bounced off the “your code leaves the building” question, this is the alternative that keeps it all local on your device.


Before you start

A note on Visual Studio version: Visual Assist runs on Visual Studio <2017, 2019, 2022, and 2026. Latest features are tested on the latest version, so we recommend using the latest stable version of VS. If you’re still on an older VS, the install steps below are the same, but we can’t promise the same polish.

A note on GPU: A supported GPU is required. CPU-only is technically possible but slow enough that we don’t recommend it. See the GPU requirements section at the end for details.


Step 1 — Enable VA Intelligence

VA Intelligence ships disabled. Nothing about it — not Ollama, not the model weights, not a background process — exists on your machine until you opt in.

  1. In Visual Studio 2026, open VAssistX -> Visual Assist Options -> VA Intelligence.
  2. Check Enable VA Intelligence.
  3. Click Setup VA Intelligence.

A confirmation dialog appears showing what’s about to be installed (Ollama 0.11.8 and Gemma3:12b, ~9 GB), the minimum hardware spec, and a link to the Whole Tomato AI questions page.

  1. Click Proceed to install Ollama and Gemma.

  1. The download starts. You can close the Options dialog and keep working — reopen it any time to check progress.

  1. When the install finishes, you’ll see a green check and the message VA Intelligence is set up and ready to go. No Visual Studio restart required.

If you ever want to reclaim the disk space, Remove Files in the same panel uninstalls everything VA Intelligence put on the machine. Free Memory clears the model out of RAM and VRAM until your next request — useful when you want your GPU back for a build or a game.


Step 2 — Explain with AI

The first feature: select code, ask Visual Assist what it does, get an answer grounded in your actual file rather than a generic stack-overflow average.

  1. Select a symbol, expression, or block in your C++ file.
  2. Open the VA Quick Actions menu with Ctrl + Shift + Q.
  3. Choose Explain ‘<symbol>’ with AI.

  1. The explanation appears in a panel beside your code. Gemma3 runs locally, so the first response on a cold model takes a few seconds; subsequent requests are faster.


What gets sent: your selection plus a bounded amount of surrounding context that VA collects from the open file. Your full project, build configuration, and anything outside that window stay where they are.


Step 3 — Change Code with AI

The second feature: tell Visual Assist what you want to change, review the diff, accept or reject. Nothing gets applied without your say-so.

  1. Select the code you want to change.
  2. Press Ctrl + Shift + Q for Quick Actions.
  3. Choose Change Code with AI.
  4. In the Describe the change prompt, write a specific instruction.
    “Rename this parameter to snake_case” is better than “clean this up.” Gemma3 is a capable model, but it rewards precision.

  1. Press Ctrl+Enter to run.
  2. VA shows a side-by-side diff: your original on the left, the proposed change on the right. Use Accept to apply, Reject to drop it, or Regenerate to try again.


The diff is the contract. VA Intelligence will never modify a file without showing you the change first. There is no “apply automatically” toggle.


How it all fits together

Visual Studio calls Visual Assist. Visual Assist calls Ollama, which calls Gemma3. The result comes back the same way. Every box in that chain lives on your machine.


Try it on your codebase

The whole point of running locally is that you can point it at the code you’re not allowed to share with anyone else. Open the file you’d never paste into ChatGPT or Claude and use Explain with AI what the weirdest function in there is doing.

If you don’t have Visual Assist yet, the 30-day trial includes VA Intelligence.

Try Visual Assist
30-day free trial · No credit card


Want to know more?

Why local? The design behind VA Intelligence

VA Intelligence is the opposite of how most AI coding tools have been built. Most assume the model is the product and your code is the input. We assume your code is the product, and the model is a tool you reach for when you want it.

That posture shapes everything. The model is local because a model that ships your code somewhere else has already made the call for you. The features are opt-in because AI should be a thing you trigger, not a thing that follows you around. The diff is mandatory because non-deterministic output on a million-line C++ codebase is a reason to keep the human in the loop, not a quirk to wave past.

More context-aware features are coming. The line above is the part that won’t move.

GPU requirements and troubleshooting

VA Intelligence needs a GPU with 12 GB of VRAM or more. NVIDIA cards on recent drivers (531+) and AMD cards with ROCm are both supported. See Ollama’s GPU support docs for the full compatibility matrix.

CPU-only mode is technically possible but the inference latency is high enough that the feature stops being interactive. If your workstation doesn’t meet spec, the most common workaround is to run Visual Studio on the workstation and Ollama on a desktop machine with a capable GPU. That’s outside the scope of this guide.

For any other setup issue, the VA Intelligence FAQ is where we keep current answers, or drop us a note.

Exit mobile version