
Hemant Bhatt
Zero-Cost Code Assistance: Setting Up Local Completions in VSCode

Introduction
Your GPU isn't just for gaming anymore—it's your ticket to private, lightning-fast code completions without spending a dime. These local completions offer three game-changing benefits:
- 100% Free - No subscription fees or credit limits
- Completely Private
- Your code never leaves your machine Remarkably Fast - Instant suggestions without the network lag
- Ollama
Before we get started with the code suggestions, you will need to install Ollama. Head over toOllama.comand download and install the latest version for your OS. - Choose Your AI Companion
Once installed, we need to pick a LLM model. For simplicity's sake, I recommend qwen2.5-coder:0.5b. It's like starting with a reliable sedan before test-driving sports cars. You can always explore other models later when you're feeling adventurous. To install it, you need to run the below command in the terminal or command prompt:terminal1 ollama run qwen2.5-coder:0.5b - Install and Setup Continue
VSCODE Continue
This is the visual studio extension that will let you connect your Qwen coder model and use it in the editor. Go ahead and install it.
Once installed you can click the extension’s icon on the left sidebar of VScode. The last icon is that of continue in the image below. Here you have to choose local Assistant during setup.Now it is time to connect your model to this assistant. Click on the local assistant and then the settings icon.
A config JSON will open, add this key value pair to it.
Here we are specifying that for code completions, we want the continue to use qwen2.5-coder:0.5b model which is provided by the ollama provider.config1 "tabAutocompleteModel": { 2 "title": "Qwen code autocomplete", 3 "provider": "ollama", 4 "model": "qwen2.5-coder:0.5b" 5 },
That’s it! Your free autocomplete is ready to be tested. Below is the demo of the code completions working. - Qwen Coder
Qwen-coder 2.5 comes in 6 different variants, each with a different parameter size.
You can take a practical approach by starting with the smallest parameter model and working your way up. How well these models run on your system directly depends on your hardware capabilities. The goal is to find that goldilocks zone: a model large enough to provide useful code completions, but not so large that it slows your system to a crawl. - Chat
To configure the chat model you have to add a new key value pair to the config JSON. Open Config JSON the same way you did previously and add the new ‘models’ entry to it if it does not exist and add the qwen coder model to it.The ‘models’ key is for configuring the chat models where as the ‘tabAutocompleteModel’ key we saw previously, is for configuring the autocomplete model. Setting up remote chat model is equally easy. Just click the ‘+ Add chat model’ button and choose the model and provider.config1 "models": [ 2 { 3 "title": "QWEN coder 0.5B", 4 "provider": "ollama", 5 "model": "qwen2.5-coder:0.5b" 6 } 7 ] For chat related tasks, i would recommend a powerful model provider like Gemini, OpenAI or Anthropic. Although you have to pay for the tokens, the results will far exceed the results of a local model.
Conclusion:
As you grow more comfortable with your local setup, consider experimenting with different models to find the perfect balance between intelligence and performance for your specific hardware. The world of local AI assistants is rapidly evolving, with new and improved models being released regularly.