With AI-based code completion being the latest rage, I wanted to check out the basics of local capabilities that are available as an alternative to Cursor‘s hosted services. The primary prerequisite that I have is that it must have a native plugin for Visual Studio Code. This search brought be to Llama.VSCode, which (among other things) allows you to do code completion via an underlying Llama.cpp server which utilizes your local GPU. Since I have an RTX 5070 TI, I’m going to use the FIM Qwen 7B model. The setup is pretty straight forward, unless you are like me and do your development work inside a WSL session. This post is going to detail how to accomplish code completion using the WSL environment.
First, we’re going to install llama.cpp, which can be done via their releases page, or, in my case on the hosting Windows 11 box, by using the winget command.
winget install llama.cpp

Once installed you’ll need to reload you shell (Powershell) and start the llama-server using your desired model, as well as allowing it to listen on all IP Addresses.
llama-server --fim-qwen-7b-default --host 0.0.0.0

The download process can take a while, but once it’s completed your output should show the instance listening. Be mindful of your firewall rules here, you might need to open it up.
Next, we’re going to install the llama-vscode plugin into the WSL-connected Visual Studio instance.

Out of the box the local usage of a environment is very easy. You simply click on the llama.vscode in the bottom right of the Visual Studio window and hit “Select/start env”. However, since we’re not using a “local” instance on the WSL VM, we need to instead add a new Completion Model as shown below.

The process is simple:
- Name = QWEN 7b
- Local Start Command = omitted
- Endpoint = http://<host IP>:8012
- Model Name = omitted
- API Key = False
After following the wizard you will be prompted to review and save the settings.

At this point auto completions will work, however, the settings are not saved and will not survive if you close out VS Code.
You can also use the same model to do chat sessions, all you need to do is repeat the above steps, but for a Chat Model.
To save our Completion Model we need to select the “Envs” setting and add a new Completion Model.

This pops up in an awkward dialog in the lower left corner, but you need to hit the “Compl” button aand select your newly created Completion Model. If you have configured a Chat Model as well, you’ll need to add it here. After you do this, you will also need to click the “Add Env” button to save your changes.

At this point your changes are now saved and will survive future sessions. You can enable the auto-completion functionality by selecting your new environment alongside all the other defaults.

Now, you are free to enjoy the various benefits offered by your auto-complete functionality…

As well as using the chat functionality.
