15 May 2025/AI

Offline AI - Integrating Local LLMs with VSCode

Introduction

In the age of AI, developers are increasingly looking for ways to integrate powerful language models into their workflows. While cloud-based solutions like OpenAI’s GPT models are popular, there’s a growing demand for offline and private alternatives. This guide walks you through integrating a locally running LLM (DeepSeek R1 Distill LLaMA) with the Continue extension in Visual Studio Code.

Why Offline Mode is Useful

Running AI models locally offers several advantages:

Privacy: Sensitive codebases and data never leave your machine, ensuring compliance with privacy policies and regulations.
Speed: Local models eliminate latency caused by network requests, providing near-instant responses.
Cost Efficiency: Avoid recurring API costs by leveraging your local hardware.
Customization: Fine-tune models to your specific needs without relying on external providers.
Offline Access: Work uninterrupted, even without an internet connection.

For developers who value control and independence, offline AI development is a game-changer.

The full details is on github at continue-local-llm-integration.

🐳 Step 1: Run the LLM in Docker

To get started, you’ll need to run the DeepSeek R1 Distill LLaMA model locally using Docker. This model is OpenAI-compatible and provides a robust foundation for offline AI development.

Resources:

Steps:

Pull the DeepSeek model from Docker Hub.
Start the container and ensure it exposes an endpoint like:
```
http://localhost:12434/engines/v1/chat/completions
```
Test the setup using the provided test.ps1 script in this repository.

🧠 Step 2: Install the Continue Extension

The Continue extension for Visual Studio Code is a powerful tool that brings AI-assisted coding to your IDE. It supports OpenAI-compatible APIs, making it an ideal choice for integrating local LLMs.

Installation:

Open Visual Studio Code.
Navigate to the Extensions Marketplace.
Search for “Continue” and install the extension.

⚙️ Step 3: Configure Continue

To connect the Continue extension to your locally running LLM, you’ll need to update its configuration.

Locate the sample-config\config.yaml file in this repository.
Copy the file to %userprofile%/.continue on your machine.
Update the configuration to point to your local LLM endpoint:
```
api_url: http://localhost:12434/engines/v1/chat/completions
```

⚽ Step 4: Start Coding with AI

Once everything is set up, you can start leveraging the power of local AI models directly in your IDE. The Continue extension provides features like code completion, refactoring suggestions, and more.

Conclusion

By running models offline, you gain privacy, speed, and control over your development environment. Having said so, running models locally takes up a lot of resource, at least on my machine. Hopefully the experience is smoother on yours.

Credits

Banner image from BoliviaInteligente