Notes: Local LLM (Ollama) Co-pilot Alternative

Over the holidays and in the first week of the 2025, I muddled through setting up an alternative to Co-pilot using Ollama on my Mac and Continue as the UI in VSCode. I was thinking about writing an article about it and posting it here. However, this article on the IBM Developer blog: Build a local AI co-pilot using IBM Granite Code, Ollama, and Continue is very thorough. I’d say they cover pretty much everything, though I cannot vouch for their models, simply because I have not yet tried them (see update below). Here are some of the settings from my Continue config I’m using right now:

{
  "models": [
    {
      "title": "Llama3 Chat",
      "model": "llama3-8b",
      "provider": "ollama"
    },
    {
      "model": "AUTODETECT",
      "title": "Autodetect",
      "provider": "ollama"
    },
    {
      "model": "StarCoder2:15b",
      "title": "StarCoder2:15b",
      "provider": "ollama"
    }
  ],
  "tabAutocompleteModel": {
    "title": "StarCoder2:3b",
    "model": "StarCoder2:3b",
    "provider": "ollama"
  },
  "embeddingsProvider": {
    "title": "Nomic Embed Text",
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

I started on this journey of discovery because of a confluence of events: 1) GitHub made a serious pitch to get me to pay for CoPilot, sending me lots of e-mails about an “expiring coupon”. I did some research and figured out that “expiring coupon” meant they wanted me to start paying for CoPilot. Then I read this post on Simon Willison’s blog: I can now run a GPT-4 class model on my laptop. I was inspired to try my hand at the same but discovered that the model Willison was using was still way too big for my Mac. Then GitHub started pinging me that “CoPilot is free.” So I gave up on local AI, and turned CoPilot back on… until I figured out that “free” for CoPilot meant “free for a limited time.” SO, I gave it a week’s worth of discovery time, and I found some articles on models that would be better suited for a Co-pilot-esque LLM-powered coding assistant. After a bunch of experimentation, I settled on using Llama3-8b for my chat model, and StarCoder2:3b for my autocomplete model. This has worked out fairly well. It feels pretty much like working with CoPilot. Which isn’t as much of a compliment as you’d imagine, because CoPilot (and autocomplete in general) can be pretty annoying at times.

Future experimentation

I plan to try out the IBM models to see if they work any better. I’ll update this article with my thoughts on that. I also would like to experiment with writing system prompts using Continue’s actions feature.

UPDATE: the IBM models are really nice

First, I must send a shout-out to my colleague at UCSF, Eric Guerin, who first brought the IBM blog posting to my attention. I’ve tried out the IBM models, and they feel much faster than the configuration I was using before (with StarCoder and Nomic Embed Text). I invite you to play around with different models and see what fits best for your work. But, in the end, I think I’ll stick with the IBM models.