Copilot on blog.iankulin.com

Where I'm up to with AI for coding

Mon, 03 Mar 2025 00:00:00 +0000

There’s still plenty of controversy about LLMs for coding, and not without reason. But I thought I’d run through what I’ve tried, and where I’ve landed for using AI. Also what the pitfalls are, where it’s useful and how it’s changed my practice.

Issues

Training data

The training data for large language models generally is problematic. There’s no doubt that they have been trained on copyright material. With code it’s slightly less murky since there is a high availability of good quality open source data with attached licenses to train models on. No doubt this include code written by people who don’t approve of it being used by AI, but I think the popular reading of most open source licenses is that using it for training is fine.

Accuracy

Another area where AI code is better than other AI use is in verifiability. It’s possible to write good tests to verify a lot of software behaviour. This somewhat negates the problem of hallucinations.

Energy Use

Energy use is an issue I don’t really have an answer for. When IT companies are investigating owning their own power stations that’s a clear sign that this is a problem that the experts expect to get worse than better. I’ve lived through so many IT bubbles now that I’m sure that the hype around AI will die down somewhat and there won’t be VC money for adding AI to products to make them worse in a few years. Hopefully, AI will be left running in the areas only where it’s genuinely helpful like most of the previous IT fashions.

I also have a growing suspicion that we might have got to the end of the performance gains of making models bigger. Surely by now all of the data that can be gobbled up has been, and the improvements seem to be coming in smaller steps. I imagine future gains won’t involve making models bigger, but integrating them into tasks more effectively or building them to be more focused.

Nevertheless, for the moment, the power usage, especially for training, and especially that the US energy mix now looks like it’s moving away from renewables, is my main concern about AI use.

Leaking Data

Another issue is leaking data. This does not overly affect me since I open source my code anyway, but anyone using it in a real job would have to be following policy on this which in most cases would be - don’t use it. There are a couple of problems related to the AI vacuuming up all it’s context from everything in your projects that does worry me - Because I’m so comfortable in VS Code and git, I keep all my work notes as markdown and manage them in VS Code, and I also use plain text accounting (BeanCount). I don’t want any of that data heading out into the AI behemoths, so I’m constantly turning the plugins off and on.

It is possible to use local models, especially if you’re on a Mac. I’ve used Ollama with the Continue plugin for code completion and kept my data to myself. More about this experience later.

What I’ve tried

I used Github Copilot for the trial period and was so impressed with it I paid for the service for a couple of months. This was mainly for code completion although I did use the chat a bit - it just wasn’t as comfortable in the editor.

I switched to Codeium after hearing Kevin Howe on a Syntax episode. For code, this seems right on par with my (now outdated) experience of Github Copilot. Copilot did seem a bit better at figuring things out from the context though - for example my plain text accounting format is probably not in the training data for either service, but when I was letting it they both would produce suggestions in the correct format, but Copilot was making better suggestions. For example it would suggest an expense was for fuel if the payee was a petrol station who appeared elsewhere in my current file.

I then discovered Ollama, and with an M1 MacBook it’s a really simple matter to just pull models down and play with them. Mostly at the command line, but I did use Open Web UI a bit for a more ChatGPT like experience. I played around with trying to do RAG via Open Web UI but with poor results.

Using Ollama (which provides a REST type API to your models) I switched to the Continue VS Code plugin so I could do code-completion locally. This worked fine, but, 1) it was a bit slower than Copilot or Codeium. Only by a bit, but the difference was it was thinking slower than me, so I would have to wait for it, whereas with the big online services I was constantly typing over their suggestions, so I gave up on it. If my current M1 MacBook dies I’ll buy an M4 and try this again.

I have used, and continue to use, a combination of Claude, ChatGPT, V0, and DeepSeek Coder in the web browser chat modes. In fact, this is probably my main use. I don’t pay for any of them (thank you venture capitalists) and just move across to a different one when I run out of free queries.

Most of this use is the sort of questions you might ask your mates at work - how would you tackle this? what a good library for? what do you think of this approach? can you have a look over my code and suggest improvements? Working in webchat mode reduces the context available (compared to your entire project) but I’ve grown to actually prefer the tight control it gives me when I’m asking specific code questions.

How I use it now

I use Codeium via its VS Code plugin for code completion. Sometimes this is amazing - it spits out what’s in your head, and follows your naming conventions etc. Other times it doesn’t and I just keep typing.

What it’s really good at is anything repetitive. I especially love it for tests, once I’ve written a couple of tests against edge cases in my code, it gets the flavour of what I want and starts writing good ones, including some I wouldn’t have thought of which is gold. This is often a tab, tab, tab, exercise.

I spend a lot of time in long form conversations in the web interfaces of the major chatbots. Usually this is quite fruitful. I often get it to generate code, or to add behaviours to code I’ve given it which I then transfer over manually. If it gets into a muddle, I usually clear it’s memory and start a new chat or move over to a different service. Having the wrong ideas or code in the context seems to lead to a chain of stupider and stupider attempts to fix the symptoms of a problem rather than going back and identifying it. It’s possible that my fresh explanation of what I’m trying to do, the code I’ve got and what the issue is is also helpful in this restart.

How it’s changed my style

With any tool, using it well involves understanding it’s strengths and leaning into them. AI is no different, and here’s the things I do to help it help me, or things that it’s made possible.

The first change has just been to improve my craft in ways I should have been otherwise, but as a solo developer you can let slide. This is stuff like clear comments, thoughtful descriptive names, and good separation of ideas. This helps the AI as much as it would help someone reviewing your code, or future you when you come back to maintain it. I like my files to be smaller than I used to. 500 lines is a guideline for me.

I already liked old and popular tech before, but now I really like it. Think of the difference of the training corpus for Node/Express vs the latest iteration of SveltKit V2. You just get better answers and suggestions for things the AI knows better.

The last change is that I’m much more likely to change to an appropriate library or technology. The annoying friction of not knowing the exact syntax for things disappears since the AI can generate code with correct syntax for me. It makes my programming skills much more portable. Of course you need to invest in some of the high level understandings to know what you should want to do, but once you know that, you don’t need to know what to type to achieve that in the way you did a couple of years ago.

I’m sure I should know better how to regex, and to remember the common ffmpeg or rsync flags, but I’m never going back to spend time on those jobs!

Using LLMs for coding

Mon, 01 Jul 2024 00:00:00 +0000

This post looks at the context for some of my thinking about AI for supporting software development, and where I’ve landed on it for the time being.

The landscape

I briefly wrote about ChatGPT’s coding ability at the end of 2022. The wide availability of this tool marked the beginning of what I think can fairly be described as a revolution. The controversies that have crystalised since have not dampened my amazement of this step forward in what compute can do, especially around natural language processing.

The next big news in this story was Microsoft’s launch of Github Copilot. In business terms this was a brilliant move - owning the most popular code editor, and leveraging the world’s biggest collection of public code to create a product that millions of people are prepared to pay $10 a month for can only be regarded as a success.

At the same time as Microsoft established a new revenue stream, LLMs have been an exciting area of open source growth, especially the excellent Python libraries and the tools in the LangChain ecosystem.

It’s not all rainbows and unicorns though - there’s a few valid points that AI skeptics have coalesced around.

Training data - although this is a bigger issue for general models (where masses of web content has been vacuumed up) than it is for code, it is still an issue. If a model is trained on some non-permissively licensed code, and the generative AI I’m using includes that code in a commit, then a license, or at least some ethics have been breached.
Quality (1) - You can see from the feature images in many of the posts in this blog during my MidJourney enthusiasm that generative AI is not perfect. Before I abandoned them I started to prefer the mangled writing and fingers of the engines, but no one wants the software equivalent of mangled fingers in their codebases. I suspect this particular aspect of the quality of the code will probably have a technological solution - we’re in the very early days after all.
Quality (2) - A trickier quality problem is people writing code using AI where they do not fully understand the code they are committing. I imagine this is going to be a growing issue for projects, especially anything with a profit motive such as bug bonuses. Projects have mechanisms like code reviews and pull requests, but if submissions can be low-effort and checking them is high-effort, that asymmetry is going to be painful.
Poisoned well - As the amount of AI code in codebases increases, then AI is trained on those codebases this will quickly become a snake eating it’s tail as AI is training itself on it’s own code. If allowed, this would tend to slowly evolve future codebases to use techniques favoured by early coding LLMs. The current amount of machine influenced code on GitHub is definitely not 41% but it must be some, and is likely to increase, so this is a factor that will need some thought.
Exfiltrating code - if you use an external LLM, such as GitHub Copilot to write commercial code, who can see your code? Since it’s being transmitted to the AI in order to make autocomplete suggestions, the answer is Microsoft, or some other company. How does that intersect with your company’s policies? I assume, based on the questions I’ve asked Copilot over the last year, that I’d never be considered for a coding job at Microsoft :-)

I, for one, welcome our new robot overlords

In an industry particularly known for excessive hype-cycles, it’s important to critically examine what we’re doing, but for the moment, I’ve landed on the position that these are good tools for me to use. Here’s my thinking.

My situation is that I’m a very experienced developer, with solid expertise in several languages and programing paradigms, and with a degree that was strong in looking at the meta level of languages and software development processes, but, I’ve got no professional experience in modern languages. Because of this, a lot of my process has been knowing what I wanted to do, using google or stack overflow to figure out the mechanics of that in whatever language I’m using, then translating that into the context of the code I’m working on. Generative AI fits extremely well into that need - instead of jumping into a browser window to look something up, I’m just writing a descriptive comment of my intentions, then tabbing through the suggestions to chose an approach.

My particular style is also well suited to these tools - I like clear, simple to reason about code. If I can write a pure function for something, I do. I like to break my code up into separated concerns with clear interfaces, I don’t prematurely optimise. I use descriptive variable, function and object names. I like to work with established, well documented languages and popular libraries, and I prefer to reduce external dependencies. All of these habits make it easier for an AI assistant to access the context of what I’m doing, and therefore to make better quality suggestions.

My journey

I started out using ChatGPT 3 then 3.5 as a sort of super-google/stack-overflow eliminator.

Then with the public launch of GitHub Copilot, I trialed that in VSCode and it was a great experience. I guess they didn’t invent the idea for the greyed out auto-complete suggestion you can tab to accept, but it feels like a natural way to work with this stuff.

I paid for Copilot for a couple of months. But then heard about Codium, probably on Syntax, which is free for individual developers (for now - thank you VC funding). I haven’t done any careful comparisons, but its definitely of the same order. I suspect Copilot is doing something better with the local context. For example I use a plain text accounting system called Bean Count in VSCode. Copilot is able to understand these transactions and make much useful suggestions than Codium. I assume this is just inferred from my local files since there would not be much training data for them, and it suggests the correct accounts based on the payees which must be from local context.

I’ve probably done more work with Codium, 80% of it on Javascript, than with Copilot. It’s definitely a workable solution and a great choice if you want a Copilot type experience without paying for it, or have questions about Microsoft’s training data.

More recently I’ve started playing with local models to avoid the problem of exfiltrating my code - I strongly feel I can’t use AI assisted coding with client code if I don’t know what’s happening it. If I can run a local model, that problem is avoided.

I code on an early M1 MacBook, so Ollama is an easy to use choice. I’ve tried llama3 and codeqwen1.5 in the terminal for a bit, but missed the ChatGPT web experience. To get that back, I’ve been running Open WebUI in a docker container.

More recently, I’ve installed the Continue VSCode extension that allows those Ollama managed models to work in VSCode, including the auto-suggestions (following Dave Gray’s blog post). I’ve got a few long flights coming up over the next week, so it will be good to be able to work offline with that help.

I haven’t really done more than play with CodeQwen in VSCode via Continue so far, but my initial impression is that it’s comparable to Copilot, although the extra second of waiting for auto-suggestions did make me look up M3max MacBook pricing. Logic tells you that a 4GB model on a MacBook is going to be less capable than the giant GPT4 powered Copilot, but this comparison suggests the difference is not an order of magnitude (although the model size is). From limited playing around in small JavaScript codebases, they seem similar, with the local model just being a bit slower.

If this is a revolution, it’s one we’re at the start of, and I certainly reserve the right to change my mind about AI assistance in coding, but I suspect it’s our future and I’m excited at the productivity boost it currently gives me working in languages I’m new to.