By Gary Shewan in AI — Jun 3, 2024

Will LLM open source win?

I have always liked the original long form version of 'Jack of all trades'. I may not be a deep expert in a lot of things, but I'll learn enough to ask questions that are linked.

I've been reading about Large Language Model (LLM) parameter sizes, training tokens and loss curves. Some of it did make me go 'Er, whu?' but after persevering it raised an interesting question.

Most of the original models were huge. But they weren't trained a lot. To use an analogy - the time they spent in school was very short. But some of the newer models aren't as big, but they have spent 'longer in school' and they're almost as good as the earlier larger models. People are surprised I'm running Llama 3 8B on a laptop ('What, like ChatGPT?') with no connection to the Internet (pic below). It's just under 5GB and runs well. It's not as pretty, and doesn't flirt with me, but as it's free...I'm grand with that.

Why is that interesting?

Well if you've got a smaller model that's been trained and trained so it performs well - you may not need lots of datacentres and compute power that are burning through planet killing resources.

The other side of that coin is it's harder to monetise it. You don't need to (or can't) sell metered API access or a monthly subscription to your expensive datacentres. You also don't get to suck up data.

That also puts an interesting spin (for me) on any corporate dramas in the AI world. People leaving, coming back and then others leaving. It also helps me understand why there's been more of a Consumer push in the space. Market share, brand recognition, partnerships etc. Product guys always want to monetise. Nothing wrong with that - unless it holds back beneficial progress.

For an old school comparison I'm wondering whether open source efforts will win out here though - the same as Windows NT vs Apache back in the day? Apache put a rocket booster on the world wide web. Not Windows NT.

An open source LLM package that you can install and point at your own data (easily) would really shake things up. At the moment you'd still need compute power for the training period. But things change.

We'll see.

Subscribe to Gary P Shewan