Understanding LLMs

Rather than just criticise the hype around Large Language Models (LLMs…I refuse to say AI). It’s good to try and get beyond that and gain a deeper understanding. Can local data be pointed at a local model which mitigates some of the Infosec concerns? A bit of reading, but it’s always good to ‘do’ as well.

You can get a local open source LLM. For data I have ten years of fitness tracker data. Can I point it at that?

How to get a local model:

GGML-AI (http://ggml.ai) : this is a little more complex to deploy. You need to at least be comfortable with GitHub.

GPT4ALL (https://gpt4all.io) : super easy to use with a selection of models to choose from. Warns you if the model is going to send data externally. If you want to play try these.

Early lessons?

➡️ It’s VERY hardware and compute intensive. I’m only running the small 7 billion parameter models. How does that scale with more users of a service? Running costs of onprem vs cloud? Overall costs?

➡️ Adding custom data - now you get into either retraining, fine tuning or embedding. Or a combination. It’s an interesting new field and not simple. How would that work for an ever evolving data set? GPT-3 used 284,000kWh of energy to be trained (wow). But how would that translate to operational running? Models must evolve, they can’t just get bigger.

➡️ Confirms to me you need a REALLY good use case before jumping into this. Also a good understanding of overall costs for the project against what it will deliver in benefits (so…the basics).

➡️ If you’re really going to do this, bring in experts

Subscribe to Gary P Shewan

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe