Understanding LLMs
Rather than just criticise the hype around Large Language Models (LLMs…I refuse to say AI). It’s good to try and get beyond that and gain a deeper understanding. Can local data be pointed at a local model which mitigates some of the Infosec concerns? A bit of reading, but it’s always good to ‘do’ as well.
You can get a local open source LLM. For data I have ten years of fitness tracker data. Can I point it at that?
How to get a local model:
GGML-AI (http://ggml.ai) : this is a little more complex to deploy. You need to at least be comfortable with GitHub.
GPT4ALL (https://gpt4all.io) : super easy to use with a selection of models to choose from. Warns you if the model is going to send data externally. If you want to play try these.
Early lessons?
➡️ It’s VERY hardware and compute intensive. I’m only running the small 7 billion parameter models. How does that scale with more users of a service? Running costs of onprem vs cloud? Overall costs?
➡️ Adding custom data - now you get into either retraining, fine tuning or embedding. Or a combination. It’s an interesting new field and not simple. How would that work for an ever evolving data set? GPT-3 used 284,000kWh of energy to be trained (wow). But how would that translate to operational running? Models must evolve, they can’t just get bigger.
➡️ Confirms to me you need a REALLY good use case before jumping into this. Also a good understanding of overall costs for the project against what it will deliver in benefits (so…the basics).
➡️ If you’re really going to do this, bring in experts