Not a surprise
This is not surprising or even news. People have been having a lot of fun bypassing for a while, and prompt injection is a real thing
“The UK’s AI Safety Institute (AISI) said systems it had tested were “highly vulnerable” to jailbreaks, a term for text prompts designed to elicit a response that a model is supposedly trained to avoid issuing. The AISI said it had tested five unnamed large language models (LLM) – the technology that underpins chatbots – and circumvented their safeguards with relative ease, even without concerted attempts to beat their guardrails”
The earliest ones just needed ‘Pretend…’, ‘Imagine…’ or ‘Theoretically…’
Open models, deployed internally, and tuned on internal data is the safest way forward if you’re concerned about keeping data secure (IMO).