Equinox IT Blog

Moving fast and breaking things with open-weight AI

Moving fast and breaking things with open-weight AI
3:02

Mark Zuckerberg penned a now famous line in Facebook's 2012 SEC filing 'We have a saying: "Move fast and break things". The idea is that if you never break anything, you’re probably not moving fast enough.

Moving fast and breaking things

AI-generated image (OpenAI DALL-E)

It's an ethos that could describe today’s open-weight generative AI ecosystem, with successive model releases massively accelerating innovation.

Open-weight models are also making real inroads within organisations, with McKinsey’s April 2025 Open source technology in the age of AI survey reporting that 63% of respondent organisations use 'open-source' AI models, thanks in part to their lower operating costs.

Add to this the concerns that Geoffrey Hinton raised, when he argued that open-sourcing big models enables bad actors to fine-tune them for harmful purposes, likening it to 'being able to buy nuclear weapons at RadioShack'.

And then the reality of so called 'abliterated' derivatives of open-weight models that are intentionally bypassing safety refusals (where abliteration refers to the 'ablation' or removal of safety alignment refusals to the point of 'obliterating' them).

While those publishing abliterated models may not have done so with malicious intent, for example citing purposes such as 'role playing', the removal of safety alignment refusals makes these models more prone to malicious use.

Less well understood is that similar issues can arise from well-intentioned modifications. For example, fine-tuned derivatives of open-weight models have been shown to produce more harmful responses than their base models - over 22 times more likely according to research conducted in May 2024 by Robust Intelligence.

This is known as 'safety alignment drift', and it is entirely preventable when you consider and remediate safety alignment across feature, training, and inference concerns. The Model Atlas gives an idea of the scale of published derivative models on Hugging Face - how many of these have undergone safety alignment remediation after modification?

There is an increasing understanding of the need for open-weight model risk management, and a recent post from the (UK) AI Security Institute Managing risks from increasingly capable open-weight AI systems is a good starting point for this. Additionally, mitigation approaches to abliteration are improving, such as the extended-refusal fine-tuning approach published in An Embarrassingly Simple Defense Against LLM Abliteration Attacks.

Given the current state of play in the open-weight space, it's not hard to argue for the kind of oversight the authors of the recent open letter to the New Zealand Parliament (RegulateAI.nz) call for. While open-weight models have resulted in innovation that we wouldn't have otherwise achieved through closed models alone, we need to have a much more robust discussion about the things we're breaking as we move fast with them.

 

 

Download From Overspend to Advantage whitepaper

Cloud spending continues to surge globally, but most organisations haven’t made the changes necessary to maximise the value and cost-efficiency benefits of their cloud investments. Download the whitepaper From Overspend to Advantage to learn about our proven approach to optimising cloud value.

 

Subscribe by email