Goodfire says it removed German from a small language model with just four tokens

Goodfire says it stripped German from a small language model with minimal fine-tuning

Goodfire AI says it managed to remove a language model's ability to generate German text by fine-tuning it on just four German tokens, a result the company presented as an early demonstration of its model-editing approach.

In a thread posted on X, the startup said the experiment was done during a one-day hackathon using its Silico product. The company described the target as a 67 million parameter language model and said it altered the model by tuning a single scalar factor on one internal weight subcomponent.

The work was framed as an exploration of what Goodfire calls parameter decomposition. The company said the method breaks a model's weight matrices into smaller, interpretable parts that activate sparsely. That decomposition, Goodfire said, can make it possible to locate and modify specific behaviors in a model more precisely than standard fine-tuning.

A targeted edit with limited spillover

Goodfire said it chose German because it appeared to be the model's strongest non-English language. After applying its edit, the company said the model could no longer predict German text in the way it had before.

The startup compared the result with LoRA fine-tuning, a more common adaptation technique. According to Goodfire, its method matched LoRA's ability to suppress German while using far fewer tokens. It also claimed the edit had less impact on other languages. In the company's account, some LoRA runs harmed French, Spanish, Italian and, in some cases, English, while its own approach left those languages largely intact.

Goodfire also said the interpretability of its decomposition approach helped it refine the experiment. The company said it first tried tuning the top 16 components associated with German, but inspection of their labels showed that many were connected to foreign languages more broadly. That prompted the team to narrow the edit to a single German-specific component, which it said improved precision.

The company presented the result as an example of how interpretability could support more predictable model editing. It also suggested that if a decomposition is useful once, the up-front cost of interpreting the model could be reused across many future tasks.

Part of a broader interpretability push

The German-removal demo fits into Goodfire's larger push around interpretability tools and model control. The company has been promoting Silico as a platform for inspecting model internals, debugging failures and shaping model behavior. It has also used recent posts to highlight related research into neural geometry, data debugging and model behavior analysis.

Goodfire said the experiment was an early demo rather than a finished product claim. It pointed readers to a longer technical explanation on LessWrong and invited researchers to request access to Silico.

The thread also included a correction noting that one plot in the original post had a small labeling issue. Goodfire said the bars in a chart of off-target effects were displayed slightly above their true means because of a plotting error, and it later shared a corrected version.

While the demo was narrow in scope, it adds to a growing body of work aimed at making AI systems more editable and easier to understand. The company’s pitch is that if researchers can identify the internal pieces responsible for a behavior, they may be able to alter that behavior without broadly disturbing the rest of the model.