Goodfire releases Silico to debug and adjust AI models during training

7 articles · Updated · MIT Technology Review · Apr 30

The San Francisco startup says the paid tool is the first off-the-shelf product covering dataset design through training, mainly for open-source models rather than systems like ChatGPT or Gemini.
Silico uses mechanistic interpretability and agents to map neurons, trace pathways and tune behaviours, which Goodfire says can reduce hallucinations and make model development more precise.
Examples included flipping an AI disclosure answer nine times out of 10 and isolating a Qwen 3 neuron tied to moral-dilemma framing, though outside researchers cautioned the approach remains imperfect.

Is 'neuron surgery' for AI a true science, or a precise alchemy that transparent models will soon make obsolete?

When anyone can edit an AI's ethics, how do we prevent this control from being used to create more manipulative systems?