
Neuron to Graph: Interpreting Language Model Neurons at Scale
May 31, 2023 · We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable …
Neuron2Graph
We propose Neuron to Graph (N2G), a tool which takes a neuron and its dataset examples, and automatically distills the neuron's behaviour on those examples to an interpretable graph.
neurons. Our method easily scales to build graph representations for all neurons in a 6-layer Transformer model using a single Tesla T4 GPU, allowing for wide u. ability. We release the …
onally, inspecting individual neurons by hand is time-consuming and unlikely to scale to entire models. To overcome these challenges, we present Neuron to Graph (N2G), which …
Neuron to Graph: Interpreting Language Model Neurons at Scale
Neuron graph for an in-context learning neuron that activates on repeated token sequences. Identified by searching the graph representations for neurons which frequently have a …
Neuron to Graph: Interpreting Language Model Neurons at Scale
We propose Neuron to Graph (N2G), a tool which takes a neuron and its dataset examples, and automatically distills the neuron's behaviour on those examples to an interpretable graph.
Neuron to Graph: Interpreting Language Model Neurons at Scale
May 31, 2023 · This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within LLMs, to make them more …
Neuron to Graph: Interpreting Neurons in Large Language Models
Nov 28, 2023 · Neuron to Graph (N2G) is an innovative method that utilizes an interpretable graph to automatically extract a neuron’s behavior from the dataset it was trained on.
nd their contribution to the network. This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within LLMs, to make them m.
Neuron to Graph: Interpreting Language Model Neurons at Scale
This paper introduces Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable …