How do LLMs use their depth?
Best AI papers explained - En podcast af Enoch H. Kang
Kategorier:
The research paper explores how Large Language Models (LLMs) utilize their depth during inference, proposing a "Guess-then-Refine" framework to explain layer-wise prediction dynamics. The authors use the TunedLens method to trace intermediate representations, revealing that early layers function as "statistical guessers" by promoting high-frequency tokens as initial predictions due to limited contextual information. As processing continues through deeper layers, these initial guesses undergo "massive contextual refinement" to become contextually appropriate tokens. Furthermore, the study demonstrates "Complexity-Aware Depth Use," where LLMs intelligently dedicate shallower layers to simpler tasks, such as predicting function words, while reserving deeper layers for more complex computations like recalling multi-token facts or reasoning through constrained-choice tasks.
