Splitting Adam | 2024 |

It argues that Adam's second moment actually causes word representations to become narrow and directional (anisotropic).

It shows that Adam minimizes a specific form of sharpness —specifically the trace of the square root of the Hessian—which is fundamentally different from how SGD behaves. 4. Better Embeddings with Coupled Adam Splitting Adam

This paper effectively "splits" the Adam algorithm into two distinct components to study them: It argues that Adam's second moment actually causes

It isolates the stochastic direction (the sign of the gradient) from the adaptive step size (the relative variance). Splitting Adam

It proposes Coupled Adam to fix this specific side effect.

Published in 2025, this paper "splits" the problem of in LLM embeddings.

Splitting Adam

Log In to My Account

Download a Free Theme