- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Mamba and State Space Models: The Sequence Modelling Revolution
State Space Models and Mamba's input-selective mechanism — linear-time sequence modelling that rivals Transformers on long sequences.
-
Mixture of Experts: Scaling AI Without Breaking the Bank
How Mixture-of-Experts architectures let language models reach trillion-parameter scale while keeping per-token compute tractable.
-
Flash Attention: Making Transformers Faster Than Ever
A deep dive into Flash Attention — the IO-aware exact attention algorithm that makes training large language models dramatically faster while using far less memory.
-
In-Context Learning: How LLMs Learn Without Gradient Updates
The mysterious emergent ability of large language models to perform new tasks from just a handful of examples in the prompt — no gradient updates required.
-
Knowledge Distillation: Teaching Small Models to Think Big
How knowledge distillation, pruning, and quantization compress state-of-the-art models into deployable systems — without sacrificing capability.