- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Mixture of Experts: Scaling AI Without Breaking the Bank
How Mixture-of-Experts architectures let language models reach trillion-parameter scale while keeping per-token compute tractable.
-
Mamba and State Space Models: The Sequence Modelling Revolution
State Space Models and Mamba's input-selective mechanism — linear-time sequence modelling that rivals Transformers on long sequences.
-
Flash Attention: Making Transformers Faster Than Ever
A deep dive into Flash Attention — the IO-aware exact attention algorithm that makes training large language models dramatically faster while using far less memory.
-
Sparse Spatio-Temporal Attention (SSTA)
LWM-Temporal -- Introducing the first physics-informed efficient attention
-
Large Wireless Model (LWM)
LWM -- Introducing the first foundation model for wireless channels