al-folio

a simple whitespace theme for academics

a distill-style blog post

an example of a distill-style blog post and main elements

25 min read · 2021

a post with code

an example of a blog post with some code

4 min read · 2015

Mixture of Experts: Scaling AI Without Breaking the Bank

How Mixture-of-Experts architectures let language models reach trillion-parameter scale while keeping per-token compute tractable.

7 min read · April 01, 2026

2026 · moe scaling llm efficiency sparse · foundation-models
Mamba and State Space Models: The Sequence Modelling Revolution

State Space Models and Mamba's input-selective mechanism — linear-time sequence modelling that rivals Transformers on long sequences.

7 min read · April 01, 2026

2026 · ssm mamba recurrence linear sequence · foundation-models
Flash Attention: Making Transformers Faster Than Ever

A deep dive into Flash Attention — the IO-aware exact attention algorithm that makes training large language models dramatically faster while using far less memory.

7 min read · April 01, 2026

2026 · attention transformers efficiency hardware · foundation-models
Sparse Spatio-Temporal Attention (SSTA)

LWM-Temporal -- Introducing the first physics-informed efficient attention

1 min read · November 28, 2025

2025 · lwm-temporal · foundation-models
Large Wireless Model (LWM)

LWM -- Introducing the first foundation model for wireless channels

8 min read · November 27, 2025

2025 · lwm · foundation-models