al-folio

a simple whitespace theme for academics

a distill-style blog post

an example of a distill-style blog post and main elements

25 min read · 2021

a post with code

an example of a blog post with some code

4 min read · 2015

LoRA and QLoRA: Fine-Tuning 70 B Models on a Consumer GPU

LoRA, QLoRA, and the PEFT ecosystem — how the intrinsic dimensionality hypothesis lets us fine-tune billion-parameter models on a single GPU.

5 min read · April 27, 2026

2026 · lora qlora peft fine-tuning efficiency · efficiency
RoPE and ALiBi: Giving Transformers Unlimited Memory

How RoPE, ALiBi, and YaRN enable language models to handle context windows from 4 k to over 1 million tokens.

8 min read · April 26, 2026

2026 · rope positional-encoding long-context transformers · foundation-models
Vision Transformers: How Attention Conquered Computer Vision

From patch embeddings to DINOv2 — the complete story of how Transformers revolutionized computer vision.

7 min read · April 25, 2026

2026 · vit vision patches self-supervised dino mae · foundation-models
Diffusion Models: The Probabilistic Engine Behind Generative AI

A rigorous but accessible walkthrough of DDPM, score matching, and latent diffusion — the mathematical backbone of Stable Diffusion and DALL·E.

6 min read · April 24, 2026

2026 · diffusion generative score-matching ddpm · generative-ai
RLHF and DPO: Teaching Language Models to Be Helpful and Harmless

The complete alignment pipeline — from SFT to RLHF with PPO, to Direct Preference Optimization that eliminates the reward model entirely.

7 min read · April 23, 2026

2026 · rlhf dpo alignment safety preference · alignment