efficiency | Sadjad Alikhani

May 03, 2026	Speculative Decoding: 3× Faster LLM Inference for Free
Apr 27, 2026	LoRA and QLoRA: Fine-Tuning 70 B Models on a Consumer GPU
Apr 21, 2026	Mixture of Experts: Scaling AI Without Breaking the Bank
Apr 20, 2026	Flash Attention: Making Transformers Faster Than Ever
Apr 13, 2026	Speculative Decoding: 3× Faster LLM Inference for Free
Apr 07, 2026	LoRA and QLoRA: Fine-Tuning 70 B Models on a Consumer GPU
Apr 01, 2026	Mixture of Experts: Scaling AI Without Breaking the Bank
Apr 01, 2026	Flash Attention: Making Transformers Faster Than Ever