Yash Datta — Read the Source

Read the Source takes one mechanism behind modern AI at a time and works it all the way down: the theory in full, a runnable implementation, the safety and alignment properties, and what it takes to run it in production. Reading one of these should leave you understanding the mechanism: why it works, and where it breaks.

I work at the intersection of AI and cybersecurity, building AI safety models and agent control fabric at Highflame. At vibz.art I build Sanskrit-first AI infrastructure for Indic knowledge corpora. Agent Almanac is my publication of recurring, rigorous benchmarks of AI agents on real-world tasks.

From the latest post

Streaming softmax: the recurrence that makes FlashAttention work

A two-state recurrence (log-sum-exp with a running-max correction factor) turned 200K-token context windows from a hardware fantasy into a routine training run.

softmax( [ 1, 2, 3.0 ] ) = [ 0.090, 0.245, 0.665 ]

Drag the third value; the three weights recompute live. This is a figure from the article itself, not a mock-up.

Read the full article →

Read the Source

All writing →

May 15, 2026 Streaming softmax: the recurrence that makes FlashAttention work A two-state recurrence (log-sum-exp with a running-max correction factor) turned 200K-token context windows from a hardware fantasy into a routine training run. Math foundations

Papers

JavelinGuard: Low-Cost Transformer Architectures for LLM Security Yash Datta, Sharath Rajasekar · arXiv, 2025
First author. Low-cost transformer architectures for detecting malicious intent in LLM interactions.
DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar · arXiv, 2026
Originated the core idea; led and executed by Justin Albrethsen.

New pieces by email — saucam.substack.com.