Writing a deep-research agent from scratch
Real talk on the 'secret final boss' of building agents: the hostile web.

NVSHMEM from scratch with RDMA, PCIe topology, GPUDirect RDMA, CUDA IPC—demystifies GPU networking internals.
Systems engineers, CUDA developers, researchers working on distributed LLM training and MoE systems
NVIDIA NVSHMEM official docs · DeepEP MoE dispatch paper · NCCL design documentation
Real talk on the 'secret final boss' of building agents: the hostile web.
Reimplementing FA2 in CuTe from scratch is a masterclass in GPU kernel optimization.
BitTorrent for LLMs with on-device PII scanning before prompts leave your machine.
Direct2D GPU PDF renderer with CPU fallback, but alpha-stage and Windows-only.
Heterogeneous GPU pooling for training when Vast.ai only handles rentals.
Full x86 OS with GUI, networking, and processes—built via vibe-coding with Claude.