Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend.
Recent articles
- Trying out llama.cpp's new vision support - 10th May 2025
- Saying "hi" to Microsoft's Phi-4-reasoning - 6th May 2025
- Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25) - 5th May 2025