Posts for: #Ai-Systems

The Three Truths of Data-Oriented Development: Lessons from Production AI Systems

The Three Truths of Data-Oriented Development: Lessons from Production AI Systems

Mike Acton’s 2014 CppCon talk on data-oriented design fundamentally changed how I approach software engineering. After building AI systems serving millions of users, these principles have proven even more critical in production environments where data volume, transformation pipelines, and hardware constraints dominate success metrics.

Rather than frame these as “lies to avoid,” I’ve found greater value in articulating them as positive truths to embrace. These three principles have guided every production system I’ve architected, particularly in AI/ML contexts where data-oriented thinking isn’t optional—it’s fundamental.

[Read more]

Clarity: AI-Powered Team Transparency Through Text

Clarity: AI-Powered Team Transparency Through Text

Distributed teams lose visibility into what everyone is doing. Managers interrupt with status requests. Developers context-switch to update multiple systems. Jira tickets don’t reflect reality. Confluence pages go stale. Git commits tell part of the story.

Clarity solves this by using AI to synthesize status from all these sources into readable text that humans actually want to read.

The Architecture

Model Context Protocol (MCP) servers pull data from Jira, Confluence, and Git into a shared context window. AI processes this nightly, generating personalized daily status reports in Markdown. Each team member reviews their pre-baked summary via chat, CLI, or any text interface. Management gets real-time access to these reports and generated dashboards.

[Read more]

FFmpeg for AI Training Data: Jump-Cut Automation

FFmpeg for AI Training Data: Jump-Cut Automation

Video data represents one of the richest sources for training AI models, from action recognition to content moderation systems. However, raw video often contains significant noise - dead air, redundant frames, and irrelevant segments. Here’s a production-tested approach to automated video processing that has streamlined our training data preparation.

The Challenge: Extracting Signal from Video Noise

When building datasets for video understanding models, we frequently encounter:

  • Long pauses that add no informational value
  • Redundant segments that can skew model training
  • Inconsistent formats that break processing pipelines
  • Massive file sizes that inflate storage costs

The solution? Automated jump-cut processing with intelligent backup strategies.

[Read more]

Conventions That Scale: File Naming Standards in Production AI Systems

Conventions That Scale: File Naming Standards in Production AI Systems

In production AI systems processing millions of files daily, naming conventions aren’t trivial details - they’re critical infrastructure decisions. A recent incident where a junior engineer renamed all uppercase files to lowercase caused our data pipeline to miss critical configuration files for three hours. This highlighted why understanding and respecting established conventions matters.

The Power of Visual Hierarchy

Traditional Unix systems established uppercase filenames for important files - README, Makefile, LICENSE. This wasn’t arbitrary; it leveraged the ASCII sorting order where uppercase letters precede lowercase, creating natural visual hierarchy in terminal listings.

[Read more]