Conventions That Scale: File Naming Standards in Production AI Systems
In production AI systems processing millions of files daily, naming conventions aren’t trivial details - they’re critical infrastructure decisions. A recent incident where a junior engineer renamed all uppercase files to lowercase caused our data pipeline to miss critical configuration files for three hours. This highlighted why understanding and respecting established conventions matters.
The Power of Visual Hierarchy#
Traditional Unix systems established uppercase filenames for important files - README, Makefile, LICENSE. This wasn’t arbitrary; it leveraged the ASCII sorting order where uppercase letters precede lowercase, creating natural visual hierarchy in terminal listings.
$ ls -la
total 1824
drwxr-xr-x 24 mleng staff 768 Jun 2 10:15 .
drwxr-xr-x 18 mleng staff 576 Jun 2 09:30 ..
-rw-r--r-- 1 mleng staff 1067 Jun 2 09:45 LICENSE
-rw-r--r-- 1 mleng staff 2456 Jun 2 10:15 Makefile
-rw-r--r-- 1 mleng staff 8923 Jun 2 10:00 README.md
drwxr-xr-x 12 mleng staff 384 Jun 2 09:30 configs/
drwxr-xr-x 45 mleng staff 1440 Jun 2 10:10 data/
-rw-r--r-- 1 mleng staff 456 Jun 2 09:30 requirements.txt
drwxr-xr-x 23 mleng staff 736 Jun 2 10:05 scripts/
drwxr-xr-x 18 mleng staff 576 Jun 2 09:50 src/
Critical files surface immediately, reducing cognitive load when navigating complex projects.
Conventions in Modern AI Projects#
Configuration Files#
In ML projects, configuration files often control critical behavior:
CONFIG.yaml # Global configuration - always uppercase
EXPERIMENTS.yaml # Experiment definitions - uppercase for visibility
hyperparameters.yaml # Specific run parameters - lowercase
This hierarchy communicates importance through naming alone.
Data Pipeline Files#
DATA_MANIFEST.json # Critical data inventory
SCHEMA.json # Data schema definition
PIPELINE.yaml # Main pipeline configuration
transform_01.py # Individual transformation scripts
transform_02.py
The uppercase files define the system; lowercase files implement it.
Model Artifacts#
models/
├── BASELINE.onnx # Reference model
├── PRODUCTION.onnx # Current production model
├── experiment_001.onnx # Experimental variants
├── experiment_002.onnx
└── experiment_003.onnx
Critical models stand out immediately in listings and logs.
Real-World Impact#
Case Study: Missing Configuration Incident#
Our data pipeline scans for configuration files using glob patterns:
def load_critical_configs():
"""Load all critical configuration files"""
configs = {}
# Original implementation - relies on uppercase convention
for config_file in Path('.').glob('[A-Z]*.yaml'):
configs[config_file.stem] = yaml.safe_load(config_file.read_text())
if 'CONFIG' not in configs:
raise ConfigurationError("Missing CONFIG.yaml")
return configs
When uppercase files were renamed, the pipeline silently skipped configuration loading, processing data with defaults that corrupted 10,000 training samples before detection.
Terminal-First Development#
Many AI engineers spend 80% of their time in terminals - SSH sessions to GPU clusters, kubectl exec into containers, tmux sessions monitoring training. In these environments, visual conventions matter:
# Quick identification of important files
$ ls | head -5
CHANGELOG.md
CONTRIBUTING.md
LICENSE
README.md
SECURITY.md
# Versus everything mixed together
$ ls | head -5
api.py
changelog.md
client.py
config.py
contributing.md
The first layout enables faster navigation and reduces errors.
Modern Tool Compatibility#
Git and Version Control#
Git handles case-sensitive filenames differently across platforms:
# macOS (case-insensitive by default)
$ git mv readme.md README.md
fatal: destination exists
# Linux (case-sensitive)
$ git mv readme.md README.md
# Works fine
Understanding these differences prevents cross-platform issues in diverse teams.
Docker and Containers#
Container builds can fail due to case mismatches:
# Dockerfile
COPY Config.yaml /app/config.yaml # Fails if file is CONFIG.yaml
Consistent conventions prevent these build failures.
IDE Integration#
Modern IDEs sort files differently:
- VS Code: Case-insensitive by default, configurable
- IntelliJ: Respects system settings
- Vim/Emacs: Follow terminal conventions
Teams must agree on conventions that work across all tools.
Establishing Team Conventions#
Documentation-Driven Conventions#
Create explicit standards in your project:
# File Naming Conventions
## Uppercase Files
- README.md - Project documentation
- LICENSE - Legal information
- CHANGELOG.md - Version history
- SECURITY.md - Security policies
- CONFIG.* - Global configuration
- SCHEMA.* - Data schemas
## Lowercase Files
- All source code files (.py, .js, .go)
- Test files (*_test.py)
- Local configurations (.env)
- Script files (*.sh)
## Special Cases
- Makefile - Always capitalized (Make convention)
- Dockerfile - Always capitalized (Docker convention)
- Jenkinsfile - Always capitalized (Jenkins convention)
Automated Enforcement#
Use pre-commit hooks to enforce conventions:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: check-uppercase-files
name: Verify uppercase conventions
entry: python scripts/check_naming.py
language: python
files: '^[A-Z]+\.'
# scripts/check_naming.py
import sys
from pathlib import Path
REQUIRED_UPPERCASE = ['README.md', 'LICENSE', 'CHANGELOG.md']
FORBIDDEN_UPPERCASE = ['*.py', '*.js', '*.go']
def check_conventions():
errors = []
for required in REQUIRED_UPPERCASE:
if not Path(required).exists():
lowercase = Path(required.lower())
if lowercase.exists():
errors.append(f"{required} should be uppercase")
for pattern in FORBIDDEN_UPPERCASE:
for file in Path('.').glob(pattern):
if file.name[0].isupper():
errors.append(f"{file} should be lowercase")
return errors
if __name__ == '__main__':
errors = check_conventions()
if errors:
print("Naming convention violations:")
for error in errors:
print(f" - {error}")
sys.exit(1)
Migration Strategies#
When joining projects with established conventions:
1. Understand Before Changing#
# Analyze existing patterns
$ find . -type f -name '[A-Z]*' | head -20
$ git log --follow README.md # Check history
2. Communicate Changes#
If changes are necessary, document the reasoning:
## RFC: Lowercase Migration
### Problem
- Cross-platform inconsistencies
- Tool compatibility issues
### Proposed Solution
- Migrate all files to lowercase
- Update all references
- Add redirects for documentation
### Impact
- 423 files to rename
- 89 import statements to update
- CI/CD pipeline modifications
3. Gradual Migration#
For large codebases, migrate incrementally:
#!/bin/bash
# Staged migration script
# Phase 1: Non-critical files
for file in $(find . -name '[A-Z]*.txt'); do
git mv "$file" "$(echo $file | tr '[:upper:]' '[:lower:]')"
done
# Phase 2: Documentation (with redirects)
for doc in README.md CHANGELOG.md; do
lower=$(echo $doc | tr '[:upper:]' '[:lower:]')
git mv $doc $lower
echo "See $lower" > $doc
git add $doc
done
# Phase 3: Critical configs (with compatibility period)
Best Practices for AI Teams#
1. Document Conventions Early#
Include naming standards in project initialization:
# cookiecutter template for ML projects
{{cookiecutter.project_name}}/
├── README.md # Always uppercase
├── CONTRIBUTING.md # Always uppercase
├── Makefile # Standard capitalization
├── requirements.txt # Always lowercase
└── src/ # Always lowercase
└── {{cookiecutter.module_name}}/ # Always lowercase
2. Consider Tool Ecosystem#
Different ML tools have different expectations:
- MLflow: Lowercase for artifacts
- Kubeflow: Lowercase for resources
- Airflow: Snake_case for DAGs
- DVC: Lowercase for DVC files
3. Prioritize Consistency#
Whether uppercase or lowercase, consistency matters most:
# Inconsistent (avoid)
README.md
changelog.md
LICENSE.txt
Security.MD
# Consistent uppercase
README.md
CHANGELOG.md
LICENSE.txt
SECURITY.md
# Consistent lowercase
readme.md
changelog.md
license.txt
security.md
Naming conventions become infrastructure decisions in production systems processing millions of files. The uppercase convention provides visual hierarchy that aids navigation and reduces errors. These conventions affect pipeline reliability, cross-platform compatibility, and system maintainability. Before changing established conventions, understand their purpose.