My infrastructure: Ridge
Ridge is a self-hosted infrastructure built to manage end-to-end Data Science and AI projects. Selected Ridge services are now publicly accessible (upon request and via authentication) at eduardodefilippis.com/ridge
• How Access Works: Traffic passes through Cloudflare Tunnel (cloudflared) — zero inbound ports open, encrypted zero-trust connection, no exposed IP. It then reaches Traefik acting as a reverse proxy with dynamic routing via Docker labels.
• Services:
- MLflow: experiment tracking and model registry
- MinIO: S3-compatible object storage for artifacts
- VS Code Server: browser-based IDE across 3 isolated instances with native access to the NVIDIA GPU
- Gitea: self-hosted Git repository with code review
- Actions Runner: integrated CI/CD pipeline with Gitea
- Coolify: self-hosted PaaS for application deployment
- Metrics collection: node-exporter, cAdvisor, nvidia-exporter, and Traefik metrics gathering system, container, GPU, and network data
- Prometheus: data aggregation with a 30-day retention policy
- Grafana: dashboards and alerting
- OpenProject: project management with Gantt charts
- Mattermost: team chat and boards
- MediaWiki: knowledge base and technical documentation
- Memos: quick notes and personal capture
• Storage & Backup: Data is physically segregated: SSDs for configurations and compose files, HDDs for volumes, artifacts, and databases. Automatic nightly backups are managed via rclone to iCloud (covering 9 services).
• Foundation: The entire stack runs on bare-metal Ubuntu 24.04 LTS, utilizing Docker Engine (ridge-net bridge network), NVIDIA GPU access with CUDA, full provisioning via Ansible, nftables firewall, and Fail2Ban for security.
← Back to project list