This document outlines the DevOps best practices applied to the observability stack configuration.
unless-stopped ensures services restart automaticallymonitoring-network) for service communication.env files or secret management:ro)prometheus-data - Stores time-series data (30-day retention)grafana-data - Stores dashboards, users, datasourcesgrafana-logs - Separate log storagealertmanager-data - Stores alert stateloki-data - Stores log data3030:3000 (host:container)
/api/health endpoint9090:9090/-/healthy endpointhost.docker.internal for scraping host services9093:9093/-/healthy endpoint3100:3100/ready endpointβββββββββββββββββββββββββββββββββββββββββββ
β monitoring-network (bridge) β
β β
β ββββββββββββ ββββββββββββ βββββββββββ
β βPrometheusβ β Loki β βGrafana ββ
β ββββββββββββ ββββββββββββ βββββββββββ
β β β β β
β ββββββββββββ ββββββββββββ β
β βAlertmgr β β Promtail β β
β ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
β
β (via host.docker.internal)
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Host Machine β
β Backend: localhost:8080 β
βββββββββββββββββββββββββββββββββββββββββββ
All services implement health checks:
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider <endpoint> || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s-40s
| Service | Host Port | Container Port | Purpose |
|---|---|---|---|
| Grafana | 3030 | 3000 | Web UI (avoids conflict with Next.js) |
| Prometheus | 9090 | 9090 | Metrics collection |
| Alertmanager | 9093 | 9093 | Alert management |
| Loki | 3100 | 3100 | Log aggregation API |
| Promtail | - | - | Internal log collection |
depends_on:
- prometheus # Grafana needs Prometheus datasource
- loki # Grafana needs Loki datasource
GRAFANA_ADMIN_USER - Grafana admin usernameGRAFANA_ADMIN_PASSWORD - Grafana admin passwordGRAFANA_LOG_LEVEL - Logging level (info, warn, error)SLACK_WEBHOOK_URL - Slack integrationSMTP_* - Email notificationsWEBHOOK_URL - Custom webhookPAGERDUTY_SERVICE_KEY - PagerDuty integration# Backup Grafana data
docker run --rm -v monitoring_grafana-data:/data -v $(pwd):/backup \
alpine tar czf /backup/grafana-backup-$(date +%Y%m%d).tar.gz /data
# Backup Prometheus data
docker run --rm -v monitoring_prometheus-data:/data -v $(pwd):/backup \
alpine tar czf /backup/prometheus-backup-$(date +%Y%m%d).tar.gz /data
docker compose -f monitoring/docker-compose.observability.yml downdocker compose -f monitoring/docker-compose.observability.yml pulldocker compose -f monitoring/docker-compose.observability.yml up -d# All services
docker compose -f monitoring/docker-compose.observability.yml ps
# Individual service logs
docker compose -f monitoring/docker-compose.observability.yml logs <service>
# Health check manually
curl http://localhost:3030/api/health # Grafana
curl http://localhost:9090/-/healthy # Prometheus
curl http://localhost:9093/-/healthy # Alertmanager
curl http://localhost:3100/ready # Loki
# Check port usage
lsof -i :3030
lsof -i :9090
lsof -i :9093
lsof -i :3100
# Kill process on port (if needed)
kill -9 $(lsof -t -i:PORT)