Overview
Nekotopia operates a comprehensive monitoring infrastructure to ensure service reliability, performance optimisation, and proactive issue detection. Our telemetry stack is built on industry-standard open-source tools, providing real-time visibility into network health and VPN service performance.
Monitoring Stack
Core Components
| Component | Purpose | Description |
|---|---|---|
| Prometheus | Metrics Database | Time-series collection and storage with powerful query language (PromQL) |
| Grafana | Visualisation | Dashboards, alerting, and data exploration |
| MKTXP | MikroTik Exporter | Exports RouterOS metrics (interfaces, queues, firewall, wireless) |
| Node Exporter | Host Metrics | CPU, memory, disk, network stats from Linux servers |
| cAdvisor | Container Metrics | Docker container resource usage and performance |
| pmacct | Flow Analysis | NetFlow/IPFIX traffic accounting and analysis |
Architecture
The monitoring system follows a pull-based collection model with centralised storage and visualisation:visualisation.
Data Collection Flow
1. Data Sources generate metrics:
- MikroTik Router (network stats, queues, firewall)
- Linux Hosts (system resources)
- Docker Containers (application metrics)
2. Exporters expose metrics in Prometheus format:
MKTXP→ MikroTik metrics on:9436Node Exporter→ Host metrics on:9100cAdvisor→ Container metrics on:8080
3. Prometheus scrapes all exporters every 15 seconds and stores 30 days of time-series data.
4. Grafana queries Prometheus and displays dashboards for operators.
Visual Flow
┌──────────────────────────────────────────────────────────────────┐╔═══════════════════════════════════════════════════════════════════╗
│║ DATA SOURCES │║
├──────────────────────────────────────────────────────────────────┤╠═══════════════════════════════════════════════════════════════════╣
│║ │║
│║ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │║
│║ │ MikroTik │ │ Linux │ │ Docker │ │║
│║ │ Router │ │ Hosts │ │ Containers │ │║
│║ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │║
║ │ │ │ │║
│
│║ ▼ ▼ ▼ │║
│║ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │║
│║ │ MKTXP │ │ Node │ │ cAdvisor │ ║
║ │ :9436 │ │ (Exporter):9100 │ │ Exporter:8080 │ │║
│ │
│║ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │║
│ │╚══════════╪══════════════════╪══════════════════╪══════════════════╝
│ │ │
└──────────┼──────────────────┼──────────────────┼────────────────┘
│ │ │
│ :9436/metrics │ :9100/metrics │ :8080/metrics
│ │
│
▼
▼╔═══════════════════════════════╗
▼
┌──────────────────────────────────────────────────────────────────┐
│║ PROMETHEUS │║
│║ (Metrics Database) │║
│║ │║
│║ • Scrapes exporters every 15 secondssec ║
║ • 30 day retention ║
║ • Alerting rules ║
╚═══════════════╤═══════════════╝
│
│ • Stores 30 days of metrics │
│ • Evaluates alerting rules │
└─────────────────────────────┬────────────────────────────────────┘
│
│ :9090/api
▼
┌──────────────────────────────────────────────────────────────────┐╔═══════════════════════════════╗
│║ GRAFANA │║
│║ (Dashboards) │║
│║ │║
│║ • VPN User Statistics •║
Router Health │
│║ • Bandwidth byGraphs Tier║
║ • ContainerSystem ResourcesHealth │║
│ • Traffic Analysis • Alert Management │
└──────────────────────────────────────────────────────────────────┘╚═══════════════════════════════╝
NetFlow Traffic Analysis
For detailed traffic analysis, NetFlow data follows a separate path:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ MikroTik │ NetFlow │ pmacct │ metrics │ Prometheus │
│ Router │── ──────▶ │ Collector │── ──────▶ │ │
│ │ v9 │ │ │ │
└─────────────┘ └───────┬──────┘ └─────────────┘
│
│ Aggregated flow data
▼
┌─────────────┐
│ Grafana │
│ (Traffic │
│ Analysis)Analysis │
└─────────────┘
Key Metrics Collected
Network Metrics
-
Metric Description Interface throughputThroughput-Bytes in/out per interface Packet ratesRates-Packets per second, errors, drops Queue statisticsStatistics-Per-user bandwidth enforcement Firewall countersCounters-Rule hit counts, blocked connections WireGuard peersPeers-Handshake status, data transfer -
Metric Description CPU/Memory -Server and router resource usage Disk I/O -Storage performance Container healthHealth-Docker service status Process monitoringMonitoring-Critical service uptime -
Metric Description Flow recordsRecords-Source/destination IP, ports, protocols User Bandwidth by user-Per-VPN-peer traffic accounting Top talkersTalkers-Highest bandwidth consumers Protocol distributionDistribution-Traffic breakdown by application
System Metrics
Traffic Analysis
Access
Grafana dashboards are available to administrators at the internal monitoring endpoint. User-facing statistics are exposed through the Nekotopia dashboard where appropriate.