Why scaling is different for MQTT
MQTT is connection-heavy. A broker can manage hundreds of thousands of long-lived connections, but each connection consumes memory and CPU.
Scaling is less about request rate and more about concurrent sessions, retained messages, and session state.
Vertical scaling first
Start with a single, well-sized broker. It keeps the system simple and avoids the complexity of clustered state.
Optimize connection limits, keep-alives, and retained storage before moving to a cluster.
Clustering strategies
Some brokers support shared-nothing clusters with consistent hashing; others replicate session state. The choice affects latency, durability, and failure handling.
Understand how the broker handles subscriptions and retained messages across nodes. That is often the performance bottleneck.
Session persistence
Persistent sessions allow clients to resume subscriptions and receive missed messages. In clusters, this requires storage or replication.
Decide which clients truly need persistence. Not every device needs a durable session.
Bridging and sharding
For large deployments, you can shard by region or customer and bridge brokers for cross-region flows. This reduces blast radius.
Bridges add latency and complexity. Use them when geographic distribution or tenancy requires isolation.
Operational observability
Monitor connection churn, subscription count, retained size, and message rates. These metrics predict when a cluster will saturate.
Scale gradually and load test with realistic device behavior, including reconnect storms.
Failure modes and recovery
Plan for node failure. What happens to persistent sessions, retained messages, and in-flight QoS 1/2 data?
Test failover procedures and validate that clients reconnect gracefully without data loss surprises.
