For fun and curiosity, I recently decided to build a little console app to run some benchmarks across the various transports.
It was driven by finishing up the NATS implementation, I knew that NATS is fast, so I assumed that the Nimbus transport provider would demonstrate it, but wanted to know more.
Two interesting things came from this little exercise.
- I found a bug!
- Yes, NATS is really really fast.
The bug
Well more of a “opportunity for improvement” than a bug, but in my AMQP transport I had a very naive (aka inefficient) approach of creating a new queue or topic Producer on every message send.
It got the transport working, and didn’t go back once I had all my tests passing and re-visit it. Turns out that’s a huge overhead.
So now I have a pool of senders, cached by the destination key, so repeat sends and publishes are fast. It took the AMQP transport from being quite slow, to very very fast.
So I’ll call that a win for performance testing.
NATS is Fast
When we added the Redis transport, we were blown away by how much faster it was than Azure Service Bus. ASB from the average Australian internet connection back then wasn’t setting any records, and Redis seemed incredible. It was at the expense of any kind of persistence or reliability.
What the results show is that NATS is incredibly quick. With the persistent JetStream option enabled, it’s still almost twice as fast as Redis, and using the non-persistent NATS Core, it’s almost as fast as our In Memory transport which we just use for testing. Amazing numbers.
Other takeaways
To it’s credit, ActiveMQ is also now a touch faster than Redis, and it has a much better persistence story.
So Redis has been a great thing for Nimbus, particularly in environments where you already have it and don’t want another piece of infrastructure. But it’s not the performance king anymore.
The database transports are not fast, which is to be expected. There’s a polling loop in there which means the first message latency can be quite high. These transports exist for a world where we don’t want to add any other messaging infrastructure, and there are a lot of scenarios where that is fine.
However, for throughput, Postgres does incredibly well under load. Is there anything that Postgres can’t do well?
Another nice outcome, although not unexpected, every message that got sent was received. I’d be concerned if that wasn’t the case though.
Conclusion
Here are some of the numbers, I didn’t run the SQL Server one for the same number of iterations as the others, it would have taken a while. Although the throughput numbers would have gone up a bit. The tests only do a simple Send and Receive on a Queue too, and in a single process.
It’s not hugely scientific, but it definitely achieved more than I set out to do.
| Transport | Messages | TotalTime(ms) | Msg/s | Latency Min(ms) | Latency p99 | Latency max |
|---|---|---|---|---|---|---|
| InProcess | 100000 | 3539 | 28260 | 0.02 | 1.75 | 88.82 |
| Redis | 100000 | 39656 | 2522 | 0.36 | 1.06 | 83.92 |
| Nats | 100000 | 3778 | 26467 | 0.17 | 3.15 | 114.25 |
| Nats - JetStream | 100000 | 21382 | 4677 | 0.13 | 0.8 | 101.96 |
| ActiveMQ | 100000 | 36681 | 2726 | 0.27 | 0.88 | 98.47 |
| Postgres | 100000 | 40470 | 2471 | 46.79 | 5619.45 | 5696.61 |
| SqlServer | 10000 | 16760 | 597 | 35.75 | 6407.37 | 6545.28 |