Need 4 speed – the Orion story

Car photo by Michal Zacharzewski

People are often asking: “Tell me how fast is your platform?” and I’d be: “Well, it depends”. If I was in a good mood, I’d respond “100 000 messages / second”. This is something we have done and serve our customer, but different protocols different type of traffic.

The reason for my answer is, that if there is no performance requirement, there is really no reason to speculate how fast it needs to be. And the performance/resource balance needs to be tuned based on a Requirement, not some a wish or a dream. In any case, after some time I got tired of this and our team put on some lab coats and we started cooking. Orion soup this time.

The Goal

Our goal was clear: eke out as much of performance for Orion as possible and build the 1000 msg/sec platform WITHOUT any buffering, aggregation and such tricks. From the depths of the Internet a research paper popped up stating that mad Swedish scientists have reached 2000+ messages /sec.

The Setup

As a starting point we have a 6 core VM with SSD. On top of that bad boyy we put a docker swarm and run services there.

We decided to load the Orion directly, as we have no idea what tricks the IoT Agents are up to, and in our use scenarios we do not (often) benefit from IoT agents.

The Disappointment

The Swedes have used Mongo Sharded cluster, so we setup one too. After some experimentation with Shard keys, it was a clear disappointment. We could not load up the Orion, we could not get the Orion use more than 4% of a single CPU and the Sharded cluster was loading up only one of the shards. In the beginning we got about 15 messages/sec throughput. Something was definitely off.

Crazy experiment hat on, and at the end of the day we got 277 msg. Ok, not bad but not good either. And this was with (nuts) 10 Orions scaled and one Mongo instance. At this point Mongo was maxing the CPU usage and the VM was maxing out on CPU as well.

Few words about the architecture: The “load machine” sits inside the deployment on another docker, so we are really not bothering with TLS termination which is usually something IRL you have to do. Load machine sends the requests direct to Orion.

Search for Excellence

Time to put the reading classes on. Few additional options surfaced: Indexing. reqMutexPolicy. Changing the reqMutexPolicy to “none”, we got about 6 times more perf out of one Orion (we have discarded sharded cluster at this point), jump from 13 msg/sec to 100 msg/sec. Yay! \o/ but still far away from 1000.

Another point was that the Orion from Sweden had used 2-8 cores. At best more than we have for all the processes. Well, we have what we have. Next step. Indexing.

The Result

Poking at the indexing. There was a post in Stack Overflow (thank you Internet!) which mentions that indexing will help performance. Ok, so apply indexing. Running tests again. If my chair did not have arm rests, I would have dropped to the floor. My jaw certainly did. Orion using one core and Mongo using one core, we got 1000+ messages / second ! (you need to do bit of math, sum all them msg/sec numbers up):

After this performance tuning session, I was not worried at all. If we need more, we put more. The Context Broker can be scaled horizontally by deploying several instances. Scaling out MongoDB database means a sharded cluster, which divides a database in shards that can run on separate machines. But I know we are not done yet. The next mountain to conquer is the historical data storage. More on that later.

If you think we did something stupid here, fight me in the comments.

Shout-out to Will Freitag from omp computer gmbh for developing a kick ass Python script to use on the tests!