Testing the Azure Eventgrid response time

What is Azure Eventgrid

Azure Eventgrid is a new technology in Azure, aimed at connecting different applications, much like other integration technology like Azure Service Bus or Logic Apps. However, Eventgrid wants to turn traditional integration patterns on its head. Traditionally, you poll for data until data arrives. An event based model is the other way around. You do not poll, you wait for another system to send you an event. That event might contain all the necessary data or just enough for you to ask for the new data. An example might be in order.

Say you have a database that does some heavy number-crunching. You need the crunched data. The database exposes a service (or a stored procedure) for you to get the data once it’s done. In a traditional integration you would poll that service once every x minutes to get the data as soon as possible. In an event based integration, the database sends an event to the Eventgrid telling you that the number crunching is done. That event tells you to get the data. No polling is needed.

This is not new. It can be done using a simple Logic App that the database can call instead to send the event. So why use Azure Eventgrid? Logic Apps can do so much more and is therefore not as cheap. It might not even be quick enough and you might need to handle a lot of events with a very low latency. This is where Eventgrid fits in.

For more information about Eventgrid, including very nice usable demos and capabilities like routing and filtering, read this post by Eldert Grootenboer.

What kind of performance?

What do you want out of Eventgrid? I would like for it to be able to forward events quickly without any latency even if there are long waits between events. I want it to react quickly to me sending an event even if there are long periods of inactivity between events. I decided to test this. Does Azure Eventgrid have the

I would like the response time and forwarding time to be “short enough” and consistent. Not like “half a second, 2 seconds, half a second, one minute”.

The test design

First a short disclaimer: The Eventgrid service is in preview, this means that response times and availability is not supported in any SLA. The test is not meant to focus on getting the maximum speed but to find if Azure Eventgrid has consistent response times.

Here is a picture of the communication architecture:

The flow

A command line c# program, running on an Azure VM, sends custom events to the Eventgrid. The Eventgrid forwards the event (using a subscription) to an Azure Function that writes timestamp-information into a database. All resources are in West US 2. The timestamps where both using UTC to negate any time zone problems.

The sending application

The c# program worked like a product database might work. When a product is changed an event will be sent. The program waited for a random number of seconds between sending events, to simulate my imagined workload. People are not consistent. The program sent 5 messages every 1 to 600 seconds.

The message consisted of a light data body and I used the eventTime property to mark the start time of the flow.

The Azure Function

To make sure the function would not be the bottle neck, I used the App Service Plan option and scaled it to two instances. The function code was written in csx (not compiled) and simply received the event message, got the starting timestamp, adding its own timestamp to act as “time received” and then saved it to the Azure SQL Server database.

If you think this might be inefficient I can say that when I did initial bulk testing (200+ messages per second) I flooded the Azure SQL Server database, but the Azure Functions were fine.

The database

It was a simple Azure SQL database with a simple table consisting of three columns: ID, EventSentTime and EventReceivedTime.

Test Execution

The test ran between 2017-09-13 07:31 UTC and 2017-09-13 10:04 UTC, during that time a total of 110 events was sent on a total of 24 occasions.

The test results

The overall results are good! The Eventgrid lives up to my expectations of quickly responding and sending messages even after long periods of inactivity.

Timestamp trouble

Sadly, the timestamps did not line up. Due to different clocks on the VM and in Azure Functions I got negative numbers, as low as -240 miliseconds (ms). This coupled with a maximum time of 1304 ms, the results do not lend themselves to statistics.

In conclusion

Even with the timestamp trouble, there is a clear pattern: The reaction times are quick (the whole flow took about 500 ms to execute after a longer period of inactivity), and consistent, exactly what I wanted out of Azure Eventgrid. I am looking forward to being able to use this technology in production.

Further study

I would like to try is to run more instances of the messaging program.