Load testing experience with Node.js applications

Introduction

Some fine day in autumn, I decided to monitor the state of the applications we were developing.

On a regular basis I log in to pm2 and ask myself if there is a memory leak today. And every time the application absorbs more and more memory. I log in every day − on holidays and at lunch, but every time the situation remains unchanged. This is a very harsh, truly male confrontation, which outcome is unclear. Obviously, each side expects to win. But now I am grimly determined that only a Reload command cannot remedy the situation. Perhaps my decision was influenced by a couple of articles dedicated to load testing posted on Habr that I read earlier.

Before giving battle to memory leaks, I decided to conduct load tests before and after situation remediation. After all, if there is a leak, then over time the application will cope with loads worse.

This article describes our experience in load testing of Node.js stack. As an experimental instance, we used an integration Node.js solution developed by JazzTeam for a European VoIP provider.

JazzTeam solution

A few words about the tested applications

Let’s start the article with a description of our two experimental applications. The applications are independent of each other. The path for testing, which will be described below, was chosen based on their features:

The first application is called Integration-API. This is a standard server with a large number of routes. The application has both REST requests to third-party servers and communication with the database.
The second application is called Сore. The peculiarity of this application is that it is a WebSocket client with a developed logic for incoming events processing. Events are divided into several categories, but they are not identical. The application communicates with the database and sends REST requests to third-party servers.

Analysis of available tools

Load testing can be carried out in various ways. In fact, any tool that can spam our server with requests can be used. You can use Jmeter, Yandex-Tank, or start the Node.js server using libraries for load creation. There are many good options, the choice depends only on your preferences, capabilities and realities of the project.

We opted for testing with the use of Node.js. There was no need to install/configure an additional environment and it was an advantage for us. Testing, like development, is carried out in a single language in Node.js. I just created an empty node.js application and installed a library. It remained to make a body for the request, and we were ready for testing. I recommend autocannon.js as the library. It is well-customized and easy to work with. There is also Artillery, loadtest, but autocannon seemed easier to us. Another important advantage is the ability to enable clinic.js to create high-quality metrics.

It should be noted that Jmeter and Yandex-Tank, Apache Bench and wrk2 are also good tools for such purposes, but AutoCannon is written in Node. It provides a similar (or sometimes higher) load and is very easy to install on Windows, Linux and Mac OS X. Within the framework of our applications, it would take more time to prepare and configure other tools. If you need a quick and easy (install-launch) tool, then autocannon.js is the best option.

I want to briefly tell about the need for load tests. In my opinion, such testing is extremely necessary for applications that are at the final stage of development. The most important advantage − we know what to expect from the application. It will not be a surprise for us when at the height of the working day the application stops working due to a high load. At the same time, nothing prevents from improving the application to make this moment arrive a couple of years later or never at all.

A good question is − how much time should be spent on load testing? In the operating system, scheduled tasks are often performed once a day or more often: backups, log rotation, and other third-party operations that can affect the performance of both the entire server and our application instance. Therefore, one session of load testing should take at least a day. However, in our article, I will not conduct such a long test. Why? The application is under development; it definitely has problems with memory leaks. The obtained test results will be irrelevant after memory leaks elimination.

Organization of load testing

A script was prepared for load testing. Its objective is to create the most plausible events for shooting. However, it is worth considering that the script should be as easy and fast as possible. If the logic of the application allows, I would recommend first preparing the data, and only then start shooting. The preparation of events should not slow down the shooting process. It is worth remembering that this is a load test, not a stress test.

Decide in advance what to do with external dependencies. Third-party requests are easy to mock (simulate). But with the database, you should think what is more important to you. It can also be mocked. But maintain the experimental integrity, you can use Docker and each time you run tests, use the prepared database from the container. If there is time and opportunity, it is advisable to maintain realistic conditions for testing.

Tests are run with one npm/yarn command. Also, do not forget that the tested application must be launched and respond to each shot.

I want to add that applications running in one instance participated in the testing. Load balancers, proxy servers, etc. were not taken into account.

Features of Core application testing

To test this application, it was decided to cheat. Since this is a WebSocket client, it is impossible to test it straight forward using autocannon.js.

The application listens to one event, which means that it will not be difficult to create an HTTP point to enter event processing for testing. Perhaps this makes the experiment “dirty”, but we are ready to sacrifice the “purity” of the experiment.

Benefits of this solution:

We don’t waste time looking for the perfect solution, and after spending a couple of minutes, we are ready to shoot our next application.
Autocannon.js is very flexible, so we make the load as similar as possible to the work of the WebSocket server by simply establishing a single connection. Another advantage of this tool is request body customization. The peculiarity of the application is that if you use one event for shooting, then further copies of this event will not be processed, so all event should be different. Having created a small script, we prepared the events with which we will shoot our application.
We can use the autocannon.js + clinic.js link to analyze our application and get high-quality metrics.
In the future, we will use this solution to load our application as quickly as possible, and get a plausible dump to search for a memory leak.

At the output there are REST requests to third-party applications. Our application is not interested in what will happen next, the response from third-party applications is not processed. Therefore, we easily mock these requests.

Let’s proceed to testing. We will carry out testing in two modes:

chaotic;
stable.

Chaotic method

Load events are unstructured and random, generated by the script; the probability that the event will pass the full cycle of processing is low. Due to the peculiarities of the project, there may be events that do not need to be processed − they will be discarded at one of the verification stages. This mode shows the largest number of requests that can be processed by the application without delving into the entire logic of the application, which means that a considerable part of asynchronous and difficult operations will be omitted.

The value of the results of this method in terms of load testing is low, the results will not meet the realities of the application. However, it is easier to create, use and configure. This means that we will be able to build metrics faster and start finding and eliminating performance errors. It will also be useful for creating a dump at the initial stages of processing, where, most likely, there is a memory leak. If the accuracy of load testing is not important to us, we only need fast diagnostics − we do not have to waste time to prepare realistic data for shooting.

So, let’s start shooting our app. We will carry out several shootings that will last a different amount of time. The events are presented in the diagram in Fig. 1, the results are presented in Table 1. Only a small part of the events could pass all the stages of processing.

Fig. 1. Diagram of events distribution for the first test mode.

Table. 1. Results of the chaotic method of Core application testing.

The results cannot be called excellent, and over time it becomes obvious that the application weakens. For the first hour of work under load, the application lost 24 requests/sec in performance, for 3 hours of work under load − already 56. If we consider that this is the average value for three hours of work, then the real performance in the last hour is much lower than the average value. We know the average and the maximum values. So, we can calculate the approximate minimum value. Why? It helps us understand that we should optimize the application now, and not leave this issue for later.

(460+x)/2=404 => x=348 requests/sec.

where:

460 requests/sec. − the maximum number of requests. The result for the first 30 seconds was taken as this value;
x − the minimum number of requests, i.e. the approximate value at the end of the test;
404 requests/sec. − the average number of requests for 3 hours.

Test results already show that the application is not ready for a constant load. After three hours of work under constant load, the app’s performance dropped by 110 requests/sec.

However, we would like to see a picture close to the real load. The so-called stable method of application loading is used for this purpose.

Stable method

With this approach, all load events are chained. Every event will go through a full cycle of processing, no request in the database will be forgotten. It is quite a realistic variant of loading, however, we advise you not to use some events that occur infrequently, if less efforts are needed for their processing compared to main functionality processing. Writing a script for event generation is more complicated compared to the first method. We advise you to think over the events for shooting. At the same time, we can compare how much shooting with realistic data can affect the result. So, let’s start shooting our app. Shooting here will be similar to shooting described in the previous method. The results are presented in Table 2:

When compared with the chaotic method, the results changed almost twice. Thus, we proved that the results of load testing depend on the input load. Of course, this depends on the features of the project, if we have a simple and inelaborate event handler, then the results may be almost the same.

Unfortunately, if the application is exposed to a long load, a performance leak can result in its collapse over time. In the future, we can analyze the application using Clinic.js. It’s a great library for getting different metrics and analyzing the app performance. For example, it will not be difficult for us to find the code elements that heavily load the application, find out in which part of the product there is a memory leak, etc.

Features of Integration-API application testing

Integration-API application is a regular Node.js server that processes requests over the https protocol. The application has many event handlers; each handler interacts with the database.

If you have an application with a large number of routes, then you can write a test so that autocannon shoots all routes at a time. However, (in our humble opinion) you should not immediately stick your neck out. Probably, you have routes that are used much more often than others. We will test them, because such routes account for most of the load and they are more likely to cause performance problems in the future.

But still, it depends only on the features of our application, so it is not always good to neglect some handlers.

After analyzing the application logs, we planned load distribution. The results are presented in the diagram in Fig. 2.

We decided not to reinvent a wheel and apply the load on the two most important handlers separately. At the moment, the application does not experience a constant load, and even situations when 100 requests per second are received are very unlikely.

However, to be sure, we will conduct a load test. Tables 3 and 4 show the results:

Table. 3. Integration-API application test results, handler 1.

The results for the handler that works with the database and communicates with a third-party resource are quite good for us. But still, even if we are satisfied with it, it doesn’t mean that it’s a good result.

Conclusion

We could conduct load testing of our applications. We know how the application behaves under a constant load, and we know what to expect from our application. I think it is very important information for the developer. This will also allow finding bottlenecks in the architecture in the future.

Properly organized load testing should always be carried out. Otherwise, there is a risk of finding yourself in hot water during routine operation, and it will take time to remedy the situation. In the article, a rough description of load testing is given. The aim was to show the importance of load testing. After all, it gives us not only information about the number of users the server can withstand, but also an understanding of whether there are problems in the application itself. Thanks to testing, we learned that the application has problems with memory leaks.

It is worth adding that load testing integration into CI/CD is an excellent solution for the project. After each stage of development, we always have up-to-date information about the behavior of our application under load.

We would like to note the following advantages of the chosen path:

Functionality and simplicity − in a short time we were ready to start testing.
Possibility to use additional libraries for performance analysis.
Possibility to quickly get a plausible dump.

However, the results cannot be called as accurate as possible for several reasons:

Testing was carried out on a regular computer, where a copy of the application was run. The performance of the server on which the production version of the application is installed is higher. Results under prolonged load may differ greatly.
The script for generating events that further shoot Core application also takes power from the machine, which slightly worsens the results.
The applications were tested separately. In production, they are run on a single server. In theory, with the maximum load on each application, the results would be lower, because they both use the power of the server.
Core application is a WebSocket client, we tested it as a regular https application. Even being perfectly set up, an error exists.

Links for further study

https://github.com/mcollina/autocannon − a library for load creation.
https://clinicjs.org/ − a library for detailed analysis of the application under load. It is used with autocannon.js.