Provide Best Programming Tutorials
Message queue: How to deal with tens of thousands of order requests per second during spike?

Message queue: How to deal with tens of thousands of order requests per second during spike?

From the beginning of the course, I took you to understand the three goals of high-concurrency system design: performance, availability, and scalability. In terms of improving system performance, we have always focused on the query performance of the system. A lot of space has also been used to explain the distributed transformation of the database, the principles of various types of caching and the use of skills. The reason is that most of the scenarios we encounter are read more and write less, especially in the initial stages of a system.

For example, a community system must have only a small number of seed users in the beginning to produce content, and most of the users are “watching” what others are saying. At this time, the overall traffic is relatively small, and the write traffic may only account for one percent of the overall traffic, so even if the overall QPS reaches 10,000 times / second, the write request is only 100 times per second. If you want to do the write request Performance optimization, its cost performance is really not too high.

However, with the development of the business, you may encounter some scenarios with high concurrent write requests, of which spike buying is the most typical scenario. Suppose your mall is planning a spike event, which starts at 00:00 on the fifth day and is limited to the top 200. Then when the spike is about to start, the background will show that the user is frantically refreshing the APP or browser to ensure that they can See the goods as early as possible.

At this time, you are still facing a high read request, so what are the countermeasures?

Because the user is querying a small amount of product data, which belongs to the hot data of the query, you can use a cache strategy to block requests as much as possible in the upper-level cache. Data that can be static, such as picture and video data in the mall, Be static so that you can hit the CDN node cache, reducing the query volume and bandwidth burden of the Web server. Web servers such as Nginx can directly access distributed cache nodes, which can prevent requests from reaching business servers such as Tomcat.

Of course, you can add some traffic limiting policies, such as discarding repeated requests from a certain user, a certain IP, or a certain device within a short time.

In these ways, you find that you can block requests as far as possible from the database.

After slightly alleviating the read request, the 00:00 minute spike activity started on time. The user immediately requested the e-commerce system to generate an order and reduce the inventory. These write operations of the user went directly to the database without passing through the cache. Within 1 second, 10,000 database connections were reached at the same time, and the system’s database was on the verge of collapse. It was urgent to find a solution that could handle such a high level of concurrent write requests. Then you think of message queues.

What I understand message queues

You may know something about message queues, so explaining the concepts of it is not the focus of this lesson. Here I just talk about my own views on message queues. In my work experience over the years, I have always regarded the message queue as a container for temporarily storing data. I think the message queue is a tool that balances the difference in processing time between low-speed systems and high-speed systems. I give you an example of this image.

For example, ancient courtiers often went to meet the emperor to state some national affairs, and waited for the emperor to make a decision. But there are many ministers. If you go to the emperor at the same time, if you say something to me, the emperor will definitely collapse. Later, when he became a courtier and arrived at the Noon Gate, he had to wait for the emperor to summon them one by one into the hall to discuss state affairs, so as to ease the pressure on the emperor to handle things. You can think of Wumen as a container that temporarily holds courtiers, which is what we call a message queue.

In fact, you will see the shadow of the message queue in some components:

In the Java thread pool, we will use a queue to temporarily store submitted tasks, waiting for idle threads to process these tasks;

In the operating system, the second half of the interrupt will also use the work queue to implement deferred execution;

When we implement an RPC framework, we will also write requests received from the network into the queue, and then start several worker threads to process.

… In short, queues are a common component in system design.

So how do we use the message queue to solve the problem in the spike scenario? Next, let’s take a concrete example to see the role of the message queue in the spike scenario.

Cut the peak write traffic in the spike scenario

As mentioned earlier, in the spike scenario, the write traffic of the database will be high in a short time, so according to our previous thinking, the data should be divided into databases and tables. If you have done database and tables, then you need to expand more databases to cope with higher write traffic. However, whether it is to separate databases and tables, or to expand more databases, it will be more complicated. The reason is that you need to migrate the data in the database. This time must be calculated by day or even by week.

In the spike scenario, high concurrent write requests are not continuous and do not occur frequently, but only exist within a few seconds or ten seconds after the start of the spike activity. In order to cope with the ten-second instant write peak, it will take several days or even weeks to expand the database, and then spend a few days to reduce the size after the spike, which is undoubtedly worth the loss.

Therefore, our idea is to temporarily store the spike request in the message queue, and then the business server will respond to the user “the spike result is being calculated” and release the system resources before processing other users’ requests.

We will start several queue handlers in the background, consume messages in the message queue, and then perform logic such as checking inventory and placing orders. Because only a limited number of queue processing threads are executing, the number of concurrent requests that fall into the back-end database is limited. Requests can be temporarily piled up in the message queue. When the inventory is exhausted, the accumulated requests in the message queue can be discarded.

This is the most important role of the message queue in the spike system: peak clipping and valley filling, that is, it can smooth out short traffic peaks. Although accumulation will cause the request to be temporarily delayed, but as long as we monitor the length of the accumulation in the message queue When the amount of accumulation exceeds a certain amount, it is better to increase the number of queue processors to improve the processing capacity of the message, and the user of the spike can also tolerate a short delay to know the result of the spike.

It should be noted here that what I am talking about is a “short-term” delay. If you do not publish the spike results to the user for a long time, then the user may suspect that your spike activity is tricky. Therefore, when using message queues to cope with peak traffic, you need to evaluate the queue processing time, the size of the front-end write traffic, and the database processing capacity, and then decide how many queue processing programs to deploy based on different magnitudes.

For example, if you have 1,000 spike products and the processing time for a single purchase request is 500ms, it will take 500s in total. At this time, you deploy 10 queue handlers, then the processing time of the spike request is 50s, which means that the user needs to wait for 50s to see the result of the spike, which is acceptable. At this time, 10 requests will arrive at the database concurrently, which will not cause great pressure on the database.

Streamline business processes in spike requests through asynchronous processing

In fact, when a large number of write requests “attack” your e-commerce system, in addition to the main peak-cutting and valley-filling role, message queues can also implement asynchronous processing to simplify business processes in spike requests and improve the system’s performance.

You think, in the spike scenario just mentioned, we need 500ms when processing the purchase request. At this point, you analyzed the entire purchase process and found that there will be major business logic and secondary business logic: for example, the main process is generating orders and deducting inventory; the secondary process may be We will issue coupons to users after ordering successfully, which will increase users’ points.

If it takes 50ms to issue coupons and 50ms to increase user points, then if we place the operation of issuing coupons and increasing points on another queue processor, the entire process will be shortened to

400ms, performance increased by 20%, and the time to process these 1,000 items became 400s. If we still want to see the spike results within 50s, we only need to deploy 8 queue programs.

After asynchronously processing some business processes, the deployment structure of our spike system will also change:

Decoupling to achieve loose coupling between spike system modules

In addition to asynchronous processing and peak clipping and valley filling, another role of message queues in the spike system is decoupling.

For example, the data team tells you that you want to count the activity data after the spike event to analyze the popularity of the event products, the characteristics of the buyer crowd, and the user’s satisfaction with the spike interaction. And we need to send a lot of data to the data team, so what do we do?

One idea is: you can use HTTP or RPC to call synchronously, that is, the data team provides an interface, we push the spike data to it in real time, but there will be two problems with this call:

The coupling of the overall system is relatively strong. When the interface of the data team fails, the availability of the spike system will be affected.

When the data system needs new fields, it is necessary to change the parameters of the interface, so the spike system must be changed along with it. At this time, we can consider using message queues to reduce the direct coupling of business systems and data systems.

After the spike system generates a purchase data, we can first send all the data to the message queue, and then the data team subscribes to the topic of this message queue, so that they can receive the data, and then do filtering and processing.

After the spike system is decoupled in this way, the failure of the data system will not affect the spike system. At the same time, when the data system needs new fields, it only needs to parse the messages in the message queue and get the required data.

Asynchronous processing, decoupling, and peak clipping are the main roles that message queues play in the design of spike systems. Among them, asynchronous processing can simplify steps in business processes and improve system performance. Peak clipping and valley filling can be cut to reach the spike system. Peak traffic to ease the processing of business logic; decoupling can decouple the spike system and the data system so that any changes in the two systems will not affect the other system,

If your system wants to improve write performance, achieve low coupling of the system, and want to withstand high concurrent write traffic, then you can consider using message queues to complete.

Lesson Summary

In this lesson, I combined my own practical experience to bring you to understand the role of message queues in the design of high-concurrency systems, and some precautions. The points you need to understand are as follows:

Peak clipping and valley filling are the main functions of message queues, but they will cause delays in request processing.

Asynchronous processing is an artifact to improve system performance, but you need to distinguish the boundary between synchronous and asynchronous processes. At the same time, there is a risk of message loss. We need to consider how to ensure that messages arrive.

Decoupling can improve the robustness of your overall system.

Of course, you should know that although the existing problems can be solved after using message queues, the complexity of the system will also increase. For example, in the business process mentioned above, where is the boundary between the synchronous process and the asynchronous process? Will messages be lost and repeated? How can the delay of the request be reduced? Will the order of message reception affect the normal execution of the business process? If Do I need to reissue after the message processing process fails? These issues are all we need to consider. I will use the next two lessons to address the two main issues: how to deal with the loss and duplication of messages, and how to reduce the delay of messages.

The introduction of message queues also introduces new problems that require new solutions to solve. This is the challenge of system design and the unique charm of system design, and we will continue to improve technical capabilities and systems in these challenges. Designing ability.

Think time

In today’s class, I mentioned the role of message queues in the design of highly concurrent systems. So in what scenarios will you use message queues during development? Welcome to share your experience with me in the message area.

Finally, thank you for reading. If this article has made you gain, you are also welcome to share it with more friends.

Leave a Reply

Close Menu