Taskpool: A Hunch Turned Performance Improvement

Asterisk is made up of many different modules that each usually use different APIs provided by Asterisk itself or other libraries. One that has seen use over the last 10 years has been the threadpool API.

Threadpool

The threadpool API provides common methods for a configurable pool of threads that can be used to do work. It’s arbitrary what the work is allowing it to be used in different ways. The two heaviest users of it have been PJSIP and Stasis for their threading models, heavily focusing on dispatching work to be done in an asynchronous manner. The threadpool is convenient for this because it can be configured to have a fixed number of threads or to grow and shrink as needed.

Based on trends I’ve seen for awhile (such as posts and comments about the stasis/pool and stasis/pool-control taskprocessors) I had a hunch that the cost of using the threadpool API was high. After looking at the implementation I became more convinced of this. The threadpool API has a single queue of work for all tasks and uses a second separate management queue to manage the pool itself. Each pushing of a task to the threadpool results in 2 tasks being queued, instead of one. The actual task to be worked and a task to manage the threadpool (grow it, wake up a waiting thread). If the threadpool can’t respond fast enough to tasks being added or there is a fixed number of threads allowed then the single queue of work can grow and grow. This is what causes the stasis/pool taskprocessor messages that some people have seen.

This led me to writing some efficiency tests to see how much work the threadpool implementation can do. In the efficiency tests the work is minimal which helps isolate the results to just the time spent in the threadpool implementation.  There are two tests with the first one being the pushing of a large number of tasks that then get executed by threadpool threads. The second test uses multiple serializers which then also get executed by threadpool threads. These tests both execute for 30 seconds with 50 threads and aim to essentially flood the threadpool with as much use as possible over that time.

Running the efficiency tests gave the following results on my development system:

Threadpool Push: 22,809,319 tasks executed (760,310 per second)

Threadpool Serializer Push: 16,590,999 tasks executed (553,033 per second)

As you can see the serializer usage incurs an approximate 27% penalty on my system.

These results and usage allowed me to better understand the activity in the threadpool. I came to the conclusion that the threadpool API is really good for medium to long running tasks, but is inefficient for short to medium duration tasks. The overhead of managing the threadpool itself just isn’t worth it.

Taskpool

I decided to come up with an API specifically focused on short to medium duration tasks, taking into account all I had learned from threadpool as well as usage of tasks in general. This new API is called taskpool.

Taskpool is effectively a pool of taskprocessors that can be used to execute tasks but shares the configurability of the threadpool API. It can have a fixed number of taskprocessors or it can grow and shrink as needed. A substantial difference to threadpool is that there is no single queue of work. Instead when a task is pushed it is immediately allocated to a specific taskprocessor and queued directly on it. There is also no separate management queue that manages the taskpool. If the taskpool needs to grow it is done when pushing a task. If it needs to shrink this is done asynchronously in the background using a periodic scheduled item.

You might immediately think “but taskprocessors have issues!”. I want to set the record straight. Taskprocessors themselves are actually pretty efficient. The users of the taskprocessor API are really the fundamental cause of taskprocessor issues. If they are slow or don’t use taskprocessors efficiently, then a symptom is taskprocessor issues.

As part of implementing taskpool I wrote efficiency tests that are equivalent to the threadpool efficiency tests to give a comparison. Running these tests on my development system yielded the following results:

Taskpool Push: 61,255,016 tasks executed (2,041,833 per second) – 2.69x more tasks executed than threadpool

Taskpool Serializer Push: 48,997,697 tasks executed (1,633,256 per second) – 2.95x more tasks executed than threadpool

For taskpool the serializer usage penalty is actually less, approximately 20%.

These results were just from my system and having others run them for comparison has shown cases of even better results – as high as 20x. It’s very dependent on the system itself and CPU but no matter what it is an improvement for everyone. This is a great demonstration of how the usage of taskprocessors can make a huge difference in performance. Throughout all of this no changes have been made to taskprocessors themselves. I explored doing so but did not find any changes that would improve them.

So Where Is Taskpool Being Used Right Now?

The current major user of taskpool is stasis for message dispatching as of Asterisk 20.17.0, 22.7.0, and 23.1.0. It has been changed over internally to use taskpool with the only user facing change being the configuration changing to taskpool (with threadpool options still being supported). For power users who have examined “core show taskprocessors” this has also changed for stasis with the pool and pool-control taskprocessors being removed and replaced with stasis taskpool ones.

Early reports have shown that this has improved CPU usage a noticeable amount (anywhere from 20-30% less CPU usage) and solved performance issues for people which is great to hear. If you’re interested in using it all you have to do is update to one of the versions listed above or newer. There’s nothing else to do.

About the Author

What can we help you find?