Delays compared to priority scheduling

In an underload situation, the thread scheduler doesn't delay ready-to-run threads, but the highest-priority thread might not run if the thread scheduler is balancing budgets.

In very unlikely cases, a large window size can cause some scheduler partitions to experience runtime delays, but these delays are always less than what would occur without adaptive partitioning thread scheduling. There are two cases where this can occur.

Case 1

If a scheduler partition's budget is budget milliseconds, then the delay is never longer than:

window_sizesmallest_budget + largest_budget

This upper bound is only ever reached when low-budget and low-priority scheduler partitions interact with two other scheduler partitions in a specific way, and then only when all threads in the system are ready to run for very long intervals. This maximum possible delay has an extremely low chance of occurring.

For example, given these scheduler partitions:

  • Partition A: 10% share; always ready to run at priority 10
  • Partition B: 10% share; when it runs, it runs at priority 20
  • Partition C: 80% share; when it runs, it runs at priority 30

This delay happens when the following occurs:

  • Let B and C sleep for a long time. A will run opportunistically and eventually run for 100 milliseconds (the size of the averaging window).
  • Then B wakes up. It has both available budget and a higher priority, so it runs. Let's call this time Ta, since it's the last time partition A ran. Since C continues to sleep, B runs opportunistically.
  • At Ta + 90 milliseconds, partition A has just paid back all the time it opportunistically used (the window size minus partition A's budget of 10%). Normally, it would run on the very next tick because that's when it would next have a budget of 1 millisecond, and B is over budget.
  • But let's say that, by coincidence, C chooses to wake at that exact time. Because it has budget and a higher priority than A, it runs. It proceeds to run for another 80 milliseconds, which is when it runs out of budget.
  • Only now, at Ta + 90 ms + 80 ms, or 170 milliseconds later, does A get to run again.
Note:
This scenario can't occur unless a high-priority partition wakes up exactly when a lower-priority partition just finishes paying back its opportunistic run time.

Case 2

Still rare, but more common, is a delay of window_sizebudget milliseconds, which may occur to low-budget scheduler partitions with, on average, priorities equal to other partitions.

With a typical mix of thread priorities, when ready to run, each scheduler partition typically experiences a maximum delay of much less than the window_size milliseconds.

For example, let's suppose we have these scheduler partitions:

  • partition A: 10% share, always ready to run at priority 10
  • partition B: 90% share, always ready to run at priority 20, except that every 150 milliseconds, it sleeps for 50 milliseconds.

This delay occurs when the following happens:

  • When partition B sleeps, partition A is already at its budget limit of 10 milliseconds (10% of the window size).
  • But then A runs opportunistically for 50 milliseconds, which is when B wakes up. Let's call that time Ta, the last time partition A ran.
  • B runs continuously for 90 milliseconds, which is when it exhausts its budget. Only then does A run again; this is 90 milliseconds after Ta.

However, this pattern occurs only if the 10% application never suspends (which is exceedingly unlikely), and if there are no threads of other priorities (also exceedingly unlikely).

Approximating the delays

Because these scenarios are complicated, and the maximum delay time is a function of the partition shares, we approximate this rule by saying that the maximum ready-queue delay time is twice the window size.

Note:
If you change the tick size of the system at runtime, do so before defining the window size of the partition thread scheduler, because QNX Neutrino converts the window size from milliseconds to clock ticks for internal use.

The practical way to verify that your scheduling delays are correct is to load your system with stress loads, and use the System Profiler tool from the IDE to monitor the delays. The aps command lets you change budgets dynamically, so you can quickly confirm that you have the correct configuration of budgets.

Page updated: