DragonFly kernel List (threaded) for 2006-05
Re: Load balancing between CPUs
:As a part of my lottery scheduler work, I'm trying to figure out
:how to implement load balancing between CPUs.
:As I understand it, the goal is equal load of all CPUs. Periodically
:we obtain number of ready-to-execute processes on each CPU (per-cpu
:runqueue length) and 'push' processes from high-loaded CPUs to
:low-loaded. This scheme is quite simple. The problem is CPU affinity,
:which we didn't count.
:Here is what I'm thinking about:
: - when new process created via fork(), usched 'pushed' it
: to the first most low-loaded CPU;
You will want to run it on the same cpu as the parent initially.
Then just let it migrate naturally, or push actively when it does
The reason is that the system does not know as of when the fork()
is issued whether the thread is forking a new instance or
whether it is doing a fork/exec to run another program. If you
actively push the newly forked process to another cpu, you
instantly blow the caches on both cpus.
: - add to the struct lwp field 'lastcpu' to track the CPU
: on which this lwp was scheduled last time;
That is probably best left to the per-lwp scheduler data
(the lwp_usdata pointer). The current bsd4 scheduler stores
its last cpu field there already.
: - followed Matt's mail on Sat, 26 Nov 2005, add to the
: struct lwp field 'desiredcpu', which can be setup by
: usched's setcpumask();
: - balancing scheme: first we try to find on most high-loaded
: CPU lwp with 'desiredcpu' equal to most low-loaded and then
: 'pushed' it on success. If we can't, doing the same but
: with 'lastcpu' field. If both previous are failed, pick up
: a random lwp and 'push' it to the most low-loaded CPU, breaking
: cpu affinity.. Do this step until runqueues on both CPUs
: will be equal (+/- 1).
It is probably best to just schedule the thread on the last cpu
it ran on, and then let the scheduler move the thread to another
cpu if it doesn't get cpu 'quickly enough' (whatever we define that
:With this scheme we balance only 2 CPUs at one balancing cycle. Maybe
:we need to do this for more than one CPU pair..
Doing the migration out of band also makes balancing across multiple
cpus easier, since you can construct a dedicated algorithm to do
the balancing which has access to a lot more statistical information
rather than attempting to figure it out when the thread is scheduled
The reason you want to do this is because it is virtually impossible
to characterize how much cpu the target thread will need before it
goes to sleep again. If you schedule the target thread to another
cpu when it would otherwise only require a few cycles of cpu before
going to sleep again, you again blow both cpu's caches and add a lot
of overhead that winds up being wasted.
For example, when the scheduler does a normal thread switch and must
select the 'next' thread to run on the current cpu, it could also
check other threads in the run queue that haven't been able to run
yet and determine whether it makes sense to migrate some of them.
Alternatively, one could have a 'helper' thread that runs a certain
number of times a second, like 10 times a second, on each cpu, which
implements a more sophisticated balancing algorithm. The advantage of
this is that the algorithm then takes a small fixed amount of cpu that
is entirely independant of the actual machine load, and you do not
clutter the normal 'critical' switching path with all sorts of code.
:Eventually, we will have (I hope) fully-supported 1:1 threading. So,
:maybe we need to do some effort on scheduling different threads of
:one multithreaded process on different CPUs?
:Feedback and comments are appreciated.