DragonFly BSD
DragonFly kernel List (threaded) for 2010-08
[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]

Re: kqueue selective wakeup mechanism

From: "Samuel J. Greear" <sjg@xxxxxxxxxxxx>
Date: Fri, 6 Aug 2010 10:30:05 -0600

On Wed, Aug 4, 2010 at 6:06 AM, Samuel J. Greear <sjg@evilcode.net> wrote:
> As part of my Summer of Code work I have been looking at implementing
> selective wakeups in one form or another. Currently (and probably
> since the beginning of time) when some processes or threads are
> waiting on a descriptor in one of select/poll/kevent, all waiting
> processes/threads are woken up  when it is available, and that
> functionality is now mandatory.
> Quite a few ideas were considered but the only one that really makes
> sense right now is an opt-in mechanism in the form of a kqueue flag,
> EV_WAKEONE. The idea here is to be able to simplify a userspace
> server, such as a webserver, to allow it to reasonably block in
> kevent(2) and cut out a lot of the need or desire to do descriptor
> passing and and cross-process/thread synchronization around kevent(2)
> or accept(2). While at the same time improving performance, we aren't
> needlessly waking a large number of processes or threads.
> I looked at a large number of webservers in an attempt to find one
> that operated in roughly this manner, but to no avail. So to perform
> an adequate test I wrote one (well, not really, its the minimum I
> could get away with and have apachebench come away happy):
> http://pastie.org/1074861 - This server forks n processes (see for
> loop) and all of them block in kevent waiting for new connections and
> readiness of existing connections. This server isn't really useful
> beyond the scope of this test, it does no parsing and effectively
> relies on the socket buffers working ideally, but it should be
> sufficient to illustrate the performance differences between waking up
> n listeners and waking up 1.
> Here http://pastie.org/1074873 is the output of apachebench run
> against the test server in two modes, the first used the traditional
> kqueue flags and races to accept(2), ignoring errors there. The second
> used EV_WAKEONE and reported accept(2) errors (there were none). As
> you can see the second test was approximately 16% faster on a 2 core
> SMP test machine.
> For reference, I have the results of lighttpd and apache1.3 subjected
> to a nearly identical workload on the same machine
> http://pastie.org/1074889
> The kernel implementation of this functionality is rather simple:
> http://pastie.org/1074878
> Any thoughts on any of this would be appreciated.
> Best,
> Sam

hsu suggested I should see how this fares with the sample servers in
Stevens UNIX Network Programming.

There was a sample server that forked n children and blocked in
select(2), to demonstrate the effect of select collisions (racing to
accept(2)). I created two variants of this server, one that was a
direct port to kq/kevent, and another that was identical to this that

I noticed on the first run that server02l-kone (the EV_WAKEONE
variant) had a single process doing all of the work on my 2 core SMP
test box, so modified the selective wakeup code slightly to push
events that are woken up and carry the EV_WAKEONE option to the end of
the list. We shall call it mkII, it is included below.

# serv00: traditional concurrent server: use as base level
# serv01: one fork per client (traditional concurrent server).
# serv02: prefork, no locking; works on BSD-derived systems
#       but not on SVR4-derived systems.
# serv02l: prefork, no locking, block in select instead of accept to see
#       select collisions; works on BSD-derived systems but not on SVR4.
# serv02l-k: same as serv02l but with kevent instead of select
# serv02l-kone: same as serv02l-k but with EV_WAKEONE
# serv03: prefork, file locking using fcntl().  Similar to Apache server.
# serv04: prefork, file locking using pthread locking.
# serv05: prefork, descrptor passing to children.  Similar to NSCA server.
# serv06: one thread per client.
# serv07: prethread with mutex locking around accept().
# serv08: prethread with only main thread doing accept().
# serv09: prethread with no locking around accept().

Server name     serv00        serv01	    serv02      serv02l
Average runtime 176597.14     768679.27     136438.55    141523.42

Server name     serv02l-k     serv02l-kone  serv03      serv04
Average runtime 141191.04     139346.81     153423.5    136863.83

Server name     serv05        serv06        serv07      serv08
Average runtime 161074.53     168190.29     159819.51   180385.9

Server name     serv09
Average runtime 145117.12

All results are at a concurrency of 8 with 100,000 requests per client.
All times are in ms (lower is better)

serv02 and serv02l-kone, concurrency of 32, 10,000 requests
55889.63     55821.46  (averages)

serv02 and serv02l-kone, concurrency of 128, 10,000 requests
231933.41    226900.81 (averages)

As you can see (yes, the numbers are rough, but pretty consistent), it
is competitive with blocking in accept(2) at low concurrencies and
beats it under high concurrencies (which probably means our accept(2)
could use some work).

corecode has suggested that probably I should try a variant that
instead of re-ordering the list, simply wakes up the first entry in
the list that actually happens to be sleeping. I may try that as

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index 65ff246..324e1b4 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1120,12 +1120,41 @@ filter_event(struct knote *kn, long hint)
 knote(struct klist *list, long hint)
-       struct knote *kn;
+       struct knote *kn, *temp_kn, *list_tail;
+       struct klist wnotes;
+       short wakeone[EVFILT_SYSCOUNT];
+       short wake_filter;
+       SLIST_INIT(&wnotes);
+       memset(&wakeone, 0, sizeof(wakeone));

-       SLIST_FOREACH(kn, list, kn_next)
-               if (filter_event(kn, hint))
-                       KNOTE_ACTIVATE(kn);
+       list_tail = SLIST_FIRST(list);
+       SLIST_FOREACH_MUTABLE(kn, list, kn_next, temp_kn) {
+               /*
+                * Activate the knote, but only once for a given
+                * filter type when EV_WAKEONE is set.
+                */
+               wake_filter = kn->kn_filter + EVFILT_SYSCOUNT;
+               if (wakeone[wake_filter] == 0 ||
+                   !(kn->kn_flags & EV_WAKEONE)) {
+                       if (filter_event(kn, hint)) {
+                               KNOTE_ACTIVATE(kn);
+                               if (kn->kn_flags & EV_WAKEONE) {
+                                       wakeone[wake_filter] = 1;
+                                       SLIST_REMOVE(list, kn, knote, kn_next);
+                                       SLIST_INSERT_HEAD(&wnotes, kn, kn_next);
+                               }
+                       }
+               }
+               if (SLIST_NEXT(kn, kn_next) == NULL)
+                       list_tail = kn;
+       }
+       while (!SLIST_EMPTY(&wnotes)) {
+               kn = SLIST_FIRST(&wnotes);
+               SLIST_REMOVE_HEAD(&wnotes, kn_next);
+               SLIST_INSERT_AFTER(list_tail, kn, kn_next);
+       }

diff --git a/sys/sys/event.h b/sys/sys/event.h
index f0d20b7..cf76cfb 100644
--- a/sys/sys/event.h
+++ b/sys/sys/event.h
@@ -81,6 +81,7 @@ struct kevent {
 /* flags */
 #define EV_ONESHOT     0x0010          /* only report one occurrence */
 #define EV_CLEAR       0x0020          /* clear event state after reporting */
+#define EV_WAKEONE     0x0040          /* only wake a single WAKEONE
listener */

 #define EV_SYSFLAGS    0xF000          /* reserved by system */
 #define EV_FLAG1       0x2000          /* filter-specific flag */

Again, comments appreciated. Thanks.

[Date Prev][Date Next]  [Thread Prev][Thread Next]  [Date Index][Thread Index]