Up to [DragonFly] / src / sys / netinet
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Do not return an EINVAL error for certain abort and disconnect cases. Otherwise an async close() by the other end can cause our close() to return EINVAL.
Increase sockbuf send and receive buffers to 57344 bytes. In particular, note that we want to use buffer limits that allow lo0's mtu of 16384 to be fully utilized.
do early copyin / delayed copyout for socket options
* Fix some cases where NULL was used but 0 was meant (and vice versa). * Remove some bogus casts of NULL to (void *).
tcp_output_dispatch() is only used by SMP kernel Noticed-by: swildner@
In TCP PRU_CONNECT handling, install inp's route with the route entry on the correct CPU: If the TCP connection's target port is not tcp_thread's port on which TCP PRU_CONNECT is processed, then - In tcp_connect(), the route entry installed in inp's route by in_pcbladdr() is freed, so the next fix could take effect. - In tcp_usr_connect(), tcp_output() is dispatched to the connection's target port to be called, so the route entry on the connection's target CPU will be installed in inp's route. Approved-by: dillon@
Add lwkt_sleep() to formalize a shortcut numerous bits of code have been using for a while, which is to directly deschedule oneself and switch away. This method of blocking requires a direct lwkt_schedule() call to reschedule the thread and is primarily used by the message port abstraction. Change the psignal code to check TDF_SINTR in the thread flags instead of checking MSGPORTF_WAITING in the thread's private message port. The lwkt_waitmsg() and lwkt_waitport() functions use the same msgport backend function (mp_waitport). Separate the backend into two functions, mp_waitport and mp_waitmsg, and allow tsleep flags to be passed in instead of flagging interruptability in the lwkt_msg flags. Optimize the lwkt_waitmsg() backends - in the fully synchronous critical path case no critical sections or spinlocks are required at all.
LWKT message ports contain a number of function pointers which abstract their backend operation. * Add a new function, mp_getport(), which takes over the functionality of lwkt_getport(). * Formalize the default backend and rename it the 'thread' port backend, used when a message port will only be drained by a single thread. This backend is able to use critical sections and IPI messages to handle races. * Fix a small timing window in the thread port backend where replying a synchronous message request from a different cpu may fail to wake up the originator who is waiting for the message completion. * Abstract-out the message port initialization code and clean up related code pollution. * Add a new backend called the 'spin' port backend. This backend can be used if a message port might be drained by several different threads. For example, this would allow us to use a message port as part of a file pointer / file descriptor construct. * Add a boot-time tunable, lwkt.use_spin_port (defaults to off) which forces spin ports to be used instead of thread ports for the per-thread message port. This is used only for debugging.
* Greatly reduce the complexity of the LWKT messaging and port abstraction. Significantly reduce the overhead of the subsystem. * The message abort algorithm has been rewritten. It now sends a separate message to issue the abort instead of trying to requeue the original message. This also means the TAILQ embedded in the lwkt_msg structure can be used by unrelated code during processing of the message. * Numerous MSGF_ flags have been removed, and all the LWKT msg/port algorithms have been rewritten and simplified. The message structure is now only touched by the current owner in all situations. * Numerous structural fields have been removed. In particular, the fields used for message abort sequencing have been simplified and we do not try to embed a 'command' field in the base LWKT message any more. * Clean up the netmsg abstraction, which is used all over the network stack. Instead of trying to overload fields in lwkt_msg we now simply extend the base lwkt_msg into struct netmsg. The function dispatch now takes a netmsg and returns void (before we had to return EASYNC), and we no longer need weird casts. Accept/connect message aborts are now greatly simplified.
Give the sockbuf structure its own header file and supporting source file. Move all sockbuf-specific functions from kern/uipc_socket2.c into the new kern/uipc_sockbuf.c and move all the sockbuf-specific structures from sys/socketvar.h to sys/sockbuf.h. Change the sockbuf structure to only contain those fields required to properly management a chain of mbufs. Create a signalsockbuf structure to hold the remaining fields (e.g. selinfo, mbmax, etc). Change the so_rcv and so_snd structures in the struct socket from a sockbuf to a signalsockbuf. Remove the recently added sorecv_direct structure which was being used to provide a direct mbuf path to consumers for socket I/O. Use the newly revamped sockbuf base structure instead. This gives mbuf consumers direct access to the sockbuf API functions for use outside of a struct socket. This will also allow new API functions to be added to the sockbuf interface to ease the job of parsing data out of chained mbufs.
Convert all pr_usrreqs structure initializations to the .name = data format.
Remove weird license clause which has expired.
Made jails IPv6 aware and support more than one IP address. Based-on: Pawel Jakub Dawidek mijail patches. Reviewed-by: Simon 'corecode' Schubert, Thomas E. Spanjaard, et al.
Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
* Remove (void) casts for discarded return values. * Put function types on separate lines. * Ansify function definitions. * Remove __P. In-collaboration-with: Alexey Slynko <firstname.lastname@example.org>
Remove spl*() calls from netinet, replacing them with critical sections. A slight rearrangement of COMMON_START() in tcp_usrreq.c was necessary to ensure that the inp is loaded after entering the critical section.
Apply same bug fix as last commit to IPV6. Reported-by: Jeffrey Hsu
Fix a bug in the distributed PCB wildcardhash code for TCP. For the SMP case both the INP_WILDCARD and INP_WILDCARD_MP flags must be set. The insertion code was calling in_pcbinswildcardhash_oncpu() instead of in_pcbinswildcardhash() for the current-cpu case, which leaves the INP_WILDCARD flag unset. The wildcard deletion code calls various oncpu routines which remove the wildcard from the other cpu's hash tables, then finally calls in_pcbdetach()->in_pcbremlist() on the originating cpu but this fails to delete the inp because INP_WILDCARD was not set. This bug caused the TCP stack to get seriously confused because wildcard entries with stale inp pointers wind up being left in the hash table. The bug causes a mix of ignored connection requests (not even an RST), refused connection requests, successful connection requests, and crashes. Reported-by: Peter Avalos <email@example.com>
Now that 'so_pcb' is properly declared as a 'void *', remove a layer of indirection and directly use 'so->so_pcb' in place of 'sotoinpcb(so)'.
Clean up the routing and networking code before I parallelize routing.
Correct a bug where incoming connections do not properly initialize the inflight bandwidth calculator. Reorg the code a bit, removing random initialization elsewhere and putting it all in one place. Add an idle check and a pure-ack check. Reported-by: Dan Nelson <firstname.lastname@example.org>
Cache a pointer the last mbuf in the sockbuf for faster insertion. Update it on sockbuf insertion and deletion and on user reads. Add a new sbappendstream() function that inserts in constant time. Use it for TCP.
We have to replicate listening IPv6 sockets in the wildcard table because they're also used to match incoming IPv4 connections.
Fix a NULL pointer dereference panic that occurs when the TCP protocol stack races against userland while closing a tcp connection. It is possible for userland to queue a disconnect request but for the protocol stack to then receive a packet that causes it to call tcp_drop()->tcp_close() which also disconnects the inpcb from the tcpcb. When the protocol stack then processes the disconnect request it hits the panic because the inpcb no longer has a tcpcb connected to it. The bug generally only occured on SMP systems where the latency in intra-cpu communication opens up the window of opportunity for the bug to occur. Panic-Reported-by: Adam K Kirchhoff <email@example.com>
Add a state to sanity check tcp_close() to make sure it is not called twice. Add a 'cpu' field to the inpcb so the cpu owning a pcb can be made well-known, for use in later assertions as we move closer to removing the BGL. Fix a bug in the closing of listen sockets. The inp wildcard hash table removal was being done asynchronously with the freeing of the inp, which could lead to problems. Instead of sending messages in parallel to all tcp protocol threads to remove the wildcard hash we instead chain a single message through all tcp protocol threads to remove the hash, then detach the inp at the end of the chain. There is still an issue with the socket being ripped out from under other protocol threads which might be trying to accept connections on behalf of the listen socket which must be resolved before the BGL can be removed (amoung other things).
Add the standard DragonFly copyright notice to go along with mine. Approved by: Matt
Update some of my copyright notices before we officially publish DragonFlyBSD in Release 1.0.
Put snd_recover in the same cache line as snd_una. Make room in the snd_una cache line by coalescing the t_force field into t_flags and moving snd_up into the old t_force slot.
Fix IPV6 listen(). It was simply a matter of a missing in_pcbinswildcardhash() call. Submitted-by: Jeffrey Hsu <firstname.lastname@example.org> Reported-by: "Erik P. Skaalerud" <email@example.com> and others
Remember if an inpcb was entered into the wildcard table to save some cycles when a connection is closed.
Replicate the TCP listen table to give each cpu its own copy.
Use a message structure off the stack for a synchronous call.
The temporary message allocated to execute a connect request is not optional (M_NOWAIT -> M_INTWAIT).
Fix a netmsg memory leak in the ARP code. Adjust all ms_cmd function dispatches to return a proper error code. Reported-by: multiple people
Revamp the initial lwkt_abortmsg() support to normalize the abstraction. Now a message's primary command is always processed by the target even if an abort is requested before the target has retrieved the message from the message port. The message will then be requeued and the abort command copied into lwkt_msg_t->ms_cmd. Thus the target is always guarenteed to see the original message and then a second, abort message (the same message with ms_cmd = ms_abort) regardless of whether the abort was requested before or after the target retrieved the original message. ms_cmd is now an opaque union. LWKT makes no assumptions as to its contents. The NET code now stores nm_handler in ms_cmd as a function vector, and nm_handler has been removed from all netmsg structures. The ms_cmd function vector support nominally returns an integer error code which is intended to support synchronous/asynchronous optimizations in the future (to bypass messaging queueing and dequeueing in those situations where they can be bypassed, without messing up the messaging abstraction). The connect() predicate for which signal/abort support was added in the last commit now uses the new abort mechanism. Instead of having the handler function check whether a message represents an abort or not, a different handler vector is stored in ms_abort and run when an abort is processed (making for an easy separation of function). The large netmsg switch has been replaced by individual function vectors using the new ms_cmd function vector support. This will soon be removed entirely in favor of direct assignment of LWKT-aware PRU vectors to the messages command vector. NOTE ADDITIONAL: eventually the SYSCALL, VFS, and DEV interfaces will use the new message opaque ms_cmd 'function vector' support instead of a command index. Work by: Matthew Dillon and Jeffrey Hsu
Don't need opt_tcp_input.h for TCP_DISTRIBUTED_TCBINFO anymore.
get rid of TCP_DISTRIBUTED_TCBINFO, it only added confusion.
Send connects to the right processor.
Add header file to pull in the setting of the TCP_DISTRIBUTED_TCBINFO option.
per-cpu tcbinfos aren't ready for prime time yet. The tcbinfo is assigned at tcp_attach time, but there is insufficient information available at this time to select the hash table and the wrong one gets assigned N-1 out of N times on MP systems (N = number of cpus), causing outgoing tcp connections to fail. An an option, TCP_DISTRIBUTED_TCBINFO, so MP-safe tcbinfo distribution can continue to be developed without impacting users.
Only enter into wildcard hash table if bind succeeds.
Only enter wildcard sockets into the wildcard hash table.
Partition the TCP connection table.
Once we distribute socket protocol processing requests to different processors, we no longer have a process context to refer to, so eliminate the use of curproc in soreserve() by passing the sockbuf resource limit all the down from the system call code to sbreserve(). Eliminate the use of curproc in unp_attach() by passing down the fields it needs from the proc structure. Define a pru_attach_info structure to hold the information the attach usrreq function requires. The thread argument to in_pcballoc() is unused, so we don't need to pass a thread structure down to in_pcballoc().
Split out wildcarded sockets from the connection hash table.
if ipv6 doesnt need oldstyle prototypes maybe its time we took them out of ipv4's code
Register keyword removal Approved by: Matt Dillon
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread pointers instead of process pointers as arguments, similar to what FreeBSD-5 did. Note however that ultimately both APIs are going to be message-passing which means the current thread context will not be useable for creds and descriptor access.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 220.127.116.11