Up to [DragonFly] / src / sys / conf
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Make ath(4) compilable into the kernel and add it to LINT. Adapted-from: FreeBSD
Add acpi_video(4) - a driver for ACPI video extensions. Obtained-from: FreeBSD with modifications
Add ACPI support module for IBM/Lenovo Thinkpad laptops. Work in progress, but already usable for most of Thinkpads. Obtained-from: FreeBSD with modifications
Move acpi_toshiba.c, it's not pc32 specific.
Add some methods to ACPI to handle embedded controllers and device matching. Obtained-from: FreeBSD
Dispatch ipfw control to netisr0. To avoid possible dangling netmsg handler, create ip_fw2_glue.c, which will be built if inet is built. IPFW_LOADED is checked again after netmsg's handler is running, since ipfw unload netmsg may be processed before this ipfw control netmsg.
Remove some old driver remains.
Add jme(4)
Move em(4) from MD conf/files to MI conf/files
Add jmphy.c
MFC: Enable building of hammer.ko.
Make HAMMER build and work as a module and extend hammer(5)'s SYNOPSIS accordingly.
HAMMER 61D/Many: Mirroring features * Split PFS ioctls into their own source file. * Add additional PFS/mirroring directives: pfs-upgrade, pfs-downgrade, and finish implementing pfs-destroy. (Yes, that means you can change the master/slave mode for a PFS now). * Consolidate some of the B-Tree deletion code. * Fix another sync_lock deadlock.
Introduce experimental MPLS over ethernet support. Add 'options MPLS' to the kernel config file to enable it. This modification increases the footprint of each route in the FIB by 12 bytes, used to hold up to 3 label operations per route. Hints-from: Ayame, NiSTswitch implementations. Reviewed-by: dillon@, sephe@, hsu@, hasso@.
Replace the bwillwrite() subsystem to make it more fair to processes. * Add new API functions, bwillread(), bwillwrite(), bwillinode() which the kernel calls when it intends to read, write, or make inode modifications. * Redo the backend. Add bd_heatup() and bd_wait(). bd_heatup() heats up the buf_daemon, starting it flushing before we hit any blocking conditions (similar to the previous algorith). * The new bwill*() blocking functions no longer introduce escalating delays to keep the number of dirty buffers under control. Instead it takes a page from HAMMER and estimates the load caused by the caller, then waits for a specific number of dirty buffers to complete their write I/O's before returning. If the buffers can be retired quickly these functions will return more quickly.
Fix a typo so that old netgraph builds again.
Add files and options lines for NETGRAPH7
HAMMER 59A/Many: Mirroring related work (and one bug fix).
* BUG FIX: Fix a bug in directory hashkey generation. The iterator could
sometimes conflict with a key already on-disk and interfere with a pending
deletion. The chance of this occuring was miniscule but not 0. Now fixed.
The fix also revamps the directory iterator code, moving it all to one
place and removing it from two other places.
* PRUNING CHANGE: The pruning code no longer shifts the create_tid and
delete_tid of adjacent records to fill gaps. This means that historical
queries must either use snapshot softlinks or use a fine-grained
transaction id greater then the most recent snapshot softlink.
fine-grained historical access still works up to the first snapshot
softlink.
* Clean up the cursor code responsible for acquiring the parent node.
* Add the core mirror ioctl read/write infrastructure. This work is still
in progress.
- ioctl commands
- pseudofs enhancements, including st_dev munging.
- mount options
- transaction id and object id conflictless allocation
- initial mirror_tid recursion up the B-Tree (not finished)
- B-Tree mirror scan optimizations to skip sub-hierarchies that do not
need to be scanned (requires mirror_tid recursion to be 100% working).
HAMMER 53E/Many: Performance tuning * Change the code which waits for reclaims to drain to be more inline with the new bwillwrite(). Impose a dynamic delay instead of blocking outright. * Move the hammer_inode_waitreclaims() call from hammer_vop_open() to hammer_get_inode(), and only call it when we would otherwise have to create a new inode. * Sort HAMMER's file list in conf/files.
HAMMER 43/Many: Remove records from the media format, plus other stuff * Get rid of hammer_record_ondisk. As HAMMER has evolved the need for a separate record structure has devolved into trivialities. Originally the idea was to have B-Tree nodes referencing records and data. The B-Tree elements were originally intended to be throw-away and the on-media records were originally intended to be the official representation of the data and contained additional meta-information such as the obj_id of a directory entry and a few additional fields related to the inode. But once the UNDO code went in and it became obvious that the B-Tree needed to be tracked (undo-wise) along with everything else, the need for an official representation of the record as a separate media structure essentially disappeared. Move the directory-record meta-data into the directory-entry data and move the inode-record meta-data into the inode-record data. As a single exception move the atime field to the B-Tree element itself (it replaces what used to be the record offset), in order to continue to allow atime updates to occur without requiring record rewrites. With these changes records are no longer needed at all, so remove the on-media record structure and all the related code. * The removal of the on-media record structure also greatly improves performance. * B-Tree elements are now the official on-media record. * Fix a race in the extraction of the root of the B-Tree. * Clean up the in-memory record handling API. Instead of having to construct B-Tree leaf elements we can simply embed one in the in-memory record structure (struct hammer_record), and in the inode.
Add hammer_flusher.c, to make kernels with options HAMMER build.
Add a driver for Omnikey CardMan 4040 smartcard reader - cmx(4). Obtained-from: FreeBSD
Add fairq to altq. Fairq is a fair queueing algorithm with bandwidth
prioritization and a bandwidth delimiter (hogs) to allow low bandwidth
buckets to jump the round robin. This fairq algorithm is currently unweighted
but traffic can still be classified with the global priority model. For
each queue traffic is normally round robined by taking a packet from each
bucket in turn.
This feature is primarily intended for edge routers and egress points with
bandwidth constrictions.
* Hogs feature allows low bandwidth buckets to burst. Low bandwidth can
mean, e.g. an interactive shell or even simply ack traffic, without
the need to explicitly classify it. Bandwidth is managed on a per-bucket
basis.
* Prioritization feature allows minimum guaranteed bandwidths based on
service classifications. e.g. VOIP, web, mail, PtP, etc.
* Weighted fairq not implemented (beyond using classification into priority
queues), but the circular bucket design makes it a fairly easy task if
someone wants to do it.
* Add ALTQ_MBUF_STATE_HASHED and generate a hash of the connection state
in the mbuf header for any packet that you have set 'keep state' for
in pf. This is done in PF and is needed by fairq to bucketize
'connections'.
* Add the fairq implementation and a new ALTQ_FAIRQ kernel build option.
* Simple example included below.
ports="{ 25, 80 }"
altq on vke0 fairq bandwidth 500Kb queue { normal, bulk }
queue bulk priority 1 bandwidth 100Kb \
fairq(buckets 64, hogs 25Kb) qlimit 50
queue normal priority 2 bandwidth 400Kb \
fairq(buckets 64, hogs 25Kb, default) qlimit 50
pass out on vke0 inet proto tcp from any to any \
keep state queue normal
pass out on vke0 inet proto tcp from any to any port $ports \
keep state queue bulk
DRM update to git snapshot from 2008-01-04.
Fix collision in conf/files, add hammer_signal.c.
Remove ipw(4) from LINT building and module building
HAMMER 32/many: Record holes, initial undo API, initial reblocking code * Add code to record recent 'holes' created by the blockmap allocator due to the requirement that data blocks not cross a 16K hammer buffer boundary, in order to try to fill in the gaps with smaller chunks of data when possible. Currently a hole is not added for blockmap frees. It is questionable whether it is a good idea to do it for frees or not, because it can interfere with the reblock code's attempt to completely free a big block. * Add a reblocking ioctl which scans the B-Tree and reblocks leaf nodes, records, and data in partially empty big blocks to try to free up the entire big block. Incomplete (needs to reblock internal B-Tree nodes and doesn't yet, needs a low-free-space mode which focuses on freeing a single large block). * Add the API infrastructure required to implement the undo records, and implement the initial undo code (sans ordering requirements for writes). Incomplete.
- Embed ether vlan tag in mbuf packet header. Add an mbuf flag to mark that this field is valid. - Hide ifvlan after the above change; drivers support hardware vlan tagging only need to check ether_vlantag in mbuf packet header. - Convert all drivers that support hardware vlan tagging to use vlan tag field in mbug packet header. Obtained-from: FreeBSD Change the vlan/parent serializer releasing/holding sequences into mbuf dispatching. There are several reasons to do so: - Avoid excessive vlan interface serializer releasing/holding - Touching parent interface if_snd without holding parent's serializer is unsafe - vlan's parent may disappear or be changed after vlan's serializer is released # This dispatching could be further optimized by packing all mbufs into one # netmsg using m_nextpkt to: # - Amortize netmsg sending cost # - Reduce the time that parent interface spends on serializer releasing/holding
- Add an entry for iwl(4) - Add iwl(4) in LINT
Add an experimental driver for NICs using Silan Microelectronics' SC92301 chip, some of which seem to be known as Rsltek [sic] 8139D. This is a port of Silan's own FreeBSD 4.7 driver which was written by one 'gaoyonghong'. It's up to the point where it works with the "Noganet KN-8139D" product, but it still gives occasional errors/warnings on the console. Also, some areas need to be brought more up to date. Therefore, the if_sln.ko module is built, but the driver is not yet in GENERIC. Tested-by: Damian Vicino <dvicino@dc.uba.ar>
Clean up remains of the umsm(4) -> ugensa(4) renaming.
HAMMER 28/many: Implement zoned blockmap * Implement a zoned blockmap. Separate B-Tree nodes, records, small blocks of data, and large blocks of data into their own zones. Use 8MB large blocks, 32-byte blockmap entry structures, and two layers to support 59 bits (512 petabytes). * Create a temporary freeblock allocator so the blockmap can be tested. It just allocates sequentially and asserts when it hits the end of the volume. This will be replaced with a real freeblock allocator soon. * Clean up some of the mess I created from the temporary fifo mechanism that had been put in-place to test the major rewiring in 27. * Adjust newfs_hammer. The 'hammer' utility has not yet been adjusted (it can't decode blockmaps yet but will soon).
HAMMER 27/many: Major surgery - change allocation model After getting stuck on the recovery code and highly unoptimal write performance issues, remove the super-cluster/cluster and radix tree bitmap infrastructure and replace it with a circular FIFO. * Nothing is localized yet with this major surgery commit, which means radix nodes, hammer records, file data, and undo fifo elements are all being written to a single fifo. These elements will soon get their own abstracted fifos (and in particular, the undo elements will get a fixed-sized circular fifo and be considered temporary data). * No sequence numbers or transaction spaces are generated yet. * Create a 'hammer_off_t' type (64 bits). This type reserves 4 bits for a zone. Zones which encode volume numbers reserve another 8 bits, giving us a 52 bit byte offset able to represent up to 4096 TB per volume. Zones which do not encode volume numbers have 60 bits available for an abstracted offset, resulting in a maximum filesystem size of 2^60 bytes (1 MTB). Up to 15 zones can be encoded. As of this commit only 2 zones are implemented to wrap up existing functionality. * Adjust the B-Tree to use full 64 bit hammer offsets. Have one global B-Tree for the entire filesystem. The tree is no longer per-cluster. * Scrap the recovery and spike code. Scrap the cluster and super-cluster code. Scrap those portions of the B-Tree code that dealt with spikes. Scrap those portions of the IO subsystem that dealt with marking a cluster open or closed. * Expand the hammer_modify_*() functions to include a data range and add UNDO record generation. Do not implement buffer ordering dependancies yet (ordering issues are going change radically with the FIFO model).
HAMMER 25/many: Add an ioctl API for HAMMER. * Add HAMMERIOC_PRUNE - a command which will scan a range of inode numbers and prune them according to the supplied list. This is a preliminary implementation. * Add HAMMERIOC_GETHISTORY - a command which scans the history for a particular file or a particular file offset within a file and displays it.
Nuke following outdated drivers ray(4): FH only wireless NIC driver awi(4): 802.11 (read: not even 802.11b) and FH wireless NIC driver gx(4): replaced by em(4) and unmaintained for quite a long time
Sync mly(4) with FreeBSD. Obtained-from: FreeBSD
Add hammer_recover.c for kernel builds w/ HAMMER.
- Split if_clone.c out of if.c, license in if.c is attached to if_clone.c - Split if_clone.h out of if.h and if_var.h, license of if_var.h is attached to if_clone.h - Staticize some variables and function in if_clone.c - if_clonereq is the only userland visible structure related to this commit; it is kept in if.h for now, so userland application won't be aware of this commit. It will be moved to net/if_clone.h No functional changes. # if_clone.c is subjected to change to support clone creation with # additional parameters.
Remove the xrpu driver as FreeBSD did ~1 month ago.
Import msk(4) which supports Marvell Yukon II based NICs (both gigE and fastE) Obtained-from: FreeBSD (yongari@FreeBSD.org) Tested-by: me, swildner@, Ferruccio Zamuner <nonsolosoft@diff.org> # Hardware vlan tagging, hardware checksum offload and jumbo frame support # are still missing as of this commit.
HAMMER 3/many - more core infrastructure. * Add an in-memory B-Tree node abstraction * Add an in-memory record abstraction. * Put the B-Tree cursor code in its own source file. * Fill in more of the VOP code. * Do a major clean-up of all in-memory structures and some on-disk structures. All the major in-memory structures now use similarly named functions. * Move inter-cluster link from a B-Tree leaf node to a B-Tree internal node, giving us a left and right boundary to play with. This simplifies the algorithms by quite a bit. * Allow the B-Tree to be unbalanced by moving the sub-type from the B-Tree node header to the B-Tree element structure. * Revamp the I/O infrastructure, in particular allow B-Tree nodes to be held passively. * Implement a flexible B-Tree node cache. References into the B-Tree can be cached by inodes. If the related buffer is flushed by the system, the related cache pointers will be cleared.
Rewrite of the CAM error recovery code. Some of the major changes include: - The SCSI error handling portion of cam_periph_error() has been broken out into a number of subfunctions to better modularize the code that handles the hierarchy of SCSI errors. As a result, the code is now much easier to read. - String handling and error printing has been significantly revamped. We now use sbufs to do string formatting instead of using printfs (for the kernel) and snprintf/strncat (for userland) as before. There is a new catchall error printing routine, cam_error_print() and its string-based counterpart, cam_error_string() that allow the kernel and userland applications to pass in a CCB and have errors printed out properly, whether or not they're SCSI errors. Among other things, this helped eliminate a fair amount of duplicate code in camcontrol. We now print out more information than before, including the CAM status and SCSI status and the error recovery action taken to remedy the problem. Obtained-from: FreeBSD
CPU localize dummynet(4) step 1/2
CPU ip_dn_cpu CPU n1
+--------------------------+ +---------------------+
| netisr | | |
| | | | |
| +<---------------dn_descX----[ip_fw_dn_io_ptr()] |
| | | | |
| [ip_dn_io_ptr(dn_descX)] | | |
| | | | |
| | | | |
| | | | |
| [transmit_event() begin | | |
| +----------------dn_descY------>[ip_output()] |
| : | | |
| : | | |
| : | +---------------------+
| : |
| : |
| : | CPU n2
| : | +---------------------+
| : | | |
| +----------------dn_descZ------>[ip_input()] |
| : | | |
| transmit_event() end] | +---------------------+
| | |
+--------------------------+
NOTE: transmit_event() is triggered by dummynet systimer on CPU ip_dn_cpu
- Add flow id field, which is packet filter independent, in dummynet
descriptor, so that we can record the flow id realted information on the
originator's stack. In this way, dummynet descriptor and its associated
mbuf could be dispatched to different thread for further processing.
- Add packet filter private data and private data unreference function
pointer in dummynet descriptor.
- All of the dummynet descriptor is allocated and filled by packet filter
(only ipfw(4) currently), so things like route entry reference is updated
on the CPU to which it belongs.
- All packets are dispatched to netisr on CPU ip_dn_cpu to be queued on the
target flow queue. Netisr on CPU ip_dn_cpu is also where various dummynet
events got processed.
- DUMMYNET_LOADED is not checked before dispatching a packet; it is checked
in netisr before the packet is handed to dummynet. This paves the way for
step 2/2.
- ip_{output,input}/ether_{demux,output_frame} is no longer called directly
in dummynet, they are called after packet dispatched back to the originator
CPU, so that ip_input() will be called on the same CPU (as determined by
ip_mport()) and things like route entry reference will be updated on the
CPU to which it belongs.
- If the packet is to be dispatched back to ip_output(), the recorded route
entry is checked to make sure that it is still up.
- Dummynet discriptor and its associated mbuf is freed on their originator CPU.
- Reference count the ipfw(4) rule if it is going to be associated with a
dummynet descriptor, so we would not have a dangling rule pointer if the
rule was deleted when the dummynet descriptor was in transit state.
Suggested-by: dillon@
- If ipfw(4) is compiled and loaded as module, reference count the ipfw(4)
module usage, if a rule is associated with a dummynet descriptor.
- Add net/dummynet/ip_dummynet_glue.c, which contains various netisr dispatch
function. This file will be compiled into kernel if 'options INET' is set,
so that we will not have a dangling function pointer in transitting dummynet
descriptor.
- Add DUMMYNET_MBUF_TAGGED mbuf fw_flag, which may be used later.
- Nuke dummynet's dependency on ipfw(4).
Some aac(4) cleanup: * Activate aac_debug.c. Fix AAC_DEBUG and add it to LINT. * Staticise stuff which is used only locally. * Remove unused functions. * Constify an argument of aac_print_fib().
Fix LINT build. Reported-by: swildner
Add uchcom(4) - the driver for WinChipHead CH341/CH340 chips. Obtained-from: NetBSD
Add moscom(4) - the driver for MosChip Semiconductor MCS7703 USB to serial chips. Obtained-from: OpenBSD
HAMMER part 2/many. * Implement most of the I/O infrastructure and internal HAMMER tracking structures for volumes, super-clusters, clusters, and buffers. * Flesh out the B-Tree code and add an iterator. * Implement a good chunk of the vnops, but no modifying operations yet. * Implement passive filesystem buffer tracking which allows a struct buf to remain associated with internal HAMMER data structures and also provides a reverse path whereby the filesystem buffer cache drives garbage collection of internal HAMMER data structures. Use the augmented bio_ops to facilitate this. * Skeleton for transactions, spikes, and object allocation & management.
Break-out the standard UNIX uid/gid tests for VOP_ACCESS into a helper file. The code was basically taken from UFS and the helper file retains the appropriate copyright. This way HAMMER can call the procedure without us needing to add the University copyright to HAMMER sources.
Add a HAMMER kernel build option, add a VFS type for HAMMER, add a file type for key-access (database) files.
Add et(4), which supports Agere ET1310 based Ethernet chips (PCIe only) This chips supports two RX rings, one is currently used for packets whose size are smaller than 110 bytes, the other one is used for the rest packets sizes. Its RX interrupt moderation is quite similar to what bge(4) does: Two control variables, one is used to control how many packet should be received the other is used to control RX interrupt delay, RX interrupt moderation is achieved through the interaction of these two variables. Its TX interrupt moderation is more straightforward than RX's ;), you can tell hardware which TX segment should trigger interrupt. It also has a hardware timer, which is set to 1Hz currently to prevent if_watchdog() from (mis)firing. I didn't figure out how to add polling(4) support for this chip, its RX state ring simply stops working if interrupts are disabled. However, its hardware timer may be used to mimic polling(4) support. The missing features of the driver as of this commit: - Hardware checksum - Hardware vlan tagging - Jumbo buffer support Hopefully, they will be added later. Add TruePHY (will any vendors name their PHY as FalsePHY one day?) support into miibus(4) for Agere ET1011C PHY, which is used by et(4). The data sheet says model is 1 for ET1011C, while my testing card's model is 4; it may be just a variant.
lm(4) and it(4) drivers for hardware sensors used in many motherboards. Ported from OpenBSD to FreeBSD by Constantine A. Murenin <mureninc at gmail.com>. Obtained-from: OpenBSD via FreeBSD GSoC 2007 project
Coretemp(4) driver for Intel Core on-die digital thermal sensor with patch from Constantine A. Murenin <mureninc at gmail.com> to make it use hw.sensors framework. Obtained-from: FreeBSD with modifications from Constantine A. Murenin
Hardware sensors framework originally developed in OpenBSD and ported to FreeBSD by Constantine A. Murenin <mureninc at gmail.com>. Obtained-from: OpenBSD via FreeBSD GSoC 2007 project
Update the agp(4) code to the latest one from FreeBSD HEAD. This brings in a lot of newer hardware support. Obtained-from: FreeBSD
Switch ipfw from ipfw1 to ipfw2. Approved-by: dillon@ Submitted-by: Gary Allan <dragonfly@gallan.plus.com> (w/ modification)
GC even more remains of the recent old driver removal.
Fix LINT: Remove cm(4) and fla(4) remains.
Nuke FDDI support.
Nuke token ring support. This also means one blob less in DragonFly.
Nuke ARCnet support.
Add umsm(4) driver for EVDO and UMTS modems with Qualcomm MSM chipsets. Obtained-from: OpenBSD
Bring CARP into the tree. CARP = Common Address Redundancy Protocol, which allows an IP address to hot switch to backup machine(s) when the master goes offline. Submitted-by: Baptiste Ritter <baptiste.ritter@ulp.u-strasbg.fr>, Jonathan, and Nicolas Testing-by: Thomas Nikolajsen, Gergo Szakal Obtained-from: OpenBSD, NetBSD, and FreeBSD
Fix uslcom path.
Add uark(4) driver which supports Arkmicro Technologies ARK3116 chip found in some USB to serial adapters. Obtained-from: OpenBSD
Add uslcom(4) driver which provides support for USB devices based on Silicon Laboratories CP120x USB-UART bridges. Obtained-from: OpenBSD
Part 1/many USERFS support. Bring in the initial userfs infrastructure. Add syslink-based mount support. Most of the VOPs are still just dummy wrappers with this commit. USERFS is not yet linked into the build.
Add an ISA attachment to the aic7xxx driver to handle 284X controllers. This was sleeping in my tree and was somehow forgotten earlier.
Synchronize all changes made in HEAD to date with the 1.10 release branch. * usbdevs update * header file fixes * vinum root * vinum device I/O fixes * MD fixes * New PCI ids for netif rum and ural * New USB uplcom ids * linux exec memory leak * devclass ordering fixes (sound devices) * rate-limited kprintf support (filesystem full console spams) * msdosfs fixes * Manual page work
Add infrastructure to locate a disk device by name by scanning the disk list. Note: This doesn't work with the 'vn' device yet but it does work with ccd. Add a VFS with a dummy mount which is capable of synthesizing vnodes for devices. Add infrastructure that allows easy opening and closing of a device-backed vnode.
Update cardbus/pccard support. The original patch was done by joerg@; I seemed to "maintain" it for quite a long time :P Obtained-from: FreeBSD Tested-by: many (intermittently tho)
Repo-copy numerous files from sys/emulation/posix4 to sys/sys and sys/kern and adjust the build to suit. posix scheduling is here to stay. Submitted-by: Joe Talbott <josepht@cstone.net>
Move all the code related to handling the current 32 bit disklabel to subr_disklabel32.c. Move the header file from sys/disklabel.h to sys/disklabel32.h. Rename all the related structures and constants and retire 'struct disklabel'. Redo the sys/disklabel.h header file to implement a generic disklabel abstraction. Modify kern/subr_diskslice.c to use this abstraction, with some shims for the ops dispatch at the moment which will be cleaned up later. Adjust all auxillary code that directly accesses 32 bit disklabels to use the new structure and constant names. Remove the snoop-adjust code. The kernel would snoop reads and writes to the disklabel area via the raw slice device (e.g. ad0s1) and convert the disklabel from the in-core format to the on-disk format and vise-versa. The reads and writes made by disklabel -r and the kernel's own internal readdisklabel and writedisklabel code used the snooping. Rearrange the kernel's internal code to manually convert the disklabel when reading and writing. Rearrange the /sbin/disklabel program to do the same when the -r option is used. Have the disklabel program also check which DragonFly OS it is running under so it can be run on older systems. Note that the disklabel binary prior to these changes will NOT operate on the disklabel properly if running on a NEW kernel. Introduce skeleton files for 64 bit disklabel support.
* Add a missing KMODDEP to ng_eiface and hook it into the build. [*] * Add a ng_eiface(4) manual page from FreeBSD-4 [*] and add a reference to it in netgraph(4). * Add a NETGRAPH_EIFACE kernel config option. * Sync libnetgraph with our node types. [*] Submitted-by: Nuno-Antunes <nuno.antunes@gmail.com>
Bring in the latest sound changes from RELENG_6. Obtained-from: FreeBSD
Import the kernel GPT and UUID header files from FreeBSD, and bring in kern_uuid.c from FreeBSD.
Add support for Broadcom NetXtreme II GigE. Jumbo buffer support is missing currently, which will be added later. Thank David Christensen <davidch@broadcom.com> for sending us two sample NICs. Thank dillon@ for providing a blazing fast machine and environment to test the driver. Also thank Walter <wa1ter@myrealbox.com> very much, who contacted Broadcom for me :) Obtained-from: FreeBSD (w/ modification)
Implement SYSREF - structural reference counting, allocation, and sysid management subsystem. * Structural reference count management, including creation and termination sequencing (e.g. where the structure might be temporarily referenced during termination). * Allocation. It uses an objcache backend for optimal allocation, deallocation, and memory recovery. * Sysid assignment and red-black tree indexing. It does this in the objcache CTOR and DTOR so it costs us absolutely nothing in the resource allocation / deallocation critical path. sysids will be reused unless they are externally accessed.
Move syslink_desc to sys/syslink_rpc.h so kernel code does not need to #include sys/syslink.h. Add a kernel config option 'SYSLINK' to build with kern_syslink.c, so it can be worked on (read: broken) without interfering with other developer's kernel builds. Add a shims file for the syslink() system call for kernels not built with kern_syslink.c. The shims file can be used generally for this purpose.
Give the sockbuf structure its own header file and supporting source file. Move all sockbuf-specific functions from kern/uipc_socket2.c into the new kern/uipc_sockbuf.c and move all the sockbuf-specific structures from sys/socketvar.h to sys/sockbuf.h. Change the sockbuf structure to only contain those fields required to properly management a chain of mbufs. Create a signalsockbuf structure to hold the remaining fields (e.g. selinfo, mbmax, etc). Change the so_rcv and so_snd structures in the struct socket from a sockbuf to a signalsockbuf. Remove the recently added sorecv_direct structure which was being used to provide a direct mbuf path to consumers for socket I/O. Use the newly revamped sockbuf base structure instead. This gives mbuf consumers direct access to the sockbuf API functions for use outside of a struct socket. This will also allow new API functions to be added to the sockbuf interface to ease the job of parsing data out of chained mbufs.
Add subr_alist.c. This is a bitmap allocator that works very similarly to subr_blist.c (swap allocator), but with added considerations. 1. All allocations must be in powers of 2. 2. All allocations will be aligned to the allocation size. 3. No allocation size limit (blist was limited to 32 blocks per allocation) Like the blist allocator, the alist is arranged in a linear array suitable for direct mapping onto a storage medium. A dataspace of 2^31-1 blocks may be represented. Approximately 3 bits of kernel memory is used per block. This allocator will be used by HAMMER and ANVIL (filesystem and filesystem storage manager), and by syslink route nodes to chop out individual addresses and subnets. We may also use this allocator to improve the allocation of physical memory.
Remove the hostcache code which has been inactive since 1998.
Nuke old TX rate control algorithm coming with ral(4).
- Define 802.11 modulation types as 'enum ieee80211_modtype'.
- Expose ieee80211_rate2modtype() for pubic use.
- Add definition for DIFS, slot time and contention window.
- Add addition field in TX rate control state structure, so drivers can
give hints to TX rate control algorithms about their capabilities.
- Add Sample TX rate control support:
http://www.pdos.lcs.mit.edu/papers/jbicket-ms.pdf
It is factored out and adapted from the one in ath(4).
- In ieee80211_ratectl.h, expose only IEEE80211_RATECTL_{ONOE,AMRR,SAMPLE}
for user space program.
- Teach ifconfig(8) to show and set Sample TX rate control algorithm.
- Fix a node leakage on rt2560_tx_mgt() error handling path.
- Support Onoe and Sample TX rate control algorithm in 2560 part of
ral(4), and use Sample TX rate control algorithm as the default TX
rate control algorithm. [*]
- Make ral(4) depend on wlan_ratectl_{onoe,sample}.
- Hook Sample TX rate control algorithm into GENERIC and LINT.
# [*]
# If Sample TX rate control algorithm is used, I get almost 100~200%
# UDP_STREAM netperf TX performance boost than the original TX rate
# control algorithm in open/noisy enviroments, and +200~500Kbits/s
# UDP_STREAM netperf TX performance boost under good conditions.
Remove ancient SimOS support.
Change kinfo_proc interface between kernel and userland. Before, we were embedding a struct proc (among others) into struct kinfo_proc. Every time we change implementation details in the kernel, userland has to be adapted (recompiled). In preparation for the coming LWP changes this interface has been reworked. Now kinfo_proc is a structure which does not depend on other structures on the kernel which are subject to change. Instead, the routines fill_kinfo_proc and fill_kinfo_lwp copy all values which are of interest between the kernel structure and the stable kinfo_proc structure. Furthermore, this change adds infrastructure to export LWP-specific data. If userland requests LWP data, it sets the flag KERN_PROC_FLAG_LWP in the sysctl oid. This leads to multiple kinfo_procs being exported. If not set, the first LWP will used. This is like FreeBSD do it, and it seems easy and simple. Note that userland was not yet adjusted to actually request LWPs and aggregate this information if necessary. Besides, the kernel does not yet have more than one LWP per process anyways. This introduces a new file, kern/kern_kinfo.c, which is shared between kernel and libkvm. This was done to avoid and remove code duplication. Now kvm_getprocs constructs a complete struct proc, including pointers, and then calls fill_kinfo_proc to do its job. In-collaboration-with: Thomas E. Spanjaard <tgen@netphreax.net>
Update ACPI build wrappers to use new ACPI-CA code. * many fixes in ACPI-CA code (see changes.txt for detail) * enable interpreter slack code relaxed checking on AML code to get fewer warnings * use OS implementation of spinlock and cache object: OSL cache code by: Jeffrey Hsu <hsu@dragonflybsd.org> Fix to semaphore and locking code: Simon 'corecode' Schubert <corecode@fs.ei.tum.de> * added a few debugging knobs(on make command line): ACPI_DEBUG_LOCKS=yes to activate debugging code for AcpiOs*Lock() ACPI_DEBUG_MEMMAP=yes to activate debugging code for AcpiOs*MapMemory()
* Sync with FreeBSD-RELENG_6. * Add some devices to LINT. * Do some cleanup in sys/conf/files. OK-by: corecode
Say hello to a sound system update from FreeBSD. This includes the long awaited Intel High Definition Audio (HDA) a.k.a. Azalia support. The generic sound support module has been renamed to sound.ko and the "everything included" module is called snd_driver.ko now. Apart from that, everything should continue working as normal, just better.
Repo copy machine/pc32/i386/mem.c to kern/kern_memio.c and separate out the (few) machine-dependant parts. This file primarily controls access to /dev/zero, /dev/null, /dev/random, and kernel memory, and does not belong in a machine-dependant directory.
Make umct compilable into the kernel and add it to LINT.
- Hook rum(4) and ural(4) into GENERIC and LINT - Hook rum(4) and ural(4) into module building - Enable wlan_ratectl_onoe in GENERIC. It is required by rum(4) and ural(4) - Add a commented-out entry in GENERIC for wlan_ratectl_amrr - Enable rtw(4) in GENERIC again
Initial import of the port of the new(er) FreeBSD ATA code.
Note this code has not yet been hooked into the build as such, unless you (unwisely) specify the devices in your kernel config according to sys/conf/files. The modules are also excluded from the module build due to not having a SUBDIR entry in sys/dev/disk/Makefile. The PCI code isn't yet operation pending a patch for sys/bus/pci/pci.c I will send to kernel@ shortly. It short-circuits lazy resource allocation for PCI ATA controllers in legacy mode (i.e. on legacy ISA ATA addresses, which are not configured in the PCI BARs).
The userland utility used to control nata ('natacontrol') and documentation will follow later. Also, be aware only nata, natapci, natadisk and natapicd have seen testing on real hardware so far. nataraid, natausb and natacam are probably not compilable yet, I need to clean those up.
- Add stge(4) for Sundance/Tamarack TC9021 Gigabit Ethernet chip.
It supports following cards:
o Antares Microsystems Gigabit Ethernet
o ASUS NX1101 Gigabit Ethernet
o D-Link DGE-550T Gigabit Ethernet
o IC Plus IP1000A Gigabit Ethernet
o Sundance ST-2021 Gigabit Ethernet
o Sundance ST-2023 Gigabit Ethernet
o Sundance TC9021 Gigabit Ethernet
o Tamarack TC9021 Gigabit Ethernet
- Add PHY module for IC Plus IP1000A integrated PHY, which may be used
by some on-board stge(4)
- Hook stge(4) into GENERIC and LINT
Obtained-from: FreeBSD (yongari@freebsd.org)
MFC 1.142: null_subr.c doesn't exist anymore.
nullfs_subr doesn't exist anymore.
Move the code that eats certain PNP IDs into a ISA bus-specific file.
Add skeleton procedures for the vmspace_*() series of system calls which will be used by virtual kernels to implement processes.
- Port rtw(4) from NetBSD, which supports various RealTek 8180 chip based wireless NIC. - Put NetBSD 802.11 duration related structures and functions in rtw.c and rtwvar.h, and rename them to rtw_xxxx. - Fix various ieee80211_node leakages in TX path. - Use spare RX DMA map to recover from bus_dmamap_load_mbuf() failure. - Utilize TX rate control algorithm framework in our 802.11 layer, support Onoe TX rate control algorithm. - Hook rtw(4) into module building. - Hook rtw(4) into GENERIC and LINT. Thank David Young and many other people for their work on this driver. Tested with a Linksys WPC11 ver.4
Implement a generic TX rate control algorithm framework in 802.11 layer. It is highly modulized so TX rate control algorithms can be added with ease. Only limited interfaces are exported for driver to use, so most of the WiFi drivers can be converted without too much trouble. It does not affect WiFi drivers which are unaware of the new framework yet. Also, the new framework allows TX rate control algorithm to be changed without touching the 802.11 state machine or reinitializing WiFi devices. Two TX rate control algorithms are factored out from ath(4) driver: 1) Onoe TX rate control algorithm, which is suitable for almost any kinds of WiFi NIC driver, especially for 11b devices. (*) 2) AMRR TX rate control algorithm, which should _only_ be used by the WiFi NIC which supports multi-rate retry. More information of this TX rate control algorithm is available at: http://www-sop.inria.fr/rapports/sophia/RR-5208.html In order to use the framework, individual WiFi driver needs to do following: 1) Tell the framework, which TX rate control algorithms it supports and which one to be used as the default, by setting up ieee80211com.ic_ratectl. 2) Call ieee80211_ratectl_newstate() in driver's own newstate() function. 3) When set up hardware TX descriptors, which normally contain TX rate related fields, instead of accessing ieee80211_node.ni_txrate directly, call ieee80211_ratectl_findrate() to get a rate set from the framework. 4) When TX completes, feed TX state (e.g. failure, number of retries) to the framework by calling ieee80211_ratectl_tx_complete(). Teach ifconfig(8) to print and set the TX rate control algorithm. # (*) There is no formal paper for this algorithm, but following two papers # have brief introduction of this TX rate control algorithm: # http://www-sop.inria.fr/rapports/sophia/RR-5208.html # http://www.pdos.lcs.mit.edu/papers/jbicket-ms.pdf
Add an entry for nfe(4)
Bring in the initial cut of the Cache Coherency Management System module.
Add a sysctl kern.ccms_enable for testing. CCMS operations are disabled by
default.
The comment below describes the whole enchillada. Only basic locking has
been implemented in this commit.
CCMS is a duel-purpose cache management layer based around offset ranges.
#1 - Threads on the local machine can obtain shared, exclusive, and modifying
range locks. These work kinda like lockf locks and the kernel will use
them to enforce UNIX I/O atomicy rules.
#2 - The BUF/BIO/VM system can manage the cache coherency state for offset
ranges. That is, Modified/Exclusive/Shared/Invalid (and two more
advanced states).
These cache states to not represent the state of data we have cached.
Instead they represent the best case state of data we are allowed
to cache within the range.
The cache state for a single machine (i.e. no cluster), for every
CCMS data set, would simply be 'Exclusive' or 'Modified' for the
entire 64 bit offset range.
The way this works in general is that the locking layer is used to enforce
UNIX I/O atomicy rules locally and to generally control access on the local
machine. The cache coherency layer would maintain the cache state for
the object's entire offset range. The local locking layer would be used
to prevent demotion of the underlying cache state, and modifications to the
cache state might have the side effect of communicating with other machines
in the cluster.
Take a typical write(). The offset range in the file would first be locked,
then the underlying cache coherency state would be upgraded to Modified.
If the underlying cache state is not compatible with the desired cache
state then communication might occur with other nodes in the cluster in
order to gain exclusive access to the cache elements in question so they
can be upgraded to the desired state. Once upgraded, the range lock
prevents downgrading until the operation completes. This of course can
result in a deadlock between machines and deadlocks would have to be dealt
with.
Likewise, if a remote machine needs to upgrade its representation of
the cache state for a particular file it might have to communicate with
us in order to downgrade our cache state. If a remote machine
needs an offset range to be Shared then we have to downgrade our
cache state for that range to Shared or Invalid. This might have side
effects on us such as causing any dirty buffers or VM pages to be flushed
to disk. If the remote machine needs to upgrade its cache state to
Exclusive then we have to downgrade ours to Invalid, resulting in a
flush and discard of the related buffers and VM pages.
Both range locks and range-based cache state is stored using a common
structure called a CST, in a red-black tree. All operations are
approximately N*LOG(N). CCMS uses a far superior algorithm to the one
that the POSIX locking code (lockf) has to use.
It is important to note that layer #2 cache state is fairly persistent
while layer #1 locks tend to be ephermal. To prevent too much
fragmentation of the data space the cache state for adjacent elements
may have to be actively merged (either upgraded or downgraded to match).
The buffer cache and VM page caches are naturally fragmentory, but we
really do not want the CCMS representation to be too fragmented. This
also gives us the opportunity to predispose our CCMS cache state so
I/O operations done on the local machine are not likely to require
communication with other hosts in the cluster. The cache state as
stored in CCMS is a superset of the actual buffers and VM pages cached
on the local machine.
Remove the coda fs. It hasn't worked in a long time.
Add structures and skeleton code for a new system call called syslink() which will support the kernel syslink API. This is the link protocol that will be used for user<->kernel (e.g. user VFS) and kernel<->kernel (cluster) communications. Syslink-based protocols will be used for DEV, VFS, CCMS, and other cluster-related operations.
Sync MII support with NetBSD/OpenBSD:
- Standard conforming GMII support:
1) replace mii_media_add() with mii_phy_add_media().
2) ukphy has generic GMII support now, thus retire nvphy.
- Factor common code of PHY modules out into mii_physubr.c, noticably
mii_phy_{set_media, tick, update}().
In order to support this refactoring:
1) mii_softc.{mii_reset,mii_status} funtion pointers are added, which are
used to reset PHY modules and get PHY modules' status.
2) mii_softc.mii_anegticks is added, which is used by PHY modules to tell
mii_phy_tick(), how often auto-negociation should happen. Two commonly
used values are defined as MII_ANEGTICKS and MII_ANEGTICKS_GIGE.
mii_softc.mii_anegticks is set to MII_ANEGTICKS by default.
- Add mii_softc.mii_media_status and rename mii_softc.mii_active to
mii_softc.mii_media_active. Now changes in either one of them will cause
MIIBUS_STATCHG() being involked.
- For PHY modules that utilize mii_phy_add_media(), ifmedia_entry.ifm_data
no longer stores value of BMCR. It stores an index of mii_media_table[],
which stores BMCR, ANAR and GTCR.
- Replace slightly different PHY modules detach routines with ukphy_detach().
- Use OUI and MODEL id array + mii_phy_match() in PHY modules probe routines,
instead of original large `if, else if' or `switch' code segment.
- Support more OUIs and MODELs in individual PHY module.
- Make the usage of `mii' and `sc' stack variable more consistent. `mii'
refers to miibus softc, while `sc' refers to PHY module softc.
- Nuke no longer used functions' definition and declaration.
- Regen miidevs.h
Following PHY modules were tested:
acphy(dc), brgphy(bge), e1000phy(nv,sk), exphy(xl), inphy(fxp), rgephy(re),
rlphy(rl), ruephy(rue), ukphy(nv,vr,...)
MII generic code is mainly synced with NetBSD.
Individual PHY modules are mainly synced with OpenBSD.
Tested-by: swildner, corecode
MASSIVE reorganization of the device operations vector. Change cdevsw to dev_ops. dev_ops is a syslink-compatible operations vector structure similar to the vop_ops structure used by vnodes. Remove a huge number of instances where a thread pointer is still being passed as an argument to various device ops and other related routines. The device OPEN and IOCTL calls now take a ucred instead of a thread pointer, and the CLOSE call no longer takes a thread pointer.
- Replace lnc(4) driver with NetBSD's le(4), which gives us better performance, especially for VMWare users. - Sync lnc(4) manpage. Submitted-by: Bill Marquette <bill.marquette@gmail.com> Manpage-reviewed-and-adjusted-by: swildner
Remove OLDBRIDGE
Replace the random number generator with an IBAA generator for /dev/random and a L15 generator for /dev/urandom. Submitted-by: Robin J Carey Modified-from-original: Syntax adjusted to DragonFly kernel norms, and I integrated the original code into the DragonFly kernel random number generator API. Also added a TSC-based seed on top of nanouptime.
Remove the asynchronous system call interface sendsys/waitsys. It was an idea before its time.
Remove LWKT reader-writer locks (kern/lwkt_rwlock.c). Remove lwkt_wait queues (only RW locks used them). Convert remaining uses of RW locks to LOCKMGR locks. In recent months lockmgr locks have been simplified to the point where we no longer need a lighter-weight fully blocking lock. The removal also simplifies lwkt_schedule() in that it no longer needs a special case to deal with wait lists.
Further isolate the user process scheduler data by moving more variables from the globaldata structure to the scheduler module(s). Make the user process scheduler MP safe. Make the LWKT 'pull thread' (to a different cpu) feature MP safe. Streamline the user process scheduler API. Do a near complete rewrite of the BSD4 scheduler. Remote reschedules (reschedules to other cpus), cpu pickup of queued processes, and locality of reference handling should make the new BSD4 scheduler a lot more responsive. Add a demonstration user process scheduler called 'dummy' (kern/usched_dummy.c). Add a kenv variable 'kern.user_scheduler' that can be set to the desired scheduler on boot (i.e. 'bsd4' or 'dummy'). NOTE: Until more of the system is taken out from under the MP lock, these changes actually slow things down slightly. Buildworlds are about ~2.7% slower.
Move all the resource limit handling code into a new file, kern/kern_plimit.c. Add spinlocks for access, and mark getrlimit and setrlimit as being MPSAFE. Document how LWPs will have to be handled - basically we will have to unshare the resource structure once we start allowing multiple LWPs per process, but we can otherwise leave it in the proc structure.
Mop up remains of the ibcs2/streams/svr4 removal: * Remove streams(4) and svr4(4) manual pages. * Add associated modules and their manual pages to the list of files to be removed upon 'make upgrade'. * Remove IBCS2 and SPX_HACK options. * Change M_ZOMBIE definition back to static. * Fix miscellaneous references & comments.
- Add ral(4) for Ralink RT2500/RT2501/RT2600 chip based wireless NIC - Add ral(4) to GENERIC and LINT - Add man page for ral(4) Reviewed-by: swildner Thank Damien Bergamini for his work on this driver For RT2500: - Fix a ieee80211_node leakage - Due to the inter-dependency nature of DONE/(ENCRYPT|DECRYPT) intr, reap desc rings twice if one of them comes. This change gives me ~17.6% TX performance boost on my ASUS WL-107G (WPA is used here): Original way of TX/RX intr processing ------------------------------------------------------------ Client connecting to sephe-test, TCP port 5001 TCP window size: 32.5 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.14 port 1063 connected with 192.168.2.254 port 5001 [ 3] 0.0- 5.0 sec 10.2 MBytes 17.1 Mbits/sec [ 3] 5.0-10.0 sec 9.95 MBytes 16.7 Mbits/sec [ 3] 10.0-15.0 sec 9.67 MBytes 16.2 Mbits/sec [ 3] 15.0-20.0 sec 10.1 MBytes 17.0 Mbits/sec [ 3] 20.0-25.0 sec 10.2 MBytes 17.1 Mbits/sec [ 3] 25.0-30.0 sec 10.0 MBytes 16.8 Mbits/sec [ 3] 30.0-35.0 sec 9.91 MBytes 16.6 Mbits/sec [ 3] 35.0-40.0 sec 10.3 MBytes 17.2 Mbits/sec [ 3] 40.0-45.0 sec 9.87 MBytes 16.6 Mbits/sec [ 3] 45.0-50.0 sec 9.94 MBytes 16.7 Mbits/sec [ 3] 50.0-55.0 sec 10.2 MBytes 17.2 Mbits/sec [ 3] 55.0-60.0 sec 9.73 MBytes 16.3 Mbits/sec [ 3] 0.0-60.0 sec 120 MBytes 16.8 Mbits/sec Adapted way of TX/RX intr processing ------------------------------------------------------------ Client connecting to sephe-test, TCP port 5001 TCP window size: 32.5 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.14 port 1062 connected with 192.168.2.254 port 5001 [ 3] 0.0- 5.0 sec 11.8 MBytes 19.8 Mbits/sec [ 3] 5.0-10.0 sec 11.5 MBytes 19.4 Mbits/sec [ 3] 10.0-15.0 sec 11.1 MBytes 18.7 Mbits/sec [ 3] 15.0-20.0 sec 12.0 MBytes 20.1 Mbits/sec [ 3] 20.0-25.0 sec 12.6 MBytes 21.2 Mbits/sec [ 3] 25.0-30.0 sec 11.7 MBytes 19.6 Mbits/sec [ 3] 30.0-35.0 sec 12.3 MBytes 20.7 Mbits/sec [ 3] 35.0-40.0 sec 11.9 MBytes 19.9 Mbits/sec [ 3] 40.0-45.0 sec 11.9 MBytes 19.9 Mbits/sec [ 3] 45.0-50.0 sec 12.2 MBytes 20.4 Mbits/sec [ 3] 50.0-55.0 sec 12.1 MBytes 20.2 Mbits/sec [ 3] 55.0-60.0 sec 12.3 MBytes 20.7 Mbits/sec [ 3] 0.0-60.0 sec 143 MBytes 20.0 Mbits/sec Obtained-from: FreeBSD
- Add ciphy for PHY modules produced by Cicada Semiconductor - Add vge(4) for VIA VT612x GigE, which may have ciphy as its PHY module - Add vge(4) into GENERIC and LINT - Add man page for vge(4) Thank Bill Paul for his work on this driver. Thank Sascha Wildner for preparing the man page. Obtained-from: FreeBSD Tested-by: herrgard <herrgard@gmail.com> NOTE: Although polling(4) is claimed to be supported by this driver, but it does not work that well with vge(4) (extremely slow, ~7000ms for ping, as reported by herrgard).
Sync 802.11 support with FreeBSD6:
"it includes completed 802.11g, WPA, 802.11i, 802.1x, WME/WMM, AP-side
power-save, crypto plugin framework, authenticator plugin framework,
and access control plugin frameowrk."
Reoriganize the layout of netproto/802_11: put generic 802.11 layer, crypto
modules, authentication module and access control module into their own
directories. Header files are still in their original place.
Nuke all of the mutexing in generic 802.11, reorganize ieee80211_node table
scanning a little bit.
Rename FreeBSD's m_append() to ieee80211_mbuf_append(), rename FreeBSD's
m_unshare() to ieee80211_mbuf_clone() and put them into
netproto/802_11/wlan/ieee80211_dragonly.c
They are not generic enough for public using, at least for now.
Pointed-out-by: hsu
Expose ieee80211_add_{ssid, xrates, rates}() which are used by acx(4)
Keep using opencrypto's AES implmentation for 802.11 CCMP crypto module
Sync ifconfig(8)'s 802.11 support with FreeBSD6
Update acx(4) and ndis(4) for the new 802.11 support
Sync iwi(4), ipw(4), wi(4) and ray(4) with FreeBSD6
For iwi(4):
- Fix ieee80211_node leakage
- Use a bitmap instead of FreeBSD's "unit number alloctor" to allocate IBSS node
Add generic 802.11 layer and crypto modules into GENERIC and LINT,
authentication module and access module are only added to LINT
Unhook awi(4) from GENERIC and LINT temporarily, since as of this commit it
is broken :( It will be fixed sometime later.
Thank Sam Leffler and many other people for their work on 802.11 support.
Thank Andrew Atrens and Adrian Michael Nida for submitting the patch.
Thank all the people that helped testing 802.11 patches for this commit
Based-on-Patch-Submitted-by:
Andrew Atrens <atrens@nortelnetworks.com>
Adrian Michael Nida <nida@musc.edu>
Tested-by:
Thomas Schlesinger <schlesinger@netcologne.de>
Johannes Hofmann <Johannes.Hofmann@gmx.de>
Andrew Thompson <andrew@hijacked.us>
Erik Wikström <erik-wikstrom@telia.com>
UMAPFS has been disabled (and non-working) for a long time. Scrap it entirely.
Split kern/vfs_journal.c. Leave the low level journal support code in kern/vfs_journal.c and move all the mount-based journaling code and journaling VNOPS to kern/vfs_jops.c. This is in preparation for utilizing the core journaling protocol for userland VFS support.
Transplant all the UFS ops that EXT2 used to call into the EXT2 tree and reconnect it to the build. Recent BUF/BIO work made most of the UFS tree incompatible with EXT2FS. Reported-by: Csaba Henk <csaba.henk@creo.hu>
Remove owi(4) (old wi) driver and adjust related bits. It's been replaced by the more generic wi(4) driver since Sep 5, 2004.
- Import driver[acx(4)] for TI acx100/acx111 based WiFi NIC. - Import user space utility[acxcontrol(8)] to load firmware and show driver statistics. - Add acx(4) and acxcontrol(8) man pages. - Build acx(4) as module only, since it needs firmware to work. - Add an entry for acx(4) in LINT. This driver is known to work with following hardware: D-Link DWL-520+ D-Link DWL-650+ D-Link DWL-G520+ D-Link DWL-G650+ Although both infrastructured mode and adhoc mode are supported, it may not work well in adhoc mode. PBCC based rate, 22Mbits/s, is not supported yet. acxcontrl(8) and man pages are written by Sascha Wildner. He also kindly helped debugging and testing the driver. Thank you, Sascha! The meaning and layout of hardware registers are based on the reverse engineering work done by people at acx100.sourceforge.net Thank them for their great work! This driver is initially based on acx100 developed by people at wlan.kewl.org Thank them for their nice work.
Remove NQNFS support. The mechanisms are too crude to co-exist with upcoming cache coherency management work and the original implementation hacked up the NFS code pretty severely. Move nqnfs_clientd() out of nfs_nqlease.c to a new file, nfs_kerb.c, and rename it nfs_clientd().
MFC commit from long ago: Build aicasm as a host tool This allows us to compile 1.2-Release on 1.4+ again.
MFC - add rgephy to the build.
hook rgephy
Bring in if_bridge from Open-/Net-/FreeBSD Based-on-patch-by: Andrew Atrens Reviewed-and-locking-corrected-by: dillon and sephe
Adjust sources to accomodate for repo copy of our bridging code sys/net/bridge was copied to sys/net/oldbridge
Continue work on our pluggable scheduler abstraction. Implement a system call to set the scheduler for the current process (and future children), and add an abstraction for scheduler registration. Submitted-by: Sergey Glushchenko <deen@smz.com.ua>
ICU/APIC cleanup part 5/many. Start migrating the ICU and APIC interrupt interfaces to a new machine level interrupt ABI. This ABI will eventually be tied into the BUS architecture. Move INTRDIS/INTREN to the new API: machintr_intrdis(irq) and machintr_intren(irq). Get rid of ithread_unmask(). Have the interrupt thread code call machintr_intrdis(irq) directly.
Add an mii_flags field to the attach arguments, to make it easier to create custom MII drivers without having to duplicate all of the generic attach/detach code. Add a simplified custom driver for nvidia/marvell which simply sets the flag which allows the generic mii code to probe for GiGE.
Add iwi (fixes building of LINT).
Actually hook up ipw. Forgotten since 2005-03-06.
Port rue(4) from FreeBSD:
dev/usb/if_rue.c rev 1.14, 1.20
dev/usb/if_ruereg.h rev 1.3
dev/mii/ruephy.c rev 1.1.4.1
dev/mii/ruephyreg.h rev 1.1.4.1
modules/rue/Makefile rev 1.2
This driver supports:
RealTek RTL8150 based USB ethernet devices
- Melco LUA-KTX
- GREEN HOUSE GH-USB100B
- Billionton ThumbLAN USBKR2-100B
Changes to FreeBSD version:
- use hw.rue as sysctl node instead of hw.usb.rue
- cleanup rue_attach() code path
- RUE_{LOCK, UNLOCK}() ==> crit_{enter, exit}()
- get rid of qdat
- nuke rue_softc.{rue_info, rue_unit}
- use callout_*()
- use m_getcl() instead of MGETHDR() and MCLGET()
Thanks Shunsuke Akiyama <akiyama@FreeBSD.org> and others for rue(4)
Lack of rue(4) in our base was first noticed by swilder,
thanks him for his reminding
Approved-by: dillon
Tested-by: me (with a Billionton ThumbLAN USBKR2-100B)
Give the kernel a native NFS mount rpc capability for mounting NFS roots by splitting off the mount rpc code from the BOOTP code. The loader is no longer required to pass the nfs root mount file handle to the kernel. Pure tftp-based loaders with no knowledge of NFS can now pass a NFS root mount path to the kernel without having to pass a resolved NFS file handle. This change allows kernels booted from tftp loaders to have an NFS root without having to specify BOOTP (which sometimes doesn't work properly when done from both the loader and from the kernel).
A machine-independent spinlock implementation. It has the advantages of
1. being written in C except for the most low-level atomic swap primitive,
which is universally supported on current processor architectures
2. having a very small inlined memory footprint for spin_lock(),
with the slow-path defered to a subroutine call
3. only requiring a bus-locked operation for lock acquisition,
and not requiring a bus-locked operation for lock release
4. doing a non-bus-locked check first in the spin loop to
reduce bus contention
5. doing exponential backoff in the uncommon contested case, which
Sun has found to reduce bus contention by a factor of 5 or more
Reviewed by: Matt Dillon
Make struct dirent contain a full 64bit inode. Allow more than 255 byte filenames by increasing d_namlen to 16bit. Remove UFS specific macros from sys/dirent.h, programs which really need them should include vfs/ufs/dir.h. MAXNAMLEN should not be used, but replaced by NAME_MAX. To keep the impact for older BSD code small, d_ino and d_fileno are kept in the old meaning when __BSD_VISIBLE is defined, otherwise the POSIX version d_ino is used. This will be changed later to always define only d_ino and make d_fileno a compatiblity macro for __BSD_VISIBLE. d_name is left with hard-coded 256 byte space, this will be changed at some point in the future and doesn't affect the ABI. Programs should correctly allocate space themselve, since the maximum directory entry length can be > 256 byte. For allocating dirents (e.g. for readdir_r), _DIRENT_RECLEN and _DIRENT_DIRSIZ should be used. NetBSD has choosen the same names. Revamp the compatibility code to always use a local kernel buffer and write out the entries. This will be changed later by passing down the output function to vop_readdir, elimininating the redundant copy. Change NFS and CD9660 to use to use vop_write_dirent, for CD9660 ensure that the buffers are big enough by prepending char arrays of the right size. Tested-by & discussed-with: dillon
Get rid of 4.x-derived acpi code: - move most part of /sys/dev/acpica5/Makefile.inc into /sys/conf/acpi.mk - rename SYSACPICA_DIR, OSACPI_DIR to ACPICA_DIR and ACPI_MI_DIR, make them relative to $S (or $SYSDIR) so as they can be shared between kernel and modules builds - remove 4.x ACPI lines from, and add acpica5 lines to, /sys/conf/files* - make LINT to use `device acpi' instead of older `device acpica' - adjust ACPI driver build wrapper and ACPI tools to use /sys/conf/acpi.mk
Add a new system config directive called "nonoptional" which specifies files based on options which MUST be specified in the kernel config. Make the option to maintain compatibility with DragonFly 1.2 (and older 1.3 kernels) non-optional. Older versions of config will not recognize the new option and generate a reasonable error, rather then blindly compile a kernel without 1.2 support, hopefully prompting people to recompile their config program rather then post a bug report.
Make nlink_t 32bit and ino_t 64bit. Implement the old syscall numbers for *stat by wrapping the new syscalls and truncation of the values. Add a hack for boot2 to keep ino_t 32bit, otherwise we would have to link the 64bit math code in and that would most likely overflow boot2. Bump libc major to annotate changed ABI and work around a problem with strip during installworld. strip is dynamically linked and doesn't play well with the new libc otherwise. Support for 64bit inode numbers is still incomplete, because the dirent limited to 32bit. The checks for nlink_t have to be redone too.
Do not compile the kernel with the stack protector. I've decided to tolerate the stack protector as a default for user programs, but not for the kernel. The stack protector has known bugs and I frankly believe that it is far more likely that we would hit one of its bugs verses it actually finding a stack overflow in the kernel.
Tie SCTP into the kernel, this includes adding a new syscall (sctp_peeloff). Obtained from: KAME
Move PCCARD attachment into separate file. Use the NEWCARD helper functions for accessing the CIS instead of doing it manually. Submitted-by: Sepherosa Ziehau <sepherosa@gmail.com> Obtained-from: FreeBSD
Fix typo: netgarph -> netgraph.
Associate a userland scheduler control structure with every process and call userland scheduling functions through that structure. Note that the proc structure already had a field reserved for this purpose so it actually doesn't change size. The child of a fork() inherits the parent's userland scheduler control structure pointer. Move uio_yield() to a scheduler-independant file, and do some minor cleanups of already #ifdef'd out code. Repo-Rename usched_4bsd.c to usched_bsd4.c, so the file matches the function prefixes I want to use. Believe it or not, this should not represent any operational code changes other then changing some previously direct function calls into indirect calls through the new p_usched field in the process structure.
Repo-copy kern_switch.c to usched_4bsd.c, remove kern_switch.c, and point the kernel build at usched_4bsd.c in preparation for creating a switchable userland scheduling API.
Dispose of support for IBM's Micro Channel architecture (MCA).
Generic cache of pre-initialized objects. It uses per-cpu caches for MP-safety, was designed to be NUMA-aware, and works on top of any storage allocator. The object cache is largely inspired by the object cache portion of Sun's slab allocator.
More cleanups, add the API implementation to select the system clock.
Get rid of bus_{disable,enable}_intr(), it wasn't generic enough for
our needs.
Implement some generic atomic.h functions to aid in the implementation of
a low level mutex.
Implement a generic low level sleep-mutex serializer, kern/lwkt_serialize.c.
The serializer is designed to be a replacement for SPL calls but may also
be used for other very low level work (e.g. lockmgr interlocks).
Add a serializer argument to BUS_SETUP_INTR(). When non-NULL, the interrupt
handler will no longer be protected by an SPL so e.g. spl*() will no
longer protect against that device's interrupts.
The IF queueing and dequeueing mechanisms may no longer depend on outside
SPL state because network driver interrupt handlers are no longer required to
enter splnet(). Use critical sections for the moment. The IFQ and
IFF_OACTIVE interactions are not yet MP safe.
Build aicasm as host program, not via world's compiler.
Split pcm into the generic framework (pcm) and the sound cards (snd). Add support for choosing single devices by the same name as the corresponding module. E.g. device snd_ich gives the AC97 support, device "snd_sb8" gives the SoundBlaster 8 support.
Generic firmware support. Currently implemented is loading from /etc/firmware, support kernel builtin images and kernel modules will follow later. Written-by: Johannes Hofmann and Joerg Sonnenberger
Import ALTQ support from KAME. This is based on the FreeBSD 4 snapshot. This includes neither the ALTQ3 compat code nor the !DragonFly defines. The macros have been replaced with inline functions in net/ifq_var.h. This also renames pkthdr.pf_flags as it is intended as general flag bit. Currently supported are ppp(4), sppp(4), tun(4) and wi(4), more drivers are coming later. Reviewed-by: corecode, dillon, hsu Comments-from: hmp
Remove GPLed fpemulation, old rp, old awe and pcic. dgb is still in, until the persons having the hardware decide that digi(4) works for them. Correct spelling of deprecation.
Add missing kern_umtx.c to sys/conf/files. Noticed-by: David Rhodus
Remove wx(4). It's been superseded by em(4).
VFS messaging/interfacing work stage 10/99: Start adding the journaling, range locking, and (very slightly) cache coherency infrastructure. Continue cleaning up the VOP operations vector. Expand on past commits that gave each mount structure its own set of VOP operations vectors by adding additional vector sets for journaling or cache coherency operations. Remove the vv_jops and vv_cops fields from the vnode operations vector in favor of placing those vop_ops directly in the mount structure. Reorganize the VOP calls as a double-indirect and add a field to the mount structure which represents the current vnode operations set (which will change when e.g. journaling is turned on or off). This creates the infrastructure necessary to allow us to stack a generic journaling implementation on top of a filesystem. Introduce a hard range-locking API for vnodes. This API will be used by high level system/vfs calls in order to handle atomicy guarentees. It is a prerequisit for: (1) being able to break I/O's up into smaller pieces for the vm_page list/direct-to-DMA-without-mapping goal, (2) to support the parallel write operations on a vnode goal, (3) to support the clustered (remote) cache coherency goal, and (4) to support massive parallelism in dispatching operations for the upcoming threaded VFS work. This commit represents only infrastructure and skeleton/API work.
Add the basic of libkcore. Switch pstat to use kcore/kinfo backing, defaulting to kcore for now.
There is enough demand for Kip Macy's checkpointing code to warrent permanent integration into the kernel. Add a fixed system call, sys_checkpoint(2), to support the checkpt(1) utility as well as user programs which want to install their own signal handler (SIGCKPT).
Implement SACK.
Sync with FreeBSD-current: - Split dcons core and OS dependent part. - Use dcons buffer passed by loader(8). - Invalidate dcons buffer on shutdown.
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things. This patch also fixes FS
corruption due to prior vfs work in head. In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths. With
this patch there appear to be no serious problems even after two days
of continuous testing.
* VX lock all VOP_CLOSE operations.
* Fix two NFS issues. There was an incorrect assertion (found by
David Rhodus), and the nfs_rename() code was not properly
purging the target file from the cache, resulting in Stale file
handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue. Programs which open("/dev/tty" ,...) and
then run the TIOCNOTTY ioctl were causing the system to lose track
of the open count, preventing the tty from properly detaching.
This is actually a very old BSD bug, but it came out of the woodwork
in DragonFly because I am now attempting to track device opens
explicitly.
* Gets rid of the vnode interlock. The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
(A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
and VOP_RECLAIM). New guarentees emplaced with regard to vnode
ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
directories are now held so the leaf test should no longer be
necessary.
* Splits the over-large vfs_subr.c into three additional source
files, broken down by major function (locking, mount related,
filesystem syncer).
* Changes splvm() protection to a critical-section in a number of
places (bleedover from another patch set which is also about to be
committed).
Known issues not yet resolved:
* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
pass to make vp->v_lock mandatory and to clean up the few remaining
inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
on the server to not be properly detected on the client if the client
already has a negative-cache hit for the filename in question.
Testing-by: David Rhodus <sdrhodus@gmail.com>,
Peter Kadau <peter.kadau@tuebingen.mpg.de>,
walt <wa1ter@myrealbox.com>,
others
VFS messaging/interfacing work stage 6/99. Populate and maintain the namecache pointers previously attached to struct filedesc, giving the new lookup code a base from which to work. Implement the new lookup API (it is not yet being used by anything) and augment the namecache API to handle the new functions, in particular adding cache_setvp() to resolve an unresolved namecache entry into a positive or negative hit and set various flags. Note that we do not yet cache symlink data but we could very easily. The new API is greatly simplified. Basically nlookups need only returned a locked namecache pointer (guarenteeing namespace atomicy). Related vnodes are not locked. Both the leaf and governing directory vnodes can be extracted from the returned namecache pointer. namecache pointers may also represent negative hits, which means that their namespace locking feature serves to reserve a filename that has not yet been created (e.g. open+create, rename). The kernel is still using the old API as of this commit. This commit is primarily introducing the management infrastructure required to actually start writing code to use the new API. VOP_RESOLVE() has been added, along with a default function which falls back to VOP_LOOKUP()/VOP_CACHEDLOOKUP(). This VOP function is not yet being used as of this commit. This VOP will be responsible for taking an unresolved but locked namecache structure (hence the namespace is locked), and actually does the directory lookup. But unlike the far more complex VOP_LOOKUP()/VOP_CACHEDLOOKUP() API the VOP_RESOLVE() API only needs to attach a vnode (or NULL if the entry does not exist) to the passed-in namecache structure. It is likely that timeouts, e.g. for NFS, will also be attached via this API. This commit does not implement any of the cache-coherency infrastructure but keeps this future requirement in mind in its design.
Hooks to build dcons(4)/dcons_crom(4).
Add KTR, a facility that logs kernel events to help debugging. You can access the logged information with ddb. If KTR_VERBOSE is defined a string will also be printed with printf() to your console. See ktr(4) and ktr(9) for information on how to use KTR. Obtained from: FreeBSD
Kernel part of PF Ported-by: - Max Layer (original patch set, FreeBSD PF maintainer) - Devon O'Dell, Simon 'corecode' Schubert (integration and DragonFly specific changes) In contrast to FreeBSD and OpenBSD, use direct flags in pkthdr instead of m_tags. This reduces allocation and processing overhead. Keep the IP header in Host Byte Order like the rest of the tree assumes. Module support has a memory leak for vm_zones when unloading pf.ko.
Import the new wi(4) driver based on the generic 802.11 layer. Obtained-from: FreeBSD
Save current version of wi(4) as owi before switching to generic 802.11 version. Small modifications to allow kernels with both devices included. if_owi.c: copy of if_wi.c, 1.15 if_owi_pccard.c: copy of if_wi_pccard.c, 1.5 if_owi_pci.c: copy of if_wi_pci.c, 1.4 if_wireg.h: copy of if_wireg.h, 1.3 if_wivar.h: copy of if_wivar.h, 1.4 owi_hostap.c: copy of wi_hostap.c, 1.9 wi_hostap.h: copy of wi_hostap.h, 1.2
VFS messaging/interfacing work stage 1/99. This stage replaces the old dynamic VFS descriptor and inlined wrapper mess with a fixed structure and fixed procedural wrappers. Most of the work is straightforward except for vfs_init, which was basically rewritten (and greatly simplified). It is my intention to make the vop_*() call wrappers eventually handle range locking and cache coherency issues as well as implementing the direct call -> messaging interface layer. The call wrappers will also API translation as we shift the APIs over to new, more powerful mechanisms in order to allow the work to be incrementally committed. This is the first stage of what is likely to be a huge number of stages to modernize the VFS subsystem.
Add re(4) to GENERIC.
Import generic 802.11 layer. Choose netproto/802_11 instead of net80211 as source location. Use token API instead mutexing. The locking heavenly depends on atomic Operations and needs additional work. Use POSIX int types. Add dev/wi/if_wavelan_ieee.h as netproto/802_11/if_wavelan_ieee.h, since this is used by all wireless interfaces and beside the variable and constant naming mostly wi(4) indepedent. Obtained-from: FreeBSD
Add RC4 to the crypto module / device. This will be used by the generic 802.11 layer.
ugenbuf is associated with the 'ugen' device, not the 'ugenbuf' device. Reported-by: "GeekGod" <GeekGod@GeekGod.com>
Julian Elischer posted an interesting proof-of-concept to freebsd-current regarding UGEN's use of a 1K stack buffer for bulk IO issues. The small block size resulted in unnecessarily slow performance with certain devices. Implement a fix along the lines described. Create a simple ugen buffer allocator abstraction and a one-entry cache to avoid unnecessary malloc/free sequences. Allow the block size to be set with a sysctl and default it to 16K. Not much uses UGEN. Camera software, mainly. The change appears to slightly improve s10sh transfer performance from my Canon 10D.
Get rid of the PFIL_HOOKS option, integrate pfil in the system permanently. (previously the packet filters couldn't even be kldload'd without PFIL_HOOKS).
Add the MSFBUF API. MSFBUFs are like SFBUFs but they manage ephermal multi-page mappings instead of single-page mappings. MSFBUFs have the same caching and page invalidation optimizations that SFBUFs have and are considered to be SMP-friendly. Whereas XIO manages pure page lists, MSFBUFs manage KVA mappings of pure page lists. This initial commit just gets the basic API operational. The roadmap for future work includes things like better interactions with third-party XIOs, mapping user buffers into the kernel (extending the xio_init_ubuf() API into the MSFBUF API), and allowing higher level subsystems to pass previously released MSFBUFs as a hint to speed-up regeneration. We also need to come up with a way to overload additional sets of MSFBUFs representing smaller chunks of memory on top of the same KVA space in order to efficiently use our KVA reservation when dealing with subsystems like the buffer cache. MSFBUFs will eventually replace the KVA management in the BUF/BIO, PIPE, and other subsystems which create fake linear mappings with pbufs. The general idea for BUF/BIO will be to use XIO and MSFBUFs to avoid KVA mapping file data through the nominal I/O path. XIO will be the primary I/O buffer mechanism while MSFBUFs will be used when things like UFS decide they need a temporary mapping. This is a collaborative work between Hiten Pandya <hmp@leaf.dragonflybsd.org> and Matthew Dillon <dillon@backplane.com>.
async syscall work: The async syscall code got dated by recent LWKT changes, set mp_abort_port and clear MSGF_DONE as appropriate. If a system call returns EASYNC, record the message in p->p_sysmsgq so we can run them down in exit1(). In exit1(), run down any asynch system calls that are still running. Note that this commit does not implement abort support (yet). Get rid of lwkt_port->mp_refs, it was not being used and it is likely never going to be used (reference counting LWKT is hazzardous anyway since it doesn't really fit the access model). Add lwkt_checkmsg() to support some of the rearranged async syscall code.
Update bktr(4) to FreeBSD current's version. This most importantly includes a new msp driver based on the Linux Brooktree driver. Add support for Terratec TValue submitted by Patrick Mauritz <oxygene@studentenbude.ath.cx>. The ioctl headers are moved into the MI dev/ tree, symlinks for compatibiliy are added in a separate commit.
Add in kernel config file options that were forgotten on last commit.
Trash the vmspace_copy() hacks that CAPS was previously using. No other subsystem uses these hacks and the new XIO mechanism is far, far superior.
Hook XIO up to the kernel build.
Adjust the Makefile's to move the iconv files to libiconv, and add it to the module build.
Merge the kernel part of UDF support from FreeBSD 5. This doesn't include the iconv hocks and makes use of M_WAITOK everywhere.
Dispatch upper-half protocol request handling.
Add bfe(4) support from FreeBSD. Initial code submitted by Peter Avalos <pavalos@theshell.com>. Changes to the FreeBSD version: - make the code consistent w.r.t. style(9) - remove some unused entries from bfe_softc - use the PCI IDs from pcidevs.h - use BUS_DMA_WAITOK since the allocation is done in bfe_attach before the interrupt is registered and sleeping is therefore safe - fix some warnings in the code about signed/unsigned comparisions
Split off the PCI-PCI bridge and the PCI-ISA bridge code from pcisupport.c. This moves just code around and is non-functional.
Synchronize a bunch of things from FreeBSD-5 in preparation for the new ACPICA driver support. * Bring in a lot of new bus and pci DEV_METHODs from FreeBSD-5 * split apic.h into apicreg.h and apicio.h * rename INTR_TYPE_FAST -> INTR_FAST and move the #define * rename INTR_TYPE_EXCL -> INTR_EXCL and move the #define * rename some PCIR_ registers and add additional macros from FreeBSD-5 * note: new pcib bus call, host_pcib_get_busno() imported. * kern/subr_power.c no longer optional. Other changes: * machine/smp.h machine smp/smptests.h can now be #included unconditionally, and some APIC_IO vs SMP separation has been done as well. * gd_acpi_id and gd_apic_id added to machine/globaldata.h prep for new ACPI code. Despite all the changes, the generated code should be virtually the same. These were mostly additions which the pre-existing code does not (yet) use.
Remove duplicate line for if_ray
* Remove ufs_disksubr.c from kernel build files.
Split the IPIQ messaging out of lwkt_thread.c and move it to its own file, lwkt_ipiq.c. Add a MI synchronous cpu rendezvous API lwkt_cpusync_*(). This API allows the kernel to synchronize an operation across any number of cpus. Multiple cpus can initiate synchronization operations simultaniously without creating a deadlock. The API utilizes the IPI messaging core and guarentees that other synchronization and IPI messaging operations will continue to work during any given synchronization op. The API is a spin-blocking API, meaning that it will not switch threads and can be used by mainline code, interrupts, and other sensitive code. This API is intended to replace smp_rendezvous(), Xcpustop, and other hardwired IPI ops. It will also be used to fix our TLB shootdown code. As of this commit the API has not yet been connected to anything and has been tested only a little.
Move <machine/in_cksum.h> to <sys/in_cksum.h>. This file is now platform independant. If we want to add extreme machine specialization later on then sys/in_cksum.h will #include machine/in_cksum.h. Move i386/i386/in_cksum.c to netinet/in_cksum.c. Note that netinet/in_cksum.c already existed but was not used by the build system at all. The move overwrites it. The new in_cksum.c is a portable, complete rewrite which references core assembly (procedure call) to do 32-bit-aligned work. See also i386/i386/in_cksum2.s.
Sync if_ed with FreeBSD current
Initial backport of NEWCARD from FreeBSD 5. The included version is from end of November 2002 with the exception of bus/pccard/pccarddevs which is from November 2003. Thanks to Warner Losh and the other folks for NEWCARD.
Split the lwkt_token code out of lwkt_thread.c. Give it its own file. No operational changes.
* Add kern_systimer.c to the kernel build process.
Resident executable support stage 1/4: Add kernel bits and syscall support for in-kernel caching of vmspace structures. The main purpose of this feature is to make it possible to run dynamically linked programs as fast as if they were statically linked, by vmspace_fork()ing their vmspace and saving the copy in the kernel, then using that whenever the program is exec'd.
CAPS IPC library stage 1/3: The core CAPS IPC code, providing system calls to create and connect to named rendezvous points. The CAPS interface implements a many-to-1 (client:server) capability and is totally self contained. The messaging is designed to support single and multi-threading, synchronous or asynchronous (as of this commit: polling and synchronous only). Message data is 100% opaque and so while the intention is to integrate it into a userland LWKT messaging subsystem, the actual system calls do not depend on any LWKT structures. Since these system calls are experiemental and may contain root holes, they must be enabled via the sysctl kern.caps_enabled.
Add pcib_if.m
* Add in support for the IBM ServeRAID controller. Port done and sent in by: TONETANI Tomokazu <ghwt+dragonfly-kernel@les.ath.cx>
Move the FreeBSD 2.2 and 3.x PCI compatibility code into pci_compat.c and let it depend on COMPAT_OLDPCI. Adjust LINT to accordingly.
Import the libkern/fnmatch code from FreeBSD-5. Submitted-by: Max Laier <max@love2party.net>
Bring in the entire FreeBSD-5 USB infrastructure. As of this commit my USB camera, Hard Drive, Mouse, and Sony memory key all work and I can even unplug and replug them in without crashing the port. Not all drivers and subsystems compile as of this commit, but the ones that do not are very close.
Pull the sf_buf routines and structures out into its own files in anticipation of wider future use. Requested and reviewed by: dillon
Add -fstack-protector support for the kernel.
http://www.trl.ibm.com/projects/security/ssp/
Submitted-by: Ryan Dooley <dooleyr@missouri.edu>
Add strlcpy and strlcat to libkern
PCI compat cleanup, part 1. This brings in the LNC and VX drivers from FreeBSD-5. They are not the newest versions, just new enough to use newbus, not the PCI compat code. Submitted-by: Joerg Sonnenberger <joerg@britannica.bec.de>
Add PFIL_HOOKS functionality. This allows us to plug in many firewalling architectures by using/having generic hooks in the networking code.
Add the MPIPE subsystem. This subsystem is used for 'pipelining' fixed-size allocations. Pipelining is used to avoid lack-of-resource deadlocks by still allowing resource allocations to 'block' by guarenteeing that an already in-progress operation will soon free memory that will be immediately used to satisfy the blocked resource. Adjust the ATAold code to use the new mechanism and remove the code that tried to back-off into PIO mode when resources were lacking.
Implement an upcall mechanism to support userland LWKT. This mechanism will allow multiple processes sharing the same VM space (aka clone/threading) to send each other what are basically IPIs. Two new system calls have been added, upc_register() and upc_control(). Documentation is forthcoming. The upcalls are nicely abstracted and a program can register as many as it wants up to the kernel limit (which is 32 at the moment). The upcalls will be used for passing asynch data from kernel to userland, such as asynch syscall message replies, for thread preemption timing, software interrupts, IPIs between virtual cpus (e.g. between the processes that are sharing the single VM space).
Rework the logic in the kernel config files. Remove all build magic for the module interface methods from the 'files' file and move it to the Makefile where it belongs. Based on: Makefile.i386 1.181 and files 1.348 Tested with: LINT
Split mmap(). Move ovadvise(), ogetpagesize() and ommap() to new file 43bsd/43bsd_vm.c. http://gomerbud.com/daver/patches/dragonfly/syscall-separation-15.diff
Move ogethostname(), osethostname(), ogethostid(), osethostid(), and
oquota() to the 43bsd emulation subtree.
Change o{get,set}hostname() to use kernel_sysctl() instead of
userland_sysctl().
Network threading stage 1/3: netisrs are already software interrupts, which means they alraedy run in their own thread. This commit creates multiple supporting threads for netisrs rather then just one and code has been added to begin routing packets to particular threads based on their content. Eventually this will lead to us being able to isolate and serialize PCBs in particular threads. The tail end of the ip_input path's protocol dispatch, the UIPC (user entry) code, and listen socket have not been covered yet and still need to be serialized. A new debugging sysctl, net.inet.ip.mthread_enable, has been added. It defaults to 1. If you set this sysctl 0 netisr processing will revert to the prior single-threaded behavior. Submitted-by: Jeffrey Hsu <hsu@FreeBSD.org> Additional-work-by: dillon
Variant symlink support stage 1/2: Implement support for storing and retrieving system-specific, user-specific, and process-specific variables.
add cmpdi2 and ucmpdi2 to conf/files to fix LINT build.
Split wait4(), setrlimit(), getrlimit(), statfs(), fstatfs(), chdir(),
open(), mknod(), link(), symlink(), unlink(), lseek(), access(), stat(),
lstat(), readlink(), chmod(), chown(), lchown(), utimes(), lutimes(),
futimes(), truncate(), rename(), mkdir(), rmdir(), getdirentries(),
getdents().
Trash the 4.3BSD numeric filesystem type support in mount().
Move ocreat(), olseek(), otruncate(), ostat(), olstat(), owait(),
ogetrlimit(), and osetrlimit() to the 43bsd subtree and reimplement
using split syscalls. Move ogetdirentries() to the subtree without
change because it is such a mess.
Convince linux_waitpid(), linux_wait(), linux_setrlimit(),
linux_old_getrlimit(), and linux_getrlimit() to use split syscalls.
The file kern/vfs_syscalls.c is now completely free of COMPAT_43 code.
I believe that execve() is the only pending split before I can tackle
stackgap usage in the linux emulator's CHECKALT{EXIST,CREAT}() macros.
Add nForce AGP support, taken from FreeBSD with some minor changes to get it to work with DragonFly.
Remove the FreeBSD 3.x signal code. This includes osendsig(), osigreturn() and a couple of structures that these syscalls depended on. Split the sigaction(), sigprocmask(), sigpending(), sigsuspend(), sigaltstack() and kill() syscalls. Move the 4.3BSD signal syscalls osigvec(), osigblock(), osigsetmask(), osigstack() and okillpg() to the 43bsd subtree. I'm not too sure if these will even work with the FreeBSD-4 signal trampoline code, but they do compile and link. Implement linux_signal(), linux_rt_sigaction(), linux_sigprocmask(), linux_rt_sigprocmask(), linux_sigpending(), linux_kill(), linux_sigaction(), linux_sigsuspend(), linux_rt_sigsuspend(), linux_pause(), and linux_sigaltstack() with the new in-kernel syscalls. This patch kills 7 stackgap allocations in the Linuxolator.
Create the kern_fstat() and kern_ftruncate() in-kernel syscalls. Implement fstat(), nfstat() and ftruncate() using the in-kernel syscalls. Move ofstat() and oftruncate() to the 43bsd emulation tree and implement with in-kernel syscalls. Create the linux_ftruncate() syscall in the linux emulation layer. This replaces a direct use of oftruncate() in the linux syscall map. Rewrite linux_newfstat() and linux_fstat64() with the in-kernel syscalls.
Entirely remove the old kernel malloc and kmem_map code. The slab allocator is now mandatory. Also remove the related conf options, USE_KMEM_MAP and NO_SLAB_ALLOCATOR.
Second contigmalloc() cleanup: * Move the contigmalloc/vm_contig_pg API into its own file, vm_contig.c. * Give contigmalloc1() a more sensible to reflect its purpose, contigmalloc_map().
Augment falloc() to support thread-only file pointers (with no integer file descriptor or process). Add new generic 'easy to use' fp_*() kernel functions which operate on file pointers. This will greatly ease in-kernel functions which must open, perform I/O, and close files. Adopted from: other kernel sources and Kip Macy's checkpoint code.
Looks like we can't have comments on the same lines that are being parsed.
* Intel ACPI 20030228 distribution with local DragonFly changes. * OSPM ACPI driver. Note that this driver does not include support for PCI interrupt routing or enumeration of ISA bridges or Host to PCI bridges. While functional on some machines, this driver should be considered experimental and should be tested prior to being deployed in a production environment. Original work done by John Baldwin Sponsored by: The Weather Channel
Centralize if queue handling. Original patch against FreeBSD submitted by Jonathan Lemon. Reviewed by Matt Dillon.
Create an emulation/43bsd directory and move the recently modified compatibility syscalls there. Any future work on the COMPAT_43 code should be split from the rest of the kernel and moved here. Everything in the kernel that explicity uses the osockaddr structure has been modified to include "emulation/43bsd/43bsd_socket.h". There was one case where struct osockaddr was used in userland, talk/talkd. This commit has a temporary fix for talk/talkd.
Inital cleanup work to make NETNS compile again before someone tries to remove it. 8-) Also fix a few small bugs and try to make the code do the right thing.
SLAB ALLOCATOR Stage 1. This brings in a slab allocator written from scratch
by your's truely. A detailed explanation of the allocator is included but
first, other changes:
* Instead of having vm_map_entry_insert*() and friends allocate the
vm_map_entry structures a new mechanism has been emplaced where by
the vm_map_entry structures are reserved at a higher level, then
expected to exist in the free pool in deep vm_map code. This preliminary
implementation may eventually turn into something more sophisticated that
includes things like pmap entries and so forth. The idea is to convert
what should be low level routines (VM object and map manipulation)
back into low level routines.
* vm_map_entry structure are now per-cpu cached, which is integrated into
the the reservation model above.
* The zalloc 'kmapentzone' has been removed. We now only have 'mapentzone'.
* There were race conditions between vm_map_findspace() and actually
entering the map_entry with vm_map_insert(). These have been closed
through the vm_map_entry reservation model described above.
* Two new kernel config options now work. NO_KMEM_MAP has been fleshed out
a bit more and a number of deadlocks related to having only the kernel_map
now have been fixed. The USE_SLAB_ALLOCATOR option will cause the kernel
to compile-in the slab allocator instead of the original malloc allocator.
If you specify USE_SLAB_ALLOCATOR you must also specify NO_KMEM_MAP.
* vm_poff_t and vm_paddr_t integer types have been added. These are meant
to represent physical addresses and offsets (physical memory might be
larger then virtual memory, for example Intel PAE). They are not heavily
used yet but the intention is to separate physical representation from
virtual representation.
SLAB ALLOCATOR FEATURES
The slab allocator breaks allocations up into approximately 80 zones based
on their size. Each zone has a chunk size (alignment). For example, all
allocations in the 1-8 byte range will allocate in chunks of 8 bytes. Each
size zone is backed by one or more blocks of memory. The size of these
blocks is fixed at ZoneSize, which is calculated at boot time to be between
32K and 128K. The use of a fixed block size allows us to locate the zone
header given a memory pointer with a simple masking operation.
The slab allocator operates on a per-cpu basis. The cpu that allocates a
zone block owns it. free() checks the cpu that owns the zone holding the
memory pointer being freed and forwards the request to the appropriate cpu
through an asynchronous IPI. This request is not currently optimized but it
can theoretically be heavily optimized ('queued') to the point where the
overhead becomes inconsequential. As of this commit the malloc_type
information is not MP safe, but the core slab allocation and deallocation
algorithms, non-inclusive the having to allocate the backing block,
*ARE* MP safe. The core code requires no mutexes or locks, only a critical
section.
Each zone contains N allocations of a fixed chunk size. For example, a
128K zone can hold approximately 16000 or so 8 byte allocations. The zone
is initially zero'd and new allocations are simply allocated linearly out
of the zone. When a chunk is freed it is entered into a linked list and
the next allocation request will reuse it. The slab allocator heavily
optimizes M_ZERO operations at both the page level and the chunk level.
The slab allocator maintains various undocumented malloc quirks such as
ensuring that small power-of-2 allocations are aligned to their size,
and malloc(0) requests are also allowed and return a non-NULL result.
kern_tty.c depends heavily on the power-of-2 alignment feature and ahc
depends on the malloc(0) feature. Eventually we may remove the malloc(0)
feature.
PROBLEMS AS OF THIS COMMIT
NOTE! This commit may destabilize the kernel a bit. There are issues
with the ISA DMA area ('bounce' buffer allocation) due to the large backing
block size used by the slab allocator and there are probably some deadlock
issues do to the removal of kmem_map that have not yet been resolved.
kernel tree reorganization stage 1: Major cvs repository work (not logged as
commits) plus a major reworking of the #include's to accomodate the
relocations.
* CVS repository files manually moved. Old directories left intact
and empty (temporary).
* Reorganize all filesystems into vfs/, most devices into dev/,
sub-divide devices by function.
* Begin to move device-specific architecture files to the device
subdirs rather then throwing them all into, e.g. i386/include
* Reorganize files related to system busses, placing the related code
in a new bus/ directory. Also move cam to bus/cam though this may
not have been the best idea in retrospect.
* Reorganize emulation code and place it in a new emulation/ directory.
* Remove the -I- compiler option in order to allow #include file
localization, rename all config generated X.h files to use_X.h to
clean up the conflicts.
* Remove /usr/src/include (or /usr/include) dependancies during the
kernel build, beyond what is normally needed to compile helper
programs.
* Make config create 'machine' softlinks for architecture specific
directories outside of the standard <arch>/include.
* Bump the config rev.
WARNING! after this commit /usr/include and /usr/src/sys/compile/*
should be regenerated from scratch.
Move the backtrace() function from kern_subr.c to kern_debug.c. All debugging related kernel functions, and syscalls should be added into this file. Discussed with: Matt (about kern_debug.c)
DEV messaging stage 2/4: In this stage all DEV commands are now being funneled through the message port for action by the port's beginmsg function. CONSOLE and DISK device shims replace the port with their own and then forward to the original. FB (Frame Buffer) shims supposedly do the same thing but I haven't been able to test it. I don't expect instability in mainline code but there might be easy-to-fix, and some drivers still need to be converted. See primarily: kern/kern_device.c (new dev_*() functions and inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c for the high points. In this stage all DEV messages are still acted upon synchronously in the context of the caller. We cannot create a separate handler thread until the copyin's (primarily in ioctl functions) are made thread-aware. Note that the messaging shims are going to look rather messy in these early days but as more subsystems are converted over we will begin to use pre-initialized messages and message forwarding to avoid having to constantly rebuild messages prior to use. Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned up in subsequent passes. e.g. the way sub-devices inherit the main device's cdevsw was always a bad hack and it still is, and several functions (mmap, kqfilter, psize, poll) return results rather then error codes, which will be fixed since now we have a message to store the result in :-)
This is the initial implmentation of the LWKT messaging infrastructure. Messages are sent to message ports and typically replied to a message port embedded in the originating thread's thread structure (td_msgport). The port functions match up and optimization client sync/asynch requests verses target synch/asynch responses. In this initial implementation a port must be owned by a particular thread, and we use *asynch* IPI messaging to forward queueing and dequeueing operations to the correct cpu. Most of the IPI overhead will be absorbed by the fact that these same IPIs also tend to schedule the threads in question, which on the correct cpu (which is the one it will be on) costs nothing. Message ports have in-context dispatch functions for initiating, aborting, and replying to a message which can be overriden and will queue by default. This code compiles but is as yet unreferenced, and almost certainly needs more work.
threaded interrupts 1: Rewrite the ICU interrupt code, splz, and doreti code. The APIC code hasn't been done yet. Consolidate many interrupt thread related functions into MI code, especially software interrupts. All normal interrupts and software interrupts are now threaded, and I'm almost ready to deal with interrupt-thread-only preemption. At the moment I run interrupt threads in a critical section and probably will continue to do so until I can make them MP safe.
Add kern/lwkt_rwlock.c -- reader/writer locks. Clean up the process exit & reaping interlock code to allow context switches to occur. Clean up and make operational the lwkt_block/signaling code.
thread stage 8: add crit_enter(), per-thread cpl handling, fix deferred interrupt handling for critical sections, add some basic passive token code, and blocking/signaling code. Add structural definitions for additional LWKT mechanisms. Remove asleep/await. Add generation number based xsleep/xwakeup. Note that when exiting the last crit_exit() we run splz() to catch up on blocked interrupts. There is also some #if 0'd code that will cause a thread switch to occur 'at odd times'... primarily wakeup()-> lwkt_schedule()->critical_section->switch. This will be usefulf or testing purposes down the line. The passive token code is mostly disabled at the moment. It's primary use will be under SMP and its primary advantage is very low overhead on UP and, if used properly, should also have good characteristics under SMP.
thread stage 7: Implement basic LWKTs, use a straight round-robin model for the moment. Also continue consolidating the globaldata structure so both UP and SMP use it with more commonality. Temporarily match user processes up with scheduled LWKTs on a 1:1 basis. Eventually user processes will have LWKTs, but they will not all be scheduled 1:1 with the user process's runnability. With this commit work can potentially start to fan out, but I'm not ready to announce yet.
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids. Most ids have been removed from !lint sections and moved into comment sections.
import from FreeBSD RELENG_4 1.340.2.137