Up to [DragonFly] / src / sys / dev / netif / re
Request diff between arbitrary revisions
Keyword substitution: kv
Default branch: MAIN
Put unknown hardware ids print back into bootverbose, mainly to suppress it on a box with rl(4).
Try recollecting RX/TX descriptors if we are going to switch back to TX/RX interrupts. There seems to be a race between turning on TX/RX interrupt and asserting TX/RX interrupt by the hardware. This fixes the poor performance that I have seen when using single threaded remote cpdup.
- Move RX filter configuration from re_init() into re_setmulti() - IFF_BROADCAST will never be set/cleared - Fix SIOCSIFFLAGS support; it was really annoying that each time when I ran tcpdump, the NIC reinitialized itself.
Add pcie_set_max_readrq() to avoid code duplication between various network device drivers.
Add m_devpad() to avoid code duplication in various network device drivers
- In re_stop(), call re_reset(), which is supposed to stop TX/RX engines. - In re_reset(), don't touch 0x82 (a magic CSR), which seems to be 8110/8169 specific. Write 1 to it on attach path. According-to: RealTek r8169-6.007.00 - For certain chips (looks like all MAC2 chips), RE_CMD_RESET will not stop TX/RX engines, a seperate command (RE_CMD_STOPREQ) must be issued before RE_CMD_RESET. According-to: RealTek r8168-8.008.00
- Pack boolean fields into re_softc.re_flags - Nuke some unused fields in re_softc
- Move PCIe chip detection into re_probe() - Panic if the passed in "max read request size" exceeds limit
Try recollecting TX descriptors when we are short of them in re_start()
There are 4 fields in re_hwrev
- Don't claim 7422 MTU size is support by various 8111/8169 chips (PCI devices);
6144 MTU size works reliably.
Set MTU above 6144 (6 * 1024) on these chips and do following test:
netperf -H host -l 30 -t UDP_STREAM -- -m (mtu-28)
All kinds of wiredness will pop up on the test box.
- Set max supported MTU to 9216 for 8168D.
Obtained-from: Realtek r8168-8.008.00
- Set max supported MTU to 6144 for non-8168D GigE chips.
- Cleanup jumbo frame/MTU size related macros.
# As usual, 8169(with 88E1000 PHY) does not seem to work well with any jumbo
# frame size
Fix hardware vlan tagging support by setting vlan information on all TX descriptors for multi-segment packets. # Even with this fix in place, 8169 still does not work reliably with vlan. # Certain packets are never seen on the wire; maybe caused by the trailing # ether frame CRC generated by the hardware?
Fix re_ioctl SIOCSIFCAP support, so that VLAN_HWTAGGING and VLAN_MTU could be turned off.
Correct jumbo frame support for 8168C/CP/D. These newer chips use ancient design, which does _not_ support gathering RX. An even worse aspect of the new chips' design is that it does not compat with old ones: the buffer length field in the RX descriptor seems to be completely ignored by the hardware. This means host memory will be trashed by hardware if driver uses gathering RX. Allocate a jumbo buffer pool for these chips and configure "max RX packet size" register according to MTU.
Adjust max read request size according to MTU; 512 seems to be the only value that works with jumbo frames without "watchdog timeout" during UDP_STREAM netperf tests.
According to wpaul's comment, 8139C+ only support 64 TX/RX descriptors
Add hardware csum offload support for MAC style 2 chips, which include 8102E, 8102EL, 8168C, 8168CP and 8168D. Obtained-from: RealTek r8101-1.009.00 r8168-8.008.00 Add RE_C_AUTOPAD capability to indicate hardware could correctly pad short ether frames. Turn it on for newer version of 8168B (0x38000000 and 0xb8000000) and MAC style 2 chips; manually padding short UDP packets for newer version 8168B will result in incorrect UDP csum, while manually padding short ICMP packets for MAC style 2 chips will result in both incorrect IP header csum and incorrect IP length (o_O)
- Adjust PCI latency timer on all types of chips - Adjust PCI cache line size for 8110/8169 chips - For certain revision of 8101E, reading MAC address from IDRx may not work; read from EEPROM instead - Add comment that adjusting config1 and config5 may cause unrecoverible disaster Obtained-from: RealTek Linux drivers
0x28000000 is 8168D according to Realtek r8168-8.008.00 driver
Add some PHY fixups before we do mii_phy_probe() Obtained-from: Realtek BSD driver v176
Bring in some PCI register settings from RealTek BSD driver v176. Disable the PCI register configuration for "style 2 MAC", add comment about it.
- Read ethernet address from IDRx registers. Obtained-from: RealTek BSD driver v176 This eliminates the need to read/config EEPROM. Put EEPROM related functions under RE_USE_EEPROM; disabled by default - Maintain re_softc size no matter what kernel options we are using - Remove RE_DISABLE_HWCSUM; we could do it by clearing RE_C_HWCUM
re_softc.re_swcum_lim is applied to ethernet frame without trailing CRC, so it should include the size of ether header.
- Nuke re_type, add RE_C_8139CP to indicate the chip is 8139C+ - Change hardware revision mask from 0x7cc00000 to 0xfc800000 Obtained-from: Realtek BSD driver v176 - Convert MAC mode to MAC version and save MAC version in softc Obtained-from: Realtek BSD driver v176 - Add hardware revision 0x34800000(8102E) and 0x28000000(chip name is unknown) Obtained-from: Realtek BSD driver v176
Rework re_probe()
Rename some HWREV
Use hardware timer to simulate interrupt moderation. Old devices will no longer be livelocked when they are receiving on GigE line. Newer devices also gain well controlled interrupt rate. If hardware supports interrupt moderation (e.g. 8168B, 8168C), you could also use hardware based interrupt moderation, however, due to lack of necessary information it does not work as reliably as simulated interrupt moderation. It is _not_ recommended currently. By default, PCI-E devices' simulated interrupt moderation timer is set to 75us, while PCI devices' is set to 125us.
- According to Realtek's BSD driver v176, we could always write to MISSEDPKT - Use pci_get_pciecap_ptr() to decide whether a given chip is PCI-E or not - Rename re_flags to re_caps; we will need a real re_flags soon
- It does not make sense to disable TX interrupt moderation - Add field in softc to store RX related interrupt bits This cleanup eases upcoming changes.
- Nuke interrupt bits definition which don't apply to 8169 - Don't test TX desc unavailable bit in re_intr, since it is never enabled
- Rearrange comment - Reduce RX im timer from 125us to 50us
Add RX interrupt moderation suport for PCI-E GigaE chips. Interrupt moderation register position is obtained from Realtek's BSD driver v176. The meaning of the IM register bits is partially reverse engineered: RX timer position and unit. This kind of interrupt moderation does not work on PCI GigaE chips.
re_freebufmem() may be re-entered, so set the mbuf tag to NULL after it is destroyed.
- Set hardware timer according to bus clock. Adjust hardware timer to 8000HZ - For PCI-E device, increase "max read request size" from default value (512) to 4096. With 512 TX descriptors, this change gives me additional +80-90Mbps during netperf stream tests on an 8168C.
Get bus clock, which will be used to fix broken TCTR setting (hardware timer, interrupt moderation related)
Set ifq maxlen according to number of TX descriptors
Add tunable for RX/TX descriptor count
Don't assume that RE_RX_DESC_CNT and RE_TX_DESC_CNT are always same
- For relative newer parts (8168B), setting MTPS (max transmit packet size) according to MTU makes jumbo frame + TX csum offloading work. However, for old ones (8169), setting MTPS does not have much effect. - Reduce max jumbo frame size from 9018 to 7440 (according to DS) - Fix MTU setting in re_ioctl
Free sysctl tree during detach
Transmit csum offload does not work at all on certain hardware revision once frame length exceeds certain threshold (different parts seems to have different thresholds). Borrow code from ip_output to do software csum, if transmit csum offloading is enabled and frame length exceeds hardware's threshold. 8169, 8169S, 8169SB and 8168B are tested, while 8169S and 8169SB does not seem to have this bug.
Print hardware revision during attach
- Don't substract ETHER_ALIGN from the fragment length, we don't do m_adj(ETHET_ALIGN) in re_newbuf() - If one fragment of a multi-fragment packet recolletion fails, we drop will consecutive fragments of this packet. - All of the TX descs in TX ring could be used; there is no need to reserve RE_TXDESC_SPARE TX descs
Factor out re_free_rxchain()
Rework DMA stuffs' allocation/free
If RX/TX ring initialization failed, then stop re(4) and return
Rework re_newbuf() and re_encap()
- Instead of using magic number 4 define it as RE_TXDESC_SPARE - Clear if_timer only if all TX descs are free - Clear IFF_OACTIVE only if more than RE_TXDESC_SPARE TX descs ar free
Add support for "RealTek 8102EL PCIe 10/100baseTX". Checksum support doesn't work yet for this card so disable hardware checksumming. Submitted-by: "Mitja Horvat" <pinkfluid@gmail.com>
Always enable ETHER_INPUT_CHAIN support
Remove the '2' suffix from ether_input_chain and vlan_input; their counterparts have gone for a long time.
Nuke INTR_NETSAFE
Remove useless assignment. Found-by: LLVM/Clang Static Analyzer
Switch to ETHER_INPUT2 on ethernet input path by default:
- Nuke old ether_input_chain and ether_demux_chain
- Nuke old vlan_input
- Nuke ETHER_INPUT2 kernel option
- Adjust comment about functions on old ether input path
- Adjust NIC drivers which aware ETHER_INPUT2
vlan(4):
Clearing of ifnet.if_vlantrunks is now protected in the following way
trunks = ifp->if_vlantrunks;
ifp->if_vlantrunks = NULL;
netmsg_service_sync();
kfree(trunks);
Users of ifnet.if_vlantrunks have already been adjusted to aware of this.
bridge(4):
Clearing of ifnet.if_bridge is now protected in the following way
ifp->if_bridge = NULL;
netmsg_service_sync();
Users of ifnet.if_bridge have already been adjusted to aware of this.
carp(4):
Remove the LK_NOWAIT lockmgr lock flags; using LK_NOWAIT was actually a
workaround for that lockmgr lock was used in NIC's interrupt routine
(i.e. old ether_input)
Dragonfly-bug: <http://bugs.dragonflybsd.org/issue957>
ipflow:
- Now per-cpu ipflow hash table installs its own ipflow entry instead of
having ipflow entry duplicated onto each cpu
- Remove the serializer parameter to ipflow_fastforward()
- Comment out ipflow_fastforward() in ef(4) and ppp(4), they need to be
changed to fit the current ipflow cpu localization model
- Serialize re_{resume,suspend}()
- Add serializer assertion in all major NIC driver interfaces
Make re(4) aware ETHER_INPUT_CHAIN and ETHER_INPUT2
Unify vlan_input() and vlan_input_tag(): - For device drivers that support hardware vlan tag extraction, mbuf's M_VLANTAG is turned on and vlan tag is saved in mbuf.m_pkthdr.ether_vlantag - At the very beginning of ether_input_chain(), if the packet's ether type is vlan and hardware does not extract vlan tag, vlan_ether_decap() is called to do software vlan tag extraction. - Instead of BPF_MTAP(), ETHER_BPF_MTAP() is used in ether_input_chain() to deliver possible vlan tagging information to the bpf listeners. - Ether header is restored before calling vlan_input(), so under most cases, extra ether header copy is avoided. vlan_input() does nothing more than finding vlan interface and looping back the packet to ether_input_chain() with vlan interface as input interface. Ideas-from: FreeBSD
Reduce ifnet.if_serializer contention on output path:
- Push ifnet.if_serializer holding down into each ifnet.if_output implementation
- Add a serializer into ifaltq, which is used to protect send queue instead of
its parent's if_serializer. This change has following implication:
o On output path, enqueueing packets and calling ifnet.if_start are decoupled
o In device drivers, poll->dev_encap_ok->dequeue operation sequence is no
longer safe, instead dequeue->dev_encap_fail->prepend should be used
This serializer will be held by using lwkt_serialize_adaptive_enter()
- Add altq_started field into ifaltq, which is used to interlock the calling
of its parent's if_start, to reduce ifnet.if_serializer contention.
if_devstart(), a helper function which utilizes ifaltq.altq_started, is added
to reduce code duplication in ethernet device drivers.
- Add if_cpuid into ifnet. This field indicates on which CPU device driver's
interrupt will happen.
- Add ifq_dispatch(). This function will try to hold ifnet.if_serializer in
order to call ifnet.if_start. If this attempt fails, this function will
schedule ifnet.if_start to be called on CPU located by ifnet.if_start_cpuid
if_start_nmsg, which is per-CPU netmsg, is added to ifnet to facilitate
ifnet.if_start scheduling. ifq_dispatch() is called by ether_output_frame()
currently
- Use ifq_classic_ functions, if altq is not enabled
- Fix various device drivers bugs in their if_start implementation
- Add ktr for ifq classic enqueue and dequeue
- Add ktr for ifnet.if_start
Add basic support for 8111C; hardware checksum offload does not seems to work on 8111C yet.
Print unknown hardware version.
Another round of typo fixes (mostly in messages).
Add ETHER_BPF_MTAP() which will call vlan_ether_ptap() for packets whose vlan tagging is offloaded to NIC. Obtained-from: FreeBSD
- Embed ether vlan tag in mbuf packet header. Add an mbuf flag to mark that this field is valid. - Hide ifvlan after the above change; drivers support hardware vlan tagging only need to check ether_vlantag in mbuf packet header. - Convert all drivers that support hardware vlan tagging to use vlan tag field in mbug packet header. Obtained-from: FreeBSD Change the vlan/parent serializer releasing/holding sequences into mbuf dispatching. There are several reasons to do so: - Avoid excessive vlan interface serializer releasing/holding - Touching parent interface if_snd without holding parent's serializer is unsafe - vlan's parent may disappear or be changed after vlan's serializer is released # This dispatching could be further optimized by packing all mbufs into one # netmsg using m_nextpkt to: # - Amortize netmsg sending cost # - Reduce the time that parent interface spends on serializer releasing/holding
Add a new csum flag to tell IP defragmenter that csum_data does _not_ contain a valid IP fragment payload checksum. This flag is only intented to be used by IP defragmenter. Currently only bce(4), bge(4) and ti(4) provide valid IP fragment payload checksum. Turn on the new csum flag for the rest of the drivers, which support hardware TCP/UDP checksum offload but hard-wire csum_data to 0xffff, to avoid bypassing verification of defragmented payload's checksum. Discussed-with: dillon@, hsu@ Approved-by: dillon@
PCI-E re(4) needs multi hash in reverse order. Add comment about it. Reported-by: Dennis den Brok <d.den.brok@uni-bonn.de> Obtained-from: NetBSD (tsutsui@netbsd.org)
Add support for a new revision of the RealTek 8168B/8111B called SPIN3. Requested-by: d.den.brok@uni-bonn.de (Dennis den Brok)
Nuke "is is" stammering.
Yet another RTL8110SC Obtained-from: FreeBSD (remko@freebsd.org)
PCIe re(4) can't handle TCP csum offloading well if short packets are padded by the driver, which is intended to fix PCI re(4) csum offloading bug. It turns out both PCI and PCIe re(4) _can_ handle short packets TCP csum offloading without driver's interferece, so padding for short TCP packets is avoided. Obtained-from: FreeBSD (wpaul@freebsd.org) Tested-by: Joe Talbott <josepht@cstone.net> RTL8101E(PCIe) me RTL8169S(PCI) RTL8169SB(PCI)
By default do not enable hardware csum on PCIe re(4), which trashes packets intermittently if csum offload is enabled. This problem does not seem to plague PCI re(4). The pattern of trashed packets is not yet identified. From the tcpdump information provided by Joe, the packets' size should not be the direct cause. Hardware bug? Reported-by: Joe Talbott <josepht@cstone.net> (RTL8101E, PCIe re(4)) # Same problem is reported to FreeBSD by two RTL8168B(PCIe re(4)) users.
- Don't call m_adj() to make RX buffer's _payload_ on longword aligned, because some re(4) chips (e.g. RTL8101E) require RX buffer to be 8-bytes aligned. This change shows no noticeable performance change. Reported-by: Joe Talbott <josepht@cstone.net> - Avoid writing extra hardware registers by writing 2 bytes to IDR4 instead instead of writing 4 bytes, bacause: 1) the extra two registers after IDR5 are reserved. 2) accessing arpcom.ac_enaddr[6,7] should be invalid. - Add a flag field in re_softc and re_hwrev. Currently only one flag, RE_F_HASMPC, is defined. This flag is used to indicate whether the hardware has MPC register or not, so we can avoid writing to MPC's position, if that position is reserved. - Move descriptor ring address setting up before RX/TX enabling, since some re(4) chips (e.g. RTL8101E) will try accessing descriptor ring immediately after RX/TX is enabled, which results in intermittent kernel panic or system hanging. Paniced-by: Joe Talbott <josepht@cstone.net> - Avoid calling re_init(), if hw.reX.tx_moderation is changed but NIC is not up yet. - Const-fy global hardware id arrays and nuke unused macro while I'm here. Thank Joe Talbott <josepht@cstone.net> to help debugging and provide valuable information (esp. locating the problematic RX/TX enabling :) Thank dillon@ to provide debugging hints. Tested-by: Joe Talbott <josepht@cstone.net> (RTL8101E) swildner@ (onboard RTL8169S) (*) me (RTL8169S) # (*) swildner@'s card is still half broken even after this commit :\
Sync re(4) with FreeBSD: - Add support for RealTek 8169SC/8110SC and RTL8101E devices. The latter is a PCIe 10/100 chip. - Add support for RealTek RTL8168(B?) - Fix EEPROM reading code - Disable diagnostic code in re_attach() by default. It is almost useless and has caused much trouble. - Manually padding small IP datagrams to work arround hardware checksum offload bug [1]. Enable IP/TCP/UDP checksum offload after this fix. - Work arround hardware TX bug in some PCIe re(4) devices: The TX command, which is issued when there is transmission in progress, will get lost [2]. So at the end of re_txeof(), if there are still packets sitting in the TX ring, we kick the TX engine again. - Add a sysctl hw.reX.tx_moderation to turn on/off TX moderation. It is on by default. - Move softc related structs from if_rereg.h into newly created if_revar.h Thank Bill Paul (wpaul@freebsd.org) and many other people for their work on this driver. # # [1] Detailed description of this bug is at: # FreeBSD dev/re/if_re.c rev1.70 by wpaul@freebsd.org # # [2] Detailed description of this bug is at: # FreeBSD dev/re/if_re.c rev1.71 by wpaul@freebsd.org #
Do a major clean-up of the BUSDMA architecture. A large number of essentially machine-independant drivers use the structures and definitions in machine-dependant directories that are really machine-independant in nature. Split <machine/bus_dma.h> into machine-depdendant and machine-independant parts and make the primary access run through <sys/bus_dma.h>. Remove <machine/bus.h>, <machine/bus_memio.h> and <machine/bus_pio.h>. The optimizations related to bus_memio.h and bus_pio.h made a huge mess, introduced machine-specific knowledge into essentially machine-independant drivers, and required specific #include file orderings to do their job. They may be reintroduced in some other form later on. Move <machine/resource.h> to <sys/bus_resource.h>. The contents of the file is machine-independant or can be made a superset across many platforms. Make <sys/bus.h> include <sys/bus_dma.h> and <sys/bus_resource.h> and include <sys/bus.h> where necessary. Remove all #include's of <machine/resource.h> and <machine/bus.h>. That is, make the BUSDMA infrastructure integral to I/O-mapped and memory-mapped accesses to devices and remove a large chunk of machine-specific dependancies from drivers. bus_if.h and device_if.h are now required to be present when using <sys/bus.h>.
Add support for Linksys EG1032 rev.3 GigE Obtained-from: FreeBSD (jhb@freebsd.org)
- Use RE_RX_LIST_SIZE instead of RE_TX_LIST_SIZE while dealing with RX DMA stuffs, though current RE_RX_LIST_SIZE == RE_TX_LIST_SIZE Obtained-from: FreeBSD (jmg@freebsd.org) - Use BUS_DMASYNC_PREWRITE instead of BUS_DMASYNC_PREWRITE|BUS_DMASYNC_PREREAD, the latter does not apply to DragonFly
Add support Corega CG-LAPCIGT Gigabit Ethernet(8169S) Obtained-from: FreeBSD
Rename malloc->kmalloc, free->kfree, and realloc->krealloc. Pass 1
Use pcidevs.h.
MFC serializer fixes by Sepherosa Ziehau. Primarily reorder the call to ether_ifdetach and ieee80211_ifdetach and make the calls without holding the serializer to avoid a panic.
{ether,ieee80211}_ifdetach() can't be called with serializer being held, since
they will go through code which tries to hold serializer again, e.g.
ether_ifdetach() -> if_detach() -> in_control()
So in various NICs' xxx_detach():
- Move bus_teardown_intr() under "(device_is_attached())", whenever it is
applicable. Since it is not possible that intrhandle is NULL here, nuke
original "(intrhandle != NULL)". This can:
1) Avoid holding serializer, if xxx_attach() fails
2) Release serializer ASAP
3) Ease following tasks
- Hold serializer only for xxx_stop()(or similar functions which stops NIC) and
bus_teardown_intr()
- Call {ether,ieee80211}_ifdetach() after serializer is released
Other stuffs:
- Serialize xxx_detach() for awi(4), ep(4), sn(4) and xe(4)
- Release serializer before returning from {ed_pccard,ray}_detach()
- Make ipw(4)'s ipw_detach() suitable for error handling, adjust ipw_attach()
accordingly
- Fix a bug in ex_pccard_detach(): instead of if_detach(), ether_ifdetach()
should be used here
- For ndis(4), "ifp->serializer" ==> "ifp->if_serializer"
Reported-by: esmith <esmith@postmark.net>
Discussed-with: dillon and joerg
Partially-Reviewed-by: dillon and joerg
Reported-by: Steve Mynott <steve.mynott@gmail.com> and me
Add support for DLink 528(T) Gigabit cards. Submitted-by: Gary Allan <dragonfly@gallan.plus.com> Taken-from: FreeBSD
Make all network interrupt service routines MPSAFE part 1/3. Replace the critical section that was previously used to serialize access with the LWKT serializer. Integrate the serializer into the IFNET structure. Note that kern.intr_mpsafe must be set to 1 for network interrupts to actually run MPSAFE. Also note that any interrupts shared with othre non-MP drivers will cause all drivers on that interrupt to run with the Big Giant Lock. Network interrupt - Each network driver then simply passes that serializer to bus_setup_intr() so only a single serializer is required to process the entire interrupt path. LWKT serialization support is already 100% integrated into the interrupt subsystem so it will already be held as of when the registered interrupt procedure is called. Ioctl and if_* functions - All callers of if_* functions (such as if_start, if_ioctl, etc) now obtain the IFNET serializer before making the call. Thus all of these entry points into the driver will now be serialized. if_input - All code that calls if_input now ensures that the serializer is held. It will either already be held (when called from a driver), or the serializer will be wrapped around the call. When packets are forwarded or bridged between interfaces, the target interface serializer will be dropped temporarily to avoid a deadlock. Device Driver access - dev_* entry points into certain pseudo-network devices now obtain and release the serializer. This had to be done on a device-by-device basis (but there are only a few such devices). Thanks to several people for helping test the patch, in particular Sepherosa Ziehau.
Fix the design of ifq_dequeue/altq_dequeue by adding an mbuf pointer and requiring that a polled mbuf be passed as an argument to the dequeue function. Assert that the passed argument matches the mbuf that is actually dequeued. Also remove assignments of the return value from ifq_dequeue() in such cases which implied that the mbuf might be different when, in fact, it had better not be.
- Move DEVICE_POLLING from opt_global.h to opt_polling.h(newly added),
so that polling(4) can be enabled in modules that are not built
during kernel building
- Add opt_polling.h to files that depend on DEVICE_POLLING
- Change related netif modules' Makefile to enable polling(4) support
- Add comment in net/if_var.h to prevent DEVICE_POLLING related
incompatibilities from being introduced
Suggested-by: dillon
NOTE: As of this commit, any file that will depend on DEVICE_POLLING
*must* include opt_polling.h at its beginning
With-helps-from: joerg
Reviewed-by: dillon, submit@
Remove the INTR_TYPE_* flags. The interrupt type is no longer used to figure out which spl*() set an interrupt belongs to, because, well, spl's no longer exist.
For bge(4), dc(4), lge(4), ndis(4), nge(4), pcn(4), re(4), sis(4), sk(4), ti(4) - Do not start tx engine or set if_timer, if there is nothing to be sent - Let if_watchdog() kick if_start(). This may avoid a possible race (in the future) between testing/setting if_timer and calling if_watchdog(). Only bge(4), re(4), sk(4) and ti(4) require this change. The rest drivers affected by this commit already have this in place. Discussed-with: joerg Reviewed-by: joerg
Convert to critical sections. No need to protect the interupt from racing against itself.
The header type of a mbuf doesn't change when appended onto a chain.
Rewrite the polling code. Instead of trying to do fancy polling enablement from inside the IF interrupt itself, which creates a headache in the code, simply allow IFF_POLLING to be set and cleared via ifconfig. This greatly simplifies both the networking code and the polling code and allows polling to be enabled and disabled at will on a per-network-interface basis. * Drivers no longer have to have polling checks in the interrupt path. * An if_poll function vector has been added. Polling is supported if the driver initializes the vector. * Registration command added to the poll function command list. * Driver code for registration and deregistration is now greatly simplified. The kernel polling code no longer randomly turns off the polling bit if an interface goes down or is reset. Remove IFCAP_POLLING, it serves no purpose. Fix a couple of bugs in the serializer code. Add a warning in nexus_setup_intr if a driver tries to specify a serializer and an SPL. A driver can specify one or the other, not both. Convert the EM driver to use the new serializer API instead of SPLs. Add ifconfig poll and ifconfig -poll support to ifconfig, and fix bugs in the rtsock code that only returned the low 16 bits of the interface flags so ifconfig properly reports when polling mode is turned on for an interface. NOTE to people using polling. You must first enable polling via kern.polling.enable, and then may specify the 'poll' directive in ifconfig to enable it on a per interface basis. If IFF_POLLING refuses to be set, the device does not support polling.
Get rid of bus_{disable,enable}_intr(), it wasn't generic enough for
our needs.
Implement some generic atomic.h functions to aid in the implementation of
a low level mutex.
Implement a generic low level sleep-mutex serializer, kern/lwkt_serialize.c.
The serializer is designed to be a replacement for SPL calls but may also
be used for other very low level work (e.g. lockmgr interlocks).
Add a serializer argument to BUS_SETUP_INTR(). When non-NULL, the interrupt
handler will no longer be protected by an SPL so e.g. spl*() will no
longer protect against that device's interrupts.
The IF queueing and dequeueing mechanisms may no longer depend on outside
SPL state because network driver interrupt handlers are no longer required to
enter splnet(). Use critical sections for the moment. The IFQ and
IFF_OACTIVE interactions are not yet MP safe.
Fix a bug introduced earlier. We can't put packets back into the queue with ALTQ, so we have to handle the case of errors after m_defrag has been called directly. We also have to remove the packets from the queue before we free them to avoid races. Same applies to calling bpf_mtap, which has to done on the defragmented packet.
ALTQ support.
Import ALTQ support from KAME. This is based on the FreeBSD 4 snapshot. This includes neither the ALTQ3 compat code nor the !DragonFly defines. The macros have been replaced with inline functions in net/ifq_var.h. This also renames pkthdr.pf_flags as it is intended as general flag bit. Currently supported are ppp(4), sppp(4), tun(4) and wi(4), more drivers are coming later. Reviewed-by: corecode, dillon, hsu Comments-from: hmp
Disable hardware checksum support by default, it produces packet corruption. It is unclear why the corruption occurs, but certain fragmented packets consistently reproduce it so there's a good chance that there may be alignment or length requirements that we don't know about, or just pure hardware brokedness with certain packets. The checksumming can be turned on again with ifconfig for testing purposes. EXTRACTION from Aggelos's packet dumps ] CLIENT: (RE0) 11:43:27.261710 192.168.2.2.183764104 > 192.168.2.4.nfs: 1472 write [|nfs] (frag 8031:1480@0+) 11:43:27.261718 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@1480+) 11:43:27.261729 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@2960+) 11:43:27.261743 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@4440+) 11:43:27.261756 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@5920+) 11:43:27.261767 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@7400+) 11:43:27.261781 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@8880+) 11:43:27.261793 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@10360+) 11:43:27.261807 192.168.2.2 > 192.168.2.4: udp (frag 8031:4@11840) SERVER: (RL0) 13:56:59.783671 192.168.2.2.183764104 > 192.168.2.4.nfs: 1472 write [|nfs] (frag 8031:1480@0+) 13:56:59.783785 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@1480+) 13:56:59.783915 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@2960+) 13:56:59.784037 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@4440+) 13:56:59.784159 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@5920+) 13:56:59.784283 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@7400+) 13:56:59.784407 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@8880+) 13:56:59.784527 192.168.2.2 > 192.168.2.4: udp (frag 8031:1480@10360+) 13:56:59.784532 0.0.0.0 > 0.0.2.4: udp (frag 8031:4@11840) Reported-by: aoiko@cc.ece.ntua.gr (Aggelos) Note-Also: also turned off in FreeBSD /usr/src/sys/dev/re/if_re.c:1.37
Release to correct ressource in re_detach, this is PCI_LOIO now. Remove a RE_DESC_INC from re_rxeof, a left-over from the while loop.
Forced commit to annotate the (unrelated) changes from the last commit. RealTek doesn't seem to support memory-mapped IO for re(4), the card generates an interrupt storm under pretty low load. Therefore change re(4) to the slower port-mapped IO.
Change (almost) all references to tqh_first and tqe_next and tqe_prev to the correct TAILQ macros. Exceptions are contrib/ipfilter, which will be handled separately, and dev/misc/labpc, which makes some very wiered things and therefore needs much more care.
Unify the input handling of the low-level network stack by introducing a new field if_input in struct ifnet. Initialize if_input and if_output in the low-level _ifattach routines. Make the _output and _input routines static, they are now called via (*ifp->if_input) and (*ifp->if_output) accordingly. The exception is ether_input which is still used with the second argument, the pointer to the Ethernet header instead of always taking it from the mbuf. Move the if_attach and bpfattach from the devices into fddi_ifattach, atm_ifattach. Remove the first argument to VLAN_INPUT_TAG, the pointer to the Ethernet header. Expect it at the beginning of the mbuf. Adjust the network for the changed API. Exceptions are wl(4), le(4), ie(4), el(4), ed(4) and de(4), because they use a on-stack Ethernet header. Another exception is the ATM stack, which uses a fourth argument to atm_input. Inspired-by: NetBSd net/if.h, rev 1.36
Don't init sc->re_timer twice.
Add re(4) as kernel module. After some feedback, this will be added to the GENERIC. Obtained-from: FreeBSD