DragonFly BSD

CVS log for src/sys/kern/kern_device.c

[BACK] Up to [DragonFly] / src / sys / kern

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.27: download - view: text, markup, annotated - select for diffs
Mon Jul 23 18:59:50 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: HEAD, DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12, DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10, DragonFly_Preview
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +1 -1 lines
The disk layer must not inherit the D_TRACKCLOSE flag from the underlying
device as this will confuse the disk layer's tracking of opens and closes.

This bug caused the disk layer to lose track of which slices and partitions
were open when a slice or partition was opened multiple times.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Thu May 17 03:01:59 2007 UTC (7 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +6 -0 lines
Add dev_drefs() - return the number of references on a cdev_t

Revision 1.25: download - view: text, markup, annotated - select for diffs
Tue May 15 22:44:14 2007 UTC (7 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +1 -1 lines
* The diskslice abstraction now stores offsets/sizes as 64 bit quantities.
  (NOTE: DOS partition tables and standard disklabels can't handle 64 bit
  sector numbers yet).  For future pluggable disklabel/partitioning schemes.

* The kernel panic / kernel core API is now 64 bits.

* The VN device now uses 64 bit sector numbers and can handle block devices
  up to what is supported by the filesystem (typically 8TB).  This change
  was made primarily so we can test future disklabel / partition table
  support.

* Pass 64 bit LBAs to various block devices and to the SCSI layer.

* Check for and assert 32 bit overflow conditions in various places, instead
  of wrapping.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Wed May 9 00:53:34 2007 UTC (7 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +115 -39 lines
Give the device major / minor numbers their own separate 32 bit fields
in the kernel.  Change dev_ops to use a RB tree to index major device
numbers and remove the 256 device major number limitation.

Build a dynamic major number assignment feature into dev_ops_add() and
adjust ASR (which already had a hand-rolled one), and MFS to use the
feature.  MFS at least does not require any filesystem visibility to
access its backing device.  Major devices numbers >= 256 are used for
dynamic assignment.

Retain filesystem compatibility for device numbers that fall within the
range that can be represented in UFS or struct stat (which is a single
32 bit field supporting 8 bit major numbers and 24 bit minor numbers).

Revision 1.23: download - view: text, markup, annotated - select for diffs
Sun Apr 29 06:11:19 2007 UTC (7 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +1 -1 lines
Remove unneeded references to sys/syslink.h.  Get syslink_desc from
sys/syslink_rpc.h

Revision 1.22: download - view: text, markup, annotated - select for diffs
Sat Dec 23 00:35:04 2006 UTC (7 years, 10 months ago) by swildner
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +7 -7 lines
Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

Revision 1.21: download - view: text, markup, annotated - select for diffs
Tue Sep 26 18:57:13 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +2 -0 lines
Follow up to kern_conf.c 1.16.  We can't just ignore the ops comparison, it
is needed to keep user-invisible devices user-invisible.  Add a flag so
hashdev() knows when it can ignore the comparison and when it can't ignore
the comparison.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Sun Sep 10 01:26:39 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +19 -19 lines
Change the kernel dev_t, representing a pointer to a specinfo structure,
to cdev_t.  Change struct specinfo to struct cdev.  The name 'cdev' was taken
from FreeBSD.  Remove the dev_t shim for the kernel.

This commit generally removes the overloading of 'dev_t' between userland and
the kernel.

Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being
properly converted to a userland dev_t.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Tue Sep 5 00:55:45 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +3 -3 lines
Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 1

Revision 1.18: download - view: text, markup, annotated - select for diffs
Fri Jul 28 02:17:40 2006 UTC (8 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +495 -432 lines
MASSIVE reorganization of the device operations vector.  Change cdevsw
to dev_ops.  dev_ops is a syslink-compatible operations vector structure
similar to the vop_ops structure used by vnodes.

Remove a huge number of instances where a thread pointer is still being
passed as an argument to various device ops and other related routines.
The device OPEN and IOCTL calls now take a ucred instead of a thread pointer,
and the CLOSE call no longer takes a thread pointer.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Sun Apr 30 17:22:17 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +2 -1 lines
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF
b_flags with a separate b_cmd field.  Use b_cmd to test for I/O completion
as well (getting rid of B_DONE in the process).  This further simplifies
the setup required to issue a buffer cache I/O.

Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any
discrepancies into bus/isa/isavar.h.

Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of
trying to overload them on top of B_READ, B_WRITE, and B_RAW.  Add a
routine isa_dmabp() which takes a struct buf pointer and returns the ISA
dma flags associated with the operation.

Remove the 'clear_modify' argument to vfs_busy_pages().  Instead,
vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses
it to determine the action it must take.

Revision 1.16: download - view: text, markup, annotated - select for diffs
Fri Feb 17 19:18:06 2006 UTC (8 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +34 -3 lines
Make the entire BUF/BIO system BIO-centric instead of BUF-centric.  Vnode
and device strategy routines now take a BIO and must pass that BIO to
biodone().  All code which previously managed a BUF undergoing I/O now
manages a BIO.

The new BIO-centric algorithms allow BIOs to be stacked, where each layer
represents a block translation, completion callback, or caller or device
private data.  This information is no longer overloaded within the BUF.
Translation layer linkages remain intact as a 'cache' after I/O has completed.

The VOP and DEV strategy routines no longer make assumptions as to which
translated block number applies to them.  The use the block number in the
BIO specifically passed to them.

Change the 'untranslated' constant to NOOFFSET (for bio_offset), and
(daddr_t)-1 (for bio_blkno).  Rip out all code that previously set the
translated block number to the untranslated block number to indicate
that the translation had not been made.

Rip out all the cluster linkage fields for clustered VFS and clustered
paging operations.  Clustering now occurs in a private BIO layer using
private fields within the BIO.

Reformulate the vn_strategy() and dev_dstrategy() abstraction(s).  These
routines no longer assume that bp->b_vp == the vp of the VOP operation, and
the dev_t is no longer stored in the struct buf.  Instead, only the vp passed
to vn_strategy() (and related *_strategy() routines for VFS ops), and
the dev_t passed to dev_dstrateg() (and related *_strategy() routines for
device ops) is used by the VFS or DEV code.  This will allow an arbitrary
number of translation layers in the future.

Create an independant per-BIO tracking entity, struct bio_track, which
is used to determine when I/O is in-progress on the associated device
or vnode.

NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold
the fields describing the data buffer, resid, and error state.

Major-testing-by: Stefan Krueger

Revision 1.15: download - view: text, markup, annotated - select for diffs
Wed Mar 23 02:50:53 2005 UTC (9 years, 7 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable, DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4, DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +18 -3 lines
Because destroy_all_dev() checks the mask/match against the device's si_udev,
which is a field combining both major and minor numbers, we must mask off the
major bits (to retain only the minor bits) from the 'mask' variable in order
to allow a generic -1 to be passed as the mask.  Otherwise we will not match
anything.

Revamp a good chunk of the documentation to try to make the major/minor
number masking issues clear.

Bug-found-by: Chuck Tuffli <chuck_tuffli@agilent.com>

Revision 1.14: download - view: text, markup, annotated - select for diffs
Mon Feb 21 18:56:05 2005 UTC (9 years, 8 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +6 -1 lines
Track the last read and last write timestamp at the device level and modify
the stat code to retrieve the information.  This is so devices such as ttys
report the correct access/modified time for the 'w' and related utilities.
NOTE: the inode still needs to be updated at CLOSE time to record the last
accessed and modified times persistently, and this is not yet occuring.

This is necessary because device read/write now bypasses the filesystem VOP
code.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Wed Sep 15 03:21:03 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +1 -1 lines
Don't complain when a cdevsw with non-zero refs is being removed if it still
has links to other mask/match sets.  Add misc comments to the code.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Wed Sep 15 01:48:09 2004 UTC (10 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +5 -7 lines
Improve error reporting when the cdevsw code detects problems.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Wed May 19 22:52:58 2004 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap13Sep2004, DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +173 -262 lines
Device layer rollup commit.

* cdevsw_add() is now required.  cdevsw_add() and cdevsw_remove() may specify
  a mask/match indicating the range of supported minor numbers.  Multiple
  cdevsw_add()'s using the same major number, but distinctly different
  ranges, may be issued.  All devices that failed to call cdevsw_add() before
  now do.

* cdevsw_remove() now automatically marks all devices within its supported
  range as being destroyed.

* vnode->v_rdev is no longer resolved when the vnode is created.  Instead,
  only v_udev (a newly added field) is resolved.  v_rdev is resolved when
  the vnode is opened and cleared on the last close.

* A great deal of code was making rather dubious assumptions with regards
  to the validity of devices associated with vnodes, primarily due to
  the persistence of a device structure due to being indexed by (major, minor)
  instead of by (cdevsw, major, minor).  In particular, if you run a program
  which connects to a USB device and then you pull the USB device and plug
  it back in, the vnode subsystem will continue to believe that the device
  is open when, in fact, it isn't (because it was destroyed and recreated).

  In particular, note that all the VFS mount procedures now check devices
  via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev
  is NULL prior to the first open.

* The disk layer's device interaction has been rewritten.  The disk layer
  (i.e. the slice and disklabel management layer) no longer overloads
  its data onto the device structure representing the underlying physical
  disk.  Instead, the disk layer uses the new cdevsw_add() functionality
  to register its own cdevsw using the underlying device's major number,
  and simply does NOT register the underlying device's cdevsw.  No
  confusion is created because the device hash is now based on
  (cdevsw,major,minor) rather then (major,minor).

  NOTE: This also means that underlying raw disk devices may use the entire
  device minor number instead of having to reserve the bits used by the disk
  layer, and also means that can we (theoretically) stack a fully
  disklabel-supported 'disk' on top of any block device.

* The new reference counting scheme prevents this by associating a device
  with a cdevsw and disconnecting the device from its cdevsw when the cdevsw
  is removed.  Additionally, all udev2dev() lookups run through the cdevsw
  mask/match and only successfully find devices still associated with an
  active cdevsw.

* Major work on MFS:  MFS no longer shortcuts vnode and device creation.  It
  now creates a real vnode and a real device and implements real open and
  close VOPs.  Additionally, due to the disk layer changes, MFS is no longer
  limited to 255 mounts.  The new limit is 16 million.  Since MFS creates a
  real device node, mount_mfs will now create a real /dev/mfs<PID> device
  that can be read from userland (e.g. so you can dump an MFS filesystem).

* BUF AND DEVICE STRATEGY changes.  The struct buf contains a b_dev field.
  In order to properly handle stacked devices we now require that the b_dev
  field be initialized before the device strategy routine is called.  This
  required some additional work in various VFS implementations.  To enforce
  this requirement, biodone() now sets b_dev to NODEV.  The new disk layer
  will adjust b_dev before forwarding a request to the actual physical
  device.

* A bug in the ISO CD boot sequence which resulted in a panic has been fixed.

Testing by: lots of people, but David Rhodus found the most aggregious bugs.

Revision 1.10: download - view: text, markup, annotated - select for diffs
Thu May 13 23:49:23 2004 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +1 -19 lines
device switch 1/many: Remove d_autoq, add d_clone (where d_autoq was).

d_autoq was used to allow the device port dispatch to mix old-style synchronous
calls with new style messaging calls within a particular device.  It was never
used for that purpose.

d_clone will be more fully implemented as work continues.  We are going to
install d_port in the dev_t (struct specinfo) structure itself and d_clone
will be needed to allow devices to 'revector' the port on a minor-number
by minor-number basis, in particular allowing minor numbers to be directly
dispatched to distinct threads.  This is something we will be needing later
on.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Tue Apr 20 01:52:22 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +7 -3 lines
Revamp the initial lwkt_abortmsg() support to normalize the abstraction.  Now
a message's primary command is always processed by the target even if an
abort is requested before the target has retrieved the message from the
message port.  The message will then be requeued and the abort command copied
into lwkt_msg_t->ms_cmd.  Thus the target is always guarenteed to see the
original message and then a second, abort message (the same message with
ms_cmd = ms_abort) regardless of whether the abort was requested before
or after the target retrieved the original message.

ms_cmd is now an opaque union.  LWKT makes no assumptions as to its contents.
The NET code now stores nm_handler in ms_cmd as a function vector, and
nm_handler has been removed from all netmsg structures.

The ms_cmd function vector support nominally returns an integer error code
which is intended to support synchronous/asynchronous optimizations in the
future (to bypass messaging queueing and dequeueing in those situations
where they can be bypassed, without messing up the messaging abstraction).

The connect() predicate for which signal/abort support was added in the last
commit now uses the new abort mechanism.  Instead of having the handler
function check whether a message represents an abort or not, a different
handler vector is stored in ms_abort and run when an abort is processed
(making for an easy separation of function).

The large netmsg switch has been replaced by individual function vectors
using the new ms_cmd function vector support.  This will soon be removed
entirely in favor of direct assignment of LWKT-aware PRU vectors to the
messages command vector.

NOTE ADDITIONAL: eventually the SYSCALL, VFS, and DEV interfaces will use
the new message opaque ms_cmd 'function vector' support instead of a
command index.

Work by: Matthew Dillon and Jeffrey Hsu

Revision 1.8: download - view: text, markup, annotated - select for diffs
Sat Mar 6 19:40:28 2004 UTC (10 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +1 -1 lines
Simplify LWKT message initialization semantics to reduce API confusion.

Cleanup netisr messaging to provide more uniform error handling and to use
lwkt_replymsg() unconditionally for both async/auto-free and sync messages
as the abstraction intended.  This also fixes a reply/free race.

Revision 1.7: download - view: text, markup, annotated - select for diffs
Mon Nov 24 20:46:01 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +2 -2 lines
More LWKT messaging cleanups.  Isolate the default port functions by making
them static and rename lwkt_init_port() to lwkt_initport() to conform with
lwkt_initmsg().

Revision 1.6: download - view: text, markup, annotated - select for diffs
Thu Nov 20 06:05:30 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +2 -2 lines
This is a major cleanup of the LWKT message port code.  The messaging code
is getting closer to being directly useable by userland.  With these changes
message/port operations are now far better abstracted then they were before.

    * Stale fields have been removed from struct lwkt_msg.
    * lwkt_abortmsg() has been revamped to make it easier to support.
    * lwkt_waitmsg has been converted to a port function.
    * mp_*port() function fields have been renamed for better readability.
    * ms_cleanupmsg has been removed from struct lwkt_msg.
    * Union sysmsg is now struct sysmsg.
    * A copyout function has been added to struct sysmsg.
    * The system calls have been regenerated.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sat Aug 23 16:58:36 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +2 -0 lines
Allow a NULL dev to be passed to _devsw().  This should close any remaining
kernel panics related to non-existant devices.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Tue Aug 12 02:36:15 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +1 -1 lines
Syscall messaging 4: Further expand the kernel-version of the syscall message.
The (in-kernel) syscall message is now arranged:

    struct blah_args {
	sysmsg
	usrmsg
	... syscall arguments ...
    }

Original system calls copyin() just the arguments and then initialize sysmsg
and go.  Syscall messages copyin() usrmsg+arguments and then initialize sysmsg
as appropriate and go.

Further detail work for EASYNC support.  Implement td_msgport as a reply port
and start working on an async capability for the nanosleep() system call.

NOTE: Preliminary system call messaging can be tested using the suite of
programs in /usr/src/test/sysmsg.

NOTE: Work is still in progress  and you can crash the system, so use of
MSGF_ASYNC for messaging system calls is currently restricted to root.

Also fixed a bug in the syscall module helper code in sys/sysent.h, which
might have been causing the linux problems (or might not have).

All system call headers had to be regenerated to deal with the structural
changes.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Jul 24 23:52:38 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +2 -2 lines
Syscall messaging work 2: Continue with the implementation of sendsys(),
using int 0x81.  This entry point will be responsible for sending system
call messages or waiting for messages / port activity.

With this commit system call messages can be run through 0x81 but at the
moment they will always run synchronously. Here's the core interface
code for IA32:

    static __inline int
    sendsys(void *port, void *msg, int msgsize)
    {
	int error;
	__asm __volatile("int $0x81" : "=a"(error) :
			"a"(port), "c"(msg), "d"(msgsize) : "memory");
	return(error);
    }

Performance verses a direct system call is currently excellent considering
that this is my initial attempt.

		600MHzC3	1.2GHzP3x2(SMP)

getuid()	1300 ns		 909 ns
getuid_msg()	1700 ns		1077 ns

Revision 1.2: download - view: text, markup, annotated - select for diffs
Wed Jul 23 02:30:20 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +2 -2 lines
LINT pass.  Cleanup missed proc->thread conversions and get rid of warnings.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jul 22 17:03:33 2003 UTC (11 years, 3 months ago) by dillon
Branches: MAIN
DEV messaging stage 2/4: In this stage all DEV commands are now being
funneled through the message port for action by the port's beginmsg function.
CONSOLE and DISK device shims replace the port with their own and then
forward to the original.  FB (Frame Buffer) shims supposedly do the same
thing but I haven't been able to test it.   I don't expect instability
in mainline code but there might be easy-to-fix, and some drivers still need
to be converted.  See primarily: kern/kern_device.c (new dev_*() functions and
inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c
for the high points.

In this stage all DEV messages are still acted upon synchronously in the
context of the caller.  We cannot create a separate handler thread until
the copyin's (primarily in ioctl functions) are made thread-aware.

Note that the messaging shims are going to look rather messy in these early
days but as more subsystems are converted over we will begin to use
pre-initialized messages and message forwarding to avoid having to constantly
rebuild messages prior to use.

Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned
up in subsequent passes.  e.g. the way sub-devices inherit the main device's
cdevsw was always a bad hack and it still is, and several functions
(mmap, kqfilter, psize, poll) return results rather then error codes, which
will be fixed since now we have a message to store the result in :-)

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options