DragonFly BSD

CVS log for src/sys/dev/disk/ccd/ccd.c

[BACK] Up to [DragonFly] / src / sys / dev / disk / ccd

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.50: download - view: text, markup, annotated - select for diffs
Tue Nov 6 03:50:02 2007 UTC (6 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: HEAD, DragonFly_RELEASE_2_0_Slip, DragonFly_RELEASE_2_0, DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12, DragonFly_Preview
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +1 -1 lines
Convert the global 'bioops' into per-mount bio_ops.  For now we also have
to have a per buffer b_ops as well since the controlling filesystem cannot
be located from information in struct buf (b_vp could be the backing store
so that can't be used).  This change allows HAMMER to use bio_ops.

Change the ordering of the bio_ops.io_deallocate call so it occurs before
the buffer's B_LOCKED is checked.  This allows the deallocate call to set
B_LOCKED to retain the buffer in situations where the target filesystem
is unable to immediately disassociate the buffer.  Also keep VMIO intact
for B_LOCKED buffers (in addition to B_DELWRI buffers).

HAMMER will use this feature to keep buffers passively associated with
other filesystem structures and thus be able to avoid constantly brelse()ing
and getblk()ing them.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Wed Jul 11 23:42:16 2007 UTC (7 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +13 -2 lines
Use I/O size limits in underlying devices to govern I/O chunk
sizes.  This fixes issues with NATA which does not break up large
requests like the old ATA driver did.

Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>

Revision 1.48: download - view: text, markup, annotated - select for diffs
Tue Jun 19 19:09:46 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +14 -6 lines
The fstype was not being properly tested for a CCD uuid.

Correct a bug when generating an interleave table for very large disk
arrays (> 2TB).  A size variable was 32 bits instead of 64 bits.

Revision 1.47: download - view: text, markup, annotated - select for diffs
Tue Jun 19 06:07:54 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +2 -2 lines
Make some adjustments to clean up structural field names.  Add type and
storage uuid's to the partinfo structure for the DIOCGPART ioctl and
load the fields up for GPT slices and disklabel64 partitions.

Revision 1.46: download - view: text, markup, annotated - select for diffs
Sun Jun 17 23:50:15 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +1 -0 lines
Disklabel separation work - Generally shift all disklabel-specific
procedures for the kernel proper to a new source file, subr_disklabel32.c.
Move the DTYPE_ and FS_ defines out of sys/disklabel.h and into a new
header files sys/dtype.h

Make adjustments to the uuids file, renaming "DragonFly Label" to
"DragonFly Label32" and creating a "DragonFly Label64" uuid.

Revision 1.45: download - view: text, markup, annotated - select for diffs
Sun Jun 17 03:51:14 2007 UTC (7 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +3 -3 lines
Implement (non-bootable) GPT support.  If a PMBR partition type is detected
the rest of the MBR is ignored and the GPT partition table will be parsed
into slices.  GPT partition 0 will be s0, GPT partition 1 will be s1, etc.
Bootable support is forthcoming.

Remove support for COMPATIBILITY_SLICE when a MBR/GPT table is present.  That
is, the COMPATIBILITY_SLICE (s0) will still point to the dangerously
dedicated disklabel or be synthesized for a CD, but it will no longer point
to the 'first BSD slice' in a real MBR or GPT table.  For GPT tables
slice 0 (s0) will point at GPT partition #0, slice 1 (s1) at
GPT partition #1, etc.

Redo the reserved sector handling code.  There is now a single reserved
sector count instead of separate fields for the slice layer and disklabel
layer.

Redo the disklabel snooping code.  Note that you cannot run an old
/sbin/disklabel in raw (-r) mode with a new OS because the old disklabel
will not turn on snooping.  For now the on-disk format remains the same,
but more changes may be forthcoming (after discussion).  I would like to
get rid of the snooping entirely.

Add kuuid_is_nil() and use it to ignore unset GPT paritions.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Tue May 22 21:28:56 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +0 -3 lines
Remove unused define.

Revision 1.43: download - view: text, markup, annotated - select for diffs
Thu May 17 03:20:07 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +168 -330 lines
Remove the roll-your-own disklabel from CCD.  Use the kernel disk manager
for disklabel support instead.

Make CCD a real disk device rather then a fake one.  NOTE: All /dev/ccd*
devices have changed and must be remade

Introduce DSO_COMPATMBR.  This forces an MBR sector to be reserved in front
of a disklabel even when the target disk does not have slices.  It is used
by the CCD and VN devices to keep the disklabel aligned the same way it has
been historically.

Implement 64 bit block addressing for CCD.

Implement a new filesystem type "ccd", and require that the devices backing
the CCD use that filesystem type for safety.

Fix a bug in DIOCGPART where the partinfo->media_blocks was not being
set properly for partitions.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Wed May 16 05:20:13 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +1 -0 lines
Continue untangling the disklabel.  Add sector index reservation fields
to the diskslice and partinfo structures.  These fields will replace the
hardcoded LABELSECTOR constant and also help manage reserved areas in
the disklabel.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Tue May 15 22:44:04 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +3 -2 lines
* The diskslice abstraction now stores offsets/sizes as 64 bit quantities.
  (NOTE: DOS partition tables and standard disklabels can't handle 64 bit
  sector numbers yet).  For future pluggable disklabel/partitioning schemes.

* The kernel panic / kernel core API is now 64 bits.

* The VN device now uses 64 bit sector numbers and can handle block devices
  up to what is supported by the filesystem (typically 8TB).  This change
  was made primarily so we can test future disklabel / partition table
  support.

* Pass 64 bit LBAs to various block devices and to the SCSI layer.

* Check for and assert 32 bit overflow conditions in various places, instead
  of wrapping.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Tue May 15 17:50:52 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +23 -11 lines
Continue untangling the disklabel.  Reorganize struct partinfo and the
DIOCGPART ioctl to extract the required information directly, and fix
the DIOCGPART ioctl direction so userland can use it.

This removes numerous disklabel references, particularly from the filesystem
code which was doing silly indirections just to figure out the sector size.

NOTE: The absolute byte offset of the slice or partition (relative to the
base of the raw disk) is also made available, but is not currently used
by the kernel.

Revision 1.39: download - view: text, markup, annotated - select for diffs
Sun May 6 19:23:21 2007 UTC (7 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +1 -1 lines
Use SYSREF to reference count struct vnode.  v_usecount is now
v_sysref(.refcnt).  v_holdcnt is now v_auxrefs.  SYSREF's termination state
(using a negative reference count from -0x40000000+) now places the vnode in
a VCACHED or VFREE state and deactivates it.  The vnode is now assigned a
64 bit unique id via SYSREF.

vhold() (which manipulates v_auxrefs) no longer reactivates a vnode and
is explicitly used only to track references from auxillary structures
and references to prevent premature destruction of the vnode.  vdrop()
will now only move a vnode from VCACHED to VFREE on the 1->0 transition
of v_auxrefs if the vnode is in a termination state.

vref() will now panic if used on a vnode in a termination state.  vget()
must now be used to explicitly reactivate a vnode.  These requirements
existed before but are now explicitly asserted.

vlrureclaim() and allocvnode() should now interact a bit better.  In
particular, vlrureclaim() will do a better job of finding vnodes to flush
and transition from VCACHED to VFREE, and allocvnode() will do a better
job finding vnodes to reuse without getting blocked by a flush.

allocvnode now uses a real VX lock to sequence vnodes into VRECLAIMED.  All
vnode special state processing now uses a VX lock.

Vnodes are now able to be slowly returned to the memory pool when
kern.maxvnodes is reduced at run time.

Various initialization elements have been moved to CTOR/DTOR and are
no longer in the critical path, improving performance.  However, since
SYSREF uses atomic_cmpset_int() (aka cmpxchgl), which reduces performance
somewhat, overall performance tends to be about the same.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Fri Dec 22 23:26:16 2006 UTC (7 years, 8 months ago) by swildner
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +34 -34 lines
Rename printf -> kprintf in sys/ and add some defines where necessary
(files which are used in userland, too).

Revision 1.37: download - view: text, markup, annotated - select for diffs
Sun Sep 10 01:26:33 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +10 -10 lines
Change the kernel dev_t, representing a pointer to a specinfo structure,
to cdev_t.  Change struct specinfo to struct cdev.  The name 'cdev' was taken
from FreeBSD.  Remove the dev_t shim for the kernel.

This commit generally removes the overloading of 'dev_t' between userland and
the kernel.

Also fix a bug in libkvm where a kernel dev_t (now cdev_t) was not being
properly converted to a userland dev_t.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Tue Sep 5 03:48:10 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +4 -4 lines
Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 2

Revision 1.35: download - view: text, markup, annotated - select for diffs
Tue Sep 5 00:55:37 2006 UTC (7 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +21 -21 lines
Rename malloc->kmalloc, free->kfree, and realloc->krealloc.  Pass 1

Revision 1.34: download - view: text, markup, annotated - select for diffs
Sat Aug 12 00:26:17 2006 UTC (8 years ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +1 -1 lines
VNode sequencing and locking - part 3/4.

VNode aliasing is handled by the namecache (aka nullfs), so there is no
longer a need to have VOP_LOCK, VOP_UNLOCK, or VOP_ISSLOCKED as 'VOP'
functions.  Both NFS and DEADFS have been using standard locking functions
for some time and are no longer special cases.  Replace all uses with
native calls to vn_lock, vn_unlock, and vn_islocked.

We can't have these as VOP functions anyhow because of the introduction of
the new SYSLINK transport layer, since vnode locks are primarily used to
protect the local vnode structure itself.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Fri Jul 28 02:17:35 2006 UTC (8 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +48 -58 lines
MASSIVE reorganization of the device operations vector.  Change cdevsw
to dev_ops.  dev_ops is a syslink-compatible operations vector structure
similar to the vop_ops structure used by vnodes.

Remove a huge number of instances where a thread pointer is still being
passed as an argument to various device ops and other related routines.
The device OPEN and IOCTL calls now take a ucred instead of a thread pointer,
and the CLOSE call no longer takes a thread pointer.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Sat May 6 02:43:02 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +4 -4 lines
The thread/proc pointer argument in the VFS subsystem originally existed
for...  well, I'm not sure *WHY* it originally existed when most of the
time the pointer couldn't be anything other then curthread or curproc or
the code wouldn't work.  This is particularly true of lockmgr locks.

Remove the pointer argument from all VOP_*() functions, all fileops functions,
and most ioctl functions.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Fri May 5 21:15:06 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +1 -1 lines
Simplify vn_lock(), VOP_LOCK(), and VOP_UNLOCK() by removing the thread_t
argument.  These calls now always use the current thread as the lockholder.
Passing a thread_t to these functions has always been questionable at best.

Revision 1.30: download - view: text, markup, annotated - select for diffs
Thu May 4 18:32:19 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +30 -31 lines
Block devices generally truncate the size of I/O requests which go past EOF.
This is exactly what we want when manually reading or writing a block device
such as /dev/ad0s1a, but is not desired when a VFS issues I/O ops on
filesystem buffers.  In such cases, any EOF condition must be considered an
error.

Implement a new filesystem buffer flag B_BNOCLIP, which getblk() and friends
automatically set.  If set, block devices are guarenteed to return an error
if the I/O request is at EOF or would otherwise have to be clipped to EOF.
Block devices further guarentee that b_bcount will not be modified when this
flag is set.

Adjust all block device EOF checks to use the new flag, and clean up the code
while I'm there.  Also, set b_resid in a couple of degenerate cases where
it was not being set.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Thu May 4 08:00:59 2006 UTC (8 years, 4 months ago) by y0netan1
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +2 -0 lines
Don't forget to replicate b_cmd, which has been split off of b_flags.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Wed May 3 20:44:46 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +15 -7 lines
- Clarify the definitions of b_bufsize, b_bcount, and b_resid.
- Remove unnecessary assignments based on the clarified fields.
- Add additional checks for premature EOF.

b_bufsize is only used by buffer management entities such as getblk() and
other vnode-backed buffer handling procedures.  b_bufsize is not required
for calls to vn_strategy() or dev_dstrategy().  A number of other subsystems
use it to track the original request size.

b_bcount is the I/O request size, but b_bcount() is allowed to be truncated
by the device chain if the request encompasses EOF (such as on a raw disk
device).  A caller which needs to record the original buffer size verses
the EOF-truncated buffer can compare b_bcount after the I/O against a
recorded copy of the original request size.  This copy can be recorded in
b_bufsize for unmanaged buffers (malloced or getpbuf()'d buffers).

b_resid is always relative to b_bcount, not b_bufsize.  A successful read
that is truncated to the device EOF will thus have a b_resid of 0 and a
truncated b_bcount.

Revision 1.27: download - view: text, markup, annotated - select for diffs
Sun Apr 30 17:22:16 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +3 -3 lines
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF
b_flags with a separate b_cmd field.  Use b_cmd to test for I/O completion
as well (getting rid of B_DONE in the process).  This further simplifies
the setup required to issue a buffer cache I/O.

Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any
discrepancies into bus/isa/isavar.h.

Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of
trying to overload them on top of B_READ, B_WRITE, and B_RAW.  Add a
routine isa_dmabp() which takes a struct buf pointer and returns the ISA
dma flags associated with the operation.

Remove the 'clear_modify' argument to vfs_busy_pages().  Instead,
vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses
it to determine the action it must take.

Revision 1.26: download - view: text, markup, annotated - select for diffs
Fri Apr 28 16:33:59 2006 UTC (8 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +13 -12 lines
Get rid of pbgetvp() and pbrelvp().  Instead fold the B_PAGING flag directly
into getpbuf() (the only type of buffer that pbgetvp() could be called on
anyway).  Change related b_flags assignments from '=' to '|='.

Get rid of remaining depdendancies on b_vp.  vn_strategy() now relies solely
on the vp passed to it as an argument.  Remove buffer cache code that sets
b_vp for anonymous pbuf's.

Add a stopgap 'vp' argument to vfs_busy_pages().  This is only really needed
by NFS and the clustering code do to the severely hackish nature of the
NFS and clustering code.

Fix a bug in the ext2fs inode code where vfs_busy_pages() was being called
on B_CACHE buffers.  Add an assertion to vfs_busy_pages() to panic if it
encounters a B_CACHE buffer.

Revision 1.25: download - view: text, markup, annotated - select for diffs
Mon Apr 3 02:02:32 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +5 -4 lines
A number of structures related to UFS and QUOTAS have changed name.

dinode -> ufs1_dinode
dqblk -> ufs_dqblk (and other quota related structures)

In addition, a large number of UFS related structures and procedures have
been prefixed with 'ufs_' to allow us to split off EXT2FS.

ufs_daddr_t has been moved out of sys/types.h and into vfs/ufs/dinode.h.

The #ifndef header file checks for UFS have been normalized.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Fri Mar 24 18:35:32 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +23 -24 lines
Major BUF/BIO work commit.  Make I/O BIO-centric and specify the disk or
file location with a 64 bit offset instead of a 32 bit block number.

* All I/O is now BIO-centric instead of BUF-centric.

* File/Disk addresses universally use a 64 bit bio_offset now.  bio_blkno
  no longer exists.

* Stackable BIO's hold disk offset translations.  Translations are no longer
  overloaded onto a single structure (BUF or BIO).

* bio_offset == NOOFFSET is now universally used to indicate that a
  translation has not been made.  The old (blkno == lblkno) junk has all
  been removed.

* There is no longer a distinction between logical I/O and physical I/O.

* All driver BUFQs have been converted to BIOQs.

* BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*,
  and findblk all now take and/or return 64 bit byte offsets instead
  of block numbers.  Note that BMAP now returns a byte range for the before
  and after variables.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Wed Mar 8 17:14:11 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +24 -14 lines
Struct buf's cannot simply be bcopy'd any more due to linkages in the
BIOs.  The CCD code was also zeroing its custom bufs after initializing
them.  This fixes the bugs and cleans it up a bit.

Reported-by: YONETANI Tomokazu <qhwt+dfly@les.ath.cx>

Revision 1.22: download - view: text, markup, annotated - select for diffs
Fri Feb 17 19:17:55 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +78 -55 lines
Make the entire BUF/BIO system BIO-centric instead of BUF-centric.  Vnode
and device strategy routines now take a BIO and must pass that BIO to
biodone().  All code which previously managed a BUF undergoing I/O now
manages a BIO.

The new BIO-centric algorithms allow BIOs to be stacked, where each layer
represents a block translation, completion callback, or caller or device
private data.  This information is no longer overloaded within the BUF.
Translation layer linkages remain intact as a 'cache' after I/O has completed.

The VOP and DEV strategy routines no longer make assumptions as to which
translated block number applies to them.  The use the block number in the
BIO specifically passed to them.

Change the 'untranslated' constant to NOOFFSET (for bio_offset), and
(daddr_t)-1 (for bio_blkno).  Rip out all code that previously set the
translated block number to the untranslated block number to indicate
that the translation had not been made.

Rip out all the cluster linkage fields for clustered VFS and clustered
paging operations.  Clustering now occurs in a private BIO layer using
private fields within the BIO.

Reformulate the vn_strategy() and dev_dstrategy() abstraction(s).  These
routines no longer assume that bp->b_vp == the vp of the VOP operation, and
the dev_t is no longer stored in the struct buf.  Instead, only the vp passed
to vn_strategy() (and related *_strategy() routines for VFS ops), and
the dev_t passed to dev_dstrateg() (and related *_strategy() routines for
device ops) is used by the VFS or DEV code.  This will allow an arbitrary
number of translation layers in the future.

Create an independant per-BIO tracking entity, struct bio_track, which
is used to determine when I/O is in-progress on the associated device
or vnode.

NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold
the fields describing the data buffer, resid, and error state.

Major-testing-by: Stefan Krueger

Revision 1.21: download - view: text, markup, annotated - select for diffs
Sun Dec 11 01:54:07 2005 UTC (8 years, 8 months ago) by swildner
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_4_Slip, DragonFly_RELEASE_1_4
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +14 -35 lines
* Ansify function definitions.

* Minor style cleanup.

Submitted-by: Alexey Slynko <slynko@tronet.ru>

Revision 1.20: download - view: text, markup, annotated - select for diffs
Wed Aug 3 16:36:33 2005 UTC (9 years, 1 month ago) by hmp
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +1 -1 lines
BUF/BIO cleanup 3/99:

Retire the B_CALL flag in favour of checking the bp->b_iodone pointer
directly, thus simplifying the BUF interface even more.

Move scattered B_UNUSED* flag space defintions into one place, that
is below the rest of the definitions.

Revision 1.19: download - view: text, markup, annotated - select for diffs
Fri Jun 3 21:56:23 2005 UTC (9 years, 3 months ago) by swildner
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +12 -11 lines
Remove spl*() in disk/{ata,buslogic,ccd} and replace them with
critical sections.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Fri Nov 12 00:09:03 2004 UTC (9 years, 9 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable, DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +17 -16 lines
VFS messaging/interfacing work stage 9/99: VFS 'NEW' API WORK.

NOTE: unionfs and nullfs are temporarily broken by this commit.

* Remove the old namecache API.  Remove vfs_cache_lookup(), cache_lookup(),
  cache_enter(), namei() and lookup() are all gone.  VOP_LOOKUP() and
  VOP_CACHEDLOOKUP() have been collapsed into a single non-caching
  VOP_LOOKUP().

* Complete the new VFS CACHE (namecache) API.  The new API is able to
  supply topological guarentees and is able to reserve namespaces,
  including negative cache spaces (whether the target name exists or not),
  which the new API uses to reserve namespace for things like NRENAME
  and NCREATE (and others).

* Complete the new namecache API.  VOP_NRESOLVE, NLOOKUPDOTDOT, NCREATE,
  NMKDIR, NMKNOD, NLINK, NSYMLINK, NWHITEOUT, NRENAME, NRMDIR, NREMOVE.
  These new calls take (typicaly locked) namecache pointers rather then
  combinations of directory vnodes, file vnodes, and name components.  The
  new calls are *MUCH* simpler in concept and implementation.  For example,
  VOP_RENAME() has 8 arguments while VOP_NRENAME() has only 3 arguments.

  The new namecache API uses the namecache to lock namespaces without having
  to lock the underlying vnodes.  For example, this allows the kernel
  to reserve the target name of a create function trivially.  Namecache
  records are maintained BY THE KERNEL for both positive and negative hits.

  Generally speaking, the kernel layer is now responsible for resolving
  path elements.  NRESOLVE is called when an unresolved namecache record
  needs to be resolved.  Unlike the old VOP_LOOKUP, NRESOLVE is simply
  responsible for associating a vnode to a namecache record (positive hit)
  or telling the system that it's a negative hit, and not responsible for
  handling symlinks or other special cases or doing any of the other
  path lookup work, much unlike the old VOP_LOOKUP.

  It should be particularly noted that the new namecache topology does not
  allow disconnected namecache records.  In rare cases where a vnode must
  be converted to a namecache pointer for new API operation via a file handle
  (i.e. NFS), the cache_fromdvp() function is provided and a new API VOP,
  VOP_NLOOKUPDOTDOT() is provided to allow the namecache to resolve the
  topology leading up to the requested vnode.  These and other topological
  guarentees greatly reduce the complexity of the new namecache API.

  The new namei() is called nlookup().  This function uses a combination
  of cache_n*() calls, VOP_NRESOLVE(), and standard VOP calls resolve the
  supplied path, deal with symlinks, and so forth, in a nice small compact
  compartmentalized procedure.

* The old VFS code is no longer responsible for maintaining namecache records,
  a function which was mostly adhoc cache_purge()s occuring before the VFS
  actually knows whether an operation will succeed or not.

  The new VFS code is typically responsible for adjusting the state of
  locked namecache records passed into it.  For example, if NCREATE succeeds
  it must call cache_setvp() to associate the passed namecache record with
  the vnode representing the successfully created file.  The new requirements
  are much less complex then the old requirements.

* Most VFSs still implement the old API calls, albeit somewhat modified
  and in particular the VOP_LOOKUP function is now *MUCH* simpler.  However,
  the kernel now uses the new API calls almost exclusively and relies on
  compatibility code installed in the default ops (vop_compat_*()) to
  convert the new calls to the old calls.

* All kernel system calls and related support functions which used to do
  complex and confusing namei() operations now do far less complex and
  far less confusing nlookup() operations.

* SPECOPS shortcutting has been implemented.  User reads and writes now go
  directly to supporting functions which talk to the device via fileops
  rather then having to be routed through VOP_READ or VOP_WRITE, saving
  significant overhead.  Note, however, that these only really effect
  /dev/null and /dev/zero.

  Implementing this was fairly easy, we now simply pass an optional
  struct file pointer to VOP_OPEN() and let spec_open() handle the
  override.

SPECIAL NOTES: It should be noted that we must still lock a directory vnode
LK_EXCLUSIVE before issuing a VOP_LOOKUP(), even for simple lookups, because
a number of VFS's (including UFS) store active directory scanning information
in the directory vnode.  The legacy NAMEI_LOOKUP cases can be changed to
use LK_SHARED once these VFS cases are fixed.  In particular, we are now
organized well enough to actually be able to do record locking within a
directory for handling NCREATE, NDELETE, and NRENAME situations, but it hasn't
been done yet.

Many thanks to all of the testers and in particular David Rhodus for
finding a large number of panics and other issues.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Tue Oct 12 19:20:30 2004 UTC (9 years, 10 months ago) by dillon
Branches: MAIN
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +2 -2 lines
VFS messaging/interfacing work stage 8/99: Major reworking of the vnode
interlock and other miscellanious things.  This patch also fixes FS
corruption due to prior vfs work in head.  In particular, prior to this
patch the namecache locking could introduce blocking conditions that
confuse the old vnode deactivation and reclamation code paths.  With
this patch there appear to be no serious problems even after two days
of continuous testing.

* VX lock all VOP_CLOSE operations.
* Fix two NFS issues.  There was an incorrect assertion (found by
  David Rhodus), and the nfs_rename() code was not properly
  purging the target file from the cache, resulting in Stale file
  handle errors during, e.g. a buildworld with an NFS-mounted /usr/obj.
* Fix a TTY session issue.  Programs which open("/dev/tty" ,...) and
  then run the TIOCNOTTY ioctl were causing the system to lose track
  of the open count, preventing the tty from properly detaching.
  This is actually a very old BSD bug, but it came out of the woodwork
  in DragonFly because I am now attempting to track device opens
  explicitly.
* Gets rid of the vnode interlock.  The lockmgr interlock remains.
* Introduced VX locks, which are mandatory vp->v_lock based locks.
* Rewrites the locking semantics for deactivation and reclamation.
  (A ref'd VX lock'd vnode is now required for vgone(), VOP_INACTIVE,
  and VOP_RECLAIM).  New guarentees emplaced with regard to vnode
  ripouts.
* Recodes the mountlist scanning routines to close timing races.
* Recodes getnewvnode to close timing races (it now returns a
  VX locked and refd vnode rather then a refd but unlocked vnode).
* Recodes VOP_REVOKE- a locked vnode is now mandatory.
* Recodes all VFS inode hash routines to close timing holes.
* Removes cache_leaf_test() - vnodes representing intermediate
  directories are now held so the leaf test should no longer be
  necessary.
* Splits the over-large vfs_subr.c into three additional source
  files, broken down by major function (locking, mount related,
  filesystem syncer).

* Changes splvm() protection to a critical-section in a number of
  places (bleedover from another patch set which is also about to be
  committed).

Known issues not yet resolved:

* Possible vnode/namecache deadlocks.
* While most filesystems now use vp->v_lock, I haven't done a final
  pass to make vp->v_lock mandatory and to clean up the few remaining
  inode based locks (nwfs I think and other obscure filesystems).
* NullFS gets confused when you hit a mount point in the underlying
  filesystem.
* Only UFS and NFS have been well tested
* NFS is not properly timing out namecache entries, causing changes made
  on the server to not be properly detected on the client if the client
  already has a negative-cache hit for the filename in question.

Testing-by: David Rhodus <sdrhodus@gmail.com>,
	    Peter Kadau <peter.kadau@tuebingen.mpg.de>,
	    walt <wa1ter@myrealbox.com>,
	    others

Revision 1.16: download - view: text, markup, annotated - select for diffs
Wed May 19 22:52:41 2004 UTC (10 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004, DragonFly_Snap13Sep2004, DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +10 -9 lines
Device layer rollup commit.

* cdevsw_add() is now required.  cdevsw_add() and cdevsw_remove() may specify
  a mask/match indicating the range of supported minor numbers.  Multiple
  cdevsw_add()'s using the same major number, but distinctly different
  ranges, may be issued.  All devices that failed to call cdevsw_add() before
  now do.

* cdevsw_remove() now automatically marks all devices within its supported
  range as being destroyed.

* vnode->v_rdev is no longer resolved when the vnode is created.  Instead,
  only v_udev (a newly added field) is resolved.  v_rdev is resolved when
  the vnode is opened and cleared on the last close.

* A great deal of code was making rather dubious assumptions with regards
  to the validity of devices associated with vnodes, primarily due to
  the persistence of a device structure due to being indexed by (major, minor)
  instead of by (cdevsw, major, minor).  In particular, if you run a program
  which connects to a USB device and then you pull the USB device and plug
  it back in, the vnode subsystem will continue to believe that the device
  is open when, in fact, it isn't (because it was destroyed and recreated).

  In particular, note that all the VFS mount procedures now check devices
  via v_udev instead of v_rdev prior to calling VOP_OPEN(), since v_rdev
  is NULL prior to the first open.

* The disk layer's device interaction has been rewritten.  The disk layer
  (i.e. the slice and disklabel management layer) no longer overloads
  its data onto the device structure representing the underlying physical
  disk.  Instead, the disk layer uses the new cdevsw_add() functionality
  to register its own cdevsw using the underlying device's major number,
  and simply does NOT register the underlying device's cdevsw.  No
  confusion is created because the device hash is now based on
  (cdevsw,major,minor) rather then (major,minor).

  NOTE: This also means that underlying raw disk devices may use the entire
  device minor number instead of having to reserve the bits used by the disk
  layer, and also means that can we (theoretically) stack a fully
  disklabel-supported 'disk' on top of any block device.

* The new reference counting scheme prevents this by associating a device
  with a cdevsw and disconnecting the device from its cdevsw when the cdevsw
  is removed.  Additionally, all udev2dev() lookups run through the cdevsw
  mask/match and only successfully find devices still associated with an
  active cdevsw.

* Major work on MFS:  MFS no longer shortcuts vnode and device creation.  It
  now creates a real vnode and a real device and implements real open and
  close VOPs.  Additionally, due to the disk layer changes, MFS is no longer
  limited to 255 mounts.  The new limit is 16 million.  Since MFS creates a
  real device node, mount_mfs will now create a real /dev/mfs<PID> device
  that can be read from userland (e.g. so you can dump an MFS filesystem).

* BUF AND DEVICE STRATEGY changes.  The struct buf contains a b_dev field.
  In order to properly handle stacked devices we now require that the b_dev
  field be initialized before the device strategy routine is called.  This
  required some additional work in various VFS implementations.  To enforce
  this requirement, biodone() now sets b_dev to NODEV.  The new disk layer
  will adjust b_dev before forwarding a request to the actual physical
  device.

* A bug in the ISO CD boot sequence which resulted in a panic has been fixed.

Testing by: lots of people, but David Rhodus found the most aggregious bugs.

Revision 1.15: download - view: text, markup, annotated - select for diffs
Thu May 13 23:49:15 2004 UTC (10 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +1 -1 lines
device switch 1/many: Remove d_autoq, add d_clone (where d_autoq was).

d_autoq was used to allow the device port dispatch to mix old-style synchronous
calls with new style messaging calls within a particular device.  It was never
used for that purpose.

d_clone will be more fully implemented as work continues.  We are going to
install d_port in the dev_t (struct specinfo) structure itself and d_clone
will be needed to allow devices to 'revector' the port on a minor-number
by minor-number basis, in particular allowing minor numbers to be directly
dispatched to distinct threads.  This is something we will be needing later
on.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Mon Mar 15 01:10:43 2004 UTC (10 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +4 -14 lines
The cam_sim structure was being deallocated unconditionally by device
driver detach routines.  The problem with this is that part of the CAM
bus structure may still be active (for example, with pending timeout()'s),
and even though the bus, target, and device is freed, since the sim IS
freed any accesses through the sim will hit 0xdeadc0de.  This case most often
occurs with USB UMASS devices.

The CAM_XPT and CAM_SIM layer has been revamped.  CAM_DEV_UNCONFIGURED is now
accounted for in the device->refcount, and the cam_sim structure is now
ref-counted as well.  Additionally, the cam_simq* code which handles the
device queues has been revamped to refcount as well, so shared device queues
(raid and multi-channel devices) are not free()'d before all references have
gone away.

scsi_low free'd its cam_sim twice.  Fixed.

USB was improperly using M_NOWAIT.  All M_NOWAIT instances have been renamed
to M_INTWAIT.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Mon Mar 1 06:33:13 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +2 -2 lines
Newtoken commit.  Change the token implementation as follows:  (1) Obtaining
a token no longer enters a critical section.  (2) tokens can be held through
schedular switches and blocking conditions and are effectively released and
reacquired on resume.  Thus tokens serialize access only while the thread
is actually running.  Serialization is not broken by preemptive interrupts.
That is, interrupt threads which preempt do no release the preempted thread's
tokens.  (3) Unlike spl's, tokens will interlock w/ interrupt threads on
the same or on a different cpu.

The vnode interlock code has been rewritten and the API has changed.  The
mountlist vnode scanning code has been consolidated and all known races have
been fixed.  The vnode interlock is now a pool token.

The code that frees unreferenced vnodes whos last VM page has been freed has
been moved out of the low level vm_page_free() code and moved to the
periodic filesystem sycer code in vfs_msycn().

The SMP startup code and the IPI code has been cleaned up considerably.
Certain early token interactions on AP cpus have been moved to the BSP.

The LWKT rwlock API has been cleaned up and turned on.

Major testing by: David Rhodus

Revision 1.12: download - view: text, markup, annotated - select for diffs
Tue Sep 23 05:03:40 2003 UTC (10 years, 11 months ago) by dillon
Branches: MAIN
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +1 -1 lines
namecache work stage 1: namespace cleanups.  Add a NAMEI_ prefix to
CREATE, LOOKUP, DELETE, and RENAME.  Add a CNP_ prefix too all the name
lookup flags (nd_flags) e.g. ISDOTDOT->CNP_ISDOTDOT.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Wed Aug 27 10:35:16 2003 UTC (11 years ago) by rob
Branches: MAIN
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +15 -15 lines
remove __P() from this directory

Revision 1.10: download - view: text, markup, annotated - select for diffs
Thu Aug 7 21:54:29 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +1 -1 lines
kernel tree reorganization stage 1: Major cvs repository work (not logged as
commits) plus a major reworking of the #include's to accomodate the
relocations.

    * CVS repository files manually moved.  Old directories left intact
      and empty (temporary).

    * Reorganize all filesystems into vfs/, most devices into dev/,
      sub-divide devices by function.

    * Begin to move device-specific architecture files to the device
      subdirs rather then throwing them all into, e.g. i386/include

    * Reorganize files related to system busses, placing the related code
      in a new bus/ directory.  Also move cam to bus/cam though this may
      not have been the best idea in retrospect.

    * Reorganize emulation code and place it in a new emulation/ directory.

    * Remove the -I- compiler option in order to allow #include file
      localization, rename all config generated X.h files to use_X.h to
      clean up the conflicts.

    * Remove /usr/src/include (or /usr/include) dependancies during the
      kernel build, beyond what is normally needed to compile helper
      programs.

    * Make config create 'machine' softlinks for architecture specific
      directories outside of the standard <arch>/include.

    * Bump the config rev.

    WARNING! after this commit /usr/include and /usr/src/sys/compile/*
    should be regenerated from scratch.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Thu Aug 7 21:16:52 2003 UTC (11 years ago) by dillon
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +1 -1 lines
kernel tree reorganization stage 1: Major cvs repository work (not logged as
commits) plus a major reworking of the #include's to accomodate the
relocations.

    * CVS repository files manually moved.  Old directories left intact
      and empty (temporary).

    * Reorganize all filesystems into vfs/, most devices into dev/,
      sub-divide devices by function.

    * Begin to move device-specific architecture files to the device
      subdirs rather then throwing them all into, e.g. i386/include

    * Reorganize files related to system busses, placing the related code
      in a new bus/ directory.  Also move cam to bus/cam though this may
      not have been the best idea in retrospect.

    * Reorganize emulation code and place it in a new emulation/ directory.

    * Remove the -I- compiler option in order to allow #include file
      localization, rename all config generated X.h files to use_X.h to
      clean up the conflicts.

    * Remove /usr/src/include (or /usr/include) dependancies during the
      kernel build, beyond what is normally needed to compile helper
      programs.

    * Make config create 'machine' softlinks for architecture specific
      directories outside of the standard <arch>/include.

    * Bump the config rev.

    WARNING! after this commit /usr/include and /usr/src/sys/compile/*
    should be regenerated from scratch.

Revision 1.8: download - view: text, markup, annotated - select for diffs
Mon Jul 21 05:50:28 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +7 -6 lines
DEV messaging stage 1/4: Rearrange struct cdevsw and add a message port
and auto-queueing mask.  The mask will tell us which message functions
can be safely queued to another thread and which still need to run in the
context of the caller.   Primary configuration fields (name, cmaj, flags,
port, autoq mask) are now at the head of the structure.  Function vectors,
which may eventually go away, are at the end.  The port and autoq fields
are non-functional in this stage.

The old BDEV device major number support has also been removed from cdevsw,
and code has been added to translate the bootdev passed from the boot code
(the boot code has always passed the now defunct block device major numbers
and we obviously need to keep that compatibility intact).

Revision 1.7: download - view: text, markup, annotated - select for diffs
Sat Jul 19 21:14:19 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +2 -2 lines
Remove the priority part of the priority|flags argument to tsleep().  Only
flags are passed now.  The priority was a user scheduler thingy that is not
used by the LWKT subsystem.  For process statistics assume sleeps without
P_SINTR set to be disk-waits, and sleeps with it set to be normal sleeps.

This commit should not contain any operational changes.

Revision 1.6: download - view: text, markup, annotated - select for diffs
Thu Jun 26 05:55:11 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +5 -8 lines
proc->thread stage 5:  BUF/VFS clearance!  Remove the ucred argument from
vop_close, vop_getattr, vop_fsync, and vop_createvobject.  These VOPs can
be called from multiple contexts so the cred is fairly useless, and UFS
ignorse it anyway.  For filesystems (like NFS) that sometimes need a cred
we use proc0.p_ucred for now.

This removal also removed the need for a 'proc' reference in the related
VFS procedures, which greatly helps our proc->thread conversion.

bp->b_wcred and bp->b_rcred have also been removed, and for the same reason.
It makes no sense to have a particular cred when multiple users can
access a file.  This may create issues with certain types of NFS mounts
but if it does we will solve them in a way that doesn't pollute the
struct buf.

Revision 1.5: download - view: text, markup, annotated - select for diffs
Wed Jun 25 03:55:47 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +26 -23 lines
proc->thread stage 4: rework the VFS and DEVICE subsystems to take thread
pointers instead of process pointers as arguments, similar to what FreeBSD-5
did.  Note however that ultimately both APIs are going to be message-passing
which means the current thread context will not be useable for creds and
descriptor access.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Mon Jun 23 17:55:30 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +10 -19 lines
proc->thread stage 2: MAJOR revamping of system calls, ucred, jail API,
and some work on the low level device interface (proc arg -> thread arg).
As -current did, I have removed p_cred and incorporated its functions
into p_ucred.  p_prison has also been moved into p_ucred and adjusted
accordingly.  The jail interface tests now uses ucreds rather then processes.

The syscall(p,uap) interface has been changed to just (uap).  This is inclusive
of the emulation code.  It makes little sense to pass a proc pointer around
which confuses the MP readability of the code, because most system call code
will only work with the current process anyway.  Note that eventually
*ALL* syscall emulation code will be moved to a kernel-protected userland
layer because it really makes no sense whatsoever to implement these
emulations in the kernel.

suser() now takes no arguments and only operates with the current process.
The process argument has been removed from suser_xxx() so it now just takes
a ucred and flags.

The sysctl interface was adjusted somewhat.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Jun 19 01:55:03 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +2 -1 lines
thread stage 5: Separate the inline functions out of sys/buf.h, creating
sys/buf2.h (A methodology that will continue as time passes).  This solves
inline vs struct ordering problems.

Do a major cleanup of the globaldata access methodology.  Create a
gcc-cacheable 'mycpu' macro & inline to access per-cpu data.  Atomicy is not
required because we will never change cpus out from under a thread, even if
it gets preempted by an interrupt thread, because we want to be able to
implement per-cpu caches that do not require locked bus cycles or special
instructions.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:28:23 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:54:07 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.73.2.1

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options