DragonFly BSD

CVS log for src/sys/sys/buf.h

[BACK] Up to [DragonFly] / src / sys / sys

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.51.2.1: download - view: text, markup, annotated - select for diffs
Thu Sep 25 01:44:54 2008 UTC (5 years, 10 months ago) by dillon
Branches: DragonFly_RELEASE_2_0
CVS tags: DragonFly_RELEASE_2_0_Slip
Diff to: previous 1.51: preferred, unified; next MAIN 1.52: preferred, unified
Changes since revision 1.51: +8 -1 lines
MFC numerous features from HEAD.

* Bounce buffer fixes for physio.
* Disk flush support in scsi and nata subsystems.
* Dead bio handling

Revision 1.54: download - view: text, markup, annotated - select for diffs
Fri Aug 29 20:08:37 2008 UTC (5 years, 11 months ago) by dillon
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.53: preferred, unified
Changes since revision 1.53: +2 -1 lines
Add BUF_CMD_FLUSH support - issue flush command to mass storage device.

Revision 1.53: download - view: text, markup, annotated - select for diffs
Sun Aug 10 20:03:15 2008 UTC (6 years ago) by dillon
Branches: MAIN
Diff to: previous 1.52: preferred, unified
Changes since revision 1.52: +2 -0 lines
Implement a bounce buffer for physio if the buffer passed from userland
is not at least 16-byte aligned.

Reported-by: "Steve O'Hara-Smith" <steve@sohara.org>, and others

Revision 1.52: download - view: text, markup, annotated - select for diffs
Thu Jul 17 23:55:24 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Preview
Diff to: previous 1.51: preferred, unified
Changes since revision 1.51: +4 -0 lines
Code documentation only: Describe B_NOCACHE

Revision 1.51: download - view: text, markup, annotated - select for diffs
Mon Jul 14 03:08:58 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_2_0
Diff to: previous 1.50: preferred, unified
Changes since revision 1.50: +8 -1 lines
Kernel support for HAMMER:

* Add another type to the bio->bio_caller_info1 union

* Add two new flags to getblk(), used by the cluster code.

  GETBLK_SZMATCH	- Tell getblk() to fail and return NULL if a
			  pre-existing buffer's size does not match
			  the requested size (this prevents getblk()
			  from doing a potentially undesired bwrite()
			  sequence).

  GETBLK_NOWAIT		- Tell getblk() to use a non-blocking lock.

* pop_bio() now returns the previous BIO (or NULL if there is no previous
  BIO).  This allows HAMMER to chain bio_done()'s

* Fix a bug in cluster_read().  The cluster code's read-ahead at the
  end could go past the caller-specified limit and force a block to
  the wrong block size.

Revision 1.50: download - view: text, markup, annotated - select for diffs
Sat Jun 28 23:45:19 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.49: preferred, unified
Changes since revision 1.49: +1 -1 lines
Fix hopefully all possible deadlocks that can occur when mixed block sizes
are used with the buffer cache.  The fix is simply to base the limiting
and flushing code on a byte count rather then a buffer count.

This will allow UFS to utilizes a greater number of dirty buffers and
will cause HAMMER to use fewer.  This also makes tuning the buffer cache
a whole lot easier.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Sat Jun 28 17:59:47 2008 UTC (6 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.48: preferred, unified
Changes since revision 1.48: +2 -1 lines
Replace the bwillwrite() subsystem to make it more fair to processes.

* Add new API functions, bwillread(), bwillwrite(), bwillinode() which
  the kernel calls when it intends to read, write, or make inode
  modifications.

* Redo the backend.  Add bd_heatup() and bd_wait().  bd_heatup() heats up
  the buf_daemon, starting it flushing before we hit any blocking conditions
  (similar to the previous algorith).

* The new bwill*() blocking functions no longer introduce escalating delays
  to keep the number of dirty buffers under control.  Instead it takes a page
  from HAMMER and estimates the load caused by the caller, then waits for a
  specific number of dirty buffers to complete their write I/O's before
  returning.  If the buffers can be retired quickly these functions will
  return more quickly.

Revision 1.48: download - view: text, markup, annotated - select for diffs
Thu Jun 19 23:27:36 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.47: preferred, unified
Changes since revision 1.47: +5 -1 lines
Miscellanious performance adjustments to the kernel

* Add an argument to VOP_BMAP so VFSs can discern the type of operation
  the BMAP is being done for.

* Normalize the variable name denoting the blocksize to 'blksize' in
  vfs_cluster.c.

* Fix a bug in the cluster code where a stale bp->b_error could wind up
  getting returned when B_ERROR is not set.

* Do not B_AGE cluster bufs.

* Pass the block size to both cluster_read() and cluster_write() instead
  of those routines getting the block size from
  vp->v_mount->mnt_stat.f_iosize.  This allows different areas of a file
  to use a different block size.

* Properly initialize bp->b_bio2.bio_offset to doffset in cluster_read().
  This fixes an issue where VFSs were making an extra, unnecessary call
  to BMAP.

* Do not recycle vnodes on the free list until numvnodes has reached
  desiredvnodes.  Vnodes were being recycled when their resident page count
  had dropped to zero, but this is actually too early as the VFS may cache
  important information in the vnode that would otherwise require a number
  of I/O's to re-acquire.  This mainly helps HAMMER (whos inode lookups are
  fairly expensive).

* Do not VAGE vnodes.

* Remove the minvnodes test.  There is no reason not to load the vnode cache
  all the way through to its max.

* buf_cmd_t visibility for the new BMAP argument.

Revision 1.47: download - view: text, markup, annotated - select for diffs
Thu Jun 12 23:26:36 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.46: preferred, unified
Changes since revision 1.46: +16 -2 lines
Reimplement B_AGE.  Have it cycle the buffer in the queue twice instead of
placing buffers at the head of the queue (which causes them to be run-down
backwards).  Leave B_AGE set through the write cycle and have the bufdaemon
set the flag when flushing dirty buffers.  B_AGE no longer effects the
ordering of the actual write and is allowed to slide through to the clean
queue when the write completes.

Revision 1.46: download - view: text, markup, annotated - select for diffs
Mon Jun 9 16:54:45 2008 UTC (6 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.45: preferred, unified
Changes since revision 1.45: +1 -0 lines
Add an extern for hidirtybuffers.

Reported-by: Michael Neumann <mneumann@ntecs.de>

Revision 1.45: download - view: text, markup, annotated - select for diffs
Tue May 6 00:14:11 2008 UTC (6 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.44: preferred, unified
Changes since revision 1.44: +2 -1 lines
Keep track of the number of buffers undgoing IO, and include that number
in calculations involving numdirtybuffers.  This prevents the kernel from
believing that there are only a few dirty buffers when, in fact, all the
dirty buffers are running IOs.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Tue Apr 22 18:46:52 2008 UTC (6 years, 4 months ago) by dillon
Branches: MAIN
Diff to: previous 1.43: preferred, unified
Changes since revision 1.43: +2 -8 lines
Fix some IO sequencing performance issues and reformulate the strategy
we use to deal with potential buffer cache deadlocks.  Generally speaking
try to remove roadblocks in the vn_strategy() path.

* Remove buf->b_tid (HAMMER no longer needs it)

* Replace IO_NOWDRAIN with IO_NOBWILL, requesting that bwillwrite() not
  be called.  Used by VN to try to avoid deadlocking.  Remove B_NOWDRAIN.

* No longer block in bwrite() or getblk() when we have a lot of dirty
  buffers.   getblk() in particular needs to be callable by filesystems
  to drain dirty buffers and we don't want to deadlock.

* Improve bwillwrite() by having it wake up the buffer flusher at 1/2 the
  dirty buffer limit but not block, and then block if the limit is reached.
  This should smooth out flushes during heavy filesystem activity.

Revision 1.43: download - view: text, markup, annotated - select for diffs
Tue Feb 5 07:58:41 2008 UTC (6 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_12_Slip, DragonFly_RELEASE_1_12
Diff to: previous 1.42: preferred, unified
Changes since revision 1.42: +1 -0 lines
HAMMER 25/many: Pruning code

* Add b_tid to struct buf so dirty buffer cache buffers can be tagged with
  a transaction id to try to retain consistency when doing as-of queries
  on files that change size (so the data records have a TID <= the inode
  record).  This is also an issue when a file is created and immediately
  written to.  This may be temporary, a more sophisticated solution is needed.

* Fix a bug in the special handling of create_tid for as-of queries
  in btree_search().  An assignment was off by one, causing historical
  queries to not be able to find bits of data here and there.

* Freeze the transaction id for newly created inodes until the initial
  inode record is laid down on disk, so the transaction id matches the
  transaction id of the related directory entry.

* Major work on the pruning code.   When pruning the tree to a particular
  granularity the create_tid and delete_tid of related records must be
  aligned to that granularity in order to avoid creating 'holes' at
  various time points.

  This requires some serious B-Tree manipulation because the right-hand
  boundary may need to be shifted when the create_tid of an existing
  record is forward aligned.  This work is still in progress but it works
  in basic testing.

  Prune the tree in the reverse direction instead of in the forward
  direction.  This keeps the B-Tree consistent when we have to adjust
  the right-hand boundary to accomodate the realignment of create_tid.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Thu Jan 10 07:34:03 2008 UTC (6 years, 7 months ago) by dillon
Branches: MAIN
Diff to: previous 1.41: preferred, unified
Changes since revision 1.41: +15 -1 lines
Fix buffer cache deadlocks by splitting dirty buffers into two categories:
Light weight dirty buffers and heavy weight dirty buffers.  Add a second
buffer cache flushing daemon to deal with the heavy weight dirty buffers.

Currently only HAMMER uses the new feature, but it can also easily be used
by UFS in the future.

Buffer cache deadlocks can occur in low memory situations where the buffer
cache tries to flush out dirty buffers and deadlocks when the act of
flushing a dirty buffer requires additional buffers to be acquired.  Because
there was only one buffer flushing daemon, a deadlock on a heavy weight buffer
prevented any further buffer flushes, whether light or heavy weight, and
wound up deadlocking the entire system.

Giving the heavy weight buffers their own daemon solves the problem by
allowing light weight buffers to continue to be flushed even if a stall
occurs on a heavy weight buffer.  The numbers of dirty heavy weight buffers
is limited to ensure that enough light weight buffers are available.

This is primarily implemented by changing getblk()'s mostly unused slpflag
parameter to a new blkflags parameter and adding a new buffer cache queue
called BQUEUE_DIRTY_HW.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Wed Nov 7 00:46:38 2007 UTC (6 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.40: preferred, unified
Changes since revision 1.40: +6 -0 lines
Add bio_ops->io_checkread and io_checkwrite - a read and write pre-check
which gives HAMMER a chance to set B_LOCKED if the kernel wants to write out
a passively held buffer.

Change B_LOCKED semantics slightly.  B_LOCKED buffers will not be written
until B_LOCKED is cleared.  This allows HAMMER to hold off B_DELWRI writes
on passively held buffers.

Revision 1.40: download - view: text, markup, annotated - select for diffs
Tue Nov 6 20:06:24 2007 UTC (6 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.39: preferred, unified
Changes since revision 1.39: +1 -0 lines
Add regetblk() - reacquire a buffer lock.  The buffer must be B_LOCKED or
must be interlocked with bio_ops.  Used by HAMMER.

Further changes to B_LOCKED buffers.  A B_LOCKED|B_DELWRI buffer will be
placed on the dirty queue and then returned to the locked queue once the
I/O completes.  That is, B_LOCKED does not interfere with B_DELWRI
operation.

Revision 1.39: download - view: text, markup, annotated - select for diffs
Tue Nov 6 03:49:59 2007 UTC (6 years, 9 months ago) by dillon
Branches: MAIN
Diff to: previous 1.38: preferred, unified
Changes since revision 1.38: +4 -15 lines
Convert the global 'bioops' into per-mount bio_ops.  For now we also have
to have a per buffer b_ops as well since the controlling filesystem cannot
be located from information in struct buf (b_vp could be the backing store
so that can't be used).  This change allows HAMMER to use bio_ops.

Change the ordering of the bio_ops.io_deallocate call so it occurs before
the buffer's B_LOCKED is checked.  This allows the deallocate call to set
B_LOCKED to retain the buffer in situations where the target filesystem
is unable to immediately disassociate the buffer.  Also keep VMIO intact
for B_LOCKED buffers (in addition to B_DELWRI buffers).

HAMMER will use this feature to keep buffers passively associated with
other filesystem structures and thus be able to avoid constantly brelse()ing
and getblk()ing them.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Fri Jul 28 02:17:41 2006 UTC (8 years ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_8_Slip, DragonFly_RELEASE_1_8, DragonFly_RELEASE_1_10_Slip, DragonFly_RELEASE_1_10
Diff to: previous 1.37: preferred, unified
Changes since revision 1.37: +2 -3 lines
MASSIVE reorganization of the device operations vector.  Change cdevsw
to dev_ops.  dev_ops is a syslink-compatible operations vector structure
similar to the vop_ops structure used by vnodes.

Remove a huge number of instances where a thread pointer is still being
passed as an argument to various device ops and other related routines.
The device OPEN and IOCTL calls now take a ucred instead of a thread pointer,
and the CLOSE call no longer takes a thread pointer.

Revision 1.21.2.1: download - view: text, markup, annotated - select for diffs
Mon Jun 5 14:51:29 2006 UTC (8 years, 2 months ago) by dillon
Branches: DragonFly_RELEASE_1_4
CVS tags: DragonFly_RELEASE_1_4_Slip
Diff to: previous 1.21: preferred, unified; next MAIN 1.22: preferred, unified
Changes since revision 1.21: +1 -0 lines
Add some diagnostic messages to try to catch a ufs_dirbad panic before it
happens.

MFC: Reorder BUF_UNLOCK() - it must occur after b_flags is modified, not
before.

A newly created non-VMIO buffer is now marked B_INVAL.  Callers of getblk()
now always clear B_INVAL before issuing a READ I/O or when clearing or
overwriting the buffer.  Before this change, a getblk() (getnewbuf),
brelse(), getblk() sequence on a non-VMIO buffer would result in a buffer
with B_CACHE set yet containing uninitialized data.

MFC: B_NOCACHE cannot be set on a clean VMIO-backed buffer as this will
destroy the VM backing store, which might be dirty.

MFC: Reorder vnode_pager_setsize() calls to close a race condition.

Revision 1.37: download - view: text, markup, annotated - select for diffs
Thu May 25 19:31:14 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_RELEASE_1_6_Slip, DragonFly_RELEASE_1_6
Diff to: previous 1.36: preferred, unified
Changes since revision 1.36: +1 -1 lines
Fix several buffer cache issues related to B_NOCACHE.

* Do not set B_NOCACHE when calling vinvalbuf(... V_SAVE).  This will
  destroy dirty VM backing store associated with clean buffers before
  the VM system has a chance to check for and flush them.

  Taken-from: FreeBSD

* Properly set B_NOCACHE when destroying buffers related to truncated data.

* Fix a bug in vnode_pager_setsize() that was recently introduced.
  v_filesize was being set before a new/old size comparison, causing a
  file truncation to not destroy related VM pages past the new EOF.

* Remove a bogus B_NOCACHE|B_DIRTY test in brelse().  This was originally
  intended to be a B_NOCACHE|B_DELWRITE test which then cleared B_NOCACHE,
  but now that B_NOCACHE operation has been fixed it really does indicate that
  the buffer, its contents, and its backing store are to be destroyed, even
  if the buffer is marked B_DELWRI.

  Instead of clearing B_NOCACHE when B_DELWRITE is found to be set, clear
  B_DELWRITE when B_NOCACHE is found to be set.

  Note that B_NOCACHE is still cleared when bdirty() is called in order to
  ensure that data is not lost when softupdates and other code do a
  'B_NOCACHE + bwrite' sequence.  Softupdates can redirty a buffer in its
  io completion hook and a write error can also redirty a buffer.

* The VMIO buffer rundown seems to have mophed into a state where the
  distinction between NFS and non-NFS buffers can be removed.  Remove
  the test.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Sun May 21 03:43:47 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.35: preferred, unified
Changes since revision 1.35: +5 -3 lines
Clean up more #include files.  Create an internal __boolean_t so two or
three sys/ header files don't have to juggle the type.  Use
_KERNEL_STRUCTURES in variuos pieces of user code that delve into kvm.

Reported-by: Rumko <rumcic@gmail.com>, walt <wa1ter@myrealbox.com>

Revision 1.35: download - view: text, markup, annotated - select for diffs
Thu May 4 18:32:23 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.34: preferred, unified
Changes since revision 1.34: +15 -18 lines
Block devices generally truncate the size of I/O requests which go past EOF.
This is exactly what we want when manually reading or writing a block device
such as /dev/ad0s1a, but is not desired when a VFS issues I/O ops on
filesystem buffers.  In such cases, any EOF condition must be considered an
error.

Implement a new filesystem buffer flag B_BNOCLIP, which getblk() and friends
automatically set.  If set, block devices are guarenteed to return an error
if the I/O request is at EOF or would otherwise have to be clipped to EOF.
Block devices further guarentee that b_bcount will not be modified when this
flag is set.

Adjust all block device EOF checks to use the new flag, and clean up the code
while I'm there.  Also, set b_resid in a couple of degenerate cases where
it was not being set.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Wed May 3 20:44:49 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.33: preferred, unified
Changes since revision 1.33: +23 -8 lines
- Clarify the definitions of b_bufsize, b_bcount, and b_resid.
- Remove unnecessary assignments based on the clarified fields.
- Add additional checks for premature EOF.

b_bufsize is only used by buffer management entities such as getblk() and
other vnode-backed buffer handling procedures.  b_bufsize is not required
for calls to vn_strategy() or dev_dstrategy().  A number of other subsystems
use it to track the original request size.

b_bcount is the I/O request size, but b_bcount() is allowed to be truncated
by the device chain if the request encompasses EOF (such as on a raw disk
device).  A caller which needs to record the original buffer size verses
the EOF-truncated buffer can compare b_bcount after the I/O against a
recorded copy of the original request size.  This copy can be recorded in
b_bufsize for unmanaged buffers (malloced or getpbuf()'d buffers).

b_resid is always relative to b_bcount, not b_bufsize.  A successful read
that is truncated to the device EOF will thus have a b_resid of 0 and a
truncated b_bcount.

Revision 1.33: download - view: text, markup, annotated - select for diffs
Sun Apr 30 20:23:25 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.32: preferred, unified
Changes since revision 1.32: +1 -2 lines
Remove buf->b_saveaddr, assert that vmapbuf() is only called on pbuf's.  Pass
the user pointer and length to vmapbuf() rather then having it try to pull
the information out of the buffer.  vmapbuf() is now responsible for setting
b_data, b_bufsize, and b_bcount.

Also fix a bug in cam_periph_mapmem().  The procedure was failing to unmap
earlier vmapped bufs if later vmapbuf() calls in the loop failed.

Revision 1.32: download - view: text, markup, annotated - select for diffs
Sun Apr 30 18:25:36 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.31: preferred, unified
Changes since revision 1.31: +10 -20 lines
Remove b_xflags.  Fold BX_VNCLEAN and BX_VNDIRTY into b_flags as
B_VNCLEAN and B_VNDIRTY.  Remove BX_AUTOCHAINDONE and recode the
swap pager to use one of the caller data fields in the BIO instead.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Sun Apr 30 17:22:17 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.30: preferred, unified
Changes since revision 1.30: +15 -7 lines
Replace the the buffer cache's B_READ, B_WRITE, B_FORMAT, and B_FREEBUF
b_flags with a separate b_cmd field.  Use b_cmd to test for I/O completion
as well (getting rid of B_DONE in the process).  This further simplifies
the setup required to issue a buffer cache I/O.

Remove a redundant header file, bus/isa/i386/isa_dma.h and merge any
discrepancies into bus/isa/isavar.h.

Give ISADMA_READ/WRITE/RAW their own independant flag definitions instead of
trying to overload them on top of B_READ, B_WRITE, and B_RAW.  Add a
routine isa_dmabp() which takes a struct buf pointer and returns the ISA
dma flags associated with the operation.

Remove the 'clear_modify' argument to vfs_busy_pages().  Instead,
vfs_busy_pages() asserts that the buffer's b_cmd is valid and then uses
it to determine the action it must take.

Revision 1.30: download - view: text, markup, annotated - select for diffs
Fri Apr 28 16:34:01 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.29: preferred, unified
Changes since revision 1.29: +1 -3 lines
Get rid of pbgetvp() and pbrelvp().  Instead fold the B_PAGING flag directly
into getpbuf() (the only type of buffer that pbgetvp() could be called on
anyway).  Change related b_flags assignments from '=' to '|='.

Get rid of remaining depdendancies on b_vp.  vn_strategy() now relies solely
on the vp passed to it as an argument.  Remove buffer cache code that sets
b_vp for anonymous pbuf's.

Add a stopgap 'vp' argument to vfs_busy_pages().  This is only really needed
by NFS and the clustering code do to the severely hackish nature of the
NFS and clustering code.

Fix a bug in the ext2fs inode code where vfs_busy_pages() was being called
on B_CACHE buffers.  Add an assertion to vfs_busy_pages() to panic if it
encounters a B_CACHE buffer.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Fri Apr 28 06:13:55 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.28: preferred, unified
Changes since revision 1.28: +3 -3 lines
Get rid of the remaining buffer background bitmap code.  It's been turned
off for a while, and it represents a fairly severe hack to the buffer
cache code that just complicates further development.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Fri Apr 28 00:24:46 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.27: preferred, unified
Changes since revision 1.27: +1 -1 lines
Remove the buffer cache's B_PHYS flag.  This flag was originally used as
part of a severe hack to treat buffers containing 'user' addresses
differently, in particular by using b_offset instead of b_blkno.  Now that
buffer cache buffers only HAVE b_offset (b_*blkno is gone for good), there
is literally no difference between B_PHYS I/O and non-B_PHYS I/O once
the buffer has been handed off to the device.

Revision 1.27: download - view: text, markup, annotated - select for diffs
Thu Apr 27 23:28:34 2006 UTC (8 years, 3 months ago) by dillon
Branches: MAIN
Diff to: previous 1.26: preferred, unified
Changes since revision 1.26: +2 -1 lines
Move most references to the buffer cache array (buf[]) to kern/vfs_bio.c.
Implement a procedure which scans all buffers, called scan_all_buffers().
Cleanup unused debugging code referencing buf[].

Revision 1.26: download - view: text, markup, annotated - select for diffs
Sat Mar 25 21:46:38 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.25: preferred, unified
Changes since revision 1.25: +2 -2 lines
Clean up the extended lookup features in the red-black tree code.

Revision 1.25: download - view: text, markup, annotated - select for diffs
Fri Mar 24 18:35:33 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.24: preferred, unified
Changes since revision 1.24: +20 -26 lines
Major BUF/BIO work commit.  Make I/O BIO-centric and specify the disk or
file location with a 64 bit offset instead of a 32 bit block number.

* All I/O is now BIO-centric instead of BUF-centric.

* File/Disk addresses universally use a 64 bit bio_offset now.  bio_blkno
  no longer exists.

* Stackable BIO's hold disk offset translations.  Translations are no longer
  overloaded onto a single structure (BUF or BIO).

* bio_offset == NOOFFSET is now universally used to indicate that a
  translation has not been made.  The old (blkno == lblkno) junk has all
  been removed.

* There is no longer a distinction between logical I/O and physical I/O.

* All driver BUFQs have been converted to BIOQs.

* BMAP, FREEBLKS, getblk, bread, breadn, bwrite, inmem, cluster_*,
  and findblk all now take and/or return 64 bit byte offsets instead
  of block numbers.  Note that BMAP now returns a byte range for the before
  and after variables.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Sun Mar 5 18:38:36 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.23: preferred, unified
Changes since revision 1.23: +8 -9 lines
Replace the global buffer cache hash table with a per-vnode red-black tree.
Add a B_HASHED b_flags bit as a sanity check.  Remove the invalhash junk
and replace with assertions in several cases where the buffer must already
not be hashed.  Get rid of incore() and gbincore() and replace with a new
function called findblk().

Merge the new RB management with bgetvp(), the two are now fully integrated.

Previous work has turned reassignbuf() into a mostly degenerate call, simplify
its arguments and functionality to match.  Remove an unnecessary reassignbuf()
call from the NFS code.  Get rid of pbreassignbuf().

Adjust the code in several places where it was assumed that calling
BUF_LOCK() with LK_SLEEPFAIL after previously failing with LK_NOWAIT
would always fail.  This code was used to sleep before a retry.  Instead,
if the second lock unexpectedly succeeds, simply issue an unlock and retry
anyway.

Testing-by: Stefan Krueger <skrueger@meinberlikomm.de>

Revision 1.23: download - view: text, markup, annotated - select for diffs
Thu Mar 2 19:26:17 2006 UTC (8 years, 5 months ago) by dillon
Branches: MAIN
Diff to: previous 1.22: preferred, unified
Changes since revision 1.22: +0 -1 lines
buftimespinlock is utterly useless since the spinlock is released
within lockmgr().  The only real problem was with lk_prio, which no longer
exists, so get rid of the spin lock and document the remaining passive
races.

Revision 1.22: download - view: text, markup, annotated - select for diffs
Fri Feb 17 19:18:07 2006 UTC (8 years, 6 months ago) by dillon
Branches: MAIN
Diff to: previous 1.21: preferred, unified
Changes since revision 1.21: +53 -48 lines
Make the entire BUF/BIO system BIO-centric instead of BUF-centric.  Vnode
and device strategy routines now take a BIO and must pass that BIO to
biodone().  All code which previously managed a BUF undergoing I/O now
manages a BIO.

The new BIO-centric algorithms allow BIOs to be stacked, where each layer
represents a block translation, completion callback, or caller or device
private data.  This information is no longer overloaded within the BUF.
Translation layer linkages remain intact as a 'cache' after I/O has completed.

The VOP and DEV strategy routines no longer make assumptions as to which
translated block number applies to them.  The use the block number in the
BIO specifically passed to them.

Change the 'untranslated' constant to NOOFFSET (for bio_offset), and
(daddr_t)-1 (for bio_blkno).  Rip out all code that previously set the
translated block number to the untranslated block number to indicate
that the translation had not been made.

Rip out all the cluster linkage fields for clustered VFS and clustered
paging operations.  Clustering now occurs in a private BIO layer using
private fields within the BIO.

Reformulate the vn_strategy() and dev_dstrategy() abstraction(s).  These
routines no longer assume that bp->b_vp == the vp of the VOP operation, and
the dev_t is no longer stored in the struct buf.  Instead, only the vp passed
to vn_strategy() (and related *_strategy() routines for VFS ops), and
the dev_t passed to dev_dstrateg() (and related *_strategy() routines for
device ops) is used by the VFS or DEV code.  This will allow an arbitrary
number of translation layers in the future.

Create an independant per-BIO tracking entity, struct bio_track, which
is used to determine when I/O is in-progress on the associated device
or vnode.

NOTE: Unlike FreeBSD's BIO work, our struct BUF is still used to hold
the fields describing the data buffer, resid, and error state.

Major-testing-by: Stefan Krueger

Revision 1.21: download - view: text, markup, annotated - select for diffs
Sat Nov 19 17:19:48 2005 UTC (8 years, 9 months ago) by dillon
Branches: MAIN
Branch point for: DragonFly_RELEASE_1_4
Diff to: previous 1.20: preferred, unified
Changes since revision 1.20: +4 -2 lines
Convert the lockmgr interlock from a token to a spinlock.  This fixes a
problem on SMP boxes where the MP lock would unexpectedly lose atomicy for
a short period of time due to token acquisition.

Add a tsleep_interlock() call which takes advantage of tsleep()'s cpu
locality of reference to provide a helper function which allows us to
atomically spin_unlock() and tsleep() in an MP safe manner with only
a critical section.  Basically all it does is set a cpumask bit for the
ident hash index to cause other cpu's issuing a wakeup to notify our cpu.
Any actual wakeup occuring during the race period after the spin_unlock
but before the tsleep() call will be delayed by the critical section
until after the tsleep has queued the thread.

Cleanup some unused junk in vm_map.h.

Revision 1.20: download - view: text, markup, annotated - select for diffs
Fri Aug 12 00:17:26 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.19: preferred, unified
Changes since revision 1.19: +1 -1 lines
Move bio_lblkno (logical blockno in a file) field back to its rightful
place, which is in struct buf.  Lower levels have no knowledge of this
little critter.

Suggested-by: 	dillon

Revision 1.19: download - view: text, markup, annotated - select for diffs
Tue Aug 9 05:11:42 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.18: preferred, unified
Changes since revision 1.18: +5 -5 lines
Whitespace cleanup.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Mon Aug 8 16:53:12 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.17: preferred, unified
Changes since revision 1.17: +0 -1 lines
Move the bswlist symbol into vm/vm_pager.c because PBUFs are the only
consumer of the latter.

The PBUF abstraction is just a clever hack, this code will be redone
at some point so this measure is temporary.

Revision 1.17: download - view: text, markup, annotated - select for diffs
Mon Aug 8 01:25:31 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.16: preferred, unified
Changes since revision 1.16: +16 -12 lines
BUF/BIO cleanup 7/99:

First attempt at separating low-level information from BUF structure into
the new BIO structure.  The latter will be used to represent the actual
I/O underlying the buffer cache, other subsystems and device drivers.

Other information from the BUF structure will be moved eventually once
their place in the grand scheme is determined.  For now, preprocess macros
have been added to reduce widespread changes; this is a temporary measure
by all means until more of the BIO and BUF API is formalised.

Remove compatibility preprocessor macros in the AAC driver because our
BUF/BIO system is mutating; not to mention they were getting in the way.

NB the name BIO has been used because it's quite appropriate and known
among kernel developers from other operating system groups, be it BSD or
Linux.

This change should not have any operational affect (famous last words).

Reviewed by:	Matthew Dillon <dillon@dragonflybsd.org>

Revision 1.16: download - view: text, markup, annotated - select for diffs
Thu Aug 4 16:28:30 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.15: preferred, unified
Changes since revision 1.15: +3 -5 lines
Put unused flag space definitions back to their original position in
order to avoid confusion.

Requested-by: 	Matt

Revision 1.15: download - view: text, markup, annotated - select for diffs
Wed Aug 3 16:55:11 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.14: preferred, unified
Changes since revision 1.14: +1 -1 lines
Bring name of an unused flag field in line with the rest.

Revision 1.14: download - view: text, markup, annotated - select for diffs
Wed Aug 3 16:36:33 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.13: preferred, unified
Changes since revision 1.13: +5 -3 lines
BUF/BIO cleanup 3/99:

Retire the B_CALL flag in favour of checking the bp->b_iodone pointer
directly, thus simplifying the BUF interface even more.

Move scattered B_UNUSED* flag space defintions into one place, that
is below the rest of the definitions.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Wed Aug 3 04:59:53 2005 UTC (9 years ago) by hmp
Branches: MAIN
Diff to: previous 1.12: preferred, unified
Changes since revision 1.12: +0 -13 lines
BUF/BIO cleanup 2/99:

Localise buffer queue information into kern/vfs_bio.c, it should not be
messed with outside of the named file.  Convert the QUEUE_* #defines
into enum bufq_type, prefix the names with 'B'.  The change to initpbuf()
is acceptable since they are a hack anyway, not to mention that

Move vfs_bufstats() from kern/vfs_syscalls.c into kern/vfs_bio.c since
that's where it should really belong, atleast till its use is cleaned.

Move bufqueues extern from sys/buf.h into kern/vfs_bio.c as it shouldn't
be messed with by anything else.  It was only sitting in sys/buf.h
because of vfs_bufstats().

Note the change to initpbuf() is acceptable since they are a hack anyway,
not to mention that the said function and friends should probably reside
in kern/vfs_bio.c.

Revision 1.12: download - view: text, markup, annotated - select for diffs
Fri Apr 15 19:08:13 2005 UTC (9 years, 4 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_Stable
Diff to: previous 1.11: preferred, unified
Changes since revision 1.11: +9 -3 lines
Implement Red-Black trees for the vnode clean/dirty buffer lists.

Implement ranged fsyncs and adjust the syncer to use the new capability.
This capability will also soon be used to replace the write_behind
heuristic.  Rewrite the fsync code for all VFSs to use the new APIs
(generally simplifying them).

Get rid of B_WRITEINPROG, it is no longer useful or needed.
Get rid of B_SCANNED, it is no longer useful or needed.

Rewrite the NFS 2-phase commit protocol to take advantage of the new
Red-Black tree topology.

Add RB_SCAN() for callback-scanning of Red-Black trees.  Give RB_SCAN
the ability to track the 'next' scan node and automatically fix it up
if the callback directly or indirectly or through blocking indirectly
deletes nodes in the tree while the scan is in progress.

Remove most related loop restart conditions, they are no longer necessary.

Disable filesystem background bitmap writes.  This really needs to be
solved a different way and the concept does not work well with red-black
trees.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Sat Jul 17 01:45:37 2004 UTC (10 years, 1 month ago) by hmp
Branches: MAIN
CVS tags: DragonFly_Snap29Sep2004, DragonFly_Snap13Sep2004, DragonFly_RELEASE_1_2_Slip, DragonFly_RELEASE_1_2
Diff to: previous 1.10: preferred, unified
Changes since revision 1.10: +0 -2 lines
BUF/BIO stage 2:

	o Remove remaining source references to b_caller2 and b_driver2
	  field members of the BUF structure.

	o Remove b_caller2 and b_driver2 field members from the BUF
	  structure.

Discussed-with:  	Matthew Dillon <dillon@apollo.backplane.com>

Revision 1.10: download - view: text, markup, annotated - select for diffs
Fri Jul 16 02:01:17 2004 UTC (10 years, 1 month ago) by hmp
Branches: MAIN
Diff to: previous 1.9: preferred, unified
Changes since revision 1.9: +1 -1 lines
Annotate the b_xio field member of the BUF structure.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Wed Jul 14 03:10:17 2004 UTC (10 years, 1 month ago) by hmp
Branches: MAIN
Diff to: previous 1.8: preferred, unified
Changes since revision 1.8: +6 -2 lines
BUF/BIO work, for removing the requirement of KVA mappings for I/O
requests.

Stage 1 of 8:

	o Replace the b_pages member of the BUF structure with an embedded
	  XIO (b_xio).  The XIO will be used for managing the BUF's page
	  lists.

	o Initialize the XIO at two main (only) points: 1) the pbuf code,
	  which is used by the NFS code to create a temporary buffer; and
	  bufinit(9), which is used by the rest of the BUF/BIO consumers.

Discussed-with: 	Matthew Dillon <dillon@apollo.backplane.com>,

Revision 1.8: download - view: text, markup, annotated - select for diffs
Mon Feb 16 19:09:31 2004 UTC (10 years, 6 months ago) by dillon
Branches: MAIN
CVS tags: DragonFly_1_0_REL, DragonFly_1_0_RC1, DragonFly_1_0A_REL
Diff to: previous 1.7: preferred, unified
Changes since revision 1.7: +1 -1 lines
Make buftimetoken an extern so it is not declared as a common variable.
Modules were compiling up with their own local copy of buftimetoken rather
then linking against the kernel's buftimetoken, causing modules to crash.

Reported-by: Adam K Kirchhoff <adamk@voicenet.com>

Revision 1.7: download - view: text, markup, annotated - select for diffs
Wed Aug 20 07:31:21 2003 UTC (11 years ago) by rob
Branches: MAIN
Diff to: previous 1.6: preferred, unified
Changes since revision 1.6: +57 -57 lines
__P() != wanted, begin removal, in order to preserve white space this needs
to be done by hand, as I accidently killed a source tree that I had gotten
this far on. I'm committing this now, LINT and GENERIC both build with
these changes, there are many more to come.

Revision 1.6: download - view: text, markup, annotated - select for diffs
Tue Jul 22 17:03:34 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.5: preferred, unified
Changes since revision 1.5: +8 -1 lines
DEV messaging stage 2/4: In this stage all DEV commands are now being
funneled through the message port for action by the port's beginmsg function.
CONSOLE and DISK device shims replace the port with their own and then
forward to the original.  FB (Frame Buffer) shims supposedly do the same
thing but I haven't been able to test it.   I don't expect instability
in mainline code but there might be easy-to-fix, and some drivers still need
to be converted.  See primarily: kern/kern_device.c (new dev_*() functions and
inherits cdevsw code from kern/kern_conf.c), sys/device.h, and kern/subr_disk.c
for the high points.

In this stage all DEV messages are still acted upon synchronously in the
context of the caller.  We cannot create a separate handler thread until
the copyin's (primarily in ioctl functions) are made thread-aware.

Note that the messaging shims are going to look rather messy in these early
days but as more subsystems are converted over we will begin to use
pre-initialized messages and message forwarding to avoid having to constantly
rebuild messages prior to use.

Note that DEV itself is a mess oweing to its 4.x roots and will be cleaned
up in subsequent passes.  e.g. the way sub-devices inherit the main device's
cdevsw was always a bad hack and it still is, and several functions
(mmap, kqfilter, psize, poll) return results rather then error codes, which
will be fixed since now we have a message to store the result in :-)

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sun Jul 6 21:23:54 2003 UTC (11 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.4: preferred, unified
Changes since revision 1.4: +2 -2 lines
MP Implementation 1/2: Get the APIC code working again, sweetly integrate the
MP lock into the LWKT scheduler, replace the old simplelock code with
tokens or spin locks as appropriate.  In particular, the vnode interlock
(and most other interlocks) are now tokens.  Also clean up a few curproc/cred
sequences that are no longer needed.

The APs are left in degenerate state with non IPI interrupts disabled as
additional LWKT work must be done before we can really make use of them,
and FAST interrupts are not managed by the MP lock yet.  The main thing
for this stage was to get the system working with an APIC again.

buildworld tested on UP and 2xCPU/MP (Dell 2550)

Revision 1.4: download - view: text, markup, annotated - select for diffs
Thu Jun 26 05:55:19 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: PRE_MP
Diff to: previous 1.3: preferred, unified
Changes since revision 1.3: +4 -7 lines
proc->thread stage 5:  BUF/VFS clearance!  Remove the ucred argument from
vop_close, vop_getattr, vop_fsync, and vop_createvobject.  These VOPs can
be called from multiple contexts so the cred is fairly useless, and UFS
ignorse it anyway.  For filesystems (like NFS) that sometimes need a cred
we use proc0.p_ucred for now.

This removal also removed the need for a 'proc' reference in the related
VFS procedures, which greatly helps our proc->thread conversion.

bp->b_wcred and bp->b_rcred have also been removed, and for the same reason.
It makes no sense to have a particular cred when multiple users can
access a file.  This may create issues with certain types of NFS mounts
but if it does we will solve them in a way that doesn't pollute the
struct buf.

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Jun 19 01:55:07 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.2: preferred, unified
Changes since revision 1.2: +2 -152 lines
thread stage 5: Separate the inline functions out of sys/buf.h, creating
sys/buf2.h (A methodology that will continue as time passes).  This solves
inline vs struct ordering problems.

Do a major cleanup of the globaldata access methodology.  Create a
gcc-cacheable 'mycpu' macro & inline to access per-cpu data.  Atomicy is not
required because we will never change cpus out from under a thread, even if
it gets preempted by an interrupt thread, because we want to be able to
implement per-cpu caches that do not require locked bus cycles or special
instructions.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Tue Jun 17 04:28:58 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.1: preferred, unified
Changes since revision 1.1: +1 -0 lines
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue Jun 17 02:55:49 2003 UTC (11 years, 2 months ago) by dillon
Branches: MAIN
CVS tags: FREEBSD_4_FORK
import from FreeBSD RELENG_4 1.88.2.10

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options