File:  [DragonFly] / src / share / man / man4 / vinum.4
Revision 1.2: download - view: text, annotated - select for diffs
Tue Jun 17 04:36:59 2003 UTC (10 years, 10 months ago) by dillon
Branches: MAIN
CVS tags: HEAD
Add the DragonFly cvs id and perform general cleanups on cvs/rcs/sccs ids.  Most
ids have been removed from !lint sections and moved into comment sections.

    1: .\"  Hey, Emacs, edit this file in -*- nroff-fill -*- mode
    2: .\"-
    3: .\" Copyright (c) 1997, 1998
    4: .\"	Nan Yang Computer Services Limited.  All rights reserved.
    5: .\"
    6: .\"  This software is distributed under the so-called ``Berkeley
    7: .\"  License'':
    8: .\"
    9: .\" Redistribution and use in source and binary forms, with or without
   10: .\" modification, are permitted provided that the following conditions
   11: .\" are met:
   12: .\" 1. Redistributions of source code must retain the above copyright
   13: .\"    notice, this list of conditions and the following disclaimer.
   14: .\" 2. Redistributions in binary form must reproduce the above copyright
   15: .\"    notice, this list of conditions and the following disclaimer in the
   16: .\"    documentation and/or other materials provided with the distribution.
   17: .\" 3. All advertising materials mentioning features or use of this software
   18: .\"    must display the following acknowledgement:
   19: .\"	This product includes software developed by Nan Yang Computer
   20: .\"      Services Limited.
   21: .\" 4. Neither the name of the Company nor the names of its contributors
   22: .\"    may be used to endorse or promote products derived from this software
   23: .\"    without specific prior written permission.
   24: .\"
   25: .\" This software is provided ``as is'', and any express or implied
   26: .\" warranties, including, but not limited to, the implied warranties of
   27: .\" merchantability and fitness for a particular purpose are disclaimed.
   28: .\" In no event shall the company or contributors be liable for any
   29: .\" direct, indirect, incidental, special, exemplary, or consequential
   30: .\" damages (including, but not limited to, procurement of substitute
   31: .\" goods or services; loss of use, data, or profits; or business
   32: .\" interruption) however caused and on any theory of liability, whether
   33: .\" in contract, strict liability, or tort (including negligence or
   34: .\" otherwise) arising in any way out of the use of this software, even if
   35: .\" advised of the possibility of such damage.
   36: .\"
   37: .\" $FreeBSD: src/share/man/man4/vinum.4,v 1.22.2.9 2002/04/22 08:19:35 kuriyama Exp $
   38: .\" $DragonFly: src/share/man/man4/vinum.4,v 1.2 2003/06/17 04:36:59 dillon Exp $
   39: .\"
   40: .Dd October 5, 1999
   41: .Dt vinum 4
   42: .Os
   43: .Sh NAME
   44: .Nm vinum
   45: .Nd Logical Volume Manager
   46: .Sh SYNOPSIS
   47: .Cd "kldload vinum"
   48: .Cd "kldload Vinum"
   49: .Sh DESCRIPTION
   50: .Nm
   51: is a logical volume manager inspired by, but not derived from, the Veritas
   52: Volume Manager.  It provides the following features:
   53: .Bl -bullet
   54: .It
   55: It provides device-independent logical disks, called \fIvolumes\fP.  Volumes are
   56: not restricted to the size of any disk on the system.
   57: .It
   58: The volumes consist of one or more \fIplexes\fP, each of which contain the
   59: entire address space of a volume.  This represents an implementation of RAID-1
   60: (mirroring).  Multiple plexes can also be used for
   61: .\" XXX What about sparse plexes?  Do we want them?
   62: .if t .sp
   63: .Bl -bullet
   64: .It
   65: Increased read throughput.
   66: .Nm
   67: will read data from the least active disk, so if a volume has plexes on multiple
   68: disks, more data can be read in parallel.
   69: .Nm
   70: reads data from only one plex, but it writes data to all plexes.
   71: .It
   72: Increased reliability.  By storing plexes on different disks, data will remain
   73: available even if one of the plexes becomes unavailable.  In comparison with a
   74: RAID-5 plex (see below), using multiple plexes requires more storage space, but
   75: gives better performance, particularly in the case of a drive failure.
   76: .It
   77: Additional plexes can be used for on-line data reorganization.  By attaching an
   78: additional plex and subsequently detaching one of the older plexes, data can be
   79: moved on-line without compromising access.
   80: .It
   81: An additional plex can be used to obtain a consistent dump of a file system.  By
   82: attaching an additional plex and detaching at a specific time, the detached plex
   83: becomes an accurate snapshot of the file system at the time of detachment.
   84: .\" Make sure to flush!
   85: .El
   86: .It
   87: Each plex consists of one or more logical disk slices, called \fIsubdisks\fP.
   88: Subdisks are defined as a contiguous block of physical disk storage.  A plex may
   89: consist of any reasonable number of subdisks (in other words, the real limit is
   90: not the number, but other factors, such as memory and performance, associated
   91: with maintaining a large number of subdisks).
   92: .It
   93: A number of mappings between subdisks and plexes are available:
   94: .Bl -bullet
   95: .It
   96: \fIConcatenated plexes\fP\| consist of one or more subdisks, each of which
   97: is mapped to a contiguous part of the plex address space.
   98: .It
   99: \fIStriped plexes\fP\| consist of two or more subdisks of equal size.  The file
  100: address space is mapped in \fIstripes\fP, integral fractions of the subdisk
  101: size.  Consecutive plex address space is mapped to stripes in each subdisk in
  102: .if n turn.
  103: .if t \{\
  104: turn.
  105: .ig
  106: .\" FIXME
  107: .br
  108: .ne 1.5i
  109: .PS
  110: move right 2i
  111: down
  112: SD0: box
  113: SD1: box
  114: SD2: box
  115: 
  116: "plex 0" at SD0.n+(0,.2)
  117: "subdisk 0" rjust at SD0.w-(.2,0)
  118: "subdisk 1" rjust at SD1.w-(.2,0)
  119: "subdisk 2" rjust at SD2.w-(.2,0)
  120: .PE
  121: ..
  122: .\}
  123: The subdisks of a striped plex must all be the same size.
  124: .It
  125: \fIRAID-5 plexes\fP\| require at least three equal-sized subdisks.  They
  126: resemble striped plexes, except that in each stripe, one subdisk stores parity
  127: information.  This subdisk changes in each stripe: in the first stripe, it is the
  128: first subdisk, in the second it is the second subdisk, etc.  In the event of a
  129: single disk failure,
  130: .Nm
  131: will recover the data based on the information stored on the remaining subdisks.
  132: This mapping is particularly suited to read-intensive access.  The subdisks of a
  133: RAID-5 plex must all be the same size.
  134: .\" Make sure to flush!
  135: .El
  136: .It
  137: .Nm Drives
  138: are the lowest level of the storage hierarchy.  They represent disk special
  139: devices.
  140: .It
  141: .Nm
  142: offers automatic startup.  Unlike UNIX file systems,
  143: .Nm
  144: volumes contain all the configuration information needed to ensure that they are
  145: started correctly when the subsystem is enabled.  This is also a significant
  146: advantage over the Veritas\(tm File System.  This feature regards the presence
  147: of the volumes.  It does not mean that the volumes will be mounted
  148: automatically, since the standard startup procedures with
  149: .Pa /etc/fstab
  150: perform this function.
  151: .El
  152: .Sh KERNEL CONFIGURATION
  153: .Nm
  154: is currently supplied as a kernel loadable module (kld), and does not require
  155: configuration.  As with other klds, it is absolutely necessary to match the kld
  156: to the version of the operating system.  Failure to do so will cause
  157: .Nm
  158: to issue an error message and terminate.
  159: .Pp
  160: It is possible to configure
  161: .Nm
  162: in the kernel, but this is not recommended.  To do so, add this line to the
  163: kernel configuration file:
  164: .Bd -literal -offset indent
  165: pseudo-device	vinum
  166: .Ed
  167: .Pp
  168: .Ss DEBUG OPTIONS
  169: The current version of
  170: .Nm ,
  171: both the kernel module and the user program
  172: .Xr vinum 8 ,
  173: include significant debugging support.  It is not recommended to remove
  174: this support at the moment, but if you do you must remove it from both the
  175: kernel and the user components.  To do this, edit the files
  176: .Pa /usr/src/sbin/vinum/Makefile
  177: and
  178: .Pa /usr/src/sys/modules/vinum/Makefile
  179: and edit the CFLAGS variable to remove the -DVINUMDEBUG option.  If you have
  180: configured
  181: .Nm
  182: into the kernel, either specify the line
  183: .Bd -literal -offset indent
  184: options		VINUMDEBUG
  185: .Ed
  186: .Pp
  187: in the kernel configuration file or remove the -DVINUMDEBUG option from
  188: .Pa /usr/src/sbin/vinum/Makefile
  189: as described above.
  190: .Pp
  191: If the VINUMDEBUG variables do not match,
  192: .Xr vinum 8
  193: will fail with a message
  194: explaining the problem and what to do to correct it.
  195: .Pp
  196: .Nm
  197: was previously available in two versions: a freely available version which did
  198: not contain RAID-5 functionality, and a full version including RAID-5
  199: functionality, which was available only from Cybernet Systems Inc.  The present
  200: version of
  201: .Nm
  202: includes the RAID-5 functionality.
  203: .Sh RUNNING VINUM
  204: .Nm
  205: is part of the base
  206: .Fx
  207: system.  It does not require installation.
  208: To start it, start the
  209: .Nm
  210: program, which will load the kld if it is not already present.
  211: Before using
  212: .Nm ,
  213: it must be configured.  See
  214: .Xr vinum 8
  215: for information on how to create a
  216: .Nm
  217: configuration.
  218: .Pp
  219: Normally, you start a configured version of
  220: .Nm
  221: at boot time.  Set the variable
  222: .Ar start_vinum
  223: in
  224: .Pa /etc/rc.conf
  225: to
  226: .Ar YES
  227: to start
  228: .Nm
  229: at boot time.
  230: .Pp
  231: If
  232: .Nm
  233: is loaded as a kld (the recommended way), the
  234: .Nm
  235: .Ar stop
  236: command will unload it.  You can also do this with the
  237: .Nm kldunload
  238: command.
  239: .Pp
  240: The kld can only be unloaded when idle, in other words when no volumes are
  241: mounted and no other instances of the
  242: .Nm
  243: program are active.  Unloading the kld does not harm the data in the volumes.
  244: .Ss CONFIGURING AND STARTING OBJECTS
  245: Use the
  246: .Xr vinum 8
  247: utility to configure and start
  248: .Nm
  249: objects.
  250: .Sh IOCTL CALLS
  251: .Pa ioctl
  252: calls are intended for the use of the
  253: .Nm
  254: configuration program only.  They are described in the header file
  255: .Pa /sys/sys/vinumio.h
  256: .Ss DISK LABELS
  257: Conventional disk special devices have a
  258: .Em disk label
  259: in the second sector of the device.  See
  260: .Xr disklabel 5
  261: for more details.  This disk label describes the layout of the partitions within
  262: the device.
  263: .Nm
  264: does not subdivide volumes, so volumes do not contain a physical disk label.
  265: For convenience,
  266: .Nm
  267: implements the ioctl calls DIOCGDINFO (get disk label), DIOCGPART (get partition
  268: information), DIOCWDINFO (write partition information) and DIOCSDINFO (set
  269: partition information).  DIOCGDINFO and DIOCGPART refer to an internal
  270: representation of the disk label which is not present on the volume.  As a
  271: result, the
  272: .Fl r
  273: option of
  274: .Xr disklabel 8 ,
  275: which reads the
  276: .if t ``raw disk'',
  277: .if n "raw disk",
  278: will fail.
  279: .Pp
  280: In general,
  281: .Xr disklabel 8
  282: serves no useful purpose on a vinum volume.  If you run it, it will show you
  283: three partitions, a, b and c, all the same except for the fstype, for example:
  284: .br
  285: .ne 1i
  286: .Bd -literal -offset
  287: 3 partitions:
  288: #        size   offset    fstype   [fsize bsize bps/cpg]
  289:   a:     2048        0    4.2BSD     1024  8192     0   # (Cyl.    0 - 0)
  290:   b:     2048        0      swap                        # (Cyl.    0 - 0)
  291:   c:     2048        0    unused        0     0         # (Cyl.    0 - 0)
  292: .Ed
  293: .Pp
  294: .Nm
  295: ignores the DIOCWDINFO and DIOCSDINFO ioctls, since there is nothing to change.
  296: As a result, any attempt to modify the disk label will be silently ignored.
  297: .Sh MAKING FILE SYSTEMS
  298: Since
  299: .Nm
  300: volumes do not contain partitions, the names do not need to conform to the
  301: standard rules for naming disk partitions.  For a physical disk partition, the
  302: last letter of the device name specifies the partition identifier (a to h).
  303: .Nm
  304: volumes need not conform to this convention, but if they do not,
  305: .Nm newfs
  306: will complain that it cannot determine the partition.  To solve this problem,
  307: use the
  308: .Fl v
  309: flag to
  310: .Nm newfs .
  311: For example, if you have a volume
  312: .Pa concat ,
  313: use the following command to create a ufs file system on it:
  314: .Pp
  315: .Bd -literal
  316:   # newfs -v /dev/vinum/concat
  317: .Ed
  318: .Pp
  319: .Sh OBJECT NAMING
  320: .Nm
  321: assigns default names to plexes and subdisks, although they may be overridden.
  322: We do not recommend overriding the default names.  Experience with the
  323: .if t Veritas\(tm
  324: .if n Veritas(tm)
  325: volume manager, which allows arbitary naming of objects, has shown that this
  326: flexibility does not bring a significant advantage, and it can cause confusion.
  327: .sp
  328: Names may contain any non-blank character, but it is recommended to restrict
  329: them to letters, digits and the underscore characters.  The names of volumes,
  330: plexes and subdisks may be up to 64 characters long, and the names of drives may
  331: up to 32 characters long.  When choosing volume and plex names, bear in mind
  332: that automatically generated plex and subdisk names are longer than the name
  333: from which they are derived.
  334: .Bl -bullet
  335: .It
  336: When
  337: .Xr vinum 8
  338: creates or deletes objects, it creates a directory
  339: .Pa /dev/vinum ,
  340: in which it makes device entries for each volume.  It also creates the
  341: subdirectories
  342: .Pa /dev/vinum/plex
  343: and
  344: .Pa /dev/vinum/sd ,
  345: in which it stores device entries for the plexes and subdisks.  In addition, it
  346: creates two more directories,
  347: .Pa /dev/vinum/vol
  348: and
  349: .Pa /dev/vinum/drive ,
  350: in which it stores hierarchical information for volumes and drives.
  351: .It
  352: In addition,
  353: .Nm
  354: creates three super-devices,
  355: .Pa /dev/vinum/control ,
  356: .Pa /dev/vinum/Control
  357: and
  358: .Pa /dev/vinum/controld .
  359: .Pa /dev/vinum/control
  360: is used by
  361: .Xr vinum 8
  362: when it has been compiled without the VINUMDEBUG option,
  363: .Pa /dev/vinum/Control
  364: is used by
  365: .Xr vinum 8
  366: when it has been compiled with the VINUMDEBUG option,
  367: and
  368: .Pa /dev/vinum/controld
  369: is used by the
  370: .Nm
  371: daemon.  The two control devices for
  372: .Xr vinum 8
  373: are used to synchronize the debug status of kernel and user modules.
  374: .It
  375: Unlike
  376: .Nm UNIX
  377: drives,
  378: .Nm
  379: volumes are not subdivided into partitions, and thus do not contain a disk
  380: label.  Unfortunately, this confuses a number of utilities, notably
  381: .Nm newfs ,
  382: which normally tries to interpret the last letter of a
  383: .Nm
  384: volume name as a partition identifier.  If you use a volume name which does not
  385: end in the letters
  386: .Ar a
  387: to
  388: .Ar c ,
  389: you must use the
  390: .Fl v
  391: flag to
  392: .Nm newfs
  393: in order to tell it to ignore this convention.
  394: .\"
  395: .It
  396: Plexes do not need to be assigned explicit names.  By default, a plex name is
  397: the name of the volume followed by the letters \f(CW.p\fR and the number of the
  398: plex.  For example, the plexes of volume
  399: .Ar vol3
  400: are called
  401: .Ar vol3.p0 ,
  402: .Ar vol3.p1
  403: and so on.  These names can be overridden, but it is not recommended.
  404: .br
  405: .It
  406: Like plexes, subdisks are assigned names automatically, and explicit naming is
  407: discouraged.  A subdisk name is the name of the plex followed by the letters
  408: \f(CW\&.s\fR and a number identifying the subdisk.  For example, the subdisks of
  409: plex
  410: .Ar vol3.p0
  411: are called
  412: .Ar vol3.p0.s0 ,
  413: .Ar vol3.p0.s1
  414: and so on.
  415: .br
  416: .It
  417: By contrast,
  418: .Nm drives
  419: must be named.  This makes it possible to move a drive to a different location
  420: and still recognize it automatically.  Drive names may be up to 32 characters
  421: long.
  422: .El
  423: .Pp
  424: EXAMPLE
  425: .Pp
  426: Assume the
  427: .Nm
  428: objects described in the section CONFIGURATION FILE in
  429: .Xr vinum 8 .
  430: The directory
  431: .Ar /dev/vinum
  432: looks like:
  433: .Bd -literal -offset indent
  434: # ls -lR /dev/vinum
  435: total 5
  436: crwxr-xr--  1 root  wheel   91,   2 Mar 30 16:08 concat
  437: crwx------  1 root  wheel   91, 0x40000000 Mar 30 16:08 control
  438: crwx------  1 root  wheel   91, 0x40000001 Mar 30 16:08 controld
  439: drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 drive
  440: drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 plex
  441: drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 rvol
  442: drwxrwxrwx  2 root  wheel       512 Mar 30 16:08 sd
  443: crwxr-xr--  1 root  wheel   91,   3 Mar 30 16:08 strcon
  444: crwxr-xr--  1 root  wheel   91,   1 Mar 30 16:08 stripe
  445: crwxr-xr--  1 root  wheel   91,   0 Mar 30 16:08 tinyvol
  446: drwxrwxrwx  7 root  wheel       512 Mar 30 16:08 vol
  447: crwxr-xr--  1 root  wheel   91,   4 Mar 30 16:08 vol5
  448: 
  449: /dev/vinum/drive:
  450: total 0
  451: crw-r-----  1 root  operator    4,  15 Oct 21 16:51 drive2
  452: crw-r-----  1 root  operator    4,  31 Oct 21 16:51 drive4
  453: 
  454: /dev/vinum/plex:
  455: total 0
  456: crwxr-xr--  1 root  wheel   91, 0x10000002 Mar 30 16:08 concat.p0
  457: crwxr-xr--  1 root  wheel   91, 0x10010002 Mar 30 16:08 concat.p1
  458: crwxr-xr--  1 root  wheel   91, 0x10000003 Mar 30 16:08 strcon.p0
  459: crwxr-xr--  1 root  wheel   91, 0x10010003 Mar 30 16:08 strcon.p1
  460: crwxr-xr--  1 root  wheel   91, 0x10000001 Mar 30 16:08 stripe.p0
  461: crwxr-xr--  1 root  wheel   91, 0x10000000 Mar 30 16:08 tinyvol.p0
  462: crwxr-xr--  1 root  wheel   91, 0x10000004 Mar 30 16:08 vol5.p0
  463: crwxr-xr--  1 root  wheel   91, 0x10010004 Mar 30 16:08 vol5.p1
  464: 
  465: /dev/vinum/sd:
  466: total 0
  467: crwxr-xr--  1 root  wheel   91, 0x20000002 Mar 30 16:08 concat.p0.s0
  468: crwxr-xr--  1 root  wheel   91, 0x20100002 Mar 30 16:08 concat.p0.s1
  469: crwxr-xr--  1 root  wheel   91, 0x20010002 Mar 30 16:08 concat.p1.s0
  470: crwxr-xr--  1 root  wheel   91, 0x20000003 Mar 30 16:08 strcon.p0.s0
  471: crwxr-xr--  1 root  wheel   91, 0x20100003 Mar 30 16:08 strcon.p0.s1
  472: crwxr-xr--  1 root  wheel   91, 0x20010003 Mar 30 16:08 strcon.p1.s0
  473: crwxr-xr--  1 root  wheel   91, 0x20110003 Mar 30 16:08 strcon.p1.s1
  474: crwxr-xr--  1 root  wheel   91, 0x20000001 Mar 30 16:08 stripe.p0.s0
  475: crwxr-xr--  1 root  wheel   91, 0x20100001 Mar 30 16:08 stripe.p0.s1
  476: crwxr-xr--  1 root  wheel   91, 0x20000000 Mar 30 16:08 tinyvol.p0.s0
  477: crwxr-xr--  1 root  wheel   91, 0x20100000 Mar 30 16:08 tinyvol.p0.s1
  478: crwxr-xr--  1 root  wheel   91, 0x20000004 Mar 30 16:08 vol5.p0.s0
  479: crwxr-xr--  1 root  wheel   91, 0x20100004 Mar 30 16:08 vol5.p0.s1
  480: crwxr-xr--  1 root  wheel   91, 0x20010004 Mar 30 16:08 vol5.p1.s0
  481: crwxr-xr--  1 root  wheel   91, 0x20110004 Mar 30 16:08 vol5.p1.s1
  482: 
  483: /dev/vinum/vol:
  484: total 5
  485: crwxr-xr--  1 root  wheel   91,   2 Mar 30 16:08 concat
  486: drwxr-xr-x  4 root  wheel       512 Mar 30 16:08 concat.plex
  487: crwxr-xr--  1 root  wheel   91,   3 Mar 30 16:08 strcon
  488: drwxr-xr-x  4 root  wheel       512 Mar 30 16:08 strcon.plex
  489: crwxr-xr--  1 root  wheel   91,   1 Mar 30 16:08 stripe
  490: drwxr-xr-x  3 root  wheel       512 Mar 30 16:08 stripe.plex
  491: crwxr-xr--  1 root  wheel   91,   0 Mar 30 16:08 tinyvol
  492: drwxr-xr-x  3 root  wheel       512 Mar 30 16:08 tinyvol.plex
  493: crwxr-xr--  1 root  wheel   91,   4 Mar 30 16:08 vol5
  494: drwxr-xr-x  4 root  wheel       512 Mar 30 16:08 vol5.plex
  495: 
  496: /dev/vinum/vol/concat.plex:
  497: total 2
  498: crwxr-xr--  1 root  wheel   91, 0x10000002 Mar 30 16:08 concat.p0
  499: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 concat.p0.sd
  500: crwxr-xr--  1 root  wheel   91, 0x10010002 Mar 30 16:08 concat.p1
  501: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 concat.p1.sd
  502: 
  503: /dev/vinum/vol/concat.plex/concat.p0.sd:
  504: total 0
  505: crwxr-xr--  1 root  wheel   91, 0x20000002 Mar 30 16:08 concat.p0.s0
  506: crwxr-xr--  1 root  wheel   91, 0x20100002 Mar 30 16:08 concat.p0.s1
  507: 
  508: /dev/vinum/vol/concat.plex/concat.p1.sd:
  509: total 0
  510: crwxr-xr--  1 root  wheel   91, 0x20010002 Mar 30 16:08 concat.p1.s0
  511: 
  512: /dev/vinum/vol/strcon.plex:
  513: total 2
  514: crwxr-xr--  1 root  wheel   91, 0x10000003 Mar 30 16:08 strcon.p0
  515: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 strcon.p0.sd
  516: crwxr-xr--  1 root  wheel   91, 0x10010003 Mar 30 16:08 strcon.p1
  517: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 strcon.p1.sd
  518: 
  519: /dev/vinum/vol/strcon.plex/strcon.p0.sd:
  520: total 0
  521: crwxr-xr--  1 root  wheel   91, 0x20000003 Mar 30 16:08 strcon.p0.s0
  522: crwxr-xr--  1 root  wheel   91, 0x20100003 Mar 30 16:08 strcon.p0.s1
  523: 
  524: /dev/vinum/vol/strcon.plex/strcon.p1.sd:
  525: total 0
  526: crwxr-xr--  1 root  wheel   91, 0x20010003 Mar 30 16:08 strcon.p1.s0
  527: crwxr-xr--  1 root  wheel   91, 0x20110003 Mar 30 16:08 strcon.p1.s1
  528: 
  529: /dev/vinum/vol/stripe.plex:
  530: total 1
  531: crwxr-xr--  1 root  wheel   91, 0x10000001 Mar 30 16:08 stripe.p0
  532: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 stripe.p0.sd
  533: 
  534: /dev/vinum/vol/stripe.plex/stripe.p0.sd:
  535: total 0
  536: crwxr-xr--  1 root  wheel   91, 0x20000001 Mar 30 16:08 stripe.p0.s0
  537: crwxr-xr--  1 root  wheel   91, 0x20100001 Mar 30 16:08 stripe.p0.s1
  538: 
  539: /dev/vinum/vol/tinyvol.plex:
  540: total 1
  541: crwxr-xr--  1 root  wheel   91, 0x10000000 Mar 30 16:08 tinyvol.p0
  542: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 tinyvol.p0.sd
  543: 
  544: /dev/vinum/vol/tinyvol.plex/tinyvol.p0.sd:
  545: total 0
  546: crwxr-xr--  1 root  wheel   91, 0x20000000 Mar 30 16:08 tinyvol.p0.s0
  547: crwxr-xr--  1 root  wheel   91, 0x20100000 Mar 30 16:08 tinyvol.p0.s1
  548: 
  549: /dev/vinum/vol/vol5.plex:
  550: total 2
  551: crwxr-xr--  1 root  wheel   91, 0x10000004 Mar 30 16:08 vol5.p0
  552: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 vol5.p0.sd
  553: crwxr-xr--  1 root  wheel   91, 0x10010004 Mar 30 16:08 vol5.p1
  554: drwxr-xr-x  2 root  wheel       512 Mar 30 16:08 vol5.p1.sd
  555: 
  556: /dev/vinum/vol/vol5.plex/vol5.p0.sd:
  557: total 0
  558: crwxr-xr--  1 root  wheel   91, 0x20000004 Mar 30 16:08 vol5.p0.s0
  559: crwxr-xr--  1 root  wheel   91, 0x20100004 Mar 30 16:08 vol5.p0.s1
  560: 
  561: /dev/vinum/vol/vol5.plex/vol5.p1.sd:
  562: total 0
  563: crwxr-xr--  1 root  wheel   91, 0x20010004 Mar 30 16:08 vol5.p1.s0
  564: crwxr-xr--  1 root  wheel   91, 0x20110004 Mar 30 16:08 vol5.p1.s1
  565: .Ed
  566: .Pp
  567: In the case of unattached plexes and subdisks, the naming is reversed.  Subdisks
  568: are named after the disk on which they are located, and plexes are named after
  569: the subdisk.
  570: .\" XXX
  571: .Nm This mapping is still to be determined.
  572: .Ss OBJECT STATES
  573: .Pp
  574: Each
  575: .Nm
  576: object has a \fIstate\fR associated with it.
  577: .Nm
  578: uses this state to determine the handling of the object.
  579: .Pp
  580: .Ss VOLUME STATES
  581: Volumes may have the following states:
  582: .sp
  583: .Bl -hang -width 14n
  584: .It Li down
  585: The volume is completely inaccessible.
  586: .It Li up
  587: The volume is up and at least partially functional.  Not all plexes may be
  588: available.
  589: .El
  590: .Ss "PLEX STATES"
  591: Plexes may have the following states:
  592: .sp
  593: .ne 1i
  594: .Bl -hang -width 14n
  595: .It Li referenced
  596: A plex entry which has been referenced as part of a volume, but which is
  597: currently not known.
  598: .It Li faulty
  599: A plex which has gone completely down because of I/O errors.
  600: .It Li down
  601: A plex which has been taken down by the administrator.
  602: .It Li initializing
  603: A plex which is being initialized.
  604: .sp
  605: The remaining states represent plexes which are at least partially up.
  606: .It Li corrupt
  607: A plex entry which is at least partially up.  Not all subdisks are available,
  608: and an inconsistency has occurred.  If no other plex is uncorrupted, the volume
  609: is no longer consistent.
  610: .It Li degraded
  611: A RAID-5 plex entry which is accessible, but one subdisk is down, requiring
  612: recovery for many I/O requests.
  613: .It Li flaky
  614: A plex which is really up, but which has a reborn subdisk which we don't
  615: completely trust, and which we don't want to read if we can avoid it.
  616: .It Li up
  617: A plex entry which is completely up.  All subdisks are up.
  618: .El
  619: .sp 2v
  620: .Ss "SUBDISK STATES"
  621: Subdisks can have the following states:
  622: .sp
  623: .ne 1i
  624: .Bl -hang -width 14n
  625: .It Li empty
  626: A subdisk entry which has been created completely.  All fields are correct, and
  627: the disk has been updated, but the on the disk is not valid.
  628: .It Li referenced
  629: A subdisk entry which has been referenced as part of a plex, but which is
  630: currently not known.
  631: .It Li initializing
  632: A subdisk entry which has been created completely and which is currently being
  633: initialized.
  634: .sp
  635: The following states represent invalid data.
  636: .It Li obsolete
  637: A subdisk entry which has been created completely.  All fields are correct, the
  638: config on disk has been updated, and the data was valid, but since then the
  639: drive has been taken down, and as a result updates have been missed.
  640: .It Li stale
  641: A subdisk entry which has been created completely.  All fields are correct, the
  642: disk has been updated, and the data was valid, but since then the drive has been
  643: crashed and updates have been lost.
  644: .sp
  645: The following states represent valid, inaccessible data.
  646: .It Li crashed
  647: A subdisk entry which has been created completely.  All fields are correct, the
  648: disk has been updated, and the data was valid, but since then the drive has gone
  649: down.  No attempt has been made to write to the subdisk since the crash, so the
  650: data is valid.
  651: .It Li down
  652: A subdisk entry which was up, which contained valid data, and which was taken
  653: down by the administrator.  The data is valid.
  654: .It Li reviving
  655: The subdisk is currently in the process of being revived.  We can write but not
  656: read.
  657: .sp
  658: The following states represent accessible subdisks with valid data.
  659: .It Li reborn
  660: A subdisk entry which has been created completely.  All fields are correct, the
  661: disk has been updated, and the data was valid, but since then the drive has gone
  662: down and up again.  No updates were lost, but it is possible that the subdisk
  663: has been damaged.  We won't read from this subdisk if we have a choice.  If this
  664: is the only subdisk which covers this address space in the plex, we set its
  665: state to up under these circumstances, so this status implies that there is
  666: another subdisk to fulfil the request.
  667: .It Li up
  668: A subdisk entry which has been created completely.  All fields are correct, the
  669: disk has been updated, and the data is valid.
  670: .El
  671: .sp 2v
  672: .Ss "DRIVE STATES"
  673: Drives can have the following states:
  674: .sp
  675: .ne 1i
  676: .Bl -hang -width 14n
  677: .It Li referenced
  678: At least one subdisk refers to the drive, but it is not currently accessible to
  679: the system.  No device name is known.
  680: .It Li down
  681: The drive is not accessible.
  682: .It Li up
  683: The drive is up and running.
  684: .El
  685: .sp 2v
  686: .Sh BUGS
  687: .Bl -enum
  688: .It
  689: .Nm
  690: is a new product.  Bugs can be expected.  The configuration mechanism is not yet
  691: fully functional.  If you have difficulties, please look at the section
  692: DEBUGGING PROBLEMS WITH VINUM before reporting problems.
  693: .It
  694: Kernels with the
  695: .Nm
  696: pseudo-device appear to work, but are not supported.  If you have trouble with
  697: this configuration, please first replace the kernel with a non-Vinum
  698: kernel and test with the kld module.
  699: .It
  700: Detection of differences between the version of the kernel and the kld is not
  701: yet implemented.
  702: .It
  703: The RAID-5 functionality is new in
  704: .Fx 3.3 .
  705: Some problems have been
  706: reported with
  707: .Nm
  708: in combination with soft updates, but these are not reproducible on all
  709: systems.  If you are planning to use
  710: .Nm
  711: in a production environment, please test carefully.
  712: .El
  713: .Sh DEBUGGING PROBLEMS WITH VINUM
  714: Solving problems with
  715: .Nm
  716: can be a difficult affair.  This section suggests some approaches.
  717: .Ss Configuration problems
  718: .Pp
  719: It is relatively easy (too easy) to run into problems with the
  720: .Nm
  721: configuration.  If you do, the first thing you should do is stop configuration
  722: updates:
  723: .if t .ps -3
  724: .if t .vs -3
  725: .Bd -literal
  726: # \fBvinum setdaemon 4\fP
  727: .Ed
  728: .if t .vs
  729: .if t .ps
  730: .Pp
  731: This will stop updates and any further corruption of the on-disk configuration.
  732: .Pp
  733: Next, look at the on-disk configuration with the
  734: .Nm vinum dumpconfig
  735: command, for example:
  736: .if t .ps -3
  737: .if t .vs -3
  738: .Bd -literal
  739: # \fBvinum dumpconfig\fP
  740: Drive 4:        Device /dev/da3h
  741:                 Created on crash.lemis.com at Sat May 20 16:32:44 2000
  742:                 Config last updated Sat May 20 16:32:56 2000
  743:                 Size:        601052160 bytes (573 MB)
  744: volume obj state up
  745: volume src state up
  746: volume raid state down
  747: volume r state down
  748: volume foo state up
  749: plex name obj.p0 state corrupt org concat vol obj
  750: plex name obj.p1 state corrupt org striped 128b vol obj
  751: plex name src.p0 state corrupt org striped 128b vol src
  752: plex name src.p1 state up org concat vol src
  753: plex name raid.p0 state faulty org disorg vol raid
  754: plex name r.p0 state faulty org disorg vol r
  755: plex name foo.p0 state up org concat vol foo
  756: plex name foo.p1 state faulty org concat vol foo
  757: sd name obj.p0.s0 drive drive2 plex obj.p0 state reborn len 409600b driveoffset 265b plexoffset 0b
  758: sd name obj.p0.s1 drive drive4 plex obj.p0 state up len 409600b driveoffset 265b plexoffset 409600b
  759: sd name obj.p1.s0 drive drive1 plex obj.p1 state up len 204800b driveoffset 265b plexoffset 0b
  760: sd name obj.p1.s1 drive drive2 plex obj.p1 state reborn len 204800b driveoffset 409865b plexoffset 128b
  761: sd name obj.p1.s2 drive drive3 plex obj.p1 state up len 204800b driveoffset 265b plexoffset 256b
  762: sd name obj.p1.s3 drive drive4 plex obj.p1 state up len 204800b driveoffset 409865b plexoffset 384b
  763: .Ed
  764: .if t .vs
  765: .if t .ps
  766: .Pp
  767: The configuration on all disks should be the same.  If this is not the case,
  768: please save the output to a file and report the problem.  There is probably
  769: little that can be done to recover the on-disk configuration, but if you keep a
  770: copy of the files used to create the objects, you should be able to re-create
  771: them.  The
  772: .Cm create
  773: command does not change the subdisk data, so this will not cause data
  774: corruption.  You may need to use the
  775: .Cm resetconfig
  776: command if you have this kind of trouble.
  777: .Ss Kernel Panics
  778: .Pp
  779: In order to analyse a panic which you suspect comes from
  780: .Nm
  781: you will need to build a debug kernel.  See the online handbook at
  782: .Pa /usr/share/doc/en/books/developers-handbook/kerneldebug.html
  783: (if installed) or
  784: .Pa http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
  785: for more details of how to do this.
  786: .Pp
  787: Perform the following steps to analyse a
  788: .Nm
  789: problem:
  790: .Bl -enum
  791: .It
  792: Copy the files
  793: .Pa /usr/src/sys/modules/vinum/.gdbinit.crash ,
  794: .Pa /usr/src/sys/modules/vinum/.gdbinit.kernel ,
  795: .Pa /usr/src/sys/modules/vinum/.gdbinit.serial ,
  796: .Pa /usr/src/sys/modules/vinum/.gdbinit.vinum
  797: and
  798: .Pa /usr/src/sys/modules/vinum/.gdbinit.vinum.paths
  799: to the directory in which you will be performing the analysis, typically
  800: .Pa /var/crash .
  801: .It
  802: Make sure that you build the
  803: .Nm
  804: module with debugging information.  The standard
  805: .Pa Makefile
  806: builds a module with debugging symbols by default.  If the version of
  807: .Nm
  808: in
  809: .Pa /modules
  810: does not contain symbols, you will not get an error message, but the stack trace
  811: will not show the symbols.  Check the module before starting
  812: .Nm gdb :
  813: .Bd -literal
  814: $ file /modules/vinum.ko
  815: /modules/vinum.ko: ELF 32-bit LSB shared object, Intel 80386,
  816:   version 1 (FreeBSD), not stripped
  817: .Ed
  818: .Pp
  819: If the output shows that
  820: .Pa /modules/vinum.ko
  821: is stripped, you will have to find a version which is not.  Usually this will be
  822: either in
  823: .Pa /usr/obj/sys/modules/vinum/vinum.ko
  824: (if you have built
  825: .Nm
  826: with a
  827: .Ar make world )
  828: or
  829: .Pa /usr/src/sys/modules/vinum/vinum.ko
  830: (if you have built
  831: .Nm
  832: in this directory).  Modify the file
  833: .Pa .gdbinit.vinum.paths
  834: accordingly.
  835: .It
  836: Either take a dump or use remote serial
  837: .Cm gdb
  838: to analyse the problem.  To analyse a dump, say
  839: .Pa /var/crash/vmcore.5 ,
  840: link
  841: .Pa /var/crash/.gdbinit.crash
  842: to
  843: .Pa /var/crash/.gdbinit
  844: and enter:
  845: .Bd -literal
  846: # cd /var/crash
  847: # gdb -k kernel.debug vmcore.5
  848: .Ed
  849: .Pp
  850: This example assumes that you have installed the correct debug kernel at
  851: .Pa /var/crash/kernel.debug .
  852: If not, substitute the correct name of the debug kernel.
  853: .Pp
  854: To perform remote serial debugging,
  855: link
  856: .Pa /var/crash/.gdbinit.serial
  857: to
  858: .Pa /var/crash/.gdbinit
  859: and enter
  860: .Bd -literal
  861: # cd /var/crash
  862: # gdb -k kernel.debug
  863: .Ed
  864: .Pp
  865: In this case, the
  866: .Pa .gdbinit
  867: file performs the functions necessary to establish connection.  The remote
  868: machine must already be in debug mode: enter the kernel debugger and select
  869: .Nm gdb .
  870: The serial
  871: .Pa .gdbinit
  872: file expects the serial connection to run at 38400 bits per second; if you run
  873: at a different speed, edit the file accordingly (look for the
  874: .Ar remotebaud
  875: specification).
  876: .Pp
  877: The following example shows a remote debugging session using the
  878: .Ar debug
  879: command of
  880: .Xr vinum 8 :
  881: .if t .ps -3
  882: .if t .vs -3
  883: .Bd -literal
  884: GDB 4.16 (i386-unknown-freebsd), Copyright 1996 Free Software Foundation, Inc.
  885: Debugger (msg=0xf1093174 "vinum debug") at ../../i386/i386/db_interface.c:318
  886: 318                 in_Debugger = 0;
  887: #1  0xf108d9bc in vinumioctl (dev=0x40001900, cmd=0xc008464b, data=0xf6dedee0 "",
  888:     flag=0x3, p=0xf68b7940) at
  889:     /usr/src/sys/modules/Vinum/../../dev/Vinum/vinumioctl.c:102
  890: 102             Debugger ("vinum debug");
  891: (kgdb) bt
  892: #0  Debugger (msg=0xf0f661ac "vinum debug") at ../../i386/i386/db_interface.c:318
  893: #1  0xf0f60a7c in vinumioctl (dev=0x40001900, cmd=0xc008464b, data=0xf6923ed0 "",
  894:       flag=0x3, p=0xf688e6c0) at
  895:       /usr/src/sys/modules/vinum/../../dev/vinum/vinumioctl.c:109
  896: #2  0xf01833b7 in spec_ioctl (ap=0xf6923e0c) at ../../miscfs/specfs/spec_vnops.c:424
  897: #3  0xf0182cc9 in spec_vnoperate (ap=0xf6923e0c) at ../../miscfs/specfs/spec_vnops.c:129
  898: #4  0xf01eb3c1 in ufs_vnoperatespec (ap=0xf6923e0c) at ../../ufs/ufs/ufs_vnops.c:2312
  899: #5  0xf017dbb1 in vn_ioctl (fp=0xf1007ec0, com=0xc008464b, data=0xf6923ed0 "",
  900:       p=0xf688e6c0) at vnode_if.h:395
  901: #6  0xf015dce0 in ioctl (p=0xf688e6c0, uap=0xf6923f84) at ../../kern/sys_generic.c:473
  902: #7  0xf0214c0b in syscall (frame={tf_es = 0x27, tf_ds = 0x27, tf_edi = 0xefbfcff8,
  903:       tf_esi = 0x1, tf_ebp = 0xefbfcf90, tf_isp = 0xf6923fd4, tf_ebx = 0x2,
  904:       tf_edx = 0x804b614, tf_ecx = 0x8085d10, tf_eax = 0x36, tf_trapno = 0x7,
  905:       tf_err = 0x2, tf_eip = 0x8060a34, tf_cs = 0x1f, tf_eflags = 0x286,
  906:       tf_esp = 0xefbfcf78, tf_ss = 0x27}) at ../../i386/i386/trap.c:1100
  907: #8  0xf020a1fc in Xint0x80_syscall ()
  908: #9  0x804832d in ?? ()
  909: #10 0x80482ad in ?? ()
  910: #11 0x80480e9 in ?? ()
  911: .Ed
  912: .if t .vs
  913: .if t .ps
  914: .Pp
  915: When entering from the debugger, it's important that the source of frame 1
  916: (listed by the
  917: .Pa .gdbinit
  918: file at the top of the example) contains the text
  919: .if t .ps -3
  920: .if t .vs -3
  921: .Bd -literal
  922: Debugger ("vinum debug");
  923: .Ed
  924: .if t .vs
  925: .if t .ps
  926: .Pp
  927: This is an indication that the address specifications are correct.  If you get
  928: some other output, your symbols and the kernel module are out of sync, and the
  929: trace will be meaningless.
  930: .El
  931: .Pp
  932: For an initial investigation, the most important information is the output of
  933: the
  934: .Nm bt
  935: (backtrace) command above.
  936: .Ss Reporting problems with Vinum
  937: .Pp
  938: If you find any bugs in
  939: .Nm ,
  940: please report them to Greg Lehey <grog@lemis.com>.  Supply the following
  941: information:
  942: .Pp
  943: .Bl -bullet
  944: .It
  945: The output of the
  946: .Nm
  947: .Cm list
  948: command.
  949: .It
  950: Any messages printed in
  951: .Pa /var/log/messages .
  952: All such messages will be identified by the text
  953: .Nm
  954: at the beginning.
  955: .It
  956: If you have a panic, a stack trace as described above.
  957: .El
  958: .Sh AUTHORS
  959: .An Greg Lehey Aq grog@lemis.com .
  960: .Sh HISTORY
  961: .Nm
  962: first appeared in
  963: .Fx 3.0 .
  964: The RAID-5 component of
  965: .Nm
  966: was developed by Cybernet Inc.
  967: .Pa www.cybernet.com
  968: for its NetMAX product.
  969: .Sh SEE ALSO
  970: .Xr disklabel 5 ,
  971: .Xr disklabel 8 ,
  972: .Xr newfs 8 ,
  973: .Xr vinum 8