´ÙÀ½ ÀÌÀü Â÷·Ê

3. ¼³Ä¡¿Í ¼³Á¤

  1. Q: Software RAID ¸¦ ¾î¶»°Ô ¼³Ä¡ÇØ¾ß °¡Àå ÁÁÀ» ±î¿ä?
    A: ³ª´Â ÆÄÀÏ ½Ã½ºÅÛ °èȹÀÌ Á» ´õ ¾î·Á¿î À¯´Ð½º ¼³Á¤ÀÛ¾÷ÀÎ °ÍÀ» ±ú´Ý µµ·Ï ³²°ÜµÐ´Ù. Áú¹®¿¡ ´ëÇÑ ´ë´äÀ¸·Î, ¿ì¸®°¡ ÇÑ ÀÏÀ» ¼³¸íÇÏ°Ú´Ù. ¿ì¸®´Â °¢°¢ 2.1 ±â°¡ÀÇ EIDE µð½ºÅ©¸¦ ¾Æ·¡¿Í °°ÀÌ ¼³Á¤ÇÒ °èȹÀ» ¼¼¿ü´Ù.

    I keep rediscovering that file-system planning is one of the more difficult Unix configuration tasks. To answer your question, I can describe what we did.

    We planned the following setup:

    • two EIDE disks, 2.1.gig each.
      disk partition mount pt.  size    device
        1      1       /        300M   /dev/hda1
        1      2       swap      64M   /dev/hda2
        1      3       /home    800M   /dev/hda3
        1      4       /var     900M   /dev/hda4
      
        2      1       /root    300M   /dev/hdc1
        2      2       swap      64M   /dev/hdc2
        2      3       /home    800M   /dev/hdc3
        2      4       /var     900M   /dev/hdc4
                          
      
    • °¢ µð½ºÅ©´Â ¸ðµÎ ºÐ¸®µÈ ÄÁÆ®·Ñ·¯¿Í ¸®º» ÄÉÀÌºí »ó¿¡ ÀÖ´Ù. ÀÌ°ÍÀº ÇϳªÀÇ ÄÁÆ®·Ñ·¯³ª ÄÉÀ̺íÀÌ °íÀå ³µÀ» ¶§, µð½ºÅ©µéÀÌ °°ÀÌ »ç¿ë ºÒ°¡´ÉÇÏ°Ô µÇ´Â °ÍÀ» ¸·¾ÆÁØ´Ù.

      Each disk is on a separate controller (& ribbon cable). The theory is that a controller failure and/or ribbon failure won't disable both disks. Also, we might possibly get a performance boost from parallel operations over two controllers/cables.

    • ·çÆ® ÆÄƼ¼Ç (/ /dev/hda1 )¿¡ ¸®´ª½º Ä¿³ÎÀ» ¼³Ä¡ÇÒ °ÍÀÌ´Ù. ÀÌ ÆÄƼ¼ÇÀ» bootable·Î ¼³Á¤Çضó.

      Install the Linux kernel on the root (/) partition /dev/hda1. Mark this partition as bootable.

    • /dev/hac1Àº /dev/hda1 ÀÇ RAID º¹»çº»ÀÌ ¾Æ´Ñ ´Ü¼ø º¹»çº»ÀÌ´Ù. ÀÌ°ÍÀ¸·Î, ù¹ø° µð½ºÅ©°¡ ¿À·ù³µÀ» ¶§ rescue µð½ºÅ©¸¦ »ç¿ëÇØ ÀÌ ÆÄƼ¼ÇÀ» bootable ¼³Á¤ÇÏ¿© ½Ã½ºÅÛÀ» ´Ù½Ã ÀνºÅçÇÏÁö ¾Ê°í »ç¿ëÇÒ ¼ö ÀÖ´Ù.

      /dev/hdc1 will contain a ``cold'' copy of /dev/hda1. This is NOT a raid copy, just a plain old copy-copy. It's there just in case the first disk fails; we can use a rescue disk, mark /dev/hdc1 as bootable, and use that to keep going without having to reinstall the system. You may even want to put /dev/hdc1's copy of the kernel into LILO to simplify booting in case of failure.

      ÀÌ°ÍÀº ½É°¢ÇÑ ¹®Á¦ ¹ß»ý½Ã, raid superblock-corruption À̳ª ´Ù¸¥ ÀÌÇØÇÒ¼ö ¾ø´Â ¹®Á¦¿¡ ´ëÇÑ °ÆÁ¤¾øÀÌ ½Ã½ºÅÛÀ» ºÎÆÃÇÒ ¼ö ÀÖ°Ô ÇØÁØ´Ù.

      The theory here is that in case of severe failure, I can still boot the system without worrying about raid superblock-corruption or other raid failure modes & gotchas that I don't understand.

    • /dev/hda3 ¿Í /dev/hdc3 ´Â ¹Ì·¯¸µÀ» ÅëÇØ /dev/md0 °¡ µÉ°ÍÀÌ´Ù.

      /dev/hda3 and /dev/hdc3 will be mirrors /dev/md0.

    • /dev/hda4 ¿Í /dev/hdc4 ´Â ¹Ì·¯¸µÀ» ÅëÇØ /dev/md1 °¡ µÉ°ÍÀÌ´Ù.

      /dev/hda4 and /dev/hdc4 will be mirrors /dev/md1.

    • ¿ì¸®´Â ¾Æ·¡¿Í °°Àº ÀÌÀ¯·Î ÆÄƼ¼ÇÀ» ³ª´©°í, /var ¿Í /home ÆÄƼ¼ÇÀ» ¹Ì·¯¸µÇϱâ·Î °áÁ¤ÇÏ¿´´Ù.

      we picked /var and /home to be mirrored, and in separate partitions, using the following logic:

      • / (·çÆ® ÆÄƼ¼Ç)ÀÇ µ¥ÀÌÅ͵éÀº »ó´ëÀûÀ¸·Î Àß º¯ÇÏÁö ¾Ê´Â´Ù.

        / (the root partition) will contain relatively static, non-changing data: for all practical purposes, it will be read-only without actually being marked & mounted read-only.

      • /home ÆÄƼ¼ÇÀº ''õõÈ÷'' º¯ÇÏ´Â µ¥ÀÌÅ͸¦ °¡Áö°í ÀÖ´Ù.

        /home will contain ''slowly'' changing data.

      • /var> ´Â ¸ÞÀÏ spool , µ¥ÀÌÅͺ£À̽º ³»¿ë, À¥ ¼­¹öÀÇ log ¿Í °°Àº ±Þ¼ÓÈ÷ º¯ÇÏ´Â µ¥ÀÌÅ͸¦ Æ÷ÇÔÇÏ°í ÀÖ´Ù.

        /var will contain rapidly changing data, including mail spools, database contents and web server logs.

      ÀÌ·¸°Ô ¿©·¯°³ÀÇ ´Ù¸¥ ÆÄƼ¼ÇÀ» ³ª´©´Â °ÍÀº, Àΰ£ÀÇ ½Ç¼ö, Àü¿ø, ȤÀº osÀÇ ¹®Á¦µîÀÌ ÀϾÀ» ¶§, ±×°ÍÀÌ ¹ÌÄ¡´Â ¿µÇâÀÌ ÇϳªÀÇ ÆÄƼ¼Ç¿¡¸¸ ÇÑÁ¤µÇ±â ¶§¹®ÀÌ´Ù.

      The idea behind using multiple, distinct partitions is that if, for some bizarre reason, whether it is human error, power loss, or an operating system gone wild, corruption is limited to one partition. In one typical case, power is lost while the system is writing to disk. This will almost certainly lead to a corrupted filesystem, which will be repaired by fsck during the next boot. Although fsck does it's best to make the repairs without creating additional damage during those repairs, it can be comforting to know that any such damage has been limited to one partition. In another typical case, the sysadmin makes a mistake during rescue operations, leading to erased or destroyed data. Partitions can help limit the repercussions of the operator's errors.

    • /usr ¿Í /opt ÆÄƼ¼ÇÀ» ¼±ÅÃÇÏ¿©µµ ±¦Âú¾ÒÀ» °ÍÀÌ´Ù. »ç½Ç, Çϵ尡 Á»´õ ÀÖ¾ú´Ù¸é, /opt ¿Í /home ÆÄƼ¼ÇÀ» RAID-5 ·Î ¼³Á¤ÇÏ´Â °ÍÀÌ ´õ ÁÁ¾ÒÀ» °ÍÀÌ´Ù. ÁÖÀÇÇÒ Á¡Àº /usr ÆÄƼ¼ÇÀ» RAID-5·Î ¼³Á¤ÇÏÁö ¸»¶ó´Â °ÍÀÌ´Ù. ½É°¢ÇÑ ¹®Á¦°¡ ÀϾÀ» °æ¿ì /usr ÆÄƼ¼Ç¿¡ ¸¶¿îÆ® ÇÒ¼ö ¾ø°Ô µÉ °ÍÀÌ°í, /usr ÆÄƼ¼Ç¾ÈÀÇ ³×Æ®¿öÅ© Åø°ú ÄÄÆÄÀÏ·¯ °°Àº °ÍµéÀ» ÇÊ¿ä·Î ÇÏ°Ô µÉ °ÍÀÌ´Ù. RAID-1À» »ç¿ëÇÑ´Ù¸é, ÀÌ·± ¿¡·¯°¡ ³µÀ»¶§, RAID´Â »ç¿ëÇÒ¼ö ¾ø¾îµµ µÎ°³ÀÇ ¹Ì·¯¸µµÈ °ÍÁß Çϳª¿¡´Â ¸¶¿îÆ®°¡ °¡´ÉÇÏ´Ù.

      Other reasonable choices for partitions might be /usr or /opt. In fact, /opt and /home make great choices for RAID-5 partitions, if we had more disks. A word of caution: DO NOT put /usr in a RAID-5 partition. If a serious fault occurs, you may find that you cannot mount /usr, and that you want some of the tools on it (e.g. the networking tools, or the compiler.) With RAID-1, if a fault has occurred, and you can't get RAID to work, you can at least mount one of the two mirrors. You can't do this with any of the other RAID levels (RAID-5, striping, or linear append).

    ±×·¡¼­ Áú¹®¿¡ ´ëÇÑ ¿Ï¼ºµÈ ´äÀº:

    • ù¹ø° µð½ºÅ©ÀÇ Ã¹¹ø° ÆÄƼ¼Ç¿¡ ¿î¿µÃ¼Á¦¸¦ ¼³Ä¡ÇÏ°í ´Ù¸¥ ÆÄƼ¼ÇµéÀº ¸¶¿îÆ®ÇÏÁö ¸»¾Æ¶ó.

      install the OS on disk 1, partition 1. do NOT mount any of the other partitions.

    • ¸í·É´ÜÀ§·Î RAID¸¦ ¼³Ä¡Ç϶ó.

      install RAID per instructions.

    • md0 ¿Í md1. ¼³Á¤Ç϶ó.

      configure md0 and md1.

    • µð½ºÅ© ¿À·ù°¡ ÀϾÀ» ¶§ ¹«¾ùÀ» ÇØ¾ß ÇÏ´Â Áö ÁغñÇضó. °ü¸®ÀÚ°¡ Áö±Ý ½Ç¼öÇÏ´ÂÁö ã¾Æº¸°í, Ÿ°ÝÀ» ÀÔ°Ô ³öµÎÁö ¸¶¶ó. ±×¸®°í °æÇèÀ» ½×¾Æ¶ó. (¿ì¸®´Â µð½ºÅ©°¡ ÀÛµ¿ÇÏ°í ÀÖ´Â µ¿¾È, Àü¿øÀ» ²¨º¸¾Ò´Ù. ÀÌ°ÍÀº ¸ÛûÇغ¸ÀÌÁö¸¸, Á¤º¸¸¦ ¾òÀ» ¼ö ÀÖ´Ù.)

      convince yourself that you know what to do in case of a disk failure! Discover sysadmin mistakes now, and not during an actual crisis. Experiment! (we turned off power during disk activity — this proved to be ugly but informative).

    • /var ¸¦ /dev/md1À¸·Î ¿Å±â´Â Áß, ¾î´À Á¤µµ À߸øµÈ mount/copy/unmount/rename/reboot À» Çغ¸¶ó. Á¶½ÉÈ÷¸¸ ÇÑ´Ù¸é, À§ÇèÇÏÁö´Â ¾ÊÀ» °ÍÀÌ´Ù.

      do some ugly mount/copy/unmount/rename/reboot scheme to move /var over to the /dev/md1. Done carefully, this is not dangerous.

    • ±×¸®°í, ±×°ÍµéÀ» Áñ°Ü¶ó.
  2. Q: mdadd, mdrun µîÀÇ ¸í·É°ú raidadd, raidrun ¸í·ÉÀÇ ´Ù¸¥ Á¡ÀÌ ¹º°¡¿ä?
    A: raidtools ÆÐÅ°ÁöÀÇ 0.5 ¹öÁ¯ºÎÅÍ À̸§ÀÌ ¹Ù²î¾ú´Ù. md·Î À̸§ÀÌ ºÙ´Â °ÍÀº 0.43 ÀÌÀü¹öÁ¯ÀÌ°í raid·Î À̸§ÀÌ ºÙ´Â °ÍÀº 0.5 ¹öÁ¯°ú ´õ »õ¹öÁ¯µéÀÌ´Ù..

    The names of the tools have changed as of the 0.5 release of the raidtools package. The md naming convention was used in the 0.43 and older versions, while raid is used in 0.5 and newer versions.

  3. Q: °¡Áö°í ÀÖ´Â 2.0.34 Ä¿³Î¿¡¼­ RAID-linear ¿Í RAID-0 ¸¦ »ç¿ëÇÏ°í ½Í´Ù. RAID-linear ¿Í RAID-0 À» À§Çؼ­ ÆÐÄ¡°¡ ÇÊ¿äÇÏÁö ¾Ê±â ¶§¹®¿¡. raid ÆÐÄ¡´Â ÇÏ°í ½ÍÁö ¾Ê´Ù. ¾îµð¿¡ °¡¸é, À̰͵éÀ» À§ÇÑ raid-tool À» ±¸ÇÒ¼ö ÀÖ³ª?
    A: °ú°ÜÇÑ Áú¹®ÀÌ´Ù. »ç½Ç, ÃÖ½ÅÀÇ raid toolµéÀº ÄÄÆÄÀÏ Çϱâ À§ÇØ RAID-1,4,5 Ä¿³Î ÆÐÄ¡¸¦ ÇÊ¿ä·Î ÇÑ´Ù. ÇöÀç raid toolÀÇ ÄÄÆÄÀÏµÈ ¹ÙÀ̳ʸ® ¹öÁ¯Ã£Áö ¸øÇß´Ù. ÇÏÁö¸¸, 2.1.100 Ä¿³Î¿¡¼­ ÄÄÆÄÀÏµÈ ¹ÙÀ̳ʸ®°¡ 2.0.34 Ä¿³Î¿¡¼­ RAID-0/linear ÆÄƼ¼ÇÀ» ¸¸µå´Â °ÍÀ» Àß ¼öÇàÇÏ´Â °ÍÀ» º¸¾Ò´Ù. ±×·¡¼­, ³ª´Â http://linas.org/linux/Software-RAID/ ¿¡ mdadd,mdcreateµîÀÇ ¹ÙÀ̳ʸ®¸¦ ÀÓ½ÃÀûÀ¸·Î ¿Ã¸°´Ù.

    This is a tough question, indeed, as the newest raid tools package needs to have the RAID-1,4,5 kernel patches installed in order to compile. I am not aware of any pre-compiled, binary version of the raid tools that is available at this time. However, experiments show that the raid-tools binaries, when compiled against kernel 2.1.100, seem to work just fine in creating a RAID-0/linear partition under 2.0.34. A brave soul has asked for these, and I've temporarily placed the binaries mdadd, mdcreate, etc. at http://linas.org/linux/Software-RAID/ You must get the man pages, etc. from the usual raid-tools package.

  4. Q: ·çÆ® ÆÄƼ¼Ç¿¡ RAID¸¦ Àû¿ëÇÒ ¼ö Àִ°¡? ¿Ö md µð½ºÅ©·Î Á÷Á¢ ºÎÆÃÇÒ ¼ö ¾ø´Â°¡?
    A: LILO¿Í Loadlin ¸ðµÎ RAID ÆÄƼ¼Ç¿¡¼­ Ä¿³ÎÀ̹ÌÁö¸¦ Àоî¿Ã ¼ö ¾ø´Ù. ·çÆ® ÆÄƼ¼Ç¿¡ RAID¸¦ Àû¿ëÇÏ°í ½Í´Ù¸é, Ä¿³ÎÀ» ÀúÀåÇÒ RAID°¡ ¾Æ´Ñ ÆÄƼ¼ÇÀ» ¸¸µé¾î¾ß ÇÒ°ÍÀÌ´Ù. (ÀϹÝÀûÀ¸·Î ÀÌ ÆÄƼ¼ÇÀÇ À̸§Àº /bootÀÌ´Ù.) < HarryH@Royal.Net> ·ÎºÎÅÍ ¹ÞÀº initial ramdisk (initrd) ¶Ç´Â, ÆÐÄ¡´Â RAID µð½ºÅ©¸¦ root µð¹ÙÀ̽º·Î »ç¿ë°¡´ÉÇÏ°Ô ÇØ ÁÙ°ÍÀÌ´Ù. (ÀÌ ÆÐÄ¡´Â ÃÖ±Ù 2.1.xÄ¿³Î¿¡´Â ±âº»ÀûÀ¸·Î äÅõǾîÀÖ´Ù.)

    Both LILO and Loadlin need an non-stripped/mirrored partition to read the kernel image from. If you want to strip/mirror the root partition (/), then you'll want to create an unstriped/mirrored partition to hold the kernel(s). Typically, this partition is named /boot. Then you either use the initial ramdisk support (initrd), or patches from Harald Hoyer < HarryH@Royal.Net> that allow a stripped partition to be used as the root device. (These patches are now a standard part of recent 2.1.x kernels)

    °Å±â¿¡´Â »ç¿ëÇÒ ¼ö ÀÖ´Â ¸î°¡Áö ¹æ¹ýÀÌ Àִµ¥, Çϳª´Â Bootable RAID mini-HOWTO: ftp://ftp.bizsystems.com/pub/raid/bootable-raid¿¡ ÀÚ¼¼È÷ ¼³¸íµÇ¾î ÀÖ´Ù.

    There are several approaches that can be used. One approach is documented in detail in the Bootable RAID mini-HOWTO: ftp://ftp.bizsystems.com/pub/raid/bootable-raid.

    ¶Ç´Â, ¾Æ·¡Ã³·³ mkinitrd ¸¦ »ç¿ëÇØ ramdisk image¸¦ ¸¸µé¼öµµ ÀÖ´Ù.

    Alternately, use mkinitrd to build the ramdisk image, see below.

    Edward Welbon < welbon@bga.com> writes:

    • ... all that is needed is a script to manage the boot setup. To mount an md filesystem as root, the main thing is to build an initial file system image that has the needed modules and md tools to start md. I have a simple script that does this.
    • For boot media, I have a small cheap SCSI disk (170MB I got it used for $20). This disk runs on a AHA1452, but it could just as well be an inexpensive IDE disk on the native IDE. The disk need not be very fast since it is mainly for boot.
    • This disk has a small file system which contains the kernel and the file system image for initrd. The initial file system image has just enough stuff to allow me to load the raid SCSI device driver module and start the raid partition that will become root. I then do an
      echo 0x900 > /proc/sys/kernel/real-root-dev
                    
      
      (0x900 is for /dev/md0) and exit linuxrc. The boot proceeds normally from there.
    • I have built most support as a module except for the AHA1452 driver that brings in the initrd filesystem. So I have a fairly small kernel. The method is perfectly reliable, I have been doing this since before 2.1.26 and have never had a problem that I could not easily recover from. The file systems even survived several 2.1.4[45] hard crashes with no real problems.
    • At one time I had partitioned the raid disks so that the initial cylinders of the first raid disk held the kernel and the initial cylinders of the second raid disk hold the initial file system image, instead I made the initial cylinders of the raid disks swap since they are the fastest cylinders (why waste them on boot?).
    • The nice thing about having an inexpensive device dedicated to boot is that it is easy to boot from and can also serve as a rescue disk if necessary. If you are interested, you can take a look at the script that builds my initial ram disk image and then runs LILO.
      http://www.realtime.net/~welbon/initrd.md.tar.gz
      It is current enough to show the picture. It isn't especially pretty and it could certainly build a much smaller filesystem image for the initial ram disk. It would be easy to a make it more efficient. But it uses LILO as is. If you make any improvements, please forward a copy to me. 8-)
  5. Q: striping À§¿¡ ¹Ì·¯¸µÀÌ °¡´ÉÇÏ´Ù°í µé¾ú´Âµ¥, »ç½ÇÀΰ¡? loopback ÀåÄ¡·Î ¹Ì·¯¸µÇÒ ¼ö Àִ°¡?
    A: ±×·¸´Ù. ÇÏÁö¸¸, ±× ¹Ý´ë·Î´Â ¾ÈµÈ´Ù.

    Yes, but not the reverse. That is, you can put a stripe over several disks, and then build a mirror on top of this. However, striping cannot be put on top of mirroring.

    °£´ÜÈ÷ ±â¼úÀûÀÎ ¼³¸íÀ» µ¡ºÙÀÌÀÚ¸é, linear ¿Í stripe´Â ÀÚüÀûÀ¸·Î ll_rw_blk ·çƾÀ» »ç¿ëÇÏ´Â µ¥ ÀÌ°ÍÀº block ¸¦ »ç¿ëÇÏÁö ¾Ê°í µð½ºÅ© device¿Í sector¸¦ »ç¿ëÇØ Á¤½ÄÀûÀ¸·Î, ±×¸®°í Àú¼öÁØÀÇ access¸¦ ÇÑ´Ù, ¶§¹®¿¡, ´Ù¸¥ ¹Ì·¯¸µÀ§¿¡ À§Ä¡½Ãų¼ö ¾ø´Ù.

    A brief technical explanation is that the linear and stripe personalities use the ll_rw_blk routine for access. The ll_rw_blk routine maps disk devices and sectors, not blocks. Block devices can be layered one on top of the other; but devices that do raw, low-level disk accesses, such as ll_rw_blk, cannot.

    ÇöÀç (1997³â 11¿ù) RAID´Â loopback device¸¦ Áö¿øÇÏÁö ¾ÊÁö¸¸, °ð Áö¿øÇÒ °ÍÀÌ´Ù.

    Currently (November 1997) RAID cannot be run over the loopback devices, although this should be fixed shortly.

  6. Q: µÎ°³ÀÇ ÀÛÀº µð½ºÅ©¿Í ¼¼°³ÀÇ Å« µð½ºÅ©¸¦ °¡Áö°í ÀÖÀ»¶§, ÀÛÀº µð½ºÅ© µÎ°³¸¦ RAID-0À¸·Î ¹­Àº ÈÄ, ³ª¸ÓÁö µð½ºÅ©µé°ú, RAID-5¸¦ ¸¸µé¼ö Àִ°¡?
    A: 1997³â 11¿ù ÇöÀç, RAID-5·Î ¹­À» ¼ö´Â ¾ø´Ù. ¹­¿©Áø µð½ºÅ©µé·Î´Â RAID-1(¹Ì·¯¸µ)¸¸ °¡´ÉÇÏ´Ù.

    Currently (November 1997), for a RAID-5 array, no. Currently, one can do this only for a RAID-1 on top of the concatenated drives.

  7. Q: µÎ°³ÀÇ µð½ºÅ©·Î RAID-1 À» ¼³Á¤ÇÏ´Â °Í°ú, RAID-5¸¦ ¼³Á¤ÇÏ´Â °ÍÀÌ ¾î¶»°Ô ´Ù¸¥°¡?
    A: µ¥ÀÌÅÍÀÇ ÀúÀåÀ²¿¡´Â Â÷ÀÌ°¡ ¾ø´Ù. µð½ºÅ©¸¦ ´õ ºÙÈù´Ù°í ÀúÀåÀ²ÀÌ ´Ã¾î°¡´Â °Íµµ ¾Æ´Ï´Ù.

    There is no difference in storage capacity. Nor can disks be added to either array to increase capacity (see the question below for details).

    RAID-1 Àº °¢ µå¶óÀ̺꿡¼­ µÎ ¼½Å͸¦ µ¿½Ã¿¡ Àд ºÐ»ê ±â¼úÀ» »ç¿ëÇϱ⠶§¹®¿¡ µÎ¹èÀÇ Àб⠼º´ÉÀ» º¸¿©ÁØ´Ù.

    RAID-1 offers a performance advantage for reads: the RAID-1 driver uses distributed-read technology to simultaneously read two sectors, one from each drive, thus doubling read performance.

    RAID-5´Â ¸¹Àº °ÍµéÀ» Æ÷ÇÔÇÏÁö¸¸, 1997³â 9¿ù ÇöÀç ±îÁö´Â, µ¥ÀÌÅÍ µð½ºÅ©°¡ parity µð½ºÅ©·Î ½ÇÁ¦ÀûÀ¸·Î ¹Ì·¯¸µµÇÁö´Â ¾Ê´Â´Ù. ¶§¹®¿¡ µ¥ÀÌÅ͸¦ º´·Ä·Î ÀÐÁö´Â ¾Ê´Â´Ù.

    The RAID-5 driver, although it contains many optimizations, does not currently (September 1997) realize that the parity disk is actually a mirrored copy of the data disk. Thus, it serializes data reads.

  8. Q: µÎ°³ÀÇ µð½ºÅ©°¡ ¸Á°¡Á³À»¶§¿¡´Â ¾î¶»°Ô ´ëºñÇÏÁÒ?
    A: ¸î¸îÀÇ RAID ´Â ¾Ë°í¸®ÁòÀº ¿©·¯°³ÀÇ µð½ºÅ©°¡ ¸Á°¡Á³À» ¶§¸¦ ´ëºñÇÒ ¼ö ÀÖ´Ù. ÇÏÁö¸¸, ÇöÀç ¸®´ª½º¿¡¼­ Áö¿øµÇÁö´Â ¾Ê´Â´Ù. ±×·¯³ª, RAIDÀ§¿¡ RAID¸¦ ±¸ÃàÇÔÀ¸·Î½á, Linux Software RAID·Îµµ, ±×·± »óȲ¿¡ ´ëºñÇÒ ¼ö ÀÖ´Ù. ¿¹¸¦ µé¸é,9°³ÀÇ µð½ºÅ©·Î 3°³ÀÇ RAID-5¸¦ ¸¸µé°í ´Ù½Ã ±×°ÍÀ» ÇϳªÀÇ RAID-5 ·Î ¸¸µå´Â °ÍÀÌ´Ù. ÀÌ·± ¼³Á¤Àº 3°³ÀÇ µð½ºÅ©°¡ ¸Á°¡Á³À»¶§±îÁö ´ëºñÇÒ ¼ö ÀÖÁö¸¸, ¸¹Àº °ø°£ÀÌ ''³¶ºñ''µÈ´Ù´Â °ÍÀ» ÁÖ¸ñÇ϶ó.

    Some of the RAID algorithms do guard against multiple disk failures, but these are not currently implemented for Linux. However, the Linux Software RAID can guard against multiple disk failures by layering an array on top of an array. For example, nine disks can be used to create three raid-5 arrays. Then these three arrays can in turn be hooked together into a single RAID-5 array on top. In fact, this kind of a configuration will guard against a three-disk failure. Note that a large amount of disk space is ''wasted'' on the redundancy information.

        For an NxN raid-5 array,
        N=3, 5 out of 9 disks are used for parity (=55%)
        N=4, 7 out of 16 disks
        N=5, 9 out of 25 disks
        ...
        N=9, 17 out of 81 disks (=~20%)
                
    
    ÀϹÝÀûÀ¸·Î, MxN °³·Î ¸¸µé¾îÁø RAID¸¦ À§ÇØ M+N-1 °³ÀÇ µð½ºÅ©°¡ parity ·Î »ç¿ëµÇ°í, M = N À϶§ ¹ö·ÁÁö´Â ¾çÀÌ ÃÖ¼Ò°¡ µÉ °ÍÀÌ´Ù.

    In general, an MxN array will use M+N-1 disks for parity. The least amount of space is "wasted" when M=N.

    ´Ù¸¥ ¹æ¹ýÀº ¼¼°³ÀÇ µð½ºÅ©(RAID-5·Î ¼³Á¤µÈ)·Î RAID-1À» ¸¸µå´Â °ÍÀÌ´Ù. ±×°ÍÀº, ¼¼°³ÀÇ µð½ºÅ©Áß °°Àº µ¥ÀÌÅ͸¦ °¡Áö´Â 2/3À» ³¶ºñÇÏ°Ô µÉ °ÍÀÌ´Ù.

    Another alternative is to create a RAID-1 array with three disks. Note that since all three disks contain identical data, that 2/3's of the space is ''wasted''.

  9. Q: ÆÄƼ¼ÇÀÌ Á¦´ë·Î unmount µÇÁö ¾Ê¾ÒÀ» ¶§ fsck°¡ ½ÇÇàµÇ¾î¼­ ÆÄÀϽýºÅÛÀ» ½º½º·Î °íÄ¡´Â °ÍÀÌ ¾î¶»°Ô °¡´ÉÇÑÁö ¾Ë°í ½Í´Ù. RAID ½Ã½ºÅÛÀ» ckraid --fix ·Î °íÄ¥¼ö Àִµ¥ ¿Ö ±×°ÍÀ» ÀÚµ¿À¸·Î ÇÏÁö ¾Ê´Â°¡?

    I'd like to understand how it'd be possible to have something like fsck: if the partition hasn't been cleanly unmounted, fsck runs and fixes the filesystem by itself more than 90% of the time. Since the machine is capable of fixing it by itself with ckraid --fix, why not make it automatic?

    A: /etc/rc.d/rc.sysinit ¿¡ ¾Æ·¡¿Í °°ÀÌ Ãß°¡ÇÔÀ¸·Î½á ÇÒ¼ö ÀÖ´Ù.

    This can be done by adding lines like the following to /etc/rc.d/rc.sysinit:

        mdadd /dev/md0 /dev/hda1 /dev/hdc1 || {
            ckraid --fix /etc/raid.usr.conf
            mdadd /dev/md0 /dev/hda1 /dev/hdc1
        }
                
    
    or
        mdrun -p1 /dev/md0
        if [ $? -gt 0 ] ; then
                ckraid --fix /etc/raid1.conf
                mdrun -p1 /dev/md0
        fi
                
    
    Á»´õ ¿Ïº®ÇÑ ½ºÅ©¸³Æ®¸¦ ¸¸µé±â ÀÌÀü¿¡ ½Ã½ºÅÛÀÌ ¾î¶»°Ô ÄÑÁö´ÂÁö º¸µµ·Ï ÇÏÀÚ.

    Before presenting a more complete and reliable script, lets review the theory of operation.

    Á¤»óÀûÀ¸·Î Á¾·áµÇÁö ¾Ê¾Ò´Ù¸é, ¸®´ª½º´Â ¾Æ·¡¿Í °°Àº »óÅÂÁßÀÇ ÇϳªÀÏ ²¨¶ó°í Gadi OxmanÀº ¸»Çß´Ù.

    Gadi Oxman writes: In an unclean shutdown, Linux might be in one of the following states:

    RAID-1À» »ç¿ëÇÑ´Ù¸é, À§ÀÇ Ã¹¹ø° °æ¿ì¿¡¼­, ¾î´ÀÁ¤µµ¸¸ ¹Ì·¯¸µµÆÀ» °æ¿ì°¡ »ý±ä´Ù. ÀÌ·± °æ¿ì, ´ÙÀ½ ºÎÆö§, ¹Ì·¯¸µµÈ µ¥ÀÌÅÍ°¡ ¼­·Î °°Áö ¾ÊÀ» °ÍÀÌ´Ù.

    ÀÌ·±°æ¿ì¿¡ ¹Ì·¯¸µÀÌ ´Ù¸¥°É ¹«½ÃÇÑ´Ù¸é, Àбâ½Ã ¹Ì·¯¸µµÈ °ÍÁß Çϳª¸¦ ¼±ÅÃÇÒ °ÍÀÌ°í, ¸ð¼øµÈ °á°ú¸¦ Ãâ·ÂÇÒ °ÍÀÌ´Ù.

    Suppose we were using a RAID-1 array. In (2a), it might happen that before the crash, a small number of data blocks were successfully written only to some of the mirrors, so that on the next reboot, the mirrors will no longer contain the same data.

    If we were to ignore the mirror differences, the raidtools-0.36.3 read-balancing code might choose to read the above data blocks from any of the mirrors, which will result in inconsistent behavior (for example, the output of e2fsck -n /dev/md0 can differ from run to run).

    RAID ´Â ºñÁ¤»óÀûÀÎ shutdownÀ» À§ÇØ ¼³°èµÈ °ÍÀÌ ¾Æ´Ï°í, ÀϹÝÀûÀ¸·Î ¹Ì·¯¸µµÈ µ¥ÀÌÅÍ°¡ ´Ù¸¦ ¶§³ª, ÆÄÀϽýºÅÛÀÌ °íÀå³µÀ» ¶§ÀÇ ¿Ïº®ÇÑ ÇØ°áÃ¥µµ ¾ø´Ù.

    Since RAID doesn't protect against unclean shutdowns, usually there isn't any ''obviously correct'' way to fix the mirror differences and the filesystem corruption.

    For example, by default ckraid --fix will choose the first operational mirror and update the other mirrors with its contents. However, depending on the exact timing at the crash, the data on another mirror might be more recent, and we might want to use it as the source mirror instead, or perhaps use another method for recovery.

    ¾Æ·¡ÀÇ ½ºÅ©¸³Æ®¸¦ rc.raid.init ¿¡ Ãß°¡ÇÏ°í, ±× µð·ºÅ丮¿¡ path¸¦ °É¾î¶ó. ±×°ÍÀº Á»´õ ¾ÈÀüÇÑ ºÎÆÃÀ» Áö¿øÇÒ °ÍÀÌ°í, ƯÈ÷, ÀÏÄ¡ÇÏÁö ¾Ê´Â µð½ºÅ©³ª, ÄÜÆ®·Ñ·¯, ÄÜÆ®·Ñ·¯ µå¶óÀ̹öµîÀÌ ÀÖÀ¸¸é, Áú°í ¹Ýº¹ÀûÀ¸·Î chraid¸¦ ½ÇÇàÇÒ °ÍÀÌ´Ù. rc.raid.init´Â fsck·Î ·çÆ®ÆÄƼ¼ÇÀÌ Ã¼Å©µÇ°í Read Write ¸¶¿îÆ® µÈ »óÅ¿¡¼­ ÀÛµ¿ÇÒ °ÍÀÌ´Ù.

    The following script provides one of the more robust boot-up sequences. In particular, it guards against long, repeated ckraid's in the presence of uncooperative disks, controllers, or controller device drivers. Modify it to reflect your config, and copy it to rc.raid.init. Then invoke rc.raid.init after the root partition has been fsck'ed and mounted rw, but before the remaining partitions are fsck'ed. Make sure the current directory is in the search path.

        mdadd /dev/md0 /dev/hda1 /dev/hdc1 || {
            rm -f /fastboot             # force an fsck to occur  
            ckraid --fix /etc/raid.usr.conf
            mdadd /dev/md0 /dev/hda1 /dev/hdc1
        }
        # if a crash occurs later in the boot process,
        # we at least want to leave this md in a clean state.
        /sbin/mdstop /dev/md0
    
        mdadd /dev/md1 /dev/hda2 /dev/hdc2 || {
            rm -f /fastboot             # force an fsck to occur  
            ckraid --fix /etc/raid.home.conf
            mdadd /dev/md1 /dev/hda2 /dev/hdc2
        }
        # if a crash occurs later in the boot process,
        # we at least want to leave this md in a clean state.
        /sbin/mdstop /dev/md1
    
        mdadd /dev/md0 /dev/hda1 /dev/hdc1
        mdrun -p1 /dev/md0
        if [ $? -gt 0 ] ; then
            rm -f /fastboot             # force an fsck to occur  
            ckraid --fix /etc/raid.usr.conf
            mdrun -p1 /dev/md0
        fi
        # if a crash occurs later in the boot process,
        # we at least want to leave this md in a clean state.
        /sbin/mdstop /dev/md0
    
        mdadd /dev/md1 /dev/hda2 /dev/hdc2
        mdrun -p1 /dev/md1
        if [ $? -gt 0 ] ; then
            rm -f /fastboot             # force an fsck to occur  
            ckraid --fix /etc/raid.home.conf
            mdrun -p1 /dev/md1
        fi
        # if a crash occurs later in the boot process,
        # we at least want to leave this md in a clean state.
        /sbin/mdstop /dev/md1
    
        # OK, just blast through the md commands now.  If there were
        # errors, the above checks should have fixed things up.
        /sbin/mdadd /dev/md0 /dev/hda1 /dev/hdc1
        /sbin/mdrun -p1 /dev/md0
        
        /sbin/mdadd /dev/md12 /dev/hda2 /dev/hdc2
        /sbin/mdrun -p1 /dev/md1
    
                
    
    ¾Æ·¡¿Í °°ÀÌ rc.raid.halt¸¦ Ãß°¡ÇÏ°í ½Í´Ù¸é Ãß°¡Ç϶ó.

    In addition to the above, you'll want to create a rc.raid.halt which should look like the following:

        /sbin/mdstop /dev/md0
        /sbin/mdstop /dev/md1
                
    
    rc.sysinit¿Í init.d/halt ½ºÅ©¸³¤¼Áß ½Ã½ºÅÛÀÇ halt/roboot¸¦ À§ÇÑ ¸ðµç unmount Àü¿¡ À§ ÆÄÀÏÀ» ÷ºÎ½ÃÄѶó. ( rc.sysinit¿¡´Â fsck°¡ ½ÇÆÐÇßÀ»¶§, µð½ºÅ©¸¦ unmount ÇÏ°í reboot ÇÏ´Â ·çƾÀÌ ÀÖ´Ù.)

    Be sure to modify both rc.sysinit and init.d/halt to include this everywhere that filesystems get unmounted before a halt/reboot. (Note that rc.sysinit unmounts and reboots if fsck returned with an error.)

  10. Q: ÇöÀç °¡Áø ÇϳªÀÇ µð½ºÅ©·Î ¹ÝÂÊÂ¥¸® RAID-1À» ±¸¼ºÈÄ, ³ªÁß¿¡ µð½ºÅ©¸¦ Ãß°¡ÇÒ¼ö ÀÖ³ª¿ä?
    A: ÇöÀçÀÇ µµ±¸µé·Î´Â ºÒ°¡´ÉÇÏ°í, ¾î¶² ½¬¿î ¹æ¹ýµµ ¾ø´Ù. ƯÈ÷, µð½ºÅ©ÀÇ ³»¿ëÀ» ´ÜÁö º¹»çÇÔÀ¸·Î½á, ¹Ì·¯¸µÀº ÀÌ·ç¾îÁöÁö ¾Ê´Â´Ù. RAID µå¶óÀ̹ö´Â ÇÇƼ¼Ç³¡ÀÇ ÀÛÀº °ø°£À» superblock·Î »ç¿ëÇϱ⠶§¹®ÀÌ´Ù. ÀÌ°ÍÀº ÀÛÀº °ø°£¸¸À» Â÷ÁöÇÏÁö¸¸, ÀÌ¹Ì Á¸ÀçÇÏ´Â ÆÄÀÏ ½Ã½ºÅ׿¡ ´Ü¼øÈ÷ º¹»çÇÏ·Á ÇÒ°æ¿ì, superblockÀº ÆÄÀÏ ½Ã½ºÅÛÀ» µ¤¾î¾º¿ï °ÍÀÌ°í, µ¥ÀÌÅ͸¦ ¾û¸ÁÀ¸·Î ¸¸µé°ÍÀÌ´Ù. ext2fs ÆÄÀϽýºÅÛÀº ÆÄÀϵéÀÌ Á¶°¢³ª´Â °ÍÀ» ¸·±âÀ§ÇØ, ÆÄÀϵéÀ» ¹«ÀÛÀ§·Î ¹èÄ¡½ÃÄѿԱ⠶§¹®¿¡, µð½ºÅ©¸¦ ¸ðµÎ »ç¿ëÇϱâ Àü¿¡, ÆÄƼ¼ÇÀÇ ³¡ºÎºÐÀÌ ÃæºÐÈ÷ »ç¿ëµÉ ¼ö ÀÖ´Ù.

    With the current tools, no, not in any easy way. In particular, you cannot just copy the contents of one disk onto another, and then pair them up. This is because the RAID drivers use glob of space at the end of the partition to store the superblock. This decreases the amount of space available to the file system slightly; if you just naively try to force a RAID-1 arrangement onto a partition with an existing filesystem, the raid superblock will overwrite a portion of the file system and mangle data. Since the ext2fs filesystem scatters files randomly throughput the partition (in order to avoid fragmentation), there is a very good chance that some file will land at the very end of a partition long before the disk is full.

    ´ç½ÅÀÌ À¯´ÉÇÏ´Ù¸é, superblockÀÌ ¾î´ÀÁ¤µµÀÇ °ø°£À» Â÷ÁöÇÏ´ÂÁö °è»êÇؼ­, ÆÄÀϽýºÅÛÀ» Á¶±Ý ÀÛ°Ô ¸¸µé°ÍÀ» Á¦¾ÈÇÑ´Ù. ±×¸®°í, µð½ºÅ©¸¦ Ãß°¡ÇÒ¶§, RAID ÅøÀ» ´ç½Å¿¡ ¸Â°Ô °íÃļ­ »ç¿ëÇØ¾ß ÇÒ°ÍÀÌ´Ù. (±× ÅøµéÀÌ ²ûÂïÇÏ°Ô º¹ÀâÇÏÁö´Â ¾Ê´Ù.)

    If you are clever, I suppose you can calculate how much room the RAID superblock will need, and make your filesystem slightly smaller, leaving room for it when you add it later. But then, if you are this clever, you should also be able to modify the tools to do this automatically for you. (The tools are not terribly complex).

    ÁÖÀDZí°Ô ÀÐÀº »ç¶÷À̶ó¸é, ¾Æ·¡¿Í °°Àº °ÍÀÌ ÀÛµ¿ÇÒ°ÍÀ̶ó°í ÁöÀûÇßÀ» °ÍÀÌ´Ù. ³ª´Â ÀÌ°ÍÀ» ½ÃµµÇغ¸°Å³ª Áõ¸íÇغ¸Áö´Â ¸øÇß´Ù. /dev/nullÀ» ÇϳªÀÇ ±â±â·Î½á mkraid¿¡ ÀÌ¿ëÇÏ´Â °ÍÀÌ´Ù. ÁøÂ¥ µð½ºÅ© Çϳª¸¦ °¡Áö°í mdadd -r ¸¦ ½ÇÇà½ÃŲÈÄ, mkraid·Î RAID ¹è¿­À» ¸¸µé¼ö ÀÖÀ» °ÍÀÌ°í, µð½ºÅ© Çϳª°¡ ±úÁ³À» ¶§Ã³·³ "degraded" ¸ðµå·Î ÀÛµ¿½Ãų¼ö ÀÖÀ»°ÍÀÌ´Ù.

    Note:A careful reader has pointed out that the following trick may work; I have not tried or verified this: Do the mkraid with /dev/null as one of the devices. Then mdadd -r with only the single, true disk (do not mdadd /dev/null). The mkraid should have successfully built the raid array, while the mdadd step just forces the system to run in "degraded" mode, as if one of the disks had failed.


´ÙÀ½ ÀÌÀü Â÷·Ê