Page Body

GNU/Linux Drive, Partition, and File System Management

Table of Contents

Storage drives store data via blocks.

Note: If you are not familiar with the GNU/Linux command line interface, review the Conventions page before proceeding.

Data Block Addressing

A drive's data blocks are addressed via different addressing schemes.

Cylinder-Head-Sector (CHS)

Traditional hard disk drives were described by their drive geometry, which consisted of Cylinder-Head-Sector (CHS). This drive geometry was used to address hard disk drive data blocks.

Specifically, CHS referred to:

  • Cylinders per head (i.e., the number of tracks on one side of each platter for each head aligned through the stack of platters)
  • Number of read/write heads
  • Sectors per track

An illustration helps demonstrate how the CHS components fit together:

Hard Drive Geometry
"Hard Drive Geometry" by Henry Mühlpfordt is licensed under a CC BY-SA 4.0 license

Modern hard disks have more sectors on the outside cylinders than on those near the center of the hard disk platters, so as drives grew in size and their geometry became more complicated, the CHS model was no longer suitable. Regardless, every disk still claims a fabricated disk geometry that matches the disk capacity, in order to preserve backwards compatibility with older system firmware (e.g., BIOS) and DOS.

Logical Block Addressing (LBA)

The replacement for the CHS model is Logical Block Addressing (LBA). Under LBA, disk data blocks are located by an integer index (e.g., 0, 1, 2) and each linear base address describes a single data block.

Partitions

A partition is a logical part of a disk. Drives are divided into partitions because they help organize the contents of a disk according to the kind and use of the data contained within them.

Partition Table

A drive's partition table specifies where on the disk that the Linux kernel can find partitions. Once the kernel locates a partition, the content of the partition table is not used until it is searched again, usually during the next system boot.

Partitioning Schemes

There are two dominant methods for partitioning a drive.

Master Boot Record (MBR)

The Master Boot Record (MBR) scheme is the older way of partitioning disks. Specifically, the MBR is the first sector (i.e., address 0) of a drive (often, this also means the first 512 bytes of a drive, since many drives use a 512-byte Unit Allocation Size).

The MBR is also a boot sector (i.e., the part of a boot medium that contains special information concerning system start). The MBR only exists on partitioned drives, but is located outside of any partitions.

Master Boot Record Anatomy
"MBR (Master Boot Record) Anatomy" by ScotXW is licensed under a CC BY-SA 3.0 license

The MBR partition table entries usually store the starting sector number of a partition (you can find this in /sys/block/ex_device_ex_partition/start), as well as the length of a partition (in sectors). The MBR partition table also contains each partition type, which loosely describes the type of data management structure that might appear on the partition.

Show/Hide GNU/Linux MBR Partition Types
GNU/Linux MBR Partition Types
Partition Type Associated Hexadecimal Value
Linux data 81
Linux swap space 82
Redundant Array of Independent Disks (RAID) superblock 86
Linux Logical Volume Management (LVM) 8E
Linux Unified Key Setup (LUKS) E8
Protective partition for GUID Partition Table (GPT)-partitioned disk EE
RAID superblock with autodetection FD
Linux LVM (old style) FE

The MBR scheme as limitations:

  • A maximum partition size of 2 tebibytes.
  • Only four Primary partitions allowed.
  • No redundancy. There is only one copy of the MBR partition table.
  • No data integrity checks or protections.

GUID Partition Table (GPT)

The GUID Partition Table (GPT) scheme is the replacement for MBR. A GPT ensures that every partition on a drive has its own Globally Unique Identifier (GUID).

GPT has several advantages:

  • A maximum partition size of 8 zebibytes.
  • Allows up to nearly an unlimited number of partitions (this will be limited by your operating system).
  • Has redundancy. Multiple copies of the GPT are stored throughout a drive.
  • Has data integrity checks and protections. Cyclic Redundancy Check (CRC) values are stored in the GPT.
  • Is backwards compatible with MBR. The first sector (address 0) of a GPT disk remains reserved for a protective MBR, which designates the whole disk as partitioned from a MBR point of view.

The second sector (address 1) of a GPT disk contains the GPT header, which stores management information for the whole disk. Usually, partitioning information is contained in the third and subsequent sectors, but can be stored anywhere on a disk. The minimum size for the sector containing the partition information is 16 kibibytes.

It is common to have the first partition start one mebibyte after the start of the disk (assuming 512-byte sectors, this would be at sector 2048).

GUID Partition Table Scheme
"GUID Partition Table Scheme" by Kbolino is licensed under a CC BY-SA 2.5 license

Partition table entries are at least 128 bytes long. Sixteen byte GUIDs are used for the partition type and the partition itself, and 8 bytes are used for a starting and ending block number. Eight bytes are used for attributes and 72 bytes are used for a partition name.

Show/Hide Partition Type GUIDs
Partition Type GUIDs
Partition Type Globally Unique Identifier (GUID)
Unused entry 00000000-0000-0000-0000-000000000000
MBR partition scheme 024DEE41-33E7-11D3-9D69-0008C781F39F
Extensible Firmware Interface (EFI) system partition C12A7328-F81F-11D2-BA4B-00A0C93EC93B
Basic Input/Output System (BIOS) boot partition 21686148-6449-6E6F-744E-656564454649

Some partition type GUIDS are specific to GNU/Linux.

Show/Hide GNU/Linux Partition Type GUIDs
GNU/Linux Partition Type GUIDs
Partition Type Globally Unique Identifier (GUID)
Linux file system data 0FC63DAF-8483-4772-8E79-3D69D8477DE4
RAID partition A19D880F-05FC-4D3B-A006-743F0F84911E
Root partition (x86) 44479540-F297-41B2-9AF7-D131D5F0458A
Root partition (x86-64) 4F68BCE3-E8CD-4DB1-96E7-FBCAF984B709
Root partition (32-bit ARM) 69DAD710-2CE4-4E3C-B16C-21A1D49ABED3
Root partition (64-bit ARM/AArch64) B921B045-1DF0-41C3-AF44-4C6F280D3FAE
/boot/ partition BC13C2FF-59E6-4262-A352-B275FD6F7172
Swap partition 0657FD6D-A4AB-43C4-84E5-0933C84B4F4F
LVM partition E6D6D379-F507-44C2-A23C-238F2A3DF928
/home/ partition 933AC7E1-2EB4-4F13-B844-0E14E2AEF915
/srv/ (server data) partition 3B8F8425-20E0-4F3B-907F-1A25A76F98E8
Plain dm-crypt partition 7FFEC5C9-2D00-49B7-8941-3EA10A5586B7
LUKS partition CA7D7CCB-63ED-4C53-861C-1742536059CC
Reserved 8DA63339-0007-60C0-C436-083AC8230908

GNU/Linux can use GPT-partitioned disks, but your GNU/Linux installer and bootloader also need to support GPT if you want to create or boot from GPT-partitioned disks, respectively.

Partition Commands

fdisk

To list the partition tables for all drives currently attached to a GNU/Linux system, enter # fdisk -l (the -l option is short for --list):

# fdisk -l
Disk /dev/vda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x9381a638

Device     Boot  Start      End  Sectors  Size Id Type
/dev/vda1  *      2048   499711   497664  243M 83 Linux
/dev/vda2       501758 41940991 41439234 19.8G  5 Extended
/dev/vda5       501760 41940991 41439232 19.8G 8e Linux LVM


Disk /dev/mapper/tails--vg-root: 15.8 GiB, 16907239424 bytes, 33021952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/tails--vg-swap_1: 4 GiB, 4286578688 bytes, 8372224 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

If you are only interested in the partition table for a specific drive(s), you can pass its device file (e.g., /dev/sda) to fdisk as an argument:

# fdisk -l ex_device_file...

By default, the units of measure are sectors, but this can be changed with the -u ex_units (--units=ex_units) option and a new unit provided to the option as an argument.

The partition table for a specific device can be interactively manipulated like so:

# fdisk ex_device_file

Afterwards, you will be placed at the Command prompt for fdisk:

# fdisk '/dev/vda'

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help):

fdisk lets you partition hard disks according to the MBR or GPT schemes. On an empty (unpartitioned) disk, fdisk will, by default, create a MBR partition table, but you can change this afterwards.

These are some of the commands that fdisk supports in interactive mode:

m
Help. Displays fdisk commands.
p, print
Print the partition table.
n
Create a new partition.
d
Delete a partition.
t
Set a partition type. You will need to enter the code for the partition type as a hexadecimal number.
Enter L to list the supported partition types.
w
Write changes to disk and quit fdisk.
q
Quit fdisk without saving changes.

You can generate a GPT-style partition table by pressing g (any existing MBR partition will be overwritten in the process). The created GUID will apply to the entire disk (for GPT disks, there are also GUIDs for the partition type and the partition itself).

Afterwards, you can use the n command to create partitions the usual way (the menu will look slightly different than when working with a MBR disk). The partition type selection is different, as well, because it involves GUIDs, rather than two-digit hexadecimal numbers.

When done, write you changes to disk and quit fdisk by pressing w.

After storing the partition table, fdisk tries to get the Linux kernel to re-read the new partition table. This works well with new or unused disks, but fails if a partition on the disk is in use (e.g., as a mounted file system). This lets you re-partition the disk with the / file system only with a system reboot.

When working with fdisk, it is usually wise to adhere to its suggested defaults, unless you have compelling reasons to do otherwise.

cfdisk is a screen-oriented alternative to fdisk.

gdisk

gdisk is similar to fdisk and its elementary functions correspond to those of fdisk. To view all gdisk commands, enter ?. As with fdisk, you can quit gdisk (without saving your changes) by pressing q.

There are a few features of gdisk that make it unique:

  • gdisk can be used to convert a MBR-partitioned medium to a GPT-partitioned medium. This presupposes that there is enough space at the start and end of the medium for GPT partition tables. You will need to ensure that the last 3 sectors of the medium are not assigned to a partition. To do this conversion, you usually only need to provide the device file to gdisk as an argument and follow the prompts.
  • gdisk can be used to convert a GPT-partitioned medium to a MBR-partitioned medium. To do the conversion, you must use the r command in gdisk to change the menu to recovery and transformation options and select the g command there (convert GPT into MBR and exit).

sfdisk and sgdisk

sfdisk and sgdisk are non-interactive automated partitioning programs for MBR and GPT disks, respectively. Also, they can be used to create a backup copy of partitioning information and either print it as a table or store it to media. This is useful when dealing with logical partitions and you want to save partitioning data (logical partitions do not store their partition information in the MBR, but inside the logical partition itself).

parted

To list the partition tables for all drives currently attached to a GNU/Linux system, enter # parted -l (the -l option is short for --list):

# parted -l
Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/tails--vg-swap_1: 4287MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system     Flags
 1      0.00B  4287MB  4287MB  linux-swap(v1)


Model: Linux device-mapper (linear) (dm)
Disk /dev/mapper/tails--vg-root: 16.9GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  16.9GB  16.9GB  ext4


Model: Virtio Block Device (virtblk)
Disk /dev/vda: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system  Flags
 1      1049kB  256MB   255MB   primary   ext2         boot
 2      257MB   21.5GB  21.2GB  extended
 5      257MB   21.5GB  21.2GB  logical                lvm

If you are only interested in the partition table for a specific drive, you can pass its device file (e.g., /dev/sda) to parted as an argument and use the print command to view its partition table:

# parted ex_device_file print

The default units for parted are in MB (megabytes).

parted can be used interactively or on the command line (e.g., parted '/dev/sdb' mkpart primary ext4 316MB 421MB). Unlike fdisk, parted makes changes you request in real-time.

The partition table for a specific device can be interactively manipulated like so:

# parted ex_device_file

# parted '/dev/vda'
GNU Parted 3.2
Using /dev/vda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)

Afterwards, you will be placed at the (parted) command line. Use the help command to display a list of parted commands.

A disk partition table is created with the mktable command:

mktable ex_partition_type

ex_partition_type can be:

  • aix
  • amiga
  • bsd
  • dvh
  • gpt
  • loop (raw disk access)
  • msdos (MBR)
  • pc98
  • sun

The command format for creating a partition is:

mkpart ex_partition_type ex_file_system_label ex_file_system_type ex_start_point ex_end_point

For example:

mkpart primary my_system ext4 3 8GB

For ex_partition_type, you can enter the values:

  • primary
  • extended
  • logical

A partition type only needs to be specified for msdos and dvh partition tables.

ex_file_system_label is required for GPT partition tables.

ex_file_system_type is optional. Available options are:

  • btrfs
  • ext2
  • ext3
  • ext4
  • fat16
  • fat32
  • hfs
  • hfs+
  • linux-swap
  • ntfs
  • reiserfs
  • xfs

For ex_start_point and ex_end_point, you can use:

  • B (bytes)
  • compact (megabytes for input and a human-friendly form for output)
  • chs (cylinders, heads, sectors)
  • cyl (cylinders)
  • GB
  • GiB
  • kB
  • MB
  • MiB
  • s (sectors)
  • TB
  • TiB
  • % (percentage of device size)

If you want to use all the remaining space on a disk for the partition you are creating, you can use 100% as the end point.

parted commands include:

name
Add a file system label to a GPT partition.
p, print
View the partition table for a drive.
print all
Output the partition tables of all block devices.
print devices
List all available block devices.
print free
Display free (unallocated) space on a drive.
q
Quit parted.
rm
Remove unwanted partitions.

select Select a new drive.

If you accidentally remove a partition, parted can help you find it again with the rescue command. You need to provide rescue with the approximate location on the disk where the partition used to be (rescue 200MB 350MB). For the rescue command to work, there must be a file system on the partition.

GParted (GNOME Partition Editor) is a graphical version of parted.

lsblk and blkid

To list all block devices, along with their device node names/partitions, major/minor numbers, sizes, types (i.e., disk, partition), and mount points, use the lsblk command:

lsblk -a

The -a (--all) option ensures that empty devices and RAM disk devices are also listed.

lsblk's output can be customized by the use of the -o (--output) option, which can be passed a comma-separated list of block device attributes to additionally include in the command's output.

For example, the following adds block device file system labels and Universally Unique Identifiers (UUIDs) to lsblk's output:

lsblk -a -o +LABEL,UUID

Block device file system labels and UUIDs can also be viewed with the # blkid command.

iostat

The iostat command reports CPU statistics and I/O statistics for devices and partitions. Specifically, it displays a single history-since-boot report for all CPUs and devices. All subsequent reports cover the time since the previous report.

iostat is available as part of the sysstat package and will likely need to be installed from your distribution's respository (e.g., # apt install sysstat, # dnf install sysstat).

$ iostat
Linux 4.19.0-8-amd64 (tails)    11/09/2020  _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.22    0.10    5.22    0.90    0.50   76.06

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda             183.65      7005.18      6400.90    1031162     942213
dm-0            225.58      6921.96      6397.93    1018913     941776
dm-1              3.45        22.39         7.28       3296       1072

You can pass specific devices to the command as arguments in a space-separated list to only see results for those devices (e.g., iostat '/dev/sda' '/dev/sdb').

iostat is used for monitoring system I/O device loading by observing the time the devices are active in relation to their average transfer rates.

This command generates three types of reports:

  1. CPU Utilization
  2. Device Utilization
  3. Network File System

CPU Utilization

For multiprocessor systems, the CPU values are global averages among all processors.

The format for this report is:

%user
Percentage of CPU utilization that occurred while executing at the user level.
%nice
Percentage of CPU utilization that occurred while executing at the user level with nice priority.
%system
Percentage of CPU utilization that occurred while executing at the system (kernel) level.
%iowait
Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%steal
Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%idle
Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

Device Utilization

This report provides statistics on a per physical device or partition basis.

The format for this report will usually show at least the following fields (more fields may be shown, depending on the flags used with the command):

Device
The device (or partition) name.
tps
Number of transfers per second that were issued to the device. A transfer is an I/O request to the device.
kB_read/s
The amount of data read from the device expressed in kilobytes per second.
kB_wrtn/s
The amount of data written to the device expressed in kilobytes per second.
kB_read
The total number of kilobytes read by the device.
kB_wrtn
The total number of kilobytes written by the device.

Network File System

This report provides statistics for each mounted network file system.

The report shows the following fields:

Filesystem
The hostname of the Network File System (NFS) server followed by a colon and by the directory name where the network file system is mounted.
rkB_nor/s
The number of kilobytes read by applications via the read(2) system call interface.
wkB_nor/s
The number of kilobytes written by applications via the write(2) system call interface.
rkB_dir/s
The number of kilobytes read from files opened with the O_DIRECT flag.
wkB_dir/s
The number of kilobytes written to files opened with the O_DIRECT flag.
rkB_svr/s
The number of kilobytes read from the server by the NFS client via an NFS READ request.
wkB_svr/s
The number of kilobytes written to the server by the NFS client via an NFS WRITE request.
ops/s
The number of operations that were issued to the file system per second.
rops/s
The number of read operations that were issued to the file system per second.
wops/s
The number of write operations that were issued to the file system per second.

iotop

The iotop command can be used to watch I/O usage information output by the kernel and display a table of current I/O usage by processor threads on the system. Most likely, you will need to install this command from your GNU/Linux distribution's repository (e.g., # apt install iotop, # dnf install iotop).

# iotop
Total DISK READ:         0.00 B/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                       
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
    4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
    6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H-kblockd]
    8 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
    9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
   10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
   11 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
   12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
   14 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/0]
   15 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/1]
   16 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
   17 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
   19 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H-kblockd]
   20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kdevtmpfs]
   21 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [netns]
   22 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kauditd]
   23 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [khungtaskd]
   24 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [oom_reaper]
   25 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [writeback]
 keys:  any: refresh  q: quit  i: ionice  o: active  p: procs  a: accum                     
  sort:  r: asc  left: SWAPIN  right: COMMAND  home: TID  end: COMMAND  

ionice

The ionice command queries the current I/O scheduling class and priority for the shell.

$ ionice
none: prio 0

A process can be in one of three scheduling classes:

  1. Idle A program running with idle I/O priority will only get disk time when no other program has asked for disk I/O for a defined grace period.
  2. Best-effort This is the effective scheduling class for any process that has not asked for a specific I/O priority.
  3. Realtime The RT scheduling class is given first access to the disk, regardless of what else is going on in the system.

The formula for I/O Priority is:

I/O Priority = (cpu_nice_value + 20) / 5

The following command can be used to query the current I/O scheduling class and priority for a process:

ionice -p ex_process_id...

Multiple processes can be passed to the -p (--pid) option as a space-separated list.

You can set the I/O scheduling class and priority for a running process that your user owns like so:

ionice -c ex_class -n ex_priority_level -p ex_process_id

The -c (--class) option can take 0 (none), 1 (real-time), 2 (best-effort), and 3 (idle) as arguments. For processes owned by other users, you will need root privileges in order to run the above command (e.g., # ionice -c ex_class -n ex_priority_level -p ex_process_id).

Files as Storage Media

GNU/Linux is able to treat files like storage media, i.e., you can partition them, generate file systems, and generally treat the partitions on a file as if they were partitions on a real drive.

For example, you can create an empty file to work with using the dd command:

dd if='/dev/zero' of='example.img' bs=1M count=1024

The above command creates a .img file from null bytes (i.e., if='/dev/zero'). Several dd operands are used in the above command:

if=ex_file
Read from ex_file instead of from standard input.
of=ex_file
Write to ex_file instead of standard output.
bs=ex_bytes
Read and write to ex_bytes bytes at a time (the default is 512 bytes). This overrides the ibs and obs operands.
count=ex_integer
Copy only ex_integer input blocks.

Before you can use the file, you have to ensure that there are device files available for the partitions (unlike with real storage media, this is not automatic for simulated storage media in files), as well as a device file for the example.img file as a whole (i.e., a loop device).

losetup

You can create a loop device with the losetup command. Loop devices make block-oriented devices from files:

# losetup -f ex_file

For example, to create a loop device for the example.img file that we created with dd, run # losetup -f 'example.img'.

losetup creates device filenames of the form /dev/loopn, where n is a number associated with a device (e.g., /dev/loop3). The -f (--find) option makes losetup find the first unused loop device.

Use losetup's -a (--all) option to show the status of all loop devices. This command can be used to verify which /dev/loopn device our example.img file is tied to:

# losetup -a
/dev/loop0: [64768]:262458 (/home/amnesia/example.img)

Above, we can confirm that example.img is associated with /dev/loop0.

kpartx

Once you have assigned your disk image to a loop device (e.g., /dev/loop0) and created partitions on the newly created loop device, you can create device files for its partition(s). This is done using the kpartx command, which makes partitions on loop devices accessible via device files. You may need to install this command from your distribution's repository (e.g., # apt install kpartx).

# kpartx -av ex_device_file

The -a option adds partition mappings and the -v option specifies verbose output.

The following example creates a partition device file from the single partition on the /dev/loop0 device:

# kpartx -av '/dev/loop0'
add map loop0p1 (253:2): 0 2093056 linear 7:0 2048

The created device files appear in the /dev/mapper/ directory (e.g., /dev/mapper/loop0p1). Now, you can create file systems on these partitions and mount them (e.g., # mount -o loop '/dev/mapper/loop0p1' '/mnt/') to your system's directory structure.

The device files for the partitions can be removed with the -d option:

# kpartx -dv ex_device_file

For example, to remove the /dev/mapper/loop0p1 file, we can run:

# kpartx -dv '/dev/loop0'
del devmap : loop0p1

An unused loop device can be released using the losetup command with the -d option:

# losetup -d ex_device_file...

The # losetup -D command releases all loop devices simultaneously.

Logical Volume Management (LVM)

Logical Volume Management (LVM) is a more flexible way of allocating space on drives than traditional partitioning schemes. Specifically, it can concatenate, stripe together, or combine partitions (or whole disks) into larger virtual partitions that you can re-size or move, without causing system disruptions.

In the LVM paradigm, Physical Volumes (i.e., disks and disk partitions) are broken up into Physical Extents.

Physical Volumes are combined into a Volume Group. A Volume Group can consist of Physical Volumes from different disks, but that is generally not recommended (i.e., if one disk fails, the entire Volume Group will be lost).

Logical Volumes are made from Physical Extents from one or more Physical Volumes in a Volume Group. A Logical Volume can be changed in size while the system is running, as long as your Volume Group has unused space available. The file system being used must support this, as well. ext3 and ext4 file systems allow you to increase their size while the system is mounted, but you must unmount them to reduce the file system size.

When creating Logical Volumes, you can cause their storage space to be spread across several physical disks (i.e., striping) or multiple copies of their data to be stored in several places within the Volume Group at the same time (i.e., mirroring). The former may decrease retrieval time, while increasing the danger of losing data whenever any of the disks in the Volume Group fail. The latter may reduce the risk of losing data, while increasing access times. In reality, a hardware or software Redundant Array of Independent Disks (RAID) may be preferable.

If a disk in your Volume Group experiences problems, you can migrate the Logical Volumes from that disk to another in your Volume Group. Afterwards, you can withdraw the problematic disk, install a new disk, add that disk to the Volume Group, and migrate the Logical Volumes back.

With LVM, you can also take snapshots, which you can use for backup copies without having to take your system offline for hours. You can freeze the current state of a Logical Volume on another new Logical Volume (which is often done swiftly) and then make a copy of that new Logical Volume whenever you want, while normal operations continue on the old Logical Volume.

The snapshot Logical Volume only needs to be big enough to hold the amount of changes to the original Logical Volume you expect to accrue while the backup is being made (plus a sensible safety margin), since only the changes are being stored inside of the new snapshot Logical Volume. Snapshots can be writable, as well.

On GNU/Linux, LVM is a special application of the previously mentioned device mapper (/dev/mapper/), a system component enabling the flexible use of block devices. The device mapper also provides other useful features (e.g., encrypted disks and space-saving storage provisioning for virtual servers).

File Systems

A file system is a method of storing and finding files on a disk. File systems should be placed within a partition.

A networking file system can be thought of as a grouping of lower-level file systems of varying types.

Object Types

A GNU/Linux file system has six different types of objects (files):

  1. Plain files - Includes texts, graphics, sound files, executable programs, etc.
  2. Directories - Help structure storage. Essentially, a directory is a table providing filenames and associated inode numbers.
  3. Symbolic links - Contain a path specification redirecting access to the link to a different file.
  4. Device files - Serve as interfaces to arbitrary devices. Every read or write access to such a file is redirected to the corresponding device.
  5. FIFOs - Like pipes, FIFOs (First In, First Out; also referred to as named pipes) allow the direct communication between processes without using intermediate files. One FIFO is used for reading and another is used for writing. Unlike pipes, FIFOs are real files and have access rights (i.e., associated file system permissions).
  6. UNIX-domain sockets - Like FIFOs, UNIX-domain sockets are a method of inter-process communication (IPC). Basically, they use the same programming interface as network communications across TCP/IP, but only work for communication on the same computer. Unlike FIFOs, UNIX-domain sockets allow bidirectional communication.

Filenames

For GNU/Linux filenames, you can use any characters except for the / and the zero byte (i.e., the character with the ASCII value of 0). The / character serves to separate directories from files and directories from other directories in pathnames. Also, the / is used to denote the root directory (i.e., the root of the directory tree).

Any kind of special character (i.e., a character special to the shell) must be escaped with a \ or surrounded in single quotes in order to avoid misinterpretation by the shell. Also, what happens to special characters depends on your system's locale settings, since there is no general standard for representing characters exceeding the ASCII character set. It is probably best to avoid using special characters in filenames to maintain the greatest cross-system compatibility.

GNU/Linux file systems are case-sensitive. There is no definite upper limit on the length of a filename. The maximum amount will depend on the file system being used. A typical upper limit is 255 characters.

All characters from the following set can be freely used in filenames:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789+-._

In general, filenames should always start with a letter or digit.

.. refers to a directory's parent directory (e.g., for /home/amnesia/, .. refers to /home/). . refers to the current directory.

GNU/Linux does not use suffixes to characterize a file's type, even though it is still wise to use them to make it easier to identify file content. Also, certain programs insist on their input files having specific suffixes (e.g., the C compiler gcc).

File System Components

A traditional GNU/Linux file system has two primary components:

  1. A pool of data blocks where you can store data
  2. A database system that manages the data pool (i.e., the inode table)

The database is centered around the inode data structure. An inode is a set of data that describes a particular file, including its type, permissions, and where in the data pool the file data resides. It describes everything about a file, except the filename.

Inodes are identified by numbers listed in the inode table. For any ext2/3/4 file system, you start at inode number 2, i.e., the root inode. You can tell that you are in the root inode if you look at its data pool representation and you see that it does not have a parent directory (i.e., ..).

Filenames and directories are also implemented in inodes. A directory inode contains a list of filenames and corresponding links to other inodes. Essentially, a directory is a table mapping filenames to inode numbers (this mapping is also referred to as a dentry, which is short for directory entry).

Various file system objects require inodes, but no data block (e.g., device files, FIFOs, short symbolic links). Therefore, you can run out of inodes before running out of data blocks.

You can view the inode information for an object with the stat and ls -i commands.

A file system label is a piece of arbitrary text of up to 16 characters that is placed in a file system's superblock. The superblock is the block at the beginning of the file system that contains information about the structure of the file system.

A file system Universally Unique Identifier (UUID) is automatically generated when a file system is created. You can determine a file system's UUID using the lsblk -o +UUID or blkid commands, or by looking in /dev/disk/by-uuid/.

The Virtual File System

The Linux kernel uses the Virtual File System (VFS) switch, which abstracts file system operations (e.g., the opening and closing of files, the reading and writing of data) to enable the coexistence of different file system implementations in GNU/Linux distributions. The VFS allows client applications to access different types of real file systems in a uniform way.

Character and Block Devices

GNU/Linux distinguishes between character devices and block devices.

A character device provides or processes single characters.

A block device treats data in blocks. In ls -l's output, character devices are labeled with a c and block devices are labeled with a b.

Each device has a major and minor number. The major number specifies the device's type and governs which kernel driver is in charge of the device. The minor number is used by the driver to distinguish between similar or related devices, or to denote the various partitions of a disk.

Usually, device names for Serial Advanced Technology Attachment (SATA) mass storage are /dev/sda, /dev/sdb, etc., in the order the devices are recognized. Partitions are numbered /dev/sda1, /dev/sda2, etc.

If /dev/sda is partitioned according to the MBR scheme, /dev/sda1 through /dev/sda4 correspond to the primary partitions (possibly including an extended partition), while logical partitions are numbered starting with /dev/sda5, regardless of whether there are four primary partitions on the disk or fewer. This scheme allows for up to 15 partitions per device.

The s in /dev/sda derives from Small Computer System Interface (SCSI), and GNU/Linux manages almost all mass storage devices as SCSI devices.

On older Parallel Advanced Technology Attachment (PATA) disks, a different mechanism is used, which directly accesses the IDE controllers inside the computer. The two drives connected to the first controller are called /dev/hda and /dev/hdb, and the ones connected to the second controller are called /dev/hdc and /dev/hdd. Partitions on PATA disks are also numbered (e.g., /dev/hda1, /dev/hda2, etc). This scheme allows for up to 63 partitions per device.

The Filesystem Hierarchy Standard (FHS)

The Filesystem Hierarchy Standard (FHS) defines the directory structure and contents in GNU/Linux distributions. It is maintained by the Linux Foundation.

According to the FHS, the following directories are required:

/bin/
Essential command binaries for all users.
/boot/
Static files of the bootloader. Kernel image, kernel system map (System.map-ex_kernel), kernel configuration file (config-ex_kernel), and boot files (e.g., initrd.img-ex_kernel or initramfs-ex_kernel.img, grub or grub2 directory).
/dev/
Device files. These files are also known as device nodes and form the interface between the shell and the device drivers inside the kernel. They have no content and refer to a driver inside the kernel via device numbers.
/etc/
Host-specific system configuration files.
/lib/
Essential shared libraries and kernel modules (i.e., /lib/modules/). This directory contains drivers, file systems, the packet filter infrastructure, and network protocols that the kernel can load and unload on-the-fly.
/media/
Mount points for removable media (newer systems use /run/media/).
/mnt/
Temporary mount points for the system administrator.
/opt/
Add-on application software packages.
/run/
Data relevant to running processes.
/sbin/
Essential system command binaries (e.g., binaries needed for boot).
/srv/
Data for services provided by the system.
/tmp/
Temporary files. Typically cleared during boot.
/usr/
Secondary hierarchy. This directory contains shareable read-only programs/data that are not host-specific and do not vary with time.
/var/
Variable data (e.g., data associated with logging, Web services, FTP services, print queues, low-level package manger databases like /var/lib/dpkg/, or the NTP driftfile)./var/tmp/ is not cleared during boot.

The two most common directories to give their own partition are /home/ and /var/. Secondary candidates for their own partitions are /boot/, /tmp/, and /usr/.

The /usr/ subdirectories are:

/usr/bin/
Non-essential user command binaries. This is the primary directory of executable commands on the system.
/usr/include/
Header files (i.e., functions) for C programs.
/usr/lib/
Non-essential shared libraries.
/usr/local/
For use by the system administrator when locally installing software, and when the software needs to be safe from being overwritten when the system is updated. This directory may be used for programs and data that are shareable among a group of hosts, but not found in /usr/.
/usr/sbin/
Non-essential system command binaries.
/usr/share/
Shared data used by applications (e.g., /usr/share/zoneinfo/). Generally architecture-independent.
/usr/src/
Source code, usually for the Linux kernel.

A non-FHS directory that only exists on ext file systems is lost+found. Files whose inodes are not referenced from any directory are placed here using the inode number as the filename and can be moved elsewhere from here (e.g., they are unlinked and their name has been erased or they have been corrupted, like during a kernel panic or power failure).

The file system checker creates links to the files in the lost+found directory on the same file system. This way, the system administrator can determine if the file is recoverable and where the file really belongs.

Pseudo File Systems

Pseudo file systems (also referred to as synthetic file systems) are hierarchical interfaces to non-file objects that appear as regular files in a real file system's directory tree.

The GNU/Linux pseudo file systems are:

procfs (mounted on /proc/)
Exposes kernel information to user space. Can be used to view information about system hardware. Each process has a directory in /proc/.
sysfs (mounted on /sys/)
Gives information about the system and hardware. Can be used to alter system parameters and for debugging purposes. /sys/ is a more structured successor to /proc/.
tmpfs (mounted on /run/)
Uses the page cache to prevent out of memory situations.

The Extended File System (ext)

The extended file system (ext) was the first file system specifically created for the Linux kernel.

ext2 pushed various size limits and added support for file access time, modification time, and change time values. Most data structures contained surplus space, which enabled the use of important extensions, like Access Control Lists (ACLs) and extended attributes.

ext3 added journaling, enlarging mounted file systems, and more efficient data structures for directories with many entries. Usually, it is possible to access ext3 file systems as ext2 file systems (i.e., mounting an ext3 file system as an ext2 file system; new features cannot be used) and vice-versa.

Journaling

A journaling file system considers every write access to the disk as a transaction that must be completely performed, or not at all. By definition, the file system is consistent before and after a transaction is performed. Every transaction is first written into a special area of the file system called the journal (you can view a file system's journal using the debugfs, e.g., # debugfs, open ex_partition_file, logdump).

If a transaction has been entirely written to the journal, it is marked as complete and is official. If the system crashes, a journaling file system does not need to undergo a complete file system check. Instead, the journal is considered and any transactions marked complete are transferred to the actual file system. Transactions not marked complete are discarded.

ext3 provides a choice between three operating modes:

  1. Writing metadata and data to the journal before writing it to the file system (mount option data=journal).
  2. Writing data blocks directly to disk and then metadata to the journal (mount option data=ordered). If a file is being written or appended to and there is a disruption, the journal will indicate that the new file or appended data has not been committed, and it will be purged by the cleanup process. Files being overwritten can become corrupted because the original version of the file is not stored, and the resulting file will end up in an intermediate state (i.e., the new data never completely made it to the disk, and the old data is not stored anywhere).
  3. No restrictions, i.e., only metadata is saved to the journal, but it can be journaled either before of after the data is written to disk (mount option data=writeback).

The default mode is data=ordered.

ext4

Instead of maintaining the data blocks of individual files as lists of block numbers, ext4 uses extents, i.e., groups of physically continuous blocks on disk. This leads to greater efficiency, but makes file systems using extents incompatible to ext3. That is, ext3 and ext2 file systems can be mounted as ext4 file systems and will benefit from internal improvements in ext4. However, ext4 feil systems cannot be accessible as ext3 and ext2 file systems.

Other ext4 improvements include:

  • When data is written, actual blocks on the disk are assigned as late as possible, which helps prevent fragmentation.
  • User programs can advise the operating system of how large a file is going to be, which can be used to assign contiguous file space and mitigate fragmentation.
  • Checksums are used to safeguard the journal, which increases reliability and avoids some problems when the journal is replayed after a system crash.
  • Various optimizations of internal data structures, which increases the speed of consistency checks.
  • Timestamps now carry nanosecond resolution and roll over in 2242, rather than in 2038.
  • Some size limits have increased, e.g., directories may now contain 64,000 or more subdirectories (previously 32,000), files can be as large as 16 TiB (tebibytes), and file systems as large as 1 EiB (exbibytes).

File System Creation Commands

There are several commands for creating file systems on GNU/Linux.

mkfs

A file system can be created with the mkfs command:

# mkfs.ex_file_system_type ex_partition_file

Above, ex_file_system_type can be:

  • btrfs
  • cramfs
  • ext2
  • ext3
  • ext4
  • fat
  • hfsplus
  • minix
  • msdos
  • ntfs
  • vfat
  • xfs

ext file systems can be created with the mke2fs command, as well (mkfs.ext commands are just symbolic links to the mke2fs command):

# mke2fs ex_partition_file

If you do not specify the file system type with mke2fs's -t option or use the -j option (which creates the file system with an ext3 journal), the default file system of ext2 will be used. The default parameters that the mke2fs command uses are specified in /etc/mke2fs.conf.

For example, the following command creates an ext4 file system on the sda2 partition:

# mke2fs -j -t ext4 '/dev/sda2'

mke2fs options include:

-b ex_block_size
Determine the block size. Typical values are 1024, 2048, or 4096.
-c
Check the device for damaged blocks before creating the file system.
-F
Format file system objects that are not block device files (e.g., a cdrom.img file). This is a shortcut to the previously mentioned Files as Storage Media approach.
# dd if='/dev/zero' of='example.img' bs=1M count=1024
# mke2fs -F 'example.img'
# mount -o loop 'example.img' '/mnt/'
# cp ex_file '/mnt/'
# umount '/mnt/'
-i ex_inode_density
Set the inode density (i.e., bytes per inode). The value must be a multiple of the block size. The minimum value is 1024. The default value is the current block size.
-j
Create a file system with an ext3 journal. The default journal parameters will be used to create an appropriately sized journal.
-m ex_data_block_percentage
Set the percentage of data blocks reserved for the root user. The default is 5%.
-N ex_inode_number
Specify how many inodes are created on the file system.
-S
Cause mke2fs to rewrite the superblocks (the superblock is the block at the beginning of the file system that contains information about the structure of the file system) and group descriptors, and leave the inodes intact. This is useful when trying to address a corrupted superblock. e2fsck should be run after this option is used.
-t ex_file_system_type
Specify the file system type. Accepted values are ext2, ext3, and ext4.

genisoimage

The genisoimage command is used to create an ISO file system from a directory tree:

# genisoimage -o ex_image.iso ex_directory...

Multiple directories can be passed to genisoimage as a space-separated list.

Likely, genisoimage will need to be installed from your GNU/Linux distribution's repository:

# apt install genisoimage (Debian)

# dnf install genisoimage (Fedora)

mknod and mkfifo

The mknod command creates special node files:

# mknod ex_device_name ex_device_type ex_major_number ex_minor_number

ex_device_name is the name you are giving to the device node.

ex_device_type can be:

  • b (block)
  • c (character)
  • u (character unbuffered)
  • p (named pipe, FIFO)

ex_major_number and ex_minor_number must be specified for device types b, c, and u, and omitted for device type p.

An alternative method to create a named pipe is to use the mkfifo command:

# mkfifo ex_named_pipe...

mdadm

mdadm can be used to create Redundant Array of Independent Disks (RAID) sets from separate disks.

# mdadm --zero-superblock ex_device_file_1 ex_device_file_2...
# mdadm --create ex_raid_array --level=ex_level \
    --raid-devices=ex_number_of_devices ex_device_file_1 ex_device_file_2...

The --zero-superblock option overwrites any existing valid multiple device (MD) superblocks on the specified devices with zeros. --create creates a new array, --level sets the RAID level, and --raid-devices specifies the number of active devices in the array.

You may need to install the mdadm command from your distribution's repository (e.g., # apt install mdadm).

mkswap

A swap partition or file can be made with the mkswap command:

# mkswap ex_partition_file

You can use files as swap space, too. Initially, the file must be filled with null bytes (i.e., zero bytes), e.g., # dd if='/dev/zero' of='swapfile' bs=1M count=256.

mkswap writes administrative data to the partition/file you pass to the command as an argument. Afterwards, you will need to enable (activate) the swap partition/file with the swapon command:

# swapon ex_partition_file...

# swapon ex_swap_file...

The partition's file system label or UUID can be used instead of ex_partition_file by using swapon's -L ex_file_system_label or -U ex_uuid options, respectively.

To deactivate a swap partition/file, use the swapoff command:

# swapoff ex_partition_file...

# swapoff ex_swap_file...

GNU/Linux uses swap space to store part of the contents of system Random Access Memory (RAM). Therefore, the effective amount of working memory available to your system is greater than the amount of RAM in your computer.

You can operate up to thirty two swap partitions in parallel. The maximum size depends on the computer's architecture. If you have several disks, you should spread your swap space across all of them, which should noticeably increase speed.

Swap space should be about the same size as the computer's RAM, up to a maximum of 8 GiB or so. The system usually takes care of activating/deactivating swap partitions, as long as you put the partitions in the /etc/fstab file.

# swapoff -a && swapon -a

The -a (--all) option ensures that all devices marked as swap in /etc/fstab are targeted, except for those with the noauto option.

GNU/Linux can prioritize swap space. This is worth doing if the disks containing your swap space have different speeds. Run man 8 swapon for more information.

File System Mounting Commands

Mounted file system information can be examined, in increasing detail, by:

  1. Examining the output of the mount command.
  2. Running less /etc/mtab to display the contents of the mount table file, which lists the currently mounted file systems.
  3. Running less /proc/mounts.

mount

In principle, it is possible to directly access data stored on a medium through a device file. However, well-known file-management commands can only access files via the directory tree. To use these commands, storage media must be made a part of the directory tree (i.e., mounted) using their device files. This is done with the mount command.

The place in a directory tree where a file system is to be mounted is called a mount point. This can be any directory. The directory does not have to be empty, but you will not be able to access the original directory content while another file system is mounted over it. The original directory content will reappear once the file system is unmounted. You should not use important system directories as mount points.

A file system can be manually mounted like so:

# mount ex_partition_file ex_mount_point

ex_partition_file can be substituted for the file system's UUID (with the use of the -U ex_uuid option) or the file system's label (with the use of the -L ex_file_system_label option). If the partition in question is in the /etc/fstab file, you can provide mount with just the partition file or the mount point, instead of both.

Optionally, you can specify the file system type with the -t ex_file_system_type option, but the Linux kernel can usually figure this out for itself. If mount cannot figure out what to do with a type specification, it tries to delegate the job to the appropriate /sbin/mount.ex_type command (e.g., mount.exfat, mount.fuse, mount.ntfs).

For the -t option, auto means that mount should try to determine the file system's type. To do this, mount tries the libblkid and libvolume_id libraries, which are able to determine which type of file systems exist on a device. If unsuccessful, mount uses the /etc/filesystems file. If /etc/filesystems does not exist, the /proc/filesystems file is utilized. /proc/filesystems is also read if /etc/filesystems ends with a line containing just an asterisk (*).

The Linux kernel generates /proc/filesystems dynamically based on those file systems for which it contains drivers (i.e., in /lib/modules/). /etc/filesystems is useful if you want to specify an order for mount's guesswork that deviates from the one resulting from /proc/filesystems, which you cannot influence.

mount options include:

-a, --all
Mount all file systems listed in /etc/fstab (except for those whose line contains the noauto keyword). File systems are mounted following their order in /etc/fstab.
-l, --show-labels
Add file system labels in the mount output.
-L ex_file_system_label, --label ex_file_system_label
Mount the partition that has the specified label.
-n, --no-mtab
Mount without writing in /etc/mtab. This is required when /etc is mounted on a read-only file system.
-o ex_mount_options, --options ex_mount_options
Use the specified mount options provided to this option as a comma-separated list (e.g., noatime,nodev,nosuid).
-r, --read-only
Mount the file system as read-only.
-t ex_file_system_type, --types ex_file_system_type
Specify the file system type, which is passed to this option as an argument.
Common types include ext2, ext3, ext4, xfs, btrfs, vfat, sysfs, proc, nfs and cifs. Run less /proc/filesystems for a complete list of file systems.
-U ex_file_system_uuid, --uuid ex_file_system_uuid
Mount the partition with the specified UUID.
-w, --rw, --read-write
Mount the file system as read/write.

File system independent options that can be used with mount's -o option include:

async
Enable asynchronous input/output (I/O). Changes are cached and written when the system is not busy.
atime
Specify that the file access time is updated in the file's inode.
auto
The file system is automatically mounted on boot.
defaults
Implement the default options of rw,suid,dev,exec,auto,nouser,async.
dev
Interpret device files in /dev/ as block special devices.
exec
Allow binaries on the file system to be executed.
noatime
Specify that the file access time of files is not updated in the file's inode. Usually, this provides better overall file system performance.
noauto
The file system is not automatically mounted on boot.
nodev
Prevent interpretation of device files in /dev/ as block special devices.
noexec
Prevent binaries on the file system from being executed.
nosuid
Block the use of SUID and SGID bits.
nouser
Only the root user can mount the file system.
ro
Mount a file system in read-only mode.
rw
Mount a file system in read/write mode.
suid
Enable the use of set user ID (SUID) and set group ID (SGID) bits.
sync
Enable synchronous input/output (I/O). Changes are immediately written. This is useful for removable media.
user
Allow regular users to mount the file system.
users
Allow regular users to unmount the file system.

For more file system independent options and file system specific options, run man 8 mount.

/etc/fstab

/etc/fstab is a system configuration file that lists which file systems should be mounted on which mount points by the mount command at boot time.

The /etc/fstab format is:

ex_device ex_mount_point ex_file_system_type ex_mount_options ex_dump_order ex_fsck_order

ex_device can be a file system UUID (UUID=ex_UUID) or label (LABEL=ex_label), a local partition /dev/ file (e.g., /dev/sda1), or Network File System (NFS) share.

If ex_fsck_order is set to 0, the associated device will not be checked by the fsck command.

mount Examples
Mount an ext4 File System as Read/Write With the SUID and SGID Bits Disabled

# mount -t ext4 -o nosuid,rw '/dev/sda2' '/mnt/'

Mount a Remote File System as a One-Time Mount

# mount pro_server_952:'/pro_site' '/mnt/'

Mount a Read-Only Optical Drive

# mount -t iso9660 '/dev/sr0' '/mnt/'

Mount a Read/Write Optical Drive

# mount -t iso9660 '/dev/sg0' '/mnt/'

Mount an ISO Image as Read-Only

# mount -t iso9660 -o ro "${HOME}/Downloads/tails.iso" '/mnt/'

Mount a USB Device

# mount -t auto '/dev/sdc' '/mnt/'

Create a tmpfs File System

The mount command can also be used to create a tmpfs file system:

# mount -t tmpfs -o size=ex_size,mode=ex_mode tmpfs ex_mount_point

The size option is used to specify an upper limit of the file system and is given in bytes, which are rounded up to entire pages. The size may have a k, m, or g suffix for kibibytes, mebibytes, and gibibytes, respectively. The size may also have a % suffix to limit this instance to a percentage of physical RAM. If no size is specified, the default of size=50% is used.

The mode option is used to set the initial permissions of the tmpfs root directory.

tmpfs is an implementation of a RAM Disk Filesystem, which stores files not on disk, but in a system's virtual memory. Therefore, files can be accessed more quickly, but seldom used files can still be moved to swap space.

Virtual memory is a memory management technique that makes it seem to each process that it has the entirety of the system's memory to itself. A memory page is a fixed-length contiguous block of virtual memory that describes a single entry in the page table. It is the smallest unit of data for memory management in a virtual memory system.

umount

A file system can be unmounted with the umount command:

# umount ex_partition_file...

Alternatively, you can provide a file system's mount point to the umount command to unmount it:

# umount ex_mount_point...

File System Management Commands

df

You can list file systems, their sizes, space used, space available, used percentage, and mount point with the df (disk free) command:

df -h

The -h option is short for --human-readable and makes the command's output clearer.

To limit df's output to local file systems, add the -l (--local) option:

df -hl

To list file systems, the number of inodes they support, the inodes used, the inodes free, used inode percentage, and mount point, use the df -i command (-i is short for --inodes).

fsck

The fsck (file system consistency check) command checks and repairs GNU/Linux file systems. Before using fsck on a file system, you should ensure that it has been unmounted.

Without any arguments, fsck will serially check all file systems listed in /etc/fstab:

# fsck

The above command is equivalent to # fsck -As.

fsck can use a partition's device file, or a file system's mount point, label, or UUID as an argument. It can also take multiple arguments as a space-separated list (e.g., # fsck ex_partition_file_1 ex_partition_file_2).

Like the mkfs command, fsck points to specific programs that actually perform the command's function (e.g., fsck.ext4 actually points to e2fsck, which can handle file system checking for all ext file systems).

fsck options include:

-A
Cause fsck to check all file systems mentioned in /etc/fstab by the fsck checking order listed in the sixth column of /etc/fstab. Parallel checks for file systems with the same checking order are allowed.
-C
Include a status bar in the command output.
-N
Display what fsck would do without actually doing it (i.e., perform a dry run).
-R
When the -A option is also used, skip the root file system.
-s
Inhibit parallel checking of multiple file systems (i.e., serialize fsck operations).
-t ex_file_system_type...
Specify the type(s) of file system to be checked. A comma-separated list of file systems and options are provided to this option as an argument (e.g., # fsck -t ext3,ext4,opts=ro '/dev/sdc1'). If the -A option is also specified, only file systems that match the provided argument list will be checked in /etc/fstab.
Run man 8 fsck for more information on this option's implementation.
-V
Output verbose messages about the check run.

Besides its own options, additional file system specific options that fsck does not understand are passed to the file system specific checkers (e.g., fsck.ex2). These options should occur after the name of the file system to be checked and after a -- separator (e.g., # fsck '/dev/sdb1' -- -D).

File system specific options for ext file systems (fsck.ext2, fsck.ext3, fsck.ext4, e2fsck) include:

-b ex_superblock_number
Use ex_superblock_number instead of using the normal superblock. If an alternative superblock is provided and the file system in question is not opened read-only, e2fsck will ensure that the primary superblock is appropriately updated upon completion of the file system check. This can be used to restore a damaged superblock.
You can obtain a superblock number for an ext file system using the dumpe2fs command (e.g., # dumpe2fs '/dev/sda1' | grep 'superblock'). A superblock can be restored with the # e2fsck -b ex_superblock_number -f ex_partition_file command (the -f option, short for --force, forces file system checking, even if it seems clean).
The superblock is the block at the beginning of the file system that contains information about the structure of the file system. It contains information pertaining to the file system as a whole (e.g., when it was last mounted or unmounted, whether it was cleanly unmounted or because of a system crash).
The superblock also normally points to other parts of the management data structures (e.g., where the inodes of free/unoccupied block lists are to be found and which parts of the medium are available for data). Most GNU/Linux file systems keep backup copies of the superblock at various locations in the partition.
-B ex_block_size
Give the size of a block group between two copies of the superblock. You can determine the current blocks per group for an ext file system with the # tune2fs -l command.
-c
Search the file system for bad blocks using the badblocks program. If bad blocks are found, they are added to the bad block inode to prevent them from being allocated to a file or directory.
-D
Optimize directories in the file system, either by reindexing them (if the file system supports directory indexing), or by sorting and compressing directories for smaller directories, or for file systems using traditional linear directories.
-f
Force checking even if the file system seems clean.
-l ex_file
Read the list of bad blocks from ex_file and mark these blocks as used. The format of the ex_file is the same as the one generated by the badblocks program.
-p
Automatically repair (preen) the file system.
-v
Verbose mode.
-y
Answer yes to any questions.

e2fsck performs the following steps:

  • The command line arguments are checked.
  • The program checks whether the file system in question is mounted.
  • The file system is opened.
  • The superblock is checked for readability.
  • The data blocks are checked for errors.
  • The superblock information on inodes, blocks, and sizes are compared with the current system state.
  • Directory entries are checked against inodes.
  • Every data block that is marked used is checked for existence and whether it is referred to exactly once by some inode.
  • The number of links within directories is checked with the inode link counters (must match).
  • The total number of blocks must equal the number of free blocks, plus the number of used blocks.

At program termination, fsck/e2fsck passes information about the file system state to the shell.

Show/Hide fsck Error Codes
fsck Error Codes
Error Code Description
0 No errors
1 File system errors corrected
2 System should be rebooted
4 File system errors left uncorrected
8 Operation error
16 Usage or syntax error
32 fsck canceled by user request
128 Shared library error

If several file systems are being checked, the exit status of fsck is the logical OR of the exit statuses for each file system that is checked.

fsck Examples
Check All File Systems in /etc/fstab (Parallel Checks Allowed)

# fsck -A

To include a status bar and verbose output, run:

# fsck -ACV

Perform a Dry Run on a File System

# fsck -N ex_partition_file

Non-Interactively Repair an ext File System

# fsck ex_partition_file -- -p

Flush an ext3 or ext4 File System Journal

# fsck ex_partition_file -- -fy

Or, you can directly use the e2fsck command:

# e2fsck -fy ex_partition_file

dumpe2fs

Superblock and block group information can be displayed with the dumpe2fs command.

# dumpe2fs ex_partition_file

This command makes visible the internal management data structures of an ext file system. It can also be used to view the locations of a file system's superblocks. Use the grep command to filter dumpe2fs's output for the term superblock to display these locations (e.g., # dumpe2fs '/dev/sda1' | grep 'superblock').

tune2fs

The tune2fs command can be used to change formatting parameters of a file system after it has been created. This command should only be used on a file system that is not mounted for writing.

# tune2fs ex_options ex_partition_file

Useful tune2fs options include:

-c ex_max_mount_counts
Set the maximum number of times the file system may be mounted between two routine file system checks. ex_max_mount_counts is an integer. The default value set by the mke2fs command is a random number around 30. The value 0 means infinitely many.
-C ex_mount_count
Set the current mount count. ex_mount_count is an integer. This can be used to force a file system check during the next system boot by setting it to a value larger than the current maximum set using the -c option.
-e ex_error_behavior
Determines the behavior of the system when a file system error is detected. ex_error_behavior can be continue (go on as normal), remount-ro (disallow further writing to the file system), or panic (force a kernel panic).
-g ex_group
Set the group that can use the reserved file system blocks used by the root user. ex_group can be a group name or group ID (GID).
-i ex_interval_between_checks
Set the maximum time between two routine file system checks. ex_interval_between_checks is an integer. Acceptable units are d (days), w (weeks), m (months), and 0 (infinitely long).
-j
Add an ext3 journal to the file system. You will need to adjust /etc/fstab manually, if necessary.
-l
Display superblock information, including the current values of the parameters that can be set via this program. This can be used to determine the current blocks per group.
-L ex_file_system_label
Set a file system label (up to 16 characters).
-m ex_reserved_blocks_percentage
Set the percentage of data blocks reserved for the root user (or the user specified using the -u option). The default value is 5.
-O ex_feature,...
Set or clear the indicated file system options in the file system. A comma-separated list of features is also accepted. Run man 5 ext4 for details.
-u ex_user
Set the user that can use the reserved file system blocks. ex_user can be a username or user ID (UID).
-U ex_uuid
Set the UUID for a file system.

debugfs

The debugfs command is an interactive file system debugger for ext file systems.

# debugfs

At the debugfs: prompt, you can enter the ? command to see a list of the available commands.

If you know the file system that you want to operate on, you can pass it to debugfs as an argument:

# debugfs ex_partition_file

Some of debugfs's most useful commands include:

?
List available commands.
cd
Change working directory.
close
Close the file system.
features
Set/print the superblock features.
ls
List directory.
logdump
Dump the contents of the file system journal.
open
Open a file system.
params
Show debugfs parameters.
pwd
Show working directory.
stats
Show superblock statistics.
undelete
Undelete a just deleted file.
q
Quit debugfs.

debugfs only allows writing to a file system if you call it with the -w option.

sync

The sync command synchronizes cached writes to persistent storage.

When run as a non-root user, the current user's cached file data is synced to disk (sync). When run as the root user, all cached file data for all mounted file systems are synced to disk (# sync).

The sync command can be used in conjunction with other commands to clear the various caches in a GNU/Linux filesystem, as well. This can be useful after having changed how a system handles caching and you want to confirm that your customizations are active.

Clear Cache Commands
# sync; echo 1 > '/proc/sys/vm/drop_caches'
Clear PageCache.
# sync; echo 2 > '/proc/sys/vm/drop_caches'
Clear dentries and inodes.
# sync; echo 3 > '/proc/sys/vm/drop_caches'
Clear PageCache, dentries, and inodes.

Quotas

Quotas can be placed on the number of blocks (i.e., disk space) and the number of inodes (i.e., number of files) that a user is allowed to consume. These quotas are assigned for users per file system.

Both hard and soft limits can be set. A user may temporarily exceed soft limits for a time you define as the grace period (dipping back below a soft limit will restart the associated grace period). A user may not exceed a hard limit.

To set up quotas, you must:

  • Install the quota software, if necessary.
  • Mark those file systems where quotas should be enforced by including the usrquota mount option (grpquota for groups) either after the fact for a mounted file system (e.g., # mount -o remount,usrquota '/dev/sda2' '/home') or (preferably) by permanently including the option in the file system's entry in /etc/fstab (e.g., /dev/sda2 /home ext4 defaults,usrquota 0 2).

You can also set up group quotas applying to all members of a group together. For this to be effective, you must limit all groups that the users in question are members of. Otherwise, they could circumvent their quota by changing groups.

Quota Commands

There are several commands that can be used to manage quotas on GNU/Linux file systems.

quotacheck

The quotacheck command creates or verifies the aquota.user database file (aquota.group for groups) in a partition's root directory. For verification, the command examines each file system, builds a table of current disk usage, and compares this table against that recorded in the disk quota file for the file system. If any inconsistencies are detected, both the quota file and the current system copy of the incorrect quotas are updated.

To initialize or verify all quotas for all file systems, you can run:

# quotacheck -agmuv

The options used above are:

-a, --all
Check all mounted non-Network File System (NFS) file systems in /etc/mtab.
-g, --group
Only group quotas listed in /etc/mtab or on the specified file systems are to be checked.
-m, --no-remount
Do not try to remount file systems read-only.
-u, --user
Only user quotas listed in /etc/mtab or on the specified file systems are to be checked.
-v, --verbose
quotacheck reports its operation as it progresses.

It is advisable to periodically execute quotacheck during normal system operations in order to clean the database. Quotas should be turned off before running this command.

edquota

A user's quota can be edited with the edquota command and its -u (--user) option:

# edquota -u ex_username

A user quota can be set using a different user's quota setting (i.e., a prototype user) like so (where -p is short for --prototype):

# edquota -p ex_prototype_user ex_username

A grace period can be set for all users with the -t option (--edit-period):

# edquota -t

After running this command, your user's editor should open the quota configuration file to let you specify a grace period.

quotaon

Quotas can be turned on for a file system with the quotaon command:

# quotaon -guv ex_file_system_mount_point

Group and user quotas are enabled with the -g (--group) and -u (--user) options, respectively, and the -v (--verbose) option ensures verbose command output.

To have quotas turned on for all automatically mounted non-NFS file systems in /etc/fstab with quotas, add the -a (--all) option.

quotaoff

Quotas are turned off with the quotaoff command:

# quotaoff -guv ex_file_system_mount_point

The -guv and -a options for quotaoff serve the same purpose as for quotaon.

quota

Quota information can be viewed with the quota command.

View Quotas in Place for the Current User

quota -v, quota --verbose

View Quotas in Place for a Specific User's Account

# quota -v ex_username

Display Group Quotas for the Current User's Groups

quota -gv, quota --group --verbose

View Quotas in Place for a Specific Group

# quota -gv ex_group_name

Display File Systems Whose Soft Quota Has Been Exceeded for the Current User

quota -q, quota --quiet

repquota

To display the hard and soft limits for all users, as well as each user's usage and grace period, you can use the repquota command with the -a (--all) option:

# repquota -a

The grace column contains the remaining days of the grace period if the soft quota has been exceeded.

Network File System (NFS)

The Network File System (NFS) supports UNIX, UNIX-like, and Windows systems.

To start the NFS daemons, you can run:

# systemctl start nfs

To view the directories and permissions that a GNU/Linux system is willing to share over NFS, run:

less '/etc/exports'

After modifying this file, you will need to run # exportfs -av or # systemctl restart nfs. The former option is preferable because the latter halts NFS for a short time while starting it up again.

To ensure that NFS starts when the system is booted, run # systemctl enable nfs.

dd

The dd (data definition) command reads data block by block from an input file and writes it unchanged to an output file. The data type and the type of files in question are inconsequential.

# dd ex_operands...

dd operands include:

bs=ex_bytes
Read and write up to ex_bytes at a time. The default is 512 bytes.
count=ex_integer
Copy only ex_integer input blocks.
if=ex_file
Read from ex_file instead of the standard input.
of=ex_file
Write to ex_file instead of the standard output.
oflag=ex_flags
Write as per the comma-separated symbol list (e.g., append, direct, directory; run man 1 dd for more flag information).
status=ex_level
The ex_level of information to print to the standard error. none suppresses everything except error messages, noxfer suppresses the final transfer statistics, and progress shows periodic transfer statistics.

dd Command Examples

Copy a Partition to a File

# dd \
    if=ex_partition_file \
    of=ex_file.dump \
    bs=8M \
    status=progress \
    oflag=direct &&
    sync

Restore a Partition File to a Partition on a New Drive

# dd \
    if=ex_file.dump \
    of=ex_partition_file \
    bs=8M \
    status=progress \
    oflag=direct &&
    sync

The new drive's geometry and partition table must match that of the old drive.

Back Up a Drive's Master Boot Record

# dd if=ex_device_file of=ex_mbr_file.dump bs=512 count=1

This command assumes a 512-byte Unit Allocation Size. Also, keep in mind that the MBR does not contain partitioning information for logical partitions. If you use logical partitions, you should use a program like sfdisk to save all of the partitioning scheme.

Restore a Drive's Master Boot Record

# dd if=ex_mbr_file.dump of=ex_device_file

Create a Live USB Key From an ISO Image File

# dd \
    if=ex_iso_file \
    of=ex_device_file \
    bs=8M \
    status=progress \
    oflag=direct &&
    sync

Create a Backup Image of a Drive

# dd \
    if=ex_device_file \
    of=ex_backup.img \
    bs=8M \
    status=progress \
    oflag=direct &&
    sync

Restore a Backup of a Drive

# dd \
    if=ex_backup.img \
    of=ex_device_file \
    bs=8M \
    status=progress \
    oflag=direct &&
    sync

Clone a Drive

# dd \
    if=ex_device_file \
    of=ex_device_file \
    bs=8M \
    status=progress \
    oflag=direct &&
    sync

Documentation

You can find more information on the commands discussed above by examining the Linux User's Manual, either at the command line or online.

Enjoyed this post?

Subscribe to the feed for the latest updates.