Page Body

How to Manage Compression and Archiving on GNU/Linux

Managing compression and archiving tasks is a substantive part of administering a GNU/Linux system.

Note: If you are not familiar with the GNU/Linux command line interface, review the Conventions page before proceeding.

gzip

gzip (GNU zip) is one of the popular compression/decompression utilities available on GNU/Linux systems (alternatives are the slower, but more efficient, bzip2 and xz).

gzip Compression

gzip replaces an uncompressed file(s) with a compressed one ending in the .gz suffix.

gzip ex_file...

Search patterns can be used with gzip (e.g., gzip *.txt will replace all .txt files in the current directory with compressed versions).

These are some of gzip's most helpful options:

-c
Writes the compressed file to the standard output. The original file remains unmodified.
-k
Keep an uncompressed version of the file that is being compressed.
-l
Displays important information about the compressed file (i.e., compressed size, uncompressed size, ratio, uncompressed name). This option is passed a compressed file(s) as an argument.
-r
Instructs gzip to act recursively when acting on a directory.
-S ex_suffix
Use the ex_suffix suffix passed instead of .gz. This option must also be used for decompression when dealing with non-.gz files.
-v
Output the name and percentage reduction for each file compressed or decompressed.
-1...-9
Specify a compression factor. -1 is the quickest and worst compression. -9 is the slowest and best compression. -6 is used by default.

There are special alternatives for certain commands that are designed to work with gzip-compressed files. These are:

  • zcat
  • zless
  • zdiff
  • zgrep

gzip Decompression

To decompress a file(s) that was compressed using gzip, you can use the command's -d option or the gunzip command.

gzip -d ex_file...

gunzip ex_file...

tar

The tar command (short for tape archive) is used to manage archives. Its general syntax is:

tar ex_options ex_archive

tar has three main methods that act on its arguments. Each method has a corresponding letter that must be the first letter in the list of specified options:

  1. -c Create an archive
  2. -t Tell the contents of an archive
  3. -x Extract the contents of an archive

When using tar's -f option (short for --file), it must come last among the tar options. This option tells tar that the next word on the command line is the plain file archive or device file archive to create or act upon. When creating or extracting archives, tar does not send anything to the standard output, unless the -v option (short for --verbose) is used.

Arguments for tar options come after the options bundle, and are matched to the corresponding parameter-taking options within the bundle, in turn. If arguments are provided using absolute path names, tar automatically stores them as relative path names (i.e., leading /s are removed). This avoids issues when unpacking archives on other systems.

When tar writes files to an archive, it writes them one after another, and annotates the files with additional information (e.g., date, access permissions, owner). tar archives can contain both files and whole directory hierarchies.

By default, tar acts recursively.

These are some of the most useful tar options:

--delete ex_archive_member
Delete from the archive. This option is supplied with the name(s) of archive members to be removed as arguments, and only works for uncompressed archives.
-M
Create/list/extract multi-volume archive. Short for --multi-volume.
-r ex_file
Append ex_file file(s) to the archive. If a file with the same name as the to-be-appended file already exists in the archive, it will not be overwritten, as the new version is appended to the end of the archive. Short for --append.
-u ex_file
Append ex_file file(s) that are newer than the corresponding copy inside of the archive. Newer files do not replace their old archive copies, but instead are appended to the end of archive. If a file is not archived, it will be inserted into the archive. Short for --update.
-v
Verbosely list processed files. Short for --verbose.

Like find, tar is a complex command that can be better understood by viewing examples.

tar Archive Creation

tar -cf 'foobar.tar' 'foobar'

The example above creates an archive called foobar.tar from the foobar directory that is located inside of the current directory.

When creating a tar archive, run your tar command from the parent directory of the directory that you wish to create an archive from. For the example above, if the foobar directory was located at /home/amnesia/foobar, the command would be run from the /home/amnesia directory. After command completion, you would see a foobar.tar archive at /home/amnesia/foobar.tar.

If you want your archive to be compressed, as well, append the -z option, which adds compression via gzip:

tar -czf 'foobar.tar.gz' 'foobar'

tar Archive Telling

$ tar -tf '/home/amnesia/foobar.tar'
foobar/
foobar/foo_4.txt
foobar/foo_3.txt
foobar/foobar_2/
foobar/foobar_2/foo_4.txt
foobar/foobar_2/foo_3.txt
foobar/foobar_2/foo_1.txt
foobar/foobar_2/foo_2.txt
foobar/foobar_2/foo_5.txt
foobar/foo_1.txt
foobar/foo_2.txt
foobar/foo_5.txt

The example above displays (tells) the contents of the /home/amnesia/foobar.tar archive. The same command would work if the archive was compressed (e.g., tar -tf '/home/amnesia/foobar.tar.gz).

tar Archive Extraction

tar -xf '/home/amnesia/foobar.tar'

The example above extracts the contents of the /home/amnesia/foobar.tar archive into the current directory. As during archive telling, the same command would work if the archive was compressed (e.g., tar -xf '/home/amnesia/foobar.tar.gz).

When you extract a tar archive, it is best to do so from a directory other than the directory where you created the archive. For example, the /tmp directory is a good location to quickly extract archives to.

You can give specific file or directory names to extract. In this case, only the file or directory specified will be extracted from the archive, e.g., tar -xf '/home/amnesia/foobar.tar' 'foobar/foo_4.txt' or tar -xf '/home/amnesia/foobar.tar' 'foobar/foobar_2/'. If multiple objects with the same name are in the archive, the latest one (by its order in the archive) is extracted.

cpio

Another popular archiving utility on GNU/Linux is cpio. The initrd.img file used to help boot a Debian GNU/Linux system is a gzip-compressed cpio archive.

cpio has three operating modes:

  1. Copy-out Read a list of filenames from the standard input and create an archive containing these files on the standard output. Uses the -o option.
  2. Copy-in Read the archive from the standard input or from a file (--file) and extract files from it (uses the -i option), or list its contents to the standard output (uses the -t option).
  3. Pass-through Read a list of filenames from the standard input and copy them to the specified directory. Uses the -p option.

In some ways, cpio operates like tar. For example, it appends objects to the end of an archive, without overwriting any extant objects in the archive with the same name. cpio archives can contain both files and whole directory hierarchies. Also, when creating or extracting archives, cpio does not send anything to the standard output unless the -v option (short for --verbose) is used.

Unlike tar, cpio needs input and output specified via redirection or a pipe. Also, cpio is not recursive, which is why you may see it paired with the find command, which is recursive (i.e., cpio will not include the contents of any subdirectories in the directory from which you create your cpio archive from).

The following are example commands for how to create, tell, and extract cpio archives, as well as how to copy files from one directory to another with cpio.

cpio Archive Creation

$ ls | cpio -o > '/home/amnesia/foobar/foobar.cpio'
1 block

The above command creates an archive called foobar.cpio from the /home/amnesia/foobar directory. When creating a cpio archive, you should run your command from inside of the directory that you wish to archive. For the above example, the foobar directory was located at /home/amnesia/foobar, so the command was run from the /home/amnesia/foobar directory.

You can compress your cpio archive by piping the cpio output through gzip:

$ ls | cpio -o | gzip > '/home/amnesia/foobar/foobar.cpio.gz'
1 block

As previously mentioned, unlike tar, cpio is not recursive. If you want to make a cpio archive of a directory that contains subdirectories (i.e., a directory tree), you can pair it with the find command.

$ find . | cpio -o > '/home/amnesia/foobar/foobar.cpio'
2 blocks

Above, the find command uses the current directory (.) as the directory tree to begin its recursive search, and then pipes the filename paths it returns to the cpio command.

cpio Archive Telling

$ cpio -t < '/home/amnesia/foobar/foobar.cpio'
foo_1.txt
foo_2.txt
foo_3.txt
foo_4.txt
foo_5.txt
foobar.cpio
1 block

To tell the contents of a gzip-compressed cpio archive, you need to first decompress the archive, and then pipe the output to cpio:

$ gzip -cd '/home/amnesia/foobar/foobar.cpio.gz' | cpio -t
foo_1.txt
foo_2.txt
foo_3.txt
foo_4.txt
foo_5.txt
foobar.cpio.gz
1 block

Above, gzip's -c option is used to send the command's output to the standard output, which is then piped to cpio -t.

cpio Archive Extraction

$ cpio -i < '/home/amnesia/foobar/foobar.cpio'
1 block
$ ls
foo_1.txt  foo_2.txt  foo_3.txt  foo_4.txt  foo_5.txt  foobar.cpio

Above, the contents of the foobar.cpio archive are extracted into the current directory.

As with tar, when you extract a cpio archive, it is best to do so from a directory other than the directory where you created the archive (e.g., /tmp). If you are going to extract a cpio archive that contains a directory tree, you can add the -d option to create leading directories where needed.

You can extract files from a gzip-compressed cpio archive like so:

$ gzip -cd '/home/amnesia/foobar/foobar.cpio.gz' | cpio -i
1 block
$ ls
foo_1.txt  foo_2.txt  foo_3.txt  foo_4.txt  foo_5.txt  foobar.cpio.gz

Copying Files With cpio

cpio's copy-pass mode is used to copy files from one directory to another, combining its copy-in and copy-out modes without creating an archive.

$ ls | cpio -p '/home/amnesia/foobar_copy'
0 blocks
$ ls '/home/amnesia/foobar_copy'
foo_1.txt  foo_2.txt  foo_3.txt  foo_4.txt  foo_5.txt

zip

A third utility related to archiving on GNU/Linux is zip. Unlike gzip and tar/cpio, which separately handle compression and archiving, respectively, zip does both.

zip ex_archive.zip ex_file...

zip knows two methods for adding files to an archive:

  1. stored means that the file was stored without compression
  2. deflated denotes compression (and the percentage states how much the file was compressed)

zip automatically chooses which method to use, unless you disable compression using the -0 option.

Unlike tar and cpio, zip automatically overwrites existing content in an archive that has the same name as a new file that you are adding to the archive. Like cpio, zip is not recursive, but can act recursively with its -r option (i.e., like gzip).

Helpful zip options include:

-0
Disable compression.
-d ex_archive_entry
Delete ex_archive_entry entry (or entries) from archive.
-FS
File system sync. Synchronizes an archive with the file system (i.e., updates the archive by adding files to the archive only if the file mentioned on the command line is newer than a pre-existing object of the same name in the archive, and deletes files from the archive that have not been named on the command line). Short for --filesync.
-r
Instructs zip to act recursively when acting on a directory.
-u
Update the archive by adding objects to the archive only if the object mentioned on the command line is newer than a pre-existing object of the same name in the archive.

Here are example commands for how to create, tell, and extract zip archives.

Create zip Archive of All Files in the Current Directory

$ zip 'foobar.zip' *
  adding: foo_1.txt (stored 0%)
  adding: foo_2.txt (stored 0%)
  adding: foo_3.txt (stored 0%)
  adding: foo_4.txt (stored 0%)
  adding: foo_5.txt (stored 0%)

Above, every file in the current directory (specified by the * search pattern) is added to a foobar.zip archive. The command was run from inside of the directory to be archived, i.e., the /home/amnesia/foobar directory.

Create zip Archive of All Files in a Directory Tree

In the prior example, if /home/amnesia/foobar contained any subdirectories, their content would not have been included in the final foobar.zip archive. You can change this by using zip's -r option.

$ zip -r 'foobar.zip' 'foobar'
  adding: foobar/ (stored 0%)
  adding: foobar/foo_3.txt (stored 0%)
  adding: foobar/foo_2.txt (stored 0%)
  adding: foobar/foo_1.txt (stored 0%)
  adding: foobar/foobar_2/ (stored 0%)
  adding: foobar/foobar_2/foo_3.txt (stored 0%)
  adding: foobar/foobar_2/foo_2.txt (stored 0%)
  adding: foobar/foobar_2/foo_1.txt (stored 0%)
  adding: foobar/foobar_2/foo_5.txt (stored 0%)
  adding: foobar/foobar_2/foo_4.txt (stored 0%)
  adding: foobar/foo_5.txt (stored 0%)
  adding: foobar/foo_4.txt (stored 0%)

Like with tar, when creating archives of directory trees, you should run your zip command from the parent directory of the directory that you wish to create an archive from. For the example above, you would run the command from the /home/amnesia directory, i.e., the parent directory of the /home/amnesia/foobar directory.

Alternatively, you can do this:

$ find 'foobar' | zip -@ 'foobar.zip'
  adding: foobar/ (stored 0%)
  adding: foobar/foo_3.txt (stored 0%)
  adding: foobar/foo_2.txt (stored 0%)
  adding: foobar/foo_1.txt (stored 0%)
  adding: foobar/foobar_2/ (stored 0%)
  adding: foobar/foobar_2/foo_3.txt (stored 0%)
  adding: foobar/foobar_2/foo_2.txt (stored 0%)
  adding: foobar/foobar_2/foo_1.txt (stored 0%)
  adding: foobar/foobar_2/foo_5.txt (stored 0%)
  adding: foobar/foobar_2/foo_4.txt (stored 0%)
  adding: foobar/foo_5.txt (stored 0%)
  adding: foobar/foo_4.txt (stored 0%)

Above, the find command is used to recursively traverse the /home/amnesia/foobar directory and pass along the path name of each file to the zip command via a pipe (|). -@ tells zip to take its file list from the standard input, instead of as a list of arguments on the command line.

The find 'foobar' | zip -@ 'foobar.zip' command would be run from the /home/amnesia directory, i.e., the parent directory of the /home/amnesia/foobar directory. This use case is similar to how the find command can be used with the cpio command, as previously discussed. This kind of construction allows you to utilize the powerful flexibility of find to create custom filters for the kinds of files you want to include in your archive.

zip Archive Telling

You can view the contents of a zip archive by using zip's -sf option (short for --show-files).

$ zip -sf 'foobar.zip'
Archive contains:
  home/amnesia/foobar/
  home/amnesia/foobar/foo_3.txt
  home/amnesia/foobar/foo_2.txt
  home/amnesia/foobar/foo_1.txt
  home/amnesia/foobar/foobar_2/
  home/amnesia/foobar/foobar_2/foo_3.txt
  home/amnesia/foobar/foobar_2/foo_2.txt
  home/amnesia/foobar/foobar_2/foo_1.txt
  home/amnesia/foobar/foobar_2/foo_5.txt
  home/amnesia/foobar/foobar_2/foo_4.txt
  home/amnesia/foobar/foo_5.txt
  home/amnesia/foobar/foo_4.txt
Total 12 entries (0 bytes)

The above example displays the contents of the foobar.zip archive in the current directory.

zip Archive Decompression and Extraction

$ unzip '/home/amnesia/foobar.zip'
Archive:  /home/amnesia/foobar.zip
   creating: foobar/
 extracting: foobar/foo_3.txt        
 extracting: foobar/foo_2.txt        
 extracting: foobar/foo_1.txt        
   creating: foobar/foobar_2/
 extracting: foobar/foobar_2/foo_3.txt  
 extracting: foobar/foobar_2/foo_2.txt  
 extracting: foobar/foobar_2/foo_1.txt  
 extracting: foobar/foobar_2/foo_5.txt  
 extracting: foobar/foobar_2/foo_4.txt  
 extracting: foobar/foo_5.txt        
 extracting: foobar/foo_4.txt

If the object that you are about to decompress and extract already exists in the current directory, unzip offers to ignore it in the archive, rename the object from the archive, or overwrite the existing object in the current directory with the archive copy.

If you only want to decompress and extract specific objects in the archive, you can do so by specifying them as arguments on the command line:

$ unzip '/home/amnesia/foobar.zip' 'foobar/foo_3.txt'
Archive:  /home/amnesia/foobar.zip
 extracting: foobar/foo_3.txt

If you want to decompress and extract multiple zip archives in the same directory, you can do the following:

$ unzip '*.zip'
Archive:  foobar.zip
 extracting: foo_1.txt               
 extracting: foo_2.txt               
 extracting: foo_3.txt               
 extracting: foo_4.txt               
 extracting: foo_5.txt               

Archive:  foobar_2.zip
 extracting: foo_10.txt              
 extracting: foo_6.txt               
 extracting: foo_7.txt               
 extracting: foo_8.txt               
 extracting: foo_9.txt               

Archive:  foobar_3.zip
 extracting: foo_11.txt              
 extracting: foo_12.txt              
 extracting: foo_13.txt              
 extracting: foo_14.txt              
 extracting: foo_15.txt              

3 archives were successfully processed.

Above, *.zip needs to be surrounded in quotes ('') because if filename expansion occurs, zip will interpret one archive as the target to operate on (foobar.zip), and the other two zip archives (foobar_2.zip and foobar_3.zip) as entries you would like to decompress/extract from the target.

Documentation

You can learn more about gzip, tar, cpio, and zip by examining their man pages on the command line, or online.

Avatar

Enjoyed this post?

Subscribe to the feed for the latest updates.