Managing compression and archiving tasks is a substantive part of administering a GNU/Linux system.
Note: If you are not familiar with the GNU/Linux command line interface, review the Conventions page before proceeding.
gzip (GNU zip) is one of the popular compression/decompression utilities available on GNU/Linux systems (alternatives are the slower, but more efficient,
gzip replaces an uncompressed file(s) with a compressed one ending in the
Search patterns can be used with
gzip *.txt will replace all .txt files in the current directory with compressed versions).
These are some of
gzip's most helpful options:
- Writes the compressed file to the standard output. The original file remains unmodified.
- Keep an uncompressed version of the file that is being compressed.
- Displays important information about the compressed file (i.e., compressed size, uncompressed size, ratio, uncompressed name). This option is passed a compressed file(s) as an argument.
gzipto act recursively when acting on a directory.
- Use the
ex_suffixsuffix passed instead of
.gz. This option must also be used for decompression when dealing with non-
- Output the name and percentage reduction for each file compressed or decompressed.
- Specify a compression factor.
-1is the quickest and worst compression.
-9is the slowest and best compression.
-6is used by default.
There are special alternatives for certain commands that are designed to work with
gzip-compressed files. These are:
To decompress a file(s) that was compressed using
gzip, you can use the command's
--decompress) option or the
gzip -d ex_file...
tar command (short for tape archive) is used to manage archives. Its general syntax is:
tar ex_options ex_archive
tar has three main methods that act on its arguments. Each method has a corresponding letter that must be the first letter in the list of specified options:
-cCreate an archive
-tTell the contents of an archive
-xExtract the contents of an archive
--file) option, it must come last among the
tar options. This option tells
tar that the next word on the command line is the plain file archive or device file archive to create or act upon. When creating or extracting archives,
tar does not send anything to the standard output, unless the
-v option (short for
--verbose) is used.
tar options come after the options bundle, and are matched to the corresponding parameter-taking options within the bundle, in turn. If arguments are provided using absolute path names,
tar automatically stores them as relative path names (i.e., leading
/s are removed). This avoids issues when unpacking archives on other systems.
tar writes files to an archive, it writes them one after another, and annotates the files with additional information (e.g., date, access permissions, owner). tar archives can contain both files and whole directory hierarchies.
tar acts recursively.
These are some of the most useful
- Delete from the archive. This option is supplied with the name(s) of archive members to be removed as arguments, and only works for uncompressed archives.
- Create/list/extract multi-volume archive.
ex_filefile(s) to the archive. If a file with the same name as the to-be-appended file already exists in the archive, it will not be overwritten, as the new version is appended to the end of the archive.
ex_filefile(s) that are newer than the corresponding copy inside of the archive. Newer files do not replace their old archive copies, but instead are appended to the end of archive. If a file is not archived, it will be inserted into the archive.
- Verbosely list processed files.
tar is a complex command that can be better understood by viewing examples.
tar -cf 'foobar.tar' 'foobar'
The example above creates an archive called
foobar.tar from the
foobar directory that is located inside of the current directory.
When creating a tar archive, run your
tar command from the parent directory of the directory that you wish to create an archive from. For the example above, if the
foobar directory was located at
/home/amnesia/foobar, the command would be run from the
/home/amnesia directory. After command completion, you would see a
foobar.tar archive at
If you want your archive to be compressed, as well, append the
-z option, which adds compression via
tar -czf 'foobar.tar.gz' 'foobar'
$ tar -tf '/home/amnesia/foobar.tar' foobar/ foobar/foo_4.txt foobar/foo_3.txt foobar/foobar_2/ foobar/foobar_2/foo_4.txt foobar/foobar_2/foo_3.txt foobar/foobar_2/foo_1.txt foobar/foobar_2/foo_2.txt foobar/foobar_2/foo_5.txt foobar/foo_1.txt foobar/foo_2.txt foobar/foo_5.txt
The example above displays (tells) the contents of the
/home/amnesia/foobar.tar archive. The same command would work if the archive was compressed (e.g.,
tar -tf '/home/amnesia/foobar.tar.gz).
tar -xf '/home/amnesia/foobar.tar'
The example above extracts the contents of the
/home/amnesia/foobar.tar archive into the current directory. As during archive telling, the same command would work if the archive was compressed (e.g.,
tar -xf '/home/amnesia/foobar.tar.gz).
When you extract a tar archive, it is best to do so from a directory other than the directory where you created the archive. For example, the
/tmp directory is a good location to quickly extract archives to.
You can give specific file or directory names to extract. In this case, only the file or directory specified will be extracted from the archive, e.g.,
tar -xf '/home/amnesia/foobar.tar' 'foobar/foo_4.txt' or
tar -xf '/home/amnesia/foobar.tar' 'foobar/foobar_2/'. If multiple objects with the same name are in the archive, the latest one (by its order in the archive) is extracted.
Another popular archiving utility on GNU/Linux is
initrd.img file used to help boot a Debian GNU/Linux system is a
cpio has three operating modes:
- Copy-out Read a list of filenames from the standard input and create an archive containing these files on the standard output. Uses the
- Copy-in Read the archive from the standard input or from a file (
--file) and extract files from it (uses the
-ioption; short for
--extract), or list its contents to the standard output (uses the
- Pass-through Read a list of filenames from the standard input and copy them to the specified directory. Uses the
In some ways,
cpio operates like
tar. For example, it appends objects to the end of an archive, without overwriting any extant objects in the archive with the same name. cpio archives can contain both files and whole directory hierarchies. Also, when creating or extracting archives,
cpio does not send anything to the standard output unless the
--verbose) option is used.
cpio needs input and output specified via redirection or a pipe. Also,
cpio is not recursive, which is why you may see it paired with the
find command, which is recursive (i.e.,
cpio will not include the contents of any subdirectories in the directory from which you create your cpio archive from).
The following are example commands for how to create, tell, and extract cpio archives, as well as how to copy files from one directory to another with
$ ls | cpio -o > '/home/amnesia/foobar/foobar.cpio' 1 block
The above command creates an archive called
foobar.cpio from the
/home/amnesia/foobar directory. When creating a cpio archive, you should run your command from inside of the directory that you wish to archive. For the above example, the
foobar directory was located at
/home/amnesia/foobar, so the command was run from the
You can compress your cpio archive by piping the
cpio output through
$ ls | cpio -o | gzip > '/home/amnesia/foobar/foobar.cpio.gz' 1 block
As previously mentioned, unlike
cpio is not recursive. If you want to make a cpio archive of a directory that contains subdirectories (i.e., a directory tree), you can pair it with the
$ find . | cpio -o > '/home/amnesia/foobar/foobar.cpio' 2 blocks
find command uses the current directory (
.) as the directory tree to begin its recursive search, and then pipes the filename paths it returns to the
$ cpio -t < '/home/amnesia/foobar/foobar.cpio' foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt foobar.cpio 1 block
To tell the contents of a
gzip-compressed cpio archive, you need to first decompress the archive, and then pipe the output to
$ gzip -cd '/home/amnesia/foobar/foobar.cpio.gz' | cpio -t foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt foobar.cpio.gz 1 block
-c option is used to send the command's output to the standard output, which is then piped to
$ cpio -i < '/home/amnesia/foobar/foobar.cpio' 1 block $ ls foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt foobar.cpio
Above, the contents of the
foobar.cpio archive are extracted into the current directory.
tar, when you extract a cpio archive, it is best to do so from a directory other than the directory where you created the archive (e.g.,
/tmp). If you are going to extract a cpio archive that contains a directory tree, you can add the
-d option to create leading directories where needed.
You can extract files from a
gzip-compressed cpio archive like so:
$ gzip -cd '/home/amnesia/foobar/foobar.cpio.gz' | cpio -i 1 block $ ls foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt foobar.cpio.gz
cpio's copy-pass mode is used to copy files from one directory to another, combining its copy-in and copy-out modes without creating an archive.
$ ls | cpio -p '/home/amnesia/foobar_copy' 0 blocks $ ls '/home/amnesia/foobar_copy' foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt
A third utility related to archiving on GNU/Linux is
cpio, which separately handle compression and archiving, respectively,
zip does both.
zip ex_archive.zip ex_file...
zip knows two methods for adding files to an archive:
- stored means that the file was stored without compression
- deflated denotes compression (and the percentage states how much the file was compressed)
zip automatically chooses which method to use, unless you disable compression using the
zip automatically overwrites existing content in an archive that has the same name as a new file that you are adding to the archive. Like
zip is not recursive, but can act recursively with its
-r option (i.e., like
zip options include:
- Disable compression.
ex_archive_entryentry (or entries) from archive.
- File system sync. Synchronizes an archive with the file system (i.e., updates the archive by adding files to the archive only if the file mentioned on the command line is newer than a pre-existing object of the same name in the archive, and deletes files from the archive that have not been named on the command line).
zipto act recursively when acting on a directory.
- Update the archive by adding objects to the archive only if the object mentioned on the command line is newer than a pre-existing object of the same name in the archive.
Here are example commands for how to create, tell, and extract zip archives.
$ zip 'foobar.zip' * adding: foo_1.txt (stored 0%) adding: foo_2.txt (stored 0%) adding: foo_3.txt (stored 0%) adding: foo_4.txt (stored 0%) adding: foo_5.txt (stored 0%)
Above, every file in the current directory (specified by the
* search pattern) is added to a
foobar.zip archive. The command was run from inside of the directory to be archived, i.e., the
In the prior example, if
/home/amnesia/foobar contained any subdirectories, their content would not have been included in the final
foobar.zip archive. You can change this by using
$ zip -r 'foobar.zip' 'foobar' adding: foobar/ (stored 0%) adding: foobar/foo_3.txt (stored 0%) adding: foobar/foo_2.txt (stored 0%) adding: foobar/foo_1.txt (stored 0%) adding: foobar/foobar_2/ (stored 0%) adding: foobar/foobar_2/foo_3.txt (stored 0%) adding: foobar/foobar_2/foo_2.txt (stored 0%) adding: foobar/foobar_2/foo_1.txt (stored 0%) adding: foobar/foobar_2/foo_5.txt (stored 0%) adding: foobar/foobar_2/foo_4.txt (stored 0%) adding: foobar/foo_5.txt (stored 0%) adding: foobar/foo_4.txt (stored 0%)
tar, when creating archives of directory trees, you should run your
zip command from the parent directory of the directory that you wish to create an archive from. For the example above, you would run the command from the
/home/amnesia directory, i.e., the parent directory of the
Alternatively, you can do this:
$ find 'foobar' | zip -@ 'foobar.zip' adding: foobar/ (stored 0%) adding: foobar/foo_3.txt (stored 0%) adding: foobar/foo_2.txt (stored 0%) adding: foobar/foo_1.txt (stored 0%) adding: foobar/foobar_2/ (stored 0%) adding: foobar/foobar_2/foo_3.txt (stored 0%) adding: foobar/foobar_2/foo_2.txt (stored 0%) adding: foobar/foobar_2/foo_1.txt (stored 0%) adding: foobar/foobar_2/foo_5.txt (stored 0%) adding: foobar/foobar_2/foo_4.txt (stored 0%) adding: foobar/foo_5.txt (stored 0%) adding: foobar/foo_4.txt (stored 0%)
find command is used to recursively traverse the
/home/amnesia/foobar directory and pass along the path name of each file to the
zip command via a pipe (
zip to take its file list from the standard input, instead of as a list of arguments on the command line.
find 'foobar' | zip -@ 'foobar.zip' command would be run from the
/home/amnesia directory, i.e., the parent directory of the
/home/amnesia/foobar directory. This use case is similar to how the
find command can be used with the
cpio command, as previously discussed. This kind of construction allows you to utilize the powerful flexibility of
find to create custom filters for the kinds of files you want to include in your archive.
You can view the contents of a zip archive by using
$ zip -sf 'foobar.zip' Archive contains: home/amnesia/foobar/ home/amnesia/foobar/foo_3.txt home/amnesia/foobar/foo_2.txt home/amnesia/foobar/foo_1.txt home/amnesia/foobar/foobar_2/ home/amnesia/foobar/foobar_2/foo_3.txt home/amnesia/foobar/foobar_2/foo_2.txt home/amnesia/foobar/foobar_2/foo_1.txt home/amnesia/foobar/foobar_2/foo_5.txt home/amnesia/foobar/foobar_2/foo_4.txt home/amnesia/foobar/foo_5.txt home/amnesia/foobar/foo_4.txt Total 12 entries (0 bytes)
The above example displays the contents of the
foobar.zip archive in the current directory.
$ unzip '/home/amnesia/foobar.zip' Archive: /home/amnesia/foobar.zip creating: foobar/ extracting: foobar/foo_3.txt extracting: foobar/foo_2.txt extracting: foobar/foo_1.txt creating: foobar/foobar_2/ extracting: foobar/foobar_2/foo_3.txt extracting: foobar/foobar_2/foo_2.txt extracting: foobar/foobar_2/foo_1.txt extracting: foobar/foobar_2/foo_5.txt extracting: foobar/foobar_2/foo_4.txt extracting: foobar/foo_5.txt extracting: foobar/foo_4.txt
If the object that you are about to decompress and extract already exists in the current directory,
unzip offers to ignore it in the archive, rename the object from the archive, or overwrite the existing object in the current directory with the archive copy.
If you only want to decompress and extract specific objects in the archive, you can do so by specifying them as arguments on the command line:
$ unzip '/home/amnesia/foobar.zip' 'foobar/foo_3.txt' Archive: /home/amnesia/foobar.zip extracting: foobar/foo_3.txt
If you want to decompress and extract multiple zip archives in the same directory, you can do the following:
$ unzip '*.zip' Archive: foobar.zip extracting: foo_1.txt extracting: foo_2.txt extracting: foo_3.txt extracting: foo_4.txt extracting: foo_5.txt Archive: foobar_2.zip extracting: foo_10.txt extracting: foo_6.txt extracting: foo_7.txt extracting: foo_8.txt extracting: foo_9.txt Archive: foobar_3.zip extracting: foo_11.txt extracting: foo_12.txt extracting: foo_13.txt extracting: foo_14.txt extracting: foo_15.txt 3 archives were successfully processed.
*.zip needs to be surrounded in quotes (
'') because if filename expansion occurs,
zip will interpret one archive as the target to operate on (
foobar.zip), and the other two zip archives (
foobar_3.zip) as entries you would like to decompress/extract from the target.