Managing compression and archiving tasks is a substantive part of administering a GNU/Linux system.
Note: If you are not familiar with the GNU/Linux command line interface, review the Conventions page before proceeding.
gzip
gzip
(GNU zip) is one of the popular compression/decompression utilities available on GNU/Linux systems (alternatives are the slower, but more efficient, bzip2
and xz
).
gzip Compression
gzip
replaces an uncompressed file(s) with a compressed one ending in the .gz
suffix.
gzip ex_file...
Search patterns can be used with gzip
(e.g., gzip *.txt
will replace all .txt files in the current directory with compressed versions).
These are some of gzip
's most helpful options:
-c
,--stdout
,--to-stdout
- Writes the compressed file to the standard output. The original file remains unmodified.
-k
,--keep
- Keep an uncompressed version of the file that is being compressed.
-l
,--list
- Displays important information about the compressed file (i.e., compressed size, uncompressed size, ratio, uncompressed name). This option is passed a compressed file(s) as an argument.
-r
,--recursive
- Instructs
gzip
to act recursively when acting on a directory. -S ex_suffix
,--suffix ex_suffix
- Use the
ex_suffix
suffix passed instead of.gz
. This option must also be used for decompression when dealing with non-.gz
files. -v
,--verbose
- Output the name and percentage reduction for each file compressed or decompressed.
-1...-9
- Specify a compression factor.
-1
is the quickest and worst compression.-9
is the slowest and best compression.-6
is used by default.
There are special alternatives for certain commands that are designed to work with gzip
-compressed files. These are:
zcat
zless
zdiff
zgrep
gzip Decompression
To decompress a file(s) that was compressed using gzip
, you can use the command's -d
(--decompress
) option or the gunzip
command.
gzip -d ex_file...
gunzip ex_file...
tar
The tar
command (short for tape archive) is used to manage archives. Its general syntax is:
tar ex_options ex_archive
tar
has three main methods that act on its arguments. Each method has a corresponding letter that must be the first letter in the list of specified options:
-c
Create an archive-t
Tell the contents of an archive-x
Extract the contents of an archive
When using tar
's -f
(--file
) option, it must come last among the tar
options. This option tells tar
that the next word on the command line is the plain file archive or device file archive to create or act upon. When creating or extracting archives, tar
does not send anything to the standard output, unless the -v
option (short for --verbose
) is used.
Arguments for tar
options come after the options bundle, and are matched to the corresponding parameter-taking options within the bundle, in turn. If arguments are provided using absolute path names, tar
automatically stores them as relative path names (i.e., leading /
s are removed). This avoids issues when unpacking archives on other systems.
When tar
writes files to an archive, it writes them one after another, and annotates the files with additional information (e.g., date, access permissions, owner). tar archives can contain both files and whole directory hierarchies.
By default, tar
acts recursively.
These are some of the most useful tar
options:
--delete ex_archive_member
- Delete from the archive. This option is supplied with the name(s) of archive members to be removed as arguments, and only works for uncompressed archives.
-M
,--multi-volume
- Create/list/extract multi-volume archive.
-r ex_file
,--append ex_file
- Append
ex_file
file(s) to the archive. If a file with the same name as the to-be-appended file already exists in the archive, it will not be overwritten, as the new version is appended to the end of the archive. -u ex_file
,--update ex_file
- Append
ex_file
file(s) that are newer than the corresponding copy inside of the archive. Newer files do not replace their old archive copies, but instead are appended to the end of archive. If a file is not archived, it will be inserted into the archive. -v
,--verbose
- Verbosely list processed files.
Like find, tar
is a complex command that can be better understood by viewing examples.
tar Archive Creation
tar -cf 'foobar.tar' 'foobar/'
The example above creates an archive called foobar.tar
from the foobar/
directory that is located inside of the current directory.
When creating a tar archive, run your tar
command from the parent directory of the directory that you wish to create an archive from. For the example above, if the foobar
directory was located at /home/amnesia/foobar/
, the command would be run from the /home/amnesia/
directory. After command completion, you would see a foobar.tar
archive at /home/amnesia/foobar.tar
.
If you want your archive to be compressed, as well, append the -z
option, which adds compression via gzip
:
tar -czf 'foobar.tar.gz' 'foobar/'
tar Archive Telling
$ tar -tf '/home/amnesia/foobar.tar'
foobar/
foobar/foo_4.txt
foobar/foo_3.txt
foobar/foobar_2/
foobar/foobar_2/foo_4.txt
foobar/foobar_2/foo_3.txt
foobar/foobar_2/foo_1.txt
foobar/foobar_2/foo_2.txt
foobar/foobar_2/foo_5.txt
foobar/foo_1.txt
foobar/foo_2.txt
foobar/foo_5.txt
The example above displays (tells) the contents of the /home/amnesia/foobar.tar
archive. The same command would work if the archive was compressed (e.g., tar -tf '/home/amnesia/foobar.tar.gz
).
tar Archive Extraction
tar -xf '/home/amnesia/foobar.tar'
The example above extracts the contents of the /home/amnesia/foobar.tar
archive into the current directory. As during archive telling, the same command would work if the archive was compressed (e.g., tar -xf '/home/amnesia/foobar.tar.gz'
).
When you extract a tar archive, it is best to do so from a directory other than the directory where you created the archive. For example, the /tmp/
directory is a good location to quickly extract archives to.
You can give specific file or directory names to extract. In this case, only the file or directory specified will be extracted from the archive, e.g., tar -xf '/home/amnesia/foobar.tar' 'foobar/foo_4.txt'
or tar -xf '/home/amnesia/foobar.tar' 'foobar/foobar_2/'
. If multiple objects with the same name are in the archive, the latest one (by its order in the archive) is extracted.
cpio
Another popular archiving utility on GNU/Linux is cpio
. The initrd.img
file used to help boot a Debian GNU/Linux system is a gzip
-compressed cpio
archive.
cpio
has three operating modes:
- Copy-out Read a list of filenames from the standard input and create an archive containing these files on the standard output. Uses the
-o
(--create
) option. - Copy-in Read the archive from the standard input or from a file (
--file
) and extract files from it (uses the-i
option; short for--extract
), or list its contents to the standard output (uses the-t
option). - Pass-through Read a list of filenames from the standard input and copy them to the specified directory. Uses the
-p
(--pass-through
) option.
In some ways, cpio
operates like tar
. For example, it appends objects to the end of an archive, without overwriting any extant objects in the archive with the same name. cpio archives can contain both files and whole directory hierarchies. Also, when creating or extracting archives, cpio
does not send anything to the standard output unless the -v
(--verbose
) option is used.
Unlike tar
, cpio
needs input and output specified via redirection or a pipe. Also, cpio
is not recursive, which is why you may see it paired with the find
command, which is recursive (i.e., cpio
will not include the contents of any subdirectories in the directory from which you create your cpio archive from).
The following are example commands for how to create, tell, and extract cpio archives, as well as how to copy files from one directory to another with cpio
.
cpio Archive Creation
$ ls | cpio -o > '/home/amnesia/foobar/foobar.cpio'
1 block
The above command creates an archive called foobar.cpio
from the /home/amnesia/foobar/
directory. When creating a cpio archive, you should run your command from inside of the directory that you wish to archive. For the above example, the foobar
directory was located at /home/amnesia/foobar/
, so the command was run from the /home/amnesia/foobar/
directory.
You can compress your cpio archive by piping the cpio
output through gzip
:
$ ls | cpio -o | gzip > '/home/amnesia/foobar/foobar.cpio.gz'
1 block
As previously mentioned, unlike tar
, cpio
is not recursive. If you want to make a cpio archive of a directory that contains subdirectories (i.e., a directory tree), you can pair it with the find
command.
$ find . | cpio -o > '/home/amnesia/foobar/foobar.cpio'
2 blocks
Above, the find
command uses the current directory (.
) as the directory tree to begin its recursive search, and then pipes the filename paths it returns to the cpio
command.
cpio Archive Telling
$ cpio -t < '/home/amnesia/foobar/foobar.cpio'
foo_1.txt
foo_2.txt
foo_3.txt
foo_4.txt
foo_5.txt
foobar.cpio
1 block
To tell the contents of a gzip
-compressed cpio archive, you need to first decompress the archive, and then pipe the output to cpio
:
$ gzip -cd '/home/amnesia/foobar/foobar.cpio.gz' | cpio -t
foo_1.txt
foo_2.txt
foo_3.txt
foo_4.txt
foo_5.txt
foobar.cpio.gz
1 block
Above, gzip
's -c
option is used to send the command's output to the standard output, which is then piped to cpio -t
.
cpio Archive Extraction
$ cpio -i < '/home/amnesia/foobar/foobar.cpio'
1 block
$ ls
foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt foobar.cpio
Above, the contents of the foobar.cpio
archive are extracted into the current directory.
As with tar
, when you extract a cpio archive, it is best to do so from a directory other than the directory where you created the archive (e.g., /tmp/
). If you are going to extract a cpio archive that contains a directory tree, you can add the -d
option to create leading directories where needed.
You can extract files from a gzip
-compressed cpio archive like so:
$ gzip -cd '/home/amnesia/foobar/foobar.cpio.gz' | cpio -i
1 block
$ ls
foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt foobar.cpio.gz
Copying Files With cpio
cpio
's copy-pass mode is used to copy files from one directory to another, combining its copy-in and copy-out modes without creating an archive.
$ ls | cpio -p '/home/amnesia/foobar_copy/'
0 blocks
$ ls '/home/amnesia/foobar_copy/'
foo_1.txt foo_2.txt foo_3.txt foo_4.txt foo_5.txt
zip
A third utility related to archiving on GNU/Linux is zip
. Unlike gzip
and tar
/cpio
, which separately handle compression and archiving, respectively, zip
does both.
zip ex_archive.zip ex_file...
zip
knows two methods for adding files to an archive:
- stored means that the file was stored without compression
- deflated denotes compression (and the percentage states how much the file was compressed)
zip
automatically chooses which method to use, unless you disable compression using the -0
option.
Unlike tar
and cpio
, zip
automatically overwrites existing content in an archive that has the same name as a new file that you are adding to the archive. Like cpio
, zip
is not recursive, but can act recursively with its -r
option (i.e., like gzip
).
Helpful zip
options include:
-0
- Disable compression.
-d ex_archive_entry
,--delete ex_archive_entry
- Delete
ex_archive_entry
entry (or entries) from archive. -FS
,--filesync
- File system sync. Synchronizes an archive with the file system (i.e., updates the archive by adding files to the archive only if the file mentioned on the command line is newer than a pre-existing object of the same name in the archive, and deletes files from the archive that have not been named on the command line).
-r
,--recurse-paths
- Instructs
zip
to act recursively when acting on a directory. -u
,--update
- Update the archive by adding objects to the archive only if the object mentioned on the command line is newer than a pre-existing object of the same name in the archive.
Here are example commands for how to create, tell, and extract zip archives.
Create zip Archive of All Files in the Current Directory
$ zip 'foobar.zip' *
adding: foo_1.txt (stored 0%)
adding: foo_2.txt (stored 0%)
adding: foo_3.txt (stored 0%)
adding: foo_4.txt (stored 0%)
adding: foo_5.txt (stored 0%)
Above, every file in the current directory (specified by the *
search pattern) is added to a foobar.zip
archive. The command was run from inside of the directory to be archived, i.e., the /home/amnesia/foobar
directory.
Create zip Archive of All Files in a Directory Tree
In the prior example, if /home/amnesia/foobar/
contained any subdirectories, their content would not have been included in the final foobar.zip
archive. You can change this by using zip
's -r
option.
$ zip -r 'foobar.zip' 'foobar/'
adding: foobar/ (stored 0%)
adding: foobar/foo_3.txt (stored 0%)
adding: foobar/foo_2.txt (stored 0%)
adding: foobar/foo_1.txt (stored 0%)
adding: foobar/foobar_2/ (stored 0%)
adding: foobar/foobar_2/foo_3.txt (stored 0%)
adding: foobar/foobar_2/foo_2.txt (stored 0%)
adding: foobar/foobar_2/foo_1.txt (stored 0%)
adding: foobar/foobar_2/foo_5.txt (stored 0%)
adding: foobar/foobar_2/foo_4.txt (stored 0%)
adding: foobar/foo_5.txt (stored 0%)
adding: foobar/foo_4.txt (stored 0%)
Like with tar
, when creating archives of directory trees, you should run your zip
command from the parent directory of the directory that you wish to create an archive from. For the example above, you would run the command from the /home/amnesia/
directory, i.e., the parent directory of the /home/amnesia/foobar/
directory.
Alternatively, you can do this:
$ find 'foobar/' | zip -@ 'foobar.zip'
adding: foobar/ (stored 0%)
adding: foobar/foo_3.txt (stored 0%)
adding: foobar/foo_2.txt (stored 0%)
adding: foobar/foo_1.txt (stored 0%)
adding: foobar/foobar_2/ (stored 0%)
adding: foobar/foobar_2/foo_3.txt (stored 0%)
adding: foobar/foobar_2/foo_2.txt (stored 0%)
adding: foobar/foobar_2/foo_1.txt (stored 0%)
adding: foobar/foobar_2/foo_5.txt (stored 0%)
adding: foobar/foobar_2/foo_4.txt (stored 0%)
adding: foobar/foo_5.txt (stored 0%)
adding: foobar/foo_4.txt (stored 0%)
Above, the find
command is used to recursively traverse the /home/amnesia/foobar/
directory and pass along the path name of each file to the zip
command via a pipe (|
). -@
tells zip
to take its file list from the standard input, instead of as a list of arguments on the command line.
The find 'foobar/' | zip -@ 'foobar.zip'
command would be run from the /home/amnesia/
directory, i.e., the parent directory of the /home/amnesia/foobar/
directory. This use case is similar to how the find
command can be used with the cpio
command, as previously discussed. This kind of construction allows you to utilize the powerful flexibility of find
to create custom filters for the kinds of files you want to include in your archive.
zip Archive Telling
You can view the contents of a zip archive by using zip
's -sf
(--show-files
) option.
$ zip -sf 'foobar.zip'
Archive contains:
home/amnesia/foobar/
home/amnesia/foobar/foo_3.txt
home/amnesia/foobar/foo_2.txt
home/amnesia/foobar/foo_1.txt
home/amnesia/foobar/foobar_2/
home/amnesia/foobar/foobar_2/foo_3.txt
home/amnesia/foobar/foobar_2/foo_2.txt
home/amnesia/foobar/foobar_2/foo_1.txt
home/amnesia/foobar/foobar_2/foo_5.txt
home/amnesia/foobar/foobar_2/foo_4.txt
home/amnesia/foobar/foo_5.txt
home/amnesia/foobar/foo_4.txt
Total 12 entries (0 bytes)
The above example displays the contents of the foobar.zip
archive in the current directory.
zip Archive Decompression and Extraction
$ unzip '/home/amnesia/foobar.zip'
Archive: /home/amnesia/foobar.zip
creating: foobar/
extracting: foobar/foo_3.txt
extracting: foobar/foo_2.txt
extracting: foobar/foo_1.txt
creating: foobar/foobar_2/
extracting: foobar/foobar_2/foo_3.txt
extracting: foobar/foobar_2/foo_2.txt
extracting: foobar/foobar_2/foo_1.txt
extracting: foobar/foobar_2/foo_5.txt
extracting: foobar/foobar_2/foo_4.txt
extracting: foobar/foo_5.txt
extracting: foobar/foo_4.txt
If the object that you are about to decompress and extract already exists in the current directory, unzip
offers to ignore it in the archive, rename the object from the archive, or overwrite the existing object in the current directory with the archive copy.
If you only want to decompress and extract specific objects in the archive, you can do so by specifying them as arguments on the command line:
$ unzip '/home/amnesia/foobar.zip' 'foobar/foo_3.txt'
Archive: /home/amnesia/foobar.zip
extracting: foobar/foo_3.txt
If you want to decompress and extract multiple zip archives in the same directory, you can do the following:
$ unzip '*.zip'
Archive: foobar.zip
extracting: foo_1.txt
extracting: foo_2.txt
extracting: foo_3.txt
extracting: foo_4.txt
extracting: foo_5.txt
Archive: foobar_2.zip
extracting: foo_10.txt
extracting: foo_6.txt
extracting: foo_7.txt
extracting: foo_8.txt
extracting: foo_9.txt
Archive: foobar_3.zip
extracting: foo_11.txt
extracting: foo_12.txt
extracting: foo_13.txt
extracting: foo_14.txt
extracting: foo_15.txt
3 archives were successfully processed.
Above, *.zip
needs to be surrounded in quotes (''
) because if filename expansion occurs, zip
will interpret one archive as the target to operate on (foobar.zip
), and the other two zip archives (foobar_2.zip
and foobar_3.zip
) as entries you would like to decompress/extract from the target.
Documentation
You can learn more about gzip
, tar
, cpio
, and zip
by examining their man pages on the command line, or online.