1

Ursin

Various File and Disk Usage Utilities

Submitted to

Dr. Leangsuksun

Computer Science Department

LouisianaTechUniversity

Ruston, LA

By

Tamor Ursin

Computer Science

11 May 2004

Introduction

What are file and disk usage utilities? File utilities can be organized in three major categories. These categories consist of file management, file information, and file compression utilities.File management utilities help manage various file types on the Linux operating system. File information utilities basically describe the information requested by the appropriate command. Finally, file compression utilities compress and decompress file types.

Disk usage utilities provide four main commands which allow users to check usage for memory, space, and file systems. Throughout this paper I will farther discuss and give examples of two major utilities from file management, file information, and file compression, and disk usage categories.

File Utilities

File Management

Two major file management utilities are rsync and chown. Rsync is a program that behaves in the same way that rcp does, however, there are many more options and uses with rsync remote-update protocol to greatly increase file transfer speed when the destination file already exist. The rsync protocol allows rsync to transfers only the differences between two sets of files across the network link using an efficient search algorithm. Rsync’s specialty is synchronizing file trees across networks; however, it is optimized for single computer.

Some of the major additional features of rsync consist of support for copying links, devices, owners, groups, and permissions. Secondly, rsync is optimized to exclude and exclude-from options similar to GNU tar. Lastly, rsyncpossesses a CVS exclude mode for ignoring the same files that CVS would ignore. For example, suppose you have a directory called source

and you want to create a backup in destinationthis is how the process would be accomplished:

rsync –a source/ destination

The second major utility is chown. The owner of the file specified by path or by fd is changed. Only the super-user may change the owner of a file. The owner of a file may change the group of the file to any group of which that owner is a member. The super-user may change the group arbitrarily. If the owner or group is specified as -1, then that ID is not changed. When the owner or groups of an executable file are changed by a non-super user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not specify whether this also should happen when root does the chown; the Linux behaviors depend on the kernel version. In case of a non-group executable file (with clear S_IXGRP bit) the S_ISGID bit indicates mandatory locking, and is not clearedby a chown.A practical display of this function is to change owner and group of a file at same time by:

Chown owner.group filename

A second example of how to change owner and group of multiple files at the same time signifies:

Chown –R owner.group directory

File Information

The strace utility calls and signals for a binary program. In the simplest case strace runs the specified command until it exits. TI intercepts and records the system calls which are called by a process and the signals which are received by a process. The name of each system call, its arguments and its return value are printed on standard error or to the file specified with the –o options.Strace is a useful diagnostic, instructional, debugging tool. System administrators, diagnosticians and trouble-shooters find it invaluable for solving problems with programs for which the source is not readily available since they do no t need to be recompiled in order to trace them. Students, hackers and the overly curious will find that a great deal can be learned about a system and its system calls by tracing even ordinary programs. And programmers find that sine system calls and signals are events that happen at the user/kernel interface, a close exam of this boundary is very useful for bug isolation, sanity checking and attempting to capture race conditions.In the simplest case strace runs the specified command until it exits. It intercepts and records the system calls which are called by a process and the signals which are received by a process. The name of each system call, its arguments and its return value are printed on standard error or to the file specified with the -o option. For example:

1.strace [ -dffhiqrtttTvxx ] [ -acolumn ] [ -eexpr ] ... [ -ofile ] [ -ppid ] ...

[ -sstrsize ] [ -uusername ] [ -Evar=val ] ... [ -Evar ] ...

[ command [ arg ... ]]

2.strace-c [ -eexpr ] ... [ -Ooverhead ] [ -Ssortby ] [ command [ arg ... ] ]

The second utility am introducing is head, which has a twin named tail. The head utility allows the user to display the first 10 lines of a file to standard output. With this utility more than one FILE precedes each with a head giving the file name. Mandartory arguments are are necessary for short and long options. With head, there are options to print first bytes. Secondly, to print first lines instead of first 10. Thirdly, there is the option to never print headers giving a file name. Finally, the verbose options which always prints headers giving a file name. Examples of these functions are as follows:

  1. -c, --bytes=SIZE
  2. –n, --lines=NUMBERS
  3. –q, --quiet, --silent
  4. –v, --verbose

File Compression

There are four major types of linux file compression utilities. The two that will be infusized are bunzip2 and gzip. Bunzip2 utilities is as simple as using any other command-line tool. There are switches to use with the main command but typical usage will be without switches. The most important thing to remember is that bzip2 compresses and bunzip2 decompresses. If you have a file named todays_payroll and you need this file compressed with bzip2, run the command bzip2 todays_payroll, which will result in the file todays_payroll.bz2. To decompress the new file, run the command bunzip2 today_payroll.bz2, and the original file will appear intact. Unlike bunzip2, the gzip compression utilities use Lempel-Ziv coding (LZ77). This compression technique is based on numerically indexing character string segments, based on their first appearance in a file, and then replacing those strings with numeric values in future occurrences. The algorithm is complex, and doesn’t offer an enormous upside in file size reduction. A 14-character test string, abaabaaabbabb, that I compressed using Lempel-Ziv, dropped to 13 characters, 0a0b1a2b1ab45.
I compressed a 34-MB file with bzip2 down to 11 MB; gzip compressed the file to 12 MB but took nearly half the time. Bzip2 has to rearrange blocks in such a way as to make the overall file smaller; gzip simply makes each string smaller by replacement. Because gzip doesn't have quite the compression ratio of bzip2, yet is able to compress much faster, gzip is best suited for on-the-fly compression where size is not an issue. Other than speed, gzip holds one other benefit over bzip2;gzip is able to work with multiple formats. Where bzip2 is only able to handle files with the .bz2 extension, gzip can work with .gz, .Z, .tgz, and .zip extensions.

The bzip2recover (part of bzip2) utility has the ability to recover data from a damaged transmission error or damaged media. This utility should only be used on larger .bz2 files because the larger the file, the more recoverable blocks it will contain. To attempt recovery, run the command bzip2recover file_name. The recovered file will have a leading recov00001 (where 00001 equals the number of the extracted block).

Both gzip and gunzip have a number of switches that can be passed to the command. The three most useful switches are:

  • -N: This always saves the original file name and time stamp.
  • -r: This recursively compresses a directory.
  • -c: This concatenates two files.

The -c switch must be used with caution. The syntax of this command requires two steps:

  1. Step 1: gzip -c file1 > file.gz
  2. Step 2: gzip -c file2 > file.gz

Disk Usage Utilities

There are four main commands that associate with disk utilities. Df is for disk usage for

all mounted drives. Du is for disk usage for the current directory and all subdirectories. Du -h

show results in human-readable form (kb & MB). Finally statdisplays information about the

specified file(s).With no arguments, df reports the space used and available on all currently

mounted filesystems (of all types). Otherwise, df reports on the filesystem containing each

argument file. Normally the disk space is printed in units of 1024 bytes, but this can be

overridden . Non-integer quantities are rounded up to the next higher unit.

If an argument file is a disk device file containing a mounted filesystem, df shows the space

available on that filesystem rather than on the filesystem containing the device node . GNU df

does not attempt to determine the disk usage on unmounted filesystems, because on most kinds

of systems doing so requires extremely nonportable intimate knowledge

of filesystem structures.

Du reports the disk space for the current directory. Normally the disk

space is printed in units of 1024 bytes, but this can be overridden . Non-integer quantities are

rounded up to the next higher unit.

The syntax for this option is:

du [ option ] … file

On BSD systems, du reports sizes that are half the correct values for files that are NFS-mounted

from HP-UX systems. On HP-UX systems, it reports sizes that are twice the correct values for

files that are NFS-mounted from BSD systems. This is due to a flaw in HP-UX; it also affects the

HP-UX du program.

Stat reports all information about the given files. But it also can be used to report the

information of the filesystems the given files are located on. If the files are links,

stat can also give information about the files the links point to.

Sync writes any data buffered in memory out to disk. This can include modified

superblocks, modified inodes, and delayed reads and writes. This must be

implemented by the kernel; The sync program does nothing but exercise the sync system call.

The kernel keeps data in memory to avoid doing (relatively slow) disk reads and writes. This

improves performance, but if the computer crashes, data may be lost or the filesystem corrupted

as a result. sync ensures everything in memory is written to disk.

Conclusion

File utilities are organized in three major categories. These categories consist of file management, file information, and file compression utilities. I have conveyed that file management utilities help manage various file types on the Linux operating system. File information utilities basically describe the information requested by the appropriate command. Finally, file compression utilities compress and decompress file types.

Disk usage utilities provide four main commands which allow users to check usage for memory, space, and file systems. Throughout this synopsis, I gave two major utilities from file management, file information, and file compression, and four major utilities disk usage categories.