CSCI 4630
Spring 2000
Programming assignment 4

Due: April 19, 5:00pm.

The assignment

Implement the following in C or C++ under Unix.

Write a program that counts the number of bytes used by all files in a given directory, including files in its subdirectories, their subdirectories, etc. The directory name should be put on the command line that starts the program. For example, if your program is called space and you want to know how many bytes are used by files in directory mydir, you would type

   space mydir

Only count bytes used by ordinary files, not by subdirectories themselves. (Count the files in the subdirectories, just ignore the space occupied by the directory entries.)

If there are two hard links to the same file, then your program should count the space twice. It should not attempt to avoid counting a multiply-linked file more than once.

Do not traverse soft links. If you encounter a soft link, just ignore it.

Although you can ask how many blocks are allocated to a file, and compute then number of physical bytes used (included wasted space at the end of the file) do not do that. Instead, only count, for each file, the number of actual bytes in the file, ignoring wasted space on the disk.

Be careful about the directories "." and "..". Directory "." points back to the current directory, and ".." points to the parent directory. Do not go up to the parent directory!

Testing your program

Do not turn in an untested program. To test it, create a test directory and put a few files in it. Be sure to include some subdirectories as well. You will want to be able to create links. Command
  ln old new
creates a file called new that is hard-linked to existing file old. You cannot do a hard-link to a directory, or across file systems. Command
  ln -s old new
creates a soft link, also called a symbolic link, called new, referring to existing file or directory old. You can do a symbolic link across file systems. That is, you can refer to a file that resides in a different file system or different device from the link.

To check a symbolic link, use

  ls -l lnk
where lnk is the name of the link.

Tools

Reading a directory

You can get the files in a directory using the opendir and readdir calls. If variable dfd has type DIR, then you can open directory dirname using
    dfd = opendir(dirname);
Declare a variable of type struct dirent*. A call to readdir(dfd) gets the next entry from the open directory dfd, and returns a pointer to a value of type struct dirent. So if you write
    struct dirent* dir;
    ...
    dir = readdir(dfd);
then dir is set to point to a structure that contains information about one entry in the directory. The diren structure has fields providing information about the entry. One of them is the d_name field. To get the name of the file, use dir->d_name.

After you have read all of the entries in a directory, the next call to readdir returns NULL. At that point, you should close the open directory, using closedir(dfd).

You will need to include header file dirent.h to use these types and functions. See the manual page for readdir for more detail.

Getting file attributes

The lstat system call provides status information about a file. Create a variable (say, statusinfo) of type struct stat and another (say, statres) of type int. Then, to find out information about a file called filename, do
   statres = lstat(filename, &statusinfo);
If the file exists, lstat returns 0, and puts information about the file into the statusinfo structure. If the file does not exist lstat returns -1. See the manual page for lstat for a complete description of the available information. Here are some of the pieces of information that you get.

  1. S_ISDIR(statusinfo.st_mode) is nonzero if the file is a directory.

  2. S_ISLNK(statusinfo.st_mode) is nonzero if the file is a soft link.

  3. statusinfo.st_size if the file size, in bytes.

To use the lstat function, you will need to include header file sys/stat.h.

Remark. There is a related function called stat. The difference between stat and lstat is that, when the file is a symbolic link, stat returns information about the file that the link refers to, while lstat returns information about the link itself.

Getting the command line

The command line is provided as an array of strings (type char*). If you use main program heading
  int main(int argc, char** argv)
then argc is the number of parts in the command line (including the command name) and argv is an array of the parts. For example, if you use command line
   space mydir
then argc is 2 and argv contains two strings, "space" and "mydir".