Monday, December 10, 2007

UNIX I/O

Device Terminology
A peripheral device is piece of hardware accessed by a computer system. Common peripheral devices include disks, tapes, CD-ROMs, screens, keyboards, printers, mouse devices and network interfaces. User programs perform control and I/O to these devices through system calls to operating system modules called device drivers. A device driver hides the details of device operation and protects the device from unauthorized use. Devices of the same type may vary substantially in their operation, so to be usable, even a single-user machine needs device drivers. Some operating systems provide pseudodevice drivers to simulate devices such as terminals. Pseudoterminals, for example, simplify the handling of remote login to computer systems over a network or a modem line.

Some operating systems provide specific system calls for each type of supported device, requiring the systems programmer to learn a complex set of calls for device control. UNIX has greatly simplified the programmer device interface by providing uniform access to most devices through five functions—open, close, read, write and ioctl. All devices are represented by files, called special files, that are located in the /dev directory. Thus, disk files and other devices are named and accessed in the same way. A regular file is just an ordinary data file on disk. A block special file represents a device with characteristics similar to a disk. The device driver transfers information from a block special device in blocks or chunks, and usually such devices support the capability of retrieving a block from anywhere on the device. A character special file represents a device with characteristics similar to a terminal. The device appears to represent a stream of bytes that must be accessed in sequential order.
Reading and Writing
UNIX provides sequential access to files and other devices through the read and write functions. The read function attempts to retrieve nbyte bytes from the file or device represented by fildes into the user variable buf. You must actually provide a buffer that is large enough to hold nbyte bytes of data. (A common mistake is to provide an uninitialized pointer, buf, rather than an actual buffer.)

SYNOPSIS

#include

ssize_t read(int fildes, void *buf, size_t nbyte);
POSIX

If successful, read returns the number of bytes actually read. If unsuccessful, read returns –1 and sets errno. The following table lists the mandatory errors for read.

errno
cause

ECONNRESET
read attempted on a socket and connection was forcibly closed by its peer

EAGAIN
O_NONBLOCK is set for file descriptor and thread would be delayed

EBADF
fildes is not a valid file descriptor open for reading

EINTR
read was terminated due to receipt of a signal and no data was transferred

EIO
process is a member of a background process group attempting to read from its controlling terminal and either process is ignoring or blocking SIGTTIN or process group is orphaned

ENOTCONN
read attempted on socket that is not connected

EOVERFLOW
the file is a regular file, nbyte is greater than 0, and the starting position exceeds offset maximum

ETIMEDOUT
read attempted on socket and transmission timeout occurred

EWOULDBLOCK
file descriptor is for socket marked O_NONBLOCK and no data is waiting to be received (EAGAIN is alternative)



A read operation for a regular file may return fewer bytes than requested if, for example, it reached end-of-file before completely satisfying the request. A read operation for a regular file returns 0 to indicate end-of-file. When special files corresponding to devices are read, the meaning of a read return value of 0 depends on the implementation and the particular device. A read operation for a pipe returns as soon as the pipe is not empty, so the number of bytes read can be less than the number of bytes requested. (Pipes are a type of communication buffer discussed in Chapter 6.) When reading from a terminal, read returns 0 when the user enters an end-of-file character. On many systems the default end-of-file character is Ctrl-D.

The ssize_t data type is a signed integer data type used for the number of bytes read, or –1 if an error occurs. On some systems, this type may be larger than an int. The size_t is an unsigned integer data type for the number of bytes to read.

Example 4.1
The following code segment reads at most 100 bytes into buf from standard input.

char buf[100];
ssize_t bytesread;

bytesread = read(STDIN_FILENO, buf, 100);

This code does no error checking.

The file descriptor, which represents a file or device that is open, can be thought of as an index into the process file descriptor table. The file descriptor table is in the process user area and provides access to the system information for the associated file or device.

When you execute a program from the shell, the program starts with three open streams associated with file descriptors STDIN_FILENO, STDOUT_FILENO and STDERR_FILENO. STDIN_FILENO and STDOUT_FILENO are standard input and standard output, respectively. By default, these two streams usually correspond to keyboard input and screen output. Programs should use STDERR_FILENO, the standard error device, for error messages and should never close it. In legacy code standard input, standard output and standard error are represented by 0, 1 and 2, respectively. However, you should always use their symbolic names rather than these numeric values. Section 4.6 explains how file descriptors work.

Exercise 4.2
What happens when the following code executes?

char *buf;
ssize_t bytesread;

bytesread = read(STDIN_FILENO, buf, 100);

Answer:

The code segment, which may compile without error, does not allocate space for buf. The result of read is unpredictable, but most probably it will generate a memory access violation. If buf is an automatic variable stored on the stack, it is not initialized to any particular value. Whatever that memory happens to hold is treated as the address of the buffer for reading.

The readline function of Program 4.1 reads bytes, one at a time, into a buffer of fixed size until a newline character ('\n') or an error occurs. The function handles end-of-file, limited buffer size and interruption by a signal. The readline function returns the number of bytes read or –1 if an error occurs. A return value of 0 indicates an end-of-file before any characters were read. A return value greater than 0 indicates the number of bytes read. In this case, the buffer contains a string ending in a newline character. A return value of –1 indicates that errno has been set and one of the following errors occurred.

An error occurred on read.

At least one byte was read and an end-of-file occurred before a newline was read.

nbytes-1 bytes were read and no newline was found.

Upon successful return of a value greater than 0, the buffer contains a string ending in a newline character. If readline reads from a file that does not end with a newline character, it treats the last line read as an error. The readline function is available in the restart library, of Appendix B.

Program 4.1 readline.c
The readline function returns the next line from a file.

#include
#include

int readline(int fd, char *buf, int nbytes) {
int numread = 0;
int returnval;

while (numread < nbytes - 1) {
returnval = read(fd, buf + numread, 1);
if ((returnval == -1) && (errno == EINTR))
continue;
if ( (returnval == 0) && (numread == 0) )
return 0;
if (returnval == 0)
break;
if (returnval == -1)
return -1;
numread++;
if (buf[numread-1] == '\n') {
buf[numread] = '\0';
return numread;
}
}
errno = EINVAL;
return -1;
}

Example 4.3
The following code segment calls the readline function of Program 4.1 to read a line of at most 99 bytes from standard input.

int bytesread;
char mybuf[100];

bytesread = readline(STDIN_FILENO, mybuf, sizeof(mybuf));

Exercise 4.4
Under what circumstances does the readline function of Program 4.1 return a buffer with no newline character?

Answer:

This can only happen if the return value is 0 or –1. The return value of 0 indicates that nothing was read. The return of –1 indicates some type of error. In either case, the buffer may not contain a string.

The write function attempts to output nbyte bytes from the user buffer buf to the file represented by file descriptor fildes.

SYNOPSIS

#include

ssize_t write(int fildes, const void *buf, size_t nbyte);
POSIX

If successful, write returns the number of bytes actually written. If unsuccessful, write returns –1 and sets errno. The following table lists the mandatory errors for write.

errno
cause

ECONNRESET
write attempted on a socket that is not connected

EAGAIN
O_NONBLOCK is set for file descriptor and thread would be delayed

EBADF
fildes is not a valid file descriptor open for writing

EFBIG
attempt to write a file that exceeds implementation-defined maximum; file is a regular file, nbyte is greater than 0, and starting position exceeds offset maximum

EINTR
write was terminated due to receipt of a signal and no data was transferred

EIO
process is a member of a background process group attempting to write to controlling terminal, TOSTOP is set, process is neither blocking nor ignoring SIGTTOU and process group is orphaned

ENOSPC
no free space remaining on device containing the file

EPIPE
attempt to write to a pipe or FIFO not open for reading or that has only one end open (thread may also get SIGPIPE), or write attempted on socket shut down for writing or not connected (if not connected, also generates SIGPIPE signal)

EWOULDBLOCK
file descriptor is for socket marked O_NONBLOCK and write would block (EAGAIN is alternative)



Exercise 4.5
What can go wrong with the following code segment?

#define BLKSIZE 1024
char buf[BLKSIZE];

read(STDIN_FILENO, buf, BLKSIZE);
write(STDOUT_FILENO, buf, BLKSIZE);

Answer:

The write function assumes that the read has filled buf with BLKSIZE bytes. However, read may fail or may not read the full BLKSIZE bytes. In these two cases, write outputs garbage.

Exercise 4.6
What can go wrong with the following code segment to read from standard input and write to standard output?

#define BLKSIZE 1024
char buf[BLKSIZE];
ssize_t bytesread;

bytesread = read(STDIN_FILENO, buf, BLKSIZE);
if (bytesread > 0)
write(STDOUT_FILE, buf, bytesread);

Answer:

Although write uses bytesread rather than BLKSIZE, there is no guarantee that write actually outputs all of the bytes requested. Furthermore, either read or write can be interrupted by a signal. In this case, the interrupted call returns a –1 with errno set to EINTR.

Program 4.2 copies bytes from the file represented by fromfd to the file represented by tofd. The function restarts read and write if either is interrupted by a signal. Notice that the write statement specifies the buffer by a pointer, bp, rather than by a fixed address such as buf. If the previous write operation did not output all of buf, the next write operation must start from the end of the previous output. The copyfile function returns the number of bytes read and does not indicate whether or not an error occurred.

Example 4.7 simplecopy.c
The following program calls copyfile to copy a file from standard input to standard output.

#include
#include

int copyfile(int fromfd, int tofd);

int main (void) {
int numbytes;

numbytes = copyfile(STDIN_FILENO, STDOUT_FILENO);
fprintf(stderr, "Number of bytes copied: %d\n", numbytes);
return 0;
}

Exercise 4.8
What happens when you run the program of Example 4.7?

Answer:

Standard input is usually set to read one line at a time, so I/O is likely be entered and echoed on line boundaries. The I/O continues until you enter the end-of-file character (often Ctrl-D by default) at the start of a line or you interrupt the program by entering the interrupt character (often Ctrl-C by default). Use the stty -a command to find the current settings for these characters.

Program 4.2 copyfile1.c
The copyfile.c function copies a file from fromfd to tofd.

#include
#include
#define BLKSIZE 1024

int copyfile(int fromfd, int tofd) {
char *bp;
char buf[BLKSIZE];
int bytesread, byteswritten;
int totalbytes = 0;

for ( ; ; ) {
while (((bytesread = read(fromfd, buf, BLKSIZE)) == -1) &&
(errno == EINTR)) ; /* handle interruption by signal */
if (bytesread <= 0) /* real error or end-of-file on fromfd */
break;
bp = buf;
while (bytesread > 0) {
while(((byteswritten = write(tofd, bp, bytesread)) == -1 ) &&
(errno == EINTR)) ; /* handle interruption by signal */
if (byteswritten <= 0) /* real error on tofd */
break;
totalbytes += byteswritten;
bytesread -= byteswritten;
bp += byteswritten;
}
if (byteswritten == -1) /* real error on tofd */
break;
}
return totalbytes;
}

Exercise 4.9
How would you use the program of Example 4.7 to copy the file myin.dat to myout.dat?

Answer:

Use redirection. If the executable of Example 4.7 is called simplecopy, the line would be as follows.

simplecopy < myin.dat > myout.dat

The problems of restarting read and write after signals and of writing the entire amount requested occur in nearly every program using read and write. Program 4.3 shows a separate r_read function that you can use instead of read when you want to restart after a signal. Similarly, Program 4.4 shows a separate r_write function that restarts after a signal and writes the full amount requested. For convenience, a number of functions, including r_read, r_write, copyfile and readline, have been collected in a library called restart.c. The prototypes for these functions are contained in restart.h, and we include this header file when necessary. Appendix B presents the complete restart library implementation.

Program 4.3 r_read.c
The r_read.c function is similar to read except that it restarts itself if interrupted by a signal.

#include
#include

ssize_t r_read(int fd, void *buf, size_t size) {
ssize_t retval;

while (retval = read(fd, buf, size), retval == -1 && errno == EINTR) ;
return retval;
}

Program 4.4 r_write.c
The r_write.c function is similar to write except that it restarts itself if interrupted by a signal and writes the full amount requested.

#include
#include

ssize_t r_write(int fd, void *buf, size_t size) {
char *bufp;
size_t bytestowrite;
ssize_t byteswritten;
size_t totalbytes;

for (bufp = buf, bytestowrite = size, totalbytes = 0;
bytestowrite > 0;
bufp += byteswritten, bytestowrite -= byteswritten) {
byteswritten = write(fd, bufp, bytestowrite);
if ((byteswritten) == -1 && (errno != EINTR))
return -1;
if (byteswritten == -1)
byteswritten = 0;
totalbytes += byteswritten;
}
return totalbytes;
}

The functions r_read and r_write can greatly simplify programs that need to read and write while handling signals.

Program 4.5 shows the readwrite function that reads bytes from one file descriptor and writes all of the bytes read to another one. It uses a buffer of size PIPE_BUF to transfer at most PIPE_BUF bytes. This size is useful for writing to pipes since a write to a pipe of PIPE_BUF bytes or less is atomic. Program 4.6 shows a version of copyfile that uses the readwrite function. Compare this with Program 4.2.

Program 4.5 readwrite.c
A program that reads from one file descriptor and writes all the bytes read to another file descriptor.

#include
#include "restart.h"
#define BLKSIZE PIPE_BUF

int readwrite(int fromfd, int tofd) {
char buf[BLKSIZE];
int bytesread;

if ((bytesread = r_read(fromfd, buf, BLKSIZE)) == -1)
return -1;
if (bytesread == 0)
return 0;
if (r_write(tofd, buf, bytesread) == -1)
return -1;
return bytesread;
}

Program 4.6 copyfile.c
A simplified implementation of copyfile that uses r_read and r_write.

#include
#include "restart.h"
#define BLKSIZE 1024

int copyfile(int fromfd, int tofd) {
char buf[BLKSIZE];
int bytesread, byteswritten;
int totalbytes = 0;

for ( ; ; ) {
if ((bytesread = r_read(fromfd, buf, BLKSIZE)) <= 0)
break;
if ((byteswritten = r_write(tofd, buf, bytesread)) == -1)
break;
totalbytes += byteswritten;
}
return totalbytes;
}

The r_write function writes all the bytes requested and restarts the write if fewer bytes are written. The r_read only restarts if interrupted by a signal and often reads fewer bytes than requested. The readblock function is a version of read that continues reading until the requested number of bytes is read or an error occurs. Program 4.7 shows an implementation of readblock. The readblock function is part of the restart library. It is especially useful for reading structures.

Program 4.7 readblock.c
A function that reads a specific number of bytes.

#include
#include

ssize_t readblock(int fd, void *buf, size_t size) {
char *bufp;
size_t bytestoread;
ssize_t bytesread;
size_t totalbytes;

for (bufp = buf, bytestoread = size, totalbytes = 0;
bytestoread > 0;
bufp += bytesread, bytestoread -= bytesread) {
bytesread = read(fd, bufp, bytestoread);
if ((bytesread == 0) && (totalbytes == 0))
return 0;
if (bytesread == 0) {
errno = EINVAL;
return -1;
}
if ((bytesread) == -1 && (errno != EINTR))
return -1;
if (bytesread == -1)
bytesread = 0;
totalbytes += bytesread;
}
return totalbytes;
}

There are only three possibilities for the return value of readblock. The readblock function returns 0 if an end-of-file occurs before any bytes are read. This happens if the first call to read returns 0. If readblock is successful, it returns size, signifying that the requested number of bytes was successfully read. Otherwise, readblock returns –1 and sets errno. If readblock reaches the end-of-file after some, but not all, of the needed bytes have been read, readblock returns –1 and sets errno to EINVAL.

Example 4.10
The following code segment can be used to read a pair of integers from an open file descriptor.

struct {
int x;
int y;
} point;
if (readblock(fd, &point, sizeof(point)) <= 0)
fprintf(stderr, "Cannot read a point.\n");

Program 4.8 combines readblock with r_write to read a fixed number of bytes from one open file descriptor and write them to another open file descriptor.

Program 4.8 readwriteblock.c
A program that copies a fixed number of bytes from one file descriptor to another.

#include "restart.h"

int readwriteblock(int fromfd, int tofd, char *buf, int size) {
int bytesread;

bytesread = readblock(fromfd, buf, size);
if (bytesread != size) /* can only be 0 or -1 */
return bytesread;
return r_write(tofd, buf, size);
}

Processes in UNIX

Process Identification

UNIX identifies processes by a unique integral value called the process ID. Each process also has a parent process ID, which is initially the process ID of the process that created it. If this parent process terminates, the process is adopted by a system process so that the parent process ID always identifies a valid process.

The getpid and getppid functions return the process ID and the parent process ID, respectively. The pid_t is an unsigned integer type that represents a process ID.

SYNOPSIS

#include

pid_t getpid(void);
pid_t getppid(void) ;
POSIX

Neither the getpid nor the getppid functions can return an error.

Example 3.1 outputPID.c

The following program outputs its process ID and its parent process ID. Notice that the return values are cast to long for printing since there is no guarantee that a pid_t will fit in an int.

#include 
#include

int main (void) {
printf("I am process %ld\n", (long)getpid());
printf("My parent is %ld\n", (long)getppid());
return 0;
}

System administrators assign a unique integral user ID and an integral group ID to each user when creating the user's account. The system uses the user and group IDs to retrieve from the system database the privileges allowed for that user. The most privileged user, superuser or root, has a user ID of 0. The root user is usually the system administrator.

A UNIX process has several user and group IDs that convey privileges to the process. These include the real user ID, the real group ID, the effective user ID and the effective group ID. Usually, the real and effective IDs are the same, but under some circumstances the process can change them. The process uses the effective IDs for determining access permissions for files. For example, a program that runs with root privileges may want to create a file on behalf of an ordinary user. By setting the process's effective user ID to be that of this user, the process can create the files "as if" the user created them. For the most part, we assume that the real and effective user and group IDs are the same.

The following functions return group and user IDs for a process. The gid_t and uid_t are integral types representing group and user IDs, respectively. The getgid and getuid functions return the real IDs, and getegid and geteuid return the effective IDs.

SYNOPSIS

#include

gid_t getegid(void);
uid_t geteuid(void);
git_t getgid(void);
uid_t getuid(void);
POSIX

None of these functions can return an error.

Example 3.2 outputIDs.c

The following program prints out various user and group IDs for a process.

#include 
#include

int main(void) {
printf("My real user ID is %5ld\n", (long)getuid());
printf("My effective user ID is %5ld\n", (long)geteuid());
printf("My real group ID is %5ld\n", (long)getgid());
printf("My effective group ID is %5ld\n", (long)getegid());
return 0;
}

Process State

The state of a process indicates its status at a particular time. Most operating systems allow some form of the states listed in Table 3.1. A state diagram is a graphical representation of the allowed states of a process and the allowed transitions between states. Figure 3.1 shows such a diagram. The nodes of the graph in the diagram represent the possible states, and the edges represent possible transitions. A directed arc from state A to state B means that a process can go directly from state A to state B. The labels on the arcs specify the conditions that cause the transitions between states to occur.

Figure 3.1. State diagram for a simple operating system.

graphics/03fig01.gif

While a program is undergoing the transformation into an active process, it is said to be in the new state. When the transformation completes, the operating system puts the process in a queue of processes that are ready to run. The process is then in the ready or runnable state. Eventually the component of the operating system called the process scheduler selects a process to run. The process is in the running state when it is actually executing on the CPU.

Table 3.1. Common process states.

state

meaning

new

being created

running

instructions are being executed

blocked

waiting for an event such as I/O

ready

waiting to be assigned to a processor

done

finished

A process in the blocked state is waiting for an event and is not eligible to be picked for execution. A process can voluntarily move to the blocked state by executing a call such as sleep. More commonly, a process moves to the blocked state when it performs an I/O request. As explained in Section 1.2, input and output can be thousands of times slower than ordinary instructions. A process performs I/O by requesting the service through a library function that is sometimes called a system call. During the execution of a system call, the operating system regains control of the processor and can move the process to the blocked state until the operation completes.

A context switch is the act of removing one process from the running state and replacing it with another. The process context is the information that the operating systems needs about the process and its environment to restart it after a context switch. Clearly, the executable code, stack, registers and program counter are part of the context, as is the memory used for static and dynamic variables. To be able to transparently restart a process, the operating system also keeps track of the process state, the status of program I/O, user and process identification, privileges, scheduling parameters, accounting information and memory management information. If a process is waiting for an event or has caught a signal, that information is also part of the context. The context also contains information about other resources such as locks held by the process.

The ps utility displays information about processes. By default, ps displays information about processes associated with the user. The -a option displays information for processes associated with terminals. The -A option displays information for all processes. The -o option specifies the format of the output.

SYNOPSIS ps [-aA] [-G grouplist] [-o format]...[-p proclist] [-t termlist] [-U userlist] POSIX Shells and Utilities
Example 3.3

The following is sample output from the ps -a command.

>% ps -a PID TTY TIME CMD 20825 pts/11 0:00 pine 20205 pts/11 0:01 bash 20258 pts/16 0:01 telnet 20829 pts/2 0:00 ps 20728 pts/4 0:00 pine 19086 pts/12 0:00 vi

The POSIX:XSI Extension provides additional arguments for the ps command. Among the most useful are the full (-f) and the long (-l) options. Table 3.2 lists the fields that are printed for each option. An (all) in the option column means that the field appears in all forms of ps.

Example 3.4

The execution of the ps -la command on the same system as for Example 3.3 produced the following output.

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 8 S 4228 20825 20205 0 40 20 ? 859 ? pts/11 0:00 pine 8 S 4228 20205 19974 0 40 20 ? 321 ? pts/11 0:01 bash 8 S 2852 20258 20248 0 40 20 ? 328 ? pts/16 0:01 telnet 8 O 512 20838 18178 0 50 20 ? 134 pts/2 0:00 ps 8 S 3060 20728 20719 0 40 20 ? 845 ? pts/4 0:00 pine 8 S 1614 19086 18875 0 40 20 ? 236 ? pts/12 0:00 vi

Table 3.2. Fields reported for various options of the ps command in the POSIX:XSI Extension.

header

option

meaning

F

-l

flags (octal and additive) associated with the process

S

-l

process state

UID

-f, -l

user ID of the process owner

PID

(all)

process ID

PPID

-f, -l

parent process ID

C

-f, -l

processor utilization used for scheduling

PRI

-l

process priority

NI

-l

nice value

ADDR

-l

process memory address

SZ

-l

size in blocks of the process image

WCHAN

-l

event on which the process is waiting

TTY

(all)

controlling terminal

TIME

(all)

cumulative execution time

CMD

(all)

command name (arguments with -f option)


UNIX Process Creation and fork

A process can create a new process by calling fork. The calling process becomes the parent, and the created process is called the child. The fork function copies the parent's memory image so that the new process receives a copy of the address space of the parent. Both processes continue at the instruction after the fork statement (executing in their respective memory images).

SYNOPSIS #include pid_t fork(void); POSIX

Creation of two completely identical processes would not be very useful. The fork function return value is the critical characteristic that allows the parent and the child to distinguish themselves and to execute different code. The fork function returns 0 to the child and returns the child's process ID to the parent. When fork fails, it returns –1 and sets the errno. If the system does not have the necessary resources to create the child or if limits on the number of processes would be exceeded, fork sets errno to EAGAIN. In case of a failure, the fork does not create a child.

Example 3.5 simplefork.c

In the following program, both parent and child execute the x = 1 assignment statement after returning from fork.

#include #include int main(void) { int x; x = 0; fork(); x = 1; printf("I am process %ld and my x is %d\n", (long)getpid(), x); return 0; }

Before the fork of Example 3.5, one process executes with a single x variable. After the fork, two independent processes execute, each with its own copy of the x variable. Since the parent and child processes execute independently, they do not execute the code in lock step or modify the same memory locations. Each process prints a message with its respective process ID and x value.

The parent and child processes execute the same instructions because the code of Example 3.5 did not test the return value of fork. Example 3.6 demonstrates how to test the return value of fork.

Example 3.6 twoprocs.c

After fork in the following program, the parent and child output their respective process IDs.

#include #include #include int main(void) { pid_t childpid; childpid = fork(); if (childpid == -1) { perror("Failed to fork"); return 1; } if (childpid == 0) /* child code */ printf("I am child %ld\n", (long)getpid()); else /* parent code */ printf("I am parent %ld\n", (long)getpid()); return 0; }

The original process in Example 3.6 has a nonzero value of the childpid variable, so it executes the second printf statement. The child process has a zero value of childpid and executes the first printf statement. The output from these processes can appear in either order, depending on whether the parent or the child executes first. If the program is run several times on the same system, the order of the output may or may not always be the same.

Exercise 3.7 badprocessID.c

What happens when the following program executes?

#include #include #include int main(void) { pid_t childpid; pid_t mypid; mypid = getpid(); childpid = fork(); if (childpid == -1) { perror("Failed to fork"); return 1; } if (childpid == 0) /* child code */ printf("I am child %ld, ID = %ld\n", (long)getpid(), (long)mypid); else /* parent code */ printf("I am parent %ld, ID = %ld\n", (long)getpid(), (long)mypid); return 0; }

Answer:

The parent sets the mypid value to its process ID before the fork. When fork executes, the child gets a copy of the process address space, including all variables. Since the child does not reset mypid, the value of mypid for the child does not agree with the value returned by getpid.

Program 3.1 creates a chain of n processes by calling fork in a loop. On each iteration of the loop, the parent process has a nonzero childpid and hence breaks out of the loop. The child process has a zero value of childpid and becomes a parent in the next loop iteration. In case of an error, fork returns –1 and the calling process breaks out of the loop. The exercises in Section 3.8 build on this program.

Figure 3.2 shows a graph representing the chain of processes generated for Program 3.1 when n is 4. Each circle represents a process labeled by its value of i when it leaves the loop. The edges represent the is-a-parent relationship. AB means process A is the parent of process B.

Figure 3.2. Chain of processes generated by Program 3.1 when called with a command-line argument of 4.

graphics/03fig02.gif

Program 3.1 simplechain.c

A program that creates a chain of n processes, where n is a command-line argument.

#include #include #include int main (int argc, char *argv[]) { pid_t childpid = 0; int i, n; if (argc != 2){ /* check for valid number of command-line arguments */ fprintf(stderr, "Usage: %s processes\n", argv[0]); return 1; } n = atoi(argv[1]); for (i = 1; i < childpid =" fork())" class="docExampleTitle">Exercise 3.8

Run Program 3.1 for large values of n. Will the messages always come out ordered by increasing i?

Answer:

The exact order in which the messages appear depends on the order in which the processes are selected by the process scheduler to run. If you run the program several times, you should notice some variation in the order.

Exercise 3.9

What happens if Program 3.1 writes the messages to stdout, using printf, instead of to stderr, using fprintf?

Answer:

By default, the system buffers output written to stdout, so a particular message may not appear immediately after the printf returns. Messages to stderr are not buffered, but instead written immediately. For this reason, you should always use stderr for your debugging messages.

Program 3.2 creates a fan of n processes by calling fork in a loop. On each iteration, the newly created process breaks from the loop while the original process continues. In contrast, the process that calls fork in Program 3.1 breaks from the loop while the newly created process continues for the next iteration.

Program 3.2 simplefan.c

A program that creates a fan of n processes where n is passed as a command-line argument.

#include #include #include int main (int argc, char *argv[]) { pid_t childpid = 0; int i, n; if (argc != 2){ /* check for valid number of command-line arguments */ fprintf(stderr, "Usage: %s processes\n", argv[0]); return 1; } n = atoi(argv[1]); for (i = 1; i < childpid =" fork())" class="docText">Figure 3.3 shows the process fan generated by Program 3.2 when n is 4. The processes are labeled by the value of i at the time they leave the loop. The original process creates n–1 children. The exercises in Section 3.9 build on this example.

Figure 3.3. Fan of processes generated by Program 3.2 with a command-line argument of 4.

graphics/03fig03.gif

Exercise 3.10

Explain what happens when you replace the test

(childpid = fork()) <= 0

of Program 3.2 with

(childpid = fork()) == -1

Answer:

In this case, all the processes remain in the loop unless the fork fails. Each iteration of the loop doubles the number of processes, forming a tree configuration illustrated in Figure 3.4 when n is 4. The figure represents each process by a circle labeled with the i value at the time it was created. The original process has a 0 label. The lowercase letters distinguish processes that were created with the same value of i. Although this code appears to be similar to that of Program 3.1, it does not distinguish between parent and child after fork executes. Both the parent and child processes go on to create children on the next iteration of the loop, hence the population explosion.

Exercise 3.11

Run Program 3.1, Program 3.2, and a process tree program based on the modification suggested in Exercise 3.10. Carefully examine the output. Draw diagrams similar to those of Figure 3.2 through Figure 3.4, labeling the circles with the actual process IDs. Use to designate the is-a-parent relationship. Do not use large values of the command-line argument unless you are on a dedicated system. How can you modify the programs so that you can use ps to see the processes that are created?

Answer:

In their current form, the programs complete too quickly for you to view them with ps. Insert the sleep(30); statement immediately before return in order to have each process block for 30 seconds before exiting. In another command window, continually execute ps -l. Section 3.4 explains why some of the processes may report a parent ID of 1 when sleep is omitted.

Figure 3.4. Tree of processes produced by the modification of Program 3.2 suggested in Exercise 3.10.

graphics/03fig04.gif

The fork function creates a new process by making a copy of the parent's image in memory. The child inherits parent attributes such as environment and privileges. The child also inherits some of the parent's resources such as open files and devices.

Not every parent attribute or resource is inherited by the child. For instance, the child has a new process ID and of course a different parent ID. The child's times for CPU usage are reset to 0. The child does not get locks that the parent holds. If the parent has set an alarm, the child is not notified when the parent's alarm expires. The child starts with no pending signals, even if the parent had signals pending at the time of the fork.

Although a child inherits its parent's process priority and scheduling attributes, it competes for processor time with other processes as a separate entity. A user running on a crowded time-sharing system can obtain a greater share of the CPU time by creating more processes. A system manager on a crowded system might restrict process creation to prevent a user from creating processes to get a bigger share of the resources.

Programs, Processes and Threads

How a Program Becomes a Process

A program is a prepared sequence of instructions to accomplish a defined task. To write a C source program, a programmer creates disk files containing C statements that are organized into functions. An individual C source file may also contain variable and function declarations, type and macro definitions (e.g., typedef) and preprocessor commands (e.g., #ifdef, #include, #define). The source program contains exactly one main function.

Traditionally, C source filenames have a .c extension, and header filenames have a .h extension. Header files usually only contain macro and type definitions, defined constants and function declarations. Use the #include preprocessor command to insert the contents of a header file into the source.

The C compiler translates each source file into an object file. The compiler then links the individual object files with the necessary libraries to produce an executable module. When a program is run or executed, the operating system copies the executable module into a program image in main memory.

A process is an instance of a program that is executing. Each instance has its own address space and execution state. When does a program become a process? The operating system reads the program into memory. The allocation of memory for the program image is not enough to make the program a process. The process must have an ID (the process ID) so that the operating system can distinguish among individual processes. The process state indicates the execution status of an individual process. The operating system keeps track of the process IDs and corresponding process states and uses the information to allocate and manage resources for the system. The operating system also manages the memory occupied by the processes and the memory available for allocation.

When the operating system has added the appropriate information in the kernel data structures and has allocated the necessary resources to run the program code, the program has become a process. A process has an address space (memory it can access) and at least one flow of control called a thread. The variables of a process can either remain in existence for the life of the process (static storage) or be automatically allocated when execution enters a block and deallocated when execution leaves the block (automatic storage). Appendix A.5 discusses C storage classes in detail.

A process starts with a single flow of control that executes a sequence of instructions. The processor program counter keeps track of the next instruction to be executed by that processor (CPU). The CPU increments the program counter after fetching an instruction and may further modify it during the execution of the instruction, for example, when a branch occurs. Multiple processes may reside in memory and execute concurrently, almost independently of each other. For processes to communicate or cooperate, they must explicitly interact through operating system constructs such as the filesystem, pipes shared memory or a network .

Threads and Thread of Execution

When a program executes, the value of the process program counter determines which process instruction is executed next. The resulting stream of instructions, called a thread of execution, can be represented by the sequence of instruction addresses assigned to the program counter during the execution of the program's code.

Example 2.1

Process 1 executes statements 245, 246 and 247 in a loop. Its thread of execution can be represented as 2451, 2461, 2471, 2451, 2461, 2471, 2451, 2461, 2471 . . . , where the subscripts identify the thread of execution as belonging to process 1.

The sequence of instructions in a thread of execution appears to the process as an uninterrupted stream of addresses. From the point of view of the processor, however, the threads of execution from different processes are intermixed. The point at which execution switches from one process to another is called a context switch.

Example 2.2

Process 1 executes its statements 245, 246 and 247 in a loop as in Example 2.1, and process 2 executes its statements 10, 11, 12 . . . . The CPU executes instructions in the order 2451, 2461, 2471, 2451, 2461, [context-switch instructions], 102, 112, 122, 132, [context-switch instructions], 2471, 2451, 2461, 2471 . . . . Context switches occur between 2461 and 102 and between 132 and 2471. The processor sees the threads of execution interleaved, whereas the individual processes see uninterrupted sequences.

A natural extension of the process model allows multiple threads to execute within the same process. Multiple threads avoid context switches and allow sharing of code and data. The approach may improve program performance on machines with multiple processors. Programs with natural parallelism in the form of independent tasks operating on shared data can take advantage of added execution power on these multiple-processor machines. Operating systems have significant natural parallelism and perform better by having multiple, simultaneous threads of execution. Vendors advertise symmetric multiprocessing support in which the operating system and applications have multiple undistinguished threads of execution that take advantage of parallel hardware.

A thread is an abstract data type that represents a thread of execution within a process. A thread has its own execution stack, program counter value, register set and state. By declaring many threads within the confines of a single process, a programmer can write programs that achieve parallelism with low overhead. While these threads provide low-overhead parallelism, they may require additional synchronization because they reside in the same process address space and therefore share process resources. Some people call processes heavyweight because of the work needed to start them. In contrast, threads are sometimes called lightweight processes.

Layout of a Program Image

After loading, the program executable appears to occupy a contiguous block of memory called a program image. shows a sample layout of a program image in its logical address space [112]. The program image has several distinct sections. The program text or code is shown in low-order memory. The initialized and uninitialized static variables have their own sections in the image. Other sections include the heap, stack and environment.


An activation record is a block of memory allocated on the top of the process stack to hold the execution context of a function during a call. Each function call creates a new activation record on the stack. The activation record is removed from the stack when the function returns, providing the last-called-first-returned order for nested function calls.

The activation record contains the return address, the parameters (whose values are copied from the corresponding arguments), status information and a copy of some of the CPU register values at the time of the call. The process restores the register values on return from the call represented by the record. The activation record also contains automatic variables that are allocated within the function while it is executing. The particular format for an activation record depends on the hardware and on the programming language.

In addition to the static and automatic variables, the program image contains space for argc and argv and for allocations by malloc. The malloc family of functions allocates storage from a free memory pool called the heap. Storage allocated on the heap persists until it is freed or until the program exits. If a function calls malloc, the storage remains allocated after the function returns. The program cannot access the storage after the return unless it has a pointer to the storage that is accessible after the function returns.

Static variables that are not explicitly initialized in their declarations are initialized to 0 at run time. Notice that the initialized static variables and the uninitialized static variables occupy different sections in the program image. Typically, the initialized static variables are part of the executable module on disk, but the uninitialized static variables are not. Of course, the automatic variables are not part of the executable module because they are only allocated when their defining block is called. The initial values of automatic variables are undetermined unless the program explicitly initializes them.

Exercise 2.3

Use ls -l to compare the sizes of the executable modules for the following two C programs. Explain the results.

Version 1: largearrayinit.c

int myarray[50000] = {1, 2, 3, 4};

int main(void) {
myarray[0] = 3;
return 0;
}

Version 2: largearray.c

int myarray[50000];

int main(void) {
myarray[0] = 3;
return 0;
}

Answer:

The executable module for Version 1 should be about 200,000 bytes larger than that of Version 2 because the myarray of Version 1 is initialized static data and is therefore part of the executable module. The myarray of Version 2 is not allocated until the program is loaded in memory, and the array elements are initialized to 0 at that time.

Static variables can make a program unsafe for threaded execution. For example, the C library function readdir and its relatives described in Section 5.2 use static variables to hold return values. The function strtok discussed in Section 2.6 uses a static variable to keep track of its progress between calls. Neither of these functions can be safely called by multiple threads within a program. In other words, they are not thread-safe. External static variables also make code more difficult to debug because successive invocations of a function that references a static variable may behave in unexpected ways. For these reasons, avoid using static variables except under controlled circumstances. Section 2.9 presents an example of when to use variables with static storage class.

Although the program image appears to occupy a contiguous block of memory, in practice, the operating system maps the program image into noncontiguous blocks of physical memory. A common mapping divides the program image into equal-sized pieces, called pages. The operating system loads the individual pages into memory and looks up the location of the page in a table when the processor references memory on that page. This mapping allows a large logical address space for the stack and heap without actually using physical memory unless it is needed. The operating system hides the existence of such an underlying mapping, so the programmer can view the program image as logically contiguous even when some of the pages do not actually reside in memory.

Library Function Calls

We introduce most library functions by a condensed version of its specification, and you should always refer to the man pages for more complete information.

The summary starts with a brief description of the function and its parameters, followed by a SYNOPSIS box giving the required header files and the function prototype. (Unfortunately, some compilers do not give warning messages if the header files are missing, so be sure to use lint as described in Appendix A to detect these problems.) The SYNOPSIS box also names the POSIX standard that specifies the function. A description of the function return values and a discussion of how the function reports errors follows the SYNOPSIS box. Here is a typical summary.

The close function deallocates the file descriptor specified by fildes.

SYNOPSIS

#include

int close(int fildes);
POSIX

If successful, close returns 0. If unsuccessful, close returns –1 and sets errno. The following table lists the mandatory errors for close.

errno

cause

EBADF

fildes is not valid

EINTR

close was interrupted by a signal

This book's summary descriptions generally include the mandatory errors. These are the errors that the standard requires that every implementation detect. We include these particular errors because they are a good indication of the major points of failure. You must handle all errors, not just the mandatory ones. POSIX often defines many other types of optional errors. If an implementation chooses to treat the specified condition as an error, then it should use the specified error value. Implementations are free to define other errors as well. When there is only one mandatory error, we describe it in a sentence. When the function has more than one mandatory error, we use a table like the one for close.

Traditional UNIX functions usually return –1 (or sometimes NULL) and set errno to indicate the error. The POSIX standards committee decided that all new functions would not use errno and would instead directly return an error number as a function return value. We illustrate both ways of handling errors in examples throughout the text.

Example 2.4

The following code segment demonstrates how to call the close function.

int fildes;

if (close(fildes) == -1)
perror("Failed to close the file");

The code assumes that the unistd.h header file has been included in the source. In general, we do not show the header files for code segments.

The perror function outputs to standard error a message corresponding to the current value of errno. If s is not NULL, perror outputs the string (an array of characters terminated by a null character) pointed to by s and followed by a colon and a space. Then, perror outputs an error message corresponding to the current value of errno followed by a newline.

SYNOPSIS

#include

void perror(const char *s);
POSIX:CX

No return values and no errors are defined for perror.

Example 2.5

The output produced by Example 2.4 might be as follows.

Failed to close the file: invalid file descriptor

The strerror function returns a pointer to the system error message corresponding to the error code errnum.

SYNOPSIS

#include

char *strerror(int errnum);
POSIX:CX

If successful, strerror returns a pointer to the error string. No values are reserved for failure.

Use strerror to produce informative messages, or use it with functions that return error codes directly without setting errno.

Example 2.6

The following code segment uses strerror to output a more informative error message when close fails.

int fildes;

if (close(fildes) == -1)
fprintf(stderr, "Failed to close file descriptor %d: %s\n",
fildes, strerror(errno));

The strerror function may change errno. You should save and restore errno if you need to use it again.

Example 2.7

The following code segment illustrates how to use strerror and still preserve the value of errno.

int error;
int fildes;

if (close(fildes) == -1) {
error = errno; /* temporarily save errno */
fprintf(stderr, "Failed to close file descriptor %d: %s\n",
fildes, strerror(errno));
errno = error; /* restore errno after writing the error message */
}

Correctly handing errno is a tricky business. Because its implementation may call other functions that set errno, a library function may change errno, even though the man page doesn't explicitly state that it does. Also, applications cannot change the string returned from strerror, but subsequent calls to either strerror or perror may overwrite this string.

Another common problem is that many library calls abort if the process is interrupted by a signal. Functions generally report this type of return with an error code of EINTR. For example, the close function may be interrupted by a signal. In this case, the error was not due to a problem with its execution but was a result of some external factor. Usually the program should not treat this interruption as an error but should restart the call.

Example 2.8

The following code segment restarts the close function if a signal occurs.

int error;
int fildes;

while (((error = close(fildes)) == -1) && (errno == EINTR)) ;
if (error == -1)
perror("Failed to close the file"); /* a real close error occurred */

The while loop of Example 2.8 has an empty statement clause. It simply calls close until it either executes successfully or encounters a real error. The problem of restarting library calls is so common that we provide a library of restarted calls with prototypes defined in restart.h. The functions are designated by a leading r_ prepended to the regular library name. For example, the restart library designates a restarted version of close by the name r_close.

Example 2.9

The following code segment illustrates how to use a version of close from the restart library.

#include "restart.h"     /* user-defined library not part of standard */
int fildes;

if (r_close(fildes) == -1)
perror("Failed to close the file"); /* a true close error occurred

Technology's Impact on Programs

Terminology of Change

Computer power has increased exponentially for nearly fifty years [73] in many areas including processor, memory and mass-storage capacity, circuit density, hardware reliability and I/O bandwidth. The growth has continued in the past decade, along with sophisticated instruction pipelines on single CPUs, placement of multiple CPUs on the desktop and an explosion in network connectivity.

The dramatic increases in communication and computing power have triggered fundamental changes in commercial software.

  • Large database and other business applications, which formerly executed on a mainframe connected to terminals, are now distributed over smaller, less expensive machines.

  • Terminals have given way to desktop workstations with graphical user interfaces and multimedia capabilities.

  • At the other end of the spectrum, standalone personal computer applications have evolved to use network communication. For example, a spreadsheet application is no longer an isolated program supporting a single user because an update of the spreadsheet may cause an automatic update of other linked applications. These could graph the data or perform sales projections.

  • Applications such as cooperative editing, conferencing and common whiteboards facilitate group work and interactions.

  • Computing applications are evolving through sophisticated data sharing, realtime interaction, intelligent graphical user interfaces and complex data streams that include audio and video as well as text.

These developments in technology rely on communication, concurrency and asynchronous operation within software applications.

Asynchronous operation occurs because many computer system events happen at unpredictable times and in an unpredictable order. For example, a programmer cannot predict the exact time at which a printer attached to a system needs data or other attention. Similarly, a program cannot anticipate the exact time that the user presses a key for input or interrupts the program. As a result, a program must work correctly for all possible timings in order to be correct. Unfortunately, timing errors are often hard to repeat and may only occur once every million executions of a program.

Concurrency is the sharing of resources in the same time frame. When two programs execute on the same system so that their execution is interleaved in time, they share processor resources. Programs can also share data, code and devices. The concurrent entities can be threads of execution within a single program or other abstract objects. Concurrency can occur in a system with a single CPU, multiple CPUs sharing the same memory, or independent systems running over a network. A major job of a modern operating system is to manage the concurrent operations of a computer system and its running applications. However, concurrency control has also become an integral part of applications. Concurrent and asynchronous operations share the same problems—they cause bugs that are often hard to reproduce and create unexpected side effects.

Communication is the conveying of information by one entity to another. Because of the World Wide Web and the dominance of network applications, many programs must deal with I/O over the network as well as from local devices such as disks. Network communication introduces a myriad of new problems resulting from unpredictable timings and the possibility of undetected remote failures.

The remainder of this chapter describes simplified examples of asynchronous operation, concurrency and communication. The buffer overflow problem illustrates how careless programming and lack of error checking can cause serious problems and security breaches. This chapter also provides a brief overview of how operating systems work and summarizes the operating system standards that are used in the book.

Time and Speed

Operating systems manage system resources: processors, memory and I/O devices including keyboards, monitors, printers, mouse devices, disks, tapes, CD-ROMs and network interfaces. The convoluted way operating systems appear to work derives from the characteristics of peripheral devices, particularly their speed relative to the CPU or processor. Table 1.1 lists typical processor, memory and peripheral times in nanoseconds. The third column shows these speeds slowed down by a factor of 2 billion to give the time scaled in human terms. The scaled time of one operation per second is roughly the rate of the old mechanical calculators from fifty years ago.

Table 1.1. Typical times for components of a computer system. One nanosecond (ns) is 10–9 seconds, one microsecond (ms) is 10–6 seconds, and one millisecond (ms) is 10–3 seconds.

item

time

scaled time in human terms (2 billion times slower)

processor cycle

0.5 ns

(2 GHz)

1

second

cache access

1 ns

(1 GHz)

2

seconds

memory access

15 ns

30

seconds

context switch

5,000 ns

(5 ms)

167

minutes

disk access

7,000,000 ns

(7 ms)

162

days

quantum

100,000,000 ns

(100 ms)

6.3

years

Disk drives have improved, but their rotating mechanical nature limits their performance. Disk access times have not decreased exponentially. The disparity between processor and disk access times continues to grow; as of 2003 the ratio is roughly 1 to 14,000,000 for a 2-GHz processor. The cited speeds are a moving target, but the trend is that processor speeds are increasing exponentially, causing an increasing performance gap between processors and peripherals.

The context-switch time is the time it takes to switch from executing one process to another. The quantum is roughly the amount of CPU time allocated to a process before it has to let another process run. In a sense, a user at a keyboard is a peripheral device. A fast typist can type a keystroke every 100 milliseconds. This time is the same order of magnitude as the process scheduling quantum, and it is no coincidence that these numbers are comparable for interactive timesharing systems.

Exercise 1.1

A modem is a device that permits a computer to communicate with another computer over a phone line. A typical modem is rated at 57,600 bps, where bps means "bits per second." Assuming it takes 8 bits to transmit a byte, estimate the time needed for a 57,600 bps modem to fill a computer screen with 25 lines of 80 characters. Now consider a graphics display that consists of an array of 1024 by 768 pixels. Each pixel has a color value that can be one of 256 possible colors. Assume such a pixel value can be transmitted by modem in 8 bits. What compression ratio is necessary for a 768-kbps DSL line to fill a screen with graphics as fast as a 57,600-bps modem can fill a screen with text?

Answer:

Table 1.2 compares the times. The text display has 80 x 25 = 2000 characters so 16,000 bits must be transmitted. The graphics display has 1024 x 768 = 786,432 pixels so 6,291,456 bits must be transmitted. The estimates do not account for compression or for communication protocol overhead. A compression ratio of about 29 is necessary!


Table 1.2. Comparison of time estimates for filling a screen.

modem type

bits per second

time needed to display

text

graphics

1979 telephone modem

300

1 minute

6 hours

1983 telephone modem

2,400

6 seconds

45 minutes

current telephone modem

57,600

0.28 seconds

109 seconds

current DSL modem

768,000

0.02 seconds

8 seconds

Multiprogramming and Time Sharing

Observe from Table 1.1 that processes performing disk I/O do not use the CPU very efficiently: 0.5 nanoseconds versus 7 milliseconds, or in human terms, 1 second versus 162 days. Because of the time disparity, most modern operating systems do multiprogramming. Multiprogramming means that more than one process can be ready to execute. The operating system chooses one of these ready processes for execution. When that process needs to wait for a resource (say, a keystroke or a disk access), the operating system saves all the information needed to resume that process where it left off and chooses another ready process to execute. It is simple to see how multiprogramming might be implemented. A resource request (such as read or write) results in an operating system request (i.e., a system call). A system call is a request to the operating system for service that causes the normal CPU cycle to be interrupted and control to be given to the operating system. The operating system can then switch to another process.

Exercise 1.2

Explain how a disk I/O request might allow the operating system to run another process.

Answer:

Most devices are handled by the operating system rather than by applications. When an application executes a disk read, the call issues a request for the operating system to actually perform the operation. The operating system now has control. It can issue commands to the disk controller to begin retrieving the disk blocks requested by the application. However, since the disk retrieval does not complete for a long time (162 days in relative time), the operating system puts the application's process on a queue of processes that are waiting for I/O to complete and starts another process that is ready to run. Eventually, the disk controller interrupts the CPU instruction cycle when the results are available. At that time, the operating system regains control and can choose whether to continue with the currently running process or to allow the original process to run.

UNIX does timesharing as well as multiprogramming. Timesharing creates the illusion that several processes execute simultaneously, even though there may be only one physical CPU. On a single processor system, only one instruction from one process can be executing at any particular time. Since the human time scale is billions of times slower than that of modern computers, the operating system can rapidly switch between processes to give the appearance of several processes executing at the same time.

Consider the following analogy. Suppose a grocery store has several checkout counters (the processes) but only one checker (the CPU). The checker checks one item from a customer (the instruction) and then does the next item for that same customer. Checking continues until a price check (a resource request) is needed. Instead of waiting for the price check and doing nothing, the checker moves to another checkout counter and checks items from another customer. The checker (CPU) is always busy as long as there are customers (processes) ready to check out. This is multiprogramming. The checker is efficient, but customers probably would not want to shop at such a store because of the long wait when someone has a large order with no price checks (a CPU-bound process).

Now suppose that the checker starts a 10-second timer and processes items for one customer for a maximum of 10 seconds (the quantum). If the timer expires, the checker moves to another customer even if no price check is needed. This is timesharing. If the checker is sufficiently fast, the situation is almost equivalent to having one slower checker at each checkout stand. Consider making a video of such a checkout stand and playing it back at 100 times its normal speed. It would look as if the checker were handling several customers simultaneously.

Exercise 1.3

Suppose that the checker can check one item per second (a one-second processor cycle time in Table 1.1). According to this table, what would be the maximum time the checker would spend with one customer before moving to a waiting customer?

Answer:

The time is the quantum that is scaled in the table to 6.3 years. A program may execute billions of instructions in a quantum—a bit more than the number of grocery items purchased by the average customer.

If the time to move from one customer to another (the context-switch time) is small compared with the time between switches (the CPU burst time), the checker handles customers efficiently. Timesharing wastes processing cycles by switching between customers, but it has the advantage of not wasting the checker resources during a price check. Furthermore, customers with small orders are not held in abeyance for long periods while waiting for customers with large orders.

The analogy would be more realistic if instead of several checkout counters, there were only one, with the customers crowded around the checker. To switch from customer A to customer B, the checker saves the contents of the register tape (the context) and restores it to what it was when it last processed customer B. The context-switch time can be reduced if the cash register has several tapes and can hold the contents of several customers' orders simultaneously. In fact, some computer systems have special hardware to hold many contexts at the same time.

Multiprocessor systems have several processors accessing a shared memory. In the checkout analogy for a multiprocessor system, each customer has an individual register tape and multiple checkers rove the checkout stands working on the orders for unserved customers. Many grocery stores have packers who do this.

UNIX Standards

Not too long ago, two distinct and somewhat incompatible "flavors" of UNIX, System V from AT&T and BSD from Berkeley coexisted. Because no official standard existed, there were major and minor differences between the versions from different vendors, even within the same flavor. Consequently, programs written for one type of UNIX would not run correctly or sometimes would not even compile under a UNIX from another vendor.

The IEEE (Institute of Electronic and Electrical Engineers) decided to develop a standard for the UNIX libraries in an initiative called POSIX. POSIX stands for Portable Operating System Interface and is pronounced pahz-icks, as stated explicitly by the standard. IEEE's first attempt, called POSIX.1, was published in 1988. When this standard was adopted, there was no known historical implementation of UNIX that would not have to change to meet the standard. The original standard covered only a small subset of UNIX. In 1994, the X/Open Foundation published a more comprehensive standard called Spec 1170, based on System V. Unfortunately, inconsistencies between Spec 1170 and POSIX made it difficult for vendors and application developers to adhere to both standards.

In 1998, after another version of the X/Open standard, many additions to the POSIX standard, and the threat of world-domination by Microsoft, the Austin Group was formed. This group included members from The Open Group (a new name for the X/Open Foundation), IEEE POSIX and the ISO/IEC Joint Technical Committee. The purpose of the group was to revise, combine and update the standards. Finally, at the end of 2001, a joint document was approved by the IEEE and The Open Group. The ISO/IEC approved this document in November of 2002. This specification is referred to as the Single UNIX Specification, Version 3, or IEEE Std. 1003.1-2001, POSIX. In this book we refer to this standard merely as POSIX.

Each of the standards organizations publishes copies of the standard. Print and electronic versions of the standard are available from IEEE and ISO/IEC. The Open Group publishes the standard on CD-ROM. It is also freely available on their web site [89]. The copy of the standard published by the IEEE is in four volumes: Base Definitions [50], Shell and Utilities [52], System Interfaces [49] and Rationale [51] and is over 3600 pages in length.

The code for this book was tested on three systems: Solaris 9, Redhat Linux 8 and Mac OS 10.2. Table 1.3 lists the extensions of POSIX discussed in the book and the status of implementation of each on the tested systems. This indication is based on the man pages and on running the programs from the book, not on any official statement of compliance.

Table 1.3. POSIX extensions supported by our test systems.

code

extension

Solaris 9

Redhat 8

Mac OS 10.2

AIO

asynchronous input and output

yes

yes

no

CX

extension to the ISO C standard

yes

yes

yes

FSC

file synchronization

yes

yes

yes

RTS

realtime signals extension

yes

yes

no

SEM

semaphores

yes

unnamed only

named only

THR

threads

yes

almost

yes

TMR

timers

yes

yes

no

TPS

thread execution scheduling

yes

yes

yes

TSA

thread stack address attribute

no

no

no

TSF

thread-safe functions

yes

strtok_r only

yes

XSI

XSI extension

yes

yes

timers, getsid, ftok, no IPC

_POSIX_VERSION

199506

199506

198808

A POSIX-compliant implementation must support the POSIX base standard. Many of the interesting aspects of POSIX are not part of the base standard but rather are defined as extensions to the base standard. Table E.1 of Appendix E gives a complete list of the extensions in the 2001 version of POSIX. Appendix E applies only to implementations that claim compliance with the 2001 version base standard. These implementations set the symbol _POSIX_VERSION defined in unistd.h to 200112L. As of the writing of this book, none of the systems we tested used this value. Systems that support the previous version of POSIX have a value of 199506L. Differences between the 1995 and 2001 standards for features supported by both are minor.

The new POSIX standard also incorporates the ISO/IEC International Standard 9899, also referred to as ISO C. In the past, minor differences between the POSIX and ISO C standards have caused confusion. Often, these differences were unintentional, but differences in published standards required developers to choose between them. The current POSIX standard makes it clear that any differences between the published POSIX standard and the ISO C standard are unintentional. If any discrepancies occur, the ISO C standard takes precedence.