Monday, December 10, 2007

Processes in UNIX

Process Identification

UNIX identifies processes by a unique integral value called the process ID. Each process also has a parent process ID, which is initially the process ID of the process that created it. If this parent process terminates, the process is adopted by a system process so that the parent process ID always identifies a valid process.

The getpid and getppid functions return the process ID and the parent process ID, respectively. The pid_t is an unsigned integer type that represents a process ID.

SYNOPSIS

#include

pid_t getpid(void);
pid_t getppid(void) ;
POSIX

Neither the getpid nor the getppid functions can return an error.

Example 3.1 outputPID.c

The following program outputs its process ID and its parent process ID. Notice that the return values are cast to long for printing since there is no guarantee that a pid_t will fit in an int.

#include 
#include

int main (void) {
printf("I am process %ld\n", (long)getpid());
printf("My parent is %ld\n", (long)getppid());
return 0;
}

System administrators assign a unique integral user ID and an integral group ID to each user when creating the user's account. The system uses the user and group IDs to retrieve from the system database the privileges allowed for that user. The most privileged user, superuser or root, has a user ID of 0. The root user is usually the system administrator.

A UNIX process has several user and group IDs that convey privileges to the process. These include the real user ID, the real group ID, the effective user ID and the effective group ID. Usually, the real and effective IDs are the same, but under some circumstances the process can change them. The process uses the effective IDs for determining access permissions for files. For example, a program that runs with root privileges may want to create a file on behalf of an ordinary user. By setting the process's effective user ID to be that of this user, the process can create the files "as if" the user created them. For the most part, we assume that the real and effective user and group IDs are the same.

The following functions return group and user IDs for a process. The gid_t and uid_t are integral types representing group and user IDs, respectively. The getgid and getuid functions return the real IDs, and getegid and geteuid return the effective IDs.

SYNOPSIS

#include

gid_t getegid(void);
uid_t geteuid(void);
git_t getgid(void);
uid_t getuid(void);
POSIX

None of these functions can return an error.

Example 3.2 outputIDs.c

The following program prints out various user and group IDs for a process.

#include 
#include

int main(void) {
printf("My real user ID is %5ld\n", (long)getuid());
printf("My effective user ID is %5ld\n", (long)geteuid());
printf("My real group ID is %5ld\n", (long)getgid());
printf("My effective group ID is %5ld\n", (long)getegid());
return 0;
}

Process State

The state of a process indicates its status at a particular time. Most operating systems allow some form of the states listed in Table 3.1. A state diagram is a graphical representation of the allowed states of a process and the allowed transitions between states. Figure 3.1 shows such a diagram. The nodes of the graph in the diagram represent the possible states, and the edges represent possible transitions. A directed arc from state A to state B means that a process can go directly from state A to state B. The labels on the arcs specify the conditions that cause the transitions between states to occur.

Figure 3.1. State diagram for a simple operating system.

graphics/03fig01.gif

While a program is undergoing the transformation into an active process, it is said to be in the new state. When the transformation completes, the operating system puts the process in a queue of processes that are ready to run. The process is then in the ready or runnable state. Eventually the component of the operating system called the process scheduler selects a process to run. The process is in the running state when it is actually executing on the CPU.

Table 3.1. Common process states.

state

meaning

new

being created

running

instructions are being executed

blocked

waiting for an event such as I/O

ready

waiting to be assigned to a processor

done

finished

A process in the blocked state is waiting for an event and is not eligible to be picked for execution. A process can voluntarily move to the blocked state by executing a call such as sleep. More commonly, a process moves to the blocked state when it performs an I/O request. As explained in Section 1.2, input and output can be thousands of times slower than ordinary instructions. A process performs I/O by requesting the service through a library function that is sometimes called a system call. During the execution of a system call, the operating system regains control of the processor and can move the process to the blocked state until the operation completes.

A context switch is the act of removing one process from the running state and replacing it with another. The process context is the information that the operating systems needs about the process and its environment to restart it after a context switch. Clearly, the executable code, stack, registers and program counter are part of the context, as is the memory used for static and dynamic variables. To be able to transparently restart a process, the operating system also keeps track of the process state, the status of program I/O, user and process identification, privileges, scheduling parameters, accounting information and memory management information. If a process is waiting for an event or has caught a signal, that information is also part of the context. The context also contains information about other resources such as locks held by the process.

The ps utility displays information about processes. By default, ps displays information about processes associated with the user. The -a option displays information for processes associated with terminals. The -A option displays information for all processes. The -o option specifies the format of the output.

SYNOPSIS ps [-aA] [-G grouplist] [-o format]...[-p proclist] [-t termlist] [-U userlist] POSIX Shells and Utilities
Example 3.3

The following is sample output from the ps -a command.

>% ps -a PID TTY TIME CMD 20825 pts/11 0:00 pine 20205 pts/11 0:01 bash 20258 pts/16 0:01 telnet 20829 pts/2 0:00 ps 20728 pts/4 0:00 pine 19086 pts/12 0:00 vi

The POSIX:XSI Extension provides additional arguments for the ps command. Among the most useful are the full (-f) and the long (-l) options. Table 3.2 lists the fields that are printed for each option. An (all) in the option column means that the field appears in all forms of ps.

Example 3.4

The execution of the ps -la command on the same system as for Example 3.3 produced the following output.

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 8 S 4228 20825 20205 0 40 20 ? 859 ? pts/11 0:00 pine 8 S 4228 20205 19974 0 40 20 ? 321 ? pts/11 0:01 bash 8 S 2852 20258 20248 0 40 20 ? 328 ? pts/16 0:01 telnet 8 O 512 20838 18178 0 50 20 ? 134 pts/2 0:00 ps 8 S 3060 20728 20719 0 40 20 ? 845 ? pts/4 0:00 pine 8 S 1614 19086 18875 0 40 20 ? 236 ? pts/12 0:00 vi

Table 3.2. Fields reported for various options of the ps command in the POSIX:XSI Extension.

header

option

meaning

F

-l

flags (octal and additive) associated with the process

S

-l

process state

UID

-f, -l

user ID of the process owner

PID

(all)

process ID

PPID

-f, -l

parent process ID

C

-f, -l

processor utilization used for scheduling

PRI

-l

process priority

NI

-l

nice value

ADDR

-l

process memory address

SZ

-l

size in blocks of the process image

WCHAN

-l

event on which the process is waiting

TTY

(all)

controlling terminal

TIME

(all)

cumulative execution time

CMD

(all)

command name (arguments with -f option)


UNIX Process Creation and fork

A process can create a new process by calling fork. The calling process becomes the parent, and the created process is called the child. The fork function copies the parent's memory image so that the new process receives a copy of the address space of the parent. Both processes continue at the instruction after the fork statement (executing in their respective memory images).

SYNOPSIS #include pid_t fork(void); POSIX

Creation of two completely identical processes would not be very useful. The fork function return value is the critical characteristic that allows the parent and the child to distinguish themselves and to execute different code. The fork function returns 0 to the child and returns the child's process ID to the parent. When fork fails, it returns –1 and sets the errno. If the system does not have the necessary resources to create the child or if limits on the number of processes would be exceeded, fork sets errno to EAGAIN. In case of a failure, the fork does not create a child.

Example 3.5 simplefork.c

In the following program, both parent and child execute the x = 1 assignment statement after returning from fork.

#include #include int main(void) { int x; x = 0; fork(); x = 1; printf("I am process %ld and my x is %d\n", (long)getpid(), x); return 0; }

Before the fork of Example 3.5, one process executes with a single x variable. After the fork, two independent processes execute, each with its own copy of the x variable. Since the parent and child processes execute independently, they do not execute the code in lock step or modify the same memory locations. Each process prints a message with its respective process ID and x value.

The parent and child processes execute the same instructions because the code of Example 3.5 did not test the return value of fork. Example 3.6 demonstrates how to test the return value of fork.

Example 3.6 twoprocs.c

After fork in the following program, the parent and child output their respective process IDs.

#include #include #include int main(void) { pid_t childpid; childpid = fork(); if (childpid == -1) { perror("Failed to fork"); return 1; } if (childpid == 0) /* child code */ printf("I am child %ld\n", (long)getpid()); else /* parent code */ printf("I am parent %ld\n", (long)getpid()); return 0; }

The original process in Example 3.6 has a nonzero value of the childpid variable, so it executes the second printf statement. The child process has a zero value of childpid and executes the first printf statement. The output from these processes can appear in either order, depending on whether the parent or the child executes first. If the program is run several times on the same system, the order of the output may or may not always be the same.

Exercise 3.7 badprocessID.c

What happens when the following program executes?

#include #include #include int main(void) { pid_t childpid; pid_t mypid; mypid = getpid(); childpid = fork(); if (childpid == -1) { perror("Failed to fork"); return 1; } if (childpid == 0) /* child code */ printf("I am child %ld, ID = %ld\n", (long)getpid(), (long)mypid); else /* parent code */ printf("I am parent %ld, ID = %ld\n", (long)getpid(), (long)mypid); return 0; }

Answer:

The parent sets the mypid value to its process ID before the fork. When fork executes, the child gets a copy of the process address space, including all variables. Since the child does not reset mypid, the value of mypid for the child does not agree with the value returned by getpid.

Program 3.1 creates a chain of n processes by calling fork in a loop. On each iteration of the loop, the parent process has a nonzero childpid and hence breaks out of the loop. The child process has a zero value of childpid and becomes a parent in the next loop iteration. In case of an error, fork returns –1 and the calling process breaks out of the loop. The exercises in Section 3.8 build on this program.

Figure 3.2 shows a graph representing the chain of processes generated for Program 3.1 when n is 4. Each circle represents a process labeled by its value of i when it leaves the loop. The edges represent the is-a-parent relationship. AB means process A is the parent of process B.

Figure 3.2. Chain of processes generated by Program 3.1 when called with a command-line argument of 4.

graphics/03fig02.gif

Program 3.1 simplechain.c

A program that creates a chain of n processes, where n is a command-line argument.

#include #include #include int main (int argc, char *argv[]) { pid_t childpid = 0; int i, n; if (argc != 2){ /* check for valid number of command-line arguments */ fprintf(stderr, "Usage: %s processes\n", argv[0]); return 1; } n = atoi(argv[1]); for (i = 1; i < childpid =" fork())" class="docExampleTitle">Exercise 3.8

Run Program 3.1 for large values of n. Will the messages always come out ordered by increasing i?

Answer:

The exact order in which the messages appear depends on the order in which the processes are selected by the process scheduler to run. If you run the program several times, you should notice some variation in the order.

Exercise 3.9

What happens if Program 3.1 writes the messages to stdout, using printf, instead of to stderr, using fprintf?

Answer:

By default, the system buffers output written to stdout, so a particular message may not appear immediately after the printf returns. Messages to stderr are not buffered, but instead written immediately. For this reason, you should always use stderr for your debugging messages.

Program 3.2 creates a fan of n processes by calling fork in a loop. On each iteration, the newly created process breaks from the loop while the original process continues. In contrast, the process that calls fork in Program 3.1 breaks from the loop while the newly created process continues for the next iteration.

Program 3.2 simplefan.c

A program that creates a fan of n processes where n is passed as a command-line argument.

#include #include #include int main (int argc, char *argv[]) { pid_t childpid = 0; int i, n; if (argc != 2){ /* check for valid number of command-line arguments */ fprintf(stderr, "Usage: %s processes\n", argv[0]); return 1; } n = atoi(argv[1]); for (i = 1; i < childpid =" fork())" class="docText">Figure 3.3 shows the process fan generated by Program 3.2 when n is 4. The processes are labeled by the value of i at the time they leave the loop. The original process creates n–1 children. The exercises in Section 3.9 build on this example.

Figure 3.3. Fan of processes generated by Program 3.2 with a command-line argument of 4.

graphics/03fig03.gif

Exercise 3.10

Explain what happens when you replace the test

(childpid = fork()) <= 0

of Program 3.2 with

(childpid = fork()) == -1

Answer:

In this case, all the processes remain in the loop unless the fork fails. Each iteration of the loop doubles the number of processes, forming a tree configuration illustrated in Figure 3.4 when n is 4. The figure represents each process by a circle labeled with the i value at the time it was created. The original process has a 0 label. The lowercase letters distinguish processes that were created with the same value of i. Although this code appears to be similar to that of Program 3.1, it does not distinguish between parent and child after fork executes. Both the parent and child processes go on to create children on the next iteration of the loop, hence the population explosion.

Exercise 3.11

Run Program 3.1, Program 3.2, and a process tree program based on the modification suggested in Exercise 3.10. Carefully examine the output. Draw diagrams similar to those of Figure 3.2 through Figure 3.4, labeling the circles with the actual process IDs. Use to designate the is-a-parent relationship. Do not use large values of the command-line argument unless you are on a dedicated system. How can you modify the programs so that you can use ps to see the processes that are created?

Answer:

In their current form, the programs complete too quickly for you to view them with ps. Insert the sleep(30); statement immediately before return in order to have each process block for 30 seconds before exiting. In another command window, continually execute ps -l. Section 3.4 explains why some of the processes may report a parent ID of 1 when sleep is omitted.

Figure 3.4. Tree of processes produced by the modification of Program 3.2 suggested in Exercise 3.10.

graphics/03fig04.gif

The fork function creates a new process by making a copy of the parent's image in memory. The child inherits parent attributes such as environment and privileges. The child also inherits some of the parent's resources such as open files and devices.

Not every parent attribute or resource is inherited by the child. For instance, the child has a new process ID and of course a different parent ID. The child's times for CPU usage are reset to 0. The child does not get locks that the parent holds. If the parent has set an alarm, the child is not notified when the parent's alarm expires. The child starts with no pending signals, even if the parent had signals pending at the time of the fork.

Although a child inherits its parent's process priority and scheduling attributes, it competes for processor time with other processes as a separate entity. A user running on a crowded time-sharing system can obtain a greater share of the CPU time by creating more processes. A system manager on a crowded system might restrict process creation to prevent a user from creating processes to get a bigger share of the resources.

No comments: