C Programming Linux Kernel

Your First C Program Using Fork System Call

By default, C programs have no concurrency or parallelism, only one task happens at a time, each line of code is read sequentially. But sometimes, you have to read a file or – even worst – a socket connected to a remote computer and this takes really a long time for a computer. It takes generally less than a second but remember that a single CPU core can execute 1 or 2 billions of instructions during that time.

So, as a good developer, you will be tempted to instruct your C program to do something more useful while waiting. That’s where concurrency programming is here for your rescue – and makes your computer unhappy because it has to work more.

Here, I’ll show you the Linux fork system call, one of the safest way to do concurrent programming.

Concurrent programming can be unsafe?

Yes, it can. For example, there’s also another way calling multithreading. It has the benefit to be lighter but it can really go wrong if you use it incorrectly. If your program, by mistake, reads a variable and write to the same variable at the same time, your program will become incoherent and it’s almost undetectable – one of the worst developer’s nightmare.

As you will see below, fork copies the memory so it’s not possible to have such problems with variables. Also, fork makes an independent process for each concurrent task. Due to these security measures, it’s approximately 5x slower to launch a new concurrent task using fork than with multithreading. As you can see, that’s not much for the benefits it brings.

Now, enough of explanations, it’s time to test your first C program using fork call.

The Linux fork example

Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/wait.h>

int main() {
pid_t forkStatus;

forkStatus = fork();

/* Child... */
if        (forkStatus == 0) {
printf("Child is running, processing.\n");
sleep(5);
printf("Child is done, exiting.\n");

/* Parent... */
} else if (forkStatus != -1) {
printf("Parent is waiting...\n");

wait(NULL);
printf("Parent is exiting...\n");

} else {
perror("Error while calling the fork function");

}

return 0;

}

I invite you to test, compile and execute the code above but if you want to see what the output would look like and you are too “lazy” to compile it – after all, you are maybe a tired developer who compiled C programs all day long – you can find the output of the C program below along with the command I used to compile it:

$ gcc -std=c89 -Wpedantic -Wall forkSleep.c -o forkSleep -O2
$ ./forkSleep
Parent is waiting...
Child is running, processing.
Child is done, exiting.
Parent is exiting...

Please don’t be afraid if the output isn’t 100% identical to my output above. Remember that running things at the same time means that tasks are running out-of-order, there’s no predefined ordering. In this example, you might see that child is running before parent is waiting, and there’s nothing wrong with that. In general, the ordering depends of the kernel version, the number of CPU cores, the programs that are currently running on your computer, etc.

OK, now get back to the code. Before the line with fork(), this C program is perfectly normal: 1 line is executing at a time, there’s only one process for this program (if there was a small delay before fork, you could confirm that in your task manager).

After the fork(), there’s now 2 processes that can run in parallel. First, there’s a child process. This process is the one that has been created upon fork(). This child process is special: it hasn’t executed any of the lines of code above the line with fork(). Instead of looking for the main function, it will rather run the fork() line.

What about the variables declared before fork?

Well, Linux fork() is interesting because it smartly answers this question. Variables and, in fact, all the memory in C programs is copied into the child process.

Let me define what is doing fork in a few words: it creates a clone of the process calling it. The 2 processes are almost identical: all variables will contain the same values and both processes will execute the line just after fork(). However, after the cloning process, they are separated. If you update a variable in one process, the other process won’t have its variable updated. It’s really a clone, a copy, the processes shares almost nothing. It’s really useful: you can prepare a lot of data and then fork() and use that data in all clones.

The separation starts when fork() returns a value. The original process (it’s called the parent process) will get the process ID of the cloned process. On the other side, the cloned process (this one is called the child process) will get the 0 number. Now, you should start to understand why I’ve put if/else if statements after the fork() line. Using return value, you can instruct the child to do something different of what the parent is doing – and believe me, it’s useful.

On one side, in the example code above, the child is doing a task that takes 5 seconds and prints a message. To imitate a process that takes a long time, I use the sleep function. Then, the child exits successfully.

On the other side, the parent prints a message, wait until child exits and finally prints another message. The fact parent wait for its child is important. As it’s an example, the parent is pending most of this time to wait for its child. But, I could have instructed the parent to do any kind of long-running tasks before telling it to wait. This way, it would have done useful tasks instead of waiting – after all, this is why we use fork(), no?

However, as I said above, it’s really important that parent waits for its childs. And it’s important because of zombie processes.

How waiting is important

Parents generally want to know if childs have finished their processing. For example, you want to run tasks in parallel but you certainly don’t want the parent to exit before childs are done, because if it happened, shell would give back a prompt while childs have not finished yet – which is weird.

The wait function allows to wait until one of the child processes is terminated. If a parent calls 10 times fork(), It will also need to call 10 times wait(), once for each child created.

But what happens if parent calls wait function while all childs have already exited? That’s where zombie processes are needed.

When a child exits before parent calls wait(), Linux kernel will let the child exit but it will keep a ticket telling the child has exited. Then, when the parent calls wait(), it will find the ticket, delete that ticket and the wait() function will return immediately because it knows the parent needs to know when the child has finished. This ticket is called a zombie process.

That’s why it’s important that parent calls wait(): if it doesn’t do so, zombie processes remains in memory and Linux kernel can’t keep many zombie processes in memory. Once limit is reached, your computer is unable to create any new process and so you will be in a very bad shape: even for killing a process, you may need to create a new process for that. For example, if you want to open your task manager to kill a process, you can’t, because your task manager will need a new process. Even worst, you cannot kill a zombie process.

That’s why calling wait is important: it allows the kernel clean up the child process instead of keep piling up with a list of terminated processes. And what if the parent exits without ever calling wait()?

Fortunately, as the parent is terminated, no one else can call wait() for these childs, so there’s no reason to keep these zombie processes. Therefore, when a parent exits, all remaining zombie processes linked to this parent are removed. Zombie processes are really only useful to allow parent processes to find that a child terminated before parent called wait().

Now, you may prefer to know some safety measures to allow you the best usage of fork without any problem.

Simple rules to have fork working as intended

First, if you know multithreading, please don’t fork a program using threads. In fact, avoid in general to mix multiple concurrency technologies. fork assumes to work in normal C programs, it only intends to clone one parallel task, not more.

Second, avoid to open or fopen files before fork(). Files is one of the only thing shared and not cloned between parent and child. If you read 16 bytes in parent, it will move the read cursor forward of 16 bytes both in the parent and in the child. Worst, if child and parent write bytes to the same file at the same time, the bytes of parent can be mixed with bytes of the child!

To be clear, outside of  STDIN, STDOUT, & STDERR, you really don’t want to share any open files with clones.

Third, be careful about sockets. Sockets are also shared between parent and childs. It’s useful in order to listen a port and then let have multiple child workers ready to handle a new client connection. However, if you use it wrongly, you will get in trouble.

Fourth, if you want to call fork() within a loop, do this with extreme care. Let’s take this code:

/* DO NOT COMPILE THIS */
const int targetFork = 4;
pid_t forkResult
 
for (int i = 0; i < targetFork; i++) {
forkResult = fork();
/* ... */
 
}

If you read the code, you might expect it to create 4 childs. But it will rather create 16 childs. It’s because childs will also execute the loop and so childs will, in turn, call fork(). When the loop is infinite, it’s called a fork bomb and is one of the ways to slow down a Linux system so much that it no longer works and will need a reboot. In a nutshell, keep in mind that Clone Wars isn’t only dangerous in Star Wars!

Now you have seen how a simple loop can go wrong, how to use loops with fork()? If you need a loop, always check fork’s return value:

const int targetFork = 4;
pid_t forkResult;
int i = 0;

do {
forkResult = fork();
/* ... */

i++;

} while ((forkResult != 0 && forkResult != -1) && (i < targetFork));

Conclusion

Now it’s time for you to do your own experiments with fork()! Try novel ways to optimize time by doing tasks across multiple CPU cores or do some background processing while you wait for reading a file!

Don’t hesitate to read the manual pages via the man command. You will learn about how fork() precisely works, what errors you can get, etc. And enjoy concurrency!

About the author

On Air Netlines

On Air Netlines

Experienced since more than a decade in computer science, I use Linux technology everyday for hobby and business purposes. Recently, I'm really working in Google Cloud Platform environment in order to bring flexibility to all my software systems. Reach me at: onairnetlines.com