[Kos-dev] Thread et fork()

Fri Oct 31 16:37:06 CET 2003

Salut,

Extrait d'un forum trouvé grâce à Google. C'est une réponse d'un certain Loic
Demaigne :

-----------------------------------
Hello Peteris,

     > > OTOH using fork() with threads is usually a bad, bad,
     > > very bad idea!
  >
  > Please elaborate.

So you want to know more about fork() and Pthreads? Here's a
simplified story.

Prior to the advent of Pthreads, fork() was really simple and well
understood. A process was in reality a single thread, and forking had
a well defined meaning: creating a new process whose image after the
fork() equals the one prior to the fork().

But then, Pthreads came. From a Pthreads point of view, a process is a
collection of threads. So the first natural question that arose was:
what does it mean when a thread fork()? Does it mean that only the
thread calling fork() exists it the child (this is call "fork one").
Or does it mean that the whole process must be forked, i.e. that all
threads exist in the child ("fork all")?

The POSIX guys opted for the "fork one" semantic, because it was
"simpler". Fine. But this approach had a major problem: the data
invariants accross the fork() are not necessarily protected. Indeed,
recall that fork() duplicates the whole memory space, including
mutexes in their current locking state, but only the calling thread:
other threads are not running in the child process. Thus, if a mutex
is locked by a thread other than the thread calling fork, that mutex
will remain locked forever in the child process, possibly blocking
the execution of the child process.

To face to that problem, the POSIX guys introduced the famous
pthread_atfork(). The function pthread_atfork() allows you to install
fork handlers, in order to ensure the proper locking/unlocking of
mutexes accross fork(). So far, so good. The Pthreads interface, aka
Posix 1003.1c-1995, was released...

Some times after, David Butenhof (one of the Pthread masters), noticed
a big issue that have been sadly overlooked by the standard. Namely,
that fork() is defined in the list of async-signal-safe functions,
namely the functions than can be safely called in signal handlers. And
that's a problem, because if you install fork handlers, you might call
the pthread_mutex_* and none of the Pthreads interface is
async-signal-safe (for efficiency reasons).

This issue causes a lot of headaches to the POSIX guys, and if I
understood correctly, there is no portable way to ensure that the data
invariants are protected accross a fork(). A rule of thumb stated by
the POSIX guys is what
Paul explained:

"any async-signal unsafe functions between the return from fork(), and
subsequent exec.Otherwise, the behavior might be undefined."

(As a matter of fact, this is a sufficient condition to ensure
portability, but not a necessary one. This might be weaken).

==========================================
A Mini-guide regarding fork() and Pthreads
==========================================

1- You DO NOT WANT to do that.

2- If you needs to fork() then: whenever possible, fork() all your
childs prior to starting any threads.

3- If you need to fork() in a thread then: exec immediately a new
program in the child process.

4- Take care about reentrancy wrt. POSIX signals between the fork()
and the exec*()

Conclusion:

As stated humoristically by David Butenhof:
"it is not polite to say to a thread: fork you!" ;-)

Regards,
Loic.
--------------------------------

Voilà, ça donne déjà quelques informations à ce sujet. Finalement notre décision
: si il y a plus d'un thread => pas de fork() n'est pas si stupide que ça !

Thomas
-- 
Thomas Petazzoni
thomas.petazzoni at enix.org - Jabber: kostom at sourcecode.de
http://www.enix.org/~thomas/