Novell Doc: NDK: Libraries for C (LibC), Volume 2

49.1 Thread Migration

You can move a thread from one application to another. This movement of threads is called thread migration. This concept is specific to NetWare and is not found in other operating systems such as Linux or Windows.

The best type of migration occurs when a thread calls into and comes back out of a function in an NLM library that was written to support this functionality. The multiprocessor kernel (NetWare 5.x and above) and the LibC functions give NetWare the latest multithreaded technology and techniques, but only if your code follows good programming practices.

The problem of a thread becoming inoperative while it holds a lock or other resource, which denies its use from other threads, is an important issue for NLM applications because of their free relationships, especially while in the kernel mode (ring 0). In NetWare, when this situation occurs at unload and involves the system console thread, the console becomes permanently inoperative.

The solution to this cancellation problem is to use both implicit and explicit cancellation points that are maintained by the library and the application through polling. The application must know where its threads might wander and whether they might be indiscriminately cancelled. You should explicitly disable cancellation while in unsure circumstances (calls into foreign NLM applications, etc.) and re-enable cancellation once you are sure it is safe.

See the following sections for more about migration and cancellation issues:

49.1.1 Latency

In LibC, a thread cannot kill itself. However, one thread can kill another thread by calling pthread_cancel with a type of PTHREAD_CANCEL_DEFERRED. This call does not immediately kill a thread but instead marks the thread for death. The marked thread continues running until it reaches a cancellation point, at which time the library puts it to sleep and kills it. The difference between the time that the marking occurs and the time that the thread reaches its cancellation point is called latency.

When pthread_cancel with a type of PTHREAD_CANCEL_ASYNCHRONOUS is called, the specified thread has been cancelled. However, the call might not return immediately because a thread is not always in a state that permits cancellation. The library causes the cancel function to block until the specified thread reaches a cancellation point.

49.1.2 Cancellation Points

A cancellation point is an explicit point that indicates whether a thread can be safely suspended or killed. An application that calls into LibC is protected against incorrect behavior by the placement of these cancellation points and by the library's knowledge about each thread and its lock state. For example, when a thread acquires a lock of any sort, it is marked against cancellation. Even though another thread might mark the thread to be killed, that will not happen until the thread abandons all the locks that it holds. LibC does not honor cancellation requests when it knows that the situation is unsafe.

Most LibC functions that perform complex operations (or are likely to block or call through into NetWare or other low-level components) check for pending cancellation upon entry and, to reduce latency, again upon exit.

Unfortunately, other NLM applications might not follow these LibC practices. If you allow your application threads to migrate between NLM applications (a long-standing practice on NetWare), then resource management becomes your responsibility. For example, your application might call into foreign components under NetWare that can acquire locks or other resources that are not managed by LibC. To prevent those resources from being killed, you must perform explicit cancellation control by calling one of the following NetWare specific functions:

Function	Description
nxCancelCheck	Permits the library to cancel the calling thread if it has been marked by another thread for a cancellation operation.
nxCancelDisable	Protects the calling thread from being cancelled until nxCancelEnable is called.
nxCancelEnable	Frees the calling thread to be cancelled.

The following sections illustrate some of the issues involved with thread migration and cancellation points.

49.1.3 Cancellation Points in Functions

In the following graphic, an NLM calls into Lib C and the graphic illustrates potential cancellation points.

Figure 49-1 Cancellation Points

In this graphic, each lowercase “x” marks a cancellation point of the function being called. Two functions are shown as being checked both before and after the functions, another only before, and the last only after. NLM A calls only into LibC.

Because all of its tasks are known, you do not need to take any special cancellation precautions. For example, if a mutex is acquired by a given thread, thread X, it is marked. Should another thread, thread Y, call pthread_cancel on thread X, the cancel won't actually occur until after thread X has called pthread_mutex_unlock.

49.1.4 Deferring Cancellation During Thread Migration

In the following graphic, NLM A calls into NLM B, and NLM B calls (on A's thread) into libc.nlm.

Figure 49-2 Thread Migration.

While in NLM B, NLM A's thread might acquired a mutex or semaphore. Unless the design of NLM A included specific details about NLM B behavior and knew for certain that B would not acquire a lock, NLM A's thread could be suspended or killed. Because LibC knows nothing about the lock that B acquired, it does not disable cancellation for NLM A's thread. If NLM B is an important kernel service, both NLM A and NLM B could experience serious problems.

The solution is for NLM A to explicitly mark its thread as not-safe-to-cancel by calling nxCancelDisable, thereby deferring cancellation. When this thread returns from the called function in NLM B to NLM A, NLM A calls nxCancelEnable to restore the thread's ability to be cancelled. If the thread was marked for suspension or death by one of its sibling threads in NLM A, you could cancel the thread the next time an LibC call offered a cancellation checkpoint. (This deferral capability applies only to LibC because it refuses to stave off suspension or death of a thread that is cancel-deferred.)

NLM A's thread could also explicitly check to see if it was cancelled while in NLM B and allow itself to be cancelled by calling nxCancelCheck (polling).

49.1.5 Foreign Thread Cancellation

In the following graphic, a thread originating in NLM A migrates to NLM B, from which it migrates to NLM C and then into LibC.

Figure 49-3 Foreign Thread Cancellation.

In this case, any external code (NLM A) that calls into NLM B (if this NLM exports interfaces for use by yet other NLM applications) must know whether B's function acquires locks or migrates the calling thread into NLM C or any other NLM. You cannot assume that because it is a LibC NLM, NLM B is safe unless you certify that NLM C is correct. However, the thread of a foreign non-LibC NLM can safely call into NLM C (which only calls into libc.nlm) because libc.nlm cannot perform cancellation on a foreign thread.