Novell Documentation: NetWare 6 - NetWare Integrated Kernel

NetWare Integrated Kernel

The core of the NetWare operating system is the integrated kernel. MPK (multiprocessing kernel) manages threads, schedules processor resources, handles interrupts and exceptions, and manages access to memory and the I/O subsystems.

For explanations and descriptions, see the following:

The Run Queue

Load Balancing

Concurrency and Parallelism

Pre-emption

Platform Support Module

The Run Queue

A thread is a stream of control that can execute its instructions independently. A simpler definition is that a thread is a unit of execution. It is not the code itself.

HINT: For more information on threads, see Bil Lewis and Daniel J. Berg's Threads Primer: A Guide to Multithreaded Programming ^© Sun Microsystems, Inc. Their definition of "thread" has been quoted above.

The kernel maintains a data structure called the run queue which contains threads that are in a state of readiness. In a uniprocessor system, there is only one run queue from which the processor can pick up threads for execution.

In a multiprocessing system where more than one processor is available, there is more than one possible solution. The distribution of threads to multiple processors could be handled by a global run queue (all processors sharing a single run queue) or by per-processor run queues (known also as local run queues or distributed run queues). Or by some combination of both.

To compare the two approaches:

Global run queue. This approach to distributing threads has the advantage of automatic load balancing. The reason is that no processor remains idle as long as the run queue has threads ready. However, the global solution has the drawback of becoming a bottleneck as the number of processors increases in a system, although under certain scheduling policies (such as a real time scheduling policy), a global queue might be necessary.

Per-processor run queue. This approach has the advantage of being able to exploit cache affinity---where threads are preferentially scheduled on the processor on which they last ran. In addition, this approach does not have the bottleneck problem associated with the global run queue approach.

With local queues, however, it becomes necessary to ensure that the load on the processors---the number of threads in the queue---does not become severely imbalanced. A load balancing mechanism is required to handle load imbalances so that threads do not pile up at one processor while another processor remains idle.

The NetWare kernel uses the per-processor run queue. As implemented, a processor can pick up threads for execution only from its local run queue. This makes the NetWare scheduler highly scalable compared to an implementation using a global run queue. To address load imbalance, NetWare uses a sophisticated load balancing algorithm.

Load Balancing

Two important requirements of any load balancing scheme are stability (not overreacting to small load imbalances) and the ability to distribute the processing load quickly.

The NetWare scheduler handles the stability requirement by using a threshold. The threshold determines how much load imbalance is permitted in the system before the load balancing mechanism kicks in.

Because a low threshold value would have less tolerance for load imbalances than a higher value, the lower value could lead to excessive thread movement due to frequent load balancing. A higher value would have more tolerance for load imbalances, with the result that the load balancing mechanism would be triggered less often. However, an optimum value of the threshold would prevent excessive thread migration while addressing load imbalance only as needed.

The NetWare scheduler periodically calculates the system-wide load and the mean load and uses the latter to compare loads and to determine whether an individual processor is overloaded or underloaded.

The load balancing threshold and the calculated mean load are then used to determine the high and low trigger loads.

A processor is overloaded when its load exceeds the high trigger load.

A processor is underloaded when it is below the low trigger load.

In this situation, the scheduler then moves threads from the overloaded processor to the underloaded processor with the result that the loads are balanced. See the following figure for an illustration of the relationship between the mean load, the load balancing threshold, and the high and low trigger loads.

Figure 1
Mean Load, Threshold, and High and Low Trigger Loads

Without the margin provided by the threshold, threads would constantly move from one processor to another, thereby compromising the productivity of the system.

WARNING: Although the threshold is configurable, we strongly recommend that you retain the preset optimum value. If after careful consideration, you decide to modify the threshold, test it in an isolated system before modifying the value for a production environment. If you modify the threshold, remember that you can always reset it at the optimum value.

You can modify the threshold through NetWare Remote Manager. For details, see Setting the Load Balancing Threshold.

Concurrency and Parallelism

NetWare has always had threads (typically referred to as processes) but until NetWare 6 has not exploited the potential for parallelism in multithreaded code. Multithreading enables multiple paths of parallel execution through the code path. A software developer identifies tasks that can be performed concurrently, that are not dependent on being performed in a fixed sequence, and provides the mechanisms for assigning tasks to multiple threads and for appropriate synchronization to protect data shared by the threads.

In a uniprocessor environment, multithreaded code allows threads to run concurrently. This means that one or more threads are active on the same processor. The threads appear to run at the same time---although they do not actually do so. One thread, for example, can be blocking while another thread is executing code. They are perceived as executing simultaneously because processors are very fast and time quanta are very small.

On the other hand, it is the availability of hardware systems with multiple processors that makes it possible to have multiple threads actually running at exactly the same time on different processors. When threads execute simultaneously on multiple processors, they are running in parallel. Multithreaded code allows more efficient processor utilization by exploiting parallelism.

With NetWare 6, applications can be written to exploit the parallelism available in multiprocessor (MP) hardware and the support for parallelism in the server operating system, and your system will benefit from the performance gains and scaling that server applications such as GroupWise^® provides.

NOTE: A single binary supports both uniprocessor and multiprocessor systems precisely because the NetWare kernel is multithreaded.

Pre-emption

NetWare allows for pre-emption of threads, within constraints. New NetWare modules can be written to be pre-emptible.

NOTE: Earlier versions of NetWare implemented a nonpre-emptible round-robin (First-in, First-out) scheduling policy where threads were scheduled to run in the order that they entered the run queue. On a uniprocessor system, NetWare is fast and very efficient.

For an application to exploit pre-emption, the code must be explicitly written to be pre-emptible. Critical section boundaries might be marked by calling scheduler API functions that signal a critical code section. Critical sectioning is used to keep program data in a consistent state and to prevent code that doesn't lend itself to concurrent execution from executing concurrently. A thread cannot be pre-empted when it is in a critical section.

By default, if an application thread is running, it will not be pre-empted until the following conditions are met.

The code where the thread is running must be pre-emptable. This is indicated by a flag set in the module's NLM^TM file format. When the code is loaded into memory, the memory pages are flagged as pre-emptible.

The thread cannot be in a critical section of the code.

The thread has run long enough to qualify for pre-emption. The scheduler checks the elapsed time with every tick.

Support for pre-emption provides

An execution environment that allows simplified application development. Developers can rely on the scheduler to handle pre-emption.

A mechanism that prevents ill-behaved modules from monopolizing the processor.

The kernel itself is not pre-emptible.

Platform Support Module

Besides NetWare, all that is necessary to enable multiprocessing on a multiprocessor computer is the Platform Support Module (PSM) for your specific hardware platform and NetWare. No other modules are required.

NOTE: Use the PSMs that load during installation. The PSMs for NetWare 4.11 SMP.NLM do not work for NetWare 5 or later.

The PSM is an interrupt abstraction for multiple processors. As a device driver for the processors, it shields NetWare from hardware-dependent or platform-specific details. It enables secondary processors to be brought online and taken offline.

During installation, NetWare detects multiple processors by reading the MP configuration table in BIOS and then determines which of the available NetWare Platform Support Modules (PSMs) matches the MP hardware platform.

The network administrator then has the option to load the PSM or to run NetWare on Processor 0 only. The installation program will modify the STARTUP.NCF file to load the PSM whenever the server is started.

Novell^® provides MPS14.PSM, which supports any hardware platform that complies with the Intel* Multiprocessor Specification 1.1 and 1.4. Compaq* also provides a PSM for its system requirements. Contact Tricord* for information on their PSM for NetWare.

In addition to scalability, NetWare multiprocessing offers these benefits:

Backward compatibility for applications written for NetWare 4.11 SMP.NLM. In fact, NetWare supports everything written to CLIB in previous releases. Older application code is simply funneled to Processor 0.

An integrated multiprocessing kernel (MPK) that also supports uniprocessing. One binary runs on both uniprocessor and multiprocessor hardware platforms.

Kernel support for pre-emption. (Applications must be explicitly enabled by their developers to take advantage of the kernel's pre-emption support.)

Platform support for MPK in a single platform support module.