Novell Home

AppNote: ZENworks 6.5 Server Management: Achieving Proactive Network Health Monitoring

Novell Cool Solutions: AppNote
By Shailaja Yadawad

Digg This - Slashdot This

Posted: 26 Aug 2004
 

Shailaja Yadawad
Engineering Manager

Abstract: This AppNote enables you to comprehend the working principles of Health Reports, a feature of ZENworks 6.5 Server Management. With this understanding you will be able to customize the health report configuration to best suit your requirements and leverage the feature more effectively.

Contents

1.0 Introduction
2.0 Health Report Generation
2.1 Health Report Configuration
2.1.1 Profile Configuration
2.1.2 Report Configuration
2.2 Health Data Gathering and Health Information Calculation
2.2.1 Server List Creation
2.2.2 Health Calculation
2.3 Health Report Writing
2.4 Health Report Next Execution Scheduling
2.5 Health Report Viewing
3.0 Customizing Your Environment Health Report Configuration
3.1 Generating the Health Report for a Specific Set of Servers
3.2 Changing the Publish Directory
3.3 Modifications to the Attribute Property of the Profile
4.0 Troubleshooting the Health Reports
4.1 MMS database
4.2 Trending Agents
4.3 Discontinuity in the Generation of Health Reports
4.4 Issues with Health Report Viewing
5.0 Conclusion
Appendix
Profile-Agent-Service (Table-1)
Health Report Profile Details (Table-2)
1.0 Introduction

ZENworks 6.5 Server Management adds additional capability to an already robust health reporting solution. Using the Heath Report feature of Management and Monitoring Services (MMS), the system administrator is able to gather and view information about the overall health of managed servers or the overall health of the network. Health Reports enable you to configure reports for problem-solving, proactive planning, and budgeting as you shape your network. The Health Report data is exported into an HTML file and can be viewed by any browser over either intranet or the Internet.

New to ZENworks 6.5 Server Management is the ability to monitor the health of Linux servers. Also new is the ability to group servers in custom containers and generate Health Reports at the custom container level. This gives you much greater control to monitor all the servers that you care about in a single report.

2.0 Health Report Generation

Health Report generation consists of three main activities:

  • Health Report Configuration
  • Health Report Generation
  • Health Report Viewing

Health Report Configuration

Health Report Configuration involves Profile Configuration and Report Configuration.

2.1.1 Profile Configuration
ZENworks Server Management (ZSM) enables users to compute the health of managed servers or the health of the entire network. Health Report configuration is characterized by a profile. Profiles are formed by selecting a set of attributes representing historical information of MIB variables of the corresponding ZSM trending agents (see Appendix, table-1). Attributes in a given profile are limited to those instrumented by the corresponding trending agent. Table-2 in the Appendix gives a list of attributes associated with each profile.

Profile for Server Health -- This profile is used to determine server health.
There are profiles for each of the operating system platforms supported by ZENworks for Servers:

  • NetWare Server Profile
  • Microsoft Windows Profile
  • Linux Server Profile
  • Profile for Network Health -- This type of profile is used to determine network health.
    Profiles are provided for the following networks:

  • Ethernet Network Profile
  • Token Ring Network Profile
  • FDDI Network Profile
  • The system administrator can create new profiles based on the standard profiles. In the new profile, the administrator may enable or disable attributes, change weights for specified attributes, and change the publish directory for saving the generated reports.

    2.1.2 Report Configuration
    Report configuration allows the administrator to create, modify, or delete reports. Health Reports are configured by associating a Profile, Schedule, and Scope.

    The administrator can associate the report to any available profile (standard or custom) through the "Profile" option. The report schedule can be configured with either a daily, weekly, or monthly schedule.

    Unlike profile and schedule configuration, changing the report scope does not happen explicitly. Report scope is configured when a user selects an entity and invokes the "Properties" option to configure the profile and schedule attributes. Any of the following entities can be configured as report scope:

  • Page
  • Segment
  • System (or custom) Atlas
  • Custom Container
  • Any change (addition/deletion/updating) to the report configuration updates and stores the configuration to the ZENworks Management and Monitoring Services (MMS) database.

    2.2 Health Data Gathering and Health Information Calculation

    Health report generation is triggered by the report schedule. The number of servers selected for report generation depends on the configured (or selected) report scope.

    The final step, Update next execution time, indicates the completion of the current iteration of report generation. The health report component updates the execution time for the next iteration.

    2.2.1 Server List Creation
    A list of the servers (or agents) is formed by querying the MMS database using the report profile and scope based on the report configuration. The scope determines the container from which to request servers. The request is further filtered based on the requested profile type which is attributed to service running on the server (see table-1). Hence under a selected container, a list of servers is formed running the requested service.

    For example, if the Report is configured at the "System Atlas Level" with NetWare Profile, a list of agents is formed by querying all the servers running "ManageWise_NetWareManagementAgentService" in the selected Atlas scope.

    2.2.2 Health Calculation
    The gathered Health Data is subjected to a predefined formula in order to arrive at comprehensive health information. Health is calculated in the following manner:

    Overall Health Calculation
    For each attribute included in overall health calculation, sample values based on the schedule specified while generating the report are collected, normalized and assigned a weight.

    Each attribute may have an associated weight attached to it as configured in the respective profile. Each attribute sample is then multiplied by the corresponding weight using the following formula (where, value is the particular sample after normalization, attributeWeight is the weight associated with the attribute and the TotalWeight is the total weight of all the attributes used in health calculation):

    Value = value * attributeWeight / TotalWeight

    Additional Health Information

    The other values displayed in Health Reports are based on the following calculations:

    Minimum Value = Minimum value of all values in a given sample
    Maximum Value = Maximum value of all values in a given sample
    Average Value = Sum of all Values / number of Samples


    Trend is calculated based on the following Slope formula (where n is the number of samples, x is the time at which these samples were captured and y is the trend values):

    Slope = (n * x *y - x * y) / (n * x * x - x * x)
    
    If Slope > 0, then the trend is increasing
    If Slope < 0, then the trend is decreasing
    If Slope = 0, then the trend is steady
    Intercept = (y - Slope * x) / n
    Next Week Projection or Next Month Projection = Slope * time + Intercept (where time is the Report Schedule Time)


    The threshold displayed on the health report is the threshold setting of the agent from which the health data is collected.

    % Uptime Calculation

    The %Uptime information is available for NetWare server reports only. Currently, the data required for calculation of the %Uptime is instrumented only on NetWare. The data required for %Uptime calculation is not instrumented by agents on the Windows and Linux server and hence %Uptime is shown without any values.

    Following is a typical health report. The profile chosen is NetWare with a daily schedule and an "Atlas" scope.


    Figure 1 - Typical Health Report

    2.3 Health Report Writing

    Health Reports write health data in a predefined format in the publish directory. The data written is segregated into different files (e.g., index.txt, agentCache, scope.txt, health.tbl, trendCache.txt, timeticks.txt). The main health data is stored in the form of CSV file format. It is very important not to modify the contents of these files as viewing of the health report will fail otherwise. These files are read-only.

    2.4 Health Report Next Execution Scheduling

    Health Reports update the next execution time of the Report upon receiving the Health Data response from all the servers in the server list. Based on the configured schedule of the Report, the next execution time of the report is calculated and will be placed into the MMS scheduler. The scheduler triggers the report generation at the next execution time.

    2.5 Health Report Viewing

    The Health Reports are designed to be persistent (i.e., generated reports are available for future viewing at anytime). The Health Report data is stored in several different files and when the user views the report, the health information is built using the existing files.

    Health Report viewing is implemented as an applet to allow viewing through a web-browser. The applet runs using Java plug-in 1.3.1_01 only and downloads it if not present on the system.

    The Health Report Applet is invoked by the Administrator through the index.html file placed in the Publish Directory.

    Health Reports have the Following Components:

    • Left pane tree to allow individual server report selection for viewing. This is built by reading the information present in index.txt, scope.txt, and agentcache.txt.
    • Overall summary for all servers in report. This is built by reading the information present in index.txt and health.tbl present in publish directory.
    • Summary for an agent's report. This is built by reading the information present in index.txt and health.tbl present in agent_<number directory>
    • % Uptime information got from timeTicks.txt file.
    • Overall Health Graph and Individual Trend Graph. These are built by reading the information from trendCache.txt and temp_<number>.csv files.
    • Error report (incase of absence of health data). These are built using information from error.err file.

    When the applet is initialized, it builds the left pane tree by reading the information present in index.txt, scope.txt, and agentcache.txt. After the applet is run, it listens to the tree selection events and updates the right pane as per the selected node on the left pane. The right pane is created by reading the information from index.txt, health.tbl, error.err, timeTicks.txt, trendCache.txt, temp_<number>.csv files.

    When the print button is pressed, the printer properties are obtained from the Administrator. As per the page settings, the right pane is built again with the pane width corresponding to selected paper type (e.g. A4 paper). Then the right pane contents are printed page-by-page taking care of paper height and chart height.

    3.0 Customizing Your Environment Health Report Configuration

    3.1 Generating the Health Report for a Specific Set of Servers

    Create a custom map (or atlas). Create a custom container within the custom map.


    Figure 2 - Create Custom Map

    Add the desired servers to the newly created container.


    Figure 3 - Generate Health Report for the desired list of servers

    Generate the health report of desired profile on the newly created container.

    3.2 Changing the Publish Directory

    The default location for storing all the health report files is:
    <ZSM install directory>:\zenworks\mms\MWServer\bin

    The admin can change the directory to any desired location. Please be mindful of the data storage consumption of generated health reports. Following are the steps to generate reports using a changed publish directory:

    1. Right Click on site server
    2. Choose properties option
    3. Go to "Health Profile" tab
    4. Select a profile of interest, for example, NetWare profile
    5. Click edit. You will see the screen below. Change the publish directory and save changes.

    Figure 4 - Changing publish directory

    3.3 Modifications to the Attribute Property of the Profile

    User can modify the attribute list considered for health calculation by enabling the checkbox "In Health Calculation" in the profile tab. Following are the properties of a given attribute and the ways to modify their values:

    Plotting trend graphs
    User can request trend graphs for any of the attributes in the given profile by enabling the check box "Show trend graph".

    Changing the List of Attributes for Calculating Health
    Refer to second column of table-2 in the Appendix. This column represents a list of attributes that can be added or removed in the "Health Calculation" by enabling or displaying the "In Health Calculation" option. For the attributes that have "In Health Calculation" selected, a weight can be assigned between 0-10000.


    Figure 5 - Changing the attribute list in the "Health Calculation"
    4.0 Troubleshooting the Health Reports

    Here are some of the common trouble spots for Health Reports and the workarounds:

    4.1 MMS database

    It is very important for the user to run "Network Discovery". It is always recommended to configure Health Reports after the complete discovery of your network. If you cannot see the Health Reports (either data or error report) for the expected list of servers, you need to check whether MMS data is populated correctly with your actual network or whether the discovery process is still in the process of discovering your network. If you rebuild the topology, the health report configuration will be lost and you will have to redo the health report configurations.

    Every profile is associated to an attribute of a server (or a node) in the MMS database (Refer table-1 Appendix). Whenever you get an error report instead of a Health Report check whether the required service has been discovered on that node by Right Clicking the server (or a node):

  • Click Properties option
  • Click Computer Attributes tab
  • Look for the service based on the profile (Refer table-1 in Appendix) in Service option in the pane
  • If the service is not found, run discovery to correct the situation

  • Figure 6 - Checking for the monitored services for a given server

    4.2 Trending Agents

    Trending Agent plays a vital role in the functioning of Health Reports feature because Health Reports eventually fetch the raw health (or trend) information from the trending agent. Health Reports run on the MMS site server and contact the trending agent on the server for which the Health Report is configured. There are many possibilities why Health Report may generate errors. Some possible reasons might be:

  • The server may not be running
  • The server may be up but the trending agent may not be loaded
  • Trending agent may be loaded but trending agent may not be configured to collect the requested health or trend data. This situation happens when trending has just started to collect the history or if the trending data collection has been disabled.
  • 4.3 Discontinuity in the Generation of Health Reports

    Sometimes reports get generated for first few days but the generation of the reports stops after few runs. For example, if you request the report at say System Atlas (or map) level. Assume that System Atlas (the report scope) contains N number of nodes. For each node the following sequence of operation should take place.

    Update next execution time - This step indicates the completion of current iteration of report generation. The health report component updates the execution time for the next iteration of report generation.

    As shown in the diagram above, for every health request there has to be a corresponding response either in the form of no data or data available. If for whatever reasons the health report does not receive a requested data, health report component stops further processing for that particular Health Report configuration. This kind of situation indicates some serious issue with the Network traffic or high server utilization and is usually very rare.

    4.4 Issues with Health Report Viewing

    If you click on index.html and if your browser hangs, the most probable problem could be the health report files got tampered. It is extremely difficult to recover from this kind of situation. One of the thing you can do is backup your report files in a new directory and delete all the files from the publish directory. You will not be able to see the old health report files but you should be able to see the reports after the cleanup.

    If you were expecting a Health Report but instead were seeing the error report for a given server, here is what you can do to locate the problem area:

    • Find the node of your interest by right clicking on the atlas where the server is located. Select "Find".
    • In the find search box give the name or the IP address of the server.
    • Double-click on the node found in the search result window.
    • You will get to see a node selected in the right pane.
    • Right click on the selected node and select "View" option.
    • Click on "Trend".


    Figure 7 - Checking for the Trend data by selecting the individual server using "Trend View"

    If you are able to see the trend graphs but are unable to see the Health Report generated for this particular server, it clearly depicts that there is some communication problem between the Health Report components residing on the site server and the trending agent residing on the server of interest.

    5.0 Conclusion

    ZENworks for Servers Health Reports provide information about the overall health of a specified network entity. This component interfaces with many other components to get the raw health data and then uses that data to compute the comprehensive health information. User can customize health gathering configuration to suit the network requirement.

    Appendix

    Health Report Profile represents a ZENworks management agent. Table-1 represents the mapping between the profile, agent and the service.

    Profile-Agent-Service (Table-1)

    Profile Agent Service
    NetWare Profile Server Management Agent for NetWare ManageWise_NetWareManagementAgentService
    Windows Profile Server Management Agent for Windows   ManageWise_WindowsNTManagementAgentService
    Ethernet Profile Traffic Analysis Agent CIM_EthernetProtocolService
    Token Ring Profile   Traffic Analysis Agent CIM_TokenRingProtocolService
    FDDI Profile Traffic Analysis Agent ManageWise_FDDIService
    Linux Profile Advanced Trending Agent for Linux ManageWise_LinuxManagementAgentService

    Health Report Profile Details (Table-2)

    Profile Type Attributes that can be included in the health calculation Total attribute set available (For all these attributes you can request a trend graph)
    NetWare Server Profile Logged in users
    Cache Buffers
    Cache Hits
    CPU Utilization
    Volume Free Space
    Free redirection area
    Logged in users( avg#)
    Connection (avg#)
    File system reads(#/min)
    File system writes(#/min)
    File system reads (KB/min)
    File system writes(KB/min)
    LSL packets received(#/min)
    LSL packets transmitted (#/min)
    NCP requests(#/min)
    CPU utilization (%)
    Cache Buffers (%)
    Code and data memory (%)
    Allocated memory (%)
    Code and data memory (%)
    Allocation memory (%)
    Dirty packet receive buffers (%)
    Packets received on a NIC (#/min)
    Packets transmitted on a NIC (#/min)
    KBytes received on a NIC (KB/min)
    KBytes transmitted on a NIC (KB/min)
    Ready jobs in a queue (avg.KB)
    Wait time in a queue (sec)
    Volume free space (%)
    Cache hits (%)
    Free redirection area (%)
    Running processes (avg.#)
    Dropped packets (avg.#)
    Packet receive buffers (avg.#)
    Microsoft Windows Profile Cache Hits
    CPU Utilization
    Disk Free Space
    Available Memory
    Logged in users
    CPU Utilization (%)
    Total Kbytes transmitted and received (KB/min)
    Kbytes transmitted (KB/min)
    Logged in users (avg#)
    Failed logon attempts( avg#)
    Connection (avg#)
    Available memory (%)
    File system reads (#/min)
    File system writes (#/min)
    File system reads (KB/min)
    File system writes (KB/min)
    Cache hits (%)
    Disk transfer time(micro sec)
    Disk Free Space (%)
    Packets received on a NIC (#/min)
    Packets transmitted on antic (#/min)
    KBytes received on a NIC (KB/min)
    KBytes transmitted on a NIC (KB/min)
    Ready items in a queue (#)
    Queue length (avg#)
    Ready KBytes in a queue (avg KB)
    Ethernet Network Profile Total Errors
    Network Utilization
    CRC error packets
    Undersized packets
    Oversized packets
    Fragmented packets
    Jabbers
    Network Utilization (%)
    Total Bytes (Bytes/sec)
    Total packets (#/sec)
    Good packets (#/sec)
    Total errors (#/sec)
    Broadcast packets (#/sec)
    Multicast packets (#/sec)
    Nicest packets (#/sec)
    CRC error packets (#/sec)
    Undersized packets (#/sec)
    Oversized packets (#/sec)
    Fragmented packets (#/sec)
    Jabbers (#/sec)
    Token Ring Network Profile Network UtilizationTotal Errors Network Utilization (%)
    Data Bytes (Bytes/s)
    Data packets (#/sec)
    Broadcast packets (#/sec)
    Multicast packets (#/sec)
    Unicast packets (#/sec)
    MAC Bytes (Bytes/sec)
    MAC packets (#/sec)
    Ring purges (#/sec)
    Claim tokens (#/sec)
    Beacons (#/sec)
    Total errors (#/sec)
    Ring poll failures (#/sec)
    Line errors (#/sec)
    Internal errors (#/sec)
    Burst errors (#/sec)
    AC errors (#/sec)
    Abort delimiters errors (#/sec)
    Lost frame errors (#/sec)
    Received congestions errors (#/sec)
    Frequency errors (#/sec)
    Token errors (#/sec)
    Frame copy errors (#/sec)
    FDDI Network Profile Network Utilization
    CRC error packets
    Undersized packets
    Oversized packets
    Lost frame errors
    Network utilization (%)
    Total Bytes (Bytes/sec)
    Total Packets (#/sec)
    Echo frame received (#/sec)
    Total errors (#/sec)
    Broadcast packets (#/sec)
    Multicast packets (#/sec)
    Unicast packets (#/sec)
    MAC Bytes (Bytes/sec)
    MAC packets (#/sec)
    SMT Bytes (Bytes/sec)
    SMT packets (#/sec)
    Claim tokens (#/sec)
    Ring Wraps (#/sec)
    Elasticity buffer errors (#/sec)
    CRC error packets (#/sec)
    Undersized packets (#/sec)
    Oversized packets (#/sec)
    Beacons (#/sec)
    Lost frame errors (#/sec)
    Frame not copied (#/sec)
    Linux Server Profile Processor Utilization
    Logged in Users
    Processor Utilization (%)
    Disk Reads (#)
    Disk Writes (#)
    Disk Block Reads (#)
    Disk Block Writes (#)
    Logged in Users (#)


    Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

    © 2014 Novell