dgtest.job

This job demonstrates downloading a file from the datagrid.

Usage

> zos login --user vmmanager
Please enter current password for 'vmmanager':
Logged into grid as vmmanager

> zos jobinfo --detail dgtest
Jobname/Parameters    Attributes
------------------    ----------
dgtest                 Desc: This job demonstrates downloading from the Datagrid

    multicast        Desc: Whether to download using multicast or unicast
                     Type: Boolean
                   Default: false

    filename         Desc: The filename to download from the Datagrid
                     Type: String
                  Default: None! Value must be specified

Description

Demonstrates usage of the datagrid to download a file stored on the Grid Management Server (GMS) to a node. For additional background information, see Section 3.1, Defining the Datagrid.

Because it typically grows quite large, the physical location of the GMS root directory is important. Use the following procedure to determine the location of the datagrid in the Orchestrator server console:

  1. Select the grid id on the left in the Orchestrator Explorer window >

  2. Click the Constraints/Facts tab.

    The read-only fact name (matrix.datagrid.root) is located here by default:

    /var/opt/novell/zenworks/zos/server
    

    The top level directory name is dataGrid.

    Contents of the GMS can be seen with the Console command:

    > zos dir grid:///
         <DIR>        Dec-6-2007 6:55 installs
         <DIR>        Dec-6-2007 6:55 jobs
         <DIR>        Dec-6-2007 22:01 users
         <DIR>        Dec-6-2007 6:55 vms
         <DIR>        Dec-6-2007 6:56 warehouse
    

Job Files

The files that make up the Dgtest job include:

dgtest                                          # Total: 208 lines
|-- dgtest.jdl                                  #  158 lines
`-- dgtest.policy                               #   50 lines

dgtest.jdl

  1  """
  2  Example usage of DataGrid to download a file stored on the GMS Server to a node.
  3  
  4  Setup:
  5     Before running the job, you must:
  6        (1) Create a dgtest resource group using the management console.
  7        (2) Copy a suitable file into the GMS Server datagrid
  8        (3) Modify the dgtest policy with the filename to download
  9                (to not use the default test file).
 10  
 11     For example, use the following matrix command to copy the file 'suse-9-flat.vmdk'
 12     into the deployment area for the job 'dgtest'
 13        >matrix mkdir grid:///images
 14  
 15        >matrix copy suse-9-flat.vmdk grid:///images/
 16  
 17     To verify the file is there:
 18        >matrix dir grid:///images
 19  
 20  
 21  To start job, after the above setup steps are complete:
 22        >matrix run dgtest filename=suse-9-flat.vmdk
 23  
 24  """
 25  import os,time
 26  
 27  #
 28  # Add to the 'examples' group on deployment
 29  #
 30  if __mode__ == "deploy":
 31     try:
 32        jobgroupname = "examples"
 33        jobgroup = getMatrix().getGroup(TYPE_JOB, jobgroupname)
 34        if jobgroup == None:
 35           jobgroup = getMatrix().createGroup(TYPE_JOB, jobgroupname)
 36        jobgroup.addMember(__jobname__)
 37     except:
 38        exc_type, exc_value, exc_traceback = sys.exc_info()
 39        print "Error adding %s to %s group: %s %s" % (__jobname__, jobgroupname, exc_type, exc_value)
 40  
 41  
 42  class test(Job):
 43  
 44     def job_started_event(self):
 45        filename = self.getFact("jobargs.filename")
 46        print "Starting Datagrid Test Job."
 47        print "Filename: %s" % (filename)
 48  
 49        rg = None
 50        try:
 51           rg = getMatrix().getGroup("resource","dgtest")
 52        except:
 53           # no such group
 54           pass
 55  
 56        if rg == None:
 57           self.fail("The resource group 'dgtest' was not found. It is required for this job.")
 58           return
 59  
 60        members = rg.getMembers()
 61        count = 0
 62        for resource in members:
 63           if resource.getFact("resource.online") == True and \
 64              resource.getFact("resource.enabled") == True:
 65              count += 1
 66  
 67        memo = "Scheduling Datagrid Test on %d Joblets" % (count)
 68        self.setFact("jobinstance.memo",memo)
 69        print memo
 70        self.schedule(testnode,count)
 71  
 72  
 73  class testnode(Joblet):
 74  
 75     def joblet_started_event(self):
 76        jobletnum = self.getFact("joblet.number")
 77        print "Running datagrid test joblet #%d" % (jobletnum)
 78        filename = self.getFact("jobargs.filename")
 79        multicast = self.getFact("jobargs.multicast")
 80  
 81        # Test download a file from server job directory
 82        dg_url = "grid:///images/" + filename
 83  
 84        # Create an intance of the JDL DataGrid object
 85        # This object is used to manage DataGrid operations
 86        dg = DataGrid()
 87  
 88        # Set to always force a download.
 89        dg.setCache(False)
 90  
 91        # Set whether to use multicast or unicast
 92        # If set to True, then the following  4 multicast
 93        # options are applicable
 94        dg.setMulticast(multicast)
 95  
 96        # how long to wait for a quorom (milliseconds)
 97        #dg.setMulticastWait( 10000 )
 98  
 99        # Number of receivers that constitute a quorum
100        #dg.setMulticastQuorum(4)
101  
102        # Requested data rate in bytes per second. 0 means use default
103        #dg.setMulticastRate(0)
104  
105        # Min number of receivers
106        #dg.setMulticastMin(1)
107  
108        if multicast:
109           mode = "multicast"
110        else:
111           mode = "unicast"
112  
113        memo = "Starting %s download of file: %s" % (mode,dg_url)
114        self.setFact("joblet.memo",memo)
115        print memo
116  
117        # Destination defaults to Node's Joblet dir.
118        # Change this path to go to any other local filesystem.
119        # e.g. to store in /tmp:
120        #    dest = "/tmp/" + filename
121        dest = filename
122        try:
123           dg.copy(dg_url,dest)
124        except:
125           exc_type, exc_value, exc_traceback = sys.exc_info()
126           retryUnicast = False
127           if multicast == True:
128              # If node's OS, NIC does not fully support multicast,
129              # then the node will timeout waiting for broadcasts.
130              # Note the error and fallback to unicast
131              if exc_type != None and len(str(exc_type)) > 0:
132                 msg = str(exc_type)
133                 index = msg.find("Multicast receive timed out")
134                 retryUnicast = index != -1
135  
136           if retryUnicast:
137              memo = "Multicast timeout. Fallback to unicast"
138              self.setFact("joblet.memo",memo)
139              print memo
140              dg.setMulticast(False)
141              dg.copy(dg_url,dest)
142           else:
143              raise exc_type,exc_value
144  
145        if os.path.exists(dest):
146           print dg_url + " downloaded successfully."
147  
148           # Show directory listing of downloaded file to job log
149           if self.getFact("resource.os.family") == "windows":
150              cmd = "dir %s" % (dest)
151           else:
152              cmd = "ls -lsart %s" % (dest)
153  
154           system(cmd)
155        else:
156           raise RuntimeError, "Datagrid copy() failed"
157  
158        print "Datagrid test completed"

dgtest.policy

  1  <policy>
  2  
  3     <jobargs>
  4  
  5        <!--
  6           Name of file that is stored in the Datagrid area to
  7           download to the resource.  8  
  9           A value for this fact the 'zos run' is assigned when
 10           using the 'zos run' command.
 11        -->
 12        <fact name="filename"
 13              type="String"
 14              description="The filename to download from the Datagrid"
 15              />
 16  
 17        <fact name="multicast"
 18              type="Boolean"
 19              description="Whether to download using multicast or unicast"
 20              value="false" />
 21  
 22     </jobargs>
 23  
 24     <job>
 25        <fact name="description"
 26              type="String"
 27              value="This job demonstrates downloading from the Datagrid" />
 28  
 29        <!-- limit to one per host -->
 30        <fact name="joblet.maxperresource"
 31              type="Integer"
 32              value="1" />
 33     </job>
 34  
 35  
 36     <!--
 37        This job will only run on resources in the "dgtest" resource group.
 38  
 39        You must create a Resource Group named 'dgtest' using the management
 40        console and populate the new group with resources that you wish to have
 41        participate in the datagrid test.
 42     -->
 43     <constraint type="resource" reason="No resources are in the dgtest group" >
 44  
 45        <contains fact="resource.groups" value="dgtest"
 46           reason="Resource is not in the dgtest group" />
 47  
 48     </constraint>
 49  
 50  </policy>

Classes and Methods

Definitions:

Job

A representation of a running job instance.

Joblet

Defines execution on the resource.

MatrixInfo

A representation of the matrix grid object, which provides operations for retrieving and creating grid objects in the system. MatrixInfo is retrieved using the built-in getMatrix() function. Write capability is dependent on the context in which getMatrix() is called. For example, in a joblet process on a resource, creating new grid objects is not supported.

GroupInfo

A representation of Group grid objects. Operations include retrieving the group member lists and adding/removing from the group member lists, and retrieving and setting facts on the group.

test

Class test (line 42 in dgtest.jdl is derived from the Job class.

testnode

Class testnode (line 73 in dgtest.jdl is derived from the Joblet class.

Job Details

dgtest.job can be broken down into the following parts:

Policy

In addition to describing the filename and multicast jobargs and the default settings for multicast (lines 3-22) in the dgtest.policy file, there is the <job/> section (lines 24-33), which describes static facts (Section 5.1.2, Facts).You must assign the filename argument when executing this example. This is only the name of the file in the “images” area of the GMS. For example, for grid:///images/disk.img, just assign disk.img to the argument. This file must be in the GMS file system for fetching and delivering to remote nodes used in this example.

To populate the GMS use the zos copy command. For example, for a file named suse-9-flat.vmd in the current directory, use the following command:

> zos mkdir grid:///images
> zos copy suse-9-flat.vmd grid:///images/

The multicast jobarg is a Boolean, defaulted to false so that unicast is used for transport. Set this value to true to use multicast transport for delivery of the file.

The policy in the <job/> section also describes a resource.groups constraint. (For more information, see Constraints). This requires a resource group named dgtest (lines 30-39 in dgtest.policy) and that group should have member nodes. Consequently, you must create this resource group using the Orchestrator server console and assign it some nodes to run this example successfully.

zosadmin deploy

When the Orchestrator server deploys a job for the first time (see Section 7.5, Deploying Jobs), the job JDL files are executed in a special deploy mode. Looking at dgtest.jdl, you might notice that when the job is deployed (line 30), either via the Orchestrator console or the zosadmin deploy command, that it attempts to find the examples jobgroup (lines 32-33), create it if missing (lines 34-45), and add the dgtest job to the group (line 36).

If this deployment fails for some reason, an exception is thrown (line 37), which prints (line 39) the job name, group name, exception type, and value.

job_started_event

In dgtest.jdl, the test class (line 42) defines only the required job_started_event (line 44) method. This method runs on the Orchestration server when the job is run to launch the joblets.

When job_started_event is executed, it gets the name of the file assigned to the jobargs.filename variable and prints useful tracing information (lines 45-47). It then tries to find the resource group named dgtest. If the resource group doesn’t exist, the member fail string is set to inform the user and returns without scheduling the joblet(s) (lines 49-58).

After finding the dgtest group, the job gets the member list and determines how many nodes are online and enabled. The total count is stored in lines 60-65. After setting the memo line in the Console (67-69), the job schedules count number of testnode joblets (line 70).

joblet_started_event

In dgtest.jdl, the testnode class (line 73) defines only the required joblet_started_event (line 75) method. This method runs on the Orchestrator agent nodes when scheduled by a Job class.

The joblet_started_event prints some trace information (lines 76-77), gets the name of the file to transfer (line 78) and the mode of transfer (line 79), and creates the grid URL for the file (line 82).

A DataGrid is instantiated (line 86), set not to cache (line 89), and set to use the multicast jobarg (line 94). The next four settings control multicast behavior are commented out (lines 97, 100, 103, and 106). See ,, , and .

The joblet prints a memo line for the Orchestrator console (lines 108-115), sets the location for the file on the local node (line 121), and tries to transfer the file from the datagrid (line 123).

If the datagrid copy at line 123 fails for some reason, we have a retry mechanism in the exception handler (lines 125-143). The information for why the exception occurred is fetched (line125).

The variable retryUnicast (line 126) is set False and will only be set True if the failed download attempt was using multicast transport and the exception type has the string "Multicast receive timed out" (lines 125-134). If the timed out string is not found, the triad assigns the retryUnicast a value of -1. With this logic, either multicast timeout or not, a unicast attempt is made if multicast fails.

If you get to line 136 from a failed multicast copy, a memo for the Orchestrator console is set and printed to the log (137-138), setMulticast is set to false (140), and another copy from the datagrid is attempted.

If we get to line 136 from a failed unicast copy, an exception is raised (line 143) and we’re done.

Configure and Run

> zos run dgtest filename=suse-9-flat.vmd
JobID: vmmanager.dgtest.323

Looks like it ran successfully; let’s see what the log says:

> zos log vmmanager.dgtest.323
Starting Datagrid Test Job.
Filename: suse-9-flat.vmd
Job 'vmmanager.dgtest.323' terminated because of failure. Reason: The resource group 'dgtest' was not found. It is required for this job.

There is no resource group. Using the Orchestration Console create the resource group dgtest:

> zos run dgtest filename=suse-9-flat.vmd
JobID: vmmanager.dgtest.324

> zos log vmmanager.dgtest.324
Starting Datagrid Test Job.
Filename: suse-9-flat.vmd
Scheduling Datagrid Test on 0 Joblets

NOTE:No joblets were scheduled because we have no active nodes in the group.

Using the Orchestration Console populate the dgtest group with nodes that are both online and anabled:

> zos run dgtest filename=suse-9-flat.vmd
JobID: vmmanager.dgtest.325

> zos log vmmanager.dgtest.325
Starting Datagrid Test Job.
Filename: suse-9-flat.vmd
Scheduling Datagrid Test on 2 Joblets
[freeze] Running datagrid test joblet #0
[freeze] Starting unicast download of file: grid:///images/suse-9-flat.vmd
[freeze] Traceback (innermost last):
[freeze]   File "dgtest.jdl", line 143, in joblet_started_event
[freeze] copy() failed: DataGrid file "/images/suse-9-flat.vmd" does not exist.
Job 'vmmanager.dgtest.325' terminated because of failure. Reason: Job failed because of too many joblet failures (job.joblet.maxfailures = 0)
[melt] Running datagrid test joblet #1
[melt] Starting unicast download of file: grid:///images/suse-9-flat.vmd
[melt] Traceback (innermost last):
[melt]   File "dgtest.jdl", line 143, in joblet_started_event
[melt] copy() failed: DataGrid file "/images/suse-9-flat.vmd" does not exist.

Because the path and the file in the DataGrid are missing, we need to create and populate them:

> zos mkdir grid:///images
Directory created.

> zos copy suse-9-flat.vmd grid:///images/
suse-9-flat.vmd copied.

> zos run dgtest filename=suse-9-flat.vmd
JobID: vmmanager.dgtest.326

> zos log vmmanager.dgtest.326
Starting Datagrid Test Job.
Filename: suse-9-flat.vmd
Scheduling Datagrid Test on 2 Joblets
[melt] Running datagrid test joblet #1
[melt] Starting unicast download of file: grid:///images/suse-9-flat.vmd
[melt] grid:///images/suse-9-flat.vmd downloaded successfully.
[melt] 16732 -rw-r--r-- 1 root root 17108462 Dec 21 21:32 suse-9-flat.vmd
[melt] Datagrid test completed
[freeze] Running datagrid test joblet #0
[freeze] Starting unicast download of file: grid:///images/suse-9-flat.vmd
[freeze] grid:///images/suse-9-flat.vmd downloaded successfully.
[freeze] 16732 -rw-r--r-- 1 root root 17108462 Dec 21 21:31 suse-9-flat.vmd
[freeze] Datagrid test completed

Finally, the file is deployed from the dataGrid and copied successfully. However, you will not find it if you look for it on the agent after the joblet is finished. By default, the file is deployed only for the joblet’s lifetime into a directory for the joblet, like the following:

/var/opt/novell/zenworks/zos/agent/node.default/melt/vmmanager.dgtest.326.0

So, for a more permanent demonstration, see lines 118-120 in dgtest.jdl. Uncomment line 120 and comment out line 121 to store your file in the /tmp directory and have it continue to exist on the agent after the joblet executes completely.