Script Metrics

In addition to Metrics based on regular expressions, Analyzer now supports metrics based on scripts. With scripts, much more powerful tests can be implemented, based not only on the value being tested but on other attributes in the record, including attributes with multiple values.

For example, a script could enforce a telephone number format based on the country attribute. Another script might test for a numeric value within a prescribed range, flagging values outside the range.

This version of Analyzer supports ECMAScript (the generic classification for JavaScript), Python, and Ruby scripting languages.

The same script may be used to both test and clean data. It may be applied as a test script or applied as a cleaning script. See the documentation under Data Cleaning for more information.

Creating a Script Metric

A Script Metric is created, edited, copied, or deleted in a similar manner as a RegEx metric. Select the "Metric Type: Script" radio button at the bottom of the dialog to select a scripting language and specify the script.

Creating a new Script Metric


Copying an Existing Script

An easy way to create a new script is to copy an existing script, changing the Name, Description, Syntax and Format fields as needed, then editing the script itself. A predefined script metric named "Data Range" may be used for this purpose. One advantage of this approach is that the predefined script contains comments describing how to access attributes within the script and how to return Pass/Fail status or cleaned values.

Accessing Attributes within a Script

The value to be tested is sent into the script as the script variable "aimValue". The script should set the boolean variable "aimReturnStatus" to true or false before returning. True indicates the value passed the test, and false indicates failure. For example, a very simple script to test that values don't exceed 64 character might look like this:

   aimReturnStatus = true;
   aimReturnValue = aimValue;
   
   if (aimValue != null && aimValue.length > 64)
      aimReturnStatus = false;
Note that in the case of an empty cell (i.e. an unpopulated attribute value), "aimValue" will contain null so the script needs to handle this case.

If the script needs to access other attributes in the record in order to test the value in question, it may use these script variables:

   aimValue             - the current value being tested
   aimArrayValue        - array of all values in the current cell
   aim_<attrName>       - the value of each attribute in the dataset
   aimArray_<attrName>  - array of values for each attribute

   aimReturnStatus      - boolean variable.  Set to true if value passed test, false if failed.
   aimReturnValue       - Used for data cleaning. Return with original value to leave unmodified.
For example, if there is a "Country" attribute in the dataset, it may be accessed in the script as the variable "aim_Country". In case the attribute is empty, "aim_Country" will contain null. If this attribute has multiple values, "aim_Country" will contain the first value.

If the script needs to look at all values of a multi-valued attribute, it can use the "aimArrayValue" for an array of values in the current cell, or "aimArray_<attrName>" for an array of all the values of any attribute in the dataset. If a cell is empty the array variable is a 0-length array.

In this example, the metric will return a failure for each value of an attribute with more than one value.

   aimReturnValue = aimValue;
   aimReturnStatus = aimArrayValue.length > 1;

Alternate method to Edit a Script

The script may be modified in the small multi-line text box of the "User Defined Metric" dialog. An alternate way is to use Eclipse's built-in JavaScript editor, providing a full-screen color-coded editor. Bring up the Navigator view by the menu option Window, Show View, Navigator. In the desired project, browse to Analyzer, Toolbox, Metrics, and refresh with F5. Right-click the .js file, and open it with the JavaScript Editor. After saving the file, you can switch to a Data Browser tab and reapply the modified script.

Editing a Script


Applying a Script Metric

The script metric is applied to a column the same way a RegEx or Internal metric is applied. Right click on a cell in the desired column, select "Apply Metric to Column", then select the desired metric, whether it's a script, Reg Ex, or internal metric. Script metrics may also be used in the Analysis section as with other metrics.

Applying a Script