9.3 Customizing the File Types Filter

The File Types filter considers the MIME type and perceived types defined on the Dynamic File Services server in the Windows Registry, the File Types configuration file (DswFileTypes.cfg), and the MIME Types configuration file (DswMimeTypes.cfg). This section explains how these files are used, and how to customize them to improve the effectiveness of the File Type filter.

9.3.1 Viewing MIME Types and Perceived Types for Installed Applications in the Windows Registry

The Windows Registry contains information about each installed application’s file extensions and the related perceived type or MIME type. The types vary, depending on what applications are installed on a server. Before you use the File Types filter, ensure that you understand what file types are in use on the server and what file extensions are associated with them in the server’s Windows Registry.

An application maps file extensions to content types by adding an entry in the server’s Windows Registry under the HKEY_CLASSES_ROOT\<file_extension> key. For example:

HKEY_CLASSES_ROOT\.gif
Content Type = "image/gif"

Content types are also listed in the Windows Registry under HKEY_CLASSES_ROOT\MIME\Content Type\<type>\<subtype> key.

Applications can also specify a Perceived_Type parameter for the file extensions it uses.

Common file types and a list of the file extensions that are typically associated with them are provided in Table 9-4. The list is not intended to be exhaustive.

Table 9-4 Common MIME Types and Their Associated File Extensions

MIME Type

Sample of the Associated File Extensions

application

  • .accdb
  • .ai
  • .ani
  • .csv
  • .doc
  • .docx
  • .gz
  • .odp
  • .odt
  • .pdf
  • .pps
  • .ppt
  • .pptx
  • .xls
  • .zip

audio

  • .aiff
  • .mid
  • .mmv
  • .mp3
  • .wav
  • .wma

compressed (perceived type)

  • .arc
  • .cab
  • .zip

image

  • .bmp
  • .gif
  • .jpeg
  • .jpg
  • .png
  • .tiff

message

  • .mht
  • .mhtml
  • .nws

model

  • .iges
  • .mesh
  • .vrml

system

  • .386
  • .chk

text

  • .css
  • .htm
  • .html
  • .rtf
  • .sgm
  • .sgml
  • .txt
  • .xml
  • .xms

video

  • .avi
  • .mp3
  • .mp4
  • .mpeg
  • .qt
  • .wmv

9.3.2 Configuring File Extensions and Categories for the File Types Filter

Perceived file types provide a similar function as the standard MIME types, except that they refer to broad categories of file format types rather than specific file types. They are defined by use and general acceptance.

Dynamic File Services uses the C:\Program Files\Dynamic File Services\DswFileTypes.cfg file to associate well-known file extensions for the following Microsoft Perceived Types:

Each line in the DswFileTypes.cfg file contains a file extension and its perceived type or category in following format:

.<file_extension>/<file_type_category>

For example:

.doc/document

You can customize the file to add, remove, or modify the file extension entries. This allows you to define new file extensions and categories, or to associate a file extension with your preferred category. Ensure that each file extension is associated with only one category. The entries can be listed in any order.

To customize the DswFileTypes.cfg file:

  1. Log in to the Dynamic File Services server as a user with Administrator privileges.

  2. In a file browser, navigate to the folder where you installed Dynamic File Services.

    The default installation location is C:\Program Files\Dynamic File Services.

  3. Make a copy of the default DswFileTypes.cfg file, and give it a different name.

  4. Open the working copy of the DswFileTypes.cfg file in a text editor.

  5. Add, remove, or modify the definitions.

    Place each entry on a separate line in following format. The entries can appear in any order.

    .<file_extension>/<file_type_category>
    

    For example:

    .ext/example
    
  6. Save the file.

    The revised definitions are applied for the next run of a file types policy.

After you modify the file, you can expect the following changes in the policies and wizards:

  • New or modified file extensions are considered in future policy runs for the specified category.

  • New or modified categories appear as options in the File Types dialog box when you create or modify a file types policy.

    For example, if you add the following line to the DswFiletypes.cfg file, other appears as a valid category the next time you edit or add file types for a policy:

    .new/other
    
  • If a file extension is removed from the file and it is not otherwise defined in the server’s Windows Registry, a file types policy does not move files with that extension.

  • If all instances of a category are removed from the file and the category is not otherwise defined as a MIME type or perceived type in the server’s Windows Registry, the category no longer appears as an option in the File Types dialog box when you create or modify a file types policy.

    For existing policies, the category is considered an invalid category in future policy runs, and a file types policy does not move files based on the category.

9.3.3 Configuring MIME Types and Categories for the Content Filter

Dynamic File Services uses the C:\Program Files\Dynamic File Services\DswMimeTypes.cfg file to associate well-known MIME content types with file extensions.

Each line in the DswMimeTypes.cfg file contains a MIME content type and file extension in following format:

<mime_type>/<mime_subtype>:.<file_extension>

For example:

application/msword:.doc

When the file content is used to determine the type, the file content filter reads the file content from each file, ignoring the file’s extension.

Continuing the example, if a file’s content type is the MIME type/subtype of application/msword, then the file is the same as a .doc file. The filter looks up the .doc file extension in the DswFileTypes.cfg file to determine its assigned category, such as .doc/document. The file is moved if the category is specified in the file types policy.

You can customize the DswMimeTypes.cfg file to add, remove, or modify the MIME Type definitions. This allows you to define new MIME types and associate them with file extensions, or to associate a file extension with your preferred MIME type. Ensure that each MIME type is associated with only one file extension. The entries can be listed in any order. Ensure that the new or modified MIME types have been mapped to a file extension in the DswMimeTypes.cfg file, and the file extension is mapped to a file type category in the DswFileTypes.cfg file.

To customize the DswMimeTypes.cfg file:

  1. Log in to the Dynamic File Services server as a user with Administrator privileges.

  2. In a file browser, navigate to the folder where you installed Dynamic File Services.

    The default installation location is C:\Program Files\Dynamic File Services.

  3. Make a copy of the default DswMimeTypes.cfg file, and give it a different name.

  4. Open the working copy of the DswMimeTypes.cfg file in a text editor.

  5. Add, remove, or modify the definitions.

    Place each entry on a separate line in following format. The entries can appear in any order.

    <mime_type>/<mime_subtype>:.<file_extension>
    

    For example:

    other/x-new:.new
    
  6. Save the file.

  7. Edit the DswFileTypes.cfg file to ensure that the file extension is assigned to a category, then save the file.

    For example:

    .new/custom
    

After you modify the files, you can expect the following changes in the policies and wizards:

  • Files with a content type that matches the new or modified MIME types can be moved by content filter policies that use the category mapped to its file extension.

  • If a MIME type definition is removed from the file and it is not otherwise defined in the server’s Windows Registry, a content filter policy does not move the file.

See the following examples for how the MIME Type settings affect your file types policies:

Example 1: New MIME Type

If a file is not being moved as expected, you can assess its file content type by using the Apache Tika open source application as described in Section 9.3.4, Using Apache Tika to Find the MIME Type of a File. If Tika returns a new file content type, you should modify the DswMimeTypes.cfg file to add a new <mime_type>/<mime_subtype>:.<file_extension> entry, then add the file extension and category to the DswFileTypes.cfg file.

Assume that Tika returns a content type of other/x-new for a file with the .new extension. You modify the DswMimeTypes.cfg file to add the following line:

other/x-new:.new

You modify the DswFileTypes.cfg file to add an entry for the new file extension with a category called custom:

.new/custom

The next time that you edit or create a file types policy, the custom category appears as an option in the File Types dialog box. If you select the option, the category is applied the next time you run the policy.

Example 2: Files with No Extensions

In this example, the following line is in the DswFileTypes.cfg file:

.pdf/document

The following line is in the DswMimeTypes.cfg file:

application/pdf:.pdf

Suppose that a file named unknown has no extension, but it is really a PDF file.

You create a file types policy with document as one of the categories. The following outcome is expected, depending on whether you enable the Use file content to determine type option:

  • Do not enable Use File Content to Determine Type: The filter matches the category based on a file’s file extension. Since the file has no file extension, the file does not match the document category.

  • Enable Use File Content to Determine Type: The filter looks at the file content to determine the files type and returns application/pdf as its MIME type. Based on the matching entry in the DswMimeTypes.cfg file, the unknown file is treated the same as if it has a .pdf extension. Since the file extension is mapped to the document category, the filter matches the file to the category, and moves the file.

Example 3: Unexpected MIME Types

Assume that Apache Tika returns a file content type of application/x-document-special for a file with no extension that you think is a document file type. You modify the DswMimeType.cfg file to add the following line that maps the MIME type to the .doc file extension.

application/x-document-special:.doc    

Since .doc is already mapped to the document category with the following line in the DswFileTypes.cfg file, you do not need to modify that file.

.doc/document

9.3.4 Using Apache Tika to Find the MIME Type of a File

To effectively use the File Types filter option with the Use file content to determine type option, you should know the MIME type of the files you are trying to move. Choose a representative file, then use any available utility to determine it’s actual MIME type based on its content. Use this information to choose the appropriate File Types category in the policy.

Apache Tika 1.1 provides a standalone, runnable application jar (tika-app-1.1.jar) that you can use to discover the MIME type for a file. This Tika application combines the core and parser libraries into a single runnable jar with a GUI and a command line interface.

To download the Tika 1.1 application jar file to a Dynamic File Services server:

  1. In a Web browser, go to the Apache Tika download site, click tika-app-1.1.jar file name link, then select a mirror site to use for the download.

    Apache Tika 1.1 is the version used by Dynamic File Services 2.1. If a newer Tika version is shown on the Tika Downloads page, follow the link to the Tika archives, then download the Tika 1.1 version of the application file.

  2. Ensure that a version of Java is installed on the Windows server.

For information about running the Tika 1.1 application with Java, see Using Tika as a Command Line Utility on the Getting Started with Apache Tika documentation Web page.

To use the Apache Tika application jar file with Java to determine the MIME type of a sample file:

  1. On the Dynamic File Services server, open an Administrator Command Prompt console (right-click the Command Prompt icon, then select Run as Administrator).

  2. At the prompt, navigate to the folder where you downloaded the Apache Tika application.

  3. Start the Apache Tika application by entering

    java -jar tika-app-1.1.jar -g
    
  4. Drag and drop the sample file on the Apache Tika GUI.

  5. Select View > Metadata to view the file’s metadata information.

    The Content-Type line shows the file’s MIME type based on content. For example, the following is an example of the file’s metadata:

    Content-Length: 15898
    Content-Type: application/vnd.oasis.opendocument.spreadsheet
    Creation-Date: 2011-09-20T10:31:00.43
    Edit-Time: P38DT9H18M1S
    Object-Count: 0
    Table-Count: 3
    date: 2012-01-04T16:49:10.35
    editing-cycles: 44
    generator: LibreOffice/3.4$Win32 LibreOffice_project/340m1$Build-1219
    nbObject: 0
    nbTab: 3
    resourceName: Intlab IPaddress assignments.ods