Using XML for Enterprise Email Discovery - Part 3
Novell Cool Solutions: Feature
By Messaging Architects
Digg This -
Posted: 31 Oct 2006
Demystifying XML: XML vs SQL for Enterprise Email Discovery - Part 3
Review Part 1 of the Series,
Anatomy of an Email Record
Review Part 2 of the Series,
What You Need to Know about XML Data
Messaging Architects employs the XML approach to storing the data managed through GWArchive, its enterprise-class message retention and compliance solution. Messages are harvested through crawlers or archive agents, which search for new messages and convert them to XML records. The records are then indexed by our enterprise class indexing engines to provide fast and accurate search and data retrieval across hundreds of thousands and millions of individual email records.
It should be made clear that GWArchive does not treat the XML repository as a database, only as a data source. As was already mentioned above, the indexing of the data determines the ability for quick and easy record retrieval based on relevancy, as well as on whether the term exists or not.
GWArchive's choice to use XML instead of a database repository is quite revolutionary in the context of email archiving. Most archiving solutions on the market today opt for a database repository solution because they lack a high-performance search engine and thus have to rely on the database search engines for fast record search and retrieval. In other words, it's not about the database, it's about searchability.
Yet, while the database search engines are useful for some types of data, electronic records retrieval often requires fuzzy logic searches or keyword searches based on wordlists. These types of unstructured queries often pose additional overhead for db search engines and aren't necessarily the best choice for electronic records retrieval.
To overcome databases' traditional inability to find meaning in the textual components of structured data, enterprise-class search engines offer the advantage of being able to handle both structured and unstructured content. GWArchive already comes integrated with a robust search engine that allows for all kinds of advanced searches. As soon as an email message is brought into the archive repository, it is indexed to meet the enterprise's future needs for data accessibility in the context of electronic discovery or internal audits.
The Future of XML Technology
As organizations move forward and information access within the enterprise becomes increasingly important, federated search engines are being utilized to access information from even traditional RDBMS systems. The ability for these systems to "crawl" the enterprise and automatically capture and index information, along with their ability to present unstructured information in an organized and easily understandable format, makes them a focal point for consolidated searching of information within the organization.
Scalability of these systems is definitely not an issue, as we have seen with companies, such as Google or Yahoo, which provide fast access to information on the web and recently have even started delivering an enterprise search appliance entirely without the use of SQL queries.
RDBMS and SQL will continue to play an important part in the enterprise for the access and management of structured data. At the same time, XML as a technology is proving to be flexible and scalable to meet the demands for search and fast retrieval of the growing volumes of unstructured data that resides in the corporate systems. XML has overcome its initial limitations and the growing pains that accompany any emergent technology. Reputable industry analyst groups that track the developments and innovations in the IT space, such as Gartner, confirm that XML is an important enterprise data source by referencing IBM's Viper DB2 product, which stores XML data natively and supports SQL queries to it.
While databases provide excellent access to structured data, enterprise search is the emerging giant in providing "humanistic" access to data in an age where data discovery for legal and compliance have become so critical. Teaming XML storage with Enterprise Search provides the best of both worlds: a technology-neutral storage format ready to use as a data source with most applications and a search format that provides intuitive and granular discovery of all types of structured and unstructured data.
About the Author:
Greg Smith, MCNE and MCNI, has been working in the high-technology field for more than 15 years, predominantly with Novell Platinum integrators and resellers. Greg Smith is one of the main designers of Messaging Architects' GWArchive, the only GroupWise-native email retention solution included in the Gartner Magic Quadrant for active archiving. In his current position as Director of Professional Services at Messaging Architects, he brings his networking and messaging expertise to a company that specializes in GroupWise enhancements and product development. Greg has been active in the area of public speaking, giving technical presentations at GroupWise Advisor Summits, as well as at Novell BrainShare.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com