What makes Nalytics Different?
Hi. Having introduced my colleagues and myself in the previous blog, it’s time to get down to business and talk about Nalytics, our unique Search and Discovery Platform for ‘unstructured’ data. This blog introduces the fundamental index structures at the core of Nalytics.
I don’t use words like ‘unique’ lightly; Nalytics really is unique. It’s unique because its index isn’t a simple index of words pointing to the documents they occur in, it’s an index of words pointing to the sentences they occur in which point to the paragraphs they occur in, which point to the sections they occur in which point to the documents they occur in! That really is unique. Those of you familiar with hardware design and manufacturing will recognise this as a Bill of Material, and indeed it is! It’s a recursive structure which enables some very powerful capabilities that a simple word to document index could never support. It’s also fair to claim that given this structure we need no longer talk about ‘unstructured data’ – this is real structure!
At the Nalytics core is a Multi Level Structured Text Index which has a recursive parent child structure where each parent is an instance of a recognized unit of textual structure such as a document, paragraph, sentence or word. Each of these can be a parent (for example a paragraph is a parent of a sentence), as well as a child (for example a paragraph is a child of a document). The leaf level in the index is a word, which has no children.
This is depicted in the object model below:
The index is built using a technique that optimizes use of memory in a managed way. This allows us to create a Multi Level Structured Text Index from large sets of source documents very rapidly.
Although the current system works with Textual Units, any units of information that have similarly recursive parent child relationships between the different levels of information unit could be indexed using this structure. Files containing movies for example may be broken down into scenes within the movie, each of which could be further broken down into the entities depicted within the scene and the spoken dialogue within the scene.
Nalytics Metadata Index
At the same time as documents are parsed and their textual content is added to the text index, the metadata for each document is also obtained. This metadata includes the metadata held within the document itself, together with any metadata supplied by the users of the source system, such as custom metadata tags in SharePoint for example. The metadata is added to a separate metadata index. This indexes each metadata item name and its corresponding value. This facilitates faceted search, whereby only documents that fulfill a combination of metadata criteria selected by a user will be returned when a full text search is executed. It is also possible to search for documents based solely on metadata criteria.