Technology Advanced Collaboration Workgroups and Communtiies

Structured and unstructured data analytics

ElasticSearch, Spark and related analytical and machine learning platforms deliver Information Retrieval and Natural Language Processing (NLP) capabilities that can organize, query and socialize content that is formally published, web crawled, user-generated or operationally created in structured, unstructured or semi-structured formats.

Key capabilities

- Faceted search for large heterogeneous content bodies
- High performance log processing and analytics dashboards
- Behavior, sentiment and reputation analysis

Underlying algorithms and services:

- Machine Learning based Information Retrieval ( IR/ML)
- Natural Language Processing (NLP)
- Document Understanding
- Graph-based Reasoning
- Bayesian reasoning
- Support Vector Machines (SVM)
- Belief networks
- Document Classifiers
- Business Intelligence data mining

When combining big data technologies with group intelligence software and topic-based knowledge exchanges, we are working in an exciting new realm where content information retrieval, text analytics and machine learning are used to pre-digest vast amounts of structured and unstructured data which can be continually fed into collaborative knowledge workflows in a semantically accessible and familiar form..

ES/Hadoop/Spark/Python can integrate IR / TA / ML capabilities with existing business platforms, including collaboration suites, Business Intelligence software, CRM, SFA, legal systems and content management (CMS) applications. Making mountains of content understandable

Unstructured Data Analysis

Capabilities include semantic analysis of unstructured or “semi-structured” text content such as web pages, documents, social media, research papers, reports, medical records, work logs and forms, RDF triplestores, and any free-form text:

- Sentiment analysis (evaluating the sentiment of the author of a document)
- Named Entity Recognition (parsing out significant references to real world objects)
- Document classification (different ways of cauterizing and classifying documents)
- Relevance recognition (determining how relevant a document is to a given topic)
- Paragraph Gisting (extracting the core meaning of a paragraph)
- Ontological search (recognizing similarities from context)
- Semantic filtering (recognizing what a reference is about from context)
- Auto-generation of tags to add to the search space of user-generated content
- Auto generation of links between documents
- Associative retrieval of documents

Structured Data Analysis

Support for semantic analysis of structured data such as that found in relational, B.I. or flat databases is supported with following capabilities:

- Faceted search (allowing repetitive searches to filter the results of prior searches)
- Linked Data Analysis (across structured data sets)
- Data Mining on Big Data collections (pattern matching and selection)
- Predictive Analytics (locating and identifying trends in structured data)
- Data Record Classification (naïve bayes, k-nearest neighbor)
- Knowledge Workflows Driven by Machine Intelligent Information Retrieval

In a traditional information retrieval application... documents are indexed with TF/IDF and then query against them to get a search results listing…

Wikipedia on TF/IDF search technology

ElasticSearch can support a very wide range of TF/IDF applications and can also do the opposite… index a large set of queries ( i.e., rules, business logic, metadata structures, exploratory categorization routines) and then throw incoming documents and structured content against the indexed queries..

This so called "reverse indexing" approach makes it possible to quickly parse and process a large number of heterogeneous documents, papers, research notes, transaction records, annotations, social media content, and unstructured / semi structured text records looking for categories, topics, tags, and fuzzy emergent patterns.

More importantly the engine helps capture and share the intelligence for these reusable queries between knowledge workers.

In IR/ TA terms.. the underlying mechanism is called percolation.

ElasticSearch reverse indexing document percolation

With the python API to ElasticSearch we can write software that extends the IR/TA capabilities into the most advanced reaches of probabilistic machine learning, creating new forms of social knowledge sharing applications in finance, insurance, intelligence, marketing, publishing, e-commerce and healthcare.

Technologies for Topic Communities & Group Intelligence

Structured and unstructured data analytics

Next Steps