Upload
brianne-jenkins
View
214
Download
0
Embed Size (px)
Citation preview
Voquette Company Confidential
SCORE
Voquette Company Confidential
Presentation Overview
• Industry Requirements
• Capabilities
• System Architecture and Technologies
• Examples and Scenarios
• Measures (Quality, Performance, Scalability, Robustness)
• Deployment Information
• Questions & Answers: What if
• Business Development Issues
• Milestones and Schedules
Voquette Company Confidential
1. The Problem: massive, disparate information
• Multiple isolated sources of intelligence information (FBI, CIA, etc.) that is not
shared or integrated
• Large variety (format, media) of open source, partner, FAA and IC information
2. The Difficulty: inability to have timely actionable info
• Amount of data too overwhelming to use constructively
• Manual methods of aggregating data not scaleable
=> Lack of a “complete picture” to make decisions
• Inability to make timely, accurate and actionable conclusions based on information-
at-hand
3. The Solution: Voquette’s Semantic Technology
• Technology to analyze and integrate data from disparate sources to provide a near-
real time, reliable, scaleable and actionable solution for intelligence and security
applications
Intelligence Content Management Challenges
Voquette Company Confidential
1. Aggregation
• Feed handlers/Agents that understand content representation and media semantics
• Push-pull, Web-DB-Files, Structured-Semi-structured-Unstructured data of different
types from proprietary, partner and open source
2. Homogenization and Enhancement
• Enterprise-wide common and customizable view (information organization)
• Domain model, taxonomy/classification, metadata standards
• Semantic Metadata– created automatically if possible
• Semantic associations/inferences (link analysis)
3. Semantic Applications (in near real-time)
• Search, personalization, alerts, knowledge browsing/inference for improved
relevance, intelligent personalization, customization
New Technical Challenges in Enterprise Content Management
Voquette Company Confidential
Voquette’s Unique Capabilities
• Semantics (understanding of content and user needs)
• Extreme relevance
• Knowledge inferencing (semantic associations)
• Near real-time
• Multiple applications/usage patterns (not just search)
• Automation
• Scalability in all aspects
Voquette Company Confidential
Voquette Semantic Technology System Architecture
Distributed agents that automatically extract relevantsemantic metadata from structured and unstructured content
Fast main-memory based query engine with APIs and XML output
CACS provides automatic classification (w.r.t. WorldModel)from unstructured text and extracts contextually relevant metadata
Distributed agents that automatically extract/mineknowledge from trusted sources
Toolkit to design and maintain the KnowledgebaseKnowledgebase represents the real-world instantiation(entities and relationships) of the WorldModel
WorldModel specifies enterprise’snormalized view of information (ontology)
Voquette Company Confidential
Workflow Process
• WorldModel™ (Domain Model), Taxonomy/Classification,
Knowledge base schema
• Classifiers
• Knowledge and Content Extraction Agents
• Automated or human-supervised run-time
(for classification and metadata enhancement, knowledge base
maintenance)
• Semantic Applications
All components support incremental extensions.
Voquette Company Confidential
Technological Innovation
• Semantic approach (classification/taxonomy, domain model, entities and relationships)
[All components]
• Semantic associations/ knowledge inferences
• Classification committee (multiple technologies, rather than one size fits all) [CACS]
• Scalability throughout with distributed architecture and implementation (number of
content and knowledge sources, indexing, etc.)
• Main memory implementation, incremental check pointing [SSE]
Voquette Company Confidential
Example:Domain: Intelligence
Sub-domain: People, Org, Places(Other Sub-domains: Financing, Methods & Training, Materials)
Voquette Company Confidential
Voquette Semantic Technology System Architecture
Voquette Company Confidential
What is it?WorldModel™: Template infrastructure to organize and index content contextually
What does it consist of?Domains (categories) and domain-specific attributes, with geo-spatial and temporal info
Setting up a Terrorist Intelligence WorldModel™Terrorism
Intelligence
Group
Person
Event
Bank
Attack Material
Name Alias
Terrorism IntelligenceWorldModel™ (simplified)
Alias Email Address
Location
Time
What are the information pieces of possible interest?
(that can be modeled as WorldModel™ attributes)
• Groups: Nationalist, Terrorist, Political groups
• Person: Terrorist, Suicide Bomber, Hijacker, Personality
• Event: Flight hijacking, WTC Crash,Kidnapping, Terrorist training
• Bank: Swiss bank, Belgian bank (where groups have accts)
• Attack Material: Knives, Plastic Explosives, RDX, AK47 Gun
• Name Alias: Aliases of terrorists (Osama BL = Usama BL)
• Alias Email Addresses: Email addresses for alias names
• Location: Location related with event of interest
• Time: Date/time related to event of interest
Intelligence WorldModel™
Voquette Company Confidential
Voquette Semantic Technology System Architecture
Voquette Company Confidential
What is it?Extractor Agents: Intelligent software robots that work on structured content and automatically extract metadata information that is relevant and meaningful to the domain/sub-domain at hand
How do they work?• Intelligence extractor agents use the Intelligence WorldModel™ definition for meaningful
metadata extraction from trusted Intelligence content• Extractor agents exploit the structure of Intelligence content and automatically “pick up” meaningful Intelligence metadata information (as defined in the WorldModel™)
ExtractorAgent
ForCIA
ConfidentialContent
Pick up syntax metadata
Pick up group name
Pick up person
Pick up attack material
Pick up bank name
Pick up location/date/time
Metadata extracted
Pick up name aliases
TerrorismIntelligence
Group
Person
Event
Bank
Attack Material
Name Alias
Terrorism IntelligenceWorldModel™
Alias Email Address
Location
Time
Intelligence Extractor Agents
Voquette Company Confidential
Voquette Semantic Technology System Architecture
Voquette Company Confidential
What is it?Knowledge Base: Network of Intelligence objects (significant pieces of information) anda representation of the real-world relationships (associations) between them
Group originated in(‘’Al Queda” originated in “Afghanistan”)
Country
Groupworks with
Group
(‘Irish IRA” works with “Columbian Group”)
works for(‘Nabil Almarabh” works for “Al Queda”)
Person
leads(‘Bin Laden” leads “Al Queda”)
Person Group
has alias(‘Bin Laden” has alias “Mohammed”)
Person
has email(‘Mohammed” has email “[email protected]”)
Alias
Alias Email add
involved in(‘Bin Laden” involved in “WTC Crash”)
Person Event
occurred at
(‘WTC Crash” occurred at “New York, USA”)
Event Location
occurred at
(‘WTC Crash” occurred at “0903, 9/11/01”)
TerrorismIntelligence
Group
Person
Event
Bank
Attack Material
Name Alias
Alias Email Address
Location
Time
WorldModel™ Intelligence Knowledge Base Definition
Group
Alias
Person
Country
Bank
Account in
Has alias
Has email
Involved in
Occurred at
Works for/leads
Location Time
EmailAdd
Event
Occurred at
Originated in
Is funded by/works with
Intelligence Knowledge Base
Group accounts in(‘’Al Queda” accounts in “Swiss bank”)
Bank
Group
TimeEvent
Voquette Company Confidential
Voquette Semantic Technology System Architecture
Voquette Company Confidential
What is it?CACS: Module that categorizes content and automatically creates metadata of content
How does it work?Uses a hybrid of statistical, machine learning and Intelligence knowledge-base techniques
Structured Intelligencecontent
CACS
Information exchange for metadatacreation
Event: Pentagon Attack
Metadata extracted:Terrorist Group: Al QuedaPerson: Bin LadenLocation: Washington, USATime: 0918 hrs
Affiliation Country: AfghanistanAllied Group: Saudi MisaalPerson Alias: Mohammed
Intelligence Knowledge Base Definition
Group
Alias
Person
Country
BankAccount in
Has alias
Has email
Involved in
Occurred at
Works for/leads
Location Time
EmailAdd
Event
Occurred at
Originated in
Is funded by/works with
Unstructured Intelligencecontent
OR
Application in IntelligenceCACS could be trained to intelligently process Intelligence content to classify the content piece as a terrorism-related event (WTC Crash, Flight hijacking, etc.)
Categorization and Auto-Cataloging System (CACS)
Voquette Company Confidential
Voquette Semantic Technology System Architecture
Voquette Company Confidential
What is it?Semantic Engine: Fast main memory-based front end query engine that enables the end-userto retrieve highly relevant and personalized content via custom APIs
Features and Functionality• Minimal input from security agent – system intelligent enough to provide all possible relevant
content to security agent (type in “Bin Laden” and get all relevant information on him and other items related to him)• Applications: Search, personalization, alerts, notifications, directory
SemanticEngine
Search
Personalization
Directory
Alerts/Notifications
Intelligent Inference
Analyst WorkBench
Custom Apps.
ConfidentialAgent
User query submitted
Highly relevant Content returned
ContentEnhancementTechnology
Intelligence Semantic Engine
Voquette Company Confidential
Scenario 1: Intelligent Analysis of Confidential Email
Voquette Company Confidential
• Information underlined in blue are important metadata elements automatically picked up by
the Intelligence extractor agents
• Information shown in red boxes are names of terrorists (stored in our Knowledge Base) that
are also automatically picked up by the Intelligence extractor agents
• CACS can determine by content analysis that this is a “Terrorist Meeting” information
• Intelligent inferencing is possible due to semantic associations of the Knowledge Base
“Mohamed Atta met with Abdulaziz Alomari” Picked up off explicit mention in email
Al Qaeda Saudi Misaal
Works for Works for Voquette Knowledge Associations
Inference: Al Qaeda and Saudi Misaalhave possibly started working togetheras allied groupsOriginated in Originated in
Afghanistan Saudi Arabia Inference: Afghanistan and Saudi Arabia have groups that probably collaborate - look for other relationships
Scenario 1: Intelligent Analysis of Email (Contd.)
Voquette Company Confidential
Scenario 2: Analyst Workbench
• Voquette’s Semantic Technology enables highly relevant and comprehensive
terrorist research
• Example: A security agent wishes to perform research on “Bin Laden” (as he is prime suspect)
• News/Information directly about Bin Laden is retrieved (that mentions his name explicitly)
• News/Information on Al Qaeda is retrieved (Bin Laden Al Qaeda association in KB)
• News/Information on WTC Crash is retrieved (WTC Crash Bin Laden association in KB)
• News/Information on Mohammed is retrieved (Mohammed Bin Laden ‘alias assoc.’ in KB)
• News/Information (intelligence) on Afghanistan is retrieved (Al Qaeda Afghanistan in KB)
• News/Information (intelligence) on Swiss bank is retrieved (Al Qaeda Swiss bank in KB)
• Combined together, this co-related information is extremely valuable in bringing together
multiple actionable perspectives and point-of-views on one screen
• Result: Less time-spending, faster and much better decision making, more security!
Voquette Company Confidential
Syntax Metadata
Semantic Metadata
led by
Same entity
Human-assisted inference
Knowledge Inferencing Workflow
Voquette Company Confidential
Analyst Usage Scenarios/Interfacesfor Knowledge Inference
Analysts can possibly use:
• Search
• Knowledge Base Browser / Directory
• Personalization/Alerts
• APIs for custom applications
All options support Reference Pages, Semantic Associations,
Knowledge-based browsing
Voquette Company Confidential
Intelligence Analyst Browsing Scenario
Voquette Company Confidential
Core Competencies of Voquette’s Semantic Technology
Content Aggregation, Integration and Normalization
• Create a Customized WorldModel™ (domain model with customized domain attributes)
• Content Aggregation and integration from multiple sources, formats and media (text/audio/video)
• Support push or pull delivery/ingestion of content
• Patented extractor agent technology
• Metadata extraction from structured, semi-structured and unstructured text (fully automated)
• Automatically homogenize content feed tags (fully automated)
Categorization and Auto-Cataloging
• Automatically categorize structured and unstructured text
• Create contextually relevant semantic metadata from unstructured text (fully automated)
• Uniquely uses a hybrid of statistical, machine learning and knowledge-base techniques for classification
Voquette Company Confidential
Content Enhancement using Knowledge Base• Create and maintain a Customized Knowledge Base for any domain
• Automatically create content tags based on text Itself (fully automated)
• Automatically enhance content tags based on information outside of text (fully automated) by exploiting Knowledge Base
• Provide end user relevant content not only relevant content he asked for, but also relevant content that he did not explicitly ask for, but that he needs to know
Core Competencies of Voquette’s Semantic Technology
Semantic Engine• Fast , main-memory based Semantic Engine
• Response Time of the order of 10s of milliseconds
• Performance: 1 million queries per hour per server
• Real Time Indexing (stories indexed for search/personalization within a minute)
• Near real-time search/personalization of new content and breaking news
• Information retrieval based on quality and not quantity
• Semantic Applications: Search, Directory, Personalization, Alert, Notifications, Custom enterprise applications
Voquette Company Confidential
SCORE Implementation Architecture
Distributed agents that automatically extract relevantsemantic metadata from structured and unstructured content
Fast main-memory based query engine with APIs and XML output
CACS provides automatic classification (w.r.t. WorldModel)from unstructured text and extracts contextually relevant metadata
Distributed agents that automatically extract/mineknowledge from trusted sources
Toolkit to design and maintain the KnowledgebaseKnowledgebase represents the real-world instantiation(entities and relationships) of the WorldModel
WorldModel specifies enterprise’snormalized view of information (ontology)
Voquette Company Confidential
ExampleDomain: Financial ServicesSub-domain: Equity Market
(other potential sub-domains: Fixed Income, Mutual Funds, …)
Voquette Company Confidential
Semantic Metadata
Syntax Metadata
Content Enhancement Workflow
Voquette Company Confidential
ExtractorAgent
forBloomberg
Scans text for analysis
Metadataextractedautomatically
AssetSyntax MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San Jose, CAURL: http://bloomberg.com/1.htmMedia: Text
Semantic Metadata Company: Cisco Systems, Inc.
Creates asset (index)out of extracted metadata
AssetSyntax MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San Jose, CAURL: http://bloomberg.com/1.htmMedia: Text
Semantic Metadata Company: Cisco Systems, Inc.Topic: Company News
Categorization &Auto-Cataloging System (CACS)
Scans text for analysis
Classifies document into pre-defined category/topic
Appends topic metadatato asset
CiscoSystems
CSCO
NASDAQ
Company
Ticker
Exchange
Industry
Sector
Executives
John ChambersTelecomm.
Computer Hardware
Competition
Nortel Networks
Knowledge Base
CEO of
Competes with
Syntax Metadata AssetProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San Jose, CAURL: http://bloomberg.com/1.htmMedia: Text
Semantic Metadata Company: Cisco Systems, Inc.Topic: Company NewsTicker: CSCOExchange: NASDAQIndustry: Telecomm.Sector: Computer HardwareExecutive: John ChambersCompetition: Nortel NetworksHeadquarters: San Jose, CA
Leveragesknowledgeto enhance
metatagging
Enhanced Content Asset
Indexed
Headquarters
San Jose
XML Feed
SemanticEngine
Content Asset Index Evolution
Voquette Company Confidential
What is it?WorldModel™: Template infrastructure to organize and index content contextually
What does it consist of?Domains (categories) and domain-specific attributes
Examples
Equity
Company
Ticker
Industry
Sector
Executive
Headquarters
Equity WorldModel™
Definition
Domain: Equity
Equity-specific attributes:CompanyTickerIndustrySectorExecutiveHeadquarters
Sports WorldModel™
Sports
Sport Name
Location
Player
Team
League
Coach
Golf Football
Golfer
Tourney
Golf Course
Definition
Domain: Sports
Sports-specific attributes:Sport NameLocation
Sub-Domain: Golf
Golf-specific attributes:GolferTourneyGolf Course
Sub-Domain: Football
Football-specific attributes:PlayerTeamLeagueCoach
Voquette WorldModel™
Voquette Company Confidential
What is it?Extractor Agents: Intelligent software robots that work on structured content and automatically extract metadata information that is relevant and meaningful to the domain/sub-domain at hand
How do they work?• Extractor agents use the WorldModel™ definition for metadata extraction• Extractor agents exploit the structure of content and automatically “pick up” meaningful metadata information• Write once, Extract permanently – schedulable according to needs• Can work on Web content, feeds, XML, corporate databases, etc.• Extractor agents specific to structure of content-at-hand
Equity
Company
Ticker
Industry
Sector
Executive
Headquarters
Equity WorldModel™
ExtractorAgent
ForCNNfN
Pick up syntax metadata
Pick up company
Pick up ticker
Pick up industry
Pick up sector
Pick up executives
Metadata extracted
Pick up headquarters
Voquette Extractor Agents
Voquette Company Confidential
What is it?Knowledge Base: Network of entity objects (significant pieces of information) anda representation of the real-world relationships (associations) between them
What does it consist of?Entities (person, location, organization, etc.) and Entity-Relationships How does it work?• Structured closely to the structure of the WorldModel™• Entity and relationship template definitions for the domain at hand• Work with knowledge extractor agents to collect instances of entities from trusted sources• Automatically create relationships between instances using type definitions
Equity
Company
Ticker
Industry
Sector
Executive
Headquarters
Equity WorldModel™
ExchangeCisco
Systems
CSCO
NASDAQ
Company
Ticker
Exchange
Industry
Sector
Executives
John ChambersTelecomm.
Computer Hardware
Competition
Nortel Networks
Knowledge Base
Competes with
Headquarters
San Jose
CEO of
Equity Knowledge BaseDefinition
Company
TickerExchange
Industry
Sector
Executives
Headquarters
CEO of
Belongs to
Trades on
Represented by
Located at
Belongs to
Voquette Knowledge Base
Voquette Company Confidential
What is it?CACS: Module that categorizes content and automatically creates metadata of content
How does it work?Uses a hybrid of statistical, machine learning and knowledge-base techniques
Features• Core competency – Not only categorizes, but also catalogs (extracts metadata)• Unique solution for semantic metadata extraction from unstructured content• Flexibly adaptable for diverse domains
Structured content
Unstructured content
CACS
Equity Knowledge BaseDefinition
Company
TickerExchange
Industry
Sector
Executives
Headquarters
CEO of
Belongs to
Trades on
Represented by
Located at
Belongs to
Information exchange for metadatacreation
Topic: Company News
Metadata extracted:
Company: ConveraTicker: CNVRExchange: NASDAQ
Industry: Content ManagementSector: Computer SoftwareHeadquarters: Vienna, VAExecutives: Ronald Whittier
Voquette Categorization and Auto-Cataloging System (CACS)
Voquette Company Confidential
Semantic Engine
What is it?Semantic Engine: Fast main memory-based front end query engine that enables the end-userto retrieve highly relevant and personalized content via custom APIs
Features and Functionality
• Minimal input from user – system intelligent enough to provide only relevant content to user
• Deep levels of personalization
• Applications: Search, personalization, alerts, notifications, directory, routing, syndication
• Custom applications: Research Dashboard (demo)
SemanticEngine
Search
Personalization
Directory
Alerts/Notifications
Syndication
Dashboard
Custom Apps.
EndUsers
User query submitted
Highly relevant Content returned
ContentEnhancementTechnology
Voquette Semantic Engine
Voquette Company Confidential
Focused relevantcontent
organizedby topic
(semantic categorization)
Automatic ContentAggregationfrom multiple
content providers and feeds
Related relevant content not
explicitly asked for (semantic
associations)
Competitive research inferred
automatically
Automatic 3rd party content
integration
Semantic Application Example – Research Dashboard
Voquette Company Confidential
COMTEX Content Enhancement- Value-added metatagging
Limited tagging(mostly syntactic)
COMTEX Tagging
Content‘Enhancement’Rich Semantic
Metatagging
Value-added Voquette Semantic Tagging
Value-addedrelevant metatagsadded by Voquetteto existing COMTEX tags:
• Private companies • Type of company• Industry affiliation• Sector• Exchange• Company Execs• Competitors
Voquette Company Confidential
COMTEX Content Enhancement- Tag Normalization
VoquetteKnowledge
Base
Company name: Merrill Lynch & Co.
Source B Document<company_name=Merrill Lynch Corp.>
<company_name=Merrill Lynch & Co.>
<company_name=Merrill Lynch & Co.>
Source A Document<company_name=Merrill Lynch, Inc.>
Source A Document withnormalized tag
Source B Document withnormalized tag
Voquette Company Confidential
Technology Classification Metadata Features and Advantages Disadvantages and Limitations
Manual Yes Yes Intelligent, adaptable to changing business needs, high levels of accuracy, rapid integration and deployment, minimal upfront investment
Extremely slow, high cost of maintenance and ownership; may not be possible to scale with very high volume; difficult to have uniformity across humans
Information Retrieval/Document Indexing
No No Keyword-based search Typically poor relevance if used alone on a large data set
Clustering May be N/A User/Enterprise does not need to give taxonomy
Many clusters might be meaningless; broad commercial success not yet demonstrated
Lexical/Natural language (NLP)
N/A No Often better than keyword based search; natural language querying/phrases;
Good for summarizing document
Does not help beyond search and summarization ; generally cannot associate one document with other (no inferencing)
Rules-based Yes No Works well with complex taxonomies, high consistency
Intelligence bounded, high cost of maintenance, high computation cost and possible scalability limitations
Classification & Extraction Technology Comparisons
Voquette Company Confidential
Technology Classification
Metadata Features and Advantages Disadvantages and Limitations
Machine Learning/AI
(Bayesian, HMM, Neural Network)
Yes No User/Enterprise can define taxonomy; combined with indexing can lead to better keyword based search by limited search to a node in taxonomy ; broad variety of technology choices and good experience in applying the technology
User needs to provide training set; retraining needed if taxonomy is changed;
Success dependent on training;
usually unstructured documents/data only- not structured or semi-structured content
Thesaurus, Reference data, (Ontology)
N/A Limited Metadata limited to Terms in reference data or ontology
How is reference data kept up to date? Context is limited and applications are limited to narrow areas; sometimes “one size fits all” good for Web search but not necessarily for Enterprise applications ; power of relationship missing
Domain Model and Information Extractors
Yes Yes For structured data and semi-structured data (Feeds, Web sites); Domain model allows user/enterprise to define contextually relevant metadata;
Allows more precise query formulation (attribute-value);
Homogenization/integration;Semantic search
Need substantial toolkit support for writing extraction, mapping heterogeneous sources to uniform domain model
Knowledge Base (Entities/Classes plus Relationships)
Enhances Enhances Extremely powerful, especially when combined with Domain Model;
Automatic Metadata Enhancement;
very highly relevant search; beyond search (personalization, semantic associations)
Requires creation and maintenance of knowledge base and access to trusted sources for mining/synthesizing knowledge
Classification & ExtractionTechnology Comparisons (Contd.)
Voquette Company Confidential
Activity Traditional Effort CET Effort Comments
Categorization of Web pages
50 pages/day/editor 1,000 pages/day (with human supervision) [at least an order of magnitude higher without supervision]
Much higher quality metadata generation, in addition to higher quantity
Metatagging of news feeds
10-20 feeds (syntactic + semantic metadata) 100 feeds (syntactic metadata)
5,000-10,000 feeds/day (fully automatic)
No human supervision needed
Metatagging of internal/enterprise research content
50-100 assets/day/research editor
500-1,000 assets (with human supervision)
Human supervision supports higher quality metadata
Metatagging of content from multiple internal or external sources
Content editors using internally developed tools typically manage 1 to 5 sources
Single person can supervise automatic tagging of content from 20-50 sources
ROI Comparative Effort Chart
Voquette Company Confidential
Knowledge Base Toolkit
Extractor Toolkit
NT(any system supporting JVM)
Categorization and Auto Cataloging System
Semantic Engine
Linux/Solaris
WorldModel™ Knowledge Base
More DevelopersMore Sources
. . .
Higher Performance, Redundancy,More content
Deployment System Architecture
Toolkits (Workstation) Enterprise S/W (Server)
Voquette Company Confidential
Measures
• Quality
– Categorization accuracy: Around 90 % (domain and training dependent)
– Metadata extraction: limited only by WorldModel™ and KB
(for which we have automated maintenance support)
– Relevance: near 100% (unlike IR techniques, typical
precision/recall limitation do not apply when we have
metadata)
• Scalability
– Millions of documents per server (for Semantic Engine)
– Unlimited number of documents due to distributed index seamlessly
spanning multiple servers
– Few to hundreds of content sources (distributed SW agents)
Voquette Company Confidential
Measures (Continued)
• Performance
– Inclusion of new content source: 2 to 8 hrs
– Building WorldModel™ and Knowledge Base: 2 to 8 weeks per domain
for an effort leading to useful results (approx. 1 million entities and
relationships)
– Extraction – several documents per second (processing time)
– Near real-time search/personalization of new content and breaking
news (sub-minute, due to incremental indexing)
– 1 million queries per hour per server, or 1 to 10s of ms query
response/inference time due to main-memory indexing/data structures
• Robustness
– Semantic Engine has not needed rebooted for over 400 days!
– Many other engineering solutions (HW/SW redundancy) to meet any
SLA
Voquette Company Confidential
Voquette vs. The Rest
Pages Read and Classified
Voquette AverageHuman
Per Minute
Per Hour
Per Day
Per Year
600 - 10,000 (batch mode)
36,000 – 600,000
864,000 –14.5 Million
315 Million – 5.2 Trillion
1
60
480
120,000
Voquette vs. The Rest
Pages Read ,Classified, Metadata extracted, Normalized & Enhanced
Voquette AverageHuman
Per Minute
Per Hour
Per Day
Per Year
30
1,800
43,200
16 Million
1
60
480
120,000
Reading and Classification
Reading , Classification, Metadata Extraction, Normalization, Enhancement
Quantitative Measures
Voquette Company Confidential
Voquette Specifications
Semantic Engine &Knowledge Base Specs
Voquette
Queries per hour per server
Query Response Time(Lightly loaded server)
Query Response Time(Heavily loaded server)
Semantic associationscreated per hour
Semantic Associations per domain
1 Million
1 to 10 ms
100 to 200 ms
10,000
Over 1 million
Quantitative Comparison (Continued)