1
Abstract NASA and its partners produce staggering amounts of data: petabytes per day for both earth and space science, with some missions producing over a petabyte/day individually. Examining all the data by hand is clearly impossible; progress has been made in automated discovery, but little has been done to enable the user to search for potential items of interest directly. The goal of our research is to address this gap. Strategic Alignment: OCT Roadmap TA11 (Intelligent Data Understanding, Data Lifecycle elements) National Aero R&D Plan (organization and mining of safety data elements) Initial TRL: 3 Martian Images Partner: Planetary Data System Customer: Scientists • Search for images by metadata • Scientists enter desired values • Tradeoffs over matching values are used to rank images D SEARCH METHODS FOR MULTIDIMENSIONAL DATA –––––––––––––––––––––––––––––––––––––– Shawn Wolfe Nikunj Oza, PhD Yi Zhang (UCSC), PhD Solution Our approach is to adapt elements from several methods to address the problem. Relevance estimation concepts from information retrieval instead of strict constraint application. Multidimensional utility function from utility theory. Query refinement by data mining explicit and implicit user feedback. Multiattribute query specifications from database systems. Problem No existing technology adequately supports a user who wants to search for data in today’s vast data sets: • The strict query interpretation in databases often over- restricts or under- restricts the results. • Data mining does not support ad hoc search as it is not user directed. • Information retrieval is applied to text, not data. As a result, the user misses important items in the data. Systems Operations Data Partner: System-Wide Safety and Assurance Technologies Customer: Operations and Maintenance • Search begins with prototype of anomaly • System ranks candidates by variation from prototype • System automatically expands on user’s initial specification. Safety Reports Partner: Aviation Safety Reporting System Customer: Safety Analysts • Search over different field types (numbers, categories, text) • Fine-tune performance by data mining past queries and results • Users can also provide explicit feedback to refine results Contact: Shawn Wolfe Intelligent Systems (TI) (650) 604-4760 <[email protected]> Novelty/Contributio n • Decrease difficulty of finding important data • Utilize strengths from multiple technologies • Combine human and machine intelligence

Search Methods for Multidimensional Data

Embed Size (px)

Citation preview

Page 1: Search Methods for Multidimensional Data

Abstract

NASA and its partners produce staggering amounts of data: petabytes per day for both earth and space science, with somemissions producing over a petabyte/day individually. Examining allthe data by hand is clearly impossible; progress has been madein automated discovery, but little has been done to enablethe user to search for potential items of interestdirectly. The goal of our research is to address this gap.

Strategic Alignment:OCT Roadmap TA11 (IntelligentData Understanding, DataLifecycle elements)National Aero R&D Plan(organization and miningof safety data elements)

Initial TRL: 3

Martian ImagesPartner: Planetary Data

SystemCustomer: Scientists

• Search for images by metadata• Scientists enter desired values• Tradeoffs over matching values are used to rank images

DD

SEARCHMETHODS FOR

MULTIDIMENSIONAL DATA––––––––––––––––––––––––––––––––––––––

Shawn WolfeNikunj Oza, PhD

Yi Zhang (UCSC), PhD

Solution

Our approach is to adapt elements fromseveral methods to address the problem.

• Relevance estimation concepts from information

retrieval instead of strict constraint application.

• Multidimensional utility function from utility theory.

• Query refinement by data

mining explicit and implicit user feedback.

• Multiattribute query specifications from database

systems.

Problem

No existing technology adequately

supports a user who wants to search fordata in today’s vast data sets:

• The strict query interpretation in databases often over- restricts or under- restricts the results.

• Data mining does not support ad hoc search as it is not user directed.

• Information retrieval is applied to text, not data.

As a result, the usermisses important items in the data.

Systems Operations DataPartner: System-Wide Safetyand Assurance Technologies

Customer: Operations and Maintenance

• Search begins with prototype of anomaly • System ranks candidates by variation from prototype • System

automatically expands on user’s initial specification.

Safety ReportsPartner: Aviation Safety

Reporting SystemCustomer: Safety Analysts

• Search over different field types (numbers, categories, text) • Fine-tune performance by data mining past queries and results • Users can also provide explicit feedback to refine results

Contact:Shawn Wolfe

Intelligent Systems (TI)(650) 604-4760

<[email protected]>

Novelty/Contribution

• Decrease difficulty of finding important data• Utilize strengths from multiple technologies• Combine human and machine intelligence