View
212
Download
0
Category
Preview:
Citation preview
MySQL Users Conf.04-19-2005
MIT Lincoln Laboratory1
Real-Time Sensor Data Warehouse Architecture Using MySQL Database
Jacob Nikom
MIT Lincoln Laboratory
The MySQL Users Conference 2005
19 April 2005
This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002.Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the United States Government.
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 2
Outline
• Introduction
• Corporate Information Factory (CIF) and its
Data Management Architecture (DMA)
• Designing ROCC DMA using CIF architecture
• Summary
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 3
Outline
• Introduction
– Reagan Test Site (RTS) and its instrumentation
– What is RTS Operations Coordination Center (ROCC)?
– ROCC primary operations
– ROCC logical component block diagram
– ROCC modernization
– New ROCC Data Management Architecture
• Corporate Information Factory (CIF) and its Data Management Architecture (DMA)
• Designing ROCC DMA based on CIF architecture
• Summary
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 4
Reagan Test Site (RTS) and its Instrumentation
• The Reagan Test Site (RTS) range instrumentation
– Multiple RF sensors collecting data in several regions of electromagnetic spectrum
– Multiple optical sensors collecting objects’ metrics and spectral characteristics
– Telemetry systems capable of tracking multiple targets
– Mobile and fixed ground safety instrumentation
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 5
What is RTS Operations Coordination Center (ROCC)?
Network
Flat Files
Sensors
Data Analysis Algorithms
Decision Algorithms
Current DMA
Displays
• RTS instrumentation is controlled by the ROCC
• ROCC primary operations– Executes the prepared scenario for the acquisition session
– Manages the data flow from multiple sensors
– Processes the acquired data
– Provides operator displays to track and predict the path of space objects
– Stores the acquired data for later analysis and reporting
– Facilitates training and simulation of performed activities
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 6
What kind of system is ROCC?
Feedback control system block diagram
• Control is the process of making a system variable adhere to a particular value, called reference value
• A system designed to follow a changing reference is called tracking control system
PLANTCONTROLLER
feedbackprocessor
referenceInput r(t)
controlled variable c(t)
feedbacksignal
actuatingsignal m(t)
error signal e(t)
b(t)c(t)
+
-
FORWARD PATH
FEEDBACK PATH
COMPARATOR
ROCC is a tracking control system following the predefined reference input
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 7
Current ROCC DMA Block Diagram
Planning
Reference Data
Report:Data analysis
Output Data
Data Plant
Sensors
Simulation
Automatic Real-Time Processing & Analysis
Manual Processing & Analysis
Displays Voice Operators
TrackingFusion
ClassificationIdentification
TrajectoryEstimation
Tactical decision control loop
• ROCC controls the data acquisition, analysis and distribution processes
• Maximizes the quality of delivered data over specified time
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 8
ROCC Modernization
• Obsolete system hardware– Old central processors and boards are no longer supported– Not enough computational power to perform new tasks– Old components and interfaces are incompatible with modern
technology
• Aging system software– Centralized monolithic architecture– Flat files for storing data– Use of old procedural languages– Alphanumeric displays
• Modernized system– Industry standard 32/64-bit Xeon or Opteron servers– Software vendor independence: Linux and Java– Database-based storage– Distributed architecture using publish/subscribe paradigm– Graphical user interface for visualization tools– Targeted dataflow rates: 5 MB/s (sustained), 10 MB/s (peak)– Data accumulation rate: 1 TB/year
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 9
New Data Management Architecture
• ROCC data management challenges
– Support powerful high-precision instrumentation with almost real-time response
– Support intensive and costly data collection process involving many human operators with high level of reliability
– Support data analysis leading to changes in data acquisition environment
– Be adequate for the wide range of transaction types – from simple real-time record reads and inserts to complex multidimensional analytical queries
– Manage combination of streaming data with traditional structures
– Provide request management, configuration management and data quality management capabilities
• Search for new data management architecture
– New system represents conceptual change from the old architecture
– Instrumentation and Control software traditionally concentrates on algorithm development and lacks good data architecture
– Need for framework supporting “analysis – decision – execution” paradigm
– Enterprise software is a leading implementer of distributed architecture and publish/subscribe paradigm
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 10
Outline
• Introduction
• Corporate Information Factory (CIF) for Data Management Architecture
– What is Corporate Information Factory (CIF)?
– CIF data flow diagram
– CIF data
– CIF layers
– CIF logical component block diagram
• Designing ROCC data management architecture using CIF architecture
• Summary
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 11
What is Corporate Information Factory (CIF) ? *
• Information ecosystem is a model of corporate information processing
– “CIF is the physical embodiment of the notion of an information ecosystem”
• CIF consists of the following components
– External world
– Applications
– An integration and transformation layer (I & T layer)
– An operational data store (ODS)
– A data warehouse (DW) with current and historical detailed data
– A data mart(s)
– An internet and intranet
– A metadata repository
– An exploration and data mining warehouse
– Alternative (secondary) storage
– Decision support system (DSS)
• CIF approach could be used for modeling information processing in any organization (“forest vs. trees” view)
* “Corporate Information Factory”, by W.H. Inmon, Claudia Imhoff, Ryan Sousa. Wiley; 2 edition (December 18, 2000)
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 12
CIF Data Flow Diagram
DW
Primary storage
management
Data acquisition
Integration &Transform
layer
Reference
data
Application layer
CRM (tx)
eComm (tx)
ERP (tx)
BI (tx)
Data delivery
Exploration warehouse
Data mining warehouse
Statistical analysis
DSS
applications
Finance
Sales
Marketing
Accounting
Data marts
External world
Enterprise transactions
Internet
Enterprise Resource Planning
(ERP)
ODS
Historical reference
data
Operational reports
External data
Metadata managementRow detailed data
Operational layer
Warehouse layer
Report & Analysis layer
eComm (rpt)
CRM (rpt)
ERP (rpt)
BI (rpt)
Alternative storage
CRM = Customer Relation Management
BI = Business Intelligence
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 13
CIF Data
• External data
– Data is defined outside of corporation. Could have erroneous, redundant or unnecessary items
– Data format is defined outside of corporation. Reformatting could be required
• Reference data
– Allows to standardize on commonly used names for important and frequently used information
– Allows consistent interpretation of corporate data across different departments
– Could be aliases for common and often referred names
• Historical data
– Volume of data – longer history more data
– Usefulness of data – recent data is more useful than the older one
– Granularity of data – older data likely be used on summary level
ODS Applications
Ancient history Recent history Most current activity Immediate future
Corporate timeline
Data
DW
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 14
CIF Layers
• Application layer
– Interacting directly with end user
– Gathering detailed transaction data
– Auditing and adjusting data
– Editing data
• Integration and transformation layer
– Combined non-integrated data from multiple application
– Transform external data into corporate data
– Creating appropriate metadata
– Mathematical transformation
– Reformatting and resequencing
CRM (tx)
eComm (tx)
ERP (tx)
BI (tx)
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 15
CIF Layers (Continued)
• Operational layer– Subject-oriented– Integrated– Volatile– Current-valued– Detailed– Normalized
• Warehouse layer– Subject-oriented– Integrated– Nonvolatile– Time-variant– Comprised of both summary and detailed data– Summary data optimized for Report & Analyses queries– Normalized and de-normalized data
• Report & Analysis layer– Statistical analysis
– Exploration reporting– Data mining reporting
– DSS analysis and reporting– Finance – Sales– Marketing– Accounting
ODS
DataWarehouse
eComm (rpt)
CRM (rpt)
ERP (rpt)
BI (rpt)
Statistics
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 16
CIF Logical Component Block Diagram
Corporate Goals
Reference Data Data Plant
Applications
Tactical decision control loop
OperationalData Store
Real-time DSS
Long-term DSS
DataWarehouse
Strategic decision control loop
Output Data
Corporate Report
• System controls the corporation resources using real-time and long-term DSS
• Maximized the expected profit of corporation over specified time
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 17
Outline
• Introduction
• Corporate Information Factory (CIF) for Data Management Architecture (DMA)
• Designing ROCC DMA using CIF architecture
– ROCC data flow diagram
– ROCC data
– ROCC layers
– ROCC logical component block diagram
– Database selection
– Three dangers of database design
• Summary
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 18
ROCC Data Flow Diagram
Operational data
Data acquisition
Integration &Transform
layer
Reference data
Archived data
SpaceData
marts
ODS
Operational layer
Warehouse layer
Report & Analysis layer
External world
Multicast middleware
Quick Look reports
Planning
…
Post overview BET
Impact
Bias modeling
Data mining warehouse
Sensor control data
Short-term reporting &
analysis
Long-term reporting &
analysis
RIB
RIB
RIB
RIB
Best Choice
Smoother
Data Fusion
Classifier
DSS applications
Secondary storage
DW
RIB = ROCC Interface Box
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 19
ROCC Data
• External data– Data is defined outside of ROCC. Could have erroneous, redundant, or unnecessary items– Data format is defined outside of ROCC. Reformatting or object conversion could be
required
• Reference data– Comprise geophysics models and constants necessary for external data interpretation– Comprise common locations, sensor names, name of computers, programs– Comprise the user names, passwords, access rights and privileges
• Historical data– Operational data being migrated to the warehouse become historical data– Detailed historical data are used to produce summarized historical data– Historical data only inserted, never updated
• Planning data– Comprise configuration data for the sensors’ acquisition procedures– Comprise ROCC software components’ configuration data (XML format)– Comprise data to plan specific activities to acquire space objects’ coordinates
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 20
ROCC Layers
• External world
– Simultaneous output from multiple sensors up to 10 MB/s– Capable to produce data autonomously– Capable to work under the guidance of DSS applications– Produces data as streams with considerable output rates
Feedback from DSS applications
• Integration and transformation layer
Plays vitally important role in reconciling the incoming external data
content and format with the internal data requirements
– Converts incoming data into appropriate Java objects– Creates necessary metadata– Mathematical transformation– Reformatting and resequencing
RIB
RIB
RIB
RIB
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 21
ROCC Layers (continued)
– Subject-orientedFocusing on basic transaction processing. Inserts and reads the streams of integrated andtransformed sensor data• Tracks, Ids, Control blocks, etc.
– Integrated Physical unification and cohesiveness
• Uniform key structures• Table naming conventions• Common physical units and coordinate systems• Data layouts and Metadata
– VolatileODS data could be updated (replaced) as a normal part of processing. After acquisitionsession is done the data are moved to the DW
– Current-valuedODS data values are related to the current event (current acquisition session). For the nextmission the ODS will be updated and its content will be moved to the DW (data migration)
– DetailedODS contains inserted values of the published sensor objects and does not expect to havesummary data
– NormalizedODS contains normalized data
– Decision Support System ApplicationsMakes real-time operational decisions like ID assignment, sensor allocation, etc
ODS
Best Choice
Smoother
Data Fusion
Classifier
DSS applications
• Operational Layer
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 22
ROCC ODS Specifics
• Data streams of objects– Streams of measurements usually don’t have very complex structures– Object-relational mapping is straightforward and not computationally intensive
• Indices– High-speed insertion does not allow to use indices– Relatively small size of the ODS allows to work without indices– Indices do exist in the DW
• Real-time DSS feedback– Could control the sensors, which in turn influences the input data– Typical analytical application assume that data producer is not changed during
the query
DW
Network
Secondary System
Primary System
ODS
Network
ODS
Network
Archive System
Additional benefits
• Necessary operations could be performed during the copying
• Two operational databases could be used in parallel right after the acquisition
• Fault-tolerance (primary and secondary ODS)
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 23
ROCC Layers (continued)
• Historical (data warehouse) layer
– Subject-oriented
Organized like ODS around major ROCC entities, but focused on the modeling and analysis of data
– Integrated
Data migrated into DW from ODS are integrated with the rest of DW data
– Time-variant
Every datum in the data warehouse is identified with a particular time period. All summarized data are correct only for the particular period to whom the corresponding detailed data are identified with
– Non-volatile
There are no updates in the warehouse, only inserts. The past cannot be changed, only expanded
– Comprised of both summary and detailed data
Once detailed data from ODS migrated into DW, they became a part of history. In addition to the detailed historical data DW contains summary data. They are pre-calculated to reduce analytical query times
– ROCC DW specifics
ROCC DW does not use multidimensional data model yet, only summarized tables
DataWarehouse
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 24
What is Angle Bias Modeling?What is Angle Bias Modeling? Creation of a mathematical model to describe differences Creation of a mathematical model to describe differences between reported and actual antenna pointing positionsbetween reported and actual antenna pointing positions
Δ
Adjusted pointing using biases
Raw pointing information
Bias
Corrected pointing information
Bias model Bias model coefficientscoefficients
DataWarehouse
Bias
Modeling
Application
ODS RIB
Real-time queriesReal-time queries
Storing sensor Storing sensor data streamsdata streams
Data Data migrationmigration
Analytical Analytical queriesqueries
Sensor data Sensor data collectioncollection
Sensor Control System
ROCC Layers (continued)
Continuous automatic monitoring of sensor metric performanceExample: Angle Bias Modeling using ROCC Data WarehouseExample: Angle Bias Modeling using ROCC Data Warehouse
• Analysis and Reporting layer
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 25
Angle Bias Modeling using ROCC Angle Bias Modeling using ROCC Data WarehouseData Warehouse
Organization of Sensor-Specific Summary Track Data in the Warehouse
Observed Data Truth Data (Time-aligned and in Sensor Coord) Residual Data
Time Range Az El Iono Corr Tropo Corr SNR Range Az El Delta Rng Delta Az SNRSource
Bias Modeling Application Data Flow
Generate Generate ResidualsResiduals
Observed Observed Data Data
AtmosphericAtmospheric DataData
Truth Truth DataData
Residual Residual Data Data
Multivariate Multivariate RegressionRegression
Bias ModelBias ModelAnalytic Analytic EquationEquation
Bias Model Bias Model CoefficientsCoefficients
ReportReport
Sensor Control Sensor Control System System
DataWarehouse
Strategic decision control loop
Data Warehouse
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 26
ROCC Logical Component Block Diagram
Planning
Reference Data
Tactical decision control loop
Strategic decision control loop
Output Data
ReportData Analysis
Data PlantSensors
Simulation
Displays Voice Operators
Operational Data Store
Tactical real-time DSS
Strategic long-term DSS
Data Warehouse
Bias Modeling Sensor Comparison Operators
• ROCC controls the RTS resources using tactical and strategic DSS
• Maximizes the quality of collected data over specified time
TrackingFusion
ClassificationIdentification
TrajectoryEstimation
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 27
Database Selection
Comparison Comparison criteriacriteria
(qualitative values)
MySQL Oracle DB2 (IBM) SQL Server
(Microsoft)
PostgreSQL
Speed High High High High Low
Sophistication Moderate High High High High
Reliability High High High Moderate Low
Administration
simplicity
High Low Low Moderate High
Standardization High Moderate Moderate Moderate Moderate
Savings High Low Low Low High
• The same server should work adequately for both ODS and DW
• Deficiency in sophistication could be mitigated by custom programming
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 28
Three dangers of ROCC DMA design
• “Balkanization” of data– Different groups of data have different design– Attempt to fit data definitions into requirements of the existing tool– In the long run increase the maintenance cost
• Dialectism– Usage of specific database dialects– Deviation from existing SQL standards– Locks the user with specific vendor
• “Dirty” repository design– Part of the data stored in the database, another (closely related on)
stored in the file system– Duplication of data between database and file system– Increases the maintenance const
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 29
Outline
• Introduction
• Corporate Information Factory (CIF) for Data Management Architecture
• Designing ROCC data management architecture using CIF Architecture
• Summary
MIT Lincoln LaboratoryMySQL Users Conf.
04-19-2005 04/21/23 04:19 AM 30
Summary
• Modernization of the ROCC calls for a new type of data management architecture– New high-performance hardware– Significant increase of generated and managed volumes of data– Introduction of new services
• CIF satisfies the requirements– Designed to support large scale information system– Effectively manages different types of information queries– Provides flexibility in distributing data between multiple producers and consumers
• ODS and DW represent two types of repositories for information request– ODS supports near real-time storage requirements and targeted, low granular queries– DW is used for complex queries against summary-level data
• ODS and DW are parts of different control loops– ODS provides information for tactical decisions about near real-time data acquisition– DW delivers feedback for strategic decisions leading to system improvements
• MySQL is a good fit for ODS and DW databases– Good performance for fast queries in ODS– Capable of storing large amount of data in DW– Simple installation and licensing allow many independent servers to run inside one system
being used as ODS, DW, data marts, etc.– Excellent Java support allows seamless integration with the rest of the software
Recommended