Upload
trantram
View
214
Download
0
Embed Size (px)
Citation preview
This Conference brought to you bywww.ttcus.com
Linkedin/Group:Technology Training Corporation
Technology Training Corporation
@Techtrain
Corporationwww.ttcus.com
Optimizing Data StorageOptimizing Data Storage and Management for Bi D t i th F d lBig Data in the Federal GovernmentGreg Gardner, PhD, CISSPColonel, Infantry, U.S. Army (Retired)Chief Architect Defense and IntelChief Architect, Defense and Intel SolutionsNetApp
2
Conference Agendas…and Baseball Lineupsand Baseball Lineups
No. 1, or lead off, hitter: Traditionally, lead off hitters were fast and reached base often. SES
No. 2 hitter: Managers often used a hitter with exceptional bat control in the number 2 spot. RADM
No. 3 hitter: This spot usually is reserved for the team's best hitter, who often hits for a high
d till h IDAaverage and still has power. No. 4, or cleanup, hitter: This batter is usually the
t ' bi t h th t
IDA
team's biggest home-run threat.
3NetApp Confidential - Internal Use Only
NIST
Conference Agendas…and Baseball Lineupsand Baseball Lineups
No. 7 and 8 hitters: These players often are weak hitters who are more important to the team forhitters who are more important to the team for their defensive capabilities. The No. 7 hitter might have a little more pop in his bat that can drive in p pthe players hitting in front of him.
The No. 8 batter is often light-hitting, but ideally AARON
g g ycan reach base on walks.
This allows the manager to use the No. 9 hitter to GREG
sacrifice, making his otherwise sure out into one that at least can advance the baserunner.
/ G
4NetApp Confidential - Internal Use Only
ALAN / GERARD
All This Data and Not Even to the “Peak”
Peak of Inflated ExpectationsVISIBILITY
Slope of Enlightenment
Plateau of Productivity
Technology TriggerTrough of Disillusionment
Slope of Enlightenment
Estimated size of the35 Zettabytes 5 Billion
Smart phones
TIME
Estimated size of the digital universe in 2020
Smart phones
30 BillionPieces of new content to
80%UnstructuredPieces of new content to
Facebook per month
5
Unstructureddata
Data Analytics PlatformPlatform
Intelligent Search Real Time Search Index
UserRules Engine
Man
agem
ent
Big Data Analytics Scale Out Platform Fault Tolerant
N1 N2 N3
Parallel Analytics Platform
Visualization
M Data ArchivalN4 NN…
Nodes
Data CollectorsData
C Data
Hub Location
CollectorsData CollectorsData
CollectorsData CollectorsAgent
Adapters
Agent
Adapters
©2012 Zaloni, Inc. All Rights Reserved.
Edge Location Edge Location
Coke’s Smart Dispenser Monitoring and Analytics
Micro-dispensing Technology to serve 100+ drinks
Cartridges store concentrated ingredients in dispenser and are RFID enabled.enabled.
RFID chips detect supplies and call-home
Other call home data: Daily debug, Diagnostics, Product usage
100,000 Freestyles in 2012
©2012 Zaloni, Inc. All Rights Reserved.
Coke Freestyle Hadoop Analytical Architecture
Collector
Scale out Real time
Data collection
.
.Dispenser
Collector..
Collector
.Dispenser
Agent Collector
.
Analytics
Local MaintainerHDFS
N1 N2 N3 NN…
Hadoop ClusterHBase
Analytics PlatformRules Engine
Visualization N1 N2 N3 NN…
Lucene IndexIntelligent Search
Visualization
Regional Analyst / Manager
©2012 Zaloni, Inc. All Rights Reserved.
National Sales Director
Local Data Capture for Machine Maintenance and Stockage
©2012 Zaloni, Inc. All Rights Reserved.
Regional Metrics and Intelligent Search
©2012 Zaloni, Inc. All Rights Reserved.
National Level Metrics and Visualization
©2012 Zaloni, Inc. All Rights Reserved.
Same model for MilitMilitaryConditions Based M i tMaintenance(CBM)??
CBM+ Hadoop Analytical Architecture
Scale out Real time
Data Bn Level lid tiEdge-T Collector collection consolidation
(BCCS Stack)
Edge-T Collector
Analytics
Crew ChiefAccessing 2410
HDFS
N1 N2 N3 NN…
Hadoop ClusterHBase
Analytics PlatformRules Engine
Visualization
MechanicAccessing LOGSA
& Parts N1 N2 N3 NN…
Lucene IndexIntelligent Search
Visualization& Parts
©2012 Zaloni, Inc. All Rights Reserved.
Avn Bde A/S4 Briefing Cdr on Maint Status
Local Data Capture / Retreival of M i t d L i ti l i f tiMaintenance and Logistical information
Insert AILA Network slideData
Condition BasedMaintenance (CBM)Maintenance (CBM)
PREVENTIVEPREVENTIVE INDICATORSINDICATORS DIAGNOSTICSDIAGNOSTICS PROGNOSTICSPROGNOSTICS ONON--CONDITIONCONDITIONPREVENTIVEPREVENTIVE INDICATORSINDICATORS DIAGNOSTICSDIAGNOSTICS PROGNOSTICSPROGNOSTICS ONON--CONDITIONCONDITION
• Reactive Maintenance
• Time Based
• Digital Source Collector Installation• Knowledge Development• Digital Source Collector Installation• Knowledge Development
• Proactive Maintenance
• ‘On Condition’
CBM Program The Purpose of Army Maintenance is to Generate Combat Power.
• Time Based Inspection/Overhaul
• Fault Diagnosis• Remaining Useful Life Calculation• Inspection Targeting
• Fault Diagnosis• Remaining Useful Life Calculation• Inspection Targeting
• On ConditionInspection/Overhaul
CBM Program Objectives:
• Decrease Maintenance Burden on the Soldier
Main Rotor HubMain Rotor ShaftBifilarsSwashplate ASSYSwashplate Guide
M/R SpindleSpherical BearingM/R Spindle Tie RodControl HornM/R Shaft ExtenderM/R D
T/R BladeT/R Pitch Change HornPC Links
T/R Pitch Change ShaftT/R Pitch Change Bearing
The Purpose of Army Maintenance is to Generate Combat Power.AR 750-1
• Increase Platform Availability and Readiness
• Enhance Safety
Swashplate Bearing**PC Links
M/R Damper
Tail Rotor GB**Retention Plates (2)Gearshaft
Viscous Bearings (4)**
M/R BladeM/R Blade Tip CapM/R Blade Expandable PinM/R Blade Cuff
T/R Driveshafts (7)**• Reduce Operations &
Support (O&S) Costs
Key CBM Enablers• Digital Source Collectors M i T i i M d l Engine (No 1 & 2) Oil C l A i l F
Intermediate GB**VibrationAbsorbers (2)
Digital Source Collectors• Flight Line Diagnostics• Data Fusion/Analysis
Currently Monitored by Vibration
Main Transmission ModuleAccessory ModuleGenerators (2)Hydraulic Pumps (2)Planetary Carrier
Engine (No. 1 & 2)Driveshafts (2)**Input Modules (2)
Oil Cooler Axial FanOil Cooler Fan Bearing**APU
This is NOT industry best practice…
Aviation Logistics Data
*2410IMMC
OSMISCEAC*OLR
IMMCCOBRAIMMC
*SOF/ASMAMCOM SAFETY
DailyYearly Daily
Quarterly
Weekly Yearly
Data sources currently being
displayed in FMAS marked with *
FMDR
*ACEIMMC
*AWRPIMSOL
Daily
*AMTracksAMCOM SAFETY
Daily
*LAR’SIMMC
Daily
*RIDB/Gold Book
*WOLFLOGSA
PIMSOL
MonthlyTOP
Live LinkDaily
*ACLOCPIMSOL
Daily
*RMISMonthly
RIDB/Gold BookLOGSA
Daily
TIER
Live Link Data Subscribers
Selected Data AccessDaily
*LMPLOGSA
Daily
RMISPIMSOL
CDDBLOGSA
D ilULLS HUMS
MDATPCMCIA
C d
SubscribersAED
Q-Tec
*PQDRAED
Monthly
18
Laptops(PC/QC/Shops)
Daily
BATTALION SERVERS
MDAT Card
DailyAC Laptops
What’s The Problem?
I tiIncentives
Culture
19NetApp Confidential - Internal Use Only
Big Data ABCs
Analytics for extremelylarge datasets
GainInsight
Big
KeepEverything
Go Fast
gData
Performance for data intensive workloads
Secure boundlessdata storageg
20NetApp Confidential – Limited Use
Big Data Market Requirements
Distributed compute farms with standard software
Very high streaming bandwidth
Boundaryless containers (10’s PBs)standard software
Simpler systems, faster to deploy
Optimized balance
bandwidth Dense
configurations -GB/s/Rack Unit
(10 s PBs) Simple access to
dynamic datasets New access Optimized balance
between compute & storage
Reliable data
Large data sets (containers)
Simpler system
New access technology(CDMI) with legacy application interfaces
(metadata)p y
configurations Highly efficient, self-managing systems
21
Big Content Solutions
Solution Workloads/Environment
File Services Multiapplication/client workloads Frequent read/write access
Enterprise ContentRepositories
Infinite container Active archive
Distributed Content Repositories
Large, multisite repositoryP li b d d t tRepositories Policy-based data management
What Is Apache Hadoop?An open-source software framework that: Supports data-intensive distributed Name Node /
HDFS
ppapplications
Enables applications to work with thousands of nodes and petabytes
Name Node /JobTracker
thousands of nodes and petabytes of data
Runs on a collection of commodity, h d thi
Map ReduceData Nodes /TaskTracker
shared-nothing servers Has two key components:
– HDFS. Hadoop Distributed File System
:
p y– MapReduce. Programming model for
processing and generating large datasetsData Nodes /TaskTracker
NetApp Confidential - Internal and Partner Use Only 23
HadoopTarget MarketsTarget Markets
Full Motion Video (FMV)Full Motion Video (FMV)– Government agencies building multi-petabyte video capture/playback solutions from
satellites, drones, etc.
Media Content Management (MCM) Media Content Management (MCM)– Broadcast and cable networks video w/ ingest/playback active repositories
Seismic Processing Solution (SPS)– Upstream Energy customers building Seismic Processing systems
Open Solution for Hadoop (OSH)– Orgs building large Hadoop clusters requiring high bandwidth and densityOrgs. building large Hadoop clusters requiring high bandwidth and density
HPC Solution for Lustre (Lustre)– Government and research institutes building HPC solutions
24NetApp Confidential - Internal Use Only
Vertical ApplicationsEnterprise Sales reports
C t
Government Event
monitoring
Retail Sales reports
T t
Financial Sales reports
F d Customer analysis
Cost trends Quality
monitoring Terrorism
monitoring Data mining
Target marketing
New product rollout
Fraud detection
Targeted marketing Quality
controlData mining
Internet E-commerce
rollout Customer
loyalty Customer
marketing Customer data Compliance
Web sites Data mining Customer data
buying habits Inventory
predictions New customer
acquisition
NetApp Confidential - Internal and Partner Use Only 25
Enterprise-Class Features with Hadoop
Dynamic Disk Pools SSD Cache and Hybrid Storage Enhanced Snapshot CopiesDynamic Disk Pools6x faster rebuild times (minutes, not days) andcontinuous high performance during drive rebuild
SSD Cache and Hybrid StorageExpedited access to “hot” data through automated, real-time caching to SSD; mix and match SSDs and HDDs
Enhanced Snapshot CopiesMore precise recovery point objectives and faster recovery
Encrypted Drive SupportExtended security enhancements for compliance
Enterprise Replication Cost-effective enterprise-class disaster recovery of data with
Thin ProvisioningImprove storage utilization up to 35%
26
and regulationsFC and IP replication and eliminate overprovisioning
Use the ESG ReportValidated for the EnterpriseValidated for the Enterprise
Improved the performance of the H d l t b 62%Hadoop cluster by 62%
Completed jobs under drive f il th 200% f tfailure more than 200% faster
Delivered linear performance scalability as data nodes setsscalability as data nodes, sets expanded
The Open Solution for Hadoop improves capacity and performance efficiency and recoverability compared to a server-based DAS deployment.
ESG 2012
27
- ESG, 2012
Use Case Example: NetApp Auto Support
Phone home data representing information about the status NetApp storage controllers
Correlate disk latency (hot) with disk type
g
disk type– 24 billion records – 4 weeks to run query– Hadoop implementation 10.5 hours
Bug detection through pattern matchingmatching– 240 billion records – Too large to
run– Hadoop implementation 18 hours
28
NIF: Big Science Needs Big Data
With a private cloud, NIF was able to:
• Deliver nonstop, 24/7 availability for critical data
• Cut planned downtime by 60 hours• Cut planned downtime by 60 hours per year to maximize facility availability
• Reduce latency by 97% to provide data to scientists within 15 minutes of an experimentan experiment
• Enable secure multi-tenancy to protect sensitive scientific and
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only29
government data
State-of-the-Science in HPC Today
16 M / 20 P k P t FLOPS (20 1015)
LLNL Sequoia - designed to be the fastest supercomputer and storage combination on the planet
16 Max / 20 Peak PetaFLOPS (20 x 1015) 1.6 Million Cores 1.6 PB main memoryy > 1.3 Terabyte/sec using:
– 55 PB of usable storage – Lustre Cluster Parallel File SystemLustre Cluster Parallel File System– QDR (40 Gigabit) Infiniband
~8 MW of PowerSi l ti f Simulations for – Nuclear Weapons Viability – Counter Terrorism Now #1 of the Top 500 HPC
Systems in the world – Energy Security – Climate Change
30Press Release: http://www.netapp.com/us/company/news/news-rel-20110928-990734.html
Systems in the world.http://top500.org.
Big Data in HR: Some Fundamental PrinciplesPrinciples
Transparency PrivacyPrivacy “Do No Harm” Validity and Verification Validity and Verification Security
31NetApp Confidential - Internal Use Only
Big Data: Transforming theDesign Philosophy of Future InternetDesign Philosophy of Future Internet
32NetApp Confidential - Internal Use Only
Remember: With Big Data it’s all aboutWith Big Data, it s all about…
I tiIncentives
Culture
33NetApp Confidential - Internal Use Only
34