Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A B D C E A
Foundation
Columnar Storage
Speeds query time by reading only necessary data
Compression
Lowers costly I/O to boost overall performance
MPP Scale-out
Provides high scalability on clusters with no name node or other single point of failure
Distributed Query
Any node can initiate the queries and use other nodes for work. No single point of failure
Projections
Combine high availability with special optimizations for query performance
Ongoing Commitment to Innovation
Flex Tables(Schema on Read)
SQL on Hadoop
Kafka Support
Fast ORC Reader
Live AggregateProjections
Geospatial &Social Analytics
Fast ParquetReader
In-database ML
Innovation Timeline
2013 2014 2015 2016 2017
Google Cloud
Platform
Query S3
Data Lake
Columnar Store
Aggressive Data
Compression
MPP Architecture
HA Architecture
ANSI SQL Compliant
Java, Python, R APIs
ACID Compliance
No Single Point
of Failure
Management Console
Database Designer
Projections and
Optimizations
Foundation
Cascading Resource Pools
Directed Queries
Dynamic WorkloadManagement
Big FlatTables
ParallelLoading
TextAnalytics
AmazonAWS
MSAzure
S3Connector
Analyze in the Right Place
In-Database Machine Learning
& Advanced Analytics
Freedom from Underlying
Infrastructure
Strong Reliable Performance at Exabyte Scale
The Industry’s Only Infrastructure Agnostic,Unified Advanced Analytics Platform for All Your Data
Use as a database or a query engineUse as a database AND a query engine
Choose your own storage
Use your preferred file format
ROS
21
A History of Separation and Integration
Vertica in Enterprise ModeOn-Premises
Vertica in Enterprise ModeAWS, Azure, Google Clouds
Vertica in Eon Mode Amazon Web Services
SINGLE UNIFIED ENGINE
Vertica Database
Vertica in Eon Mode Opens Up a New World of Analytic Possibilities
• Next generation of analytics architecture
• Separation of compute and storage
• Elastic scaling
• Maximizes cloud economics
• Supports dynamic workloads
• Simplifies database operations
• Opens up next generation automation and analytic workloads
Amazon S3
AmazonEC2
AmazonEC2
AmazonEC2
Depot Depot Depot
VerticaROS Storage
DatabaseAdministrator
Workload Isolation
SUN MON TUES WED THU FRI SAT
x1
x2
x4C
OM
PU
TE C
APA
CIT
Y
Marketing
DataScience
Dashboard
Storage Disruption is Beyond Public Clouds
Gartner says, by 2021, more than 80% of enterprise data will be stored in scale-out storage systems in enterprise and cloud data centers, up from 30% today.
The number of solutions supporting object storage APIs (primarily Amazon S3 API) is growing at an incredible pace, and now counting more than 4,000 different products.
Vertica by the Hour on AWS Marketplace
Easy-to-consume, all-in-one hourly pricing per node enables anyone to: • Start small and grow on the fly•Unlimited data size • Employ OPEX vs. CAPEX spending• Support included
Frictionless Consumption
In Vertica’s Management Console (MC), a GUI Web admin tool:
• Added query execution functionality
• Included a Catalog size growth chart
Increased MC’s awareness of and utility for the Cloud:
• Implemented AWS Provisioning and Management of a Vertica Cluster and DB for the Cloud
• Included option for using IAM authentication in MC S3 Load UI
• Added screens showing how data is sharded across Eon nodes, along with Depot path and state
Have You Tried The Vertica Management Console Lately?
Challenges
Processing Power
Data Movement
Scalability – small data to big data
Incremental costs
Security
Data integrity
Machine Learning in Production
Vertica bridges the gap between Machine Learning as a science project and production deployment
Vertica ML algorithms – available today, built to scale
Linear regression K-meansLogistic regression
Naive Bayes Random ForestSVM
Predict customer retention
Forecast sales revenues Customer segmentation
Predict sensor failureClassify gene expression datafor drug discovery
Refine keywords to improve Click Through Rate (CTR)
Business Understanding
Data Analysis &
Understanding
Data Preparation Modeling Evaluation Deployment
Machine Learning
Speed
ANSI SQL
Scalability
Massively Parallel
Processing
Deploy Anywhere
Outer Detection
Normalization
ImbalancedData
Processing
Sampling
Missing Value Imputation
And More…
Support Vector
Machines
Random Forests
Logistic Regression
Linear Regression
Ridge Regression
Naive Bayes
Cross Validation
And More…
Model-level Stats
ROC Tables
Error Rate
Lift Table
Confusion Matrix
R-Squared
MSE
In-Database Scoring
Speed
Scale
Security
Pattern Matching
Date/Time Algebra
Window/Partition
Date Type Handling
Sequences
And More…
Sessionize
Time Series
Statistical Summary
SQL SQLSQL SQLSQL
Vertica Machine Learning Process Flow
HEALTHDATA COMM ACTIVITY CONTENT CONTEXT IDENTITY ASSET
DATA RELATIONSHIP ePORTFOLIO GOVERNMENTRECORDS
Sensitive data explosion – type and scale
PERSONAL DATA
Citizenship
Corporate Board of
Directors
Law Enforcement Records
Public Records
Legal Name
Births
Deaths
Marriages
Divorces
Property Ownership
Academic
Exams
Student Projects
Transcripts
Degrees
Employment
Reviews
Actions
Promotions
Continuing Education
Virtual Goods
Identifiers
Domain Names
Handles (twitter etc)
Objects
Gifts
Currencies
Financial Data
Income
Expenses
Transactions
Accounts
Tax Info
Assets
Liabilities
Insurance
Credit Rating
Physical Goods Digital Records
Real Estate
Vehicles
Personal Effects
Art
Appliances
Contacts
Address Book
Communications
Call Logs
Messaging Logs
Social Networks
Family Geneology
Demographic
Age
Sex
Address
Profession
Identifiers
Name
User-names
e-Mail Addresses
Phone Numbers
Nick Names
Persons
Device IDs
IP addresses
Bluetooth IDs
SSID
IMEI
SIM
Interests
Declared
Llikes
Favorites
Preferences
Location
Current
Planned Future
Past
People
Copresent
Physical World
Digital World
Interlaced With
Events
Calendar Data
Event Data from
Web Services
Objects
Copresent
Physical World
Digital World
Interlaced With
Private Documents
Word Processing
Spreadsheets
Project Plans
Presentations
Consumer Media
Books
Photos
Videos
Podcasts
Music
Audio Books
Games
Software/Apps
Browser
Clicks
Keystrokes
Sites Visited
Queries
Bookmarks
Client Apps
Physical World
Eating
Drinking
Driving
Shopping
Sleeping
Operating System
Presence
Availability
Channels
Text
SMS
IM/Chat
Attachment
Body
Status Updates
Social Media
Videos
Podcasts
Photis
Shared
Produced Music
Links
Bookmarks
Speech
Voice Calls
Voice Mails
Insurance
Claims
Payments
Coverage
Personal
Tracking Devices
Activity Records
Genetic Code
Patient
Prescriptions
Diagnosis
Device Logs
Measurement
5200GB of data for every person by 2020!
Computerworld, 12/2012
InsuranceClaimsPaymentsCoverage
PersonalTracking DevicesActivity RecordsGenetic Code
PatientPrescriptionsDiagnosisDevice LogsMeasurement
HEALTHDATA
DemographicAgeSexAddressProfession
IdentifiersNameUser-namese-Mail
AddressesPhone NumbersNick NamesPersonsDevice IDsIP addressesBluetooth IDsSSIDIMEISIM
Interests
IDENTITY
California Consumer Privacy Act (CCPA)
New York State Department of Financial Services (NYDFS)
Health Insurance Portability and Accountability Act of 1996 (HIPAA)
Gramm-Leach-Bliley Act (GLBA)
Children’s Online Privacy Protection Act of 1998 (COPPA)
Defense Federal Acquisition Regulation Supplement, Controlled Unclassified Information (DFARS-CUI)
Hundreds more among 50 states and territories…
Data Protection is now the Law
* Source: Data Protection and Privacy in 26 Jurisdictions Worldwide, Law Business Research Ltd.
EU: General Data Protection Regulation (GDPR)
Australia: PrivacyAct of 1988 (Privacy Act)
Japan: Act on the Protection of Personal Information (APPI)
China: 2017 Cyber Security Law
Canada: Personal Information Protection and Electronic Documents Act (PIPEDA)
South Korea: Personal Information Protection Act (PIPA)
Hundreds more across the world…
Before: All applications and users have access to data
Analysts Help Desk DBAs Malicious User
HR Application ETL Tool Mainframe App Malware
Name SSNs Credit Card # Street Address Customer ID State Score
James Potter 385-12-1199 3712 3456 7890 1001 1279 Farland Avenue G8199143 NY 100
Ryan Johnson 857-64-4190 5587 0806 2212 0139 111 Grant Street S3626248 NY 200
Carrie Young 761-58-6733 5348 9261 0695 2829 4513 Cambridge Court B0191348 CA 120
Brent Warner 604-41-6687 4929 4358 7398 4379 1984 Middleville Road G8888767 CA 120
Anna Berman 416-03-4226 4556 2525 1285 1830 2893 Hamilton Drive S9298273 KY 160
After: Format-preserving encryption at the field level
Analysts Help Desk DBAs Malicious User
Payments App Malware
Name SSNs Credit Card # Street Address Customer ID State Score
Kwfdv Cqvzgk 161-82-1199 3712 3488 7865 1001 2890 Ykzbpoi Clpppn G7202483 NY 100
Veks Iounrfo 200-79-4190 5587 0876 5467 0139 406 Cmxto Osfalu S0928254 NY 200
Pdnme Wntob 095-52-6733 5348 9212 3456 2829 1498 Zejojtbbx Pqkag B7265029 CA 120
Eskfw Gzhqlv 178-17-6687 4929 4356 7432 4379 8261 Saicbmeayqw Yotv G3951257 CA 120
Jsfk Tbluhm 525-25-4226 4556 2598 7643 1830 8412 Wbbhalhs Ueyzg S6625294 KY 160
NIST Standard FF1 preserves format and length of data at the source upon creation
ETL ToolHR Application