Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Oracle Big Data Cloud Service
Presented by : Mandeep Kaur SandhuSenior Oracle DBA
Download these slides from : mandysandhu.com
• Introduction to Big Data• Oracle Big data deployment models• Oracle Big Data cloud Service• Core Principles• Access and Admin tasks• Data Management tools• Event Hub • Conclusion
2
Goals
3
What is Big Data??
Batch
Streaming Data
Terabytes
Zettabytes
Structured andUnstructured
Structured
VarietyVelocity
Volume
• Big data is a term that describe Large or complex datasets
• Traditional data Processing system failed to analyse this data
• Big data identify the value of data
An open Source Software Platform for distributed storage and processing – Highly Scalable , Reliable and Available
4
What is Hadoop??
Hadoop
Logically Distributed file system
Framework for processing
Designed to run on small/large machine for parallel processing
Allow resource Growth
Avoid Vendor Locks in
HDFS MapReduce
HDFS stores the data in cluster
• NameNode• DataNode
5
Two Components
Programming Model for processing large data sets
• Map - set of data and converts into another set of data • Reduce – Take output of Map as input and combine into smaller set
MapReduce
6
7
Oracle Big Data Deployment Models
Oracle Big Data Cloud service model
delivered in your data centre, behind your
firewall
Oracle Big Data Cloud at
Customer(BDCC)
On- Premises engineered system designed to deliver predictable Hadoop
infrastructure
Oracle Big Data Appliance X6
Oracle public cloud infrastructure with cluster nodes and
data sources
Oracle Big Data Cloud Service
(BDCS)
Operational Efficiency• Out of box installation • Automated cluster management• Cloudera Manager
Security• Data in encrypted – At rest and motion• Authorization and Authentication• Network Firewall
Versatility• Cloudera distribution – Apache Hadoop Enterprise Data hub• Install and operate third party software
8
BDCS - Core Principles
Highly Efficient Cluster Management • Fault Tolerant – HA Hadoop Infrastructure• Fully tested Hadoop upgrades
Cluster Nodes• Cluster is a collection of nodes• Permanent nodes• Edge Nodes• Compute Nodes
9
BDCS - Features
• Master or Data node • Last for the lifetime of the cluster• Each nodes has:
• 32 OCPU’s• 256 GB RAM• 48 TB Storage• Full Cloudera distribution – Licence and Support
10
Permanent Nodes
• Empty Nodes – OS and disk• Hadoop client configs• Interface between Hadoop cluster and outside
Network• Permanent node
Note: No data Node role
11
Edge Nodes
• CPU and Memory• No disks• Temporary nodes• Need to Have cluster to add compute nodes• Cluster can be extended up to 15 cluster
compute nodes• No HDFS data
12
Compute Nodes
• Oracle Linux 6 and Oracle Java – JDK8• Cloudera Enterprise (Data Hub Edition)
• CDH 5.X with support for YARN and MR2• Cloudera Impala• HBASE• Cloudera Search• Apache Spark
• Oracle R distribution• Oracle Big Data Spatial and Graph
13
BDCS – Included Software
Oracle Big Data SQL Cloud Service• Unified SQL access• Dedicated instances
14
BDCS – Additional Component
Oracle cloud
Cloudera 12c
B X
• Login to Oracle cloud • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes • Additional Node – Added later• Big Data SQL node
15
Oracle BDCS – Service Instance
• Go to Oracle big data service instance • Create service cluster• Provide tags and Instance Name
16
Oracle BDCS – Service Cluster
• Select Big data Appliance system – Service instance• SSH keys
17
Oracle BDCS – Service Cluster
Starter pack 1 –> 3 instancesLowest IP address –> Master Node
18
Oracle BDCS – Admin page
• You can connect via– opc• CLI – bdacli• Overall information about cluster
19
Oracle BDCS – Connect
• Open Cloudera console • Username/password
20
Access Cloudera console
• Add nodes in one node increment – up to total 60 nodes• Four Permanent Hadoop nodes – Allow additional Edge Node
• Extend/Shrink the service
21
Administrative Tasks
• Open Cloudera console – Hue
• Same account detail as CM• Add Group• Add User• Upload file
22
Hue – Group/user and File upload
• GUI based console• Login username – bigdatamgr• Explore jobs and data stored• Usage and Health of cluster• YARN jobs
23
Big Data Manager Console
• Zeepelin Notebooks – Interactive analysis using R and Python
24
Oracle Big Manager - Notebook
odcp• Command line for copy large files• Take input and split it into chunks• Uses spark to provide parallel transfer
Examples:
odcp hdfs:///user/mandy/bigdata01.csv hdfs:///user/mandy/bigdata01.csv_copy
odcp hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy
odcp hdfs:///user/mandy/bigdata01.csv s3://aserver/bigdata01.csv_copy
odcp s3://user/mandy/bigdata01.csv s3://mandy01/bigdata01.csv_copy
25
Data Management - odcp
odiff• Oracle distribution diff – To compare large Data sets• Compatible with cloudera distribution• Minimum block size to compare – 5MB• Maximum – 2GB
Examples:
/usr/bin/odiff hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy
/usr/bin/odiff -V hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy
/usr/bin/odiff -d hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy
26
Data Management - odiff
bda-oss-admin• To Manage data and resources• Can set the environment variables• Configure the cluster with storage provider
Examples:
bdm-oss-admin --cm-username admin --cm-password abce1234
bdm-oss-admin restart_cluster
#!/bin/bashexport CM_ADMIN="my_CM_admin_username"
27
Data Management
bdm-cli • Big data command line interface to copy data and mange copy jobs• Duplicate of odcp commands
bdm-cli copy
bdm-cli create_job
28
Data Management – bdm-cli
Oracle Big Data Cloud Service
Direct ingest into oracle BDCS
29
Data ingest options
Customer Data Centre
Flume
SCP
SCP(SSH protocol)
Common ingests using Flume or ETL work
VPN and FastConnect
• Open Source stream processing• Real time streaming• High throughput and Low latency platform
30
Apache Kafka
Steams ProcessingIOT
Anomaly Detection
Data IntegrationData Lakes
HDFSObjects storage
Log AggregationClick Streams
Server logs
MessagingTraditional AppsMicros-services
• Fully Managed streaming data platform• Provide world’s most popular message broker( kafka)
• Flexible• Available full managed and dedicated deployment option• Elastic – horizontally and Vertically
• Access• REST API access• SSH access to Kafka cluster
31
Oracle Event Hub Cloud Service
• Start you big data journey now• Built and populate a data lake• Help business to solve the problems by using data• Register for oracle cloud free trail
https://cloud.oracle.com/tryit
32
Conclusion
Thank you for your time!!
Follow and Subscribe Me.
Blog mandysandhu.com Twitter @mandysandhu14 LinkedIn kaurmandeep88