Upload
hadoop-summit
View
256
Download
0
Embed Size (px)
Citation preview
BATCH IS BACK: CRITICAL FOR AGILE APPLICATION ADOPTIONRobby Dick, Solution Marketing, BMC
What Developers, Engineers and Ops NeedWorkflow creation should be intuitive for programmersRich support for databases, HDFS, common file formats and applicationsSimple access to logs and outputResilient to failures, graceful error handlingSupport for DevOps methodologies and toolsTesting supported out of the boxAuditing, Reporting, Compliance, Complex dependencies and event triggers, sophisticated scheduling, SLA Management …
What they get OozieCronTask SchedulerJenkins, Chef, Puppet, etc.ChronosMarathonOracle Enterprise ManagerSQL Agent
FalconNi-FiTalend SchedulerInformatica Workflow ManagerSWF, Data PipelineAzure Scheduler Databricks JOBSAzkaban, Luigi, Airflow …
How are they delivering?Changing … Almost EVERYTHING!
– Agile – code faster– DevOps – deploy faster– Containerization – isolation/insulation, scalability– Cloud – instant infrastructure – Hadoop – distributed file system and unlimited scale– noSQL – novel, flexible view of data
Except for Batch
#!/bin/ksh
cd /home/bmcU1ser/ftp_race_source
sftp -b /dev/stdin -o Cipher=blowfish -o Compression=yes -o BatchMode=yes -o IdentityFile=/export/home/user/.ssh/id_rsa -o Port=22 bmcUs1ser@hou-hadoop-mstr 1>sftp.log 2>&1 <<ENDSFTP
if [ -f /home/bmcU1ser/ftp_race_target/daily_shipment_log ]; then exit 1else put daily_shipment_log /home/bmcU1ser/ftp_race_targetfiquitENDSFTPrc=$?if [[ $rc != 0 ]]; then print "***Error occurred...$rc" `date "+%Y-%m-%d-%H.%M.%S"` if [[ -f /home/bmcU1ser/ftp_race_target/daily_shipment_log ]]; then rm /home/bmcU1ser/ftp_race_target/daily_shipment_log fi else mv /home/bmcU1ser/ftp_race_source/daily_shipment_log /home/bmcU1ser/ftp_race_source/old/daily_shipment_log print "***Successful transfer...$rc" `date "+%Y-%m-%d-%H.%M.%S"`fi
#!/usr/bin/sh # Sample pmcmd script set pagesize 0 linesize 80 feedback off
SELECT 'The database ' || instance_name ||' has been running since ' || to_char(startup_time, 'HH24:MI MM/DD/YYYY')FROM v$instance;
SELECT 'There are ' || count(status) ||' data files with a status of ' || statusFROM dba_data_filesGROUP BY statusORDER BY status;
SELECT 'The total storage used by the data files is ' ||sum(bytes)/1024/1024 || ' MB'FROM dba_data_files;
#!/usr/bin/env bashbin=`dirname "$0"`bin=`cd "$bin"; pwd`. "$bin"/../libexec/hadoop-config.sh#set the hadoop command and the path to the hadoop jarHADOOP_CMD="${HADOOP_PREFIX}/bin/hadoop --config $HADOOP_CONF_DIR“#find the hadoop jarHADOOP_JAR='‘#find under HADOOP_PREFIX (tar ball install)HADOOP_JAR=`find ${HADOOP_PREFIX} -name 'hadoop--*.jar' | head -n1`#if its not found look under /usr/share/hadoop (rpm/deb installs)if [ "$HADOOP_JAR" == '' ]then
HADOOP_JAR=`find /usr/share/hadoop -name 'hadoop--*.jar' | head -n1`fi#if it is still empty then dont run the testsif [ "$HADOOP_JAR" == '' ]then
echo "Did not find hadoop--*.jar under '${HADOOP_PREFIX} or '/usr/share/hadoop'"
exit 1fi#dir where to store the data on hdfs. The data is relative of the users home dir on hdfs.PARENT_DIR="validate_deploy_`date+%s`“TERA_GEN_OUTPUT_DIR="${PARENT_DIR}/tera_gen_data“TERA_SORT_OUTPUT_DIR="${PARENT_DIR}/tera_sort_data“
#!/usr/bin/bash # Sample pmcmd script # Check if the service is alive pmcmd pingservice -sv testService -d testDomain if [ "$?" != 0 ]; then # handle error echo "Could not ping service" exit fi # Get service properties pmcmd getserviceproperties -sv testService -d testDomain if [ "$?" != 0 ]; then # handle error echo "Could not get service properties" exit fi # Get task details for session task "s_testSessionTask" of workflow # "wf_test_workflow" in folder "testFolder" pmcmd gettaskdetails -sv testService -d testDomain -u Administrator -p adminPass -folder testFolder -workflow wf_test_workflow s_testSessionTask if [ "$?" != 0 ]; then # handle error echo "Could not get details for task s_testSessionTask" exit fi
Enterprise scale Workflow Scheduling
That works with your ENTIRE Ecosystem
Monitoring Workflows
Open Platform with Application Integrator
Design Tool Application Hub+
Single Platform for Batch Scheduling
Self Service
SLA and Service Management
Reporting, Analytics, Auditing, Security and Compliance
Comprehensive Scheduling
Dynamic Workload Optimization
Automation of file transfers
Seamless integration in the DevOps/CI/CD Toolchain
PowerfulSimple
Proven
BMC Control-M Workload Automation
© copyright 2015 BMC Software, Inc.