Computing Cloud - Jordi Torres · data available from tweeter and the method to interact with them,...

Preview:

Citation preview

Cloud ComputingAWS a practical example

Mayo 2012 Hugo PérezUPC

● Introduction ● Infraestructure ● Development and Results ● Conclusions

Index

- 2 -

IntroductionIn order to know deeper about AWS services, mapreduce process, the public data available from tweeter and the method to interact with them, i developed a little example, using: AWS Infraestructure:- Elastic Cloud Compute EC2- Elastic Block Store EBS- Elastic IP- Simple Storage Service S3

AWS Tools:- Management Console- CloudWatch- Elastic MapReduce EMR

Tweeter Search API

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 3 -

Index

● Introduction ● Infraestructure ● Development and Results ● Conclusions

- 4 -

Creating AWS AccountGo to http://aws.amazon.com

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 5 -

Creating AWS AccountSign in as a new user

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 6 -

Creating AWS AccountRecord name, email and password

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 7 -

Creating AWS AccountRecord contact details

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 8 -

Creating AWS AccountRecord payment data

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 9 -

Creating AWS AccountConfirm a PIN by a phone call

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 10 -

Creating AWS AccountConfirming..

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 11 -

Creating AWS AccountWait some minutes until the account is active (less than 10 mins in this case)

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 12 -

Creating EC2Go to AWS Management Console-> EC2 Dashboard

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 13 -

Creating EC2Create a new instance

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 14 -

Creating EC2Choose the AMI (Amazon Machine Image) to install, Ubuntu Server 12.04

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 15 -

Creating EC2Defining number of instances and type, in this case 1 Micro, characteristics:HD: 8Gb (EBS), RAM: 600 Mb, CPU:Intel(R) Xeon(R) CPU E5430 @ 2.66GHz

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 16 -

Creating EC2Defining instance details, like shutdown behavior, user data.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 17 -

Creating EC2Defining tags: user-friendly names to manage the resources

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 18 -

Creating EC2Creating Key Pair to securely connect with the instance.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 19 -

Creating EC2Configuring the firewall

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 20 -

Creating EC2Review

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 21 -

Creating EC2You can check the details from the Management Console

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 22 -

Creating EC2Also you can monitor the instance, create alarms, configure detailed monitoring.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 23 -

Creating Elastic IPNow you can access to the instance by ssh using this name: ec2-23-23-187-119.compute-1.amazonaws.comTo simplify it, you can create a elastic ip address

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 24 -

Creating Elastic IPOnce created the elastic ip

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 25 -

Creating Elastic IPYou should associate it with the instance

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 26 -

Creating S3Defining the name and region, the region should be the same that EC2 to optimize for latency. AWS gives 5 Gb free.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 27 -

Creating S3Set permissions to grant access to list the S3 Bucket to Authenticated Users.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 28 -

Creating Billing AlarmFirst you have to enable this function.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 29 -

Creating Billing AlarmDefine the parameters: recipients and threshold

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 30 -

Cloud WatchBesides the alarm, you can check the estimated charges, through cloud watch

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 31 -

Cloud WatchThrought cloud watch you can query different kind of metrics

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 32 -

Index

● Introduction ● Infraestructure ● Development and Results ● Conclusions

- 33 -

Installing EMR CLIConnect to the serverssh -i awskey.pem ubuntu@23.21.252.15 Install the Amazon Elastic MapReduce Ruby Client$ mkdir elastic-mapreduce-cli$ cd elastic-mapreduce-cli$ wget http://elasticmapreduce.s3.amazonaws.com/elastic-mapreduce-ruby.zip$ unzip elastic-mapreduce-ruby.zip

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 34 -

Installing EMR CLI

Configuring credentials$ vi credentials.json

{"access_id": "[Your AWS Access Key ID]","private_key": "[Your AWS Secret Access Key]","keypair": "[Your key pair name]","key-pair-file": "[The path and name of your PEM file]","log_uri": "[A path to a bucket you own on Amazon S3, such as, s3n://mylog-uri/]","region": "[The Region of your job flow, either us-east-1, us-west-2, us-west-1, eu-west-1, ap-northeast-1, ap-southeast-1, or sa-east-1]"}

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 35 -

Installing EMR CLIYou can get the AWS Access Key ID and the AWS Secret Access Key by entering to your account in http://aws.amazon.com in the Access Credentials section.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 36 -

Installing EMR CLIIt is recomended to create a new key pair for the exercise. I did it from Management Console, i put this key pair in the EC2 instance.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 37 -

Installing EMR CLII save all the parameters in the file: ubuntu@ip-10-195-195-175:~/elastic-mapreduce-cli$ more credentials.json{"access_id": "HPVAJFNULSZULY5NWHPV","private_key": "65xBzYVzV7THPVYWW2LcYN0roVwK1I+nxJ+BNHPV","keypair": "mapReduce","key-pair-file": "/home/ubuntu/mapReduce.pem","log_uri": "s3n://mylog-uri-hpv/","region": "us-east-1"}

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 38 -

Basics EMR CLIBasic commands of EMR CLI: $ ./elastic-mapreduce --help$ ./elastic-mapreduce --create$ ./elastic-mapreduce --list$ ./elastic-mapreduce --describe --jobFlow [JobFlowID]$ ./elastic-mapreduce -j JobFlowID --stream$ ./elastic-mapreduce --terminate JobFlowID

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 39 -

MapperThe mapper script, the classic word counter:

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 40 -

#!/usr/bin/pythonimport sys import re

def main(argv): pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*") for line in sys.stdin: for word in pattern.findall(line): print "LongValueSum:" + word.lower() + "\t" + "1"

if __name__ == "__main__": main(sys.argv)

Using Twitter API To generate the input data, run a simple query to twitter:

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 41 -

Using Twitter API Query:http://search.twitter.com/search.json?q=cloud%20computing&rpp=5&include_entities=true&result_type=mixed pattern: cloud computing rpp: return per page=5 include_entities: if it is true the result includes urls, media and hashtags result_type: - mixed: Include both popular and real time results in the response.- recent: return only the most recent results in the response- popular: return only the most popular results in the response.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 42 -

Using Twitter API Query:http://search.twitter.com/search.json?q=cloud%20computing&rpp=5&include_entities=true&result_type=mixed pattern: cloud computing rpp: return per page=5 include_entities: if it is true the result includes urls, media and hashtags result_type: - mixed: Include both popular and real time results in the response.- recent: return only the most recent results in the response- popular: return only the most popular results in the response.

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 43 -

Using Twitter API Transfer the result to S3: $ s3curl.pl --id=personal --put=cloudcomputing -- http://s3.amazonaws.com/mylog-uri-hpv/entradas/cloudcomputing

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 44 -

Exec EMR $ ./elastic-mapreduce --create --stream --mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py --input s3://mylog-uri-hpv/entradas/cloudcomputing --output s3://mylog-uri-hpv/salidas/cloudcomputing --reducer aggregate $ ./elastic-mapreduce --list --activej-3EBJ6MT4FBM80 STARTING Development Job Flow PENDING Example Streaming Step $ ./elastic-mapreduce --list --activej-3EBJ6MT4FBM80 RUNNING ec2-23-20-6-34.compute-1.amazonaws.com Development Job Flow RUNNING Example Streaming Step

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 45 -

Exec EMR Monitoring from Management Console

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 46 -

Exec EMR Provisioning on demand

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 47 -

Exec EMR Monitoring Graphs

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 48 -

Results EMR Results on S3

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 49 -

Index

● Introduction ● Infraestructure ● Development and Results ● Conclusions

- 50 -

ConclusionsThe software development model is completely new. Is eliminated the purchase process, the installation process is becoming easier, the role of system administrator (sysadmin, DBA, etc.) is disappearing, the developer can focus on business logic, not only provides AWS infrastructure, but also the development platform. Twitter api is well documented and easy to use. This model is available to a company of any size. The free application layer covers all hardware components used in this exercise (EC2, EBS, Elastic IP, S3) except for one small EC2 instance that is used on demand in the process of MapReduce. The total charge for the development of this exercise was USD $ 0.45

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 51 -

ConclusionsCharges:

1. Introduction 2. Infraestructure 3. Development and Results 4. Conclusions - 52 -

References

http://aws.amazon.comhttp://aws.amazon.com/es/elasticmapreduce/http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/Welcome.html?r=6602 https://dev.twitter.com/docshttps://dev.twitter.com/starthttps://dev.twitter.com/docs/using-searchhttps://dev.twitter.com/docs/api/1/get/search

ThanksIn order to know deeper about AWS services, mapreduce process, the public data available from tweeter and the method to interact with them, i developed a little example, using: AWS Infraestructure:- Elastic Cloud Compute EC2- Elastic Block Store EBS- Elastic IP- Simple Storage Service S3

AWS Tools:- Management Console- CloudWatch- Elastic MapReduce EMR

Tweeter Search API

Recommended