53
Google Storage, Bigquery and Prediction APIs Patrick Chanezon, Developer Advocate, Cloud @chanezon, [email protected] Sao Paulo, October 29th 2010 Developer Day Google 2010 Friday, October 29, 2010

GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

  • View
    8.255

  • Download
    0

Embed Size (px)

DESCRIPTION

Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.

Citation preview

Page 1: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Google Storage, Bigquery and Prediction APIsPatrick Chanezon, Developer Advocate, Cloud@chanezon, [email protected] Paulo, October 29th 2010

Developer DayGoogle 2010

Friday, October 29, 2010

Page 2: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Mobile Agenda for GDD

http://bit.ly/mgddbr

Developer DayGoogle 2010Friday, October 29, 2010

Page 3: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Agenda

• Google Storage for Developers• Prediction API• BigQuery

Friday, October 29, 2010

Page 4: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

What iscloud

computing?

Infrastructure…

Platform…

Software…

… as a Service

Friday, October 29, 2010

Page 5: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

What iscloud

computing?

PlacePostage H

ere

IaaSPaaSSaaS

Infrastructure…

Platform…

Software…

… as a Service

Friday, October 29, 2010

Page 6: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage Prediction API

BigQuery

1. Google Apps2. Third party Apps: Google Apps Marketplace3. ________

5

Google App Engine

IaaS

PaaS

SaaS

Google's Cloud Offerings

Friday, October 29, 2010

Page 7: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage Prediction API

BigQuery

Your Apps

1. Google Apps2. Third party Apps: Google Apps Marketplace3. ________

5

Google App Engine

IaaS

PaaS

SaaS

Google's Cloud Offerings

Friday, October 29, 2010

Page 8: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage for DevelopersStore your data in Google's cloud

Friday, October 29, 2010

Page 9: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

What Is Google Storage?

• Store your data in Google's cloudoany format, any amount, any time

• You control access to your dataoprivate, shared, or public

• Access via Google APIs or 3rd party tools/libraries

Friday, October 29, 2010

Page 10: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Sample Use CasesStatic content hostinge.g. static html, images, music, video Backup and recoverye.g. personal data, business records Sharinge.g. share data with your customers Data storage for applicationse.g. used as storage backend for Android, App Engine, Cloud based apps Storage for Computatione.g. BigQuery, Prediction API

Friday, October 29, 2010

Page 11: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage Benefits

High Performance and Scalability Backed by Google infrastructure

Strong Security and Privacy Control access to your data

Easy to UseGet started fast with Google & 3rd party tools

Friday, October 29, 2010

Page 12: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage Technical Details• RESTful API 

o Verbs: GET, PUT, POST, HEAD, DELETE o Resources: identified by URIo Compatible with S3 

• Buckets o Flat containers, i.e. no bucket hierarchy

 • Objects 

o Any typeo Size: 100 GB / object

• Access Control for Google Accounts o For individuals and groups

• Two Ways to Authenticate Requests o Sign request using access keys o Web browser login

Friday, October 29, 2010

Page 13: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Performance and Scalability• Objects of any type and 100 GB / Object• Unlimited numbers of objects, 1000s of buckets • All data replicated to multiple US data centers• Leveraging Google's worldwide network for data delivery • Only you can use bucket names with your domain names • “Read-your-writes” data consistency• Range Get

Friday, October 29, 2010

Page 14: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Security and Privacy Features

• Key-based authentication• Authenticated downloads from a web browser • Sharing with individuals• Group sharing via Google Groups • Access control for buckets and objects• Set Read/Write/List permissions

Friday, October 29, 2010

Page 15: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

ToolsGoogle Storage Manager

gsutil

Friday, October 29, 2010

Page 16: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage usage within Google

Haiti Relief Imagery USPTO data

Partner Reporting

Google BigQuery

Google Prediction API

Partner Reporting

Friday, October 29, 2010

Page 17: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Some Early Google Storage Adopters

Friday, October 29, 2010

Page 18: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage - PricingoStorage

$0.17/GB/Month

oNetworkUpload - $0.10/GBDownload

$0.30/GB APAC$0.15/GB Americas / EMEA

oRequestsPUT, POST, LIST - $0.01 / 1,000 RequestsGET, HEAD - $0.01 / 10,000 Requests

Friday, October 29, 2010

Page 19: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage - Availability• Limited preview in US* currently

o 100GB free storage and network per accountoSign up for wait list at

o http://code.google.com/apis/storage/

* Non-US preview available on case-by-case basis

Friday, October 29, 2010

Page 20: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Storage Summary • Store any kind of data using Google's cloud infrastructure• Easy to Use APIs • Many available tools and libraries

o gsutil, Google Storage Managero 3rd party:

Boto, CloudBerry, CyberDuck, JetS3t, …

Friday, October 29, 2010

Page 21: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google Prediction APIGoogle's prediction engine in the cloud

Friday, October 29, 2010

Page 22: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Introducing the Google Prediction API

• Google's sophisticated machine learning technology• Available as an on-demand RESTful HTTP web service

Friday, October 29, 2010

Page 23: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

"english" The quick brown fox jumped over the lazy dog.

"english" To err is human, but to really foul things up you need a computer.

"spanish" No hay mal que por bien no venga.

"spanish" La tercera es la vencida.

? To be or not to be, that is the question.

? La fe mueve montañas.

2. PREDICTThe Prediction APIlater searches forthose featuresduring prediction.

How does it work?1. TRAINThe Prediction APIfinds relevantfeatures in the sample data during training.

Friday, October 29, 2010

Page 24: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

CustomerSentiment

TransactionRisk

SpeciesIdentification

MessageRouting

Legal DocketClassification

SuspiciousActivity

Work RosterAssignment

RecommendProducts

PoliticalBias

UpliftMarketing

EmailFiltering

Diagnostics

InappropriateContent

CareerCounseling

ChurnPrediction

... and many more ...

A virtually endless number of applications...

Friday, October 29, 2010

Page 25: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Automatically categorize and respond to emails by language

• Customer: ACME Corp, a multinational organization• Goal: Respond to customer emails in their language• Data: Many emails, tagged with their languages

• Outcome: Predict language and respond accordingly

A Prediction API Example

Friday, October 29, 2010

Page 26: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Using the Prediction API

1. Upload

2. Train

Upload your training data toGoogle Storage

Build a model from your data

Make new predictions3. Predict

A simple three step process...

Friday, October 29, 2010

Page 27: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Upload your training data to Google Storage

• Training data: outputs and input features • Data format: comma separated value format (CSV), result in first column

"english","To err is human, but to really ...""spanish","No hay mal que por bien no venga."...

Upload to Google Storage

gsutil cp ${data} gs://yourbucket/${data}

Step 1: Upload

Friday, October 29, 2010

Page 28: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Create a new model by training on data

To train a model:

POST prediction/v1.1/training?data=mybucket%2Fmydata

Training runs asynchronously. To see if it has finished:GET prediction/v1.1/training/mybucket%2Fmydata

{"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}}

Step 2: Train

Friday, October 29, 2010

Page 29: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Apply the trained model to make predictions on new data

POST prediction/v1.1/query/mybucket%2Fmydata/predict

{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}

Step 3: Predict

Friday, October 29, 2010

Page 30: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Apply the trained model to make predictions on new data

POST prediction/v1.1/query/mybucket%2Fmydata/predict

{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}

{ data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}}

Step 3: Predict

Friday, October 29, 2010

Page 31: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Apply the trained model to make predictions on new data

import httplib

# put new data in JSON formatparams = { ... }header = {"Content-Type" : "application/json"}

conn = httplib.HTTPConnection("www.googleapis.com")conn.request("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict", params, header)

print conn.getresponse()

Step 3: Predict

Friday, October 29, 2010

Page 32: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Data• Input Features: numeric or unstructured text• Output: up to hundreds of discrete categories

Training• Many machine learning techniques• Automatically selected • Performed asynchronously

Access from many platforms:• Web app from Google App Engine• Apps Script (e.g. from Google Spreadsheet)• Desktop app

Prediction API Capabilities

Friday, October 29, 2010

Page 33: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

• Updated Syntax• Multi-category prediction

o Tag entry with multiple labels• Continuous Output

o Finer grained prediction rankings based on multiple labels • Mixed Inputs

o Both numeric and text inputs are now supported

Can combine continuous output with mixed inputs

Prediction API v1.1 - new features

Friday, October 29, 2010

Page 34: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Google BigQueryInteractive analysis of large datasets in Google's cloud

Friday, October 29, 2010

Page 35: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Introducing Google BigQuery– Google's large data adhoc analysis technology

• Analyze massive amounts of data in seconds

– Simple SQL-like query language – Flexible access

• REST APIs, JSON-RPC, Google Apps Script

Friday, October 29, 2010

Page 36: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Working with large data is a challenge

Why BigQuery?

Friday, October 29, 2010

Page 37: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Spam TrendsDetection

Web Dashboards

Network Optimization

Interactive Tools

Many Use Cases ...

Friday, October 29, 2010

Page 38: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

• Scalable: Billions of rows

• Fast: Response in seconds

• Simple: Queries in SQL

• Web ServiceoRESToJSON-RPCoGoogle App Scripts

Key Capabilities of BigQuery

Friday, October 29, 2010

Page 39: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

1. Upload

2. Import

Upload your raw data toGoogle Storage

Import raw data into BigQuery table

Perform SQL queries on table

3. Query

Another simple three step process...

Using BigQuery

Friday, October 29, 2010

Page 40: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Compact subset of SQLo SELECT ... FROM ...WHERE ... GROUP BY ... ORDER BY ...LIMIT ...;

Common functionso Math, String, Time, ...

Additional statistical approximationso TOPo COUNT DISTINCT

Writing Queries

Friday, October 29, 2010

Page 41: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

GET /bigquery/v1/tables/{table name}

GET /bigquery/v1/query?q={query}

Sample JSON Reply:{ "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] }}

Also supports JSON-RPC

BigQuery via REST

Friday, October 29, 2010

Page 42: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Standard Google Authentication• Client Login• OAuth• AuthSub

HTTPS support• protects your credentials• protects your data

Relies on Google Storage to manage access

Security and Privacy

Friday, October 29, 2010

Page 43: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Wikimedia Revision history data from:http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z

Wikimedia Revision History

Large Data Analysis Example

Friday, October 29, 2010

Page 44: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Wikimedia Revision history data from:http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z

Wikimedia Revision History

Large Data Analysis Example

Friday, October 29, 2010

Page 45: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Python DB API 2.0 + B. Clapper's sqlcmdhttp://www.clapper.org/software/python/sqlcmd/

Using BigQuery Shell

Friday, October 29, 2010

Page 46: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

BigQuery from a Spreadsheet

Friday, October 29, 2010

Page 47: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

BigQuery from a Spreadsheet

Friday, October 29, 2010

Page 48: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Input Data: http://delic.io.us/chanezon–6000 urls, 14000 tags in 6 years

Analyze my delicious tags–use delicious API to get all tagged urls–cleanup data, resize (100Mb limit)–PUT data in Google storage–Define table–analyze

Predict how I would tag a technology article–input is tag,url,text–send new url and text–get predicted tag

Prediction API and BigQuery Demo: Tagger

Friday, October 29, 2010

Page 49: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Nick Johnson’s blog–http://blog.notdot.net/2010/06/Trying-out-the-new-Prediction-API

–42,753 submissions, for a week–63% accuracy, to categorize new submissions

Guessing Subreddits with Prediction API

Friday, October 29, 2010

Page 50: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

• Google StorageoHigh speed data storage on Google Cloud

• Prediction APIoGoogle's machine learning technology able to

predict outcomes based on sample data

• BigQueryo Interactive analysis of very large data setsoSimple SQL query language access

Recap

Friday, October 29, 2010

Page 51: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

• Google Storage for Developerso http://code.google.com/apis/storage

• Prediction APIo http://code.google.com/apis/prediction

• BigQueryo http://code.google.com/apis/bigquery

More information

Friday, October 29, 2010

Page 52: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Mobile Agenda for GDD

http://bit.ly/mgddbr

Developer DayGoogle 2010Friday, October 29, 2010

Page 53: GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs

Developer DayGoogle 2010

Friday, October 29, 2010