View
8.255
Download
0
Embed Size (px)
DESCRIPTION
Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.
Citation preview
Google Storage, Bigquery and Prediction APIsPatrick Chanezon, Developer Advocate, Cloud@chanezon, [email protected] Paulo, October 29th 2010
Developer DayGoogle 2010
Friday, October 29, 2010
Mobile Agenda for GDD
http://bit.ly/mgddbr
Developer DayGoogle 2010Friday, October 29, 2010
Developer DayGoogle 2010
Agenda
• Google Storage for Developers• Prediction API• BigQuery
Friday, October 29, 2010
What iscloud
computing?
Infrastructure…
Platform…
Software…
… as a Service
Friday, October 29, 2010
What iscloud
computing?
PlacePostage H
ere
IaaSPaaSSaaS
Infrastructure…
Platform…
Software…
… as a Service
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Prediction API
BigQuery
1. Google Apps2. Third party Apps: Google Apps Marketplace3. ________
5
Google App Engine
IaaS
PaaS
SaaS
Google's Cloud Offerings
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Prediction API
BigQuery
Your Apps
1. Google Apps2. Third party Apps: Google Apps Marketplace3. ________
5
Google App Engine
IaaS
PaaS
SaaS
Google's Cloud Offerings
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage for DevelopersStore your data in Google's cloud
Friday, October 29, 2010
Developer DayGoogle 2010
What Is Google Storage?
• Store your data in Google's cloudoany format, any amount, any time
• You control access to your dataoprivate, shared, or public
• Access via Google APIs or 3rd party tools/libraries
Friday, October 29, 2010
Developer DayGoogle 2010
Sample Use CasesStatic content hostinge.g. static html, images, music, video Backup and recoverye.g. personal data, business records Sharinge.g. share data with your customers Data storage for applicationse.g. used as storage backend for Android, App Engine, Cloud based apps Storage for Computatione.g. BigQuery, Prediction API
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Benefits
High Performance and Scalability Backed by Google infrastructure
Strong Security and Privacy Control access to your data
Easy to UseGet started fast with Google & 3rd party tools
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Technical Details• RESTful API
o Verbs: GET, PUT, POST, HEAD, DELETE o Resources: identified by URIo Compatible with S3
• Buckets o Flat containers, i.e. no bucket hierarchy
• Objects
o Any typeo Size: 100 GB / object
• Access Control for Google Accounts o For individuals and groups
• Two Ways to Authenticate Requests o Sign request using access keys o Web browser login
Friday, October 29, 2010
Developer DayGoogle 2010
Performance and Scalability• Objects of any type and 100 GB / Object• Unlimited numbers of objects, 1000s of buckets • All data replicated to multiple US data centers• Leveraging Google's worldwide network for data delivery • Only you can use bucket names with your domain names • “Read-your-writes” data consistency• Range Get
Friday, October 29, 2010
Developer DayGoogle 2010
Security and Privacy Features
• Key-based authentication• Authenticated downloads from a web browser • Sharing with individuals• Group sharing via Google Groups • Access control for buckets and objects• Set Read/Write/List permissions
Friday, October 29, 2010
Developer DayGoogle 2010
ToolsGoogle Storage Manager
gsutil
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage usage within Google
Haiti Relief Imagery USPTO data
Partner Reporting
Google BigQuery
Google Prediction API
Partner Reporting
Friday, October 29, 2010
Developer DayGoogle 2010
Some Early Google Storage Adopters
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage - PricingoStorage
$0.17/GB/Month
oNetworkUpload - $0.10/GBDownload
$0.30/GB APAC$0.15/GB Americas / EMEA
oRequestsPUT, POST, LIST - $0.01 / 1,000 RequestsGET, HEAD - $0.01 / 10,000 Requests
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage - Availability• Limited preview in US* currently
o 100GB free storage and network per accountoSign up for wait list at
o http://code.google.com/apis/storage/
* Non-US preview available on case-by-case basis
Friday, October 29, 2010
Developer DayGoogle 2010
Google Storage Summary • Store any kind of data using Google's cloud infrastructure• Easy to Use APIs • Many available tools and libraries
o gsutil, Google Storage Managero 3rd party:
Boto, CloudBerry, CyberDuck, JetS3t, …
Friday, October 29, 2010
Developer DayGoogle 2010
Google Prediction APIGoogle's prediction engine in the cloud
Friday, October 29, 2010
Developer DayGoogle 2010
Introducing the Google Prediction API
• Google's sophisticated machine learning technology• Available as an on-demand RESTful HTTP web service
Friday, October 29, 2010
Developer DayGoogle 2010
"english" The quick brown fox jumped over the lazy dog.
"english" To err is human, but to really foul things up you need a computer.
"spanish" No hay mal que por bien no venga.
"spanish" La tercera es la vencida.
? To be or not to be, that is the question.
? La fe mueve montañas.
2. PREDICTThe Prediction APIlater searches forthose featuresduring prediction.
How does it work?1. TRAINThe Prediction APIfinds relevantfeatures in the sample data during training.
Friday, October 29, 2010
Developer DayGoogle 2010
CustomerSentiment
TransactionRisk
SpeciesIdentification
MessageRouting
Legal DocketClassification
SuspiciousActivity
Work RosterAssignment
RecommendProducts
PoliticalBias
UpliftMarketing
EmailFiltering
Diagnostics
InappropriateContent
CareerCounseling
ChurnPrediction
... and many more ...
A virtually endless number of applications...
Friday, October 29, 2010
Developer DayGoogle 2010
Automatically categorize and respond to emails by language
• Customer: ACME Corp, a multinational organization• Goal: Respond to customer emails in their language• Data: Many emails, tagged with their languages
• Outcome: Predict language and respond accordingly
A Prediction API Example
Friday, October 29, 2010
Developer DayGoogle 2010
Using the Prediction API
1. Upload
2. Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3. Predict
A simple three step process...
Friday, October 29, 2010
Developer DayGoogle 2010
Upload your training data to Google Storage
• Training data: outputs and input features • Data format: comma separated value format (CSV), result in first column
"english","To err is human, but to really ...""spanish","No hay mal que por bien no venga."...
Upload to Google Storage
gsutil cp ${data} gs://yourbucket/${data}
Step 1: Upload
Friday, October 29, 2010
Developer DayGoogle 2010
Create a new model by training on data
To train a model:
POST prediction/v1.1/training?data=mybucket%2Fmydata
Training runs asynchronously. To see if it has finished:GET prediction/v1.1/training/mybucket%2Fmydata
{"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}}
Step 2: Train
Friday, October 29, 2010
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/mybucket%2Fmydata/predict
{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}
Step 3: Predict
Friday, October 29, 2010
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
POST prediction/v1.1/query/mybucket%2Fmydata/predict
{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}
{ data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}}
Step 3: Predict
Friday, October 29, 2010
Developer DayGoogle 2010
Apply the trained model to make predictions on new data
import httplib
# put new data in JSON formatparams = { ... }header = {"Content-Type" : "application/json"}
conn = httplib.HTTPConnection("www.googleapis.com")conn.request("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict", params, header)
print conn.getresponse()
Step 3: Predict
Friday, October 29, 2010
Developer DayGoogle 2010
Data• Input Features: numeric or unstructured text• Output: up to hundreds of discrete categories
Training• Many machine learning techniques• Automatically selected • Performed asynchronously
Access from many platforms:• Web app from Google App Engine• Apps Script (e.g. from Google Spreadsheet)• Desktop app
Prediction API Capabilities
Friday, October 29, 2010
Developer DayGoogle 2010
• Updated Syntax• Multi-category prediction
o Tag entry with multiple labels• Continuous Output
o Finer grained prediction rankings based on multiple labels • Mixed Inputs
o Both numeric and text inputs are now supported
Can combine continuous output with mixed inputs
Prediction API v1.1 - new features
Friday, October 29, 2010
Developer DayGoogle 2010
Google BigQueryInteractive analysis of large datasets in Google's cloud
Friday, October 29, 2010
Developer DayGoogle 2010
Introducing Google BigQuery– Google's large data adhoc analysis technology
• Analyze massive amounts of data in seconds
– Simple SQL-like query language – Flexible access
• REST APIs, JSON-RPC, Google Apps Script
Friday, October 29, 2010
Developer DayGoogle 2010
Working with large data is a challenge
Why BigQuery?
Friday, October 29, 2010
Developer DayGoogle 2010
Spam TrendsDetection
Web Dashboards
Network Optimization
Interactive Tools
Many Use Cases ...
Friday, October 29, 2010
Developer DayGoogle 2010
• Scalable: Billions of rows
• Fast: Response in seconds
• Simple: Queries in SQL
• Web ServiceoRESToJSON-RPCoGoogle App Scripts
Key Capabilities of BigQuery
Friday, October 29, 2010
Developer DayGoogle 2010
1. Upload
2. Import
Upload your raw data toGoogle Storage
Import raw data into BigQuery table
Perform SQL queries on table
3. Query
Another simple three step process...
Using BigQuery
Friday, October 29, 2010
Developer DayGoogle 2010
Compact subset of SQLo SELECT ... FROM ...WHERE ... GROUP BY ... ORDER BY ...LIMIT ...;
Common functionso Math, String, Time, ...
Additional statistical approximationso TOPo COUNT DISTINCT
Writing Queries
Friday, October 29, 2010
Developer DayGoogle 2010
GET /bigquery/v1/tables/{table name}
GET /bigquery/v1/query?q={query}
Sample JSON Reply:{ "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] }}
Also supports JSON-RPC
BigQuery via REST
Friday, October 29, 2010
Developer DayGoogle 2010
Standard Google Authentication• Client Login• OAuth• AuthSub
HTTPS support• protects your credentials• protects your data
Relies on Google Storage to manage access
Security and Privacy
Friday, October 29, 2010
Developer DayGoogle 2010
Wikimedia Revision history data from:http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
Wikimedia Revision History
Large Data Analysis Example
Friday, October 29, 2010
Developer DayGoogle 2010
Wikimedia Revision history data from:http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
Wikimedia Revision History
Large Data Analysis Example
Friday, October 29, 2010
Developer DayGoogle 2010
Python DB API 2.0 + B. Clapper's sqlcmdhttp://www.clapper.org/software/python/sqlcmd/
Using BigQuery Shell
Friday, October 29, 2010
Developer DayGoogle 2010
BigQuery from a Spreadsheet
Friday, October 29, 2010
Developer DayGoogle 2010
BigQuery from a Spreadsheet
Friday, October 29, 2010
Developer DayGoogle 2010
Input Data: http://delic.io.us/chanezon–6000 urls, 14000 tags in 6 years
Analyze my delicious tags–use delicious API to get all tagged urls–cleanup data, resize (100Mb limit)–PUT data in Google storage–Define table–analyze
Predict how I would tag a technology article–input is tag,url,text–send new url and text–get predicted tag
Prediction API and BigQuery Demo: Tagger
Friday, October 29, 2010
Developer DayGoogle 2010
Nick Johnson’s blog–http://blog.notdot.net/2010/06/Trying-out-the-new-Prediction-API
–42,753 submissions, for a week–63% accuracy, to categorize new submissions
Guessing Subreddits with Prediction API
Friday, October 29, 2010
Developer DayGoogle 2010
• Google StorageoHigh speed data storage on Google Cloud
• Prediction APIoGoogle's machine learning technology able to
predict outcomes based on sample data
• BigQueryo Interactive analysis of very large data setsoSimple SQL query language access
Recap
Friday, October 29, 2010
Developer DayGoogle 2010
• Google Storage for Developerso http://code.google.com/apis/storage
• Prediction APIo http://code.google.com/apis/prediction
• BigQueryo http://code.google.com/apis/bigquery
More information
Friday, October 29, 2010
Mobile Agenda for GDD
http://bit.ly/mgddbr
Developer DayGoogle 2010Friday, October 29, 2010
Developer DayGoogle 2010
Friday, October 29, 2010