36
Predicting the future with Google Prediction API Talks #32

Predicting the future with Google Prediction API

Embed Size (px)

DESCRIPTION

This is my presentation from Talks #32 - http://www.softbinator.ro/events/talks-32/ It is in Romanian && English Pentru discuții puteți să vă alăturați grupului de Facebook - Talks by Softbinator: https://www.facebook.com/groups/talks.by.softbinator/

Citation preview

Page 1: Predicting the future with Google Prediction API

Predicting the future with Google Prediction

API

Talks #32

Page 2: Predicting the future with Google Prediction API
Page 3: Predicting the future with Google Prediction API

RESTful API Flexible Input

Asynchronous cloud-based training, automatic model selection and tuning, and the ability to add training data on the fly.

Numeric or text input that can output hundreds of discrete categories or continuous values.

Page 4: Predicting the future with Google Prediction API

Great, so what do we do now?

The same thing we do every night Pinky, TRY TO TAKE OVER THE WORLD!

Page 5: Predicting the future with Google Prediction API

Does that take any money?

• Well… It’s free. În limita bunului simț :D

1.0 requests/second/user100 requests/dayTraining de 5MB / zi100 de streaming updates / zi

• Lifetime cap (20k predicții), deci după 20k predicții trebuie să începi să plătești...

Page 6: Predicting the future with Google Prediction API

Great, so what do I get for my MONEY? X(• 10$, 10k predicții pe lună gratuite• 10k streaming updates gratis• Max training upload (via Google Cloud Storage 2.5GB)

Page 7: Predicting the future with Google Prediction API

How do I get started?

• Glad you asked!

• Trebuie să creezi un proiect nou în Google Console API și să enable• Google Prediction API• Google Cloud Storage API (requires billing ON, adică vrea cardul tău)

Page 8: Predicting the future with Google Prediction API
Page 9: Predicting the future with Google Prediction API

Great, any documentation to read?

• Yes!• But it totally sucks. (Toate lucrurile din Tools and Resources au link-uri

broken…)• But the Hello World example works. Yuppie!

Page 10: Predicting the future with Google Prediction API

Great, I got things done, now What?

• Now we traing the CSV. If we have it• If not we build it.

Page 11: Predicting the future with Google Prediction API

Great, how should my CSV look like?

“like”, “Am castigat la loto si vreau sa dau tuturor hosting gratuity forever”, “bucuresti” , “loto”“dislike”, “Doi caini maidanezi au muscat 3 pisici clonate si au murit.”, “bucuresti” , “venim”

[output], [feature1], [feature2], [feature3]

Output = Output. Hhahah.

Feature = Input. Poate să fie numeric / text / whatever.

Și FĂRĂ HEADERE la CSV.

Și de maxim 2.5GB.

Eh, dacă ai varianta Free de Google Prediction, 4mb mai exact

Page 12: Predicting the future with Google Prediction API

Great, ne arăți unul?

Page 13: Predicting the future with Google Prediction API

That’s one ugly Excel, not a CSV

NEVER USE EXCEL!Nu face output *content**quotation_mark**comma**quotation_mark**content*

Și nici uploadat în Google Drive și Export din Spreadsheet-ul lor.

So, go for OpenOffice!

Page 14: Predicting the future with Google Prediction API

So? Now what?

• Upload la CSV în Google Cloud Storage.

Page 15: Predicting the future with Google Prediction API
Page 16: Predicting the future with Google Prediction API

500 training Data = 18 sec

Page 17: Predicting the future with Google Prediction API
Page 18: Predicting the future with Google Prediction API
Page 19: Predicting the future with Google Prediction API

476 instances? Shouldn’t be 500 ?

Page 20: Predicting the future with Google Prediction API

Let’s see some fresh meat. I mean tweet. Lol

Page 21: Predicting the future with Google Prediction API

So, cât de bine prezice Google Prediction API ?

• Un băiețaș a vrut să facă niște teste / exemple:• http://blog.notdot.net/2010/06/Trying-out-the-new-Prediction-API• Training on movie/book reviews to try and predict the score given

based on the text• Training on product descriptions to try and predict their rating• Training on Reddit submissions to try and predict the subreddit a new

submission belongs in

Page 22: Predicting the future with Google Prediction API

Guessing subreddits with the Prediction API• He had: 75MB of JSON-encoded data, comprising 72,986 submissions• A determinat 20 subreddits cu cele mai multe submisii in acea

perioada de timp• This subset made up 42,753 submissions, or about 58% of the

original.• Submissions were randomly split into either the training set (98%) or

the validation set (2%):

Page 23: Predicting the future with Google Prediction API

Reddit Submissions

reddit.com 14578

pics 4157

AskReddit 3375

reportthespammers 3258

politics 3162

funny 2176

WTF 1773

gaming 1367

worldnews 938

videos 849

atheism 834

Music 833

technology 732

trees 703

comics 639

nsfw 611

circlejerk 600

news 567

environment 537

DoesAnybodyElse 537

Page 24: Predicting the future with Google Prediction API

După training, Google a estimate o rată de success de 61%

So? Cum s-a descurcat?

484 of 857 predicted correctly.56% - not far off the system's own estimate.

Page 25: Predicting the future with Google Prediction API

Where’s the problem?

• People are the problem, not Google Prediction API• Userii au pus incorect categoriile. NEVER TRUST THE USER!

Page 26: Predicting the future with Google Prediction API

Anyway, back la oile noastre

• Data Harvesting (from Twitter)• Phirehose - https://github.com/fennb/phirehose - a php interface to

twitter streaming api • What have I gathered?

Page 27: Predicting the future with Google Prediction API
Page 28: Predicting the future with Google Prediction API

1,3GB twitter #bigdata harvesting. Hihi

Page 29: Predicting the future with Google Prediction API

Am luat 500 de exemple (but the more, the better)

Le-am introdus în excel, și împărțit în 3 bucket-uri (0,1,2)

0 = Dislike = nu-mi place1 = Fav = îi dau doar fav2 = Reshare dar îi dau și fav și retweet Save to CSV, upload, TRAIN.

Page 30: Predicting the future with Google Prediction API
Page 31: Predicting the future with Google Prediction API

So, cu ce ne ajută?

The interesting part, este că deși avem 3 valori (0 sau 1 sau 2),El ne va return un float între 0 și 2, adică un rezultat de 1,563212 este foarte posibil!

Page 32: Predicting the future with Google Prediction API

Ce-am folosit for the Twitter Bot cool Follower gathering Application blabla?• folosit PHP Library-ul asociat Google Prediction si anume

serviceAccount.php• E stricat!

$result = $service->trainedmodels->predict($id, $input);

Trebuie să fie:

$service->trainedmodels->predict($project, $id, $input);

Page 33: Predicting the future with Google Prediction API

What else?

• Twitter API Exchanger - https://github.com/J7mbo/twitter-api-php

Page 34: Predicting the future with Google Prediction API

So ce anume facem?

• Database, luam ultimul Tweet• Vedem ce scor scoate• Daca scoate un scor bun ii dam fav / retweet.• Atât.

Page 35: Predicting the future with Google Prediction API

Huge recap?

• New Google Project• Enable Google Prediction & Google Cloud Storage• Upload your training CSV• Make Predictions via API EXPLORER• Download PHP Library for Google Prediction & Twitter Library• Fix Google Library• Put all toghether in one php file• Run it, put a sleep, make it run forever lol.