View
1.692
Download
0
Category
Preview:
DESCRIPTION
A presentation I gave at geekcamp 09 (http://geekcamp.my) collating some notes I have been gathering on designing and implementing web APIs / services. Note: This presentation draws from a lot of existing content online and I have attempted to ensure that the sources have copyright that allowed reuse as well as all sources have been duly attributed. If there is any attribution missing or misuse of content please do contact me and I will rectify it.
Citation preview
Web API Do's and Don'ts
Mohanaraj Gopala KrishnanGeekcamp '09
http://www.flickr.com/photos/clarkece/3504299410/
06/15/09 2
Question for you?
Anyone built web APIs ?What for ? What type ? - SOAP, XML-RPC
Used web APIs ?Which one's?
What type of clients ? GUI, CLI, Web browser (mobile/desktop), RIA platforms
06/15/09 3
Web API / Service simple taxonomy
A way for a client to manipulate data on a server via HTTP
RESTful
Uses HTTP as an application protocol
RPC
Uses HTTP as a transport protocol – reimplements many of the features on top
RPC-Rest hybrids
Applies RESTful principals some of the time
Read only, read write
Type of clients that use the web API
06/15/09 4
Caveat emptor
I am no expert some ideas and learning I would like to share
Its all about balanceWhat “should” and what “can” will sometimes be at oddsMaking conscious choices
Biased towards RESTful approach vs RPC approach
Believe its better
06/15/09 5
Authentication and authorization
AuthenticationWho you are?
Authorization What you can do?
06/15/09 6
Do use a form of delegated authorization
Allows use of existing website login system
Authentication handled by existing system – a token passed back to client
Future client requests include token in some form
Ensures authentication credentials don't need to be stored on the client
All you store is some form of token not the actual authentication credentials
Risk mitigationHow: Use ? OAuth
http://www.hueniverse.com/hueniverse/2007/12/where-are-your.html
06/15/09 7
Limitations
Complicated in contrast with Basic auth, cookie based system
Use a standard like OAuth – libraries for both client and server readily available
There are usability issues, the “dance”, confuses users
True – study being done on usability, being worked on
Users are getting more familiar with the idea through education
"We have enough fast, insecure systems. We don't need another"-- Bruce Schneier and Neils Ferguson from "Practical Cryptography"Image from: http://geekz.co.uk/schneierfacts/fact/77
06/15/09 8
Do require signing of API requests
Each request has a checksum to against tampering
A hash is generated from the parameters of the request, and a secret key and included in the request
If includes a timestamp + nonce, in hash, mitigates against replays as well
Not necessary to resort to HTTPS all the time
Again – free with OAuth :)
Hash is either included in the URL or in the Authorization header(preferred)
06/15/09 9
Limitations
Unless using XHR, browsers don't easily allow for setting of headers-must be in URI query string
The hash, will then become cache busting as URL keeps changing
Complicates the client and server implementation
Use OAuth, message signing is central to its operation
Most people salt their hash.Bruce salt and peppers his.
http://geekz.co.uk/schneierfacts/fact/671
06/15/09 10
Do put users in control of their authorizations
Ensure that users can view and revoke authorizations at any point
Ensure that authorization level is reasonably fine grain
Client indicates level of access during initial authorization process, user must approve and can tweak later
Not part of the OAuth standard – but easily extended
Google implemented via scope parameter
06/15/09 11
Calling the API
How do client's access your APIHow do you define
What is being operated on ? (noun)What is the operation ? (verb)
06/15/09 12
Don' treat all HTTP methods the same
All methods are treated – “Send this over”
End up encoding method (verbs) into the URL
06/15/09 13
Don' treat all HTTP methods the same
The methods have semantics – choose the right one
GET , HEAD – safe
No side effects that the client is responsible for
PUT, DELETE – idempotent
Repeated calls – no different from the first
POST – neither safe nor idempotent
06/15/09 14
http://api.rememberthemilk.com/services/rest/?format=json&auth_token=...&filter=status%3Aincomplete&list_id=123&api_sig=...&api_key=...&method=rtm.tasks.getList
HTTP method: GET/POST
http://api.rememberthemilk.com/services/rest/users/mohangk/lists/123/tasks.json?filter=status%3Aincomplete&auth_token=...&api_sig=...&api_key=...
To get a lists of incomplete tasks
Currently
InsteadHTTP method: GET
06/15/09 15
http://api.rememberthemilk.com/services/rest/?format=json&auth_token=...&filter=status%3Aincomplete&list_id=123&api_sig=...&api_key=...&method=rtm.tasks.add&name=Get+geek+camp+presentation+done!
HTTP method: GET/POST
http://api.rememberthemilk.com/services/rest/users/mohangk/lists/123/tasks.json&auth_token=...&api_sig=...&api_key=...
name=Get+geek+camp+presentation+done!
To add a task to a list
Currently
Instead
HTTP method: POST
06/15/09 16
http://api.rememberthemilk.com/services/rest/?format=json&auth_token=...&filter=status%3Aincomplete&list_id=123&task_id=789&api_sig=...&api_key=...&method=rtm.tasks.setName&name=Get+geek+camp+presentation+done+TONIGHT!
HTTP method: GET/POST
http://api.rememberthemilk.com/services/rest/users/mohangk/lists/123/tasks/789&auth_token=...&api_sig=...&api_key=...
name=Get+geek+camp+presentation+done+TONIGHT!
To update a task on a list
Currently
Instead
HTTP method: PUT
06/15/09 17
http://api.rememberthemilk.com/services/rest/?format=json&auth_token=...&filter=status%3Aincomplete&list_id=123&task_id=789&api_sig=...&api_key=...&method=rtm.tasks.delete
HTTP method: GET/POST
http://api.rememberthemilk.com/services/rest/users/mohangk/lists/123/tasks/789&auth_token=...&api_sig=...&api_key=...
To delete a task on a list
Currently
Instead
HTTP method: DELETE
06/15/09 18
Simplifications
Some rememberthemilk artifacts been droppedTaskseries, timelines
PUT Will post and put the complete representationHere reduced to name parameter
06/15/09 19
Why bother?
Allows for reliable client requests over an unreliable network
Recovering from timeouts based on the HTTP method
Clients can safely prefetch data
Google web accelerator
Leverage the fact that HTTP clients/ libraries inherently understand how to use your resource
Not a new concept
Important when you are not in control of the client
Heterogeneous systems
Eg: Browser
Provide UI specific handling – Trying to re-POST
not caching POST
not making POSTS borkmark-able
Understands the semantics of POSTEg. pythons HTTPlib2
Comes with built in private caching features – like a browser
06/15/09 20
Why bother?
Caching
Shared caches – corporate proxies, CDNs – Don't cache POSTs
Private caches – Wrong semantics – will loose out on conditional gets
Bots
Don't click on links they aren't suppose to
http://thedailywtf.com/Articles/The_Spider_of_Doom.aspx
06/15/09 21
Limitations
HTML4 only does GET & POST
Tunnel PUT and DELETE over POST via a hidden variableRails does this using the _method
XHR implements PUT,DELETE,HEAD HTML5 implements as well
Blocked by some over zealous corporate proxies
Happening less as awareness is rising
http://thedailywtf.com/Articles/The_Spider_of_Doom.aspx
06/15/09 22
Representations
What do you send and receive from clients ?What is the view of your data/resource
What the client receives when you do a GETWhat the client sends when doing a PUT or POST
06/15/09 23
Do give your representation serious thought
JSON or XML tend to be the first pickshttp://api.rememberthemilk.com/services/rest/?format=json&auth_token=...&filter=status%3Aincomplete&list_id=123&api_sig=...&api_key=...&method=rtm.tasks.getList
{ "rsp":{ "stat":"ok", "tasks":{ "list":{ "id":"588855", "taskseries":{ "id":"41587591", "created":"2009-05-29T12:32:35Z", "modified":"2009-05-29T12:33:03Z", "name":"Get the geekcammp presentation done!", "source":"js", "task":{ "id":"59163255", "due":"2009-05-28T16:00:00Z", "has_due_time":"0", "added":"2009-05-29T12:32:35Z", "completed":"", "deleted":"", "priority":"N", "postponed":"10", "estimate":"18 hours" } } } } }}
06/15/09 24
Do give your representation serious thought
XML serialisation
Format parameter value is REST – but really nothing to do with REST
<?xml version="1.0" encoding="utf-16"?><rsp stat="ok"> <tasks> <list id="588855"> <taskseries id="41587591" created="2009-05-29T12:32:35Z" modified="2009-05-29T12:47:28Z" name="Get the geekcammp presentation done!" source="js" url="" location_id=""> <tags /> <participants /> <notes /> <task id="59163255" due="2009-05-29T16:00:00Z" has_due_time="0" added="2009-05-29T12:32:35Z" completed="" deleted="" priority="N" postponed="0" estimate="18 hours" /> </taskseries> </list> </tasks></rsp>
http://api.rememberthemilk.com/services/rest/?format=rest&auth_token=...&filter=status%3Aincomplete&list_id=588855&api_sig=...&api_key=...&method=rtm.tasks.getList
06/15/09 25
Do give your representation serious thought
“Web architects must understand that
resources are just consistent
mappings from an identifier to some set of views on server-side state. If one view
doesn’t suit your needs,
then feel free to create a different resource that provides a better view (for any definition of “better”).”http://roy.gbiv.com/untangled/2008/paper-tigers-and-hidden-dragons
http://www.flickr.com/photos/x180/1403558269/
06/15/09 26
Different representation options
Returning an image to represent data points that have changed
“Any black-and-white GIF or PNG image is just a sparse bit array, and they have the nice side-effect of being easy to visualize. We can define our representation of 1 million Flickr users to be a 1000×1000 pixel black-and-white image”
http://roy.gbiv.com/untangled/2008/paper-tigers-and-hidden-dragons
06/15/09 27
Different representation options
How about name value pairs ?Great for simple data structures
Used on the web for formsapplication/x-www-form-urlencoded
What about CSV ?Great for data rows
GET http://example.com/queue/1
Instead of
<queue> <deliver-area>Klang</delivery-area> <delay>65</delay></queue>
Why not just return
delivery-area=Klang&delay=65
http://www.tbray.org/ongoing/When/200x/2009/01/29/Name-Value-Pairs http://code.google.com/apis/visualization/documentation/dev/implementing_data_source.html#responseformat
GET http://example.com/salesOrders/2345
Instead of
<salesOrder id='A'> <item sku='123' cost='4.50'/> <item sku='456' cost='9.50'/></salesOrder>
Why not just return
salesOrder, sku, itemA, 123, 4.50B, 456, 9.50
06/15/09 28
Different representation options
Pick representations that fits your needs best
Support more then one if possible
JSON should be one of them
Indicate the different response types by setting the right Content-type in the response header
Let the client determine the response type
Accept headers
Encode type into the URL
“As always, the point is to think more about resource design and representation when building RESTful systems, and hopefully you will add both bit-vectors and Bloom fitlers to your toolkit.” - Joe Gregariohttp://bitworking.org/news/380/bloom-filter-resources
06/15/09 29
Limitation
No access to the request Accept header to indicate to the server what content type to return
"Accept: text/javascript, text/html, application/xml, text/xml */*"Indicate the content type within the URL
XHR allows you to access the request headers
Browser responding to unknown content-type
Non-trivial to addEnd up tunneling through generic Content-Type
application/jsonapplication/xml
06/15/09 30
Do put URLs into resource representations
Instead of putting unique id, use URLs to point to other resources
Client code implementations easier - no need to figure out how to construct URLs
Reduces coupling
Refactor the server easily URIs are treated as opaque
Corollary
Ensure that all critical resources in the system are addressable
<?xml version="1.0" encoding="utf-16"?>
<rsp stat="ok">
<tasks>
<list id="588855">
<taskseries id="41587591" created="2009-05-29T12:32:35Z" modified="2009-05-29T12:47:28Z" name="Get the geekcammp presentation done!" source="js" url="" location_id="">
<tags />
<participants />
<notes />
<?xml version="1.0" encoding="utf-16"?><rsp stat="ok"> <tasks> <list href="http://example.com/username/lists/588855" name =”Personal list” rel=”up”> <taskseries id="41587591" created="2009-05-29T12:32:35Z" modified="2009-05-29T12:47:28Z" name="Get the geekcammp presentation done!" source="js" url="" location_id=""> <tags /> <participants /> <notes />
Unique ID
URL
06/15/09 31
Caching
Caching Shared and private cache
Require different treatment
06/15/09 32
Do build for client side caching
This comes baked into HTTP – if used right
HTTP client libraries that implement caching will get this for free as well
Python's httplib2
Include
Last-Modified , Etags in response headers
Clients will be able to do conditonal GETs Properly timed Expires, Cache-contol: max-age headers
Allow for well behaving clients to only query server when necessary
Caching will allow for clients to push maintaining application state to the HTTP client library implementation
Good implementations will ensure that
Resources that have been fetched are cached and not refetchedAny non-safe method carried out on a resource invalidated the cache
Application code, ideally will not need to manage cache no need to manually manage caching and cache invalidation
06/15/09 33
Error handling
How do you tell the client that a request is successfulHow do you relay nature/reason for failures ?
Authentication failuresMissing parametersResource not available
06/15/09 34
Don't reinvent HTTP status codes
Tendency to reinvent HTTP status codes within the response body
JSON
XML
06/15/09 35
HTTP has status codes
39 official HTTP response codes2xx, 3xx, 4xx, 5xx7 really good ones
200 ("OK") - Everything's fine. The document in the entity-body, if any, is a representation of some resource.
400 ("Bad Request") - There's a problem on the client side. The document in the entity-body, if any, is an error message. Hopefully the client can understand the error message and use it to fix the problem.
500 ("Internal Server Error") -There's a problem on the server side. The document in the entity-body, if any, is an error message. The error message probably won't do much good, since the client can't fix a server problem.
06/15/09 36
301 ("Moved Permanently") - Sent when the client triggers some action that causes the URI of a resource to change. Also sent if a client requests the old URI.
404 ("Not Found") and 410 ("Gone") - Sent when the client requests a URI that doesn't map to any resource. 404 is used when the server has no clue what the client is asking for. 410 is used when the server knows there used to be a resource there, but there isn't anymore.
409 ("Conflict") - Sent when the client tries to perform an operation that would leave one or more resources in an inconsistent state.
From RESTful Web Services : Appendix B
06/15/09 37
Example HTTP response with HTTP status codes
06/15/09 38
Summary
Use HTTP as how it was meant to be usedThere is little difference between a good API and a good website
06/15/09 39
http://www.sc.ehu.es/siwebso/KZCC/Oracle_10g_Documentacion/server.101/b12170/arch.gif
06/15/09 40
http://technofriends.in/2008/12/14/understanding-content-delivery-networks/
06/15/09 41
http://www.flickr.com/photos/sermoa/2785192486/
06/15/09 42
Recommended