Designing and Implementing Web Data Services in Perl

Preview:

DESCRIPTION

Designing and Implementing Web Data Services in Perl. Michael McClennen. Server. Data Store. Request. Client. Response. What is "REST" ?. REST is a set of architectural principles for the World Wide Web Developed by Roy Fielding, one of the Web's principal architects - PowerPoint PPT Presentation

Citation preview

Designing and Implementing Web Data Services in Perl

Michael McClennen

Server

DataStore

Client

Request

Response

What is "REST" ?

• REST is a set of architectural principles for the World Wide Web

• Developed by Roy Fielding, one of the Web's principal architects

• Stands for "REpresentational State Transfer"• No consensus about exactly what it means in

practice

REST: original principles• Separation of client and server by a uniform interface• Intermediate servers (i.e. proxies or caches) may be

interposed arbitrarily• All client-server interactions are stateless• Data is composed of resources, each identified by a URI• Server sends a representation of a resource• Clients can manipulate the resource by means of the

representation• Representations are self-describing• Client state transitions depend upon information embedded

in representations (HATEOAS)

REST: in practice

1. One protocol layer, generally HTTP– no extra layers (such as SOAP) on top of it– headers and status codes are used as designed

2. Resources are identified by URIs– individual resources– all resources matching particular criteria

3. Client-server interactions are stateless– with the possible exception of authentication

Server

DataStore

Client

Web Data

Service (API)

Query

HTTP Response

HTTP Request

HTTP Response

HTTP Request

Operation

Result

Result

Web Data Service (API)

• Parse HTTP requests• Validate parameters• Talk to the backend data store• Assemble representations of data• Serialize representations in JSON, XML, …• Set HTTP response headers• Generate appropriate error messages• Provide documentation about itself

What makes a good Web Data Service,

from the point of view of the USER?

Well designedWell documented

FlexibleConsistentResponsive

Example: Wikipedia API

http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Perl&aplimit=50&format=json

“ List 50 pages whose title starts with ‘Perl’, in JSON format ”

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=50 Specify size of result setformat=json Specify result format

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=50 Specify size of result setformat=xml Specify result format

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=xml Specify result format

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=foobar Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=xml Specify result format

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=foobar Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=json Specify result format

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php? Base URLaction=query Specify type of operationlist=allpages Specify operationapfrom=Perl Query parameteraplimit=5 Specify size of result setformat=json Specify result formatfoo=bar *Bad parameter*

Example: Wikipedia API

Execute

http://en.wikipedia.org/w/api.php Base URL only

Example: Google Feed API

https://ajax.googleapis.com/ajax/services/feed/find?v=1.0&q=Perl

“ List all feeds whose title contains ‘Perl’ ”

Example: Google Feed API

Execute

https://ajax.googleapis.com/ajax/services/ Base URLfeed/find? Specify operationq=Perl Query parameterv=1.0 Protocol version

Example: Google Feed API

https://ajax.googleapis.com/ajax/services/feed/load?v=1.0&q=http://www.perl.com/pub/atom.xml&num=10

“ Show the most recent 10 entries from the feed http://www.perl.com/pub/atom.xml ”

Example: Google Feed API

Execute

https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameterv=1.0 Protocol versionnum=10 Size of result set

Example: Google Feed API

Execute

https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameterv=1.0 Protocol versionnum=NOMNOMNOM * bad value *

Example: Google Feed API

Execute

https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameterv=1.0 Protocol versionnumm=10 * bad parameter *

Example: Google Feed API

Execute

https://ajax.googleapis.com/ajax/services/ Base URLfeed/load? Specify operationq=http://www.perl.com/pub/atom.xml Query parameter

* missing version *

Example: Google Feed API

Execute

https://ajax.googleapis.com/ajax/services/ Base URL

What makes a good Web Data Service CODEBASE,

From the point of view of the programmer?

Easy to implementEasy to documentEasy to maintain

Low overhead

Web Data Service (API)

• Parse HTTP requests• Validate parameters• Talk to the backend data store• Assemble representations of data• Serialize representations in JSON, XML, …• Set HTTP response headers• Generate appropriate error messages• Provide documentation about itself

Basic data service procedure

1. Parse URL2. Determine operation and result format3. Validate and clean the parameter values4. Get data from the backend (using param. vals.)5. Serialize the data in the selected format6. Set HTTP response headers appropriately7. If anything goes wrong, generate an error response

Introducing Web::DataService

• On CPAN as Web::DataService• Built on top of Dancer• You define operations, parameter rules,

output blocks, and it handles the rest• Complete enough for real use• Documentation still incomplete• Needs collaborators, testers, users

Important early decisions

1. Which framework to use2. How to validate parameter values3. How to organize your parameter space4. How to handle output formats5. How to implement the response procedure6. How to handle versioning7. How to report errors8. How to handle documentation

Decisions that can wait

• Which HTTP server to use• Which backend framework to use• Strategies for Caching and other performance

enhancements

Plan for these from the start:

• Multiple output formats• Multiple output vocabularies• Multiple protocol versions• Auto-generated documentation

Decision 1: which framework?

• Dancer 1• Dancer 2• Mojolicious• Web::DataService

Decision 2: parameter values

• How will the parameter values be validated and cleaned?

• Recommendation: use HTTP::Validate

define_ruleset('1.1:taxa:specifier' => { param => 'name', valid => \&TaxonData::validNameSpec, alias => 'taxon_name' }, "Return information about the most fundamental taxonomic name",

"matching this string. The C<%> and C<_> characters may be used",

"as wildcards.",{ param => 'id', valid => POS_VALUE, alias => 'taxon_id' }, "Return information about the taxonomic name corresponding to

this", "identifier.",{ at_most_one => ['name', 'id'] } "You may not specify both C<name> and C<id> in the same query.");

Decision 2: parameter values

• How will the parameter values be validated and cleaned?

• Recommendation: use HTTP::Validate

Decision 3: parameter space

• How will users specify which operation to do?– http://exmpl.com/service/some/thing ? …– http://exmpl.com/service ? op=something & …

Decision 4: output formats

• How will users specify the output format?– http://exmpl.com/service/something.json ? …– http://exmpl.com/service ? … & format=json …

• Recommendation: separate the definition of output fields from output formats

x

x

x

x

x

x

x x

x

x

$ds->define_block('1.1:taxa:basic' =>{ output => 'taxon_no', dwc_name => 'taxonID', com_name => ’oid' }, "A positive integer that uniquely identifies this taxonomic name",{ output => 'record_type', com_name => 'typ', com_value => ’txn', dwc_value => 'Taxon', value => 'taxon' }, "The type of this record. By vocabulary:", "=over", "=item pbdb", "taxon", "=item com", "txn", "=item dwc", "Taxon",

"=back",{ set => 'rank', if_vocab => 'pbdb,dwc', lookup => \%RANK_STRING },{ output => 'rank', dwc_name => 'taxonRank', com_name => 'rnk' }, "The rank of this taxon, ranging from subspecies up to kingdom",{ output => 'taxon_name', dwc_name => 'scientificName', com_name

=> 'nam' }, "The scientific name of this taxon",{ output => 'common_name', dwc_name => 'vernacularName', com_name => 'nm2' }, "The common (vernacular) name of this taxon, if any",{ set => 'attribution', if_field => 'a_al1', from_record => 1, code => \&generateAttribution },… );

• Web::DataService provides:– Web::DataService::Plugin::JSON.pm– Web::DataService::Plugin::XML.pm– Web::DataService::Plugin::Text.pm– you can add your own

• Output is delegated to the appropriate module based on the selected format

Decision 4: output formats

• How will users specify the output format?– http://exmpl.com/service/something.json ? …– http://exmpl.com/service ? … & format=json …

• Recommendation: separate the definition of output fields from output formats

Decision 5: procedure

• How will you handle the basic request-response procedure?

• Recommendation: specify a set of attributes for each operation, and use a single body of code to handle operation execution

$ds->define_path({ path => 'taxa',class => 'TaxonData',output => '1.1:taxa:basic',doc_title => 'Taxonomic names' });

$ds->define_path({

path => 'taxa/single',allow_format => 'json,csv,tsv,txt,xml',allow_vocab => 'com,pbdb,dwc',method => 'get',doc_title => 'Single taxon' });

$ds->define_path({

path => 'taxa/list',allow_format => 'json,csv,tsv,txt,xml',allow_vocab => 'com,pbdb,dwc',method => 'list',doc_title => 'Lists of taxa' });

Decision 5: procedure

• How will you handle the basic request-response procedure?

• Recommendation: specify a set of attributes for each operation, and use a single body of code to handle operation execution

Decision 6: versioning

• How will users specify which protocol version?– http://exmpl.com/service/some/thing ? … & v=1.0– http://exmpl.com/service1.0/some/thing ? …

• Recommendation: make your users specify a version from the very beginning

Decision 7: error reporting

• Recommendation: report errors in JSON if that format was selected

• Recommendation: use the HTTP result codes– 400 Bad request– 404 Not found– 415 Unrecognized media type– 500 Server error

• Recommendation: if your code throws an exception, report a generic message

Decision 8: documentation

• Recommendation: auto-generate documentation as much as possible

• Recommendation: a request using the base URL with no parameters should return the main documentation page

Other recommendations

• Recommendation: know the HTTP protocol– Status codes (400, 404, 500, 301, etc.)– CORS ("Access-Control-Allow-Origin")– Cache-Control– Content-Type

Final example

• The Paleobiology Database Navigator– http://paleobiodb.org/navigator

• Based on the Paleobiology Database API– http://paleobiodb.org/data1.1/

Call for collaboration

• Please let me know if you are interested in:– Using Web::DataService– Testing Web::DataService– Helping to further develop Web::DataService

mmcclenn@geology.wisc.edu

Recommended