Scripting EPrints. Light on syntax object->function(arg1, arg2) Incomplete Designed to give...

Preview:

Citation preview

Scripting EPrints

Light on syntax object->function(arg1, arg2)

Incomplete Designed to

give you a feel for the EPrints data model introduce you to the most significant

objects how they relate to one another their most common methods

act as a jumping off point for exploring

About This Talk

EPrints modules have embedded documentation

Extract it using perldoc perldoc perl_lib/EPrints/EPrint.pm

Finding Documentation

EPrints 3.0

This talk based on EPrints 2.3 series 3.0 API still being finalised

tidies up object hierarchy resolves some of 2.3’s naming clashes

lots of extra functionality but core data model remains the same

EPrints 3.0 is fully back-compatible 2.3 scripts will work with EPrints 3.0

1. Data EPrints, Users, Documents, Subjects,

Subscriptions

2. Data collections DataSets, MetaFields

3. Searching your data SearchExpressions

4. Scripting your archive Archives, Session

Roadmap

EPrints, Users, Documents, Subjects, Subscriptions

1. Data

Data Model Sketch

EPrint

Data Model Sketch

EPrint

Document

Document

PDF

HTMLHTML

HTML

all documents

Data Model Sketch

EPrint

Document

Document

User

PDF

HTMLHTML

HTML

owner

all documents

Data Model Sketch

EPrint

Document

Document

User

PDF

HTMLHTML

HTML

owner

owned eprints

all documents

EPrint

Data Model Sketch

EPrint

Document

Document

User

Subscription Subscription

PDF

HTMLHTML

HTML

owner

owned eprints

all documents

subscriptions

EPrint

Data Model Sketch

EPrint

Document

Document

User

Subscription Subscription

PDF

HTMLHTML

HTML

Subject

owner

owned eprints

all documents

subscriptions

EPrint

Data Model Sketch

EPrint

Document

Document

User

Subscription Subscription

PDF

HTMLHTML

HTML

Subject

Subject

Subject

parent

child

owner

owned eprints

all documents

subscriptions

EPrint

Data Model Sketch

EPrint

Document

Document

User

Subscription Subscription

PDF

HTMLHTML

HTML

Subject

Subject

Subject

parent

child

owner

owned eprints

all documents

posted eprints

subscriptions

EPrint

EPrint

An EPrint object represents a single deposit in your EPrints archive

has some metadata fields has one or more documents is owned by a user

EPrint

new(session, id) create an EPrint object for an existing

deposit create(session, dataset, data)

create a new EPrint object

More on sessions and datasets later!

Creating EPrints

EPrint is a subclass of DataObj DataObj provides common methods

for accessing metadata rendering XHTML output

Introducing DataObj

Inherited from DataObj

get_id get_url(staff)

get the URL of an EPrint e.g. URL to the abstract page of an eprint in

the archive if staff is true then returns the URL to the

staff view, which shows more detail get_type()

get the EPrint type e.g. article, book, thesis, conference

paper...

get_value(fieldname) get the value of the named field

set_value(fieldname, value) set the value of the named field Remember to call commit() to make

changes in database! is_set(fieldname)

true if the named field has a value

Inherited from DataObj

remove() erase the eprint and any associated

records/files from the database and filesystem

this should only be called on EPrints in the "inbox" or "buffer" datasets

commit() commit any changes made to the

database datestamp()

set the last modified date to today

EPrint Methods

move_to_deletion() transfer the eprint to the deletion dataset should only be called on eprints in the

archive dataset

See also: move_to_inbox() move_to_buffer() move_to_archive()

Moving EPrints Around

generate_static() generate the static abstract page for the

eprint in a multi-language archive this will

generate a page in each language

Rendering EPrints

render_citation(style) create an XHTML citation for the EPrint if style is set then use the named citation

style defined in citations-en.xml

render_citation_link(style) as above, but citation is linked to the

EPrint’s abstract page

Rendering - Inherited from DataObj

render_value(fieldname, showall)

get an XHTML fragment containing the rendered version of the value of the named field

in the current language if showall is true then all languages are

rendered usually used for staff viewing (checking) data

Rendering - Inherited from DataObj

Rendering Tips

Most rendering methods return XHTML

but not a string! XML Node objects

DocumentFragment, Element, TextNode...

In your scripts, build a document tree from these nodes

e.g. node1->appendChild(node2) then flatten it to a string Why? It’s easier to manipulate a tree

than to manipulate a large string

XML Node objects are not part of EPrints

XML::DOM or XML::GDOME libraries explore these libraries using perldoc

XHTML is good for building Web pages

but not so good for command line output!

use tree_to_utf8() extracts a string from the result of any

rendering method tree_to_utf8( eprint->render_citation)

More Rendering Tips

get_user() get a User object representing the user

to whom the EPrint belongs get_all_documents()

get a list of all the Document objects associated with the EPrint

We will look at these objects next...

Navigating to Related Objects

User

A User object represents a single registered user

Also a subclass of DataObj inherits metadata access methods

get_url get_type get_value set_value is_set

inherits rendering methods render_citation render_citation_link render_value

Also has commit and remove inherited from DataObj in 3.0

new(session, id) create a User object from an existing

user record user_with_email(session, email) user_with_username(session, username)

create_user(session, access_level)

create a new User

Creating Users

User Accessors

get_editable_eprints() get a list of EPrints that the user can edit

get_owned_eprints(dataset) get a list of EPrints owned by the user in

the dataset is_owner(eprint)

true if the user is the owner of the EPrint get_subscriptions()

get a list of Subscriptions associated with the user

A single document associated with an eprint

may actually contain one or more physical files

PDF = 1 file HTML + images = many files

Another subclass of DataObj

Document

new(session, docid) create a Document object from an

existing record create(session, eprint)

create a new Document object for the given EPrint

Creating a Document Object

get_eprint() get the EPrint object the document is

associated with local_path()

get the full path of the directory where the document is stored in the filesystem

files() get a list of (filename, file size) pairs

Document Accessors

get_main() set_main(main_file)

get/set the ‘main’ file for the document e.g. if the document is multipage HTML with

images, the main file needs to be set to the top index.html file

when rendering document links, EPrints always links to the main file in the document

set_format(format) sets the document format

Main File and Format

Adding Files to Documents

upload(filehandle, filename) uploads the contents of the given file

handle adds the file to the document (using the

given filename) add_file(file, filename)

adds a file to the document (using the given filename)

file is the full path to the file

upload_url(url) grab file(s) from given URL in the case of HTML, only relative links

will be followed add_archive(file, format)

add files from a .zip or .tar.gz archive remove_file(filename)

remove the named file from the Document

Adding Files to Documents

A single subject from the subject hierarchy

Another subclass of DataObj

Subject

new(session, subjectid) create a Subject object from an existing

subject create(session, id, name, parent, depositable)

create a new Subject depositable specifies whether or not

users can deposit eprints in the subject

Creating Subjects

children() get a list of Subjects which are the

children of the subject get_parents()

get a list of Subjects which are the parents of the subject

subject_label(session, subject_tag)

get the full label of a subject, including parents

Subject Accessors

count_eprints(dataset) get the number of eprints associated

with the subject posted_eprints(dataset)

get a list of EPrints associated with the subject

Subject Accessors

render_with_path(session, topsubjid)

get a DocumentFragment containing the subject path

example of a subject path:H Social Sciences > HD Industries. Land use.

Labor > HD28 Management. Industrial Management

Rendering Subjects

A stored search which is performed every day/week/month on behalf of a user

get_user() get the User who owns the subscription

Another subclass of DataObj

Subscription

new(session, id) create a Subscription object from an

existing subscription create(session, userid)

create a new Subscription object for the given user

Creating Subscriptions

send_out_subscription() search for new items matching the

subscription settings email them to the user owning the

subscription

Processing Subscriptions

DataObj Hierarchy

So Far..

We’ve looked at individual data objects

but an EPrints archive holds many eprints and documents, has many registered users etc.

how do we access them collectively? We’ve seen the get_value and set_value methods for metadata

but an archive’s metadata is configurable so how do we know what metadata fields

an EPrint, User etc. has? how do we access properties of the fields?

DataSets and MetaFields

2. Data Collections

A collection of data items Tells us all the possible types in the

collection e.g. EPrints may be article, thesis

Tells us the fields in each type e.g. article has title, authors, publication... e.g. conference_item has title, authors,

event_title, event_date..

Can also tell us all the fields that apply to a dataset

title, authors, publication, event_title..

Dataset

ArchiveMetadataFieldsConfig.pm fields in each dataset additional system fields defined in

EPrint.pm, User.pm etc.

metadata-types.xml types in each dataset fields that apply to each type

Dataset Configuration

Datasets in EPrints

archive EPrints that are live in the main archive

buffer EPrints that have been submitted for editorial

approval deletion

EPrints that have been deleted from the archive inbox

EPrints which users are still working on eprint

All EPrints from archive, buffer, deletion and inbox

user all registered Users

subject all Subjects in the subject tree

document the Documents belonging to all EPrints

in the archive subscription

the Subscriptions which Users have requested

Datasets in EPrints

id() get the id of the dataset

count(session) get the number of items in the dataset

get_item_ids(session) get a list of ids of the items in the

dataset

DataSet Accessors

Datasets and MetaFields

Many Dataset methods return MetaField objects

A MetaField is a single field in a dataset tells us properties of the field

e.g. name, type, input_rows, maxlength, multiple etc.

configured in ArchiveMetadataFieldsConfig.pm but not the field value

the value is specific to the individual EPrint, User etc.

e.g. eprint->get_value(“title”)

get_name() get the field name

get_type() get the field type

get_property(name) set_property(name, value)

get/set the named property to the given value

MetaField Methods

MetaField Type Hierarchy

has_field(fieldname) true if the dataset has a field of that

name get_field(fieldname)

get a MetaField object describing the named field

DataSet Accessors

get_fields() get a list of MetaFields belonging to the dataset

get_types() get a list of all types in the dataset e.g. EPrint types: article, book, book_section,

conference_item, monograph, patent, thesis, other

e.g. User types: user, editor, admin get_type_name(session, type)

get a string containing a human-readable name for the specified type in current language

DataSet Accessors

get_type_fields(type) get a list of MetaFields belonging to the

given type get_required_type_fields(type)

get a list of the MetaFields which are required for the given type

field_required_in_type(field, type)

true if given field is required in given type

DataSet Accessors

render_name(session) get an XHTML fragment containing the

name of the dataset in the language of the current session

render_type_name(session, type) get an XHTML fragment containing the

name of the given type in the language of the session

Rendering DataSets

render_name(session) get an XHTML fragment containing the

name of the field in the current language e.g. from phrases-en.xml: <ep:phrase ref="eprint_fieldname_title"> Title</ep:phrase>

Rendering MetaFields

render_input_field(session, value)

get some XHTML containing input controls that will allow a user to input data to the field

value is the default value

Rendering MetaFields

render_help(session, type) get some XHTML containing help text for a user

inputting some data for the field if an optional type is specified then specific

help for that type will be used if available e.g. from phrases-en.xml:

<ep:phrase ref="eprint_fieldhelp_title">The title of the item.</ep:phrase>

<ep:phrase ref="eprint_fieldhelp_title.book">The title of the book, usually found on the title page.</ep:phrase>

Rendering MetaFields

We know how to access data objects in EPrints

EPrint, User, Document ...

We know how to access collections of these objects

Datasets MetaFields

Now, how do we search for items?

So Far...

Searching Your Archive

SearchExpressions

The conditions of a single search new(data)

create a new search expression from the given data

se = new SearchExpression( session => session,dataset => dataset,custom_order => “title” )

sorted by title, ascending

SearchExpression

add_field(metafield, value) add a new search field with the given

value (search text) to the search expression

if the search field already exists in the search expression, its value is replaced

Adding Search Fields

Example: full text search searchexp->add_field(

dataset->get_field(“title”),“routing”,“IN”,“ALL” )

Adding Search Fields

Example: full text search matches word in title OR abstract searchexp->add_field(

[ ds->get_field(“title”),dataset->get_field(“abstract”) ],

“routing”,“IN”,“ALL” )

Adding Search Fields

Example: date range search searchexp->add_field(

dataset->get_field(“date”),“2000-2004”,“EQ”,“ALL” )

Adding Search Fields

serialise() get a text representation of the search

expression, for persistent storage from_string(string)

unserialises the contents of string but only into the fields already existing in

the SearchExpression

Serialising Searches

render_description() get some XHTML describing the current

parameters of the search expression render_search_form(help)

render an input form for the search expression

if help is true then this also renders the help for each search field in current language

Rendering SearchExpressions

Carry out a search using: perform_search()

The results can then be accessed: count()

get the number of results get_records(offset, count) get_ids(offset, count)

get a list of DataObjs (e.g. EPrint, User) representing the result set, or just their ids

optionally specify a range of results to return from result set using count and offset

Processing Results

map(function, args) using get_records to get results uses a

lot of memory if there are 1000s of results

apply the function to each result without overhead

function is called with args: (session, dataset, dataobj, args)

the DataSet object also has a map function

creates a SearchExpression over dataset sets allow_blank = 1

passes args to searchexp->map

Processing Results

Aside: Lists in EPrints 3.0

In EPrints 3.0, searches return a List ordered collection of DataObjs

In fact, any 2.3 function which returns a list (array) of DataObjs returns a List in 3.0

list->reorder( neworder ) list->union( list2 ) list->intersect( list2 ) list->remainder( list2 )

map over items in the list even arbitrarily constructed ones

Scripting Your Repository

Archives and Sessions

One EPrints installation can host multiple archives

An Archive object is a single EPrints archive

access archive-specific configuration

Don’t confuse the Archive object with the archive DataSet!

archive->get_dataset(“archive”) renamed Repository in 3.0

Archive

get_id() get the id string of the archive.

get_conf(key, subkeys) get a named configuration setting probably set in ArchiveConfig.pm get_conf( "stuff", "en", "foo" )

Archive Accessors

call(cmd, params) calls the subroutine named cmd specified

in the archive configuration (ArchiveConfig.pm etc.) with the given parameters and returns the result

can_call(cmd) true if the named cmd exists in the

archive configuration

lets you delegate processing to “user” space

Calling Archive Subs

Session

Not a session in the traditional Web sense not stateful (although it might be in future!)

3.0 introduces cookie-based authentication global object which provides access to:

current language generic rendering functions CGI parameters (input from forms etc.) http request

Always create a session object at the beginning of your script

don’t forget to terminate it at the end

new(mode, param) set mode to 0 for online session (CGI

script) uses language from cookie, http headers, or

default language

set mode to 1 for offline session (cmd line script)

param is the id of archive, uses default language

terminate() terminate session, performing necessary

cleanup

Creating & Ending a Session

Web Page Building Blocks

make_doc_fragment() create an empty XHTML document fill it with things!

make_text(text) create an XML TextNode

make_element(name, attrs) create an XHTML Elementmake_element("p", align => "right")<p align=”right” />

render_link(uri, target) create an XHTML linklink = session-> render_link("foo.html", "frame1")

link->appendChild(session-> make_text("Foo"))

<a href=”foo.html” target=”frame1”>Foo</a>

Web Page Building Blocks

Many methods for building input forms, including:

render_form(method, dest) render_option_list(params) render_hidden_field(name, value) render_upload_field(name) render_action_buttons(buttons) ...

Web Page Building Blocks

build_page(title, body) wraps your XHTML document in the

archive template send_page()

flatten page and send it to the user

Web Pages

change_lang(langid) change the session language to the given

language ID phrase(phraseid, inserts)

get given phrase (as a string) in the current language

looks up phraseid in language-specific phrase file

e.g. phrases-en.xml

lets look at an example of the inserts parameter...

Language

Language

html_phrase(phraseid, inserts) render an XHTML phrase in the current

languagesession->html_phrase( 'link_to_google', link => session-> render_link(“http://www.google.com”))

gets phrase link_to_google from phrases-en.xml

<ep:phrase><ep:pin ref="link">Search Google</ep:pin></ep:phrase>

<a href="http://www.google.com">Search Google</a>

User Input

have_parameters true if parameters (POST or GET) were passed

to the CGI script (e.g. from an input form) param(name)

get the value of a parameter passed to the CGI script

get_action_button() get the id of the button the user pressed

client get the name of the user’s browser

Navigating the API

Recommended