Upload
lethuy
View
232
Download
0
Embed Size (px)
Citation preview
Introduction to the extensible NoSQL DBMS Amos
Tore RischInformation Technology
Uppsala University2015-11-18
What is Amos?• A prototype DBMS developed at UU and LiU• Functional NoSQL query language AmosQL
– Functions and types used for data representation and queries– Functions natural for numerical data, e.g. for datamining– Based on domain calculus, i.e. variables bound to objects, not
rows in tables as in SQL, tuple calculus.• Amos is extensible
– User defined foreign functions can be implemented in e.g. C or Java
– User defined indexes can be implemented in C or Java– External DBMSs can be queried, e.g MongoDB and MySQL– Query optimizer can be extended– Several query languages are supported, e.g. SQL-92, SparQL– Key enabling technology: extensible engine
What is Amos II?• Amos based on main-memory representation of
database– Fast and easy to extend with main-memory indexes– Embeddable in other system– Storage managers (e.g. MongoDB, MySQL) as external back-
ends • Amos runs on PCs under Windows and Linux• Highly parallel data stream management system (DSMS)
on top of Amos: SCSQ, SVALI• Applications: Science, industry, etc.
– E.g. space physics, particle physics, industrial equipment monitoring
The functional data model allows defining Amos schema exactly following above EER-diagram
The functional data model
OBJECTS
FUNCTIONS TYPES
classify
belong torelate
participate in
constrain
defined over
Objects• Regard database as collection of objects• AmosQL Example:
create Person instances :tore;• creates new object, say OID[1145], and binds temporary
environment variable :tore to it. – NOTICE: Environment variables disappear when transaction finished!
• The DBMS manages the objects. • Surrogate type objects have unique object identifiers (OIDs), e.g.
OID[1145]– Explicit creation and deletion, e.g.:
delete :tore;• Objects belong to one or more types where one type is the most
specific type– The type Person is the most specific type of OID[1145]
Objects, non-surrogate types
• No OIDs – no create or delete statement– Created when mentioned– Automatically deleted by garbage collector when no longer used
• Built-in atomic types, literals: String, e.g. ‘This is a string’ Integer, e.g. 1234Real, e.g. 2.34Boolean, e.g. trueTimeval, e.g. |2008-10-13/18:11:43|
• Collection types:Bag, e.g. bag(“Tore”,”Kjell”,”Thanh”)Vector, e.g. {1,2,3}Record, e.g. {“Name”:”Tore”, “Dept”:”IT”}Tuple, e.g. (“Tore”,”IT”)
TypesExamples of type definitions:create type Person;create type Student under Person;create type Instructor under Person;create type TA under Student, Instructor;create type Dept;create type Course;
Object
Function Type Literal Collection UserObject
Number Bag Vector PersonDept Course
Student Instructor
TA
Bag of Person
Types• Classification of objects
– Objects are grouped by the types to which they belong (are instances of)
• Organized in type/subtype hierarchy– Defines that OIDs of one type is subset of OIDs of other types– Objects of user-defined types are subtypes of type UserObject
• Types very useful for classifying data, i.e. for meta-data• Each surrogate type has an extent which is the set of
objects having that type in its type set.– Literal and surrogate types have no extent
• Types and functions are objects tooof types named Type and Function
Functions
• Examples of function definitions:create function name(Person p) -> Charstring nm
as stored;create function name(Dept) -> Charstring as stored;create function name(Course) -> Charstring as stored;create function dept(Person) -> Dept as stored;create function teacher(Course) -> Instructor as stored;create function teaches(Instructor i)-> Bag of Course cas /* Inverse of teacher */
select cwhere teacher(c)= i;
create function instructorNamed(Charstring nm)-> Instructor p
as select pwhere name(p) = nm;
Function partsA function has two parts• Signature:
– Name and types or arguments and resultsname(Person p) -> Charstring nteacher(Course c) -> Instructorplus(Number x, Number y) -> Number r (infix ‘x+y’)children(Person m, Person f) -> Bag of Person cparents2(Person p) -> (Person m, Person f)
• Body:– Specifies how to compute outputs from valid inputs
• Usually inverse too– Usually non-procedural specifications
• i.e. no side effects, except for procedural functions.
Signatures represent ER-diagrams
create type Course;create type Student;create type Instructor;create function enrolled(Student)->Bag of Course as stored;create function teacher(Course)->Instructor as stored;
Student
Course
Instructor
enrolled teacher
n
n
n
1
Kinds of functions• Stored functions
– E..g. name(), dept(), teacher(), enrolled()– Store data values– C.f. object attributes, relational tables
• Derived functions– E.g. teaches(), instructorNamed()– Defined as query– C.f. object methods, relational views
• Foreign functions– Implemented in regular programming language (e.g. C or Java)– Computations, statistics, numeric, accessing files and external systems– Basic built-in functions, e.g. plus()– C.f. object methods, relational UDFs (User Defined Functions)
• Procedural functions (functional stored procedures)– May have side effects like updating the database– Should not be used in queries– C.f. object methods, relational stored procedures
Populating and querying the database
• Use the create statement to create objects and set function valuescreate Dept(name) instances :it (‘IT’);create Instructor(name, dept) instances
:tr (’Tore’, :it),:ko (’Kjell’,:it);
create Course(name,teacher) instances(‘Databases’,:tr), (‘Data Mining’,:ko), (‘ETrade’,:ko);• Equivalent formulation using update command set:create Instructor instances :tr, :ko;set name(:tr) = ’Tore’;set dept(:tr) = :it;set name(:ko) = ‘Kjell’;set dept(:ko) = :it;etc.• Examples of functional expressions, the simplest queries:sqrt(2)+3;instructorNamed('Tore');name(dept(instructorNamed('Tore')));
Queries• Select statements:select <expressions>from <variable declarations>where <restrictions on declared variables>;
• Example:select name(p), name(dept(p))from Person pwhere p = instructorNamed(’Tore’);
=>("Tore",”IT”)NOTICE that variables such as p are bound of objects from any domain, domainrelational calculus, which is different from SQL where variable can be bound only torows in tables, tuple relational calculus!select p, name(p) from Person p;=>(#[OID 1440],"Tore")(#[OID 1441],"Kjell")(#[OID 1442],"Thanh")
Function Composition• Function composition simplifies queries that traverse function graph, Daplex
semantics:name(teaches(instructorNamed('Kjell')));=>"ETrade""Data Mining"
• Daplex semantics: name applied on each element in bag returned from teaches, a form of path specification over functions.
• Equivalent more SQLish flattened query:select nfrom Charstring n, Instructor i, Course cwhere n = name(c)and c in teaches(i)and i = instructorNamed(’Kjell’);
• Notice that literal types like Charstring can be used in from clause.
Aggregate Functions• An aggregate function is a function that coerces some value to a single unit,
a bag, before it is called.• ‘Bagged’ arguments are not ‘flattened’ for aggregate functions (no Daplex
semantics for aggregate functions).
count(teaches(instructorNamed(‘Kjell’)));=>2• Signature of aggregate function count and sum:count(Bag of Object) -> Integer sum(Bag of Number) -> Number
• Nested queries, local bags:sum(select count(teaches(p)) from Instructor p);=>3
Quantifiers• Existential and universal quantification over subqueries supported through
two built-in functions:notany(Bag of Object) -> Booleansome(Bag of Object) -> Boolean
• some() tests if there exists some element in the bag• notany() tests if there does not exist any element in the bag (negation)
• Example:select name(p) from instructor p where notany(name(teaches(p))="Data Mining");
⇒ "Tore“• Quantifiers required for relational completeness
Grouping values in bags• Similar to SQL’s group by with variables bound from any domain:select <grouped result>from <variable declarations>where <restrictions on declared variables>group by <group keys>;
• For example:select name(i), count(c) <= Aggregate functions and group keysfrom Course c, Instructor iwhere teaches(i)=cgroup by name(i); <= Group key=> ("K. Orsborn",2)("T. Risch",1)
Ordering values• Similar to SQL’s order by with variables bound from any domain:select <grouped result>…order by <sort key>;
• For example:select name(i), count(c)from Course c, Instructor iwhere teaches(i)=cgroup by name(i)order by count(c); <= sort key
=> ("T. Risch",1)("K. Orsborn",2)
Top-k queries• Similar to SQL’s order by with variables bound from any domain:select <grouped result>…stop after <sort key>;
• For example:select name(i), count(c)from Course c, Instructor iwhere teaches(i)=cgroup by name(i)order by count(c)limit 1; => ("T. Risch",1)
Bags and scans• Can assign query result to bag variable:set :b = iota(1,1000000);count(:b);-> 1000000• Can iterate over very large bags using scans:open :s1 on iota(1,1000000000000);fetch :s1;-> 1fetch :s1;-> 2and so on for a while…
Updating bag-valued functions• Adding elements to bag-valued functions:add hobbies(personNamed(’T.J.M. Risch’)) = ‘Painting’;add hobbies(personNamed(’T.J.M. Risch’)) = ‘Fishing’;add hobbies(personNamed(’T.J.M. Risch’)) = ‘Sailing’;
hobbies(personNamed(’T.J.M. Risch’));‘Painting’‘Fishing’‘Sailing’• Removing elements from bag-valued functions:remove hobbies(personNamed(’T.J.M. Risch’)) = ‘Fishing’;hobbies(personNamed(’T.J.M. Risch’));‘Painting’‘Sailing’
Procedural functions• E.g. to encapsulate database updates (constructors):create function newInstructor(Charstring nm, Charstringdept) -> Instructor ias begin
create Instructor instances i;set name(i)=nm;set dept(i)=dept;return i;
end;• Iterative update:create function removeOverloadedInstructors(Integer th)
->Bag of Charstringas for each Instructor i
where count(select teaches(i)) >= thbegin return name(i);
delete i;end;
• loop and while statements also, if-then-else, scans
Vectors• The datatype Vector stores ordered sequences of objects of any type, e.g. numerical vectors, tuples.• Vector declarations can be parameterized by declaring
Vector of <type>For example:create type Segment;create function start(Segment)->Vector of Real as stored;create function stop(Segment)->Vector of Real as stored;create type Polygon;create function segments(Polygon)->Vector of Segment
as stored;• Vector values have system provided constructors by {..}:create Segment instances :s1, :s2;set start(:s1)={1.1,2.3};set stop(:s1)={2.3,4.6};set start(:s2)={2.3,4.6};set stop(:s2)={2.8,5.3};create Polygon instances :p1;set segments(:p1)={:s1,:s2};
Vector functions and queries• Functions on vectors can be definedcreate function square(Number r) -> Number as r * r;create function length(Segment s) -> Real
as euclid(start(s),stop(s));create function length(Polygon p) -> Realas sum(select length(s)
from Segment s
where s in segments(p));
• Queries, e.g.:length(:s1);count(select s from Segment s where length(s)>1.34);
vselect start(s) from Polygon p, Segment s, Integer i
where s = segments(p)[i]order by i;
Download and run Amos• sa.amos release by streamanalyze.com can be downloaded from: http://streamanalyze.com/sa-amos/
• Cheat sheet:http://www.it.uu.se/research/group/udbl/amos/doc/AmosCheats.html
• Object-Oriented database modeling tutorial:http://www.it.uu.se/research/group/udbl/amos/doc/tut.pdf
• Previous Uppsala University version, Amos II:http://www.it.uu.se/research/group/udbl/amos/