4

Advantages of Mnesia

Embed Size (px)

Citation preview

Mnesia - An Industrial DBMSwith Transactions, Distribution and a

Logical Query Language

Hans Nilsson and Claes Wikstr�om

fhans,[email protected]

Computer Science Laboratory

Ericsson Telecom AB

S-125 26 Stockholm, Sweden

AbstractMnesia is a full DBMS made for telecommunica-

tions industrial needs. It has distributed transactions,fast real time lookups, crash recovery and a logicalquery language. The DBMS is written in the func-tional language Erlang which is also the intendedapplications language. It already has real users devel-oping products.

1 IntroductionMnesia is a multiuser Distributed DBMS

(Database Management System) specially made forindustrial telecommunications applications written inthe symbolic language Erlang[AVWW95].

In telecommunications applications there areneeds di�erent from the features provided by tradi-tional DBMS's. The applications we now see imple-mented in the Erlang language need a mixture ofa broad range of features. The most extreme amongthose are:

1. fast soft real-time key-value lookup

2. complicated non-realtime queries mainly for op-eration and maintenance

3. distributed data due to distributed applications

4. high fault-tolerance

5. dynamic re-con�guration

6. variable-length records

In the current �rst version of Mnesia we primarilyuse well-known algorithms. In following versions wewill begin to experiment with alternative solutions.

Many telecommunications applications are struc-tured in such way that there are two di�erent setsof data. One which is crucial for tra�c, and onewhich contains maintenance data. The performancerequirements on the tra�c data are severe. Thismeans that

1. The tra�c data must be kept in memory at alltimes.

2. The tra�c data must be organized in such away so that when an external stimuli arrivesto the system, all data which is necessary toprocess that stimuli must be located in a verye�cient manner

The maintenance data usually contains a copy ofthe tra�c data, although structured in a di�erentway. Traditionally, the management of tra�c datain telecommunications applications has not been per-formed by a general purpose DBMS, but rather tai-lor made for each application. One of the designgoals for Mnesia was to provide a DBMS where boththe requirements on the tra�c data as well as themaintenance data are met.

The rest of this paper is organized as follows:Section 2 is an Erlang introduction and section 3is an overview of Mnesia. Some usage of Mnesia isgiven in section 4 and some notes of the future andconclusions are given in section 5 and 6.

2 ErlangErlang[AVWW95] is a symbolic functional lan-

guage intended for large real-time applicationsmainly in the telecommunication area.

Functions may have many \clauses" and the cor-rect one is selected by pattern matching. The visi-bility is controlled by explicit export declarations inmodules.

Processes are lightweight and created explicitly.They communicate with asynchronous message pass-ing. Messages are received selectively using patternmatching. When running on UNIX there is onlyone UNIX process namely the Erlang node. Thisnode has its own internal process handling, so theErlang processes should not be mistaken for UNIXprocesses.Erlang nodes can be run on di�erent types of

machines and operating systems and the Erlang

processes can communicate across a network withprocesses on other nodes. This does not in uence theprogram writing, except if the message passing timeis critical and the physical transport medium is slow.No extra protocol speci�cations are necessary. The

message passing is competitive with other networkcommunication methods [Wik94].

There are two kinds of support for error recoveryin the language: exceptions and special exit messagespropagated to linked processes when a process dies.Those messages can be caught by the other processesto activate some recovery plan, for instance restartthe dying process.Erlang programs communicate with hardware

and with programs written in other languages byports. The ports behave similar to processes in thatthey can receive messages from processes and sendmessages back to them. The di�erence from pro-cesses are that ports are connected either to non-Erlang software or to hardware.

Example 1 The append function which returns thetwo argument lists appended is in Erlang:

append([H|T], Z) -> [H|append(T,Z)];append([], X) -> X.

2

3 Mnesia3.1 Features

The DBMS Mnesia provides:

a. storage of full Erlang variant length terms:

(i) primary memory for fast access (with op-tional disk backup)

(ii) disk-only for large data sets

b. a logic query language

c. a fast program interface for very simple queries

d. distributed transactions and query evaluation

e. crash recovery

f. replication of data on separate nodes

g. location transparency: applications are writtenwithout knowledge where and how the tablesare located

h. run-time schema alteration

Those features are for Mnesia's ful�llment of therequirements given in the introduction. The map-ping from the requirements to the features could bepictured in a table:

Mnesia Requirementfeature 1 2 3 4 5 6a(i) x xa(ii) xb xc xd xe xf x x x xg xh x

Requirement and feature mapping

The feature list above is to be seen as an optionslist for the Mnesia user. For data which is accesstime critical but where it need not survive a crash,the user selects primary memory storage only. It ispossible to dump the contents of a primary memorytable to disk at regular intervals. Data which mustsurvive a crash, and have high requirements on ac-cess time, can be confugured in such a way that thedata is always kept in RAM and all write operationson the data are logged to disc. The system will thenat regular intervals checkpoint the data and truncatethe log. Data which is not performance critical, canbe kept on disc only.

An advanced feature of Mnesia is that all systemactivities that are performed, such as checkpointing,dumping tables, schema recon�gurations, e.t.c neverstop the system. So for example, although a tableis being copied to a remote node, the table is stillavailable for write operations to all applications.

3.2 Data model and schemaData is organized in tables. A table has a set of

properties, for example:

� on which node it is located

� where it is replicated (= a list of zero or morenodes)

� storage (RAM, RAM with Disc-copy or Disc-only)

� if the system should maintain any indices

The table de�nitions and the table properties arecollected in a schema which may be altered in run-time.

Each individual �eld in a row in a table, may bearbitrarily complex and contain compound Erlang

terms including object code, lambda expressions andregular data. This is one of the features which makesMnesia suitable for many telecommunications prob-lem where the single most important requirement onthe DBMS is that an individual lookup is very fast.This property makes data modeling very exible andit allows for a data model where all the data whichis necessary for i.e. a call in a telephony switch, tobe bundled in a single record.

3.3 Concurrency control and crash recov-ery

Operations on a database are grouped into trans-actions which are distributed over the network. Allcompleted transactions are logged as well as the'dirty' operations where a time-critical applicationcan bypass the transaction system.

The lock manager uses a multitude of traditionaltechniques. Locking is dynamic, and each lock isacquired by a transaction when needed. Regular two-phase locking [EGLT76] is used and deadlock pre-vention is traditional wait-die [RSL78]. The timestamps for the wait-die algorithm are acquired byLamport clocks [Lam78] maintained by the trans-action manager on each node. When a transactionis restarted, its Lamport clock is maintained, thus

making Mnesia live lock free as well. The lockmanager also implements multi granularity locking[GLPT75].

Traditional two-phase commit [Gre78] is used bythe transaction manager when a transaction is �n-ished.

Since Mnesia is running on top of distributedErlang the implementation is greatly simpli�ed.In a distributed application there are separateErlang nodes running on (usually) di�erent ma-chines. Erlang takes care of the communicationbetween processes possibly on separate nodes trans-parently. Processes and nodes can easily be started,supervised and stopped by processes on other nodes.This makes lots of communication implementationproblems disappear for Mnesia as well as for applica-tions.

Mnesia has (at least) one process on each partic-ipating node. Those processes takes care of updates,transactions, emergency recon�guration at node fail-ure as well as normal maintenance recon�gurations.

The user API to the transaction system isstraightforward and easy to use. The programmerpresents the transaction system with a lambda ex-pression and a closure, which is the executed by thetransaction system.

3.4 QueriesThe query construction is integrated with the

Erlang language by list comprehensions. This re-moves the impedance mismatch that a solutionlike Embedded SQL or some embedded kind ofDatalog[Ull89] would give with a functional languagelike Erlang. The use of list comprehension in con-nection with relational DBMS has been known for along time[Bre88].

Example 2 The Erlang list comprehension:

[E.name ||E <- table(employee),D <- table(department),E.name = D.boss,E.sex = female]

extracts a list of the names of all female bossesin a database. This is equivalent with the Datalogexpression:

employee(Name, , ,female, , ),department( , , ,Name)

2

We think that the straight forward translation be-tween List Comprehension and Datalog makes a niceand clean connection.

A logical query language is chosen by two reasons.The �rst is that one of the authors has a backgroundin logic programming and the second is that we willneed to make an SQL interface in the future. Wethink that with a logical language as target the SQLtranslation will be somewhat simpler.

Cursors are available for the users. If the node ofthe querying user process is running on a multi-cputhe database query could sometimes be parallelizedand will always run in parallel with the user process.

3.5 About the ImplementationAll of Mnesia is written in Erlang except from

a special key-value dictionary using linear hashing[Lar88]. This dictionary was added to the Erlang

implementation to get fast key-lookups.The majority of our present users queries are non-

recursive. Therefore we have chosen to separate therecursive and the non-recursive parts of the queries.In the �rst version we make the non-recursive queryevaluation e�cient and in a later version the recur-sion will be more e�cient than today.

Minimal cycles in the query ow graph are col-lapsed into one node giving a non-cyclic graph cor-responding to a non-recursive query with explicit re-cursion operators. This non-recursive query is op-timized and compiled using standard methods likepartial evaluation and goal reordering guided bystatistics about the database. Note that there isno optimization of recursive parts (yet). They aresimply put last in the query.

The query is evaluated by the relational DBMStechnique operators[Gra93]. The recursive parts areevaluated by SLG [WC93, WCS93, CW93] reso-lution. This is currently implemented by a naivestraight-forward interpreter.

The operator technique maps extremely well ontothe Erlang processes and messages | as a spin-o� e�ect queries are automatically parallelized whenrun on a multi-cpu computer.

4 ProjectsMnesia is currently used in several development

projects within Ericsson. Some of these projects aresmall scale prototype projects and some are largescale product projects. Mnesia is being used for bothtra�c data as well as maintenance data in variousways and the applications include pure switch con-trollers, Intelligent Network controllers as well as of-�ce/telephony/internet applications.

5 The FuturePrimarily we will evaluate Mnesia with the appli-

cations that are now being written. We will com-pare the performance with commercial DBMS aswell with modern research DBMS. Preliminary per-formance measurements are promising.

Mnesia has no consistency checking at updates to-day. We will therefore add integrity constraint check-ing in a near future.

In the area of query processing there are two ob-vious continuations namely making an e�cient SLGimplementation and to optimize recursive queries.

We will try other transaction and locking algo-rithms, especially real-time algorithms.

There is a need by some of our users of an SQL-interface. This will be provided and probably alsoan ODBMC interface. This will enable external PC-applications to access the DBMS.

Another problem which has not been addressed isthe problem of partitioned networks. If the networkbetween two Mnesia nodes fail, but the nodes them-selves continue to operate, we have a problematicsituation.

6 Experiences and Conclusions

By combining wellknown algorithms with thesymbolic high level language Erlang it is possibleto write a full industrial DBMS with only a fractionof the manpower normally required. We have by thisalso obtained a good platform for research and gotsome real world applications for benchmarking.

References

[AVWW95] Joe Armstrong, Robert Virding, ClaesWikstr�om, and Mike Williams. Concur-rent Programming in ERLANG. Pren-tice Hall, second edition, 1995.

[Bre88] P T Breuer. Applicative query lan-guages. Technical report, CambridgeUniversity Engineering Dept, 1988.

[CW93] A. W. Chen and D. S. Warren. Queryevaluation under the well-founded se-mantics. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principlesof Database Sys., page 168, Washington,DC, May 1993.

[EGLT76] K.P Eswaran, J.N. Grey, R.A. Lorie,and I.L. Traiger. The notions of con-sistence and predicate locks in a dat-base system. Communications of ACM,19(11):624{633, November 1976.

[GLPT75] J.N. Grey, R.A. Lorie, G.R. Putzolo,and I.L. Traiger. Granularity of locksand degrees of consistency in a shareddatabase. Technical Report Research re-port RJ1654,, IBM, September 1975.

[Gra93] Goetz Graefe. Query evaluation tech-niques for large databases. ACM Com-puting Surveys, 25(2):73{170, June 1993.

[Gre78] J.N. Grey. Notes on database operatingsystem: An advanced cource. Lecturenotes in Computer Science, SpringerVerlag, Berlin, 1(60):393{481, 1978.

[Lam78] L Lamport. Time, clocks and the or-dering of events i a distributed sys-tem. ACM Transactions on Program-ming Languages and Systems, 21(1):558{565, July 1978.

[Lar88] P-�A Larson. Dynamic hash tables.Communications of the ACM, 31(4),1988.

[RSL78] D.J. Rosenkrantz, R.E. Stearns, andP.M. Lewis. System level concur-rency control for distributed databases.ACM Transactions on Database Sys-tems, 3(2):178{198, June 1978.

[Ull89] Je�rey D. Ullman. Principles ofDatabase and Knowledge-Base Systems,volume 2. Computer Science Press,1989. ISBN 0-7167-8162-X.

[WC93] David S Warren and Weidong Chen.Towards e�ective evaluation of generallogic programs. Technical report, Com-puter Science Department (SUNY) atStony Brook, 1993.

[WCS93] D. S. Warren, W. Chen, and T. Swift.E�cient computation of queries underthe well-founded semantics. TechnicalReport 93-CSE-33, Southern MethodistUniversity, 1993.

[Wik94] Claes Wikstr�om. Distributed computingin Erlang. In First International sympo-sium on Parallel Symbolic computation,Sep 1994.