Upload
joerg-fritsch
View
1.186
Download
1
Tags:
Embed Size (px)
DESCRIPTION
With the advent of multicore CPUs, cloud computing and Big Data, we are currently observing changes that will eventually lead information technology into a whole new era, and we are need to search for programming language paradigms that match with it. Will Functional Programming languages (FPLs) be the game changer?
Citation preview
Functional Programming for computing clouds
Joerg Fritsch, NATO CI AgencySchool of Computer Science & Informatics Cardiff University, 24 October 2012
2
Agenda
• Essentials of Functional Programming Languages
• Vision and requirements of computing clouds
• Haskell: a Pure Functional Programming Language
• Some innocent code: working with immutable data
• Gaps between the cloud computing vision and FPLs
3
Functional Programming Languages
• Based on the λ-calculus• Declarative • Functions are declared, describe relation between
input and output• Functions always evaluate to the same value for a
given argument (“free of side effects”).• Variables are assigned once.• Functional PLs that by default exclude destructive
modifications (to data structures) are called “pure.”
4
λ - calculus
• Alonzo Church 1930s• Small Grammar• Grammar can partially be found back in LISP and
Haskell syntax• Can express everything that is computable• No state !
5
Pure FPLs
• Functions can be composed, curried, etc..• All pure functions can be executed in parallel• Compiler can make it fit for multicore: e.g. re-arrange
order of function execution or inline. • Runtime can cache function evaluation.• IO is a beast that disturbs this concepts & needs to
be tamed (for example with a monad).• Every Haskell program is a function in the IO monad.
6
Pure FPLs (continued)
• Haskell• Clean• Go• F#• ML / OCaml• Lisp / Scheme
• Scala• Clojure• XSLT• Erlang• SQL• Mathematica
7
Vision & requirements of cloud computing
• Clouds will need to support scalable programs.
• “Any” application scaled through distribution over parallel (multicore) hardware.
• Applications with high concurrency are good candidates for parallelization.
8
Elasticity in Computing Clouds (now)
• Duplication!• IaaS– Duplicate VMs including OS.
• PaaS– Duplicate language App Servers (e.g. JVM, Rails) or RTS and guest
code.– Duplicate app execution engine (a component of the PaaS platform
that is). • (Virtualized) Load Balancers are the glue. “Clustering"• Concurrency is enabler for parallelization.• Map reduce sold as separate capability.• Multi-tenancy always supported.
9
Elasticity in Computing Clouds (continued)
10
Elasticity in Computing Clouds (in the future?)
Legacy/IaaS
• Currently prevailing• Unit of scale = OS, VM,
Runtime• Duplication of units
Future/PaaS
• Borders of building blocks are dissolved
• Unit of scale = (Green)thread?• Requires new software, new
programming languages, new designs.
11
Haskell
• Named after Haskell Brooks Curry (1900 - 1982). Combinatory logic (1930s).
• Born as Haskell 1.0 standard in 1990 (approximately at the same time than Erlang)
• Haskell 98 is most prominent definition yet
12
Haskell (continued)
• Is a pure functional PL• Has a static type system• Is Lazy• Function composition and currying mimicking
mathematical functions• Has monads (related to category theory)
• Is sometimes mind boggling blowing
13
What does Haskell bring to the table?
Haskell
Functional
Strong Types
Inherently Parallel
Immutable by defaultLazy
evaluationCode Maintainabili
ty
14
Functions
• Functions are Data as well• Functions consist of way less code than objects• Higher order functions• return is a function name• Function signatures declare constraints (types) and
computational strategies. adder :: [Int] -> Int --type signature fun_name :: input_type -> output_type
adder [] = 0 --define output for the empty listadder (x:xs) = x + sum xs --use some fancy reursion
15
Immutability of Data
• The consequences are huge. There is more data than you think. For example a counter: c = c + 1;
• Haskell implementation of “counters” depends on what you need to achieve.
• Common to use Map and Fold (aka reduce)• Eventually counters represent some sort of state. Use
the state monad: Control.Monad.State• Haskell is by default pure. Mutable data structures
can be used: Data.IORef, Data.Judy but are seen as “not idiomatic”.
16
Immutability of Data (continued)
• Data.IORef part of the base package.• The function unsafePerformIO can “subvert” the
type system and allows any kind of mutable state.• A large number of Haskell modules make use of it! – Randomness & Encryption, GUIs, …
• Is immutability over-emphasized?
17
Immutability of Data: There is no list
Lists are build on top of cons cells.Cons cells contain pairs of values.Example. cons (:) and append (++) to a “list”.[1,2,3,4] = 1:2:3:4:[] = 1 : (2 : (3 : (4 : [] ) ) )
cons :0:1:2:3:4:[] = 0 : (1 : (2 : (3 : (4 : [] ) ) ) )Result is new list [0] plus a pointer to the previous list. Runs in O(1) time.This is also called “sharing”.
append ++1:2:3:4:5:[] = 1 : (2 : (3 : (4 : (5 : []) ) ) )Destructive operation, whole data structure taken apart recursively. Result is an all new data structure. Runs in O(n) time.
18
Data.Map.Fold (Map Reduce)
• Foldadder :: [Int] -> Int adder xs = foldr (+) 0 xs -- reduce a map using the +
• Data Structures can have a left and a right: foldl, foldr, foldM
19
Strong Type System
• All monomorphic types are part of the category of Haskell types, “Hask”. Maps between types are the functions in Haskell.
• Data types can be tainted. E.g. IO Int, Maybe Int• Type system supports safety and correctness. Haskell code is
reasonably easy to test.• At the beginning the type system frequently gets into your way.• Maintainability: I am often positively surprised how many
changes to my existing code work at the first compilation (once I get the types right).
• Definition of own types and type classes etc. bears the foundation for great flexibility.
20
(Parametric) Polymorphism: Type Variables
adder :: Num a => [a] -> aadder xs = foldr (+) 0 xs -- reduce a map using the +
• More powerful types of polymorphism: type classes, kinds, … .
• The type system is Turing complete & allows manipulations far exceeding any other PL
• Type classes & type level programming
21
Lazy Evaluation
• Lazy evaluation, “call-by-need”.• Partially the paradigm that makes immutable data
structures workable (see also “sharing”).• Risk of space leaks• Opens up a door to “infinity”: infinite lists [1, …],
Fibonacci numbers, primes, e, … & to new strategies in AI (Hughes, 1990).
22
Do Cloud Computing and FPLs match?
Aynschronous operations
Parallel, multi- & many core support
Elasticity & large scale operations
Secure, multi tenancy,
confidentiality
Immutable Data. Shared nothing. Message passing (e.g. actors) available to re-synchronize processes STM better manageable than locks.
FPLs are inerently parallel. Functions, Closures, Currying Declarative Compiler has freedom to re-arrange “everything”
Elasticity is left to the developer or to the “app engine” Code easily testable & maintainable
No “Safe Haskell” may be a good start.
23
Multi Tenancy: Safe Haskell
• Released to public in early 2012.• Vision: tenants upload code (e.g. a worker) that gets
compiled and executed as plugin by a Haskell app-engine.
• Plugin-concept based on library System.Eval.Haskell
• New language extensions to allow secure code only: -XSafe, -XTrustworthy, -Xunsafe
• Eventually based on type safety.
24
Safe Haskell (continued)
• Two routes decide what to be trusted:
• -XSafe = trust inferred by the compiler, limiting Haskell to a (small) subset.
• In PaaS subsets and restrictions are “normal”. Think Java on the Google App Engine.
• -XTrustworthy = trust decided by a person. Not a powerful security concept?
25
Issues
• There is no obvious way how to match functions to threads.
• Threads are more related to sequential programming (with shared memory) than to FP. Think CSP.
• Many programs have to parallelize relative small computations with high inter-dependency.
• Message passing & actors also no fit to distribute small computations.
• Function composition is … sequential execution!
26
Issues (continued)
• When a computation is moved to a remote node, little is known about cost of transport and state (e.g. load of the remote node). Multi-tenancy!
• Cost model required.• (Network)Protocols are the most prominent cost center.• It is extremely unlikely that commercial clouds will use
“niche” hardware or proprietary protocols.• Protocol design will need to be simple and light weight.• Protocols in distributed environments will orchestrate
and coordinate. Basis for a DS coordination language?
27
Amdahl’s Law
• Possible to calculate the speed improvement when n% of the code are parallel.
• Unknown under what conditions the law holds.
• Relatively small influences have huge adversary effects: – code that has a parallel portion of 95% results in a speed
improvement of factor six on an 8 core CPU.– code that has a parallel portion of 75% results in a speed
improvement of factor three on an 8 core CPU.
28
Amdahl’s Law (continued)
29
If Amdahl’s law holds, then …
We better go on & develop sequential codeBecauseInefficiencies and overhead add up:• Compiler• Runtime• Competition for the cpu cores & resources
• By the way: (OS) threads are costly to provision, here elastic may become plastic!
30
Thank You.
31
Spare Slides
32
Mythical Walk-Through
Quantum Field TheoryJones PolynomialKnot TheoryCategory of TanglesCategory Theory“Hask”, Category of Haskell Types (and maps)Haskell
33
OOP
• “OOP is eliminated entirely from the introductory curriculum [of Carnegie Mellon University], because it is both anti-modular and anti-parallel by its very nature, and hence unsuitable for a modern CS curriculum.” (Harper, 2011)
34
Common Claims & Expectations
• FPLs my let us get away with less duplication.
• FPLs are inherently parallel
• FPLs are inherently thread safe
• FPLs are inherently modular
• FPLs are easily testable and maintainable.