36
Haskell for the Real World Bryan O’Sullivan 1

Haskell for the Real World

Embed Size (px)

DESCRIPTION

Slides from a talk I gave at an ACCU meeting in Mountain View, California, on September 10, 2008.

Citation preview

Page 1: Haskell for the Real World

Haskell for theReal World

Bryan O’Sullivan

1

Page 2: Haskell for the Real World

Real World

• The hardest problems in modern software

• Reliability

• Modularity

• Performance

• Concurrency

2

Page 3: Haskell for the Real World

Haskell

• Decades of work in academia

• Vehicle for leading-edge research

• “Breaking out” this decade

3

Page 4: Haskell for the Real World

Real World + Haskell

• Fast native-code compiler (GHC)

• Debugger, code coverage, profiling

• 750+ open source packages

• Mostly BSD-licensed, one-click install

• Friendly, active user community

• #haskell 12th biggest on freenode

4

Page 5: Haskell for the Real World

Code You Can Believe In

Bryan O’Sullivan, Don Stewart & John Goerzen

Real World

Haskell

5

Page 6: Haskell for the Real World

Language philosophy

• “Multi-paradigm”, but opinionated

• Carefully chosen defaults:

• Pure and functional

• Static strong typing

• Lazy evaluation

6

Page 7: Haskell for the Real World

Pure and functional

• Data is immutable

• Code is a function of its visible inputs

• Consequences

• Easier to build, read, test, scale

• Many classes of bug are eliminated

7

Page 8: Haskell for the Real World

Static strong typing

• All types are known at compile time

• The compiler infers types

• No need to keyboard them in

• Do not confuse with familiar type systems

8

Page 9: Haskell for the Real World

Static strong typing

• Consequences

• Data conversions are explicit

• Many bugs caught by the compiler

• We don’t pay a “keyboard tax” for safety

9

Page 10: Haskell for the Real World

Lazy evaluation

• Defer work until needed

• Consequences

• Improves modularity

• Helps with reasoning and code reuse

10

Page 11: Haskell for the Real World

The k-minima problem

• Find the k least elements in a list

• Conventional solutions are complicated

• Haskell solution uses laziness

11

Page 12: Haskell for the Real World

Lazy k-minima

• The “take” function extracts the first k elements from a list

• The “sort” function sorts a list

k_minima k list = take k (sort list)

12

Page 13: Haskell for the Real World

How does this work?

• “sort” doesn’t completely sort the list

• Only enough to give the k least elements demanded by the caller

• That extra work to sort the rest of the list?

• Never happens!

13

Page 14: Haskell for the Real World

Algebraic data types

• Powerful and ubiquitous

• Unifying key concepts of data structuring

• enum

• union

• struct

14

Page 15: Haskell for the Real World

The enum-like view

• A type can have several constructors

data Bool = False | True

data Colour = Red | Green | Blue | Violet

15

Page 16: Haskell for the Real World

The union-like view

data PhoneNumber

= Home [Digit]

| Work [Digit]

• Major bonus: we know at runtime which constructor was used

16

Page 17: Haskell for the Real World

The struct-like view

data Tree a

= Node (Tree a) (Tree a)

| Leaf a

17

Page 18: Haskell for the Real World

Algebraic data types

data JSON = JObject [(String,JSON)]

| JArray [JSON]

| JString String

| JNumber Double

| JBoolean Bool

| JNull

18

Page 19: Haskell for the Real World

Typeclasses

• Ad-hoc polymorphism

• A function’s behaviour depends on its type

• How do we express this in a non-OO language?

19

Page 20: Haskell for the Real World

Checking for equality

• How do we express this idea?

• A type whose values can be compared for equality

class Eq a where

(==) :: a → a → Bool

20

Page 21: Haskell for the Real World

Testing for equality

• Define this function:

• “is a value present in a list?”

• Desired:

• One definition for all types that can be compared for equality

elem :: Eq a ⇒ a → [a] → Bool

21

Page 22: Haskell for the Real World

One definition of elem

• elem k (x:xs)

| k == x = True

| otherwise = elem k xs

elem k [] = False

22

Page 23: Haskell for the Real World

Instances

• How do we compare JSON values?

instance Eq JSON where

JNumber a == JNumber b = a == b

etc.

23

Page 24: Haskell for the Real World

Gene sequencing

• Splice e.g. mouse DNA into E. coli

• Replicate

• Extract DNA fragments

• ... now what?

24

Page 25: Haskell for the Real World

Contamination

• Fragments of target and E. coli genes mixed

• Must filter out the E. coli fragments

• How to identify them quickly?

• Standard solution: BLAST

25

Page 26: Haskell for the Real World

Filtering in Haskell

• Our solution: 75 lines of Haskell

• 5x faster than BLAST

26

Page 27: Haskell for the Real World

Development time

• 2 days: develop application

• 2 days: speed up app by 5x

• 2 hours: knock out 3 bugs found by QuickCheck

• 17 seconds: index human chromosome 20

• 5 minutes: check it for 100,000 E. coli fragments

27

Page 28: Haskell for the Real World

What helped?

• Great libraries

• “bio” handles biological sequences

• “bloomfilter” for fast indexing

• “bytestring” provides efficient I/O

• “QuickCheck” for randomized testing

28

Page 29: Haskell for the Real World

What helped?

• Laziness

• Generate all k-length E. coli sequences

allKWords k list = map (take k) (tails list)

• List generated on demand

• Constant space overhead

29

Page 30: Haskell for the Real World

What helped?

• Native code compilation

• Indexing and I/O at C speeds

• Mature profiling tools

• Found opportunities for 5x speedup

30

Page 31: Haskell for the Real World

What helped?

• QuickCheck is a life saver

• Generated random test cases for us

• If a test failed, provided a test case

• Found and fixed 3 gnarly bugs in 2 hours

• Would take days with traditional testing

31

Page 32: Haskell for the Real World

Parallelism

• Next step: run the code in parallel

• Expect 2 days of work needed

• Change maybe 10 lines of code

• The challenge: find the right 10 lines

• Functional programming does not give us parallelism for free ... yet

32

Page 33: Haskell for the Real World

Concurrency

• Threaded programming is a nightmare

• Locks and condition variables do not scale

• Fundamental problem:

• Combining correct threaded functions does not give a correct threaded program

33

Page 34: Haskell for the Real World

Software transactions

• A new approach to threaded programming

• Concurrent updates from multiple threads are atomic and isolated

• Like treating shared memory as a database

34

Page 35: Haskell for the Real World

Interesting features

• The type system prevents us from doing unsafe operations inside a transaction

• We can compose pieces of transactional code into a larger unit

• This still runs as one transaction

• This preserves correctness!

35

Page 36: Haskell for the Real World

Code You Can Believe In

Bryan O’Sullivan, Don Stewart & John Goerzen

Real World

Haskell

Real World Haskell

Online now, free:

book.realworldhaskell.org

In stores in November

~700 pages of good stuff

36