27

Amis Consulting LLP

  • Upload
    aden

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Amis Consulting LLP. 1977-1981: Research in Combustion/Fluids. 1983-1991: Scientific computing image processing 1992-1997 : UK Healthcare / Imperial College 1997-2003: Dotcom Boom (and bust) !!! 2003-: Financial Systems . - PowerPoint PPT Presentation

Citation preview

Page 1: Amis Consulting LLP
Page 2: Amis Consulting LLP

Amis Consulting LLP

1977-1981 : Research in Combustion/Fluids.

1983-1991 : Scientific computing image processing

1992-1997 : UK Healthcare / Imperial College

1997-2003 : Dotcom Boom (and bust) !!!

2003- : Financial Systems.

Currently involved in High Performance Computing and (of course) Big Data and well as all the other stuff.

Page 3: Amis Consulting LLP

Worked with a variety of technologies● Languages (in anger) :

Fortran / C / Ada / Perl / Python / Lisp / Java / PHP / Groovy / NodeJS

… our GOTO languages remain C and Perl but ???● Back-ends:

Unix (not just Linux) and Windows (so some .NET)● Databases :

Both relational and the NoSQL (Redis, Mongo Neo4J)● Moving into the cloud:

AWS: Map-reduce, Redshift, Google App Server

Page 4: Amis Consulting LLP

Then along came R …● At Kings in late-2000’s● Interest was in HPC (mainly CUDA) applied to

financial systems. ● Started using Matlab but was looking for a

similar type package for personal/company usage .

● Gnu/Octave and R both fitted the bill, R won – at the time.

● Looked at (and impressed by) Python

Page 5: Amis Consulting LLP

History

● Gang of “four”:– Jeff Bezanson, Virah Shah– Stefan Karpinski, Alan Edelman

● Started at MIT in 2010● First release February, 2012● Still actively maintained by G4● MIT using Julia in courses (on youtube)

Page 6: Amis Consulting LLP

What happened to Ada?

● Designed 1977/83 for US DoD in order to supercede 100’s of languages DoD used.

● Mandated its use in 1987.● Dropped the mandate in 1997.● Still used in air traffic control systems such as

iFacts, GNATS.● Nearest meetup group is in Stockholm.

Page 7: Amis Consulting LLP

Runners and Riders

Current field:

1. Runners: Matlab, R, Python

2. Riders: C/C++, Java

3. Outsiders: Scala, Clojure

4. Non-starter: Perl

Page 8: Amis Consulting LLP
Page 9: Amis Consulting LLP

What makes a good Data Science Language? (1)

● Be a general purpose language with a sizable user community and an array of general purpose libraries, including good GUI libraries, networking and web frameworks.

● Be free, open-source and platform independent.● Be fast and efficient.● Have a good, well-designed library for scientific computing,

including non-uniform random number generation and linear algebra.

● Have a strong type system, and be statically typed with good compile-time type checking and type safety.

● Have reasonable type inference.● Have a REPL for interactive use

Page 10: Amis Consulting LLP

What makes a good Data Science Language? (2)

● Have good tool support - including build tools, doc tools, testing tools, and an intelligent IDE.

● Have excellent support for functional programming, including support for immutability and immutable data structures and “monadic” design

● Allow imperative programming for occasions where it makes sense.

● Be designed with concurrency and parallelism in mind, having excellent language and library support for building really scalable concurrent and parallel applications.

● Have excellent built-in data capabilities.● Have comprehensive math and statistical routines.

Page 11: Amis Consulting LLP

Comparison with Matlab● Julia syntax is similar to Matlab but its

construction is purposely very different.● Matlab has only one data structure (the matrix)

and is optimised for matrix operations. Other native computations can be very slow.

● The focus on matrices lead to some important differences in MATLAB’s design compared to GP programming languages such as Julia.

● Julia uses similar matrix syntax to Matlab but also incorporates list comprehensions.

Page 12: Amis Consulting LLP

Comparison with R● Origins as open-source clone of S+.● Still seen as a “statistical” DSL.● R is single threaded and hard to speed up.● Introduced the data frame structure which is

also present in Julia● Julia also has an RDatasets package.● R has very good graphic and data visualisation

support.● Julia has a Google group: julia-stats.● Julia can call R modules using the Rif package.

Page 13: Amis Consulting LLP

Comparison with Python● Python now seen by many as the Data Science

language.● Strength lies in its community support.● Modules such as numpy, scipy, matplotlib and

pandas are very powerful.● Speed up using PyPy● Mature frameworks such as Django● Julia approach is co-operation not confrontation

via the PyCall and also IJulia IPython

Page 14: Amis Consulting LLP

What makes Julia special?● It is written in Julia, apart from a small core, and the

code is available to look at.● The designers are data scientists and not tied to

companies such as Google (Go) or Mozilla (Rust).● It has been designed for parallelism / distributed

computation● It takes every opportunity to cooperate rather than

confront.● Julia intends to combine the best from MATLAB, R and

Python into one language that is to be consistent, well designed and fast.

Page 15: Amis Consulting LLP

Special features• Easy installation• JIT compilation• Built-in package manager• Coroutines and green threads• Multiple dispatch• Dynamic type system• Meta programming with Lisp-like macros• Call C functions directly• Call Python functions: (PyCall)• Best-of-breed C and Fortran libraries• Unicode support

Page 17: Amis Consulting LLP

Modules and packages● Julia has its own built-in package manager● There are (now) 250+ packages.● These include:

– Statistics – Graphics– System tools– Database– Web and Cloud– Simulation

● Its quite easy to add your own package (via GITHub)

Page 18: Amis Consulting LLP

100+ contributors, 1000+ mailing list subscribers, 175+ packagesAWS, ArgParse, BSplines, Benchmark, BinDeps, BioSeq, BloomFilters, Cairo, Calculus, Calendar, Cartesian, Catalan, ChainedVectors, ChemicalKinetics, Clang, Clp, ClusterManagers, Clustering, Codecs, CoinMP, Color, Compose, ContinuedFractions, Cpp, Cubature, Curl, DICOM, DWARF, DataFrames, DataStructures, Datetime, Debug, DecisionTree, Devectorize, DictUtils, DictViews, DiscreteFactor, Distance,Distributions, DualNumbers, ELF, Elliptic, Example, ExpressionUtils, FITSIO, FactCheck, FastaIO, FastaRead, FileFind, FunctionalCollections, FunctionalUtils, GLFW, GLM, GLPK, GLUT, GSL,GZip, Gadfly, Gaston, GeoIP, GeometricMCMC,GetC, GoogleCharts, Graphs, Grid, Gtk, Gurobi, HDF5, HDFS, HTTP, HTTPClient, Hadamard, HttpCommon, HttpParser, HttpServer,HypothesisTests, ICU, ImageView,Images, ImmutableArrays, IniFile, Iterators, Ito, JSON, JudyDicts, JuliaWebRepl, KLDivergence, LIBSVM, Languages, LazySequences, LibCURL, LibExpat, LinProgGLPK, Loss, MAT, MATLAB, MCMC, MDCT, MLBase,MNIST, MarketTechnicals, MathProg, MathProgBase, Meddle, Memoize, Meshes, Metis, MixedModels,Monads, Mongo, Mongrel2, Morsel, Mustache, NHST, NIfTI, NLopt, Named, NetCDF, NumericExtensions, NumericFunctors, ODBC, ODE, OpenGL, OpenSSL, Optim, Options, PLX, PTools, PatternDispatch, Phylo,Phylogenetics, Polynomial, Profile, ProgressMeter, ProjectTemplate, PyCall, PyPlot, PySide, Quandl,QuickCheck, RDatasets, REPL, RNGTest, RPMmd, RandomMatrices, Readline, Regression, Resampling, Rif, Rmath, RobustStats, Roots, SDE, SDL, SVM, SemidefiniteProgramming, SimJulia, SimpleMCMC, Sims,Sodium, Soundex, Sqlite, Stats, StrPack, Sundials, SymPy, TOML, Terminals, TextAnalysis, TextWrap, TimeModels, TimeSeries, Tk, TopicModels, TradingInstrument, Trie, URLParse, UTF16, Units, ValueDispatch,WAV, WebSockets, Winston, YAML, ZMQ, Zlib

Page 19: Amis Consulting LLP

Julia does have graphics!

● Winston (Standard 2D graphics)

● Gadfly (Like 'gg2plot')

● Gaston (Uses gnuplot as graphics engine)

● PyPlot (Uses IPython/matplotlib.py)

● Plotly (http://plot.ly/api)

Page 20: Amis Consulting LLP

Simulated Stock Marketjulia> plothist(randn(100000), 100) julia> plot(cumsum(randn(10000)))

Page 21: Amis Consulting LLP

What’s missing?● Cached package loading

– At present all modules are compiled on the fly– Preloading would reduce startup times

● Better database connectivity– Uses ODBC– Simple d/b support via SQLite– No native Oracle, MySQL or Postgresql

● More comprehensive NoSQL support– Packages for Mongo, Redis.– JSON package helps with CouchDB, Neo4j

Page 22: Amis Consulting LLP

Familiar syntax for Matlab/Octave users

function randmatstat (t; n=10) v = zeros(t) w = zeros(t) for i = 1:t a = randn(n,n) b = randn(n,n) c = randn(n,n) d = randn(n,n) P = [a b c d] Q = [a b; c d] v[i] = trace((P'*P)^4) w[i] = trace((Q'*Q)^4) end std(v)/mean(v), std(w)/mean(w)end

Page 23: Amis Consulting LLP

Simulating an Asian Option

S0 = 100; # Spot priceK = 102; # Strike pricer = 0.05; # Risk free rateq = 0.0; # Dividend yieldv = 0.2; # Volatilitytma = 0.25; # Time to maturityT = 100; # Number of time stepsdt = tma/T; # Time increment

S = zeros(Float64,T); S[1] = S0;dW = randn(T)*sqrt(dt);[ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] +

0.5*v*v*dW[t]*dW[t]) for t=2:T ]x = linspace(1, T, length(T));p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%")add(p, Curve(x,S,color="red"))display(p)

Page 24: Amis Consulting LLP

Random Walk on Julia Studio

Page 25: Amis Consulting LLP

Going further …● Start with the julia.org website● Install Julia and read the documentation● Look at the training material

– http://julialang.org/teaching/● Try the Julia Studio● Read/subscribe to Google-groups sites

– julia-users, julia-stats, julia-opt, julia-dev● Join the LJuUG

– http://www.meetup.com/London-Julia-User-Group

Page 26: Amis Consulting LLP

My Benchmarks

Language Timing (c = 1) Asian Optionc 1.0 1.681julia 1.41 1.680python (v3)

32.67 1.671

R 154.3 1.646Octave 789.3 1.632

Results for 100,000 runs of 100 steps, (c ~ 0.73 s)

Samsung RV711 laptop with an i5 processor and 4Gb RAM running Centos 6.5 (Final)

Page 27: Amis Consulting LLP