M Machine Learning F# and Accord.net. Alena Hall Software architect, MS in Computer Science Member...

Preview:

Citation preview

m

Machine Learning F# and Accord.net

Alena Hall• Software architect, MS in Computer

Science

• Member of F# Software Foundation Board of Trustees

• Researcher in the field of mathematical theoretical abstractions possible in modern programming concepts

• Speaker and active software engineering community member

@lenadroid

Machine Learning

• Why machine learning?

• What is the data?

• How?

Questions

Data Questions.

Data reality :\

Path to grasping machine learning and data science…

Contents• Multiple Linear

Regression• Logistic Regression

Classification• K Means

Clustering• What’s next?

F# for machine learningand data science!

Why F#?1. Exploratory programming, interactive

environment

2. Functional programming, referential transparency

3. Data pipelines

4. Algebraic data types and pattern matching

5. Strong typing, type inference, Type Providers

6. Units of measure

7. Concurrent, distributed and cloud programming

Data pipelines

Algebraic data types

// Discriminated Union

Pattern matching

Type Providers

Units of measure

Linear Regression

How to predict?1. Make a guess.2. Measure how wrong the guess

is.3. Fix the error.

Make a guess!

MATH

Make a guess?What does it mean?...

Hypothesis /guess :

weights

Find out our mistake…

Cost function/ Mistake function:

… and minimize it:

Mistake function looks like…

Global minimums

How to reduce the mistake?Update each slope parameter

until Mistake Functionminimum is reached:

Simultaneously

Alpha Learning rate

Derivative Direction of moving

Fix the error

Multiple Linear RegressionX [ ] – Predictors:Statistical data about bike rentals for previous years or months.

Y – Output:Amount of bike rentals we should expect today or some other day in the future.

* Y is not nominal, here it’s numerical continuous range.

Make a guess!

Fix the error

Multiple linear regression: Bike rentals demand

“Talk is cheap. Show me the code.”

What to remember?1. Simplest regression algorithm

2. Very fast, runs in constant time

3. Good at numerical data with lots of features

4. Output from numerical continuous range

5. Linear hypothesis

6. Uses gradient descent

Linear Regression

Logistic Regression

Hypothesis function

Estimated probability that Y = 1 on input X

Mistake function

Mistake function is the cost for a single training data example

h(x)

Full mistake function

1. Uses the principle of maximum likelihood estimation.

2. We minimize it same way as with Linear Regression

“Talk is cheap. Show me the code.”

Logistic Regression Classification Example

What to remember?

• Classification algorithm

• Relatively small number of predictors

• Uses logistics function for hypothesis

• Has the cost function that is convex

• Uses gradient descent for correcting the mistake

Logistic Regression

At this point…

Machine Learning

What society thinks I do…

What other programmers think I do…

What I really do is…

K-Means

Clustering

What’s next?

I’m Lena@lenadroid

Thank you!

What if it doesn’t work?

• Try more data• Try more features• Try less features• Try feature combinations• Try polynomial features• …

Algorithm debugging tips

What else can go wrong?

Ideally... the hypothesis will… just fit the model

Underfitting … Overfitting

• Regularization…?• Too big regularization

parameter? -> underfitting - the line is over-smoothed• Too small regularization

parameter? -> overfitting - too optimized for train data

Try out different values for the regularization parameter.

Recommended