39

Machine Learning Intro Session

Embed Size (px)

Citation preview

Identifying use cases – without Google

• Anything repetitive (classifying digits/gestures/road conditions/e-mail contents)• Capturing best practices

Source: http://www.digital-photo-secrets.com/tip/2879/which-is-best-spot-center-weight-or-matrix-metering/

The Nikon D700 has a 1,005-pixel RGB (red, green, blue) sensor that measures the intensity of the light and the color of a scene. The camera then compares the information to information from 30,000 images stored in its database. The D700 determines the exposure settings based on the findings from the comparison. Simplified, it works like this: You're photographing a portrait outdoors, and the sensor detects that the light in the center of the frame is much dimmer than the edges. The camera takes this information along with the focus distance and compares it to the ones in the database. The images in the database with similar light and color patterns and subject distance tell the camera that this must be a close-up portrait with flesh tones in the center and sky in the background. From this information, the camera decides to expose primarily for the center of the frame although the background may be over or underexposed.

Source: http://my.safaribooksonline.com/book/photography/9780470413203/nikon-d700-essentials/metering_modes

• Note the effort on Data collection.• Need for synthetic data.

• Confusion Matrix

Precision is the fraction of retrieved instances that are relevant Recall is the fraction of relevant instances that are retrieved

Confusion Matrix

Sample code

• Demo 1 – Simple fit() & predict()

• Demo 2 – With Cross Validation

• Demo 3 – Use a pickled Classifier

PCA – Hum Dus, Humara Ek!

• Why? Computationally efficient. A pre-processing step when features are large.

• Up-to 10X reduction in number of features, without losing information.

• Demo 4 – Original # of features 1850. Features used 150

Clustering

Recommendation Engines• Where the money is – 75% sales

• Don’t make money on hardware – Amazon

• User based - based on User Similarity –Collaborative Filtering

• Item based – “Users who bought X also bought Y”

• Demo 5

Anomaly Detection

So anomaly detection doesn't know what they look like, but knows what they don't look like!

Very small number of positive examples

Error Analysis: Example - data center monitoring. Features

x1 = memory usex2 = number of disk access/secx3 = CPU loadx4 = network traffic

We suspect CPU load and network traffic grow linearly with one anotherIf server is serving many users, CPU is high and network is highFail case is infinite loop, so CPU load grows but network traffic is low

New feature - CPU load/network traffic

Multivariate Gaussian algorithm is aware of “covariance”.