14
Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Embed Size (px)

Citation preview

Page 1: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Recitation for BigData

Jay GuJan 10

HW1 preview and Java Review

Page 2: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Outline

• HW1 preview• Review of java basics• An example of gradient descent for linear

regression in Java

Page 3: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

HW1 Preview

On ~1 million size data.

• Warm up exercise

• Stochastic Gradient Descent for Logistic Regression

• SGD with Hashing Kernel

• Extra credit: Personalized Logistic Regression

Page 4: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Starter Code

–Class for parsing the input file and iterate over the dataset.

Dataset dataset = new Dataset(your_path, is_training, size)While(dataset.hasNext()) {

DataInstance d = dataset.next();… some action on d …

}

Page 5: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Starter Codepublic class DataInstance {

int clicks; // number of clicks, -1 if it is testing data.int impressions; // number of impressions, -1 if it is testing data.

// Feature of the sessionint depth; // depth of the session.int[] query; // List of token ids in the query field

// Feature of the ad….

// Feature of the user….

}

Page 6: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Starter Codepublic class Weights {

double w0;/* * query.get("123") will return the weight for the feature: * "token 123 in the query field". */Map<Integer, Double> query;Map<Integer, Double> title;Map<Integer, Double> keyword;Map<Integer, Double> description;double wPosition;double wDepth;double wAge;double wGender;

}

Page 7: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

BigData is often sparse

Be as lazy as you can …

Update only when necessary…

Page 8: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Avoid O(d): Sparse and lazy update

• Although the feature space d is huge, each data point only has a few tokens.– Only update what is changed.

• But even so, regularization should be applied to all d weights at each step.– Delay and batch the regularization.

Page 9: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Java Review

Not required but good to know:Interface, Inheritance, Access Modifier,

I/O,…

• Language: Class, Object, variable, method• Data Structure: Java Collections– Array– List : ArrayList– Map: HashMap

Page 10: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Classpublic class DataInstance {

// Feature of the sessionint[] query ….// Feature of the adint[] title …DataInstance(String line, … ) {

// parse the line, and set the field}

public void print() {System.out.println( “title: “);for (int token : title)

System.out.print(token + “\t”);}

}

Members or fields

Constructor

Method

Page 11: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Object

• DataInstance data = new DataInstance();

• int clicked = data.clicked

• data.print()

Page 12: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Collections

• Array– int[] tokens– double[] weights

• ArrayList– ArrayList<DataInstance>

• HashMap– HashMap<K, V>

Fixed Length, Most compact

Dynamically Increasing (double the size every time)

Constant time key value look upDynamically Increasing, use more memory

Page 13: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Variables

• “Everything” in Java is an Object– Except for primitive types : int, double

• All object variables are reference/pointers to the Object

• Function passes variables by value

Page 14: Recitation for BigData Jay Gu Jan 10 HW1 preview and Java Review

Example: SGD for linear regression

• Demo