uts.edu.auUTS CRICOS PROVIDER CODE: 00099F
VARIOUS FLAVOURS OF BIG DATA IN COMPUTER VISION, BANKING AND TEACHING
Massimo Piccardi
BIG DATA: SAMPLE SIZE
Data can be big because they are manye.g., AT&T’s trillion phone records
BIG DATA: DIMENSIONALITY
But data can also be big because each sample contains many values (dimensions)
e.g., a spam classification dataset with 16 trillion features [Weinberger 2009]
Large dimensionality large model, overfitting risk
x3x2 xDxi xjx1 ………
E[xixj]
BIG DATA: STRUCTURE
Or data can be big because their values are structuredStructure = factor graph (or others)
COMPUTER VISION: RECOGNISING ACTIONS
An interesting problem: recognising actions from still frames
superpixels: small, homogeneous regions in the image
recognition as relationships!
OUR APPROACH:
latent objects: “sky”, “road”, “desktop”, “coffee mug”…
OUR APPROACH
• Such a complex graph can be solved as a linear model!
• The graph is “flattened” into a one-dimensional array, , and scored as wT
• With a relaxed SVM solver, we have obtained an average precision of 72% over benchmark Stanford-40, a jump of 17 percentage points over the state of the art
COMPUTING THE SOLUTION
• Despite using a powerful computer cluster, training this model over 5,000 images takes over a month
• Where from here?
parallel solvers
AMPLAB, MLBASE, APACHE SPARK…
• MLbase: approximate, efficient SVM solution [Kraska 2013]
• 100x faster than Hadoop
TEACHING MACHINE LEARNING
• Started in 2004 with informal lecture series for doctoral students
• Flipped class last year (with player MS Silverlight, interactive & mobile-friendly)
• Teaching ML to industry audiences:
not sated by the bird’s eye view, very keen on the technical detail and how it actually all works
DATA SCIENCE: COMMONWEALTH BANK
FOLLOW UP?
Prof. Massimo [email protected]
Global Big Data Technology CentreUniversity of Technology, Sydney
http://www.bdt.uts.edu.au/