33
Initiatives to build ML communities around big data generators Dr Jason Rigby 24 th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore

Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Initiatives to build ML communities around big data generators

Dr Jason Rigby

24th Oct 2017

Untapped Data

NVIDIA AI Conference, Singapore

Page 2: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

INSIGHT

Lens

CAPTURE

Light Source Samples

ANALYSIS

Filters

Visualisation

techniques and

infrastructure

HPC and cloud

Workbenches

The instruments

Page 3: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Awareness

Understanding the potential

Page 4: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Data ingestion and storage at Monash University

The data deluge

0.

6.25

12.5

18.75

25.

2014 2015 2016 2017 2018 2019 2020

Re

sear

ch D

ata

Sto

rage

(P

eta

byt

es)

ManagedArchive

Tape and offsite storage

① AS M2 Eiger(Monash share)

② Titan Krios

③ Lattice LightSheet

④ MR PET

Fast Data ProcessingHPC storage

Processing and Access

Local Cloud Storage

Store.Monash

AS IMBLAS XFM

⑤ Titan Artica

⑥ FEI SEM

⑦ MBI CT

⑧ 9.4 MR

⑨ Next Generation Materials EM

⑩ Future CryoEMand beamlines

Page 5: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Targeting the long tail

Capability Maturity

Data collected,

stored and

analysed with

little or no

automation

Some processes

automated, but

analysis is largely ad-

hoc and workflows not

well defined

All processes are

automated from data

capture and analysis

Page 6: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

4% PFA 37°C 10 mins1% Triton 37°C 5 mins

4% PFA and 0.5% Triton37°C 5 mins

1% Triton 37°C 5 mins4% PFA 37°C 10 mins

and are best placed to analyse it

Researchers know their data

Which image is correct?Courtesy of Dr Donna Whelan, Monash University

Page 7: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

and are best placed to analyse it

Researchers know their data

Shatsky, Maxim, et al. Journal of structural biology 166.1 (2009): 67-78.

Richard Henderson PNAS 2013;110:18037-18041

Is your data noise or the real deal?

Page 8: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

and are best placed to analyse it

Researchers know their data

ASPREE Investigator Group. Contemporary clinical trials 36.2 (2013): 555-564.

WHO criteria for anemia

Can you use the data for your intended purpose?

Page 9: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

With great data comes great responsibility.

Researchers must be included in ML advancements to drive innovation in their discipline.

Page 10: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Access

Lowering barriers to technology

Page 11: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Stage 2 upgrade

MASSIVE 3

1,600 Intel Haswell CPU-cores

2,448 Intel Skylake CPU-cores

NVIDIA GPU coprocessors for data processing and visualisation:

48 NVIDIA Tesla K80

40 NVIDIA Pascal P100

60 NVIDIA V100

2 NVIDIA DGX1-V

8 NVIDIA Grid K1 GPUs for medium and low end visualisation

A 1.15 petabyte Lustre parallel file system

A 2 petabyte Lustre parallel file system upgrade

100 Gb/s Ethernet Mellanox Spectrum

Supplied by Dell, Mellanox and NVIDIA

0

0.5

1

1.5

2

2.5

3

3.5

K80 P100

Caffe neuro image segmentation task

Page 12: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

@vintage_computer_museum NCI supercomputer

Page 13: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens
Page 14: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens
Page 15: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Our web service for modernising HPC access

SSH Authz

• 3rd party applications are authorised to access HPC resources on behalf of the user

• Applications never access passwords

• Web services can move files, interact with the job queue

https://github.com/monash-merc/ssh-authz

• Maps institutional account to UNIX• Issues short-lived certificates

Page 16: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Running a remote desktop in the browser

Strudel Web Demo

MASSIVE Launcher > ScienTific Remote Desktop Launcher (Strudel) > Strudel Web

Page 17: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Our institutional instrument data repository

Store.Monash

Data Storage

InstrumentsComputation &

Analysis

Data Sharing & Publication

Image Capture & Management

MyTardis

MyTardis API

Atom Provider

DICOM Uploader

Metadata

Bio-formats

Web Portal

SFTP

Galaxy interface

Checkout & Push-to

“MyData”

Instrument App

SquashFS

RIF-CS

DOI Minting

SWIFT/S3

Facility View

AAF

SaltStack

Docker Container

Facilities/Labs: 13Instruments: 55

Experiments: 3,874Datasets: 35,578

Users: 968Data: 100+ TB

Downloads: 13,000+Data downloaded: 9TB

FlowCoreMonash Micro Imaging

CryoEM MBI XMFIGMBPF Rosa Lab Kitchen Lab

Page 18: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Moving data from storage to compute

Store.Monash Push-To Demo

Page 19: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

DIGITS and Jupyter on-demand: Work in progress

Supporting HPC-based web apps

Page 20: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Acceleration

Bringing skills to researchers

Page 21: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Enabling ML capabilities is a partnership between technology providers and researchers

Page 22: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

ASPREE

ASPirin in Redicing Events in the Elderly

2,411 16,703+ = 19,114

Page 23: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Autocoding concomitant medications

ASPirin in Redicing Events in the Elderly

Free-text data entered Manual coding to ATC

130,104 drugs reported in free-text

Page 24: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

A many:one mapping problem – drug names can be:

• Generic

• Trade names

• Misspelt generics

• Misspelt trade names

• Free-text with additional information (e.g. “…taken 3x daily”)

• Differ between countries

Autocoding concomitant medications

ASPirin in Redicing Events in the Elderly

ATC Code: N02BE01Many can be autocoded with

an RNN text classifier!

Page 25: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Autocoding concomitant medications

ASPirin in Redicing Events in the Elderly

acetaminophen atorvastatin

Page 26: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Extracting image identifiers from retinal scans

ASPirin in Redicing Events in the Elderly

AB42)00Z Timer: 11:20:57.00

99% Accuracy reidentifyingmislabelled images

Applied to 7000+ unlabeled images

AB42)00Z Timer: 11:20:57.00

Page 27: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Identifying the state of the glottis in rabbit kittens

Preterm respiratory development

Work in progress:~70% accuracy in correctly identifying the state of the glottis

Courtesy of Dr Marcus Kitchen & group,Monash University

Page 28: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

A containerised stack of curated DL and ML applications

Data science toolbox

https://github.com/monash-merc/data-sci-singularity

Page 29: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Providing hands-on access to educational materials for upskilling

Workshops and training

Page 30: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Providing hands-on access to educational materials for upskilling

Workshops and training

Page 31: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Providing hands-on access to educational materials for upskilling

Workshops and training

Attendees include:

• Monash staff and students

• Neighboring universities

• Government

• Other non-university research institutions

Page 32: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Modernising the delivery of data-centric curricula

Workshops and training

Master of Public Health: Practical Data Management

Page 33: Untapped Data - Nvidiaimages.nvidia.com/content/APAC/events/ai... · big data generators Dr Jason Rigby 24th Oct 2017 Untapped Data NVIDIA AI Conference, Singapore. INSIGHT Lens

Take-home messages

• The quantity and variety of data is increasing

• The most qualified people to analyse their data using ML techniques should be the researchers themselves

• Cultivating ML communities is a partnership between technology and service providers and the researchers

• Empowerment through outreach and training is essential to raise awareness and generate ideas

• Ease of access will increase technology uptake amongst the “long tail”