Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Improving Reproducible Deep Learning Workflows with DeepDIVA
M. Alberti1*, V. Pondenkandath1*, L. Vögtlin1, M. Würsch12, R. Ingold1, M. Liwicki13
*Equal contribution
1DIVA Group, University of Fribourg, Switzerland2IIT, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Switzerland3EISLAB Machine Learning, Luleå University of Technology, Sweden
Reproducibility Crisis: Trust or Verify?
2
Joelle Pineau, “Reproducible, Reusable, and Robust Reinforcement Learning”,
invited talk @NeurIPS 2018, Montreal, Canada
No possibility to verify
No possibility to extend
Lots of overhead created
Leads to no trust in scientific results
Why Is This a Problem?
3
Ensure reproducibility
Of your own experiments
Of other people’s experiments
Promote open-source code
Make it easy to have “good enough” code
Enable code trustworthiness
How To Make Steps Forward?
4
Open-Source
Python framework
Built on top of PyTorch
Makes your life easer for:Reproducing your own and other people’s experiments
Provides boilerplate code for:Common deep learning scenarios
Handling time consuming everyday problems
Documentation & Tutorial available
How We Contribute: DeepDIVA
5
Reproducing Your Own Experiments
Short-term, or work in progress
Long-term, or finished work
6
Kilometres of poor or incomplete log files
Stochasticity in the process
Short-term Reproducibility Dangers
7
Meaningful logging
Saving all run parameters and command line args
Providing concise coloured logs
Deterministic runs
Seeding the pseudo-random numbers generators: Python, Numpy and PyTorch.
Disabling CuDNN (NVIDIA Deep Neural Network library) when necessary
How DeepDIVA Ensures Short-term Reproducibility
8
Poor (or non-existent!) use of version control
Hard-to-die bad programming habits
Silent data modifications
Long-term Reproducibility Dangers
9
Git status
Linking every run to a specific commit in Git
Allowing this feature to be disabled for dev purposes
Copy code
Copying the entire running code in the output folder
Data Integrity Management
Footprint of the data in a JSON file using SHA-1 hashes
How DeepDIVA Ensures Long-term Reproducibility
10
Reproducing Other People’s Experiments
Given a paper, try to replicate the results and observations
11
In order to reproduce an experiment one needs:
Git repository URL
Git commit identifier (full SHA)
List of command line arguments used
The data
Reproducing Other People’s Experiments
12
Productivity Out-of-the-box
Making your life easier: do not reinvent the wheel!
13
“One click away” Deep Learning Scenarios
14
“when the data is ready the task is solved”
Download a dataset with a click
Natural images, medical images, historical documents, …
Split your datasetTrain, Validation and Test splits
Analyse the data
Mean/std and class distributions
Ensure data integrity
Compare the footprints
Prepare Your Data
15
Real-time Visualizations
16
Tensorboard (from TensorFlow)
Confusion Matrix
Features Visualization
Weight Histograms
Performance Evaluation
Let machine learning find the best values
No expensive grid or random search
Automatic Hyper-Parameter Optimization
17
Be A Part Of It
Getting Started With DeepDIVA
18
No Setup Time From source on Ubuntu (or other flavours of Linux)Docker Image Coming Soon
DocumentationOnline and in the code
TutorialsLearn new features efficiently
Fork ItExtensive and modular for easy modifications
How To Use It
19
Make Your Experiment Reproducible
bit.ly/DeepDIVA
20