Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
A U T O M AT I C I M A G I N G
A N D C L A S S I F I C AT I O N
O F F I S H E G G S A N D
Z O O P L A N K T O N U S I N G
M A C H I N E L E A R N I N G
A L G O R I T H M S
B. FOREST, M. TARDIVEL, M. LUNVEN, J.
PERCHOC, E. ANTAJAN, G. GUYADER, M.M.
DANIELOU, S. LE MESTRE, P. BOURRIAU, M.
SOURISSEAU, P. PETITGAS
M. HURET, J.B. ROMAGNAN, F. COLAS
IFREMER
Context
Plankton and fish eggs are
traditionally characterized by
binocular observations.
Long method requiring specialized
people.
It is just not always possible to
make the analysis on board.
Back in the lab:
Hundreds of samples to analyses,
Only the fish eggs are
counting systematically,
The zooplankon studies are
required to get a complete
description.
05/06/2018 2
Commercially available systems
Zooscan (Hydroptic, France):
Not on board
FlowCAM (Fluid Imaging, USA):
The biggest flow-cell is 2mm thick with a X2 objective.
Magnification is too large !
05/06/2018 3
Available softwares
• E. Antajan (Ifremer, France) and S. Gasperini (LOV, UPMC, France)
• Stand-alone software
• Tanagra library (R. Rakotomalala, U Lyon2)
• Several classification algorithms:
k-nearest neighbour,
Support Vector Machine for linear and non-linear Classification,
Ball Vector Machine,
Decision tree algorithm
Random Forest
Partial Least Squares Discriminant Analysis
Multilayer Perceptron neural network.
• P. Grosjean (Écologie Numérique des Milieux Aquatiques, Mons University, Belgium)
• R environment (now new GUI)
• R library
• Random forest
• New algorithm for simplifying the validation: only the suspect images are validated.
05/06/2018 4
Zoo/Phytoimage
Now: EcoTaxa
• Dev. By Villefranche Oceanology Observatory, Roscoff Biological station, Oceanomics.
• Web application for plankton classification
• Random forest and now deep learning capabilities for classification,
• Highly ergonomic validation interface.
• Stand-alone application.
05/06/2018 5
Objectives
• Developing an in-flow imaging system coupled to supervised classification software.
• Organisms from 0.1 to 10mm wide,
• Easy and fast operation (less than 10 min),
• Quantitative (about 100% of organisms)
• Classification as accurate as possible for limiting the validation time.
05/06/2018 6
05/06/2018 7
1-Acquisition
2-Image Processing
3- Training
4- Classification
• Grab images of the all the organisms
• Segmentation of the images, • Morphological parameters are calculated, • Duplicates are detected, • Thumbnails and a text file are generated
• Defining a set of images for each class (Learning set),
• Eventually optimizing the neural network.
• Each image is assigned to a class defined in the learning set.
5- Validation • Check the classification
The ZooCAM hardware
05/06/2018 8
Acquisition
05/06/2018 9
Acquisition
05/06/2018 10
Acquisition
Digitalization at 1L/min
The sample is diluted in 5L
Analysis in 5min
Less than 10min, rinsing and filling of the stirring system included.
05/06/2018 11
The ZooCAM software
05/06/2018 12
Image processing
• Images of individual organisms are extracted by thresholding,
• 48 Morphological parameters are calculated:
• Size : area, equivalent spherical diameter, perimeter, maximal and minimal
Feret diameter…
• Shape: circularity, convex hull area and perimeters compare to the object
area and size…
• Gray level distribution: 1st, 2nd and 3rd quartile, kurtosis, skewness,
• Area of the holes, number of sub-objects when changing the thresholding…
• Duplicates are identified,
• Thumbnails and a text file containing the metadata and
morphological parameters are generated .
05/06/2018 13
Classification: Random Forest • Until 2018, classification
with Random Forest (RF) algorithm.
• Based on decision trees. • Random forest builds
multiple (at least 500) and random decision trees from independent subsets of the learning set.
• The class is determined by voting
• More accurate and stable prediction.
• Efficient at relatively low computing cost.
05/06/2018 14
Unknown organism
Circularity
Mean gray level
Mean gray level
high low
high high low low
Some results after validation
Zooscan/ZooCAM
05/06/2018 16
Colas F. et al The ZooCAM, a new in-flow imaging system for fast onboard counting, sizing and classification of fish eggs and metazooplankton. Progress in Oceanography (2017) IN PRESS.
Eggs stage
• RF works well but some classification remains difficult.
• New machine learning algorithms are being tested: 1. Deep Neural Network (DNN),
2. Convolutional Neural Network (CNN).
05/06/2018 17
DNN
• Morphological parameters = input parameters • Output parameters = probability of belonging to class, • Each node: 1. linearly combines the input from all the previous node (weight, bias) 2. applies an activation function to the results (tanh, sigmoid, rectified linear...) 3. transmits to the next node. • Training = optimizing the weights, bias to minimize the errors from a learning
set (back-propagation). 05/06/2018 18
Comparison RF/DNN 00_A
rtefa
ct
00_B
ub
ble
00_B
ub
ble
_la
rge
00_D
etr
itu
s
00_F
iber
00_H
alo
sp
hera
00_P
ain
t_ru
st
03_C
op
ep
od
a_la
rge
03_C
op
ep
od
a_sm
all
04_M
ala
co
str
aca_la
rge
05_C
lad
ocera
13_E
ng
rau
lis_eg
g_1
13_E
ng
rau
lis_eg
g_2_3
13_E
ng
rau
lis_eg
g_4_6
13_E
ng
rau
lis_eg
g_7_8
13_E
ng
rau
lis_eg
g_9_11
13_F
ish
_eg
g
13_S
ard
ine_eg
g_1
13_S
ard
ine_eg
g_2_3
13_S
ard
ine_eg
g_4_6
13_S
ard
ine_eg
g_7_8
13_S
ard
ine_eg
g_9_11
13_S
ard
ine_eg
g_d
am
0
20
40
60
80
100
120
Err
or
Class
Error RF
Error DNN
00_B
ub
ble
00_B
ub
ble
_la
rge
00_D
etr
itu
s
00_F
iber
00_H
alo
sp
hera
00_P
ain
t_ru
st
03_C
op
ep
od
a_la
rge
03_C
op
ep
od
a_sm
all
04_M
ala
co
str
aca_la
rge
05_C
lad
ocera
13_E
ng
rau
lis_eg
g_1
13_E
ng
rau
lis_eg
g_2_3
13_E
ng
rau
lis_eg
g_4_6
13_E
ng
rau
lis_eg
g_7_8
13_E
ng
rau
lis_eg
g_9_11
13_S
ard
ine_eg
g_d
am
0
20
40
60
80
100
120
140
160
Err
or
Class
error RF
error DNN
05/06/2018 19
• There is a small gain for fish eggs stage by using DNN,
• But: • The learning set was good for fish eggs and not for the other classes,
• Parameters calculated for Random Forest
CNN
05/06/2018 20
• Image = input, • Output parameters = probability of belonging to class, • Input image undergoes a series of:
– convolution by a series of filters, – Sub-sampling (pooling),
• Training = optimizing the CNN parameters to minimize the errors from a learning set.
05/06/2018 21
RF DNN CNN
Learning Few hundred Few hundred Several 1000
Accuracy (auto-learning)
Greater than 98% Greater than 98% Greater than 96%
Accuracy (Real sample)
Depends on the sample and class of interest.
Depends on the sample and class of interest.
Depends on the sample and class of interest.
Ressource info learning
Memory for the decision three: 650 threes 23 classes: <40s Mem: 6Go.
3 layers and fews tens of nodes per layers, Fews 1000 Epochs Few hours per learning Networks settings saved (few 100ko)
CPU: Low memory Memory acces Several tens of hours
GPU: Video memory Several hours
Networks settings saved (few 100ko)
Ressource info classification
Fast (40s per forest+15s per classification)
Very fast (less than 15s including moving files)
Very fast (less than 15s including moving files)
Disadvantages Morphological parameters calculated
Morphological parameters calculated
Resizing the image=> loss of information? Huge training set
Advantages Robust for plankton samples
Network settings can be saved so very fast classification
Network settings can be saved so very fast classification Good for cutted organisms
Conclusion
• RF and DNN provided almost equivalent results,
• CNN seemed more accurate according to EcoTaxa. It is worth investigating it but with several thousands of images and GPU calculation!
• Validation is still necessary!
05/06/2018 22