13
Research Paper Identifying multiple plant diseases using digital image processing Jayme Garcia Arnal Barbedo * , Luciano Vieira Koenigkan, Thiago Teixeira Santos Embrapa Agricultural Informatics, Campinas, SP, Brazil article info Article history: Received 23 September 2015 Received in revised form 3 March 2016 Accepted 28 March 2016 Published online 29 April 2016 Keywords: Automatic disease recognition Visible symptoms Colour transformations The gap between the current capabilities of image-based methods for automatic plant disease identification and the real-world needs is still wide. Although advances have been made on the subject, most methods are still not robust enough to deal with a wide variety of diseases and plant species. This paper proposes a method for disease identification, based on colour transformations, colour histograms and a pairwise-based classification system. Its performance was tested using a large database containing images of symptoms belonging to 82 different biotic and abiotic stresses, affecting the leaves of 12 different plant species. The wide variety of images used in the tests made it possible to carry out an in-depth investigation about the main advantages and limitations of the proposed algo- rithm. A comparison with other algorithms is also presented, and some possible solutions for the main challenges that still prevent this kind of tool to be adopted in practice. © 2016 IAgrE. Published by Elsevier Ltd. All rights reserved. 1. Introduction The timely diagnosis of plant diseases is as important as it is challenging. Although human sight and cognition are remarkably powerful in identifying and interpreting patterns, the visual assessment of plant diseases, being a subjective task, is subject to psychological and cognitive phenomena that may lead to bias, optical illusions and, ultimately, to error. Ambiguities may be resolved by laboratorial analysis, however this is a process that is often time consuming and expensive. Additionally, many producers around the world do not have access to technical advice from rural extension, making their crops especially vulnerable to yield losses and further problems caused by plant diseases. Considerable effort has been made in the search for methods to improve the reliability and speed of the process, which inevitably involves some kind of automation. Most of the methods proposed so far try to explore imaging technol- ogies to achieve this goal (Barbedo, 2013). Among the most used imaging techniques are the fluorescence (Bauriegel, Giebel, & Herppich, 2010; Belin, Rousseau, Boureau, & Caffier, 2013; Kuckenberg, Tartachnyk, & Noga, 2009; Lins, Belasque, & Marcassa, 2009; Rodrı ´guez-Moreno et al., 2008), multispectral and hyperspectral (Barbedo, Tibola, & Fernandes, 2015; Mahlein, Steiner, Hillnhu ¨ tter, Dehne, & Oerke, 2012; Oberti et al., 2014; Polder, van der Heijden, van Doorn, & Baltissen, 2014; Zhang e al., 2014), and conven- tional photographs in the visible range (Barbedo, 2014; Cl ement, Verfaille, Lormel, & Jaloux, 2015; Kruse et al., 2014; * Corresponding author. E-mail address: [email protected] (J.G.A. Barbedo). Available online at www.sciencedirect.com ScienceDirect journal homepage: www.elsevier.com/locate/issn/15375110 biosystems engineering 147 (2016) 104 e116 http://dx.doi.org/10.1016/j.biosystemseng.2016.03.012 1537-5110/© 2016 IAgrE. Published by Elsevier Ltd. All rights reserved.

Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

ww.sciencedirect.com

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6

Available online at w

ScienceDirect

journal homepage: www.elsevier .com/locate/ issn/15375110

Research Paper

Identifying multiple plant diseases using digitalimage processing

Jayme Garcia Arnal Barbedo*, Luciano Vieira Koenigkan,Thiago Teixeira Santos

Embrapa Agricultural Informatics, Campinas, SP, Brazil

a r t i c l e i n f o

Article history:

Received 23 September 2015

Received in revised form

3 March 2016

Accepted 28 March 2016

Published online 29 April 2016

Keywords:

Automatic disease recognition

Visible symptoms

Colour transformations

* Corresponding author.E-mail address: jayme.barbedo@embrapa

http://dx.doi.org/10.1016/j.biosystemseng.2011537-5110/© 2016 IAgrE. Published by Elsevie

The gap between the current capabilities of image-based methods for automatic plant

disease identification and the real-world needs is still wide. Although advances have been

made on the subject, most methods are still not robust enough to deal with a wide variety

of diseases and plant species. This paper proposes a method for disease identification,

based on colour transformations, colour histograms and a pairwise-based classification

system. Its performance was tested using a large database containing images of symptoms

belonging to 82 different biotic and abiotic stresses, affecting the leaves of 12 different

plant species. The wide variety of images used in the tests made it possible to carry out an

in-depth investigation about the main advantages and limitations of the proposed algo-

rithm. A comparison with other algorithms is also presented, and some possible solutions

for the main challenges that still prevent this kind of tool to be adopted in practice.

© 2016 IAgrE. Published by Elsevier Ltd. All rights reserved.

1. Introduction

The timely diagnosis of plant diseases is as important as it is

challenging. Although human sight and cognition are

remarkably powerful in identifying and interpreting patterns,

the visual assessment of plant diseases, being a subjective

task, is subject to psychological and cognitive phenomena

that may lead to bias, optical illusions and, ultimately, to

error. Ambiguities may be resolved by laboratorial analysis,

however this is a process that is often time consuming and

expensive. Additionally, many producers around the world do

not have access to technical advice from rural extension,

making their crops especially vulnerable to yield losses and

further problems caused by plant diseases.

.br (J.G.A. Barbedo).6.03.012r Ltd. All rights reserved

Considerable effort has been made in the search for

methods to improve the reliability and speed of the process,

which inevitably involves some kind of automation. Most of

the methods proposed so far try to explore imaging technol-

ogies to achieve this goal (Barbedo, 2013). Among the most

used imaging techniques are the fluorescence (Bauriegel,

Giebel, & Herppich, 2010; Belin, Rousseau, Boureau, &

Caffier, 2013; Kuckenberg, Tartachnyk, & Noga, 2009; Lins,

Belasque, & Marcassa, 2009; Rodrıguez-Moreno et al., 2008),

multispectral and hyperspectral (Barbedo, Tibola, &

Fernandes, 2015; Mahlein, Steiner, Hillnhutter, Dehne, &

Oerke, 2012; Oberti et al., 2014; Polder, van der Heijden, van

Doorn, & Baltissen, 2014; Zhang e al., 2014), and conven-

tional photographs in the visible range (Barbedo, 2014;

Cl�ement, Verfaille, Lormel, & Jaloux, 2015; Kruse et al., 2014;

.

Page 2: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Nomenclature

c Number of disease classes

CD Correlation differences

CMYK CyaneMagentaeYellow-Key colour space

D Set of all diseases

HSV Hue-Saturation-Value colour space

L*a*b Colour space with L* representing lightness and

a* and b* representing colour-opponent

dimensions

Ld Likelihood that the symptoms were produced

by disease d

M Final segmentation mask

M1, M2, M3, M4 Basic binary segmentation masks

Ma, Mb Intermediate binary segmentation masks

MPixels Millions of pixels

r1, r2 Deviation of each pixel from a purely green hue

towards red and blue

RGB RedeGreeneBlue colour space

ROI Region of interest

v1, v2 Correlation difference vectors.

Xc,d Cross-correlation between intensity and

reference histograms, considering channel c

and disease d

ε Arbitrarily small value that aims at avoiding

divisions by zero

b i o s y s t em s e ng i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6 105

Phadikar, Sil, & Das, 2013; Pourreza, Lee, Etxeberria, &

Banerjee, 2015; Zhou, Kaneko, Tanaka, Kayamori, & Shimizu,

2015). The latter one is the least expensive and most acces-

sible technology, as the prices of digital cameras continue to

drop, and most mobile devices include cameras that provide

images with acceptable quality.

Thus, it is no surprise that methods for automatic plant

disease diagnosis based on visible range digital images have

received special attention. However, although there have been

advances, those are mostly limited to cases in which the

conditions, both in terms of disease manifestation and image

capture, are tightly controlled. As a result, there is a lack of

methods that can be used under the real, uncontrolled con-

ditions found in the field. The reasons for this are discussed in

depth in Barbedo (2016).

This paper presents a new digital image-based method for

automatic disease identification. This method, which is based

on colour transformations, intensity histograms and a

pairwise-based classification system, was designed specif-

ically to operate under uncontrolled conditions and to deal

with a large number of diseases. Additionally, new diseases

can be included without changing the component of the sys-

tem that has already been trained, making the process

straightforward. This method was tested with a large, un-

constrained set of leaf images containing symptoms

belonging to 74 diseases, 4 pests and 4 abiotic disorders,

affecting 12 different plant species. The images containing

symptoms that were not produced by diseases were included

because those are also important sources of diagnosis

confusion, making the database more comprehensive. The

images were captured under a wide variety of conditions

regarding lighting, angle of capture, stage of development of

the disease and leaf maturity. No constraint was enforced

during the captures, and no image was removed from the

dataset, no matter how far from ideal was the capture con-

ditions. As a result, the method was stressed to its limits,

revealing a wealth of information about the challenge of dis-

ease identification when several diseases are considered. This

allowed an in-depth analysis of the challenges that are ex-

pected to be faced in practice, as discussed here and, in more

detail, in Barbedo (2016).

2. Material and methods

2.1. Image dataset

As mentioned before, the database used in this work contains

images of 82 different disorders distributed over 12 plant

species: CommonBean (Phaseolus vulgaris L.), Cassava (Manihot

esculenta), Citrus (Citrus sp.), Coconut Tree (Cocos nucifera),

Coffee (Coffea sp.), Corn (Zea mays), Cotton (Gossypium hirsu-

tum), Grapevines (Vitis sp.), Passion Fruit (Passiflora edulis),

Soybean (Glycine max), Sugarcane (Saccharum spp.) and Wheat

(Triticum aestivum). The images were captured using a variety

of digital cameras and mobile devices, with resolutions

ranging from 1 to 24 MPixels. About 15% of the images were

captured under controlled conditions, either by transporting

the detached leaves to laboratories, or by placing the leaves

inside closed dark boxes with an opening for lighting and

image capture. The remainder 85% of the images were

captured under real conditions, with the leaves attached to

the host plant, at several experimental fields of the Brazilian

Agricultural Research Corporation (Embrapa). For these, no

constraint regarding resolution, field of view or capture con-

ditions was enforced during the image capture. This decision

aimed at producing an image database closely reproducing

conditions and situations that the proposed method will have

to deal if used in practice by producers with little or no

knowledge about imaging techniques. All images were stored

in the 8-bit RGB format. Table 1 shows how the database is

distributed in terms of plant species and disorders.

2.2. Image analysis procedure

Figure 1 shows the general structure of the proposed algo-

rithm for the analysis of the symptoms. As it can be seen, the

algorithm was divided into three main blocks, basic process-

ing, training (performed only once) and core. Each box will be

detailed in the following.

The implementation of the algorithm included a graphical

interface to guide the user through the process. Figure 2 shows

the interface, with an image of southern corn leaf blight

symptoms as example.

2.2.1. Basic processingThe first task was the segmentation of the leaf containing the

symptoms in order to remove the background. If the leaf is

isolated from the background by some kind of screen, the task

is trivial, however this was not the case for many of the im-

ages used in this work. As a result, the Guided Active Contour

Page 3: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 1 e Image database composition with plantdiseases and their hosts.

Specimen Disorder # Samples

Common Bean Anthracnose 14

Cercospora leaf spot 2

Angular mosaic 3

Common bacterial blight 25

Rust 2

Hedylepta indicata 5

Target leaf spot 23

Bacterial brown spot 2

Web blight 7

Powdery mildew 8

Bean golden mosaic 9

Phytotoxicity 7

Cassava Bacterial blight 17

White leaf spot 8

Cassava common mosaic 1

Cassava vein mosaic 11

Cassava ash 1

Blight leaf spot 1

Citrus Algal spot 5

Alternaria brown spot 2

Canker 8

Sooty mold 3

Leprosis 13

Bacterial spot 5

Greasy spot 8

Scab 2

Coconut tree Coconut scale 5

Bipolaris leaf spot 2

Lixa grande 31

Lixa pequena 33

Cylindrocladium leaf spot 5

Whitefly 2

Phytotoxicity 2

Corn Anthracnose leaf blight 7

Maize bushy stunt 3

Tropical corn rust 14

Southern corn rust 15

Scab 3

Southern corn leaf blight 43

Phaeosphaeria Leaf Spot 31

Diplodia leaf streak 6

Brown spot 8

Northern corn leaf blight 46

Coffee Leaf miner 12

Brown eye spot 35

Leaf rust 17

Bacterial blight 31

Blister spot 8

Brown leaf spot 21

Cotton Seedling disease complex 32

Myrothecium leaf spot 27

Areolate mildew 36

Grapevines Bacterial canker 10

Rust 8

Isariopsis leaf spot 1

Downy mildew 17

Powdery mildew 15

Fanleaf degeneration 2

Table 1 e (continued )

Specimen Disorder # Samples

Passion fruit Anthracnose 2

Cercospora leaf spot 4

Scab 1

Bacterial blight 21

Septoria spot 5

Woodiness 19

Soybean Bacterial blight 56

Cercospora leaf blight 2

Rust 65

Phytotoxicity 23

Soybean Mosaic 22

Target spot 62

Myrothecium leaf spot 2

Downy mildew 46

Powdery mildew 76

Brown spot 20

Sugarcane Orange rust 18

Ring spot 43

Red rot 49

Red stripe 4

Wheat Wheat blast 14

Leaf rust 24

Tan spot 2

Powdery mildew 35

Total 1335

Fig. 1 e Structure of the algorithm to identify plant diseases

by digital image processing.

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6106

Page 4: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Fig. 2 e User interface for the proposed algorithm.

b i o s y s t em s e ng i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6 107

(GAC) method (Cerutti, Tougne, Mille, Vacavant, & Coquin,

2011) was selected for this task, due to its good performance

in the tests performed by Grand-Brochier, Vacavant, Cerutti,

Bianchi, and Tougne (2013). No other preprocessing tech-

niques (histogram thresholding, contrast enhancement, etc.)

were applied to the images because they did not have any

positive impact on the results.

The second step of the algorithm is the symptom seg-

mentation, which begins with the calculation of two ratios for

each pixel in the image, r1 ¼ R=ðGþ εÞ and, r2 ¼ B=ðGþ εÞ,where R, G and B are the pixel values of the red, green and blue

channels of the RGB representation, respectively, and ε is an

arbitrarily small value that aims at avoiding divisions by zero.

The values of r1 and r2 measure, respectively, the deviation of

each pixel from a purely green hue towards red and blue, that

is, the smaller their values, the greener is the pixel and, in

theory, the healthier is that part of the leaf. Four binary seg-

mentation masks (M1 to M4) are then generated by applying

the following rules, with i and j being the coordinates of the

pixels:

- M1(i,j) ¼ 1 if r1(i,j) > 1, and 0 otherwise;

- M2(i,j) ¼ 1 if r2(i,j) > 1, and 0 otherwise;

- M3(i,j) ¼ 1 if r1(i,j) > 0.9, and 0 otherwise;

- M4(i,j) ¼ 1 if r2(i,j) > 0.67, and 0 otherwise.

Those masks are, in turn, combined into two intermediate

masks: Ma ¼ M1jjM2 and Mb ¼ M3&M4, where jj and & repre-

sent the Boolean logic operators “or” and “and”. Ma highlights

darker symptoms ranging from yellow to dark brown, while

Mb highlights bright symptoms. The final segmentation mask

is obtained by M ¼ MajjMb, which is applied to the original

image, effectively isolating the symptoms. It is important to

notice that most symptoms do not have clear boundaries,

rather gradually fading into healthy tissue. The thresholds

applied to M1 to M4 were selected in such a way the central

part of the symptoms and most of the fading region are

considered. The placement of the boundaries may be changed

by adjusting those values properly.

In the next step, the isolated symptoms and lesions,

which are in the RGB format, are transformed to the HSV,

L*a*b* and CMYK colour spaces. In other words, the three

original colour channels are arithmetically manipulated to

generate ten new colour channels (H, S, V, L, a, b, C, M, Y and

K). Each one of those channels has different characteristics

that may be more or less suitable for identifying each kind of

symptom. Figure 3 shows a mosaic containing the original

symptom image (obtained from the complete image shown

in Fig. 2) and the greyscale representation of the ten newly

generated channels.

At this point, the algorithmwas divided into two branches,

training and core. The core part uses the parameters and

values determined in the training part to perform the disease

identification. In other words, the first branch is used only

when some training is necessary (for example if a new disease

is to be included), while the second branch is the actual dis-

ease classifier, being the one with the potential to be used in

practice.

2.2.2. TrainingApproximately 70% of the images in the databasewere used in

the training, with the remainder 30% being used in the tests.

This proportion was kept for all plant species considered in

this work.

As commented before, many of the diseases present quite

similar visual symptoms. One way to deal with classification

problems for which the classes are not well defined is to divide

the problem containing c classes into c(c�1)/2 binary (or two-

class) problems, an approach known as pairwise classifica-

tion (Park & Furnkranz, 2007). The principles of the pairwise

classification, with some adaptations, are adopted here.

Considering the example of corn, in which ten diseases were

considered, this resulted in 45 possible pairs of diseases. The

main objective of the training stage was to determine which

colour channel provided the best results for each pair of

diseases.

The first step in the training part of the algorithm was to

generate, for each disease and each colour channel, a 100-bin

reference histogram combining the data contained in all cor-

responding images in the training set. Those histograms

aimed at capturing the general behaviour of each disease for

all colour channels considered in this work. It is important to

highlight that the success of those reference histograms at

capturing the basic characteristics of a given disease is closely

linked to the uniformity of the intensity histograms of the

corresponding individual images. In other words, if the char-

acteristics of the symptoms of a given disease vary signifi-

cantly from one image to another in a given colour channel,

the resulting reference histogram will reflect that by trying to

fit all images, but not quite succeeding for any of them.

Because of that, a measurement for how reliable is a given

reference histogram, the consistency value, was created.

Since most of the reference histograms were discarded in the

following steps, the consistency values were calculated only

in the end of the training process.

The colour channels whose reference histograms corre-

lated the least for each pair of diseases were taken as the ones

with the best discriminative capabilities for those pairs. Again

considering corn as example, when the pair anthracnose-

bushy stunt was considered, the correlations between the

ten reference histograms of anthracnose and their counter-

parts of bushy stunt were calculated, with channel H yielding

the lowest correlation and, consequently, having the best

discriminative capabilities for this pair. Table 2 shows the

Page 5: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Fig. 3 e Example of representation of the symptoms (corn's Phaeosphaeria Leaf Spot) in all ten colour channels considered

in this work. The letters in the upper left corners indicate the respective colour channels. (For interpretation of the

references to colour in this figure legend, the reader is referred to the web version of this article.)

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6108

channels with best discriminative capabilities for each pair of

corn diseases.

At this point, only the reference histograms with best

discriminative capabilities were kept.

The final step in the training stage was the calculation of

the consistency values. The cross-correlations between each

selected reference histogram and the histograms of all cor-

responding images in the training set were calculated and

averaged. The closer the resulting value was to one, the more

consistent was the colour channel for that disease, and hence

the stronger the results based on it. For corn, since there were

45 pairs of diseases, and the calculations were performed for

both diseases in each pair, 90 consistency values were stored.

It is important to remark that if a new disease was to be

included, the retraining would require only determining the

reference histograms for the new disease, and then investi-

gatingwhich channelswould best distinguish the newdisease

from each of the original ones.

2.2.3. CoreThe core of the algorithm is the part where the actual disease

identification is performed. After an image goes through the

Page 6: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 2 e Channel with best discriminative capabilities for each pair of corn diseases.

Anthrac. Bushystunt

Tropicalrust

Southerncorn rust

Scab S. corn leafblight

Phaeosp. LeafSpot

Dip. leafstreak

Ph. brownspot

N. LeafBlight

Anthrac. e H K b V K L V K V

Bushy stunt H e M b M M M M M M

Tropical rust K M e S K Y b H H Y

Southern

corn rust

b b S e S S Y Y M Y

Scab V M K S e a K a H Y

S. corn leaf

blight

K M Y S a e Y Y H Y

Phaeosp. Leaf

Spot

L M b Y K Y e a H H

Dip. leaf

streak

V M H Y a Y a e H Y

Ph. brown

spot

K M H M H H H H e Y

N. Leaf Blight V M Y Y Y Y H Y Y e

b i o s y s t em s e ng i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6 109

basic part of the algorithm, the intensity histograms for all 10

resulting channels are calculated. In the following, the cross-

correlations Xc,d between those intensity histograms and the

reference ones are calculated, where c indicates the colour

channel and d indicates the disease, resulting in 100 correla-

tion values.

In the next step, each pair of diseases is analysed as an

independent problem. For each pair, the two corresponding

cross-correlationsXc,d are selected, where c is given by Table 2,

and d corresponds to the two diseases in that pair. The

following correlation differences are then calculated:

CDd1 ¼ Xcd1;d2 ;d1 � Xcd1;d2 ;d2 (1)

CDd2 ¼ Xcd1;d2 ;d2 � Xcd1;d2 ;d1 (2)

where d1 and d2 are the first and second disease in the pair,

respectively, and cd1,d2 is the colour channel corresponding

to the (d1,d2) disease pair. The larger is the correlation dif-

ference CD for a given disease, the stronger is the indication

that the symptoms are more closely related to such disease,

and vice versa. CDd1 and CDd2 are then stored in the corre-

lation difference vectors v1 and v2. The same procedure is

repeated for all pairs of diseases, so the vector corre-

sponding to each disease will have nine correlation differ-

ence values.

The next step is the calculation of the likelihood that the

symptoms were produced by each of the diseases, according

to:

Ld ¼P

i¼D; isd

�vd;i$cd;i

Pi¼D; isd

�cd;i

� (3)

where L is the likelihood, d indicates the current disease, D is

the set of all diseases, and c are the consistency values

calculated in the training part. The index (d,i) indicates that

the value corresponds to the pair containing the current dis-

ease d and disease i, with i2D.

Finally, all diseases are ranked from the highest to the

lowest likelihood. Due to the nature of the calculations, like-

lihood values larger than one and smaller than zero are

possible, in which case they are rounded to one and zero,

respectively. An interpretation of this ranking and corre-

sponding likelihood values is presented in Section 2.3.

2.3. Test setup and validation

As stated before, approximately 30% of the images in the

database were used in the tests. Each image was processed

following the diagram shown in Fig. 1, excluding the training

part, which was performed prior to the tests. The training and

test sets were defined randomly. This means that at least

some of the images in the test set may have been captured

under conditions that were not considered in the training.

This was done deliberately, because the image database used

in this work does not cover all possible practical conditions,

thus it would be important to determine how robust is the

algorithm when faced with new situations.

The results presented for each plant consist of a confusion

matrix built considering the disease with highest likelihood.

The confusion matrix reveals both the accuracy of the algo-

rithm (main diagonal) and which diseases have a higher de-

gree of similarity, increasing the error rates. Each confusion

matrix have an error analysis table associated, which takes all

misclassifications and counts the number of times the correct

disease appears in each position of the ranking. This aims to

qualify the mistakes made by the algorithm: if the correct

disease was classified in the second place of the ranking, for

example, this means that the algorithm almost got it right,

and some kind of flexible classification could be applied to

relativize the results, for example by considering that any

disease whose calculated likelihood is above a certain

threshold may be the correct one, even if it is not the first in

the ranking; on the other hand, if the correct disease was

placed last in the ranking, this means that the algorithm

provided a completely wrong estimation.

A comparison with other algorithms and methods pro-

posed in the literature is also presented, always taking into

consideration that any of those methods were develop to deal

with a number of diseases as large as considered in this work.

Since the segmentation of the leaves and symptoms is not a

trivial task, and eventual segmentation flaws may lead to

Page 7: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 4eDistribution of the errors on the disease rankingfor common bean.

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6110

classification errors, a manual version of the proposed algo-

rithm was also tested in order to quantify the impact of seg-

mentation errors.

Position 2 3 4 5 6 7 8 9 10 11 12

% each position 43 21 4 8 11 11 0 2 0 0 0

Table 5 e Confusion matrix for cassava diseases. Thegrey shades indicate the correct classifications.

1 2 3 4 5 6

1 52.9 0.0 0.0 17.6 11.8 17.6

2 12.5 37.5 0.0 12.5 0.0 37.5

3 0.0 0.0 100.0 0.0 0.0 0.0

4 9.1 0.0 63.6 27.3 0.0 0.0

5 0.0 0.0 0.0 0.0 100.0 0.0

6 0.0 0.0 0.0 0.0 0.0 100.0

Legend: 1. Bacterial blight; 2. White leaf spot; 3. Cassava common

mosaic; 4. Cassava vein mosaic; 5. Cassava ash; 6. Blight leaf spot.

Table 6eDistribution of the errors on the disease rankingfor cassava.

Position 2 3 4 5 6

% each position 52 14 29 5 0

Table 7 e Confusion matrix for citrus diseases. The greyshades indicate the correct classifications.

1 2 3 4 5 6 7 8

1 40.0 0.0 20.0 0.0 0.0 20.0 20.0 0.0

2 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0

3 0.0 0.0 50.0 0.0 25.0 0.0 12.5 12.5

4 33.3 0.0 0.0 66.7 0.0 0.0 0.0 0.0

5 16.7 0.0 33.3 0.0 41.7 0.0 8.3 0.0

6 20.0 0.0 0.0 0.0 0.0 80.0 0.0 0.0

7 0.0 0.0 25.0 0.0 12.5 0.0 62.5 0.0

8 0.0 0.0 50.0 0.0 0.0 0.0 0.0 50.0

Legend: 1. Algal spot; 2. Alternaria brown spot; 3. Canker; 4. Sooty

mold; 5. Leprosis; 6. Bacterial spot; 7. Greasy spot; 8. Scab.

3. Results and discussion

3.1. Results by plant species

This section presents the results individualized for each plant

species, which are presented in alphabetical order. The com-

parison with other methods and the general discussions are

presented in the next subsections.

Table 3 shows the confusion matrix obtained for common

beans, and Table 4 shows the respective ranking distribution

for the correct diseases when not ranked first.

The overall accuracy of the algorithm for common beans

was 50%, which is a relatively good result given that 12

different diseaseswere considered. In addition, for almost two

thirds of the errors the correct diseasewas placed in one of the

first three positions of the ranking, meaning that the algo-

rithm provided reasonably good estimates in more than 80%

of the cases.

Table 5 shows the confusion matrix obtained for cassava,

and Table 6 shows the respective ranking distribution for the

correct diseases when not ranked first.

The overall accuracy of the algorithm for cassava was 46%,

with most errors coming from the similarities between the

viral diseases and between white and blight leaf spots. The

correct disease was ranked in first or second in 74% of the

cases.

Table 7 shows the confusion matrix obtained for citrus

trees, and Table 8 shows the respective ranking distribution

for the correct diseases when not ranked first.

The overall accuracy of the algorithm for citrus was 56%,

which is also a good result with eight different diseases being

considered. Additionally, the correct disease was ranked first

to third in 89% of the cases.

Table 9 shows the confusion matrix obtained for coconut

trees, and Table 10 shows the respective ranking distribution

for the correct diseases when not ranked first.

Table 3 e Confusion matrix for common bean diseases. The grey shades indicate the correct classifications.

1 2 3 4 5 6 7 8 9 10 11 12

1 15.4 0.0 7.7 38.5 0.0 0.0 0.0 23.1 0.0 0.0 0.0 15.4

2 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

4 4.0 0.0 0.0 60.0 0.0 0.0 0.0 28.0 0.0 0.0 0.0 8.0

5 0.0 0.0 0.0 0.0 50.0 0.0 0.0 0.0 50.0 0.0 0.0 0.0

6 0.0 0.0 0.0 0.0 0.0 80.0 0.0 20.0 0.0 0.0 0.0 0.0

7 0.0 0.0 8.7 26.1 0.0 0.0 52.2 0.0 0.0 0.0 0.0 13.0

8 0.0 0.0 50.0 0.0 0.0 0.0 0.0 50.0 0.0 0.0 0.0 0.0

9 0.0 14.3 0.0 0.0 0.0 28.6 14.3 0.0 42.9 0.0 0.0 0.0

10 0.0 0.0 12.5 12.5 0.0 0.0 0.0 12.5 0.0 62.5 0.0 0.0

11 0.0 0.0 55.6 11.1 0.0 0.0 11.1 11.1 0.0 0.0 11.1 11.1

12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 42.9 0.0 57.1

Legend: 1. Anthracnose; 2. Cercospora leaf spot; 3. Angularmosaic; 4. Common bacterial blight; 5. Rust; 6.Hedylepta indicata; 7. Target leaf spot; 8.

Bacterial brown spot; 9. Web blight; 10. Powdery mildew; 11. Bean golden mosaic; 12. Phytotoxicity.

Page 8: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 8eDistribution of the errors on the disease rankingfor citrus trees.

Position 2 3 4 5 6 7 8

% each position 45 30 0 10 10 5 0

Table 9e Confusionmatrix for coconut tree diseases. Thegrey shades indicate the correct classifications.

1 2 3 4 5 6 7

1 80.0 0.0 0.0 0.0 0.0 20.0 0.0

2 0.0 100.0 0.0 0.0 0.0 0.0 0.0

3 0.0 0.0 100.0 0.0 0.0 0.0 0.0

4 9.7 0.0 0.0 64.5 25.8 0.0 0.0

5 0.0 0.0 6.1 18.2 72.7 3.0 0.0

6 0.0 0.0 0.0 20.0 0.0 80.0 0.0

7 50.0 0.0 0.0 0.0 0.0 0.0 50.0

Legend: 1. Aspidiotus destructor; 2. Bipolaris; 3. Lixa grande; 4. Lixa

pequena; 5. Cylindrocladium leaf spot; 6. Whitefly; 7. Phytotoxicity.

Table 10 e Distribution of the errors on the diseaseranking for coconut trees.

Position 2 3 4 5 6 7

% each position 74 4 9 0 13 0

Table 11 e Confusionmatrix for coffee diseases. The greyshades indicate the correct classifications.

1 2 3 4 5 6

1 50.0 33.3 0.0 8.3 8.3 0.0

2 11.4 51.4 8.6 5.7 20.0 2.9

3 11.8 0.0 52.9 0.0 35.3 0.0

4 0.0 3.2 0.0 51.6 6.5 38.7

5 12.5 0.0 25.0 0.0 62.5 0.0

6 0.0 4.8 0.0 33.3 4.8 57.1

Legend: 1. Leaf miner; 2. Brown eye spot; 3. Leaf rust; 4. Bacterial

blight; 5. Blister spot; 6. Brown leaf spot.

Table 12 e Distribution of the errors on the diseaseranking for coffee.

Position 2 3 4 5 6

% each position 55 17 17 9 2

b i o s y s t em s e ng i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6 111

The overall accuracy of the algorithm for coconut treeswas

71%, and the correct disease was ranked first or second in 92%

of the cases, which is a very good result for seven diseases.

Table 11 shows the confusion matrix obtained for coffee,

and Table 12 shows the respective ranking distribution for the

correct diseases when not ranked first.

The overall accuracy of the algorithm for coffee was 53%,

and the correct disease was ranked first or second in 80% of

the cases. The results for all coffee diseases were similar, with

accuracies ranging from 50 to 65%.

Table 13 shows the confusionmatrix obtained for corn, and

Table 14 shows the respective ranking distribution for the

correct diseases when not ranked first.

The overall accuracy of the algorithm for corn was 40%,

and the correct disease was placed in one of the first three

positions of the ranking in 78% of the times. The relatively

poor performance of the algorithm for corn when compared

with other species is due to a number of factors: large number

of diseases, many images captured under very poor condi-

tions, and many diseases with very similar characteristics.

Table 15 shows the confusion matrix obtained for cotton,

and Table 16 shows the respective ranking distribution for the

correct diseases when not ranked first.

The overall accuracy of the algorithm for cotton was 76%,

and the correct disease was placed in first or second in 91% of

the times. With only three diseases to consider, the algorithm

was able to provide a better performance, although the Myr-

othecium leaf spot could not be properly characterized using

the images present in the database, causing the high error

rates observed for this disease.

Table 17 shows the confusion matrix obtained for grape-

vines, and Table 18 shows the respective ranking distribution

for the correct diseases when not ranked first.

The overall accuracy of the algorithm for grapevines was

58%, and the correct disease was ranked first to third in 79% of

the cases. The algorithm was not enough to characterize rust

properly using the images present in the database.

Table 19 shows the confusion matrix obtained for passion

fruit, and Table 20 shows the respective ranking distribution

for the correct diseases when not ranked first.

The overall accuracy of the algorithm for passion fruit was

56%, and the correct disease was ranked first to third in 90% of

the cases. The algorithm failed to correctly model and identify

Cercospora spot.

Table 21 shows the confusionmatrix obtained for soybean,

and Table 22 shows the respective ranking distribution for the

correct diseases when not ranked first.

The overall accuracy of the algorithm for soybeanwas 58%,

and the correct disease was placed in one of the first three

positions of the ranking in 88% of the times. The results for

soybean plants were consistently good, especially considering

that 10 diseases were considered. The only exception was

brown spot, which had characteristics too similar to other

diseases in the database.

Table 23 shows the confusion matrix obtained for sugar-

cane, and Table 24 shows the respective ranking distribution

for the correct diseases when not ranked first.

The overall accuracy of the algorithm for sugarcane was

59%, and the correct disease was ranked first or second in 78%

of the times. The results for sugarcane were poorer than ex-

pected, since only four diseases were considered. This was

probably due to the unbalance in the number of images

available for each disease, which caused the algorithm to

become biased.

Table 25 shows the confusion matrix obtained for wheat,

and Table 26 shows the respective ranking distribution for the

correct diseases when not ranked first.

The overall accuracy of the algorithm for wheat was 70%,

and the correct disease was ranked first or second in 83% of

the times. The algorithm successfully captured the charac-

teristics of all diseases except rust due to its mild features in

the images present in the database.

Page 9: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 13 e Confusion matrix for corn diseases. The grey shades indicate the correct classifications.

1 2 3 4 5 6 7 8 9 10

1 28.6 0.0 0.0 0.0 28.6 0.0 42.9 0.0 0.0 0.0

2 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3 0.0 0.0 21.4 0.0 7.1 0.0 42.9 7.1 21.4 0.0

4 0.0 0.0 0.0 13.3 60.0 13.3 6.7 6.7 0.0 0.0

5 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0

6 9.3 2.3 0.0 4.7 32.6 23.3 9.3 0.0 2.3 16.3

7 0.0 0.0 0.0 0.0 29.0 0.0 67.7 0.0 0.0 3.2

8 0.0 0.0 0.0 0.0 50.0 0.0 0.0 33.3 0.0 16.7

9 0.0 0.0 25.0 0.0 25.0 12.5 12.5 0.0 25.0 0.0

10 2.2 2.2 8.7 0.0 4.3 2.2 13.0 17.4 2.2 47.8

Legend: 1. Anthracnose leaf blight; 2. Maize bushy stunt; 3. Tropical corn rust; 4. Southern corn rust; 5. Scab; 6. Southern corn leaf blight; 7.

Phaeosphaeria Leaf Spot; 8. Diplodia leaf streak; 9. Brown spot; 10. Northern corn leaf blight.

Table 14 e Distribution of the errors on the diseaseranking for corn.

Position 2 3 4 5 6 7 8 9 10

% each position 31 9 6 11 7 5 13 17 1

Table 15e Confusionmatrix for cotton diseases. The greyshades indicate the correct classifications.

1 2 3

1 100.0 0.0 0.0

2 66.7 14.8 18.5

3 0.0 0.0 100.0

Legend: 1. Seedling disease complex; 2. Myrothecium leaf spot; 3.

Areolate mildew.

Table 16 e Distribution of the errors on the diseaseranking for cotton.

Position 2 3

% each position 65 35

Table 17 e Confusion matrix for grapevine diseases. Thegrey shades indicate the correct classifications.

1 2 3 4 5 6

1 90.0 0.0 0.0 10.0 0.0 0.0

2 12.5 12.5 25.0 0.0 0.0 50.0

3 0.0 0.0 100.0 0.0 0.0 0.0

4 35.3 0.0 0.0 41.2 17.6 5.9

5 6.7 0.0 6.7 6.7 80.0 0.0

6 0.0 0.0 0.0 50.0 0.0 50.0

Legend: 1. Bacterial canker; 2. Rust; 3. Isariopsis leaf spot; 4. Downy

mildew; 5. Powdery mildew; 6. Fanleaf degeneration.

Table 18 e Distribution of the errors on the diseaseranking for grapevines.

Position 2 3 4 5 6

% each position 32 18 27 9 14

Table 19 e Confusion matrix for passion fruit diseases.The grey shades indicate the correct classifications.

1 2 3 4 5 6

1 100.0 0.0 0.0 0.0 0.0 0.0

2 25.0 25.0 0.0 25.0 0.0 25.0

3 0.0 0.0 100.0 0.0 0.0 0.0

4 10.0 0.0 0.0 55.0 30.0 5.0

5 0.0 0.0 0.0 40.0 60.0 0.0

6 0.0 10.5 21.1 5.3 5.3 57.9

Legend: 1. Anthracnose; 2. Cercospora spot; 3. Scab; 4. Bacterial

blight; 5. Septoria spot; 6. Woodiness.

Table 20 e Distribution of the errors on the diseaseranking for passion fruit.

Position 2 3 4 5 6

% each position 41 36 14 9 0

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6112

3.2. Comparison with other methods

A direct comparison with other methods found in the litera-

ture is difficult, because almost all of them were designed to

deal with only a few diseases of specific plant species. How-

ever, in order to support such a comparison, however imper-

fect, two other methods were implemented based on the

information contained in the publications describing the al-

gorithms (Camargo & Smith, 2009; Phadikar et al., 2013). In

order to verify how much of the error rates observed for the

algorithm are due to problems in the automatic segmentation

of the leaf and symptoms, the results obtained when per-

forming the segmentations manually are also presented.

Table 27 presents the accuracies observed for each plant

species using each algorithm.

3.3. Discussion

The overall accuracy of the algorithmwas 58%, and the correct

disease was ranked in the top two or three diseases in about

80% of the cases. Individual accuracies varied from 40% for

corn, to 76% for cotton. Several factors played a role in the

observed error rates, as discussed in the following paragraphs.

Page 10: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 21 e Confusion matrix for soybean diseases. The grey shades indicate the correct classifications.

1 2 3 4 5 6 7 8 9 10

1 48.2 0.0 0.0 16.1 0.0 16.1 7.1 1.8 0.0 10.7

2 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3 1.5 0.0 69.2 0.0 9.2 16.9 3.1 0.0 0.0 0.0

4 17.4 0.0 0.0 73.9 0.0 0.0 4.3 4.3 0.0 0.0

5 0.0 0.0 9.1 0.0 54.5 27.3 9.1 0.0 0.0 0.0

6 3.2 0.0 8.1 0.0 4.8 45.2 30.6 4.8 0.0 3.2

7 0.0 0.0 0.0 0.0 0.0 50.0 50.0 0.0 0.0 0.0

8 0.0 0.0 0.0 0.0 0.0 0.0 26.1 71.7 2.2 0.0

9 1.3 0.0 0.0 0.0 0.0 0.0 1.3 32.9 64.5 0.0

10 10.0 0.0 0.0 10.0 15.0 15.0 30.0 5.0 0.0 15.0

Legend: 1. Bacterial blight; 2. Cercospora leaf blight; 3. Rust; 4. Phytotoxicity; 5. Soybean Mosaic; 6. Target spot; 7. Myrothecium leaf spot; 8.

Downy mildew; 9. Powdery mildew; 10. Brown spot.

Table 22 e Distribution of the errors on the diseaseranking for soybean.

Position 2 3 4 5 6 7 8 9 10

% each position 54 17 15 6 3 3 2 1 0

Table 23 e Confusion matrix for sugarcane diseases. Thegrey shades indicate the correct classifications.

1 2 3 4

1 83.3 16.7 0.0 0.0

2 23.3 53.5 4.7 18.6

3 50.0 0.0 50.0 0.0

4 14.3 22.4 8.2 55.1

Legend: 1. Orange rust; 2. Ring spot; 3. Red rot; 4. Red stripe.

Table 24 e Distribution of the errors on the diseaseranking for sugarcane.

Position 2 3 4

% each position 45 45 10

Table 25e Confusionmatrix for wheat diseases. The greyshades indicate the correct classifications.

1 2 3 4

1 90.0 10.0 0.0 0.0

2 50.0 25.0 25.0 0.0

3 0.0 0.0 100.0 0.0

4 10.5 0.0 5.3 84.2

Legend: 1. Wheat blast; 2. Leaf rust; 3. Tan spot; 4. Powderymildew.

Table 26 e Distribution of the errors on the diseaseranking for wheat.

Position 2 3 4

% each position 38 38 24

b i o s y s t em s e ng i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6 113

The number of diseases considered certainly played a role,

however not as important as expected: the correlation be-

tween number of diseases and error rates was close to 60%.

Two other factors played a seemingly more important role:

the similarity between diseases and variations in the image

capture conditions.

All plant species considered had diseases with consider-

able similarity among then. This is an intrinsic and unavoid-

able challenge associated to disease diagnosis, a problem that

affects not only computer programs, but also human experts.

It was observed that, in many cases, the images in the data-

base did not carry enough information to allow a given disease

to be properly discriminated. Thus, although the problem is

unavoidable, more complete and representative databases

may minimize its effects.

Initial versions of the algorithm included features for

capturing shape (e.g. circularity, eccentricity, perimeter, etc.),

size and texture (Gray-Level Co-Occurrence Matrix, Local Bi-

nary Pattern) information. However, the method worked

consistently better when those were not included. The prob-

lem with using shape and size information is that those may

vary considerably as the disease evolves into more severe

stages, which greatly reduces their discriminative capabilities.

In the case of texture features, they were more sensitive to

capture condition variations than anticipated, significantly

reducing their effectiveness.

Capture conditions play a very important role. Most

methods are tested with images either captured in laboratory,

or captured in the field with certain precautions to avoid the

presence of artefacts too difficult to be dealt by the algorithm.

Because this research aimed at testing the algorithm under

conditions as close as those that can be expected in the field,

those kinds of precautionswere not adopted. As a result, some

very complicated conditions were ubiquitous throughout the

images. Among those, two caused some serious difficulties in

the context of this work: specular lighting and shadowed and

illuminated areas present simultaneously. Specular lighting,

which is a high intensity reflection that occurs at certain an-

gles of view, effectively washes out any distinctive features

thatmight be located on that part of the leaf. This effect can be

usually avoided by simply altering the angle of capture and/or

the position of the leaf. Also, symptoms located in areas illu-

minated directly by the sun will have significantly different

characteristics from those in shadowed regions. If those are

present simultaneously, this poses a significant challenge for

the algorithm, and the error rates rise. Figure 4 shows an

example of image containing both specular reflections and

Page 11: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

Table 27 e Accuracies obtained using different algorithms.

Plant species Proposed Proposed manual Phadikar et al. Camargo and Smith

Bean 50% 59% 48% 51%

Cassava 46% 54% 44% 39%

Citrus 56% 66% 50% 51%

Coconut tree 71% 61% 59% 63%

Coffee 53% 50% 53% 49%

Corn 40% 71% 32% 40%

Cotton 76% 74% 69% 69%

Grapevines 58% 60% 62% 55%

Passion fruit 56% 62% 52% 47%

Soybean 58% 62% 56% 58%

Sugarcane 59% 59% 47% 51%

Wheat 70% 54% 68% 68%

Overall 58% 63% 53% 53%

Fig. 4 e Example of image with strong specular reflections

and several light/shadow transitions. The corn leaf in the

image is affected by Southern corn leaf blight.

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6114

light/shadow effects. Other factors, such as angle of capture,

equipment used in the capture and image compression had

minor impact on the overall results.

The images in the database contained leaves at various

stages of maturity, which caused some greenness variation

among the samples. The effect of this on the performance of

the algorithm was negligible, especially in comparison with

capture conditions. Only leaves with very advanced degrees of

senescence, in which the leaf's hue tends towards yellow,

caused some problems. However, there were very few images

presenting those characteristics. A more in-depth discussion

about this issue can be found in Barbedo (2016).

As can be seen in Table 27, the version of the algorithm

in which the segmentations are performed manually had a

slightly better performance. Interestingly, for some plant

species (coconut tree and coffee) the automatic version was

actually more accurate. The automatic leaf segmentation

used in this work (Cerutti et al., 2011) had almost no nega-

tive impact on the results. There are two main reasons for

this. First, the boundaries between the leaves and the

background were well defined in most of the images, even

when no screen was used to isolate the leaves. Second, most

errors experienced by the segmentation algorithm were

false negatives, that is, regions of the actual leaf were

removed in the process. As long as at least some healthy

tissue and symptoms remain, this type of error will have

little impact over the classification. The automatic symptom

segmentation, on the other hand, had a more important

negative impact due to its relatively high sensitivity to poor

capture conditions. In the majority of the cases, however,

the symptom segmentation provided by the automatic al-

gorithm was good.

The methods proposed by Camargo and Smith (2009) and

Phadikar et al. (2013) performed slightly worse than the

proposed algorithm, however their accuracy was surpris-

ingly high given the limited scope under which they were

originally developed. Those methods tended to fail when

the capture conditions were not ideal, not being as robust as

the proposed method. Another advantage of the proposed

algorithm is its simplicity when compared to its counter-

parts, both in terms of implementation and computational

complexity.

Most methods in the literature report classification accu-

racies between 50% and 90%. Those achieving higher accu-

racies were usually tested with fewer diseases and, in most

cases, only one plant species. Those methods may not hold

such a good performance when more diseases are added or

other plant species are considered. More considerations about

this problem can be found in Barbedo (2016).

All the results shown in this section seem to point out to a

ceiling in the accuracy that may be achieved by image-based

methods for automatic plant disease recognition. This is not

surprising, especially considering that even human experts

may fail under certain conditions. Thus, even with very tight

constraints, many challenges still remain. In particular, some

disorders may produce visually similar symptoms, being

almost impossible to be distinguished using only digital im-

ages in the visible spectrum.

In some cases, the use of other spectral bands like infrared

may provide enough information to distinguish between

those disorders. However, this may greatly increase the costs

involved in capturing the images, and most mobile telecom-

munication devices are not capable of capturing images in

those additional bands, which again may prevent many

Page 12: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

b i o s y s t em s e ng i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6 115

potential users from adopting the technology. Also, it is

important to notice that some ambiguities cannot be resolved

even using several spectral bands.

As a result, it seems that a complete diagnosis system

should include other modules capable of providing more in-

formation about the problem at hand. This will almost

certainly cause the system to no longer be fully automatic, but

this additional information may be necessary for a reliable

diagnosis. A possible hybrid system would couple an auto-

matic image-based module with an expert system, which is a

computer system that emulates the decision-making ability of

a human expert (Jackson, 1999). In this case, the automatic

module would be responsible for narrowing down the set of

possible diseases.

The proposed algorithm fits well such a task, as it would be

possible to apply a threshold to the assigned likelihood values

that would define which diseases could possibly be present. If

only onepossible disease remains after this process, the expert

system is not applied. However, if multiple diseases are

considered as possible candidates, the expert systemwould be

activated and the questions to be presented to the userswould

target specifically suchdiseases, trying to find clues thatwould

help to narrow down the search to only one possibility. If after

all this the symptoms still cannot be reliably identified, the

system could direct the user to look for a plant pathologist

capable of performing a deeper investigation into the problem.

4. Conclusions

This paper presented a new digital image-based algorithm for

automatic plant disease identification. The algorithm was

designed to deal with several diseases, and to be easily

retrained as new diseases are included. Its histogram-based

structure makes it reasonably robust to the condition under

which the images were captured.

Tests have shown that there is still room for improvement.

Some actions may be taken during the capture in order to

avoid many of the problems observed, such as avoiding

specular reflections and light/shadow combinations, and

using equipment with good optics and resolution. Factors like

the large number of existing disorders, heterogeneity of

symptoms associated to the same disease, and symptom

similarities between different disorders may require the

adoption of hybrid approaches combining image processing,

expert systems and other information gathering techniques

may be the best hope to overcome at least some of the limi-

tations found in practice.

Futureworkwill focus on three fronts. First, new imageswill

be added both for the diseases already present in the database

and fordiseasesandother disorders thatwerenot considered in

thiswork.Asdiscussedbefore, thiswill beapermanenteffort,as

it is unlikely that the full range of disorders and their variations

be entirely represented any time in the near future. The second

front is directly related to the first, as the proposed algorithm

will be continuously upgraded as new images and disorders

become available. Finally, a hybrid approach combining the

image-based algorithm with an expert system will be investi-

gated as a means of overcoming some of the limitations

revealed by this work. The database and the latest

implementation of the algorithm will be made available at

<https://www.agropediabrasilis.cnptia.embrapa.br/web/

digipathos>assoonascopyrightand license issuesareresolved.

Acknowledgements

Theauthorswould like to thankFapesp (proc. 2013/06884-8) and

Embrapa (SEG 03.13.00.062.00.00) for funding. The authors

would also like to thank Bernardo de Almeida Halfeld-Vieira,

Rodrigo V�eras da Costa, K�atia de Lima Nechet, Claudia Vieira

Godoy, Murillo Lobo Junior, Fl�avia Rodrigues Alves Patrıcio,

Viviane Talamini, Luiz Gonzaga Chitarra, Saulo Alves Santos de

Oliveira, Alessandra Keiko Nakasone Ishida, Jos�e Maurıcio

Cunha Fernandes, F�abio Rossi Cavalcanti, Daniel Terao and

FrancisleneAngelotti forcapturing theimagesusedinthiswork.

Appendix A. Supplementary data

Supplementary data related to this article can be found at

http://dx.doi.org/10.1016/j.biosystemseng.2016.03.012.

r e f e r e n c e s

Barbedo, J. G. A. (2013). Digital image processing techniques fordetecting, quantifying and classifying plant diseases.SpringerPlus, 2(660).

Barbedo, J. G. A. (2014). An automatic method to detect andmeasure leaf disease symptoms using digital imageprocessing. Plant Disease, 98, 1709e1716.

Barbedo, J. G. A. (2016). A review on the main challenges inautomatic plant disease identification based on visible rangeimages. Biosystems Engineering, 144, 52e60.

Barbedo, J. G. A., Tibola, C. S., & Fernandes, J. M. C. (2015).Detecting Fusarium head blight in wheat kernels usinghyperspectral imaging. Biosystems Engineering, 131,65e76.

Bauriegel, E., Giebel, A., & Herppich, W. B. (2010). Rapid Fusariumhead blight detection on winter wheat ears using chlorophyllfluorescence imaging. Journal of Applied Botany and Food Quality,83(2), 196e203.

Belin, �E., Rousseau, D., Boureau, T., & Caffier, V. (2013).Thermography versus chlorophyll fluorescence imaging fordetection and quantification of apple scab. Computers andElectronics in Agriculture, 90, 159e163.

Camargo, A., & Smith, J. S. (2009). Image pattern classification forthe identification of disease causing agents in plants.Computers and Electronics in Agriculture, 66, 121e125.

Cerutti, G., Tougne, L., Mille, J., Vacavant, A., & Coquin, D. (2011).Guiding active contours for tree leaf segmentation andidentification. In Proceedings of the Conference on multilingual andmultimodal information access evaluation (CLEF).

Cl�ement, A., Verfaille, T., Lormel, C., & Jaloux, B. (2015). A newcolour vision system to quantify automatically foliardiscolouration caused by insect pests feeding on leaf cells.Biosystems Engineering, 133, 128e140.

Grand-Brochier, M., Vacavant, A., Cerutti, G., Bianchi, K., &Tougne, L. (2013). Comparative study of segmentationmethods for tree leaves extraction. In Proceedings of theInternational workshop on video and image ground truth incomputer vision applications. Article No. 7.

Page 13: Identifying multiple plant diseases using digital …static.tongtianta.site/paper_pdf/dd188390-91bf-11e9-ab76...Thus, it is no surprise that methods for automatic plant disease diagnosis

b i o s y s t em s e n g i n e e r i n g 1 4 7 ( 2 0 1 6 ) 1 0 4e1 1 6116

Jackson, P. (1999). Introduction to expert systems (3rd ed.). Addison-Wesley.

Kruse, O. M. O., Prats-Montalb�an, J. M., Indahl, U. G., Kvaal, K.,Ferrer, A., & Futsaether, C. M. (2014). Pixel classificationmethods for identifying and quantifying leaf surface injuryfrom digital images. Computers and Electronics in Agriculture,108, 155e165.

Kuckenberg, J., Tartachnyk, I., & Noga, G. (2009). Detection anddifferentiation of nitrogen-deficiency, powdery mildew and leafrust at wheat leaf and canopy level by laser-induced chlorophyllfluorescence. Biosystems Engineering, 103(2), 121e128.

Lins, E. C., Belasque, J., Jr., & Marcassa, L. G. (2009). Detection ofcitrus canker in citrus plants using laser induced fluorescencespectroscopy. Precision Agriculture, 10, 319e330.

Mahlein, A. K., Steiner, U., Hillnhutter, C., Dehne, H. W., &Oerke, E. C. (2012). Hyperspectral imaging for small-scaleanalysis of symptoms caused by different sugar beet diseases.Plant methods, 8(1), 3.

Oberti, R., Marchi, M., Tirelli, P., Calcante, A., Iriti, M., &Borghese, A. N. (2014). Automatic detection of powderymildew on grapevine leaves by image analysis: optimal view-angle range to increase the sensitivity. Computers andElectronics in Agriculture, 104, 1e8.

Park, S.-H., & Furnkranz, J. (2007). Efficient pairwise classification.In Proceedings of the 17th European Conference on Machine Learning(ECML-07) (pp. 658e665).

Phadikar, S., Sil, J., & Das, A. K. (2013). Rice diseases classificationusing feature selection and rule generation techniques.Computers and Electronics in Agriculture, 90, 76e85.

Polder, G., van der Heijden, G. W. A. M., van Doorn, J., &Baltissen, T. A. H. M. C. (2014). Automatic detection of tulipbreaking virus (TBV) in tulip fields using machine vision.Biosystems Engineering, 117, 35e42.

Pourreza, A., Lee, W. S., Etxeberria, E., & Banerjee, A. (2015). Anevaluation of a vision-based sensor performance inHuanglongbing disease identification. Biosystems Engineering,130, 13e22.

Rodrıguez-Moreno, L., Pineda, M., Soukupov�a, J., Macho, A. P.,Beuz�on, C. R., Bar�on, M., et al. (2008). Early detection of beaninfection by Pseudomonas syringae in asymptomatic leafareas using chlorophyll fluorescence imaging. PhotosynthesisResearch, 96, 27e35.

Zhang, J., Yuan, L., Pu, R., Loraamm, R. W., Yang, G., & Wang, J.(2014). Comparison between wavelet spectral features andconventional spectral features in detecting yellow rust forwinter wheat. Computers and Electronics in Agriculture, 100,79e87.

Zhou, R., Kaneko, S., Tanaka, F., Kayamori, M., & Shimizu, M.(2015). Image-based field monitoring of Cercospora leaf spotin sugar beet by robust template matching and patternrecognition. Computers and Electronics in Agriculture, 116,65e79.