91
DSpace Institution DSpace Repository http://dspace.org Computer Science thesis 2021-02 ENSET YIELD PREDICTION MODEL USING ARTIFICIAL NEURAL NETWORK: IN CASE OF WOLAYTA ZONE, ETHIOPIA FASIKA, LACHORE http://ir.bdu.edu.et/handle/123456789/12378 Downloaded from DSpace Repository, DSpace Institution's institutional repository

ENSET YIELD PREDICTION MODEL USING ARTIFICIAL NEURAL

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

ZONE, ETHIOPIA
FASIKA, LACHORE
BAHIR DAR UNIVERSITY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULTY OF COMPUTING
By
FACULTY OF COMPUTING
IN CASE OF WOLAYTA ZONE, ETHIOPIA.
By
Fasika Lachore Laba
a thesis submitted
to the school of Research and Graduate Studies of Bahir Dar
Institute of Technology, BDU in partial fulfillment of the requirements for the degree
of Master of Science in the Software Engineering in the faculty of computing.
Advisor Name: Mekuanint Agegnehu/Phd/
DECLARATION
I, the undersigned, declare that the thesis comprises my own work. In compliance with
internationally accepted practices, I have acknowledged and refereed all materials used
in this work. I understand that non-adherence to the principles of academic honesty and
integrity, misrepresentation/ fabrication of any idea/data/fact/source will constitute
sufficient ground for disciplinary action by the University and can also evoke penal
action from the sources which have not been properly cited or acknowledged.
Name of the student_______________________________ Signature _____________
Date of submission: ________________
Place: Bahir Dar
This thesis has been submitted for examination with my approval as a university
advisor.
School of Research and Graduate Studies
Faculty of Computing
THESIS APPROVAL SHEET
The following graduate faculty members certify that this student has successfully presented
the necessary written final thesis and oral presentation for partial fulfillment of the thesis
requirements for the Degree of Master of Science in Software Engineering.
Approved By:
v
ACKNOWLEDGEMENTS
First of all, I would like to express my deepest gratitude to my sponsors Mona L. Jordan,
Raymond Beck, Sr Sofie Op Mother general of the Dominican Sisters of St Catherine of
Seina, Sr Annaliza Hipolito Op, Sr Cecilia Op, Sr Jennifer Abasolo Op, and all sisters of St
Catherine of Seina in Bahir Dar for their assurance of helping me until the completion my
study. The Start of this study would have been not possible unless their continually support.
And especial thanks go to Mama Nancy and Liyod in America for allowing me to have
privileged to get my new Laptop before the beginning of this course, which was the most
important requirement to my entire study.
I would like also to express my sincere thankfulness to my passionate and capacitate advisor
Mekuanint Agegnehu (Phd) for the nonstop start of supporting of my MSc research. I would
be always grateful for his tolerance, motivation, enthusiasm, and enormous knowledge he
has in the area of my study. The supervision he has been done is very helpful through my
research.
My sincere thanks also go to the researchers Zerihun Yemataw/Phd/, Mr Yasin Goa, Mr
Tesfaye Dejene, Mr Mikiyas Yeshitela, Mr Henok, Mr Worku, Mr Genene and Mr Mesgana
for helping me providing the idea on Enset Yield Prediction model and offering me the
chance to visit the Agricultural Research Center (AARC) that is the center of excellence for
research on Enset nationally and providing the most important part of this research, historical
data.
I extend my thanks and appreciations to Mr Abiyot and his coworkers who have helped a lot
in preparation for the primary data for this research. Not only the preparation of the data but
also providing the important materials to collect the primary data which was held in field of
the national Enset research center in Areka-Wolaita. And also, it is an honor to me to thank
Mr Yohannes Doboche, Fr Birhanu Lemma and Mr Minasie for allowing me to have a
privileged access to very advanced Library and Internet in their particular work place Dubbo
Catholic Primary School and Areka National Enset research Center.
vi
My thankfulness goes also to my family members: my wife Monica Haile and her families
in particular Birkie Haile, Solomon Seyoum and the children, my parents Lachore Laba and
Theressa Yohannes, my aunt Sr Waje Yohannes, my big brother Tesfaye and his wife
Woinshet Zemachu, and my little siblings Ashenaf L, Tinsae L, Lukas L, Peteros L, Etenesh
L and Senait L for supporting me in many ways.
I am very thankful to H.E. Bishop Lesanuchristos Mathewos, Fr Groum Tesfaye SJ, Fr
Abenet Abebe CM, Fr Michael Mungi SJ, Br Paul Klonzo SJ, and Br Ayele Shalamo SJ the
two Jesuit scholars, Sr Trufatu Beshir and all my churchmates brothers and sisters in Christ
for all your prayers, cares, guidance and being a model for my every success in my life. Your
spiritual follows up and your life styles brought me to be today’s Fasika.
Last but not the least, I would like to encompass my thankfulness to my instructors, advisors,
faculty leadership, colleagues, classmates, friends and mentors for all your inspiration,
insightful comments and remarks for identifying the study area for my research and this
helped me in looking the area and identifying the research title which is “ENSET YIELD
PREDICTION MODEL USING ARTIFICIAL NEURAL NETWORK: IN CASE OF
WOLAYTA, ETHIOPIA.
vii
ABSTRACT
Enset, Enset ventricosum, is a crop that contributes for approximately 20% of the total
population in Ethiopia depends upon Enset for food security. The yield prediction is a major
issue that remains to be solved based on available data. It is a crucial function in planning
for food security of the population of a district or even of the whole country.
Enset yields estimation models accounting the inter clonal, age group, and harvesting time
differences to predict the different yield products (kocho in two forms, bulla, amicho, and
fiber) of an Enset plant non-destructively are still lacking. As Enset has five
yields/products, the already developed Enset yield estimation model is having a limited use
in that only works for 'kocho' yield estimation. The prediction models for 'bulla', 'amicho'
and fiber yield estimations are not yet developed. In this research we used Artificial Neural
Network model for Enset yield prediction to predict the amounts of different Enset yields
(kocho in two forms, bulla, amicho, and fiber) of an Enset plant. The objective of this
research to design and develop an Enset Yield Prediction Model Using Machine Learning
Algorithms: In case Of Wolayta, Ethiopia.
The Enset data was presented in the form of numerical and it was collected from Areka
Agricultural Research center for the 7 years (2013 to 2019). An exhaustive study is
performed on the given dataset and algorithms. The research approach has five phases, data
gathering, data pre-processing, the prediction model is implemented to predict yields, the
model is trained and finally, the model is evaluated.
We have built MLP-ANN, RF model and the Ensemble MLP-ANN. We have evaluated the
performance of the models. We also compared the results based on the errors generated.
The results of comparing the three models are: - For the model RF we got with R2, MSE,
and RMSE 0.81, 0.176, and 0.419 respectively, and for the model MLP-ANN we got with
R2, MSE, and RMSE 0.857, 0.14, and 0.374 respectively. And also for newly proposed
model which is EMLP-ANN, we have evaluated for R2, MSE, and RMSE 0.92, 0.077, and
0.277 respectively.
viii
The study result show that using stacking ensemble method for MLP-ANN enables to come
up with better prediction. This also can be improved by using the combined approach of
the more advanced machine learning algorithms.
Keywords: Enset Yield Prediction, ANN, MLP Neural Network, Ensemble-ANN,
Backpropagation, Enset ventricosum, Enset, KOCHO, BULLA, FIBER, AMICHO
ix
1.5.1. General Objective .............................................................................................. 6
1.5.2. Specific objectives ............................................................................................. 6
1.7. Significance of the study ......................................................................................... 7
1.8. Thesis Organization ................................................................................................ 7
2.1.1. Neuron (Node) .................................................................................................. 11
2.3. Artificial Neural Network in Agriculture ........................................................... 15
2.4. Crop Yield Prediction ........................................................................................... 15
2.5. Background of Enset (Enset e ventricosum (Welw.) Cheesman) ..................... 16
x
2.7. Enset (Enset e ventricosum) varieties in Wolaita Zone ..................................... 17
2.8. Health Benefits of Enset ........................................................................................ 19
2.9. Approaches to this Study/Related works ............................................................ 19
2.10. Summary of Related Works ................................................................................. 20
2.11. Gaps in Previous Study ......................................................................................... 21
2.12. Recommendation from previous Study ............................................................... 22
CHAPTER THREE ......................................................................................................... 22
3. METHODOLOGY .................................................................................................... 22
3.1.3. Designing and Development ............................................................................ 23
3.1.4. Demonstration .................................................................................................. 24
3.1.5. Evaluation ........................................................................................................ 24
3.1.6. Communication ................................................................................................ 24
3.3. Data Analysis ......................................................................................................... 26
3.3.1. Data Collection ................................................................................................ 26
3.3.2. Data Description .............................................................................................. 26
3.4. Data Preprocessing ................................................................................................ 30
3.4.2. Data integration ..................................................................................................... 31
3.4.3. Data cleaning .......................................................................................................... 32
3.6. Prediction Model Skeleton .................................................................................... 35
3.7. Proposed Enset Yield Prediction Model .............................................................. 36
3.8. Ensemble Multilayer Perceptron Neural Network (EMPL-NN) ...................... 36
3.8.1. Input Layer ....................................................................................................... 37
3.11.1. Python 3.7 ........................................................................................................ 44
3.11.3. NumPy ............................................................................................................. 46
3.11.4. Anaconda ......................................................................................................... 46
3.11.5. Pandas .............................................................................................................. 47
3.11.6. Scikit-learn-Sklearn ......................................................................................... 47
3.11.7. TensorFlow ...................................................................................................... 48
3.11.8. Keras ................................................................................................................ 48
CHAPTER FOUR ............................................................................................................ 50
4.1. Experimental Simulation for the Model .............................................................. 50
4.1.1. Training phase .................................................................................................. 51
4.1.2. Testing phase .................................................................................................... 52
ANN) 59
xii
4.3.2. R-squared (R2) as Model Evaluation Metrics .................................................. 62
CHAPTER FIVE .............................................................................................................. 65
5.1. Conclusions ............................................................................................................ 65
5.2. Contributions ......................................................................................................... 66
5.3. Recommendations ................................................................................................. 66
SNNPRS Southern Nations, Nationalities and Peoples Regional State
GDP Gross Domestic Product
ANN Artificial Neural Network
BGCORM Corm weight before grating (kg)
BWT Bulla Weight (kg)
EMLP-ANN Ensemble Multilayer Perceptron Artificial Neural Network
RF Random Forest
R2 R-Squared
Figure 2-2: Operations at one neuron in ANN ................................................................... 11
Figure 2-3: Multilayer Artificial Neural Network(Patterson & Gibson, 2017.) ................. 13
Figure 2-4: Flow Chart for Back Propagation Algorithm (Kim & Seo, 2018) .................. 14
Figure 3-1: Data Preprocessing Stages(Kung et al., 2016.) ............................................... 31
Figure 3-2: Tabular representation of Dataset .................................................................... 34
Figure 3-3: Visualized Input dataset ................................................................................... 34
Figure 3-4: Visualized output dataset ................................................................................. 35
Figure 3-5: Skeleton for Yield Prediction Model for Enset ............................................... 35
Figure 3-6: MLP-ANN Architecture .................................................................................. 38
Figure 3-7: Proposed Ensemble MLP-ANN Model ........................................................... 39
Figure 3-8: Learning Process Artificial Neural Network ................................................... 42
Figure 3-9: Process flow diagram ....................................................................................... 42
Figure 3-10: Spyder python development environment ..................................................... 45
Figure 4-1: Optimization of Backpropagation algorithm ................................................... 56
Figure 4-2: Training and validating loss of MLP-ANN model .......................................... 58
Figure 4-3: Graphical view of losses of MLP-ANN Model ............................................... 59
Figure 4-4: System generated Architecture for Ensemble MLP-ANN Model ................... 59
Figure 4-5: Graphical view of losses of EMLP-ANN Model ............................................ 60
Figure 4-6: Graphical view of losses of EMLP-ANN Model ............................................ 61
xv
Table 3-1: Quantitative parameters of the Enset data ......................................................... 28
Table 3-2: Input Parameters of the Enset ........................................................................... 30
Table 3-3: Output Parameters of the Enset ......................................................................... 30
Table 3-4: List of Material used to collect primary data .................................................... 44
Table 4-1: Evaluation of Enset Yields prediction models .................................................. 63
Table 4-2: Evaluation of models terms of individual yield ................................................ 64
CHAPTER ONE
1. INTRODUCTION
1.1. Background
Agriculture is the main source of national income for most developing countries(Mohan & Patil,
2017b). Agriculture in Ethiopia is the largest component of its economy and employs majority
of the Ethiopian population. The majority of these are smallholder farmers practicing
subsistence farming on less than one hectare of land. Ethiopian agriculture is rainfall dependent
and subsistence-oriented. The agriculture in turn depends on unpredictable and erratic rainfall
and is basically subsistent in its nature.
Agricultural system is very complex since it deals with large data situation which comes from
a number of factors. A lot of techniques and approaches have been used to identify any
interactions between factors that affecting yields with the crop performances. The application
of neural network to the task of solving non-linear and complex systems is promising(Bejo &
Mustaffha, 2014).
Agriculture is the livelihood for more than 90% of the population in the rural areas. Enset e is
an essential element in Wolayita food economy and acts as a staple, or co-staple, food. Where
land is very scarce and consequently where cereal harvests are low, high yielding Enset offers
some opportunity for food security. Enset is also popular because of its drought resistant
properties(Zengele, 2017).
Agricultural management need simple and accurate estimation techniques to predict yields in
the planning process(S. S. Dahikar, Extc, & College, 2015). Most farmers are relied on their
long-terms experiences in the field on particular crops to expect a higher yield in the next
harvesting period. Also listed two important steps. First was by using traditional approach of
mathematical models and the second was on the application of artificial intelligent for the
prediction.
for Ethiopia that ensures year-round food and feed security, traditional medicine and fiber
2
(Brandt & Mccabe, 1997). The Enset cultivation system is economically viable and well adapted
to Ethiopian agricultural systems. Every part of the plant can be used in one way or another.
Farmers often acknowledge that Enset is their food, cloth, house, bed, cattle feed and
plate(Tsegaye & Struik, 2003).
Enset (Ensete ventricosum) as it is commonly known as the Ethiopian banana, Abyssinian
banana, false banana, Enset or Enset e, is an herbaceous species of flowering plant in the
banana family Musaceae. Enset (Enset e ventricosum) is the main crop of sustainable and
indigenous cropping system in Ethiopia that ensures food security for several millions of people
(Yesuf & Hunduma, 2012a). Enset, Enset e ventricosum, is a crop that contributes for
approximately 20% of the total population in Ethiopia depends upon Enset for food security.
Moreover, different parts Enset are also widely used as feed, fiber and construction material
(Yesuf & Hunduma, 2012a).
According to (Ayalew & Yeshitila, 2011) Enset has three major products utilized as food are
commonly known as Kocho, Bulla and Amicho. Kocho is a fermented product from the
scrapped parenchymatic tissue of leaf sheath and pulverized corm. Bulla is made by dehydrating
the juice arising from the mixture of scrapped parenchymatic tissue of leaf sheaths, pulverized
corm and granted stalk of inflorescence. Amicho is the stripped corm of younger plants of Enset
which is boiled and consumed. Apart from its multipurpose use the Enset plant has cultural and
socioeconomic value mainly in the south and south-west parts of Ethiopia.
Yield prediction is a very important issue in agricultural. Any farmer is interested in knowing how
much yield he is about to expect. In the past, yield prediction was performed by considering
farmer's experience on particular field and crop. The yield prediction is a major issue that remains
to be solved based on available data(Manjula, 2017).
Assessment of the usable yield of Enset , however, is difficult due to complicated production
methods and processing procedures. Enset is a perennial and the vegetative propagated
planting material is yearly transplanted into several nurseries until finally it is planted in a part
of the field where it matures until harvest.(Tsegaye & Struik, 2001a).
3
According to (Zerihun et al., 2016), Areka Agricultural Research Centre, Ethiopia which hosts
the coordination of the National Enset Improvement Program and is situated in the heart of one
of the major Enset producing areas of the country
Agriculture, as the backbone of many developing economies (especially in Ethiopia), provides
a substantial portion of their Gross Domestic Product (GDP)(Manjula, 2017). Thus, the
possibility to obtain yield predicts with reasonable accuracy prior to harvest is important, since
timely interventions can take place in case low yields are predicted.
Better predictions can be achieved through models by considering the factors that affect crop
growth and yield for a year of interest. Accurate information about history of crop yield is an
important thing for making decisions related to agricultural risk management. This research
focuses on evolution of a prediction model which may be used to predict Enset yield production.
Therefore, Crop yield prediction is an important agricultural problem.
There are works done to predict some Enset yields like kocho using regression models.
According to (Bejo & Mustaffha, 2014), the combination of advance technology and agriculture to
improve the production of crop yield is becoming more interesting recently. Added by (Bejo &
Mustaffha, 2014), ANN has become a well-liked method to most authors because of its ability of
prediction, forecasting and classification in biological science fields.
Several previous researchers like (B. & Louella, 2018) developed Bitter Melon Crop Yield
Prediction using Machine Learning Algorithm, (Kung, Kuo, et al.,2016) developed Accuracy
Analysis Mechanism for Agriculture Data Using the Ensemble Neural Network Method,
(Mohan & Patil, 2017a) designed a model for Crop Cost Forecasting using Artificial Neural
Network with feed forward back propagation method. And others like (Ramesh & Vishnu,
2015), (Manjula, 2017), (Prasad, Chai, Singh, & Kafatos, 2006), (Sahle, Yeshitela, & Saito,
2018), and (Haile, 2014b) used machine learning and data mining algorithms for designing of
crop prediction model and analysis.
4
1.2. Research Motivation
The motivation behind this study was that the study area particularly AARC. This research
center is hosting national research in Enset. The researcher was coming from this area. The
farmers’ life almost defend on the products of the Enset. A crop yield prediction is a general
problem that occurs. Farmers have curiosity in knowing how much yield they are about to
expect, though the Enset yield estimation models are not yet ready. The researcher motivated
to come up with designed model for Enset yield prediction using machine learning algorithms.
1.3. Problem Statement
The need values of the Crop yield prediction model have been mentioned by (Menaka, 2017)
is to improve crop marketing and planning, improve crop field-level investment, improve
production planning, improve crop production input, improve crop field operation and
mitigate negative soil impact. However, very numerous crops are being cultivating in
Ethiopia, Enset () is one of among, and the yield prediction model is not predictive enough.
According to (Yesuf & Hunduma, 2012a) the attempts were also made to develop regression
model which, non-destructively, predicts yield of Enset with better precision and simplifying
yield evaluation in experiments and solve difficulties in estimating kocho yield in the
assessment of production balance in Enset production region of the country. But the yield of a
Enset also has a non-linear relationship with critical input parameters which are not considered
in regression model. Hence, in these study non-linear models like Artificial Neural Network
(ANN) is used for predicting Enset yields more accurately than regression model.
Several researchers have developed models to estimate the yield of Enset in Ethiopia. In order
to reduce confusion around the yield and production estimate, (Tsegaye & Struik, 2001b)
developed a linear model for predicting Enset plant yield and assessing kocho production in
Ethiopia. At the Areka Agricultural Research Center in Southern Ethiopia, (Tsegaye & Struik,
2003) investigated the kocho yield of Enset in terms of weight and energy under different
crop establishment methods. By considering different clones and using multiple regression
models, (Haile, 2014b)further developed simple linear models and investigated fermented
5
unsqueezed kocho as a function of Enset plant height and pseudostem circumference
measurements(Sahle et al., 2018)
According to (Struik, 2003) Yield data on Enset are very scarce. Also (Struik, 2003) added
that, there is also a lack of knowledge on the physiological parameters of Enset that determine
the growth of the crop and how these parameters develop and affect growth under field
conditions in which others factors are very variable.
The study done by (Yesuf & Hunduma, 2012a) suggested as that the Enset yields estimation
models accounting the inter clonal, age group, agro-ecological, and harvesting time differences
to predict the different yield products (kocho, bulla, amicho, and fiber) of an Enset plant non-
destructively are still lacking.
The already developed Enset yield estimation model is having a limited use in that only works
for 'kocho' yield estimation; models for 'bulla', 'amicho' and fiber yield estimations are not yet
developed. As (Yuvaraj, 2016) stated, one of the difficulties faced in the prediction process is
that most of the essential parameters that are necessary to consider for the accurate prediction
are not consider.
In Summary, in this research we used Artificial Neural Network model for Enset yield
prediction in accounting the inter clonal, age group, and harvesting time differences is
developed to predict the different yield products (kocho, bulla, amicho, and fiber) of an Enset
plant.
1.4. Research Questions
In our study, the more specifically, the following research questions need to be addressed. We
listed five research questions bellow to be answered in our entire study. The following research
questions are formulated and addressed in this research.
1. What are the parameters necessarily used to build Enset Yield Prediction model using
machine learning algorithms?
2. What are the appropriate methods and techniques for Enset data processing and
structuring?
6
3. How to integrate neural networks to improve the Enset Yield prediction model
performance on learning problems?
4. What is the performance of the newly developed Enset Yield rediction model?
1.5. Objective of the study
In this section, the general and specific objective of this research is described.
1.5.1. General Objective
The general objective of the research is to design and develop an Enset Yield Prediction Model
Using Machine Learning Algorithms to predict Enset Yields (KOCHO in two forms, BULLA,
CORM (AMICHO) and FIBER the by-product) using vegetative parameters of the Enset plant.
1.5.2. Specific objectives
In order to achieve the overall stated objective of this research work, we have formulated the
following specific objectives:
To explore the existing yield prediction models for Enset plant.
To identify the vegetative determinant factors of Enset yields.
To apply machine learning algorithms for Enset data analysis and structure.
To design and develop Enset Yields prediction model.
To evaluate performance of the newly developed model.
1.6. Scope of the study and Limitation
The main scope of the study is to develop an Enset yield prediction model using machine
learning algorithms. The data for this study is collected from Wolaita zone Areka Agicultural
Research Center. The study is limited to designing Enset Yield prediction model that could be
used as a baseline for implementation. The researchers are not gone further to implement the
proposed Enset yield prediction model due to the limited amount of time and financial
resources.
7
The significance of the proposed study provides the following facts.
This research helps farmers and growers in making planting decisions, setting
appropriate food reserve level and improving risk management of crop-related
derivatives.
Saves waiting times of the farmers to know the estimation of the products as Enset takes
much time to know the amounts of the products after its after harvesting.
The study contributes increasing the income of the farmers in predicting the amount of
the yields prior to collection of the yields.
It also helps farmers and agriculture sector in addressing food security challenges and
planning for the next planting.
The study could contributes to the research center in Areka to understand yield
prediction of Enset in consideration of inter clonal, age group, and harvesting time
differences.
It could help in also maximizing the Enset yield, selection of the appropriate Enset that
would be planted plays a vital role.
The output of this research work could contribute to other future scholars who want to
do their research on the same areas and act as a base to do further improvement on the
model or the techniques this study has been used.
1.8. Thesis Organization
The remaining part of the thesis is prepared as follows.
Chapter two presents literature review on definition of Artificial Neural Networks, ANN
Application in Agriculture, Crop yield predictions and analysis, different Enset benefits and
varieties, recommendations from previous studies and gaps identified from the previous studies
Chapter three presents the methodologies that we have used to accomplish this thesis is
discussed. It includes data collection, data preprocessing, proposed model design and prediction
methods. Chapter four discusses the results of the designed model. Results of the experiment
8
are also analyzed and interpreted. Chapter five summarizes the Conclusion and
recommendations of the study for future work.
9
2. LITERATURE REVIEW
In our research, this chapter includes a brief overview of the Artificial Neural Network and its
application in agriculture. The different algorithms and the approaches used to design network
model to predict yields of Enset are reviewed and presented and others very recent related works
which are previously done included.
2.1. Overview of the Artificial Neural Network
Artificial Neural Networks (ANN) is the machine learning model that tries to solve problems
in the same way as the human brain does. Instead of neurons, ANN is using artificial neurons,
also known as perceptron. In the human brain, neurons relate to axons, while in the ANN,
weighted matrices are used for connections between artificial neurons. Information travels
through neurons using connections between them; from one neuron, the information travels to
all neurons connected to it(Jukic, Saracevic, & Subasi, 2020). Adjusting the weights between a
neurons system can be trained from input examples.
Artificial Neural Network (ANN) technology is a group of computers designed algorithms for
simulating neurological processing to process information and produce outcomes like the
thinking process of humans in learning, decision making and solving problems. The uniqueness
of ANN is its ability to deliver desirable results even with the help of incomplete or historical
data results without a need for structured experimental design by modeling and pattern
recognition. It imbibes data through repetition with suitable learning models, similarly to
humans, without actual programming.
It leverages its ability by processing elements connected with the user given inputs which
transfers as a function and provides as output. Moreover, the present output by ANN is a
combinational effect of data collected from previous inputs and the current responsiveness of
the system (Ankith & Damodharan, 2018).
Artificial Neural Networks, which are nonlinear data-driven approaches as opposed to the above
model-based nonlinear methods, are capable of performing nonlinear modeling without a priori
10
knowledge about the relationships between input and output variables. Thus they are a more
general and flexible modeling tool for forecasting. The idea of using ANNs for forecasting is
not new ability to tackle complex calculation issues; they are progressively applied to solve
practical problems.
The main advantage of ANNs is the fact that task solving is done by putting forward input
signals stimulating network capability to learn and recognize patterns. Sometimes ANN is
preferred over complex algorithms or rule-based programming for solving various tasks. As
defined by (Khatib, 2011), Artificial Neural Network (ANN) is a Mathematical model designed
to train, visualize, and validate neural network models. It has been conducted right after the
recognition of the way the human brain computes. Also (Khatib, 2011) added, ANN resembles
the brain in two respects:
1. Knowledge is acquired by the network from its environment through a learning process.
2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired
knowledge.
Artificial neural networks (ANNs) are biologically inspired computer programs designed to
simulate the way in which the human brain processes information. ANNs gather their
knowledge by detecting the patterns and relationships in data and learn (or are trained) through
experience, not from programming. An ANN is formed from hundreds of single units, artificial
neurons or processing elements (PE), connected with coefficients (weights), which constitute
the neural structure and are organized in layers. The power of neural computations comes from
connecting neurons in a network.
Figure 2-1: Structure of artificial neuron
11
Artificial Neural Network (ANN) is the network of artificial neurons. It is based on the human
brain’s biological processes(Mishra, Mishra, & Santra, 2016). The benefits of using Neural
Network models are the simplicity of application and robustness in results and NN models have
developed into a powerful approach that can approximate any nonlinear input-output mapping
function(Safa, Samarasinghe, & Nejat, 2020).
The types of artificial neural networks depend on architecture, neuron activation function, loops
in architecture, learning algorithm, and other attributes. Also, there are types of artificial neural
networks that are capable of learning without human interaction(Jukic et al., 2020). Due to its
documented ability to model any function, MLP trained with BP is selected to develop apparatus,
processes, and product prediction models(Coast & Safa, 2015).
2.1.1. Neuron (Node)
It is the basic unit of a neural network. It gets certain number of inputs and a bias value. When a
signal (value) arrives, it gets multiplied by a weight value. In this research we have 11 inputs; it
has 11 weight values which can be adjusted during training time plus bias. The ANN consists of
a very simple and highly inter-connected processor called a neuron. A neuron is an information-
processing unit that is fundamental to the operation of a neural network, and consists of a weight
and an activation function(Kim & Seo, 2018).
(2.1)
12
2.1.2. Connections
It connects one neuron in one layer to another neuron in other layer or the same layer. A
connection always has a weight value associated with it. Goal of the training is to update this
weight value to decrease the loss (error). The weight parameters on the links between neurons
are determined by the training algorithm. The weights are the most important parameters acting
as the memory of ANN, and the activation function provides nonlinear mapping potential with
the network(Kim & Seo, 2018).
2.1.3. Bias (Offset)
It is an extra input to neurons, and it is always 1, and has its own connection weight. This makes
sure that even when all the inputs are none (all 0’s) there’s gonna be an activation in the neuron.
2.1.4. Activation Function (Transfer Function)
Activation functions are used to introduce non-linearity to neural networks. It squashes the
values in a smaller range.
The artificial neural network is organized into multiple layers, where each layer contains
multiple neurons. The information inside of the network travels from input layers to the output
layers. Between input and output layers, the artificial neural network can have zero or more
hidden layers. The number of layers and number of neurons inside the artificial neural network
is called the architecture of the neural network(Jukic et al., 2020)
Typically, a minimum of three layers which are the input layer, the hidden layer and the output
layer is required to develop an ANN system(Bejo & Mustaffha, 2014). A network’s
architecture can be defined by: number of neurons, number of layers and types of
connections between layers.
Figure 2-3: Multilayer Artificial Neural Network(Patterson & Gibson, 2017.)
The above figure 3, shows feed forward artificial neural network; it is also known as
multilayer perceptron. It has inputs from the external world. It consists input layer, hidden
layers and output layer. There is also output which is out from the output layer. The
function of the input layer to send signal for the hidden layers. The hidden layers will do
computational analysis and send the result to the output layer.
ANN consists computing devices called neurons that are connected to each other in a complex
communication network, through which the brain is able to carry out highly complex
computations. Multilayer perceptron uses variety of learning techniques.
In the artificial neural network, the smallest building block is the perceptron that has multiple
weighted inputs, bias input, and the activation function. Propagating signal from input to the
output in the artificial neural network is called forward propagation, while propagating signal
from output to input is called back propagation.
14
The most popular algorithm for training artificial neural networks is called the backpropagation
algorithm(Jukic et al., 2020)
Backpropagation uses gradient descent on the weights of the connections to minimize the error
on the output of the network. It is the foremost known and easy-to-understand(Patterson &
Gibson, 2017.) Each layer can consist a different number of neurons and each layer is fully
connected to the next layer.
The performance of the prediction will be determined by the correct values of the weights
and biases. The method of fine-tuning the weights and biases from the input data is known
as training the Neural Network. In each iterations of the training process will have the following
steps;
Calculating the predicted output , known as feedforward
Updating the weights and biases, known as backpropagation.
Figure 2-4: Flow Chart for Back Propagation Algorithm (Kim & Seo, 2018)
15
2.2. Ensemble Artificial Neural Network
Ensemble machine learning techniques are algorithms that combine the outputs of multiple
learners to achieve better performance(Jukic et al., 2020). Ensemble learning is a machine
learning paradigm where multiple models (learners) are trained to solve the same problem. By
using multiple learners, generalization ability of an ensemble can be much better than single
learner. Ensemble learning algorithms are meta-algorithms that combine several machine
learning algorithms into one predictive model in order to decrease variance, bias or improve
predictions.
2.3. Artificial Neural Network in Agriculture
The purpose of this topic is to know the current state of the applications of the Artificial Neural
Network in agriculture area. Agricultural production in Ethiopia is characterized by subsistence
orientation, low productivity, low level of technology and inputs, lack of infrastructures and
market institutions, and extremely vulnerable to rainfall variability(Bhanose, Bogawar, Dhotre,
& Gaidhani, 2016). Added by (Bhanose et al., 2016), Agricultural production is dominated by
smallholder households which produce more than 90% of agricultural output and cultivate more
than 90% of the total cropped land.
2.4. Crop Yield Prediction
A crop prediction is a huge problem that occurs. A farmer had an attention in understanding
how much produce he is going to expect. Traditionally farmers decide this based on permanent
experience for specific yield, plants and weather conditions. Character directly thinks about
produce prediction rather than concerning on crop prediction (Bhanose et al., 2016). By
extending his study (Manjula, 2017) defined also yield prediction is an important agricultural
problem. Each and Every farmer is always trying to know, how much yield will get from his
expectation.
According to (Manjula, 2017) the Agricultural yield is primarily depends on weather conditions,
pests and planning of harvest operation. Accurate information about history of crop yield is an
important thing for making decisions related to agricultural risk management(Manjula, 2017).
16
Crop yield determination is a crucial function in planning for food security of the population of
a district or even of the whole country. Agriculture, as the backbone of many developing
economies like Ethiopia, provides a substantial portion of their Gross Domestic Product (GDP).
Thus, the possibility to obtain yield estimates with reasonable accuracy prior to harvest is
important, since timely interventions can take place in case low yields are predicted.
2.5. Background of Enset (Enset e ventricosum (Welw.) Cheesman)
Enset ventricosum (Welw.) Cheesman is a major food security crop in Southern Ethiopia,
where it was originally domesticated(Zengele, 2017). As stated by (Ayalew & Yeshitila,
2011), the three major products utilized as food are commonly known as Kocho, Bulla and
Amicho. Kocho is a fermented product from the scrapped parenchymatic tissue of leaf sheath
and pulverized corm. Bulla is made by dehydrating the juice arising from the mixture of
scrapped parenchymatic tissue of leaf sheaths, pulverized corm and granted stalk of
inflorescence. Amicho is the stripped corm of younger plants of Enset which is boiled and
consumed in a way like Irish potato, sweet potato and cassava(Ayalew & Yeshitila, 2011).
In the Southern Nations Nationality and Peoples Regional State (SNNPRS), the 1994 estimates
show that 300,000 hectares of Enset is projected to yield almost 10 tons per hectare. Enset
planting economy is one of the major activities of the agriculture in SNNPRS. The area contains
over 80% of Enset production of the country (Zengele, 2017). Inclusion of Enset production
from Oromiya Region (Oromia) and the national root crop production would have placed
estimated Enset and root crop production' at more than 1/4 of the total cereal and pulse
production of Ethiopia. This would have created, on paper, a food surplus situation in Ethiopia
that could endanger the food security of numerous communities, which are in fact food deficit
areas (Zengele, 2017).
2.6. Enset and Wolaita Background
Wolaita Zone is one of the major parts of Enset production area in the SNNPRs, Ethiopia. Based
on the 2007 Census conducted by the Central Statistical Agency of Ethiopia (CSA), this Zone
has a total population of 2,473,190; with an area of 4,208.64 square kilometers, Wolayita has a
17
population density of 356.67. While 172,514 or 11.49% are urban inhabitants, a further 1,196
or 0.08% are pastoralists. Wolaita, 'the people of Enset culture'.
Geographically, Wolaita Zone is located between 7° 00' North latitude and 37° 45' East
longitude at the edge of the East African Great Rift Valley. Inhabitants of the Wolaita Zone are
primarily the Wolaita ethno-linguistic communities speaking the Omotic Wolaita language,
Wolaitato Donaa. The Wolaita are predominantly agriculturalists, practicing mixed crop-
livestock production and living in permanent settlements. Within their landholdings,
community members maintain fruit orchards, nurseries, medicinal plants, vegetables, root and
tuber crops, ornamentals, spices, as well as open areas for raising domestic animals(S. Dahikar
& Rode, 2014).
The Wolaita are people whose agriculture is based on Enset , locally known as uutta. The
Wolaita is regarded as 'the Enset people' or 'the people of Enset culture' for the strong interlink
that exist between Enset cultivation and the local food and material culture of the people
(Zengele, 2017). Added by (Zengele, 2017)as indicated in Regional Statistical Abstract, the area
coverage of Enset production in the Zone is 5,400 hectares. The estimated annual production
is 2, 032,656 quintal.
2.7. Enset (Enset e ventricosum) varieties in Wolaita Zone
Enset cultivation is the centre of the cropping system in which the entire farming system is
based and the crop is the major food security and livelihood source in the Wolaita
community(Zengele, 2017). The study done by (Haile, 2014a) discovered different Enset
(Enset ventricosum) vernaculars/clones are identified by the farmers in the study area and have
their own names that are uniformly spread across the study zone. Enset clones are very diverse
in the area ranging from 2 to more than 50 clones. Each farmer possessed various number of
Enset varieties in his farm.
Farmers give vernacular names for each clone. They differentiated one from the other
phenotypically by looking the color (as dark green, light green, brown, light brown, red, pinkish,
etc.) of petiole, mid-rib, leaf sheath, angle of leaf orientation, size and color of leaves and
circumference & length of pseudostem (as tall, medium, short, very short, etc.). Almost all the
18
farmers in the area produce many Enset clones in mixtures that are used for different
purposes(Haile, 2014a).
The Wolaita hold a great repository of Enset landrace diversity in their home gardens. The
Wolaita agricultural systems maintain a greater level of Enset intra-specific diversity than any
other crop species. It is maintained in homegarden (darkuwa) ring in poly-varietal perennial
plantations without any crop-rotations and land-fallowing.
A study done by (Tsegaye & Struik, 2001b) indicated that there are 55 morphologically diverse
Enset clones known by Wolaita People. However; two years later review done by (S. Dahikar
& Rode, 2014), at the then Areka Research Station showed that there were 77 Enset accessions
in Wolaita administrative regions(Tsegaye & Struik, 2001a).
After eight years later, the same study done by the Areka Agriculutural Reseach Center
(AARC), 2012 indicated that from the overall landraces that are known to the Wolaita farming
communities only 35 are represented in the national ex situ Enset collection of AARC. This
showed that 42 landraces of Enset is either genetically eroded or not recorded very well.
Different researchers result indicated that there is a decreasing trend in maintaining landrace
Enset diversity in Wolaita. Some of the landrace genotypes have been rare; many more are not
cultivated anymore.
Recent research result by (Olango, Tesfaye, Catellani, & Pè, 2014) indicated that 67 different
vernacular names of Enset landrace were under cultivation. From these 31 landraces in lowland
and 52 landraces in each of the highland and midland agroecologies, 22 of which were shared
across the 3 agro-ecologies. In general, many landraces are identified by vernacular names,
showed a narrow and unique pattern of distribution, whereas 39 (41%) landraces known to the
Wolaita community were commonly reported at least by 3 of the 5 kebeles(Haile, 2014a).
Different previous studies showed that the genetic diversity of Enset was decreasing from time
to time. This may be due to farmers give priority for some selected clones; genetic erosion or
limited researcher’s sample size. Generally different researchers result combined together
identified and named a total of 95 Enset landrace vernacular names known to the Wolaita
farming communities (Haile, 2014a) and (Olango et al., 2014).
19
2.8. Health Benefits of Enset
Some Enset varieties are believed to have medicinal value and used by the Enset growing
community. For example, in Areka area, a variety called sweete is strongly recommended for
treating a person with bone problem. This may be because it contains high calcium and
phosphorous. Even in the central highlands and cities where Enset is not a staple, bulla is fed
to a mother who gave birth for strengthening and fast recovery.
They also make atmit (gruel) and given to a person caught cold. Different Enset varieties were
reported to have medicinal and religious (ritual) significances for prevention, healing, and other
therapeutic purposes(Bekele, Diro, Yeshitla, Agricultural, & Agricultural, 2013).
2.9. Approaches to this Study/Related works
The study on Enset yield prediction for Kocho has been done by (Haile, 2014b). The main
objective to develop multiple regression models which take in to account large number of
samples, different Enset clones from low to high yielder and the other vegetative parameters.
And to construct a more precise model which will enable to predict kocho and fiber yields non-
destructively from linear dimensions of Enset plants. According to (“Emergencies Unit for
Ethiopia,” 1996) based on the data from sample size of 67 Enset plants a positive relationship
was obtained between measurements of plant pseudostem girth and height with plant kocho
yield. Previously developed Enset yield predictor model by (“Emergencies Unit for Ethiopia,”
1996) also lacked taking in to account different types of clones; the model also predicts no yield
of the very small and very high plants.
Multiple Linear Regression (MLR) is the method, used to model the linear relationship between
a dependent variable and one or more independent variables. The dependent variable is
sometimes termed as predicant and independent variables are called predictors(Ramesh &
Vishnu, 2015).
Multiple regression models which will enable to predict kocho yield from linear dimensions of
Enset considering different clones developed by (Haile, 2014b). Also, an attempt to estimate
fiber yield from measurements of the vegetative parameters, though none of the regression
relationships gave a significant result.
20
The experiment was carried out at Areka Agricultural Research on-station site on a total number
of 328 Enset clones from the six major Enset growing areas of Southern Ethiopia. Plant height
and pseudostem circumference were found out to be the best non-destructive Enset kocho yield
predictors (Haile, 2014b). Kocho assessment is a difficult task as Enset is a multiple year crop
and transplanted from nursery to nursery and then main field at ever wider spacing.
According to (Yesuf & Hunduma, 2012a) Attempts were also made to develop regression
model which, non-destructively, predicts yield of Enset . It was with better precision and
simplifying yield evaluation in experiments and solve difficulties in estimating kocho yield in
the assessment of production balance in Enset production region of the country.
2.10. Summary of Related Works
Research Methods and
width measurements an Enset, kocho
yield regression model done for
kocho yield prediction.
fermented unsqueezed kocho yield
scheme was implemented in order to
improve the cost prediction accuracy
of crop.
farmers, which gives the analysis of
rice production based on available
data? Different
simplifying yield evaluation in
experiments and also solve
Enset yield estimation
models accounting the
plant non-destructively
are still lacking.
The already developed Enset yield estimation model only works for 'kocho' yield estimation; models
for 'bulla', 'amicho' and fiber yield estimations are not yet developed.
Table 2-1: Summary of related works
2.11. Gaps in Previous Study
Previously developed Enset yield predictor model by collected from the six major Enset
growing areas of (“Emergencies Unit for Ethiopia,” 1996) also lacked taking in to account
Enset plants with different types of clones. The model also predicts no yield various growth
as well as yielding ability ranging from the of the very small and very high plants.
Enset yield estimation models accounting the inter clonal, age group, agro-ecological, and
harvesting time differences to predict the different yield products (kocho, bulla, amicho, and
22
fiber) of an Enset plant non-d.estructively are still lacking. The already developed Enset
yield estimation model is having a limited use in that the sample clones were considered at
one location and only works for 'kocho' yield estimation. Models for 'bulla', 'amicho' and fiber
yield estimations are not yet developed (Yesuf & Hunduma, 2012a).
2.12. Recommendation from previous Study
For the future, it is recommended that by using many samples from specific Enset clone
having similar fiber content, fiber yield could be estimated from linear dimensions of Enset
plant (Haile, 2014b). Enset yield estimation models accounting the inter clonal, age group,
agro-ecological, and harvesting time differences should be developed to predict the different
yield products (kocho, bulla, amicho, and fiber) of an Enset plant non-destructively (Yesuf
& Hunduma, 2012a).
The ENN method is based on BPNs. The ENN mechanism randomly generates a plurality of
neural networks, each with a different architecture. For instance, the numbers of hidden layers
and hidden layer neurons are generated randomly(Kung et al., n.d.). Added by (Shahhosseini,
Hu, & Archontoulis, 2020), Stacked generalization aims to minimize the generalization error
of some ML models by performing at least one more level of learning task using the outputs
of ML base models as inputs and the actual response values of some part of the data set
(training data) as outputs.
3.1. Research Methodology
For this research work, we have followed a Design science research methodology which is a
type of information technology research methodology that focuses on evaluating the
23
performance of the outcome. It is a research paradigm where the creation of new artifact and
evaluation of the artifact is a key contribution.
In this research we have used process model designed by (Peffers, et al.,2007), (Offermann,
Levina, & Schönherr, 2009), and (Rossi, Hui, & Bragge, 2006) which has six phases. These are
problem identification and motivation, defining the objective for solution, designing and
development, demonstration, evaluation and communication. In order to achieve the objective
of our research and answer the research questions formulated in the statement of the problem
section, this research methodology is used in this study.
3.1.1. Problem identification and motivation
In this phase, the research problem is identified and motivation coming from the identified
problems defined. Problem definition is used to develop an artifact that provides a solution.
Literatures are reviewed to acquire knowledge about the state of the problem and the importance
of the solution. Literatures which support our research work are reviewed, and the gaps in
related research works are analyzed and how we fill in the gaps is presented. We have also
reviewed several previous related work journals, articles, books, and materials. Relevant
documentation about tools and techniques for the model design and development have been
reviewed and analyzed.
3.1.2. Objectives for the solution
The objectives of a solution are inferred from the problem definition or specification. Many
literatures have been reviewed to know the state of the problem, the state of current solutions
and state of the art. The objective of the research is to solve the problem mentioned by
developing Enset yields prediction model.
3.1.3. Designing and Development
In this section, the solution design is created and developed. This activity includes
determining the artifacts of desired functionality and its architecture and then developing
the actual model. Keras (using TensorFlow as backend) is used for designing the RF, MLP-
24
ANN and the Stacked Ensemble of MLP-ANN, Python is used for writing the required source
codes.
3.1.4. Demonstration
The developed system is demonstrated by simulating how the developed model to predict Enset
yields (Kocho in tow forms, Bulla, Amicho, and Fiber-the byproduct) from the historical data
of Enset . We have used Anaconda tool, Spyder editor with Python language to develop the
model.
3.1.5. Evaluation
The developed system is evaluated to measure how well it supports a solution to the
problem. To evaluate the system in a rational method, testing datasets were fed into the
developed model. Subsequently, the model was evaluated by comparing its output with
the observed data using R2, MSE, and RMSE.
3.1.6. Communication
The researchers will be communicating the AARC for the further implementation of the
proposed model. It would be also considered to communicate with the local agriculture sector
to finance for its improvement to implement for the use of farmers. Other scholars from the
areas of agriculture and engineering will be communicated to see for the ways improve the
model and gadget the system. Zonal administration agriculture office also will be the target
sector to discuss the ways of using the model for the use of sectors and farmers. The researchers
also will communicate entrepreneurs, investors and NGOs working in the areas of improving
yields of crop for the betterment of the farmers.
3.2. Rationale of the Research
This study is conducted by quantitative research approach. In quantitative research approach,
collecting and analyzing of the data obtained from different sources is in a structured way and
it involves in the use of computational, statistical, and mathematical tools to derive results. This
quantitative research approach was used as the study began with data collection based on
document analysis, physical observations of the study site and the important stakeholders
25
helped in providing the necessary data for the study. In addition to this, the secondary data was
extracted by having literature review.
From Areka Agricultural research center, the input and output data needed for this study was
collected. The data contained about 11 input parameters, these are,
Maturity time
Plant height
Pseudostem height
Pseudostem circumference
Leaf number
Leaf length
Leaf width
and about 5 output parameters are
Corm weight before grating (Amicho)
Bulla Weight (Bulla)
Fiber Weight (Fiber)
Fermented unsqueezed kocho (Kocho-Unsqueezed)
Fermented squeezed kocho (Kocho-Squeezed)
These are used for the ANN yield prediction model processing. As talked to experts and
researchers on the area of study in addition to literature review. These parameters considered
for the research are those that are most delicate to the outputs. These parameters are used as
inputs and outputs to and from our newly developed Yield prediction model. The model
generated from ANN is applied collected vegetative and agronomical parameters of the Enset
to predict the yields. That are, fermented unsqueezed KOCHO, Fermented squeezed KOCHO,
Bulla, Amicho/Corm and Fiber-the byproduct.
26
3.3. Data Analysis
3.3.1. Data Collection
We have collected both primary and secondary data from Wolaita Zone from Areka Agricultural
Research Ceneter/AARC/. As stated by(Beyene Teklu Mellisse, Descheemaeker, Giller, Abebe,
& van de Ven, 2018), AARC is located 70 09’ N latitude and 370 47’ E longitudes and at an
elevation range of 1750 and 1820 masl. Based on five years data, the average annual rainfall is
1615.2 mm with a minimum/maximum mean air temperature of 13.90C/25.60C and 63%
relative humidity. The soil is silt clay loam with a pH value of 5.2. The research center has been
serving the area for about 25 years. Also, the center has been caring the Enset research at national level
whose center of excellence is Enset (Enset Ventricusom).
After having reviewed many documents and talked to experts in the area of the study, giving field
observations, and as stated by (Yemataw, Chala, Ambachew, Grant, & Tesfaye, 2017), we have
identified 15 quantitative parameters used for Enset Yield Prediction. Enset has five (5) yields: KOCHO
in two forms (Kocho-Unsqueezed and Kocho-Squeezed), AMICHO/CORM, BULLA and FIBER-
the byproduct. As stated by (Yemataw et al., 2017) and added by (Haile, 2014b) and (B. T. Mellisse,
Descheemaeker, Mourik, & van de Ven, 2017) the Enset yield predictions has significance to vegetative
parameters/ agronomic characteristics that were said 15 quantitative traits. In this study, in our case we
have introduced one parameter which is the weight of central shoot after the inflorescence removed
measured before grating. Because it is correlative relationship for one of the Enset yield Bulla
prediction. This Bulla prediction did not attempt in the precious study. Therefore, 16
quantitative parameters were used for this study.
We have used physical site observation in the Areka Agricultural Research Center (AARC),
which is the center of excellence for research on Enset nationally for primary data. And the
differentiated 16 quantitative parameters were recorded from 36 different selected clones of
Enset which are very suitable for the particular area in AARC. The 36 clones were selected for
the sample collection based on the most availability in the study area. For the 36 clones, the
quantitative measurement process was held.
3.3.2. Data Description
27
The data for this research was collected from Areka Agricultural research center (AARC),
whose center of the excellence at national level is Enset (Enset e Ventricosum). As mentioned
in table 1 bellow, there are 16 quantitative parameters of Enset (Enset e Ventricosum) data for
this research. These quantitative parameters were selected after having talked to many experts
and researchers like Dr Zerihun Yemataw, Dr Yasin Goa and technical teams in the area of the
study.
In this process, the procedure has started from the differentiation of the quantitative parameters
which are used in the entire research. As deep discussion with the research center, the prediction
of the yield in case of Enset (Enset e Ventricosum) agreed to have the following quantitative
parameters.
No Quantitative trait Code Description
1 Maturity time MT Number of years from transplanting up to
harvesting.
2 Plant height (m) PLHT Plant height was measured prior to harvesting
by
Measuring it from ground level to the tip of the
longest leaf using a tape meter.
3 Pseudostem height
harvesting by measuring it from ground level to
the start of the leaf petiole using a tape meter.
4 Pseudostem
circumference (m)
to
the middle height point using a tape meter.
5 Leaf number LFNO Leaf number was taken prior to harvesting by
counting all the fully expanded and green
leaves.
6 Leaf length (m) LFL Leaf length was taken prior to harvesting by
measuring from the end point of the petiole to
28
Table 3-1: Quantitative parameters of the Enset data
the tip edge of the leaf using a tape meter across
the midrib.
7 Leaf width (m) LFWTH The leaf width was measured prior to
harvesting by measuring at the middle wider
point using a tape meter.
8 Leaf sheath number LFSTH
NO
from each plant at harvest.
9 Leaf sheath weight
before decortication and measured before
decortication
and measured after decortication.
11 Central shoot weight
inflorescence removed measured before grating.
12 Corm weight before
removal and prior to grating.
13 Bulla Weight (kg) BWT The weight of dehydrated mixture of scrapped
parenchymatic tissue of leaf sheaths pulverized
corm and granted stalk of inflorescence.
14 Fiber Weight (kg) FYield Fiber yield was measured by weighing all the
fiber left, soon after decorticating the leafsheath.
15 Fermented
unsqueezed kocho
yield (kg/plant)
The unfermented kocho yield is left in the pit for
some time usually 30 days for fermentation.
16 Fermented squeezed
kocho yield (kg/plant)
applying human force to reduce its water as
much as possible.
3.3.3. Independent and Dependent Variables
Input data, the independent variables and the output data, the dependent variables needed for
this research are taken from Areka Agricultural Research center whose center for excellence is
Enset at National Level. 11 inputs and five output parameters used for the ANN processing.
The inputs considered for the research are those that are most sensitive to the outputs(Yemataw,
Mohamed, Diro, Addis, & Blomme, 2014). These 11 input parameters are listed below in table
2:
No Quantitative trait Code Description
1 Maturity time MT Number of years from transplanting up to
harvesting.
2 Plant height (m) PLHT Measurement from ground level to the tip of the
longest leaf at flowering.
3 Pseudostem height (m) PSHT Measurement from ground level to the start of
the petioles.
4 Pseudostem
circumference (m)
pseudostem.
5 Leaf number LFNO The number of 50% green and 50% unrolled
leaves.
6 Leaf length (m) LFL Measurement of all functional leaves from the
end of the petiole to the tip of the leaf and their
mean was taken for analysis.
7 Leaf width (m) LFWTH Measurement of the widest part of all
functional leaf blades just below flag leaf and
their mean was taken for analysis.
8 Leaf sheath number LFSTH
NO
from each plant at harvest.
9 Leaf sheath weight
before decortication and measured before
decortication
30
Table 3-2: Input Parameters of the Enset
The output parameters are five, which are listed in table 3, these are Corm Weight, Bulla
Weight, Fiber Weight, fermented unsqueezed Kocho yield and Fermented squeezed kocho
yield (Yemataw et al., 2014).
Table 3-3: Output Parameters of the Enset
3.4. Data Preprocessing
Data Preprocessing is a technique that is used to convert the raw data into a clean data set. In
our research the data collected from AARC is preprocessed so that it can be suitable for good
model design in artificial neural network. For the currently collected Real-world data it is often
clear that the data is incomplete, inconsistent, and/or lacking in certain behaviors or trends, and
is likely to contain many errors(García, Ramírez-gallego, Luengo, Benítez, & Herrera, 2016).
10 Leaf sheath weight
decortication and measured after decortication.
11 Central shoot weight
inflorescence removed measured before
1 Corm Yield CORM
root removal and prior to grating.
2 Fiber Yield FY The weight of fiber measured
3 Bulla Yield BWT The weight of bulla measured
4 Fermented unsqueezed
fermentation.
SQKOC
HO
applying human force to reduce its water as
much as possible.
31
It is data preprocessing methods proven of resolving such issues. As mentioned by (Alasadi,
2017) it follows the following steps. As mentioned by (Kung et al., 2016.) the data
preprocessing has three stage, which involves data integration, data cleaning and data
transformation.
Figure 3-1: Data Preprocessing Stages(Kung et al., 2016.)
3.4.2. Data integration
We have collected raw data and stored in a place in which data cleaning can be performed. 2520
row of data with 16 columns are collected and stored. We have consider the differences of the
clones for Enset having taken number of years from transplanting up to harvesting as maturity
time. The measurement done for the Enset from ground level to the tip of the longest leaf at
flowering to get its heights which is very significance when designing prediction model for
Kocho and Fiber. Psedustem height is also an important factor determining the yields of the
Enset. We have measured it from ground level to the start of the petioles in meter.
The measurement at the middle height of the Enset pseudostem, we call it pseudostem
circumference which is also another determinant of Kocho yields. Leaf length has also the value
for making the yields of the Enset determined by. The Measurement of all functional leaves
from the end of the petiole to the tip of the leaf and their mean was taken for analysis. To get
the leaf width in meter, we did a measurement of the widest part of all functional leaf blades
just below flag leaf and their mean was taken for analysis to get leaf width in meter. Leaf sheath
32
number is also an important factor. We have counted of all decorticable leaf sheathes obtained
from each plant at harvest. Leaf sheath weight before decortication is another factor affecting
the Enset yields. The weight of all leaf sheathes for each plant before decortication and
measured before decortication in kilogram.
Leaf sheath weight after decortication in kilogram is another factor. The weight of pulp for each
plant after decortication and measured after decortication. Central shoot weight in kilogram
before grating the main determinant of Bulla yield. The weight of central shoot after the
inflorescence removed measured before grating.
3.4.3. Data cleaning
In data preprocessing, this step is using to fill in missing values (attribute or class value). In our
research case, the data collected are summarized one. No missing data in our research work
happen.
3.4.4. Data transformation and Standardization:
The mean and standard deviation estimates of a dataset can be more robust to new data than the
minimum and maximum. Data standardization is about making sure that data is internally
consistent; that is, each data type has the same content and format. Once the standardization is
done, all the features will have a mean of zero, a standard deviation of one, and thus, the same
scale.
Standardizing a dataset involves rescaling the distribution of values so that the mean of observed
values is 0 and the standard deviation is 1. Subtracting the mean from the data is
called centering, whereas dividing by the standard deviation is called scaling. As such, the
method is sometime called “center scaling“.
The most straightforward and common data transformation is to center scale the predictor
variables. To center a predictor variable, the average predictor value is subtracted from all the
values. As a result of centering, the predictor has a zero mean. Similarly, to scale the data, each
value of the predictor variable is divided by its standard deviation. Scaling the data coerce the
values to have a common standard deviation of one.
33
This technique in machine learning is feature scaling, which is very important to design neural
network model. There are two approaches of data transformation. Normalization and
Standardization. We used Standardization approach that scales features such that the
distribution is centered around 0, with a standard deviation of 1. StandardScaler follows
Standard Normal Distribution (SND).
Therefore, it makes mean = 0 and scales the data to unit variance. Its main purpose is to change
the values of numeric columns in the dataset to a common scale, without distorting differences
in the ranges of values(García et al., 2016). It is applied to independent and variables of the
data. Sometimes, it also helps in speeding up the calculations in an algorithm(Zhu, 2016).
Standardization = X−µ
Standard Deviation = √
(3,3)
In this step of model development, the collected data is loaded to the python Programming
environment and we split in to two datasets. Training Dataset and Testing Dataset. Among
2520 records of 7 years data for 36 clones of Enset s, 70% of the data splitted in to training
dataset and 30% of the data splitted in to testing dataset. Then the data is classified as Input
Dataset, we call independent variables and output dataset we call them output variables.
We also classified the dataset in to training dataset and testing dataset. As we have 2520 records
of total dataset, 70% of the total dataset is assigned as training dataset and 30% of the total
dataset for testing case. This is done in our research by loading the Prepared Enset data in csv
format in Python Workspace using Syder Editor in Pyhthon 3.7 using anaconda.
3.5. Tabular Visualization of Data
In this section we have discussed about the analysis of data of 7 years from
2005 up to April 2011 E.C. We have summarized the data using table and graph. After
34
preprocessing the Enset records, we have 2013 to 2019. We have recorded 2520 records with
16 features, from which 11 parameters are input and 5 parameters output (targets).
Figure 3-2: Tabular representation of Dataset
Figure 3-3: Visualized Input dataset
35
3.6. Prediction Model Skeleton
Figure 3-5: Skeleton for Yield Prediction Model for Enset
From fig 3.5, the skeleton of the prediction model is classified in to five working zones. These
are listed as follows.
In data zone of the model skeleton, all activities like data collections, data analysis, and data
preprocessing are discussed. In the training/Learning Zone, after the activities related to data
are done, the model has to be trained using different training/learning machine learning
36
algorithms. In evaluation Zone, after the model is trained, it needs to be evaluating for its
performance. This is done in this zone of model design. Prediction Zone, the model after its
evaluation for performance, it has to be ready for prediction of Enset yields which is the goal
of this research. Finally, in final/Output Zone, the outputs of the predictions, which are the
yields of Enset (Kocho in two forms, Bulla, Amicho and Fiber, the byproduct).
3.7. Proposed Enset Yield Prediction Model
In this section we describe about the developed model for Enset yields prediction. MLPs are
capable of approximating any continuous function. Multilayer perceptron are often applied to
supervised learning problems: they train on a set of input-output pairs and learn to model the
correlation (or dependencies) between those inputs and outputs.
In our study, Ensemble Multilayer Perceptron Neural Network Model (EMLP-NN) with two
hidden layers. Stacked Generalization Ensemble is a model averaging techniques that combines
the predictions from multiple trained models.
Stacked generalization aims to minimize the generalization error of some ML models by
performing at least one more level of learning task using the outputs of ML base models as
inputs and the actual response values of some part of the data set (training data) as
outputs(Shahhosseini et al., 2020). Researches like (Kung et al., n.d.), shows Ensemble Neural
Network (ENN) method is better than traditional back-propagation neural networks and
multiple regression analysis.
Researcher (Kim & Seo, 2018) suggests backpropagation algorithm as the most common and
standard training algorithm, the central idea of which is that the errors for the neurons of the
hidden layer are determined by back-propagation of the error of the neurons of the output layer.
We optimized backpropagation algorithm by ADAM optimizer after we found it better than
others optimizers like SGD and RMSprop.
3.8. Ensemble Multilayer Perceptron Neural Network (EMPL-NN)
A three-layer, defined by an input layer, a hidden layer and an output layer feed forward back-
propagation neural network is developed. This three-layer neural network is used to predict
Enset Yields particularly KOCHO in two forms (Unsqueezed KOHCHO and Squeezed
37
KOCHO), BULLA, AMICHO/CORM AND FIBEER-the byproduct due to its ability to
accommodate large input data and its capabilities to solve problems with vast complexities.
3.8.1. Input Layer
This is the first layer in the neural network. It takes input signals (values) and passes them on to
the next layer. It doesn’t apply any operations on the input signals (values) & has no weights and
biases values associated. In our research the network model has 11 input signals MT, PLHT,
PSHT, PSCIR, LFNO, LFL, LFWTH, LFSTH NO, LFSTH BD, LFSTH AD, SBG
3.8.2. Hidden Layers
In neural networks, a hidden layer is located between the input and output of the algorithm, in
which the function applies weights to the inputs and directs them through an
activation function as the output. Hidden layers have neurons (nodes) which apply different
transformations to the input data.
In our research we have 2 hidden layers, 11 neurons for the first hidden layer and 5 neurons in
the second hidden layer which is passing on the values to the output layer. We have compared
different structures to select for hidden layers and finally we found the chosen hidden layers and
neurons gave better results. All the neurons in a hidden layer are connected to each neuron in
output layer. In short, the hidden layers perform nonlinear transformations of the inputs entered
into the network.
3.8.3. Output Layer
This layer is the last layer in the network & receives input from the hidden layer. With this layer
we can get desired number of values. In this research the network model has 5 neurons in the
output layer, and it has also the outputs Corm (Amicho), Fiber the byproduct, Bulla, Kocho in
two forms (Fermented squeezed Kocho and Fermented unsqueezed Kocho).
38
Figure 3-7: Proposed Ensemble MLP-ANN Model
The above diagram figure 3.7 shows the proposed Ensemble ANN model. We have designed a
proposed model which is EMLP-ANN using a techniques called Stacking. Stacking is a model
averaging technique where multiple sub-models contribute equally to a combined prediction.
The technique of combining multiple models into a single one is referred to as Ensemble
Modeling(Kim & Seo, 2018), And as added by (Kim & Seo, 2018), the application of an
ensemble technique is divided into two steps. The first step is the creation of individual
ensemble members, and the second step is the combination of outputs of the ensemble members,
to produce the most appropriate output.
Random forests are an ensemble learning method for regression that operate by constructing a
multitude of decision trees at training time and outputting the mean/average prediction. It is
usually trained with the “bagging” method. The general idea of the bagging method is that a
combination of learning models increases the overall result. In our study we have used Random
Forest which is an ensemble of decision trees for the combination of the MLP-ANNs.
Instead of searching for the most important feature while splitting a node, it searches for the
best feature among a random subset of features. This results in a wide diversity that generally
results in a better model.
40
In our proposed model EMLP-ANN, it contains two steps. The first step is training the sub
models of MLP ANN using training dataset to make predictions from this data set. These sub
models or base models are trained by using an algorithm called Backpropagation.
The second step is training the single model or Meta learner. This meta-learner or single model
takes the outputs from base models as inputs, and learns to make predictions from this data. In
our study, Random forest algorithm is used as meta-learner that will best combine the
predictions from the sub-models. The outputs of the sub models will be merged using simple
concatenation merge. Then the average of the result will go to output layer. Then finally the
bigger Ensemble Multilayer Perceptron Artificial Neural Network is created to predict Enset
Yields (KOCHO in tow forms, BULLA, AMICHO/CORM, FIBER the byproduct).
3.9. Back propagation algorithm
Backpropagation is the most used algorithm for training artificial neural networks. This
algorithm is based on an optimization method called gradient descent(Jukic et al., 2020).
Artificial neural networks use backpropagation as a learning algorithm to compute a gradient
descent with respect to weights. Much recommended for predictive analysis. Desired outputs
are compared to achieved system outputs, and then the systems are tuned by adjusting
connection weights to narrow the difference between the two as much as possible.
The algorithm gets its name because the weights are updated backwards, from output towards
input. Because backpropagation requires a known, desired output for each input value in order
to calculate the loss function gradient, it is usually classified as a type of supervised machine
learning.
In a nutshell, the algorithm has three phases(Jukic et al., 2020): forward propagation, error
calculation, and weights’ updates. When the input data sample is propagated through the
artificial neural network, then the output is calculated. The output of the input sample is
compared with expected output and the error is calculated. The error is used to do backward
propagation and update weights in all layers to make the error minimal. This process has been
repeated for every data sample from input. This process is repeated until artificial neural
network mean square error reaches the desirable level.
41
The most common and standard algorithm is the backpropagation training algorithm, the central
idea of which is that the errors for the neurons of the hidden layer are determined by back-
propagation of the error of the neurons of the output layer(Kim & Seo, 2018). As mentioned by
(Ranjeet & Armstrong, 2014) the process involves in the backpropagation algorithms is shown
in the following steps:
Step 1: Randomly initiate the weights to small numbers close to 0 but not 0
Step 2: Provide the input data sets in the input layer and desired outcomes
Step 3: Forward Propagation: from left to right the neurons are activated in a way that
the impact of each neuron.
Step 4: Compute the error between the actual and desired outcomes
Step 4: Back propagate: from right to left, Amendment of the weights associated with
inputs and functions.
Step 5: Compare the error and the tolerance ratio
Step 6: If error is still higher than the tolerance, begin from the step 1 again otherwise
stop. When the whole training set passed though the ANN, that makes an epoch.
First term, in Backpropagation Algorithm is “feed forward” defines how this neural network
works and recalls patterns and the term “back propagation” defines how this kind of neural
networks are accomplished. The network obtains inputs by neurons in the input layer, and the
output of the network is given by the neurons on an output layer(Šastný, Konený, & Trenz,
2011).
Next, the network calculates a loss function to estimate the loss (or error) and to compare and
measure how good/bad our prediction result.
Once the loss has been calculated, this information is propagated backwards. after forward
propagation has competed, we get an output value which is the predicted value. Starting from
the output layer, that loss information propagates to all the neurons in the hidden layer that
contribute directly to the output. To calculate error, we compare the predicted value with
the actual output value. We use a loss function to calculate the error value. Then we calculate the
derivative of the error value with respect to each weight in the neural network.
42
Loss Function/Cost Function — the loss function computes the error for a single training. The
cost function is the average of the loss functions of the entire training set.
Figure 3-8: Learning Process Artificial Neural Network
3.10. Process flow diagram
Figure 3-9: Process flow diagram
Fig 16 describes the Enset yield prediction model. The model has four core phases. These are,
Data preprocessing,
Prediction
In data preproces