26
BIG DATA: BIG DATA ANALYSIS: is it a solution to understand big problems? Rice program (Agrobioversity) & Big Data expert group (DAPA)

Big data ciat april_2014_dj_et_slideshare

Embed Size (px)

DESCRIPTION

Presenters: Daniel Jiménez (Leader of the Big Data expert group, DAPA) & Edgar Torres (Leader, Rice Program, AGROBIODIVERSITY) Title: BIG DATA: BIG DATA ANALYSIS: is it a solution to understand Big Problems? . The case of yield variation of rice in Colombia ------------------ Cukier and Mayer-Schönberger (2013) stated “As the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing information will help us to make sense of our world in ways we are just starting to appreciate”. We subscribe to this view and nowadays in agriculture we have the capacity to capture, analyze, store and share agricultural information in ways which 10 years ago was considered science fiction. The amount and variety of agricultural data generated by multiple individuals and organizations using a huge range of techniques and technologies is growing exponentially. We believe that the next agricultural (r)evolution will come from the development of innovation systems that harness agricultural data from multiple sources, to generate new knowledge that will increase agricultural productivity moving beyond blanket technological solutions towards a system of dynamic site-specific management, which are sensitive and responsive to climate, soil and local socio-economic conditions. In this seminar, CIAT's researchers will share how several databases that have been collected for different purposes and shared by FEDRARROZ (the country-wide association of rice growers in Colombia), have been used to obtain important insights to support FEDEARROZ on how to be more efficient managing rice at site-specific level. http://marafris.ciat.cgiar.org:8080/Webinars/Bluejeans/2014-24-04%20-%20Daniel_Jimenez_Edgar_Torres.mp4 Mayer-Schonberger, V., Cukier, K., 2013. ). Big Data: A Revolution That Will Transform How We Live, Work and Think

Citation preview

Page 1: Big data ciat april_2014_dj_et_slideshare

BIG DATA: BIG DATA ANALYSIS: is it a solution to understand big problems?

Rice program (Agrobioversity) & Big Data expert group (DAPA)

Page 2: Big data ciat april_2014_dj_et_slideshare

computational models are tailored to the analysis of the data rather than data to a particular

methodology, as researchers have done for over a century

Applying the principles of Big Data to research in agriculture

• Big Data refers to things that one can do at a large scale that cannot be done at a smaller one to extract new insights

• Sometimes to inform is better than explain – Looking for patterns or associations

• Approaching “N=All”

• Adding value to secondary databases

Big Data (Foreign Affairs magazine / McKinsey's High Tech)… Cukier and Mayer-Schönberger (2013)

Page 3: Big data ciat april_2014_dj_et_slideshare

computational models are tailored to the analysis of the data rather than data to a particular

methodology, as researchers have done for over a century

How?

• Including the use of ICTs to collect (androids app), analyze (traditional and machine learning techniques), share (in a way that facilitates the decision making at different levels and for different users)

• Analytical approaches tailored to the analysis of the data rather than data to a particular methodology, as researchers have done for over a century

• Development of tools as part of a close dialogue with end-users

Page 4: Big data ciat april_2014_dj_et_slideshare

How?

+ + =

Climate Soil Crop management productivity/ha (including varieties)

% ? + % ? + %? = To Explain (100 %)

Maximizing productivity in agricultural systems. Working with secondary databases

• To Identify the combination of factors that lead to high and low productivities (empirical approaches – machine learning)

• Within the framework “Convenio MADR-CIAT” climate change project – Adaptation strategy

Page 5: Big data ciat april_2014_dj_et_slideshare

19901991

19921993

19941995

19961997

19981999

20002001

20022003

20042005

20062007

20082009

20102011

20120

500

1000

1500

2000

2500

3000

3500

0.0

1.0

2.0

3.0

4.0

5.0

6.0Trends on Rice Production, Harvested Area and Yield in Colombia, 1990-2012

Area Production Yield

Thou

sand

s to

ns o

r has

Tn/h

a

The problem: In Colombia, since 2009 there is a significant reduction on the yields at the farm level

Source USDA-PSD

Page 6: Big data ciat april_2014_dj_et_slideshare

And what are the causes for this yield reduction?

We can see similar problems in Central America, Ecuador, Peru and Venezuela. Reductions on yield that are causing heavy losses to the rice farmersNot a single factor is involved: Drought, high minimum temperatures, low light, high humidity, bacteria, mites , fungus , lack of adaptation etc.

low yields are caused by Burkholderia glumae!

Page 7: Big data ciat april_2014_dj_et_slideshare

Misdiagnosis, wrong treatments and excessive pesticides applications causing others problems (Hoja Blanca)

Non ecoefficient

And to worsen the problem the farmers wants a “magical cure

Page 8: Big data ciat april_2014_dj_et_slideshare

Reducing stress because of lack of water. Water Harvest

Better agronomyKey points, Crop Rotation and Regulations

Improved CultivarsIncreasing Yield PotentialProtecting YieldAdding value

Trait Discovery

Gene Discovery &

Marker

Applications

Germplasm

Enhancement

Elite Breeding

Inbred&Hybrids

There is something missing here?

How we can manage this problem?

Page 9: Big data ciat april_2014_dj_et_slideshare

AMTECMassive Adoption of Technology

OBJECTIVES To transfer jointly the technology available for crop management.

To increase productivity and reduce production costs, with the least environmental impact, in a context of social responsibility

To aim for competitiveness and profitability of rice farmers in Colombia

TECHNOLOGY TRANSFER

Field days

Planning and good

management practices

Visits to research centers

Demonstration Trials

Reduction costs

Page 10: Big data ciat april_2014_dj_et_slideshare

AMTEC Results from 2012 and 2013… Source Fedearroz

Agronomy helps a lot!

2012

2013

Page 11: Big data ciat april_2014_dj_et_slideshare

Gene discoveryEmerging pathogen: Burkholderia glumae, producing grain sterility

Sources of tolerance identified

Tolerant genotype showing 60% less damage than susceptible genotypes

Molecular markers are being developed to speed up the transference of this trait into elite germplasm

Susceptible Tolerant (field evaluation)

Page 12: Big data ciat april_2014_dj_et_slideshare

Trait Discovery Gene Discovery & Marker Applications

Germplasm Enhancement Elite Breeding

Breeding pipeline

low light tolerance;nitrogen use efficiency;water use efficiency;high yield potential;panicle blight tolerance

recombinant populations;CSSL;NAM;iBridges;Software

introgressed lines;RS population;training population for GS

yield potential;grain quality; lodging resistance

•QTLs mapping; •QTL validation; •functional markers identification

•MABC;•recurrent selection;•genomic selection

•inbred FLAR•CIRAD & hybrids-HIAAL;•MET

•trait value characterization; •screening methods;•donors identification;•populations development;•sequencing;•gene validation

Page 13: Big data ciat april_2014_dj_et_slideshare

TECH

NO

LOGY

TR

ANSF

ER(2

5 ag

rono

mist

) RESEARCH BREEDING AND AGRONOMY

(45 researchers)

Breeding (Conventional 7,)

Agronomy (Physiology 3, Phytopatology 1, Soils 2, Water 2, Crop Management 26, Biotech 3, Weeds 1)

ECONOMICS(7 officials)

Updated Socio-economic studies

Our strategic partner for Rice Research in Colombia

Page 14: Big data ciat april_2014_dj_et_slideshare

computational models are tailored to the analysis of the data rather than data to a particular

methodology, as researchers have done for over a century

National Survey• Purpose: Keep the crop sector updated • N= 738 cropping events

Harvesting records• Purpose: Technical research (crop management, soils, breeding,

biotechnology, physiology)• N= 3193 cropping events

“Data is no longer regarded as static, whose usefulness is finished once the purpose for which it was collected is achieve”

Information on: Planting and harvesting date, productivity , grain humidity, variety, cropping system

Zones: Caribbean, Andean (Tolima), Plains (Llanos)

Databases:

Databases…. plenty of information

Page 15: Big data ciat april_2014_dj_et_slideshare

Adding value to secondary databases. The case of information on cropping events of rice

in Colombia

Planting dates experiments (Field trials) • Purpose: Technical research on the best sowing date• N= 272 cropping events

Adding value to secondary databases…but first, merging databases: Challenging task!!!

Climate • About 27 weather stations

Page 16: Big data ciat april_2014_dj_et_slideshare

Letting the data speak

“Before Big Data our analysis were usually limited to testing a small number of hypotheses that we defined well before we even collected the data. When we let the data speak we can make connections that we had never thought existed”

Cukier and Mayer-Schönberger (2013)

Page 17: Big data ciat april_2014_dj_et_slideshare

Sowing Harvest

a cropping event in rice = 120 days

Climate series for all variables

Crop

time

Hypothesis Yield variation is associated with climate

Page 18: Big data ciat april_2014_dj_et_slideshare

FEDEARROZ 733, 27 % of productivity variation explained

Multivariate analysis for Saldaña (research station- Andean zone ): cropping events (2007 to 2012)

Lagunas, 47 % of productivity variation explained

Varieties perform

differently under identical

climatic conditions

Letting the data speak

FEDEARROZ 733

N = 189

N = 63

Cimarrón Barinas

Page 19: Big data ciat april_2014_dj_et_slideshare

Letting the data speakClimate and analysis based on phenological stages in Saldaña (research station ) Andean zone 2007 – 2012 (N= about 800 cropping events – irrigated rice)

• The crop sector can suggest to farmers the best planting date• By assessing the same approach in other stations (enviroments) – New insights

for future breeding • Adaptation strategy for climate change

Climate accounts for 30% to 40% of production variability in irrigated rice

Page 20: Big data ciat april_2014_dj_et_slideshare

computational models are tailored to the analysis of the data rather than data to a particular

methodology, as researchers have done for over a century

Letting the data speakClimate and analysis based on phenological stages in Zone: Colombian Plains- 2007 – 2012 (N= about 500 cropping events – Upland rice)

• Rainfall is a critical driving factor for upland rice during grain filling and panicle initiation

• Machine learning (MLP)

Again! - climate accounts for 30% to 40% of production variability in upland rice

Page 21: Big data ciat april_2014_dj_et_slideshare

Letting the data speakClimate and analysis based on phenological stages in Zone Plains-Colombia 2007 – 2012 N= about 200 (cropping events – Upland rice.. variety F174)

• Temperature is a critical driving factor for variety 174 (upland rice) during grain filling• Machine learning (MLP)

This time climate explained more than 40% of production variability !!! in upland rice V F174

Page 22: Big data ciat april_2014_dj_et_slideshare

Case study : working with secondary databases: Seasonal forecast, niñ@s & Big Data. Rice in Colombia (Pompeya- Llanos)

What is likely to happen in March-April-May 2014?

We generated 24 clusters based on more than 500 cropping events

• Seasonal forecast + (data) Best technologies + Big Data analysis = Better adaptive responses to CC and CV

Cluster 7

Rice variety Productivity (Kg/Ha) Cropping eventsF174 4,564 31FORTALEZA 3,543 17F2000 4,977 8LAGUNAS 5,052 6MOCARI 4,604 6

Page 23: Big data ciat april_2014_dj_et_slideshare

What can we do with these results?FLAR and CIAT Rice Breeders• Better understanding of yield and its formation under changing, complex,

and extremely variable conditions.• New breeding objectives like low light tolerance, pattern of biomass

accumulation etc.• Better environments definition

FEDEARROZ• Reduce pesticide applications.. since it is demonstrated that there are

other factors behind the yield variation• Establish planting dates and new crop systems based on crop rotation• Establish a dynamic system for crop management based on short term

prediction to manage the risk associated with the changing conditions

CGIAR• Expand this experience to other crops and areas• Understand the importance of FARMERS ORGANIZATIONS to have impact• Interesting concept for CCAFS, GRiSP, MAYZE others

Page 24: Big data ciat april_2014_dj_et_slideshare

•The analytical approach used demonstrated that variation of rice productivity can be associated with climate (30 -45%)

• Internal Cooperation between research areas within CIAT and external FEDEARROZ is a powerful combination- Also… multidisciplinary work is key!!!

•As long as the information is available it can be applied in any other regions/ crops

• CCAFS is keen to integrate – CN selected CSMS (CIAT- FLAR-IRRI)

• Start collaborations with the yield gap taskforce

•Encourage others partners in LAC to collect information and be part of this idea…(e.g strategy of FLAR) and add value to info that has been already collected.

Concluding remarks and perspectives

Jimenez, Daniel (CIAT)
Para vla próxima, hablar más lo de los rangos óptimos...en la tesia hay algo escrito al respecto... lo de os resultados por sitio, tal ves es un well known factor lo de la radiación pero lo de que en el mismo sitio difrrentes variedades responden diferente a temperatura, radiación, grados acc, etc., eso hay que hacwer más énfasis... lo otro es lo de que damos respuesta rápida (en 8 meses crunching data) legamos a resultados similares alcanzados through traditional field trial experimentation
Page 25: Big data ciat april_2014_dj_et_slideshare

Modern information technoloy, Big Data, Site-specific Management/Agriculture, digital soil mapping, Terra I, Bio-informatics are already here…

A new Ageekulture can be regarded as complementary to CIAT’s traditional research in order to fulfill the center`s mission

Concluding remarks and perspectives

Page 26: Big data ciat april_2014_dj_et_slideshare

THANK YOU!!!•Patricia Guzman – FEDEARROZ•Nestor Gutierrez- FEDEARROZ•Jose Levis – FEDEARROZ

•Gabriel Garces - FEDEARROZ

• Andy Jarvis (CC expert)•Edgar Torres – (Rice Breeder)•Daniel Jiménez (Agronomist)

• Camila Rebolledo – (Plant Physiologist)• Sylvain Delerce – Agronomist /Math background

•Hugo Dorado (Statistician) •Armando Muñoz (Biologist) •Victor Patiño (Statistician)

•Juan Felipe Rodriguez (The computer science component)

MADR, FEDEARROZ, CCAFS, GRiSP