Work-package 6 Statistical integration Allard de Wit & Raymond van der Wijngaart

Work-package 6Statistical integration

Allard de Wit & Raymond van der Wijngaart

Content of this workshop

Work package description About crop yield forecasting Yield forecasting models Statistical forecasting assumptions The CGMS Statistical Toolbox

Work package description

Statistical integration for an E-Agriculture service

To integrate the crop yield indicators generated by modelling platforms and those derived from remote sensing information. The indicators generated from the previous work-packages will be ingested into a statistic analysis platform which has to integrate different statistical approaches, to be regionally flexible and specific and to be reproducible for a posterior evaluation.

Work package description

Objectives Implement the CST for crop yield forecasting for

the target regions Load database with yield predictors derived

from crop modelling and/or remote sensing based approaches

Organize a workshop for piloting of the CST and improve understanding and experience with the tool by the end users.

Raise awareness and gain experience with the CST which should lead to improved yield forecasts.

Work package descriptionNo Work Package title Typ

eLeader Person-

monthsstart end

6 Statistic integration

Alterra 11 24 36

6.1 Statistic platform design

RTD Alterra 6.5 24 36

6.2 Statistic platform pilot

RTD INRA 4.5 30 36

No Deliverable name Nature Dissemination

Delivery date

6.1.1 CST setup for target regions

Report Public 36

6.2.1 Piloting and workshop report

Report Public 36

About crop yield forecasting

Estimates of crop yield or production for the current season before the harvest

For administrative regions Often calibrated against regional statistical

data Continuous “assimilation” of data as the

growing season progresses Improve accuracy during the growing season Better then a baseline forecast (i.e. average

or trend)

About crop yield forecasting

“the art of identifying the factors that determine the spatial and inter-annual variability of crop yields” (René Gommes, FAO 2003).

About those factors“ … AND IF IT GETS ENOUGH RAIN, AND SUN, AND IF IT ISN'T KILLED BY HAIL, AND IF IT ISN’T DAMAGED BY FROST, AND IF WE CAN GET IT OFF BEFORE IT’S COVERED BY SNOW, AND IF WE GET IT TO THE ELEVATORS, AND IF THE TRAINS ARE RUNNING, AND IF THE GRAIN HANDLERS AREN’T IN STRIKE, AND IF … “

Type of forecasting systems

Judgement, based on stakeholders that reach consensus on the expected yield given all available information

Statistical, based on functional relationships between a crop yield indicator and the crop yield statistics (e.g. time trend models and/or CGMS simulation result)

Combinations of the above (European MARS system)

Parameterize the model in time or in space?

Build a time-series model for one region and multiple years

Build a spatial model for multiple regions and one year

Combine the two above (even more difficult!)

Reasons for preferring a time-series modelSeveral effects: Socio-economic factors differ between

regions (example: Germany 7.24 ton/ha, Poland 3.44

ton/ha). Crop yields often show an upward (or

downward) trend over the years Magnitude of simulated year-to-year

variability in crop yield differs from variability in regional statistics.

Statistical forecasting models

Parametric models: Regression analyses: (multiple -) regression

between crop yield statistics and crop indicators Non-parametric models:

Scenario analyses: Find similar years and use these to forecast

Neural networks: train a neural network to recognize yield-indicator relationships

Statistical forecasting assumptions

Uses time-series of historic statistics and crop yield indicators

Parameterizes a forecasting model explaining the relation using a best fit criterion (mean squared error)

Crop yield = f( time-trend + indicators(i1, i2, …) ) Uses (multiple) linear regression The model parameters are derived at several time

steps during the growing season (i.e. each dekad) Forecast model is then applied in prognostic mode

to forecast the current season’s yield


0

1970

year

simulated yield

observed yield

trend

2010 2011

simulated indicator

forecasted yield

2000

Long term technological trend CGMS predicts deviation from trend caused by weather


Advantages: Simple, understandable Hypothesis testing (statistical significance) Provides models with predictive power

Example of regression analyses for winter-wheat in Hebei province

Winter-wheat yield -- Hebei province

2.5

3

3.5

4

4.5

5

5.5

6

1992 1994 1996 1998 2000 2002 2004

Year

Yie

ld [

ton

/ha

]

R 2̂ = 0.521

Time-trend analyses

Strange value, discarded!

Compare residuals with CGMS resultsHebei winter-wheat - Deviations from the trend

-0.5

-0.4-0.3

-0.2-0.1

0

0.10.2

0.30.4

0.5

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Year

Yie

ld a

no

ma

ly [

ton

/ha

]

Hebei winter-wheat - CGMS pot. yield storage organs

0

1000

2000

3000

4000

5000

6000

1994 1995 1996 1998 2000 2001 2002 2003

Year

Bio

mas

s [k

g/h

a]

Fit linear model through yield anomaliesResiduals vs CGMS simulation results

-0.5

-0.25

0

0.25

0.5

3000 3500 4000 4500 5000 5500

CGMS simulated yield [kg/ha]

Yie

ld a

no

mal

y [t

on

/ha]

R2 = 0.601

Build and test complete forecasting model

Our complete forecasting model is specified by:

1. The time trend model (R2 = 0.521) +2. The yield anomaly model as a function of

the CGMS simulation results (R2 = 0.601)

The CGMS statistical toolbox (CST)

Manual analyses is error prone CST does several analyses: time trend analyses,

(multiple) regression analyses and scenario analyses

Each model is tested whether it improves prediction beyond the trend only

Hypothesis testing for determining significance of results

Lists the models that were analyzed

Lists the statistical indicators of each

model

Lists the t-values of each component in each model

(significance testing)

Coloring of according to

statistical significance and sign of the model

coefficient

Documents

Work-package 6 Statistical integration Allard de Wit & Raymond van der Wijngaart