37
BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Embed Size (px)

Citation preview

Page 1: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

BoM / CAWCR.

Text Generation in the Next-Gen Forecast System (GFE)

J Bally &T Leeuwenburg

Page 2: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Background & Drivers.... Next-Gen Forecast System

Better use of NWP models

Systematic forecast process

Temporal and spatial detail

Can verify everything

Efficiency gains

Many new services: grids, graphics and text all from the same weather database

Page 3: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Nowcast: TIFS (objects) On-the-fly, shallow, slot filling

Page 4: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation… introduction

Most sophisticated meteorological text generation system ???

Large jump from “slot filling” systems (TIFS, TC, Scribe etc)

Text as a network of nodes

Goal directed multi-pass processing

64,000 lines of python - > 15 p-yr development

Page 5: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation : example goals

Try for <= three weather sub-phrases (2 for wind etc.)

Describe the weather trends, rather than a sequence

Describe changes in weather only if the impact differs substantially

Try for elegant sentence structure; split out unusual weather types if they are not part of the trend

Must-goals (guarantees) vs should-goals

……….etc etc

Page 6: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation…multi-pass processing

Page 7: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation…multi-pass processing

Page 8: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation…multi-pass processing

Page 9: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation.... overview

Information representation

Data Gathering

Information Processing and Document Planning

Mapping to Words ( Surface Realisation )

Post Processing

Page 10: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Representation: Scalars, Vectors, Weather……

PoPSky

WeatherTemp / Wind

Page 11: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Representation: Hazards

Hazards

Page 12: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation....

Information representation

Data Gathering

Information Processing and Document Planning

Mapping to Words ( Surface Realisation )

Post Processing

Page 13: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Data Gathering.... Grid sampling

Use Statistics for scalars and vectors Element ...

30th percentile wind speed, 90th percentile wind speed Wind Phrase

25th and 75th percentile wind directions centred on average dir Wind Phrase

90th percentile, 10th percentile Sea Height

90th percentile, 10th percentile Swell Height

25th and 75th percentile swell directions centred on average dir Swell Direction

What about weather and hazards?

How to summarise a bit of patchy rain, isolated severe thunderstorms and raised dust?

Lets concentrate on the weather ..........

Page 14: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Data Gathering.... Grid sampling- eg 3 hr time slices

} Isolated Thunderstorms

NoWxSct SH -

WideSH m

PatchyRA m

Sct TS n

Isolated Showers}Key Number of

Points*Percentage

Wide SH m 10, 000 10%

Sct SH - 34, 533 35%

Patchy RA m 7, 644 8%

Sct TS n 10, 000 10%

No Weather 45, 000 45%

Reported coverage = Σ (internal coverage * grid point count)

total points

Page 15: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Data Gathering.... Grid sampling

NoWxSct SH -

WideSH m

PatchyRA m

Sct TS n

Reported coverage = Σ (internal coverage * grid point count)

total points

Reported Intensity = Σ (intensity contribution* grid point count)

total affected grid points

Similar calculation to collapse similar weather types…Sh/Dz/Ra

Page 16: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Data Gathering.... Grid sampling

NoWxSct SH -

WideSH m

PatchyRA m

Sct TS n

Filtering the Weather List

Wx Types Coverage Threshold

SN, SNSH, SL, SLSH 2.5% of total area

TS, FG, MI 5% of total area

FR 5% of the area below 500m

All other types 15% of the total area

Page 17: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation....

Information representation

Data Gathering

Information Processing and Document Planning

Mapping to Words ( Surface Realisation )

Post Processing

Page 18: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing.... Embedded Local Effect > Winds: Easterly 10 to 20 knots decreasing to 10 to 15

knots around midday then increasing to 15 to 20 knots during the afternoon, locally up to 30 knots in the east. Seas: Below 0.5 metres increasing to 0.5 to 1 metres by early evening, locally up to 1.5 metres in the east.

Forecast-Split Local Effect > In the east: Winds: Easterly 10 to 20 knots increasing

to 20 to 30 knots during the afternoon. Seas: 0.5 to 1 metres, increasing up to 1.5 metres by early evening.

Elsewhere: Easterly 10 to 20 knots decreasing to 10 to 15 knots around midday then increasing to 15 to 20 knots during the afternoon. Seas: Below 0.5 metres increasing to 0.5 to 1 metres by early evening.

Page 19: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing....

Check for Local Effects …. Scalar Metrics

Element Stat Scale Value / Embedded

Consideration Value At 0.5

Wind DIR AVG 135 deg 90 deg

Wind SPEED MAX 15 kt 10 kt

Sea HEIGHT MAX 1.5m 1.0 m

Swell DIR AVG 135 deg 90 deg

Swell HEIGHT MAX 1.5m 1.0 m

Page 20: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing....

Check for Local Effects

Name Wind Speed

Wind Dir Sea Swell Height

Swell Dir Avg

East-West 0 0 0 2 2 0.8

Far West 1 1 2 2 2 1.6

Far East 0 0 0 0 0 0

Inshore 3.0 0 0 0 0 0.6

Offshore 4.0 0 0 0 0 0.8

Page 21: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation…multi-pass processing

Page 22: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing.... Pre-Process Weather......

Arrange statistics in time order; Combine where appropriate, maintaining ranges; Separate co-reportable types

0-3am 3-6am 6-9am 9-noon noon-3pm

3-6pm 6-9pm 9-night

NoWx NoWx SH+ + DU SH+ + TS SH+ + TS SHm SHm NoWx

0-3am 3-6am 6-9am 9-noon noon-3pm

3-6pm 6-9pm 9-night

NoWx NoWx

-----------------------( SH+, SHm)-----------------------

----------TS-----------

---DU---

Subphrases after preProcessWx

Subphrases before preProcessWx

Page 23: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing.... Simplify Weather......

Collapse Ranges

0-3am 3-6am 6-9am 9-noon noon-3pm

3-6pm 6-9pm 9-night

NoWx SHm RA+ SH-

Subphrases after preProcessWx

Subphrases before preProcessWx

0-3am 3-6am 6-9am 9-noon noon-3pm

3-6pm 6-9pm 9-night

NoWx (SHm, RAm) (RAm, RA+) SH-

Page 24: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing.... Merge Weather …. telling little white lies

0-3am 3-6am 6-9am 9-noon noon-3pm 3-6pm 6-9pm 9-night

NoWx SH + TS SH

After mergeOverlap:

0-3am 3-6am 6-9am 9-noon noon-3pm 3-6pm 6-9pm 9-night

NoWx SH + TS

before mergeGap:

0-3am 3-6am 6-9am 9-noon noon-3pm 3-6pm 6-9pm 9-night

NoWx Isol SH- NoWx Sct SH - AreasRAm

Subphrases after mergeGap:

0-3am 3-6am 6-9am 9-noon noon-3pm 3-6pm 6-9pm 9-night

NoWx Isol SH- Sct SH - AreasRAm

Page 25: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Recall …multi-pass processing

Page 26: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Information Processing....

Have we tried every processing step enough?

Have we achieved our goals for level of detail?

Can Adjust Detail by…..

Looking for more local effects?.. Split forecast?

More aggressive sub-phrase combining

Coarser sampling strategy

Start again !

Page 27: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation....

Information representation

Data Gathering

Information Processing and Document Planning

Mapping to Words ( Surface Realisation )

Post Processing

Page 28: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Mapping to words.... Process Trends….

Recognise and Summarise trends

0-3am 3-6am 6-9am 9-noon noon-3pm

3-6pm 6-9pm 9-night

NoWx NoWx NoWx SH- SHm RAm RAm RAm

0-3am 3-6am 6-9am 9-noon noon-3pm

3-6pm 6-9pm 9-night

NoWx SH- developing >..skip.. > increasing to Ram

Subphrases after ProcessTrends

Subphrases before ProcessTrends

Page 29: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Mapping to words....

Connectors Increasing / Decreasing Becoming / Tending Developing / Clearing

Winds W toNW’y at 15 to 25 knots tending W to SW’ly then increasing to 30 knots.

Isolated showers developing during the morning then increasing to heavy widespread rain…..

Page 30: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Mapping to words....

Time reporting Transition (change verbs) Over-time (nouns) Mixed (trend verbs)

Winds W to NW’y at 15 to 25 knots tending W to SW’ly around noon then increasing to 30 knots.

Morning Fog. Isolated showers developing during the afternoon then increasing to widespread rain…

Page 31: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Text Generation....

Information representation

Data Gathering

Information Processing and Document Planning

Mapping to Words ( Surface Realisation )

Post Processing

Page 32: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Post Processing....

Post-Process Phrases

- string replacements to cover limitations

- “band-aid”… eg

Early frost. Early fog. >> Early frost and Fog.

Remove repeated words eg

W to NW’y winds becoming NW’ly

Page 33: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Example District Forecast... inc local effects

Page 34: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Products all forecasts are in XML ...

Page 35: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

QC.. with some help from our testing infrastructure ...

Page 36: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

Change Management ....

Importance of specifications

Agreed? policies

Big change in the role of forecasters

Forecaster edits for style and/or substance

Change management

Page 37: BoM / CAWCR. Text Generation in the Next-Gen Forecast System (GFE) J Bally & T Leeuwenburg

The End

Text Generation in the GFE