Upload
john-brzeski
View
72
Download
3
Embed Size (px)
Citation preview
Big Data on Crossrail
Big Data on Crossrail
John Brzeski, CH2M Hill
15th June 2016, BGA Annual Conference
Good afternoon everyone.
Im here to talk to you about Big Data on Crossrail.
Project/What data we have
What is Big Data?
Examples and Challenges
How managed and the Future
First Id like to start by telling you about the project and the data we have. Then Ill attempt to explain to you what I think big data is. We can they see if it is possible to call our data big data.
Ill show you some examples and explain some of the challenges we have face along the way
Finally Id like to share with you what I think the future of this data is, something which I believe has the potential to be very interesting
1
Crossrail
Im sure everyone in the room knows about Crossrail
Scope is - Data in Central section
TBM, SCL, Shafts & Portals
My role to manage data from all
Brief
UCIMS
Im sure everyone in the room knows all about Crossrail so I wont go in to too much detail on the project itself. This scope of presentation is to explain how we have handled the data relating to ground movements caused by the Crossrail construction activities in the central section. Which consists of 42km of twin bore TBM tunnels, 5 new underground stations excavated using SCL methods and various associated portals and access shafts.
My role on the project was to manage the data being generated by the main works contractors with regards to construction progress and Instrumentation and monitoring. The brief for this part of the project was to provide the client with a centralised database of monitoring and construction information in order for interested parties to view and download data. This was to be known as the Underground Construction Information Management System, or more colloquially as UCIMS.
2
The data
I&M
100,000 sensors
Up to sub-hourly readings
InSAR
850,000 Reflectors
6 readings per quarter
Construction Data
TBM, SCL, grouting, excavations, piling, dwalls
200,000 separate events
Asset Information
Utilities, buildings, structures
16,000 assets
Approx. 1billion records
Mapping
Aerial Images
Ok, so what data do we have?
Our data
Cool representation of our data points
Blue sensors
Yellow construction
Grey InSAR
Others Assets an Mapping
Total
Hell of a lot. All types from many. Europe's biggest civil project reflected in scale of data project
3
Is this Big Data?
1 billion readings
Sensors, satellite monitoring, building information, construction events
Generated at max frequencies of 1 reading every 15 minutes
Erroneous values common in I&M
Decision making tool for construction process control
5 Vs
Volume
Variety
Velocity
Veracity
Value
Its a hell of a lot but can we call it Big data?
Have to say what BD is first.
This nice chap is a bit of a guru
Uses the 5 Vs to describe the nature of BD
Not really quantitative
Not about the data its about making use of it.
For us
Is this big data? Yes (has the potential)
Are we managing to get the most VALUE out of the huge VOLUME of data given its VARIETY, VELOCITY and VERACITY?
4
A word (or two) on data format
So, if were saying we have 1 billion records. What about variety?
This refers to the format
Every system needs to be flexible
Need to be prescriptive
AGS 3.1 was chosen. Gave us the framework to conform with the BS for geotech dm
Needed some tweaks
Innovation was construction groups see animation
Specs being reviewed
Used for the last 4 years.
Mixed reception but raised the bar on CRL
In terms of the Five Vs this is really the variety
Its important for any system to be able to cope with various data formats and there are so many out there for different data types. When dealing with multiple contractors it is important to establish a format and more importantly, structure, that all can adhere to in order to use the data effectively.
Crossrail adopted AGS 3.1 for this format. Whilst AGS 3.1 does include the monitoring spec, it did not encompass everything we needed so we added instrument types and readings fields.
The construction data groups were devised in consultation with the main works contractors and with reference to Crossrail specifications. We made an AGS group for each construction type as explained on the previous slide.
The CRL AGS spec is currently under review by the AGS but we have used them over the last 4 years to standardise the type of data and metadata we wanted.
5
Data veracity
Knocks !
Spikes !
Gaps/Noise !
Too much !
Now on to the trustworthiness of our data
The veracity
Some issues in I&M spikes, knocks, gaps, noise, delays, too much!
Example
Eliminating in live system
Interesting thing is Im often asked to sanitise
Understand they exist and develop strategies to see past.
Others in the industry working on this.
What I will say is that we have gone through a continual QA process with the data. Everything is present and correct and in the right position.
Eliminating these errors in a live monitoring system is not a realistic objective, it is more about understanding that they can exist and developing tools to see past them. One of the biggest challenges I faced on Crossrail was getting consumers to approach the analysis of the data from a bigger data perspective. As we all know, monitoring is all about identifying trends and comparing observed behaviours to events. This is not generally easily possible by viewing series of individual graphs and then looking up events elsewhere, then comparing this to other environmental factors. Everything needs to be in one place so that relationships can be explored and data should be aggregated to see trends. ## In terms of stakeholder assurance, the most successful contracts were those who could take all the data being generated and distil it down to the pertinent points without missing information.
The challenge is to have a system that can still perform the required analysis with the spurious data included without it skewing results. I have often been asked to remove spurious data from the database but it is sometimes valuable to keep it as it shows what type of errors you can expect to see in the data and you can then work on algorithms to identify and discount them. Something I know that others in the industry are also working on.
6
.
Initial implementation
13000 files per day
2000 downloads a day
500 plots a day
So how did we manage all of this on Crossrail?
The contract for doing this was let along with the Route Wide I&M contract
This is what was delivered. Some screenshots
Lancaster Gate
TBM
Sensor
Profile
Stats
Ramping down
Served construction phase and fulfilled brief
Use evolving
Damage claims
Single source of truth
Not scalable value not realised
7
What next?
ValueDesignAcademic ResearchFuture ProjectsConstruction ControlStakeholder AssuranceData ManagementDamage AssessmentsSo, given the scaling issue, what have we done?
Rebuilding of dB from source AGS
Focus has been on connectivity & scalability
Scalability geos both ways
APIs for dashboard
See TCR example
API for connection via Esri arcMAP and other GIS
Conclusion
Conclusion most important v, Value
Clear that a lot of value has been delivered
Always the case that data is forgotten
Aim is to make data open
Data.gov.uk BGS, OS, TFL
Challenges
Making the data available to all would ensure the full value is extracted
Some of the values to be gained shown on slide
Love to hear from you if you have experience in this.
8
9