Spatial Data Integrator - Software Presentation and Use Cases

Embed Size (px)

Citation preview

  • 1. Spatial Data Integrator software presentation and use cases National Geographic Community Meeting Day Ministry of Ecology and Sustainable Development Ministry of Agroculture mathieu.rajerison

2. Summary

  • Software presentation
  • General aspects

3. Place of an ETL inside a data infrastructure 4. The different interface elements of SDI Demonstration: joining data and managing rejects

  • Configuring the access and creating the schemas

5. Connecting the components insite the workspace 6. Configuring the tMap component 7. Executing the job Use cases

  • Scheduling the aggregation of different sources of data

8. Merging layers 9. Chaining the quality checking of layers 10. Migrating data to PostgreSQL/PostGIS 11. Other applications Conclusion

  • Some other functionalities

12. Links 13. 1- Software presentation 14. General aspects

  • Opensource ETL(Extract, Treat and Load) Software created by CampToCamp

15. Based onTalend Open Studio 16. It adds aspatiallayer to TOS thanks to geospatial access and treatment components 17. Developed on Java: Eclipse environment, UDig elements, GeoTools library, Java Topology Suite, Sextante 18. Place of an ETL in a data infrastructure Dashboards Portal 19. The interface elements the map window This windows enables to visualize geographic data. It is useful when controlling the results of a treatment. This windows is part of UDig Software. 20. The toolThe business modeler The business modeler enables to model the job processes Il allows a large public to take part of of the data flow conception and to follow the advancement of development, without requiring any computer skills Modelling in this window has no impact on the job execution 21. The interface elements The repository metadata tab The repository contains, among other things, the metadata part The metadata part is a place where to store the data access parameters. On the image, you can notice-the different types of data sources. Note that the configuration of geographic data is not made inside the metadata part (we'll see that further in the demo) 22. The interface elements The graphical workspace The main window is where you create your jobs You pick your components and put them here There are different types of relations between components that won't be detailed in this keynote. 23. The interface elements The components palette The palette contains the different components. It's a kind of toolbox Spatial Data Integrator adds thegeopart to it The palette is extensible thanks to the contributions of developers As it is opensource, you can develop your own components 24. The interface elements The configuration tab the bottom windows is where you configure the behaviour of each component it also enables you to parameter the execution of your job. 25. 2- Demonstration How to manage outer joins 26. Configuring the data accessand creating the schemas the first step consists in configuring the access to you data source. 27. Connecting the componentsinside the workspace You put and connect the components inside the workspace 28. Configuring the tMap component Here, the city name links the two tables. Two output flows are generated: one for inner join results, one for the outer join ones. 29. The job execution The job can now be executed There are two modes of execution: - statisticsmode displays the number of rows for each flow - tracesmode displays its content Each of these modes is executed in streaming. 30. Going further:detecting similarities between rows Here, we use a fuzzy logic component namedtFuzzyMatch . It detects the similarities between rows coming from two different flows. It can be useful to see which rows from a reference (lookup) table correspond the most to the outer join results. 31. 3- Use Cases 32. Scheduling the aggregation of data A web geographic portal demands joining periodically the data from different sources Here, it is an Access database fed by users. We'll associate its entries with the cities objects. WMS Access SHP BDCARTO Map Server Sybase XML ... Client part SCP SHP 33. Scheduling the aggregation of data -SDI task scheduler-crontab for Linux env -windows taskscheduler 34. Merging layers Imagine a data infrastructure where geograhic layers are disseminated in as many files as cities. Consequently, there is one file per city. The jobs aims at merging all these files in one unique table. SHP5 SHP4 SHP3 SHP2 SHP1 SHP 35. Merging layers 36. Chaining the Quality Controlof Digitalized Documents After having digitalized a huge mass of data, we must operate a complete control on it. The geometry of the objects and their attributes must be checked. This task is very time-consuming if we accomplish it with usual mapping softwares. checkingthetables structure checkingthecontent checkingthegeometriccompliance comparison tothereference data 37. Chaining the Quality Controlof Digitalized Documents With a single click, SDI enables to operate this series of controls Reports will list errors related to the objects geometric compliance or attribute values. checkingthetables structure checkingthecontent checkingthegeometriccompliance comparison tothereference data 38. Chaining the Quality Controlof Digitalized Documents 39. Chaining the Quality Controlof Digitalized Documents Job comparing the Urban Planning Project Map to the Cadastral Reference Data. 40. Chaining the Quality Controlof Digitalized Documents Tmap joining component Used function Result type row4.the_geom. symDifference (row2.the_geom) gomtrique GeometryOperation.GETAREA (row4.the_geom.difference(row2.the_geom))flottant 41. Migrating data into aPostgreSQL/PostGIS database At a regional scope, we want to mutualize data and integrate it into a PostgreSQL/postGIS database management system Folder tree Relational Database System 42. Migrating data into aPostgreSQL/PostGIS database 43. Other applications

  • Mass geometric treatment : splitting or slicing objects using ones of a different layer

44. Dividing an imagein multiple images, each cut using the city contour and naming each image with the name of the city it has been cut with 45. Using Talend withGDAL-OGR : conversion in other formats 46. Massive reprojections 47. Extending the possibilities by using auxiliar java libraries 48. Conclusion

  • Shortensdrastically the delay between the data collecting and its valorization

49. Enables tomigrate ,consolidatespatial data infrastructures 50. Simplifiesusually time-consuming tasks 51. Avoids errorsdue to the repeating of manual operations, enhances the quality of controls 52. A very active community 53. New components are to be available 54. Some other functionalities

  • Can read multiple formats, amongst whichGPX ,WFSand "contemporary" standards:OpenStreetMap ,GeoRSS

55. Multiple accesses to data : SCP, FTP, WebServices, POP 56. Metadataautomatic creation: MEF, XML files for GeoNetwork 57. Raster processingusing Sextante 58. Conclusion Links

  • Learn how to use Talend
  • A generaldocumentation , and one dedicated to thecomponentscovering multiple use cases

Learn how to use Spatial Data Integrator

  • Awiki

Meet the community of users

  • The spatial data integratorforumhost by Tale nd