Upload
sharyl-lester
View
215
Download
0
Embed Size (px)
Citation preview
Data.govReview of New and Existing Applications
Brand K. Niemann, Rich W. LaValley, Dr. W. Chris Hardy
Presentation to the Data Architecture Subcommittee (DAS)
September 10, 2009
Advanced Concepts and Integrated Systems (ACIS)
SAIC
© 2008 Science Applications International Corporation. All rights reserved. SAIC and the SAIC logo are registered trademarks of Science Applications International Corporation in the U.S. and/or other countries.
2
Overview
• Review Data Sets
• Review and Demonstrate New and Existing Applications
• Feedback and Comments
Summary
Tools Data
COTS & GOTS, Desktop & Web
Review variety of applications from a variety of sources
Review Datasets and Application Sources
Data Sources
• http://www.data.gov/details/92 (June 2009)
Application Sources
• http://data-gov.tw.rpi.edu/wiki/Main_Page
• http://wiki.sunlightlabs.com/Main_Page
• http://data-gov.tw.rpi.edu/wiki/Main_Page
• http://www.gov2expo.com/gov2expo2009
By the Numbers
Data Sources
www.data.gov
788
Data-gov.tw.rpi.eduhttp://data-gov.tw.rpi.edu/wiki/Demos
Applications: 11
Converted to RDF: 16
Apps for America 2
http://sunlightlabs.com/
46
Gov2.0 Expohttp://www.gov2expo.com/gov2expo2009/public/schedule/presentations
35 (5 Categories)
Other: https://analyzethe.us/
Palantir Government
See also Data.Gov dashboard
http://spreadsheets.google.com/pub?key=tchvwRko8_bEQ9c36b33fOA&gid=10
The Giant Warehouse of Data
File Type Contributed
Influence the kind of applications that are developed.
Review the Challenge (Open Government)
Gives us the tools and we will can do it ourselves. Lend your hand and your coding skills (Tim O’Reilly)
http://blip.tv/file/25528241. Be an Organizer2. Volunteer skills, developers – parse a state – 50 states3. Provide Specific Results, Work together4. Visualize Data(Clay Johnson, Sunlight Labs)
http://blip.tv/file/2075676
5. Visually explore and interact with data to facilitate sense making (DAS, 9/10/2009)
Age of Visualization and Analysis
Emerging Trends in Data Visualization, July 30,2009 DM Radiohttp://www.information-management.com/dmradio/-10015788-1.html
Heat Maps, Tag Clouds, Concepts Layers Widgets, Dashboards, Sliders, Filters
View of data over time is a storyHeat Maps, Tag Clouds, Concepts Layers Widgets, Dashboards, Sliders, Filters
View of data over time is a story
http://www.smartmoney.com/investing/bonds/the-living-yield-curve-7923/
The Yield Curve
View of data over time is a story (temporal and geospatial characteristics)
http://www.palantirtech.com/government/analysis-blog/uncovering-a-bot-net-exploring-router-data-using-palantir
Efficient Access of Data Sources
• Data Imaging
• Direct, ad hoc extraction of selected data elements from a native file• Representation of the content of the data extracted as an integer matrix
• Dates become integer in YYYYMMDD format• Time becomes number of seconds after midnight• Character names/descriptions assigned index values in table• Numerical values expressed as integer with understood base
• Benefits• Minimal overhead in configuration for data handling• Significant compression of working files without loss of content• Substantial acceleration of data retrieval and analysis capabilities achieved by:
• Reduction of tests to integer (=1 word) compares • Exploiting matrix-based processing efficiencies
Efficient Access of Data Sources
Date/Time of X-mission
Router SerialNumber Fault Code
Example479,921 records/69,588,548 bytes
1/7/2007 7:00:00 6 1 1/7/2007 6:29:00 00-08-74-36-37-21 29.224.42.199 80 1545 9S26
1/1/2007 4:00:00 3 2 1/1/2007 3:36:00 00-08-74-52-83-98 129.224.42.199 443 1970 4Z55
1/2/2007 14:01:00 14 3 1/2/2007 14:01:00 00-08-74-52-73-79 129.251.240.179 443 1073 2J89
1/7/2007 0:01:00 0 1 1/7/2007 0:01:00 00-08-74-52-83-98 129.251.240.179 80 7095 4Q66
1/7/2007 22:00:00 21 1 1/7/2007 21:01:00 00-08-74-06-36-24 129.3.1.91 22 2014 7X44
1/5/2007 8:01:00 8 6 1/5/2007 8:01:00 00-08-74-08-66-92 129.3.1.91 443 5821 5G49
1/7/2007 13:00:00 12 1 1/7/2007 12:31:00 00-08-74-52-73-79 129.40.42.144 22 1605 4Z55
1/7/2007 18:00:00 17 1 1/7/2007 17:36:00 00-08-74-52-73-79 129.40.42.144 443 922 4Z55
1/5/2007 12:01:00 12 6 1/5/2007 12:01:00 00-08-74-52-83-98 129.66.124.144 80 2825 3D39
1/5/2007 6:01:00 6 6 1/5/2007 6:01:00 00-08-74-06-36-24 129.9.137.79 21 3653 1G29
. . .
Data Elements of Interest : 21,596,488 bytes
Image Generation: 44.5 secsImage Size:
- 7,678,736 bytes plus 12,592 bytes in conversion tables - 9:1 compression over total data set
- 2.8:1 compression of data soughtQuery for Error Counts by Router:
- Direct: more than 1 minute - Matrix-Based: 9.7 secs - Image-Based: 1.2 secs
-- Neighborhood to Live
Name Source
Crime in the US 1998-2007
FBI Tableau 5 Application
Data.gov
New Application
State
--Related
Are you Safe?
http://www.areyousafedc.com/
Existing Application
City
--Related
Every Block
http://dc.everyblock.com/crime/by-offense/theft/
Existing Application
City
--RelatedDensity of firearms/ Death Rate
http://www.datamasher.org/mash-ups/test-123#table-tab
Existing Application
State
-- Purchasing a Car, Planning a Vacation
Name Source
Fuel Efficient Cars
www.fueleconomy.gov
Heat Map Explorer (COTS)
New Application
Federal
Hurricane data (1990 – 2006)
-- Related www.nhc.noaa.gov/
Tableau 5 (COTS)
New Application
Federal
See other examples
http://www.tableausoftware.com/learning/examples
Discussion and Feedback
-- OtherBackup
Name Source
World Copper Smelters
http://tin.er.usgs.gov/copper/output/copper-fLD.kml
Data.gov
Existing Application
World Copper Smelters.bmp
USGS Oil and Gas Assessment Database
http://energy.cr.usgs.gov/oilgas/wep
Data.gov
Existing Application
World Petroleum Assessment.bmp
-- Emerging Technologies Backup