Upload
briana-easter-williams
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
https://portal.futuregrid.org
Cyberinfrastructure Supporting Social Science
Cyberinfrastructure WorkshopOctober 16 2012 Chicago
Geoffrey [email protected]
Informatics, Computing and PhysicsIndiana University Bloomington
https://portal.futuregrid.org 2
Goal of Day• Come up with a few (3-5) projects that
advance Social Sciences Cyberinfrastructure• Choose so that together they cover spectrum
of characteristics
Characteristics
A B C …. Z
Project 1 X X X
Project 2 X X X
…..
Project N X X
https://portal.futuregrid.org 3
Data Type• What is large? #Collections v. Collection Size v. #Users• “Big (Social) Science” v Long Tail
• # rows v # columns v time dependence• Structured (defined) v unstructured (inferred/discovered) metadata• granularity of metadata
• Data modality: Streaming, video, image, text, “binary”– vector space or not (genomics, network)
• distributed v centralized data (production/storage/processing)• Complex objects v. tables• Observed v. simulation or modeling
https://portal.futuregrid.org 4
Data Nature (“ilities”)• Open data• Sharable Data• Publication model / Data citation models?– DOI or Handler
• Reproducibility• Sustainability• Standards • Management• Integration• Dramatic change in next 10 years• Data availability as in Public Windy Grid
https://portal.futuregrid.org 5
Mining/Analyzing data• Access: role of Community comments, crowd sourcing, • Processing: “Simple” statistics, Linkage software, data
visualization, GIS, analytics (SVM, LDA, Clustering ...); (new) management tools
• Data Mining (discovering the unexpected) v. Data Analysis (discovering with excellence the ~expected)
• Modeling for data components and regression• More data v more/better algorithms (in simulation, algorithm
advances ~ as important as machine advances)• Programming model: Excel, SQL, R, SPSS, Other Scripting,
MapReduce, "Fortran/C++/Java", Libraries, workflow, portal/gateway
• Open software & sustainability of it
https://portal.futuregrid.org 6
Security & Privacy
• Support sharing• The law• Risk of identification, harm from disclosure• Differential Privacy and nifty obfuscation ideas• IRB• Federated Identity• Enclave
https://portal.futuregrid.org 7
The Infrastructure• Repository/Archive v. Active (compute + storage) data• Bring Computing to data • Commercial Clouds v. XSEDE v. University• Local v. cloud v. department/university • Distributed (Federated) clouds as collections distributed• DropBox, Google docs, Skype etc. v customized• Generality of DuraCloud, Dataverse DataUp etc.• Tool repository/library• Cloudbursting (public-private hybrid cloud)• Connectivity to cloud (can be addressed by I2?)• Backup v Main Home
https://portal.futuregrid.org 8
Other Characteristics• Satisfying NSF Data Management requirements• Breadth of applicability of solutions• # Organizations collaborating on project• Interdisciplinary collaborations• Data (science) Curricula• Relation to issues in other fields• Support and Governance• Industry ahead of Academia