8
https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox [email protected] Informatics, Computing and Physics Indiana University Bloomington

Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox [email protected]

Embed Size (px)

Citation preview

Page 1: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org

Cyberinfrastructure Supporting Social Science

Cyberinfrastructure WorkshopOctober 16 2012 Chicago

Geoffrey [email protected]

Informatics, Computing and PhysicsIndiana University Bloomington

Page 2: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 2

Goal of Day• Come up with a few (3-5) projects that

advance Social Sciences Cyberinfrastructure• Choose so that together they cover spectrum

of characteristics

Characteristics

A B C …. Z

Project 1 X X X

Project 2 X X X

…..

Project N X X

Page 3: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 3

Data Type• What is large? #Collections v. Collection Size v. #Users• “Big (Social) Science” v Long Tail

• # rows v # columns v time dependence• Structured (defined) v unstructured (inferred/discovered) metadata• granularity of metadata

• Data modality: Streaming, video, image, text, “binary”– vector space or not (genomics, network)

• distributed v centralized data (production/storage/processing)• Complex objects v. tables• Observed v. simulation or modeling

Page 4: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 4

Data Nature (“ilities”)• Open data• Sharable Data• Publication model / Data citation models?– DOI or Handler

• Reproducibility• Sustainability• Standards • Management• Integration• Dramatic change in next 10 years• Data availability as in Public Windy Grid

Page 5: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 5

Mining/Analyzing data• Access: role of Community comments, crowd sourcing, • Processing: “Simple” statistics, Linkage software, data

visualization, GIS, analytics (SVM, LDA, Clustering ...); (new) management tools

• Data Mining (discovering the unexpected) v. Data Analysis (discovering with excellence the ~expected)

• Modeling for data components and regression• More data v more/better algorithms (in simulation, algorithm

advances ~ as important as machine advances)• Programming model: Excel, SQL, R, SPSS, Other Scripting,

MapReduce, "Fortran/C++/Java", Libraries, workflow, portal/gateway

• Open software & sustainability of it

Page 6: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 6

Security & Privacy

• Support sharing• The law• Risk of identification, harm from disclosure• Differential Privacy and nifty obfuscation ideas• IRB• Federated Identity• Enclave

Page 7: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 7

The Infrastructure• Repository/Archive v. Active (compute + storage) data• Bring Computing to data • Commercial Clouds v. XSEDE v. University• Local v. cloud v. department/university • Distributed (Federated) clouds as collections distributed• DropBox, Google docs, Skype etc. v customized• Generality of DuraCloud, Dataverse DataUp etc.• Tool repository/library• Cloudbursting (public-private hybrid cloud)• Connectivity to cloud (can be addressed by I2?)• Backup v Main Home

Page 8: Https://portal.futuregrid.org Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October 16 2012 Chicago Geoffrey Fox gcf@indiana.edu

https://portal.futuregrid.org 8

Other Characteristics• Satisfying NSF Data Management requirements• Breadth of applicability of solutions• # Organizations collaborating on project• Interdisciplinary collaborations• Data (science) Curricula• Relation to issues in other fields• Support and Governance• Industry ahead of Academia