12
Applied Data Analysis Lab – a profile Dr. Lukasz Bolikowski ICM, University of Warsaw December 2014

A profile of Applied Data Analysis Lab (ADA Lab)

Embed Size (px)

Citation preview

Page 1: A profile of Applied Data Analysis Lab (ADA Lab)

Applied Data Analysis Lab – a profile

Dr. Łukasz BolikowskiICM, University of Warsaw

December 2014

Page 2: A profile of Applied Data Analysis Lab (ADA Lab)

ADA Lab ⊆ ICM ⊆ UW

University of Warsaw (UW) is one of the top Polish higher education establishments.

Interdisciplinary Centre for Mathematical and Computational Modelling (ICM)is a supercomputing and research data centre within the University of Warsaw.

Applied Data Analysis Lab (ADA Lab) is a research group within the ICM.

Page 3: A profile of Applied Data Analysis Lab (ADA Lab)

ADA Lab’s Scope of Interest

Legal Text Mining

Business Data Mining

Training & Outreach

Scholarly PDF Mining

Map of SciencePersistent IDs

Data Anonymization

Scalable Text and Data Mining Informatics for Open Science

Page 4: A profile of Applied Data Analysis Lab (ADA Lab)

Legal Text Mining

Building a judgment analysis system for Poland.Integrating data from common courts, theSupreme Administrative Court, the SupremeCourt, and the Constitutional Tribunal.Planning a larger, European project with similargoals (Horizon 2020; currently building consor-tium and defining scope).

Page 5: A profile of Applied Data Analysis Lab (ADA Lab)

Business Data Mining

Leveraging high demand for data science skills.For-profit projects with business partners.Usually can’t discuss details due to NDAs.Our favourite toolset:

R for data understanding and modellingApache Spark for analysing larger data setsD3 for information visualizationCRISP-DM for managing our projects(Cross-Industry Standard Process for Data Mining)

Page 6: A profile of Applied Data Analysis Lab (ADA Lab)

Training and Outreach

“Web-Scale Data Mining and Processing”(Course at Polish Academy of Sciences)

“Introduction to Text Mining”(Course at Warsaw School of Data Analysis organised by ICM)

Internal trainings on Hadoop, SparkPresentations at Big Data conferences(Target audience: business partners)

Workshops and internships for talented youth(In collaboration with Polish Children’s Fund)

Page 7: A profile of Applied Data Analysis Lab (ADA Lab)

Scholarly PDF Mining

Extracting metadata, bibliographic references, and full textfrom scholarly PDFs. Research direction: semantic anno-tation of paragraphs, sentences, phrases.CERMINE is an open software (AGPL license), with usersworldwide: OpenAIRE.eu, Paperity.org, Public KnowledgeProject.Interfaces for humans and for machines (RESTful API).Try CERMINE at: http://cermine.ceon.pl/

Page 8: A profile of Applied Data Analysis Lab (ADA Lab)

Map of Science

A comprehensive map of academia. Mining availabledocuments and data sets in order to reconstruct thegraph of relations between: people, documents, insti-tutions, topics, funding sources.Final result: a publicly available data set.Why? Better understanding of science. Cool featuresin digital libraries and research information systems.Elements of the map currently developed in OpenAIREand OCEAN projects.

Page 9: A profile of Applied Data Analysis Lab (ADA Lab)

Persistent IDs

To achieve long-term preservation of research arti-facts, we need an identifier minting and managementscheme that can outlive the organization managingthe scheme.We are developing a distributed scheme based onpublic-key cryptography and P2P networking (a lotin common with Bitcoin).

Page 10: A profile of Applied Data Analysis Lab (ADA Lab)

Data Anonymization

Privacy-preserving research data publication is across-cutting issue, applies to various types ofdata analysed at ICM: legal judgments, medicalrecords, social network activity.

Page 11: A profile of Applied Data Analysis Lab (ADA Lab)

Thank you for your attention. Let’s stay in touch!

adalab.icm.edu.pl/blog

twitter.com/adalab_icm

linkedin.com/in/bolikowski

twitter.com/bolikowski

[email protected]

Page 12: A profile of Applied Data Analysis Lab (ADA Lab)

License

c© 2014 ICM, University of Warsaw. Some rights reserved. This presentation is available under a CC BY 3.0 license. Materials from the followingsources were used:

https://www.flickr.com/photos/86530412@N02/8213432552 (p. 4, CC BY 2.0)https://www.flickr.com/photos/124247024@N07/13903385550 (p. 5, CC BY-SA 2.0)https://www.flickr.com/photos/genista/228006200 (p. 6, CC BY-SA 2.0)https://www.flickr.com/photos/bohman/210977249 (p. 9, CC BY 2.0)https://www.flickr.com/photos/hyku/368912557 (p. 10, CC BY 2.0)