18
DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, [email protected] Based on presentation by Welf Löwe, [email protected] Linnaeus University Center of Excellence for Data Intensive Sciences and Applications https://lnu.se/disa

2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, [email protected]

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

DISA

A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017

Koraljka Golub, [email protected] on presentation by Welf Löwe, [email protected]

Linnaeus University Center of Excellence for Data Intensive Sciences and Applications

https://lnu.se/disa

Page 2: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se
Page 3: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Big Data, Information, and Knowledge

Big Data: Analyzing data sets and streams to gain knowledge about technical, scientific, sociological or economical phenomena.

• Data: symbols, signals, bits & bytes, words, numbers, tokens…

• Information: data interpreted in a context, i.e. meaning

• Knowledge: actionable information, i.e. insights allow for controlling processes and giving predictions

Page 4: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Creating knowledge from data: Collaborative effortCredit: EU CROS portal

Page 5: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

What is “Big” today and tomorrow?

• Challenging quantities that go beyond the capability of humans and commonly used software tools • A constantly moving target• Ranging from a few dozen terabytes (1012) to many petabytes (1015)

today

• Challenging qualities, as data come in varying and complex formats, consisting of all types of structured and unstructured data

• My computer has ½ terabyte = 512 gigabytes of disk space= 1 year of music, 20 days of video, 150,000 photos, 10 days of movies

Page 6: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Turn data into knowledge to create value Credit: PLOS Biology, 7 July 2015Projection for 2025, 1 petabyte = 1024 terabytes

Page 7: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Big Data – profile of LNU

• DISA@LNU:- Multidisciplinary fundamental and applied Big Data research - Linnaeus University Center (ca 70 MSEK from LNU alone until 2021)

• DISA@IEC: - Applied Big Data research and innovation with industry- IEC is an ICT cluster organizing collaborations of academia and industry

(ca 6 MSEK from EU, LNU, Regions of Kalmar and Kronoberg, Tungafordon, TEC until 2018)

• iSchools: - Effort in Big Data related courses and education programs

Page 8: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

IT Core – Foundations and Technologies

Transforming Big Data to Information to Knowledge• Signal processing: signal analysis and theories with application to direct and

inverse problems• Statistical analysis: collection, analysis, interpretation, presentation, and

organization of (sparse, high-dimensional) data• Machine learning: classification, estimation, prediction based on data• Visual analytics: analytical reasoning about data facilitated by interactive visual

interfaces

Coping with variety, velocity, and volume of Big Data • Composition, adaptation: building systems from (adapted) components• Self-adaptation: let systems adapt based on observations in its state and

environment• Future Internet: the Internet as a distributed, mobile data collection and

computing platform (Cloud Computing, Internet of Things)• Parallel & high-performance computing: scale with data processing and storage

Page 9: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Big Data Application Areas

1. Astrophysics

2. Wood and Building Technologies

3. Engineering of Smarter Systems

4. Software and Information Quality

5. Digital Humanities: English Linguistics, Media & Journalism, Library and Information Science

6. Computational Social Sciences

7. eHealth

Page 10: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

eHealth: Improving our health systems

What if we could gain actionable knowledge from all European health registers?

Credit: BigApple Horizon 2020 proposal.

Page 11: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Accelerating DNA analysis

What if we could …• Reduce time for pediatricians to scan and analyze

the entire genome of a critically ill infant? • Explore differences and similarities of all

organisms and their evolutionary relationship?

GenBank dataset• 188’372’017 DNA sequences (Oct 2015)• One human DNA sequence ~3 gigabytes

Our DNA sequence analysis application* • Finds patterns in large-scale DNA sequences• Speedup 10x compared to baseline on regular PC

Credit: http://www.biol.unt.edu/

* Suejb Memeti, Sabri Pllana: Accelerating DNA Sequence Analysis using Intel Xeon Phi, ISPA 2015 IEEE.

Page 12: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

What if we could… • Analyze for instance what people in Sweden talk about in public?• Explore the relationship between language and thought – public sentiment,

consumer trends, opinions, …• Look in real-time through a window to the world

The Nordic Twitter Stream initiative• A robot to monitor geocoded Tweet stream in five Nordic countries• Strict ethical guidelines: focus on macro-level patterns – not individuals• Text and data mining tools to analyze big and complex data sets• User-generated content – what do we talk about on a really big scale?• A consortium of linguists and computer scientists

The Nordic Tweet Stream initiative (NTS)

Understanding language

Page 13: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Applications inEngineering,

Sciences & Humanities

IT Core

Self-adaptationVisual analytics

Distributed ComputingFuture Internet

Parallel & high-performance computingSignal processingStatistical analysisMachine learning

Composition & adaptation

Astrophysics

Wood/BuildingTechnologies

Software& InfoQuality

Media &Journalism

EnglishLinguistics

eHealth

Comput.Social

Sciences

Library &Info Science

Page 14: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Ongoing Big Data collaborations

Collaboration with industry• Common research projects Atlas Copco, Ericsson, IBM, Intel, Meltwater,

NVIDIA, Sigma Technology, Telia, Vattenfall, Yaskawa, …• Interests across industriesCollaboration with branch organization • IT industry: IEC, SwedSoft• Building and construction: Smart Housing Småland, GodaHus• Heavy vehicle industry: Tunga fordon• Manufacturing: LTC, TECCollaboration with public sector • Active project involvement of and financial support from the regional municipalities• eHälsomyndigheten established in Kalmar much due to the activities at the eHealth

Institute• Collaboration with Stralsakerhetsmyndigheten

Page 15: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

New Big Data collaborations

• Offer towards industry, the public sector, and interested LNU researches• Various research competences that we can contribute with • Many ways of collaboration

- High-Performance Computing Center- Data and Text Streaming Platform (social media and web data) - Thesis and seed projects- Digitalization courses and seminars for industry (with IKEA)- Development of and contribution to educational programs - Workshops for developing realistic research and innovation projects with

external funding- Research and innovation projects together with LNU- …

Page 16: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

High-Performance Computing Center (HPCC)

• The HPCC will be a high- performance computing platform providing advanced computing and storage infrastructure and knowledgeable scientific and technical staff.

• It will provide services to scientists and to the regional industry.• The platform complements the large-scale computing and storage

infrastructures that are available at the national level, e.g., the Swedish National Infrastructure for Computing (SNIC).

• Core exists: cluster with 20 servers, two accelerated single-node systems; suitable for experimentation with limited-size problems.

• DISA invests ca. 3 MSEK, but more funding will be needed

• Contact [email protected]

Page 17: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Thesis and Seed projects

• Collaborative innovation and research in Big Data with Master’s and PhD students and senior researchers involved

• Explore ideas and jump start collaboration projects• Multidisciplinary collaboration between industry, the public sector and LNU• Short start up time, 3-6 month activities, budget of max 200 KSEK• Why

– Idea testing and prototyping– Explorative study– Get in contact with students as (future) customers – Get in contact with teachers as part of the marketing strategy– Assessment of students for hiring– IT support for internal development and innovation- Pilot for a larger research project

• Contact [email protected]

Page 18: 2017 Welf DISA@LNU-intro v1 - gxu.edu.cn...DISA A Calculus of Culture | Circumventing the Black Box of Culture Analytics, Nanning, 21-22 March 2017 Koraljka Golub, koraljka.golub@lnu.se

Research and innovation projects together with LNU • Requirements as for seed projects (multidisciplinary or/and industry-academia)• Longer planning and decision time (ca. 1 year) • Longer project time (3-5 years), • Higher funding (3-30 MSEK) but usually in-kind contributions required• Why

- LNU: keep or get back control over your research subject when there is a paradise shift towards data-driven scientific approaches

- Industry: do not miss benefitting from the data you have access to and getting necessary R&D resources for free

• Contact [email protected]