15
Big Data What is Big Data? Recently much good science, whether physical, biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon. Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115) Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewa- tripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric

Big Data What is Big Data? Recently much good science, whether physical, biological, or social, has been forced to confront - and has often benefited from

Embed Size (px)

Citation preview

Big Data

• What is Big Data?• Recently much good science, whether physical,

biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon.

• Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115)

Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurementand Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewa-tripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics:Theory and Applications, Eighth World Congress of the Econometric Society, CambridgeUniversity Press, 115-122

Big data spans four dimensions:

Volume, Velocity, Variety, and Veracity

• The first 3Vs definition is widely used by Gartner and much of the industry

• The new V “Veracity” is introduced by some organizations

• Volume: Enterprises are awash with ever-growing data of all types, easily amassing– terabytes—even

petabytes—of information.

– Turn 12 terabytes of Tweets created each day into improved product sentiment analysis

– Convert 350 billion annual meter readings to better predict power consumption

• Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching– fraud, big data must be

used as it streams into your enterprise in order to maximize its value.

– Scrutinize 5 million trade events created each day to identify potential fraud

– Analyze 500 million daily call detail records in real-time to predict customer churn faster

• Variety: Big data is any type of data - structured and unstructured data such as text, sensor– data, audio, video, click

streams, log files and more. New insights are found when analyzing these data types together.

– Monitor 100’s of live video feeds from surveillance cameras to target points of interest

– Exploit the 80% data growth in images, video and documents to improve customer satisfaction

• Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. – How can you act upon information if you

don’t trust it? – Establishing trust in big data presents a

huge challenge as the variety and number of sources grows.

Where Does Big Data Come From?

• Our Data-driven World– Science

• Data bases from astronomy, genomics, environmental data, transportation data, …

– Humanities and Social Sciences• Scanned books, historical documents,

social interactions data, new technology like GPS, …

– Business & Commerce• Corporate sales, stock market

transactions, census, airline traffic, …– Entertainment

• Internet images, Hollywood movies, MP3 files, …

– Medicine• MRI & CT scans, patient records, …

Usage Example in Big Data

US 2012 Election

- Orca big-data app (however, there were so many fails about ORCA)- YouTube channel (23,700 subscribers and 26 million page views)

- predictive modeling- mybarackobama.com - drive traffic to other campaign sites Facebook page (33 million “likes”) YouTube channel (240,000 subscribers and 246 million page views)- a contest to dine with Sarah Jessica Parker- Every single night, the team ran 66,000 computer simulations- Amazon web services

Usage Example in Big Data (cont.)

Data Analysis prediction for US 2012 Election

Drew Linzer, June 2012 332 for Obama, 206 for Romney

Nate Silver’s, Five thirty Eight blogPredict Obama had a 86% chance of winningPredicted all 50 state correctly

Sam Wang, the Princeton Election ConsortiumThe probability of Obama’s re-electionat more than 98%

media continue reporting the race as very tight

Big Challenge in Big Data

• How to convert big data into useable information by identifying patterns and deviations from those patterns?

• Big data challenge requires talents– Highly skilled in programming and data

analysis to extract meaningful information and insights

Big Data Techniques and Technologies

• A/B testing• Association rule learning• Classification• Cluster analysis• Crowdsourcing• Data fusion and data

integration• Data mining• Ensemble learning• Genetic algorithms• Machine learning• Natural language

processing• Neural networks

• Network analysis• Optimization• Pattern recognition• Predictive modeling• Regression• Sentiment analysis• Signal processing• Spatial analysis• Statistics• Supervised learning• Simulation• Time series analysis• Unsupervised learning• Visualization• …

• Common Skill Sets– Data analysis is the cornerstone– Education and experience in data analysis, business

analytics, mathematics, statistics, quantitative skills

Big Questions about Big Data

• What happens in a world of radical transparency, with data widely available?

• If you could test all your decisions, how would that change the way you compete?

• How would your business change if you used big data for widespread, real time customization?

• How can big data augment or even replace Management?

• Could you create a new business model based on data?

• …

Related Careers in Big Data

• Data scientist– Often at the top of the big data hierarchical chart– Typically proven professionals who posses deep

analytical talent• Data architect

– Computer programmers who are skilled in working with undefined data and disparate types of data

• Data visualizer– Professionals who are able to translate data into

information that people can effectively use• Data change agent

– Use data analytics to recommend and drive changes within an organization

• Data engineer and operator– Designers, builders and managers of big data systems

Job Opportunities in Big Data

• Resource: McKinsey• There will be a shortage of talent necessary for organizations to

take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions

• Big Data industry is worth more than $100 billion growing at almost 10% a year (roughly twice as fast as the software business)

Demand for Deep Analytical Talent in US

IS Relevant Courses

• IS 410: Introduction to Database Design– Discuss the process of database development,

including data modeling, database design, and database implementation

• IS 420: Database Application Development– Offer hands-on experience for developing

client/server database applications using a major database management system

• IS 427: Introduction to Artificial Intelligence: Concepts and Applications– Provide an introduction to, and hands-on experience

with several Artificial Intelligence (AI) techniques• IS 428: Data Mining Techniques and

Applications– Learn both how data mining techniques work and

how to apply data mining to various business and organizational contexts