81
HTTP://WWW.ANALYTICS-MAGAZINE.ORG JULY/AUGUST 2013 DRIVING BETTER BUSINESS DECISIONS ADVENT OF THE DATA SCIENTIST Executive Edge FICO chief analytics officer Andrew Jennings on what makes a good data scientist WHY IT’S THE ‘SEXIEST JOB OF THE 21 ST CENTURY’ BROUGHT TO YOU BY: Analytics & BPM ‘SMAC’ delivers one-two punch Predictive Analytics Harnessing the power of big data Forecasting & Optimization Perfect technology for complex sourcing ALSO INSIDE:

Data Analytics and Performance

Embed Size (px)

DESCRIPTION

How data analytics is changing the face of performance measurement. How Data Analytics is complementing BPM framework - the global trend...Awareness for South Africa.

Citation preview

Page 1: Data Analytics and Performance

H T T P : / / W W W. A N A LY T I C S - M A G A Z I N E . O R G

JULY/AUGUST 2013DRIVING BETTER BUSINESS DECISIONS

ADVENT OF THE

DATA SCIENTIST

Executive Edge FICO chief analytics officer Andrew Jenningson what makes a good data scientist

WHY IT’S THE ‘SEXIEST JOB OF THE 21ST CENTURY’

BROUGHT TO YOU BY:

• Analytics & BPM ‘SMAC’ delivers

one-two punch• Predictive Analytics Harnessing the

power of big data

• Forecasting & Optimization Perfect

technology for complex sourcing

ALSO INSIDE:

Page 2: Data Analytics and Performance

W W W. I N F O R M S . O R G2 | A N A LY T I C S - M AGA Z I N E . O R G

Big bang theory of analytics

INSIDE STORY

FICO recently published an eye-pop-ping infographic called “The Analytics Big Bang” that, according to an accompany-ing press release, “traces predictive ana-lytics from the dawn of the computer age in the 1940s through the present day, and cites compelling evidence indicating that the analytics industry is at an inflec-tion point.”

The compelling evidence includes these nuggets:• Sales of analytics software grew

from $11 billion to $35 billion between 2000 and 2012.

• The number of data scientist job posts jumped 15,000 percent from 2011 to 2012.

• 2.5 quintillion bytes of big data are created each day, enabling analytics to become more insightful, precise and predictive than at any point in history.

“Predictive analytics is becoming the defining technology of the early 21st cen-tury,” says Andrew Jennings, FICO’s chief analytics officer and head of FICO Labs, which produced the infographic. “You can trace the evolution over the past few de-cades, but we’ve now reached a tipping point where the convergence of big data, cloud computing and analytic technol-ogy is leading to massive innovation and

market disruption. We foresee predictive analytics being used to solve previously unsolvable problems, and bringing enor-mous value to businesses, governments and people.”

The explosive growth in the demand for analytics and data scientists has created an interesting problem for managers like Jennings: What makes a good data scien-tist and how do you find one? Jennings ad-dresses the question in his Executive Edge column in this issue of Analytics magazine. Jennings details four key skills and traits to look for when building an analytics team and notes that, “It’s a great time to be a data scientist, but a tricky time to hire one.” ❙

– PETER HORNER, [email protected]

Page 4: Data Analytics and Performance

FEATURESANALYTICS & BPMBy Malcolm Ross‘SMAC’ delivers a much-needed combination punch for peak customer experience.

PREDICTIVE ANALYTICSBy Eric SiegelHarnessing the power of big data and the priceless collection of experience within it.

FORECASTING & OPTIMIZATIONBy Arne Andersson The development of optimization: the perfect technology for complex sourcing.

UTILITIES DUST OFF FORECASTING PLAYBOOKBy Tao Hong and Alyssa FarrellSmart grid data brings challenges and opportunities for the power industry.

FORENSIC ANALYTICSBy Priti RaviCombating a growing pandemic of corporate crime, from identity theft to insider trading.

ANALYTIC COMMUNICATIONBy Evan S. LevineFundamental principles include clarity, transparency, integrity and humility.

W W W. I N F O R M S . O R G4 | A N A LY T I C S - M AGA Z I N E . O R G

DRIVING BETTER BUSINESS DECISIONS

C O N T E N T S

30

38

44

50

58

64

64

50

30

JULY/AUGUST 2013

Brought to you by

Page 6: Data Analytics and Performance

6 |

DRIVING BETTER BUSINESS DECISIONS

REGISTER FOR A FREE SUBSCRIPTION:http://analytics.informs.org

INFORMS BOARD OF DIRECTORS

President Anne G. Robinson, Verizon Wireless President-Elect Stephen M. Robinson, University of Wisconsin-Madison Past President Terry Harrison, Penn State University Secretary Brian Denton, University of Michigan Treasurer Nicholas G. Hall, Ohio State University Vice President-Meetings William “Bill” Klimack, Chevron Vice President-Publications Eric Johnson, Dartmouth College Vice President- Sections and Societies Paul Messinger, University of Alberta Vice President- Information Technology Bjarni Kristjansson, Maximal Software Vice President-Practice Activities Jack Levis, UPS Vice President-International Activities Jionghua “Judy” Jin, Univ. of Michigan Vice President-Membership and Professional Recognition Ozlem Ergun, Georgia Tech Vice President-Education Joel Sokol, Georgia Tech Vice President-Marketing, Communications and Outreach E. Andrew “Andy” Boyd, University of Houston Vice President-Chapters/Fora Olga Raskina, Con-way Freight

INFORMS OFFICES

www.informs.org • Tel: 1-800-4INFORMS Executive Director Melissa Moore Meetings Director Teresa V. Cryan Marketing Director Gary Bennett Communications Director Barry List Headquarters INFORMS (Maryland) 5521 Research Park Drive, Suite 200 Catonsville, MD 21228 Tel.: 443.757.3500 E-mail: [email protected]

ANALYTICS EDITORIAL AND ADVERTISING

Lionheart Publishing Inc., 506 Roswell Street, Suite 220, Marietta, GA 30060 USATel.: 770.431.0867 • Fax: 770.432.6969

President & Advertising Sales John Llewellyn [email protected] Tel.: 770.431.0867, ext.209 Editor Peter R. Horner [email protected] Tel.: 770.587.3172 Art Director Lindsay Sport [email protected] Tel.: 770.431.0867, ext.223 Advertising Sales Sharon Baker [email protected] Tel.: 813.852.9942

Analytics (ISSN 1938-1697) is published six times a year by the Institute for Operations Research and the Management Sciences (INFORMS), the largest membership society in the word dedicated to the analytics profession. For a free subscription, register at http://analytics.informs.org. Address other correspondence to the editor, Peter Horner, [email protected]. The opinions expressed in Analytics are those of the authors, and do not necessarily reflect the opinions of INFORMS, its officers, Lionheart Publishing Inc. or the editorial staff of Analytics. Analytics copyright ©2013 by the Institute for Operations Research and the Management Sciences. All rights reserved.

DEPARTMENTSInside Story

Executive Edge

Analyze This!

Viewpoint

Forum

INFORMS audio, video presentations

Five-Minute Analyst

Thinking Analytically

2

8

14

20

26

72

76

80

76

Page 8: Data Analytics and Performance

W W W. I N F O R M S . O R G8 | A N A LY T I C S - M AGA Z I N E . O R G

Companies in every industry from retail to bank-ing are leveraging big data to improve the customer experience and enhance their bottom lines. Big data – high volume, high velocity (real time) and high vari-ety (structured and unstructured) data – is transform-ing the way we live and conduct business across all industries and all aspects of daily life.

This has created a talent gap for qualified data scientists. And this is not purely a Silicon Valley tech phenomenon. Gartner estimates that big data will gen-erate six million new U.S. jobs in the next three years, including non-technical roles (see CNNMoney).

Last October a Harvard Business Review article called data scientist “the sexiest job of the 21st centu-ry,” and Indeed.com reported that job postings for an-alytic scientists have jumped 15,000 percent between the summer of 2011 and 2012. McKinsey & Company predicted a 50 percent to 60 percent shortfall in ana-lytic scientists in the United States by 2018. Gartner echoed this sentiment, predicting that only one-third of 4.4 million global big data jobs will be filled by 2015.

Prior to 2000, the analytics function, outside of a few places like retail banking, was relegated to the finance or IT department. Now, many companies are

What makes a good data scientist?

BY ANDREW JENNINGS

At the heart of a successful deployment is still human

intelligence. Hiring the right people is crucial.

Four things to look for when building an analytics team.

EXECUTIVE EDGE

Page 9: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 9A N A L Y T I C S

hiring autonomous analytics teams that work across departments. There is no magic to leveraging big data in pursuit of solutions to business problems. Yes there is the technology – sophisticated predictive analytics, for example – but at the heart of a successful deployment is still human intelligence. Hiring the right people is crucial.

So what makes a good data scien-tist? What qualities should a company look for when recruiting and interviewing candidates?

I’ve been with FICO for 20 years, and the company itself has been hiring data scientists (by any name) since 1956. We’ve hired some of the best – and prob-ably a few who should never have been let near a data set. Here’s what we be-lieve you should do when building your own analytics team.

1. Find people who are focused on solving problems, not just boosting model performance curves. Math skills are important, but the point of leveraging big data analytics is solving business problems. It’s coming up with answers to challenges that will actually be use-ful in the real world. It means answering specific questions in ways that will be helpful to the bottom line. For example, key questions would include: What de-cision are we looking to improve? How

will we measure the improvement? How do we make that decision today? What are the deployment constraints? And so on. These are all practical questions be-fore one gets to the data and the statisti-cal techniques, which are generally the things that attract all the media attention.

One example of a big data challenge that seems to resonate universally and helps highlight the importance of these questions is customer attrition. Most businesses are focused on retaining their best customers. Aside from thorny questions like what does “best” mean, there are other important questions such as, how far ahead of the potential attri-tion event does the prediction need to be made? In other words, how does one construct the problem to allow time be-tween the prediction indicating an attri-tion risk and the delivery of some action and that action having enough time to be effective? These are business context questions that need to be answered long before a data analyst can be effective.

2. Make sure they can talk with people who don’t hold Ph.D.s. Data scientists are not simply good problem solvers; they are also good at helping to identify the right problems to solve and framing the questions in such a way as to yield meaningful answers. The challenges whose solutions have the most value to

Page 10: Data Analytics and Performance

W W W. I N F O R M S . O R G10 | A N A LY T I C S - M AGA Z I N E . O R G

an organization are not easy to solve and they often take a non-mathematical mindset. How can we make changes for the better? Where do we even start?

Some data scientists are abstract thinkers who are technical and academ-ic. And then there is a rare breed, those data scientists who can think and con-ceptualize and communicate to a busi-ness audience. Given some of the key questions above, ideally you want an individual who is business-savvy and well-versed enough on the larger strat-egy that he or she can have a discussion with the business user. If you could only choose one person this profile would be the perfect package, but these individu-als are hard to find.

In a team context, making that trade-off between best-in-class tech-nical skills and strong communicators who can help translate the highly tech-nical information into language that a business user can understand is a trade-off worth making. Also, going in reverse, those same people need to be able to translate a business need into an analytics investigation.

Ideally, even if the back-office analyt-ics folks won’t speak to clients, you want them to want to, because that indicates that they’re thinking of things from a cli-ent perspective, not just a technology perspective. There are some data scien-tists who will never want to move beyond an R&D role, and for these folks, com-munications may seem less important – but then again, don’t you want them to be able to justify their work, explain its benefits and author white papers?

3. Put more emphasis on skills and mindset than degrees. Clearly, a strong background in numerical science is a ne-cessity. Not all candidates need to be a Ph.D. in mathematics or operations re-search; they may be electrical engineers or sociologists. I have become less con-cerned about those specifics and far more attuned to the mindset. Good data scientists are not only technically sound, with attention to detail, but they are also inquisitive and open-minded; they ques-tion everything that they find. They ask tough questions of the data and equally of the veracity of the conclusions. Big

EXECUTIVE EDGE

Join the Analytics Section of INFORMSFor more information, visit: http://www.informs.org/Community/Analytics/Membership

Page 12: Data Analytics and Performance

W W W. I N F O R M S . O R G12 | A N A LY T I C S - M AGA Z I N E . O R G

will impress hiring managers. They will likely want to see a demonstrated abil-ity to learn a programming language, and link various concepts via code. There is obviously a need for individuals who are well-versed in big data programming frameworks such as Hadoop and statisti-cal programming languages such as R.

4. Use your current analysts to sniff out the real data scientists from the pretenders. As more and more candi-dates start self-identifying as data scien-tists, sorting through them all has become more challenging. When screening and interviewing data scientists, having one or more involved in the process, someone who really knows what he or she is do-ing, is an obvious part of the recruitment process. This is particularly important for those hiring managers with a traditional business background who may not know the right questions to ask.

Some candidates will of course over-represent their background and experience. They may claim to have run a full analytics process but really only have been involved in part of it. You don’t want to hire someone

EXECUTIVE EDGE

data doesn’t guarantee the right answer. People still need to think about getting to the right answer.

Increasingly, the effective data scien-tist needs to be able to automate. This means that they need to be comfortable with writing scripts and code to make their work efficient, mixing and matching tools, and have the ability to absorb new techniques.

From a long-term career perspective, one of the big opportunities is that data science can lead in any number of direc-tions. Some end up in sales, finance or executive management. Others start off in more traditional corporate roles and slowly gravitate toward jobs that are more heavily steeped in predictive ana-lytics. A broad skill set always comes in handy, and there will ultimately be a range of opportunities where an analytic mindset can be applied effectively. Being inquisitive goes a long way.

For those looking to transition to a data analytics role from, for example, a fi-nancial or economics background, basic programming skills are important. Being able to manipulate data and think logically

Request a no-obligation INFORMS Member Benefits PacketFor more information, visit: http://www.informs.org/Membership

Page 13: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 13A N A L Y T I C S

who says they’re a modeling superstar but in fact has been specializing in data cleans-ing. In all the buzz, analytics has become a broadly and loosely used term. They may know the lingo, but if they are not familiar with how a whole analytic project is put to-gether, the knowledge gaps may be too great to overcome.

A word of caution: The best analytic teams will embrace diversity of experi-ence and skills. Just like any other hir-ing situation you always need to guard

against hiring people that “look just like the people you already have.”

We’re entering a new age of analytic competition. It’s a great time to be a data scientist, but a tricky time to hire one. Ev-ery candidate will claim mad math skills – your job is to appraise those while also looking for the problem-solvers, the com-municators and the skills that will make your data scientists a more valuable part of your whole organization. ❙

Andrew Jennings is the chief analytics officer of FICO. He is a member of INFORMS.

Page 14: Data Analytics and Performance

W W W. I N F O R M S . O R G14 | A N A LY T I C S - M AGA Z I N E . O R G

A dozen or so years after the bursting of the In-ternet bubble, Silicon Valley is once again in the spotlight as a symbol of its times. The press cover-age of Napster founder and former Facebook Presi-dent Sean Parker’s $10 million wedding has brought cheers from some, jeers from others, and a series of spirited rejoinders from its perpetrator [1]. George Packer’s recent article in the New Yorker [2] shines a somewhat harsh light on the region (the money quote: “… after decades in which the country has become less and less equal, Silicon Valley is one of the most unequal places in America”). And as Somini Sengupta recently reported in the New York Times [3], despite the best efforts of various governments around the world to lure technically talented young people to their shores with visas and funding oppor-tunities, the Silicon Valley dream of quick wealth and enduring fame continues to exert an extremely pow-erful pull on their imaginations (one aspiring Indian entrepreneur describes the region as “the N.B.A. of the start-up world”).

Analytics are increasingly central to the lore, and the lure, of Silicon Valley. A great deal of leading-edge research on data analysis and modeling continues

Silicon Valley’s ‘serial entrepreneurs’

BY VIJAY MEHROTRA

Analytics are increasingly central to the lore, and the

lure, of Silicon Valley.

ANALYZE THIS!

Page 15: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 15A N A L Y T I C S

to happen on the Stanford campus, and many analytic innovations have roots that can be traced back to this research. More recently, as the launching pad for Internet search engines, online social networks and mobile application development, an increasing number of Silicon Valley companies have featured intelligent use of huge volumes of data as part of their “value proposition.” In turn, the develop-ment of Hadoop and the much-ballyhood “Big Data” revolution have largely hap-pened in response to the explosion of data resulting from the needs of search, social and mobile platforms.

One recent Saturday night, I took a trip down to Silicon Valley from my home in Oakland. While the drive took less than an hour, the cultural distance is startling: the New York Times has referred to Oak-land as “rust belt town” [4], and the city’s role in today’s technologically enabled global economy is primarily as a promi-nent physical node (because of its large container port).

I drop in to a sports bar to see an old friend, an analytics professional who has been working for more than a decade in the world of online advertising. He brings me up to speed on recent developments: the Hadoop-enabled platform that his group has been building and using for a couple of years lets them utilize more of the data that their network captures and

enables them to control decisions on a much more granular level than their pre-vious optimization platform.

He also expresses some frustrations: “Senior management depends on our al-gorithms to drive revenue, and there are a lot of people tasked to make sure the targets are actually met. But they don’t understand – and don’t really try to – what the models are actually doing, so every time we seem to be heading for a quarterly revenue shortfall there’s some kind of fire drill where a lot of silly ideas get thrown around by people who don’t know what they are talking about, be-cause they are afraid of looking dumb in front of executives that are demanding answer from them.”

As I’m leaving, he points out that most of those people have MBAs, “so keep do-ing what you are doing with that consult-ing class.” [5]

At my next destination, over drinks at a pleasant birthday party, I meet a sup-ply chain manager who admits to an ob-session with relentlessly squeezing out costs. My attempts to engage him in a discussion of the broader impact of global

Subscribe to AnalyticsIt’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

Page 16: Data Analytics and Performance

W W W. I N F O R M S . O R G16 | A N A LY T I C S - M AGA Z I N E . O R G

supply chains, including the impact of the recent trag-edy in Bangladesh [6], are moderately successful, and he quickly mentions regular supplier audits, best practices and various other programs that his com-pany promotes on their website. He clearly knows the right things to say, but for some reason, I still leave this conversation feeling like he’s a lot more focused on his bonus plan than his supplier scorecards.

Finally, I end up at a somewhat upscale dinner with an interesting collection of technology profes-sionals, most of whom I’m just meeting for the first time. I quickly notice that several of them are wear-ing attractive, elegantly designed gadgets to track blood sugar levels, heart rates, blood pressure and other health-related data. This observation triggers a friendly debate about which glucose meter is the best and most technologically advanced (several of us are from India, which makes us three times more likely than white Americans to be diagnosed as di-abetics [7]). As for me, since being diagnosed with type 2 diabetes several years ago, I basically use the only one for which my insurance company is willing to provide test strips.

Our host listens to this discussion with a faintly amused look. From previous conversations, I know that he is working hard on a health-related start-up compa-ny. He and his colleagues, most with computer science and engineering backgrounds, have been scouring a number of publicly available databases searching for

ANALYZE THIS!

He clearly knows the right things to say, but for some

reason, I still leave this conversation feeling like

he’s a lot more focused on his bonus plan than his

supplier scorecards.

Join the Analytics Section of INFORMSFor more information, visit: http://www.informs.org/Community/Analytics/Membership

Page 18: Data Analytics and Performance

W W W. I N F O R M S . O R G18 | A N A LY T I C S - M AGA Z I N E . O R G

correlations and opportunities. They have also been furiously reading trade publica-tions and research literature, one of them even taking a class on endocrinology, to de-velop the background needed to generate better hypotheses to investigate. The guys leading this thing up are what Silicon Valley folks call “serial entrepreneurs,” and they have successfully sold a couple of compa-nies already. I’m not sure what they will end up doing with this venture – neither do they, frankly – but I’m pleased to see that they are focused on using data to improve the state of our public health, and I wouldn’t bet against them doing something significant.

A quarter century after arriving there as a young and naïve graduate student, my own feelings about Silicon Valley are decidedly mixed. It is certainly no acci-dent that I no longer live or work in the tech industry echo chamber [8], and in talking to the supply chain executive and listening to the jousting about whose glu-cose monitor was the most techno-chic, I was reminded of some of the reasons. But it is also no accident that I still live nearby, thereby keeping my ringside seat at the circus, and that I look for reasons

to visit there often. The vast majority of the young people who continue to flock in droves to Silicon Valley are not actu-ally going to change the world much. And yet collectively its denizens have had an astonishing impact on our world – and on the world of applied analytics – and there is seemingly no end in sight.

Anyway, I’ll be heading back down again next month. Who knows what I might find down there next time? ❙

Vijay Mehrotra ([email protected]) is an associate professor in the Department of Analytics and Technology at the University of San Francisco’s School of Management. He is also an experienced analytics consultant and entrepreneur, an angel investor in several successful analytics companies and a longtime member of INFORMS.

ANALYZE THIS!

REFERENCES

1. See, for example, http://news.cnet.com/8301-1023_3-57589288-93/sean-parker-on-his-wedding-redwoods-and-death-threats/

2. http://www.newyorker.com/reporting/2013/05/27/130527fa_fact_packer

3. http://www.nytimes.com/2013/06/06/technology/wishing-you-and-your-start-up-were-here.html

4. http://www.nytimes.com/2012/08/05/magazine/oakland-occupy-movement.html?pagewanted=all

5. For more about my MBA course, see http://analytics-magazine.org/may-june-2013/798-analyze-this-course-puts-students-in-the-analytics-game

6. See, for example, http://www.ft.com/cms/s/0/5bd48c1a-b7e2-11e2-9f1a-00144feabdc0.html#axzz2WUz50Bbi

7. http://forecast.diabetes.org/news/indian-ethnicity-tied-higher-diabetes-risk

8. See, for example, http://bits.blogs.nytimes.com/2013/06/02/disruptions-the-echo-chamber-of-silicon-valley/

Help Promote Analytics MagazineIt’s fast and it’s easy! Visit: http://analytics.informs.org/button.html

Page 20: Data Analytics and Performance

W W W. I N F O R M S . O R G2 0 | A N A LY T I C S - M AGA Z I N E . O R G

In a 2010 survey [1], members of the Institute for Operations Research and the Management Sciences (INFORMS) were asked to compare operations re-search (O.R.) and analytics. Thirty percent of the re-spondents stated, “O.R. is a subset of analytics,” 29 percent stated, “analytics is a subset of O.R.,” and 28 percent stated, “advanced analytics is the intersec-tion of O.R. and analytics.” The remaining 13 percent were split between “analytics and O.R. are separate fields” (7 percent) and “analytics is the same as O.R.” (6 percent).

The emergence of data science only adds to the confusion. Is data science just another clever market-ing term popularized by the math illuminati?

INFORMS has developed working definitions of both O.R. and analytics through surveys of INFORMS members and Analytics magazine readers. O.R. is the “application of advanced analytical methods to help make better decisions.” Analytics is the “scien-tific process of transforming data into insight for better decision-making.”

DATA SCIENCE: AN EMERGING FIELD

Data science is an emerging field with no stan-dard definition yet. An early description can be found

O.R. vs. analytics … and now data science?

BY BRIAN KELLER

The results of data science are not just competitive

advantages; results of data science are the

products of the company. The data is the product.

VIEWPOINT

Page 21: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 21A N A L Y T I C S

in “Data Science: An Action Plan for Ex-panding the Technical Areas of the Field of Statistics” [2]. I think of data science as an interdisciplinary field combining math-ematics, statistics and computer science to create products based on data. The delivery of data products is the key idea. More on that later.

Indeed, the definitions for each sound similar. Differences begin to emerge when looking at O.R., analyt-ics and data science in terms of the focus of the discipline and types of techniques applied.

Operations research tends to focus on the solution of a specific problem us-ing a defined set of methods and tech-niques [3]. Classic examples of O.R. include facility location problems, sched-uling and deciding how many lines should be opened at a service center, which are all problem-solution focused. Techniques tend to be model-driven in which analysts select a reasonable model, fit the model parameters to the data and analyze re-sults. Based on survey data in “ASP: The Art and Science of Practice” [3], the top O.R. quantitative skills are optimization, decision analysis and simulation.

Analytics tends to go beyond solv-ing a single problem and focuses on overall business impact [3]. Clas-sic examples of analytics include business intelligence to summarize

operations and customer segmenta-tion for improved marketing and sales. The same survey identified the top an-alytics quantitative skills as statistics, data visualization, data management and data mining.

Data science tends to focus on data as a product. For example, Ama-zon records your searches, correlates them with other users and offers you suggestions on what you might like to buy. Those suggestions are data products that personalize the world’s biggest market, which drives sales. Google Now presents the results of your search before you even think to search for the information. Google Now is a data product that increases use of Google services, which delivers added revenue to Google.

Amazon product recommendations and Google Now may sound like an ana-lytic, which focuses quantitative effort on a broader business impact. However, the results of data science are not just competitive advantages; results of data science are the products of the company. The data is the product.

Subscribe to AnalyticsIt’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

Page 22: Data Analytics and Performance

W W W. I N F O R M S . O R G22 | A N A LY T I C S - M AGA Z I N E . O R G

Creating data products requires a strong sense of creativity and diverse perspectives of thought. As such, data scientists hail from a variety of academic backgrounds including O.R., statistics, computer sci-ence, engineering, biology and physics. The common themes across data scientists are creativity, curiosity to ask bigger questions, skills in data analysis and programming.

Data science often relies on combining multiple types of data together for analysis. Some data may be company proprietary; other data is available from one of the many public data sets available on the Web. These data sets often are too large to analyze using desktop tools, have missing or erroneous data, vary in structure across data sets, and may be lack-ing structure entirely (e.g., free-form text in main-tenance repair logs). The combination of data size and structure adds an additional challenge on top of data analysis – the data itself becomes part of the problem.

LEVERAGING DIVERSE SKILLS

Because so much effort of data science work falls on parsing, cleaning and managing the data, data scientists often must leverage diverse software de-velopment skills. One project may use Python for data acquisition and parsing, R for exploratory analy-sis, Hadoop for data storage and Map Reduce via

VIEWPOINT

Creating data products requires a strong sense

of creativity and diverse perspectives of thought. As

such, data scientists hail from a variety of academic

backgrounds including O.R., statistics, computer

science, engineering, biology and physics.

Request a no-obligation INFORMS Member Benefits PacketFor more information, visit: http://www.informs.org/Membership

Page 24: Data Analytics and Performance

W W W. I N F O R M S . O R G24 | A N A LY T I C S - M AGA Z I N E . O R G

VIEWPOINT

Java for production analytics, with re-sults delivered through Ruby on Rails. Analytics practitioners share in many of the data management challenges of data scientists, although usually at a smaller scale. In contrast, O.R. applications tend to focus on problem solution, and O.R. analysts usually use fewer tools during a project.

Visualization is key to the success of data science projects since the informa-tion must be consumable to users. Who would want to use Google Now if it pre-sented results in a table with p-values? Similarly, analytics practitioners value data visualization, whereas visualization is much less important to O.R. practitio-ners [3].

Analysis techniques may also differ with the large amounts of data collected. O.R. and analytics approaches generally assume a model and then fit the model to the data. The large amounts of data collected in many data science projects enable an alternative, model-free, data-driven approach. For example, auto-mated language translation algorithms were predominantly manual, rule-driven approaches until an increase in storage and compute power enabled storage and processing of large amount of bilingual text corpora from which statistical mod-els could infer the translation rules from the data.

DuoLingo [4], a free language learn-ing website, has created a data product based on a data-driven approach. As us-ers progress through lessons, they help translate websites and documents. In other lessons, users vote on correctness of translations. Statistical models based on user skill choose the best translations of documents, which others have submit-ted to be translated for a fee.

O.R., analytics and data science are closely related – all apply math to gain in-sights – and the fuzzy descriptions of the three disciplines above have boundaries as porous as the borders of countries in the European Union. However, just as a person in Germany is most likely a Ger-man (although he or she could be French or Italian), an O.R./analytics/data science practitioner will most likely fit the descrip-tion outlined in this article. ❙

Brian Keller ([email protected]), Ph.D., is a data science practitioner and lead associate at Booz Allen Hamilton. He is a member of INFORMS.

REFERENCES

1. Matthew Liberatore, Wenhong Luo, “INFORMS and the Analytics Movement: The View of the Membership,” Interfaces, Vol. 41, No. 6, November-December 2011, pp. 578–589.

2. W. S. Cleveland, “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,” ISI Review, Vol. 69, p 21-26, 2001.

3. Matthew Liberatore, Wenhong Luo, “ASP: The Art and Science of Practice,” Interfaces, Vol. 43, No. 2, p 194-197, March/April 2013.

4. www.duolingo.com

Page 25: Data Analytics and Performance

www.tradeextensions.com [email protected]

Corporate Headquarters U.S. Headquarters Europe Headquarters Uppsala, Sweden +46 (0) 18 13 66 00 Houston, TX USA +1 (855) 215 8387 Swaffham, UK +44 (0)1760 720 746

The Trade Extensions platform allows you to transform your day-to-day operations and

eliminate the boundaries between sourcing and supply. Our software will take any amount of

your data, in any format, from any system and its world-class optimization and reporting tools

will enhance your decision making on a daily basis and improve your supply chain.

We are used by some of the world’s most respected organizations including: P&G, Coca-Cola, Unilever,

BP, Kimberly-Clark, Cargill, Impress, Alcan, Dow, Cabot, Huhtamaki, Ineos Polyolefins and AT Kearney.

Contact us to see how we can help you.

Page 26: Data Analytics and Performance

W W W. I N F O R M S . O R G26 | A N A LY T I C S - M AGA Z I N E . O R G

Throughout the century-and-a-half since the dawn of the commercial petroleum industry, oilmen have al-ways hoped for the gusher – the big find that would spew enough oil to make them rich. There have al-ways been far more “dry holes” than gushers, howev-er, and the proportion only gets worse as oil and gas become harder to find and more difficult to produce. One thing the industry has in abundance today, how-ever, is data. And just like some of the biggest oil dis-coveries, the data gusher offers huge promise if it can be tamed. However, the exploration and production (E&P) industry, and the IT companies that support it, have a lot of work ahead to derive maximum value from their growing troves of data.

The first oil wells didn’t rely on data at all. They were sited where oil seeped from the ground natu-rally, and the challenge – not simple, but not data-dependent – was figuring out how to dig or drill down to the source.

Data grew more important as the industry grew, reserves became harder to exploit and drilling tech-nology evolved. Drillers began keeping paper records of what occurred during each work shift or “tour.” They tracked basic metrics such as the number of

Oil & gas producers need to tame the gusher … of data

BY WARREN WILSON

Data grew more important as the industry grew,

reserves became harder to exploit and drilling

technology evolved.

FORUM

Page 27: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 27A N A L Y T I C S

feet drilled per hour or day, obstacles en-countered, injuries sustained.

Over the years recordkeeping has become steadily more thorough and sophisticated. Today the E&P indus-try has vastly better tools for every aspect of oilfield operations including: three-dimensional maps of subsurface geological structures and hydrocar-bon reservoirs; graphs or “logs” of the wells’ downhole conditions (tempera-ture, pressure, porosity, permeability, etc.); and records of injuries and envi-ronmental incidents. New equipment is often fitted with sensors that produce steady streams of data about temper-ature, vibration and other parameters that indicate whether the asset is oper-ating as it should, or whether it needs service, repair or replacement.

Yet E&P companies still find them-selves in the same position as their pre-decessors and their counterparts in other industries: They are awash in data, but they don’t have all the insight they need. For example, they have copious real-time data from individual wells, but they do not have a good handle on the dynamics of complex unconventional reservoirs. Sim-ilarly, they have access to vastly better analytical tools than ever before, but few companies can claim to have fully opti-mized field operations, production or as-set management.

ANALYTICS TOOLS MUST EVOLVE

E&P companies’ drilling programs have always relied primarily on historical data that describe a given region and its history. Extrapolation and interpolation suggest where to drill next – if you drill between two producing wells, your odds of success are relatively good. Outside known reservoir boundaries, the odds fall off dramatically; so-called “wildcat” wells face a much greater risk of producing no return on many millions of dollars invest-ed. That has always been the biggest gamble in oil exploration – even when armed with the best historical, descrip-tive data available, you still have to spend large sums up front to drill the well before finding out if your data is any good.

Today the industry’s data and analyt-ics needs are changing, and so are the tools at its disposal. Having found and developed most of the world’s “easy” oil and gas reserves, E&P companies are venturing into even more remote loca-tions and extracting hydrocarbons from unconventional sources. They are tap-ping into shale rock so impermeable that it won’t give up its gas and oil without first being “fractured” with water, sand and chemicals injected under high pressure. They are tapping sand deposits that contain oil so viscous that it won’t flow without first being diluted with solvents or softened by steam.

Page 28: Data Analytics and Performance

W W W. I N F O R M S . O R G28 | A N A LY T I C S - M AGA Z I N E . O R G

These unconventional sources pose unique challenges that require greater precision and real-time analytics – for example, to keep drill bits positioned precisely where they need to be within the shale “pay zone” and to control the placement, composition and pressure of fracking fluids to yield optimal results.

At the same time, E&P companies increasingly need predictive analytics tools, for a variety of purposes. They need to better understand how current production methods will affect long-term yield. They need more accurate predic-tions of asset behavior to improve con-tinuity in drilling and production while minimizing the costs of spare equipment or service crews. Software vendors are increasingly offering analytics tools that address such needs.

But prediction is just one step forward. The next is so-called “prescriptive” ana-lytics that go beyond merely predicting future behavior, to recommend the best course of action to achieve a desired result. Such capabilities are quite rare today and at a very early stage of devel-opment. But they promise to bring new

levels of performance, not just in E&P but in many other industries as well, because they track the results of their recommen-dations and feed those results back into the prescriptive algorithms to produce (in theory, at least) better and better recom-mendations over time.

COMPLEX NEEDS DRIVE DEMAND

FOR NEW SOLUTIONS

Despite all the potential benefits of advanced analytics in oilfield operations, their adoption is still at a very early stage. That is true of most industries, simply be-cause the technologies themselves are new. Software vendors today are evan-gelizing their predictive capabilities; only a tiny number yet offer “prescriptive” an-alytics. Indeed, the term itself is not yet widely known.

An additional factor in IT adoption in E&P is the industry’s insularity. It is a world unto itself, an industry that oper-ates largely outside of everyday view and speaks a specialized language that few outsiders understand. It is an industry driven by geologists and engineers who tend to regard IT as a mere support tool for existing operations.

The idea that IT, and particularly an-alytical software, can provide strategic and competitive advantage is not widely held. But until it is, E&P companies will not find or produce as much oil and gas

FORUM

Help Promote Analytics MagazineIt’s fast and it’s easy! Visit: http://analytics.informs.org/button.html

Page 29: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 29A N A L Y T I C S

as they could, nor will they manage their operations as efficiently and safely as they could. E&P companies owe it to themselves, and to their shareholders, to broaden their traditional view of IT and consider the strategic advantage it can provide. IT vendors, for their part, must state their value propositions in terms that E&P engineers can understand. They must explain, in plain language, how IT can help E&P companies find and extract more oil. ❙

Warren Wilson ([email protected]) leads Ovum’s Energy team, focusing primarily on IT for upstream Oil & Gas. He has been an analyst for 14 years. He joined Ovum in 2006 when Ovum acquired his former employer, Summit Strategies. At Summit his primary area of responsibility was mobile business applications. On joining Ovum, his research focus shifted to business management applications such as enterprise resource planning, supply chain management, and analytics. Before becoming an IT analyst, Wilson had been a reporter and editor for U.S. newspapers including the Casper (Wyoming) Star-Tribune, where he covered oil & gas and other energy industries. He majored in geology at Carleton College in Northfield, Minn., and later worked in the oilfield as a roughneck and well logger.

Page 30: Data Analytics and Performance

W W W. I N F O R M S . O R G3 0 | A N A LY T I C S - M AGA Z I N E . O R G

‘SMAC’ delivers a much-needed combination punch for peak customer experience.

In some circles, business process management (BPM) has developed an unfortu-nate reputation, particularly

among general business media. From the media’s perspective, the negative

reputation is implicit in the words them-selves: “process” implies rigidity, and “management” implies slowness.

Business that’s rigid and slow is an-tithetical to success in the age of social, mobile and cloud technologies – and it

Analytics & BPM

BY MALCOLM ROSS

I

SOCIAL, MOBILE , ANALYTICS & CLOUD

Page 31: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 31A N A L Y T I C S

doesn’t address the importance of ana-lytics at all. BPM as a phrase contains everything that may sound bad about old-fashioned enterprise software.

The truth is, it is precisely the advent of social, mobile and cloud that makes business process management, when combined with sophisticated data ana-lytics, the cornerstone of success in this new age.

The vocabulary of IT is increasingly embracing social, mobile, analytics and cloud – or its catchier acronym “SMAC.” All of these words have one thing in com-mon. They are all ways to engage cus-tomers, whether internal or external, to do several important things:• to create meaningful interpretations

of information, • to adapt quickly and nimbly to even

small changes in the business environment,

• to improve the customer experience, and most importantly,

• to do it all more quickly than ever before.

SYSTEMS OF RECORD VERSUS

SYSTEMS OF ENGAGEMENT

Understanding how analytics works to support intelligent business process and provide peak customer experience means taking a hard look at the two camps of enterprise data-related software.

The industry analyst firm Forrester and Forrester senior analyst Clay Richardson draw a distinction between “systems of record” and “systems of engagement.”

ERP software, big data repositories and other similar technologies comprise what Richardson refers to as systems of record. They record data according to organizational function, and they offer a fairly static representation of a company’s performance – essentially a snapshot, or in the best case, a dashboard.

Systems of engagement sit in front of systems of record. They provide the rules for creating customer-facing responses to the data stored in those systems of re-cord. That includes rules for interacting with customers in the mobile, social and cloud environment. The rules themselves are based on an analysis of the informa-tion in the systems of record.

Almost by definition, business pro-cess management can be seen as a system of engagement. It’s how workers engage with the people and data they need to make informed business choices, quickly, easily, measurably and routinely.

Some BPM systems, it can be ar-gued, still to have one foot in the systems of record camp, simply because they re-cord business rules and processes. At the high end, these systems may even include specific functionality that allows companies to look at their business as a

Page 32: Data Analytics and Performance

W W W. I N F O R M S . O R G32 | A N A LY T I C S - M AGA Z I N E . O R G

collection of records – essentially a snapshot of infor-mation related to particular work transactions.

It’s when SMAC enters the picture, though, that the distinction between the two systems emerges. The introduction of methods of engagement such as social, mobile, analytics and cloud, puts a pic-ture frame around Forrester’s contention that BPM is much more than a system of record, and is, in fact, a sophisticated system of engagement.

The benefit of this type of analytically supported system of engagement is that it creates routines (business processes) for the best possible customer experience. It is a means to that end, not an end it-self. With SMAC, systems of engagement offer busi-nesses a way to make sense of all the means by which workers connect to their business data, their colleagues’ brainpower and their customers.

Any organization that uses BPM to understand data, create business rules that can be adapted almost on the fly, and apply those rules through various media such as social and mobile technology, is ultimately creating the best possible customer experience it can offer. Every-thing else is just a delivery mechanism.

In particular, social networking – whether ac-cessed through mobile networks or through the cloud – offers a truly meaningful customer experi-ence only when filtered through the lens of analyti-cally supported business processes. Social media without the context of work is nothing more than water cooler talk.

It’s the overlap of work with social engagement (call it “worksocial”) that really creates a compelling case for business success. And implicit in the notion of work is the analytical component.

SMAC DELIVERS

Any organization that uses BPM to understand data, create business rules that

can be adapted almost on the fly, and apply those

rules through various media such as social

and mobile technology, is ultimately creating the

best possible customer experience it can offer.

Page 33: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 33A N A L Y T I C S

The worksocial model applied to data analysis and BPM delivers a modern and more consistent customer experience, in-creasing organizational productivity and operational efficiency for a more agile busi-ness. Real-time analytics can be improved to make informed decisions and take ac-tions from anywhere, at any time. Organi-zations can quickly respond to new market opportunities, and enterprise social software finally has a real business purpose.

Now that we’ve had some background theory of how SMAC relates to analytics and business process, let’s look at what it means in the real world.

ANALYTICS AND PROCESS

IN ACTION – PRACTICAL

APPLICATIONS

For analytics to be actionable, analy-sis must work in conjunction with process. Business intelligence and analytics yields intelligent business process management.

By having more real-time intelligence and analytics directly feeding automated processes via mobile, social and cloud mechanisms, an organization can see trends, issue actions and measure the results through reports delivered by en-terprise social media.

The application of SMAC to business intelligence and process essentially cre-ates a definition of what a peak customer experience looks like to any one company,

all the way up to what someone should do when actually interacting with the customer. Again, keep in mind that both internal and external customers benefit. Following are only a few examples of tangible benefits:

Field service management and op-erational efficiency. One of the largest wind turbine companies in the United States and the world, EDP Renewables has 28 separate, geographically dis-persed wind farms each populated with massive 300-foot tall turbine generators.

Page 34: Data Analytics and Performance

W W W. I N F O R M S . O R G34 | A N A LY T I C S - M AGA Z I N E . O R G

The company’s wind farms generate 3.3 gigawatts of green wind energy – more power than created by the Hoover Dam.

The energy market fluctuates great-ly. The margin on a watt of energy pro-duced versus a watt sold is constantly in flux, depending on the weather and usage patterns. Identifying areas where the market is very good, such as where demand for climate control is high, might dictate prioritization of that equipment for maintenance.

EDP uses this type of real-time stream-ing data analytics information to help pri-oritize how the company manages and maintains its wind turbine assets. Weath-er patterns and related North American weather events are streamed in to the orga-nization’s business process management application as big data information. By ana-lyzing that information in context with data on turbine maintenance issues, EDP can anticipate weather patterns to identify the potential energy output – and the potential price of that energy – for particular farms. This is essential for prioritizing the remedia-tion of turbine issue fixes to maximize the profitability of the company.

With thousands of wind turbine com-ponents from a variety of vendors, EDP also needs to be able to analyze the rela-tive quality of the turbines and compare their performance over time against the cost to repair and maintain these pieces of equipment. They are constantly re-viewing their vendors in this analysis. By correlating factors such as time between repairs from one vendor to another, EDP can make appropriate assessments on the prioritization and maintenance of its equipment.

Quality control. A leading global bever-age provider uses business analytics in mobile applications to assist in inspect-ing individual stores for quality control to optimize the customer experience. With tens of thousands of stores all over the world, the company has dramatically ac-celerated the inspection process and the return of inspection reports that identify areas for improvement.

Inspectors in the field are able to per-form complete inspections using an iPad, with immediate tabulation of individual store issues, as well as regional and sup-plier trends. In any given location, field inspectors can examine the store for fac-tors such as equipment and service qual-ity, customer experience, signage and cleanliness. The real-time data crunch-ing that produces scoring data enables

SMAC DELIVERS

Subscribe to AnalyticsIt’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

Page 36: Data Analytics and Performance

W W W. I N F O R M S . O R G36 | A N A LY T I C S - M AGA Z I N E . O R G

SMAC DELIVERS

inspectors to then sit with store manag-ers and discuss action plans all in the single inspection visit. The company calculates that this process acceleration saves it more than 30,000 hours annually in inspection times, directly improving the quality of the customer experience more rapidly and consistently.

Customer satisfaction. Online retail giant Amazon does not source every product they sell. When you purchase something from Amazon, that product may actually be sourced from a depart-mental store, specialty merchant or other retail organization. When a customer re-ceives an incorrect shipment or an incor-rect charge, Amazon refers to this error as a “price purchase variance.” As the intermediary, Amazon is responsible for resolving that variance.

Because of the sheer quantity of transactions conducted through its web-site, Amazon has considerable familiarity and expertise in handling these types of exceptions. In some cases in the past, Amazon would have simply absorbed the difference in cost or the cost of correcting the transaction problem. Price purchase variance settlements were easily costing the company millions of dollars a year.

Amazon began developing a pro-cess environment that would take in the massive volume of exception handling

and apply business rules to it, to auto-mate decision logic. They can see how the variances are coming in, look at the past performance of other vendors, de-termine the vendors’ track record of ful-filling products and determine whether there is a history of that type of transac-tion problem.

Using statistical analysis of price pur-chase variance frequency, Amazon now ap-plies business rules to automatically push the issue back to the sourcing vendor and request resolution – whether auto approval, make good or other tasks for the vendors. Through the same system, they can initiate communication with the customers to as-sure them that they are aware of the prob-lem and are working to resolve it.

Social engagement for collaboration. Social media for business involves both external and internal aspects. In con-sumer social platforms (Twitter, Face-book, LinkedIn, etc.), there clearly can be a business integration element, including brand management and customer rela-tionship management.

There’s also an internal component to social media for business. This aspect fo-cuses on developing a project-based so-cial collaboration framework for the open exchange of data and knowledge within the enterprise. In some cases, there may also be a hybrid of external and internal

Page 37: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 37A N A L Y T I C S

elements. Processes can be designed to read external social media feeds, iden-tify exceptions or trends, and bring those things into an automated environment to set up response mechanisms (particu-larly important for brand and reputation management).

By applying social media in the context of work – the “worksocial” concept men-tioned earlier in this article – organizations can openly share both collaborations and conversations related to the analysis at hand. Whether it is an individual case of turbine maintenance, story inspection, invoice excep-tion or any other process, someone else involved in the same type of task can see that information and openly share related data or process recommenda-tions and strategies. This improves business deci-sion-making and the cus-tomer experience as well.

CONCLUSION

Peak customer expe-rience has long been in need of a healthy SMAC.

Social, mobile, ana-lytics and cloud capability fuels the ways in which business processes can

be automated. Organizations can con-nect their workers to the job at hand, letting them all make use of whatever in-formation they need to deliver the best result to customers. And SMAC can im-prove the quality of the user experience at the speed of the Internet, bringing to-gether many users, using many devices, across many different networks, in many different places everywhere, all at once. ❙

Malcolm Ross is VP of Product Marketing for Appian. He can be reached at [email protected].

Page 38: Data Analytics and Performance

W W W. I N F O R M S . O R G38 | A N A LY T I C S - M AGA Z I N E . O R G

Harnessing the power of big data.

Every day’s a struggle. I’ve faced some tough chal-lenges such as which sur-gery to get, how to invest for

my business and even how to deal with identify theft. With so much stuff coming at me from all angles, daily prosperity re-lies on spam filters, Internet search en-gines, and personalized music and movie recommendations. My mailbox wonders why companies still don’t know me well enough to send less junk mail.

These predicaments matter. They can make or break your day, year or life. But what do they all have in common?

These challenges – and many others like them – are best addressed with pre-diction. Will the patient’s outcome from surgery be positive? Will the credit appli-cant turn out to be a fraudster? Will the in-vestment fail? Will the customer respond if mailed a brochure?

There’s another angle. Beyond ben-efiting you and I as individuals, prediction

Predictive analytics

BY ERIC SIEGEL

E

COMPETIT IVE EDGE

Page 39: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 39A N A L Y T I C S

bestows power upon an organization: Big business secures a competitive strong-hold by predicting the future destiny and value of individual assets.

For example, in the mid-1990s, Chase Bank witnessed a windfall predicting mortgage outcome. By driving millions of transactional decisions with predictions about the future payment behavior of homeowners, Chase bolstered mortgage portfolio management, curtailing risk and boosting profit.

INTRODUCING ... THE CLAIRVOYANT

COMPUTER

Making such predictions poses a tough challenge. Each prediction de-pends on multiple factors: the various characteristics known about each pa-tient, each homeowner and each e-mail that may be spam. How shall we attack the intricate problem of putting all these pieces together for each prediction?

The solution is machine learning; computers automatically discovering pat-terns and developing new knowledge by furiously feeding on modern society’s greatest and most potent unnatural re-source: data.

Data can seem like such dry, un-interesting stuff. It’s a vast, endless regiment of recorded facts and figures. It’s the unsalted, flavorless residue de-posited en masse as businesses churn away.

But the truth is that today’s big data embodies a priceless collection of expe-rience from which to learn. Every medical procedure, credit application, Facebook post, movie recommendation, fraudulent act, spammy e-mail and purchase of any kind is encoded as data and warehoused. This veritable Big Bang delivers a pleth-ora of examples so great in number only a computer could manage to learn from them.

This learning process discovers and builds on insightful gems such as:• Early retirement decreases your life

expectancy.• Online daters more consistently

rated as attractive receive less interest.

Figure 1: The learning process.

Page 40: Data Analytics and Performance

W W W. I N F O R M S . O R G4 0 | A N A LY T I C S - M AGA Z I N E . O R G

• Vegetarians miss fewer flights.• Local crime increases after public

sporting events.

Machine learning develops predic-tive capabilities with a form of number-crunching, a trial-and-error learning process that builds upon statistics and computer science. In commercial, indus-trial and government applications – in the real-world usage of machine learning to predict – it’s known as: Predictive analytics — Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions.

APPLIED PREDICTION

“The powerhouse organizations of the Internet era, which include Google and Amazon ... have business models that hinge on predictive models based on machine learning.”

– Professor Vasant Dhar, Stern School of Business, NYU

Every important thing a person does is valuable to predict, including:

consume, work, love, procreate, vote, mess up, commit a crime and die. Here are some examples:• Prediction drives the coupons you

get at the grocery cash register. U.K. grocery giant Tesco predicts which discounts will be redeemed in order to target more than 100 million personalized coupons annually at cash registers across 13 countries. This increased coupon redemption rates by a factor of 3.6 over previous methods.

• Predicting mouse clicks pays off massively: websites predict which ad you’ll click in order to instantly choose which ad to show, driving millions in new-found revenue.

• Netflix awarded $1 million to a team of scientists who best improved their recommendation system’s ability to predict which movies you will like.

• Obama was re-elected in 2012 with the help of voter prediction. The campaign predicted which voters would be positively persuaded by campaign contact, and which would be inadvertently influenced

PREDICTIVE ANALYTICS

Join the Analytics Section of INFORMSFor more information, visit: http://www.informs.org/Community/Analytics/Membership

Page 41: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 41A N A L Y T I C S

to vote adversely. Acting on these predictions was shown to successfully convince more voters to choose Obama than traditional campaign targeting.

• The leading career-focused social network, LinkedIn, predicts your job skills.

• Online dating leaders Match.com, OkCupid and eHarmony predict which prospect on your screen would be the best bet at your side.

• Target predicts customer pregnancy

in order to market relevant products accordingly. Nothing foretells consumer need like predicting the birth of a new consumer.

• Student essay grade prediction has been developed for automatic grading. The system grades as accurately as human graders.

• Wireless carriers predict how likely it is you will cancel and defect to a competitor – possibly before you have even conceived a plan to do so – based on factors such as

Page 42: Data Analytics and Performance

W W W. I N F O R M S . O R G42 | A N A LY T I C S - M AGA Z I N E . O R G

dropped calls, your phone usage, billing information and whether your contacts have already defected.

• Wikipedia predicts which of its editors, who work for free to keep this priceless asset alive, are going to discontinue their valuable service.

• Allstate Insurance tripled the accuracy of predicting bodily injury liability from car crashes based on the characteristics of the insured vehicle. This could be worth an estimated $40 million annually to the company.

• At Stanford University, a machine learned to diagnose breast cancer better than human doctors by discovering an innovative method that considers a greater number of factors in a tissue sample.

• Researchers predict your risk of death in surgery based on aspects of you and your condition in order to inform medical decisions.

• Crime-predicting computers help decide who belongs in prison. To assist with parole and sentencing decisions, officials in states such as Oregon and Pennsylvania consult prognostic machines that assess the risk a convict will offend again.

Organizations of all kinds benefit by applying predictive analytics, since

there’s ample room for operational im-provement; organizations are intrin-sically inefficient and wasteful on a grand scale. Marketing casts a wide net; “junk mail” is marketing money wasted and trees felled to print unread brochures. An estimated 80 percent of all e-mail is spam. Risky debtors are given too much credit. Applications for government benefits are backlogged and delayed.

With predictive analytics, millions of decisions a day determine whom to call, mail, approve, test, diagnose, warn, investigate, incarcerate, set up on a date and medicate. By answer-ing this mountain of smaller questions, predictive analytics combats financial risk, fortifies healthcare, conquers spam, toughens crime-fighting, boosts sales and may in fact answer the big-gest question of all: How can we im-prove the effectiveness of all these massive functions across business, government, healthcare, non-profit and law enforcement work? ❙

Eric Siegel, Ph.D., is the founder of Predictive Analytics World (www.pawcon.com) and the author of “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die,” from which this article was adapted with permission of the publishers, Wiley. Upcoming Predictive Analytics World conferences will be held in Boston, San Francisco, Chicago, Washington, D.C., Berlin and London. For more information about predictive analytics, see the Predictive Analytics Guide. ©2013 Eric Siegel. All rights reserved.

PREDICTIVE ANALYTICS

Page 44: Data Analytics and Performance

W W W. I N F O R M S . O R G44 | A N A LY T I C S - M AGA Z I N E . O R G

Beyond supply chain planning: the development of optimization in complex sourcing.

Optimization is the perfect technology for sourcing since it deals with selecting the best element that meets

specified criteria from some set of avail-able alternatives, i.e., “finding the best deal.” Its use is becoming more wide-spread in industry as data handling, pro-cessing power and solvers have improved to the extent that there is no other way to handle the levels of data and complexity in sourcing projects that are run today.

What constitutes the “best deal” is a discussion in its own right, but the per-son or organization doing the buying will determine it. If it is a commodity with a clearly defined specification it could well be a question of the lowest price. If, how-ever, it is a service that is being sourced, “softer” criteria will more likely needed to be met, so price is only one factor that will be considered. As more criteria are intro-duced, the complexity increases and it is the ability to handle complexity that has

Forecasting and optimization

BY ARNE ANDERSSON

O

SUPPLY CHAIN MANAGEMENT

Page 45: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 45A N A L Y T I C S

seen a dramatic change in the way large organizations approach sourcing.

Fifty years ago the telephone and a notepad were the tools available for sourcing, so the levels of complexity were relatively low. Spreadsheets and e-mail dramatically increased the levels of complexity that could be handled, but even these techniques pale into insignifi-cance with the levels of complexity that are handled by online sourcing platforms today. A typical “buying event” today will have thousands of items, and tens of thousands of offers from hundreds of suppliers. Even the simplest event will have potentially millions of combinations of goods and suppliers, so optimization is the only way to analyze this level of data.

HANDLING COMPLEXITY

The ability to handle large amounts of data has also seen sourcing change from a one-way process where suppliers are asked to make offers for individual items based on the sourcing companies’ crite-ria to a process of collecting an array of information from suppliers and analyzing the information collected in order to find the best solution. This combinatorial ap-proach allows suppliers to express their strengths by creating their own groups of items so they can make their most competitive offers. Trade Extensions, for example, carried out the first online

combinatorial auction in February 2001 when it worked with Volvo on a packag-ing tender, which involved 600 items, 15 suppliers and had a value of approxi-mately $15 million.

To show how far the levels of com-plexity have increased, a U.S. bank re-cently used the platform to allocate a spend of $1 billion sourcing the materials to produce and deliver two billion items of direct mail after collecting 400,000 bids from more than 100 suppliers for 65,000 items. This level of complexity is commonplace nowadays, and many re-cent projects take the complexity to an-other level by integrating sourcing and planning.

MOVING BEYOND SOURCING

Companies that have become famil-iar using the technology for bid collection and analysis now realize that the software can be configured to solve any constraint-driven challenge. For example one of our customers is using the platform to define the manufacturing process of its prod-ucts stage by stage. This customer has numerous manufacturing sites and even more assets at its disposal. In this case assets are manufacturing equipment that are owned and operated by external sup-pliers. Each asset has been qualified by the company to perform a certain opera-tion for each product, so the challenge

Page 46: Data Analytics and Performance

W W W. I N F O R M S . O R G4 6 | A N A LY T I C S - M AGA Z I N E . O R G

is to optimize the manufacturing pro-cess to ensure that each product goes through the correct number of processes required to produce the finished product using only qualified assets while taking into account the various costs – raw ma-terial, production, transport, warehous-ing, inventory etc. It’s a simple concept and a classic optimization challenge, and it is made more complex by intro-ducing further constraints. For example, it is possible to increase the number of

operations individual assets can perform on different products, but this qualifica-tion process costs time and money and there is qualification budget that cannot be exceeded.

To identify the most appropriate as-sets to use, the manufacturer optimizes its production based on monthly demand forecasts for each product and per mar-ket. It is an incredibly complex system in turns of data, but optimization transforms the data into tangible information that the

SUPPLY CHAIN MANAGEMENT

Figure 1: Asset optimization based on monthly forecasts for each product and market, taking into account operational constraints.

Raw Material CostDifferent sites use different raw materials for a product: The precise mix of raw materials can vary between sites for a product

Production CostExternal suppliers have contracted prices for each operation and product

Operational Constraints

Supply Chain CostsFreightWarehousing/Inventory/CapitalTaxes & Duty

Initiative CostsCost for changing/moving productionQualification cost for approving an asset to perform an additional operation

Page 47: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 4 7A N A L Y T I C S

business uses to determine its day-to-day operations. And because the data is continually updated it essentially creates a dynamic model of the supply chain on which further analysis can be carried out. For example: What happens if there is natural disaster that completely closes site Y? What happens if there is a 15 percent wage increase in China? What happens if “Supplier x” goes bust? If the data is handled in the correct way, there are no limits on the “what if?” questions companies can ask so they can see the impact of any proposed changes before implementation.

FLEXIBILITY CREATES COMPLEXITY

The flexibility that is provided to orga-nizations in terms of analysis creates its own problems, and a large proportion of the research that we are carrying out at the moment deals with improving our op-timization software, both in terms of ca-pability and user friendliness. There are many factors to consider.

First of all, one must always show re-spect for complexity. The type of mathe-matical problems that need to be tackled are known in the scientific world as NP-complete. Simply put, it is impossible to give any guarantees for how long they will take to solve. Therefore, we have carefully developed our skills and expe-riences of how to properly handle hard

optimization problems in practice. In our experience, with a proper re-formulation or relaxation of the hardest problems, there is basically always a working solu-tion available, with or without tweaking.

As an example, consider the follow-ing business rule: “For each product, we want two suppliers, but no individual supplier should be awarded less than 20 percent.” This seems like a quite natural rule. Let us re-formulate it slightly: “For each product, no supplier is awarded more than 80 percent, and the total num-ber of suppliers is at most two.” Are the two rules identical? No; there are some subtle differences. For example, if there is only one available supplier on one product, the first rule would create an in-feasible problem, while the second would still be able to handle. But, more impor-tantly, the difference in execution time on a solver may be very large when these rules are combined with other rules. And, by helping clients re-formulate rules as above, we can bring significant assis-tance in tackling the most challenging problem instances.

Another example where much care is needed is related to numeric precision. It is not uncommon that very large num-bers are mixed with very small numbers in the same sourcing/optimization project (e.g., when a retailer sources products where volumes differ by several orders

Page 48: Data Analytics and Performance

W W W. I N F O R M S . O R G48 | A N A LY T I C S - M AGA Z I N E . O R G

of magnitude between different product categories). However, the small numbers are just as significant as large numbers and they cannot be ignored.

We also have to remember that we are working with people in sourcing de-partments and not computer scientists from academic institutions, and often us-ers will create impossible or illogical que-ries to solve. Therefore, helping users to identify conflicting rules and constraints is of great importance. Not only may we face conflicting rules, but sometimes it may be very hard to understand why a particular solution is the optimal one. For example, we may ask ourselves why

one particular supplier is not included in the optimal solution, and access to good automatic explanations is of vital importance. Such an explanation could be, “Supplier X not allocated because of Rule Y,” or “Not allocated because price is too high,” etc.

Alongside the challenge of formulat-ing the correct query is the practical prob-lem of computing power. Because the queries are NP-complete problems and it is impossible to predict how long they will take to solve, they can tie up a sig-nificant amount of computer resources. The Trade Extensions platform solves this by dynamically allocating computer

SUPPLY CHAIN MANAGEMENT

Figure 2: Optimization used in conjunction with large-scale data and effective reporting is transforming sourcing and moving into areas beyond supply chain planning and asset optimization.

Page 49: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 4 9A N A L Y T I C S

resources over the cloud. While it’s com-monplace for “the cloud” to be used for data storage, using it for data processing is still quite rare yet it allows an unlimited number of complex queries to be solved simultaneously.

CONCLUSION

Optimization is transforming sourc-ing and its influence on other areas of business is only going to increase. Data handling, equation definition, solvers and reporting are improving all the time so the

number of people and organizations able to access these incredibly sophisticated tools will grow and optimization applica-tions will only be limited by individuals’ creativity. ❙

Arne Andersson co-founded Trade Extensions (www.tradeextensions.com) in June 2000. He is one of the world’s leading experts in designing algorithms for optimization, and he has published more than 50 articles in major scientific journals. In 2012 he became a member of the Royal Swedish Academy of Engineering Sciences. Previously, Andersson was a professor of computer science at Uppsala University, the oldest university in Sweden (founded in 1477) and one of the highest ranked universities in Europe.

Page 50: Data Analytics and Performance

W W W. I N F O R M S . O R G5 0 | A N A LY T I C S - M AGA Z I N E . O R G

Smart grid data brings challenges and opportunities for power industry.

The age-old business of forecasting is once again a hot topic of conversation at utilities. As the business

needs shift to a more proactive manage-ment style, analytics that give insight into the future – whether customer adoption of electric vehicles (EVs) over the next five years or tomorrow’s wind power genera-tion – are in demand.

For load forecasters specifically, the scrutiny is intensifying. Previously,

utilities didn’t get many questions about the accuracy of their load forecasts dur-ing the regulatory rate case approval process. But now, new rate cases are harder and harder to approve. In this environment, utilities need more de-fensible forecasts to secure regulatory approval.

Under pressure to demonstrate a re-turn on smart grid investments, utilities are using the data they collect from smart meters and other smart grid devices to

Utilities dust off the forecasting playbook

BY TAO HONG (LEFT) AND ALYSSA FARRELL

T

LOAD FORECASTING

Page 51: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 51A N A L Y T I C S

better understand customers, design de-mand response (DR) programs, make buying and selling decisions on the en-ergy market, and increase the reliability of the grid. Forecasting plays a key role in each of these areas, from modeling fu-ture load growth to predicting the impact of DR.

Forecasting is also becoming more critical to the operations of a utility

because of the increasing penetration of distributed energy resources, EVs and energy-efficient appliances. Previously, when forecasting electricity demand, utilities didn’t have to worry about electric vehicles or solar panels on rooftops or wind farms because these technologies were not present in significant enough numbers to have any real effect. Now, however, they’re increasing in preva-lence and therefore increasing the chal-lenge of accurately forecasting electricity demand.

Advanced Metering Infrastructure (AMI) is the primary technology that of-fers forecasters more timely and granu-lar data for load analysis and forecasting. With AMI, the utility has two-way commu-nication with the meter (electricity, water or gas), and it gets readings back in an automated fashion in real time, which

Figure 1: Ten years of hourly electric load of a U.S. utility at the corporate level. As millions of smart meters are being installed, utilities will see more and more hourly or even sub-hourly load series at the household level. The data brings both challenges and opportunities to the utility industry.

Increased use of solar panels makes forecasting electricity demand more difficult.

Page 52: Data Analytics and Performance

W W W. I N F O R M S . O R G52 | A N A LY T I C S - M AGA Z I N E . O R G

means that all the data about energy consumption, down to the meter level, can be more granular than ever before.

AN EXPANDING ROLE FOR UTILITY

FORECASTERS

For the vast majority of the electric-ity grid, energy consumption is mainly driven by weather, human activities and

the interactions among those variables. In the past, if utilities could predict tem-perature and properly model seasonal behaviors, they would arrive at a pretty decent forecast. Now, utilities with re-newable generation resources may need to forecast cloud cover or wind speed.

For example, as cloud cover increas-es, solar photovoltaic output goes down. This means the net demand on the re-maining system will increase under the same loading condition. The opposite is true for wind. As wind speed increases in a region, the output from wind farms increases and net demand on the system

LOAD FORECASTING

Help Promote Analytics MagazineIt’s fast and it’s easy! Visit: http://analytics.informs.org/button.html

Smart grid infographic takes into consideration emerging energy sources.

Page 54: Data Analytics and Performance

W W W. I N F O R M S . O R G54 | A N A LY T I C S - M AGA Z I N E . O R G

is reduced. Unfortunately, making predic-tions about cloud cover and wind speed and direction is significantly more chal-lenging than predicting temperature. The high volatility of wind and solar makes to-day’s load forecasting much more com-plicated than before.

In addition, EV charging is quite difficult to model. If EV owners regularly charge their batteries in the evening hours then that would be a predictable load to forecast, but human behavior is erratic. We come home early some days, stay late, go out for din-ner, work from home, etc. The volatility in demand that is introduced by these new technologies is putting new pressures on utility forecasters.

BRIDGING THE CULTURAL AND

TECHNICAL DIVIDE

As a fundamental problem in the utility industry, forecasting finds its applications across several departments of a utility, such as planning, operations, marketing and customer services. Many utility fore-casting teams are siloed, sitting in dif-ferent departments. Some utilities have an analytics center of excellence that serves multiple business needs. When they are centralized, these resources communicate better with each other and build collaborative forecasts that tend to have higher overall accuracy. If they are siloed, the consistency and qual-ity of the data is sometimes sporadic.

LOAD FORECASTING

Figure 2: One week of solar generation (kW) at five-minute intervals. There is no solar generation at night. During the daytime, solar generation can be very volatile and difficult to predict. The utility industry needs advanced forecasting and optimization techniques to operate the power grid under reliability, economic and environmental constraints.

Page 56: Data Analytics and Performance

W W W. I N F O R M S . O R G5 6 | A N A LY T I C S - M AGA Z I N E . O R G

Siloed forecasting teams may use differ-ent data, customized tools and have ac-cess to less computing power than if they were centralized.

Just like organizational differences, the business pressures faced by each utility are also unique. Large utilities tend to feel the pain of renewable and distributed energy resources more than smaller utilities. For example, several utilities in California pro-vide power to urban areas that have high penetrations of renewable energy. On the other hand, municipals and co-ops also care about improving their forecasting processes because many of them deployed smart me-ters even before the larger investor-owned utilities.

Because municipal utilities and coop-eratives are city- or member-owned, they have the incentive to understand their cus-tomers better so that they can more accu-rately contract the right amount of power to meet demand. When they do this well, they can pass on the savings directly to their cus-tomers. Old Dominion Electric Cooperative (ODEC) credits advanced forecasting ca-pabilities with enabling four rate decreases in just one year [1].

UTILITY FORECASTING KEYS TO

SUCCESS

Because forecasts are having an increasingly significant impact on busi-ness decisions, it is important to highlight several keys to success. One of the au-thors (Tao Hong) discussed three skills of the ideal energy forecaster in his blog [2]. First, forecasters need to maintain a close relationship with the business. The forecast provides no value unless people on the business side know how to use it. In addition, forecasters need broad ana-lytical skills to understand basic statistics, and they need technical skills to master the tool set available to them. Finally, but most importantly, forecasters need to be honest and true to their forecasting methodology and not allow themselves to be swayed by internal politics. Fore-casting results should be data-driven, not tweaked to meet some personal agenda.

To improve energy forecasting in the future, each utility needs centralized forecasting teams that provide analytical services for most of the business units across the utility. A team consists of peo-ple with diverse backgrounds, including electrical engineers, economists, statis-ticians, meteorologists, social scientists, operations research specialists, informa-tion management specialists, software programmers and business liaisons. Ide-ally, these people with their diverse skill

LOAD FORECASTING

Subscribe to AnalyticsIt’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

Page 57: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 5 7A N A L Y T I C S

sets all have access to quality data and are not technology constrained so they can perform complex calculations across many models in a very short time frame. They have rigorous, traceable forecasts, and comprehensive documentation. By working closely with the business side, the liaisons within the forecasting team help them improve data-driven decision-making. ❙

Tao Hong is the head of Energy Forecasting at SAS, where he oversees research and development, consulting, education, marketing

and sales of SAS solutions for energy forecasting. He is the author of the blog Energy Forecasting (http://blog.drhongtao.com). He is also the chair of IEEE Working Group on Energy Forecasting, general chair of the Global Energy Forecasting Competition and an adjunct instructor at the Institute of Advanced Analytics at North Carolina State University. Alyssa Farrell leads global industry marketing for SAS’ business within the energy sector, including utilities, oil and gas. She also has responsibility for SAS’ Sustainability Solutions and works with customers around the world to understand best practices and solutions for managing their business with environmental responsibility in mind. She participates in the Green Tech Council of the North Carolina Technology Association.

Page 58: Data Analytics and Performance

W W W. I N F O R M S . O R G58 | A N A LY T I C S - M AGA Z I N E . O R G

Adapting to a growing pandemic.

Corporate fraud has mani-fested itself in diverse forms repeatedly around the world. From identify thefts

and insider trading to more sophisticated e-crimes and misrepresentation of finan-cial information, the spectrum of fraudu-lent activities is huge and hence a tough challenge to overcome. Today, even as corporations ramp up fraud-detection ef-forts, the incidence of various frauds has been on the rise.

Consider these recent instances:• Medicare made about $35 million

worth of payouts to an organized group of more than 50 people who allegedly stole personal information, including Social Security numbers of 2,900 Medicare patients and billed Medicare for unperformed medical

services using phantom clinics.• Epsilon, the world’s largest provider

of permission-based e-mail marketing, announced that millions of individual e-mail addresses were exposed in an attack on its servers affecting a large number of brands on whose behalf Epsilon sends marketing e-mails to customers.

• CC Avenue, an Indian firm that validates payments made over certain e-commerce websites, faced charge-backs from a number of customers for e-transactions that they apparently did not make when a Web service provider posed as both the seller and the buyer by using credit card information he had pilfered to make purchases from his own website.

Forensic Analytics

BY PRITI RAVI

C

COMBATING CORPORATE FRAUD

Page 59: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 5 9A N A L Y T I C S

While such experiences may have in-duced a sense of urgency in many orga-nizations to establish a basic framework for fraud management, without a sophis-ticated and intelligent monitoring and fraud detection system in place, most corporations are still struggling to cope with fraud detection as a discipline.

Corporate fraud is a multi-industry global phenomenon. One in five com-panies in Western Europe highlighted a significant increase in fraud in 2010. A similar pattern was observed in Latin America and the Middle East and Africa [1]. On the other side of the world, Asia is reported to have the highest number of employees who do not know what to classify as misconduct [2], clearly deter-ring the expansion plans of Western firms

into the emerging markets. Further, the 2010 annual report of the Reserve Bank of India indicates a near doubling in the two yearly average of the cost of fraud incurred between 2007-2008 and 2009-2010 (Figure 1).

FRAUD DETECTION TECHNIQUES

NEED TO EVOLVE

With corporate frauds estimated to cost 5 percent of global revenue every year [3], fraud detection has been on the radar of companies over the last few years. However, certain myths regarding fraud detection have also become perva-sive, and they are often cited as grounds for not deviating from traditional fraud de-tection. The first step toward managing frauds is to debunk such myths.

Figure 1: Frauds in the Indian Banking sector.Source: RBI Annual report 2010

Page 60: Data Analytics and Performance

W W W. I N F O R M S . O R G6 0 | A N A LY T I C S - M AGA Z I N E . O R G

Myth 1: Fraud detection needs invest-ments only in risk-prone disciplines such as financial accounting.

Some of the biggest scams that shook the world – the Enron scandal, WorldCom bankruptcy and the Barings PLC col-lapse – are certainly financial in nature. However, the incidence of some lesser-known, non-financial schemes such as supply chain frauds and e-crimes have subjected various firms to huge financial losses, making them equally grave con-cern areas.

Myth 2: Internal audits and whistle-blower policies are adequate fraud detection techniques.

Approximately 50 percent of firms surveyed by KPMG in 2010 indicate that they rely on internal audits to detect fraud while about 25 percent indicate that they rely on tips/whistleblowers. However, the use of more sophisticated techniques such as data analytics can help detect frauds faster. For instance, the use of Link Analysis, a technique that identi-fies the connections and network of a

FORENSIC ANALYTICS

Figure 2: Corporate fraud is widespread.Source: Kroll Global Fraud report 2010

Request a no-obligation INFORMS Member Benefits PacketFor more information, visit: http://www.informs.org/Membership

Page 61: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 61A N A L Y T I C S

fraudster, could have helped detect the 50-person Medicare fraud described ear-lier. Similarly, a Web server survey [4] could have helped brands minimize their losses from Epsilon data theft, while an anomaly based machine-learning sys-tem could have helped detect the credit card fraud faced by CC Avenue.

Myth 3: Fraud prevention through effi-cient security measures renders fraud detection unnecessary.

Although prevention techniques such as holographs on banknotes, Internet security systems for credit card trans-actions and subscriber identity module cards for mobile phones or predictive analytic techniques (such as profiling potential fraudsters) are leading meth-ods used to contain fraud, fraudsters are increasingly adaptive. Analytics-based fraud detection techniques can identify frauds that have passed through the pre-vention system.

USING FORENSIC ANALYTICS TO

DETECT CORPORATE CRIMES

The process of fraud management goes beyond fraud detection as illustrat-ed in Figure 3.

In a typical fraud management system, once a fraud is detected, suitable alarms are raised that are then scrutinized to confirm the incidence of a fraud before any further action for resolution is warranted.

Forensic analytics, on the other hand, encapsulates a diverse set of techniques used to identify data-based anomalies and to use such outlier trends to detect/predict the occurrence of frauds. Although a subset of the analytics discipline, foren-sic analytics differs from general analyt-ics in the following ways:• Forensic analytics is extremely

data heavy – it needs to learn from every fraudulent and regular (non-fraudulent) activity and hence cannot use a sample of data as general analytics does.

Figure 3: Process of fraud management.

Page 62: Data Analytics and Performance

W W W. I N F O R M S . O R G62 | A N A LY T I C S - M AGA Z I N E . O R G

• Forensic analytics requires human intervention – the cost of a misclassified fraud and the investigation therein is extremely high in most industries, and hence an alarm raised by forensic analytics is usually subjected to further human scrutiny and resolution.

FORENSIC ANALYTIC TECHNIQUES

Although statistical techniques for fo-rensic analytics are varied, they have a common theme: comparing the observed values with expected values. The ex-pected values, in turn, could be derived using multiple techniques – starting from simple numerical summaries (graphical summaries) to more sophisticated be-havior profiling or anomaly based mod-eling techniques to obtain suspicion scores.

Statistical tools for fraud detection can either be supervised or unsupervised. Supervised methods use both fraudu-lent and non-fraudulent data records to construct models, while unsupervised methods use only outlier (potentially fraudulent) records that could be further analyzed closely.

CONCLUSION

With an increase in the number of fraudulent activities in the recent past,

a robust fraud management system is increasingly being seen as a must-have across the globe and across in-dustries. Forensic analytics offers a collective set of techniques to make data-driven decisions to combat fraud. Ranging from simple rule-based tech-niques to complex self-learning and predictive algorithms such as neural networks, forensic analytics can be used for both prevention and detection of various types of frauds. It is a com-plex and adaptive approach, which could well become the norm in fraud management in the coming decade. ❙

Priti Ravi is a senior manager with Mu Sigma, specializing in providing analytics-driven advisory services to some of the largest retail, pharmaceutical and technology clients spread across the United States. She has more than eight years of experience in the corporate sector. Ravi completed the Post Graduate Programme in Management from the Indian School of Business, specializing in marketing and finance.

FORENSIC ANALYTICS

REFERENCES

1. Ernst and Young 11th Global fraud survey, 2009-10.

2. CEB’s Compliance and Ethics Leadership Council, 2009.

3. ACFE 2010 Global Fraud Survey.

4. a service provided by vendors with access to host names, domain names and first page content of websites that can check for the occurrence of the brand’s trademarks or commonly used phrases.

5. RBI Annual Report, 2010.

6. Kroll Global Fraud Report, 2010.

7. Deloitte Airline Fraud Report, 2010.

Page 64: Data Analytics and Performance

W W W. I N F O R M S . O R G6 4 | A N A LY T I C S - M AGA Z I N E . O R G

Fundamental principles of analytic communication

BY EVAN S. LEVINE

SOFT SKILLS

Page 65: Data Analytics and Performance

A

J U LY / A U G U S T 2 013 | 65A N A L Y T I C S

Analytic ideas and findings are often surprising, subtle and technically complex. These qualities can make

them challenging to communicate, re-gardless of the audience. On the other hand, analysts have a great deal of freedom over the manner in which they communicate ideas and findings – some overarching, general principles can help analysts make decisions in this regard.

These sorts of principles are useful be-cause communication advice for analysts is fragmented, primarily by medium. We use data visualization books to help us build plots, slide construction guides to help us build presentations, and Web manuals to help us build Websites. The advice specific to these media is very useful, but establish-ing overarching principles helps analysts make decisions regarding how to organize communication materials by keeping a small set of objectives in mind.

Four principles apply to all analytic communication, regardless of audience or medium. These principles are: clar-ity, transparency, integrity and humility. (A similar list of principles for excellence in analytic communication appears in Markel (2012 [1]).

Whenever you are faced with a de-sign decision for a communication prod-uct, return to these principles and they will guide you to a good solution.

An alternative frame for these prin-ciples is to think of them as fundamental objectives (as the term is used in decision analysis) for the analytic communication process. Some alternatives will solely im-pact one of the objectives; for example, sometimes an analyst can improve the clarity of a plot by changing the color that the lines are drawn with. On the other hand, sometimes alternatives will involve tradeoffs between the objectives; those decisions are generally more difficult, and which alternative is preferred can depend on the audience or the medium.

Let’s discuss each of the principles in more depth.

CLARITY

Clarity is the expression of an analytic concept in a manner that is simple, direct, efficient and effective. Extraneous lines, words, colors and other markings are minimized so the key idea can be placed front and center. At the same time, the concept must be expressed in a manner that is understandable and not oversim-plified; minimization should not be taken so far that the finding disappears or key data features are lost.

Consider how experts on analytic communication in various media make recommendations that maximize clarity:• In data visualization, clarity is

exemplified by Tufte’s first two

Page 66: Data Analytics and Performance

W W W. I N F O R M S . O R G6 6 | A N A LY T I C S - M AGA Z I N E . O R G

principles in the theory of data graphics (Tufte 2001 [2]) – “above all else show the data” and “maximize the data-ink ratio.” In other words, when making a data visualization, don’t add tick marks, gridlines or decorative elements unless they actually convey information. At the same time, don’t eliminate any markings that impart information about the data.

• Iliinsky and Steele (2011 [3]), in their book on data visualization, are expressing the desire for clarity when they recommend “function first, suave second.” (Personally, I would put suave even lower on the list.)

• In his guide to slide presentations, Reynolds describes the Zen concept of simplicity as one of his guiding principles (Reynolds 2008 [4]). Reynolds’ simplicity is similar to what I’ve called clarity, as evidenced by his advice that “simplicity can be obtained through the careful reduction of the nonessential,” as long as it also “gets to the essence of an issue.”

• In their classic book on style in writing, Strunk and White (1959 [5]) stress the importance of clarity by emphasizing the repercussions when an author fails to achieve it: “Muddiness is not merely a destroyer of prose, it is also a destroyer of life, of hope: death on the highway caused by a badly worded road sign, heartbreak among lovers caused by a misplaced phrase in a well-intentioned letter, anguish of a traveler expecting to be met at a railroad station and not being met because of a slipshod telegram. Think of the tragedies that are rooted in ambiguity, and be clear!”

ANALYTIC COMMUNICATION

When communicating analytic ideas and

findings, clarity means that you should maximize

efficiency, whether measured through words, lines or colors, while still conveying your thoughts

forcefully, definitively and, most importantly,

understandably.

Page 67: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 67A N A L Y T I C S

• One of the rules of journalism is not to “bury the lead (intro).” The writer should put the most important finding at the front of the story, stated directly. If the important finding is placed deeper in the story, the audience is more likely to miss it.

In summary, when communicat-ing analytic ideas and findings, clar-ity means that you should maximize efficiency, whether measured through words, lines or colors, while still

conveying your thoughts forcefully, de-finitively and, most importantly, under-standably. (Scientists can think about clarity as maximizing the signal-to-noise ratio of communication.)

TRANSPARENCY

Transparent analytic communication explains to the audience the method by which the findings were derived, acces-sibly and to an appropriate depth. In ad-dition to presenting the methodology, part of transparency is ensuring that the

Page 68: Data Analytics and Performance

W W W. I N F O R M S . O R G68 | A N A LY T I C S - M AGA Z I N E . O R G

audience understands the assumptions that underlie the analysis. This is im-portant because if the assumptions are violated (whether through further study, natural change or some other means), or even if the audience just doesn’t ac-cept the assumptions, there will be impli-cations for the findings. Transparency is most appropriately applied to the entirety of an analytic communication package, as opposed, for example, to a single plot inside of a technical document.

Some examples of transparency in action include:• A journal article that describes the

methodology behind an interesting result. In many fields, the author of an article is required to publish his or her methods to the level of detail required for another researcher to replicate every step.

• A financial analyst making a presentation who discloses his data sources and the techniques by which he processed them.

• An analyst who publishes on the Web the raw documents and code used in his new text mining method.

• A scientist speaking to the general public who, given the limited time allotted for her presentation, refers the audience to a freely available white paper for those who would like more technical detail.

Why is it important for analytic com-munication to be transparent? Shouldn’t an analyst only care that the findings are communicated correctly?

First of all, one of the benefits of do-ing analysis is that there is a logical line of reasoning that leads to a finding; analysts don’t need to rely on the assertion of find-ings without support. This gives analytics a competitive advantage versus other types of arguments, such as “gut-based” reason-ing or subject matter expertise, and that advantage should only be squandered for very good reason. In other words, transpar-ency builds the audience’s confidence in the findings.

Secondly, part of our responsibility as analysts is to expose our line of reasoning to questions and comments. Sometimes this feedback reveals errors, oversights or forgotten assumptions. These correc-tions should be welcomed, because in the long run they result in better analysis. In most cases, however, an analyst will have a ready answer to a question or a comment because he or she has spent more time thinking about the data and findings than the audience. Answering questions directly also increases the au-dience’s confidence in the findings.

Finally, transparency helps to spur on other analysis. This occurs because reveal-ing the methodology behind a finding can give a member of the audience an idea to

ANALYTIC COMMUNICATION

Page 69: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 6 9A N A L Y T I C S

solve a problem they’ve encountered or, even if transparency doesn’t spark an im-mediate idea, it can add another tool to the analytic toolbox of the audience members. The audience can also more easily recog-nize other opportunities to apply analytics, sometimes bringing business and collab-orative opportunities to the analyst.

The transparency communication ob-jective demonstrates a benefit of keeping the line of reasoning as simple as possible – simple methodologies are easier to explain, and if you can’t explain the methodology in

a way that the audience will understand, few people will believe the findings.

INTEGRITY

Analytic communication with integrity conveys defensible and sound conclu-sions driven by the data, while properly representing the limitations and uncer-tainty in the findings.

As analysts, our first responsibility is to ensure that we are communicat-ing the data and findings accurately and honestly. However, it can be tempting to

Page 70: Data Analytics and Performance

W W W. I N F O R M S . O R G70 | A N A LY T I C S - M AGA Z I N E . O R G

exaggerate the implications of the data, because, in all likelihood, no one will look at your data as thoroughly as you will. Analysts engage in exaggeration for many different reasons, whether to fur-ther their career, to please the audience or simply to make a stronger argument than the data support. It is important to understand that this temptation is com-mon; as analysts, we spend inordinate amounts of time and energy focused on our work and to be tempted by a larger reward is entirely natural.

However, it is counterproductive to en-gage in this kind of overreach, whatever the reasoning behind it. In the long run this kind of behavior will have a negative effect on your career, particularly in the opinion of other analysts. Additionally, as analysts, we study real phenomena, and our techniques are designed to reveal real insight regarding these phenomena. Even if your colleagues can’t tell that you’ve gone too far, eventually the phenomena you are studying will show the truth.

In addition to communicating the data and findings accurately, the presentation of limitations and uncertainty in analytic communication is integral to integrity. This information allows the audience to use the findings responsibly – if limitations and un-certainty are not presented or are minimized, the audience is likely to apply the findings in regimes beyond which they are valid. It also

ANALYTIC COMMUNICATION

facilitates comparisons between different analysts’ findings.

Integrity is connected to the concept of “epistemological modesty,” a complicated sounding phrase that describes a simple idea. Roughly, analysts that demonstrate epistemological modesty do not overstate the findings and certainty of their work be-cause they recognize that the real world is quite complex and often difficult to under-stand and model. Analysis can break down in very surprising ways even if you’ve care-fully accounted for the known sources of uncertainty. Keep this in mind when com-municating findings. A good example of the concept of integrity in action can be found in data visualization. When making plots, it is easy to exaggerate the trend you are trying to demonstrate by adjusting the axes in improper ways or by showing only se-lected subsets of the data. (This behavior is common in situations where the analysis was carried out to support a predetermined position – it’s often seen in politics.) Tufte (2001, [2]) expresses integrity by arguing that “graphical excellence begins with telling the truth about the data.” The analyst should present the data in such a way that the audi-ence leaves with an accurate and complete impression.

HUMILITY

By humility in analytic communication, I mean that we should strive to remove

Page 71: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 71A N A L Y T I C S

– good analytic communication requires a great deal of intelligence!

CONCLUSION

These basic principles can help guide our decision-making when it comes to com-municating analytics. However, I don’t to imply that there is one right answer to com-munication decisions. Even with the con-straints imposed by the principles there is still plenty of room for individual style, unique voices and elegant solutions. ❙

Evan S. Levine ([email protected]) is the lead for analytics at the New York City Police Department’s Counterterrorism Bureau. Previously, he served as chief scientist at the Department of Homeland Security’s Office of Risk Management and Analysis. Levine is the author of the textbook “Applying Analytics: A Practical Introduction,” forthcoming from Taylor & Francis. This article is an excerpt from the book and is reprinted with permission. The views expressed in the article are those of the author and do not necessarily represent those of his employer. Levine is a member of INFORMS.

the analyst from the message. In writing, Strunk and White (1959, [5]) recommend that authors, “Place yourself in the back-ground. Write in a way that draws the read-er’s attention to the sense and substance of the writing, rather than to the mood and temper of the author.” In analytic communi-cation, too often the audience takes away the idea that the analyst is some kind of super-genius, that analytical work is inac-cessible, or that they could never carry out their own analyses. These perceptions are detrimental to the future of our profession; analytics is a young field and in order to grow we need to attract people and busi-ness by making ourselves and our work as accessible as possible. Furthermore, the data and the conclusions drawn from it should speak for themselves – if you find yourself needing to rely on your authority as an analyst, that’s a sign that you may be overreaching.

We can communicate with humility by not encouraging a “cult of personality” around the analyst. For example, you can talk about mistakes that you made in the ini-tial pass through the analysis or ways you feel the findings are difficult for you to un-derstand. Discussing these sorts of things won’t hurt the audience’s opinion of you; in fact, it will actually improve it, because they will find you more relatable. Further-more, they’ll also think you are smart, in the way that we think great teachers are smart

REFERENCES

1. Markel, Mike, 2012, “Technical Communication,” Bedford/St. Martin’s, Boston, Mass.

2. Tufte, Edward R., 2001, “The Visual Display of Quantitative Information,” Graphics Press, Cheshire, Conn.

3. Lipinski, Noah, and Steele, Julie, 2011, “Designing Data Visualizations,” O’Reilly Media, Sebastopol, Calif.

4. Reynolds, Garr, 2008, “Presentation Zen: Simple Ideas on Presentation Design and Delivery,” New Riders, Berkeley, Calif.

5. Strunk, William and White, E.B, 1959, “The elements of style,” Macmillan Company, New York,. N.Y.

Page 72: Data Analytics and Performance

W W W. I N F O R M S . O R G72 | A N A LY T I C S - M AGA Z I N E . O R G

LEARNING RESOURCES

SCIENCE OF BETTER: PODCASTSGain insights from experts on how

math, analytics, and operations research affect organizations like yours in these 20-30 minute podcasts conducted by INFORMS Director of Communications Barry List. Visit www.scienceofbetter.org/podcast.

Brian Keller, Booz Allen HamiltonHadoop Anyone?Recorded May 24, 2013

Don Kleinmuntz, Strata Decision Technology Healthcare analytics: Hospitals and ObamacareRecorded May 2, 2013

Carrie Beam, Carrie Beam ConsultingSoft Skills for Lone WolvesRecorded April 19, 2013

Atanu Basu, Ayata5 Pillars of Prescriptive AnalyticsRecorded April 5, 2013

Stephen Budiansky, Author Blackett: WWII’s Indispensable ManRecorded March 21, 2013

Matthew Liberatore and Wenhong Luo, Villanova University Bright Contrast in Roles of OR and AnalyticsRecorded March 12, 2013

Brett R. Gordon, Columbia University & Wesley R. HartmannAdvertising & Presidential CampaignsRecorded February 22, 2013

Michael Gualtieri, Forrester ResearchForrester on Predictive Analytics SolutionsRecorded February 7, 2013

Maksim Tsvetovat, Deepmile Networks & George Mason UniversitySentiment Analytics: What Does the Blogosphere Think?Recorded January 25, 2013

INFORMS’ Library of Audio and Video Presentations

BARRY LIST

Page 73: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 73A N A L Y T I C S

Ralph Keeney, Duke University Brainstorming with Ralph Keeney Recorded January 10, 2013

Gary Lilien, Penn State University Marketing Analytics: A Must for Retailers and ManufacturersRecorded November 28, 2012

Michael Schroeck , IBMBig Data: Extracting the ValueRecorded November 11, 2012

Arnold Barnett, MIT’s Sloan SchoolTerror Goes to GroundRecorded October 24, 2012

Allan Lichtman, American University and Sheldon Jacobson, University of IllinoisForecasting the U.S. Presidential ElectionRecorded September 21, 2012

Michael Fry, University of Cincinnati & Jeffrey Ohlmann, University of Iowa More than MoneyballRecorded July 6, 2012

Chrysanthos Dellarocas, Boston University The Pay per Click ParadoxRecorded June 8, 2012

Gary Cokins, SAS consultantMystery of Dying Industry GiantsRecorded May 24, 2012

Wally Hopp and Roman KapuscinskiDoes American Manufacturing Have a Future?Recoded May 11, 2012

U.S. Army Major Rob DeesMeasure of a SoldierRecorded April 27, 2012

Theresa Kushner, the Senior Director of Customer Intelligence at CiscoMarketing Analytics at CiscoRecorded March 30, 2012

Sheldon Jacobson, University of Illinois Urbana-ChampaignMarch Madness O.R. StyleRecorded March 16, 2012

Renee Adams, University of New South Wales and Patricia Funk, Universitat Pompeau Fabra and Barcelona Graduate School of EconomicsBeyond the Glass CeilingRecorded March 2, 2012

Plus Archival Podcasts back to 2009

Page 74: Data Analytics and Performance

W W W. I N F O R M S . O R G74 | A N A LY T I C S - M AGA Z I N E . O R G

INFORMS VIDEO LEARNING CENTER

View these free, on-demand pre-sentations, complete with slides, from INFORMS renowned meetings and confer-ences. Visit https://www.informs.org/Ap-ply-Operations-Research-and-Analytics/INFORMS-Video-Learning-Center

INFORMS ANALYTICS

CONFERENCE 2013

2013 Edelman Award Presentations

Dutch Delta Commissioners: Economically Efficient Standards to Protect the Netherlands against Flooding

Operations Research Transforms Baosteel’s Operations

Optimizing Chevron’s Refineries

Dell’s Channel Transformation –Leveraging Operations Research to Unleash Potential across the Value Chain

Kroger Uses Simulation –Optimization to Improve Pharmacy Inventory Management

McKesson: A Holistic Supply Chain Management Solution

Keynote Presentations

Betting the Company: The Role of Analytics by Jerry Allyne, Boeing Commercial Airplanes

Achieving Social Success: How Data and Analytics Guide the Social Business Journey by Sandy Carter, IBM

Analytics Process Presentations

Effective Use of Business Analytics by Kathy Lange, SAS

Hospital Business Analytics in an Era of Healthcare Reform by Don Kleinmuntz, Strata Decision Technology

Stop Sacred Cows before they Stop Analytics! By Jake Breeden, Breeden Ideas

Optimizing as if People Matter by Steve Sashihara, Princeton Consultants

Integrated Analytics in Transportation and Logistics by Ted Gifford, Schneider National

LEARNING RESOURCES

Page 75: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 75A N A L Y T I C S

INFORMS ANALYTICS

CONFERENCE 2012

2012 Edelman Award Presentations

Chris Goossens, TNT; Hein Fleuren, Tilburg University; Davide Chiotti, TNT; Marco Hendriks, TNTSupply Chain-Wide Optimization at TNT Express

Frederic Deschamps, Carlson; Pelin Pekgun, JDA Software; Suresh Acharya, JDA Software; Kathleen Mallery, CarlsonCarlson Rezidor Hotel Group Maximizes Revenue through Improved Demand Management and Price Optimization

Greg Burel, CDC; Eva K. Lee, Georgia Institute of TechnologyCenters for Disease Control and Prevention: Advancing Public Health and Medical Preparedness with Operations Research

Sofia Archontaki, Danaos; Takis Varelas, Danaos; Iraklis Lazakis, Danaos; Evangelos Chatzis, DanaosOperations Research in Ship Management: Maximizing Fleet-Wide Revenue Routing at Danaos

Suresh Subramanian, HP; Prasanna Dhore, HP; Girish Srinivasan, HP; David Hill, HPHewlett-Packard: Transformation of HP’s Business Model through Advanced Analytics and Operations Research

Karl Kempf, Intel; Feryal Erhun, Stanford University; Robert Bruck, IntelOptimizing Capital Investment Decisions at Intel Corporation

Panel Discussion:Diego Klabjan, Northwestern University; Thomas Olavson, Google; Blake Johnson, Stanford University; Daniel Graham, Teradata; Michael Zeller, CEO, Zementis, Inc. Innovation and Big Data: Panel Discussion

Plus Archival Videos of Other Great TalksINFORMS Annual Meeting 2012

INFORMS Annual Meeting 2011

INFORMS Analytics Conference 2011

INFORMS Annual Meeting 2010

INFORMS Practice Conference 2010

INFORMS Annual Meeting 2009

Page 76: Data Analytics and Performance

W W W. I N F O R M S . O R G76 | A N A LY T I C S - M AGA Z I N E . O R G

I was captivated by this recent headline: “Man los-es life savings, wins giant banana” [1]. After disbelief subsided, I wondered – as is my habit – what five minutes of analysis might tell us.

In the news story, a man is determined to win a video game console by playing a carnival game where balls are tossed into tubs. He did not win; in fact, he lost around $2,600. There are (at least) three subproblems to consider:

1. How much to wager: the player’s point of view. Given that the game console was worth approximately $100 and not unique to the carnival, we quickly con-clude that if the object was to obtain a game console, the best course of action may be to not go to the carni-val at all. In order to assess whether or not to play, we would have to know what our probability of success in a carnival game. While this is something that we do not know, we may make a few basic assumptions.

Suppose that the wholesale value of the prize is $100. If the game costs $5 to play, we can be almost certain that an individual “on the street” has less than a 5 percent chance of winning on a single try – otherwise the “house” would never make a profit. In order to make a reasonable profit, the real odds of the game are probably lower, in the neighborhood of 1 percent. We assure ourselves, saying that we are better at tubs than the average person. If this were really the case, we should play the game some pre-determined number of times, with predetermined maxi-mum losses (10 at a cost of $50 seems reasonable) and

Carnival Game

BY HARRISON SCHRAMM

A man is determined to win a video game console

by playing a carnival game where balls are

tossed into tubs. He did not win; in fact, he lost around $2,600. There

are (at least) three subproblems to consider.

F IVE-MINUTE ANALYST

Page 77: Data Analytics and Performance

J U LY / A U G U S T 2 013 | 7 7A N A L Y T I C S

simply walk away when we reach our predetermined threshold.

Human beings are notoriously bad at walking away. The reason is be-cause we (mis-)count the money spent as “invested” instead of properly “lost.”

The more we play and are unsuccess-ful, the more evidence we gather that we are bad at the game. For example, if we have tossed the ring twice and missed both times, I would have a point estimate of 0 percent probability of success, but could construct a 95 percent confidence interval of my true probability of suc-cess as the interval 0-75 percent. If we have tossed the ring 20 times and missed, our confidence has now shrunk to 12 percent.

2. How much to al-low wagered: The carnival’s point of view. The house

could simply allow a player to lose as much money as he wishes. The longer the game goes on, the more money the carnival makes. There are two reasons why this “greedy approach” – both math-ematically and in the usual sense – may not be optimal.a. If the player becomes very upset he

could bring unwanted things outside of the wager into play. In this case it was law enforcement, but we might imagine a rough cousin.

b. If we take too much of a player’s money at once, he or she will never come back. If we space out their losses over time, and allow them to win every so often, then they may lose more in the long run.

Figure 1: A giant Rasta banana.

Figure 2: Exceptionally optimistic (95 percent upper bound) estimate of success as a function of the number of unsuccessful trials. The longer we play without success, the more certain we should be that we aren’t very good at carnival games! This chart was computed by solving: 1 – (1 – p)N = B , i.e. p = 1 – eln(1 – B) / N.

Page 78: Data Analytics and Performance

W W W. I N F O R M S . O R G78 | A N A LY T I C S - M AGA Z I N E . O R G

Both points reflect that a carnival is different than a casino. For one thing, an individual losing $2,600 at a casino is not newsworthy! People find the idea of large losses in a gambling house more palat-able than at a carnival. For a carnival, it would seem that the maximum amount to take from a person would be some small multiple of the maximum prize, in this case $200 to $300 (and throw in the prize to boot).

3. How to lose all of your money at the carnival (or anywhere else). Reason that the probability of winning the prize is non-ze-ro (it is), then they may use the properties of the geometric random variable to conclude that given enough tries, you will certainly win the prize. Armed with this (mis-)information, increase the stakes “double or nothing” until your eventual win.

The reason this does not work is that the player has not considered how much he will lose before his eventual win. This result is true in general [2]. Consider our contestant lost $300, then lost the rest in a few rounds of “double or nothing’; in this case, three.

In order for “double or nothing” to be a vi-able gambling strategy, the cumulative dis-tribution of winning has to grow faster than you are hemorrhaging money, and your losses are exponential in that they grow like 2N. At the carnival, for “double or nothing” to

work, the player needs to be relatively con-fident that he will be successful on at least one toss before losing it all. In order to have a 50 percent chance that he would be suc-cessful at least once in three rolls, the player would need to be approximately 20 percent certain that he would be successful on any one roll. Not comforting, given that his point estimate of winning when he began playing “Double or Nothing” is zero.

“Double or Nothing” is a very good way to lose all of your money very fast.

Bonus footnote: When I was young, there was a pizza establishment in my hometown that had a video game where for 50 cents, you could win prizes, the “grand prize” being lunch. I had a high school classmate who through several weeks of effort learned the game well enough that he could win lunch every day. He got lunch for 50 cents each day for about a week before the proprietors grew tired and removed the game! ❙

Harrison Schramm ([email protected]) is an operations research professional in the Washington, D.C., area, and he is a member of INFORMS.

FIVE-MINUTE ANALYST

REFERENCES

1. http://newsfeed.time.com/2013/05/01/man-loses-life-savings-playing-carnival-game-wins-giant-banana/

2. A very nice explanation involving Casanova can be found on p. 333 of “Probability and Random Processes” by Grimmett and Stiraker, Oxford University Press, 2001.

Page 80: Data Analytics and Performance

John Toczek is the director of Decision Support and Analytics for

ARAMARK Corporation in the Global Risk Management group. He earned

his bachelor’s of science degree in chemical engineering at Drexel

University (1996) and his master’s degree in operations research from Virginia Commonwealth University

(2005). He is a member of INFORMS.

W W W. I N F O R M S . O R G8 0 | A N A LY T I C S - M AGA Z I N E . O R G W W W. I N F O R M S . O R G8 0 | A N A LY T I C S - M AGA Z I N E . O R G

Self-driving cars are cars that can drive them-selves without a human behind the wheel. This tech-nology should be available in the not-too-distant future. New algorithms will need to be developed to help route these cars to get their passengers to their desired destinations efficiently.

Figure 1 shows 10 people in need of transportation. Their current location (the pick-up point) is indicated by the person icon and their desired destination (the drop-off location) is indicated by the building icon. The purple arrow indicates the path from the pickup location to the drop-off location. Your job is to order the passengers so that they are picked up in an order that minimizes the total distance travelled by the self-driving car.

The car can start at any pickup point. You may only carry one person at a time. The car does not need to return to its starting point after the last person is dropped off. There is only one self-driving car avail-able for use. Use Pythagorean theorem to calculate the distance between cells. For example, the distance between the person nearest the lower left corner and her drop-off point is 3.162 km.

QUESTION: What is the minimum distance the car must travel in order to transport all of the passengers from their pick-up points to their drop-off points?

Send your answer to [email protected] by Sept. 15. The winner, chosen randomly from correct an-swers, will receive a Magic 8 Ball. Past questions can be found at puzzlor.com. ❙

Self-driving car

BY JOHN TOCZEK

THINKING ANALYTICALLY

Figure 1: How far must the car travel to get everyone where they want to be?