View
1
Download
0
Category
Preview:
Citation preview
rediscoveringBI
After the big dAtA pArty04April 2013
issue 7
radiant Advisors publication
The Big DaTahoneyMoon
Over AlreAdy?
Bi anD Big DaTa
bringing them tOgether
Big DaTa vs.DaTa ManageMenT
Bi's BigQUesTion
A zerO-sum scenAriO
hAs the bubble burst?
WW
2 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 3
editor in Chief Lindy Ryanlindy.ryan@radiantadvisors.com
ContributorDr. Barry Devlinbarry@9sight.com
ContributorKrish Krishnanrkrish1124@yahoo.com
ContributorJohn O’Brienjohn.obrien@radiantadvisors.com
Distinguished WriterStephen Swoyerstephen.swoyer@gmail.com
art DirectorBrendan Fergusonbrendan.ferguson@radiantadvisors.com
For More information:info@radiantadvisors.com
frOm the editOr“Big data” is being routinely paired with proportionally “big” descrip-
tors: innovative, revolutionary, and even (in this editor’s humble opinion,
the mother-of-all-big-descriptors) transformative. With the inexorable
momentum of cyber-journalism keeping it affixed atop industry head-
lines, big data has indeed earned itself quite the reputation, complete
with a stalwart following of industry pundits, papers, and conferences.
In fact, the whole big-data-thing has mutated into a sort-of larger-
than-life caricature of promise and possibility – and, of course, power.
Let’s face it: big data is the Incredible Hulk of BI -- gargantuan, brilliant,
and, yes, sometimes even a bit hyper-aggressive. For many the mild-
mannered, Bruce Banner-esque data analyst out there, big data is, for
better or worse, the remarkably-regenerative, impulsive alter ego of the
industry, eager to show with brute force just how much we can really do
with all our data – what the tangible extent of all that big data power
is, so to speak.
Yet, as with the inaugural debut of Stan Lee’s destructive antihero’s in
The Incredible Hulk #1, in early 2013 we still haven’t begun to really see
what big data can do yet. Not even close.
In this month’s edition of RediscoveringBI, authors Dr. Barry Devlin, Krish
Krishnan, Stephen Swoyer, and John O’Brien each explore different fac-
ets of that very construct, asking has the big data bubble actually burst,
or is its honeymoon phase just over? How much is just hype? And, how
much is only a precursor to what we’ll continue to see buzzing around
and reinventing the industry?
What’s next for BI’s Incredible Hulk antihero, Big Data?
Lindy R
rediscoveringBI April 2013, issue 7
SPOTLIGHT
[P4]The Honeymoon is Over for Big Data big data, it turns out, means precisely nothing and imprecisely anything you
want it to mean.
[By Dr. Barry Devlin]
[P16]Bringing BI and Big Data Together three
things that make a “big” difference when
implementing big data.
[By John o’Brien]
[P10] Twilight of the (DM) Idols big data is already a
disruptive force: at once democratizing,
reconfiguring, and destructive.
[By stephen swoyer]
[P8] Has the Big Data Bubble Burst? the bi industry
is abuzz with one new question: is big
data done?
[By Krish Krishnan]
EDITOR’S PICK
[P7] Big Data Revolution Are we becom-
ing no more than sentient founts
of data? mayer-schönberger and
cukier put the pulse back in the
big data conversation.
[By Lindy Ryan]
[P13] Big Impact: Big Data two use cases for big
data are having a big impact, at least
from a data management perspective.
[P15] A Kludge Too Far? [By stephen swoyer]
rediscoveringBI
After the big dAtA pArty04April 2013
issue 7
radiant Advisors publication
The Big DaTahoneyMoon
Over AlreAdy?
Bi anD Big DaTa
bringing them tOgether
Big DaTa vs.DaTa ManageMenT
Bi's BigQUesTion
A zerO-sum scenAriO
hAs the bubble burst?
Lindy Ryan
Editor in Chief
Radiant Advisors
feAtures
sidebAr
yan
4 • rediscoveringBI Magazine • #rediscoveringBI
Evolution vs. RevolutionThis is an excellent, thought-provoking article. I believe
that you are correct in the assertion that an Architectural
Reckoning is underway. In fact, I believe it has been underway
for at least 10 years.
To focus on technology in general and Hadoop in particular is,
however, to miss the point. The Reckoning is being driven by
the intersection of business needs and technology advances.
Both sides can be summed up as “faster and smarter” – and
they are mutually reinforcing. I call this the “biz-tech ecosys-
tem”. On the technology side, Hadoop is and will be part of
it. So will a range of other data management technologies,
including relational databases, for sure. And I believe that
the various approaches – column, in-memory and more – will
be combined into a hybrid approach more powerful than
any RDBMS we have today. And we will need that, because
I am certain that the Data Warehouse – in a new, more cir-
cumscribed role, but central to consistency and reliability
for information that must be of high quality – will continue
to thrive. (And many thanks for the historical positioning of
Paul’s and my paper from 1988!) I call this new role “Core
Business Information.”
As you also pointed out, it’s not just about data management.
What is happening is also changing application development,
as well as process and business modeling and implementa-
tion. Collaborative and social computing are also vital com-
ponents of the mix. So, yes, an inter-disciplinary approach will
be needed – not just within IT but across the business – IT
divide.
We are also in somewhat of a positive feedback loop – and as
anyone who has ever put a microphone in front of speakers
knows, the result rapidly becomes very unpleasant. So, we do
need to step back from the hype of big data and recognize the
dangers as well as the opportunities.
My bottom line: yes, we are in a time of Architectural
Reckoning (is this the same as a Paradigm Shift?) but continu-
ity of thinking and a mindset of evolution rather than revolu-
tion are vital. I’m trying to capture this in my long-awaited (by
me, anyway) second book.
- Dr. Barry Devlin (Editor's Note: See The Honeymoon is Over for
Big Data by Dr. Barry Devlin in this month's issue)
Augmentation of Traditional DWsI totally agree with Claudia that “not all analytics now belong
inside the BI architecture” and that we are in a “very disrup-
tive period of a lot of new technologies flooding in to busi-
ness intelligence.” I also am not actually that far away from
the position Scott Davis takes: I agree that “Hadoop is a huge-
ly transformational technology.” I just think for the short- to
medium-term Hadoop et al are going to augment, rather than
replace, traditional data warehouses. Will Hadoop replace a
traditional data warehouse database in the long term? Only if
it adds a lot of database like features, and then the argument
becomes a lot less interesting – something akin to the “Will
Ingres/Informix/Sybase replace Oracle?” debate of yesteryear.
My main concern is how customers are going to embrace this
new data landscape, rather than if they are going to. How are
organizations going to build a data landscape that includes
Teradata, Aster, and Hadoop? How are they going to manage
Analysis Services cubes and a smattering of legacy Oracle
data warehouses?
Data warehouses currently take too long to build and are too
hard to change. The new architectural changes are going to
make things worse not better.
Yes, WhereScape does have a stake in the game –although
not in the status quo. Regardless of the platform, design and
technology the need to deliver quickly without compromise
remains the same. Who wants to manually build out a mul-
tiple platform data warehouse? A data warehouse automation
environment (such as WhereScape RED) helps simplify the
approach, and I believe is a key piece of the new architecture.
- Michael Whitehead (Editor's Note: Michael Whitehead is the
CEO and Founder of WhereScape)
Agile and flexible -- those might well be the mantras of Modern Data Platforms. As
organizations look to harness the latest advances in analytics and integration technolo-
gies, the focus turns quite sharply to architecture: the right data platform can empower
companies to harness everything from Big Data to real-time, all without sacrificing data
quality and governance.
Register for this free Webcast to catch a preview of SPARK!: Modern Data Platforms, a
three-day seminar series to be held in Austin, TX, from April 29 - May 1. The seminar
will feature a tag-team of experts from Radiant Advisors and The Bloor Group, who will
provide detailed instruction on the range of activities associated with modernizing and
evolving robust data platforms. John O'Brien of Radiant will focus on Rediscovering BI,
while Dr. Robin Bloor of The Bloor Group will discuss the Event-Driven Architecture.
Attendees of the Webcast will receive a discount code for $150 off the in-person seminar.
rediscoveringBI
shifting gEARs with ModERn bi ARchitEctuREs03MARch 2013
issuE 6
Radiant Advisors Publication
EvEnt-drivEn architEcturEs
thE shifting lAndscAPE
timE ofrEckoning
arE data modEls dEad?
An ARchitEctuRAl
thE REAl dEbAtE
collision couRsEsElEctingthE rightbi solution
tying goAls to REquiREMEnts
http://www.bigdatabootcamp.net
Don't miss Radiant Advisors' John O'Brien as he keynotes the upcom-
ing Big Data Boot Camp.
May 21-22New York
HiltonJohn will offer perspective into the
dynamics and current issues being
encountered in today's Big Data
analytic implementations as well
as the most important and strate-
gic technologies currently emerg-
ing to meet the needs of the "Big
Data Paradigm."
Join John and other Big Data
experts as they converge upon New
York and be sure to save an extra
$100 off the early bird rate by
using this link. Early bird registra-tion ends April 19.
Have something to say? Send your letters to the editor at lindy.ryan@radiantadvisors.com
opInIon
letters tO the editOr
upcoming Webinarinside AnalysisMoDeRn DaTa PLaTFoRMsinside Analysis with dr. robin bloor and John O'brienhosted by eric Kavanagh
April17
3:00pM Cst
RegisteR Now
On: Time for An Architectural Reckoning
yes, we are in a time of Architectural reckoning but continuity of think-ing and a mindset of evolution rather than revolution are vital."
- dr. barry devlin
f o l l o w t h e
c o n v e r s a t i o n # s p a r k e v e n t
6 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 7
IG DATA IS tumbling into the
“Trough of Disillusionment,”
according to Gartner’s Svetlana
Sicular. If you fear that this
means the end of the road for big data,
see Mark Beyer’s (co-lead for Gartner
big data research) remedial education
on the meaning of Gartner’s Hype Cycle
curve, although they might have chosen
a less alarmist phrase!
Let me put it another way: the big data
honeymoon is over. Let’s quickly review
the history of the romance before look-
ing to the future of the relationship.
For commercial computing, big data
“dating” really began in the mid-2000s,
when technical people in the bur-
geoning web business began to con-
sider new ways to handle the exploding
amounts and types of data generated
by web usage. Before then, big data
had been the dream -- or nightmare,
actually -- of the scientific community
where, from genetics to astrophysics,
instrumentation was spewing data. In
early 2008, the commercial romance
of big data really began to get serious
when Hadoop, the yellow poster ele-
phant child of big data, was accepted as
a top-level Apache project. The market-
ing paparazzi began stalking the couple
soon after and, true to paparazzi nature,
have been publishing a stream of outra-
geous claims and pictures ever since. By
2012, a shotgun wedding with business
the is Over fOr big dAtA
spotlIght
dr. bArry devlin
was hastily arranged. By then the gloss had begun to wear off
and the honeymoon was washed out in a brief trip to Atlantic
City at the height of a super storm.
Enough Of The Past: Let’s Look Forward!Big data does offer real and realizable business benefits,
but there is one major issue: what actually is big data? The
“volume, variety, and velocity” nomenclature, claimed by Doug
Laney from a 2001 Meta Group research note, is useful short-
hand at best. In reality, each attribute opens up a question of
how far on any scale must data be in order to be called big
-- how vast, how disparate, how fast? Furthermore, what combi-
nation of these three factors should be used in making a call?
Big data, it turns out, means precisely nothing and imprecisely
anything the Mad Men want it to mean. And, with the various
additional “v-words” vaunted by vendors, the value vanishes.
(Oops, I veered into the v-v-verge there!)
The extent of this terminology problem was made clear in
a big data survey conducted last fall by EMA and myself.
Participants were those who declared they were investigat-
ing or implementing big data projects yet almost a third of
respondents classed the data source for their projects as
“process-mediated data” -- data originating from traditional
operating systems. My conclusion: the term big data has
passed its use-by date.
Big data and “small data” are conceptually one and the same:
just data, all data. Or, to be more semantically correct, all
information, as I’ll explain in a new book later this year. (Editor's
Note: Business Unintelligence: Via Analytics, Big Data and
Collaboration to Innovative Business Insight will be published in
Q3 2013 by Technics Publications).
To be clear, I don’t consider that big data has taken us into
a dead end. Rather, it has usefully exposed the fact that our
traditional business intelligence (BI) view of the information
available to and used by business is woefully inadequate. It
has caused me to revisit many underlying assumptions about
information and I now see that there exist three domains of
information that future business intelligence/analytics must
handle, as shown in in the accompanying figure: human-
sourced information, process-mediated data, and machine-
generated data. These domains are fundamentally different in
their usage characteristics and in the demands they place on
technology. The terms are largely self-explanatory, but more
information can be found in my white paper. (See Barry Devlin’s
The Big Data Zoo - Taming the Beasts: The need for an integrated
platform for enterprise information).
The bottom line is that we need a new architecture for infor-
mation -- all of it and its entire life cycle in business.
The Biz-Tech EcosystemBoth challenges and opportunities emerge as we shift the
view from IT to business.
The biggest challenge in the big data/analytics scene is the
alleged dearth of so-called “data scientists.” How different
are data scientists from the power users we’ve known in BI
for decades? Arguably, the only substantive difference is deep
statistical skill. The other characteristics mentioned -- data
munging, business acumen, and storytelling -- are all com-
mon to power users. Statistics, however, is a very specialized
skill that should, in principle, be tightly supervised to ensure
valid and proper application. The phrase “lies, damn lies, and
statistics” indicates the problem: statistics are far too easy
to misuse -- deliberately or otherwise. Moreover, we seem
to have blindly accepted an assertion that the exponential
growth in data volumes implies a similar growth in hidden
nuggets of useful business knowledge. This is unlikely to be
true. Most of the good examples of business value coming
from big data illustrate this. Real value emerges from a new
type or new combination of data; growth in volumes leads to
incremental increases in value, at best.
These challenges aside, a focus on novel (big) data use does
big data, it turns out, means precisely nothing and imprecisely anything”“[The term big data has passed its use-by date]
8 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 9
drive opportunities for new businesses, business models, or,
simply, ways to compete. A useful, cross-industry categoriza-
tion (courtesy of IBM) of these opportunities is:
• Big Data Exploration: analyze “big data” to identify new
business opportunities
• Enhanced 360° View of the Customer: incorporate human-
sourced information sources, such as call center logs and
social media, into traditional CRM approaches
• Security and Intelligence Extension: lower risk, detect fraud,
and monitor cyber security in real-time, machine-generated
data
• Operations Analysis: analyze and use machine-generated
data to drive immediate business results
• Data Warehouse Augmentation: increase operational effi-
ciency by integrating big data with BI
This focus on (big) data is but the latest stage in the evolu-
tion of what I call the biz-tech ecosystem -- the symbiotic
relationship between business and IT that drives all suc-
cessful, modern businesses. Every business advance worth
mentioning in the past twenty years has had technology, and
almost always information technology, at its core. On the other
hand, much of the advances in IT have been driven by busi-
ness demands. The relative roles of business and IT people
may change as the process evolves, but that process is set to
continue. And, at its heart are the collection, creation, and use
of information, as opposed to data -- big or small -- as manda-
tory, core competencies of modern business. Share your comments >
...at its heart are the collection, creation, and
use of information, as opposed to data -- big
or small -- as mandatory, core competencies of
modern business.”
“
Dr. Barry Devlin is Founder and Principal
of 9sight Consulting, and is among the
foremost authorities on business insight
and big data. He is a widely respected
analyst, consultant, lecturer, and author.
HILE IT’S INARGUABLE that the phenomenon
known as “big data” is rapidly reknitting the very
fabric of our lives, what we are just now begin-
ning to see and to understand – to appreciate
– is how.
Yet, so often our conversations about big data focus on these
“how’s” in the abstract – on its benefits, potentials, and oppor-
tunities, and likewise, its risks, challenges, and implications –
that we overlook the simpler, more primordial question: what’s
not changing?
It’s a simple question that requires a simple answer. Us. Sure,
we can assert that we’re becoming more data-dependent. We
generate more data: last month, social media giant Twitter
blogged1 that its over 200-million active users generate
over 400-million tweets per day. We consume more data: a
now-outdated University of California report2 calculated that
American households collectively consumed 3.6 zettabytes of
information in 2008. Are we – the data-generating organisms
that we are – becoming no more than sentient founts of data?
In Big Data: A Revolution That Will Transform How We Live,
Work, and Think, authors Viktor Mayer-Schönberger and
Kenneth Cukier effectively put the pulse back in the Big Data
Conversation: “big data is not an ice-cold world of algorithms
and automatons. . .we [must] carve out a place for the human:
to reserve space for intuition, common sense, and serendipity
to ensure that they are not crowded out by data and machine-
made answers.”
In our brief email exchange, Mayer- Schönberger elaborated
a bit more on this idea. “[We] try to understand the (human)
dimension between input and output,” he noted. “Not through
the jargon-laden sociology of big data, but through what we
believe is the flesh and blood of big data as it is done right
now.”
With the elegance of an Oxford University professor and
The Economist’s data editor – Mayer-Schönberger and Cukier,
respectively – Big Data’s authors remind us that it is our
human traits of “creativity, intuition, and intellectual ambi-
tion” that should be fostered in this brave new world of
big data. That the inevitable “messiness” of big data can
be directly correlated to the inherent “messiness” of being
human. And, most important, that the evolution of big data
as a resource and tool derives from (is a function of) the dis-
tinctly human capacities of instinct, accident, and error, which
manifest, even if unpredictably, in greatness. In that greatness
is progress.
That – progress – is the intrinsic value of big data. It’s what’s
so compelling about Big Data (both the book and the thing
itself): it’s not always about the inputs or outputs, but the
space – or, what Mayer-Schönberger calls the “black box,” of
in-between.
1 http://blog.twitter.com/2013/03/celebrating-twitter7.html2 How Much Information? http://hmi.ucsd.edu/howmuchinfo.php
Lindy Ryan is Editor in Chief
of Radiant Advisors.
lindy ryAn
Big Data is available on Amazon and
the Radiant Advisors eBookshelf
www.radiantadvisors.com/ebookshelf
editOr’s picK
big dAtA
Share your comments >
10 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 11
1. Build the business case and keep it simple
2. Create a data discovery environment that can be used
by line of business experts
3. Identify the data and patterns that are needed to
create a robust foundation for analytics
4. Create the initial analytics based on the data discovery
5. Visualize the data in a mash-up platform using
semantic data integration techniques
6. Get the business users to use the outcomes
7. Gain adoption of the users
8. Create a roadmap for the larger program
ECENT ARTICLES IN leading business publications, a
hype-cycle presentation by Gartner, and a number of
blogs have all startled the world of big data by asking
one “big” question: are we done? Did the big data
bubble burst even quicker than the “dot com” bubble?
Has the big data bubble burst?
The answer is: not really. If anything, the market for infra-
structure is booming with more vendors distributing com-
mercial versions of open source software (like Hadoop and
NoSQL). We are seeing the evolution of new consulting
practices focused on analytics and – perhaps most impor-
tant – traditional database vendors have all either embraced
or announced support for big data platforms. So, what is the
basis of this notion of failure or disappointment around the
big data space?
The Promised LandIn 2004, Google’s announcement of the general availability of
MapReduce and Google File System started a flurry of activ-
ity building platforms aimed at solving scalability problems.
One of these projects was “Nutch,” a parallel search engine
on the open source platform. The team at Nutch succeeded
in building the infrastructure that attracted Yahoo to sponsor
and incubate the project under its commercial name: Hadoop.
Submitted to open source in 2009, Hadoop quickly gained
notoriety as the panacea for all data scalability problems.
Since then it has become a viable platform for large-scale
computing needs and has been adopted as a data storage
and processing platform at many companies across the world.
Subsequently, the last four years have also seen the evolution
of NoSQL databases and multiple other additional technolo-
gies on the Hadoop framework.
The RealityHadoop’s early adopters did not fully understand the com-
plexities of the platform until they began implementing the
technology, and this lack of understanding inevitably has
spurred a sense of failure (or disappointment).
Among the potential gaps not understood clearly by adopters:
One size does not fit all: Big data technologies were devel-
oped to solve the problems of extreme scalability and sus-
tained performance. While these technologies have certainly
overcome the traditional limitations of database-oriented
data processing, the same techniques cannot be directly
extended to solve problems in the same realm.
MapReduce skill availability: To effectively use most of the
big data platforms one has to be able to write some amount
of Map Reduce code; however, this is an area where skills are
evolving and (still) scarce.
Programming dependence: Many corporations are unable to
adjust to the idea of having teams design and develop code
(or data processing) – much like application software devel-
opment. Standardization of programming techniques for big
data are still maturing.
Business case: Most early adopters did not have a robust
business case, or, in many cases, the right business case to
implement on these platforms. The lack of an end-state solu-
tion -- or usage and ROI expectations -- has led to longer
development and implementation cycles.
Hype: Continued hype about the technology has caused
unrest amongst executives, line of business owners, IT, and
business users, leading to often misunderstood capabilities
of the platform as well as incorrect ROI or TCO expectations.
But wait: it is not “all over” when we talk about big data,
rather we have come to the point in time where the reality of
the platform – and how to drive its adoption within corpora-
tions – has started settling down. The big data bubble is well
and alive; in fact, it’s even progressing in the right direction.
How to Integrate Big Data As corporations begin to see beyond the hype of big data,
everyone from the executive sponsor to the implementa-
tion team is beginning to recognize the need to dig a better
foundation for integrating big data. There are a few subtle yet
invaluable pointers in this process:
Features
hAs the Big DaTa BUBBLe burst?[The BI industry is abuzz with one new question: is big data done?]
Krish KrishnAn
While the overall process of big data integration seems
closely aligned to the integration of any other project, there
are key differences that can define the success of the big data
bubble in your corporation: data discovery, data analysis, and
data visualization. These three integral pillars will clearly
identify the basis of how to implement big data and monetize
such an exercise.
The FutureSeveral technology providers have announced their support
of big data platforms, including Datastax (Cassandra), Intel,
Microsoft, EMC and HP (Hadoop), 10Gen (Mongo DB), and
Cray (YARC Graph Analytics DB). These vendors -- along with
existing vendors -- will undoubtedly continue to provide more
options and solution platforms for deploying and integrating
big data technologies within the enterprise platform.
The big data bubble has not busted; it is still only begin-
ning and will be reaching various levels of maturity over the
following years. There are many layers of complexities and
intricacies that need to be defined and formalized, but this is
where the evolution and opportunities exist.
Share your comments >
Krish Krishnan is a globally recognized
expert in the strategy, architecture, and
implementation of big data. His new
book Data Warehousing in the Age of Big
Data will be released in August 2013.
the big data bubble is well and alive; in fact, it’s even progressing in the right direction."
“
12 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 13
OME IN THE INDUSTRY ARE already writing epitaphs
for big data. Others – a prominent market watcher
comes to mind – argue that big data, like so many
technologies or trends before it, is simply conforming
to well-established patterns: following a period of hype, it’s
undergoing a correction. It’s regressing toward a mean.
That was fast.
This doesn’t concern us. Big data is an epistemic shift. It’s
going to transform how we know and understand — how we
perceive — the world. What’s meant by the term “big data” is
a force for destabilizing and reordering existing configura-
tions – much as the Bubonic Plague, or Black Death, was for
the Europe of the late-medieval period. It’s an unsettling anal-
ogy, but it underscores an important point: the phenomenon
of big data, like that of the Black Death, is indifferent to the
hopes, prayers, expectations, or innumerate prognostications of
human actors. It’s inevitable. It’s going to happen. It’s going to
change everything.
Even as the epitaphs are flying, the magic quadrants being
plotted, and the opinions mongering, big data is changing
(chiefly by challenging) the status quo. This is particularly the
case with respect to the domain of data management (DM) and
its status quo. Here, big data is already a disruptive force: at
once democratizing, reconfiguring, and destructive. We’ll con-
sider its reordering effect through the prism of Hadoop, which,
in the software development and data management worlds,
has to a real degree become synonymous with what’s meant
by “big data.”
[Big Data Vs. Data Management]
TwilighT of The (DM) idolsstephen sWOyer
The Citadel of Data ManagementBig data has been described as a wake-up call for data man-
agement (DM) practitioners.
If we’re grasping for analogies, the big data phenomenon
seems less like a wake-up call than.. .a grim tableau straight
out of 14th France.
This was the time of the Black Death, which was to function as
an enormous force for social destabilization and reordering. It
was also the time of the Hundred Years War, which was fought
between England and France on French soil. The manpower
shortage of the invading English was exacerbated by the viru-
lence of the Plague, which historians estimate killed between
one- to two-thirds of the European population. Outmanned
– and outwoman-ed, for that matter, once Joan D’Arc abrupted
onto the scene – the English resorted to a time-tested tactic:
the chevauchée. The logic of the chevauchée is fiendishly
simple: Edward III’s English forces were resource-constrained;
they enjoyed neither the manpower nor the defensive advan-
tages – e.g. , castles, towers, or city walls – that accrued (by
default) to the French. The English achieved their best out-
comes in pitched battle; the French, on the other hand, were
understandably reluctant to relinquish their fortifications,
fixed or otherwise.
The challenge for the English was to draw them out to fight.
Enter the chevauchée. It describes the “tactic” of rampag-
ing and pillaging – among other, far more horrific practices
– in the comparatively defenseless French countryside. Left
unchecked, the depredations of the chevauchée could ulti-
mately comprise a threat to a ruler’s hegemony: fealty counts
for little if it doesn’t at least afford one protection from other
would-be conquerors.
As a tactical tool, the chevauchée succeeded by challenging
the legitimacy of a ruling power.
Hadoop has had a similar effect. For the last two decades,
the data management (DM) or data warehousing (DW) Powers
That Be have been holed up in their fortified castles, dictating
terms of access – dictating terms of ingest; dictating time-
tables and schedules, almost always to the frustration of the
line of business, to say nothing of other IT stakeholders.
Though Hadoop wasn’t conceived tactically, its adoption and
growth have had a tactical aspect.
By running amok in the countryside, pillaging, burning, and
destroying stuff – or, by offering an alternative to the data
warehouse-driven BI model – the Hottentots of Hadoop have
managed to drag the Lords of DM into open battle.
At last year’s Strata + Hadoop World confab in New York, NY,
a representative with a prominent data integration (DI) ven-
dor shared the story of a frustrated customer that it says had
developed – perforce – an especially ambitious project focus-
ing on Hadoop.
The salient point, this vendor representative indicated, was
that the business and IT stakeholders behind the project saw
in Hadoop an opportunity to upend the power and authority of
the rival DM team. “It’s almost like a coup d’etat for them,” he
said, explaining that both business stakeholders and software
developers were exasperated by the glacial pace of the DM
team’s responsiveness. “[T]hey asked how long it would take to
get source connectivity [for a proposed application and] they
were told nine months. Now they just want to go around them
[i.e. , the data management group],” this representative said.
“[T]hey basically want Hadoop to be their new massive data
warehouse.”
The Zero-Sum ScenarioThis zero-sum scenario sets up a struggle for information
management supremacy. It proposes to isolate DM altogether;
eventually it would starve the DM group out of existence. It
views DM not as a potential partner for compromise, but as a
zero-sum adversary.
It’s an extremist position, to be sure; it nevertheless brings into
focus the primary antagonism that exists between software-
development and data-management stakeholders. This antag-
onism must be seen as a factor in the promotion of Hadoop as
a general-purpose platform for enterprise data management.
Hadoop was created to address the unprecedented challenges
associated with developing and managing data-intensive
distributed applications. The impetus and momentum behind
Hadoop originated with Web or distributed application devel-
opers. To some extent, Hadoop and other big data technology
projects are still largely programmer-driven efforts. This has
implications for their use on an enterprise-wide scale, because
software developers and data management practitioners have
very different worldviews. Both groups are accustomed to talk-
ing past one another. Each suspects the other of giving short
shrift to its concerns or requirements.
big data is an epistemic shift. it’s going to transform how we know and understand — how we perceive — the world.”“
Features
John O’Brien
Founder and CEO
Radiant Advisors
Dr. Robin Bloor Co-Founder and Principal Analyst
The Bloor Group
14 • rediscoveringBI Magazine • #rediscoveringBI
Get directions
In short, both groups resent one another. This resentment
isn’t symmetrical, however; there’s a power imbalance. For a
quarter century now, the DM group hasn’t just managed data
-- it’s been able to dictate the terms and conditions of access
to the data that it manages. In this capacity, it’s been able to
impose its will on multiple internal constituencies: not only
on software developers, but on line-of-business stakehold-
ers, too. The irony is that the per-
ceived inflexibility and unrespon-
siveness – the seeming indifference
– of DM stakeholders has helped to
bring together two other nominally
antagonistic camps; in their resent-
ment of DM, software developers
and the line of business have been
able to find common cause.
Few would deny that stakeholders
jealously guard their fiefdoms. This
is as true of software developers
and the line of business as it is of
their counterparts in the DM world.
Part of the problem is that DM
is viewed as an unreasonable or
uncompromising stakeholder: e.g. ,
DM practitioners have been unable
to meaningfully communicate the
logic of their policies; they’ve like-
wise been reluctant – or in some cases, unwilling – to revise
these policies to address changing business requirements. In
addition, they’ve been slow to adopt technologies or meth-
ods that promise to reduce latencies or which propose to
empower line-of-business users. Finally, DM practitioners are
fundamentally uncomfortable with practices – such as ana-
lytic discovery, with its preference for less-than-consistent
data – which don’t comport with data management best
practices.
Hadoop and Big Data in ContextThat’s where the zero-sum animus comes from. It explains
why some in business and IT
champion Hadoop as a technology to replace – or at the very
least, to displace – the DM status quo. There’s a much more
pragmatic way of looking at what’s going on, however.
This is to see Hadoop in context – i.e. , at the nexus of two
related trends: viz. , a decade-plus, bottom-up insurgency,
and a sweeping (if still coalescing) big data epistemic shift.
The two are related. Think back to the Bubonic Plague, which
had a destabilizing effect on the late-Medieval social order.
The depredations of the Plague effectively wiped out many
of the practices, customs, and (not to put too fine a point on
it) human stakeholders that might otherwise have contested
destabilization.
The Plague, then, cleared away the ante-status quo, creating
the conditions for change and transformation. Big data has
had a similar effect in data management – chiefly by raising
questions about the warehouse’s ability to accommodate
disruptions (e.g. , new kinds of data and new analytic use
cases) for which it wasn’t designed. Simply by claiming to
be Something New, big data raised questions about the DM
status quo.
This challenge was exploited by
well-established insurgent cur-
rents inside both the line of busi-
ness and IT. The former has been
fighting an insurgency against IT
for decades; however, in an age
of pervasive mobility, BYOD, social
collaboration, and (specific to the
DM space) analytic discovery, this
insurgency has taken on new force
and urgency.
IT, for its part, has grappled with
insurgency in its own ranks: the
agile movement, which most in
DM associate with project manage-
ment, began as a software develop-
ment initiative; it explicitly bor-
rowed from the language of politi-
cal revolution – the seminal agile
document is Kent Beck’s “Manifesto
for Agile Software Development,” published in 2001 – in
championing an alternative to software development’s top-
down, deterministic status quo.
Agility and insurgency have been slower to catch on in DM.
Nevertheless, insurgent pressure from both the line of busi-
ness and IT is forcing DM stakeholders (and the vendors who
nominally service them) to reassess both their strategies and
their positions.
However far-fetched, the possibility of a Hadoop-led chevau-
chée in the very heart of its enterprise fiefdom – with aid
and comfort from a line-of-business class that DM has too
often treated more as peasants than as enfranchised citizens
– snagged the attention of data management practitioners.
Big time.
ReinventionThe Hadoop chevauchée got the attention of DM practitio-
ners for another reason.
In its current state, Hadoop is no more suited for use as a
general-purpose, all-in-one platform for reporting, discovery,
and analysis than is the data warehouse. (See Sidebar: A Kludge Too Far?)Given the maturity of the DW, Hadoop is arguably much less
suited for this role. For all of its shortcomings, the data ware-
house is an inescapably pragmatic solution; (Contiued p21)DM practitioners learned what works chiefly by figuring out
Day One | Designing Modern Data PlatformsThese sessions provide an approach to confidently assess and make architecture changes, beginning with an understanding
of how data warehouse architectures evolve and mature over time, balancing technical and strategic value delivery. We break
down best practices into principles for creating new data platforms.
Day Two | Modern Data IntegrationThese sessions provide the knowledge needed for understanding and modeling data integration frameworks to make confident
decisions to approach, design, and manage evolving data integration blueprints that leverage agile techniques. We recognize
data integration patterns for refactoring into optimized engines.
Day Three | Databases for AnalyticsThese sessions review several of the most significant trends in analytic databases challenging BI architects today. Cutting through
the definitions and hype of big data in the market, NoSQL databases offer a solution for a variety of data warehouse requirements.
Register now at: http://radiantadiantadvisors.com
CAN'T MAKE IT? Catch us in San Francisco from May 28-30. Registration opens April 22nd. Use the priority code ReBI to save $150
At the Omni Downtown in Austin
AUsTiN, TXApril 29 - MAy 1
#sparkevent
Sponsored by:
Featured Keynotes By:
16 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 17
The most common big data use cases tend to be less sexy
than mundane.
In fact, two use cases for which big data is today having a
Big Impact have decidedly sexy implications, at least from
a data management (DM) perspective.
Both use cases address long-standing DM problems;
both likewise anticipate issues specific to the age of big
data. The first involves using big data technologies to
super charge ETL; the second, as a landing zone – i.e. , a
general-purpose virtual storage locker – for all kinds of
information.
Of the two, the first is the more mature: IT technologists
have been talking up the potential of super-charged ETL
almost from the beginning.
Back then, this was framed largely in terms of MapReduce,
the mega-scale parallel processing algorithm popular-
ized by Google. Five years on, the emphasis has shifted
to Hadoop itself as a platform for massively parallel ETL
processing.
The rub is that performing stuff other than map and
reduce operations across a Hadoop cluster is kind of a
kludge. (See sidebar: A KLUDGE TOO FAR?.)
However, because ETL processing can be broken down
into sequential map and reduce operations, data integra-
tion (DI) vendors have managed to make it work. Some DI
players – e.g. , Informatica, Pervasive Software, SyncSort,
and Talend, among others – market ETL products for
Hadoop. Both Informatica and Talend – along with ana-
lytic specialist Pentaho Inc. – use Hadoop MapReduce to
perform ETL operations. Pervasive and SyncSort, on the
other hand, tout libraries that they say can be used as
MapReduce replacements. The result, both vendors claim,
is ETL processing that’s (a) faster than vanilla Hadoop
MapReduce and (b) orders of magnitude faster than tradi-
tional enterprise ETL.
This stuff is available now. In the last 12 calendar months,
both Informatica and Talend announced “big data” ver-
sions of their ETL technologies for Hadoop MapReduce;
Pervasive and SyncSort have marketed Hadoop-able ver-
sions of their own ETL tools (DataRush and DMExpress,
respectively) for slightly longer. In every case, big data
ETL tools abstract the complexity of Hadoop: ETL work-
flows are designed in a GUI design studio; the tools them-
selves generate jobs in the form of Java code, which can
be fed into Hadoop.
Just because the technology’s available doesn’t mean
there’s demand for it.
Parallel processing ETL technologies have been available
for decades; not everybody needs or can afford them,
however. David Inbar, senior director of big data products
with Pervasive, concedes that demand for mega-scale ETL
processing used to be specialized.
At the same time, he says, usage patterns are changing;
analytic practices and methods are changing. So, too, is
the concept of analytic scale: scaling from gigabyte-sized
data sets to dozens or hundreds of terabytes – to say
nothing of petabytes – is an increase of several orders
of magnitude. In the emerging model, rapid iteration is
the thing; this means being able to rapidly prepare and
crunch data sets for analysis.
sidebAr
big dAtA: big impAct
[STEPHEN SWOYER]
[STEPHEN SWOYER]
SIDEBAR: A Kludge Too FAr?
Nor is analysis a one-and-done affair, says Inbar: it’s itera-
tive.
“What really matters is not so much if it uses MapReduce
code or if it uses some other code; what really matters is
does it perform and does it save you operational money –
and can you actually iterate and discover patterns in the
first place faster than you would be able to otherwise?” he
asks. “It’s always possible to write custom code to get stuff
done. Ultimately it’s a relatively straightforward [proposi-
tion]: [manually] stringing together SQL code [for tradi-
tional ETL] or Java code [for Hadoop] can work, but it’s not
going to carry you forward.”
However, one of the data warehouse’s (DW) biggest selling
points is also its biggest limiting factor.
The DW is a schema-mandatory platform. It’s most comfort-
able speaking SQL. It uses a kludge – i.e. , the binary large
object (BLOB) – to accommodate unstructured, semi-struc-
tured, or non-traditional data-types. Hadoop, by contrast, is
a schema-optional platform.
For this reason, many in DM conceive of Hadoop as a virtual
storage locker for big data.
“You can drop any old piece of data on it without having to
do any of the upfront work of modeling the data and trans-
forming it [to conform to] your data model,” explains Rick
Glick, vice president of technology and architecture with
analytic discovery specialist ParAccel. “You can do that [i.e. ,
transform and conform] as you move the data over.”
At a recent industry event, several vendors – viz. ,
Hortonworks, ParAccel, and Teradata, – touted Hadoop as
a point of ingest for all kinds of information. This “landing
zone” scenario is something that customers are adopting
right now, says Pervasive’s Inbar; it has the potential to be
the most common use case for Hadoop in the enterprise.
“Before you can do all of the amazing/glamorous/ground-
breaking analytical work … and innovation, you do actually
have to land and ingest and provision the data,” he argues.
“Hadoop and HDFS are wonderful in that they let you [store
data] without having predefined what it is you think you’re
going to get out of it. Traditionally, the data warehouse
requires you to predefine what you think you’re going to
get out of it in the first place.”
The problem with MapReduce – to invoke a shopworn
cliché – is that it’s a hammer.
From its perspective, any and every distributed processing
task wants and needs to be nailed. If Hadoop is to be a
useful platform for general-purpose parallel processing,
it must be able to perform operations other than synchro-
nous map and reduce jobs.
The problem is that MapReduce and Hadoop are tightly
coupled: the former has historically functioned as paral-
lel processing yin to the Hadoop Distributed File System’s
storage yang.
Enter the still-incubating Apache YARN project (YARN is
a bacronym for “Yet Another Resource Negotiator”), which
aims to decouple Hadoop from MapReduce.
Right now, Hadoop’s Job Tracker facility performs two
functions: resource management and job scheduling;
YARN breaks Job Tracker into two discrete daemons.
From a DM perspective, this will make it possible to
perform asynchronous operations in Hadoop; it will also
enable pipelining, which – to the extent it’s possible in
Hadoop today – is typically supported by vendor-specific
libraries.
YARN’s been a long time coming, however: it’s part of
the Hadoop 2.0 framework, which is still in development.
Given what’s involved, some in DM say YARN’s going to
need seasoning before it can be used to manage mission-
critical, production workloads.
That said, YARN is hugely important to Hadoop. It has
support from all of the Hadoop Heavies: Cloudera, EMC,
Hortonworks, Intel, MapR, and others.
“It feels like it’s been coming for quite a while,” concedes
David Inbar, senior director of big data products with data
integration specialist Pervasive Software. “All of the play-
ers … are in favor of it. Customers are going to need it. If
as a sysadmin you don’t have a unified view of everything
that’s running and consum[ing] resources in your environ-
ment, that’s going to be suboptimal,” Inbar continues. “So
YARN is a mechanism that’s going to make it easier to
manage [Hadoop clusters]. It’s also going to open up the
Hadoop distributed data and processing framework to a
wider range of compute engines and paradigms.”
Just because the technology's available doesn't mean there's demand for it."
18 • rediscoveringBI Magazine • #rediscoveringBI rediscoveringBI Magazine • #rediscoveringBI • 19
ERHAPS YOUR ORGANIZATION IS hearing the buzz
about big data and business analytics creating value,
transforming businesses, and gaining new insights. Or,
perhaps you’ve spent some time and resources during
the past year reading publications or attending industry
events, or even launched a small scale “big data pilot” exper-
iment. In any case, if you’re at the early stages of your com-
pany’s journey into big data, there are some important con-
versations to keep in mind as you continue your path to
bringing business intelligence (BI) and your company’s big
data together.
Big Data and the Business Intelligence Program
For the most part, big data environments are those that adopt
Apache’s Hadoop or one of its variants (like Cloudera, MapR,
or HortonWorks) or the NoSQL databases (like MongoDB,
Cassandra, or HBase with Hadoop). These data stores have
massive scalability and unstructured data flexibility at the
best price. No longer
reserved for the biggest IT shops, the democratization of big
data comes from Hadoop’s ability to enable any company to
affordably and easily exploit big data sets, and sometimes go
even further with Cloud implementations. Gleaning insights
from these vast data sets requires a completely different type
of data platform and programming framework for creating
insightful analytic routines.
Analytics is not new to BI: the ability to execute statistical
models and identify hidden patterns and clusters of data
has long allowed for better business decision-making and
predictions. What these new BI analytic capabilities have
in common is that they work beyond the capabilities of SQL
statements that govern relational database management
systems to execute embedded algorithms. No longer are we
constrained to sample data sets; advanced analytic tools can
now execute their algorithms in parallel at the data layer. For
many years, data has been extracted from data warehouses
into flat files to be executed outside the RDBMS by data min-
ing software packages (like SPSS, SAS, and Statistica). Both
traditional capabilities -- reporting and dimensional analysis
– have always been needed, along with what is now being
called “Analytics” in today’s BI programs.
Big data analytics are another one of the several BI capabili-
ties required by the business. And, even when big data is not
3WAyS to BRIng BI And BIg dATA togEthER
JOhn O’brien
[Three things that make a “big” difference when implementing big data.]
Features
1.
so “big” there are other reasons why Hadoop and NoSQL are
better solutions than RDBMSs, or cubes. Most common is when
working with the data is beyond the capabilities of SQL and
tends to be more programmatic. The second most common
is when the data be captured is constantly changing or is an
unknown structure, such that a database schema is difficult to
maintain. In this scenario, schema-less Hadoop and key value
data stores are a clear solution. Another is when the data
needs to be stored in various data types, such as documents,
images, videos, sounds, or other non-record like data (think,
for example, about the metadata to be extracted from a photo
image, like date, time, geo-coding, technical photography data,
meta-tags, and perhaps even names of people from facial rec-
ognition). Most company big data environments today are less
than ten terabytes and fewer than eight nodes in the Hadoop
cluster because of the other “non-bigness” requirements.
Data Platform = Big Data + Data Warehouse
You might have already discussed what to do now that you
have both a Hadoop and data warehouse system. Should the
data warehouse be moved into Hadoop, or should you link
them? Do you provide a semantic layer over both of them for
users or between the data stores?
Most companies are moving forward recognizing that both
environments serve different purposes, but are part of a com-
plete BI data platform. The traditional hub and spoke archi-
tecture of data warehouses and data marts is evolving into a
modern data platform of three tiers: big data Hadoop, analytic
databases, and the traditional RDBMS. Industry analysts are
contemplating whether this is a two-tier or three-tier data
platform, especially given the expected maturing of Hadoop
in the coming years; however, it is safe to say that analytic
databases will be the cornerstone of modern BI data platforms
for years to come.
The analytic database tier is really for highly-optimized or
highly-specialized workloads -- such as columnar, MPP, and in-
memory (or vector based) -- for analytic performance, or text
analytics and graph databases for highly-specialized analytic
capabilities. Big data governance and analytic lifecycles would
encompass semantic and analytic discoveries made in Hadoop,
combined with traditional reference data, and then be migrat-
ed and productionized in a more controlled, monitored-- and
accessible -- analytics tier.
2.
...the democratization of big data comes from hadoop’s
ability to enable any company to affordably and easily exploit
big data sets”
“
Stephen Swoyer is a technology
writer with more than 15 years of
experience. His writing has focused
on business intelligence and data
warehousing for almost a decade.
rediscoveringBI Magazine • #rediscoveringBI • 21
what doesn’t work. The genealogy of the data warehouse is
encoded in a double-helix of intertwined lineages: the first is
a lineage of failure; the second, a lineage of success born of
this failure. The latter has been won – at considerable cost –
at the expense of the former. A common DM-centric critique
of Hadoop (and of big data in general) is that some of its sup-
porters want to throw out the old order and start from scratch.
As with the chevauchée – which entailed the destruction of
infrastructure, agricultural sustenance, and formative social
institutions – many in DM (rightly) see in this a challenge to
an entrenched order or configuration.
They likewise see the inevitability of avoidable mistakes –
particularly to the extent that Hadoop developers are con-
temptuous of or indifferent to the finely-honed techniques,
methods, and best practices of data management.
“Reinvention is exactly it, … [but] they aren’t inventing data
management technology. They don’t understand data manage-
ment at all,” argues industry veteran Mark Madsen, a principal
with information management consultancy Third Nature Inc.
Madsen is by no means a Hadoop hater; he notes that, as a
schema-optional platform, Hadoop seems tailor-made for the
age of big data: it can function as a virtual warehouse – i.e. ,
as a general-purpose storage area – for information of any
and every kind.
The DW is schema-mandatory; its design is predicated on
a pair of best-of-all-possible-worlds assumptions: firstly,
that data and requirements can be known and modeled in
advance; secondly, that requirements won’t significantly
change. For this very reason, the data warehouse will never be
a good general-purpose storage area. Madsen takes issue with
Hadoop’s promotion as an information management platform-
of-all-trades.
Proponents who tout such a vision “understand data process-
ing. They get code, not data,” he argues. “They write code and
focus on that, despite the data being important. Their ethos
is around data as the expendable item. They think [that] code
[is greater than or more important than] data, or maybe [they]
believe that [even though they say] the opposite. So they do
not understand managing data, data quality, why some data is
more important than other data at all times, while other data
is variable and/or contextual. They build systems that pre-
sume data, simply source and store it, then whack away at it.”
The New PragmatismInitially, interest in Hadoop took the form of dismissive
assessments.
A later move was to co-opt some of the key technologies
associated with Hadoop and big data: almost five years ago,
for example, Aster Data Systems Inc. and Greenplum Software
(both companies have since been acquired by Teradata
and EMC, respectively) introduced in-database support for
MapReduce, the parallel processing algorithm that search
giant Google had first helped to popularize, and which Yahoo
helped to democratize – in the guise of Hadoop. Aster and
Greenplum effectively excised MapReduce from Hadoop and
implemented it (as one algorithm among others) inside their
massively parallel processing (MPP) database engines; this
gave them the ability to perform mapping/reducing opera-
tions across their MPP clusters, on top of their own file sys-
tems. Hadoop and its Hadoop Distributed File System (HDFS)
were nowhere in the mix.
It was, however, a big part of the backstory. Let’s turn the clock
back just a bit more, to early-2008, when Greenplum made a
move which hinted at what was to come – announcing API-
level support for Hadoop and HDFS. In this way, Greenplum
positioned its MPP appliance as a kind of choreographer for
external MapReduce jobs: by writing to its Hadoop API, devel-
opers could schedule MapReduce jobs to run on Hadoop and
HDFS. The resulting data, data sets, or analysis could then be
recirculated back to the Greenplum RDBMS.
Today, this is one of the schemes by which many in DM
would like to accommodate Hadoop and big data. The differ-
ence, at least relative to half a decade ago, is a kind of frank
acceptance of the inevitability – and, to some extent, of the
desirability – of platform heterogeneity. Part of this has to do
with the “big” in big data: as volumes scale into the double-
or triple-digit terabyte -- or even into the petabyte – range,
technologists in every IT domain must reassess what they’re
doing and where they’re doing it, along with just how they
expect to do it in a timely and cost-effective manner. Bound
up with this is acceptance of the fact that DM can no longer
simply dictate terms: that it must become more responsive to
the concerns and requirements of line-of-business stakehold-
ers, as well as to those of its IT peers; that it must open itself
up to new types of data, new kinds of analytics, new ways of
doing things.
“The overall strategy is one of cooperative computing,”
explains Rick Glick, vice president of technology and archi-
tecture with analytic discovery specialist ParAccel Inc. “When
you’re dealing with terabytes or petabytes [of data], the chal-
lenge is that you want to move as little of it as possible. If
you’ve got these other [data processing] platforms, you inevi-
tably say, ‘Where is the cheapest place to do it?’” This means
proactively adopting technologies or methods that help to
promote agility, reduce latency, and empower line-of-business
users. This means running the “right” workloads in the “right”
place, with “right” being understood as a function of both
timeliness and cost-effectiveness. Share your comments >
(Continued from p12)
3. Determining Access
Apache “Hive” is sometimes called the “data warehouse appli-
cation on top of Hadoop” as it enables a more generalized
access capability for everyday users with its familiar Hive-QL
format that SQL-familiar users can understand. Hive provides a
semantic layer that allows for the definition of familiar tables
and columns mapped to key-value pairs found in Hadoop. With
virtual tables and columns in places, Hive users can write HQL
to access data within the Hadoop environment.
More recently, has been the release of “HCatalog,” which is mak-
ing its way into the Apache Hadoop project. HCatalog is the
semantic layer component similar to Hive, and allows for the
definition of virtual tables and columns for communication with
any application, not just HiveQL. Last summer, data visualization
tool Tableau allowed users to work with and visualize Hadoop
data for the first time via HCatalog. Today, many analytic data-
bases are allowing users to work with tables that are views to
HCatalog and Hadoop data. Some vendors also choose to lever-
age Hive as access to Hadoop data by leveraging its semantic
layer and converting user SQL statements into HQL statements.
Expect more BI vendors to follow suit and enable their own con-
nectivity to Hadoop.
There are emerging new agile analytic development methodolo-
gies and processes that enable the iterative and agile nature of
analytics in big data environments for discovery, then couple
that with data governance procedures to properly move the
analytic models to a faster analytic database with operational
controls and access. In this model, companies can store big data
cheaply until its value can be determined, and then move it to
the appropriate production and valued data platform tier. This
could be a map-reduce extract to a relational database data
mart (or cube), or this could be executing the analytic program
in an MPP, columnar, or in-memory high-performance database.
More to ComeWhile big data has come a long way in just a short amount of
time, it still has a long road ahead as an industry, as a maturing
technology, and as best practices are realized and shared. Don’t
compare your company with mega e-commerce companies (like
Yahoo, Facebook, Google, or LinkedIn) who live and breathe big
data as a part of their mission critical core business functions
for many years already. Rather, think of your company as the
other 99% of companies -- small and large -- found in every
industry exploring opportunities to unlock the hidden value in
big data on their own. These companies typically already have a
BI program underway, but now must grapple with the challenge
of maintaining BI delivery from structured operational data
combined with the new integration of big data platforms for
business analysts, customers, and internal consumers.
Share your comments >
John O’Brien is the Principal and CEO
of Radiant Advisors, a strategic advisory
and research firm that delivers innova-
tive thought-leadership, publications,
and industry news.
While big data has come a long way in just a short amount of time, it still has a long road ahead as an industry, as a maturing technology, and as best practices are realized and shared."
“
CHADVISEDOPRESEARREARCHADCHADVISEDDVISEDEVEDEVELOPR
Radiant Advisors is a strategic advisory and research firm that networks with industry experts to deliver innovative thought-leadership, cutting-edge publications, and in-depth industry research.
v i s i t w w w . r a d i a n t a d v i s o r s . c o m
F o l l o w u s o n Tw i t t e r ! @ r a d i a n t a d v i s o r s
AbOut rAdiAnt AdvisOrsr e s e A r c h . . . A d v i s e . . . d e v e l O p . . .
Recommended