Upload
bob-gourley
View
215
Download
0
Embed Size (px)
Citation preview
8/2/2019 Big Data Analytical Tools
1/8
March 2012
White Paper:Evaluating Big DataAnalytical Capabilities For
Government Use
CTOlabs.com
Inside:
The Big Data Tool Landscape
Big Data Tool Evaluation Criteria
More Resources
A White Paper providing context and guidance you can use
8/2/2019 Big Data Analytical Tools
2/8
CTOlabs.com
Evaluating Big Data Analytical Capabilities or Government
This paper, produced by the analysts and researchers of CTOlabs.com, proposes ten criteria for
evaluating analytical tools, focused on capabilities in the emerging Big Data space. The methods and
models here can help you select the best capability for your mission needs.
Executive Summary
The need for sensemaking across large and growing data stores has given rise to new approaches
to data infrastructure, including the use of capabilities like Apache Hadoop. Hadoop overcomes
traditional limitations of storage and compute by delivering capabilities that run on commodity
hardware and can leverage any data type. Hadoop enables scalability to the largest of data sets in a
very cost eective way, making it the infrastructure of choice for organizations seeking to make sense
of their growing data stores. Its ability to store data without a data model means information can be
leveraged without rst knowledge of what questions will be asked of the data, making this a system
with far more agility than legacy data based.
The core capability of Hadoop has now grown to include a full framework of tools that include a data
warehouse infrastructure (Hive), parallel computation capabilities (Pig), scalable distributed databasesable to store large tables (HBase), scalable means of distributing data (HDFS) and tools for rapidly
importing and managing data and coordinating the infrastructure (like Sqoop, Flume, Oozie and
Zookeeper). The use of this framework of Hadoop tools has given rise to a new series of innovation in
sensemaking over large quanties of data and has laid the foundation for a dramatic growth of new
analytical tools which can operate over these Big Data infrastructures.
Over the last several years organizations that wanted to leverage this Hadoop framework wrote
their own analytical capabilities to ride on top of the infrastructure. Now a new trend has emerged.
Organizations can turn to commercial vendors who oer analytical packages that ride on top of the
Hadoop framework. This positive trend makes it easier to deliver advanced big data solutions to end
users. The right tool can enable more agile use of your organizations data stores and can do so quickly.
The right tool can also make Big Data analytics so easy that end users can form their own queries and
generate their own responses. This new development is particularly exciting to knowledge-based
government organizations seeking to empower their workforce with up to date insights.
1
8/2/2019 Big Data Analytical Tools
3/8
A White Paper or the Government IT Community
This paper provides a framework meant to help in your evaluation of Big Data analytical tools. Wereview ten factors we believe should be paramount in your evaluations of Big Data analytical software
packages. We present these factors in a way you should nd easily tailorable to your organizational
needs.
Ten Evaluation Factors
The ten factors we believe should be at the forefront of your decision are:
Mission Functionality/Capability
Ease of Use/Interface
Architecture Approach Data Architecture
Models
Licensing
Security and Enterprise Governance
Partner Ecosystem
Deployment Models
Health of the Firm
We expand on these factors below.
Mission Functionality/Capability: This may be the most important factor in deciding which Big
Data analytical tools you decide to leverage in your infrastructure. If the Big Data analytical package
you are selecting does not have the capability you expect and need then no other factor matters. The
importance of this factor in evaluating solutions means you should have a well thought out vision
you can articulate for your desired capability. For example, do you need a system that can analyze
all types of unstructured and structured data? Do you need a solution that enables collaboration
between analysts? Or one that has a focus on extracting knowledge from existing data stores? Do you
want a system that just works in the back oce of an IT shop or one that supports missions through
empowering end-users?
Ease o Use/Interace: One of the rst questions you should ask when choosing Big Data analytical
tools is who you intend to use them. Do you want to increase the capabilities of your data scientists
to dig deeper into new questions? Do you want to increase the power of your analysts? Or are you
hoping to push analytical capabilities out to your entire workforce? Giving more of your enterprise
access to Big Data solutions leads to a more informed and agile workforce and reduces the IT
2
8/2/2019 Big Data Analytical Tools
4/8
CTOlabs.com
3
bottleneck. The same tools that help intelligence analysts map networks can help web developersevaluate guest activity on a website, can help the citizen-facing parts of your organization understand
citizen requirements/trends, and help HR keep track of work ows and loads. But these capabilities
only help if your workforce is willing and able to use them. A powerful tool that takes several
specialized degrees or requires specic expertise such as SQL to utilize will obviously have both limited
impact and limited usage, but more broadly, many non-IT professionals demand walk-up usability
from their information management software. Often entire departments only use a small fraction of
the capabilities that powerful analytics provide because they are intimidating or hard to access.
Interface matters as much for specialists as it does for your less tech-savvy employees. Your analytics
should be able to pose and answer questions across all data quickly and organically so that they
become an extension of the analysts thought process. A natural interface can be more important than
any individual functionality. With smooth and ecient interactions between tools and users, analysts
and decisionmakers make more and better decisions faster, which is the ultimate goal of analytics.
Architecture Approach: Some solutions require you to establish entire architectures just to support
them. This is not a good approach. Other solutions are their own stand-alone islands and expect you
to get all data into their closed system for them to do analysis. This might be ok for some missions, but
in most cases you will want systems that work with your existing enterprise architecture and are able
to securely move data in and out of the analytical tool.
Your architecture should also help drive the interface into the capability. In most cases, every user in
your organization will have a browser on their device already. Shouldnt that be the interface into all
your new analytical capabilities as well? Bottom line here: The solution you choose should work with
your architecture and should not force you to re-engineer. Expect the new solution to integrate well
with what you already have.
Data Architecture: Common standards for data are already key foundational components of most
organization IT strategies. But integration of new tools can be complicated, requiring extensive set-up
and conguration to extract, transform and load data from multiple sources. Tools that require largeteams of programmers to build ETL accesses into existing data stores are not going to have the agility
required to take advantage of new data sources or to accommodate shifting mission needs or new
8/2/2019 Big Data Analytical Tools
5/8
A White Paper For The Federal IT Community
business plans. Look for Big Data analytical tools that do not require complex data mappings andschema development that are time consuming and lock your architecture into a xed way of work.
Look for tools that are designed to work with any type of data (they should be data source agnostic).
Systems that force data to be collected again and imported into their local store in set formats and
indices designed only for that systems use are sub-optimal and will limit your ability to perform your
mission with the exibility you want. Seek a capability that has designed in an ability to add new data
fast, without a need for engineers to design and activate the new data feed. Demand integration
without limits.
Analytical Models: Analytical systems designed to help with complex issues use ontologies. These
are ways of reecting associations and meanings. Ontologies are sometimes called world views of
an organization, since they reect concepts in the environment that the group is dealing with. Simple,
basic systems can be found that use a single ontology system. These are ok as long as the problem
you analyze will never change. Multi-ontology systems enable you to see dierent perspectives
and manage policy by namespace. Multi-ontology systems also better enable discovery of new
conclusions. The ability to have multiple models allows multiple issues to be worked, and multiple
organizations can make use of the same tool. This lowers overall cost and speeds return on investment.
Bottom line here: Do not select a tool that forces you to lock in on a particular analytical model.
Licensing: User organizations should, to the greatest extent possible, push for licensing that is aseconomical, exible and predictable. For many analytical tools a license based on number of users
is a common approach. Some tools license based on the number of processors or servers or cores
so you can be stuck with a high cost even if you have no one using the tool. You want systems from
companies that are motivated to serve users, so licenses that reect actual analytics used, regardless of
processors or users, are the most exible and are generally the best for this type of tool. For example,
if the mission team needs to be drastically expanded in a short period of time, it may slow the project
down while more licenses are acquired. Also user licenses are, in most cases, acquired for longer
periods of time than mission list, so when you compare options, this sort of choice can be signicantly
lower cost to start and to maintain. You should also be careful about other licenses that are hidden
when you buy a Big Data tool. For example, are you also required to buy an Oracle or Sybase license?
Security and Governance: Enterprises require authentication, authorization, auditing and other
governance of tools for eective oversight of mission support and for ensured reliability. Expect the
capability you support to have options for LDAP/Active Directory integration, role-based access with
delegation, integrated encryption methods and strong audit capabilities. Tools working with Hadoop
4
8/2/2019 Big Data Analytical Tools
6/8
CTOlabs.com
5
clusters should have an ability to run in the secure areas of your network that hold the Hadoop masterand slave nodes.
Partner and Legacy Ecosystem: Your legacy IT infrastructure comes from a wide range of rms. Any
organization of size will have software that operates over datastores from companies like Oracle,
Microsoft, Sybase, MySQL, IBM, Cloudera and countless others. And analytical tools from a wide
range of vendors are also in your ecosystem. This means any Big Data capability you pick should have
great exibility in working with others in the ecosystem. Your Big Data solution must be able to work
with anyone. So the Big Data capabilities you pick should be designed to enable customization and
extension. This includes an ability to change ontologies, change interfaces, change data sources and
change the other tools that it interfaces to.
Deployment Models:The capabilities you acquire should be able to run without a large contractor
sta. Specialists are frequently required to install a capability and some level of services and support
to your team can be expected, but if you must buy a large number of engineers to keep the Big
Data tools running then you really have not bought a solution. You have bought the capability plus
engineers, and the cost of that will eat you alive. If you are told that engineers are required it should
send up other alarms. Will there always have to be a wizard behind the curtain?
Health o the Firm: Who are you buying your capability from? Are they a user-focused organizationthat cares and will be with you long term? This can be hard to evaluate but it is worth some homework.
What if the rm you are dealing with has the great reputation of an Enron pre-crash? How would you
know as a potential user if the rm has the ethics and abilities you require? Is this rm having trouble
staying aoat? If you are relying on the company for support, you may lose your investment if it closes
its doors. This is why the government mandates market research to be done in Federal Acquisition
Regulations. Never skip that step! Research the capability itself and the rm you are doing business
with.
Concluding Thoughts
There are many other criteria you may want to consider for evaluating Big Data analytical tools, but the
ten above are key for ensuring long term mission success. We also believe it is important to speak with
others who have used the tools you are evaluating to get the benet of the lessons learned of others.
This is especially important in the current budget environment.
8/2/2019 Big Data Analytical Tools
7/8
A White Paper For The Federal IT Community
More Reading
For more federal IaaS technology and policy issues visit:
CTOvision.com- A blog for enterprise technologists with a special focus on Big Data.
CTOlabs.com - A reference for research and reporting on all IT issues.
Carahsot.com - Oering Big Data solutions for Government.
About the Authors
Ryan Kamaufis the lead technology research analyst at Crucial Point LLC, focusing in disruptive
technologies of interest to enterprise technologists. He is also a writer at CTOvision.com Contact Ryan
Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former
federal CTO. Contact Bob at [email protected]
6
8/2/2019 Big Data Analytical Tools
8/8
CTOlabs.com
For More Inormation
If you have questions or would like to discuss this report, please contact me. As an advocate for better
IT in government, I am committed to keeping the dialogue open on technologies, processes and best
practices that will keep us moving forward.
Contact:Bob Gourley
703-994-0549
All information/data 2011 CTOLabs.com.