Big Data Analytical Tools

Embed Size (px)

Citation preview

  • 8/2/2019 Big Data Analytical Tools

    1/8

    March 2012

    White Paper:Evaluating Big DataAnalytical Capabilities For

    Government Use

    CTOlabs.com

    Inside:

    The Big Data Tool Landscape

    Big Data Tool Evaluation Criteria

    More Resources

    A White Paper providing context and guidance you can use

  • 8/2/2019 Big Data Analytical Tools

    2/8

    CTOlabs.com

    Evaluating Big Data Analytical Capabilities or Government

    This paper, produced by the analysts and researchers of CTOlabs.com, proposes ten criteria for

    evaluating analytical tools, focused on capabilities in the emerging Big Data space. The methods and

    models here can help you select the best capability for your mission needs.

    Executive Summary

    The need for sensemaking across large and growing data stores has given rise to new approaches

    to data infrastructure, including the use of capabilities like Apache Hadoop. Hadoop overcomes

    traditional limitations of storage and compute by delivering capabilities that run on commodity

    hardware and can leverage any data type. Hadoop enables scalability to the largest of data sets in a

    very cost eective way, making it the infrastructure of choice for organizations seeking to make sense

    of their growing data stores. Its ability to store data without a data model means information can be

    leveraged without rst knowledge of what questions will be asked of the data, making this a system

    with far more agility than legacy data based.

    The core capability of Hadoop has now grown to include a full framework of tools that include a data

    warehouse infrastructure (Hive), parallel computation capabilities (Pig), scalable distributed databasesable to store large tables (HBase), scalable means of distributing data (HDFS) and tools for rapidly

    importing and managing data and coordinating the infrastructure (like Sqoop, Flume, Oozie and

    Zookeeper). The use of this framework of Hadoop tools has given rise to a new series of innovation in

    sensemaking over large quanties of data and has laid the foundation for a dramatic growth of new

    analytical tools which can operate over these Big Data infrastructures.

    Over the last several years organizations that wanted to leverage this Hadoop framework wrote

    their own analytical capabilities to ride on top of the infrastructure. Now a new trend has emerged.

    Organizations can turn to commercial vendors who oer analytical packages that ride on top of the

    Hadoop framework. This positive trend makes it easier to deliver advanced big data solutions to end

    users. The right tool can enable more agile use of your organizations data stores and can do so quickly.

    The right tool can also make Big Data analytics so easy that end users can form their own queries and

    generate their own responses. This new development is particularly exciting to knowledge-based

    government organizations seeking to empower their workforce with up to date insights.

    1

  • 8/2/2019 Big Data Analytical Tools

    3/8

    A White Paper or the Government IT Community

    This paper provides a framework meant to help in your evaluation of Big Data analytical tools. Wereview ten factors we believe should be paramount in your evaluations of Big Data analytical software

    packages. We present these factors in a way you should nd easily tailorable to your organizational

    needs.

    Ten Evaluation Factors

    The ten factors we believe should be at the forefront of your decision are:

    Mission Functionality/Capability

    Ease of Use/Interface

    Architecture Approach Data Architecture

    Models

    Licensing

    Security and Enterprise Governance

    Partner Ecosystem

    Deployment Models

    Health of the Firm

    We expand on these factors below.

    Mission Functionality/Capability: This may be the most important factor in deciding which Big

    Data analytical tools you decide to leverage in your infrastructure. If the Big Data analytical package

    you are selecting does not have the capability you expect and need then no other factor matters. The

    importance of this factor in evaluating solutions means you should have a well thought out vision

    you can articulate for your desired capability. For example, do you need a system that can analyze

    all types of unstructured and structured data? Do you need a solution that enables collaboration

    between analysts? Or one that has a focus on extracting knowledge from existing data stores? Do you

    want a system that just works in the back oce of an IT shop or one that supports missions through

    empowering end-users?

    Ease o Use/Interace: One of the rst questions you should ask when choosing Big Data analytical

    tools is who you intend to use them. Do you want to increase the capabilities of your data scientists

    to dig deeper into new questions? Do you want to increase the power of your analysts? Or are you

    hoping to push analytical capabilities out to your entire workforce? Giving more of your enterprise

    access to Big Data solutions leads to a more informed and agile workforce and reduces the IT

    2

  • 8/2/2019 Big Data Analytical Tools

    4/8

    CTOlabs.com

    3

    bottleneck. The same tools that help intelligence analysts map networks can help web developersevaluate guest activity on a website, can help the citizen-facing parts of your organization understand

    citizen requirements/trends, and help HR keep track of work ows and loads. But these capabilities

    only help if your workforce is willing and able to use them. A powerful tool that takes several

    specialized degrees or requires specic expertise such as SQL to utilize will obviously have both limited

    impact and limited usage, but more broadly, many non-IT professionals demand walk-up usability

    from their information management software. Often entire departments only use a small fraction of

    the capabilities that powerful analytics provide because they are intimidating or hard to access.

    Interface matters as much for specialists as it does for your less tech-savvy employees. Your analytics

    should be able to pose and answer questions across all data quickly and organically so that they

    become an extension of the analysts thought process. A natural interface can be more important than

    any individual functionality. With smooth and ecient interactions between tools and users, analysts

    and decisionmakers make more and better decisions faster, which is the ultimate goal of analytics.

    Architecture Approach: Some solutions require you to establish entire architectures just to support

    them. This is not a good approach. Other solutions are their own stand-alone islands and expect you

    to get all data into their closed system for them to do analysis. This might be ok for some missions, but

    in most cases you will want systems that work with your existing enterprise architecture and are able

    to securely move data in and out of the analytical tool.

    Your architecture should also help drive the interface into the capability. In most cases, every user in

    your organization will have a browser on their device already. Shouldnt that be the interface into all

    your new analytical capabilities as well? Bottom line here: The solution you choose should work with

    your architecture and should not force you to re-engineer. Expect the new solution to integrate well

    with what you already have.

    Data Architecture: Common standards for data are already key foundational components of most

    organization IT strategies. But integration of new tools can be complicated, requiring extensive set-up

    and conguration to extract, transform and load data from multiple sources. Tools that require largeteams of programmers to build ETL accesses into existing data stores are not going to have the agility

    required to take advantage of new data sources or to accommodate shifting mission needs or new

  • 8/2/2019 Big Data Analytical Tools

    5/8

    A White Paper For The Federal IT Community

    business plans. Look for Big Data analytical tools that do not require complex data mappings andschema development that are time consuming and lock your architecture into a xed way of work.

    Look for tools that are designed to work with any type of data (they should be data source agnostic).

    Systems that force data to be collected again and imported into their local store in set formats and

    indices designed only for that systems use are sub-optimal and will limit your ability to perform your

    mission with the exibility you want. Seek a capability that has designed in an ability to add new data

    fast, without a need for engineers to design and activate the new data feed. Demand integration

    without limits.

    Analytical Models: Analytical systems designed to help with complex issues use ontologies. These

    are ways of reecting associations and meanings. Ontologies are sometimes called world views of

    an organization, since they reect concepts in the environment that the group is dealing with. Simple,

    basic systems can be found that use a single ontology system. These are ok as long as the problem

    you analyze will never change. Multi-ontology systems enable you to see dierent perspectives

    and manage policy by namespace. Multi-ontology systems also better enable discovery of new

    conclusions. The ability to have multiple models allows multiple issues to be worked, and multiple

    organizations can make use of the same tool. This lowers overall cost and speeds return on investment.

    Bottom line here: Do not select a tool that forces you to lock in on a particular analytical model.

    Licensing: User organizations should, to the greatest extent possible, push for licensing that is aseconomical, exible and predictable. For many analytical tools a license based on number of users

    is a common approach. Some tools license based on the number of processors or servers or cores

    so you can be stuck with a high cost even if you have no one using the tool. You want systems from

    companies that are motivated to serve users, so licenses that reect actual analytics used, regardless of

    processors or users, are the most exible and are generally the best for this type of tool. For example,

    if the mission team needs to be drastically expanded in a short period of time, it may slow the project

    down while more licenses are acquired. Also user licenses are, in most cases, acquired for longer

    periods of time than mission list, so when you compare options, this sort of choice can be signicantly

    lower cost to start and to maintain. You should also be careful about other licenses that are hidden

    when you buy a Big Data tool. For example, are you also required to buy an Oracle or Sybase license?

    Security and Governance: Enterprises require authentication, authorization, auditing and other

    governance of tools for eective oversight of mission support and for ensured reliability. Expect the

    capability you support to have options for LDAP/Active Directory integration, role-based access with

    delegation, integrated encryption methods and strong audit capabilities. Tools working with Hadoop

    4

  • 8/2/2019 Big Data Analytical Tools

    6/8

    CTOlabs.com

    5

    clusters should have an ability to run in the secure areas of your network that hold the Hadoop masterand slave nodes.

    Partner and Legacy Ecosystem: Your legacy IT infrastructure comes from a wide range of rms. Any

    organization of size will have software that operates over datastores from companies like Oracle,

    Microsoft, Sybase, MySQL, IBM, Cloudera and countless others. And analytical tools from a wide

    range of vendors are also in your ecosystem. This means any Big Data capability you pick should have

    great exibility in working with others in the ecosystem. Your Big Data solution must be able to work

    with anyone. So the Big Data capabilities you pick should be designed to enable customization and

    extension. This includes an ability to change ontologies, change interfaces, change data sources and

    change the other tools that it interfaces to.

    Deployment Models:The capabilities you acquire should be able to run without a large contractor

    sta. Specialists are frequently required to install a capability and some level of services and support

    to your team can be expected, but if you must buy a large number of engineers to keep the Big

    Data tools running then you really have not bought a solution. You have bought the capability plus

    engineers, and the cost of that will eat you alive. If you are told that engineers are required it should

    send up other alarms. Will there always have to be a wizard behind the curtain?

    Health o the Firm: Who are you buying your capability from? Are they a user-focused organizationthat cares and will be with you long term? This can be hard to evaluate but it is worth some homework.

    What if the rm you are dealing with has the great reputation of an Enron pre-crash? How would you

    know as a potential user if the rm has the ethics and abilities you require? Is this rm having trouble

    staying aoat? If you are relying on the company for support, you may lose your investment if it closes

    its doors. This is why the government mandates market research to be done in Federal Acquisition

    Regulations. Never skip that step! Research the capability itself and the rm you are doing business

    with.

    Concluding Thoughts

    There are many other criteria you may want to consider for evaluating Big Data analytical tools, but the

    ten above are key for ensuring long term mission success. We also believe it is important to speak with

    others who have used the tools you are evaluating to get the benet of the lessons learned of others.

    This is especially important in the current budget environment.

  • 8/2/2019 Big Data Analytical Tools

    7/8

    A White Paper For The Federal IT Community

    More Reading

    For more federal IaaS technology and policy issues visit:

    CTOvision.com- A blog for enterprise technologists with a special focus on Big Data.

    CTOlabs.com - A reference for research and reporting on all IT issues.

    Carahsot.com - Oering Big Data solutions for Government.

    About the Authors

    Ryan Kamaufis the lead technology research analyst at Crucial Point LLC, focusing in disruptive

    technologies of interest to enterprise technologists. He is also a writer at CTOvision.com Contact Ryan

    at [email protected]

    Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former

    federal CTO. Contact Bob at [email protected]

    6

  • 8/2/2019 Big Data Analytical Tools

    8/8

    CTOlabs.com

    For More Inormation

    If you have questions or would like to discuss this report, please contact me. As an advocate for better

    IT in government, I am committed to keeping the dialogue open on technologies, processes and best

    practices that will keep us moving forward.

    Contact:Bob Gourley

    [email protected]

    703-994-0549

    All information/data 2011 CTOLabs.com.