8
May 2012 White Paper: Datameer’s User-Focused Big Data Solutions CTOlabs.com Inside: • Overview of the Big Data Framework • Datameer’s Approach • Consideration for Deployment A White Paper providing context and guidance you can use

Datameers User Focused Big Data Solutions

Embed Size (px)

DESCRIPTION

This report reviews Datameer's user-focused Big Data solution in the context of government missions.

Citation preview

Page 1: Datameers User Focused Big Data Solutions

May 2012

White Paper: Datameer’s User-Focused Big Data Solutions

CTOlabs.com

Inside:• Overview of the Big Data Framework• Datameer’s Approach• Consideration for Deployment

A White Paper providing context and guidance you can use

Page 2: Datameers User Focused Big Data Solutions

CTOlabs.com

Datameer: Bringing Big Data To All

This paper, produced by the analysts and researchers of CTOlabs.com, provides an overview of one of the pioneers of the Big Data movement, Datameer. Datameer provides end-user-focused capabilities, enabling user self-service and real time interaction for users.

Executive Summary

Datameer provides a complete analytics platform that supports users. Users see familiar interfaces and easy to manipulate interactive visualizations. They are supported with a back end that integrates all the enterprise’s data resources. The result: powerful Big Data solutions are provided to users in a way that lets them interact directly with data instead of being forced to work through development teams.

Background

Apache Hadoop is the leading open-source Big Data platform with an ecosystem of software to inexpensively store, process, and analyze almost any type of information from any source. Hadoop is renowned for its ability to work on commodity hardware. Hadoop enables fast, distributed analysis running in parallel on multiple servers in a cluster. Hadoop is reliable, managing and healing itself; scales linearly, working as well with one terabyte of data across three nodes as it does with petabytes of data across thousands; affordable, costing much less per terabyte to store and process data compared to traditional alternatives; and agile, allowing users to load raw data into the system and implement a “schema on read” approach which orders the data based on how it’s requested.

The Challenges of Hadoop Enterprises face many common challenges when implementing Hadoop. Until the arrival of end-user-focused solutions like Datameer, Hadoop clusters had to be integrated, programmed, and queried by programming specialists or data scientists. This works for firms like Google, LinkedIn, Facebook, Twitter and others that can hire scores of computer scientists and data engineers, but in most firms the analyst

1

Page 3: Datameers User Focused Big Data Solutions

A White Paper for the Government IT Community

needs the help of additional programmers to build a query. Datameer’s approach has changed this by bringing powerful graphic user interface tools and easy to implement Big Data solutions directly to the user. Businesses and government agencies can rely on Datameer to tap into all data sources while presenting users unfamiliar with Hadoop programming paradigms with familiar tools. The analysts who can benefit the most from Big Data now have a tool purpose-built for their needs.

The Datameer Approach

Datameer was designed to let analysts and other Big Data end-users benefit from Hadoop. Datameer is the first business intelligence and analytics platform built natively on Hadoop to allow for end-user analysis and correlation of any size structured, semi-structured and unstructured data. Datameer runs on all major Hadoop distributions and integrates easily into existing IT infrastructure with point-and-click deployment. Datameer can be easly deployed over any Hadoop cluster, including those in-house or on public cloud environments like those at Amazon or Rackspace. Datameer easily integrates with all legacy technologies and datastreams, including existing business intelligence data warehouses, transactional databases and other analytic stores. It also works with newer NoSQL technologies.

With Datameer, you can integrate, analyze, and visualize data of any volume, variety, and velocity, enabling numerous Big Data use cases. It works well for large-scale data mining and text analysis because it can import massive amounts of data in parallel. Datameer can find correlations across structured and unstructured data such as phone records, social media, and text for pattern detection by joining any type of data at any size. Datameer can also be helpful in cyber security monitoring, for example by importing a large numbers of log files from disparate servers and analyzing them together for anomalous behavior.

2

Datameer and Zvents

“Zvents is the leading online platform

for the discovery and promotion of local

entertainment, including concerts, movies,

restaurants, theaters, and more. We connect

35 million monthly uniques with over 140,000

local promoters, via a network of over 300

branded media partners, creating the largest

local entertainment audience on the Web.

Data is critical to any web business, and

gaining rapid insights in the fast-moving world

of live events is even more critical. Datameer

has enabled us to leverage our considerable

investments in “Big Data” technology,

including Hadoop and Hypertable, to rapidly

discover actionable business insights that

enable us to better server our users and our

advertisers. Datameer has given us a scalable,

flexible, cost-effective way to structure and

analyze terabytes of behavioral click data,

driving new product initiatives like “Top 40,”

our new trending list of hot events at top40.

zvents.com.”

Ethan Stock,CEO and Founder, Zvents Inc.

Page 4: Datameers User Focused Big Data Solutions

CTOlabs.com

3

Datameer can do all of this because it is a complete analytics platform that supports data integration, analysis, visualization, and security while focusing on the data analysts and scientists who turn raw information into intelligence. To bring data of all sorts together, Datameer provides wizard-based data integration with over 20 pre-built connectors. These provide immediate access to all common data sources including relational databases such as Teradata, Greenplum, Vertica, Oracle, DB2, Microsoft SQL Server, and MySQL, along with file formats such as CSV, Fixed Length, JSON, XML Mbox, Apache Log Files and Twitter. Datameer also has connectors for the Hadoop Distributed File System and the Hadoop database systems Hive and HBase.

For The Analyst

For analysis, Datameer provides a familiar spreadsheet user interface that requires no programming to design end-to-end data processing pipelines. Datameer provides over 200 pre-built functions for exploring and discovering complex relationships. These include the basics such as aggregation but also advanced capabilities. Functions are provided for analysis of text, production of mathematical assessments, bioinformatics, engineering and statistics. Once users integrate and analyze their data, they can visualize the results using simple drag and drop wizards for creating visualizations and dashboards. An extensive library of widgets including tables, charts, graphs, and maps gives users the ability to choose the visualization that will best help them understand the results. With Datameer, analysts and data scientists can focus on what they do best, getting insights from data, instead of writing code. Datameer automatically compiles a workbook of spreadsheets into efficient Hadoop MapReduce execution plans; it then monitors their progress, status, and throughput to detect problems. If users want to go deeper, it offers open APIs for custom data integration, analytics and visualizations.

Datameer and Nurago

“As a result of using Datameer, nurago is

better able to help our clients identify and

analyze patterns in behavioral data of panel

members. Datameer helps us as a market

research vendor to scale for our most granular

data requirements and greatly simplifies the

integration of multiple sources. In addition,

Datameer makes reporting on big data

analytics directly accessible to our analysts so

that they don’t need to turn to developers for

their requirements.”

Nikolaus Pohle,CTO of nurago

Page 5: Datameers User Focused Big Data Solutions

A White Paper For The Federal IT Community

As a result, Datameer provides powerful, agile analytics to support your organization’s mission. Adding new data sources is quick and easy. By using Hadoop, Datameer has no limitations on storage and computation and does not require pre-defined data models so usage is never constrained by up-front system design. By focusing on the end-user, Datameer also eliminates Hadoop’s need for a user’s deep technical expertise. This lets any analyst, from any domain, across any site, and with any skillset to contribute by providing Big Data analytics in spreadsheets that can be accessed and updated instantly worldwide. Lastly, by simplifying the process and removing the IT bottleneck, Datameer removes limitations in “time-to-trigger”, letting users develop and run Hadoop-based analytics jobs in minutes.

Datameer in Government

Government agencies have been leveraging the Big Data movement to directly support many government missions. Agencies have been using Big Data approaches in missions supporting Healthcare, Education, Environmental Research, Law Enforcement, Defense, Intelligence and numerous other activities. Early adopters in these communities have been leading the way in open source solutions and contributions back to the broader community. Solutions have been deployed throughout the federal space including on most publicly facing government web properties.

The initial government foray into Big Data has in many ways mirrored the Big Data movement in industry. Still today, for most government missions to be served they must leverage teams of data scientists and engineers. Little to no user-centered Big Data approaches are in use in the government.

We believe that is about to change. With Datameer available to every government knowledge worker by easy access through a browser, citizen service and mission support will be supported in new, highly efficient and effective ways.

4

Datameer and Attributor

“Attributor’s selection of Datameer was driven

by our need to quickly provide analytics to

our clients. Datameer’s ease-of-use, seamless

integration with Cloudera’s CDH, HBase and

MySQL and ability to correlate structured and

unstructured data on day one has already

saved us both time and money in running

thousands of analytics jobs for our users.

Matt Robinson,President and COO, Attributor

Page 6: Datameers User Focused Big Data Solutions

CTOlabs.com

5

Concluding Thoughts

Since government agencies have already established visions and goals for big data approaches to serve their missions and since Datameer has a proven user-focused approach to leveraging all organizational data for analysis, we believe Datameer is poised for rapid growth in the federal sector.

A logical step for most government agencies and systems integrators, architects and engineers that support them is to begin a proof of concept activity to see first hand how Datameer can work in your environment.

More Reading

For more federal Big Data technology and policy issues visit:

• CTOvision.com- A blog for enterprise technologists with a special focus on Big Data.• CTOlabs.com - A reference for research and reporting on all IT issues.• Carahsoft.com - Offering Big Data solutions for Government. • GovernmentBigDataForum.com - Join the Government Big Data Forum.• J.mp/ctonews - Sign up for the Government Big Data Newsletter.• Datameer.com - Visit for more on how Datameer works and to arrange a proof of concept.

Page 7: Datameers User Focused Big Data Solutions

A White Paper For The Federal IT Community

About the Authors

Bob Gourley is CTO and founder of Crucial Point LLC and editor and chief of CTOvision.com He is a former federal CTO. His career included service in operational intelligence centers around the globe where his focus was operational all source intelligence analysis. He was the first director of intelligence at DoD’s Joint Task Force for Computer Network Defense, served as director of technology for a division of Northrop Grumman and spent three years as the CTO of the Defense Intelligence Agency. Bob serves on numerous government and industry advisory boards.

Contact Bob at [email protected]

Alexander Olesker is a technology research analyst at Crucial Point LLC, focusing on disruptive technologies of interest to enterprise technologists. He writes at http://ctovision.com. Alex is a graduate of the Edmund A. Walsh School of Foreign Service at Georgetown University with a degree in Science, Technology, and International Affairs. He researches and writes on developments in technology and government best practices for CTOvision.com and CTOlabs.com, and has written numerous whitepapers on these subjects. Alex has worked or interned in early childhood education, private intelligence, law enforcement, and academia, contributing to numerous publications on technology, international affairs, and security and has lectured at Georgetown and in the Netherlands. Alex is also the founder and primary contributor of an international security blog that has been quoted and featured by numerous pundits and the War Studies blog of King’s College, London. Alex is a fluent Russian speaker and proficient in French.

Contact Alex at [email protected]

6

Page 8: Datameers User Focused Big Data Solutions

CTOlabs.com

For More InformationIf you have questions or would like to discuss this report, please contact me. As an advocate for better IT in government, I am committed to keeping the dialogue open on technologies, processes and best practices that will keep us moving forward.

Contact:Bob Gourley [email protected]

All information/data ©2011 CTOLabs.com.