3
DAUM COMMUNICATIONS Using big data analytics to understand and predict user behavior ESSENTIALS Industry Telecommunications Company Size 2,000+ employees Business Challenges Reduced responsiveness due to inability to perform realtime analysis Increased complexity from NoSQL database management systems Reliance on resource-intensive data analysis Reduced capability to make ad- hoc queries on unstructured data Solution EMC VNX unified storage Pivotal Greenplum Database OVERVIEW Daum Communications (Daum) is one of the leading providers of Korean-language online services, including the news and information portal Daum.net, web-based email service Hanmail.net, and the Daum Cafe online community. Headquartered in Jeju Island, the company provides mobile web services, search marketing, and electronic mapping. It also sells online advertising products through Daum.net. Daum is the second largest web portal service provider in terms of daily visits in Korea and has operating centers in Seoul and on Jeju Island. Through its extensive range of Internet services and sale of online advertising products, Daum generates vast amounts of unstructured data. The company has one of the largest Apache Hadoop clusters in Korea, and analyzes its data to gain critical competitive information in a number of areas, including user preferences and behavior, search rankings, and advertisement targeting. COMPLEX ENVIRONMENT IMPEDES DATA ANALYSIS Facing intense domestic and global competition from a number of search engines that are growing market share across desktop and mobile searches, Daum’s businesses needed to make faster and better decisions to protect the company’s 20 percent share of the Korean search market. The company needed to analyze and make immediate decisions on its vast data stores by extracting knowledge from its data in real time. But Daum was more interested in solving analytic problems than in exploring relationships between data that are available in traditional relational database systems. As a result, Daum was using Hadoop to store data, and was using NoSQL non-relational database management systems such as Cassandra and Storm as the Hadoop Distributed File System (HDFS) to provide greater speed in performing Big Data analytics on unstructured data. This solution landscape presented the company with serious challenges. “Performing ad-hoc and multidimensional queries and analysis through Hadoop on our unstructured data proved difficult,” says Jun-Sik Eom, Team Manager, Data Technology Department, Daum Communications. “We were restricted in the speed of data analysis due the batch processing of both unstructured and structured data, which meant we relied heavily on the capability of our developers. Data analysis of complex forms was also challenging in the NoSQL database.” Because Daum’s data must be constantly reviewed, the company sought a solution that would enable employees to perform high-speed queries on the data residing in Hadoop. Additionally, Daum wanted to improve access through tools that were already familiar to developers and database administrators. CUSTOMER PROFILE

Daum Communications Case Study

  • Upload
    pivotal

  • View
    125

  • Download
    0

Embed Size (px)

DESCRIPTION

Daum evaluated solutions that could address the limitations in the resource-intensive analysis required by Hadoop and the NoSQL database management systems. To meet the data analysis requirements for its search engine and the Internet services businesses, the company selected Pivotal Greenplum Database, which connects to Hadoop and enables the co-processing of both structured and unstructured data within a single solution. To learn more, visit pivotal.io/big-data/pivotal-greenplum-database.

Citation preview

Page 1: Daum Communications Case Study

DAUM COMMUNICATIONS Using big data analytics to understand and predict user behavior

ESSENTIALS Industry

Telecommunications

Company Size

2,000+ employees

Business Challenges

• Reduced responsiveness due to inability to perform realtime analysis

• Increased complexity from NoSQL database management systems

• Reliance on resource-intensive data analysis

• Reduced capability to make ad-hoc queries on unstructured data

Solution

• EMC VNX unified storage

• Pivotal Greenplum Database

OVERVIEW Daum Communications (Daum) is one of the leading providers of Korean-language online services, including the news and information portal Daum.net, web-based email service Hanmail.net, and the Daum Cafe online community. Headquartered in Jeju Island, the company provides mobile web services, search marketing, and electronic mapping. It also sells online advertising products through Daum.net. Daum is the second largest web portal service provider in terms of daily visits in Korea and has operating centers in Seoul and on Jeju Island.

Through its extensive range of Internet services and sale of online advertising products, Daum generates vast amounts of unstructured data. The company has one of the largest Apache Hadoop clusters in Korea, and analyzes its data to gain critical competitive information in a number of areas, including user preferences and behavior, search rankings, and advertisement targeting.

COMPLEX ENVIRONMENT IMPEDES DATA ANALYSIS Facing intense domestic and global competition from a number of search engines that are growing market share across desktop and mobile searches, Daum’s businesses needed to make faster and better decisions to protect the company’s 20 percent share of the Korean search market.

The company needed to analyze and make immediate decisions on its vast data stores by extracting knowledge from its data in real time. But Daum was more interested in solving analytic problems than in exploring relationships between data that are available in traditional relational database systems. As a result, Daum was using Hadoop to store data, and was using NoSQL non-relational database management systems such as Cassandra and Storm as the Hadoop Distributed File System (HDFS) to provide greater speed in performing Big Data analytics on unstructured data. This solution landscape presented the company with serious challenges.

“Performing ad-hoc and multidimensional queries and analysis through Hadoop on our unstructured data proved difficult,” says Jun-Sik Eom, Team Manager, Data Technology Department, Daum Communications. “We were restricted in the speed of data analysis due the batch processing of both unstructured and structured data, which meant we relied heavily on the capability of our developers. Data analysis of complex forms was also challenging in the NoSQL database.”

Because Daum’s data must be constantly reviewed, the company sought a solution that would enable employees to perform high-speed queries on the data residing in Hadoop. Additionally, Daum wanted to improve access through tools that were already familiar to developers and database administrators.

CUSTOMER PROFILE

Page 2: Daum Communications Case Study

Benefits

• Increased data loading and processing speeds

• Improved accuracy in generating search results and predicting user behaviour

• Increased efficiency by performing rapid queries on the data

• Reduced expenditures through improved scalability

PIVOTAL GREENPLUM DATABASE ENABLES HIGH-SPEED ANALYSIS OF UNSTRUCTURED DATA Daum evaluated solutions that could address the limitations in the resource-intensive analysis required by Hadoop and the NoSQL database management systems. To meet the data analysis requirements for its search engine and Internet services businesses, the company selected Pivotal Greenplum Database, which connects to Hadoop and enables the co-processing of both structured and unstructured data within a single solution.

“We were attracted to Pivotal Greenplum Database because of the advantage it had in mixing the merits of database, data warehouse, and business intelligence,” says Eom. “We can now use a single platform to run high-speed analytic queries on our most appropriate data stores.”

“We were attracted to Pivotal Greenplum Database because of the advantage it had in mixing the merits of database, data warehouse, and business intelligence. We can now use a single platform to run high-speed analytic queries on our most appropriate data stores.” Jun-Sik Eom, Team Manager, Data Technology Department, Daum Communications

DELIVERING NEW BUSINESS INSIGHTS FROM REALTIME ANALYSIS To support its efforts to gain market share, Daum is using Pivotal Greenplum Database to provide improved services and search accuracy to its users. Through realtime data gathering and analysis of Internet searches and user behavior within its various online services, the company can better predict future behavior and demand.

Daum can now make multiple queries—both in real time and over time as user patterns and knowledge emerge—due to massively parallel processing (MPP) architecture, which enables fast data loading and high-speed queries on the data. In addition to performing real-time weblog analysis, the company can re-analyze data that has already been processed and gain meaningful results with these various interpretations. Pivotal helped Daum achieve an increased depth of knowledge, which is just as critical as breadth in terms of delivering services.

ELIMINATING ROADBLOCKS TO SPEEDY QUERYING Performing ad-hoc queries on the data stored in NoSQL databases from the Pivotal Greenplum Database means administrators can use familiar SQL commands to perform massive and multidimensional analysis. This reduces the company’s reliance on finding specialist NoSQL and Hadoop skill sets, and minimizes the workload for employees.

“One of the most important elements in effectively using Big Data is securing the right people,” says Eom. “We used to struggle with having the resources needed to perform queries, which greatly reduced our processing efficiency. Today, instead of performing queries on the NoSQL systems, we collect the data residing in Hadoop and NoSQL, and then save it in Pivotal Greenplum Database to execute the analysis.”

Page 3: Daum Communications Case Study

ENABLING CONTINUOUS PROCESSING WHILE REDUCING COSTS Because Pivotal Greenplum Database is available as a software-only distribution, Daum can run the data warehouse on any of its existing x86 servers running Hadoop. This ensures scalability while eliminating the need for Daum to purchase new data center infrastructure. Pivotal Greenplum Database enables gNet for Hadoop, a parallel communications transport, to access the Hadoop cluster and query the data efficiently using Hadoop servers rather than those running Pivotal Greenplum Database.

“By using our existing x86 servers, we were able to reduce expenditures and expand capacity through linear scalability,” Eom explains. “We have continuous processing across Pivotal Greenplum and Hadoop nodes. As the data increases, we can conveniently expand our capacity just by adding standard x86 servers.”

LEARN MORE To learn more about Pivotal products, services and solutions, visit gopivotal.com.

CONTACT US To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller—or visit us at www.EMC.com.

www.EMC.com

EMC2, EMC and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. GoPivotal, Pivotal, and the Pivotal logo are registered trademarks or trademarks of GoPivotal, Inc, in the United States and other jurisdictions. All other trademarks used herein are the property of their respective owners. © Copyright 2013 EMC Corporation. All rights reserved. Published in the USA. 12/13 Customer Profile H12705. EMC believes the information in this document is accurate as of its publication date. The information is subject to change without notice.