18
Big DataSecurity challenges: Hadoop Perspective Gayatri Kapil 1 , Alka Agrawal 2 , Raees Ahmad Khan 3 1,2,3 Department of Information Technology 1,2,3 Babasaheb Bhimrao Ambedkar University (A Central University), Lucknow, India 1 [email protected] 2 alka [email protected] 3 [email protected] October 14, 2018 Abstract With the exponential growth of big data, it has become increasingly vulnerable and has been exposed to malicious attacks. These attacks can damage the essential qualities of privacy, integrity and availability of information systems. In order to deal with these malicious intentions, it is necessary to develop effective security mechanisms. This paper first describes Hadoop and its components and its current secu- rity mechanism, and then analyzes security problems and its risks. In addition, some important aspects of big data Hadoopsecurity and privacy have been proposed to increase your tract and safety and, ultimately, based on previous details, Hadoop security Challenges concludes. Key Words ::Hadoop, MapReduce, HDFS, Hadoop Com- ponents, Hadoop Security and Data Encryption and HDFS Encryption. 1 International Journal of Pure and Applied Mathematics Volume 120 No. 6 2018, 1033-1050 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ Special Issue http://www.acadpubl.eu/hub/ 1033

Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

Big DataSecurity challenges: HadoopPerspective

Gayatri Kapil1, Alka Agrawal2,Raees Ahmad Khan3

1,2,3Department of Information Technology1,2,3Babasaheb Bhimrao Ambedkar University

(A Central University), Lucknow, [email protected] [email protected]@yahoo.com

October 14, 2018

Abstract

With the exponential growth of big data, it has becomeincreasingly vulnerable and has been exposed to maliciousattacks. These attacks can damage the essential qualities ofprivacy, integrity and availability of information systems. Inorder to deal with these malicious intentions, it is necessaryto develop effective security mechanisms. This paper firstdescribes Hadoop and its components and its current secu-rity mechanism, and then analyzes security problems andits risks. In addition, some important aspects of big dataHadoopsecurity and privacy have been proposed to increaseyour tract and safety and, ultimately, based on previousdetails, Hadoop security Challenges concludes.

Key Words::Hadoop, MapReduce, HDFS, Hadoop Com-ponents, Hadoop Security and Data Encryption and HDFSEncryption.

1

International Journal of Pure and Applied MathematicsVolume 120 No. 6 2018, 1033-1050ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

1033

Page 2: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

1 Introduction

A challenge that is gradually coming out in developing big datatechnology is to initiate new business opportunities for all eminentindustries. Nowadays, where almost everything is becoming digitalwhich is the reason for this era to be digitalized but a major con-cern of IT industries is in front to keep this data securely and pro-cessing it which is being produced from various different sources.Also, many IT industries are still facing problem to convert thedata generating from different sources or unstructured data intothe usable format so that it can be processed and can be used inother applications. Hadoop has emerged as a solution to almostall big data problems. As big data is different from other data interms of volume, velocity, variety, value [1], its processing has alsobecome difficult for most of the government and business applica-tions. Because of the huge volume of big data, traditional meth-ods for managing to extract and analysing the same are not veryuseful as these may not provide the accurate result for decisionmaking etc. Therefore, its management in real time has becomea major concern for research. For using big data in a managedway, researchers and practitioners have explored various tools andtechniques. Thus, big data is a moving target and requires moreattention to capture, curate, handle and process it. Though, ini-tially, it was expected that the data was less and can be easilyhandled by RDBMS but now RDBMS tools have failed to managebig data. To overcome this, Apache software foundation has devel-oped a system tool called Apache Hadoop. It is one of the mosthighly used technologies which can handle the large volume of dataas well as provide high-speed access to the data within the currentapplication. It is used for distribution, processing and running ap-plication for a large amount of datasets. It is a Java-based tooland works as a master-slave technique to handle the large volumeof continuous data traveling at a high speed from different sourceslike events, emails, social media, external feeds, etc. [1]. Also, it isan easily available tool to store & process thelarge volume of dataand provides high-speed access within the application and is usedby big industries like Google, Yahoo, Facebook, etc. [2]. About63% of various communities and organizations are using Hadoop tomanage a huge amount of structured, semi-structured and unstruc-

2

International Journal of Pure and Applied Mathematics Special Issue

1034

Page 3: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

tured data [2]. Several Enterprises and Organizations are rapidlydepending with trust and confidence on Hadoop for storing theirprevious data and processing it. But security and protection of thestored data is somehow lacking in Hadoop. This is the major lim-itation of this processor. To understand this, consider the case ofstorage and processing systems security, which are now very popu-lar. As storage and processing system nodes often exchange data,therefore the risk of privacy and security breach arisesand securityof this data requires a strong security mechanism.For these reasons,this paper presents details about Hadoop and its components be-cause it remains the processing of necessary structures and largedata for management, and then about some exiting mechanisms toincrease security and privacy. Rest of the paper is organized asfollows: Section 2 defines the architecture of Hadoop including itscomponents and how it stores, process and manage big data.Bigdata security challenge and Existing Hadoop security mechanismsare discussed in Section 3 and Section 4 respectively. And enumer-ated the directions to be taken while using the big data Hadoopincluding security & privacy measuresdiscussed in Section 4. Fi-nally the authors conclude their work in Section 5.

2 Hadoop: Big Processing Solution

Apache Hadoop is an open sourceplatform and has introduced anew easy way of storing and processing data. This was actuallyinfluenced by the Googles published documents which highlightedits attempt for handling the barrage of data. Consequently, it hasbecome the basic standard for storing, processing and analysingenormous amount of data which is in terabytes and petabytes [3].Even Hadoop provides the same processing services of expensivehardware in affordable, industry-standard servers which can storeprocess data without any limits. By using straightforward program-ming models, it processes the data which comes vary in Gigabytesto Petabytes produced by series of computers. Nowadays, the sce-nario has changed and data is increasing rigorously hence RDBMSis not able to perform efficiently because of the large volume ofdata. Vikram S. et al. have defined, big data in terms of its fivedimensions including Volume, Velocity, Variety, Value & complex-

3

International Journal of Pure and Applied Mathematics Special Issue

1035

Page 4: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

ity and also suggested the basic idea to handle big data with thehelp of Hadoop architecture like name node, data node, edge nodeand HDFS [17]. The authors have also introduced the issues facedby different users of big data i.e. data privacy and search analysiswhich required urgent attention for the research work. In the paper[16], the authors discussed the relation between the key componentsof Hadoop i.e. Map Reduce and Hadoop Distributed File System.Map reduce is used for large-scale distributions whereas HadoopDistributed File System is used to store all input data and gen-erate data for further applications. HDFS is further divided intothree categories i.e. software architecture, bottleneck portabilitylimitations, and portability assumption.A typical Hadoop architecture is shown in figure-1.

Figure-1: Hadoop Architecture

2.1 Hadoop Distributed File System (HDFS)Storage of Data

HDFS has been developed using distributed file system design. Itis highly fault tolerant and holds the significant amount of dataand provides easier access to that data. HDFS is a core componentof Hadooparchitecture used to store various input and output datafor the application. HDFS is the block-structured files system [2-3].Currently, default block size is 128 MB which was previously 64 MBand default replication factor is 3. Block size and replication factorsare configurable parameters. An individual file is a divided intothe fixed size of blocks with the following characteristics. 1)Blocksare stored in a cluster of one or many machines with enough datastorage capacity and data node manages the data storage of itssystem. 2) HDFS will be responsible for recovery of Data Nodeand distributes data across the data node in groups. 3) HDFS play

4

International Journal of Pure and Applied Mathematics Special Issue

1036

Page 5: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

an administrative role to add or remove the node from a cluster asshown in Figure-2.

Figure-2: HDFS Storage

HDFS Terminologies Name NodeName node is the core part of Hadoop system. If name nodecrashes, the entire Hadoop system goes down. The name nodemanages the file system namespace and stores the metadata infor-mation along with the location of the data blocks.Secondary NameNodeSecondary name node is responsible for copying and merging thenamespace image and editing log. In case, if the name node crashes,the namespace image stored in secondary name node can be usedto restart the name node. Secondary name node is the backbone ofname node.Data NodeIt stores the blocks of data and retrieves them. The data nodesalso report the block’s information to the name node periodically.

2.2 MapReduce- Distributed Data ProcessingFramework

Hadoop MapReduce is a Java-based system developed by Google inwhich data from the HDFS store is processed by using MapReduceprogramming paradigm [2-3]. In the MapReduce paradigm, eachjob has a user-defined map & reduce phase (which is a processed ina completely parallel manner, by splitting the input data set intoindependent chunks and using those data in user consumption orextra processing). HDFS is the storage system for both entry and

5

International Journal of Pure and Applied Mathematics Special Issue

1037

Page 6: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

exit of the MapReduce job. The main components of MapReduceare described as follows: 1) Job Tracker is the master of the systemwhich manages the jobs and resources in the cluster knows as TaskTrackers. 2) Task Trackers are the slaves which are deployed oneach machine. They are responsible for running the map and re-duce tasks as instructed by the Job Tracker. 3) Job History Serveris a daemon that provides historical information about completedapplicationsas shown in Figure-3.Map Reduce Process

Figure-3: Map Reduces Process

Step1: Input data in the form of image, video, text files is con-verted into < Key1 (K1) and Value1 (V1) > which are done byinput record reader.Step2: Output (K1, V1) is again converted into Key 2 & Value 2(K2, V2) by Mapper.Step3: Second stage output i.e. K2, V2 is converted into K2 &list (Value2) with the help of shuffle and Sort techniques.Step4: Reducer takes the values of Key 2, List (Value2) and gen-erated the output Key 3, ValueStep5: Final output is generated by Output Record Writer whichtakes the output of Reducer (K 3, V3) as an input.

2.3 Other Hadoop Components

Hadoop is neither a single tool nor only a programming language.Hadoop is a software library written in Java used for processinglarge amounts of data in a distributed environment. Hadoop stan-dalone cannot provide all the services or facilities that are requiredto process big data. Its ecosystem is a set of tools which help processlarge data of size ranging from Gigabytes to Petabytes simultane-ously. Hadoop is an Apache Project which provides many facilitiessuch as Map Reduce for parallel computing, etc. However, there

6

International Journal of Pure and Applied Mathematics Special Issue

1038

Page 7: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

is much more to do if one wants to create recommendation engineover big data, to run clustering algorithm over big data, and toget the nearby real-time access using big data itself. To processesthese requirements, one has to add more and more componentsfrom Hadoop. Apache pig, Hive, HBase, HDFS, Map Reduce, Ma-hout, Oozie, Zookeeper, Sqoop, these are several components whencombined with original Hadoop help to make ecosystem much morescalable for a robust solution.

Table-1:shows the Hadoop Components

7

International Journal of Pure and Applied Mathematics Special Issue

1039

Page 8: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

3 Big Data: Hadoop Security Chal-

lenge

To achieve high quality performance in the field of availability andscalability, IT organizations are depending on Hadoop and its com-ponents. Amazon uses the same to build their product search in-dices and process their millions of sessions. Facebook is using datawarehouse, log processing and also recommendation systems [8].Hadoop and its components are used by cloud space for their cus-tomer projects. Twitter is also using the same to manage the datawhich is generated on their website daily. The New York Timesuses Video and Image Analysis in addition to these great perform-ers, IBM, Firm, LinkedIn, and the University of Freiburg [15].

Hadoop ecosystem is evolving to satisfy the needs of many or-ganizations, researchers, and Government. At present, some or-

8

International Journal of Pure and Applied Mathematics Special Issue

1040

Page 9: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

ganizations and enterprises analyze the information and locationdata collected from the various customers of different areas. Later,they organize the collected for marketing activity, so personal datacan be disclosed when analyzing the data of customers.That hascreated a new target for hackers and other cyber criminals. Thisdata, which was previously used by organizations, is extremely valu-able, subject to privacy laws and regulations. Consequently, thesecompanies have need security to secure and protect their privacy.That means, demand for data scientists and stronger security andprivacy have continue its ascent in protecting the users personalinformation.

4 Hadoop Security

Initially, at the time of creation of Hadoop, the security issuesweren’t on the top priority [23]. The only thing in the mind ofthe developers was to develop a system for distributing and paral-lel processing of huge data. To solve these problems, need to bea strong security in Hadoop for securing sensitive information [23].Later on, some mechanisms have been proposed to Hadoop clusterto secure them. Authorization, authentication, encryption, and keymanagement are available and feasible pillars for securing Hadoopcluster. Firstly, Hadoop distributions performed much of the inte-gration and setup work with central security as Active Directory orLDAP through Apache Knox Gateway [23]. It is system that pro-vides a single point of authentication and access for Hadoop service.It accesses over HTTP/HTTPs to Hadoop cluster and eliminatesSSH edges node risks. Hadoop distributions performed much of theintegration and setup work with central security as Active Direc-tory or LDAP [23]. For securing communication between variousnodes include Kerberos, Simple Authentication and Security Layer(SASL) etc. Authentication hashing techniques have been imple-mented. This system is using SHA-256 [23] hashing technique. Theuser is allowed to authenticate to name node by sending a hash func-tion. Then name node compares that hash function sent by userwith the one generated by itself.

Secondly, HDFS Encryption, HDFS offers ’transparent’ encryp-tion embedded within the Hadoop file system. This means data is

9

International Journal of Pure and Applied Mathematics Special Issue

1041

Page 10: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

encrypted as it is stored into the file system, transparently, withoutmodication to the applications that use the cluster. This is an im-portant feature to support tenant data privacy in multi-tenant clus-ters. HDFS can be used with Hadoop’s Key Management Service(KMS), or integrated with third party key management services.Researchers and practitioners have proposed various encryptionsscheme with HDFS for securing stored and transit data.

Lei Xu et al. [27] presented CL-PRE.It is a certificate lessproxy re-encryption scheme for secure data sharing with publiccloud. CL-PRE uniquely integrates identity-based public key intoproxy re-encryption. It eliminates the key escrow problem in tra-ditional identity-based encryption, and does not require the use ofcertificates to guarantee the authenticity of public keys.M. Li etal. [28] have proposed new cloud architecture, MyCloud, insteadof cryptographic solutions to support user-configure privacy protec-tion in cloud environment. Firstly, MyCloud de-privileges the cloudprovider and then it enables user configured privacy protection. Ithas also reduced the TCB size to minimize the attack surface ofthe cloud platform. S.Park and Y. Lee [26] have proposed a secureHadoop architecture by adding encryption and decryption func-tions in HDFS. Secure HDFS was implemented by adding the AESencrypt/decrypt class to CompressionCodec in Hadoop.

Yuan Tian [25] has proposed overview of big data and discussedits security issues. In addition, he has summarized certain wayswhich improve the security of big data including security harden-ing methodology with attributes relation graph, attribute selectionmethodology, content based access control model, a scalable mul-tidimensional anonymization approach. He has also proposed anintelligent security model for enhancing big data security whichis capable of real time data collection and threat analysis. Themodel detects the threat before security intrusion in the system. Anew security model for GHadoop, an extension of Hadoop MapRe-duce framework has been developed.For the protection of GHadoopfrom traditional attacks, several security methods are provided. Forinstance, Public key cryptography and SSL(Secure Socket Layer)have been used for security [23]. A cloud-oriented storage efficientdynamic access control scheme has been developed. This includescipher text based on the CP-ABE and symmetric encryption algo-rithm (such as AES) [53]. Has proposed encryption method using

10

International Journal of Pure and Applied Mathematics Special Issue

1042

Page 11: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

AES and OTP algorithms and integrated on Hadoop for improvethe performance of file during encryption and decryption.

4.1 Comparison of Exiting Approaches/ Method

Thus, security technology and other methods are always essential.Following are some potential methods and techniques used advan-tages, limitations are shown in table-2.

Table-2: Shows the Methods/Approaches Used Advantages andLimitations of the Some Recent Papers.

11

International Journal of Pure and Applied Mathematics Special Issue

1043

Page 12: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

5 Important Aspects of Big Data Secu-

rity and Privacy: Hadoop Perspec-

tive

Some important aspects of security and privacy in Hadoop are men-tioned below:• Hadoop is gaining popularity at enterprise level. It is reliableand cost-effective big data storage and processing platform as com-pared with the other competitive software. But, along with thisthere are some risks associated with it. For example, risk of dataleakage while it is transferred over network from Hadoop client todata node. There are some Hadoop distributors like IBM, Clouderaand Hortonworks [29-30] that claim to be providing security to theclients data. Even if their claims are true, not everyone can afford touse a specialized distribution. Getting information security is now afundamental right. For a highly secure Hadoop environment, thereshould be open frameworks which are available for everyone. Thesensitive data of enterprises is stored on cloud and all the servicesare accessed through Internet which means organizations have toface many problems related to data leakage and security.• To build an infrastructure which is cost effective and efficientlyscalable, cloud providers have to build an infrastructure that under-stands customers requirements at all levels. In order to do so theyneed share storage devices and physical resources between multipleusers. This is known as multi-tenancy. But sharing of resourcesmeans that, the resources are prone to attackers. If customer andattacker are using same physical devices than the attacker can eas-ily get access to customers data, if proper security measures are notimplemented.• Companies do not have direct control over their data [29-30], theycan never know if their data is being used by someone else or not.Since there is zero number of transparent mechanisms to monitorthe resources directly and many security issues arise automatically.• Since, customers have to share physical resources with other cus-tomers and they do not have direct control over their data, theyrely on the cloud providers using trust mechanisms as an alterna-tive to giving users transparent control over their data and cloudresources. By assuring the customers that the providers operations

12

International Journal of Pure and Applied Mathematics Special Issue

1044

Page 13: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

are certified in compliance with organizational safeguards and stan-dards, cloud providers can build confidence over their customers.• Privacy and Security have always been two distinct domains forconcerns. Yet they are usually discussed together since security isrequired in order to provide privacy. The enterprises need to be surethat their sensitive data is not being accessed by cloud providersand that it is not being shared with some third party in return forsome money, which is a serious security threat to customers privacy.Security and privacy standards such as International Organizationfor Standardization (ISO) [29-30] have evolved, which requires ser-vice providers to comply with these regulatory standards to fullysafeguard their clients data assets. This has resulted in very protec-tive data security enforcement within enterprises including serviceproviders as well as the clients.Earlier data was safely confined in isolated clusters or data siloswhere security wasnt an issue. But after getting surrounded by anever growing ecosystem of tools and applications, Hadoop evolvedinto Big Data as-a-Service (BDaaS) and took to the cloud [29-30].While these innovations have served to democratize data and bringHadoop into the mainstream, they have also created new securityconcerns for organizations that now struggle to scale security in stepwith Hadoops rapid technological advances. Due this need to beexplored new security approaches for securing sensitive informationHadoop cluster and big data in cloud.

6 Conclusion

It can be inferred that in the research of the Hadoop security, theexplored techniques are not sufficient as volume of big data is nowgradually involving everywhere in the various fields. Thus, it needsmore privacy and security approaches and explored further to iden-tify importance of security in big data locations. Furthermore, findout the more secure and fast methods to keep data secure. And,there is a need to focus on application security rather than devicesecurity which provide reactive and proactive protection.

13

International Journal of Pure and Applied Mathematics Special Issue

1045

Page 14: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

References

[1] Oguntimilehin A., Ademola E.O., A Review of Big Data Man-agement, Benefits and Challenges, Journal of Emerging Trendsin Computing and Information Sciences, vol. 5, pp-433437,June 2014.

[2] T. White, MapReduce and the hadoop distributed file system,in Hadoop: The definitive guide, 1st edition, O’Reilly Media,Inc., Yahoo press, 2012.

[3] D. Borthakur, The hadoop distributed file system: architec-ture and design, Hadoop Project Website [online]. Available:http://hadoop.apache.org/ core/docs/current/hdfs design.pdf

[4] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. An-thony, H. Liu, P. Wyckoff, R. Murthy, Hive A Warehous-ing Solution Over a MapReduce Framework, In Proc. of VeryLarge Data Bases, vol. 2, pp. 1626-1629, 2009.

[5] Konstantin Shvachko, HairongKuang, Sanjay Radia, RobertChansler Yahoo! Sunnyvale, The Hadoop Distributed File Sys-tem California USA, 2010 IEEE.

[6] Harshawardhan S.Bhosale, Devendra P.Gadekar, A ReviewPaper on Big Data and Hadoop, International Journal of Sci-entific and Research Publication vol. 4, 2014.

[7] Deepika P, Anantha Raman G R, A Study of Hadoop-RelatedTools and Techniques, International Journal of Advanced Re-search in Computer Science and Software Engineering, vol. 5,pp-160-164, 2015

[8] James Manyika, Michael Chui, Brad Brown, Jacques Bughin,Richard Dobbs, Charles Roxburgh, Angela Hung Byers, Bigdata: The Next Frontier for Innovation, Competition, and Pro-ductivity, McKinsey Global Institute, 2012.

[9] Gang Zhao, A Query Processing Framework based on Hadoop,International Journal of Database Theory and ApplicationVol.7, pp. 261-272, 2014.

14

International Journal of Pure and Applied Mathematics Special Issue

1046

Page 15: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

[10] Zookeeper- Apache Software Foundation project home pagehttps://zookeeper.apache.org

[11] Apache Mahout, http://mahout.apache.org.

[12] Apache Sqoop, https://sqoop.apache.org/

[13] 14. Apache Ambari, https://ambari.apache.org

[14] C.L. Philip Chen, Chun-Yang Zhang, Data-intensive applica-tions, challenges, techniques and technologies: A survey on BigData, Information Sciences, vol. 275, pp-314-347, 2014.

[15] Jeffrey Shafer, Scott Rixner, and Alan L. Cox, The HadoopDistributed Filesystem: Balancing Portability and Perfor-mance, Performance Analysis of Systems & Software (IS-PASS), IEEE International Symposium, pp-122 133, 2010.

[16] S.Vikram Phaneendra & E.Madhusudhan Reddy, Big Data-solutions for RDBMS problems- A survey, In 12th IEEE/IFIPNetwork Operations & Management Symposium (NOMS2010) (Osaka, Japan, Apr 2013).

[17] Mark Troester(2013), Big Data Meets Big Data Ana-lytics, www.sas.com/resources/.../ WR46345.pdf, retrieved10/02/14.

[18] http://www.bmcsoftware.in/guides/hadoop-ecosystem.html

[19] http://www.dezyre.com/article/recap-of-hadoop-nes-for-january-2018/373

[20] Stephen Kaisler, Frank Armour, J.Alberto Espinosa and Wol-liam Money Big Data: Issues and Challenges Moving Forward,Hawaii International Conference on System Sciences 46th, pp-995-1003, 2013.

[21] Mark Troester(2013), Big Data Meets Big Data Ana-lytics, www.sas.com/resources/.../ WR46345.pdf, retrieved10/02/14.

15

International Journal of Pure and Applied Mathematics Special Issue

1047

Page 16: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

[22] Hadeer Mahmoud, Abdelfatah Hegazy, Mohamed H.Khafagy,An approach for big data security based on Hadoopdistributed file system, International Conference on Inno-vative Trends in Computer Engineering (ITCE),2018,DOI:10.1109/ITCE.2018.8316608.

[23] Masoumeh Rezaei Jam, Leili Mohammad Khahli, MohammadKazem Akbari, A Survey on Security of Hadoop, Interna-tional Conference on Computer and Knowledge Engineering(ICCKE), 2014,DOI: 10.1109/ICCKE.2014.6993455

[24] Youngho Song, Young-Sung Shin, Miyoung Jang, Jae-WooChang, Design and Implementation of HDFS Data EncryptionScheme using ARIA Algorithm on Hadoop, IEEE InternationalConference on Big Data and Smart Computing (BigComp),2017, DOI: 10.1109/BIGCOMP.2017.7881720.

[25] Yuan Tian, Towards the Development of Best Data Securityfor Big Data, Communication and Network, Scientific ResearchPublishing Inc. vol-9, pp-291-301, 2017.

[26] Seonyoung Park and Youngseok Lee Secure Hadoop with En-crypted HDFS, J.J. Park et al. (Eds.): GPC 2013, LNCS 7861,pp. 134141, Springer, Berlin, Heidelberg.

[27] Lei. Xu, X. Wu and X. Zhang. CL-PRE: a certificatelessproxy reencryption scheme for secure data sharing with publiccloud. Proc. Of 2012 ACM Symposium on Information, Com-puter and Communications Security (ASIACCS12), , pp. 87-88. 2012,

[28] Min Li, Wang Zang, Kun Bai, Men Yu, and Peng Liu, My-Cloud: Supporting User-Configured Privacy Protection inCloud Computing, In Proceedings of ACM ACSAC, pp. 59-68, 2013.

[29] Securing Hadoop: Security Recommen-dation for Hadoop Environments athttps://securosis.com/assets/library/reports/Securing HadoopFinal V2.pdf

16

International Journal of Pure and Applied Mathematics Special Issue

1048

Page 17: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

[30] Raj R. Parmar, Sudipta Roy, Debnath Bhattacharyya, SamirKumar Bandyopadhyay, and Tai-Hoon Kim, Large-Scale En-cryption in the Hadoop Environment: Challenges and Solu-tions,https://ieeexplore.ieee.org/document/7922533/

17

International Journal of Pure and Applied Mathematics Special Issue

1049

Page 18: Big DataSecurity challenges: Hadoop Perspectiveby big industries like Google, Yahoo, Facebook, etc. [2]. About 63% of various communities and organizations are using Hadoop to manage

1050