21
19 th January 2015 BDA Technologies & Selected Case Studies Ettikan Kandasamy Karuppiah (Ph.D), Principal Researcher & Director of Accelerative Technologies Lab MIMOS Berhad SEMINAR INTERNET COMPUTING TECHNOLOGY “Theme: Delivering Values From Hyperconnectivities” 2.00-2.45pm @Bilik Serbaguna 1, MAMPU

5. BDA Technologies & Selected Case Studies

Embed Size (px)

DESCRIPTION

Technologies & Selected

Citation preview

19th January 2015 BDA Technologies & Selected Case Studies Ettikan Kandasamy Karuppiah (Ph.D),Principal Researcher & Director of Accelerative Technologies Lab MIMOS Berhad SEMINAR INTERNET COMPUTING TECHNOLOGYTheme: Delivering Values From Hyperconnectivities 2.00-2.45pm @Bilik Serbaguna 1, MAMPU 2 Big data is defined by the high volume, velocity, variety, veracity and value of data which are generated every second, minute, hour, day.by device, human etc Turning big data into Value ECONOMIC BENEFITS GOVERNMENT BENEFITS SOCIETAL BENEFITS VOLUME Growing data 90% of worlds data generated over last 2 years VELOCITY Increasing data 175,000 tweets per second VARIETY Broadening data 80% of the worlds data is unstructured (text, geospatial, audio, video) VERACITY Establishing the of big data sources Big Data technology allows us to establish quality and accuracy especially in unstructured data Big Data Analytics in a Glance Big Data Computing in ICT SectorThe Malaysian ICT services sub-sector has huge potential growth, with a projected share of 35% in the nations Digital Economy in 2020...RequiresTransformativePlatform Source: MDEC, as taken from APeJ Big DataMaturityScape Assessment 2013 by IDC Software Solutions and Support is the Key GDP ContributorBusiness Value Data Modeling & Visualization for PDRM Workforce Planning& GPGPU Data Security Library MIMOS BigData Technologies R&D Establish work on General Purpose Graphics Processing Unit for text manipulation,Hadoop Trainings MultiCore Java Compiler Acquire Train Conducted Workshop, Hadoop Programming training to Malaysian Research Community Collaboration R&D MiAccLib CleansingMiAccLib Finance Data Cleansing Engine for PERKESO & Data Warehouse for PERKESO MiAccLib Algo/Map nVidia COE for GPGPU Established MiAccLib Crypto Sentiment Analysis Model &Data Modeling & Data Warehouse for PIK MOH & GPGPU Video Data Analytics Library R&D Data Encryption/Decryption for National Data Protection MiAccLib Video GPU Accelerated Libraries for Data Cleansing & Financial Risk Modeling MiAccLib BigData Accelerated Libraries for Database Accelerator Library (Galactica) 2014 MIMOS Berhad. All Rights Reserved. 4 GE13 Electoral Roll Analysis with Hadoop & GPU MiAccLib Cleansing ESRI Inc/US Mou Established Acquire Train Intel Malaysia /US MoU AMD Malaysia /US/Europe MoU High Risk Profiling, Illicit, Taxable & Drugs Detection (PoC) MiAccLib Image RM10 ->Foundation & Early Adaptation for Heterogenetic Computing RM11 ->Maturation & Progressive Deployment of Scalable Heterogenetic Computing Assisting Both Government & Private Sector NeedsPrivate Sector to Go Global National Public Sector Source : MDeC DECISIONS REQUESTED FCC is requested to: 1. Take note of data science upskilling for civil servants 2. Take note of MAMPU developing the Government Open Data framework by 2015 3. Endorse the DG Lab on BDA to identify use cases and pilot projects that address societal wellbeing 4. Take note of MIMOS defining and developing the Big Data technology platform for Government by 2015. 5. Mandate opening up of allrelevant data (Open/Non-Open) to the DG Lab on BDA for the pilot projectsRahsiaBesarRahsiaSulit TerhadTerbuka

Opening Up Non-Sensitive Government Data Policy for all government agencies to open up data categorised under terbuka o E.g. - non-sensitive data like meteorology, transport timetables and pricing of essential goods based on Open Data criteria + Policy Technology Developing BDA Open Innovation Platform An open-innovation platform between Government, businesses and Rakyat to improve e-participation and user satisfaction. Prioritization through the development of high impact, low-cost, demand driven life-event solutions POCs, pilots & apps Secure environment (sandbox) for Government Data BDA DG (Digital Government) LAB Expertise - Community Data - Government Data Project Sponsor Sector-specific use cases /life-events: eg. Welfare, Education, Healthcare, Transportation BDA Technology Platform DATAOUTCOMES Open Data Data.gov.my DATA Community Government Research & Development on KEY Data Extraction, Processing & Analytics Components i. National Data Sovereignty ii. Trusted Data iii. Secured Data Localized Entity (ie. MIMOS, Cybersecurity) Key Values Data Visualization Data Staging Cleansing Harmonisation Anonymisation Data Model & AnalyticsSecurity Infrastructure Management Data DB Store Data Extraction Traceability Machine Learning - Malaysian Context- (BM, English, Chinese, Tamil) Accelerated Computing Secured Cloud Services Visualization - MalaysianPerspective BDA Technology Platform Strategy 8 Mi-Cloud Mi-Harmony Mi-UAP Mi-MobileMi-MOCHA Mi-HelioMi-Morphe Mi-Harvester Mi-CLIP Mi-DocMi-ScramblerMi-Portal Mi-BISMi-ARMC Mi-Trust Mi-SP (Video Analytics) Mi-STP Mi-TargetMi-HPDW Mi-AccLytics Mi-DSSMi-AccLib Mi-TraceMi-ROSSMi-DW Mi-MarketGalactica Customization 3rd Party Systems & Hardware Data Security Data Extraction Data Staging Data DB Store Data Visualization Data Model & AnalyticsSecurity DataManagement Infrastructure Management Traceability Cleansing Harmonisation Anonymisation Data Source Structured + Open LinkedData Unstructured Applications BDA Technology Platform Strategy Extracting Value from Data Data Sharing Data Visualization Scrambled database & Datamarts Granular Primary Database DataAnonymisation PublishedData Marts Harmonisation DataHarmonisation Harmonisation Terminologies Cleansing DataCleansing Data Correction Staging Data DataHarvesting UnStructuredData Sources StructuredData Sources Virtualized Platform & Integrity Manager Mi-CLOUD + Mi-Mocha UnstructuredData Collector Mi-Clip Data Harmonisation Mi-Harmony + Mi-Semantics Detect Correction Exception Mi-Morphe +Mi-AccLib Data Anonymisation Mi-Scramble+ Mi-Crypto + MiAccLib Authentication & Authorization Mi-UAP Mi-ARMC Data Warehouse Platform(Mi-Galactica, Mi-AccConnect, Mi-HPDW) Data Modeling 2014 MIMOS Berhad. All Rights Reserved. 9 DataStatistics Mi-AccStat Sentiment Analytics Mi-Intelligence; Mi-NLP Data Visualization Mi-HELIO; Mi-BIS DataAnalytics Mi-Portal Social Network Analytics Mi-Visualitic Knowledge Harvester (LOD) Mi-Harvester DataAnalytics Mi-HPDW Data Analytics DataAnalytics Mi-Target 10 Mi-Cloud Mi-Harmony Mi-UAP Mi-MobileMi-MOCHA Mi-HelioMi-Morphe Mi-Harvester Mi-CLIP Mi-DocMi-ScramblerMi-Portal Mi-BISMi-ARMC Mi-Trust Mi-SP (Video Analytics) Mi-STP Mi-TargetMi-HPDW Mi-AccLytics Mi-DSSMi-AccLib Mi-TraceMi-ROSSMi-DW Mi-MarketGalactica New Platforms & Revisions Technology Challenges Ahead (11th Malaysia Plan) NEWER Channels of Consumption(eg. Omni channel data market) NEWER Sources of Data(eg. high speed streams) NEWER Methodsof Visualization(eg. Multi dimensional view) NEWER Paradigms on Computing (eg. Dockers) Technology Pull Technology Push 11 IoA Internet of AnythingII Industrial Internet IoE Internet of Everything IoT Internet of Things Big Data Moving Forward 12 IoA Internet of AnythingII Industrial Internet IoE Internet of Everything IoT Internet of Things Software Defined Network Big Data Processing Mobile Systems Wearables Cloud Computing Cyber-biological systems Cyber-physical systems InternetofHumans Big Data Moving Forward Open Platform & BDA Middleware Architecture DataExtraction Flume Mi-Clip Mi-Harvester Mi-Morphe Structured,Semi-structured & Un-structured Data Sources Open LinkedData Web & Social Media RDBMS Files Sqoop Data Model Mi-HPDW Kafka Data Cleansing Mi-MorpheMi-AccLib Data Anonymisation Mi-ScrambleData Harmonization Mi-HarmonyData Source Mi-CryptoMi-AccLib RDBMS Galactica FS HDFS, NoSQL GalacticaHadoopData warehouse / Data mart Data Storage Mi-HPDW STORAGE Infrastructure Mi-Cloud Mi-MochaGalactica YARN Mi-AccConnect PigHiveImpalaShark Galactica Connector RMahout ML-Lib (Spark) Mi-NLP Mi-AccStat Mi-HelioMi-BISMi-Portal Data Visualisation Data Analytics Tools (Machine Learning) Mi-UAP Data Security Mi-HPDW Mi-HPDW Mi-HPDW Mi-Target GIS Apache Drill | Spark/Shark| Hue ClouderaSearch & Solr RDF Graph DB Mi-Intelligence Cloudera Manager/Falcon Zoo Keeper Oozie Sentry Data Management Data Staging MIMOS Solution3rd Party Solution Mi-Trust Mi-Visualitics (Data Sources Type)RDBMSStreaming (twitter,logs, etc)NoSQL Data Type Stream Spark | Kafka | Spring XD & Storm Search Cloudera Search & Solr Application Program Interface Thrift | REST | Java API | AVRO Management YARN (resource management) | Big Data Orchestration Engine/Layer | Zookeeper (configuration and synchronization)Oozie (work flow scheduler) | Cloudera Manager | Management for Luster StorageHDFS | HPDW-Storage |Galactica FS| NoSQL (Hbase) Distributed Database (Cassandra) |RDBMS (Postgress, MySQL) VisualizationMi-Helio | Mi-Portal | Mi-BIS(Mi-AccConnect) | 3rd Party Apps Batch Query MapReducev2 | Pig | Hive Real Time QueryMi-BIS with Impala through Mi-AccConnectHue | Galactica | Apache Drill | Spark/Shark |HPDW-BigData DB Machine Learning Mi-BIS (Weka) | Accstats (R and Cloudera C++)ML-LIB (Spark) | Revolution R, Weka Processing Mi-Morphe | Morphlines | Mi-Acclib MapReducev2 (Accelerated ETL)HPDW Data Model Plugin(For MiMorphev3/Pentaho) Analytics Simulator | Planning Tool | Predictive Prescriptive | Prediction AlgorithmMi-BIS (Mi-Accstats) Mi-BIS (Data Mining) Revolution R 3rd PartyGIS 3rd party Legend: Security and Authentication Sentry | Mi-UAP | Mi-ARMC | Mi-Trust Data Management Sqoop | Flume MIMOS BigData Stack With Reference to Hadoop Stack Multi & Many Cores Processors (CPU +GPU)Complete 3rd Party3rd Party & MIMOS Offering MIMOS Technologies3rd Party Technologies 15 Proof of Concepts Selected Use Cases 16 Proof of Concepts -Mixed Scenario- (Technology Capabilities) 17 Challenges to be Addressed During Initial Roll-Outs Data Challenges (Stage 1) Data is stored in partial & distributed locations Format of data both in digital & non digital while some are in paper based format Incomplete data set (Q issues) Cleanliness of the data Missing values, Random, Non-Random, CR, Noise Cleaning while maintaining integrity & value Extracting the features Data in plural languages (at least English & Malay) Structured has longer historical value to be acquired Data storage media & format for extraction and usage How to authenticate the key values? Where is the reference point? As for unstructured data (e.g social media), current technology is adequate to support the pre-processing, analytics With some local challenges Who are the data owner? How to ensure the security level of the data for sharing? PDP compliance confusion . More to be share by visiting MIMOS Lab Analytics Challenges (Stage 2) Tools are available but right approach is still critical for evaluation Which are the best/right algorithms to be used? Can you identify the right domain expert within the organization? Who are the local domain experts to be consulted for the methods/algorithms selection? You may not have data scientist in specific gov. organization, but how to form one (external + internal) -> analytics team What exactly are the data owners business needs? Why do they need to do this? Headache for thembest to leave the data to rest in peace !! Which data to be included and which to be excluded, what to be anonymized? concern of meaning/trend extraction Plurality of languages & interpretation accuracy Semantification of the language specific analytics Bottlenecks to be identified and accelerated approach required for the specific processing Agile is the best way Results Challenges (Stage 3) Visualization of the results in simple, action-able and communicable how to handle continuously changing analytics (and the results) due to New data inclusion New domain expert inclusion New additional factors to be considered Who validates the results? How to translate results to value for (gov) organization How to translate the value to actions? How to follow-up on 2nd cycle of activities?Benefiting Humanity Through Technology Thank You