25
“Big Data: Big Costs, Enormous Risks and Whopping Opportuni:es” = Is your archive half empty, half full or are you afraid to look? …www.matrixconnexions.com… Matrix Connexions Consultancy and Advisory Services Dr Michael R. Taylor – Managing Consultant [email protected] Mobile: 07595 359 506

“Big%Data:%Big%Costs,%Enormous% Risks%and%Whopping ......“Big%Data:%Big%Costs,%Enormous% Risks%and%Whopping% Opportuni:es”% =%Is%your%archive%half%empty,%half%full% or%are%you%afraid%to%look?%

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • “Big%Data:%Big%Costs,%Enormous%

    Risks%and%Whopping%

    Opportuni:es”%

    =%Is%your%archive%half%empty,%half%full%

    or%are%you%afraid%to%look?%

    …www.matrixconnexions.com… !

    Matrix Connexions Consultancy and Advisory Services!Dr%Michael%R.%Taylor%–%Managing%Consultant%

    [email protected]%

    Mobile:%07595%359%506%

  • Presenta:on%Outline:%

    %

    1. %Data%becomes%BIG%DATA.%

    %

    2. %Data%Structures%

    %

    3. %Data%Usage%–%STRUCTURED%and%UNSTRUCTURED%

    %

    4. %Meteoric%rise%of%UNSTRUCTURED%Data%

    %

    5. %Big%Costs,%Enormous%Risks%

    %

    6. %Informa:on%Management%Impera:ves%

    %

    7. %Whopping%Opportuni:es%

    %

    OPTIONAL%OVER%LUNCH%–%‘Big%Data%Jargon’%and%the%‘Big%Data%Quick%Step’%

    …www.matrixconnexions.com… !

    Public Sector Group - IRMS

  • * %Data%becomes%BIG%DATA%

    Gartner%on%BIG%DATA:%“Big$data%is%high=volume,%high=velocity%and%high=variety%informa:on%assets%that%demand%cost=effec:ve,%

    innova:ve%forms%of%informa:on%processing%for%enhanced%insight%

    and%decision%making.”%

    • Volume%• Velocity%and%%• Variety%

    …www.matrixconnexions.com… !

  • What%is%Crea:ng%this%BIG%DATA?%

    SMART%phones%and%TABLET%Computers%generate%UNSTRUCTURED%

    data%(including%sound%and%video%files)%

    •  SMART%phone%ownership,%72%%of%UK%popula:on%in%2013%=%expected%to%be%98%%by%2014/5%

    Gartner%on%BIG%DATA:%“The%size,%complexity%of%formats%and%speed%

    of%delivery%exceeds%the%capabili:es%of%tradi:onal%data%

    management%technologies”%

    •  TABLET%ownership,%22%%of%UK%popula:on%in%2013%% %=%expected%to%be%50%%by%2014/5%

    …www.matrixconnexions.com… !

    •  More%than%50%%of%adult%popula:on%using%social%media%% %=%UK%Office%of%Na:onal%Sta:s:cs%

  • What%is%Crea:ng%this%BIG%DATA?%

    Nordicana%2014:%1st%February%2014,%Old%Truman%Brewery,%London,%E1%

    …www.matrixconnexions.com… !

    SMART%phone%photo%e=mailed%to%5%friends%and%Facebook%=%10MByte%storage%

    %

    =  each%friend%forwards%to%2%other%friends%=  addi:onal%20MByte%storage%

    =  add%iCloud%and%other%back=up%storage%=  %>100MByte%storage%required%for%1%photo!%

  • What%is%Crea:ng%this%BIG%DATA?%

    Nordicana%2014:%1st%February%2014,%Old%Truman%Brewery,%London,%E1%

    …www.matrixconnexions.com… !

    3%photos%use%more%storage%capacity%than%a%1979%250MByte%Hard%Drive!%

    %

    500%SMART%phones%users%taking%only%3%photos%would%typically%require%600$Hard$Drives$of$1979$vintage$to$store$the$150GBytes$of$DATA$generated!$%

    (1000%Megabytes%=%1%Gigabyte,%1000%Gigabytes%=%1%Terabyte,%1000%Terabytes%=%1%Petabyte)%%

  • *%Data%Structures%%

    Gartner%on%Dark%Data:%%“DARK%DATA%are%the%

    informa:on%assets%organiza:ons%collect,%process%and%

    store%during%regular%business%ac:vi:es,%but%generally%

    fail%to%use%for%other%purposes%(for%example,%analy:cs,%

    business%rela:onships%and%direct%mone:zing.%

    %

    …organiza>ons$o?en$retain$dark$data$for$compliance$purposes$only.$$Storing$and$securing$(dark)$data$typically$incurs$more$expense$(and$some>mes$greater$risk)$than$value.”$

    …www.matrixconnexions.com… !

  • Public%Sector%Data%in%2014%

    It%is%frequently%DISPARATE%and%%%

    •  90%%of%all%digital%data%has%been%created%in%the%last%two%years%

    …www.matrixconnexions.com… !

  • Different%Data%Structures%=%These%categories%do%not%have%100%%support%but%are%possibly%the%most%used.%

    Structured%Data:%

    %Data%that%resides%in%a%fixed%field%within%a%record%or%file%is%

    %called%structured%data%–%rela:onal%%databases%and%

    %spreadsheets.%

    Unstructured%Data:%

    %Data%(mainly%in%the%form%of%text)%that%can't%readily%be

    %classified,%par:cularly%webpages,%PDF%files,%PowerPoint,%

    %emails,%blog%entries,%wikis%and%word%processing%documents.%

    Semi=Structured%Data:%

    %This%is%data%that%is%a%cross%between%the%two%and%may%also%

    %include%visual%and%acous:c%files.%It%has%been%%referred%to%

    %as%the%‘Duck%Billed%Platypus%of%Data’.%

    …www.matrixconnexions.com… !

  • *%Data%Usage%–%Scien:fic%Analysis%

    1960/70s%Physics%Research%at%RRE%Malvern%Using%Transistor%Computer%

    =%24K%words%of%core%store%(100K%Byte%Memory)%

    =%Paper%tape%for%Input%and%Output%of%Data%

    =%Mag%tape%for%Data%storage.%

    •  ‘%White%hot%heat%of%technology’=%lab%notes%and%all%DATA%records%very%:ghtly%controlled%(lasers,%LCDs%and%superconductors)%

    %

    •  Failure%to%lodge%DATA%with%Physics%Registry%a%disciplinary%offence.%

    …www.matrixconnexions.com… !

    Scien:fic%Discoveries%Based%on%Analysis%of%STRUCTURED%DATA%

  • Data%Usage%–%Predic:ve%Analysis%

    Predic:ve%Flight%Performance%Based%on%STRUCTURED%DATA%

    …www.matrixconnexions.com… !

    1970/80s%NASA%Space%Explora:on%using%General%Purpose%Computers%(GPC)%

    %=%400K%Byte%of%storage%

    %=%upgraded%to%1%M%Byte%in%1991%

    %

    •  BUT%physical%and%memory%constraints%meant%liule%capacity%for%mission%DATA%%

    •  HP%41%Programmable%Calculators%used%for%mission%specific%DATA%and%onboard%experiments.%Paul%Fisher’s%an:=gravity%pens%used%to%record%UNSTRUCTURED%DATA!%

    1972$GPC$as$fiQed$to$Challenger$

  • Data%Usage%–%Visualisa:on%of%Unstructured%

    Text%

    1990s%Metropolitan%Police%Intelligence%System%(MCRAC)%%

    %%

    •  32%independent%Borough%based%Criminal%Intelligence%Data%Bases%%

    •  Joined$up$32$Data$Bases$as$a$POC$Demo$$$•  3$significant$inves>ga>ons$solved$in$an$a?ernoon!$%

    …www.matrixconnexions.com… !

    Intelligence%Opera:ons%Based%on%the%%%%UNSTRUCTURED%DATA%Visualisa

    :on%of%

    v%

  • Data%Usage%–%Assessment,%Visualisa:on%

    and%Sharing%of%Unstructured%Text%%

    2006%Sensi:ve%Data%sharing%across%Whitehall%JIC%Members%2006%

    …www.matrixconnexions.com… !

    ‘Joined%Up’%Users%of%UNSTRUCTURED%and%Previously$DARK$DATA%

    2014%UNSTRUCTURED%DATA%was%not%as%DARK$as%it%was%in%2001!%

  • Data%Analy:cs%

    Taylor%M.%R.,%August%2011,%“Latent%Seman:c%Indexing%–%Why%Conceptual%Search%

    is%Vital%in%the%Analysis%of%Large%Mul:=Lingual%Data%Sets”.%Digital%Forensics%

    Magazine.,%Issue%08,%pp.%17=23.%%

    …www.matrixconnexions.com… !

    Key$Message:$Document$Analy>cs$is$Complicated!$%

    Search%Engines%are%not%the%ANSWER%–%value%is%not%in%the%words%but%in%what%the%

    words%are%being%used%to%say!%

    %

    •  Dynamic%Clustering%•  Concept%–based%Categoriza:on%•  Conceptual%Search%•  Summariza:on%•  Near%Duplica:on%Detec:on%•  Language%Analy:cs%•  E=mail%Analy:cs%

  • *%Meteoric%Rise%of%UNSTRUCTURED%Data%

    The%Velocity%and%Volume%of%Data%Crea:on%is%Overwhelming%

    %=%dominated%by%UNSTRUCTURED%DATA,%the%most%difficult%data%to%analyze!%

    =%90%%of%all%data%will%be%UNSTRUCTURED%data%types%by%2015%

    …www.matrixconnexions.com… !

  • Some%Data%is%Dark%because%it%is%siloed%and%inaccessible!%

    Legacy%IT%Business%Solu:ons%were%built%to%solve%specific%business%

    problems%and%the%policies%existent%at%:me%of%procurement.%

    •  Proprietary%•  Monolithic%•  Slow%and%•  Expensive.%

    Legacy%systems%are%frequently%inflexible%and%unable%to%share%

    their%siloed%data%–%hence%the%rise%of%Dark%Data!%

    %

    Less%than%0.5%%is%analysed%for%any%purpose.%

    …www.matrixconnexions.com… !

  • 90%%of%Public%Sector%Departments%

    Cannot%Access%the%Right%Data.%

    Photo%Credit:%©%Mark%Richards%2012%/%from%The%Human%Face%of%Big%Data.%

    In%2012%more%than%90%%of%Local%Government%Departments%could%

    not%access%&%process%the%correct%informa:on%to%support%their%

    current%business%outcomes!%

    …www.matrixconnexions.com… !

  • *%BIG%DATA,%Big%Costs%%%%

    BIG%DATA%repositories%typically%contain%85%%of%an%organiza:on's%

    UNSTRUCTURED%informa:on%resource.%

    Documents,%par:cularly%e=mails%have%a%habit%of%growing%exponen:ally%

    %

    •  Data%repositories%typically%contain%40%%DUPLICATIONS%%%

    •  Duplicates%cost%between%£5%=%£80%per%document%%

    •  Increased%risk%of%accidental%distribu:on%%

    •  Near%Duplicates%can%create%mul:ple%versions%of%the%‘Truth’%%

    …www.matrixconnexions.com… !

  • Why%are%Duplicates%so%Damaging?%

    Because%holding%duplicate%records%is%like%driving%a%%

    car%with%a%DIRTY$WINDSCREEN:%%

    Duplicates:%

    •  Devalue%your%analysis%

    •  Slows%down%system%response%%

    •  Create%unnecessary%storage%costs%

    •  Compromise%informa:on%legisla:on%%

    •  Cause%uninten:onal%policy%viola:ons.%%

    …www.matrixconnexions.com… !

  • BIG%DATA,%Enormous%Risks%%%

    BIG%DATA%repositories%create%Enormous%Risks%–%mobile%devices%

    contain%>80%%of%an%organiza:on's%IP.%

    %

    Some%security%related%issues:%

    %

    •  Big%Data%sources%can%create%policy%noncompliance.%%%

    •  Is%your%data%stored%securely%and%who%is%authorised%to%access%it?%

    •  Who%has%accessed%it%already?%%

    •  Is%the%data%correctly%Protec:vely%Marked?%

    …www.matrixconnexions.com… !

  • *%Informa:on%Management%Impera:ves%

    %%%%%%Photo%Credit:%©%Jason%Grow%2012%%

    All%of%the%above%simultaneously!%

    •  Iden:fy%the%informa:on%required%to%improve%business%outcomes%–%develop%

    an%Informa:on%Strategy.%

    •  Manage%the%data%they%have%and%who%has%access%to%it%–%implement%effec:ve%

    Informa:on%Management.%

    •  Leverage%Unstructured%Data%through%Big%Data%analy:cs.%

    …www.matrixconnexions.com… !

  • Public%Sector%managers%that%leverage%their%‘BIG%DATA’%assets%will%be%tomorrow’s%

    leaders!%

    %

    •  BIG%DATA%tools%provide%powerful%insights%and%the%ability%to%predict%trends.%%%%

    •  The%measurable%outcomes%for%these%Public%Sector%Departments%are%increased%efficiency%and%effec:veness.%

    %

    •  Whopping%opportuni:es%arise%from%combining%and%analyzing%data%from%mul:ple%sources%so%you%can%take%the%right%ac:on,%at%the%right%:me%and%in%the%

    right%place.%%%%

    *%%Whopping%Opportuni:es%for%the%%

    %Public%Sector%…www.matrixconnexions.com… !

  • Public%Sector%Opportuni:es%

    Iden:fy%process%‘risk%and%pain%points’%that%would%be%removed%by%

    leveraging%your%Big%(Disparate%&%Dark)%Data.%%%

    •  $$Manage%your%Big%DATA%•  Remove%duplicates%•  Remove%non%compliant%files%%

    •  $$Protect%your%Big%DATA%•  Check%protec:ve%marking%•  Re=label%if%necessary%•  Move%sensi:ve%Data%to%secure%storage%$%

    …www.matrixconnexions.com… !

  • Public%Sector%Opportuni:es%

    %

    •  $$FREEDOM%OF%INFORMATION%(FOI).%%%Is%there%a%business%case%to%automate%the%%

    %FOI%process?%

    %

    •  $$FRAUD.%$Would%it%be%beneficial%to%iden:fy%mul:ple%%%iden::es%and%fraudulent%claim%pauerns?%

    …www.matrixconnexions.com… !

    “Think%Big,%Start%Small,%Deliver%Benefits%Incrementally.”%%

  • %

    THANK%YOU%

    %Please%feel%free%to%contact%me%if%you%would%like%to%discuss%aspects%of%

    this%presenta:on.%

    %%

    Dr%Michael%R.%Taylor%

    Managing%Consultant%

    Matrix%Connexions%

    michael.taylor@matrixconnexions%

    Tel:%+44(0)%7595%359%506%

    …www.matrixconnexions.com… !

    Public Sector Group - IRMS