18
Unified Analytics through Apache Lens @Inmobi Amareshwari Sriramadasu

UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Unified  Analytics  through  Apache  Lens @Inmobi

Amareshwari  Sriramadasu

Page 2: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Agenda

Introduction  to  Apache  

LensArchitecture

How  is  it  used  at  Inmobi?

Challenges  faced Roadmap

Page 3: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Analytics  Evolution

Analytics  2.0 Analytics  3.0

Data  warehouse

HadoopNOSQL  DBs

RDBMS

Data  warehouse

Hadoop  +  Spark  +  Data  warehouse  +  NOSQL  DBs

Analytics  1.0

Page 4: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Apache  Lens

• Platform  to  enable  multi-­‐dimensional  queries  in  a  unified  way  over  datasets  stored  in  multiple  warehouses

• OLAP  Cube  abstraction• Data  discovery  by  providing  single  metadata  layer• Unified  access  to  data  by  integrating  Hive  with  other  

traditional  warehouses• Queries  get  pushed  to  where  data  resides

Page 5: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Lens  Architecture

Page 6: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Where  does  it  fix  in  ecosystem?

Page 7: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Use  cases  @Inmobi

• Operational– Reporting– Close  of  books

• Decision  making– Real  time  decisions– Recommendations

• Explorative– Adhocanalysis– Debugging

Page 8: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

How  is  Lens  used  at  Inmobi?

Unified  schema  across  HDFS  and  columnar  DB  through  OLAP

Adhoc querying  on  unified  data

Real  time  data  queryable with  Druid  driver  (In  progress)

Experiment  with  Hive  and  SparkSQL in  production  (In  progress)

Scheduling  reports  (Soon)

Defacto data  access  layer  at  Inmobi (Soon)

Page 9: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

OLAP  Data  model

Storage Cube

Fact  Table• Physical  Fact  tables

Dimension

Dimension  Table• Physical  Dimension  tables

Page 10: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Example  query

cube  select  cityname,  revenue  from  testcube where  timerange_in(dt,  '2015-­‐03-­‐10-­‐03’,  '2015-­‐03-­‐12-­‐03’)

SELECT  (citytable.name),  sum((testcube.revenue)) FROM  c2_testfact  testcubeINNER  JOIN  c1_citytable  citytable ON  ((testcube.cityid)=  (citytable.id))WHERE  ((  testcube.dt='2015-­‐03-­‐10-­‐03')  OR  (testcube.dt='2015-­‐03-­‐10-­‐04')  OR  (testcube.dt='2015-­‐03-­‐10-­‐05')  OR  (testcube.dt='2015-­‐03-­‐10-­‐06')  OR  (testcube.dt='2015-­‐03-­‐10-­‐07')  OR  (testcube.dt='2015-­‐03-­‐10-­‐08')  OR  (testcube.dt='2015-­‐03-­‐10-­‐09')  OR  (testcube.dt='2015-­‐03-­‐10-­‐10')  OR  (testcube.dt='2015-­‐03-­‐10-­‐11')  OR  (testcube.dt='2015-­‐03-­‐10-­‐12')  OR  (testcube.dt='2015-­‐03-­‐10-­‐13')  OR  (testcube.dt='2015-­‐03-­‐10-­‐14')  OR  (testcube.dt='2015-­‐03-­‐10-­‐15')  OR  (testcube.dt='2015-­‐03-­‐10-­‐16')  OR  (testcube.dt='2015-­‐03-­‐10-­‐17')  OR  (testcube.dt='2015-­‐03-­‐10-­‐18')  OR  (testcube.dt='2015-­‐03-­‐10-­‐19')  OR  (testcube.dt='2015-­‐03-­‐10-­‐20')  OR  (testcube.dt='2015-­‐03-­‐10-­‐21')  OR  (testcube.dt='2015-­‐03-­‐10-­‐22')  OR  (testcube.dt='2015-­‐03-­‐10-­‐23')  OR  (testcube.dt='2015-­‐03-­‐11')  OR  (testcube.dt='2015-­‐03-­‐12-­‐00')  OR  (testcube.dt='2015-­‐03-­‐12    -­‐01')  OR  (testcube.dt='2015-­‐03-­‐12-­‐02')   )AND  (citytable.dt =  'latest')  GROUP  BY(citytable.name)

Page 11: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Unifying  schema

Multiple  expressions

De-­‐normalized  variables

Join  chains

Page 12: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

JoinChains explained

SalesCube  :adv_id,adv_cidpub_id,pub_cidrq_cid,

Impressions_served

Country:   id,  name

Advertiser   :id,  cid

Publisher:   id,cid

adv_id=advertizer.id

pub_id=publisher.id

rq_cid=  country.id

Publisher.cid=  country.id

Advertiser.cid=  country.id

pub_cid=country.id

adv_cid=country.id

Page 13: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Other  challenges

Partition  timeline

Query  submission  throttling

DB  Resources

Query  statistics

Bridge  tables

Page 14: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Bridge  Tables  explained

ID Name Gender

1 Ram M

2 John M

3 Suma F

User

ID Name

1 Football

2 Cricket

3 Basketball

SportsUser  ID Sport ID

1 1

1 2

2 1

2 2

2 3

User  Interests

User  Id Revenue

1 100

2 50

Fact  Table User’s  sport  interest

Revenue

Football 150

Cricket 150

BasketBall 50

User’s  sport  interest

Revenue

Football,  Cricket 100

Football,  Cricket,  BasketBall

50

Report1

Report2

Page 15: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Lens  features  added  in  previous  releases

• Zeppelin  integration• Pluggable  user  rewriter• Elastic  search  driver• Query  submission  throttling• Saved  queries  and  parameterization• Instrumentation  for  all  api

Page 16: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Roadmap

Immediate

Move  to  Apache  Hive  dependencyAbility  to  load  multiple  instances  of  same  driver  classDruid  driver

Medium  Term

Distributed  server  deploymentEstimate  query  execution   timesAuthorization  across  all  services  and  storagesAdd  scheduler   serviceMake  it  suitable  to  integrate  with  BI  tools

Long  term

Query  cachingMetastore UIAdministrator  consoleAutomatic  roll  up  suggestions  on  hot  datasets

Page 17: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Stay  Involved

Its  a  TLP  now!

Web  site

• http://lens.apache.org/

Source  repo  

• https://git-­‐wip-­‐us.apache.org/repos/asf/lens.git

Mailing  lists

[email protected][email protected]

Page 18: UnifiedAnalytics’through Apache’ Lens@Inmobi · Druid’driver Medium’Term Distributed’server’deployment Estimate’query’execution’times Authorization’across’all’services’

Thank  You!

• Questions?