39
Unlocking Hadoop for Your Rela4onal DB Kathleen Ting | @kate_ting Technical Account Manager, Cloudera | Sqoop PMC Member BigData.be April 4, 2014

Unlocking Hadoop!for!Your!Relaonal!DB! - BigData.bebigdata.be/wp-content/uploads/2014/04/140404... · Whatis! Sqoop? • Apache!TopOLevel!Project • SQl!to!hadOOP! • Tool!to!transfer!datafrom!

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

           Unlocking  Hadoop  for  Your  Rela4onal  DB  

         Kathleen Ting | @kate_ting Technical Account Manager, Cloudera | Sqoop PMC Member BigData.be April 4, 2014      

Who  Am  I?  

•  Started  3  yr  ago  as  1st  Cloudera  Support  Eng  •  Now  manages  Cloudera’s  2  largest  customers  

•  Sqoop  CommiJer,  PMC  Member  •  Co-­‐Author  of  the  Apache  Sqoop  Cookbook  

What  is  Sqoop?  

•  Apache  Top-­‐Level  Project  •  SQl  to  hadOOP  •  Tool  to  transfer  data  from  rela4onal  databases  

•  Teradata,  MySQL,  PostgreSQL,  Oracle,  Netezza  

•  To/From  Hadoop  ecosystem  •  HDFS  (text,  sequence  file),  Hive,  HBase,  Avro  

3

Why  Sqoop?  

•  Efficient/Controlled  resource  u4liza4on  •  Concurrent  connec4ons,  Time  of  opera4on  

•  Datatype  mapping  and  conversion  •  Automa4c,  and  User  override  

• Metadata  propaga4on  •  Sqoop  Record  •  Hive  Metastore  •  Avro  

Agenda  

Sqoop  1  •  Sqoop  1  Architecture  •  Sqoop  1  Command  Line  •  Sqoop  1  Examples  •  Sqoop  1  Challenges  •  Troubleshoo4ng  Sqoop  1  •  Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Sqoop  2  •  Sqoop  2  Architecture  •  Sqoop  2  Design  Goals  •  Sqoop  2  UI  in  Hue  Resources  

Agenda  

Sqoop  1  •  Sqoop  1  Architecture  •  Sqoop  1  Command  Line  •  Sqoop  1  Examples  •  Sqoop  1  Challenges  •  Troubleshoo4ng  Sqoop  1  •  Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Sqoop  2  •  Sqoop  2  Architecture  •  Sqoop  2  Design  Goals  •  Sqoop  2  UI  in  Hue  Resources  

Sqoop  1  Architecture  

7

Sqoop  1  Command  Line  

sqoop TOOL PROPS ARG [-- EXTRA] •  TOOL:  import,  export  •  PROPS

•  Hadoop  (java)  proper4es  •  -Dwhatever.whenever=yes

•  ARG •  Generic  SQOOP  arguments  •  --table, --connect,  ...  

•  EXTRA •  connector  specific  •  --schema (PostgreSQL  and  Microsoa  SQL  Server)  

Sqoop  1  Example  

sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities

sqoop export \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities \ --export-dir /temp/cities

Sqoop  1  Challenges  

•  Cryp4c,  contextual  command  line  arguments  •  Security  concerns  •  Type  mapping  is  not  clearly  defined  •  Client  needs  access  to  Hadoop  binaries/configura4on  and  database  

•  JDBC  model  is  enforced  

10

Troubleshoo4ng  Sqoop  1  

•  Versions:  Sqoop,  Hadoop,  OS,  JDBC  •  Console  log  aaer  running  with  the  --verbose flag  

•  Capture  the  en4re  output  via  sqoop import … &> sqoop.log •  En4re  Sqoop  command  including  the  op4ons-­‐file  if  applicable  •  Expected  output  and  actual  output  •  Table  defini4on  •  Small  input  data  set  that  triggers  the  problem  

•  Especially  with  export,  malformed  data  is  oaen  the  culprit  •  Hadoop  task  logs  

•  Oaen  the  task  logs  contain  further  informa4on  describing  the  problem  •  Permissions  on  input  files  

Troubleshoo4ng  Sqoop  1  

Imported  table  has  more  rows  than  source  table?  •  Data  contains  char  used  as  Hive’s  delimiters  

•  Clean  up  data  •  --hive-drop-import-delims

•  Removes  \n, \t, and \01 char

•  --hive-delims-replacement “SPECIAL” •  Replaces  \n, \t, and \01  char  with  string  SPECIAL

•  Not  restricted  to  Hive  -­‐  any  import  job  using  text  files  •  Ensure  output  files  have  one  line  per  imported  row  

Agenda  

Sqoop  1  •  Sqoop  1  Architecture  •  Sqoop  1  Command  Line  •  Sqoop  1  Examples  •  Sqoop  1  Challenges  •  Troubleshoo4ng  Sqoop  1  •  Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Sqoop  2  •  Sqoop  2  Architecture  •  Sqoop  2  Design  Goals  •  Sqoop  2  UI  in  Hue  Resources  

Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Protec4ng  Your  Password  

sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ -P

sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ --password-file my-sqoop-password

Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Sqoop  Works  on  CLI  Not  in  Oozie  

Character parameter '|' has multiple characters; only the first will be used.

Got error creating database manager: java.io.IOException:

No manager for connect string: "jdbc:teradata...”

Sqoop  Works  on  CLI  Not  in  Oozie  

sqoop import --password "spEci@l\$" \ –connect 'jdbc:x:/yyy;db=sqoop’

•  Remove  all  escaping  that  you’ve  added  for  the  shell  •  Use  <arg>  vs  <command>  tags  as  content  is  considered  to  be  one  parameter  

•  Put  all  -­‐D  parameters  into  configura4on  sec4on  •  Install  driver  into  workflow’s  lib/  directory  or  shared  ac4on  library  /user/oozie/share/lib/sqoop/  

Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Choosing  Proper  Connector  

•  JDBC  driver  is  dependency  for  all  three  connectors  

•  Sqoop  automa4cally  chooses  most  op4mal  connector  (OraOoop,  built-­‐in,    

       Generic  JDBC  Connector)  •  Or  explicitly  chose:    --connection-manager com.quest.oraoop.OraOopConnManager

Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Overriding  Type  Mapping  

-­‐-­‐map-­‐column-­‐java  parameter  •  comma  separated  list  of  key-­‐value  pairs  

•  key  =  exact  column  name  •  value  =  target  Java  type    

sqoop import \

--map-column-java \

c1=Float,c2=String,c3=String ...

Agenda  

Sqoop  1  •  Sqoop  1  Architecture  •  Sqoop  1  Command  Line  •  Sqoop  1  Examples  •  Sqoop  1  Challenges  •  Troubleshoo4ng  Sqoop  1  •  Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Sqoop  2  •  Sqoop  2  Architecture  •  Sqoop  2  Design  Goals  •  Sqoop  2  UI  in  Hue  Resources  

Sqoop  2  Architecture  

25

Sqoop  2  Design  Goals  

•  Security  and  Separa4on  of  Concerns  •  Role  based  access  and  use  

•  Ease  of  extension  •  No  low-­‐level  Hadoop  knowledge  needed    •  No  func4onal  overlap  between  Connectors  

•  Ease  of  Use  •  Uniform  func4onality  •  Domain  specific  interac4ons  

Sqoop  2  UI  in  Hue  

•  Troubleshoo4ng  •  sqoop.log  file  is  located  in  @LOGDIR@  and  the  rest  should  be  in  server/logs/*  

•  Look  for  catalina.out,  catalina.log,  localhost-­‐*.log  

28

29

30

31

32

33

34

35

36

37

Agenda  

Sqoop  1  •  Sqoop  1  Architecture  •  Sqoop  1  Command  Line  •  Sqoop  1  Examples  •  Sqoop  1  Challenges  •  Troubleshoo4ng  Sqoop  1  •  Common  Sqoop  1  Issues  

•  Protec4ng  Your  Password  •  Sqoop  Works  on  CLI  Not  in  Oozie  •  Choosing  Proper  Connector  •  Overriding  Type  Mapping  

Sqoop  2  •  Sqoop  2  Architecture  •  Sqoop  2  Design  Goals  •  Sqoop  2  UI  in  Hue  Resources  

Resources  

39

Sqoop 2 http://archive-primary.cloudera.com/cdh5/cdh/5/sqoop2/

Sqoop 1