13
2/9/10 1 Introduc/on to Database Systems CISC437/637, Lecture #1 Ben CartereAe 1 Copyright © Ben CartereAe Copyright © Ben CartereAe 2 Physical and logical organiza/on of databases. Data retrieval languages, rela/onal database languages, security and integrity, concurrency, distributed databases.

Introduc/on&to&& Database&Systems&

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

2/9/10  

1  

Introduc/on  to    Database  Systems  

CISC437/637,  Lecture  #1  Ben  CartereAe  

1  Copyright  ©  Ben  CartereAe  

Copyright  ©  Ben  CartereAe   2  

Physical  and  logical  organiza/on  of  databases.  Data  retrieval  languages,  rela/onal  database  languages,  security  and  integrity,  concurrency,  distributed  databases.  

2/9/10  

2  

Database  Systems  

•  The  overview  in  5  Ws  (and  one  H):  – What  is  a  database?    What  is  a  database  management  system  (DBMS)?  

– Why  use  databases?    Why  study  them?  

– Who  works  with  databases?  – How  does  a  DBMS  work?  – Where  and  when  did  databases  originate?  

Copyright  ©  Ben  CartereAe   3  

What  is  a  Database?  

•  A  database  is  a  collec/on  of  data  – Usually  large  quan//es  of  interrelated  data  •  E.g.  student  records,  faculty  records,  courses,  classrooms,  payrolls,  …  

•  A  database  management  system  (DBMS)  is  a  so]ware  system  designed  to  store  and  manage  data  

Copyright  ©  Ben  CartereAe   4  

2/9/10  

3  

Why  Use  a  DBMS?  

“So  a  bunch  of  text  files  on  disk  can  be  a  database.    I’ll  just  process  them  with  Python.    Why  do  I  need  to  learn  about  DBMS  so]ware?”  

•  Data  too  large  to  fit  in  memory;  files  too  big  for  random  access  on  disk  

•  Arbitrarily  complex  queries  that  must  be  answered  quickly  •  Many  users  accessing  data  concurrently  

•  Some  users  need  different  access  permissions  

Copyright  ©  Ben  CartereAe   5  

Why  Use  a  DBMS?  

•  Data  independence  •  Efficient  access  

•  Integrity  and  security  •  Access  administra/on  

•  Concurrent  access  •  Applica/on  development  /me  

Copyright  ©  Ben  CartereAe   6  

2/9/10  

4  

Why  Not  Use  a  DBMS?  

•  DBMSs  are  large,  complex  programs  designed  for  very  general  data  needs  and  workloads;  not  always  op/mal  for  specialized  tasks  

•  Applica/on  may  need  to  manipulate  data  in  ways  not  supported  by  DBMS  

•  Security,  concurrent  access,  crash  recovery  may  not  be  cri/cal  

•  Example:    web  search  

Copyright  ©  Ben  CartereAe   7  

Why  Study  Databases?  

•  Mul/billion  dollar  industry,  second  only  to  opera/ng  systems  

•  Databases  form  backbone  of  many  informa/on-­‐centric  applica/ons  – Using  computa/on  to  create  and  understand  informa/on  

•  Implemen/ng  and  understanding  DBMS  incorporates  knowledge  from  every  area  of  CS  – Systems,  theory,  ar/ficial  intelligence  

Copyright  ©  Ben  CartereAe   8  

2/9/10  

5  

Applica/ons  of  Databases  

•  Electronic  commerce  and  banking  – Amazon,  eBay,  PayPal  

–  Integra/ng  vast  catalogs  and  accounts,  high  security  

•  Social  networking  – Facebook,  TwiAer  – Analyzing  flow  of  informa/on  through  large,  /ghtly-­‐connected  networks  

Copyright  ©  Ben  CartereAe   9  

Applica/ons  of  Databases  

•  Sensor  networks  – GPS,  RFID,  …  – O]en  supports  mission-­‐cri/cal  applica/ons  – Response  to  failures  and  trust  are  important  

•  Bioinforma/cs,  health  informa/cs  – Gene  Ontology,  PubMed,  …  – Requires  data  integra/on,  paAern  matching,  approximate  matching,  ranking,  automa/c  inference  

Copyright  ©  Ben  CartereAe   10  

2/9/10  

6  

Who  Works  With  Databases?  

•  DBMS  programmers  actually  implement  the  DBMS  so]ware  

•  Database  administrators  design  storage  requirements,  handle  security,  ensure  graceful  recovery,  tune  database  performance  

•  Applica;ons  programmers  write  so]ware  that  interacts  with  a  database  

•  End  users  use  the  so]ware  wriAen  by  applica/ons  programmers  

Copyright  ©  Ben  CartereAe   11  

How  Does  a  DBMS  Work?  

•  This  is  the  focus  of  the  course  •  Today:    a  brief  overview  of  the  topics  that  will  be  covered  

1.  Data  Models  2.  Database  Queries  3.  Transac/on  Management  4.  DBMS  Structure  

Copyright  ©  Ben  CartereAe   12  

2/9/10  

7  

Data  Models  

•  A  data  model  is  a  collec/on  of  concepts  for  describing  data  

•  A  schema  is  a  descrip/on  of  a  par/cular  collec/on  of  data  using  a  given  model  

•  The  rela;onal  data  model  is  the  most  commonly  used  – Rela;ons  (tables  of  records)  are  the  main  concept  – Every  rela/on  has  a  schema  that  describes  the  record  fields/table  columns  

Copyright  ©  Ben  CartereAe   13  

Levels  of  Abstrac/on  

•  Physical  schema  describes  the  specific  files  used  to  store  a  rela/on  on  disk  

•  Conceptual  schema  defines  the  logical  structure  of  rela/ons  

•  Views  or  external  schema  describe  how  users  see  the  data  

Copyright  ©  Ben  CartereAe   14  

Physical  Schema  

Conceptual  Schema  

View  1   View  2   View  3  

2/9/10  

8  

Data  Independence  

•  Using  an  external  schema  does  not  require  knowledge  of  conceptual  schema  – Logical  data  independence  

•  Using  a  conceptual  schema  does  not  require  knowledge  of  physical  schema  – Physical  data  independence  

•  In  other  words,  applica/ons  are  insulated  from  how  data  is  structured  and  stored  

Copyright  ©  Ben  CartereAe   15  

Database  Queries  

•  Queries  are  ques/ons  asked  of  the  data  •  A  query  language  specifies  how  queries  are  posed  in  a  specific  data  model  –  The  language  consists  of  keywords  and  operators  for  manipula/ng  rela/ons  –  the  data  manipula;on  language  (DML)  

•  Formula/ng  a  query  does  not  require  knowledge  of  physical  schema  

•  Allows  fast  applica/on  development  –  Embed  DML  in  high-­‐level  language  like  Java,  C,  Python  

Copyright  ©  Ben  CartereAe   16  

2/9/10  

9  

Concurrency  Control  

•  Many  databases  are  used  by  mul/ple  users  concurrently  – Each  user  is  manipula/ng  rela/ons  in  different  ways  

– Simultaneous  uses  can  result  in  inconsistencies  •  E.g.  one  is  looking  up  vacancies  while  another  is  making  a  reserva/on  

•  DBMS  ensures  that  these  problems  don’t  happen  

Copyright  ©  Ben  CartereAe   17  

Transac/on  Management  

•  A  transac;on  is  an  atomic  sequence  of  database  ac/ons  (reads  and  writes)  

•  The  complete  execu/on  of  each  transac/on  must  leave  the  database  in  a  consistent  state  if  the  database  is  consistent  when  it  begins  – Consistency  means  no  logical  conflicts  

•  User/applica/on  formulates  integrity  constraints  for  the  DBMS  to  enforce  

Copyright  ©  Ben  CartereAe   18  

2/9/10  

10  

Scheduling  Transac/ons  

•  DBMS  ensures  that  execu/on  of  {T1,  …,  Tn}  is  equivalent  to  serial  execu/on  T1’,  …,  Tn’  –  Locks:    before  reading  or  wri/ng,  a  transac/on  requests  a  lock  on  an  object,  and  does  nothing  un/l  DBMS  grants  lock.    Locks  are  released  a]er  execu/on.  

– Use  locks  to  force  ordering  of  unordered  transac/ons.  – Deadlock:    Ti  has  lock  on  object  A  and  needs  lock  on  object  B.    Tj  has  lock  on  object  B  and  needs  lock  on  object  A.    

Copyright  ©  Ben  CartereAe   19  

Atomicity  

•  “All  or  nothing”:    an  atomic  transac/on  is  one  that  either  completely  finishes  or  does  not  happen  at  all  

•  DBMS  needs  to  maintain  atomicity  even  when  it  crashes  in  the  middle  of  transac/ons  

•  Use  a  log  to  keep  track  of  ac/ons  DBMS  takes  to  execute  transac/on  – Write-­‐ahead  log  (WAL)  enables  this  

•  Transac/on  isn’t  done  un/l  all  of  its  ac/ons  are  done  

Copyright  ©  Ben  CartereAe   20  

2/9/10  

11  

Write-­‐Ahead  Log  

•  The  log  consists  of  the  following:  – For  write  ac/ons,  the  old  data  and  the  new  data  – A  flag  indica/ng  whether  the  transac/on  was  commiAed  or  aborted  

•  Transac/ons  can  be  undone  when  commit  not  present  

•  Deadlocks  can  be  resolved  by  abor/ng  one  transac/on  and  allowing  the  other  to  con/nue  

Copyright  ©  Ben  CartereAe   21  

DBMS  Structure  

•  Layered  architecture,  each  layer  only  aware  of  layer  below  it  

Copyright  ©  Ben  CartereAe   22  

Query  op/miza/on  &  execu/on  

Rela/onal  operators  

Files  and  access  methods  

Buffer  management  

Disk  space  management  

DB  

Recovery  manager  

Transac/on  manager  

Lock  manager  

Concurrency  control  

2/9/10  

12  

When  and  Where  

•  Charles  Bachman  designed  the  Integrated  Data  Store  at  General  Electric  in  the  1960s  

•  The  network  data  model,  a  tree-­‐based  representa/on  designed  for  explora/on  rather  than  querying  

•  First  Turing  Award  winner  in  1973  

Copyright  ©  Ben  CartereAe   23  

When  and  Where  

•  Edgar  Codd  proposed  rela/onal  data  model  in  1970  at  IBM  

•  Quickly  became  the  basis  of  commercial  systems;  strong  theore/cal  founda/on  developed  

•  Turing  Award  1981  

Copyright  ©  Ben  CartereAe   24  

2/9/10  

13  

When  and  Where  

•  Jim  Gray  made  fundamental  contribu/ons  to  transac/on  management  in  the  80s  and  90s  

•  Allowed  DBMSs  to  scale  to  huge  applica/ons  with  thousands  or  millions  of  users  

•  Turing  Award  1999  Copyright  ©  Ben  CartereAe   25  

Summary  

•  DBMS  used  to  maintain  and  query  large  amounts  of  data  

•  They  allow  concurrent  access,  recovery  from  failure,  fast  applica/on  development,  security  

•  Levels  of  abstrac/on  mean  that  one  can  work  on  one  subproblem  without  knowing  about  others  

•  Huge  industry  and  huge  research  area  in  CS  

Copyright  ©  Ben  CartereAe   26