33
LING 408/508: Programming for Linguists Lecture 2 August 26 th

LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

LING  408/508:  Programming  for  Linguists  

 Lecture  2  August  26th  

Page 2: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Today’s  Topics  

•  con$nuing  on  from  last  $me  …  •  Homework  1  

Page 3: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Adminstrivia  •  No  class  on  

–  Monday  September  7th  (Labor  Day)  –  Wednesday  November  11th  (Veterans  Day)  –  Week  a5er  September  11th  (out  of  town),  plus  Monday  21st    –  Monday  October  12th  

Page 4: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  what  if  you  want  to  store  even  larger  numbers  than  32  bits?  –  Binary  Coded  Decimal  (BCD)  –  1  byte  can  code  two  digits  (0-­‐9  requires  4  bits)  –  1  nibble  (4  bits)  codes  the  sign  (+/-­‐),  e.g.  hex  C/D  23   22   21   20  

0   0   0   0  

23   22   21   20  

0   0   0   1  

23   22   21   20  

1   0   0   1  

0  

1  

9  

2   0   1   4  

2  bytes  (=  4  nibbles)  

+   2   0   1   4  

2.5  bytes  (=  5  nibbles)  

23   22   21   20  

1   1   0   0   C  23   22   21   20  

1   1   0   1   D  credit  (+)   debit  (-­‐)  

Page 5: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  

•  Typically,  64  bits  (8  bytes)  are  used  to  represent  floaTng  point  numbers  (double  precision)  –  c  =  2.99792458  x  108  (m/s)  –  coefficient:  52  bits  (implied  1,  therefore  treat  as  53)  –  exponent:  11  bits  (usually  not  2’s  complement,  unsigned  with  bias  2(10-­‐1)-­‐1  =  511)  

–  sign:  1  bit  (+/-­‐)  

C:  float  double  

wikipedia  

x86  CPUs  have  a  built-­‐in    floaTng  point  coprocessor  (x87)  80  bit  long  registers  

e.g.  probabiliTes  

Page 6: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  

•  Next  Tme,  we'll  talk  about  the  representaTon  of  characters  (leeers,  symbols,  etc.)  

Page 7: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Example  1  

•  Recall  the  speed  of  light:  •  c  =  2.99792458  x  108  (m/s)  

1.  Can  a  4  byte  integer  be  used  to  represent  c  exactly?  – 4  bytes  =  32  bits  – 32  bits  in  2’s  complement  format  – Largest  posiTve  number  is    – 231-­‐1  =  2,147,483,647  –               c  =        299,792,458    

Page 8: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Example  2  

•  Recall  the  speed  of  light:  •  c  =  2.99792458  x  108  (m/s)  

2.  How  much  memory  would  you  need  to  encode  c  using  BCD  notaTon?  – 9  digits  – each  digit  requires  4  bits  (a  nibble)  – BCD  notaTon  includes  a  sign  nibble  –  total  is  5  bytes  

 

Page 9: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Example  3  

•  Recall  the  speed  of  light:  •  c  =  2.99792458  x  108  (m/s)  

3.  Can  the  64  bit  floaTng  point  representaTon  (double)  encode  c  without  loss  of  precision?  – Recall  significand  precision:  53  bits  (52  explicitly  stored)  

– 253-­‐1  =  9,007,199,254,740,991    – almost  16  digits  

Page 10: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Example  4  •  Recall  the  speed  of  light:  •  c  =  2.99792458  x  108  (m/s)  

•  The  32  bit  floaTng  point  representaTon  (float)  –  someTmes  called  single  precision  -­‐  is  composed  of  1  bit  sign,  8  bits  exponent  (unsigned  with  bias  2(8-­‐1)-­‐1),  and  23  bits  coefficient  (24  bits  effecTve).    

•  Can  it  represent  c  without  loss  of  precision?    –  224-­‐1  =  16,777,215  – Nope  

Page 11: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  

•  For  both  soluTons,  show  your  work,  i.e.  how  you  derived  your  answer  

•  Pi  (𝛑)  is  an  irraTonal  number  – can't  be  represented  precisely!  

wikipedia  

Page 12: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  

1.  Encode  Pi  as  accurately  as  possible  using  both  the  64    and  32  bit  floaTng  point  representaTons    InstrucBon:  draw  the  diagram  and  fill  in  the  1's  and  0's  

2.  How  many  decimal  places  of  precision  is  provided  by  each  of  the  64  and  32  bit  floaTng  point  representaTons?  

Page 13: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  •  How  to  encode  1:  (bias:  01111  +  0  =  20,  frac:  1000…  remember:  there  is  an  implicit  leading  1,  

•  =  1.000…  in  binary)  

Page 14: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  2:  (exp:  10000  =  bias  01111  +  1  =  21,  frac:  1000…)  =  10.00…  in  binary  

Page 15: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  3:  (exp:  10000  =  bias  01111  +  1  =  21,  frac:  1100…)  =  11.000…  in  binary  

Page 16: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  4:  (exp:  10001  =  bias  01111  +  10  =  22,  frac:  1000…)  =  100.0…  in  binary  

Page 17: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  5:  (exp:  10001  =  bias  01111  +  10  =  22,  frac:  1010…)  =  101.0…  in  binary  

Page 18: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  6:  (exp:  10001  =  bias  01111  +  10  =  22,  frac:  1100…)  =  110.0…  in  binary  

Page 19: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  7:  (exp:  10001  =  bias  01111  +  10  =  22,  frac:  1110…)  =  111.0…  in  binary  

Page 20: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  How  to  encode  8:  (exp:  10001  =  bias  01111  +  100  =  23,  frac:  1000…)  =  1000.0…  in  binary  

Page 21: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  Decimal  3.5  is  1.11  x  21  =  11.1  in  binary    

Page 22: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  Decimal  3.25  is  1.101  x  21  =  11.01  in  binary    

Page 23: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  Hints  

•  Decimal  3.125  is  1.1001  x  21  =  11.001  in  binary    

Page 24: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

Homework  1  

•  Due  Friday  night    –  (by  midnight  in  my  emailbox)  

•  Required  format  (for  all  homeworks  unless  otherwise  specified):  –  Plain  text  or  PDF  formats  only    

•  (no  .doc,  .docx  etc.)  –  Single  file  only  –  cut  and  paste  into  one  document  

•  (no  mulTple  aeachments)  –  Subject  line:  408/508  Homework  1  –  First  line:  your  full  name  

Page 25: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  How  about  leeers,  punctuaTon,  etc.?  •  ASCII  

–  American  Standard  Code  for  InformaTon  Interchange  –  Based  on  English  alphabet  (upper  and  lower  case)  +  space  +  digits  +  

punctuaTon  +  control  (Teletype  Model  33)  –  QuesBon:  how  many  bits  do  we  need?  –  7  bits  +  1  bit  parity  –  Remember  everything  is  in  binary  …  

C:  char  

Teletype  Model  33  ASR    Teleprinter  (Wikipedia)  

Page 26: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  order  is  important  in  sorTng!  

0-­‐9:  there’s  a  connecTon  with  BCD.  NoBce:  code  30  (hex)  through  39  (hex)    

Page 27: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  Parity  bit:  

–  transmission  can  be  noisy  –  parity  bit  can  be  added  to  ASCII  code  –  can  spot  single  bit  transmission  errors  –  even/odd  parity:    

•  receiver  understands  each  byte  should  be  even/odd  –  Example:    

•  0  (zero)  is  ASCII  30  (hex)  =  011000  •  even  parity:  0110000,  odd  parity:  0110001  

–  Checking  parity:    •  Exclusive  or  (XOR):  basic  machine  instrucTon  

–  A  xor  B  true  if  either  A  or  B  true  but  not  both  –  Example:  

•  (even  parity  0)  0110000  xor  bit  by  bit  •  0  xor  1  =  1  xor  1  =  0  xor  0  =  0  xor  0  =  0  xor  0  =  0  xor  0  =  0  xor  0  =  0    

x86  assemby  language:  1.  PF:  even  parity  flag  set  by                arithmeTc  ops.  2.  TEST:  AND  (don’t  store  

result),  sets  PF  3.  JP:  jump  if  PF  set    Example:  MOV  al,<char>  TEST  al,  al  JP  <locaTon  if  even>  <go  here  if  odd>  

Page 28: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  UTF-­‐8    

–  standard  in  the  post-­‐ASCII  world  –  backwards  compaTble  with  ASCII  –  (previously,  different  languages  had  mul$-­‐byte  character  sets  that  

clashed)  –  Universal  Character  Set  (UCS)  TransformaTon  Format  8-­‐bits  

(Wikipedia)  

Page 29: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  

•  Example:  –  あ  Hiragana  leeer  A:  UTF-­‐8:  E38182    –  Byte  1:  E  =  1110,  3  =  0011  –  Byte  2:  8  =  1000,  1  =  0001  –  Byte  3:  8  =  1000,  2  =  0010  –  い  Hiragana  leeer  I:  UTF-­‐8:  E38184  

Shis-­‐JIS  (Hex):    あ:  82A0  い:  82A2  

Page 30: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  How  can  you  tell  what  encoding  your  file  is  using?  •  DetecTng  UTF-­‐8  

– Microsos:    •  1st  three  bytes  in  the  file  is  EF  BB  BF    •  (not  all  so=ware  understands  this;  not  everybody  uses  it)  

–  HTML:  •  <meta  hep-­‐equiv="Content-­‐Type"  content="text/html;charset=UTF-­‐8"  >  

•  (not  always  present)  –  Analyze  the  file:  

•  Find  non-­‐valid  UTF-­‐8  sequences:  if  found,  not  UTF-­‐8…  •  InteresTng  paper:    

–  hep://www-­‐archive.mozilla.org/projects/intl/UniversalCharsetDetecTon.html  

Page 31: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  Filesystem:  

–  different  on  different  computers:  some$mes  a  problem  if  you  mount  filesystems  across  different  systems  

•  Examples:  –  FAT32  (File  AllocaTon  Table)    DOS,  Windows,                          memory  cards  –  ExFAT  (Extended  FAT)        SD  cards  (>  4GB  files)  –  NTFS  (New  Technology  File  System)  Windows  –  ext4  (Fourth  Extended  Filesystem)    Linux  –  HFS+  (Hierarchical  File  System  Plus)  Macs  

limited  to  4GB  max  file  size  

Page 32: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  Filesystem:  

–  different  on  different  computers:  some$mes  a  problem  if  you  mount  filesystems  across  different  systems  

•  Files:  –  Name          (Path  from  /  root)  –  Type          (e.g.  .docx,  .pptx,  .pdf,  .html,  .txt)  –  Owner      (usually  the  Creator)  –  Permissions      (for  the  Owner,  Group,  or  Everyone)  –  need  to  be  opened    (to  read  from  or  write  to)  –  Mode:  read/write/append  –  Binary/Text   in  all  programming  languages:  

open  command  

Page 33: LING%408/508:%Programming%for% Linguists%sandiway/ling508-15/lecture2.pdfIntroducTon:%datatypes% • whatif%you%wantto%store%even%larger%numbers% than%32%bits?% – Binary%Coded%Decimal%(BCD)%

IntroducTon:  data  types  •  Text  files:    

–  text  files  have  lines:  how  do  we  mark  the  end  of  a  line?  –  End  of  line  (EOL)  control  character(s):    

•  LF    0x0A    (Mac/Linux),    •  CR    0x0D    (Old  Macs),    •  CR+LF    0x0D0A  (Windows)  

–  End  of  file  (EOF)  control  character:    •  (EOT)  0x04  (aka  Control-­‐D)  

binaryvision.nl  

programming  languages:  NUL  used  to  mark  the  end  of  a  string