38
Data Structures and Algorithms Introduction, Basic data types www.mif.vu.lt/~algis

Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Data Structuresand

Algorithms

Introduction, Basic data types

www.mif.vu.lt/~algis

Page 2: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Motto• Intermediate-level survey course: •mandatory - “Discrete Mathematics” and “Mathematics

for Informatics”.

• Programming and problem solving with applications.

• Algorithm: method for solving a problem.

• Data structure: method to store information.

• Data Structures and Algorithms:

• A much more dramatic effect can be made on the performance of a program by changing to a better algorithm

Page 3: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

MottoAlgorithms impact is broad and far-reaching:• Internet: Web search, packet routing, distributed file sharing• Biology: Human genome project, protein folding.• Computers. Circuit layout, file system, compilers• Computer graphics: Movies, video games, virtual reality• Security: Cell phones, e-commerce, voting machines•Multimedia: CD player, DVD, MP3, JPG, DivX, HDTV• Transportation: Airline crew scheduling, map routing• Physics: N-body simulation, particle collision simulation, etc.Study of algorithms dates at least to Euclid (500 BC)• Some important algorithms were discovered by students!

Page 4: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Motto• Example: Network connectivity

6

Why stud

y algorithm

s?

To b

e ab

le solve

proble

ms th

at could not oth

erw

ise b

e ad

dre

ssed

Exam

ple: N

etw

ork connectivity

[stay tuned]

Page 5: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Content

• 1. Introduction, computing model, von Neuman principles, data, abstract data types, data structures, basic data types• 2. Sorting, internal sorting, quicksort • 3. Merge sort, von Neuman sorting, external sorting• 4. Abstract data types, stack, queue, programming of stack

and queue• 5. Heap, priority queue, priority queue by heap structure,

lists, list programming, dynamic sets ADT• 6. Hierarchical structures, binary search trees, tree allocation

in memory• 7. AVL trees, 2-3-4 trees, red-black trees

Page 6: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Content

• 8. B-trees and other similar trees, Huffman algorithm for data compression• 9. Hashing idea, hashing functions and tables, hashing

procedures and algorithms, extendable hashing• 10. Radix search algorithms, radix trees, radix algorithms,

radix search• 11. Patricia trees, splay trees, suffix tree• 12. Text search• 13. Analysis of algorithms

Page 7: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Introduction

• Informally, algorithm means is a well-defined computational procedure that takes value (set of values) as input and produces some value (set of values) as output.

• An algorithm is thus a sequence of computational steps that transform the input into the output.

• Algorithm is also viewed as a tool for solving a well-specified problem, involving computers.

• There exist many points of view to algorithms. A good example of this is a famous Euclid’s algorithm:

Page 8: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Introduction“ for integers x, y calculate greatest common divisor gcd (x, y) “ Direct implementation of the algorithm looks like:program euclid (input, output);var x,y: integer;function gcd (u,v: integer): integer;

var t: integer; beginrepeat if u<v then begin t := u; u := v; v := t end; u := u-v; until u = 0;

gcd := v end; begin while not eof do

begin readln (x, y); if (x>0) and (y>0) then writeln(x, y, gcd (x, y)) end; end.

Page 9: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Introduction• This algorithm has some exceptional features• it is applicable only to numbers; • it has to be changed every time when something of the

environment changes, say if numbers are very long and does not fit into a size of variable (numbers like 1000!).

• For algorithms of applications, like databases, information systems, etc., they are usually understood in a slightly different way:• is repeated many times; • in a various circumstancies; • with different types of data.

Page 10: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Introduction

• 404800479988610197196058631666872994808558901323↵ 829669944590997424504087073759918823627727188732↵519779505950995276120874975462497043601418278094↵ 646496291056393887437886487337119181045825783647↵849977012476632889835955735432513185323958463075↵ 557409114262417474349347553428646576611667797396↵668820291207379143853719588249808126867838374559↵ 731746136085379534524221586593201928090878297308↵431392844403281231558611036976801357304216168747↵ 609675871348312025478589320767169132448426236131↵412508780208000261683151027341827977704784635868↵ 170164365024153691398281264810213092761244896359↵928705114964975419909342221566832572080821333186↵ 116811553615836546984046708975602900950537616475↵847728421889679646244945160765353408198901385442↵ 487984959953319101723355556602139450399736280750↵137837615307127761926849034352625200015888535147↵ 331611702103968175921510907788019393178114194545↵257223865541461062892187960223838971476088506276↵ 862967146674697562911234082439208160153780889893↵964518263243671616762179168909779911903754031274↵ 622289988005195444414282012187361745992642956581↵746628302955570299024324153181617210465832036786↵ 906117260158783520751516284225540265170483304226↵143974286933061690897968482590125458327168226458↵ 066526769958652682272807075781391858178889652208↵164348344825993266043367660176999612831860788386↵ 150279465955131156552036093988180612138558600301↵435694527224206344631797460594682573103790084024↵ 432438465657245014402821885252470935190620929023↵136493273497565513958720559654228749774011413346↵ 962715422845862377387538230483865688976461927383↵814900140767310446640259899490222221765904339901↵ 886018566526485061799702356193897017860040811889↵729918311021171229845901641921068884387121855646↵ 124960798722908519296819372388642614839657382291↵123125024186649353143970137428531926649875337218↵ 940694281434118520158014123344828015051399694290↵153483077644569099073152433278288269864602789864↵ 321139083506217095002597389863554277196742822248↵757586765752344220207573630569498825087968928162↵ 753848863396909959826280956121450994871701244516↵461260379029309120889086942028510640182154399457↵ 156805941872748998094254742173582401063677404595↵741785160829230135358081840096996372524230560855↵ 903700624271243416909004153690105933983835777939↵410970027753472000000000000000000000000000000000↵ 000000000000000000000000000000000000000000000000↵000000000000000000000000000000000000000000000000↵ 000000000000000000000000000000000000000000000000↵000000000000000000000000000000000000000000000000↵ 000000000000000000000000

Page 11: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Von Neuman computing model

1943: ENIACPresper Eckert and John Mauchly -- first general electronic computer. Hard-wired program -- settings of dials and switches.

1944: Beginnings of EDVAC

among other improvements, includes program stored in memory

1945: John von Neumann

wrote a report on the stored program concept, known as the

First Draft of a Report on EDVAC

Page 12: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Von Neuman computing modelThe main principles of the “von Neumann machine” (or model):

• a memory, containing instructions and data all together

• a processing unit, for performing arithmetic and logical operations

• a control unit, for interpreting instructions

This was a revolutionary step:

• clearly, requiring hardware changes with each new programming operation was time-consuming, error-prone, and costly

Page 13: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Von Neuman computing model

Page 14: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Memory types: RAM

• RAM is typically volatile memory (meaning it doesn’t retain voltage settings once power is removed)

• RAM is an array of cells, each with a unique address

• A cell is the minimum unit of access.

• Originally, this was 8 bits taken together as a byte. In today’s computer, word-sized cells (16 bits, grouped in 4) are more typical.

• RAM gets its name from its access performance. In RAM memory, theoretically, it would take the same amount of time to access any memory cell, regardless of its location with the memory bank (“random” access).

Page 15: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

The ALU• The third component in the von Neumann architecture is

called the Arithmetic Logic Unit.

• This is the component that performs the arithmetic and logic operations for which we have been building parts.

• This component may be duplicated more than once.

• The ALU is the “brain” of the computer.

•We have to define also a basic rules, how to read/write data in/from the memory – this leads to so-called basic data types or just data types.

Page 16: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

The ALU

• It houses the special memory locations, called registers (a very fast and special memory, of which we have already considered).

• It maybe not just one, really a different kind of registers (for example L1, L2, etc.

• The ALU is important enough that we will come back to it later.

• For now, just realize that it contains the circuitry to perform addition, substraction, multiplication and division, as well as logical comparisons (less than, equal to and greater than).

Page 17: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Data types or basic data typesData Types:

• Data type is a classification of a type of information, id esthow to prescribe value to bites or bytes in computer memory.

• Data types are essential to any computer programming language.

•Without them, it becomes very difficult to maintain information within a computer program.

• Different data types have different sizes in memory depending on the machine and compilers.

Page 18: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Data types or basic data typesData Types • Integer• Floating-point • Character• String• Boolean • ConstantsA constant is a value that never changes, it used less often than variables in programming. A constant can be : • a number, like 25 or 3.6 • a character, like a or $ • a character string, like "this is a string"

Page 19: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Logical data typesBoolean Types • Introduced by ALGOL 60 • Used to represent switched and flags in programs • Range of values: two elements, one for “true” and one for

“false” • One popular exception is C89, in which numeric expressions

are used as conditionals (all operands with nonzero values are considered true, and zero is considered false) • A Boolean value could be represented by a single bit, but

often statured in the smallest efficiently addressable cell of memory, typically a byte

Page 20: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types - Boolean data

&& 0 1

0 0 0

1 0 1

Data values: {false, true}

In any programming language, for instance, in C/C++: false = 0, true = 1 (or any nonzero object)

Could store 1 value per bit, but usually use a byte (or word)

Operations: and &&or ||not !

| | 0 1

0 0 1

1 1 1

x !x0 11 0

Page 21: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Character data typesCharacter Types: • Char types are stored as numeric codings (ASCII / Unicode) • Traditionally, the most commonly used coding was the 8-bit

code ASCII (American Standard Code for Information Interchange)

• An alternative, 16-bit coding: Unicode (UCS-2) • Java was the first widely used language to use the Unicode

character set. Since then, it has found its way into JavaScript, Python, Perl, C# and F#

• After 1991, the Unicode Consortium, in cooperation with the International Standards Organization (ISO), developed a 4-byte character code named UCS-4 or UTF-32, which is described in the ISO/IEC 10646 Standard, published in 2000

Page 22: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types - Character dataStore numeric codes (ASCII, EBCDIC, Unicode) 1 byte for ASCII and EBCDIC, 2 bytes for Unicode (see examples on p. 35).

Basic operation: comparison of chars to determine if ==, <, >, etc. uses their numeric codes (i.e. uses their ordinal values)

Page 23: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Integer data typeInteger:

• The most common primitive numeric data type is integer.

• The hardware of many computers supports several sizes of integers.

• These sizes of integers, and often a few others, are supported by some programming languages.

• Java includes four signed integer sizes: byte, short, int, and long.

• C++ and C#, include unsigned integer types, which are types for integer values without sings.

Page 24: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types - Integer data

Nonegative (unsigned) integer: type unsigned (and variations) in C++Store its base-two representation in a fixed number w of bits

(e.g., w = 16 or w = 32)

88 = 00000000010110002

Signed integer: type int (and variations) in C++Store in a fixed number w of bits using one of the following

representations:

Page 25: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Sign-magnitude representationSave one bit (usually most significant) for sign

(0 = +, 1 = – )

Use base-two representation in the other bits.

88 ® _0000000010110000

sign bit

1. Cumbersome for arithmetic computations

2. 2 0’s in this scheme

3. Incrementing by one results in subtraction of one, not addition!

–88 ® _000000001011000¯1

Both 0 and -0

Page 26: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Complement representation

For negative n (–n):(1) Find w-bit base-2 representation of n(2) Complement each bit.(3) Add 1

Example: –881. 88 as a 16-bit base-two number 0000000001011000

Same as subtracting the number from 0!

For non-negative n:Use ordinary base-two representation with leading (sign) bit 0

2. Complement this bit string3. Add 1

11111111101001111111111110101000

Page 27: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

5 + 7:

0000000000000101+0000000000000111

5 + –6:0000000000000101

+1111111111111010

These work for both + and – integers

0000000000001100

111¬¾¾ carry bits

1111111111111111

+ 0 1

0 0 1

1 1 10

x 0 1

0 0 0

1 0 1

Good for arithmetic computation

Page 28: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Problems with integer representation• Limited capacity -- a finite number of bits

• Overflow and Underflow:

Overflow

- addition or multiplication can exceed largest value permitted by storage scheme

Underflow

- subtraction or multiplication can exceed smallest value permitted by storage scheme

Not a perfect representation of (mathematical) integers:

• can only store a finite (sub)range of them

Page 29: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Floating-point data type•Model real numbers, but only as approximations for most

real values. • On most computers, floating-point numbers are stored in

binary, which exacerbates the problem. • Another problem is the loss of accuracy through arithmetic

operations. • Languages for scientific use support at least two floating-

point types; sometimes more (e.g. float, and double.) • The collection of values that can be represented by a

floating-point type is defined in terms of precision and range: • Precision: is the accuracy of the fractional part of a value,

measured as the number of bits.• Range: is the range of fractions and exponents.

Page 30: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Floating-point data type

IEEE floating-point formats: (a) single precision, (b) double precision

Page 31: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

How is real data represented?Types float and double (and variations) in C++Single precision (IEEE Floating-Point Format )

1. Write binary representation in floating-point form:b1.b2b3 . . . ´ 2k with each bi a bit and b1 = 1 (unless number is 0)

­mantissa exponent

or fractional part

Example: 22.625 = Floating point form:

10110.1012

1.01101012 ´ 24

+ 127

double:Exp: 11 bits,

bias 1023Mant: 52 bits

Round-off Errorsbase

Page 32: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types for C

C programming language:

char - smallest addressable unit of the machine that can contain basic character set. It is an integer type, actual type can be either signed or unsigned depending on the implementation.

signed char - same size as char, but guaranteed to be signed.

unsigned char - same size as char, but guaranteed to be unsigned.

shortshort intsigned shortsigned short int - short signed integer type, at least 16 bits in size.

Page 33: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types for Cunsignedunsigned int - same as int, but unsigned.

longlong intsigned longsigned long int - long signed integer type, at least 32 bits in size.

unsigned longunsigned long int - same as long, but unsigned.

long longlong long intsigned long longsigned long long int - long long signed integer type, at least 64 bits in size

Page 34: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types for C

unsigned long long int - same as long long, but unsigned.

float - single precision floating-point type.

double - double precision floating-point type, actual properties unspecified (except minimum limits), however on most systems this is the IEEE 754 double-precision binary floating-point format

long double - extended precision floating-point type, actual properties unspecified.

Boolean type

Structures:

• struct birthday { char name[20]; int day; int month; int year; };

Page 35: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Basic data types for CArray - array of N elements of type T

• int cat[10]; // array of 10 elements, each of type int

• int bob[]; // array of an unspecified number of 'int' elements.

• int a[10][8]; // array of 10 elements, each of type 'array of 8 intelements'

• float f[][32]; // array of unspecified number of 'array of 32 float elements’

Pointer - char *square; long *circle;

Unions - union types are special structures which allow access to the same memory using different type descriptions

Page 36: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Complex data typesComplex (numbers):

• Some languages support a complex type: e.g., Fortran and Python

• Each value consists of two floats: the real part and the imaginary part

• Literal form (in Python):

(7 + 3j) where 7 is the real part and 3 is the imaginary part

Page 37: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Decimal data typesDecimal

•Most larger computers that are designed to support business applications have hardware support for decimal data types

• Decimal types store a fixed number of decimal digits, with the decimal point at a fixed position in the value

• These are the primary data types for business data processing and are therefore essential to COBOL

• Advantage: accuracy of decimal values

• Disadvantages: limited range since no exponents are allowed, and its representation wastes memory

Page 38: Data Structures and Algorithms - Vilniaus universitetasalgis/dsax/Data-structures-0.pdf · 2020-02-02 · •Data Structures and Algorithms: •A much more dramatic effect can be

Strings and their operationsTypical operations:

• Assignment

• Comparison (=, >, etc.)

• Concatenation

• Substring reference

• Pattern matching

C and C++ use char arrays to store char strings and provide a collection of string operations through a standard library whose header is string.h