Insight Through Computing 15. Strings Operations Subscripting Concatenation Search Numeric-String...

Preview:

Citation preview

Insight Through Computing

15. Strings

OperationsSubscriptingConcatenationSearchNumeric-String Conversions

Built-Ins: int2str,num2str, str2double

Insight Through Computing

Previous Dealings

N = input(‘Enter Degree: ’)

title(‘The Sine Function’)

disp( sprintf(‘N = %2d’,N) )

Insight Through Computing

A String is an Array of Characters

‘Aa7*>@ x!’

A a 7 * > @ x !

This string has length 9.

Insight Through Computing

Why are Stirngs Important?

1. Numerical Data often encoded as strings

2. Genomic calculation/search

Insight Through Computing

Numerical Data is Often Encoded in Strings

For example, a file containingIthaca weather data begins with the string

W07629N4226

Longitude: 76o 29’ WestLatitude: 42o 26’ North

Insight Through Computing

What We Would Like to Do

W07629N4226

Get hold of the substring ‘07629’

Convert it to floating format so thatit can be involved in numerical

calculations.

Insight Through Computing

Format Issues

9 as an IEEE floating point number:

9 as a character:

0100000blablahblah01001111000100010010

01000otherblablaDifferent Representation

Insight Through Computing

Genomic Computations

Looking for patterns in a DNA sequence:

‘ATTCTGACCTCGATC’ACCT

Insight Through Computing

Genomic Computations

Quantifying Differences:

ATTCTGACCTCGATCATTGCTGACCTCGAT

Remove?

Insight Through Computing

Working With Strings

Insight Through Computing

Strings Can Be Assignedto Variables

S = ‘N = 2’

N = 2;

S = sprintf(‘N = %1d’,N)

‘N = 2’

S

sprintf produces a formatted string using fprintf rules

Insight Through Computing

Strings Have a Length

s = ‘abc’;

n = length(s); % n = 3

s = ‘’; % the empty string

n = length(s) % n = 0

s = ‘ ‘; % single blank

n = length(s) % n = 1

Insight Through Computing

Concatenation

This: S = ‘abc’;

T = ‘xy’

R = [S T]

is the same as this: R = ‘abcxy’

Insight Through Computing

Repeated Concatenation

This: s = ‘’;

for k=1:5

s = [s ‘z’];

end

is the same as this:

z = ‘zzzzz’

Insight Through Computing

Replacing and AppendingCharacters

s = ‘abc’;s(2) = ‘x’ % s = ‘axc’

t = ‘abc’t(4) = ‘d’ % t = ‘abcd’

v = ‘’v(5) = ‘x’ % v = ‘ x’

Insight Through Computing

Extracting Substrings

s = ‘abcdef’;

x = s(3) % x = ‘c’

x = s(2:4) % x = ‘bcd’

x = s(length(s)) % x = ‘f’

Insight Through Computing

Colon Notation

s( : )

Starting Location

Ending Location

Insight Through Computing

Replacing Substrings

s = ‘abcde’;

s(2:4) = ‘xyz’ % s = ‘axyze’

s = ‘abcde’

s(2:4) = ‘wxyz’ % Error

Insight Through Computing

Question Time

s = ‘abcde’;

for k=1:3

s = [ s(4:5) s(1:3)];

end

What is the final value of s ?

A abcde B. bcdea C. eabcd D. deabc

Insight Through Computing

Problem: DNA Strand

x is a string made up of the characters‘A’, ‘C’, ‘T’, and ‘G’.

Construct a string Y obtained from x by replacinig each A by T, each T by A, each C by G, and each G by C

x: ACGTTGCAGTTCCATATGy: TGCAACGTCAAGGTATAC

Insight Through Computing

function y = Strand(x)

% x is a string consisting of

% the characters A, C, T, and G.

% y is a string obtained by

% replacing A by T, T by A,

% C by G and G by C.

Insight Through Computing

Comparing Strings

Built-in function strcmp

strcmp(s1,s2) is true if the strings s1 and s2 are identical.

Insight Through Computing

How y is Built Up

x: ACGTTGCAGTTCCATATGy: TGCAACGTCAAGGTATAC

Start: y: ‘’ After 1 pass: y: TAfter 2 passes: y: TGAfter 3 passes: y: TGC

Insight Through Computing

for k=1:length(x)

if strcmp(x(k),'A')

y = [y 'T'];

elseif strcmp(x(k),'T')

y = [y 'A'];

elseif strcmp(x(k),'C')

y = [y 'G'];

else

y = [y 'C'];

end

end

Insight Through Computing

A DNA Search Problem

Suppose S and T are strings, e.g.,

S: ‘ACCT’

T: ‘ATGACCTGA’

We’d like to know if S is a substring of T and if so, where is the first occurrance?

Insight Through Computing

function k = FindCopy(S,T)

% S and T are strings.

% If S is not a substring of T,

% then k=0.

% Otherwise, k is the smallest

% integer so that S is identical

% to T(k:k+length(S)-1).

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(1:4)) False

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(2:5)) False

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(3:6)) False

Insight Through Computing

A DNA Search Problem

S: ‘ACCT’

T: ‘ATGACCTGA’

strcmp(S,T(4:7))) True

Insight Through Computing

Pseudocode

First = 1; Last = length(S);

while S is not identical to T(First:Last) First = First + 1;

Last = Last + 1;

end

Insight Through Computing

Subscript Error

S: ‘ACCT’

T: ‘ATGACTGA’

strcmp(S,T(6:9))

There’s a problem if S is not a substring of T.

Insight Through Computing

Pseudocode

First = 1; Last = length(s);

while Last<=length(T) && ... ~strcmp(S,T(First:Last))

First = First + 1;

Last = Last + 1;

end

Insight Through Computing

Post-Loop Processing

Loop ends when this is false:

Last<=length(T) && ...

~strcmp(S,T(First:Last))

Insight Through Computing

Post-Loop Processing

if Last>length(T) % No Match found k=0;else % There was a match k=First;end

The loop ends for one of two reasons.

Insight Through Computing

Numeric/StringConversion

Insight Through Computing

String-to-Numeric Conversion

An example…

Convention: W07629N4226

Longitude: 76o 29’ West Latitude: 42o 26’ North

Insight Through Computing

String-to-Numeric Conversion

S = ‘W07629N4226’

s1 = s(2:4);

x1 = str2double(s1);

s2 = s(5:6);

x2 = str2double(s2);

Longitude = x1 + x2/60

There are 60 minutes in a degree.

Insight Through Computing

Numeric-to-String Conversion

x = 1234;

s = int2str(x); % s = ‘1234’

x = pi;

s = num2str(x,’%5.3f’); % s =‘3.142’

Insight Through Computing

Problem

Given a date in the format ‘mm/dd’

specify the next day in the same format

Insight Through Computing

y = Tomorrow(x)

x y

02/28 03/01

07/13 07/14

12/31 01/01

Insight Through Computing

Get the Day and Month

month = str2double(x(1:2));

day = str2double(x(4:5));

Thus, if x = ’02/28’ then month is assignedthe numerical value of 2 and day is assigned the numerical value of 28.

Insight Through Computing

L = [31 28 31 30 31 30 31 31 30 31 30 31];

if day+1<=L(month)

% Tomorrow is in the same month

newDay = day+1;

newMonth = month;

Insight Through Computing

L = [31 28 31 30 31 30 31 31 30 31 30 31];

else

% Tomorrow is in the next month

newDay = 1;

if month <12

newMonth = month+1;

else

newMonth = 1;

end

Insight Through Computing

The New Day String

Compute newDay (numerical) and convert…

d = int2str(newDay);if length(d)==1 d = ['0' d];end

Insight Through Computing

The New Month String

Compute newMonth (numerical) and convert…

m = int2str(newMonth);

if length(m)==1;

m = ['0' m];

end

Insight Through Computing

The Final Concatenation

y = [m '/' d];

Insight Through Computing

Some other useful string functionsstr= ‘Cs 1112’;

length(str) % 7isletter(str) % [1 1 0 0 0 0 0]isspace(str) % [0 0 1 0 0 0 0]lower(str) % ‘cs 1112’upper(str) % ‘CS 1112’

ischar(str) % Is str a char array? True (1)strcmp(str(1:2),‘cs’) % Compare strings str(1:2) & ‘cs’. False (0)strcmp(str(1:3),‘CS’) % False (0)

Insight Through Computing

ASCII characters(American Standard Code for Information Interchange)

ascii code Character: :: :65 ‘A’66 ‘B’67 ‘C’: :90 ‘Z’: :

ascii code Character

: :: :48 ‘0’49 ‘1’50 ‘2’: :57 ‘9’: :

Insight Through Computing

Character vs ASCII code

str= ‘Age 19’

%a 1-d array of characters

code= double(str)

%convert chars to ascii values

str1= char(code)

%convert ascii values to chars

Insight Through Computing

Arithmetic and relational ops on characters

• ‘c’-‘a’ gives 2• ‘6’-‘5’ gives 1• letter1=‘e’; letter2=‘f’; • letter1-letter2 gives -1

• ‘c’>’a’ gives true• letter1==letter2 gives false

• ‘A’ + 2 gives 67• char(‘A’+2) gives ‘C’

Insight Through Computing

Example: toUpperWrite a function toUpper(cha) to convert character cha to upper case if cha is a lower case letter. Return the converted letter. If cha is not a lower case letter, simply return the character cha.

Hint: Think about the distance between a letter and the base letter ‘a’ (or ‘A’). E.g.,

a b c d e f g h …

A B C D E F G H …

Of course, do not use Matlab function upper!

distance = ‘g’-‘a’ = 6 = ‘G’-‘A’

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

cha is lower case if it is between ‘a’ and ‘z’

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’

end

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’ offset= cha - 'a';

% Go same distance from ‘A’ end

Insight Through Computing

function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.

up= cha;

if ( cha >= 'a' && cha <= 'z' )

% Find distance of cha from ‘a’ offset= cha - 'a';

% Go same distance from ‘A’ up= char('A' + offset);end

Recommended