34
C Strings Computer Organization I 1 CS@VT ©2005-2019 WD McQuain String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number of ways. If storing a character string (to use as a unit), you must ensure that a special character, the string terminator '\0' is stored in the first unused cell. Failure to understand and abide by this is a frequent source of errors. There is no special type for (character) strings in C; rather, char arrays are used. Word[0] Word[1] Word[2] Word[3] Word[4] Word[5] Word[6] 'f' 'o' 'o' 'b' 'a' 'r' '\0'

String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

1

CS@VT ©2005-2019 WD McQuain

String Representation in C

char Word[7] = "foobar";

C treats char arrays as a special case in a number of ways.

If storing a character string (to use as a unit), you must ensure that a special character, the string terminator '\0' is stored in the first unused cell.

Failure to understand and abide by this is a frequent source of errors.

There is no special type for (character) strings in C; rather, char arrays are used.

Word[0]

Word[1]

Word[2]

Word[3]

Word[4]

Word[5]

Word[6]

'f' 'o' 'o' 'b' 'a' 'r' '\0'

Page 2: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

2

CS@VT ©2005-2019 WD McQuain

Some C String Library Functions

The C Standard Library includes the following function for copying blocks of memory:

void* memcpy(void* restrict s1, const void* restrict s2,

size_t n);

Copies n bytes from the object pointed to by s2 into the object pointed to by s1.

If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1.

string.h

memcpy() is potentially more efficient than a user-defined loop.

memcpy() may trigger a segfault error if:

- the destination region specified by s1 is not large enough to allow copying n bytes

- n bytes cannot be copied from the region specified by s2.

Page 3: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

3

CS@VT ©2005-2019 WD McQuain

The memcpy() Interface

The memcpy() interface employs a few interesting features:

void* memcpy(void* restrict s1, const void* restrict s2,

size_t n);

void* says nothing about the data type to which s1 and s2 point;

which makes sense since memcpy() deals with raw bytes of data

restrict implies (more or less) that no pointer in the same context points to the same

target; here, restrict implies that s1 and s2 do not share the same target;

the implied guarantee cannot be verified by the compiler;

this is of interest mainly to compiler writers

Page 4: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

4

CS@VT ©2005-2019 WD McQuain

More C String Library Functions

And, there are functions that support operations on C strings, including:

char* strcpy(char* restrict s1, const char* restrict s2);

Copies the string pointed to by s2 (including the terminating null character) into the

array pointed to by s1.

If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1.

string.h

strcpy() execution depends on several assumptions:

- the string pointed to by s2 is properly terminated by a null character

- the array pointed to by s1 is long enough to hold all the characters in the

string pointed to by s2 and a terminator

strcpy() cannot verify either assumption and may produce serious errors if abused

Page 5: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

5

CS@VT ©2005-2019 WD McQuain

C String Library Hazards

The memcpy() and strcpy() functions illustrate classic hazards of the C library.

If the target of the parameter s1 to memcpy() is smaller than n bytes, then memcpy()

will attempt to write data past the end of the target, likely resulting in a logic error and possibly a runtime error. A similar issue arises with the target of s2.

The same issue arises with strcpy(), but strcpy() doesn't even take a parameter

specifying the maximum number of bytes to be copied, so there is no way for strcpy() to

even attempt to enforce any safety measures.

Worse, if the target of the parameter s1 to strcpy() is not properly 0-terminated, then the

strcpy() function will continue copying until a 0-byte is encountered, or until a runtime

error occurs. Either way, the effect will not be good.

Page 6: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

6

CS@VT ©2005-2019 WD McQuain

Safer Copying

char* strncpy(char* restrict s1, const char* restrict s2,

size_t n);

Copies not more than n characters (characters that follow a null character are not

copied) from the array pointed to by s2 to the array pointed to by s1.

If copying takes place between objects that overlap, the behavior is undefined.

If the array pointed to by s2 is a string that is shorter than n characters, null

characters are appended to the copy in the array pointed to by s1, until n characters

in all have been written.

Returns the value of s1.

Of course, strncpy() must trust the caller that the array pointed to by s1 can hold

at least n characters; otherwise errors may occur.

And, this still raises the hazard of an unreported truncation if s2 contains more than

n characters that were to be copied to s1, and null termination of the destination is

not guaranteed in that case.

Page 7: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

7

CS@VT ©2005-2019 WD McQuain

Another C String Library Function

size_t strlen(const char* s);

Computes the length of the string pointed to by s.

Returns the number of characters that precede the terminating null character.

Hazard: if there's no terminating null character then strlen() will read until it

encounters a null byte or a runtime error occurs.

Page 8: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

8

CS@VT ©2005-2019 WD McQuain

More C String Library Functions

char* strcat(char* restrict s1, const char* restrict s2);

Appends a copy of the string pointed to by s2 (including the terminating null

character) to the end of the string pointed to by s1.

The initial character of s2 overwrites the null character at the end of s1.

If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1.

char* strncat(char* restrict s1, const char* restrict s2,

size_t n);

Appends not more than n characters (a null character and characters that follow it

are not appended) from the array pointed to by s2 to the end of the string pointed to

by s1.

The initial character of s2 overwrites the null character at the end of s1.

A terminating null character is always appended to the result.

If copying takes place between objects that overlap, the behavior is undefined. Returns the value of s1.

Page 9: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

9

CS@VT ©2005-2019 WD McQuain

More C String Library Functions

int strcmp(const char* s1, const char* s2);

Compares the string pointed to by s1 to the string pointed to by s2.

The strcmp function returns an integer greater than, equal to, or less than zero,

accordingly as the string pointed to by s1 is greater than, equal to, or less than the

string pointed to by s2.

int strncmp(const char* s1, const char* s2, size_t n);

Compares not more than n characters (characters that follow a null character are

not compared) from the array pointed to by s1 to the array pointed to by s2.

The strncmp function returns an integer greater than, equal to, or less than zero,

accordingly as the possibly null-terminated array pointed to by s1 is greater than,

equal to, or less than the possibly null-terminated array pointed to by s2.

Page 10: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

10

CS@VT ©2005-2019 WD McQuain

Bad strcpy()!

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

int main() {

char s1[] = "K & R: the C Programming Language";

char s2[1];

strcpy(s2, s1); // s2 is too small!

printf("s1: %s\n", s1);

printf("s2: %s\n", s2);

return 0;

}

linux > gcc -o str03_64 -std=c99 -Wall str03.c

linux > str03_64

s1: & R: the C Prrogramming Language

s2: K& R: the C Prrogramming Language

Page 11: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

11

CS@VT ©2005-2019 WD McQuain

Example

/** Makes a duplicate of a given C string.

* Pre: *str is a null-terminated array

* Returns: pointer to duplicate of *str; NULL on failure

* Calls: calloc()

*/

char* dupeString(const char* const str) {

// Allocate array to hold duplicate, using calloc() to

// fill new array with zeroes;

// return NULL if failure

char* cpy = calloc(strlen(str) + 1, sizeof(char));

if ( cpy == NULL ) return NULL;

// Copy characters until terminator in *str is reached

int idx = 0;

while ( str[idx] != '\0' ) {

cpy[idx] = str[idx];

idx++;

}

return cpy;

}

Page 12: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

12

CS@VT ©2005-2019 WD McQuain

Example

/** Makes a duplicate of a given C string.

* Pre: *str is a null-terminated array

* Returns: pointer to duplicate of *str; NULL on failure

* Calls: calloc(), memcpy()

*/

char* dupeString(const char* const str) {

// Allocate array to hold duplicate, using calloc() to

// fill new array with zeroes;

// return NULL if failure

char* cpy = calloc(strlen(str) + 1, sizeof(char));

if ( cpy == NULL ) return NULL;

// Use memcpy() to copy characters from *str to *cpy

memcpy(cpy, str, strlen(str));

return cpy;

}

Page 13: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

13

CS@VT ©2005-2019 WD McQuain

Example

/** Truncates a given C string at a given character.

* Pre: *str is a null-terminated array

* Returns: true if string was terminated

*/

bool truncString(char* const str, char ch) {

// Walk *str until ch is found or end of string is reached

int idx = 0;

while ( str[idx] != '\0' ) {

if ( str[idx] == ch ) {

str[idx] = '\0';

return true;

}

idx++;

}

return false;

}

Page 14: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

14

CS@VT ©2005-2019 WD McQuain

C Strings and I/O

The basic nature of string-handling in C causes some problems with input of strings.

The fundamental problems are:

• strings are stored in arrays of char

• these arrays are fixed-length and must be created before the input is read

• input may be unpredictable

Page 15: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

15

CS@VT ©2005-2019 WD McQuain

Output and C Strings

Assuming a properly-terminated C string, writing it to a file, or standard output, is simple

and safe.

The most common approach is to use fprintf():

char* str = "some very long string ... ending here";

fprintf(out, "str: %s\n", str);

With a properly-terminated string, this operation cannot fail unless the output device is full,

which seems unlikely.

Page 16: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

16

CS@VT ©2005-2019 WD McQuain

Output and C Strings

We can also with sprint() and snprintf()…

char* str = "some very long string ... ending here";

sprintf(str2, "str: %s\n", str);

If we've made sure that *str2 is long enough to hold all the characters we are writing to it,

this will be fine.

char* str = "some very long string ... ending here";

snprintf(str2, len2, "str: %s\n", str);

snprintf() allows us to specify a limit on the size of the destination:

Page 17: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

17

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

You may use the %s switch in fscanf() to read character data into a char array:

#define MAX_LENGTH 25

. . .

char str[MAX_NLENGTH + 1];

. . .

fscanf(in, "%s", str);

fscanf() will:

• skip leading whitespace,

• read and store characters into str[] until whitespace or EOF is encountered,

• write a terminating '\0' into str[]

BUT, fscanf() has no information about the length of str[], so it may write past the

end of the array!

This is (arguably) safe when the format of the input data is tightly specified.

Page 18: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

18

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

Suppose we want to read personal names from an input file, and we are told each line of the

input file will obey the following formatting rule:

<first name><\t><middle name><\t><last name><\n>

Marion\tMitchell\tMorrison

For example:

But… how long might one of those strings be?

We have two cases:

a) a maximum length is specified by whatever is supplying the input data

b) in the absence of such guarantee, we can merely make a good guess

Page 19: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

19

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

Let's say we decide the maximum name length is 25 characters:

#define MAX_NLENGTH 25

. . .

char fname[MAX_NLENGTH + 1];

char mname[MAX_NLENGTH + 1];

char lname[MAX_NLENGTH + 1];

fscanf(in, "%s %s %s", fname, mname, lname);

printf("%s\n%s\n%s\n", fname, mname, lname);

Marion\tMitchell\tMorrison

Marion

Mitchell

Morrison

OK, that worked as desired…

Page 20: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

20

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

Now suppose the input file also contains a city name and a country name, so we have

records that are formatted like so:

<first name><\t><middle name><\t><last name><\n>

<city name><\n>

<country name><\n>

Marion\tMitchell\tMorrison

Winterset

Iowa

For example:

Now… how long might a city or country name be?

Page 21: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

21

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

Let's say we assume our earlier guess is still safe:

#define MAX_NLENGTH 25

. . .

char fname[MAX_NLENGTH + 1];

char mname[MAX_NLENGTH + 1];

char lname[MAX_NLENGTH + 1];

fscanf(in, "%s %s %s", fname, mname, lname);

printf("%s\n%s\n%s\n", fname, mname, lname);

char cityname[MAX_NLENGTH + 1];

fscanf(in, "%s", cityname);

printf("%s\n", cityname);

char countryname[MAX_NLENGTH + 1];

fscanf(in, "%s", countryname);

printf("%s\n", countryname);

Marion\tMitchell\tMorrison

Marion

Mitchell

Morrison

That looks OK…

Page 22: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

22

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

But consider the following input data (yes, that's a real place name):

#define MAX_NLENGTH 25

. . .

char fname[MAX_NLENGTH + 1];

char mname[MAX_NLENGTH + 1];

char lname[MAX_NLENGTH + 1];

fscanf(in, "%s %s %s", fname, mname, lname);

printf("%s\n%s\n%s\n", fname, mname, lname);

char cityname[MAX_NLENGTH + 1];

fscanf(in, "%s", cityname);

printf("%s\n", cityname);

char countryname[MAX_NLENGTH + 1];

fscanf(in, "%s", countryname);

printf("%s\n", countryname);

Naomi Ellen Watts

Llanfairpwllgwyngyllgogerycchwyrndrobwlllllantysilioggogogoch

Wales

Now we are in trouble.

cityname[] is far

too small to hold this.

Page 23: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

23

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

Naomi

Ellen

Watts

Llanfairpwllgwyngyllgogerycchwyrndrobwlllllantysilioggogogoch

Wales

However, things appear to still be OK. Here's the output from the given code:

But… let's add some printf() statements to check the strings after everything has been

read:

Naomi

Ellen

ndrobwlllllantysilioggogogoch

Llanfairpwllgwyngyllgogerycchwyrndrobwlllllantysilioggogogoch

Wales

Apparently, reading that long place name has caused the array holding the last name to be

corrupted… with the tail end of the long place name… and there's no runtime error… just

incorrect results…

Page 24: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

24

CS@VT ©2005-2019 WD McQuain

fscanf() and Strings

So, using fscanf() to read character data can lead to silent errors.

It can also lead to runtime errors.

If we merely change the placement of the array declarations in the code shown earlier,

execution leads to a segfault…

#define MAX_NLENGTH 25

. . .

char cityname[MAX_NLENGTH + 1];

char countryname[MAX_NLENGTH + 1];

char fname[MAX_NLENGTH + 1];

char mname[MAX_NLENGTH + 1];

char lname[MAX_NLENGTH + 1];

. . .

Using fscanf() to read character data is clearly risky, but can be considered safe if

precise assumptions about the input data can be justified.

Page 25: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

25

CS@VT ©2005-2019 WD McQuain

Reading Delimited Data

Suppose we have an input file with information about music tracks:

Buddy Guy Skin Deep 00:04:30

Eric Clapton I'm Tore Down 00:03:03

B. B. King A World Full of Strangers 00:04:22

Eagles Long Road out of Eden 00:10:17

Each line follows the pattern:

<artist><\t><track name><\t><track length><\n>

Where:

artist alphanumeric plus spaces, no length limit

track name alphanumeric plus spaces, no length limit

track length hh:mm:ss, where h, m and s are digits

Now, fscanf() evidently won't do for the artist and track name fields, since they may

contain spaces.

Page 26: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

26

CS@VT ©2005-2019 WD McQuain

fgets(), strtok(), String Library Functions

Here, the strings are delimited by tab characters; can we take advantage of that?

Buddy Guy Skin Deep 4:30

Eric Clapton I'm Tore Down 3:03

B. B. King A World Full of Strangers 4:22

Eagles Long Road out of Eden 10:17

fgets() can be used to safely read entire lines of character data, if we have a reasonable

idea of the maximum length of the line.

strtok() can be used break up a character string into chunks, based on the occurrence of

delimiting characters.

strlen() and strncpy() can be used to safely copy the chunks into individual arrays.

calloc() and strlen() can be used create custom-sized arrays to hold the chunks.

Page 27: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

27

CS@VT ©2005-2019 WD McQuain

fgets()

char* fgets(char* s, int n, FILE* stream);

For the input shown below, this code would read the lines sequentially into the array:

reads bytes from the stream into the array s until n - 1 bytes have been read, or a

newline character has been read (and transferred to s), or an EOF is encountered.

s is then terminated with a zero byte.

returns s on success; returns NULL if an error occurs or no data is read.

#define MAX_LINELENGTH 10000 // absurdly large guess

char data[MAX_LINELENGTH + 1];

while ( fgets(data, MAX_LINELENGTH + 1, in) != NULL ) {

// process the data

}

Buddy Guy Skin Deep 4:30

Eric Clapton I'm Tore Down 3:03

B. B. King A World Full of Strangers 4:22

Eagles Long Road out of Eden 10:17

Page 28: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

28

CS@VT ©2005-2019 WD McQuain

strtok()

char* strtok(char* s, const char* delimiters);

if s is not NULL:

searches s for first character that is not in delimiters; returns NULL if this fails.

otherwise notes the beginning of a token, searches s for next character that is in

delimiters, replaces that with a terminator, returns pointer to beginning of token

if s is NULL:

performs actions above, using last string s passed in, beginning immediately after

the end of the previous token that was found

Page 29: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

29

CS@VT ©2005-2019 WD McQuain

strtok()

Suppose the first line of input shown below has been read into an array data:

Buddy Guy Skin Deep 4:30

We can use strtok() to isolate the artist name, since it's followed by a tab character:

char* token = strtok(data, "\t");

'B' 'u' 'd' 'd' 'y' ' ' 'G' 'u' 'y' '\0' 'S' ...

After the call to strtok(), data[] looks like this:

And, token points to the first character in data[]. . .

. . . and so token points to a valid C-string with a terminator.

'B' 'u' 'd' 'd' 'y' ' ' 'G' 'u' 'y' '\t' 'S' ...

The array contents would be:

Page 30: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

30

CS@VT ©2005-2019 WD McQuain

strtok()

We can use strtok() again to isolate the title, since it's followed by a tab character:

char* token = strtok(NULL, "\t");

'B' 'u' 'd' 'd' 'y' ' ' 'G' 'u' 'y' '\0' 'S' ...

Now, data[] looks like this:

'\0' 'S' 'k' 'i' 'n' ' ' 'D' 'e' 'e' 'p' '\0' ...

Now, data[] looks like this:

And, token points to the first character in the second token in data[]. . .

Page 31: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

31

CS@VT ©2005-2019 WD McQuain

Copying the Token

So, we can identify the artist name, and then copy it into an appropriate array:

char* token = strtok(data, "\t");

uint32_t tokenLength = strlen(token); // get token length

// allocate an array of exactly the right length

char* artist = calloc(tokenLength + 1, sizeof(char));

// copy the token into the new array

strncpy(artist, token, tokenLength);

A few points:

• calling strlen() is safe because we know the token is terminated

• calloc() fills the new array with zeros, so we have a terminator for the new string

• strncpy() is safe because the array we are copying into is known to be large

enough

Page 32: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

32

CS@VT ©2005-2019 WD McQuain

Reading the Following Data

Each input line has a length field (time) after the title field.

This is numeric data, and should be read as such.

The interesting part is how to get a pointer to the beginning of the length field:

char* lengthField = token + strlen(token) + 1;

strlen(token) gives us the number of characters in the title field.

We need to add 1 to that to account for the '\0' that strtok() inserted in place of the

tab.

Page 33: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

33

CS@VT ©2005-2019 WD McQuain

Reading the Following Data

Reading the length data is fairly trivial:

int minutes, seconds;

sscanf(lengthField, "%d%*c%d", &minutes, &seconds);

The %*c specifier accounts for the ':' that

follows the minutes value in the input data.

The single character is read, but discarded.

Page 34: String Representation in C C Strings 1cs2505/summer2019/Notes/T12_CStrings.… · String Representation in C char Word[7] = "foobar"; C treats char arrays as a special case in a number

C Strings

Computer Organization I

34

CS@VT ©2005-2019 WD McQuain

Putting it all together…

char data[MAX_LINELENGTH + 1];

FILE* in = fopen(argv[1], "r");

while ( fgets(data, MAX_LINELENGTH + 1, in) != NULL) {

char* token = strtok(data, "\t");

uint32_t tokenLength = strlen(token);

char* artist = calloc(tokenLength + 1, sizeof(char));

strncpy(artist, token, tokenLength);

token = strtok(NULL, "\t");

tokenLength = strlen(token);

char* title = calloc(tokenLength + 1, sizeof(char));

strncpy(title, token, tokenLength);

char* lengthField = token + strlen(token) + 1;

int minutes, seconds;

sscanf(lengthField, "%d%*c%d", &minutes, &seconds);

printf("Artist: %s\n", artist);

printf("Title: %s\n", title);

printf("Length: %dm %ds\n", minutes, seconds);

printf("\n");

}

fclose(in);