Week 14 - Chapter 13 Strings Review Chapter 12 homework -------------------------- A string, in general programming terms, is used to store a bunch of characters for later use, either a word, a sentence, a paragraph - or just a single character. In some languages, strings are a special type, but in C, they are just a slightly special kind of array. That means a lot of nice things, such as all the things we've learned about arrays and pointers also apply to strings. String literal -------------- A string literal is one that appears directly in the text of a program, a series of characters contained inside double quotes. We've seen these many times before: "Hello World!" "Please enter a number:" "%02d" These are all string literals. Often, these are all combined into one place in memory, read-only memory, because they are unlikely to change. In fact, it is generally considered bad programming practice to try to change string literals, precisely because they are often placed in read-only memory. String literals start with the beginning double-quote and end with the ending one - generally, both are on the same physical line. If a string literal must be continued to another line, there are a few ways to do it. The backslash method -------------------- "this is a \ string" The biggest problem with the backslash method is that you might be tempted to indent the second part of the string - but the extra spaces are included in the string in the end. The above is equivalent to: "this is a string" Adjacent Strings ---------------- In the C89 standard, the concept of concatenation of adjacent strings was introduced. Concatenation means to add two strings together. Adjacent strings are those separated only by white space: "This is a " "string" or "This is a " "string" Null-termination ---------------- String literals are all just normal arrays of chars, but there is one big difference between the string and other arrays - strings have a special character at the end that means "end of string." This character, the null character, is used by all string supporting functions to know when the string ends. In this way, it is unique to all other arrays. The null character is expressed as "\0". This null-termination has one side effect. You must allocate space for the null, as the compiler will not do it for you. char msg[5] = "Hello"; Here we think we are being efficient by allocating exactly what is needed to hold the string, five characters. But in doing so, we prevent the program from using "msg" as a string - there is no null-termination. C will add the null for you, if you let it: char msg[6] = "Hello"; //Perfect, room for message and null char msg[] = "Hello"; //Does the same, with no intervention if the // length of the string changes The above is one way of creating a string literal, by creating an array. Of course, msg refers to the beginning of the array, just like an array name. When we do: printf("Hello World"); we are actually creating an array of char's, plus the null, in memory and we use it once only. Once it has been used, there is no way to refer back to it. So if we need to print this message many times, you have to have multiple copies. But we can also assign strings to arrays and pointers. We've seen the way to do it to an array, so let's look at the pointer: char *p; p = "abc"; This creates "abc\0" in memory, and provides the path to the string to the p pointer. Now we can use the p pointer in the code to print "abc" as many times as we need to, saving space each time we reuse p. Note that if we point p away from the memory location of "abc" and we have not saved it elsewhere, it is gone. Also note that we can assign a char pointer to another string literal, but we can never assign an array name to another string literal: char msg[80] = "Help!"; char *p = "Help!"; p = "hello"; // legal msg = "hello"; // illegal String variables ---------------- String variables differ primarily from string literals in a few ways. First, they are almost always stored in an array. But the data is always stored in read/write memory, and can always be changed. char msg[81]; // Maximum msg length is 80, add one for null C offers a full range of functions to help manipulate strings stored in string variables - we'll see them in a moment. Printing a string ----------------- Printf can be used to print a string with the %s format. The puts function can also be used, though it allows no formatting. char *msg = "Hello!"; printf("%s",msg); puts(msg); Reading a string ---------------- Scanf can be used to read a string, though it is cranky. Gets can also be used. char msg[81]; scanf("%s",msg); Scanf is bane of our existence again. In this case, it reads one word into msg (which, by the way, is a pointer already, so needs no &), and stops reading when it hits whitespace. Gets can be used to read up until the first newline: gets(msg); This reads all characters up to, but not including, a newline character. The biggest problem, though, is that since you do not tell gets how long your array is, the limit can easily be reached and exceeded without any problem or indication of problem. So, we will "burn our own" string reader. int read_line(char str[], int n) { char ch; int i = 0; while((ch = getchar()) != '\n') { // Read a character if (i < n) str[i++] = ch; // Put ch into str, if not at end } str[i] = '\0'; // null-terminate the string return(i); // Return the length of str } This function can be written once and cut and pasted into every program that needs it. It is a bounds-safe function. Processing strings ------------------ Aside from the reading of a string, we normally do not need to tell a function how long our array is, because the null is a special character that means end of string. So, if we want to count the number of spaces in a string, we only need to pass the string itself: i = count_spaces(str); // Use the null to find EOS int count_spaces(char *string) { int count = 0; while(*string) { // Use *string to see value of one char in string if (*string == ' ') count++; string++; // Move pointer to the right } return(count); } Of course, if we prefer, we can use the index value for an array: int count_spaces(char string[]) { int count = 0; int i = 0; while(string[i] != '\0') { if (string[i] == ' ') count++; i++; } return(count); } Both of these work well, pick whichever makes you more comfortable. The string library ------------------ The string library supplies several useful functions for working with strings. The first gets around a problem with any array. char msg[] = "This is a message"; char msg2[81]; msg2 = msg1; // Can't do it You can't assign one array to another, so to copy a string from one to another, use the strcpy function: strcpy(msg2,msg); The first is the destination, the second is the source. Note that there is no bounds checking, so the programmer must ensure that msg2 is large enough to hold all of msg, including the null. You also can do this: if (msg == msg2) to test two strings for equality, because since they are pointers, they will never be the same. Use the strcmp function instead: i = strcmp(msg2,msg); strcmp compares the two given strings. It returns 0 if they are exactly alike, -1 if msg2 is less than msg, or 1 if msg is greater than msg2. The comparison is alphabetic, and case counts. To determine the length of a string, we could write a loop to count until we hit the null, but strlen already does that: i = strlen(msg); i is the number of non-null characters. Segue - efficiency ------------------ When using string functions, be careful to avoid unneeded overhead. For example: for (i=0; i