CPS 202 Class 8 - Chapter 7 Types Review Chapter 6 homework ------------------------- Now that we have had a good introduction to how C handles basic topics like variables, conditions, and loops, it is time for us to start moving into the expanded C universe. The first stop will be to discuss how C data types work, and to take a look at some of the details of memory and C. We have seen two types of data thus far; the floating point type and the integer type. The floating point type is differentiated because it can hold decimal values, as opposed to integer types which can hold whole numbers only. Integer data is particularly useful because it is also highly portable. Each value in an integer has a highly structured and tightly defined storage pattern. To store data, integers manipulate bits. Bit - the most basic unit of computer memory, with a value of either 0 or 1 8 bits - make up one byte, the unit of measure we most often use when referring to memory. It can store one of up to 256 possible values. The int data type is a special kind of integer type in C. It is, by definition, the native data type for any particular computer or compiler. It can, and very often does, vary. 16-bit machines - 8086, 8088, DOS compilers - an int has 16 bits or two bytes 32-bit machines - 80286, 80386, 486, Pentium - Unix, Windows - an int has 32 bits, or four bytes. It is twice the size of a 16-bit int, but can hold billions more values 64-bit machines - relatively rare, but Itanium, a new Intel chip, is 64-bit native. Some operating systems run at 64-bit native. Each int is 8 bytes. To best understand the relationship between memory and data, it will be useful to use a memory map: +--+--+--+--+--+--+--+ | 1| 2| 4| +--+--+--+--+--+--+--+ 1 byte, 2 bytes, and 4 bytes, as arrayed in memory. Each integer data type will take up 1, 2, or 4 bytes. Sometimes 8. To better understand the memory map, it is now time to learn about hexadecimal numbering. There are actually three main numbering systems that are used in C. The main one is decimal, with the 10 numbers we know and love. The second is the octal numbering system, which uses only the numbers 0 though 7. It is only rarely used. The second most common is hexadecimal. Hex has 16 numbers, 0 through 9 and A through F. To count in hex, when you hit 9, use A and the next number instead of 0: 0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F One more unit of memory measure: nybble. Equal to four bits, rarely used. But each nybble consists of exactly one hex digit. So a nybble can store 16 values, 0 - F. Each byte is made up of two nybbles. FF is the highest hex value that can be stored in a single byte. 00 is the lowest. For ease of use, let's assume 16-bit ints. int i; { i } +--+--+ | | <-- space set aside now for i +--+--+ Now, to set i equal to a value, we use a decimal value in the code, but it is converted to hex in memory: i = 0; { i } +--+--+ |00 00| +--+--+ i++; { i } +--+--+ |00 01| +--+--+ Each number has a unique hex representation. To use the hex numbering system in C, we precede a hex number with "0x": i = 0xabcd; // i = 43981 { i } +--+--+ |ab cd| +--+--+ If we view a value in a text editor, and it has the ability to convert data into hex, this is the kind of thing we will see - dozens, even hundreds or thousands of hex pairs. So C has this int data type, which can hold different ranges of values - what if we're on a 16-bit machine and we need to store a value higher than int can store? C also has the ability to declare portable data types. int would be considered non-portable because the values it can hold can change. short int <-- better known as just "short" 16-bit long int <-- better known as just "long" 32-bit long long int <-- better known as just "long long" 64-bit Knowing which to choose means knowing your data. Shorts can hold 65536 values. Longs can hold over 4 billion values. Knowing your data will tell you which to use. If you need to store an area code, the short makes more sense - the three- digit number fits into a short nicely - a long would have too much waste. A phone number has seven digits, obviously too big for a short. A long is the perfect choice. Integer data has one further division, that between signed numbers and unsigned numbers. Signed numbers are used by default. Signed short: -32767 through 32767 Signed long: -2147483647 through 2147483647 Notice that in the short, there are 65536 unique values, but half of them are negative. Unsigned short: 0 through 65535 Unsigned long: 0 through 4294697295 The bit pattern of the highest signed short is: { i } +--+--+ |7f ff| +--+--+ The value of -1 is: { i } +--+--+ |ff ff| +--+--+ You have to be careful about overflowing the value of any variable, but especially of smaller ones like short. If you do: short s = 32767; Then you have set s equal to the highest value it can have. To do: s++; Means that you have overflowed the variable's range. This does not crash the program, it simply modifies the bit pattern: { i } +--+--+ |80 00| +--+--+ This gives us a value of -32767. As we continue to add one to the bit pattern, we move closer and closer to 0, until we hit 0 and are back in positive territory, on a never-ending loop. With unsigned numbers, there is still overflow, but it never brings us into negatives. 65535 plus 1 is 0. Constants --------- We've used constant numbers for a while, but now that we are adding new data types, we have to be sure to refer to them correctly. 10 20 30 <-- shorts 10l 20l 30l <-- longs 10ll <-- long longs 0x45ab <-- short in hex 0x12345678 <-- long in hex 0777 <-- octal (511 in decimal) When forcing using the numeric modifier, uppercase or lowercase maybe used. I prefer lower. printf ------ When printing a long or a short, be sure to modify the format appropriately: int i = 0; short s = 0; long l = 0l; printf("%d\n",i); // Use %d on an int printf("%hd\n",s); // Use %hd on a short printf("%ld\n",l); // Use %ld on a long ESPECIALLY important with scanf. %o and %x can be used to read and write data in octal and hex, if needed. Floating point -------------- float - big double - bigger long double - biggest My favorite home C compiler uses float as its default floating point type; Visual C++ uses double as its. Floating point ranges have no definite ranges, because the method used to store the data differs from that of integer data. Integer data is stored in straight bit patterns. Floating point is stored in a standard way, with bits in the storage each meaning something, from the characteristic (whole number), to the mantissa (the fractional part), to the sign of the number. There is no such thing as an unsigned floating point number. The usual format is in something called IEEE format. So while a 1 is 0001 in hex for an int, 1.0 might be abcdef in IEEE (a float could be 6 bytes if need be). Constants: 1.0 1.5 3.141 1.0f (forces float) To find out the size of a type, use the sizeof keyword: printf("The size of a float is: %d\n", sizeof(float)); Character type -------------- We have discussed shorts and longs which take up 2 and 4 bytes. But is there a data type that only takes up 1 byte? Yes, it is called char. It is called char because it fits a character perfectly (prior to Unicode). Chars have a range of 256 values. Signed char uses a range of -127 through 127, or unsigned at 0 through 255. To understand how char works in the storage of characters, understanding ASCII is critical. ASCII is an encoding method for assigning numbers to letters on the keyboard. We can refer to characters in C by their ASCII value, which is rare, or by their name themselves. To use the name, enclose the character in single quotes: char ch; ch = 'A'; ch = 65; // 65 is the ASCII value of 'A' ch = 'a'; ch = 97; // 97 is the ASCII value of 'a' To print a character, use the %c format ... same to read one. Still be sure to use the & when needed: printf("%c",ch); scanf("%c",&ch); We have our first look, though, at a replacement for scanf, with a function that can read a single character from the keyboard: getchar() ch = getchar(); // reads a single character from the keyboard while((ch = getchar()) != '\n'); This snippet of code reads and discards characters until the user presses ENTER. We will work with getchar a lot more in the coming weeks, but here is a preview. Code that allows us to process incoming keystrokes: char ch; int upper=0, lower=0, total=0; while((ch = getchar()) != '\n') { total++; if (ch >= 'a' && ch <= 'z') lower++; else if (ch >= 'A' && ch <= 'Z') upper++; } printf("There were %d characters; %d were uppers, %d were lowers\n", total, upper, lower); Conversion ---------- I've noted several times about people being lucky when they mixed integer and floating point math in one statement. For example: int i = 15; float f = 10.0; i / 2 * f = 70 i / 2 = 7.5 or 7 ... 7 * 10 = 70 i * f / 2 = 75 i * f = 150 / 2 = 75 This happens because C has automatic type conversions. Internally, C cannot multiply an int by a float, so it has to convert one of them. The rule says to always convert up, so: f / 2 works because f is a float, 2 is an integer, and the 2 converts up to a float. If it converted down, then the f would become int and we would lose precision. If you always use the .0 in your whole numbers when you're really doing floating point math, then you'll always be safe. i / 2.0 * f = 75 because the i is promoted to a float int i = 1; float f = 1.0; long l; l = i + f; First, i is converted to a float, so that two floats can be added together. Now, because we are assigning to a long, the conversion now has to go down, so the float is downgraded to a long and that value is stored in l. In this case, it works out to be 2 no matter what. But what if f was 1.5. Then l would still be 2, even though the result of i + f was 2.5. You can force your own conversions by doing typecasting: l = (long)((float)i + f); Typedef ------- Typedef is a C keyword that allows you to create your own types from existing ones. Let's say you like the "bool" data type, which will indicate ints that will have a value of 1 or 0. typedef int bool; The new name is the last thing in the list, and everything before it is what bool is the same as. Sounds like a define: #define BOOL int But typedef is more robust and is often handled by debuggers. bool my_flag;