CPS 202
Class 8 - Chapter 7
Types

Review Chapter 6 homework
-------------------------

Now that we have had a good introduction to how C handles basic topics like
variables, conditions, and loops, it is time for us to start moving into the
expanded C universe.  The first stop will be to discuss how C data types work,
and to take a look at some of the details of memory and C.

We have seen two types of data thus far; the floating point type and the integer
type.  The floating point type is differentiated because it can hold decimal
values, as opposed to integer types which can hold whole numbers only.

Integer data is particularly useful because it is also highly portable.  Each
value in an integer has a highly structured and tightly defined storage pattern.
To store data, integers manipulate bits.

Bit - the most basic unit of computer memory, with a value of either 0 or 1

8 bits - make up one byte, the unit of measure we most often use when referring
to memory.  It can store one of up to 256 possible values.

The int data type is a special kind of integer type in C.  It is, by definition,
the native data type for any particular computer or compiler.  It can, and very
often does, vary.

16-bit machines - 8086, 8088, DOS compilers - an int has 16 bits or two bytes

32-bit machines - 80286, 80386, 486, Pentium - Unix, Windows - an int has 32
bits, or four bytes.  It is twice the size of a 16-bit int, but can hold
billions more values

64-bit machines - relatively rare, but Itanium, a new Intel chip, is 64-bit
native.  Some operating systems run at 64-bit native.  Each int is 8 bytes.

To best understand the relationship between memory and data, it will be useful
to use a memory map:

+--+--+--+--+--+--+--+
| 1|    2|          4|
+--+--+--+--+--+--+--+

1 byte, 2 bytes, and 4 bytes, as arrayed in memory.  Each integer data type will
take up 1, 2, or 4 bytes.  Sometimes 8.

To better understand the memory map, it is now time to learn about hexadecimal
numbering.  There are actually three main numbering systems that are used in C.
The main one is decimal, with the 10 numbers we know and love.  The second is
the octal numbering system, which uses only the numbers 0 though 7.  It is only
rarely used.  The second most common is hexadecimal.

Hex has 16 numbers, 0 through 9 and A through F.  To count in hex, when you
hit 9, use A and the next number instead of 0:

0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F

One more unit of memory measure: nybble.  Equal to four bits, rarely used.  But
each nybble consists of exactly one hex digit.  So a nybble can store 16 values,
0 - F.  Each byte is made up of two nybbles.

FF is the highest hex value that can be stored in a single byte.  00 is the
lowest.

For ease of use, let's assume 16-bit ints.

int i;

{  i  }
+--+--+
|     |     <-- space set aside now for i
+--+--+

Now, to set i equal to a value, we use a decimal value in the code, but it is
converted to hex in memory:

i = 0;

{  i  }
+--+--+
|00 00|
+--+--+

i++;

{  i  }
+--+--+
|00 01|
+--+--+

Each number has a unique hex representation.  To use the hex numbering system in
C, we precede a hex number with "0x":

i = 0xabcd;    // i = 43981

{  i  }
+--+--+
|ab cd|
+--+--+

If we view a value in a text editor, and it has the ability to convert data into
hex, this is the kind of thing we will see - dozens, even hundreds or thousands
of hex pairs.

So C has this int data type, which can hold different ranges of values - what if
we're on a 16-bit machine and we need to store a value higher than int can
store?  C also has the ability to declare portable data types.  int would be
considered non-portable because the values it can hold can change.

short int      <--  better known as just "short"      16-bit
long int       <--  better known as just "long"       32-bit
long long int  <--  better known as just "long long"  64-bit

Knowing which to choose means knowing your data.  Shorts can hold 65536 values.
Longs can hold over 4 billion values.  Knowing your data will tell you which to
use.  If you need to store an area code, the short makes more sense - the three-
digit number fits into a short nicely - a long would have too much waste. A
phone number has seven digits, obviously too big for a short.  A long is the
perfect choice.

Integer data has one further division, that between signed numbers and unsigned
numbers.  Signed numbers are used by default.

Signed short: -32767 through 32767
Signed long:  -2147483647 through 2147483647

Notice that in the short, there are 65536 unique values, but half of them are
negative.

Unsigned short:  0 through 65535
Unsigned long:   0 through 4294697295

The bit pattern of the highest signed short is:

{  i  }
+--+--+
|7f ff|
+--+--+

The value of -1 is:

{  i  }
+--+--+
|ff ff|
+--+--+

You have to be careful about overflowing the value of any variable, but
especially of smaller ones like short.  If you do:

short s = 32767;

Then you have set s equal to the highest value it can have.  To do:

s++;

Means that you have overflowed the variable's range.  This does not crash the
program, it simply modifies the bit pattern:

{  i  }
+--+--+
|80 00|
+--+--+

This gives us a value of -32767.  As we continue to add one to the bit pattern,
we move closer and closer to 0, until we hit 0 and are back in positive
territory, on a never-ending loop.  With unsigned numbers, there is still
overflow, but it never brings us into negatives.  65535 plus 1 is 0.

Constants
---------

We've used constant numbers for a while, but now that we are adding new data
types, we have to be sure to refer to them correctly.

10 20 30    <-- shorts
10l 20l 30l <-- longs
10ll        <-- long longs
0x45ab      <-- short in hex
0x12345678  <-- long in hex
0777        <-- octal (511 in decimal)

When forcing using the numeric modifier, uppercase or lowercase maybe used. I
prefer lower.

printf
------

When printing a long or a short, be sure to modify the format appropriately:

int i = 0;
short s = 0;
long l = 0l;

printf("%d\n",i);     // Use %d on an int
printf("%hd\n",s);    // Use %hd on a short
printf("%ld\n",l);    // Use %ld on a long

ESPECIALLY important with scanf.

%o and %x can be used to read and write data in octal and hex, if needed.

Floating point
--------------

float       - big
double      - bigger
long double - biggest

My favorite home C compiler uses float as its default floating point type;
Visual C++ uses double as its.  Floating point ranges have no definite ranges,
because the method used to store the data differs from that of integer data.
Integer data is stored in straight bit patterns.  Floating point is stored in a
standard way, with bits in the storage each meaning something, from the
characteristic (whole number), to the mantissa (the fractional part), to the
sign of the number.  There is no such thing as an unsigned floating point
number.  The usual format is in something called IEEE format.

So while a 1 is 0001 in hex for an int, 1.0 might be abcdef in IEEE (a float
could be 6 bytes if need be).

Constants: 1.0  1.5  3.141  1.0f (forces float)

To find out the size of a type, use the sizeof keyword:

printf("The size of a float is: %d\n", sizeof(float));

Character type
--------------

We have discussed shorts and longs which take up 2 and 4 bytes.  But is there a
data type that only takes up 1 byte?  Yes, it is called char.  It is called char
because it fits a character perfectly (prior to Unicode).  Chars have a range of
256 values.

Signed char uses a range of -127 through 127, or unsigned at 0 through 255. To
understand how char works in the storage of characters, understanding ASCII is
critical.  ASCII is an encoding method for assigning numbers to letters on the
keyboard.  We can refer to characters in C by their ASCII value, which is rare,
or by their name themselves.  To use the name, enclose the character in single
quotes:

char ch;
ch = 'A';
ch = 65;     // 65 is the ASCII value of 'A'
ch = 'a';
ch = 97;     // 97 is the ASCII value of 'a'

To print a character, use the %c format ... same to read one.  Still be sure
to use the & when needed:

printf("%c",ch);
scanf("%c",&ch);

We have our first look, though, at a replacement for scanf, with a function that
can read a single character from the keyboard:  getchar()

ch = getchar();    // reads a single character from the keyboard

while((ch = getchar()) != '\n');

This snippet of code reads and discards characters until the user presses ENTER.
We will work with getchar a lot more in the coming weeks, but here is a preview.
Code that allows us to process incoming keystrokes:

char ch;
int upper=0, lower=0, total=0;

while((ch = getchar()) != '\n') {
  total++;
  if (ch >= 'a' && ch <= 'z') lower++;
  else if (ch >= 'A' && ch <= 'Z') upper++;
}

printf("There were %d characters; %d were uppers, %d were lowers\n",
       total, upper, lower);


Conversion
----------

I've noted several times about people being lucky when they mixed integer and
floating point math in one statement.  For example:

int i = 15;
float f = 10.0;

i / 2 * f = 70      i / 2 = 7.5 or 7 ... 7 * 10 = 70

i * f / 2 = 75      i * f = 150 / 2 = 75

This happens because C has automatic type conversions.  Internally, C cannot
multiply an int by a float, so it has to convert one of them.  The rule says to
always convert up, so:

f / 2

works because f is a float, 2 is an integer, and the 2 converts up to a float.
If it converted down, then the f would become int and we would lose precision.
If you always use the .0 in your whole numbers when you're really doing floating
point math, then you'll always be safe.

i / 2.0 * f = 75  because the i is promoted to a float

int i = 1;
float f = 1.0;
long l;

l = i + f;

First, i is converted to a float, so that two floats can be added together. Now,
because we are assigning to a long, the conversion now has to go down, so the
float is downgraded to a long and that value is stored in l.  In this case, it
works out to be 2 no matter what.  But what if f was 1.5.  Then l would still be
2, even though the result of i + f was 2.5.

You can force your own conversions by doing typecasting:

l = (long)((float)i + f);

Typedef
-------

Typedef is a C keyword that allows you to create your own types from existing
ones.  Let's say you like the "bool" data type, which will indicate ints that
will have a value of 1 or 0.

typedef int bool;

The new name is the last thing in the list, and everything before it is what
bool is the same as.  Sounds like a define:

#define BOOL int

But typedef is more robust and is often handled by debuggers.

bool my_flag;