Lec 3 Fundamental Variable Definition, Initialization, and Assignment

Lec 3 — Fundamental variable definition, initialization, and assignment

Addressing memory

In the previous lesson on variables, we talked about the fact that variables are names for a piece of memory that can be used to store information. To recap briefly, computers have random access memory (RAM) that is available for programs to use. When a variable is defined, a piece of that memory is set aside for that variable.

The smallest unit of memory is a binary digit (bit), which can hold a value of 0 or 1. You can think of a bit as being like a traditional light switch -- either the light is off (0), or it is on (1). There is no in-between. If you were to look at a random segment of memory, all you would see is …011010100101010… or some combination thereof. Memory is organized into sequential units called memory addresses (or addresses for short). Similar to how a street address can be used to find a given house on a street, the memory address allows us to find and access the contents of memory at a particular location. Perhaps surprisingly, in modern computers, each bit does not get its own address. The smallest addressable unit of memory is known as a byte. The modern standard is that a byte is comprised of 8 sequential bits. Note that some older or non-standard machines may have bytes of a different size -- however, we generally need not worry about these. For these tutorials, we’ll assume a byte is 8 bits.

The following picture shows some sequential memory addresses, along with the corresponding byte of data:

Because all data on a computer is just a sequence of bits, we use a data type (often called a “type” for short) to tell us how to interpret the contents of memory in some meaningful way. You have already seen one example of a data type: the integer. When we declare a variable as an integer, we are telling the compiler “the piece of memory that this variable addresses is going to be interpreted as a whole number”.

When you assign a value to a data type, the compiler and CPU take care of the details of encoding your value into the appropriate sequence of bits for that data type. When you ask for your value back, your number is “reconstituted” from the sequence of bits in memory.

There are many other data types in C++ besides the integer, most of which we will cover shortly. As shorthand, we typically refer to a variable’s “data type” as its “type”.

Fundamental data types

C++ comes with built-in support for certain data types. These are called fundamental data types (in the C++ specification), but are often informally called basic types, primitive types, or built-in types.

Here is a list of the fundamental data types, some of which you have already seen:

Category / Types / Meaning / Example / Notes
boolean / bool / true or false / true
character / char / a single ASCII character / ‘c’ / char16_t, char32_t are C++11 only
floating point / float, double, long double / a number with a decimal / 3.14159
integer / short, int, long, long long / a whole number / 64 / long long is C99/C++11 only
void / no type / void / n/a

This chapter is dedicated to exploring these basic data types in detail.

Defining a variable

In the “basic C++” section, you already learned how to define an integer variable:

1 / int nVarName; // int is the type, nVarName is the name of the variable

To define variables of other data types, the idea is exactly the same:

1 / type varName; // type is the type (eg. int), varName is the name of the variable

In the following example, we define 5 different variables of 5 different types.

1
2
3
4
5 / bool bValue;
char chValue;
int nValue;
float fValue;
double dValue;

Note that void has special rules about how it can be used, so the following won’t work:

1 / void vValue; // won't work, void can't be used as a type for variable definitions

Variable initialization

When a variable is defined, you can immediately give that variable a value. This is called variable initialization (or initialization for short).

C++ supports three basic ways to initialize a variable. First, we can do copy initialization by using an equals sign:

1 / int nValue = 5; // copy initialization

Note that the equals sign here is just part of the syntax, and is not the same equals sign used to assign a value once the variable has been created.

Second, we can do a direct initialization by using parenthesis.

1 / int nValue(5); // direct initialization

Even though direct initialization form looks a lot like a function call, the compiler keeps track of which names are variables and which are functions so that they can be resolved properly.

Direct initialization can perform better than copy initialization for some data types, and comes with some other benefits once we start talking about classes. It also helps differentiate initialization from assignment. Consequently, we recommend using direct initialization over copy initialization.

Rule: Favor direct initialization over copy initialization

Uniform initialization in C++11

Because C++ grew organically, the copy initialization and direct initialization forms only work for some types of variables (for example, you can’t use either of these forms to initialize a list of values).

In an attempt to provide a single initialization mechanism that will work with all data types, C++11 adds a new form of initialization called uniform initialization (also called brace initialization):

1 / int value{5};

Initializing a variable with an empty brace indicates default initialization. Default initialization initializes the variable to zero (or empty, if that’s more appropriate for a given type).

1 / int value{}; // default initialization to 0

Uniform initialization has the added benefit of disallowing “narrowing” type conversions. This means that if you try to use uniform initialization to initialize a variable with a value it can not safely hold, the compiler will throw an warning or error. For example:

1 / int value{4.5}; // error: an integer variable can not hold a non-integer value

Rule: If you’re using a C++11 compatible compiler, favor uniform initialization

Variable assignment

When a variable is given a value after it has been defined, it is called a copy assignment (or assignment for short).

1
2 / int nValue;
nValue = 5; // copy assignment

C++ does not provide any built-in way to do a direct or uniform assignment.

Uninitialized variables

A variable that is not initialized is called an uninitialized variable. In C++, a fundamental variable that is uninitialized will have a garbage value until you assign a valid one.

Side note: C++ also has other non-fundamental types, such as pointers, structs, and classes. Some of these do not initialize by default, and some of them do. We’ll explore these types in future lessons. For now, it’s safer to assume all types do not initialize by default.

Rule: Always initialize your fundamental variables, or assign a value to them as soon as possible after defining them.

Defining multiple variables

It is possible to define multiple variables of the same type in a single statement by separating the names with a comma. The following 2 snippets of code are effectively the same:

1 / int a, b;
1
2 / int a;
int b;

You can also initialize multiple variables defined on the same line:

1
2
3 / int a = 5, b = 6;
int c(7), d(8);
int e{9}, f{10};

There are three mistakes that new programmers tend to make when defining multiple variables in the same statement.

The first mistake is giving each variable a type when defining variables in sequence. This is not a bad mistake because the compiler will complain and ask you to fix it.

1
2
3 / int a, int b; // wrong (compiler error)
int a, b; // correct

The second error is to try to define variables of different types on the same line, which is not allowed. Variables of different types must be defined in separate statements. This is also not a bad mistake because the compiler will complain and ask you to fix it.

1
2
3
4
5
6
7 / int a, double b; // wrong (compiler error)
int a; double b; // correct (but not recommended)
// correct and recommended (easier to read)
int a;
double b;

The last mistake is the dangerous case. In this case, the programmer mistakenly tries to initialize both variables by using one initialization statement:

1
2
3 / int a, b = 5; // wrong (a is uninitialized!)
int a= 5, b= 5; // correct

In the top statement, variable “a” will be left uninitialized, and the compiler may or may not complain. If it doesn’t, this is a great way to have your program intermittently crash and produce sporadic results.

The best way to remember that this is wrong is to consider the case of direct initialization or uniform initialization:

1
2 / int a, b(5);
int c, d{5};

This makes it seem a little more clear that the value 5 is only being assigned to variable b.

Because defining multiple variables on a single line AND initializing them is a recipe for mistakes, we recommend that you only define multiple variables on a line if you’re not initializing any of them.

Rule: Avoid defining multiple variables on a single line if initializing any of them.

Where to define variables

Older C compilers forced users to define all of the variables in a function at the top of the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 / int main()
{
// all variable up top
int x;
int y;
// then code
std::cout < "Enter a number: ";
std::cin > x;
std::cout < "Enter another number: ";
std::cin > y;
std::cout < "The sum is: " < x + y < std::endl;
return 0;
}

This style is now obsolete. C++ compilers do not require all variables to be defined at the top of a function. The proper C++ style is to define variables as close to the first use of that variable as you reasonably can:

1
2
3
4
5
6
7
8
9
10
11
12
13 / int main()
{
std::cout < "Enter a number: ";
int x; // we need x on the next line, so we'll declare it here, as close to its first use as we reasonably can.
std::cin > x; // first use of x
std::cout < "Enter another number: ";
int y; // we don't need y until now, so it gets declared here
std::cin > y; // first use of y
std::cout < "The sum is: " < x + y < std::endl;
return 0;
}

This has quite a few advantages.

First, variables that are defined only when needed are given context by the statements around them. If x were defined at the top of the function, we would have no idea what it was used for until we scanned the function and found where it was used. Defining x amongst a bunch of input/output statements helps make it obvious that this variable is being used for input and/or output.

Second, defining a variable only where it is needed tells us that this variable does not affect anything above it, making our program easier to understand and requiring less scrolling.

Finally, it reduces the likelihood of inadvertently leaving a variable uninitialized, because we can define and then immediately initialize it with the value we want it to have.

Most of the time, you’ll be able to declare a variable on the line immediately preceding the first use of that variable. However, you will occasionally encounter a case where this is either not desirable (due to performance reasons), or not possible (because the variable will get destroyed and you need it later). We’ll see examples of these cases in future chapters.

Variable sizes and the sizeof operator

Memory on modern machines is typically organized into byte-sized units, with each unit having a unique address. Up to this point, it has been useful to think of memory as a bunch of cubbyholes or mailboxes where we can put and retrieve information, and variables as names for accessing those cubbyholes or mailboxes.

However, this analogy is not quite correct in one regard -- most variables actually take up more than 1 byte of memory. Consequently, a single variable may use 2, 4, or even 8 consecutive memory addresses. The amount of memory that a variable uses is based on its data type. Fortunately, because we typically access memory through variable names and not memory addresses, the compiler is largely able to hide the details of working with different sized variables from us.

There are several reasons it is useful to know how much memory a variable takes up.

First, the more memory a variable takes up, the more information it can hold. Because each bit can only hold a 0 or a 1, we say that bit can hold 2 possible values.

2 bits can hold 4 possible values:

bit 0 / bit 1
0 / 0
0 / 1
1 / 0
1 / 1

3 bits can hold 8 possible values:

bit 0 / bit 1 / bit 2
0 / 0 / 0
0 / 0 / 1
0 / 1 / 0
0 / 1 / 1
1 / 0 / 0
1 / 0 / 1
1 / 1 / 0
1 / 1 / 1

To generalize, a variable with n bits can hold 2n (2 to the power of n, also commonly written 2^n) possible values. With an 8-bit byte, a byte can store 28 (256) possible values.

The size of the variable puts a limit on the amount of information it can store -- variables that utilize more bytes can hold a wider range of values. We will address this issue further when we get into the different types of variables.

Second, computers have a finite amount of free memory. Every time we declare a variable, a small portion of that free memory is used for as long as the variable is in existence. Because modern computers have a lot of memory, this often isn’t a problem, especially if only declaring a few variables. However, for programs that need a large amount of variables (eg. 100,000), the difference between using 1 byte and 8 byte variables can be significant.

The size of C++ basic data types

The obvious next question is “how much memory do variables of different data types take?”. You may be surprised to find that the size of a given data type is dependent on the compiler and/or the computer architecture!

C++ guarantees that the basic data types will have a minimum size:

Category / Type / Minimum Size / Note
boolean / bool / 1 byte
character / char / 1 byte / May be signed or unsigned
Always exactly 1 byte
wchar_t / 1 byte
char16_t / 2 bytes / C++11 type
char32_t / 4 bytes / C++11 type
integer / short / 2 bytes
int / 2 bytes
long / 4 bytes
long long / 8 bytes / C99/C++11 type
floating point / float / 4 bytes
double / 8 bytes
long double / 8 bytes

However, the actual size of the variables may be different on your machine (particularly int, which is more often 4 bytes). In order to determine the size of data types on a particular machine, C++ provides an operator named sizeof. The sizeof operator is a unary operator that takes either a type or a variable, and returns its size in bytes. You can compile and run the following program to find out how large some of your data types are:

#include <iostream>

int main()

{

std::cout < "bool:\t\t" < sizeof(bool) < " bytes" < std::endl;

std::cout < "char:\t\t" < sizeof(char) < " bytes" < std::endl;

std::cout < "wchar_t:\t" < sizeof(wchar_t) < " bytes" < std::endl;

std::cout < "char16_t:\t" < sizeof(char16_t) < " bytes" < std::endl; // C++11, may not be supported by your compiler

std::cout < "char32_t:\t" < sizeof(char32_t) < " bytes" < std::endl; // C++11, may not be supported by your compiler

std::cout < "short:\t\t" < sizeof(short) < " bytes" < std::endl;

std::cout < "int:\t\t" < sizeof(int) < " bytes" < std::endl;

std::cout < "long:\t\t" < sizeof(long) < " bytes" < std::endl;

std::cout < "long long:\t" < sizeof(long long) < " bytes" < std::endl; // C++11, may not be supported by your compiler

std::cout < "float:\t\t" < sizeof(float) < " bytes" < std::endl;

std::cout < "double:\t\t" < sizeof(double) < " bytes" < std::endl;

std::cout < "long double:\t" < sizeof(long double) < " bytes" < std::endl;

return 0;

}

Here is the output from the author’s x64 machine (in 2015), using Visual Studio 2013:

bool: 1 bytes

char: 1 bytes

wchar_t: 2 bytes

char16_t: 2 bytes

char32_t: 4 bytes

short: 2 bytes

int: 4 bytes

long: 4 bytes

long long: 8 bytes

float: 4 bytes

double: 8 bytes

long double: 8 bytes

Your results may vary if you are using a different type of machine, or a different compiler. Note that you can not take the sizeof the void type, since it has no size (doing so will cause a compile error).

If you’re wondering what ‘\t’ is in the above program, it’s a special symbol that inserts a tab (in the example, we’re using it to align the output columns). We will cover ‘\t’ and other special symbols when we talk about the char data type.

You can also use the sizeof operator on a variable name:

int x;

std::cout < "x is " < sizeof(x) < " bytes" < std::endl;

x is 4 bytes.

Integers

An integer type (sometimes called an integral type) variable is a variable that can only hold non-fractional numbers (e.g. -2, -1, 0, 1, 2). C++ has five different fundamental integer types available for use:

Category / Type / Minimum Size / Note
character / char / 1 byte
integer / short / 2 bytes
int / 2 bytes / Typically 4 bytes on modern architectures
long / 4 bytes
long long / 8 bytes / C99/C++11 type

Char is a special case, in that it falls into both the character and integer categories. We’ll talk about the special properties of char later. In this lesson, you can treat it as a normal integer.

The key difference between the various integer types is that they have varying sizes -- the larger integers can hold bigger numbers.

Defining integers

Defining some integers:

char c;

short int si; // valid

short s; // preferred

int i;

long int li; // valid

long l; // preferred

long long int lli; // valid

long long ll; // preferred

While short int, long int, and long long int are valid, the shorthand versions short, long, and long long should be preferred. In addition to being less typing, adding the prefix int makes the type harder to distinguish from variables of type int. This can lead to mistakes if the short or long modifier is inadvertently missed.

Identifying integer

Because the size of char, short, int, and long can vary depending on the compiler and/or computer architecture, it can be instructive to refer to integers by their size rather than name. We often refer to integers by the number of bits a variable of that type is allocated (e.g. “32-bit integer” instead of “long”).

Integer ranges and sign

As you learned in the last section, a variable with n bits can store 2n different values. But which specific values? We call the set of specific values that a data type can hold its range. The range of an integer variable is determined by two factors: its size (in bits), and its sign, which can be “signed” or “unsigned”.

A signed integer is a variable that can hold both negative and positive numbers. To explicitly declare a variable as signed, you can use the signed keyword: