Chapter – 5: Names, Bindings, Type Checking, and Scope

Introduction

• Imperative languages are abstractions of von Neumann architecture: Memory (stores both instructions and data); Processor (provides operations for modifying the contents of the memory).

• Variables characterized by attributes (properties), most important:

– Type (fundamental concept): to design, must consider scope, lifetime, type checking, initialization, and type compatibility.

Names (Identifier): string of characters used to identify some entity in a program.

Name is a fundamental attribute of variables, labels, subprograms, formal parameters, program constructs.

• Design issues for names:

– Maximum length? Are connector characters allowed?

– Are names case sensitive? Are special words reserved words or keywords?

• Length

– If too short, they cannot be connotative.

– Language examples: FORTRAN I: maximum 6; COBOL: maximum 30; FORTRAN 90 and ANSI C: maximum 31; Ada and Java: no limit, and all are significant; C++: no limit, but implementers often impose one. Ada may impose limit on length up to 200 characters.

Impose limit in order to make the symbol table not too large, simplify maintenance.

Names in most PLs have the same form: letter followed by a string of letters, digits, and underscore.

• Connectors

– Pascal, Modula-2, and FORTRAN 77 don't allow; Others do

• Case sensitivity

– Disadvantage: Detriment to readability (names that look alike are different)

Worse in C++ and Java because predefined names are mixed case (e.g. IndexOutOfBoundsException; parseInt)

– C, C++, and Java names are case sensitive

• The names in other languages are not

• Special words (void, procedure, begin, end, if, else, int, char, etc.)

– An aid to readability; used to delimit or separate statement clauses

– A keyword is a word that is special only in certain contexts, e.g., in Fortran (special words are keywords)

Real VarName (Real is a data type followed with a name, therefore Real is a keyword), declarative statement.

Real = 3.4 (Real is a variable)[Real Apple Real=1.2]

– A reserved word is a special word that cannot be used as a user-defined name.

– As a language design choice, reserved words are better than keywords because the ability to redefine keywords can be confusing (e.g. in Fortran: Integer Real Real Integer)

Variables

• A variable is an abstraction of a memory cell

• Variables can be characterized as a sextuple of attributes:

– Name, Address, Value, Type, Lifetime, Scope

Variables Attributes

• Name - not all variables have them (e.g. of not having name Explicit heap-dynamic variables).

• Explicit heap-dynamic variables: are nameless memory cells that are allocated/deallocated (from/to) heap by explicit run-time instructions, referenced by pointer or reference variable.

Ex: C++

int *intnode; //creater pointer

intnode= new int; //create heap-dynamic variable

delete intnode;//deallocate heap-dynamic variable

• Address - the memory address with which it is associated

– A variable may have different addresses at different times during execution (e.g. subprogram has a local variable that is allocated from run-time stack. When the subprogram is called, different calls may result in that variable having different addresses).

– A variable may have different addresses at different places in a program.

Ex:

– If two variable names can be used to access the same memory location, they are called aliases (suppose sum and total variables are aliases, any change to total also changes sum and vice versa).

– Aliases are created via pointers, reference variables, C and C++ unions. Union type: type that store different type values at different times during execution. (e.g. consider a table of constants for a compiler which is used to store constants found in a program. One field of each table entry is for the value of the constant. Suppose constants types are int, float, Boolean. It is convenient if the same location (table field), could store a value of any of 3 types. Then all constant values can be addressed in the same way, i.e. location type is union of 3 value types it can store).

– Aliases are harmful to readability (program readers must remember all of them).

• Type - determines the range of values of variables and the set of operations that are defined for values of that type; in the case of floating point, type also determines the precision.

• Value - the contents of the location with which the variable is associated.

• Abstract memory cell - the physical cell or collection of cells associated with a variable (float, although may occupy 4 physical bytes, we think as it occupying a single abstract memory cell.

The Concept of Binding

• The l-value of a variable is its address; the r-value of a variable is its value. To access the r-value, the l-value must be determined 1st.

• A binding is an association, such as between an attribute and an entity, or between an operation and a symbol

• Binding time is the time at which a binding takes place.

Possible Binding Times

• Language design time-- bind operator symbols to operations (* is usually bound to multiplication operation at language design time.

• Language implementation time-- bind floating point type to a representation, bound int to a range of possible values.

• Compile time -- bind a variable to a type in C or Java.

• Load time -- bind a FORTRAN 77 variable to a memory cell (or a C static variable).

• Link time-- a call to a library subprogram is bound to the subprogram code.

• Runtime -- bind a nonstatic local variable to a memory cell.

Ex: consider the C assignment statement

Type of count is bound at compile time; set of possible values of count is bound at compile design time,; meaning of operator symbol (+) is bound at compiler design time after operands have been determined; internal representation of literal 5 is bound at compiler design time; the value of count is bound at execution time with this statement.

Static and Dynamic Binding

• A binding is static if it first occurs before run time and remains unchanged throughout program execution.

• A binding is dynamic if it first occurs during execution or can change during execution of the program.

• Binding of a variable to a storage cell in a virtual memory environment is complex, because the page or segment of the address space in which the cell resides may be moved in and out of memory many times during program execution. i.e. variables are bound and unbound repeatedly.

Type Binding

• How is a type specified? When does the binding take place?

• If static, the type may be specified by either an explicit or an implicit declaration.

Explicit/Implicit Declaration

• An explicit declaration is a program statement used for declaring the types of variables

• An implicit declaration is a default mechanism for specifying types of variables (the first appearance of the variable in the program). Both declarations create static bindings to types.

• PL/1, FORTRAN, BASIC, and Perl provide implicit declarations

– Advantage: writability

– Disadvantage: reliability: because they prevent the compiler from detecting some typographical and programmers errors. (Less trouble with Perl, $, @, %).

– In Fortran, Implicit none (prevent accidentally undeclared variables)

Dynamic Type Binding

• Dynamic Type Binding (JavaScript and PHP): variable type is not specified by declaration nor spelling. But specified through an assignment statement; e.g., in JavaScript, list = [2, 4.33, 6, 8]; regardless of previous type of list. It is now 1-dimensional array of length 4. If the statement list = 17.3; is followed, list become a scalar variable.

Advantage: flexibility (generic program units: whatever type data is input will be acceptable, because variables can be bound to correct type when data assigned to them).

Disadvantages: High cost (dynamic type checking done in run-time, and must use pure interpreter which takes more times); less reliable, type error detection by the compiler is difficult. Incorrect types of R.H.S of assignments are not detected as errors; rather the type of L.H.S is changed to incorrect type. (e.g. in JavaScript, suppose i, x are storing scalar numeric values, and y storing an array. Suppose instead of writing i=x; we wrote (due to keying error), i=y; no error is detected in this statement, and i is changed to an array. But later on, the uses of i will expect it to be a scalar, and correct results will be impossible.

• Type Inferencing (ML, Miranda, and Haskell)

Rather than by assignment statement, types are determined from the context of the reference. e.g. in ML, types of most expressions can be determined without requiring the programmer to specify the types of the variables.

Ex: specifies a function takes real and produces real. The types are inferred from the type of the constant in the expression. Consider the function ML determine type of both (parameters and return value), from the operator *. Because it is an arithmetic operator, the type of parameters and function are assumed numeric. Default type of numeric in ML, is int. so it inferred the type is int.

If square were called with real value, square(2.7), it will cause an error. We could rewrite it because ML does not allow overloaded functions, this version could not coexist with earlier int version.

• Storage Bindings & Lifetime

– Allocation - getting a cell from some pool of available cells

– Deallocation - putting a cell back into the pool.

• The lifetime of a variable is the time during which it is bound to a particular memory cell.

Categories of Variables by Lifetimes

• Static--bound to memory cells before execution begins and remains bound to the same memory cell throughout execution, e.g., all FORTRAN 77 variables, C static variables

– Advantages: efficiency (direct addressing), history-sensitive subprogram support, i.e. having variables to retain values between separate executions of the subprogram (static bound).

– C&C++ allow including static specifier on a variable definition in a function, making the variables static.

– Disadvantage: lack of flexibility (no recursion); storage cannot be shared among variables (e.g. program has 2 subprograms, both require large arrays. suppose the two subprograms are never active at the same time. If the arrays are static, they cannot share the same storage for their arrays).

• Stack-dynamic--Storage bindings are created for variables when their declaration statements are elaborated (take place when execution reaches the code to which the declaration is attached), but whose types are statically bound. Elaboration occurs during run-time. Ex: variable declarations that appears at the beginning of a Java method are elaborated when the method is called; and the variables defined by those declarations are deallocated when the method complete its execution.

• Stack-dynamic variables are allocated from the run-time stack.

• Recursive subprograms require some form of dynamic local storage, so that each active copy of the recursive subprogram has its own version of local variables. These needs are met by stack-dynamic variables. Even if there is no recursion, having stack-dynamic local storage for subprograms make them share the same memory space for their locals.

• If scalar, all attributes except address are statically bound

– local variables in C subprograms and Java methods. C# are by default stack-dynamic. In Ada all non-heap variables in subprograms are stack-dynamic

• Advantage: allows recursion; conserves storage

• Disadvantages:

– Run-time overhead of allocation and deallocation

– Subprograms cannot be history sensitive

– Inefficient references (indirect addressing), slower access.

• Explicit heap-dynamic -- Allocated and deallocated by explicit directives, specified by the programmer, which take effect during execution. Heap is a collection of storage cells whose organization is highly disorganized because of the unpredictability of its use.

• An explicit heap-dynamic variable has two variables associated with it: a pointer or reference variable through which the heap-dynamic variable can access, and the heap-dynamic variable itself. Because an explicit heap-dynamic variable is bound to a type at compile, it is static binding. However, such variable is bound to storage at the time they are created, which run-time.

• e.g. dynamic objects in C++ (via new and delete), all objects in Java, are explicit heap-dynamic.

• Advantage: Explicit heap-dynamic provides for dynamic storage management: dynamic structures (linked lists, trees) that need to grow/shrink during execution, better build using pointer or reference, and explicit heap-dynamic variables.

• Disadvantage: inefficient (cost of references to the variables; complexity of storage management implementation), and unreliable (pointers).

• Implicit heap-dynamic—are names that adapt to whatever use they are asked to serve; allocation and deallocation caused by assignment statements (i.e. bound to heap storage only when they are assigned values).

• all variables in APL; all strings and arrays in Perl and JavaScript

• Advantage: flexibility (allowing highly generic code to be written)

• Disadvantages: Inefficient, because all attributes are dynamic (run-time overhead). Loss of some error detection by compiler. Have the same storage management problems as explicit heap-dynamic.

Type Checking

• Generalize the concept of operands and operators to include subprograms and assignments, i.e. we think of subprograms as operators whose operands are their parameters. The assignment symbol will be thought of as a binary operator, with its target variable and its expression being operands.

• Type checking is the activity of ensuring that the operands of an operator are of compatible types.

• A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler- generated code ( or interpreter), to a legal type

– This automatic conversion is called a coercion, (int to float).

• A type error is the application of an operator to an operand of an inappropriate type, e.g. in original version of C, if an int value was passed to a function that expected a float, a type error would occur (because C compiler did not check parameters types).

• If all bindings of variables to types are static, nearly all type checking can be static

• If type bindings are dynamic, type checking must be dynamic at run-time, is called dynamic type checking. JavaScript, PHP (dynamic binding) allows only dynamic type checking.

• It is better to detect errors at compile-time than at run-time; less costly. But static checking reduces flexibility.

• A programming language is strongly typed if type errors are always detected. This requires that the type of all operands can be determined either at compile-time or run-time.

Strong Typing

• Advantage of strong typing: allows the detection of the misuses of variables that result in type errors.

• Language examples:

– FORTRAN 95 is not: EQUIVALENCE of different types, allows a variable of one type to refer to a value of a different type. Pascal and Ada are not: variant records. C and C++ are not: parameter type checking can be avoided; unions are not type checked. Ada is, almost (UNCHECKED CONVERSION is loophole).