4.1 Data Types Data, Types and Operations Usually Built-Into Language

Lecture week 6

4.1 Data Types – data, types and operations usually built-into language

1) investigate properties that define data objects

2) typical types in most language (and Java)

a) elementary = atomic; manipulated as a unit

b) structured = aggregate of other types

4.1.1

Data Object: run time grouping of one or more pieces of data in a virtual computer

a) programmer defined: variables, constants, arrays and files

b) system defined: run-time stack, activation records, file buffers, etc.

Data Object:

- represents a container for data value(s)

- has 6 attributes

1) lifetime

2) type (associated with set of data values)

3) location (in memory; not modifiable by programmer; may be changed by storage mgmt)

4) value (usually result of an assignment op)

5) name (set up in declarations; may be modified by subroutine calls/returns)

6) component (belongs to another object? Often represented by ptr – modified by ptr change)

variables and constants

variable

- binds to name and type at compile time

- simple variable: elementary data object with a name

- may be case sensitive/insensitive

- binds to location at run time (unless Fortran)

- binds value at run time, so value may be changed (usually by assignment stmts)

constant

- binds to name, type and value at compile time

- binds to location at run time (may not be changed, SUPPORTS RELIABILITY PRINCIPLE)

Example in ADA: CurrentSize: integer := 0;

MaxSize: CONSTANT integer := 500;

Example in C++: int currentSize = 0;

const int maxSize = 500;

Example in Java: class variable: static int currentSize = 0;

instance variable: int currentSize = 0;

constant: public static final int maxSize = 500;

Constant literal

- name is written representation of value

- “21” is representation of constant which is data object with value of 21

Persistent Data

Lifetimes of vars determined by execution time; data may have longer lifetime.

Persistent language could declare vars whose lifetime extends beyond prog execution.

Ex: airline reservation system

Persistent languages are an active research area.

Design Issues for variables

1) permit aliasing? And how? A=attributes of variable V=value of variable R=reference

NOTE: Java is in wrong place in diagram.

2) dereferencing (take reference and return its value)

X := .X + 1 (BLISS)

Context (Pascal and Ada and Java)

*Xaddr &X and context in C++ and Java

*Xaddr (value of variable located at address that follows so X is a pointer)

&X (returns address (X integer or any built in type)

Design issues for Constants (4)

1) use only simple types for constants or can you have constant arrays and records?

2) expressions used to assign constant a value

3) when is value assigned? @compile time or @block entrance

4) any predefined constants

Pascal example:

1) only simple types

2) no expressions

3) @block entrance

4) false, true, maxint

OTHER: constants must be before all other declarations

Ada example:

1) array and record constants as well

2) expressions and functions may be used to assign constant a value

3) run – time

4) yes, maxint

OTHER: just use word CONSTANT in declaration

X: constant INTEGER := 17

Y: INTEGER := 17

C++/ Java example:

1) array and record constants as well 1) simple types only

2) yes 2) yes

3) run time 3) compile, if possible otw runtime

4) yes, in libaries (like math.h for pi, etc.) 4) yes

4.1.2 Data Types

Data Types – class of data objects together with a set of operations for creating them and manipulating them

Every language

– set of primitive data types

– may provide ability to produce programmer-defined data types

LOOK AT SPECIFICATION/IMPLEMENTATION/SYNTAX ISSUES

4.1.3 Specification of Elementary Types

- elementary data object contains a single data value

- class of such data objects with operations is an elementary data type

- most languages: integer, real, character, boolean, enumerated and pointer

1) attributes: basic attributes such as type and name usually invariant during lifetime

some may be stored in a descriptor (or dope vector)

ex. Array: #dimensions, index range, component type

2) values

- type determines set of possible values

- and in elementary data types they are ordered

- ex: C has int, short, long, char

3) operations

- determine how data objects of a given type can be manipulated

- concerned with primitive operations, but can be programmer-defined

- ideally, op is a math function: given input arguments it will uniquely determine result

- domain : set of possible input arguments

- range: set of possible results

- arity of op = number of args: binary (or dyadic), unary (or monadic)

Signature of an operation (informal description /also needs implementation)

opname: arg type x arg type x …arg type à result type (= C function prototype)

Ex. + : integer x integer à integer

==: integer x integer à Boolean

SQRT: real à real

4 main factors for obscured specifications

1) ops undefined for certain inputs (ex. Underflow errors; SQRT of negative numbers)

2) implicit arguments (ex. Use of globals); makes it hard to determine all data that may affect result

3) side effects (ex. Modify input args as well as return)

4) self-modification (ex. Random generator - of data and of code; LISP may modify code)

Subtypes

Inherit ops from parent types

Forerunner of OOP inheritance

4.1.4 Implementation of Elementary Data Types

1) storage representation

- use underlying hardware storage rep

- software simulated (less efficient)

- attributes may be determined by compiler for efficiency – not stored in runtime rep (as in C)

- attributes may be stored in descriptor for flexibility (PROLOG, LISP)

- rep is usually independent of location in memory

- usually described in terms of size & layout of atts & data values within block of memory

- address of 1st word or byte represents location

2) operation implementations

- directly use built-in hardware op

- subprogram (ex. sqrt)

- in-line code sequence (like macros; ex: abs(x) = if x < 0 then –x else x)

4.1.5 Declarations

Program statement communicates to compiler about name and type of data objects .

- may also indicate lifetime (how? By placement in program)

ADA: a, b: integer; explicit declaration

x: float; bind names and type

Ada assigns default values, if not init (0’s, blanks, etc.)

C++/Java: int a, b; explicit declaration

float x; bind names and type

C++ doesn’t assign default values, if not init (uses leftover)

Java assigns default values

FORTRAN has implicit declarations (vars starting w/ I-N) are integer.

PERL: assigning a value to a variable declares it

Ex: $abc = ‘a string’; $abc = 7

Declarations of ops

- compiler needs signature

- no explicit declaration needed for primitives

- programmer-defined ops need “prototype” or “heading”

ADA: function f (p1 in: integer) return float is

Signature: f: integer à float

C++/Java: float f (int p1);

Signature: f: integer à float

Purposes for Declarations

1. choice of storage reps by translator

2. storage management (dynamic/static)

3. polymorphic ops (overloaded ops resolved; Smalltalk has no type declarations; must decide which + to use each time it is encountered)

4. type checking (allow static checking)

4.1.6 Type checking and Type conversion

Bit sequence in hw: Is it an integer or float or char? Or instruction? Hw can’t check type

Type Issues (2)

1) Type checking

- each op checked to receive proper number of args and types

- static vs. dynamic type checking

- Advantages of static type checking

- Polymorphic op replaced by type-specific

- All paths checked

- Advantages of dynamic type checking (Lisp, Prolog)

- Flexibility in program design (type of variable can change during runtime)

- Disadvantages of dynamic type checking

- Difficult to debug

- Extra storage for keeping type information during execution

- Must be in software / slow

- Concern for static type checking affects many aspects of lang: declarations, data control structures, provisions for separate compilation of subroutines, etc. Usually some things cannot be checked statically. Two ways to handle:

- Dynamic type checking

- Leaving ops unchecked

Strong typing – detect all type errors statically; few languages are truly strongly typed

2) Type inference

- ML infers data types if interpretation is unambiguous: p166, 4th ed

3) Type conversion or coercion

- mismatch of types

- flag as error

- coercion

- explict conversion

- Pascal i := round (x);

- C++ /Java: r = (float) (i);

- implicit conversion

- Pascal r := 2/3; 2 implicitly converted and does real division

- C++ r = 2/3; 2 and 3 not implicitly converted and does integer division

- two types of coercions

- no info lost/widening short int à long int

- narrowing 1.0 à 1 OK, IN THIS CASE

- ADA: almost no coercions provided àavoid tedious detail

- C++ : coercions are the rule à mask programming errors

Assignment

Basic op for changing the binding of a value to a data object

- this is a side effect

- In Pascal, assignment does not return a value: assignment (:=): integer1 x integerx à void

- In C and Lisp, it returns a value – a data object holding a copy of the value assigned: assignment (=): integer1 x integer2 à integer3

- Define r-value as righr hand side of expression (= value contained in data object)

- Define l-value as left hand side of expression (= location of data object)

- Assignment op is then:

1. Compute l-value of 1st operand expression

2. Compute r-value of 2nd operand expression

3. Assign computed r-value to computed l-value object

4. Return computed r-value as the result of the operation

- C contains a set of unary operators for manipulating l-values and r-values

Many “useful, and strange, manipulations of such assignments”

Dual use of assignment op, as mech to change data object’s value and as a function that returns a value is heavily exploited by C

- An r-value may be the l-value of some other data object (ex: A=B, where A & B are ptrs; seeFig 5.2)

Semantics of Assignment

A ß 2 + 3 Does this mean assign ‘5’ to A, or assign the operation ‘2+3’ to A? This is unambiguous if language has static types – A is either type int or op. But it can be ambiguous if lang is dynamically typed. In Prolog, ‘is’ means assign the value, while ‘=’ means assign the pattern. So the following clause 1 succeeds, because X is first assigned the value 5 & its type is set to int. The second clause fails because the two parts are different.

1. X is 2 + 3, X = 5.

2. X = 2 + 3, X = 5.

4.2 Elementary Data Types

4.2.1 Numeric

Integers

Specification

- set of integer values ordered subset within finite bounds (MININT..MAXINT)

- ops: Binary ops: integer1 x integer2 à integer3

ADA: + , -, *, / mod rem **

C++/Java: + , -, *, / , %, pow (library)

Unary ops: integer à integer

ADA: -, + sign abs not

C++/Java: -, + sign abs(math.h), !, ++, --

relops: integer x integer à boolean

ADA: /= , =, <=, >=, <, > in not in

C++/Java: != , =, <=, >=, <, >

Implementation

- hardware-defined storage rep

- hardware-arithmetic/relational primitive ops

Example:

ADA: a: integer;

type Tresult is digits 7;

x : Tresult; number of digits can be specified in ADA

C++, Java: int a;

Integer issues

1) machine implementation reliance ADV: speed DISADV: portability

2) predefined MaxInt and MinInt: ADA: Yes C++: Yes, INT_MAX and INT_MIN (include limits.h)

Java: yes, MAX_VALUE, MIN_VALUE, POSITIVE_INFINTY,

NEGATIVE_INFINITY

3) See p 174 for examples of integer storage reps:

a. Has no descriptor; ok where lang provides declarations & static type checking for ints

b. Stores descriptor in separate mem loc, w/ptr to full word int value; often used in LISP

Disadv: May double the storage required for a single int

Adv: Value is stored using built-in hw rep, so hw arith can be used

c. Stores descriptor & val in single mem loc – shortens size of val to provide desc space

Adv: Conserves space

Disadv: Before doing hw arith, must clear desc, do arith, then put desc back – so arith is inefficient. This method is only practical for hw-implemented type descriptors (not common)

4) Subranges (A : 1..10 in Pascal or A : integer range 1..10 in Ada; in C programmer decides size of storage: char, short, int, long).

Subranges have 2 important effects on imps:

1. Smaller storage requirement. But subrange vals are often stored as smallest num of bits for which hw implements arith ops (usually 8 or 16 bits).

2. Better type checking.

Ex: Month: 1..12 then Month := 0 is invalid and can be caught at compile time

But some checks must still be done at run time, for ex: Month := Month + 1. In this case, bounds must still be available at run time.

Floating Point numbers

Specification

- set of ordered real values

- ops: same as with integers, except some boolean ops may be restricted due to roundoff errors (== test)

- other ops cos : real à real

max : real à real

Implementation

- hardware-defined storage rep

- hardware-arithmetic/relational primitive ops (some may be software-simulated)

Example:

ADA: x : float;

type Tresult is float range -1.0E12..1.0E12; range can be specified in ADA

answer : Tresult;

C++, Java: float x;

double y;

Fixed point real numbers (example: dollar amounts)

Specification: avoids round off errors

Implementation: Hardware or software simulated

Stored as ints, with decimal point an attribute of the data object

Ex: X = 103.421; r-val is 103421, and X has an att Scale Factor of 3. So:

value(X) = rvalue(X) * 10-SF

Must keep track of scale factors in arith. Ex: Product = rval(X) * rval(Y); SF = SF(X) + SF(Y)

Example:

COBOL: X PICTURE 999V99

ADA: float point type Tresult is float range -1.0E12..1.0E12;

Fixed point type TsmallAmt is delta 0.01 range –100.00..100.00;