Information Fundamentals

Chapter II

Information fundamentals

1-Quotes about information

Information is power. But it is what you do with it that either makes you great or diminishes you.
President Thomas Jefferson 3rd president of the US and author of the declaration of independence 1776 said: Information is the currency of democracy. We are not afraid to follow truth wherever it may lead nor tolerate any error so long as reason is left free to combat it.
Inventor Thomas Edison: We do not know one millionth of one percent about anything.
20th century famous columnist Franklin Pierce Adams FPA: I find that the great part of the information I have was acquired by looking up something and finding something else on the way.
If you don’t know what you’re looking at, how do you find out what it is? If you can’t find out what it is, how do you know what you’re looking at? Thiddlededum & Twiddlededee, Alice in Wonderland

2-Definition of information: Information can be defined in 2 ways:

Knowledge in its different forms that may be acquired in different ways:
Knowledge acquired as a result ofresearch, conventions and seminars, study, experience, or instruction.
Knowledge of specific events or situations that has been gathered or received by investigation, communication, intelligence.
News collected from media:
Newspapers, magazines, brochures and books
Radio stations and TV channels
Internet sites
Other types of media.
Computer Science Processed, stored, or transmitted data.
A numerical measure of the uncertainty of an experimental outcome
Statistical information: A collection of facts or data
The act of informing: As business strives to have knowledgeable employees in order to get the highest quality performance; employees should get the knowledge they need for that purpose.
Condition of being informed so business executives get the knowledge they need for their job and be able to communicate needed knowledge wherever that may be useful.

3-Quality of information: In order to be useful, information should have several qualities and characteristics:

Timely: It is very important that a business manager have the information available on time before making decision at any level of the decision making process. Late information is never useful information.
Accurate: Information must be updated. Accuracy is most needed in business decision making. No one will be able to trust the integrity of information whose accuracy is debatable.
Complete: A fragment of information will not be enough to make sound decisions and may result in making the business environment cloudy and misleading.
Well based: Business information must be referenced and supported by facts, experiments, research or intelligence and expertise where it is possible to get the expertise from renowned business experts.
Related to the business activity: What would we use information for if it is not related to the activity of our business?
Kept secret: Business possession of some information will certainly misfire if everybody in the market knows that a business has it.
Easy to verify, process and communicate:
Prior to being used, any information must be verified, because there is lot of misinformation in the business environment, exactly like in the army’s battle field. Business shall not allow the use of raw information because that may expose it to unknown risks.
Business information system needs to have the possibility to process this information and fit it into the decision support system where that is useful and/or necessary.
Information exchange and communication are tightly related to coding and representation. In this context we can classify information under three categories:
The digital information form
The coded information form
The conventional information form.

4-Digital information:

Digits mean numerals that were invented by ancient civilizations and numeration systems that use them to represent digital information
Digits were developed to be used for computation and arithmetic purposes but they

5-Important numeration systems:

The Mayan numerals:(Figure 2-1) Uses 2 codes the dot ( ) for 1 and the bar ( ) for 5.
5 dots make a bar.
The base of the Mayan numeration system is 20
Vertically leveled system with the 1 = 20^0 at the bottom.
The levels are: 1 (20^0), 20 (20^1), 400 (20^2), 8000 (20^3), 160000 (20^4), etc…
2013 will be written as follows: 5(20^2) + 13; which corresponds in Mayan to: - Bar in the 3rdlayer = 5 x 400 = 2000 plus a stack of 2 bars topped by 3 dots in the 1st layer: 5 + 5 + 3 = 13. The sum is 2013

Figure 2-1: Simplified table of Maya numeration system (base 20)

Layer power / Number / 2013 in Mayan / Conversion to Decimal / 2013
5 20^4 / 160000
4 20^3 / 8000
3 20^2 / 400 / / 5 x 400 / 2000
2 20^1 / 20
1 20^0 / 1 / / (5+5+3) x 1 / 13

The Roman numerals: Figure (2-2)
The Roman system is decimal (Base 10)
It is the only numeration system that doesn’t use numeral digits.
Instead of numeral digits, the Roman numeration system borrowed capital letters from the Latin alphabetI, V, X, L, C, D and M
The letters correspond respectively to: 1, 5, 10, 50, 100, 500 and 1000 as shown in the table of Figure (2-2) below.

Figure 2-2: Roman numeration characters

Letter / I / V / X / L / C / D / M
Decimal value / 1 / 5 / 10 / 50 / 100 / 500 / 1000
Decimal power / 1 x 10^0 / 5 x 10^0 / 10^1 / 5 x 10^1 / 10^2 / 5 x 10^2 / 10^3

Roman numeration system has a set of rules that govern the way numbers are written, read, interpreted and manipulated:
As you start from the left larger number and you add while going right.
If a lower number is on the left of a higher one it is simply subtracted from the higher one and the result is considered and added in the process.
Example one: 2013 is written (MMXIII)
MM for 2000 (1000 + 1000)
X for 10
III for 3 (1 + 1 + 1)
Adding the resulting numbers from left to right we will have: 2000 + 10 + 3 = 2013
Example two: 1994 is written (MCMXCIV)
M for 1000
CM for 900 (C to the left of M) (1000 – 100 = 900)
XC for 90 (X on the left of C is subtracted from C) 100 – 10 = 90
IV for 4 (I to the left of V is subtracted from V): 5 – 1 = 4
Adding the resulting numbers from left to right we will have: 1000 + 900 + 90 + 4 = 1994
As you will notice easily that Roman system is:
Very limited and cannot handle big numbers
Not flexible for math.
This system is still used to denote small numbers or for luxury decoration of watches, clocks, jewelry and decoration items

The Arabic numerals: (Figure 2-3)
Also called Hindu numerals
Represent the modern decimal system used everywhere in the world today
Base 10 system
Comprises 10 numerals (digits): 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.
The base 10 is the lowest composed number that includes 2 digits 1 and 0.
Like the Mayan Arabic is a leveled system the levels are: 1 (10^0), 10 (10^1), 100 (10^2), 1000 (10^3), etc…

Figure 2-3: The Arabic numeration system

Power / 2013 / Multiply / 1994 / Multiply
10^9 / One billion
10^8 / 100 million
10^7 / 10 million
10^6 / 1 million
10^5 / 100 000
10^4 / 10 000
10^3 / 1000 / 2 / 2000 / 1 / 1000
10^2 / 100 / 0 / 0 / 9 / 900
10^1 / 10 / 1 / 10 / 9 / 90
10^0 / 1 / 3 / 3 / 4 / 4
Number / / 2013 / 1994

2013 is computed as: 2x10^3 + 0x10^2 + 1x10^1 + 3x10^0
1994 is computed as: 1x10^3 + 9x10^2 + 9x10^1 + 4x10^0
Important characteristics of Arabic decimal system:
Very flexible with Math operations
Adaptable to all numbers regardless of their size
Very friendly and easy to teach
Easily converted to all other numeration system and easily obtained in converting all other systems.

The Binary system: figure (2-3)
This base 2 numeration system was invented in 1701 by the German mathematician Gottfried W. Leibniz who also invented a computing machine.
This is a leveled base 2 system that has only 2 digits 0 and 1
Each digit is called a bit (abbreviation of binary digit).
Like the decimal and Mayan systems the levels are: 1 (2^0), 2 (2^1), 4 (2^2), 8 (2^3), 16 (2^4), etc…
2013 is written as: 1x2^10 + 1x2^9 + 1x2^8 + 1x2^7 + 1x2^6 + 0x2^5 + 1x2^4 + 1x2^3 + 1x2^2 + 0x2^1 + 1x2^0 = 11111011101
1994 is written as: 1x2^10 + 1x2^9 + 1x2^8 + 1x2^7 + 1x2^6 + 0x2^5 + 0x2^4 + 1x2^3 + 0x2^2 + 1x2^1 + 0x 2^0 = 11111001010

Figure 2-3: Binary system table

Power / Decimal / Number 1 / Multiply / Number 2 / Multiply
2^15 / 32768
2^14 / 16384
2^13 / 8192
2^12 / 4096
2^11 / 2048
2^10 / 1024 / 1 / 1024 / 1 / 1024
2^9 / 512 / 1 / 512 / 1 / 512
2^8 / 256 / 1 / 256 / 1 / 256
2^7 / 128 / 1 / 128 / 1 / 128
2^6 / 64 / 1 / 64 / 1 / 64
2^5 / 32 / 0 / 0 / 0 / 0
2^4 / 16 / 1 / 16 / 0 / 0
2^3 / 8 / 1 / 8 / 1 / 8
2^2 / 4 / 1 / 4 / 0 / 0
2^1 / 2 / 0 / 0 / 1 / 2
2^0 / 1 / 1 / 1 / 0 / 0
Number / / 2013 / 1994

The hexadecimal or simply HEX numeration system: (FIGURE 2-4)
This system has a base 16.
It is used in computer to represent information prior to processing.
The 16 digits used are: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E and F.
The letter A represents 10, B represents 11, C represents 12, D represents 13, E represents 14 and F represents 15.
Because 16 is the base,the decimal number 16 = HEX 10
10 represents the base in any numeration system.
HEX is used in the computer industry because of its easy conversion to or from binary.
Like the other systems: 1 (16^0), 16 (16^1), 256 (16^2), 4096 (16^3) etc…

Figure 2-4 HEX numeration system conversion table

Power / Decimal / Number 1 / Multiply / Number 2 / Multiply
16^5 = 2^20 / 1048576
16^4 = 2^16 / 65536
16^3 = 2^12 / 4096
16^2 = 2^8 / 256 / 7 / 1792 / 7 / 1792
16^1 = 2^4 / 16 / D (13) / 208 / C (12) / 192
16^0 = 2^0 / 1 / D (13) / 13 / A (10) / 10
Number / / 2013 / 1994

Because 16= 2^4, base 16 is easily converted into binary by replacing 16 by 2^4 and vice versa (Binary is easily converted into HEX)
2013 is written as: 7x16^2 + Dx16^1 + Dx16^0 = 7DD
1994 is written as: 7x16^2 + Cx16^1 + Ax16^0 = 7CA
It is curious to notice that conversion from HEX to binary can be achieved by simply fitting the 4 bits binary value of each HEX character in place of the character and we will easily get the equivalent binary number:
2013 = 7DD = 0111 1101 1101
1994 = 7CA = 0111 1100 1010
And conversion from binary to HEX can be achieved similarly by fitting the characters in place of their 4 bits binary values.

Data and information: (figure 2-5)

1-Digital data can be defined as the necessary ingredients used to process information.

2-It is similar to the raw material needed to process a finished product.

3-Data can be: text, sound, picture, audio, video, graphs, waves or anything that can be codified and quantified.

4-Information may also be processed into level2 or level3 or any higher level information combined with other data and information.Figure 2-5: Data and information.

5-Levels of information: Example of a company’s payroll(figure 2-6):

It starts with the daily salary and working days of each employee as data and ends with the payroll of the whole company.
Processing of number of days of each employee multiplied by his/her daily salary will give us the employee’s paycheck as information.
The sum of all paychecks of employees within a department forms the department payroll.
The sum of all department payrolls in the company forms the complete company’s payroll.
This process illustrates the use of level1 information to process level2 information and the use of level2 information to process level3 and so forth. The whole process is illustrated in the chart below

Figure 2-6: levels of information.

i-Data is processed to get level1 information which, in turn, is processed in order to get level2 information which is processed to get level3 information and so forth

ii-Consequently, the input may be data or information or both and the output is always a newer level of polished or advanced information.

6-Other types of information

Conventional information: Raw information saved on basic media: papers, tapes and disks of all kinds.
Coded information: uses an established coding scheme that is able to represent data and adaptable to be transmitted via one or many communication media.
Morse code was the backbone of the telegraph messaging technology that was inaugurated in 1860 with a message sent from the Supreme Court to President Abraham Lincoln.
The encryption technology that is always used to insure, enhance and maintain information security. It is based on a key that converts the plain text into cipher-text or encrypted text. To retrieve the original plain text, recipient must possess the key that may be:
A conversion table that assign to each character one or many different characters.
Scrambling criteria that convert characters of the same file into scrambled file.
A sophisticated math formula or model that converts each character into one or many other characters and/or symbols.

Data representation

1-All types of data are used as ingredients to make information.

2-Computer processor can only process data that is represented in bits (binary digits) that may have only two possible values 0 and 1.

3-We digitize data when we represent it with digits and the process is known as data digitization.

4-All types of data can be quantified and digitized

5-The most important issue was to choose a reliable universal representation system based on a bank of bits that may offer a number of combinations enough to:

Cover all letters and symbols world wide
Be reliable, flexible and easy to handle by the processor and other computer components and electronic circuitry.
Used as unit of quantification of data and information processed, handled, exchanged or stored.
Be compatible with computer processor, memory and storage media.

6-The use of a set of 2 bits where each one may have 2 values will offer only 4 different combinations (2^2) which are in this case: 00, 01, 10, 11

7-Early computers used set of 4 bits (BCD) Binary Coded Decimal which has the possibility to offer up to 2^4 = 16 different combinations that were enough to represent the decimal and HEX digits, they also used set of six bits (BCDIC), Binary Coded Decimal Interchange Code, for characters and printable graphic patterns, punched cards and punched tapes used in Europe and the USA.

ASCII & EBCDIC(figure 2-7)

1963 the American government and IBM advanced an expanded set of 7 bits called (ASCII) American Standard Code for Information Interchange to replace the old standards used in all government agencies and services that set offer 2^7 = 128 different combinations to represent data. This code was promptly extended to 8 bits later to offer 2^8 = 256 different combinations.

1-In the early 1960s, IBM launched its 360 computer series using 8 bits that was labeled by IBM as (EBCDIC) Extended Binary Code Decimal Interchange Code, which is an extension of the earlier set of 6 bits BCDIC.

2-The word “byte” deliberately advanced by Werner Buchholz to name the set of bits used to represent data. It doesn’t mean 8 bits initially but it means it now because most representation systems use 8 bits

3-The 8 bits set is called “octet” in most European systems.

4-The 8 bit set provides 2^8 = 256 different combinations which is still not enough to represent all the scripts and characters of languages used around the world and the graphical symbols and scripts used in arts and other activities of the environment.

UNICODE

1-Because ASCII is biased to the English language, a much stronger representation system called UNICODE was advanced, this system uses 2 bytes to represent data and offer 2^16 = 65000 different combinations

2-EBCDIC is becoming obsolete and modern computer technology uses ASCII with the possibility to convert files from ASCII to Unicode and vice versa.

3-While C, C++, C# use ASCII, JAVA uses Unicode for all applets and scripts, so all files in websites and webservers need to be converted accordingly.

4-The conversion of ASCII files into Unicode is easy and achieved by filling zeroes to all the left byte positions and the bad result is that the size of the file will double.

5-The table below illustrates the representation of decimal digits and some the characters

ASCII / Symbol / EBCDIC
00110000 / 0 / 11110000
00110001 / 1 / 11110001
00110010 / 2 / 11110010
00110011 / 3 / 11110011
00110100 / 4 / 11110100
00110101 / 5 / 11110101
00110110 / 6 / 11110110
00110111 / 7 / 11110111
00111000 / 8 / 11111000
00111001 / 9 / 11110001
01000001 / A / 11000001
01000010 / B / 11000010
01000011 / C / 11000011
01000100 / D / 11000100
01000101 / E / 11000101
01000110 / F / 11000110
01000111 / G
01001000 / H
01001001 / I
01001010 / J
01001011 / K
01001100 / L
01001101 / M
01001110 / N
01001111 / O
01010000 / P
01010001 / Q
01010010 / R
01010011 / S
01010100 / T
01010101 / U
01010110 / V
01010111 / W
01011000 / X
01011001 / Y
01011010 / Z
01000000 / @
00100100 / $
01100001 / a
01100010 / b

Figure 2-7: Binary codes of some ASCII and EBCDIC numerals and letters

Important note: the Hex letters A B C D E F don’t have the same binary values as letters in this table but as in the table below: Hex = 4 bits or a nibble of an octet (byte)

letter / A / B / C / D / E / F
binary / 1010 / 1011 / 1100 / 1101 / 1110 / 1111

Fig 2-8 4digit Binary values of HEX letters

Binary representation of data

How can we represent all types of data?

1-Computers are expected to process all types of data of our modern global environment. Business data is not an exception because it may include all types of data which are:

Text
Pictures
Documents
Audio and sound waves
Music audio and albums
Video and movie albums
Scripts and graphs and all type of data

2-The process of converting all known types of data into binary digits is necessary for the computer to be able to process them.

3-The conversion process technology is known as: DIGITIZATION

The following paragraphs will explain how the basic ingredients of all these listed topics can be done. We will start with the simplest

1-Digitizing text: depending on the language used, a text comprises characters and scripts that have ASCII binary equivalentor Unicode binary equivalent and all the program has to do is to fit the binary value of eachcharacter’

Foreign Languages that have scripts use an intermediate step of conversion ASCII-Unicode.
The pure text file has the smallest size

2-Digitizing picture: The technology involved in digitizing pictures is an old picture copying technology that consists of superposing a grid over the picture and digitizing the cells of the grid as follow:

Each cell is called a pixel that stands for (picture element) or (picture cell).
The number of pixels of the picture is known as its “resolution”
Each cell has coordinates
Under each cell there is a color
Unlike the artist pallet composed of three basic colors Red, Yellow and Blue, the digital color pallet consists of three basic colors Red, Green and Blue (RGB).
The intensity or amplitude of each basic color is stored in one byte which means that we need 3 bytes to store one color spot.
The total possible color representation in 3 bytes (24 bits) is 2^24 = 16.6 million different possible colors.
Example: a picture (800x600) will have a resolution 480000 pixels needs to be stored: 480000x3 = 1440000 bytes almost 1.5 megabytes
That’s why picture files or document that has text and pictures are very big files and need big storage area and fast Internet access.

Figure 2-5 illustrates how we digitize pictures (picture courtesy Microsoft clip art library)

1-Resolution: the number of pixels in the grid is known as the picture resolution which is the product of the number of rows multiplied by the number of columns: Resolution = columns x rows.

The higher the resolution the higher the picture quality and the lower the resolution the lower the picture quality because, with more pixels, more color details are represented and the average color of the pixel will be much more representative.
The higher the resolution the bigger the file and the lower the resolution the lower the file volume of the picture.
Below is an example that illustrates the process:
1000 pixel resolution will need a file of 3000 bytes for one-picture file and 9000 bytes for 3-picture file
2000 pixel resolution will need 6000 bytes for one picture file and 18000 bytes for 3-picture file.
High definition screens and monitors have very high resolution resulting in the best possible picture quality. But need a special filtering technology and high speed upload and download of the involved huge files.

2-Digitization process: To digitize a pixel we need:

To determine its location in the picture by its coordinates: For example r50,c 80 correspond to the pixel located at the intersection of the row 50 and the column 80.
To identify its average color in RGB. We take the average color because the pixel is extended over a relatively large area where colors are not the same all over the area.

3-Digitizing sounds: The sound waves are continuous waves generated by:

The change of air pressure
The magnetic vibration of loud speakers
The direct pressure variation caused by
keys of musical instrument
human vocal cords
These waves are digitized using the sampling technology
A sound wave is continuous in time (horizontal axis of the graph) and variable in amplitude or voltage intensity (vertical axis of the graph).
Digitizing the wave requires that we divide the horizontal axis into many equal parcels of several microseconds each and dividing the vertical axis into 256 parcels of amplitude if we need to store the amplitude of each parcel in one byte or 65536 parcels if the technology allows to store each one in 2 bytes.

Amplitudeor