Home Page

About Page

Photo Page

What's New Page

Contact Page

Favorite Links

MCS
Compilar Construction

Translator: - Translates source program into target program.

Source program: understandable by user/programmer. It may be in any high level language. e.g. Cobol, Pascal etc. it is made up of different constructs.

Constructs: operators, symbol, punctuation, reserved words, integers, char, identifiers.


Target program: machine language (binary), object program and executable program.

Languages: there are four types of languages

1. Low-level language: machine language, binary form, no translator required, not portable to the machines.
2. High-level language: easy to understand, portable machine, translator required, more time consumption because of translation process.
3. Assembler: intermediate language. source program must be in assembly language.
4. Problem oriented language


Types of translators: -

Compiler
Interpreter
Assembler

Compiler: -

Translates source program into target program as a whole.

Source program ------- compiler ------- object program ----------execution result


data

Interpreter: -

Translates source program into target line by line and execute each line after translation.

Source program --------- interpreter ------ result

data


Assembler: -

Translates only assembly language into target program by using assembler.


Source program --------assembler ----- result

data

Incremental language has the properties of both i.e. interpreter and compiler.


Analysis and synthesis:

A compiler is divided into two parts: analysis and synthesis. The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. The synthesis part constructs the desired target program from the intermediate representation.


Analysis of the source program: -
In a compiler analysis consists of three phases:

Linear analysis: in which stream of characters making up the source program is read from left to right and grouped into tokens that are sequences of characters having a collective meaning.
Hierarchical analysis: in which tokens are grouped hierarchically into nested collections with collective meaning.
Semantic analysis: in which certain checks are performed to ensure that the components of a program fit together meaningfully.


Model / phases / structure of a compiler

Conceptually, a compiler operates in phases, each of which transforms the source program from one representation to another. There are six phases of a compiler, which are lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimizer and code generation. Each phase of the compiler interacts with the symbol table and the error handler.



Symbol Table Management:

As essential function of a compiler is to record the identifiers used in the source program and collect information about various attributes of each identifier. It is a data structure containing a record of detailed information for each construct. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from the record quickly. When the lexical analyzer detects an identifier in the source program, the identifier is entered into the symbol table.

Symbol table operates / used three operations
1. Look up operation: whether data element is present or not
2. Insert operation: if not present insert it into symbol table
3. Delete operation: delete the data element from the table


Error detection:

Each phase can encounter errors. However, after detecting and error, a phase must some how deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. A compiler that stops when it finds the first error is not as helpful as it could be.


Source program: it is made up of different constructs.

Constructs: operators, symbol, punctuation, reserved words, integers, char, identifiers.

Lexical analyzer: -
In a compiler, linear analysis is called lexical analysis or scanning. It separates or decomposes the incoming text i.e. source program into tokens and store them into symbol table. Blank spaces would normally be eliminated during lexical analysis and not stored in the symbol table. There are may be more than one symbol table i.e. separate table for reserve words etc. where each type of construct is completely defined.



Syntax analyzer: -
Hierarchical analysis is called syntax analysis or parsing. It involves grouping of the tokens of the source program into grammatical phrases. In this phase tokens are rearranges and determine the syntactic structure (expression, control structure, declaration, statements). Usually the grammatical phrases if the source program are represented in syntax tree. Rules and regulations / grammar is required for the construction of syntax tree.

Semantic analyzer: -
It uses the hierarchical structure determine by the syntax analysis phase to identify the operators and operands of the expressions and statements. An important component of the semantic analysis is type checking of identifiers i.e. data type is same. Operands must have same data type and each operator has operands that are permitted by the source language specification. Whole grammar / syntax is checked. What is meaning of incoming text. e.g. operands are declared.

Intermediate code generator
It should be easy to produce and easy to translate into target program. Not every compiler use this phase, some may use it in semantic analyzer phase while other may skip this phase.

The language of intermediate code depends upon compiler. It is use to simplify the target program code.

Postfix notation
Syntax tree / dags (directed acyclic graph)
Both are graphical representation.

Three address code
Stack instruction: it is unable to handle arrays, pointers etc.
Three address code:

1. Three operands
2. Temporary variables to hold the results
3. lesses operands can also be used
4. a part from assignment statement := can be only one more operator

Code optimizer
Control flow analysis

Data flow analysis

Different transformation

Code generation
Memory management
Instructions sets
Register allocation

Target program: -

Machine language form:

1. Absolute language
2. Relocatable language
3. Assembly language

Introduction To Language Theory

Communication between computer and user is possible through the language understandable by both.


Language elements:

1. Alphabetic / character class
2. Grammar (Rules)


Alphabetic / Character class: -

It is a finite set of symbols or token.

Symbol / token: - It is combination of one or more characters each of which may either be a digit, character, special symbols, key words, identifiers, constants, strings etc.

e.g. token Do ; := A


Grammar: -

It is a set of rules that when followed will produce a valid sentence a language. a grammar G consists of four objects

1. Alphabet
2. Non-terminals
3. Start symbol
4. Production rules or substitution rule

Alphabet: - non-empty, finite set of symbol called terminals symbol. This is denoted by å. Its element include small letters a,b,c…z., 0,1,2…9 or special symbols. (Constructs also included later on).

Non-terminals: -non-empty, finite set of symbols called non-terminals symbol. It is denoted by N and represented by capital letters.

Start symbol: - a distinct element of the set N. It is denoted by S.

Production rules or substitution rule: - set of rules used to derive valid strings of the language being defined by the grammar. Each of the rules is of the form

(1) A x1, x2,….xn.

Where A is a non-terminal and x1, x2, …xn is a finite string terminals and non-terminals.

(2) The string x1, x2, …xn is allowed to contain no symbol i.e. it may be empty string. The production rule is of the form A l or A å

Example: To generate the string 10101

The grammar

(1) S A
(2) A A0A
(3) A 1

S = {S} N = {S, A} å = {0,1}


S = > A
= > A 0 A
= > 1 0 A
= > 1 0 A 0 A
= > 1 0 1 0 A
= > 1 0 1 0 1

11 :- The String Is Not Valid.

String: -

It is a sequence of one or more than one characters drawn from an alphabet. In language theory of compiler “Word” and “Sentence” are used for the word “String” as synonymous (identical).

The set of all strings of length one or more consisting members of e will be designed as e+

e.g. e = { #, 4, +, * }

Where as: - the set e* is the set of e+ together with the empty string.

e* = e+ U {e} or e* = e + U {l}

e.g. e = { 0, 1} e* = { l, 0, 1 }



Length of a string: -

it is the number of characters in the string and is denoted by |S|. A string with length zero is called empty string denoted by e or l.

e.g. e={4,#,+,-} | S | = 4

| + | = 1

Language: -

It is a set of strings over some fixed alphabet and is denoted by L.

e.g. let the e consists of two symbols 0 and 1.

e = { 0, 1} and the language L consists of the following string

L = { l, 0, 1, 001, 100,…..}

chomsky hierarchy


Unrestricted Grammar Restricted Grammar

or type-0-grammar



context sensitive context free linear grammar

or type – 1 or type – 2 or Regular grammar

or Type – 3



left linear right linear

or left regular or right regular







Type 0 or unrestricted grammar: -
A grammar consists of four objects G = {N, e, P, S}

i.e. a non-empty set of non-terminals and set of terminals. The start symbol S is the set of production where the form is U V



Rules: -

Where U and V are arbitrary (number of char of the string is not defined) string of terminals and non-terminals.
Where the right side of the production can be consist of empty string i.e. U l


These types of grammar are called as general and every set of string can be denoted by type 0 grammar.

A language generated by this type of grammar is called type 0 language.

Context sensitive or type 1 grammar: -
In this grammar each production has the form Y X

Where X and Y both can take strings from e and N.
Y consists of at least one element from N.
|Y| < |X| is the elements of Y should be less than or equal to X.
Where the right hand side of the production can never be empty.


Type 1 grammar can also be called as type 0 grammar.

Context free grammar or type 2: -
Each production is of the form Y X, where

Y consists elements for non-terminals and X can be take string for e and N.
The right side of production can consists of empty string.

e.g. E E + T

E T

T T l F

T F

F E

F a



The syntax of programming language constructs can be described by context free grammar. If empty string l is eliminated it is known as construct sensitive grammar. Construct free grammar can also be called construct sensitive grammar.


Regular or type - 3 grammar (Linear): -
This is also called sub class of context free grammar. A grammar is called regular if it is either right regular or left regular.

Right regular: - if its production has one of the following forms:

Non-terminals are at extreme right of the production. If l is present than it should be the first production.

e.g S l

A a B

A a



Left regular: - if its production has one of the following forms:

Non-terminals are at extreme left of the production. If l is present than it should be the first production.

e.g S l

A B a

A a



It is sub class of context free grammar except the rule of 1st production.


Write a program for integer numbers?

Rules: -

It is any sequence of significant digits from 0 – 9.
Preceded by a sign i.e. + or -.
In the absence of sign it is considered as a positive number.


Grammar: -

< number>

+ / - / l

/

0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9

Write a program for identifier (Pascal)?



Grammar: -



/ /

a / b / .…./ z / A / B/ ……/ Z

/ /

0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9


My Favorite Things To Do
I like to sleep 20 hours a day only

Some of My Favorite People
.

:

fahad