Syllabus
CSCI 5220
Program Translation and Compiling
Sections 001 and 601
Spring 2016

Class meeting 1:00–1:50 MWF, Brewster B-205
Instructor Karl Abrahamson
Office Sci&Tech C-113
Office hours
MWThF 2:00–3:00pm
Th 10:00–10:50pm
or by appointment
Phone 328-9689
Email abrahamsonk@ecu.edu
Course web page www.cs.ecu.edu/~karl/5220/spr16/
My web page www.cs.ecu.edu/~karl/
Text Compilers: Principles, Techniques & Tools (Second Edition) by A. V. Aho, M. S. Lam, R. Sethi and J. D. Ullman
Notes Notes for CSCI 5220

Contents

  1. Prerequisites
  2. Introduction
  3. Course objectives
  4. Topics
  5. Attendance policy
  6. Incompletes
  7. Grading
  8. Distance education
  9. Lecture schedule and reading assignments
  10. Additional information

Prerequisites

You should have a good understanding of programming and be able to write well structured and well organized programs in C, C++, Java or a similar programming language. We will write programs in C.


Introduction

A compiler translates a program from one language to another or from one form to another. Some compilers generate machine language, some generate assembly language, some generate more portable code such as C code, some create abstract machine code, and some just generate data structures that are used by other parts of a program.

The first compilers were written in the 1950s. Compiler developers had a poor understanding of what they were doing, and did everything in an ad-hoc way. The popular wisdom was that it took 30 person-years to write a compiler, even for a very simple programming language such as early versions of Fortran.

(Considering the state of the art at the time, the compiler developers did a remarkably good job. Their code optimizers produced excellent machine code. That was really important because if the compilers produced poor machine code, programmers would have been much slower to adopt Fortran.)

There is probably no aspect of software development that has benefited from developments of the last 50 years more than the construction of compilers. Developments in the theory of computing have led directly to very powerful tools that not only make compiler writing much easier than it once was, but are also useful for other kinds of software. It is now feasible to write a compiler for a small programming language as a course project.

It is useful for a computer scientist to study compiler design for several reasons.

  1. Anyone who does any software development needs to use a compiler. It is a good idea to understand what is going on inside the tools that you use, in order to have a better understanding of their capabilities and their limitations.

  2. Compilers are sophisticated text processors. Most programs need to do some text processing, even if only to read in the contents of a configuration file. Techniques that were developed for writing compilers are useful in a variety of other software.

  3. A useful software design technique for large projects is to develop a special-purpose language that makes the project easy to implement. It can take less time and effort, and lead to a higher quality product, to spend the time to develop and implement a small special purpose language and to write the software in that language than to write the software in a general purpose language.

    One example is the language Erlang, which was designed for writing software to control data switches. Erikson created a data switch all of whose software was written in Erlang. They claim downtime due to software bugs per switch in the range of milliseconds per year.

    There has been growth in the development of domain-specific languages. Studying compilers enables you to design and implement your own domain-specific language.

  4. Compilers benefit tremendously from careful analysis of a problem, and from tools for performing that analysis. A study of compiler design gives a good feeling for how a large problem can be broken down and solved in a manner that is not ad-hoc.

  5. Compiler design makes use of formal methods that are rarely seen elsewhere except when quite difficult formal methods are used for general purpose software design. The study of compilers provides a gentle introduction to formal methods.

  6. A course in compilers offers a good opportunity to get experience with development of a larger piece of software.


Course objectives

After completing this course, the student should understand the theory and practice of compiler design, should be able to describe the components of a compiler and how they relate to one another, and should be able to apply that knowledge to implement a complete compiler for a small programming language.


Topics

Topics will be as follows.

  1. Overview of compilers. The compilation process and the anatomy of a compiler.

  2. Lexical analysis. The role of the lexical analyzer. Finite state machines. Regular expressions. Lexical analysis tools and lexer generators.

  3. Context-free grammars. Writing grammars for programming languages.

  4. Parsing. Top-down predictive parsing and LL(1) grammars. Table-driven parsers. Transformations on grammars. Bottom-up parsing, SLR(1), LR(1) and LALR(1) parsers. Parser generators.

  5. Syntax-directed translation. Attribute grammars. Construction of abstract syntax trees. Translation based on abstract syntax trees.

  6. Table management. The symbol table.

  7. Direct evaluation of syntax trees by an interpreter.

  8. Additional topics might include: Target languages. Intermediate code and abstract machines. Semantic analysis. Code optimization.


Attendance policy

I will not take attendance. It is up to you to attend class. You are responsible for announcements and assignments given in class. If you miss a class, it is up to you to obtain notes and any other information that was provided in the class.


Incompletes

No incompletes will be issued in this course except for extraordinary circumstances, and even then only if you are nearly done already, and have done work of acceptable quality that it is realistic that you can pass the course.


Grading

Grading will be based on four midterm exams (9% each), a final exam (25%), and a programming project (39%). There will be additional exercises, but they will not be graded. The exams will be on the following dates.

  1. Wed. February 2
  2. Fri. February 19
  3. Fri. March 4
  4. Wed. March 30

Distance education

This course is taught both face-to-face and online. We will be in a classroom that have video capabilities, so that you can see the lectures. Lectures will be posted on mediasite. Log in at http://mediasite.ecu.edu/ms/login. Alternatively, try https://mediasite.ecu.edu/MS/Catalog/catalogs/GC.

I will put up material on a screen. All of the material is available to you as the class notes.

Email is the best way to ask questions outside of class. I will try to reply as quickly as possible. I don't do well with telephones or skype or any of those things. There is something about them that does bad things to my psyche. I have always been this way. Some people are afraid of elevators. I am bothered by telephones.

This is my first time teaching online. I expect to learn it as I go. I would appreciate comments and recommendations from you as we go about how to improve it.

Exam Information for Online Students

If you are taking the course online, you are allowed to come to class and take the exam in person. If you choose not to do that, you must have a proctor for all exams. You must use the University of North Carolina Proctoring Network. See http://online.northcarolina.edu/exams/overview.htm.

Lecture schedule and reading assignments

The following schedule lists brief topics along with sections from the book that cover them. You should read the relevant sections of the book before the class.

This outline is tentative and will probably need to be adjusted.

Date Topics Reading
M. 1/11 Introduction. Compilers. Structure of compilers.
Domain-specific languages.
Project: SFL.
Introduction
Dragon book, section 1.2.
W. 1/13 Programming in C.
Basics. Structures.
Review of C
Notes on C
Notes on C: structures
F. 1/15 C: Memory management and pointers
Arrays.
Assignment 1 (string table) assigned.
Review of C
Notes on C: memory
Notes on C: arrays
M. 1/18 Holiday  
W. 1/20 Lexical issues. Lexemes and tokens.
Examples of tokens. Token attributes.
Lexical rules. Lexical analyzer.
Tokens as integers.
Lexical issues
Dragon book, section 3.1.
F. 1/22 Describing sets of strings using regular expressions.
Examples of regular expressions.
Regular expressions
Dragon book, section 3.3.
M. 1/25 Flex. Flex regular expressions.
Handling non-token characters.
Assignment 1 (string table) due.
Assignment 2 (lexer) assigned.
Flex
Dragon book, section 3.5.
W. 1/27 Finite state machines.
Building your own lexer.
Implementing finite state machines.
Finite state machines
Dragon book, section 3.4.
F. 1/29 SFL. Syntax. Lists. Functional programs.
Examples of programs in SFL.
SFL
M. 2/1 SFL: Actions. Actions are values.
Handling actions in an interpreter.
Anatomy of SFL compiler/interpreter.
SFL
SFL implementation
W. 2/3 Exam 1.
(Lexical issues. Regular expressions.
Flex.)
F. 2/5 Expression trees. Abstract syntax trees.
Expression notation for abstract syntax trees.
Trees as data structures in C.
Operations on trees.
Abstract syntax trees
Dragon book, section 2.5.1.
M. 2/8 Interpreter. Simplifier for ASTs.
Examples of simplifications.
Assignment 2 (lexer) due.
Assignment 3 (abstract syntax tree) assigned.
Abstract syntax trees
W. 2/10 Handling functions and application in ASTs.
Beta-reduction rule. Examples.
Abstract syntax trees
Abstract syntax trees for SFL
SFL Interpreter
F. 2/12 Handling actions in the interpreter.
M. 2/15 Syntax. Context-free grammars.
Nonterminals. Start nonterminal.
Productions. Derivations.
Examples of context-free grammars.
Erasing productions
W. 2/17 Parse trees.
Ambiguity.
Left recursion and right recursion.
Indicating precedence in a grammar of expressions.
F. 2/19 Exam 2.
(Abstract syntax trees.
Transformations on ASTs. The interpreter.)
Assignment 3 (abstract syntax trees) due.
Assignment 4 (symbol table) assigned.
M. 2/22 The parsing problem.
Top-down approach. Bottom-up approach.
W. 2/24 Using the lookahead token to make decisions.
Left-recursion cannot be parsed top-down.
Left-factoring.
F. 2/26 Top-down parsing tables.
FIRST and FOLLOW sets.
Constructing parsing tables.
LL(1) grammars.
M. 2/29 Recursive descent.
Meaning of parsing functions.
Assignment 4 (symbol table) due.
Assignment 5 (interpreter) assigned.
W. 3/2 Bottom-up parsers.
Example of a bottom-up parse.
Shift-reduce parsers.
F. 3/4 Exam 3.
(Context-free grammars. Ambiguity.
Top-down parsing.
Building top-down parsing tables.
FIRST and FOLLOW sets.)
M. 3/7 to F. 3/11 Fall break  
M. 3/14 Shift-reduce parsing tables.
LR(0) items.
Constructing an LR(0) parsing table.
Closures.
W. 3/16 SLR(1) parsing tables.
Constructing an SLR(1) parsing table
F. 3/18 More on SLR(1) parsing tables
M. 3/21 Parsing conflicts.
Eliminating parsing conflicts.
W. 3/23 Using ambiguous grammars.
Using precedence to eliminate shift-reduce conflicts.
F. 3/25 Holiday  
M. 3/28 Bison. Writing a context-free grammar in Bison.
Disambiguating rules.
Assignment 5 (interpreter) due.
Assignment 6 (parser) assigned.
W. 3/30 Exam 4.
(Shift-reduce parsing.
Building SLR(1) parsing tables.
Parsing conflicts.)
F. 4/1 LR(1) parsers. LR(1) items.
Building an LR(1) parsing table.
M. 4/4 LALR(1) parsers.
Attribute equations.
W. 4/6 Synthesized attributes.
Inherited attributes.
Attributes in a recursive-descent parser.
F. 4/8 Attributes in a shift-reduce parser.
Assignment 6 (parser) due.
Assignment 7 (semantic parser) assigned.
M. 4/11 Garbage collection.
Implementing a garbage collector.
W. 4/13 More on garbage collection.
Compactifying.
F. 4/15 Code generators.
Intermediate code. Back end.
Syntax-directed translation.
M. 4/18 Examples of syntax directed translation for code generation.
Assignment 7 (semantic parser) due.
Assignment 8 (garbage collector) assigned.
W. 4/20 Optimizations.
Peephole optimization.
F. 4/22 Control-flow analysis
M. 4/25 Dataflow analysis.
T. 4/26 Today is treated as a Friday.
More dataflow analysis.
W. 4/27 Reading day.
Assignment 8 (garbage collector) due.
 
W. 5/4 Final exam, 11:00–1:30am.  

Additional information

For information about

please see the auxiliary information at http://www.cs.ecu.edu/~karl/5220/spr16/syllabus-aux.html.