Showing posts with label compilers construction. Show all posts
Showing posts with label compilers construction. Show all posts

Source code Indenter-Capitalizer


Following the coursework related to the Compilers Construction discipline I attended during the Computer Engineering course, I was asked to indent and capitalize the reserved words (keywords) of a source code file. More specifically, to do this work I should use the Syntactic Analyzer built with Flex and YACC that was created in a previous coursework task.

The source code file in question is the one being analyzed by the syntactic analyzer. This way at the same time it analyses the syntactic structure of the file it also indents and capitalizes the keywords.

The following is the content of the syntactic analyzer file named sinan.yacc:

%{
   #include <stdio.h>
   #include <stdlib.h>

   int c;
%}

%token PROGRAM
%token ID
%token SEMIC
%token DOT
%token VAR
%token COLON
%token INTEGER
%token REAL
%token RCONS
%token BOOL
%token OCBRA
%token CCBRA
%token IF
%token THEN
%token ELSE
%token WHILE
%token DO
%token READ
%token OPAR
%token CPAR
%token WRITE
%token COMMA
%token STRING
%token ATRIB
%token RELOP
%token ADDOP
%token MULTOP
%token NEGOP
%token CONS
%token TRUE
%token FALSE

%%

prog : PROGRAM {printf("PROGRAM "); c = 0;}

       ID {printf("%s", yytext);}

       SEMIC {printf(";\n\n");}

       decls

       compcmd

       DOT {printf(".");}
       {
         printf("\n Syntactical Analisys done without erros!\n");

         return 0;
       }
     ;

decls : VAR {printf("VAR ");} decl_list

      ;

decl_list : decl_list decl_type
            decl_type
          ;

decl_type : id_list COLON {printf(":");} type SEMIC {printf(";\n");}
          ;

id_list : id_list COMMA {printf(", ");} ID {printf("%s", yytext);}
          ID {printf("%s", yytext);}
        ;

type : INTEGER {printf(" INTEGER");}
       REAL {printf(" REAL");}
       BOOL {printf(" BOOL");}
     ;

compcmd : OCBRA {int i; for(i = 0; i < c; i++)printf(" "); printf("{\n"); c = c + 2;} cmd_list CCBRA {printf("\n"); int i; for(i = 2; i < c; i++)printf(" "); printf("}"); c = c - 2;}
        ;

cmd_list : cmd_list SEMIC {printf(";\n\n");} cmd
           cmd
         ;

cmd : {int i; for(i = 0; i < c; i++)printf(" ");} If_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} While_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Read_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Write_cmd
      {int i; for(i = 0; i < c; i++)printf(" ");} Atrib_cmd
      compcmd
    ;

If_cmd : IF {printf("IF ");} exp THEN {printf(" THEN\n");} cmd Else_cmd
       ;

Else_cmd : ELSE {printf("\n"); int i; for(i = 0; i < c; i++)printf(" "); printf("ELSE\n"); c = c + 2;} cmd {c = c - 2;}

         ;

While_cmd : WHILE {printf("WHILE ");} exp DO {printf(" DO\n");} cmd
          ;

Read_cmd : READ {printf("READ");} OPAR {printf("(");} id_list CPAR {printf(")");}
         ;

Write_cmd : WRITE {printf("WRITE");} OPAR {printf("(");} w_list CPAR {printf(")");}
          ;

w_list : w_list COMMA {printf(", ");} w_elem
         w_elem
       ;
w_elem : exp
         STRING {printf("%s", yytext);}
       ;

Atrib_cmd : ID {printf("%s ", yytext);} ATRIB {printf("= ");} exp
          ;

exp : simple_exp
      simple_exp RELOP {printf(" %s ", yytext);} simple_exp
    ;

simple_exp : simple_exp ADDOP {printf(" %s ", yytext);} term
             term
           ;

term : term MULTOP {printf(" %s ", yytext);} fac
       fac
     ;

fac : fac NEGOP {printf(" %s", yytext);}
      CONS {printf(" %s", yytext);}
      RCONS {printf(" %s", yytext);}
      OPAR exp CPAR
      TRUE {printf("TRUE ");}
      FALSE {printf("FALSE ");}
      ID {printf("%s", yytext);}
    ;


%%

#include "lex.yy.c"

As you can see there is an integer variable named c right beneath the #include section. This variable is incremented and decremented according to the section of code being analyzed (parsed). Such variable is then used inside the for loops. According to its current value blank spaces are written on the screen so that the next token parsed is printed on the right position (indented).

Each keyword parsed is capitalized and printed on the screen through a simple printf command.

Let's run a simple test case with the following source code file named test.txt. The code is intentionally not indented and it doesn't do much. It's just for a testing purpose.

program factorial;

var n, fact, i: integer;
{
read(n);

fact = 1;

i = 1;

while i <= n do
{
fact = fact * i;

i = i + 1
};

if i >= fact then
{
i = fact;

fact = fact + 1
}
else
i = fact + 1;



if i < fact then
{
i = fact;

fact = fact + 1
};

write("The factorial of ", n, " is: ", fact)
}.

In blue are the keywords, so the output should present them in CAPITALIZED letters. The blocks of code should also be presented with indentation according to the logic specified above. I use 2 white spaces to indent the blocks of code. That's why I use c = c + 2 (to increment) and c = c - 2 (to decrement) the c variable is responsible for controlling the indentation.

To run the test case it's necessary to build the syntactic analyzer. I won't show here all the steps involved since it's already done in the paper described in the post Syntactic Analyzer built with Flex and YACC. So I'll just compile the sinan.yacc file since it's the only file that was modified to accomplish this task. The other necessary files to generate the executable file - such as the lexical analyzer ones - are included in the download package at the end of this post.

For a better understanding of the commands used in this post and to set up the development environment I recommend that you read at least section 3.3 of the paper Syntactic Analyzer built with Flex and YACC. That said, let's do the job. :-)

To run this test case, follow the steps listed bellow:

  1. Create a folder called IndentCapCode on the root directory C:\, which will contain all the files created henceforward.
  2. Open a command prompt and type: path=C:\MinGW\bin;%PATH%. I consider that the MinGW installation was done in the folder C:\MinGW. Change it accordingly. After completing this step the tools to build the syntactic analyzer will be available in the command prompt.
  3. Generate the file y.tab.c in the command prompt with following command: yacc sinan.yacc.
  4. Compile the files with GCC in the command prompt with the following command: gcc y.tab.c yyerror.c main.c -osinan -lfl.

The result file for the syntactic analyzer is sinan.exe. To use it just type sinan < test.txt. The file test.txt contains the source code to be analyzed by the syntactic analyzer.

You can see the commands listed above in action through the following screenshot:

In a next post related to compilers construction I'll show the implementation of a semantic analyzer. This implementation was another coursework task. I used the same principle shown here to indent the code and even to colorize the keywords. But this is for another day.

You can get the .ZIP package that contains the files used in this post: http://leniel.googlepages.com/CodeIndenterCapitalizerFlexYACC.zip

Note: if you want, you can use a batch file (.bat) to automate the steps listed above. A .bat file named ini.bat is included in the .ZIP package. For more information about batch files, read my previous post Programming-Constructing a Batch file.

Syntactic Analyzer built with Flex and YACC


Compilers construction paper
I and a dear brother in faith of mine called Wellington Magalhães Leite wrote a paper titled: A Syntactic Analyzer built with the CASE tools Flex and YACC

See the paper's abstract below:

The function of a syntactic analyzer in a compiler is to verify the syntactic structure of a program’s source code. It then detects, signalize and handle the syntactic errors found in the source code and at the same time servers as the framework for the front-end (user interface) of the program. So, its construction helps with the familiarization regarding the tasks included in the compiler project.

The language used in this paper does not have left recursion. The language neither presents subprograms, nor indexed variables and its only loop command is while for the sake of simplicity. The syntactic analyzer implemented uses the bottom-up parsing approach.

This paper presents the steps to the construction of a syntactic analyzer, which serves as a base element for a compiler implementation.

Keywords: syntactic analyzer, syntactical analysis, compiler construction, case tools, flex, yacc

CONTENTS
1 INTRODUCTION 7
  1.1 Objective 7
  1.2 Definition 7
      1.2.1 Grammar 7
            1.2.1.1 Grammar productions 8
            1.2.1.2 Lexical specifications 8
            1.2.1.3 Reserved words or keywords 10
            1.2.1.4 Operator types and attributes 11
            1.2.1.5 Separator types 11
2 DEVELOPMENT 13
  2.1 Lexical analysis 13
      2.1.1 Sample source code of a factorial program 13
      2.1.2 Flex 16
  2.2 Syntactical analysis 16
      2.2.1 Sample syntactic tree 16
      2.2.2 YACC 20
3 APPLICATION 21
  3.1 Constructing the file for the lexical analysis (lexan.lex) 21
  3.2 Constructing the file for the syntactic analysis (sinan.yacc) 21
  3.3 Guide to implementation 24
  3.4 Using a batch file to avoid boilerplate work 28
4 CONCLUSION 30
5 REFERENCES 31
6 ADDENDUM 32

See a screenshot of the syntactic analyzer in action:

SyntacticAnalyzerFlexYACCSyntaxError

You can get a PDF copy of the paper and the accompanying syntactical analyzer's files in a ZIP file at:

http://leniel.googlepages.com/SyntacticAnalyzerBuiltWithFlexYACC.pdf http://leniel.googlepages.com/SyntacticAnalyzerBuiltWithFlexYACC.zip