thibaut
/
m2-pas-dm


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397
							\documentclass[11pt]{article}
\usepackage{fullpage}
\usepackage[bookmarks,hidelinks]{hyperref}

\usepackage{lscape}

%% Corrige quelques erreurs de LaTeX2ε
\usepackage{fixltx2e}
\usepackage{xspace}
\usepackage{microtype}

%% Pour ne pas commettre les erreurs fréquentes décrites dans l2tabu
\usepackage[l2tabu,abort]{nag}

%% Saisie en UTF-8
\usepackage[utf8]{inputenc}
%% Fonte : cf. http://www.tug.dk/FontCatalogue/
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{csquotes}

%% Pour écrire du français
% \usepackage[frenchb]{babel}

%% Pour composer des mathématiques
\usepackage{mathtools}

\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsthm}

\usepackage{stmaryrd}
\usepackage{semantic}

\newtheorem{lemma}{Lemma}
\newtheorem{definition}{Definition}

%% Commandes du paquet semantic : langage While
\reservestyle{\command}{\textbf}
\command{skip,while,do,if,then,else,not,and,or,true,false,int}
%% -> Utiliser \<begin> Pour styliser le mot-clé

\newcommand{\reduce}{\rightarrow^{*}}

\begin{document}

\author{Timoth\'ee Haudebourg \and Thibaut Marty}
\title{Homework PAS/SDL}
\date{November 18, 2016}

\maketitle

\section{Introduction}

Some programs of a language can be incorrect, that is programs which their execution will make no sense.

Ensuring that a program does exactly what we want it to do is a hard task.
Since Rice and his famous theorem, we know that automatically checking that a program respects its specification is impossible (in general).
There exists some techniques, not to ensure the validity of a program, but at least ensure that a program does not contains a certain type of nonsense.
Using a type system, and type checking, is one way to ensure that no operation in the program is performed when it has no sense.
Data-flow analysis for reachability is one other way to ensure that all used objects are defined.

However theses analysis are doomed to be imperfect.
That means they will either reject or accept programs even if they are respectively correct or not.
Such an analysis will always made some assumption that will not be true for every execution.

In this work, we first define a semantic for a small imperative language, \emph{While}.
Then, we define a type system and a static analysis for this language.

We will see that both of them have different flows. Each complement the other, and none of them are perfect.

\section{Small-step Semantics}

\subsection{Domains}

The $State$ domain of evaluation is defined as:
\[State = (Var \rightharpoonup Value) \cup \bot\]
Where $Var$ is the set of variables, and $Value$ the set of possible values.
We use $\bot$ to represent the error state.
In our model, a record is represented as a partial application $Field \rightharpoonup Value$.
This means that a record field can contains a record itself.
To do so, we define the set of value as:
\begin{align*}
	Value &= \bigcup_{i\in\mathbb{N}} Value_i \\
	Value_n &= \mathbb{N} \cup (Field \rightharpoonup Value_{n+1}) \\
\end{align*}

Because the variables are updated by copying, the codomain of the partial application is merely $Value$.
There is no references system.

\subsection{Denotational Semantic for expressions}

We define $\mathcal{A}$ the denotational semantics of arithmetic expressions as:

\[ \mathcal{A}|[ \bullet |] : AExpr \rightarrow State \rightarrow (Value \cup \bot) \]
where
\begin{align*}
  \mathcal{A} |[ n |] \sigma &= \mathcal{N} |[n|] \\
  \mathcal{A} |[ x |] \sigma &= \sigma x \\
  \mathcal{A} |[ e_1\ o\ e_2 |] \sigma &= \mathcal{A} |[ e_1 |] \sigma\ \bar{o}\ \mathcal{A} |[ e_2 |] \sigma, o \in \{+, -, \times\} \\
  \mathcal{A} |[ e.f |] \sigma &= \mathcal{A}|[e|] \sigma f\\
  \mathcal{A} |[ e_1{f \mapsto e_2} |] \sigma f &= \mathcal{A}|[ e_1 |] \sigma [f \mapsto \mathcal{A}|[ e_2 |] \sigma ]\\
  \mathcal{A} |[ \{f_1 = e_1 ; \dots ; f_n = e_n\} |] \sigma f_i &= \mathcal{A}|[ e_i |] \sigma \\
\end{align*}

For the undefined cases, for example when we try to add two records, we also note $\mathcal{A}|[e|] \sigma = \bot$.
In the rest of this work, we suppose $\mathcal{B}|[ \bullet |]$ the denotational semantic for booleans expressions already defined.

\subsection{Structural Operational Semantic for commands}

We define $->$ the structural operational semantics for commands as:

\[ ->\ \subseteq (Statement \times State) \times ((Statement \times State) \cup State) \]

\subsubsection{Regular execution}
For all $\sigma \neq \bot$:
\[
  \inference{}{(\<skip>, \sigma) -> \sigma}
\]
\[
  \inference{}{(x := a, \sigma) -> \sigma[x \mapsto \mathcal{A}|[a|] \sigma]}{\text{if }\mathcal{A}|[a|] \sigma \neq \bot}
\]
\[
  \inference{(S_1, \sigma) -> (S_1', \sigma')}{(S_1 ; S_2, \sigma) -> (S_1' ; S_2, \sigma')}
\]
\[
  \inference{(S_1, \sigma) -> \sigma'}{(S_1 ; S_2, \sigma) -> (S_2, \sigma')}
\]
\[
  \inference{}{(\<while>\ b\ \<do>\ S, \sigma) -> (S ; \<while>\ b\ \<do>\ S, \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{tt}}
\]
\[
  \inference{}{(\<while>\ b\ \<do>\ S, \sigma) -> \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{ff}}
\]
\[
  \inference{}{(\<if>\ b\ \<then>\ S_1\ \<else>\ S_2, \sigma) -> (S_1, \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{tt}}
\]
\[
  \inference{}{(\<if>\ b\ \<then>\ S_1\ \<else>\ S_2, \sigma) -> (S_2, \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{ff}}
\]

\subsubsection{Error handling}
An error can occurs when one tries to add (or compare) records.
In that case, according to our definition of arithmetic expressions, $\mathcal{A}|[e|] \sigma$ (or $\mathcal{B}|[b|]$) will returns $\bot$.
From this point, the execution of the program makes no sense, and we will return the error state.

\[
  \inference{}{(x := a, \sigma) -> \bot}{\text{if }\mathcal{A}|[a|] \sigma = \bot}
\]
\[
  \inference{}{(\<while>\ b\ \<do>\ S, \sigma) -> \bot}{\text{if } \mathcal{B}|[b|] \sigma = \bot}
\]
\[
  \inference{}{(\<if>\ b\ \<then>\ S_1\ \<else>\ S_2, \sigma) -> \bot}{\text{if } \mathcal{B}|[b|] \sigma = \bot}
\]

To ensure that the error can make it through the end, we also add the error propagation rule:

\[
  \inference{}{(S, \bot) -> \bot}
\]

Finally, note that according to the exercise statement, we suppose that all variables are already defined in $\sigma$,
so that no error can occur because of an undefined variable.

\section{Type system}

We suppose now that every variable in the program is declared with a type satisfying the following syntax:
\[
  t\quad::=\quad\text{int }|\ \{f_1 : t_1;\ \dots\ ; f_n : t_n\}
\]

\subsection{Definition}

We note $\Gamma \vdash S$ the type judgment for our language, defined by the following typing rules:

\[
  \inference{}{\Gamma \vdash n : int}
\]
\[
  \inference{(x : T) \in \Gamma}{\Gamma \vdash x : T}
\]
\[
  \inference{\Gamma \vdash a_1 : int & \Gamma \vdash a_2 : int}{\Gamma \vdash a_1\ o\ a_2 : int}{o \in \{+, -, \times\}}
\]

Records

\[
  \inference{\Gamma \vdash e_1 : T_1 & \dots & \Gamma \vdash e_n : T_n}{\Gamma \vdash \{f_1 = e_1 ; \dots ; f_n = e_n\} : \{f_1 : T_1 ; \dots; f_n : T_n\}}
\]

Field assignment

\[
  \inference[append ]{\Gamma \vdash e_2 : T & \Gamma \vdash e_1 : \{f_1 : T_1; \dots; f_n : T_n\}}{\Gamma \vdash e_1[f \mapsto e_2] : \{f : T; f_1 : T_1; \dots; f_n : T_n\}}
\]
\[
  \inference[override ]{\Gamma \vdash e_2 : T & \Gamma \vdash e_1 : \{f_1 : T_1; \dots; f_i : T_i; \dots; f_n : T_n\}}{\Gamma \vdash e_1[f_i \mapsto e_2] : \{f_1 : T_1; \dots; f_i : T; \dots; f_n : T_n\}}
\]

Field lookup

\[
  \inference{\Gamma \vdash e : \{f : T\}}{\Gamma \vdash e.f : T}
\]

As is, the last rule can seem too restrictive.
Indeed, an expression $e$ does not have to match \emph{exactly} the type $\{f : T\}$ to provide the field $f$.
It can contain some other fields.

We want the type $\{f : T\}$ to express \enquote{any record with a field $f$ of type $T$}.
To do this, we introduce the following sub-typing relation:

\[
  \inference{}{\{f_1 : T_1; \dots ; f_n : T_n; g_1 : S_1; \dots ; g_k : S_k\} <: \{f_1 : T_1; \dots ; f_n : T_n\}}
\]

Note that in order to assure the reflexivity of our relation,
a type $S$ is a sub-type of $T$ if all the fields of $T$ are included in $S$.
We also introduce a new subsumption typing rule:

\[
  \inference{\Gamma \vdash e : S & S <: T}{\Gamma \vdash e : T}
\]

Assignment
\[
  \inference{\Gamma \vdash x : T & \Gamma \vdash a : T}{\Gamma \vdash x := a}
\]

Sequence
\[
  \inference{\Gamma \vdash S_1 & \Gamma \vdash S_2}{\Gamma \vdash (S_1 ; S_2)}
\]

While loop
\[
  \inference{\Gamma \vdash b & \Gamma \vdash S}{\Gamma \vdash \<while>\ b\ \<do>\ S}
\]

Condition
\[
  \inference{\Gamma \vdash b & \Gamma \vdash S_1 & \Gamma \vdash S_2}{\Gamma \vdash \<if>\ b\ \<then>\ S_1\ \<else>\ S_2}
\]

As the boolean expressions can contain expressions which can contain variables, they are not necessarily well typed. Hence we need to define typing rules on booleans:
\[
  \inference{}{\Gamma \vdash \<true>}
\]
\[
  \inference{}{\Gamma \vdash \<false>}
\]
\[
  \inference{\Gamma \vdash b}{\Gamma \vdash \<not>\ b}
\]
\[
  \inference{\Gamma \vdash b_1 & \Gamma \vdash b_2}{\Gamma \vdash b_1\ o\ b_2}{o \in \{\<and>, \<or>\}}
\]
\[
  \inference{\Gamma \vdash a_1 : int & \Gamma \vdash a_2 : int}{\Gamma \vdash a_1 \le a_2}
\]
\[
  \inference{\Gamma \vdash a_1 : T & \Gamma \vdash a_2 : T}{\Gamma \vdash a_1 = a_2}
\]
Note: for the equality test, the rule works for \<int> as well as for records.

% Domains
\[
  n \in Num
\]
\[
  x \in Var
\]
\[
  e,e_1,e_2,a,a_1,a_2 \in AExpr
\]
\[
  b,b_1,b_2 \in BExpr
\]
\[
  f,f_1,\dots,f_n,f_i,g_1,\dots,g_k, \in Field % Field n'est pas défini…
\]
\[
  S,S_1,S_2 \in Statement
\]

\input{sections/correctness.tex}

\section{Static analysis}

We propose a constraint based static analysis to determine at each instruction which field is defined in each variables.
Our static analysis is incarnated by the function $RF$ (\enquote{Reachable Fields}).
\[RF : Lab -> Var -> \mathcal{P}(Field)\]
where $Lab$ is the set of labels.
One unique label is assigned to each instruction.
For the sake of simplicity, we note $[I]^l$ the instruction of label $l$.

\subsection{Definition}

Here is the definition of the static analysis.
We suppose that we dispose of the control flow graph $flow(P)$ of the program.

\[RF(l) = \bigcap_{(l', l) \in flow(P)} RF_{out}(l')\]

Where $RF_{out}$ is defined by:

\begin{align*}
RF_{out}([\<skip>]^l) &= RF(l) \\
RF_{out}([b]^l) &= RF(l)\quad(b \in BExp)\\
RF_{out}([x := e]^l) &= RF(l)[x \mapsto \mathcal{A}^\#|[e|] RF(l)]
\end{align*}

Where $\mathcal{A}^\#$ is defined by:

\[\mathcal{A}^\#|[\bullet|] : AExpr -> (Var -> \mathcal{P}(Field)) -> \mathcal{P}(Field)\]

\begin{align*}
\mathcal{A}^\#|[n|] \sigma &= \emptyset \\
\mathcal{A}^\#|[x|] \sigma &= \sigma x \\
\mathcal{A}^\#|[\{f_1 = e_1; \dots; f_n = e_n\}|] \sigma &= \{f_1; \dots; f_n\} \\
\mathcal{A}^\#|[e_1[f \mapsto e_2]|] \sigma &= \mathcal{A}^\#|[e_1|] \sigma \cup \{f\} \\
\mathcal{A}^\#|[e.f|] \sigma &= \emptyset
\end{align*}

\subsection{Example}

Let $S_1 = [x := \{f_1 = 0; f_2 = 42\}]^1$,
$S_3 = [y := \{f_1 = x.f_1 + 1\}]^3$,
$S_4 = [x := y]^4$, and
\[S = \<while>\ ([x.f_1 \le 100]^2)\ \<do>\ (S_3 ; S_4)\].

We perform the static analysis for the program $[S_1 ; S]^5$:
Let's begin with the initial state:

\begin{align*}
RF(1) &= \emptyset \\
RF(2) &= RF_{out}(1) \cap RF(4)_{out} \\
RF(3) &= RF(2)_{out} \\
RF(4) &= RF(3)_{out} \\
RF(5) &= RF(2)_{out}
\end{align*}

\begin{align*}
RF_{out}(1) &= RF(1)[x \mapsto \{f_1, f_2\}] \\
RF_{out}(2) &= RF(2) \\
RF_{out}(3) &= RF(3)[y \mapsto \{f_1\}] \\
RF_{out}(4) &= RF(4)[x \mapsto \{f_1\}]
\end{align*}

Now we can solve those equations and found:

\begin{align*}
RF(1) &= \emptyset \\
RF(2) &= RF(1)[x \mapsto \{f_1, f_2\}] \cap RF(4)[x \mapsto \{f_1\}] = \{x \mapsto \{f_1\}\} \\
RF(3) &= RF(2) = \{x \mapsto \{f_1\}\} \\
RF(4) &= RF(3)[y \mapsto \{f_1\}] = \{x \mapsto \{f_1\}, y \mapsto \{f_1\}\} \\
RF(5) &= RF(2) = \{x \mapsto \{f_1\}\}
\end{align*}

\subsection{Termination}

We can design an algorithm to solve this equation system under the hypothesis that there is no label $l$ such as $(l, l) \in flow(P)$.
At each step we can:
\begin{itemize}
\item Rewrite a $RF(l)$ as $\bigcap_{(l', l) \in flow(P)} RF_{out}(l')$, where $RF_{out}(l')$ can be simplified into a term that does not contain $RF(l)$ (because $(l, l) \not\in flow(P)$).
\item Because a program contains a finite number of instructions, it exists a time when no $RF(l)$ is left. The system is solved.
\end{itemize}

\subsection{Safety}

This is a pessimistic static analysis. Due to the intersection, we know that $RF(l)$ contains no field that is never defined during the execution (...need some material to do the proof).

\section{Discussion on the type system and static analysis}

The absence of undefined field reading is performed by two different ways by the type system and the static analysis.
On one hand the type system focuses on the expressions of a program, i.\,e. a statement is well typed if its expressions are well typed.
On the other hand the static analysis focuses one the commands.
Therefore, they are complementary.

The static analysis of the program $x := \{f = 1\} ; y := 3 + x$ works because there is no undefined field reading.
The variable $x$ contains one field $f$, and no other fields are read.
However, the type system rejects this program.
Indeed, the expression $3 + x$ is not well typed as the type of $x$ is $\{f:int\}$, not $int$.

The program $x := \{\}; \<if>\ b\ \<then>\ x.f := 1\ \<else>\ \<skip>; y := x.f$ is well typed because the type $\{\}$ is a sub-type of the type $\{f:int\}$.
However, its static analysis fails.
The static analysis will take the worst case of the two branches of the \<if> statement, that is the \<else> branch when $x$ does not get the extra field $f$.
Thus, the last statement, $y := x.f$, tries to reach the undefined (according to the analysis) field $f$.

\subsection{Extension of the static analysis}

A better solution of the two previous approaches is the union of them.
In other words, we could extend the static analysis by using the result of the type checking.
In this way, the two previous examples will be rightly rejected by the new analysis.

\end{document}