123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397 |
- \documentclass[11pt]{article}
- \usepackage{fullpage}
- \usepackage[bookmarks,hidelinks]{hyperref}
- \usepackage{lscape}
- %% Corrige quelques erreurs de LaTeX2ε
- \usepackage{fixltx2e}
- \usepackage{xspace}
- \usepackage{microtype}
- %% Pour ne pas commettre les erreurs fréquentes décrites dans l2tabu
- \usepackage[l2tabu,abort]{nag}
- %% Saisie en UTF-8
- \usepackage[utf8]{inputenc}
- %% Fonte : cf. http://www.tug.dk/FontCatalogue/
- \usepackage[T1]{fontenc}
- \usepackage{lmodern}
- \usepackage{csquotes}
- %% Pour écrire du français
- % \usepackage[frenchb]{babel}
- %% Pour composer des mathématiques
- \usepackage{mathtools}
- \usepackage{amsfonts}
- \usepackage{amssymb}
- \usepackage{amsthm}
- \usepackage{stmaryrd}
- \usepackage{semantic}
- \newtheorem{lemma}{Lemma}
- \newtheorem{definition}{Definition}
- %% Commandes du paquet semantic : langage While
- \reservestyle{\command}{\textbf}
- \command{skip,while,do,if,then,else,not,and,or,true,false,int}
- %% -> Utiliser \<begin> Pour styliser le mot-clé
- \newcommand{\reduce}{\rightarrow^{*}}
- \begin{document}
- \author{Timoth\'ee Haudebourg \and Thibaut Marty}
- \title{Homework PAS/SDL}
- \date{November 18, 2016}
- \maketitle
- \section{Introduction}
- Some programs of a language can be incorrect, that is programs which their execution will make no sense.
- Ensuring that a program does exactly what we want it to do is a hard task.
- Since Rice and his famous theorem, we know that automatically checking that a program respects its specification is impossible (in general).
- There exists some techniques, not to ensure the validity of a program, but at least ensure that a program does not contains a certain type of nonsense.
- Using a type system, and type checking, is one way to ensure that no operation in the program is performed when it has no sense.
- Data-flow analysis for reachability is one other way to ensure that all used objects are defined.
- However theses analysis are doomed to be imperfect.
- That means they will either reject or accept programs even if they are respectively correct or not.
- Such an analysis will always made some assumption that will not be true for every execution.
- In this work, we first define a semantic for a small imperative language, \emph{While}.
- Then, we define a type system and a static analysis for this language.
- We will see that both of them have different flows. Each complement the other, and none of them are perfect.
- \section{Small-step Semantics}
- \subsection{Domains}
- The $State$ domain of evaluation is defined as:
- \[State = (Var \rightharpoonup Value) \cup \bot\]
- Where $Var$ is the set of variables, and $Value$ the set of possible values.
- We use $\bot$ to represent the error state.
- In our model, a record is represented as a partial application $Field \rightharpoonup Value$.
- This means that a record field can contains a record itself.
- To do so, we define the set of value as:
- \begin{align*}
- Value &= \bigcup_{i\in\mathbb{N}} Value_i \\
- Value_n &= \mathbb{N} \cup (Field \rightharpoonup Value_{n+1}) \\
- \end{align*}
- Because the variables are updated by copying, the codomain of the partial application is merely $Value$.
- There is no references system.
- \subsection{Denotational Semantic for expressions}
- We define $\mathcal{A}$ the denotational semantics of arithmetic expressions as:
- \[ \mathcal{A}|[ \bullet |] : AExpr \rightarrow State \rightarrow (Value \cup \bot) \]
- where
- \begin{align*}
- \mathcal{A} |[ n |] \sigma &= \mathcal{N} |[n|] \\
- \mathcal{A} |[ x |] \sigma &= \sigma x \\
- \mathcal{A} |[ e_1\ o\ e_2 |] \sigma &= \mathcal{A} |[ e_1 |] \sigma\ \bar{o}\ \mathcal{A} |[ e_2 |] \sigma, o \in \{+, -, \times\} \\
- \mathcal{A} |[ e.f |] \sigma &= \mathcal{A}|[e|] \sigma f\\
- \mathcal{A} |[ e_1{f \mapsto e_2} |] \sigma f &= \mathcal{A}|[ e_1 |] \sigma [f \mapsto \mathcal{A}|[ e_2 |] \sigma ]\\
- \mathcal{A} |[ \{f_1 = e_1 ; \dots ; f_n = e_n\} |] \sigma f_i &= \mathcal{A}|[ e_i |] \sigma \\
- \end{align*}
- For the undefined cases, for example when we try to add two records, we also note $\mathcal{A}|[e|] \sigma = \bot$.
- In the rest of this work, we suppose $\mathcal{B}|[ \bullet |]$ the denotational semantic for booleans expressions already defined.
- \subsection{Structural Operational Semantic for commands}
- We define $->$ the structural operational semantics for commands as:
- \[ ->\ \subseteq (Statement \times State) \times ((Statement \times State) \cup State) \]
- \subsubsection{Regular execution}
- For all $\sigma \neq \bot$:
- \[
- \inference{}{(\<skip>, \sigma) -> \sigma}
- \]
- \[
- \inference{}{(x := a, \sigma) -> \sigma[x \mapsto \mathcal{A}|[a|] \sigma]}{\text{if }\mathcal{A}|[a|] \sigma \neq \bot}
- \]
- \[
- \inference{(S_1, \sigma) -> (S_1', \sigma')}{(S_1 ; S_2, \sigma) -> (S_1' ; S_2, \sigma')}
- \]
- \[
- \inference{(S_1, \sigma) -> \sigma'}{(S_1 ; S_2, \sigma) -> (S_2, \sigma')}
- \]
- \[
- \inference{}{(\<while>\ b\ \<do>\ S, \sigma) -> (S ; \<while>\ b\ \<do>\ S, \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{tt}}
- \]
- \[
- \inference{}{(\<while>\ b\ \<do>\ S, \sigma) -> \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{ff}}
- \]
- \[
- \inference{}{(\<if>\ b\ \<then>\ S_1\ \<else>\ S_2, \sigma) -> (S_1, \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{tt}}
- \]
- \[
- \inference{}{(\<if>\ b\ \<then>\ S_1\ \<else>\ S_2, \sigma) -> (S_2, \sigma)}{\text{if } \mathcal{B}|[b|] \sigma = \text{ff}}
- \]
- \subsubsection{Error handling}
- An error can occurs when one tries to add (or compare) records.
- In that case, according to our definition of arithmetic expressions, $\mathcal{A}|[e|] \sigma$ (or $\mathcal{B}|[b|]$) will returns $\bot$.
- From this point, the execution of the program makes no sense, and we will return the error state.
- \[
- \inference{}{(x := a, \sigma) -> \bot}{\text{if }\mathcal{A}|[a|] \sigma = \bot}
- \]
- \[
- \inference{}{(\<while>\ b\ \<do>\ S, \sigma) -> \bot}{\text{if } \mathcal{B}|[b|] \sigma = \bot}
- \]
- \[
- \inference{}{(\<if>\ b\ \<then>\ S_1\ \<else>\ S_2, \sigma) -> \bot}{\text{if } \mathcal{B}|[b|] \sigma = \bot}
- \]
- To ensure that the error can make it through the end, we also add the error propagation rule:
- \[
- \inference{}{(S, \bot) -> \bot}
- \]
- Finally, note that according to the exercise statement, we suppose that all variables are already defined in $\sigma$,
- so that no error can occur because of an undefined variable.
- \section{Type system}
- We suppose now that every variable in the program is declared with a type satisfying the following syntax:
- \[
- t\quad::=\quad\text{int }|\ \{f_1 : t_1;\ \dots\ ; f_n : t_n\}
- \]
- \subsection{Definition}
- We note $\Gamma \vdash S$ the type judgment for our language, defined by the following typing rules:
- \[
- \inference{}{\Gamma \vdash n : int}
- \]
- \[
- \inference{(x : T) \in \Gamma}{\Gamma \vdash x : T}
- \]
- \[
- \inference{\Gamma \vdash a_1 : int & \Gamma \vdash a_2 : int}{\Gamma \vdash a_1\ o\ a_2 : int}{o \in \{+, -, \times\}}
- \]
- Records
- \[
- \inference{\Gamma \vdash e_1 : T_1 & \dots & \Gamma \vdash e_n : T_n}{\Gamma \vdash \{f_1 = e_1 ; \dots ; f_n = e_n\} : \{f_1 : T_1 ; \dots; f_n : T_n\}}
- \]
- Field assignment
- \[
- \inference[append ]{\Gamma \vdash e_2 : T & \Gamma \vdash e_1 : \{f_1 : T_1; \dots; f_n : T_n\}}{\Gamma \vdash e_1[f \mapsto e_2] : \{f : T; f_1 : T_1; \dots; f_n : T_n\}}
- \]
- \[
- \inference[override ]{\Gamma \vdash e_2 : T & \Gamma \vdash e_1 : \{f_1 : T_1; \dots; f_i : T_i; \dots; f_n : T_n\}}{\Gamma \vdash e_1[f_i \mapsto e_2] : \{f_1 : T_1; \dots; f_i : T; \dots; f_n : T_n\}}
- \]
- Field lookup
- \[
- \inference{\Gamma \vdash e : \{f : T\}}{\Gamma \vdash e.f : T}
- \]
- As is, the last rule can seem too restrictive.
- Indeed, an expression $e$ does not have to match \emph{exactly} the type $\{f : T\}$ to provide the field $f$.
- It can contain some other fields.
- We want the type $\{f : T\}$ to express \enquote{any record with a field $f$ of type $T$}.
- To do this, we introduce the following sub-typing relation:
- \[
- \inference{}{\{f_1 : T_1; \dots ; f_n : T_n; g_1 : S_1; \dots ; g_k : S_k\} <: \{f_1 : T_1; \dots ; f_n : T_n\}}
- \]
- Note that in order to assure the reflexivity of our relation,
- a type $S$ is a sub-type of $T$ if all the fields of $T$ are included in $S$.
- We also introduce a new subsumption typing rule:
- \[
- \inference{\Gamma \vdash e : S & S <: T}{\Gamma \vdash e : T}
- \]
- Assignment
- \[
- \inference{\Gamma \vdash x : T & \Gamma \vdash a : T}{\Gamma \vdash x := a}
- \]
- Sequence
- \[
- \inference{\Gamma \vdash S_1 & \Gamma \vdash S_2}{\Gamma \vdash (S_1 ; S_2)}
- \]
- While loop
- \[
- \inference{\Gamma \vdash b & \Gamma \vdash S}{\Gamma \vdash \<while>\ b\ \<do>\ S}
- \]
- Condition
- \[
- \inference{\Gamma \vdash b & \Gamma \vdash S_1 & \Gamma \vdash S_2}{\Gamma \vdash \<if>\ b\ \<then>\ S_1\ \<else>\ S_2}
- \]
- As the boolean expressions can contain expressions which can contain variables, they are not necessarily well typed. Hence we need to define typing rules on booleans:
- \[
- \inference{}{\Gamma \vdash \<true>}
- \]
- \[
- \inference{}{\Gamma \vdash \<false>}
- \]
- \[
- \inference{\Gamma \vdash b}{\Gamma \vdash \<not>\ b}
- \]
- \[
- \inference{\Gamma \vdash b_1 & \Gamma \vdash b_2}{\Gamma \vdash b_1\ o\ b_2}{o \in \{\<and>, \<or>\}}
- \]
- \[
- \inference{\Gamma \vdash a_1 : int & \Gamma \vdash a_2 : int}{\Gamma \vdash a_1 \le a_2}
- \]
- \[
- \inference{\Gamma \vdash a_1 : T & \Gamma \vdash a_2 : T}{\Gamma \vdash a_1 = a_2}
- \]
- Note: for the equality test, the rule works for \<int> as well as for records.
- % Domains
- \[
- n \in Num
- \]
- \[
- x \in Var
- \]
- \[
- e,e_1,e_2,a,a_1,a_2 \in AExpr
- \]
- \[
- b,b_1,b_2 \in BExpr
- \]
- \[
- f,f_1,\dots,f_n,f_i,g_1,\dots,g_k, \in Field % Field n'est pas défini…
- \]
- \[
- S,S_1,S_2 \in Statement
- \]
- \input{sections/correctness.tex}
- \section{Static analysis}
- We propose a constraint based static analysis to determine at each instruction which field is defined in each variables.
- Our static analysis is incarnated by the function $RF$ (\enquote{Reachable Fields}).
- \[RF : Lab -> Var -> \mathcal{P}(Field)\]
- where $Lab$ is the set of labels.
- One unique label is assigned to each instruction.
- For the sake of simplicity, we note $[I]^l$ the instruction of label $l$.
- \subsection{Definition}
- Here is the definition of the static analysis.
- We suppose that we dispose of the control flow graph $flow(P)$ of the program.
- \[RF(l) = \bigcap_{(l', l) \in flow(P)} RF_{out}(l')\]
- Where $RF_{out}$ is defined by:
- \begin{align*}
- RF_{out}([\<skip>]^l) &= RF(l) \\
- RF_{out}([b]^l) &= RF(l)\quad(b \in BExp)\\
- RF_{out}([x := e]^l) &= RF(l)[x \mapsto \mathcal{A}^\#|[e|] RF(l)]
- \end{align*}
- Where $\mathcal{A}^\#$ is defined by:
- \[\mathcal{A}^\#|[\bullet|] : AExpr -> (Var -> \mathcal{P}(Field)) -> \mathcal{P}(Field)\]
- \begin{align*}
- \mathcal{A}^\#|[n|] \sigma &= \emptyset \\
- \mathcal{A}^\#|[x|] \sigma &= \sigma x \\
- \mathcal{A}^\#|[\{f_1 = e_1; \dots; f_n = e_n\}|] \sigma &= \{f_1; \dots; f_n\} \\
- \mathcal{A}^\#|[e_1[f \mapsto e_2]|] \sigma &= \mathcal{A}^\#|[e_1|] \sigma \cup \{f\} \\
- \mathcal{A}^\#|[e.f|] \sigma &= \emptyset
- \end{align*}
- \subsection{Example}
- Let $S_1 = [x := \{f_1 = 0; f_2 = 42\}]^1$,
- $S_3 = [y := \{f_1 = x.f_1 + 1\}]^3$,
- $S_4 = [x := y]^4$, and
- \[S = \<while>\ ([x.f_1 \le 100]^2)\ \<do>\ (S_3 ; S_4)\].
- We perform the static analysis for the program $[S_1 ; S]^5$:
- Let's begin with the initial state:
- \begin{align*}
- RF(1) &= \emptyset \\
- RF(2) &= RF_{out}(1) \cap RF(4)_{out} \\
- RF(3) &= RF(2)_{out} \\
- RF(4) &= RF(3)_{out} \\
- RF(5) &= RF(2)_{out}
- \end{align*}
- \begin{align*}
- RF_{out}(1) &= RF(1)[x \mapsto \{f_1, f_2\}] \\
- RF_{out}(2) &= RF(2) \\
- RF_{out}(3) &= RF(3)[y \mapsto \{f_1\}] \\
- RF_{out}(4) &= RF(4)[x \mapsto \{f_1\}]
- \end{align*}
- Now we can solve those equations and found:
- \begin{align*}
- RF(1) &= \emptyset \\
- RF(2) &= RF(1)[x \mapsto \{f_1, f_2\}] \cap RF(4)[x \mapsto \{f_1\}] = \{x \mapsto \{f_1\}\} \\
- RF(3) &= RF(2) = \{x \mapsto \{f_1\}\} \\
- RF(4) &= RF(3)[y \mapsto \{f_1\}] = \{x \mapsto \{f_1\}, y \mapsto \{f_1\}\} \\
- RF(5) &= RF(2) = \{x \mapsto \{f_1\}\}
- \end{align*}
- \subsection{Termination}
- We can design an algorithm to solve this equation system under the hypothesis that there is no label $l$ such as $(l, l) \in flow(P)$.
- At each step we can:
- \begin{itemize}
- \item Rewrite a $RF(l)$ as $\bigcap_{(l', l) \in flow(P)} RF_{out}(l')$, where $RF_{out}(l')$ can be simplified into a term that does not contain $RF(l)$ (because $(l, l) \not\in flow(P)$).
- \item Because a program contains a finite number of instructions, it exists a time when no $RF(l)$ is left. The system is solved.
- \end{itemize}
- \subsection{Safety}
- This is a pessimistic static analysis. Due to the intersection, we know that $RF(l)$ contains no field that is never defined during the execution (...need some material to do the proof).
- \section{Discussion on the type system and static analysis}
- The absence of undefined field reading is performed by two different ways by the type system and the static analysis.
- On one hand the type system focuses on the expressions of a program, i.\,e. a statement is well typed if its expressions are well typed.
- On the other hand the static analysis focuses one the commands.
- Therefore, they are complementary.
- The static analysis of the program $x := \{f = 1\} ; y := 3 + x$ works because there is no undefined field reading.
- The variable $x$ contains one field $f$, and no other fields are read.
- However, the type system rejects this program.
- Indeed, the expression $3 + x$ is not well typed as the type of $x$ is $\{f:int\}$, not $int$.
- The program $x := \{\}; \<if>\ b\ \<then>\ x.f := 1\ \<else>\ \<skip>; y := x.f$ is well typed because the type $\{\}$ is a sub-type of the type $\{f:int\}$.
- However, its static analysis fails.
- The static analysis will take the worst case of the two branches of the \<if> statement, that is the \<else> branch when $x$ does not get the extra field $f$.
- Thus, the last statement, $y := x.f$, tries to reach the undefined (according to the analysis) field $f$.
- \subsection{Extension of the static analysis}
- A better solution of the two previous approaches is the union of them.
- In other words, we could extend the static analysis by using the result of the type checking.
- In this way, the two previous examples will be rightly rejected by the new analysis.
- \end{document}
|