Mornfall’s DIVINE tool and me


Actually, my LART Tool and Program Abstractions

Henrich Lauko

What are we trying to solve?

How to handle data nondeterminism in program analysis

  • user inputs (integers, strings)
  • values from program interface/libraries
  • program environment:
    • operating system
    • file system
    • network
int main() 
{
    int i = input();
    if (i == 0)
        ...
    else
        ...
}

General Approach

Represent sets of input values abstractly


Example: $a =\{1, 2, 3\}$$b = \{ x \mid x \text{ is even} \}$

  • symbolic representation:
    • term domain: $\bar{a} = v \geq 1 \wedge v \leq 3$$\,\bar{b} = v \text{ mod } 2 \equiv 0$
  • abstract representation:
    • interval domain: $\bar{a} = [1, 3]$$\,\bar{b} = [0, \infty]$
    • sign domain: $\bar{a} = \{+\}$$\,\bar{b} = \{0, +\}$
    • unit domain: $\bar{a} = unit$$\,\bar{b} = unit$
    • modulo domain: $\bar{a} = \{0, 1\}$$\,\bar{b} = \{0\}$ (in mod 2 domain)

Problems to Solve in our execution model

  1. domain representation (values, operations)
  1. nondeterministic control flow
  1. branch constraint propagation
  1. concrete execution together with abstract execution
  1. domain interactions
int main() 
{
    int i = input();
    if (i < 42)
        ... // i < 42
    else
        ... // i >= 42
}

Usually handled in the abstract interpreter

Abstractions in DIVINE

Integration into DIVINE: Compilation-based Abstraction

La La Land

  • LART: LLVM Abstraction \& Refinement Tool
    1. concrete execution together with abstract execution
    2. branch constraint propagation
  • LAVA: A Library of Abstract VAlues
    1. domain representation (values, operations)
  • LAMP: A Library of Abstract Metadomain Packages
    1. domain interactions
  • NonDeterministic Runtime
    1. nondeterministic control flow

Program Abstraction Background

int x = input();
int y = x + 1;
if (y == 0) {
    int c = 2 + 7;
}
int x = __lamp_any_i32();
int y = __lamp_add(x, 1);
if (__lamp_eq(y, 0)) { // nondeterministic
    y = __lamp_assume(y == 0); // constrain y
    int c = 2 + 7;
}

Executable Compilation-based Abstraction

Demo Time

In [6]:
%%writefile simple.c
#include <lamp.h>
#include <stdio.h>

int main() {
    int i = __lamp_any_i32("i");
    if (i == 0) {
        printf("reached: i == 0\n");
    } else {
        printf("reached: i != 0\n");
    }
    printf("---------------\n");
}
Overwriting simple.c
In [10]:
!lartcc unit simple.c -o simple
[2021-05-03 14:22:39.310] [info] lartcc started
[2021-05-03 14:22:39.324] [info] lartcc finished
In [11]:
!./simple
unreachable
reached: i == 0
---------------
reached: i == 0
---------------
reached: i != 0
---------------

Abstract Dataflow

In [39]:
%%writefile dataflow.c
#include <lamp.h>
#include <stdio.h>
#include <assert.h>

int main() {
    int x = __lamp_lift(1, 3);
    int y = __lamp_lift(5, 10);
    int z = x + y;
    int u = z - 1;
}
Overwriting dataflow.c
In [40]:
%%dot.canvas
dataflow
In [41]:
!lartcc interval -g dataflow.c -o dataflow
[2021-05-03 14:31:23.544] [info] lartcc started
[2021-05-03 14:31:23.559] [info] lartcc finished
In [42]:
%%lart.simulator dataflow
run
`/home/xlauko1/src/lart/pres/paradise-seminar/dataflow' has changed; re-reading symbols.
Starting program: /home/xlauko1/src/lart/pres/paradise-seminar/dataflow 
[Inferior 1 (process 2021) exited normally]

Functor Domains: MString

  • abstract domain for analysis of C strings
  • segmentation abstract domain
  • parametrizable by domains:

    1. domain of _characters_

    2. domain of _indices_

  • abstract regular expressions - describe a set of strings
  • abstract standard library functions like, strlen, strcpy, etc.

Example

  • interval characters, term indices

  • represents strings of form $\{a,..,z\}^x \{1,..,9\}^y$
    where $x \mod 2 \equiv 0$ and $y \in \{0,1, 2\}$

Heap Layout Abstraction

int *x = malloc(sizeof(int));
int *y = malloc(sizeof(int));
if (uintptr_t(x) < uintptr_t(y))
    error();
int *x = malloc(sizeof(int));
free(x);
int *y = malloc(sizeof(int));
if (x == y)
    error();
  • distinguish two parts of pointer - location & offset:
    • &array[7] location is starting address of array, offset is 7
  • product domain of _concrete pointer_ $\times$ _abstract location_:
    • allocations return pair of _concrete address_ and _abstract location_
    • use _concrete address_ to perform memory manipulation
    • use _abstract location_ to perform relational operations

Heap Layout Abstraction Refinement

  • do not abstract all allocations in program
  • abstract only pointers that may lead to ambiguous behavior
  • checked in DIVINE's interpreter

Domain Categories & Interactions

  • scalar domains - integers, floats

  • memory/pointer domains (location, offset)

    1. location - heap layout abstraction
    2. offset - unbounded memory
    3. universal - symbolic memory
  • generic domains

LAMP defines a complete lattice over domains

  • each operation is only in one domain
  • lifts between domains
  • lattice order determines resolution between domains

Example: lift([$1$, $7$])$\, = 1 \geq v \wedge v \leq 7$

Current Work

  • Executable Abstraction (Heňo)

  • LTL Domain (Heňo)

  • Domain Specific Abstractions & How to pick a good domain? (Hugo)

  • Runtime Domain Refinement (Palo)

  • Compositional Program Analysis & Invariant Generation using Program Abstraction (Jakub)

  • File System Abstraction & OS Interaction Abstraction (Robo)

Thank you