Institute of

Theoretical Computer Science

Carl-Friedrich-Gauß-Fakultät

Technische Universität Braunschweig

This web page accompanies our submissions Checking and Enforcing robustness against TSO and Lazy TSO Reachability. Here you can

Trencher

Trencher is a tool for checking robustness against the x86-TSO memory model.

Trencher is written in C++ and uses Spin as a back-end model checker. It translates a reachability query to a program in Promela. Spin takes the Promela program and produces C source code of a model checker. This source code is compiled by a C compiler. Finally, running the model checker answers the reachability query. So, each query involves the execution of three external programs. Trencher runs the queries concurrently whenever possible, in particular robustness queries for different attacks are parallelized.

Download

Trencher's sources, documentation, and examples are available on GitHub. The sources, documentation, and examples that accompany our FASE sumission are available on the following GitHub branch of Trencher.

Examples

We executed Trencher (namely, the fence insertion algorithm for enforcing robustness) on a set of examples, using a machine with Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz (4 cores) running GNU/Linux. The results are summarized in the table below, the description of these examples and the discussion can be found below.

Name T L I RQ NR1 NR2 R F Spin CC Ver Tre Real
Peterson2141823212920.77.10.50.12.9
Peterson (Corr)2162012120000.00.00.00.00.0
Dekker224309512285542.828.91.80.214.2
Dekker (Corr)2303630300000.00.00.00.00.0
Lamport33336369151261.013.46.00.15.9
Lamport (Corr)3394227270000.00.00.00.00.0
CLHLock7625854486000.44.60.20.01.6
MCSLock4525030264000.22.70.40.00.9
Lock-Free Stack4465014140000.00.00.00.00.0
Cilk's THE WSQ586791521418330.89.318.00.17.4
NBW: W|R21513330000.00.00.00.00.0
NBW: W|WR221191595110.22.30.20.00.8
NBW: W|WR (Corr)2222015150000.00.00.00.00.0
NBW: W|AR|RA32522972000.10.60.10.00.4
NBW: W|LR|RL4454530264000.22.50.20.00.7
Parker298201110.10.40.00.00.3
Parker (Corr)2109220000.00.00.00.00.0
Triangular Race253101000.00.20.00.00.2

Description of the examples

The first group of examples are mutual exclusion algorithms: Dekker's Algorithm, Peterson's Algorithm, Lamport's Fast Mutex Algorithm, CLH and MCS queue locks. The first three algorithms are executed in an infinite loop. The examples Dekker, Peterson, and Lamport are presented in two versions: original (not fenced) and corrected (properly fenced to ensure robustness). CLH and MCS locks are robust from the start.

The second group of examples are concurrent data structures: a lock-free stack from a concurrent malloc implementation (robust) and a work-stealing queue from the implementation of Cilk 5 programming language (non-robust).

The choice of the third group of examples was inspired by S. Owens' paper Reasoning about the Implementation of Concurrency Abstractions on x86-TSO published in Proc. of ECOOP 2010:

Observations

rdtsc-mfence: Benchmarking memory fences

rdtsc-mfence is a tool for benchmarking the CPU instructions that can act as memory fences on AMD64 architecture. The tool measures the average number of cycles required for executing such an instruction, combined with several write instructions. You can download the archive with sources: rdtsc-mfence.tar.gz.

Charts

We ran the tool on several x86-64 CPUs running in 64-bit mode. We used the following patterns of assembler code:

where movl is a 32-bit MOV instruction to a global address. All MOVs in a single pattern access different adjacent addresses. The pattern was executed 108 times, in one or two concurrent threads; number of CPU cycles used was measured using RDTSC instruction and the mean was taken.

Results are presented in the charts below.

Mobile AMD Sempron Processor 3400+ (1 Thread) Intel Xeon E5420 CPU @ 2.50GHz (1 Thread) Intel Xeon E5420 CPU @ 2.50GHz (2 Threads) Intel Core i5 M650 CPU @ 2.67GHz (1 Thread) Intel Core i5 M650 CPU @ 2.67GHz (2 Threads) Intel Core 2 Duo P8700 CPU @ 2.53GHz (1 Thread) Intel Core 2 Duo P8700 CPU @ 2.53GHz (2 Threads) Intel Xeon X5650 CPU @ 2.67GHz (1 Thread) Intel Xeon X5650 CPU @ 2.67GHz (2 Threads on Different CPU Sockets) Intel Xeon X5650 CPU @ 2.67GHz (2 Threads on the Same Socket)

Observations

Based on the diagrams, we can do several empirical observations.

Summing up, to optimize performance of an application we must minimize the number of times memory fence instructions are executed.