# **Design of Parallel and High-Performance Computing**

Fall 2013

**Lecture:** Linearizability

Instructor: Torsten Hoefler & Markus Püschel

TA: Timo Schneider

#### ETH

idgenössische Technische Hochschule Zürich

### **Review of last lecture**

- Cache-coherence is not enough!
  - Many more subtle issues for parallel programs!
- Memory Models
  - Sequential consistency
  - Why threads cannot be implemented as a library ©
  - Relaxed consistency models
  - x86 TLO+CC case study
- Complexity of reasoning about parallel objects

**DPHPC Overview** locality parallelism concepts & techniques vector ISA shared memory distributed memory - caches - memory hierarchy cache coherency memory distributed models algorithms group commu-nications locks lock free wait free linearizability Amdahl's and Gustafson's law memory PRAM α - β I/O complexity balance principles I balance principles II Little's Law scheduling

### Goals of this lecture

- Queue:
  - Locked

C++ locking (small detour)

- Wait-free two-thread queue
- Linearizability
  - Intuitive understanding (sequential order on objects!)
  - Linearization points
  - Linearizable executions
  - Formal definitions (Histories, Projections, Precedence)
- Linearizability vs. Sequential Consistency
  - Modularity

Lock-based queue

class Queue {
 private:
 int head, tail;
 std::vector<item> items;
 std::mutex lock;

 public:
 Queue(int capacity) {
 head = tail = 0;
 items.resize(capacity);
 }
 ...
};

Queue fields protected by single shared lock!



### **C++ Resource Acquisition is Initialization**

- RAII suboptimal name
- Can be used for locks (or any other resource acquisitions)
  - Constructor grabs resource
  - Destructor frees resource
- Behaves as if
  - Implicit unlock at end of block!
- Main advantages
  - Always free lock at exit
  - No "lost" locks due to exceptions or strange control flow (goto ©)
  - Very easy to use

```
class lock_guard<typename mutex_impl> {
  mutex_impl &_mtx; // ref to the mutex

public:
  scoped_lock(mutex_impl & mtx ) : _mtx(mtx) {
   _mtx.lock(); //lock mutex in constructor
  }
  *csoped_lock() {
   _mtx.unlock(); //unlock mutex in destructor
  }
};
```



### **Correctness**

- Is the locked queue correct?
  - Yes, only one thread has access if locked correctly
  - Allows us again to reason about pre- and postconditions
  - Smells a bit like sequential consistency, no?
- Class question: What is the problem with this approach?
  - Same as for SC ©

It does not scale!
What is the solution here?









### Is this correct?

- Hard to reason about correctness
- What could go wrong?

```
void enq(Item x) {
  if(tail-head == items.size()) {
    throw FullException;
  }
  items[tail % items.size()] = x;
  tail = (tail+1)%items.size();
}
```

Item deq() {
 if(tail == head) {
 throw EmptyException;
 }
 Item item = items[head % items.size()];
 head = (head+1)%items.size();

- Nothing (at least no crash)
- Yet, the semantics of the queue are funny (define "FIFO" now)!

14

## **Serial to Concurrent Specifications**

- Serial specifications are complex enough, so lets stick to them
  - Define invocation and response events (start and end of method)
- Extend the concept to concurrency: linearizability
- Each method should "take effect"
  - Instantaneously
  - Between invocation and response events
- Concurrent object is correct if this "sequential" behavior is correct
  - Called "linearizable"



### Linearizability

- Sounds like a property of an execution ...
- An object is called linearizable if all possible executions on the object are linearizable
- Says nothing about the order of executions!











































































### **About Executions**

- Why?
  - Can't we specify the linearization point of each operation without describing an execution?
- Not always
  - $\,\blacksquare\,\,$  In some cases, linearization point depends on the execution
- Define a formal model for executions!













### **Sequential Histories**

A history H is sequential if



A history H is concurrent if

It is not sequential

- First event of H is an invocation
- Each invocation (except possibly the last is immediately followed by a matching response
- Each response is immediately followed by an invocation

Method calls of different threads do not interleave

61

### **Well-formed histories**

Per-thread projections must be sequential

H=
A: q.enq(x)
B: p.enq(y)
B: p:void
B: q.deq()
A: q:void
B: q:x

H|A=
A: q.enq(x)
A: q:void
H|B=
B: p.enq(y)
B: p:void
B: q.deq()

B: q:x

62

## **Equivalent histories**

Per-thread projections must be the same

H=
A: q.enq(x)
B: p.enq(y)
B: p:void
B: q.deq()
A: q:void
B: q:x

G=
A: q.enq(x)
B: p.enq(y)
A: q:void

B: p.enq(y)
A: q:void
B: p:void
B: q.deq()
B: q:x

H|A=G|A=

A: q.enq(x) A: q:void

H|B=G|B=

B: p.enq(y)
B: p:void
B: q.deq()

63

## **Legal Histories**

- Sequential specification allows to describe what behavior we expect and tolerate
  - When is a single-thread, single-object history legal?
- Recall: Example
  - Preconditions and Postconditions
  - Many others exist!
- A sequential (multi-object) history H is legal if
  - For every object x
  - H|x adheres to the sequential specification for x

64

### **Precedence**

A: q.enq(x)
B: q.enq(y)
B: q:void
A: q:void
B: q.deq()
B: q:x

A method execution precedes another if response event precedes invocation event



Precedence vs. Overlapping

Non-precedence = overlapping

A: q.enq(x)
B: q.enq(y)
B: q:void
A: q:void
B: q.deq()
B: q:x

Some method executions overlap with others



6/

### **Precedence relations**

- Given history H
- Method executions m<sub>0</sub> and m<sub>1</sub> in H
  - $\bullet \quad m_0 \mathop{\rightarrow}_H m_1 \; (m_0 \; precedes \; m_1 \; in \; H) \; if$
  - Response event of m<sub>0</sub> precedes invocation event of m<sub>1</sub>
- Precedence relation m<sub>0</sub>→<sub>H</sub> m<sub>1</sub> is a
  - Strict partial order on method executions Irreflexive, antisymmetric, transitive
- Considerations
  - Precedence forms a total order if H is sequential
  - Unrelated method calls → overlap → concurrent

### **Definition Linearizability**

- A history H induces a strict partial order < H on operations
  - $\quad \quad \mathbf{m}_0 <_{\mathbf{H}} \mathbf{m}_1 \text{ if } \mathbf{m}_0 \rightarrow_{\mathbf{H}} \mathbf{m}_1$
- A history H is linearizable if
  - H can be extended to a history H'

by appending responses to pending operations or dropping pending operations

- H' is equivalent to some legal sequential history S and
- <<sub>H′</sub> ⊆ <<sub>S</sub>
- S is a linearization of H
- Remarks:

67

- For each H, there may be many valid extensions to H'
- For each extension H', there may be many S
- Interleaving at the granularity of methods

68

## Ensuring $<_{H'} \subseteq <_{S}$

Find an S that contains H'

$$<_{H'} = \{a \rightarrow c, b \rightarrow c\}$$

$$<_{S} = \{a \rightarrow b, a \rightarrow c, b \rightarrow c\}$$

$$<_{H'}$$

$$<_{H'}$$

$$c \text{ time}$$

$$<_{S}$$

## **Example**



## Example



## **Example**













### **Linearization Points**

- Identify one atomic step where a method "happens" (effects become visible to others)
  - Critical section
  - Machine instruction (atomics, transactional memory ...)
- Does not always succeed
  - One may need to define several different steps for a given method
  - If so, extreme care must be taken to ensure pre-/postconditions
- All possible executions on the object must be linearizable

```
void enq(Item x) {
    std::lock_guard<std::mutex> | (lock)
    if(tail-head == items.size()) {
        throw FullException;
    }
    items[tail % items.size()] = x;
    tail = (tail+1)%items.size();
}
Item deq() {
    std::lock_guard<std::mutex> | (lock)
    if(tail == head) {
        throw EmptyException;
    }
    ltem item = items[head % items.size()];
    head = (head+1)%items.size();
}
```

### Composition

- H is linearizable iff for every object x, H|x is linearizable!
  - Composing linearizable objects results in a linearizable system
- Reasoning
  - Consider linearizability of objects in isolation
- Modularity
  - Allows concurrent systems to be constructed in a modular fashion
  - Compose independently-implemented objects

## Linearizability vs. Sequential Consistency

- Sequential consistency
  - Correctness condition
  - For describing hardware memory interfaces
  - Remember: not existing ones!
- Linearizability
  - Stronger correctness condition
  - For describing higher-level systems composed from linearizable components

79

Map linearizability to sequential consistency

- Variables with read and write operations
  - Sequential consistency
- Objects with a type and methods
  - Linearizability
- Map sequential consistency ↔ linearizability
  - Reduce data types to variables with read and write operations
  - Model variables as data types with read() and write() methods
- Sequential consistency
  - A history H is sequential if it can be extended to H' and H' is equivalent to some sequential history S
  - Note: Precedence order (<H ⊆ <S) does not need to be maintained</li>

Example

time















## **Properties of sequential consistency** Theorem: Sequential consistency is not compositional

A: p.enq(x) A: p:void B: q.enq(y) B: q:void A: q.enq(x) A: q:void B: p.enq(y) B: p:void A: p.deq() A: p:y

B: q.deq()

B: q:x

Compositional would mean:

"If H|p and H|q are sequentially consistent, then H is sequentially consistent!"

This is not guaranteed for SC schedules!

See following example!

























### **Correctness: Linearizability**

- Sequential Consistency
  - Not composable
  - Harder to work with
  - Good way to think about hardware models
- We will use linearizability as in the remainder of this course unless stated otherwise

### **Study Goals**

- Define linearizability with your own words!
- Describe the properties of linearizability!
- Explain the differences between sequential consistency and linearizability!
- Given a history H
  - Identify linearization points
  - Find equivalent sequential history S
  - Decide and explain whether H is linearizable
  - Decide and explain whether H is sequentially consistent
  - Give values for the response events such that the execution is linearizable

104

### **Language Memory Models**

- Which transformations/reorderings can be applied to a program
- Affects platform/system
  - Compiler, (VM), hardware
- Affects programmer
  - What are possible semantics/output
  - Which communication between threads is legal?
- Without memory model
  - Impossible to even define "legal" or "semantics" when data is accessed concurrently
- A memory model is a contract
  - Between platform and programmer

### **History of Memory Models**

- Java's original memory model was broken
  - Difficult to understand => widely violated
  - Did not allow reorderings as implemented in standard VMs
  - Final fields could appear to change value without synchronization
  - Volatile writes could be reordered with normal reads and writes
     => counter-intuitive for most developers
- Java memory model was revised
  - Java 1.5 (JSR-133)
  - Still some issues (operational semantics definition)
- C/C++ didn't even have a memory model until recently
  - Not able to make any statement about threaded semantics!
  - Introduced in C++11 and C11
  - Based on experience from Java, more conservative

106

### **Everybody wants to optimize**

- Language constructs for synchronization
  - Java: volatile, final, synchronized, ...
  - C++: atomic, (NOT volatile!) ...
- Without synchronization (defined language-specific)
  - Compiler, (VM), architecture
  - Reorder and appear to reorder memory operations
  - Maintain sequential semantics per thread
  - Other threads may observe any order (have seen examples before)

### Java and C++ High-level overview

- Relaxed memory model
  - No global visibility ordering of operations
  - Allows for standard compiler optimizations
- But
  - Program order for each thread (sequential semantics)
  - Partial order on memory operations (with respect to synchronizations)
  - Visibility function defined
- Correctly synchronized programs
  - Guarantee sequential consistency
- Incorrectly synchronized programs
  - Java: maintain safety and security guarantees
     Type safety etc. (require behavior bounded by casuality)
  - C++: undefined behaviorNo safety (anything can happen/change)

108

107







Memory semantics

Similar to synchronization variables

Thread 1

| x = 10
| v = 5
| v = 5
| v = 10
| print(x+y)

All memory accesses before an unlock ...

are ordered before and are visible to ...

any memory access after a matching lock!

Synchronization Variables
 Variables can be declared volatile (Java) or atomic (C++)
 Reads and writes to synchronization variables
 Are totally ordered with respect to all threads
 Must not be reordered with normal reads and writes
 Compiler
 Must not allocate synchronization variables in registers
 Must not swap variables with synchronization variables
 May need to issue memory fences/barriers
 ...



## **Memory Model Rules**

- Java/C++: Correctly synchronized programs will execute sequentially consistent
- Correctly synchronized = data-race free
  - iff all sequentially consistent executions are free of data races
- Two accesses to a shared memory location form a data race in the execution of a program if
  - The two accesses are from different threads
  - At least one access is a write and
  - The accesses are not synchronized

