# If you miss a key ... **ETH** zürich ... after yesterday's exercise session ... ... pick it up here! 2 # TLB management ETH zürich - Recall: the TLB is a cache. - Machines have many MMUs on many cores ⇒ many TLBs - Problem: TLBs should be coherent. Why? - Security problem if mappings change - E.g., when memory is reused - 1. Hardware TLB coherence - Integrate TLB mgmt with cache coherence - Invalidate TLB entry when PTE memory changes - Rarely implemented ### 2. Virtual caches - Required cache flush / invalidate will take care of the TLB - High context switch cost! - ⇒ Most processors use physical caches ### 5. Software TLB shootdown - Most common - OS on one core notifies all other cores Typically an IPI - Each core provides local invalidation ### 6. Hardware shootdown instructions - Broadcast special address access on the bus - Interpreted as TLB shootdown rather than cache coherence message - E.g., PowerPC architecture ## **Our Small Quiz** - True or false (raise hand) - 1. Base (relocation) and limit registers provide a full virtual address space - 2. Base and limit registers provide protection - 3. Segmentation provides a base and limit for each segment - 4. Segmentation provides a full virtual address space - 5. Segmentation allows libraries to share their code - 6. Segmentation provides linear addressing - 7. Segment tables are set up for each process in the CPU - 8. Segmenting prevents internal fragmentation - 9. Paging prevents internal fragmentation - 10. Protection information is stored at the physical frame - 11. Pages can be shared between processes - 12. The same page may be writeable in proc. A and write protected in proc. B - 13. The same physical address can be references through different addresses from (a) two different processes (b) the same process? - 14. Inverted page tables are faster to search than hierarchical (asymptotically) 11 **ETH** zürich # Today - Uses for virtual memory - Copy-on-write - Demand paging - Page fault handling - Page replacement algorithms - Frame allocation policies - Thrashing and working set - Book: OSPP Sections 9.5, 9.7 (all of 9 as refresh) # ETH zürich spcl.inf.ethz.ch ▼ @spcl\_eth # **Recap: Virtual Memory** - User logical memory ≠ physical memory. - Only part of the program must be in RAM for execution ⇒ Logical address space can be larger than physical address space - Address spaces can be shared by several processes - More efficient process creation - Virtualize memory using software+hardware # ETH zürich spcl.inf.ethz.ch y @spcl\_eth # The many uses of address translation - Process isolation - IPC - Shared code segments - Program initialization - Efficient dynamic memory allocation • - Cache management - Program debugging - Efficient I/O - Memory mapped files - Virtual memory - Checkpoint and restart - Persistent data structures - Process migration - Information flow control - Distributed shared memory and many more ... # Recall fork () Can be expensive to create a complete copy of the process' address space Especially just to do exec()! Vfork(): shares address space, doesn't copy Fast Dangerous – two writers to same heap Better: only copy when you know something is going to get written # Copy-on-Write **ETH** zürich COW allows both parent and child processes to initially share the same pages in memory If either process modifies a shared page, only then is the page copied - COW allows more efficient process creation as only modified pages are copied - Free pages are allocated from a pool of zeroed-out pages # **Demand Paging** - Page needed ⇒ reference (load or store) to it - invalid reference ⇒ abort - not-in-memory ⇒ bring to memory - Lazy swapper never swaps a page into memory unless page will be needed - Swapper that deals with pages is a pager - Can do this with segments, but more complex - Strict demand paging: only page in when referenced # **Page Fault** If there is a reference to a page, first reference to that page will trap to operating system: page fault - 1. Operating system looks at another table to decide: - Invalid reference ⇒ abort - Just not in memory - 2. Get empty frame - 3. Swap page into frame - 4. Reset tables - 5. Set valid bit v - 6. Restart the instruction that caused the page fault # **ETH** zürich Memory access time = 200 nanoseconds **Demand paging example** - Average page-fault service time = 8 milliseconds - = EAT = $(1 p) \times 200 + p (8 \text{ milliseconds})$ = $(1 - p) \times 200 + p \times 8,000,000$ = $200 + p \times 7,999,800$ - If one access out of 1,000 causes a page fault, then EAT = 8.2 microseconds. This is a slowdown by a factor of 40!! # Page Replacement # What happens if there is no free frame? - Page replacement find "little used" resident page to discard or write to disk - "victim page" - needs selection algorithm - performance want an algorithm which will result in minimum number of page faults - Same page may be brought into memory several times # Page replacement - Try to pick a victim page which won't be referenced in the future - Various heuristics but ultimately it's a guess - Use "modify" bit on PTE - Don't write "clean" (unmodified) page to disk - Try to pick "clean" pages over "dirty" ones (save a disk write) **ETH** zürich - Stack implementation keep a stack of page numbers in a double link form: - Page referenced: move it to the top requires 6 pointers to be changed - No search for replacement - General term: stack algorithms - Have property that adding frames always reduces page faults (no Belady's Anomaly) # Allocation of frames - Each process needs minimum number of pages - Example: IBM 370 6 pages to handle SS MOVE instruction: - instruction is 6 bytes, might span 2 pages - 2 pages to handle from - 2 pages to handle to - Two major allocation schemes - fixed allocation **ETH** zürich priority allocation # ETH zürich #### **Fixed allocation** - Equal allocation - all processes get equal share - Proportional allocation - allocate according to the size of process $$s_i = \text{size of process } p_i$$ $m = 64$ $S = \sum s_i$ $s_1 = 10$ $m = \text{total number of frames}$ $s_2 = 127$ $a_i = \text{allocation for } p_i = \frac{s_i}{S} \times m$ $a_1 = \frac{10}{137} \times 64 \approx 5$ $a_2 = \frac{127}{137} \times 64 \approx 59$ #### ETH zürich ### spcl.inf.ethz.ch ## **Priority allocation** - Proportional allocation scheme - Using priorities rather than size - If process P<sub>i</sub> generates a page fault, select: - 1. one of its frames, or - 2. frame from a process with lower priority #### Global vs. local allocation ETH zürich - Global replacement process selects a replacement frame from the set of all frames; one process can take a frame from another - Local replacement each process selects from only its own set of allocated frames # #### **ETH** zürich #### spcl.inf.ethz.ch graph @spcl\_eth ### Allocate demand frames - $D = \Sigma WSS_i = total demand frames$ - Intuition: how much space is really needed - D > m ⇒ Thrashing - Policy: if D > m, suspend some processes #### **ETH** zürich #### spcl.inf.ethz.ch y @spcl\_eth ## Working-set model Page reference string: ...2615777751623412344434344413234443444... #### Keeping track of the working set - Approximate with interval timer + a reference bit - Example: $\Delta = 10,000$ - Timer interrupts after every 5000 time units - Keep in memory 2 bits for each page - Whenever a timer interrupts shift+copy and sets the values of all reference bits to 0 - If one of the bits in memory = 1 ⇒ page in working set - Why is this not completely accurate? - Hint: Nyquist-Shannon! #### Keeping track of the working set - Approximate with interval timer + a reference bit - Example: $\Delta = 10,000$ - Timer interrupts after every 5000 time units - Keep in memory 2 bits for each page - Whenever a timer interrupts shift+copy and sets the values of all reference bits to 0 - If one of the bits in memory = 1 ⇒ page in working set - Why is this not completely accurate? - Improvement = 10 bits and interrupt every 1000 time units