Something about ISA Emulation -- 2
Binary Translation
Work on multiple instructions at once
- 
less management 
- 
use sequence (of codes) without control flow change (“basic block” = BB) - „dynamic BB“: ends at jump/branch instruction
- „static BB“ (term for compilers): ends at jump instr. or before jump target
 
- 
piecewise translation - 
only when execution is required 
- 
detection of instruction borders now simple (compare with static translation: what is code?) 
 
- 
Dynamic Binary Translation
- Components required
- code cache (CC)
sometimes called translation cache (TC)
- used to store translated basic blocks (BBs)
- has to be invalidated when original code
- is removed („unmapped“)
- is modified
 
 
- translation table
- maps start addresses of BBs: SPC => TPC
- looked up for (indirect) jumps
 
- original code and data
- are kept for potential accesses
 
 
- code cache (CC)
sometimes called translation cache (TC)
Code Cache
- properties in common with processor cashes
- limited size => replacement strategy (LRU, LFU, FIFO, …)
- processor caches have
- tags
- assoziativity
- cache line length
- coherency protocol
 
- differences to processor caches
- on a cache miss …
- differenttranslationspossibleforsameBB
- cachehierarchy?
 
- variable sizes of translated code
- interdependencies among entries (see chaining …)
 
- on a cache miss …
 
Code Cache: Invalidation
- required for self modifying code („SMC“)
- detection
- use host MMU (Memory Management Unit)
- guest code is write-protected
- on a write: invalidate all code on given page
- fall back: interpretation
- very slow with frequent modifications
 
- check for modification before every BB execution
- copy of original guest code required
- slow
- modification of currently running BB?
 
 
- use host MMU (Memory Management Unit)
Code Cache: Eviction handling
- 
Cache is full, but translation to be stored 
- 
LRU - 
evict the translated BB least recently used 
- 
problems - 
need to maintain order of uses => Overhead (time stamps / linked lists) 
- 
fragmentation of cache storage – possible solution: buddy lists 
- 
eviction events happen often, so need to be fast 
 
- 
 
- 
=> LRU is rarely used. What is better?
- 
Goal: minimum runtime overhead 
- 
Solutions: - 
complete flush when full - 
advantage: regular retranslation – adaptation of optimizations to current runtime behavior – old/unneeded translations get removed 
- 
disadvantage – frequently used BBs have to be translated often 
 
- 
 
- 
– flush on detection of execution phase change – blocks with FIFO replacement
Chaining
- 
Observation: expensive actions - lookup SPC => TPC
- indirect jump
 
- 
Chaining = Linking translated BBs 
- 
on known successor - last guest instruction is
- unconditional jump
- conditional jump (2 cases: follow/pass-through)
 
- lookup jump targets at translation time
- what if successor not translated yet?
 
 
- last guest instruction is
- 
on unknown successor - indirect jump
- convert into if-then-else chains (profile targets!)
 
- return from function
- use shadow stack (similar to return prediction)
 
 
- indirect jump
Superblocks (SB)
- Motivation
- reduce number of jumps for given guest code (same as with chaining)
- larger translation units allow for more optimization possibilities (see next slides)
 
- Superblock
- One entry, multiple “side exits”
- we may have branch which go out of SB in the middle, if you have a unconditional jumps in ur original code you may just go on. I you have a conditional jump you may have a side exits at that point
 
- Combine sequence of BBs
- works best if the execution path of the full SB is similar to later executions (it is “hot”) => predict from past
- also called “trace” (“tracing JITs”) => execution path of BBs
 
 
- One entry, multiple “side exits”