Something about ISA Emulation -- 2
Binary Translation
Work on multiple instructions at once
-
less management
-
use sequence (of codes) without control flow change (“basic block” = BB)
- „dynamic BB“: ends at jump/branch instruction
- „static BB“ (term for compilers): ends at jump instr. or before jump target
-
piecewise translation
-
only when execution is required
-
detection of instruction borders now simple (compare with static translation: what is code?)
-
Dynamic Binary Translation
- Components required
- code cache (CC)
sometimes called translation cache (TC)
- used to store translated basic blocks (BBs)
- has to be invalidated when original code
- is removed („unmapped“)
- is modified
- translation table
- maps start addresses of BBs: SPC => TPC
- looked up for (indirect) jumps
- original code and data
- are kept for potential accesses
- code cache (CC)
sometimes called translation cache (TC)
Code Cache
- properties in common with processor cashes
- limited size => replacement strategy (LRU, LFU, FIFO, …)
- processor caches have
- tags
- assoziativity
- cache line length
- coherency protocol
- differences to processor caches
- on a cache miss …
- differenttranslationspossibleforsameBB
- cachehierarchy?
- variable sizes of translated code
- interdependencies among entries (see chaining …)
- on a cache miss …
Code Cache: Invalidation
- required for self modifying code („SMC“)
- detection
- use host MMU (Memory Management Unit)
- guest code is write-protected
- on a write: invalidate all code on given page
- fall back: interpretation
- very slow with frequent modifications
- check for modification before every BB execution
- copy of original guest code required
- slow
- modification of currently running BB?
- use host MMU (Memory Management Unit)
Code Cache: Eviction handling
-
Cache is full, but translation to be stored
-
LRU
-
evict the translated BB least recently used
-
problems
-
need to maintain order of uses => Overhead (time stamps / linked lists)
-
fragmentation of cache storage
– possible solution: buddy lists
-
eviction events happen often, so need to be fast
-
-
=> LRU is rarely used. What is better?
-
Goal: minimum runtime overhead
-
Solutions:
-
complete flush when full
-
advantage: regular retranslation – adaptation of optimizations to current runtime behavior – old/unneeded translations get removed
-
disadvantage – frequently used BBs have to be translated often
-
-
– flush on detection of execution phase change – blocks with FIFO replacement
Chaining
-
Observation: expensive actions
- lookup SPC => TPC
- indirect jump
-
Chaining = Linking translated BBs
-
on known successor
- last guest instruction is
- unconditional jump
- conditional jump (2 cases: follow/pass-through)
- lookup jump targets at translation time
- what if successor not translated yet?
- last guest instruction is
-
on unknown successor
- indirect jump
- convert into if-then-else chains (profile targets!)
- return from function
- use shadow stack (similar to return prediction)
- indirect jump
Superblocks (SB)
- Motivation
- reduce number of jumps for given guest code (same as with chaining)
- larger translation units allow for more optimization possibilities (see next slides)
- Superblock
- One entry, multiple “side exits”
- we may have branch which go out of SB in the middle, if you have a unconditional jumps in ur original code you may just go on. I you have a conditional jump you may have a side exits at that point
- Combine sequence of BBs
- works best if the execution path of the full SB is similar to later executions (it is “hot”) => predict from past
- also called “trace” (“tracing JITs”) => execution path of BBs
- One entry, multiple “side exits”