# Examining what’s wrong with your CPU: Meltdown and Spectre bug

[Originally written on Medium.com]

Meltdown and Spectre have taken the whole cyber security industry by storm and took various multinational companies, which include: Intel, NVIDIA, ARM and AMD, with surprise. There have been rumors that, in the wake of these bugs, some CEOs are already cashing on their company stocks. Software companies are releasing fixes out of their development cycles and cloud computing providers are mass mailing their customers in advance for any inconvenience during inevitable updates.

Experts have claimed that this class of vulnerability allow unhindered access to the whole of the computer memory. Think:

• JavaScript on a random website gaining access to information you entered on other tabs
• Other VMs reading data inside your VM if you use shared hosting

This bug is a nasty one.

Testing found that mitigation strategies will have up to 30% performance reduction and can be more in specialized cases (Case in point: iPhone 6). Updates that are being rolled out are breaking computers left and right.

And we can’t even confirm if it can actually be fixed.

# So, what is this bug?

Quoting Wikipedia:

Spectre is a vulnerability with implementations of branch prediction that affects modern microprocessors with speculative execution.

That is a mouthful of words. Lets break it down.

## Speculative execution

Modern processors try hard to do all the work you give them as soon as possible. But, sometimes, they just cannot. Especially, when that computation requires some data that is not already available to them. In such cases, while waiting for it’s arrival, they use statistics and magic to predict the value of the required data and do useful work with the predicted value but don’t actually commit the values to the memory. The mechanism of this predictor is collectively know as “branch prediction”.

Once the data has arrived, the predicted value is compared against the actual value and the computation is committed to memory if the values match. Otherwise, computation done is thrown away and everything is reworked with correct value. This speculation on the processor’s part is called Speculative execution.

## Branch prediction

Every high performance, modern processor has logic built into it to predict either a memory address after function return or a branch in an ‘if-else’ type statement in your code by extensively profiling it and storing it’s history in internally. There are various ways a branch predictor can work but you only need to understand the role they play, which is to predict program flow in advance. Your processor literally has a mini brain and some manufacturers went as far as putting an actual neural network in there.

## Cache

Access to the main memory is often painfully slow on most modern architectures, which is the reason why multiple levels of increasingly smaller, closer and incrementally faster memory is present on hardware. Caches are also transparent from a software point of view. These are named as ‘LX’ cache where X is a number which represent how close it is to the main processor with L1 being the closest and fastest.

Every cache can store limited number of entries, where each entry occupies a fixed size on the cache. When an entry storing required data is found in a cache, it is a cache hit, otherwise a cache miss. When a cache hit occurs, the requested data can be returned in a few cycles but when a cache miss occurs, it may take multiple 1000 cycles to return data.

# How do they fit together?

The processor spends 95% of it’s time executing the same tightly packed loops over and over again and these loops usually only make 1% of the whole program. The CPU is carefully designed to maximize the execution of this 1% code. CPU designers achieve this by:

• Having lots of cache so that the processor is not stalled on memory access.
• Predicting the branch which the program will probably execute and fulfill it’s requirements in advance.
• Executing the instructions of predicted branch speculatively and commit computation if prediction matches the actual value.

# Back to the bug…

Although speculative execution is supposed to have no side effects if the comparison step fails, it just so happened that speculative execution caused new entries to be added into the cache which would otherwise be only in the RAM.

The clever part is that the information whether certain entries exists in the cache, by measuring the time taken to access it, can be used to infer protected memory 1 bit at a time.

## Enter Time stamp counter instruction…

The Time Stamp Counter (TSC) is present on modern processors. It counts the number of cycles since start of the processor.

Using TSC instruction (or RDTSCP as it is called in x86), it is possible to find out how much time it took for some data to be retrieved from the memory. Hence, infer some information about the data.

Consider the code snippet below:

int length = 8 // variable used to train branch predictor
int access_pattern[9] = {0, 1, 2, 3, 4, 5, 6, 7, 10000}
int arr[4096] = { /*random 4096 values*/ }
int accu = 0

for (int i = 0; i < 9; i = i + 1) {
function_flush_cache()
accu = access_pattern[i]
if (i < length) {
int out_of_bound_memory = arr[accu]
int offset = (out_of_bound_memory&1 * 256) + 512
int res = arr[offset]
accu = res + 1
} else {
int t1 = function_get_tsc()
accu = arr[512]
int t2 = function_get_tsc()
int diff1 = t2 - t1
int t3 = function_get_tsc()
accu = arr[768]
int t4 = function_get_tsc()
int diff2 = t4 - t3
if (diff1 > diff2) {
} else {
}
}
}


Let’s trace the execution of this code as it happens inside the CPU.

For the first 8 iterations of the ‘for’ loop, it is executed as one would expect it. This is done with the purpose of reliably training the branch predictor to predict consequent block of the ‘if’ statement.

The meat is in the alternative block of the ‘if’ statement.

In the 9th iteration, when the processor tries to evaluate the condition, it asks the memory controller to fetch the value of variable ‘length’ and in the meanwhile, it starts speculative execution on the consequent block of the ‘if’ statement because the branch predictor had confidence that the consequent block will execute due to previous training and history.

During this speculative execution, important events that occur are:

• Protected memory at location arr[10000] (variable i is 8 and offset is access_pattern[i]) is fetched and as this is speculative execution, out-of-bound error is not raised.
• 1 bit is extracted out of this protected memory and depending on the value of this bit either arr[256] or arr[768] is cached.
• Once variable ‘length’ arrives, all computation done during speculative execution is rolled back but the cache still contains either one of arr[256] or arr[768].

The alternative block now tries to read arr[512] and arr[768] and counts the cycles it takes. If arr[512] loads faster than arr[768], that means the value of the extracted bit is 0, otherwise it is 1.

Congratulations! You have just read one bit of protected memory and in a similar way you can actually get whole dump of the system memory albeit at a very slow speed of ~2000 bytes/second.