You should must know about Meltdown and Spectre
According to Recent press reports a newly discovered figure of security hazard that involves attackers exploiting ordinary features of modern microprocessors (aka chips) that power our computers, tablets, smart phones, and other smart devices.
These attacks are called “Meltdown” and “Spectre”, are getting a lot of awareness. Citizens are (rightly) afraid, and it’s of course very imperative to apply all of the essential software updates that have been carefully created and made accessible. Leaders of technology, as well as Red Hat, are working together to address these exploits and minimise the risk of potential attacks.
To get going, let’s understand a bit about “provisional execution” by looking at an everyday analogy.
For example a standard client visits the same coffee shop and orders the same caffeinated cocktail every morning. Over time, the client gets to know the baristas, who become familiar with the customer’s order. Looking for a good service (and save their valued customer some time standing in line) the baristas ultimately make a decision to start preparing the customer’s order when they gesture at them as they enter through the front door. But one day, the customer changes their order. Now the barista has to throw away the formerly prepared coffee and make a new one while the client waits.
Image about information security (Meltdown and Spectre):
Further taking the analogy one step, For example the baristas know the customer’s name, and they like to write that name using a stable marker on their cup. When they theoretically prepare the normal brew, they write the customer’s name on the cup. If the customer comes in with a various order, the speculated cup is thrown away along with its stuffing. But in so doing, the cup’s personally special information is briefly discernible to anybody inspection.
In this coffee shop set-up involves gossip. The staff doesn’t know for sure when the client comes in that they’re going to order a latte or an Americano, but they know from chronological data what the customer typically orders and they make an educated guess to save the customer waiting. Similar theory happens throughout our everyday lives because such guesses often turn out to be true, and we can get more done in the same amount of time as a result. It’s like this with our computers. They use a method known as “speculative execution” to perform certain processing operations before it is known for certain that those operations will be required, on the premise that these guesses often turn out to accumulate time.
In the case of computers, speculative execution is used to decide what to do when confronted by a test like “if A, do this; otherwise, do that”. We call these tests conditions, and the code that executes as a result is part of what we term a provisional branch. A branch just means a section of the program that we choose to run in response to whatever the result of the condition turns out to be. Modern computer chips have sophisticated “branch predictors” that use fancy algorithms to determine what the result of the conditional test is likely to be while that test is still being calculated. In the interim, they speculatively execute code in the branch that seems to be most likely to run. If the guess turns out to be right, the chip appears to run faster than waiting for the test to complete. If the guess is wrong, the chip has to throw away any speculative results and run the other branch. Branch predictors are often over ninety 99% exact at guessing.
As you can see, the potential performance benefit from a chip speculatively executing the correct branch of code is significant. Indeed, speculative execution is one of the many optimisations that have helped to dramatically speed up our computers over the past couple of decades. When implemented correctly, the resulting performance benefit is substantial. The source of the newly discovered problems come from the chip design attempts to further optimise by assuming that speculation process is a black box that is completely invisible to outside observers (or bad guys).
Conventional industry wisdom was that whatever happened during the process of speculation (known as a “speculative execution window”) was either later confirmed and the results were used by the program, or it was not used and completely discarded. But it turns out that there are ways attackers can view what happened within the speculation window and manipulate the system as a result. An attacker can also steer the behaviour of branch predictors to cause certain code sequences to run speculatively that should never normally have been executed. We expect these vulnerabilities and other similar flaws which could exploit speculative execution to lead to fundamental changes in the way that future chips are designed so that we can have speculative execution without security risks.
Let’s dive a bit deeper into the attacks, starting with information security like Meltdown (variant 3) which received a lot of attention because of its broad impact. In this form of attack, the chip is fooled into loading secured data during a speculation window in such a way that it can later be viewed by an unauthorised attacker. The attack relies upon a commonly-used, industry-wide practice that separates loading in-memory data from the process of checking permissions. Again, the industry’s conventional wisdom operated under the assumption that the entire speculative execution process was invisible, so separating these pieces wasn’t seen as a risk.
Information security Meltdown, a carefully crafted branch of code first arranges to execute some attack code speculatively. This code loads some secure data to which the program doesn’t ordinarily have access. Because it’s happening speculatively, the permission check on that access will happen in parallel (and not fail until the end of the speculation window), and as a consequence special internal chip memory known as a cache becomes loaded with the privileged data. Then, a carefully constructed code sequence is used to perform other memory operations based upon the value of the privileged data. While the normally observable results of these operations aren’t visible following the speculation (which ultimately is discarded), a technique known as cache side-channel analysis can be used to determine the value of the secure data.
Mitigating Meltdown involves changing how memory is managed between application software and the operating system. We introduce a new technology, known as KPTI (Kernel Page Table Isolation), which separates memory such that secure data cannot be loaded into the chip’s internal caches while running user code. Taking extra steps every time application software asks the operating system to do something on its behalf (we call these “system calls”) results in a performance hit. The degree of performance hit varies roughly in line with how frequently an application needs to use such operating system services.
The Spectre attack has two parts. The first (variant 1) has to do with “bounds check” violation. Once again, when speculatively executing code, the chip might load some data that is later used to locate a second piece of data. As part of a performance optimisation, the chip might attempt to speculatively load the second piece of data before it has validated that the first is within a defined range of values. If this happens, it is possible to arrange for code to execute speculatively and read data it should not into the system caches, from where it can be extracted using a side-channel attack similar to the one discussed before.
Mitigating the first part of Spectre involves adding what we call “load fences” throughout the kernel. They prevent the speculation hardware from attempting to perform a second load based upon a first load. These require small, trivial, and not particularly performance-impacting changes throughout the kernel source. Our toolchain team has developed some tooling and worked with others to help determine where these load fences should be located.
Second part of Information security Spectre (variant 2) is in some ways the most interesting. It has to do with “training” the branch predictor hardware to favour speculatively executing pieces of code over those it should be executing. A common hardware optimisation is to base the behaviour of a given branch choice upon the location in memory of the branch code itself. Unfortunately, the way in which this memory location is stored isn’t unique between an application and the operating system kernel. This allows for the predictor to be trained to speculatively run whatever code the attacker would like. By carefully choosing a “gadget” (existing code in the kernel that has access to privileged data) the attacker can load sensitive data in the chip caches, where the same kind of side-channel attack once again serves to extract it.
One of the biggest problems posed by this second part of Information security Spectre is its potential to exploit the boundary between the operating system kernel and a hypervisor, or between different virtual machines running on the same underlying hardware. The branch predictor can be trained by one virtual machine to cause privileged code in the hypervisor (or another virtual machine instance) to access trusted hypervisor data which can be extracted using a side channel. This poses a significant risk to private and public cloud environments running unpatched servers.
Mitigating this second part of Information security Spectre requires that the operating system (selectively) disable branch prediction hardware whenever a program requests operating system (system call) or hypervisor services, so that any attempt by malicious code to train the predictor won’t carry over into the operating system kernel, the hypervisor, or between untrusted virtual machines running on the same server. This approach works well, but it comes at a performance penalty that is not insignificant. Red Hat’s patches will default to implementing the security change and accepting the performance impact, but we’ve also added system administrators the ability to toggle this (and all the implemented settings) on or off. We are also working with the larger Linux community to reduce this impact over time by examining alternatives to disabling branch prediction. One possible alternative is known as a “retpoline”, a specially contrived way to run operating system kernel code that prevents incorrect branch speculation.
Hopefully, this post has given a little more insight into these highly sophisticated attacks. Exploiting them is far from trivial, mitigations are possible, and while some examples are now available online for Information security Meltdown (variant 3), patches are available via updates shipping from major vendors like Red Hat. Over time, additional, related vulnerabilities may be discovered, and example code to exploit them posted online, so it’s important to keep up to date with security fixes as they become available.
It’s important to bear in mind that these are early days following the discovery of an entirely new class of system security vulnerabilities, and, as a result, mitigations and associated best practice advice may change over time.