Earlier this month, security researcher Jann Horn from Google’s Project Zero reported the discovery of some serious vulnerabilities in most modern CPUs, most notably Intel. According to researchers, “virtually every user of a personal computer” are at risk.
The vulnerabilities, which are now collectively being referred to as Meltdown and Spectre, exploit performance enhancement features. The vulnerabilities could give hackers access to passwords, photos, and other sensitive data if exposed. Intel CPUs, which drive most computer platforms in the market today, are most heavily affected because they have a few tweaks that AMD/ARM CPUs do not. ARM CPUs are more common in mobile devices, and they are apparently not affected by the Meltdown attack, but are mostly vulnerable to the Spectre attacks.
These attacks allow one process to access data, including memory page contents, that would otherwise be restricted. According to Google’s research, this could even extend to the point of breaking the VM isolation boundary, e.g. a VM running malicious code for this attack may be able to access memory of another VM on the same host. This is detrimental for data centers and even worse for cloud platforms.
Cisco was one of the first networking vendors to come forward with comments on the situation. They assured end-users that there is no attack vector on their appliance platforms unless another vulnerability to execute arbitrary code is first exploited. In other words, these attacks require local code to run on the target machine. Since routers, switches, firewalls, and load balancers generally do not allow just anyone to run a piece of software, an attacker would need to first use another exploit to enable the attacker to run a piece of arbitrary code, and then could theoretically execute one of these new attacks. Such arbitrary code execution vulnerabilities are found from time to time, but keeping up with software updates on networking appliances should prevent most attacks. This also goes for virtual appliances (such as ASAv, CSR1000V, NGFWv, etc.). Though Cisco acknowledges that if the virtual server platform these systems are running on is compromised, the virtual appliances could be impacted.
One exception to the limited impact on routers and switches is those that can run user-specified processes in containers, such as the Open Agent container feature on some Cisco Nexus switches or Open Service containers on Cisco ASR routers. Always make sure you run software from trusted sources in containers on your network equipment.
That leads us to where the real impact is: server platforms
Attackers leveraging Spectre and Meltdown will primarily be targeting servers. Even though they are relatively difficult to execute, these attacks still have the potential to surface. Since server OS’ can, by nature, execute arbitrary code, these exploits can be run on a system that has that malicious code placed on them in some way. As of right now, no viruses/worms/malware are known which use these attacks, but it’s probably just a matter of time. These exploits cannot be run just by visiting a website, viewing an email, etc., they need code to run locally on the target machine. But, typical malware insertion methods (whether a worm or phony email attachments) could be used to get the code to a target machine when combined with other attacks.
UCS servers are vulnerable. This is because, like most servers today, they have Intel CPUs. This is not Cisco’s fault, but they use the impacted CPU components, like most every server issued since around 1995. Per the Cisco PSIRT linked above, there will be microcode updates available on February 18th for most Cisco server platforms, which, when applied in concert with applicable OS updates, should mitigate the vulnerabilities. The UCS update will be a BIOS update that updates the CPU microcode.
Mitigations and Mitigation Impacts:
For Meltdown, the easiest mitigation is an OS patch. Apple, Microsoft, and Linux all have patches available. Operating Systems derived from Linux will need to be updated once the upstream updates are incorporated into their builds. This is the case for many embedded platforms (including some Cisco routers, switches, etc.).
There is a CPU performance impact since the mitigation is to disable performance features. The extent of that performance hit is highly dependent on workload. Apparently, a single-workload system or one which does limited context switching (e.g., an end-user laptop/desktop) will not see a significant impact on the performance front. Our engineers suspect that dedicated network appliances (routers, switches) will not see much impact even when they are eventually patched at the OS level for the same reason.
Where is gets concerning is the server/data center/virtualization environment. Here, context switching is very frequent due to the multiple workloads running on each CPU. These mitigation patches require not only disabling some performance enhancements in the CPUs, but require doing MORE work to secure memory contents before switching to a different task. The numbers generally floating around for impact are 5-30% - that’s a lot. Again, it is hard to predict the impact on a specific workload.
Here is a graph showing the impact to two server instances before and after OS patching:
You can see a 10-20% jump in CPU. Ouch.
Click Here to see more examples of impacted workloads. Some have a negligible impact, and others are significant.
In the link above, Redis and PostgreSQL (both database platforms) saw the most impact.
Mitigating the Spectre attacks is more difficult and no specific mitigations have been released yet for most platforms.
So, the potential here is that everyone loses effectively CPU capacity in their compute environments. We can’t predict how much, but potentially enough to require a larger server farm to handle a workload than was previously required. Once patches are applied, end-customers will start to see the impacts to their workloads and will have to adjust future sizing appropriately – or in the worst case, they may need to backfill for “lost” CPU capacity if their workloads were already taxing their CPU resources.
Tips going forward:
- Keep an eye on the PSIRT page for Cisco details
- Conduct patch operation system procedures ASAP (despite the potential performance hit)
- Reach out to your H.A. account manager to set up any upcoming UCS updates