Online.net Services status

  • Status Progress
  • Percent Complete
    40%
  • Task Type Bug Report
  • Category Backend / Core
  • Severity Critical
  • Priority Flash

FS#1116 - Scaleway/Online - Emergency security update required

Several days ago, we became aware of a security vulnerability impacting x86 and ARM processors used by Scaleway and other cloud providers.
Our Security team proactively took the decision to perform a major security update on all our hypervisors.

We will perform a security update tomorrow of all impacted hypervisors and will need to reboot servers running on top of them.
A maintenance window has been scheduled between the 01/04/18, starting at 7am UTC and the 01/06/18, ending at 7am UTC.

During this maintenance, servers running on top of impacted hypervisors will be unavailable for a few minutes during the reboot phase.
We will reboot cluster one at a time to limit downtime on your infrastructure.

We sincerely apologize for the short delay of this notice, we believe security and privacy is crucial on cloud platforms and we decided today to trade some availability in favor of security.

The Scaleway Security Team

You can checkout Intel's statement at the following address: https://newsroom.intel.com/news/intel-responds-to-security-research-findings/
We're also maintaining updates on our blog: https://blog.online.net/2018/01/03/important-note-about-the-security-flaw-impacting-arm-intel-hardware/

This task does not depend on any other tasks.

Thursday, 04 January 2018, 11:15 GMT
##### Update #1 - 01/04/18 11am UTC

According to the latest update from Intel, a microcode is required to completely fix the bug. The microcode release date is, at this time, scheduled for an undisclosed confidential unacceptable late date. Due to the emergency, we decided to perform a first reboot of the platform to update the hypervisor Kernels right now, even if we need to perform a second one when the microcode will be available.

We will start by patching our Workload Intensive hypervisors in the coming hours.

According to the latest update from Cavium, ThunderX SoCs are NOT vulnerable. We're still waiting for a more throughout update.
Thursday, 04 January 2018, 11:59 GMT
### Update #2 - 01/04/18 12am UTC

We just released the 4.14.11 Kernel bootscript so every Scaleway customers can move to a fixed kernel. To upgrade your server kernel, simply change your bootscript by selecting the `4.14.11 rev1` and then reboot your server. A soft-reboot from the OS is sufficient to apply the change.

If you have a large scale deployment, this can of course be automated with our CLI. Checkout the snippet to perform the operation.
Thursday, 04 January 2018, 14:29 GMT
### Update #3 - 01/04/18 14am UTC

We are currently mailing Online Dedibox customers to inform them about critical security vulnerabilities affecting many CPU architectures (CVE-2017-5753 - http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5753, CVE-2017-5715 - http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5715 and CVE-2017-5754 http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5754).
To our knowledge and at this time, no fixed kernels are officially shipped in any distribution but we encourage you to regularly check for security updates to perform an upgrade of your kernel once available.

For more information regarding these vulnerabilities, checkout the following links:
- https://meltdownattack.com/
- https://googleprojectzero.blogspot.fr/2018/01/reading-privileged-memory-with-side.html
Thursday, 04 January 2018, 14:57 GMT
##### Update #4 - 01/04/18 15pm UTC

We are starting to fix a small batch of Starter Cloud and Workload Intensive hypervisors and plan to increase the deployment of the patch in the coming hours depending of our initial results.
Thursday, 04 January 2018, 16:25 GMT
##### Update #5 - 01/04/18 16pm UTC

Online Web Hosting Platform fix deployment is in progress.
During this operation, expect a few minutes of downtime during the reboot phase of the core infrastructure servers.
We are carefully monitoring the performance impact.
Thursday, 04 January 2018, 19:07 GMT
#### Update #6 - 01/04/18 19pm UTC

From the latest news we got from concordant sources:
- We have the confirmation that Meltdown is fixed by upgrading the kernel including KPTI starting with the 4.14.11 kernel version
- The combination of the kernel update and microcode completely fix Meltdown & Spectre vulnerabilities. At this time, we do not have any microcode available for any of our Online Dedibox and Scaleway cloud servers.
- Concerning the performance impact, we now know that both, the microcode upgrade and the kernel upgrade, will generate a non negligible performance impact, especially with IO intensive applications. We have no precise idea of the performance reduction at this time.
- We decided to delay the reboot of all hypervisors until tomorrow 11am UTC to avoid multiple infrastructure downtime and fix both Meltdown and Spectre issues at the same time. This operation may be re-scheduled depending of the information we get in-between.

For customers using BareMetal servers, we will also need to apply the microcode on each server to secure from Spectre.
Earlier today, we shipped a 4.14.11 kernel available via bootscript. We strongly encourage you to update your server kernel and reboot as soon as possible.
Thursday, 04 January 2018, 20:29 GMT
##### Update #7 - 01/04/18 8pm UTC

Several days ago, we became aware of a security vulnerability impacting x86 and more recently ARM processors used by Scaleway and other cloud providers.

Since 72 hours now, our security & SRE teams are working to understand & eradicate the Meltdown and Spectre vulnerabilities impacting our servers and all CPUs worldwide.

Due to the incomplete information provided by hardware manufacturers, we joined forces with other potentially impacted cloud providers including Linode, Packet, OVH and created a dedicated communication channel to share information and work all together to address the Meltdown & Spectre vulnerabilities.

A few minutes ago, we got confirmation from Supermicro that they will deliver a microcode upgrade for our Workload Intensive servers tomorrow evening. We should then be able to totally secure our Workload Intensive hypervisors.

At this stage, we are also working about these issues for all our other offers.
Thursday, 04 January 2018, 22:54 GMT
##### Update #8 - 01/04/18 10:55 pm UTC

Our current understanding of the situation is that, on Intel CPUs:
- Spectre 1 (bounds check bypass, CVE-2017-5753) is both hardly exploitable and hard to patch but with a limited impact
- Spectre 2 (branch target injection, CVE-2017-5715) will be fixed using a microcode update on the short term (the exact delay depends of Intel and the server manufacturers) with a performance impact. We currently have no confirmation on the exact Kernel version needed to work in conjunction with the microcode update. Our current assumption is that the fixes needed are not yet in the main kernel tree and that it will be merged in 4.15. On the longterm, the vulnerability could be fixed by Retpoline to reduce the performance impact but due to large amount of work (everything needs to recompiled), it will probably not be available before several weeks.
- Meltdown (rogue data cache load, CVE-2017-5754) is completely mitigated by the KPTI patches merged on 4.14.11

We will continue to deploy patches to solve the Meltdown and Spectre issue during the coming days. Our ability to resolve the Spectre 2 vulnerability directly depends of the release speed of both Intel and the manufacturers.

Next upgrade tomorrow 9am UTC time.
Friday, 05 January 2018, 09:42 GMT
##### Update #9 - 01/05/18 9:30am UTC

Since yesterday evening, we are actively tracking the Linux Kernel tree and are currently waiting for the IBRS patches to be merged.

On the distribution side:
- Debian 9 already backported the KPTI patches on the 4.9 branch to mitigate Meltdown https://security-tracker.debian.org/tracker/CVE-2017-5754
- Dustin Kirkland, the VP of Product at Canonical announcedd the community can expect Ubuntu updated kernels by the original January 9, 2018 coordinated release date, and sooner if possible http://blog.dustinkirkland.com/2018/01/ubuntu-updates-for-meltdown-spectre.html
- OpenBSD is silent, no updates for NetBSD or FreeBSD but they have acknowledged the problem https://www.freebsd.org/news/newsflash.html#event20180104:01

At the same time we are investigating on the QEMU & KVM sides to understand the complete mitigation process to totally secure both the Guest and Host from all vulnerabilities.

We expect to receive the first microcode updates from our hardware providers to mitigate Spectre 2 in the coming hours.
Friday, 05 January 2018, 12:26 GMT
#### Update #10 - 01/05/18 12:10pm UTC

Last night, Digital Ocean, Vultr, Nexcess, prgmr.com joined our response dedicated communication platform to centralize efforts.

From our latest information, it seems that variant 3 (Meltdown) can not be exploited to cross VM boundaries on KVM due to the way memory is managed. A guest can not read memory of the hypervisor nor of another guest VM, even with virtio.
At this stage we believe variant 2 is exploitable on KVM, we are still investigating.

That means that all Scaleway cloud riders can already protect their servers from Meltdown by upgrading their servers bootscript.


We just received from Dell the microcode update for R730 and R730XD servers.
If you have a running Dell R730 and R730XD (Dedibox server), in the coming hours, you will be able to reboot your server to apply the microcode.

Important: note that the microcode doesn’t fix the vulnerabilities without the kernel update. At the moment fix kernels are not yet publicly distributed

We will send an email to all our Dedibox customers when we will get all the microcodes and the updated kernels.

At this time we did not receive any microcode or information from other hardware vendors including: HP, QCT, IBM, Cavium
Friday, 05 January 2018, 14:10 GMT
Update #11 - 01/05/18 2pm UTC
The microcode for Dell R730 and R730XD servers (Dedibox ENT SATA 2015, ENT SSD 2015, mWOPR SATA 2015, mWOPR SSD 2015, WOPR SATA 2015, WOPR SSD 2015, ST12 SSD 2016 and ST24 SSD 2016) has been deployed.

If you have one of the server listed above, you can reboot to apply the microcode. It’s a permanent microcode fix from BIOS!

Important:
- The reboot can take up to 15 minutes due to the microcode update.
- Note that the microcode doesn’t fix the vulnerabilities without the kernel update. At the moment kernel fixes are not yet publicly distributed
- We will send an email to all our Dedibox customers once we get all the microcodes and the updated kernels.

We received the QCP microcode for X10E-9N (Dedibox LT and MD 2017). We expect to deploy it in a few minutes.

90% of our shared hosting platform is patched against Meltdown.
Friday, 05 January 2018, 17:07 GMT
Update #13 - 01/05/18 4:45pm UTC

Dedibox & Scaleway - Starting now, we will maintain the status of all our server ranges via the table available here: https://blog.online.net/2018/01/03/important-note-about-the-security-flaw-impacting-arm-intel-hardware/
Important note: BIOS for X10E-9N, DSS1510, DSS2500, R730 are already pushed and available via a single soft reboot
Important:
1/ Unlike the live microcode update, the microcode fix via BIOS upgrade is permanent and is not distribution dependant.
2/ The reboot can take up to 15 minutes due to the microcode update.
3/ Note that the microcode doesn’t fix the vulnerabilities without the kernel update. At the moment kernel fixes are not yet publicly distributed

Web Hosting - 99% of our shared hosting platform is patched against Meltdown.
Scaleway Customers Kernels - We are building 4.14.12 and LTS 4.9.75 & 4.4.100 kernels. Kernel 4.14.11 is available via bootscript since yesterday 3am UTC. The 4.14.11 is stable and fixes the Meltdown vulnerability.
Scaleway ARMv8 - Cavium confirms that ThunderX is not affected at all by Meltdown, Spectre 1 and Spectre 2.
Friday, 05 January 2018, 18:54 GMT
Update #14 - 01/05/18 6:15pm UTC

*Online Cloud Web Hosting*

Tomorrow morning, we will upgrade the Online Cloud Web Hosting platform.
During this maintenance, expect a few minutes of downtime during the reboot phase of the servers. We are carefully monitoring the performance impact.

*Scaleway X64 Workload Intensive servers*

A few minutes ago, we received a release candidate of the Supermicro patched BIOS including the fixed microcode. We are currently testing and validating this.
We plan to upgrade X64 Workload Intensive hypervisors as soon as fixed kernels are publicly distributed.

*Dedibox Classic 2016*

The patched BIOS for X10SDE server (Dedibox Classic 2016) will be deploying once we finish a short validation. We will update the table when effective.
Friday, 05 January 2018, 23:49 GMT
Update #15 - 01/05/18 23:30 UTC

We released the 4.14.12, 4.9.75 and 4.4.110 kernels available via bootscript for all x86-64 servers. To secure from Meltdown, simply change your bootscript by selecting any of these kernel and then reboot your server. A soft-reboot from the OS is sufficient to apply the change.

If you have a large scale deployment, this can of course be automated with our CLI. Checkout the snippet to perform the operation.

Next upgrade tomorrow morning UTC time.
Saturday, 06 January 2018, 10:23 GMT
Update #16 - 01/06/18 10am UTC

*Online Cloud Web Hosting*

We are currently upgrading the Online Cloud Web Hosting platform to secure from Meltdown.
During the maintenance, expect a few minutes of downtime during the reboot of the servers. We are carefully monitoring the performance impact.

We will update the status when the platform is secured.
Sunday, 07 January 2018, 11:30 GMT
Update #17 - 01/07/18 11am UTC
*Dedibox*

We have received several BIOS including the updated Microcode from SuperMicro yesterday night at 2:30am UTC.
Our team validated and deployed yesterday the BIOS on the following Dedibox offers:

- Dedibox Classic 2016
- Dedibox LT 2016, Dedibox MD 2016

If you have one of the server listed above, you can reboot (a soft-reboot from the OS is sufficient to apply the change.) to apply the microcode. It's a permanent microcode via BIOS update!

Important:
1. Unlike the live microcode update, the microcode fix via BIOS upgrade is permanent and is not distribution dependant.
2. The reboot can take up to 15 minutes due to the microcode update.
3. Note that the microcode doesn't fix the vulnerabilities without the kernel update. At the moment kernel fixes are not yet publicly distributed

*Scaleway Workload Intensive servers*

The microcode update for the Scaleway Workload Intensive Servers is now completely validated, we're waiting for the kernel level patches to update our fleet.
Monday, 08 January 2018, 11:59 GMT
Update #18 - 01/08/18 12pm UTC
Two new cloud providers AWS, Tata Communications and core members of the Red Hat and Ubuntu teams joined the mitigation task force!

*Scaleway Cloud Platform and Online Web Hosting*
We are currently working on :
- improving the Hypervisor upgrade process to reduce downtime
- Spectre 2 mitigation using an IBRS enabled Kernel (still waiting for patches to be merged) + Microcode Upgrade
- Retpoline Testing

To increase global readability about Spectre and Meltdown mitigation, we're working on a dedicated status page.

Loading...