reduce image to avoid big gap

This commit is contained in:
Arthur Grisel-Davy 2023-06-29 22:28:58 -04:00
parent bd43a83046
commit fa74984497

View file

@ -65,7 +65,7 @@
#import "template.typ": * #import "template.typ": *
#show: ieee.with( #show: ieee.with(
title: "Independent Few-shot Firmware Integrity Verification with Side-Channel Power Analysis", title: "Independent Boot Process Verification using Side-Channel Power Analysis",
abstract: [ abstract: [
Firmware attacks on embedded systems can have disastrous security implications. Firmware attacks on embedded systems can have disastrous security implications.
Through the firmware update mechanism, an attacker can tamper with the firmware to open known vulnerabilities, change security settings, or deploy custom backdoors, to pave the way for subsequent attacks or gain complete machine control. Through the firmware update mechanism, an attacker can tamper with the firmware to open known vulnerabilities, change security settings, or deploy custom backdoors, to pave the way for subsequent attacks or gain complete machine control.
@ -76,7 +76,7 @@
In this paper, we propose a physics-based @IDS called the @BPV that only relies on side-channel power consumption measurement to verify the integrity of the bootup sequence. In this paper, we propose a physics-based @IDS called the @BPV that only relies on side-channel power consumption measurement to verify the integrity of the bootup sequence.
The @BPV works in complete independence from the machine to protect and requires only a few nominal training samples to establish a baseline of nominal behaviour. The @BPV works in complete independence from the machine to protect and requires only a few nominal training samples to establish a baseline of nominal behaviour.
The range of application of this approach potentially extends to any embedded systems. The range of application of this approach potentially extends to any embedded systems.
We present test cases that illustrate the performances of the @BPV for micro-PC, network equipment (switches and wireless access points), and a drone. We present three test cases that illustrate the performances of the @BPV on micro-PC, network equipment (switches and wireless access points), and a drone.
], ],
authors: ( authors: (
( (
@ -121,7 +121,7 @@
= Introduction = Introduction
The firmware of any embedded system is susceptible to attacks. Since firmware provides many security features, it is always of major interest to attackers. The firmware of any embedded system is susceptible to attacks. Since firmware provides many security features, it is always of major interest to attackers.
Every year, a steady number of new vulnerabilities are discovered. Any device that requires firmware, such as computers @185175, @PLC @BASNIGHT201376, or @IoT devices @rieck2016attacks, is vulnerable to these attacks. Every year, new firmware vulnerabilities are discovered. Any device that requires firmware, such as computers @185175, @PLC @BASNIGHT201376, or @IoT devices @rieck2016attacks, is vulnerable to these attacks.
There are multiple ways to leverage a firmware attack. Reverting firmware to an older version allows an attacker to reopen discovered and documented flaws. There are multiple ways to leverage a firmware attack. Reverting firmware to an older version allows an attacker to reopen discovered and documented flaws.
Cancelling an update can ensure that previously deployed attacks remain available. Finally, implementing custom firmware enables full access to the machine. Cancelling an update can ensure that previously deployed attacks remain available. Finally, implementing custom firmware enables full access to the machine.
@ -135,32 +135,30 @@ The integrity verification can also be performed at run-time as part of the firm
The above solutions to firmware attacks share the common flaw of being applied to the same machine they are installed on. The above solutions to firmware attacks share the common flaw of being applied to the same machine they are installed on.
This allows an attacker to bypass these countermeasures after infecting the machine. This allows an attacker to bypass these countermeasures after infecting the machine.
An attacker that could avoid triggering a verification, tamper with the verification mechanism, feed forged data to the verification mechanism, or falsify the verification report could render any defense useless. An attacker that could avoid triggering a verification, tamper with the verification mechanism, feed forged data to the verification mechanism, or falsify the verification report could render any defense useless.
// This idea of necessary independence between the target and the @IDS can be summarized in the following statement.\ In order to avoid this flaw, the @IDS must leverage data that can be trusted even from a compromised machine.
// #align(center,text(weight: "bold", [An @IDS is incoherent if its deployment requires the cooperation of the entity it pretends to protect.])) @IDS are then subject to a trade-off between having access to relevant and meaningful information and keeping the detection mechanism separated from the target machine.
@IDS are subject to a trade-off between having access to relevant and meaningful information and keeping the detection mechanism separated from the target machine.
Our solution addresses this trade-off by leveraging side-channel information. Our solution addresses this trade-off by leveraging side-channel information.
== Contributions == Contributions
This paper presents a novel solution for firmware verification using side-channel analysis. This paper presents a novel solution for firmware verification using side-channel analysis.
Building on the assumption that every security mechanism operating on a host is vulnerable to being bypassed, we propose using the device's power consumption signature during the firmware execution to assess its integrity. Building on the assumption that every security mechanism the host cooperation is vulnerable to being bypassed, we propose using the device's power consumption signature during the firmware execution to assess its integrity.
Because of the intrinsic properties of side-channel information, the integrity evaluation is based on does not involve any communication with the host and is based on data difficult to forge. Because of the intrinsic properties of side-channel information, the integrity evaluation does not involve any communication with the host and is based on trustworthy data.
A distance-based outlier detector that uses power traces of a nominal boot-up sequence can learn the expected pattern and detect any variation in a new boot-up sequence. A distance-based outlier detector that uses power traces of a nominal boot-up sequence can learn the expected pattern and detect any variation in a new boot-up sequence.
This novel solution can detect various attacks centred around manipulating firmware. This novel solution can detect various attacks centered around manipulating firmware.
In addition to its versatility of detection, it is also easily retrofittable to almost any embedded system with @DC input and a consistent boot sequence. In addition to its versatility of detection, it is also easily retrofittable to almost any embedded system.
It requires minimal training examples and minor hardware modification in most cases, especially for DC-powered devices. It requires minimal training examples and minor hardware modification in most cases, especially for DC-powered devices.
== Paper Organization == Paper Organization
We elaborate on the type of attacks that our method aims to mitigate in the threat model @threat and the technology we leverage to capture relevant information in Section @SCA. We elaborate on the type of attacks that our method aims to mitigate in the threat model @threat, and the technology we leverage to capture relevant information in Section @SCA.
@bpv describes the proposed solution. @bpv describes the proposed solution.
@exp-network,~@exp-drone, and~@aim present test cases that illustrate applications and variations of the @BPV. @exp-network,~@exp-drone, and~@aim present test cases that illustrate applications and variations of the @BPV.
Finally, the paper finishes with @discussion that provides more insight on specific aspects of the proposed solution and Section~@conclusion for the conclusion. Finally, the paper ends with @discussion that provides more insight on specific aspects of the proposed solution and @conclusion for the conclusion.
= Related Work = Related Work
Historically, the firmware was written on a @ROM, and it impossible to change. Historically, the firmware was written on a @ROM, and it impossible to change.
With the growing complexity of embedded systems, manufacturers developed procedures to enable remote firmware upgrades. With the growing complexity of embedded systems, manufacturers developed procedures to enable remote firmware upgrades.
Firmware upgrades can address performances or security flaws or, less frequently, add features. Firmware upgrades can address performances or security flaws or add features.
Unfortunately, attackers can leverage these firmware upgrade mechanisms to implement unauthorized or malicious pieces of software in the machine. Unfortunately, attackers can leverage these firmware upgrade mechanisms to implement unauthorized or malicious pieces of software in the machine.
Almost all embedded systems are vulnerable to firmware attacks. Almost all embedded systems are vulnerable to firmware attacks.
In industrial applications, studies proposed firmware attacks on control systems such as power systems field devices @power-devices, @PLC @plc_firmware, or any other industrial embedded system @santamarta2012here. In industrial applications, studies proposed firmware attacks on control systems such as power systems field devices @power-devices, @PLC @plc_firmware, or any other industrial embedded system @santamarta2012here.
@ -171,14 +169,14 @@ Manufacturers have implemented different security mechanisms to prevent firmware
The most common protection is code signing @8726545 @4531926. The most common protection is code signing @8726545 @4531926.
The firmware code is cryptographically signed, or a checksum is computed. The firmware code is cryptographically signed, or a checksum is computed.
This method suffers many possible bypasses. This method suffers many possible bypasses.
First, the firmware can be modified at the manufacturer level @BASNIGHT201376, generating a trusted signature of the modified firmware. First, an attack can modify the firmware at the manufacturer level @BASNIGHT201376, generating a trusted signature of the modified firmware.
Second, the verification can be bypassed @9065145. Second, malware can bypass the verification @9065145.
Finally, the result of the test can be forged to report valid firmware, even with dedicated hardware @thrangrycats. Finally, an attacker can forge the result of the test to report valid firmware, even with dedicated hardware @thrangrycats.
Blockchain technology is also considered for guaranteeing firmware integrity @blockchain1. Blockchain technology is also considered for guaranteeing firmware integrity @blockchain1.
Blockchain is a cryptographic chain of trust where each link is integrated into the next to guarantee that the information in the chain has not been modified. A blockchain is a cryptographic chain of trust where each link is integrated into the next to guarantee that the information in the chain has not been modified.
This technology could provide software integrity verification at each point where a supply chain attack is possible. This technology could provide software integrity verification at each point where a supply chain attack is possible.
However, the blockchain still needs to be verified at some point, and this verification can still be bypassed or forged. However, the blockchain still needs to be verified at some point, and this verification can still be bypassed or forged.
Overall, no security mechanism that requires interacting with the host machine can guarantee firmware integrity as the host machine can already be compromised and thus produce forged results. Overall, no security mechanism that requires interacting with the host machine can guarantee firmware integrity as a compromised machine can produce forged results.
// SCA provides a way to verify the integrity without interacting with the host. // SCA provides a way to verify the integrity without interacting with the host.
Historically, attackers leveraged @SCA in general and power analysis in particular @sca_attack. Historically, attackers leveraged @SCA in general and power analysis in particular @sca_attack.
@ -186,7 +184,7 @@ Power consumption generally leaks execution information about the software activ
Clark et al. proposed a method to identify web page visits from @AC power @clark_current_2013. Clark et al. proposed a method to identify web page visits from @AC power @clark_current_2013.
Conti et al. developed a method for identifying laptop-user pairs from power consumption @10.1145-2899007.2899009. Conti et al. developed a method for identifying laptop-user pairs from power consumption @10.1145-2899007.2899009.
Seemingly harmless power consumption data from a mobile phone can even leak position data @michalevsky2015powerspy. Seemingly harmless power consumption data from a mobile phone can even leak position data @michalevsky2015powerspy.
All these methods illustrate the potential of power side channels for attacks, but a well-intention program could also leverage them for defense purposes. All these methods illustrate the potential of power side channels for attacks, but a well-intention program could also leverage them for defense.
After all, the lack of interaction required with the machine benefits the defense mechanism by increasing bypasses difficulty. After all, the lack of interaction required with the machine benefits the defense mechanism by increasing bypasses difficulty.
Following this idea, Clark et al. @wud proposed in 2013 a power consumption-based malware detector for medical devices. Following this idea, Clark et al. @wud proposed in 2013 a power consumption-based malware detector for medical devices.
Hernandez et al. included power consumption with network data for malware detection @8855288. Hernandez et al. included power consumption with network data for malware detection @8855288.
@ -198,11 +196,11 @@ Second, it is hard to fake from the developer's point of view. Because of the mu
This is especially true when considering firmware or machines with low computation capabilities or highly specialized devices that have deterministic and stable execution patterns at boot-up. This is especially true when considering firmware or machines with low computation capabilities or highly specialized devices that have deterministic and stable execution patterns at boot-up.
However, to the best of our knowledge, no work leveraged the same data or method for firmware integrity verification. However, to the best of our knowledge, no work leveraged the same data or method for firmware integrity verification.
Bootups are a natural target for defensive purposes are they are notoriously hard to protect, and host-based @IDS are not yet active in defending the machine. Bootups are a natural target for defensive purposes are they are notoriously hard to protect, and host-based @IDS are not yet active to defend the machine.
Moreover, bootup produces significantly more consistent power consumption than normal operation on general-purpose machines as it follows a pre-defined process. Moreover, bootup produces significantly more consistent power consumption than normal operation on general-purpose machines as it follows a pre-defined process.
In light of the potential of side-channel attacks, some work proposed manipulating power consumption patterns. In light of the potential of side-channel attacks, some work proposed manipulating power consumption patterns.
Defense mechanism like Maya @pothukuchi2021maya proposes to obfuscate specific activity pattern by applying a control method to target a pre-defined mask. Defense mechanisms like Maya @pothukuchi2021maya propose to obfuscate specific activity patterns by applying a control method to target a pre-defined mask.
If changing the power consumption pattern of software to impersonate another is possible, that could decrease the potential of side-channel-based @IDS. If changing the power consumption pattern of software to impersonate another is possible, that could decrease the potential of side-channel-based @IDS.
However, the current work is designed for defense and aims at obfuscating the patterns by applying masks with the goal of making all power signatures similar, not impersonating a specific one. However, the current work is designed for defense and aims at obfuscating the patterns by applying masks with the goal of making all power signatures similar, not impersonating a specific one.
Thus, power consumption remains a trustworthy source of information as a different set of instructions necessarily generates a different power consumption. Thus, power consumption remains a trustworthy source of information as a different set of instructions necessarily generates a different power consumption.
@ -215,23 +213,24 @@ If given enough time, information or access, an attacker could take complete con
A firmware modification is defined as implementing any change in the firmware code. A firmware modification is defined as implementing any change in the firmware code.
Modifications include implementing custom functions, removing security features, or changing the firmware for a different version (downgrade or upgrade). Modifications include implementing custom functions, removing security features, or changing the firmware for a different version (downgrade or upgrade).
As long as the firmware is different from the one expected by the system administrator, we consider that it has been modified. As long as the firmware is different from the one expected by the system administrator, we consider that it has been modified.
Downgrading the firmware to an older version is an efficient way to render a machine vulnerable to attacks. Downgrading the firmware to an older version (also called firmware rollback) is an efficient way to render a machine vulnerable to attacks.
Opposite to writing custom firmware, it requires little information about the machine. Opposite to writing custom firmware, it requires little information about the machine.
All the documentation and resources are easily accessible online from the manufacturer. All the documentation and resources are easily accessible online from the manufacturer.
Even the exploits are likely to be documented as they are the reason for the firmware upgrade. Even the exploits are likely to be documented as they are the reason for the firmware upgrade.
An attacker would only need to wait for vulnerabilities to be discovered and then revert the firmware to an older version. An attacker would only need to wait for vulnerabilities to be discovered and then revert the firmware to an older version.
These properties make the firmware downgrade a powerful first step to performing more elaborate attacks. These properties make the firmware downgrade a powerful first step to performing more elaborate attacks.
Manufacturers sometimes implement firmware anti-rollback mechanisms to prevent this type of attack, but they are also vulnerable to bypass.
Custom firmware may be required for more subtle or advanced attacks. Custom firmware may be required for more subtle or advanced attacks.
This requires more work and information as firmware codes are usually not open source and are challenging to reverse engineer. This requires more work and information as firmware codes are usually not open source and are challenging to reverse engineer.
Moreover, the firmware is tailored for a specific machine, and it can be difficult for an attacker to perform a custom firmware attack. Moreover, the firmware is tailored for a specific machine, and it can be difficult for an attacker to perform a custom firmware attack.
Although, if a custom firmware can be successfully implemented, almost any attack can be performed. Although, the successful implementation of custom firmware can lead to performing almost any attack.
Finally, a firmware upgrade could also be used to open a newly discovered vulnerability. Finally, a firmware upgrade could also be used to open a newly discovered vulnerability.
A complete firmware change is another form of firmware manipulation. A complete firmware change is another form of firmware manipulation.
The manufacturer's firmware is replaced by another available firmware that supports the same machine. The manufacturer's firmware is replaced by another available firmware that supports the same machine.
Such alternatives can be found for computers @coreboot, routers @owrt @ddwrt @freshtomato, but also video game consoles or various embedded machines. Such alternatives can be found for computers @coreboot, routers @owrt @ddwrt @freshtomato, but also video game consoles or various embedded machines.
These alternative firmware are often open-source and provide more features, capabilities and performances as they are updated and optimized by their community. These alternative firmware are often open-source and provide more features, capabilities and performances as they are updated and optimized by their community.
Implementing alternative firmware on a machine could allow an attacker to gain control of it without necessarily alerting the end user. Although these firmware are typically not malicious, implementing alternative firmware on a machine could allow an attacker to gain control of it without necessarily alerting the end user.
// = Side Channel Analysis<sca> // = Side Channel Analysis<sca>
// @SCA leverages the emissions of a system to gain information about its operations. // @SCA leverages the emissions of a system to gain information about its operations.
@ -278,21 +277,21 @@ However, the bootup of a machine is a rare event, and thus the training dataset
The training sequence of the @BPV computes the distance threshold based on a statistical description of the distribution of the distance between each pair of normal traces. The training sequence of the @BPV computes the distance threshold based on a statistical description of the distribution of the distance between each pair of normal traces.
The training sequence follows two steps. The training sequence follows two steps.
+ The sequence computes the distance between all pairs of training traces $D = {d(t_i,t_j) forall i,j in [1,...,N]^2; i eq.not j }$. + The sequence computes the Euclidean distance between all pairs of training traces $D = {d(t_i,t_j) forall i,j in [1,...,N]^2; i eq.not j }$.
+ The sequence computes the threshold as $"thresh" = 1.5 dot "IQR"(D)$ with IQR the Inter-Quartile Range of the distances set $D$. + The sequence computes the threshold as $"thresh" = 1.5 dot "IQR"(D)$ with IQR the Inter-Quartile Range of the distances set $D$.
The @IQR is a measure of the dispersion of samples. The @IQR is a measure of the dispersion of samples.
It is based on the first and third quartiles and defined as $ "IQR" = Q_3 - Q_1$ with $Q_3$ being the third quartile and $Q_1$ being the first quartile. It is based on the first and third quartiles and defined as $ "IQR" = Q_3 - Q_1$ with $Q_3$ being the third quartile and $Q_1$ being the first quartile.
This value is commonly used @han2011data to detect outliers as a similar but more robust alternative to the $3"sigma"$ interval of a Gaussian distribution. This value is common @han2011data for detecting outliers as a similar but more robust alternative to the $3 times sigma$ interval of a Gaussian distribution.
To apply the @IQR to the times series, we compute first compute the average of the NORMAL traces. To apply the @IQR to the times series, we first compute the average of the nominal traces.
This average serves as a reference for computing the distance of each trace. This average serves as a reference for computing the distance of each trace.
The Euclidean distance is computed between each trace and the reference, and the @IQR of these distances is computed. The Euclidean distance is computed between each trace and the reference, and the @IQR of these distances is computed.
The distance threshold takes the value $1.5 * "IQR"$. For the detection, the distance of each new trace to the reference is computed and compared to the threshold. The distance threshold takes the value $1.5 times "IQR"$. For the detection, the distance of each new trace to the reference is computed and compared to the threshold.
If the distance is above the threshold, the new trace is considered anomalous. The new trace is considered anomalous if the distance is above the threshold.
=== Support For Multi-modal Bootup Sequences<multi-modal> === Support For Multi-modal Bootup Sequences<multi-modal>
Some machines can boot following multiple different bootup sequences that are considered normal. Some machines can boot following multiple different bootup sequences that are considered normal.
There can exist various reasons for such behaviour. There exist various reasons for such behaviour.
For example, a machine can perform recovery operations if the power is interrupted while the machine is off or perform health checks on components that may pass or fail and trigger deeper inspections procedure. For example, a machine can perform recovery operations if the power is interrupted while the machine is off or perform health checks on components that may pass or fail and trigger deeper inspections procedure.
Because the machines are treated as black boxes, it is important for the @BPV to deal with these multiple modes during training. Because the machines are treated as black boxes, it is important for the @BPV to deal with these multiple modes during training.
See @online for more details about how the online training procedure deals with multi-modal models. See @online for more details about how the online training procedure deals with multi-modal models.
@ -317,11 +316,12 @@ This test case illustrates the first application of the @BPV and follows a sligh
First, the power consumption measurement does not only contain the consumption of the machine to protect. First, the power consumption measurement does not only contain the consumption of the machine to protect.
In some cases, capturing only the power consumption of the machine to protect is impossible. In some cases, capturing only the power consumption of the machine to protect is impossible.
For example, if the power connections follow proprietary designs or if the machine to protect is inaccessible (for practical or security reasons). For example, if the power connections follow proprietary designs or if the machine to protect is inaccessible (for practical or security reasons).
In this case, the data available is an aggregate of the machine to protect and a second machine. In this case, the available data is an aggregate of the machine to protect and a second machine.
The second machine does not perform any task, and its contribution to the aggregated power consumption is constant. The second machine does not perform any task, and its contribution to the aggregated power consumption is constant.
Second, anomalous examples of bootup sequences are available. Second, anomalous examples of bootup sequences are available.
This test case was designed with an industry partner for the detection of two specific attacks: bootup on an external USB drive and access to the machine's @BIOS. This test case was designed with an industry partner for the detection of two specific attacks: bootup on an external USB drive and access to the machine's @BIOS.
Because the machine and the expected attacks are known in advance, it is possible to tailor the @BPV's parameters for maximizing the performances at detecting the attacks.
Because the machine and the expected attacks are known in advance, it is possible to tailor the @BPV's parameters to maximize the performance at detecting the attacks.
Because of these two specificities, this test case should be regarded as a first iteration to demonstrate the potential of the @BPV in a more restrictive environment. Because of these two specificities, this test case should be regarded as a first iteration to demonstrate the potential of the @BPV in a more restrictive environment.
The following test cases in @exp-network and @exp-drone present other applications in more challenging environments. The following test cases in @exp-network and @exp-drone present other applications in more challenging environments.
@ -329,12 +329,12 @@ The following test cases in @exp-network and @exp-drone present other applicatio
This test case was conducted on a micro PC running Windows 8. This test case was conducted on a micro PC running Windows 8.
The available power consumption was an aggregate of two micro-pc, one being the machine to protect. The available power consumption was an aggregate of two micro-pc, one being the machine to protect.
The second machine remained idle for the duration of the experiment. The second machine remained idle for the duration of the experiment.
@l3-setup illustrates the setup for the data capture. // @l3-setup illustrates the setup for the data capture.
#figure( // #figure(
image("images/l3-setup.svg", width:100%), // image("images/l3-setup.svg", width:100%),
caption: [Overview of the setup for the test case.] // caption: [Overview of the experiment setup for test case 0.]
)<l3-setup> // )<l3-setup>
From these samples representing nominal bootups, it appears that the machine presents multiple bootup modes. From these samples representing nominal bootups, it appears that the machine presents multiple bootup modes.
Hence, the model is multi-modal with three modes. Hence, the model is multi-modal with three modes.
@ -359,16 +359,16 @@ The models are manually tuned to obtain 100% accuracy in the classification of n
Obtaining 100% accuracy illustrates that there is a clear separation between nominal and abnormal boot sequences for this type of attack. Obtaining 100% accuracy illustrates that there is a clear separation between nominal and abnormal boot sequences for this type of attack.
//#agd[could not redo the results as teh data for bios boot are missing] //#agd[could not redo the results as teh data for bios boot are missing]
Although this test case represents an unrealistic situation (mainly because the anomalous samples are accessible), it is still a valuable first evaluation of the @BPV. Although this test case represents an unrealistic situation (mainly because the anomalous samples are accessible during training), it is still a valuable first evaluation of the @BPV.
This test case serves as a proof-of-concept and indicates that there is a potential for the detection of firmware-level attacks with power consumption. This test case serves as a proof-of-concept and indicates that there is a potential for the detection of firmware-level attacks with power consumption.
The method detected the pre-defined attack with complete independence from the machine and with a perfect success rate. The @BPV detected the pre-defined attack with complete independence from the machine and with a perfect success rate.
Having access to anomalous samples enabled us to optimize the threshold placement to minimize false-positive (nominal bootups detected as anomalous) by relaxing the threshold value. Having access to anomalous samples enabled us to optimize the threshold placement to minimize false-positive (nominal bootups detected as anomalous) by relaxing the threshold value.
= Test Case 1: Network Devices<exp-network> = Test Case 1: Network Devices<exp-network>
To verify the performance of the proposed detector, we design an experiment that aims at detecting firmware modifications on different devices. To verify the performance of the proposed detector, we design an experiment that aims at detecting firmware modifications on different devices.
Networking devices are a vital component of any organization, from individual houses to complete data centers @downtime. Networking devices are a vital component of any organization, from individual houses to complete data centers.
A network failure can result in significant downtime that is extremely expensive for data centers. A network failure can result in significant downtime that is extremely expensive for data centers @downtime.
Compromised network devices can also result in data breaches and @APT. Compromised network devices can also result in data breaches and @APT.
These devices are generally highly specialized in processing and transmitting information as fast as possible. These devices are generally highly specialized in processing and transmitting information as fast as possible.
We consider four machines that represent consumer-available products for different prices and performance ranges. We consider four machines that represent consumer-available products for different prices and performance ranges.
@ -376,7 +376,7 @@ We consider four machines that represent consumer-available products for differe
- Asus Router RT-N12 D1. This router is a low-end product that provides switch, router and wireless access point capabilities for home usage. - Asus Router RT-N12 D1. This router is a low-end product that provides switch, router and wireless access point capabilities for home usage.
- Linksys Router MR8300 v1.1. This router is a mid-range product that offers the same capabilities as the Asus router with better performance at a higher price. - Linksys Router MR8300 v1.1. This router is a mid-range product that offers the same capabilities as the Asus router with better performance at a higher price.
- TP-Link Switch T1500G-10PS. This 8-port switch offers some security features for low-load usage. - TP-Link Switch T1500G-10PS. This 8-port switch offers some security features for low-load usage.
- HP Switch Procurve 2650 J4899B. This product is enterprise-oriented and provides more performance than the TP-Link switch. This is the only product of the selection that required hardware modification, as the power supply is internal to the machine. The modification consists in cutting the 5V cables to implement the capture system. - HP Switch Procurve 2650 J4899B. This product is enterprise-oriented and provides more performance than the TP-Link switch. This is the only product of the selection that required hardware modification, as the power supply is internal to the machine. The modification consists in cutting the 5V cables to install the capture system.
None of the selected devices supports the installation of host-based @IDS or firmware integrity verification. None of the selected devices supports the installation of host-based @IDS or firmware integrity verification.
The firmware is verified only during updates with a proprietary mechanism. The firmware is verified only during updates with a proprietary mechanism.
@ -402,12 +402,14 @@ The changes are listed in @tab-machines.
[Asus Router], [Latest EOM], [OpenWrt\ v21.02.2], [OpenWrt\ v21.02.0], [Asus Router], [Latest EOM], [OpenWrt\ v21.02.2], [OpenWrt\ v21.02.0],
[Linksys\ Router], [Latest EOM], [OpenWrt\ v21.02.2], [OpenWrt\ v21.02.0], [Linksys\ Router], [Latest EOM], [OpenWrt\ v21.02.2], [OpenWrt\ v21.02.0],
), ),
supplement: [Table],
kind: "table",
caption: [Machines used for the experiments and the changes applied.], caption: [Machines used for the experiments and the changes applied.],
)<tab-machines> )<tab-machines>
This experiment aims at simulating an attack situation by performing firmware modifications on the target devices and recording the boot-up power trace data for each version. This experiment aims at simulating an attack situation by performing firmware modifications on the target devices and recording the boot-up power trace data for each version.
For the switches, we flash different firmware versions provided by the \gle{oem}. For the switches, we flash different firmware versions provided by the @OEM.
For wireless routers, their firmware is changed from the @OEM to different versions of #link("https://openwrt.org/")[OpenWrt]. For wireless routers, their firmware is changed from the @OEM to different versions of #link("https://openwrt.org/")[OpenWrt].
In this study, we consider the latest @OEM firmware version to be the nominal version, expected to be installed on the machine by default. In this study, we consider the latest @OEM firmware version to be the nominal version, expected to be installed on the machine by default.
Any other version or firmware represents an attack and is considered anomalous. Any other version or firmware represents an attack and is considered anomalous.
@ -417,7 +419,7 @@ To account for randomness and gather representative boot-up sequences of the dev
This cannot reasonably be performed manually with consistency. This cannot reasonably be performed manually with consistency.
Therefore, an automation script controls the @PDU with precise timings to perform the boots without human intervention. Therefore, an automation script controls the @PDU with precise timings to perform the boots without human intervention.
The exact experimental procedure followed for each target has minor variations depending on the target's boot-up requirements and timings. The exact experimental procedure for each target has minor variations depending on the target's boot-up requirements and timings.
Overall, they all follow the same template: Overall, they all follow the same template:
+ Turn ON the power to the machine. + Turn ON the power to the machine.
@ -426,8 +428,8 @@ Overall, they all follow the same template:
== Results<results> == Results<results>
We obtain the result per machine and per model. We obtain the result per machine and per model.
The training dataset is generated by injecting artificial anomalies, but the evaluation is performed on actual anomalous traces collected in a controlled environment. The evaluation is performed on actual anomalous traces collected in a controlled environment.
For each evaluation, a random set of $10$ consecutive traces is selected from the NORMAL label to serve as the seed for the anomaly generation. For each evaluation, a random set of $10$ consecutive traces are selected from the nominal label to serve as the seed for the anomaly generation.
The anomaly generator returns a training dataset composed of normal traces on one side and anomalous artificial traces on the other. The anomaly generator returns a training dataset composed of normal traces on one side and anomalous artificial traces on the other.
The models train using this dataset and are evaluated against a balanced dataset combining $M in [20,50]$ consecutive anomalous traces selected at random across all abnormal classes and as many nominal traces. The models train using this dataset and are evaluated against a balanced dataset combining $M in [20,50]$ consecutive anomalous traces selected at random across all abnormal classes and as many nominal traces.
The testing set is balanced between nominal and abnormal traces. The testing set is balanced between nominal and abnormal traces.
@ -436,6 +438,7 @@ This evaluation is repeated $50$ times, and the $F_1$ score is computed for each
The final score is the average of these $F_1$ scores. The final score is the average of these $F_1$ scores.
The results are presented in @tab-results. The results are presented in @tab-results.
#figure( #figure(
tablex( tablex(
columns: (40%,40%), columns: (40%,40%),
@ -447,21 +450,23 @@ The results are presented in @tab-results.
[Asus router], [1.00], [Asus router], [1.00],
[Linksys router], [0.92] [Linksys router], [0.92]
), ),
supplement: [Table],
kind: "table",
caption: [Results of detection.] caption: [Results of detection.]
)<tab-results> )<tab-results>
There are two hyper-parameters to tune to obtain the best performances. There are two hyper-parameters to tune to obtain the best performances.
First, the length of the trace considered is important. First, the length of the trace considered is important.
The trace needs to cover the whole boot-up sequence to be sure to detect any possible change. The trace needs to cover the whole boot-up sequence to be sure to detect any possible change.
It is better to avoid extending the trace too much after the firmware sequence is done, as the typical operation of the machine can produce noisy power consumption that interferes with the optimal placement of the threshold. It is better to avoid extending the trace too much after the firmware sequence is done, as the typical operation of the machine can produce noisy power consumption that interferes with the optimal placement of the threshold by diluting important features.
Second, the number of training traces can be optimized. Second, the number of training traces can be optimized.
A minimum of four traces is required for the @IQR method based on quartiles. A minimum of four traces is required for the @IQR method based on quartiles.
A minimum of two traces are necessary for the @SVM Threshold method as anomalous traces need to be generated based on the average and standard deviation of the normal dataset. We confirmed empirically that a minimum of ten traces produces better results than four as it allows enable the @IQR to work on quartiles that are actually robust to outliers.
Collecting additional traces after these lower boundaries offers marginal performance improvements as the number of traces has little impact on the threshold placement of both models. Collecting additional traces after these lower boundaries offers marginal performance improvements as the number of traces has little impact on the threshold placement of both models.
Moreover, collecting many boot-up sequences can be difficult to achieve in practice. Moreover, collecting many boot-up sequences can be difficult to achieve in practice.
Finally, tuning the sampling rate is important to ensure the best performances. Finally, tuning the sampling rate is important to ensure the best performances.
A machine boot-up in two seconds will require a higher sampling rate than a machine booting in thirty seconds. A machine boot-up in two seconds will require a higher sampling rate than a machine booting in thirty seconds.
All these parameters are machine-specific and need manual tuning before deployment of the side-channel @IDS. All these parameters are machine-specific and need manual tuning before deployment of the @BPV.
= Test Case 2: Drone<exp-drone> = Test Case 2: Drone<exp-drone>
In this case study, we demonstrate the potential of physics-based @IDS for drones. In this case study, we demonstrate the potential of physics-based @IDS for drones.
@ -470,22 +475,22 @@ The core component of consumer-available drones is usually a microcontroller, al
As with any other microcontrollers, the flight controller of a drone and its main program (we call the main program firmware in this paper) are subject to updates and attacks @8326960 @8433205. As with any other microcontrollers, the flight controller of a drone and its main program (we call the main program firmware in this paper) are subject to updates and attacks @8326960 @8433205.
Some of these attacks leverage firmware manipulations @8556480. Some of these attacks leverage firmware manipulations @8556480.
With custom firmware uploaded to a drone, many attack possibilities become accessible to the attacker, such as geofencing an area, recovering video feed, or damaging the drone. With custom firmware uploaded to a drone, many attack possibilities become accessible to the attacker, such as geofencing an area, recovering video feed, or damaging the drone.
Moreover, flight controllers as specialized devices that usually do not support the installation of third-party security software nor provide advanced security features such as cryptographic verification of the firmware. Moreover, flight controllers are specialized devices that usually do not support the installation of third-party security software nor provide advanced security features such as cryptographic verification of the firmware.
With drone usage soaring and the lack of security solutions, the problem of verifying their firmware against anomalies becomes important. With drone usage soaring and the lack of security solutions, the problem of verifying their firmware against anomalies becomes important.
== Experimental Setup == Experimental Setup
The experimental setup for this case study is similar to the one presented in @exp-network. The experimental setup for this case study is similar to the one presented in @exp-network.
The experiment focuses on the Spiri Mu drone #footnote[#link("https://spirirobotics.com/products/spiri-mu/")] flashed with the PX4 Drone Autopilot firmware #footnote[#link("https://px4.io/")]. The experiment focuses on the Spiri Mu drone #footnote[#link("https://spirirobotics.com/products/spiri-mu/")] flashed with the PX4 Drone Autopilot firmware #footnote[#link("https://px4.io/")].
The firmware for the flight controller consists of a microprocessor-specific bootloader, a second-stage bootloader common to all supported flight controllers. The firmware for the flight controller consists of a microprocessor-specific bootloader, a second-stage bootloader common to all supported flight controllers, and the operating system composed of different modules.
The battery of the drone is replaced with a laboratory power supply to ensure reproducible results. The battery of the drone is replaced with a laboratory power supply to ensure reproducible results.
The power consumption measurement device (see @capture for more details) is attached in series with the main power cable that provides an 11V @DC current to the drone. The power consumption measurement device (see @capture for more details) is installed in series with the main power cable that provides an 11V @DC current to the drone.
A controllable relay is placed in series with the main cable to enable scripted bootup and shutdown scenarios. A controllable relay is placed in series with the main cable to enable scripted bootup and shutdown scenarios.
The experiment scenarios are: The experiment scenarios are:
- *Nominal*: The first two versions consisted of unmodified firmware provided by the PX4 project, the first one was a pre-compiled version, and the second one was locally compiled. Although both versions should be identical, some differences appeared in their consumption pattern and required the training of a dual-mode model. - *Nominal:* The first two versions consisted of unmodified firmware provided by the PX4 project, the first one was a pre-compiled version, and the second one was locally compiled. Although both versions should be identical, some differences appeared in their consumption pattern and required the training of a dual-mode model.
- *Low Battery*: When the drone starts with a low battery level, its behaviour changes to signal the user of the issue. Any battery level below 11V is considered low. In this scenario, a nominal firmware is loaded, and the drone starts with 10V, triggering the low battery behaviour. - *Low Battery:* When the drone starts with a low battery level, its behaviour changes to signal the user of the issue. Any battery level below 11V is considered low. In this scenario, a nominal firmware is loaded, and the drone starts with 10V, triggering the low battery behaviour.
- *Malfunctioning Firmware*: Two malfunctioning firmware versions were compiled. The first introduces a _division-by-zero_ bug in the second stage bootloader. The second introduces the same bug but in the battery management module (in the OS part of the firmware). The second scenario should not introduce measurable anomalous patterns in the boot-up sequence as it only affects the OS stage. - *Malfunctioning Firmware:* Two malfunctioning firmware versions were compiled. The first introduces a _division-by-zero_ bug in the second stage bootloader. The second introduces the same bug but in the battery management module (in the OS part of the firmware). The second scenario should not introduce measurable anomalous patterns in the boot-up sequence as it only affects the OS stage.
#figure( #figure(
image("images/drone-overlaps.svg", width: 100%), image("images/drone-overlaps.svg", width: 100%),
@ -506,6 +511,8 @@ The experiment scenarios are:
[Bootloader Bug],[1],[50], [Bootloader Bug],[1],[50],
[Battery Module Bug], [0.082],[39], [Battery Module Bug], [0.082],[39],
), ),
supplement: [Table],
kind: "table",
caption: [Results of the intrusion detection on the drone.] caption: [Results of the intrusion detection on the drone.]
)<drone-results> )<drone-results>
@ -538,6 +545,7 @@ This suggests that future work could achieve an even lower time-to-decision, lik
= Specific Case Study: @AIM <aim> = Specific Case Study: @AIM <aim>
#reset-acronym("AIM")
When training a model to detect outliers, it is often expected to have examples of possible anomalies. When training a model to detect outliers, it is often expected to have examples of possible anomalies.
In some cases, gathering anomalies can be difficult, costly, or impossible. In some cases, gathering anomalies can be difficult, costly, or impossible.
In the context of this study, it would be impractical to measure power consumption patterns for a wide range of firmware anomalies. In the context of this study, it would be impractical to measure power consumption patterns for a wide range of firmware anomalies.
@ -568,7 +576,7 @@ The goal is not the reproduce exact anomalous traces but to generate a wide vari
], ],
)<fig-boot-up_traces_TPLINK> )<fig-boot-up_traces_TPLINK>
@fig-boot-up_traces_TPLINK illustrates the domain knowledge extracted from this machine. @fig-boot-up_traces_TPLINK illustrates the domain knowledge extracted from the traces.
The anomalies that the power trace exhibits are a combination of types of transformations. The anomalies that the power trace exhibits are a combination of types of transformations.
- The trace is shifted along the $y$ axis. In this case, the anomalous firmware consumes significantly more or less power than the normal one. This shift can affect the whole trace or only a part of it. This can be the result of different usage of the machine's components or a significant change in the firmware instructions. - The trace is shifted along the $y$ axis. In this case, the anomalous firmware consumes significantly more or less power than the normal one. This shift can affect the whole trace or only a part of it. This can be the result of different usage of the machine's components or a significant change in the firmware instructions.
@ -588,7 +596,7 @@ The resulting dataset does not exactly resemble the anomalous traces that are co
To avoid introducing training biases, the dataset is balanced by generating new normal traces using the average and standard deviation if required. To avoid introducing training biases, the dataset is balanced by generating new normal traces using the average and standard deviation if required.
#figure( #figure(
image("images/schematic.svg", width: 100%), image("images/schematic.svg", width: 90%),
caption: [Overview of the @BPV model training and evaluation.], caption: [Overview of the @BPV model training and evaluation.],
)<fig-overview> )<fig-overview>
@ -604,15 +612,15 @@ A benchmarking algorithm evaluates the performances of @AIM against the performa
This is a natural extension of the @BPV when abnormal samples are available. This is a natural extension of the @BPV when abnormal samples are available.
Two main parameters are important to tune for the @AIM. Two main parameters are important to tune for the @AIM.
First, the range for the length of the x shift, and especially its lower bound, has an important influence on the generated anomalies. First, the range for the length of the $x$ shift, and especially its lower bound, has an important influence on the generated anomalies.
A small lower bound allows for the generation of anomalous traces that closely resemble the nominal traces, which can result in a sub-optimal threshold placement. A small lower bound allows for the generation of anomalous traces that closely resemble the nominal traces, which can result in a sub-optimal threshold placement.
Second, the range parameter for the y-shift affects the results in the same way. Second, the range parameter for the y-shift affects the results in the same way.
The values for these parameters are chosen as part of the domain knowledge extraction, and they affect the transferability of the model (see @aim-conclusion). The values for these parameters are chosen as part of the domain knowledge extraction, and they affect the transferability of the model (see @aim-conclusion).
The performances are evaluated on the same dataset as for the initial @BPV evaluation (see~@exp-network). The performances are evaluated on the same dataset as for Test Case 1 (see~@exp-network).
//The performance metric is the F1 score. //The performance metric is the F1 score.
The final performance measure is the average F1 score (and its standard deviation) over 30 independent runs. The final performance measure is the average F1 score (and its standard deviation) over 30 independent runs.
Each run selects five random normal traces as seed for the dataset generation. Each run selects five random normal traces as seeds for the dataset generation.
The training dataset is composed of 100 training traces and 100 evaluation races. The training dataset is composed of 100 training traces and 100 evaluation races.
The results are presented in @tab-aim The results are presented in @tab-aim
@ -621,13 +629,15 @@ The results are presented in @tab-aim
auto-vlines: false, auto-vlines: false,
align: (left, right, right), align: (left, right, right),
columns:(33%,33%,33%), columns:(33%,33%,33%),
[*Machine*], [*BPV*], [*AIM*], [*Machine*], [*BPV*], [*AIM+BPV*],
[HP-SWITCH],[$0.895 plus.minus 0.094$],[$0.657 plus.minus 0.394$], [HP-SWITCH],[$0.895 plus.minus 0.094$],[$0.657 plus.minus 0.394$],
[TPLINK-SWITCH], [$0.9 plus.minus 0.084$],[$0.985 plus.minus 0.035$], [TPLINK-SWITCH], [$0.9 plus.minus 0.084$],[$0.985 plus.minus 0.035$],
[WAP-ASUS], [$1.0 plus.minus 0.0$],[$0.987 plus.minus 0.041$], [WAP-ASUS], [$1.0 plus.minus 0.0$],[$0.987 plus.minus 0.041$],
[WAP-LINKSYS],[$0.882 plus.minus 0.099$],[$0.867 plus.minus 0.098$], [WAP-LINKSYS],[$0.882 plus.minus 0.099$],[$0.867 plus.minus 0.098$],
), ),
caption: [Performances of the @AIM model compared with the original @BPV model (average F1 score #sym.plus.minus std).] supplement: [Table],
kind: "table",
caption: [Performances of the @AIM+@BPV model compared with the original @BPV model (average F1 score #sym.plus.minus std.).]
)<tab-aim> )<tab-aim>
== Conclusion on the @AIM Model<aim-conclusion> == Conclusion on the @AIM Model<aim-conclusion>
@ -646,8 +656,8 @@ This section elaborates on some important aspects of this study.
== Capture Process<capture> == Capture Process<capture>
We use a hardware device referred to as the capture box @hidden placed in series with the primary power cable of the target device. We use a hardware device referred to as the capture box @hidden placed in series with the primary power cable of the target device.
The technology for measuring the current differs depending on the capture box's version. The technology for measuring the current differs depending on the capture box's version.
For test cases 1 and 3, the box's shunt resistor generates a voltage drop representative of the global power consumption of the machine. For test cases 0 and 3, the box's shunt resistor generates a voltage drop representative of the global power consumption of the machine.
For test case 2, a Hall effect sensor returns a voltage proportional to the current. For test case 1 and 2, a Hall effect sensor returns a voltage proportional to the current.
For both versions, the voltage value is sampled at 10 KSPS. For both versions, the voltage value is sampled at 10 KSPS.
These samples are packaged in small fixed-size chunks and sent to a data aggregation server on a private @VLAN. These samples are packaged in small fixed-size chunks and sent to a data aggregation server on a private @VLAN.
The data aggregation server is responsible for gathering data from all of our capture boxes and sending it via a @VPN tunnel to a storage server. The data aggregation server is responsible for gathering data from all of our capture boxes and sending it via a @VPN tunnel to a storage server.
@ -660,11 +670,11 @@ Because the boot-up sequence usually begins with a sharp increase in power consu
Two parameters control the extraction. Two parameters control the extraction.
$T$ is the consumption threshold, and $L$ is the length of the boot-up sequence. $T$ is the consumption threshold, and $L$ is the length of the boot-up sequence.
To extract all the boot-up sequences in a power trace, the algorithm evaluates consecutive samples against $T$. To extract all the boot-up sequences in a power trace, the algorithm evaluates consecutive samples against $T$.
If sample $s_{i-1}<T$ and $s_i>T$, then $s_i$ is the first sample of a boot-up sequence, and the next $L$ samples are extracted. If sample $s_(i-1)<T$ and $s_i>T$, then $s_i$ is the first sample of a boot-up sequence, and the next $L$ samples are extracted.
The power trace is resampled at 50ms using a median aggregating function to avoid any incorrect detections. The power trace is resampled at 50ms using a median aggregating function to avoid any incorrect detections.
This pre-processing removes most of the impulse noise that could falsely trigger the detection method. This pre-processing removes most of the impulse noise that could falsely trigger the detection method.
The final step of the detection is to store all the boot sequences under the same label for evaluation. The final step of the detection is to store all the boot sequences under the same label for evaluation.
The complete dataset corresponding to this experiment is available online @dataset. // The complete dataset corresponding to this experiment is available online @dataset.
== Support for Online Training<online> == Support for Online Training<online>
In order for the @BPV to integrate in a realistic environment, the training procedure takes the rareness of the boot-up event into account. In order for the @BPV to integrate in a realistic environment, the training procedure takes the rareness of the boot-up event into account.
@ -682,7 +692,7 @@ Thanks to its low complexity and support for multi-modes, the @BPV can adapt dur
= Conclusion<conclusion> = Conclusion<conclusion>
This study illustrates the applicability of side-channel analysis to detect firmware attacks. This study illustrates the applicability of side-channel analysis to detect firmware attacks.
The proposed side-channel-based @IDS can detect firmware tampering from the power consumption trace. The proposed side-channel-based @IDS can detect firmware tampering from the power consumption trace.
Moreover, distance-based models leveraged in this study allow minimal training data and training time requirements. Moreover, distance-based models leveraged in this study allow minimal training data requirements.
On a per-machine basis, anomaly generation can enhance the training set without additional anomalous data capture. On a per-machine basis, anomaly generation can enhance the training set without additional anomalous data capture.
Finally, deploying this technology to production networking equipment requires minimal downtime and hardware intrusion, and it is applicable to clientless equipment. Finally, deploying this technology to production networking equipment requires minimal downtime and hardware intrusion, and it is applicable to clientless equipment.
This study illustrates the potential of independent, side-channel-based @IDS for the detection of low-level attacks that can compromise machines even before the operating system gets loaded. This study illustrates the potential of independent, side-channel-based @IDS for the detection of low-level attacks that can compromise machines even before the operating system gets loaded.