review clem BPV for qrs

This commit is contained in:
Arthur Grisel-Davy 2023-07-12 13:57:56 -04:00
parent c5befd3f7d
commit c43c18e9aa
3 changed files with 172 additions and 223 deletions

View file

@ -1,67 +1,6 @@
#import "utils.typ": *
#import "tablex.typ": tablex, hlinex, vlinex, colspanx, rowspanx
#let acronyms = (
"IoT": "Internet of Things",
"BPV": "Boot Process Verifier",
"IDS": "Intrusion Detection System",
"SVM": "Support Vector Machine",
"PLC": "Programable Logic Controlers",
"DC": "Direct Current",
"AC": "Alternating Current",
"APT": "Advanced Persistent Threats",
"PDU": "Power Distribution Unit",
"VLAN": "Virtual Local Area Network",
"VPN": "Virtual Private Network",
"IQR": "Inter-Quartile Range",
"IT": "Information Technology",
"OEM": "Original equipment manufacturer",
"SCA": "Side-Channel Analysis",
"ROM": "Read Only Memory",
"AIM": "Anomaly-Infused Model",
"RFC": "Random Forest Classifier",
"BIOS": "Basic Input/Output System"
)
#let reset-acronym(term) = {
// Reset a specific acronym. It will be expanded on next use.
if term in acronyms{
state("acronym-state-" + term, false).update(false)
}
}
#let reset-all-acronyms() = {
// Reset all acronyms. They will all be expanded on the next use.
for term in acronyms.keys() {
state("acronym-state-" + term, false).update(false)
}
}
#show ref: r =>{// Overload the reference definition
// Grab the term, target of the reference
let term = if type(r.target) == "label"{
str(r.target)
}
else{
// I don't know why the target could not be a type label but it is handled
none
}
if term in acronyms{
// Grab definition of the term
let definition = acronyms.at(term)
// Generate the key associated with this term
let state-key = "acronym-state-" + term
// Create a state to keep track of the expansion of this acronym
state(state-key,false).display(seen => {if seen{term}else{[#definition (#term)]}})
// Update state to true as it has just been defined
state(state-key, false).update(true)
}
else{
r
}
}
#import "@preview/acrostiche:0.2.0": *
#import "template.typ": *
#show: ieee.with(
@ -72,11 +11,11 @@
Firmware protection solutions often share the flaw of requiring the cooperation of the machine they aim to protect.
If the machine gets compromised, the results from the protection mechanism become untrustworthy.
One solution to this problem is to leverage an independent source of information to assess the integrity of the firmware and the bootup sequence.
In this paper, we propose a physics-based @IDS called the @BPV that only relies on side-channel power consumption measurement to verify the integrity of the bootup sequence.
The @BPV works in complete independence from the machine to protect and requires only a few nominal training samples to establish a baseline of nominal behaviour.
One solution to this problem is to leverage an independent source of information to assess the integrity of the firmware and the boot-up sequence.
In this paper, we propose a physics-based Intrusion Detection System called the Boot Process Verifier that only relies on side-channel power consumption measurement to verify the integrity of the boot-up sequence.
The BPV works in complete independence from the machine to protect and requires only a few nominal training samples to establish a baseline of nominal behaviour.
The range of application of this approach potentially extends to any embedded systems.
We present three test cases that illustrate the performances of the @BPV on micro-PC, network equipment (switches and wireless access points), and a drone.
We present three test cases that illustrate the performances of the BPV on micro-PC, network equipment (switches and wireless access points), and a drone.
],
authors: (
(
@ -99,7 +38,28 @@
bibliography-file: "bibli.bib",
)
#init-acronyms((
"IoT": ("Internet of Things",),
"BPV": ("Boot Process Verifier",),
"IDS": ("Intrusion Detection System",),
"SVM": ("Support Vector Machine",),
"PLC": ("Programable Logic Controler",),
"DC": ("Direct Current",),
"AC": ("Alternating Current",),
"APT": ("Advanced Persistent Threat",),
"PDU": ("Power Distribution Unit",),
"VLAN": ("Virtual Local Area Network",),
"VPN": ("Virtual Private Network",),
"IQR": ("Inter-Quartile Range",),
"IT": ("Information Technology",),
"OEM": ("Original Equipment Manufacturer",),
"SCA": ("Side-Channel Analysis",),
"ROM": ("Read Only Memory",),
"AIM": ("Anomaly-Infused Model",),
"RFC": ("Random Forest Classifier",),
"BIOS": ("Basic Input/Output System",),
"OS": ("Operating System",),
))
@ -121,9 +81,9 @@
= Introduction
The firmware of any embedded system is susceptible to attacks. Since firmware provides many security features, it is always of major interest to attackers.
Every year, new firmware vulnerabilities are discovered. Any device that requires firmware, such as computers @185175, @PLC @BASNIGHT201376, or @IoT devices @rieck2016attacks, is vulnerable to these attacks.
Every year, new firmware vulnerabilities are discovered. Any device that requires firmware, such as computers @185175, #acr("PLC") @BASNIGHT201376, or #acr("IoT") devices @rieck2016attacks, is vulnerable to these attacks.
There are multiple ways to leverage a firmware attack. Reverting firmware to an older version allows an attacker to reopen discovered and documented flaws.
Cancelling an update can ensure that previously deployed attacks remain available. Finally, implementing custom firmware enables full access to the machine.
Canceling an update can ensure that previously deployed attacks remain available. Finally, implementing custom firmware enables full access to the machine.
The issue of malicious firmware is not recent.
The oldest firmware vulnerability recorded on #link("cve.mitre.org") related to firmware dates back to 1999.
@ -135,33 +95,33 @@ The integrity verification can also be performed at run-time as part of the firm
The above solutions to firmware attacks share the common flaw of being applied to the same machine they are installed on.
This allows an attacker to bypass these countermeasures after infecting the machine.
An attacker that could avoid triggering a verification, tamper with the verification mechanism, feed forged data to the verification mechanism, or falsify the verification report could render any defense useless.
In order to avoid this flaw, the @IDS must leverage data that can be trusted even from a compromised machine.
@IDS are then subject to a trade-off between having access to relevant and meaningful information and keeping the detection mechanism separated from the target machine.
In order to avoid this flaw, the #acr("IDS") must leverage data that can be trusted even from a compromised machine.
#acr("IDS") are then subject to a trade-off between having access to relevant and meaningful information and keeping the detection mechanism separated from the target machine.
Our solution addresses this trade-off by leveraging side-channel information.
== Contributions
This paper presents a novel solution for firmware verification using side-channel analysis.
Building on the assumption that every security mechanism the host cooperation is vulnerable to being bypassed, we propose using the device's power consumption signature during the firmware execution to assess its integrity.
Building on the assumption that every security mechanism requiring the host cooperation is vulnerable to being bypassed, we propose using the device's power consumption signature during the firmware execution to assess its integrity.
Because of the intrinsic properties of side-channel information, the integrity evaluation does not involve any communication with the host and is based on trustworthy data.
A distance-based outlier detector that uses power traces of a nominal boot-up sequence can learn the expected pattern and detect any variation in a new boot-up sequence.
This novel solution can detect various attacks centered around manipulating firmware.
In addition to its versatility of detection, it is also easily retrofittable to almost any embedded system.
It requires minimal training examples and minor hardware modification in most cases, especially for DC-powered devices.
It requires minimal training examples and minor hardware modifications in most cases, especially for #acr("DC")-powered devices.
== Paper Organization
We elaborate on the type of attacks that our method aims to mitigate in the threat model @threat, and the technology we leverage to capture relevant information in Section @SCA.
We elaborate on the type of attacks that our method aims to mitigate in the threat model @threat.
@bpv describes the proposed solution.
@exp-network,~@exp-drone, and~@aim present test cases that illustrate applications and variations of the @BPV.
Finally, the paper ends with @discussion that provides more insight on specific aspects of the proposed solution and @conclusion for the conclusion.
@exp-network,~@exp-drone, and~@aim present test cases that illustrate applications and variations of the #acr("BPV").
Finally, the paper ends with @discussion that provides more insights on specific aspects of the proposed solution and @conclusion for the conclusion.
= Related Work
Historically, the firmware was written on a @ROM, and it impossible to change.
Historically, the firmware was written on a #acr("ROM") and was impossible to change, thus preventing most attacks.
With the growing complexity of embedded systems, manufacturers developed procedures to enable remote firmware upgrades.
Firmware upgrades can address performances or security flaws or add features.
Unfortunately, attackers can leverage these firmware upgrade mechanisms to implement unauthorized or malicious pieces of software in the machine.
Almost all embedded systems are vulnerable to firmware attacks.
In industrial applications, studies proposed firmware attacks on control systems such as power systems field devices @power-devices, @PLC @plc_firmware, or any other industrial embedded system @santamarta2012here.
In industrial applications, studies proposed firmware attacks on control systems such as power systems field devices @power-devices, #acr("PLC") @plc_firmware, or any other industrial embedded system @santamarta2012here.
Safety-critical environments are also prime targets, including medical devices @health_review @pacemaker @medical_case_study, railway systems @railway or automotive systems @cars.
// Manufacturers try to protect firmware updates with cryptography but each solution interract with the host and cannot be trusted.
@ -179,9 +139,9 @@ However, the blockchain still needs to be verified at some point, and this verif
Overall, no security mechanism that requires interacting with the host machine can guarantee firmware integrity as a compromised machine can produce forged results.
// SCA provides a way to verify the integrity without interacting with the host.
Historically, attackers leveraged @SCA in general and power analysis in particular @sca_attack.
Historically, attackers leveraged #acr("SCA") in general and power analysis in particular @sca_attack.
Power consumption generally leaks execution information about the software activity that enables attacks.
Clark et al. proposed a method to identify web page visits from @AC power @clark_current_2013.
Clark et al. proposed a method to identify web page visits from #acr("AC") power @clark_current_2013.
Conti et al. developed a method for identifying laptop-user pairs from power consumption @10.1145-2899007.2899009.
Seemingly harmless power consumption data from a mobile phone can even leak position data @michalevsky2015powerspy.
All these methods illustrate the potential of power side channels for attacks, but a well-intention program could also leverage them for defense.
@ -190,25 +150,25 @@ Following this idea, Clark et al. @wud proposed in 2013 a power consumption-base
Hernandez et al. included power consumption with network data for malware detection @8855288.
Electrical power consumption is especially appropriate for inferring the machine's activity for different reasons.
First, it is easy to measure in a reproducible manner.
Then, it can be easy to get access to relevant power cables with little tampering from the machine when the power conversion from @AC to @DC power is performed outside the machine.
Second, it can be easy to get access to relevant power cables with little tampering from the machine when the power conversion from #acr("AC") to #acr("DC") power is performed outside the machine.
It is also a common side channel to all embedded systems as they all consume electricity.
Second, it is hard to fake from the developer's point of view. Because of the multiple abstraction layers between the code of a program and its implementation at the hardware level, changes in the code will result in a different power consumption pattern.
Finally, from the developer's point of view, forging power consumption to impoersonate other programms is difficult, especially at the firmware level. Because of the multiple abstraction layers between the code of a program and its implementation at the hardware level, changes in the code will result in a different power consumption pattern.
This is especially true when considering firmware or machines with low computation capabilities or highly specialized devices that have deterministic and stable execution patterns at boot-up.
However, to the best of our knowledge, no work leveraged the same data or method for firmware integrity verification.
Bootups are a natural target for defensive purposes are they are notoriously hard to protect, and host-based @IDS are not yet active to defend the machine.
Moreover, bootup produces significantly more consistent power consumption than normal operation on general-purpose machines as it follows a pre-defined process.
Boot-ups are a natural target for defensive purposes are they are notoriously hard to protect, and host-based #acr("IDS") are not yet active to defend the machine.
Moreover, boot-ups produces significantly more consistent power consumption than normal operation on general-purpose machines as it follows a pre-defined process.
In light of the potential of side-channel attacks, some work proposed manipulating power consumption patterns.
Defense mechanisms like Maya @pothukuchi2021maya propose to obfuscate specific activity patterns by applying a control method to target a pre-defined mask.
If changing the power consumption pattern of software to impersonate another is possible, that could decrease the potential of side-channel-based @IDS.
If changing the power consumption pattern of software to impersonate another is possible, that could decrease the potential of side-channel-based #acr("IDS").
However, the current work is designed for defense and aims at obfuscating the patterns by applying masks with the goal of making all power signatures similar, not impersonating a specific one.
Thus, power consumption remains a trustworthy source of information as a different set of instructions necessarily generates a different power consumption.
= Threat Model<threat>
Many attacks are enabled by tampering with the firmware.
Because the firmware is responsible for the initialization of the components, the low-level communications, and some security features, executing adversary code in place of the expected firmware is a powerful capability @mitre @capec.
If given enough time, information or access, an attacker could take complete control of the machine and pave the way to future @APT.
If given enough time, information or access, an attacker could take complete control of the machine and pave the way to future #acr("APT").
A firmware modification is defined as implementing any change in the firmware code.
Modifications include implementing custom functions, removing security features, or changing the firmware for a different version (downgrade or upgrade).
@ -221,7 +181,7 @@ An attacker would only need to wait for vulnerabilities to be discovered and the
These properties make the firmware downgrade a powerful first step to performing more elaborate attacks.
Manufacturers sometimes implement firmware anti-rollback mechanisms to prevent this type of attack, but they are also vulnerable to bypass.
Custom firmware may be required for more subtle or advanced attacks.
This requires more work and information as firmware codes are usually not open source and are challenging to reverse engineer.
This requires more work and information as firmware codes are usually not open-source and are challenging to reverse engineer.
Moreover, the firmware is tailored for a specific machine, and it can be difficult for an attacker to perform a custom firmware attack.
Although, the successful implementation of custom firmware can lead to performing almost any attack.
Finally, a firmware upgrade could also be used to open a newly discovered vulnerability.
@ -243,62 +203,62 @@ Although these firmware are typically not malicious, implementing alternative fi
// Power consumption is a reliable source of information but requires physical access to the machine.
// Sound and electromagnetic fields can be measured for a distance but are also typically more susceptible to measurement location @iot_anoamly_sca.
= Boot Process Verification<bpv>
= Boot Process Verifier<bpv>
Verifying the firmware of the machine using its power consumption represents a time series classification problem described in the problem statement:
#ps(title: "Boot Process Verification")[
Given a set of N time series $T=(t_1,...,t_N)$ corresponding to the power consumption during nominal boot sequences and a new unlabeled time series $u$, predict whether $u$ was generated by a nominal boot sequence.
]
The training time series in $T$ are discretized, mono-variate, real-valued time series of length $L$.
The training time series in $T$ are discretized, mono-variate, and real-valued time series of length $L$.
The length of the captured time series is a parameter of the detector, tuned for each machine.
The number of training time series $N$ is considered small relative to the usual size of training datasets in time series classification problems @sun2017revisiting.
All time series considered in this problem ($T union u$) are all of the same length and synchronized at capture time; see Section~@sds for more details about the synchronization process.
All time series considered in this problem ($T union u$) are all of the same length and synchronized at capture time; see @sds for more details about the synchronization process.
== Detection Models<detector>
The @BPV performs classification of the boot traces using a distance-based detector and a threshold.
The #acr("BPV") performs classification of the boot traces using a distance-based detector and a threshold.
The core of the detection is the computation of the distance between the new trace $u$, and the training traces $T$.
If this distance is greater than the pre-computed threshold, then the @BPV classifies the new trace as anomalous.
If this distance is greater than the pre-computed threshold, then the #acr("BPV") classifies the new trace as anomalous.
Otherwise, the new trace is nominal.
The training phase consists in computing the threshold based on the known good training traces.
Two main specificities of this problem make the computation of the threshold difficult.
First, the training dataset only contains nominal traces.
This assumption is important as there are a nearly infinite number of ways how a boot sequence can be altered to create a malicious or malfunctioning device.
The @BPV aims at fingerprinting the nominal sequence, not recognizing the possible abnormal sequences.
The #acr("BPV") aims at fingerprinting the nominal sequence, not recognizing the possible abnormal sequences.
Thus, the model can only describe the nominal traces statistically, based on the available examples, and assume that outliers to this statistical model correspond to abnormal boot sequences.
Second, the number of training samples is small.
In this case, small is relative to the usual number of training samples leveraged for time series classification (see @discussion for more details).
We assume that the training dataset contains between ten and 100 samples.
We assume that the training dataset contains between 10 and 100 samples.
This assumption is important for realism.
To keep the detector non-disruptive, the nominal boot sequences are captured during the normal operation of the device.
However, the bootup of a machine is a rare event, and thus the training dataset must remain small.
However, the boot-up of a machine is a rare event, and thus the training dataset must remain small.
The training sequence of the @BPV computes the distance threshold based on a statistical description of the distribution of the distance between each pair of normal traces.
The training sequence of the #acr("BPV") computes the distance threshold based on a statistical description of the distribution of the distance between each pair of normal traces.
The training sequence follows two steps.
+ The sequence computes the Euclidean distance between all pairs of training traces $D = {d(t_i,t_j) forall i,j in [1,...,N]^2; i eq.not j }$.
+ The sequence computes the threshold as $"thresh" = 1.5 dot "IQR"(D)$ with IQR the Inter-Quartile Range of the distances set $D$.
+ The sequence computes the threshold as $"thresh" = 1.5 dot "IQR"(D)$.
The @IQR is a measure of the dispersion of samples.
The #acr("IQR") is a measure of the dispersion of samples.
It is based on the first and third quartiles and defined as $ "IQR" = Q_3 - Q_1$ with $Q_3$ being the third quartile and $Q_1$ being the first quartile.
This value is common @han2011data for detecting outliers as a similar but more robust alternative to the $3 times sigma$ interval of a Gaussian distribution.
To apply the @IQR to the times series, we first compute the average of the nominal traces.
To apply the #acr("IQR") to the times series, we first compute the average of the nominal traces.
This average serves as a reference for computing the distance of each trace.
The Euclidean distance is computed between each trace and the reference, and the @IQR of these distances is computed.
The Euclidean distance is computed between each trace and the reference, and the #acr("IQR") of these distances is computed.
The distance threshold takes the value $1.5 times "IQR"$. For the detection, the distance of each new trace to the reference is computed and compared to the threshold.
The new trace is considered anomalous if the distance is above the threshold.
=== Support For Multi-modal Bootup Sequences<multi-modal>
Some machines can boot following multiple different bootup sequences that are considered normal.
=== Support For Multi-modal Boot-up Sequences<multi-modal>
Some machines can boot following multiple different boot-up sequences that are considered normal.
There exist various reasons for such behaviour.
For example, a machine can perform recovery operations if the power is interrupted while the machine is off or perform health checks on components that may pass or fail and trigger deeper inspections procedure.
Because the machines are treated as black boxes, it is important for the @BPV to deal with these multiple modes during training.
Because the machines are treated as black boxes, it is important for the #acr("BPV") to deal with these multiple modes during training.
See @online for more details about how the online training procedure deals with multi-modal models.
Our approach is to develop one model per mode following the same procedure as for a single mode, presented in Section~@detector.
Our approach is to develop one model per mode following the same procedure as for a single mode, presented in~@detector.
With multiple models available, the detection logic evolves to consider the new trace nominal if it matches any of the models.
If the new trace does not match any model, then it does not follow any of the nominal modes and is considered abnormal.
@fig-modes illustrate the trained @BPV models when two modes are present in the bootup sequence.
@fig-modes illustrates the trained #acr("BPV") models when two modes are present in the boot-up sequence.
The top part of the figure represents the average power trace for each mode. The x-axis is the time in milliseconds, and the y-axis is the amplitude in a unit proportional to the ampere (the absolute value of the consumption is unimportant for this study, only the global pattern matters).
The bottom part of the figure represents the distances and the threshold.
Each colour represents one mode.
@ -312,17 +272,17 @@ The vertical dashed lines represent the distance threshold.
= Test Case 0: General Purpose Computer
This test case illustrates the first application of the @BPV and follows a slightly different setup and assumptions.
This test case illustrates the first application of the #acr("BPV") and follows a slightly different setup and assumptions.
First, the power consumption measurement does not only contain the consumption of the machine to protect.
In some cases, capturing only the power consumption of the machine to protect is impossible.
For example, if the power connections follow proprietary designs or if the machine to protect is inaccessible (for practical or security reasons).
This is the case if the power connections follow proprietary designs or if the machine to protect is inaccessible (for practical or security reasons).
In this case, the available data is an aggregate of the machine to protect and a second machine.
The second machine does not perform any task, and its contribution to the aggregated power consumption is constant.
Second, anomalous examples of bootup sequences are available.
This test case was designed with an industry partner for the detection of two specific attacks: bootup on an external USB drive and access to the machine's @BIOS.
Second, anomalous examples of boot-up sequences are available.
This test case was designed with an industry partner for the detection of two specific attacks: boot-up on an external USB drive and access to the machine's #acr("BIOS").
Because the machine and the expected attacks are known in advance, it is possible to tailor the @BPV's parameters to maximize the performance at detecting the attacks.
Because of these two specificities, this test case should be regarded as a first iteration to demonstrate the potential of the @BPV in a more restrictive environment.
Because the machine and the expected attacks are known in advance, it is possible to tailor the #acr("BPV")'s parameters to maximize the performance at detecting the attacks.
Because of these two specificities, this test case should be regarded as a first iteration to demonstrate the potential of the #acr("BPV") in a more restrictive environment.
The following test cases in @exp-network and @exp-drone present other applications in more challenging environments.
== Experimental Setup
@ -336,7 +296,7 @@ The second machine remained idle for the duration of the experiment.
// caption: [Overview of the experiment setup for test case 0.]
// )<l3-setup>
From these samples representing nominal bootups, it appears that the machine presents multiple bootup modes.
From these samples representing nominal boot-ups, it appears that the machine presents multiple boot-up modes.
Hence, the model is multi-modal with three modes.
See @multi-modal for more details about how multi-modal models are defined.
@l3-training illustrates the power traces associated with each mode.
@ -348,28 +308,28 @@ See @multi-modal for more details about how multi-modal models are defined.
After collecting training traces, the distribution of samples in each model was $(0.31,0.06,0.62)$.
This distribution remains purely circumstantial from the point of view of the detector that considers the machine to protect as a black box.
The root causes for the appearance of one bootup mode, or another is outside the scope of this work.
The root causes for the appearance of one boot-up mode, or another is outside the scope of this work.
The final training dataset comprises 93 training samples divided into three models following the above distribution.
Abnormal bootup traces are also collected.
The abnormal boot sequences are composed of sequences where an operator went into the @BIOS and then continued booting into the OS.
Abnormal boot-up traces are also collected.
The abnormal boot sequences are composed of sequences where an operator went into the #acr("BIOS") and then continued booting into the #acr("OS").
== Results
The models are manually tuned to obtain 100% accuracy in the classification of nominal and abnormal boot sequences.
Obtaining 100% accuracy illustrates that there is a clear separation between nominal and abnormal boot sequences for this type of attack.
//#agd[could not redo the results as teh data for bios boot are missing]
Although this test case represents an unrealistic situation (mainly because the anomalous samples are accessible during training), it is still a valuable first evaluation of the @BPV.
Although this test case represents an unrealistic situation (mainly because the anomalous samples are accessible during training), it is still a valuable first evaluation of the #acr("BPV").
This test case serves as a proof-of-concept and indicates that there is a potential for the detection of firmware-level attacks with power consumption.
The @BPV detected the pre-defined attack with complete independence from the machine and with a perfect success rate.
Having access to anomalous samples enabled us to optimize the threshold placement to minimize false-positive (nominal bootups detected as anomalous) by relaxing the threshold value.
The #acr("BPV") detected the pre-defined attack with complete independence from the machine and with a perfect success rate.
Having access to anomalous samples enabled us to optimize the threshold placement to minimize false-positive (nominal boot-ups detected as anomalous) by relaxing the threshold value.
= Test Case 1: Network Devices<exp-network>
To verify the performance of the proposed detector, we design an experiment that aims at detecting firmware modifications on different devices.
Networking devices are a vital component of any organization, from individual houses to complete data centers.
A network failure can result in significant downtime that is extremely expensive for data centers @downtime.
Compromised network devices can also result in data breaches and @APT.
Compromised network devices can also result in data breaches and #acr("APT").
These devices are generally highly specialized in processing and transmitting information as fast as possible.
We consider four machines that represent consumer-available products for different prices and performance ranges.
@ -378,14 +338,14 @@ We consider four machines that represent consumer-available products for differe
- TP-Link Switch T1500G-10PS. This 8-port switch offers some security features for low-load usage.
- HP Switch Procurve 2650 J4899B. This product is enterprise-oriented and provides more performance than the TP-Link switch. This is the only product of the selection that required hardware modification, as the power supply is internal to the machine. The modification consists in cutting the 5V cables to install the capture system.
None of the selected devices supports the installation of host-based @IDS or firmware integrity verification.
None of the selected devices support the installation of host-based #acr("IDS") or firmware integrity verification.
The firmware is verified only during updates with a proprietary mechanism.
This experiment illustrates the firmware verification capability of a side-channel @IDS for these machines where common @IDS may not be applicable.
This experiment illustrates the firmware verification capability of a side-channel #acr("IDS") for these machines where common #acr("IDS") may not be applicable.
== Experimental Setup<setup>
Although this experiment is conducted in a controlled environment, the setup to a real deployment (see @capture for more details).
We gather data from the four networking equipment, which are connected to a managed @PDU (see @capture for more details).
This @PDU's output can be controlled by sending instructions on a telnet interface and enables turning each machine on or off automatically.
Although this experiment is conducted in a controlled environment, the setup is representative of a real deployment (see @capture for more details).
We gather data from the four networking equipment, which are connected to a managed #acr("PDU") (see @capture for more details).
This #acr("PDU")'s output can be controlled by sending instructions on a telnet interface and enables turning each machine on or off automatically.
Each machine will undergo a firmware change or version change to represent a firmware attack.
The changes are listed in @tab-machines.
@ -404,20 +364,20 @@ The changes are listed in @tab-machines.
),
supplement: [Table],
kind: "table",
caption: [Machines used for the experiments and the changes applied.],
caption: [Machines used for the experiment and the changes applied.],
)<tab-machines>
This experiment aims at simulating an attack situation by performing firmware modifications on the target devices and recording the boot-up power trace data for each version.
For the switches, we flash different firmware versions provided by the @OEM.
For wireless routers, their firmware is changed from the @OEM to different versions of #link("https://openwrt.org/")[OpenWrt].
In this study, we consider the latest @OEM firmware version to be the nominal version, expected to be installed on the machine by default.
For the switches, we flash different firmware versions provided by the #acr("OEM").
For wireless routers, their firmware is changed from the #acr("OEM") to different versions of #link("https://openwrt.org/")[OpenWrt].
In this study, we consider the latest #acr("OEM") firmware version to be the nominal version, expected to be installed on the machine by default.
Any other version or firmware represents an attack and is considered anomalous.
== Experiment procedure
To account for randomness and gather representative boot-up sequences of the device, we performed 500 boot iterations for each machine.
This cannot reasonably be performed manually with consistency.
Therefore, an automation script controls the @PDU with precise timings to perform the boots without human intervention.
Therefore, an automation script controls the #acr("PDU") with precise timings to perform the boots without human intervention.
The exact experimental procedure for each target has minor variations depending on the target's boot-up requirements and timings.
Overall, they all follow the same template:
@ -460,16 +420,16 @@ First, the length of the trace considered is important.
The trace needs to cover the whole boot-up sequence to be sure to detect any possible change.
It is better to avoid extending the trace too much after the firmware sequence is done, as the typical operation of the machine can produce noisy power consumption that interferes with the optimal placement of the threshold by diluting important features.
Second, the number of training traces can be optimized.
A minimum of four traces is required for the @IQR method based on quartiles.
We confirmed empirically that a minimum of ten traces produces better results than four as it allows enable the @IQR to work on quartiles that are actually robust to outliers.
A minimum of four traces is required for the #acr("IQR") method (for the computation of quartiles).
We confirmed empirically that a minimum of ten traces produces better results than four as it enables the #acr("IQR") to work on quartiles that are actually robust to outliers.
Collecting additional traces after these lower boundaries offers marginal performance improvements as the number of traces has little impact on the threshold placement of both models.
Moreover, collecting many boot-up sequences can be difficult to achieve in practice.
Finally, tuning the sampling rate is important to ensure the best performances.
A machine boot-up in two seconds will require a higher sampling rate than a machine booting in thirty seconds.
All these parameters are machine-specific and need manual tuning before deployment of the @BPV.
All these parameters are machine-specific and need manual tuning before deployment of the #acr("BPV").
= Test Case 2: Drone<exp-drone>
In this case study, we demonstrate the potential of physics-based @IDS for drones.
In this case study, we demonstrate the potential of physics-based #acr("IDS") for drones.
Drones are not new, but their usage both in the consumer and professional sectors increased significantly in recent years @droneincrease.
The core component of consumer-available drones is usually a microcontroller, also called a flight controller.
As with any other microcontrollers, the flight controller of a drone and its main program (we call the main program firmware in this paper) are subject to updates and attacks @8326960 @8433205.
@ -484,19 +444,24 @@ The experiment focuses on the Spiri Mu drone #footnote[#link("https://spirirobot
The firmware for the flight controller consists of a microprocessor-specific bootloader, a second-stage bootloader common to all supported flight controllers, and the operating system composed of different modules.
The battery of the drone is replaced with a laboratory power supply to ensure reproducible results.
The power consumption measurement device (see @capture for more details) is installed in series with the main power cable that provides an 11V @DC current to the drone.
A controllable relay is placed in series with the main cable to enable scripted bootup and shutdown scenarios.
The power consumption measurement device (see @capture for more details) is installed in series with the main power cable that provides an 11V #acr("DC") current to the drone.
A controllable relay is placed in series with the main cable to enable scripted boot-up and shutdown scenarios.
The experiment scenarios are:
- *Nominal:* The first two versions consisted of unmodified firmware provided by the PX4 project, the first one was a pre-compiled version, and the second one was locally compiled. Although both versions should be identical, some differences appeared in their consumption pattern and required the training of a dual-mode model.
- *Nominal:* The first two versions consisted of unmodified firmware provided by the PX4 project, the first one was a pre-compiled version, and the second one was locally compiled. Although both versions should be identical, some differences appeared in their consumption patterns and required the training of a dual-mode model.
- *Low Battery:* When the drone starts with a low battery level, its behaviour changes to signal the user of the issue. Any battery level below 11V is considered low. In this scenario, a nominal firmware is loaded, and the drone starts with 10V, triggering the low battery behaviour.
- *Malfunctioning Firmware:* Two malfunctioning firmware versions were compiled. The first introduces a _division-by-zero_ bug in the second stage bootloader. The second introduces the same bug but in the battery management module (in the OS part of the firmware). The second scenario should not introduce measurable anomalous patterns in the boot-up sequence as it only affects the OS stage.
#figure(
image("images/drone-overlaps.svg", width: 100%),
caption: [Overlap of bootup traces for different scenarios and their average. Green = Low Battery (8 traces + average), Purple = Battery Module Bug (8 traces + average), Orange = Bootloader Bug (6 traces + average).]
caption: [Overlap of boot-up traces for different scenarios and their average. Green = Low Battery (8 traces + average), Purple = Battery Module Bug (8 traces + average), Orange = Bootloader Bug (6 traces + average).]
)
The experiment procedure consists in starting the drone flight controller multiple times while capturing the power consumption.
The experiment consists in repeating each scenario between 40 and 100 times.
The experiment procedure automatically captures boot-up traces for better reproducibility (see @sds for more details).
#block(breakable:false)[
== Results
#figure(
@ -515,20 +480,15 @@ The experiment scenarios are:
kind: "table",
caption: [Results of the intrusion detection on the drone.]
)<drone-results>
The experiment procedure consists in starting the drone flight controller multiple times while capturing the power consumption.
The experiment consists in repeating each scenario between 40 and 100 times.
The experiment procedure automatically captures boot-up traces for better reproducibility (see @sds for more details).
]
@drone-results presents the results of the detection.
Both Original and Compiled represent nominal firmware versions.
Each scenario introduces disturbances in the boot-up sequence power consumption.
The model correctly identifies the anomalous firmware.
One interesting scenario is the Battery Module Bug that is mostly detected as nominal.
This result is expected as the bug affects the operations of the firmware after the bootup sequence.
This result is expected as the bug affects the operations of the firmware after the boot-up sequence.
Hence, the power consumption in the first second of activity remains nominal.
//#agd[Should the result of the battery module bug remain, or is it confusing to present scenarios where the BPV expectedly fails?]
@ -544,20 +504,20 @@ This suggests that future work could achieve an even lower time-to-decision, lik
// == Results
= Specific Case Study: @AIM <aim>
= Specific Case Study: Anomaly Infused Model<aim>
#reset-acronym("AIM")
When training a model to detect outliers, it is often expected to have examples of possible anomalies.
In some cases, gathering anomalies can be difficult, costly, or impossible.
In the context of this study, it would be impractical to measure power consumption patterns for a wide range of firmware anomalies.
Such data collection would require modifying firmware parameters, suspending equipment usage, or infecting production machines with malicious firmware.
These modifications are impossible for production equipment and would still lead to an incomplete training dataset.
To circumvent this limitation, we propose a variation of the training process called @AIM.
@AIM leverages the specificity of distance-based detectors.
To circumvent this limitation, we propose a variation of the training process called #acr("AIM").
#acr("AIM") leverages the specificity of distance-based detectors.
Distance-based detectors produce results based solely on the distance between two traces and a learned threshold.
The threshold is chosen to separate normal and anomalous traces as well as possible.
The actual pattern of the traces is not important for this type of detector as only the aggregated distance of each sample matters.
This implies that a distance-based detector that relies on a distance threshold can be trained the same way with either real anomalous traces or with artificial traces that present the same distance to the reference.
The idea behind an @AIM is to leverage this property and generate artificial anomalous traces to form the training set.
The idea behind an #acr("AIM") is to leverage this property and generate artificial anomalous traces to form the training set.
The additional anomalous traces are generated using only normal traces, which circumvents the need for extensive data collection.
== Anomaly Generation
@ -566,8 +526,8 @@ Data augmentation can leverage different time series modification methods to hel
The kind of modification applied to a trace is highly dependent on the application and the model @zimmering2021generating and requires domain knowledge about the system.
In this case, we want to generate anomalous traces with patterns similar to actual anomalous traces from a machine.
The first step of this process is to extract domain knowledge from all the traces collected.
The type of modification an anomalous trace present compared to a normal trace help us design anomaly generation functions that apply the same type of transformation to normal traces with varying parameters.
The goal is not the reproduce exact anomalous traces but to generate a wide variety of possible anomalous traces given a small set of normal traces.
The type of modification an anomalous trace presents compared to a normal trace help us design anomaly generation functions that apply the same type of transformation to normal traces with varying parameters.
The goal is not to reproduce exact anomalous traces but to generate a wide variety of possible anomalous traces given a small set of normal traces.
#figure(
image("images/Bootup_traces_TPLINK.svg", width: 100%),
@ -577,7 +537,7 @@ The goal is not the reproduce exact anomalous traces but to generate a wide vari
)<fig-boot-up_traces_TPLINK>
@fig-boot-up_traces_TPLINK illustrates the domain knowledge extracted from the traces.
The anomalies that the power trace exhibits are a combination of types of transformations.
The anomalies that the power traces exhibit are a combination of types of transformations.
- The trace is shifted along the $y$ axis. In this case, the anomalous firmware consumes significantly more or less power than the normal one. This shift can affect the whole trace or only a part of it. This can be the result of different usage of the machine's components or a significant change in the firmware instructions.
- The trace is delayed or in advance along the $x$ axis. The anomalous trace presents the same patterns and amplitude as the normal trace but at different points in time. This shift can occur when parts of the firmware are added or removed by updates.
@ -591,27 +551,29 @@ The possible transformations are:
- Shifting both the $x$ and $y$ axis. Anomalous traces always presents a combination of $x$ shift, $y$ shift, or both.
#figure(
image("images/schematic.svg", width: 100%),
caption: [Overview of the #acr("BPV") model training and evaluation.],
)<fig-overview>
@fig-overview presents an overview of the model's data flow.
The resulting dataset does not exactly resemble the anomalous traces that are collected but presents traces with the same range of distances to normal traces (see @fig-Synthetic_vs_Normal_TPLINK).
To avoid introducing training biases, the dataset is balanced by generating new normal traces using the average and standard deviation if required.
#figure(
image("images/schematic.svg", width: 90%),
caption: [Overview of the @BPV model training and evaluation.],
)<fig-overview>
#figure(
image("images/Synthetic_vs_Normal_TPLINK.svg", width: 100%),
caption: [Example of generated synthetic anomalous traces vs normal traces for TP-Link switch.],
caption: [Example of generated anomalous traces compared with captured normal traces for TP-Link switch.],
)<fig-Synthetic_vs_Normal_TPLINK>
== Results
A benchmarking algorithm evaluates the performances of @AIM against the performances of the original @BPV trained with only normal traces.
@AIM places the threshold to maximize the margins to the closest normal distance and abnormal distance in the same way a 1D-@SVM would.
This is a natural extension of the @BPV when abnormal samples are available.
A benchmarking algorithm evaluates the performances of #acr("AIM") against the performances of the original #acr("BPV") trained with only normal traces.
#acr("AIM") places the threshold to maximize the margins to the closest normal distance and abnormal distance in the same way a 1D-#acr("SVM") would.
This is a natural extension of the #acr("BPV") when abnormal samples are available.
Two main parameters are important to tune for the @AIM.
Two main parameters are important to tune for the #acr("AIM").
First, the range for the length of the $x$ shift, and especially its lower bound, has an important influence on the generated anomalies.
A small lower bound allows for the generation of anomalous traces that closely resemble the nominal traces, which can result in a sub-optimal threshold placement.
Second, the range parameter for the y-shift affects the results in the same way.
@ -621,7 +583,7 @@ The performances are evaluated on the same dataset as for Test Case 1 (see~@exp-
//The performance metric is the F1 score.
The final performance measure is the average F1 score (and its standard deviation) over 30 independent runs.
Each run selects five random normal traces as seeds for the dataset generation.
The training dataset is composed of 100 training traces and 100 evaluation races.
The training dataset is composed of 100 training traces and 100 evaluation traces.
The results are presented in @tab-aim
#figure(
@ -637,17 +599,17 @@ The results are presented in @tab-aim
),
supplement: [Table],
kind: "table",
caption: [Performances of the @AIM+@BPV model compared with the original @BPV model (average F1 score #sym.plus.minus std.).]
caption: [Performances of the #acr("AIM")+#acr("BPV") model compared with the original #acr("BPV") model (average F1 score #sym.plus.minus std.).]
)<tab-aim>
== Conclusion on the @AIM Model<aim-conclusion>
The @AIM model produces mixed results.
== Conclusion on the #acr("AIM") Model<aim-conclusion>
The #acr("AIM") model produces mixed results.
The model was tuned for the TPLINK-SWITCH machine and produced significantly better results for this machine.
However, the results did not transfer well to the other machines.
Experiments reveal that the values of parameters that produce the best results can differ significantly from one machine to the other, even for the same type of machine.
The idea of introducing artificial anomalous examples in the training dataset is valid and can indeed enable the creation of a better model.
The idea of introducing artificial anomalous examples in the training dataset is valid and can enable the creation of a better model.
This artificial augmentation of the training set is especially interesting in the context of rare events where creating an extensive dataset is expensive.
However, the lack of transferability of the proposed methods indicates that further work is required to evolve @AIM into an undeniably better solution compared to @BPV.
However, the lack of transferability of the proposed methods indicates that further work is required to evolve #acr("AIM") into an undeniably better solution compared to #acr("BPV").
= Discussion<discussion>
@ -659,8 +621,8 @@ The technology for measuring the current differs depending on the capture box's
For test cases 0 and 3, the box's shunt resistor generates a voltage drop representative of the global power consumption of the machine.
For test case 1 and 2, a Hall effect sensor returns a voltage proportional to the current.
For both versions, the voltage value is sampled at 10 KSPS.
These samples are packaged in small fixed-size chunks and sent to a data aggregation server on a private @VLAN.
The data aggregation server is responsible for gathering data from all of our capture boxes and sending it via a @VPN tunnel to a storage server.
These samples are packaged in small fixed-size chunks and sent to a data aggregation server on a private #acr("VLAN").
The data aggregation server is responsible for gathering data from all of our capture boxes and sending it via a #acr("VPN") tunnel to a storage server.
Each file on the server contains 10s of power consumption data.
== Extraction of Synchronized Bootup Traces<sds>
@ -677,24 +639,23 @@ The final step of the detection is to store all the boot sequences under the sam
// The complete dataset corresponding to this experiment is available online @dataset.
== Support for Online Training<online>
In order for the @BPV to integrate in a realistic environment, the training procedure takes the rareness of the boot-up event into account.
Once the measurement device is set up on the machine to protect, the streaming time series representing the power consumption serves as input for the bootup detection algorithm (see @sds).
Each bootup event is extracted and added to a dataset of bootup traces.
Once the dataset reaches the expected number of samples, the @BPV computes the threshold and is ready for validation of the next bootup.
In order for the #acr("BPV") to integrate in a realistic environment, the training procedure takes the rareness of the boot-up event into account.
Once the measurement device is set up on the machine to protect, the streaming time series representing the power consumption serves as input for the boot-up detection algorithm (see @sds).
Each boot-up event is extracted and added to a dataset of boot-up traces.
Once the dataset reaches the expected number of samples, the #acr("BPV") computes the threshold and is ready for validation of the next boot-up.
The complete training and validation procedures require no human interactions.
In the case of a multi-modal model, the training procedure requires one human interaction.
Presented with the bootup samples, an operator can transform the model into a multi-modal model by separating the training samples into multiple modes.
Once the separation is performed, the training procedure resumes without interaction, and the next bootup samples are assigned to the closest mode.
Presented with the boot-up samples, an operator can transform the model into a multi-modal model by separating the training samples into multiple modes.
Once the separation is performed, the training procedure resumes without interaction, and the next boot-up samples are assigned to the closest mode.
Thanks to its low complexity and support for multi-modes, the @BPV can adapt during training to changes in the training data and supports switching between single and multi-modes.
Thanks to its low complexity and support for multi-modes, the #acr("BPV") can adapt during training to changes in the training data and support switching between single and multi-modes.
= Conclusion<conclusion>
This study illustrates the applicability of side-channel analysis to detect firmware attacks.
The proposed side-channel-based @IDS can detect firmware tampering from the power consumption trace.
The proposed side-channel-based #acr("IDS") can detect firmware tampering from the power consumption trace.
Moreover, distance-based models leveraged in this study allow minimal training data requirements.
On a per-machine basis, anomaly generation can enhance the training set without additional anomalous data capture.
Finally, deploying this technology to production networking equipment requires minimal downtime and hardware intrusion, and it is applicable to clientless equipment.
This study illustrates the potential of independent, side-channel-based @IDS for the detection of low-level attacks that can compromise machines even before the operating system gets loaded.
This study illustrates the potential of independent, side-channel-based #acr("IDS") for the detection of low-level attacks that can compromise machines even before the operating system gets loaded.