start adding comments from reviewer

2023-09-28 05:27:30 -04:00 · 2023-09-28 05:27:30 -04:00 · f8df0543b3
commit f8df0543b3
parent 40c54e53a7
3 changed files with 508 additions and 22 deletions
--- a/BPV/qrs/bibli.bib
+++ b/BPV/qrs/bibli.bib
@ -351,6 +351,11 @@ keywords = {Industrial control systems, Programmable logic controllers, Firmware
  year = {2022},
 }

+@misc{palitronica,                                                                                                                        title = {Palitronica - Palisade},
+  howpublished = {https://www.palitronica.com/products/palisade},
+  note = {Accessed: 2010-03-26}
+}
+
@INPROCEEDINGS{blockchain1,
  author={Dhakal, Samip and Jaafar, Fehmi and Zavarsky, Pavol},
  booktitle={2019 IEEE 19th International Symposium on High Assurance Systems Engineering (HASE)}, 
@ -600,4 +605,4 @@ series = {IoTPTS '16}
      eprint={1707.02968},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
-}
+}
--- a/BPV/qrs/images/illustration.svg
+++ b/BPV/qrs/images/illustration.svg
--- a/BPV/qrs/main.typ
+++ b/BPV/qrs/main.typ
@ -33,7 +33,7 @@
      email: "sfischme@uwaterloo.ca",
    ),
  ),
-  anon: true,
+  anon: false,
  index-terms: (),
  bibliography-file: "bibli.bib",
 )
@ -111,7 +111,7 @@ It requires minimal training examples and minor hardware modifications in most c
 == Paper Organization
 We elaborate on the type of attacks that our method aims to mitigate in the threat model @threat.
@bpv describes the proposed solution.
-@exp-network,~@exp-drone, and~@aim present test cases that illustrate applications and variations of the #acr("BPV").
+@test-cases presents the four test cases that illustrate applications and variations of the #acr("BPV").
 Finally, the paper ends with @discussion that provides more insights on specific aspects of the proposed solution and @conclusion for the conclusion.


@ -214,6 +214,14 @@ The training time series in $T$ are discretized, mono-variate, and real-valued t
 The length of the captured time series is a parameter of the detector, tuned for each machine.
 The number of training time series $N$ is considered small relative to the usual size of training datasets in time series classification problems @sun2017revisiting.
 All time series considered in this problem ($T union u$) are all of the same length and synchronized at capture time; see @sds for more details about the synchronization process.
+@overview present and overview of the #acr("BPV") implementation on power traces to detect abnormal firmware.
+
+
+#figure(
+  placement: auto,
+  image("images/illustration.svg", width:100%),
+  caption: [Overview of the detection process.]
+)<overview>

 == Detection Models<detector>
 The #acr("BPV") performs classification of the boot traces using a distance-based detector and a threshold.
@ -249,7 +257,7 @@ The Euclidean distance is computed between each trace and the reference, and the
 The distance threshold takes the value $1.5 times "IQR"$. For the detection, the distance of each new trace to the reference is computed and compared to the threshold.
 The new trace is considered anomalous if the distance is above the threshold.

-=== Support For Multi-modal Boot-up Sequences<multi-modal>
+== Support For Multi-modal Boot-up Sequences<multi-modal>
 Some machines can boot following multiple different boot-up sequences that are considered normal.
 There exist various reasons for such behaviour.
 For example, a machine can perform recovery operations if the power is interrupted while the machine is off or perform health checks on components that may pass or fail and trigger deeper inspections procedure.
@ -266,11 +274,18 @@ Each point represents the distance from one training sample to the average trace
 The vertical dashed lines represent the distance threshold.

 #figure(
+  placement: auto,
  image("images/training.svg", width:100%),
  caption: [BPV model trained with two modes.]
 )<fig-modes>
+= Test Cases<test-cases>

-= Test Case 0: General Purpose Computer
+Each test case aims at evaluating the #acr("BPV") under different scenarios.
+The selected machines for each test case represent a domain where firmware attacks --- recorder or theorized --- could lead to significant dammages.
+The list of selected devices is not exhaustive as any firmware is a potential point of attack.
+
+
+== Test Case 0: General Purpose Computer

 This test case illustrates the first application of the #acr("BPV") and follows a slightly different setup and assumptions.
 First, the power consumption measurement does not only contain the consumption of the machine to protect.
@ -285,7 +300,7 @@ Because the machine and the expected attacks are known in advance, it is possibl
 Because of these two specificities, this test case should be regarded as a first iteration to demonstrate the potential of the #acr("BPV") in a more restrictive environment.
 The following test cases in @exp-network and @exp-drone present other applications in more challenging environments.

-== Experimental Setup
+=== Experimental Setup
 This test case was conducted on a micro PC running Windows 8.
 The available power consumption was an aggregate of two micro-pc, one being the machine to protect.
 The second machine remained idle for the duration of the experiment.
@ -302,19 +317,20 @@ See @multi-modal for more details about how multi-modal models are defined.
@l3-training illustrates the power traces associated with each mode.

 #figure(
+  placement: auto,
  image("images/l3-training.svg", width:100%),
  caption: [Multi-Modal BPV model after training.]
 )<l3-training>

 After collecting training traces, the distribution of samples in each model was $(0.31,0.06,0.62)$.
 This distribution remains purely circumstantial from the point of view of the detector that considers the machine to protect as a black box.
-The root causes for the appearance of one boot-up mode, or another is outside the scope of this work.
+The root causes for the appearance of one boot-up mode or another is outside the scope of this work.
 The final training dataset comprises 93 training samples divided into three models following the above distribution.

 Abnormal boot-up traces are also collected.
 The abnormal boot sequences are composed of sequences where an operator went into the #acr("BIOS") and then continued booting into the #acr("OS").

-== Results
+=== Results
 The models are manually tuned to obtain 100% accuracy in the classification of nominal and abnormal boot sequences.
 Obtaining 100% accuracy illustrates that there is a clear separation between nominal and abnormal boot sequences for this type of attack.

@ -324,7 +340,7 @@ The #acr("BPV") detected the pre-defined attack with complete independence from
 Having access to anomalous samples enabled us to optimize the threshold placement to minimize false-positive (nominal boot-ups detected as anomalous) by relaxing the threshold value. 


-= Test Case 1: Network Devices<exp-network>
+== Test Case 1: Network Devices<exp-network>
 To verify the performance of the proposed detector, we design an experiment that aims at detecting firmware modifications on different devices.
 Networking devices are a vital component of any organization, from individual houses to complete data centers.
 A network failure can result in significant downtime that is extremely expensive for data centers @downtime.
@ -341,7 +357,7 @@ None of the selected devices support the installation of host-based #acr("IDS")
 The firmware is verified only during updates with a proprietary mechanism.
 This experiment illustrates the firmware verification capability of a side-channel #acr("IDS") for these machines where common #acr("IDS") may not be applicable.

-== Experimental Setup<setup>
+=== Experimental Setup<setup>
 Although this experiment is conducted in a controlled environment, the setup is representative of a real deployment (see @capture for more details).
 We gather data from the four networking equipment, which are connected to a managed #acr("PDU") (see @capture for more details).
 This #acr("PDU")'s output can be controlled by sending instructions on a telnet interface and enables turning each machine on or off automatically.
@ -350,6 +366,7 @@ The changes are listed in @tab-machines.

 #v(10pt)
 #figure(
+  placement: auto,
  tablex(
    columns: (25%,25%,25%,25%),
    align: (left+horizon,right+horizon,right+horizon,right+horizon),
@ -373,7 +390,7 @@ For wireless routers, their firmware is changed from the #acr("OEM") to differen
 In this study, we consider the latest #acr("OEM") firmware version to be the nominal version, expected to be installed on the machine by default.
 Any other version or firmware represents an attack and is considered anomalous.

-== Experiment procedure
+=== Experiment procedure
 To account for randomness and gather representative boot-up sequences of the device, we performed 500 boot iterations for each machine.
 This cannot reasonably be performed manually with consistency.
 Therefore, an automation script controls the #acr("PDU") with precise timings to perform the boots without human intervention.
@ -385,7 +402,7 @@ Overall, they all follow the same template:
 + Wait for a predetermined time for the target to boot up completely.
 + Turn OFF the power to the machine and wait for a few seconds to ensure proper shutdown of the machine.

-== Results<results>
+=== Results<results>
 We obtain the result per machine and per model.
 The evaluation is performed on actual anomalous traces collected in a controlled environment.
 For each evaluation, a random set of $10$ consecutive traces are selected from the nominal label to serve as the seed for the anomaly generation.
@ -399,6 +416,7 @@ The results are presented in @tab-results.


 #figure(
+  placement: auto,
  tablex(
    columns: (40%,40%),
    auto-vlines: false,
@ -427,7 +445,7 @@ Finally, tuning the sampling rate is important to ensure the best performances.
 A machine boot-up in two seconds will require a higher sampling rate than a machine booting in thirty seconds.
 All these parameters are machine-specific and need manual tuning before deployment of the #acr("BPV").

-= Test Case 2: Drone<exp-drone>
+== Test Case 2: Drone<exp-drone>
 In this case study, we demonstrate the potential of physics-based #acr("IDS") for drones.
 Drones are not new, but their usage both in the consumer and professional sectors increased significantly in recent years @droneincrease.
 The core component of consumer-available drones is usually a microcontroller, also called a flight controller.
@ -437,7 +455,7 @@ With custom firmware uploaded to a drone, many attack possibilities become acces
 Moreover, flight controllers are specialized devices that usually do not support the installation of third-party security software nor provide advanced security features such as cryptographic verification of the firmware.
 With drone usage soaring and the lack of security solutions, the problem of verifying their firmware against anomalies becomes important.

-== Experimental Setup
+=== Experimental Setup
 The experimental setup for this case study is similar to the one presented in @exp-network.
 The experiment focuses on the Spiri Mu drone #footnote[#link("https://spirirobotics.com/products/spiri-mu/")] flashed with the PX4 Drone Autopilot firmware #footnote[#link("https://px4.io/")].
 The firmware for the flight controller consists of a microprocessor-specific bootloader, a second-stage bootloader common to all supported flight controllers, and the operating system composed of different modules.
@ -452,6 +470,7 @@ The experiment scenarios are:
 - *Malfunctioning Firmware:* Two malfunctioning firmware versions were compiled. The first introduces a _division-by-zero_ bug in the second stage bootloader. The second introduces the same bug but in the battery management module (in the OS part of the firmware). The second scenario should not introduce measurable anomalous patterns in the boot-up sequence as it only affects the OS stage.

 #figure(
+  placement:top,
  image("images/drone-overlaps.svg", width: 100%),
  caption: [Overlap of boot-up traces for different scenarios and their average. Green = Low Battery (8 traces + average), Purple = Battery Module Bug (8 traces + average), Orange = Bootloader Bug (6 traces + average).]
 )
@ -461,7 +480,7 @@ The experiment consists in repeating each scenario between 40 and 100 times.
 The experiment procedure automatically captures boot-up traces for better reproducibility (see @sds for more details).

 #block(breakable:false)[
-== Results
+=== Results

 #figure(
  tablex(
@ -502,7 +521,7 @@ This suggests that future work could achieve an even lower time-to-decision, lik
 // == Results


-= Specific Case Study: Anomaly Infused Model<aim>
+== Specific Case Study: Anomaly Infused Model<aim>
 #reset-acronym("AIM")
 When training a model to detect outliers, it is often expected to have examples of possible anomalies.
 In some cases, gathering anomalies can be difficult, costly, or impossible.
@ -518,7 +537,7 @@ This implies that a distance-based detector that relies on a distance threshold
 The idea behind an #acr("AIM") is to leverage this property and generate artificial anomalous traces to form the training set.
 The additional anomalous traces are generated using only normal traces, which circumvents the need for extensive data collection.

-== Anomaly Generation
+=== Anomaly Generation
 The generation of anomalies from normal traces is based on the modification of the boot-up pattern.
 Data augmentation can leverage different time series modification methods to help a model generalize.
 The kind of modification applied to a trace is highly dependent on the application and the model @zimmering2021generating and requires domain knowledge about the system.
@ -528,6 +547,7 @@ The type of modification an anomalous trace presents compared to a normal trace
 The goal is not to reproduce exact anomalous traces but to generate a wide variety of possible anomalous traces given a small set of normal traces.

 #figure(
+  placement: auto,
  image("images/Bootup_traces_TPLINK.svg", width: 100%),
  caption: [
    Example of TP-Link switch boot-up traces for different firmware versions. The anomalous firmware (FIRMWARE V2) presents both a $y$ and $x$ shift.
@ -550,6 +570,7 @@ The possible transformations are:
 - Shifting both the $x$ and $y$ axis. Anomalous traces always presents a combination of $x$ shift, $y$ shift, or both.

 #figure(
+  placement: auto,
  image("images/schematic.svg", width: 100%),
  caption: [Overview of the #acr("BPV") model training and evaluation.],  
 )<fig-overview>
@ -562,11 +583,12 @@ To avoid introducing training biases, the dataset is balanced by generating new


 #figure(
+  placement: auto,
  image("images/Synthetic_vs_Normal_TPLINK.svg", width: 100%),
  caption: [Example of generated anomalous traces compared with captured normal traces for TP-Link switch.],
 )<fig-Synthetic_vs_Normal_TPLINK>

-== Results
+=== Results
 A benchmarking algorithm evaluates the performances of #acr("AIM") against the performances of the original #acr("BPV") trained with only normal traces.
 #acr("AIM") places the threshold to maximize the margins to the closest normal distance and abnormal distance in the same way a 1D-#acr("SVM") would.
 This is a natural extension of the #acr("BPV") when abnormal samples are available.
@ -585,6 +607,7 @@ The training dataset is composed of 100 training traces and 100 evaluation trace
 The results are presented in @tab-aim

 #figure(
+  placement: auto,
  tablex(
    auto-vlines: false,
    align: (left, right, right),
@ -600,7 +623,7 @@ The results are presented in @tab-aim
  caption: [Performances of the #acr("AIM")+#acr("BPV") model compared with the original #acr("BPV") model (average F1 score #sym.plus.minus std.).]
 )<tab-aim>

-== Conclusion on the #acr("AIM") Model<aim-conclusion>
+=== Conclusion on the #acr("AIM") Model<aim-conclusion>
 The #acr("AIM") model produces mixed results.
 The model was tuned for the TPLINK-SWITCH machine and produced significantly better results for this machine.
 However, the results did not transfer well to the other machines.
@ -614,7 +637,7 @@ However, the lack of transferability of the proposed methods indicates that furt
 This section elaborates on some important aspects of this study.

 == Capture Process<capture>
-We use a hardware device referred to as the capture box @hidden placed in series with the primary power cable of the target device.
+We use a hardware device referred to as the capture box @palitronica placed in series with the primary power cable of the target device.
 The technology for measuring the current differs depending on the capture box's version.
 For test cases 0 and 3, the box's shunt resistor generates a voltage drop representative of the global power consumption of the machine.
 For test cases 1 and 2, a Hall effect sensor returns a voltage proportional to the current.
@ -656,5 +679,3 @@ Moreover, distance-based models leveraged in this study allow minimal training d
 On a per-machine basis, anomaly generation can enhance the training set without additional anomalous data capture.
 Finally, deploying this technology to production networking equipment requires minimal downtime and hardware intrusion, and it is applicable to clientless equipment.
 This study illustrates the potential of independent, side-channel-based #acr("IDS") for the detection of low-level attacks that can compromise machines even before the operating system gets loaded.
-
-