\chapter{Exploratory Work on Physics-Based Security}\label{chap:pastwork} The \gls{esg} has a history of power side-channel analysis. In 2017, the \gls{eet} project started with the aim to explore the intrusion detection capabilities of side-channel analysis. A series of exploratory work on the topic of physics-based defense followed, each ilustrating a different capability \section{Electromechanical Emission Tripwire} The \gls{eet} project marked the start of the physics-based security at the ESG lab. The project aims to evaluate the capabilities of physics-based security and provide a proof-of-concept. The initial target was a network switch. Network switches are a core component of any data center. As powerful as computers can be, if they are not inter-connected, their computing power remains useless. Communication becomes as essential as individual computing capabilities in a data center with hundreds of machines. The failure of a network switch can have devastating consequences for data center operations. Every minute of downtime costs the data center and its clients a fortune and must be prevented. \gls{hids} are often not a perfect solution for network switches. Their \gls{os} dont support additional software installation and may not propose built-it \gls{ids} capabilities. When they do, the security solutions may be weak or rapidly out of date and fail to protect against attacks such as firmware modification~\cite{cisco_trust,thomson_2019} and bypassing secure boot-up~\cite{Cui2013WhenFM, hau_2015}. They also fail to offer effective run-time monitoring through auditing and verifying log entries~\cite{koch2010security}. For these reasons, network switches are prime candidates for side-channel security. The installation of a side-channel monitoring system is often minimally invasive and can even be performed without downtime if the machine supports redundant power supplies. The aim of the project was to leverage side-channel analysis to detect anomalous activities that can be related to attacks on a network switch. The goal is not to create a complete \gls{ids} suite from physics-based security but to offer a complementary detection mechanism for the cases where traditional \gls{ids} are failing. \subsection{Attack Scenario} The attack surface on a network switch is large. Every manageable switch has a management system that enable changing the parameters of the machine. This management is typically accessible remotly via \gls{ssh}, telnet, HTTP or locally with a serial connection. At least one of these interface shoud be available and they are typically protected with a username/password pair -- although certificate or key authentication may be available for modern interfaces like \gls{ssh}. On top of these intended interface, a network switch is also at risk of attacks from the connected clients. A malicious client connected to the switch can run \gls{MAC} flooding attack or VLAN hopping attack. An attacker that gets physical access to the machine can also tamper with the firmware (upgrading/downgrading the firmware, uploading malicious firmware) or the hardware configuration of the machine. We considered the following intrusions: a remote connection via \gls{ssh}, a firmware change, and a hardware change. The remote connection via \gls{ssh} can be legitimate and does not always implies an intrusion. However, this operation can be the first step of a more complex attack. The network switch logs the connections for later forensics, but there is no mechanism to act on a remote connection on the default \gls{os}. The capability of detecting a remote connection independently of the \gls{os} is valuable in a security pipeline. Moreover, an attacker that gains access to the machine could wipe logs to cover their tracks. With the detection mechanism isolated from the target machine, the attacker cannot bypass the detection. A firmware change can also be a legitimate operation. Updating the firmware is now a common capability on many embedded systems. However, if the firmware change was not allowed by the system administrator, then it represents a threat. Downgrading the firmware can re-open older security flaws that have since been documented. Upgrading the firmware without approval can cause disruptions in the machine's operation. Loading a modified version of the firmware can also enable an attacker to forge the firmware version and remain undetected from remote security or monitoring solutions. Finally, a hardware change is also a security threat. The machine that we considered for the experiment \agd{cite machine model} allows for the installation of additional port modules. Each module expands the port capacity of the machine. Modules can be \textit{hot-plugged} and will apply the default configuration of the machine. Installing a new blade on a machine with a poor default configuration allows an attacker to set up various attacks. For example, if the default configuration does not limit the number of \gls{mac} addresses, an \gls{mac} flooding attack can be performed to access restricted traffic. This last scenario requires physical access to the machine. \subsection{Host Independence} One important aspect of the \gls{eet} technology is the independence between the host and the detection machine. In a similar way to a \gls{nids}, the detection system is remote. An attacker with access to the host does not have access to the detection system, which is important for the reliability of the results. In the case of a \gls{hids}, the data are collected by a software on the host. Whether these data are analyzed locally of sent to a remote machine makes no difference, as a compromised machine cannot be trusted to send genuine measurements. An attacker with access to the machine can tamper with the measurement process to report nominal values and stay under the radar. This problem is addressed by the \gls{eet} system as the power consumption of a software running on a machine cannot be faked and is difficult to hide. The \gls{eet} system attempts to close the gap between host-\gls{ids} independence and access to relevant information about the machine's activities. \subsection{Side-Channels} Two side-channel were initially considered: electrical power consumption and ultrasound emissions. The ultrasound emissions were quickly discarded. When working with sound, the placement of the microphone is important and should be consistent. This is a problem for the deployment of this technology to a variety of machines, as finding the best position of the microphone is complicated. Moreover, the ultrasound measurements did not show the same level of detail as the power consumption. Power consumption is a popular side-channel for many reasons: it is easy to capture reliably with low-cost equipment, the placement of the capture device has little impact on the results, adding a capture device is often easy as it can be plugged in series with the main power cable of the machine, it provides a good level of detail about the machine's activity. We measured the power consumption in the form of power traces (time series of power measurements). The capture device was a shunt resistor placed in series with the main power cable that generated a voltage drop proportional to the current (see figure \ref{fig:overview-eet1}). We measured this voltage drop at a high frequency from 10-kilo samples per second (10KSPS) to one million samples per second (1MSPS). \begin{figure} \centering \includegraphics[width=0.9\textwidth]{images/overview_eet1.pdf} \caption{Overview of the EET setup} \label{fig:overview-eet1} \end{figure} \subsection{Results} The detection of remote connection, firmware changes, and hardware changes were all successful. More specifically, firmware change detection showed the most promising results. The power consumption during the boot process is more stable and less noisy than during runtime. Thanks to this consistency, changes between two firmware versions (see Figure \ref{fig:eet1_firmware}) are easy to detect with simple methods like \gls{knn}, \gls{svm} and \gls{rfc}. All these methods yield good results for the detection of abnormal firmware. \begin{figure} \centering \includegraphics[width=0.8\textwidth]{images/eet1_firmware.pdf} \caption{Boot-up sequences for two different firmware versions} \label{fig:eet1_firmware} \end{figure} \newpage \section{xPSU}\label{sec:xpsu} The xPSU project continues the exploratory work started with the \gls{eet} project. One important observation of the \gls{eet} project was that the global power consumption could be too noisy to extract all the relevant information is some cases. One solution to this issue is to measure the power consumption at a lower level on specific components of interest. The xPSU project aims at placing a power consumption probe and pre-processing system inside a regular \gls{pc}'s \gls{psu}. The \gls{psu} is a prime location for monitoring power as it is responsible for generating the different power sources for the components of the \gls{pc}. Integrating the measurement device in a \gls{psu} enables a \textit{drop-in} installation of the monitoring system in most \gls{pc}. The capture mechanism consisted of a shunt resistor for generating the voltage drop, an \gls{adc} for measuring the value, and an \gls{sbc} for compiling and processing or sending the measurements. The measurement and analysis did not require any communication with the host device to ensure independence. The xPSU was an early proof of concept, and all the components could not fit in the \gls{psu}. The fan of the \gls{psu} was moved outside of the enclosure, and the form factor of the \gls{psu} was modified. As a result, the xPSU was not a perfect \textit{drop-in} replacement of a regular power supply, but the final form factor was encouraging. With a better design of the capture system and a more appropriate choice of components (the raspberry pi is too large and powerful for the task), a more compact form factor could be achieved. \begin{figure} \centering \includegraphics[width=0.8\textwidth]{images/xpsu_illustration} \caption{The xPSU focuses on a more granular measure of each components.} \label{fig:xpsu} \end{figure} \subsection{Results} We evaluated the performances of the xPSU on the task of detecting changes in hard drive firmware. We placed the shunt resistor on the 5V cable of the Molex connector. Although it is not an ordinary operation, it is possible to update the firmware of a hard drive. Updates enable attackers to modify the firmware in the same way presented in the \gls{eet} project previously. We selected drives with a pending firmware update for the experiment and measured their boot power trace before and after an update. We also measured the trace of multiple drives of the same model and capacity to evaluate the detection of a drive replacement. The results were satisfactory and illustrated the possibility of detecting a firmware change or a drive replacement from the boot power consumption of the drive captured from within the \gls{psu}. \newpage \section{Boot Process Verifier}\label{sec:bpv} The good results of the \gls{eet} and xPSU projects paved the way for the development of a robust and versatile solution for verifying the boot process of a machine. From the \gls{eet} project, we learned that modelling the expected trace (based on a number of known good boot traces) enabled the detection of anomalous firmware. From the xPSU project, we learned that most embedded systems requiring a firmware would exhibit a firmware signature in the power consumption. The base idea of the \gls{bpv} is to leverage a small number of known good firmware traces to build a model of normal boot power consumption. With the model, a threshold is automatically computed to describe the acceptable range within which a new boot trace should fall to be considered normal. If a new boot trace falls outside of the range, then it is abnormal, and an alert is raised. The \gls{bpv} is not a tool for finding the root cause of an anomaly. A root cause analysis can be applied later, but the \gls{bpv} is only responsible for detecting an anomaly. The anomaly can result from malicious firmware, firmware upgrade/downgrade, or a change in firmware settings. The \gls{eet} project also illustrated the potential of the simpler distance-based models. A distance-based model was adopted for the \gls{bpv} to keep the maximum explainability of the model decision. The \gls{bpv} is an approach to the following problem statement. \begin{problem-statement}[Boot Process Verification] Given a set of known-valid time series sample $S=\{s_1,\dots, s_n\}$ and a new unlabeled time series $t$, assign to $t$ the label \textit{valid} or \textit{anomalous} with the condition that the \textit{valid} label should only be assigned to new traces originating from the same distribution as the training samples from $S$. \end{problem-statement} The sample in $S$ and the unlabeled input $t$ are all discretized, real-valued time series of the same length. The training samples $S$ all belong to the \textit{valid} class. No example of the \textit{anomalous} class is accessible to the algorithm. All samples in $S$ originate from the same distribution as they are different occurrences of boot sequences from the same machine with the same firmware and configuration. The proposed solution was a distance-based detector with a threshold based on the \gls{iqr}. The distance between two time series of same length is defined as the Euclidean distance and computed as $d(a,b)^2 = \sum_{i=0}^N(a[i]-b[i])^2$. The \gls{iqr} is a measure of the dispersion of samples more robust to outliers than the variance. It is based on the quartiles and defined as $IQR = Q_3 - Q_1$ with $Q_3$ the third quartile and $Q_1$ the first quartile. This value is commonly used~\cite{han2011data} to detect outliers as a more robust alternative to the $3\sigma$ interval of a Gaussian distribution. The training phase begins by computing the pointwise average trace. Then, the \gls{iqr} of the distances from each trace to the average trace is computed. Finally, the distance threshold takes the value $Q3 + 1.5\times IQR$. The distance of each new trace to the reference average is computed and compared to the threshold in the detection phase. If the distance is above the pre-computed threshold, the new trace is considered anomalous. \subsection{Results} We evaluated the \gls{bpv} on three occasions. First, we assembled a panel of relevant devices, including switches, \gls{wap} and \gls{pc}. The evaluations revealed that the \gls{bpv} performed better on simpler devices like switches and \gls{wap} compared to general-purpose computers. This is mainly due to the reduced variability and noise in the traces captured from simpler devices that produce a more robust model. This first study leads to the publication of a work-in-progress paper in the EMSOFT 2022 conference \cite{grisel2022work} that describes the design and capabilities of the \gls{bpv} in its first version. Then, we performed a case study with an industry partner on \gls{rtu}. The \gls{rtu} was composed of one low-complexity embedded system and one main general-purpose computer. The computer's activity overtook most of the other information in the trace and made it more difficult to detect subtle variations. However, the \gls{bpv} could still detect intrusions in the computer from the global trace. For example, a user modifying some settings through the \gls{bios} or booting into a different \gls{os} was detected. This case study revealed that some systems could have multiple valid modes of the boot sequence. This discovery enabled us to rethink the model of the \gls{bpv} to allow such variations. The final evaluation was performed on a drone. A drone is a prime machine for the \gls{bpv} as its low complexity allows for consistent boot traces. We successfully detected different firmware versions by leveraging the information from the two previous experiments. Along the evaluations, the \gls{bpv} capabilities have been modified to adapt to specific cases and enable anomalous training samples, multi-model evaluations, and autonomous learning. \newpage \section{State Detection and Segmentation} In Section~\ref{sec:bpv} we mentioned the use of distance metrics on boot power traces to evaluate their validity. However, we never mentioned how these traces were detected, extracted, and synchronised. This problem of pattern detection in a time series is more complexe than it seems and the boot sequence may not be known in advance, can take multiple form, and must still be detected if an anomalous boot radically changes the pattern. The \gls{sds} algorithm was a first attempt at detected and extracting boot sequences for the \gls{bpv} to analyse. The algorithm leverages two feature common to all (cold) boot sequences which are a sharp spike of power consumption and an average increase in the power consumption. Two thresholds are manually set for the detection. The first is the \textit{off\_threshold} that is the power consumption under which the machine is considered off. The second is the \textit{boot\_time} which represent the time span of the boot procedure. Each sample is considered and a set of rule is applied to decide on its state among \textit{OFF}, \textit{BOOT} and \textit{ON}. \begin{figure}[H] \centering \includegraphics[width=\textwidth]{images/sds_illustration} \caption{SDS detection mechanism using the y offset (\textit{off\_threshold}) and the x offset (\textit{bios\_time})} \label{fig:sds_illustration} \end{figure} \begin{algorithm}[H] \caption{SDS} \label{alg:sds} \begin{algorithmic}[1] \Require $trace$ the time serie of $N$ samples., $off\_threshold$, $bios\_time$. \State $pstates \gets array(N-1)$ \State $states \gets array(N)$ \State $boot\_time \gets None$ \For{$i \in [0,\dots, N]$} \State $s \gets trace[i].value$ \State $t \gets trace[i].time$ \If{$s < off\_threshold$} \State $states[i] \gets OFF$ \Else \If{$i=0$} \State $states[i] \gets ON$ \ElsIf{$pstate[i-1] = OFF$} \State $states[i] \gets BOOT$ \State $boot_time \gets t$ \ElsIf{$pstates[i-1] = ON$} \State $states[i] \gets ON$ \Else \If{$t - boot\_time < bios\_time$} \State $states[i] \gets BOOT$ \Else \State $states[i] \gets ON$ \EndIf \EndIf \EndIf \State $pstate[i] \gets states[i]$ \EndFor \end{algorithmic} \end{algorithm} This simple algorithm makes the \gls{sds} robust and reliable but also limited. The \gls{sds} is an appropriate solution for states that exhibit a change in average consumption and with pre-defined duration. The detection of consistent and synchronized bootup sequences fits perfectly in this use case. This consistency and synchrony of the instances are essential for distance-based detectors which compare these instances. However, for the states that cannot be described by a change in average consumption and duration, the \gls{sds} is incompetent. For example, if a machine can perform two runtime operations that generate the same consumption pattern but with different frequencies, then the \gls{sds} cannot distinguish these two states reliably. These limitations make the \gls{sds} a preliminary work, not a final solution. It outlines that state detection is a complex problem and that the properties of the output need to be taken into account during the design. If the desired output is only the information of the state occurrence, then the perfect consistency and synchronization of the extract are not required. If the output is expected to be processed by a follow-up algorithm, and especially if it is distance-based, then the output needs to be much more consistent and synchronized. These considerations reveal a tradeoff between training data and capabilities. The \gls{sds} required no training data except for the two threshold values. This is interesting from a deployment perspective, where machine data can be scarce. It also impacts the detection capability as the \gls{sds} does not look for actual patterns but for single values. \newpage \section{Device State Detector} The \gls{dsd} is the continuation of the \gls{sds}. The algorithm's goal remains the same; detect the machine's state. However, the detection process and the outputs are fundamentally different. The \gls{sds} was built with robustness, ease of training and consistency. The keywords for the \gls{dsd} would be versatility and range of application. You may already guess \agd{familiar?} that the synchronization and consistency of the output will not be the main focus of the \gls{dsd}, and they will be replaced by a greater versatility of the state detection at the cost of more training data. The \gls{dsd} fits in a family of problems that are similar but differ by the natur of the data leveraged. Until now, we only took into accound the case of the power consumption of a single machine (or single source) captured at a single point (single measure). Other variation of the same problem (multi sources, multi measures, ...) will studies in the next chapter. The \gls{dsd} algorithm is an approach to the following problem statement. \begin{problem-statement}[Single-Source Single-Measure] Given a discretized time series $t$ and a set of patterns $P=\{\chi, P_1,\dots, P_n\}$, identify an injective mapping $m_{SSSM}:\mathbb{N}\longrightarrow P$ such that every sample $t[i]$ maps to a pattern in $P$ with the condition that the sample matches an occurence of the pattern in $t$. \end{problem-statement} The time serie $t: \mathbb{N} \longrightarrow \mathbb{R}$ is a discretized, mono-variate, real-valued time series. The patterns $P_j \in P\setminus \chi$ are of the same type as $t$. A sample $t[i]$ \textit{matches} a pattern $P_j \in P\setminus \chi$ if there exists a subsequence of $t$, the length of $P_j$, that include the sample, such that a similarity measure between this subsequence and $P_j$ is below a pre-defined threshold. The pattern $\chi$ is the unknown pattern assigned to the samples in $t$ that dont match any of the $P_j$ patterns. \begin{figure} \centering \includegraphics[width=0.9\textwidth]{images/dsd_illustration} \caption{Illustration of the DSD input and output.} \label{fig:dsd_illustration} \end{figure} The core of the algorithm is the \gls{knn} classification. This algorithm is a proven and robust way of labelling new samples based on their relative similarity to the training samples. Although this is a good algorithm for various problems, its application to time series for pattern matching is not obvious. For the rest of the explanation of the \gls{dsd} we will suppose that the training data consists of one time series per state. These time series represent one occurrence of a state to detect. One important detail is that each training sample can have a different length as the states are likely not all of the same duration. The default way of applying a \gls{knn} classifier for detecting patterns in a long time series would be to iteratively consider slices of the trace corresponding to each length of the training sample. Then the classifier would evaluate the distance of the slices to the training sample and normalize this distance by the length to generate comparable values. The state of the closest training sample is assigned to each sample of the slice, and the next slice is considered without overlap. The results for this method are sub-optimal. The stride between each window is too large, and crucial patterns can be overlooked in the trace. Moreover, the whole window will be assigned one label, which causes the edges of the states to be inaccurate. The \gls{dsd} uses a better metric for evaluating the distance between a sample and each state. For each sample and for each state, every window of the length of the state containing the sample is considered. The first window contains the sample at the last position, and the last window contains the sample at the first position.\agd{add figure to explain that} The algorithm computes the distance between each window and the state and normalizes it by the length of the state. After all the distances are computed, we can assign to the sample the state that is the closest. This method naturally segments the state space into areas where the borders represent a mid-point between two states.\agd{figure} We refined the method by introducing a coefficient to shrink the capture areas of each state. The emerging area corresponding to no state allows for the detection of unseen states. This method retains the low complexity of a distance-based \gls{knn} algorithm while yielding better accuracy, especially around state transitions. The \gls{dsd} was designed for one-shot classification, but the multi-shot version is naturally accessible by adding more training examples and going from a 1-NN to a K-NN. Two metrics represent the performance of the \gls{dsd} and any other algorithm for the same problem. First, the accuracy is computed as the number of correct labels over the total number of labels to predict. This metric is common and gives an overview of the performances comparable with a random baseline. However, the knowledge of the specific applications that the \gls{dsd} is designed for allow for the definition of a complementary metric. The label of each sample makes the time series actionable. Other algorithms down the processing pipeline can evaluate the sequence of states detected for a machine in order to decide on the integrity of the machine. In this regard, a labeling error can have different impact depending on the location. More specifically, a single error at the transition between two states would result in a slight timing error for the state transition detection. However, a single error in the middle of a series of identical labels would result in the detection of a new incorrect state, potentially triggering actions down the line. These two errors have the same impact on the accuracy. This illustrates that the accuracy is not the complete picture. To evaluate the state detection at a higher level, the levenshtein distance of the reduced labels is defined. The reduced labels is the vector of labels with every sequence of identical labels represented as only one symbole. The normalized state edit distance is defined as \begin{equation} nsed(truth,preds) = \dfrac{1}{max(reduced(truth),reduced(preds)}*Lev(reduced(truth),reduced(preds)) \end{equation} with $Lev$ the Levenshtein distance. This metric is complementary to the accuracy and will be computed for every evaluation of the the state detection algorithms. \newpage \section{Conclusion on Past Work} The project of physics-based security at a global level is not trivial. The main hurdle is the extraction of information with a dual constraint of only collecting unlabeled and partial information (the power consumption) and not having any control over the machine's activity (host independance). However, these constraints are also the strengths of this approach. The power consumption is a limited but reliable source of information as it is very difficult to forge. It is up to the algorithm to extract as much information from it. The independence is also important as it guarantees that an attacker cannot bypass the detection mechanism. With these constraints in mind, the current results illustrate\agd{change word} a great potential. The \gls{bpv} and \gls{dsd} algorithm tackle the problems of boot process integrity and runtime activity monitoring. These two complementary aspects represent a large area of the attack surface of a typical embedded system. The unique properties of host independence and unforgeability of the input data make the physics-based \gls{ids} a promising complement for any security suite. More work is obviously required. The main study\agd{change word} currently is to evaluate the performance of the \gls{dsd} to make it as versatile and reliable as possible. From the xPSU project, we understood that a more granular measurement of the power consumption could be beneficial in detecting specific attacks and enabling root cause analysis instead of basic anomaly detection. The continuation of the research work will focus on runtime monitoring and investigate the data measurement scales and their respective benefits for a better security solution \agd{change solution, marketing term}.