clem review + latex-checker

This commit is contained in:
Arthur Grisel-Davy 2023-06-15 15:14:56 -04:00
parent 8fac5379f2
commit 669a11bfff
2 changed files with 54 additions and 50 deletions

View file

@ -88,9 +88,8 @@ anon@anonymous.nw}
Side-channel emissions provide an independent and extrinsic source of information at the about the system, purely based on the physical by-product of its activities.
Leveraging side-channel information, we propose a physics-based \gls{ids} as an aditional layer of protection for embedded systems.
The physic-based \gls{ids} uses machine-learning-based power analysis to monitor and assess the behaviour and integrity of network equipment.
%The proposed \gls{ids} offers complementary intrusion detection for an HP Procurve Network Switch 5406zl, using its power consumption as side-channel emissions.
The \gls{ids} successfully detect three different classes of attacks on an HP Procurve Network Switch 5406zl: (i)~firmware manipulation with \numprint[\%]{99} accuracy, (ii)~brute-force SSH login attempts with \numprint[\%]{98} accuracy, and (iii)~hardware tampering with \numprint[\%]{100} accuracy.
The \gls{ids} successfully detects three different classes of attacks on an HP Procurve Network Switch 5406zl: (i)~firmware manipulation with \numprint[\%]{99} accuracy, (ii)~brute-force SSH login attempts with \numprint[\%]{98} accuracy, and (iii)~hardware tampering with \numprint[\%]{100} accuracy.
The machine-learning models require a small number of power traces for training and still achieve a high accuracy for attack detection.
The concepts and techniques discussed in the paper can also extend to offer intrusion detection for embedded systems in general.
@ -111,7 +110,7 @@ To deter cases of cyberattacks, data centers often use \gls{ids}.
Current \glspl{ids} use different approaches to detect intrusions.
\glspl{hids} are implemented directly on the monitored device and leverage information provided by the system to detect intrusions.
\glspl{nids} leverage network information to detect intrusions at the network level.
Although \glspl{hids} and \glspl{nids} offer intrusion detection capabilities, they are still quite ineffective against attacks such as firmware modification~\cite{cisco_trust,thomson_2019}, bypassing secure boot-up~\cite{Cui2013WhenFM, hau_2015}, log tampering~\cite{koch2010security}, or hardware tampering\cite{rohatgi2009electromagnetic}.
Although \glspl{hids} and \glspl{nids} offer intrusion detection capabilities, they are still ineffective against attacks such as firmware modification~\cite{cisco_trust,thomson_2019}, bypassing secure boot-up~\cite{Cui2013WhenFM, hau_2015}, log tampering~\cite{koch2010security}, or hardware tampering\cite{rohatgi2009electromagnetic}.
The literature shows promising work in improving the state-of-the-art in security by analyzing side-channel emissions from embedded systems.
Systems generate side-channel emissions, which usually reflect their activity in the form of power consumption \cite{kocher1999differential, brier2004correlation, Moreno2018}, electromagnetic waves \cite{khan2019malware, sehatbakhsh2019remote}, acoustic emissions \cite{genkin2014rsa, liuacoustic}, etc.
@ -120,6 +119,7 @@ The \gls{ids} uses \gls{dsp} and \gls{ml} to detect anomalies or recognize patte
Thus, using this \gls{ids} would improve the security of the embedded system by detecting attacks that regular \glspl{ids} fail to identify.
\subsection{Contributions}
This paper proposes a side-channel-based \gls{ids} that can complement existing \glspl{ids} and improve security for embedded systems.
The side-channel based \gls{ids} can potentially protect any embedded system treated a black box and detect a range of attacks against it.
Our \gls{ids} is deployed on an HP Procurve 5406zl network switch as a black box.
@ -133,9 +133,9 @@ The side-channel based \gls{ids} achieves near-perfect accuracy scores despite u
The paper is organized as follows:
Section~\ref{sec:Overview} provides an overview of the motivation for the experiments and threat model.
Section~\ref{Related Work} describe other side-channel-based approaches for runtime monitoring and integrity assessment.
Section~\ref{Firmware} covers experiments related to firmware manipulation,
Section~\ref{RunTime} covers log verification and auditing,
and Section~\ref{Hardware} covers hardware tampering.
Section~\ref{Firmware} describes experiments related to firmware manipulation,
Section~\ref{RunTime} describes log verification and auditing,
and Section~\ref{Hardware} describes hardware tampering.
The paper concludes in Sections~\ref{Discussion} and ~\ref{Conclusion}.
\section{Overview}
@ -143,13 +143,13 @@ The paper concludes in Sections~\ref{Discussion} and ~\ref{Conclusion}.
All embedded systems leak information about their operation through side channel emissions.
Side-channel-based \glspl{ids} use \gls{dsp} methods and \gls{ml} algorithms to model the side-channel data and learn patterns that correlate to the system activity.
An important part of designing a reliable side-channel \gls{ids} is identifying appropriate side-channel emissions among temperature, vibration, ultrasound, EM, power consumption, etc.
An important part of designing a reliable side-channel \gls{ids} is identifying appropriate side-channel emissions among temperature, vibration, ultrasound, \gls{em}, power consumption, etc.
Our experiments focus on the power consumption.
Power consumption is reasonably easy to non-intrusively and reliably measure.
Side-channel-based \gls{ids} can complement \gls{hids} and \gls{nids} in offering runtime monitoring and integrity assessment for embedded systems, as shown in Table~\ref{tab:example}.
Side-channel-based \glspl{ids} run independently from the system they monitor, which makes them more difficult to circumvent compared to \gls{ids} hosted by the system.
This independence is also beneficial in case of a malfunction of the \gls{ids}, which can not disrupt the regular operation of the system.
This independence is also beneficial in case of a malfunction of the \gls{ids}, which cannot disrupt the regular operation of the system.
\begin{table}[htb]
@ -177,7 +177,7 @@ This independence is also beneficial in case of a malfunction of the \gls{ids},
\end{tabularx}
\caption{Attack scenarios that side-channel based \gls{ids} can detect}
\caption{Attack scenarios that side-channel based \gls{ids} can detect.}
\label{tab:example}
\end{table}
@ -208,7 +208,7 @@ This could be done with the purpose of keeping a particular vulnerability in the
\subsection{Analysis of Side-channels}
Electronic systems, including embedded devices, involuntarily leak information through different side channels.
Due to each side channel's specific nature, some are more useful for different applications.
Due to each side channel's specific nature, some are better for different applications.
In the context of \gls{ids} for network equipment, we considered power consumption, ultrasound and \gls{em} emissions.
After initial tests, power consumption proved to provide the most information about the system state relative to the practicality of measurement.
@ -225,13 +225,13 @@ However, its \gls{snr} is lower compared to the \gls{dc} measurement because the
\label{Related Work}
The idea of side-channel based \gls{ids} traces back to the seminal work in side-channel analysis by Paul C. Kocher.
He introduced Differential Power Analysis to find secret keys used by cryptographic protocols in tamper-resistant devices~\cite{kocher1999differential}.
This led to a field of research focussing on side-channel analysis that has been growing since. Power analysis is the most common and widely studied side-channel analysis technique~\cite{brier2004correlation,mangard2008power}. %new citations%
Cagalj et al.~\cite{vcagalj2014timing} shows a successful passive side-channel timing attack on U.S. patent Mod 10 method and Hopper-Blum (HB) protocol.
This led to a field of research focusing on side-channel analysis that has been growing since. Power analysis is the most common and widely studied side-channel analysis technique~\cite{brier2004correlation,mangard2008power}. %new citations%
Cagalj et al.~\cite{vcagalj2014timing} show a successful passive side-channel timing attack on U.S. patent Mod 10 method and Hopper-Blum (HB) protocol.
%Quisquater et al.~\cite{quisquater2002automatic} present an approach to identify executed instructions with the use of self-organizing maps, power analysis and analysis of electromagnetic traces. %new citations%
Zhai et al.~\cite{zhai2015method} propose a self-organizing maps approach that uses features extracted from an embedded processor to detect abnormal behaviour in embedded devices.
%Eisenbarth et al.~\cite{eisenbarth2010building} propose a methodology for recovering the instruction flow of microcontrollers using its power consumption.
Goldack et al.~\cite{goldack2008side} propose a solution to identify individual instructions on a PIC microcontroller by mapping each instruction type to a power consumption template.
However, the attack focussed side-channel analysis can offer non-intrusive runtime monitoring, as well. \\
However, the attack focused side-channel analysis can offer non-intrusive runtime monitoring, as well. \\
\indent
Literature shows promising work in assessing integrity through power monitoring.%~\cite{10.1145/2976749.2978299}.
Works by Moreno et al. offer two building blocks for this work.
@ -241,14 +241,13 @@ The team builds on their previous technique and presents a new one~\cite{Moreno2
%They use a signals and systems analysis approach to identify anomalies using the power consumption of a system and showcase this by identifying buffer overflow attacks on their system.
Msgna et al.~\cite{msgna2014verifying} propose a technique for using the instruction-level power consumption of a system to verify the integrity of the software components of a system with no prior knowledge of the software code.
Grisel-Davy et al.~\cite{grisel2022work} propose the verification of the boot process of various embedded systems using their power consumption signature.
%In~\cite{kur2009improving}, Kur et al. perform power analysis of smart cards based on the JavaCard platform help identify vulnerable operations, obtain bytecode instruction information, and also propose a framework to replace vulnerable operations with safe alternatives.\\
\indent
In more recent literature, there is a trend towards the use of \gls{ml} for side-channel analysis to enhance the security of systems.
Michele Giovanni Calvi~\cite{calvi2019runtime} offers a solution for runtime monitoring of an entire cyberphysical system treated as a black box.
They collect data from a self-driving car during operations such as steering and acceleration.
Using this data, they train a Long Short Term Memory~\cite{hochreiter1997long} deep learning model and use it to verify the safety of the vehicle. %new citations%
Zhengbing et al. \cite{4488501} suggest the use of forensic techniques for profiling user behaviour to detect intrusions and propose an intelligent lightweight \gls{ids}. Hanilçi et al.~\cite{hanilci2011recognition} use recorded speech from a cell phone to ascertain the cell phone brand and model through using vector quantization and \gls{svm} models on the \gls{mfcc} of the audio.
In~\cite{khan2019malware} Khan et al. propose a technique to identify malware in critical embedded and cyberphysical systems using \gls{em} side channel signals.
In~\cite{khan2019malware}, Khan et al. propose a technique to identify malware in critical embedded and cyberphysical systems using \gls{em} side channel signals.
Their technique uses deep learning on EM emanation to model the behaviour of an uncompromised system.
The system flags an activity as anomalous when the emanations differ from the normal ones used to train the neural network.
Sehatbakhsh et al.~\cite{sehatbakhsh2019remote} also use EM emanations and detect malware code injection into a known application without any prior knowledge of the malware signature.
@ -270,8 +269,8 @@ The ability to modify the firmware enables attackers to perform a range of other
The following two experiments were conducted with ten official firmware versions using the same device configuration.
Starting from the pre-installed version K.15.06.008, we performed upgrades to the next ten higher release versions (K.15.07 to K.15.17) and picked the final build for each release.
\subsubsection{Feature Engineering}
\label{FE-Firmware}
\subsubsection{Feature Engineering}\label{FE-Firmware}
With the HP Procurve Switch 5406zl taking around 120 seconds to complete its boot-up sequence, this experiment family produces the largest datasets of this case study.
Therefore, several preprocessing steps were applied to reduce the size of the datasets and remove noise.
A combination of downsampling and a sliding median filter yields the best results at a minimal size per training set.
@ -296,7 +295,7 @@ Figure~\ref{fig:firmwares} illustrates the captured data for two different firmw
\begin{subfigure}{0.49\textwidth}
\centering
\includegraphics[width=\linewidth]{images/psd.pdf}
\caption{PSD of power traces of boot-up sequences for two different firmware versions (two traces for each version)}
\caption{PSD of power traces of boot-up sequences for two different firmware versions (two traces for each version).}
\label{fig:firmwares-psd}
\end{subfigure}
\caption{Influence of different firmware versions on the power consumption at boot time.}
@ -396,8 +395,7 @@ The signal collected from the network switch is a time series $T_1 \triangleq \{
Each sample has a corresponding label that is either 1 (\gls{ssh} login attempt) or 0 (no \gls{ssh} attempt) represented as $ T_2 \triangleq \{y \in \mathbb\{0,1\}\}$.
%SSH login attempts show discernible patterns in the power traces collected.
Figure~\ref{fig:ssh_time_window} show power consumption increases during each login attempt.
Figure~\ref{fig:ssh_time_window} shows power consumption increases during each login attempt.
The data acquisition process saves these timestamps of the connections while capturing the power traces.
To create training samples for the \gls{ml} algorithms, a sliding window of \numprint{500} datapoints and step size of \numprint{250} datapoints divides the powertrace into multiple samples with $S \triangleq \{ x \in \mathbb{R}\}$ with $|S| = 500$ and $S \subseteq T_1$.
@ -411,11 +409,11 @@ Every data point in the sample is a feature of the model. If ${S \in [1]^{500}}$
\end{figure}
The samples created while applying sliding windows to the power trace exist in the time domain.
Application of \gls{fft} can convert the data from temporal domain to frequency domain. The \gls{fft} calculates the frequency spectrum for windows of 500 features. The spectrum is labelled 0 or 1, corresponding to their original labels from the temporal domain.
Application of \gls{fft} can convert the data from temporal domain to frequency domain. The \gls{fft} calculates the frequency spectrum for windows of 500 features. The spectrum is labeled 0 or 1, corresponding to their original labels from the temporal domain.
\subsubsection{Results}
A test set with \numprint{4095} samples consisting of \numprint{500} features each led to the results in Table \ref{tab:ssh-precision-comparison}.
A test set with \numprint{4095} samples consisting of \numprint{500} features each led to the results in Table~\ref{tab:ssh-precision-comparison}.
The feature engineering step extracts the samples from 20 power traces (each 50 seconds long).
In total, there were 120 power traces and the model trained over 85 of them and validated over 15.
\gls{ssh} attempts comprised \numprint[\%]{30} of the data, and the rest represented the idle behaviour of the system.
@ -467,12 +465,12 @@ Thus, \gls{svm} had the best accuracy rates along with the lowest \gls{fnr} and
\end{tabular}
\end{center}
\caption{Comparison between the different algorithms for detecting SSH login attempts}
\caption{Comparison between the different algorithms for detecting SSH login attempts.}
\label{tab:ssh-precision-comparison}
\end{table}
\subsection{Experiment 2: Classifying SSH Login Attempts}
Given a window of power trace with an SSH login attempt, the goal of Experiment II.2 to classify the login attempt as successful or unsuccessful.
Given a window of power trace with an SSH login attempt, the goal of Experiment~II.2 to classify the login attempt as successful or unsuccessful.
\subsubsection{Feature Engineering}
This experiment builds on top of experiment II.1 and classifies the \gls{ssh} login attempts detected as successful or failed.
@ -482,12 +480,12 @@ The matrix representation for this experiment is a slight modification of the pr
\subsubsection{Results}
Models trained using \glspl{svm} and \gls{1dcnn} gave the best results for the classification along with the lowest \gls{fpr} and \gls{fnr}.
Optimizing the parameters of the \gls{rfc} with 250 trees, \glspl{svm} with $C = 100$, $\gamma = 10$, and Gaussian Kernel, and \gls{1dcnn}, the accuracy score reached \numprint[\%]{96.7}, \numprint[\%]{98.5} and \numprint[\%]{98.6} respectively. Table \ref{tab:ssh-classification-precision-comparison} details all the results.
Optimizing the parameters of the \gls{rfc} with 250 trees, \glspl{svm} with $C = 100$, $\gamma = 10$, and Gaussian Kernel, and \gls{1dcnn}, the accuracy score reached \numprint[\%]{96.7}, \numprint[\%]{98.5} and \numprint[\%]{98.6} respectively. Table~\ref{tab:ssh-classification-precision-comparison} details all the results.
The experiment uses the 4095 samples extracted from experiment \ref{detect_ssh} that includes only successful and unsuccessful SSH attempts.
65\% of all the samples form the training set, 15\% contribute to the validation set, and the test set includes 20\% of all the samples.
Testing is done over roughly 1000 samples of 500 features.
The \gls{svm} model performed the best and had the lowest \gls{fpr} and \gls{fnr}.
The experiment uses the 4095 samples extracted from Experiment~II.1 that includes only successful and unsuccessful SSH attempts.
65\% of all the samples form the training set, 15\% contribute to the validation set, and the test set includes 20\% of all the samples.
Testing is done over roughly 1000 samples of 500 features.
The \gls{svm} model performed the best and had the lowest \gls{fpr} and \gls{fnr}.
\begin{table}[ht]
@ -507,12 +505,12 @@ Optimizing the parameters of the \gls{rfc} with 250 trees, \glspl{svm} with $C =
\bottomrule
\end{tabular}
\end{center}
\caption{Comparison between the different algorithms for classifying SSH login attempts}
\caption{Comparison between the different algorithms for classifying SSH login attempts.}
\label{tab:ssh-classification-precision-comparison}
\end{table}
i\section{Experiment Family III: Hardware Tampering}
\section{Experiment Family III: Hardware Tampering}
\label{Hardware}
The HP Procurve Switch 5406zl supports the on-the-fly installation of networking modules to modify the number of ports available.
@ -520,7 +518,6 @@ This capability exposes the switch to a Hardware Integrity Attack [CAPEC 440].
An attacker with physical access to the front panel of the network equipment can tamper with the modules and potentially install unauthorized ones.
Installing new modules could offer an attacker a way to gain access to the machine by leveraging a poor default configuration of the ports.
For example, on network equipment where the default configuration does not include a limit for the number of MAC addresses per port, installing an extension module could allow an attacker to perform a MAC Flood attack [CAPEC 125].
i
Existing \glspl{ids} and security software do not yet offer functionality to detect the installation of unauthorized modules.
Hence, currently, the only way to identify unauthorized hardware modification is through the use of the network equipment's involuntary emissions.
@ -534,7 +531,7 @@ In this experiment, there was no on-the-fly installation or removal of the modul
The installation or removal of an expansion module increases or decreases the average \gls{dc} and \gls{ac} power consumption of the device.
By analyzing the power consumption, it is possible to identify the number of expansion modules installed at any time.
\textbf{\gls{dc} data:} To create the training dataset, the prepossessing program extracted snippets of data randomly picked from \numprint{138} 20 seconds long \gls{dc} power consumption trace.
\textbf{\gls{dc} data:} To create the training dataset, the prepossessing program extracted snippets of data randomly picked from \numprint{138} 20 seconds long \gls{dc} power consumption traces.
Each trace is 20 seconds long to avoid any outlier condition that, for a few seconds, could affect the average power consumption and cause biased training.
Within each trace, the program picked ten snippets of five values.
Those values of the number and length of snippets correspond to the minimum training time needed to achieve a \numprint[\%]{100} accuracy with a stratified 10-fold cross-validation setup with the data used in this experiment. The average value of each snippet is then computed. The final training dataset is a 1D array of shape $(\numprint{1380},1)$.
@ -554,14 +551,14 @@ The \gls{ac} periods do present different patterns depending on the number of mo
The \gls{svm} model was able to identify the number of modules installed with an accuracy of \numprint[\%]{99}.
Results from Table~\ref{tab:hardware-results} show that \gls{dc} data yields the best results.
These high accuracy and recall performances are the result of the non-overlapping grouping of the averages \gls{dc} consummation.
These high accuracy and recall performances are the result of the non-overlapping grouping of the averages \gls{dc} consummations.
The results presented are produced with a stratified 10-fold cross-validation setup.
\begin{table}[ht]
\begin{center}
\begin{tabular}{ccccc}
\toprule
\textbf{Input data} & \textbf{Model} & \textbf{Accuracy} & \textbf{Recall}\tabularnewline
\textbf{Input Data} & \textbf{Model} & \textbf{Accuracy} & \textbf{Recall}\tabularnewline
\midrule
\gls{dc} & SVM & \numprint[\%]{100} & \numprint[\%]{100}\tabularnewline
\gls{dc} & KNN & \numprint[\%]{100} & \numprint[\%]{100}\tabularnewline
@ -569,32 +566,30 @@ The results presented are produced with a stratified 10-fold cross-validation se
\bottomrule
\end{tabular}
\end{center}
\caption{Comparison between the different models for hardware detection with a stratified 10-fold cross validation setup}
\caption{Comparison between the different models for hardware detection with a stratified 10-fold cross validation setup.}
\label{tab:hardware-results}
\end{table}
\section{Discussion}
\label{Discussion}
This section highlights important aspects of this study.
\noindent
\textbf{Influence of traffic on the results.}
\subsection{Influence of Traffic on the Results}
The data used for training the models did not include traffic and were collected in a laboratory environment.
Because the production equipment is used by actual users, it is impossible to perform attack that would disrupt to connection quality or lower the security of the device.
%Hence, flashing firmware is not possible because it requires rebooting the machine, \gls{ssh} attacks are not possible because it requires disabling some security features, and hardware tempering is not possible because it requires to physically disconnect the users.
However, complementary experiments were conducted to verify whether traffic would have a significant influence on the results of the experiment.
%This can be explained by the fact that all the expansion module consume power whether or not they have active connection.
%This property make the detection of the number of modules installed possible and it may not be the same for every networking equipment.
For Experiment Family I (section~\ref{Firmware}), the traffic can not influence the results as the there is no traffic possible during the boot-up sequence and the experiment use only the boot-up sequences to perform the classification.
For Experiment Family II (section~\ref{RunTime}) and III (section~\ref{Hardware}), we captured data containing real traffic (captures on the identical production switch) and simulated traffic (connections between multiples pairs of machines at around 1Gbps in the laboratory environment).
For Experiment Family I (Section~\ref{Firmware}), the traffic can not influence the results as there is no traffic possible during the boot-up sequence and the experiment uses only the boot-up sequences to perform the classification.
For Experiment Family II (Section~\ref{RunTime}) and III (Section~\ref{Hardware}), we captured data containing real traffic (captures on the identical production switch) and simulated traffic (connections between multiples pairs of machines at around 1Gbps in the laboratory environment).
Traffic data does not show any significant influence on \gls{dc} or \gls{ac} in both time and frequency domain.
From these results, it is possible to conclude that traffic should not affect the results from the presented experiments.
\noindent
\textbf{Support for small datasets.} As presented in this paper, the trained models can successfully detect attacks executed on the network equipment.
Those results are especially interesting as the model training step relies on a small number of training samples to achieve near perfect accuracy scores. This is a success, because (1)~our models achieve similar accuracy as some of the most successful experiments involving \gls{ml}~\cite{szegedy2017inception,xie2017aggregated} but (2)~use only a small sample size compared to image libraries with millions of image samples as training data.
Our experiments use a maximum of \numprint{1000} power trace samples.
The small number of training samples makes this approach adaptable to a range of different systems and domains because it solves the issue of collecting large amounts of data usually required to enable \gls{ml} approaches.
\subsection{Obtainable Datasets}
As presented in this paper, the trained models can successfully detect attacks executed on the network equipment.
Those results are especially interesting as the model training step relies on an obtainable number of training samples to achieve near perfect accuracy scores.
This is a success, because (1)~our models achieve similar accuracy as some of the most successful experiments involving \gls{ml}~\cite{szegedy2017inception,xie2017aggregated} but (2)~use only a small sample size compared to image libraries with millions of image samples as training data \cite{sun2017revisiting}.
Our experiments use a maximum of \numprint{4100} power trace samples.
The obtainable number of training samples makes this approach adaptable to a range of different systems and domains because it solves the issue of collecting large amounts of data usually required to enable \gls{ml} approaches.
The models trained are relatively lightweight owing to the small number of samples along with the heavy downsampling performed on data for the experiments.
The lightweight nature of the models allows for fast online run-time monitoring and integrity assessment of embedded systems.
@ -603,7 +598,7 @@ The lightweight nature of the models allows for fast online run-time monitoring
This paper introduces a physics-based \gls{ids} that offers a novel and complementary type of runtime monitoring and integrity assessment for network equipment.
The proposed \gls{ids} leverages side-channel information generated by the system at the physical level and infer the system's state and activities to detect attacks.
This paper present en evaluation of the performances against hardware tampering, firmware manipulation, and log tampering.
This paper presents en evaluation of the performances against hardware tampering, firmware manipulation, and log tampering.
The results show that the used methods achieve near perfect accuracy on all experiments with only a small training set.
Overall, the introduced techniques provide a glimpse on a general concept that is extensible to other real-time and embedded systems.
Future work can investigate additional side channels and how the interaction can even further reduce the required sample size and improve the accuracy.