update proposal

2023-06-13 21:50:12 -04:00 · 2023-06-13 21:50:12 -04:00 · 737c930a15
commit 737c930a15
parent f20570f5d0
13 changed files with 751 additions and 228 deletions
--- a/PhD/research_proposal/introduction.tex
+++ b/PhD/research_proposal/introduction.tex
@ -6,70 +6,75 @@ These systems are present in many aspects of our daily life from transportation
 \gls{scs} are now more and more computer-based to enable features such as remote control or lower cost maintenance.
 These systems are also increasingly connected to the internet to allow for offsite monitoring or data collection.
 This digitalization of \gls{scs} also brings the undesirable aspects of connected computers.
-The more connection and interraction types are available to a computer system, the greater is the risk of an attack using one of these connection to be discovered.
-This sum of potential attack points, called attack surface, should typically be as reduced as possible, especially for \gls{scs} that require high reliability and availability.
+The more connection and interraction types are available to a computer system, the greater is the risk of an attacker using one of these connection.
+This sum of potential attack points, called attack surface, should typically be as small as possible, especially for \gls{scs} that require high reliability and availability.
 Increasing the capabilities and connectivity of \gls{scs} enable large scale attacks that would be infeasible before.
 For example, if all the water treatment plants in Canada are equipped with a data collection mechanisme exposed to the internet for centralized analysis, then an atacker could leverage this mechanism to take over all these system and put the whole country at risk.

 A wide variety range of solutions are available to protect computer systems in general.
 Among them, \gls{ids} aim at detecting security policies violations or suspicious activities from or among computers.
-This detection is often enabled by the collection and analysis of data related to the machine's activity.
-If the ressources considered are local to the machine (e.g. CPU load, RAM data, disks read/write speed), then the detection system is called \gls{hids}. 
+Collection and analysis of data related to the machines activity often enable the detection.
+If the \gls{ids} only consideres local ressources (e.g. CPU load, RAM data, disks read/write speed), then it is called \gls{hids}. 
 \gls{hids} have access to relevant local data\cn but they require to install a software on the machine (either for collection only or for local analysis).
-This represent a potential flaw as the host machine may not be trusted and can be compromised, allowing the attacker to deploy stealth attacks \cite{10.1145/586110.586145}.
-Moreover, an \gls{hids} can lack the broader vision required to detect intrusions distributed over a network of machines\cn.
+This represent a potential flaw for multiple reasons.
+First, the host machine may not be trusted and can be compromised, allowing the attacker to deploy stealth attacks \cite{10.1145/586110.586145}.
+Second, an \gls{hids} can lack the broader vision required to detect intrusions distributed over a network of machines\cn.
+Finally, the operation of the \gls{hids} may interfer with the critical operation of the system (for example if the \gls{hids} missbehave and block other operations).
+For these reasons, \gls{hids} may be difficult to implement on a wide range of embedded systems.
+
 The other main class of \gls{ids} aims at solving these issues.
 \gls{nids} \cite{vigna1999netstat, bivens2002network} consider the communication between machines in a network to detect intrusions.
 This solution does not require installing individual software on each machines and can detect network-level intrusions \cn. 
 However, \gls{nids} present their own concerns.
 First, machine-specific attacks can remain undetected as only network information are accessible.
 Then, they require the installation of dedicated equipment to collect network traffic.
-Finally, modern traffic encryption practices will limit the \gls{nids} the sender-receiver pattern analysis unless traffic flows unencrypted, which can raises privacy issues.
+Finally, modern traffic encryption practices will limit the \gls{nids} to sender-receiver pattern analysis unless traffic flows unencrypted, which can raises privacy issues.
+
 It appears that the current \gls{ids} scene present a tradeoff between granularity of detection and isolation from the protected machine.
 What about the case of protecting a machine against a local intrusion without the possibility to install additional software?
-This use case can seem a niche one but it is a reality for many purpose-built embedded systems with minimal \gls{os}. 
-Systems like network switchs, \gls{rtu}, \gls{wap} rarely allow the installation of any software and yet are of critical importance.
+This use case can seem niche but it represent a reality for many purpose-built embedded systems with minimal \gls{os}. 
+Systems like network switchs, \gls{rtu}, \gls{wap} rarely allow the installation of any software and yet perform critical tasks.
 In these cases, neither local ressources usage or network information can be leveraged for local attacks detection.
 Moreover, any industry that rely on \gls{scs} have strict regulations (e.g. DO-178C for aerospace systems in Canada, ISO 26262 for automotive system, ISO 16142 for medical devices) that guarantee the safety of every equipment.
-In this context, modifying an existing system to add intrusion detection capabilities is expensive as it requires the re-validation of the whole system.
-An external solution relying on side-channel anaylysis is easier to get certified as it does not directly interact with the \gls{scs}
+Modifying an existing system to add intrusion detection capabilities is expensive as it requires the re-validation of the whole system.
+%An external solution relying on side-channel anaylysis is easier to get certified as it does not directly interact with the \gls{scs}

-Another under-exploited source of information for embedded systems activity are the side-channels.
+A third, under-exploited, source of information for embedded systems activity are the side-channels.
 The side-channels are all the physical emissions that a machine involuntarely generates.
-For example, the sound of a fan, the temperature of a CPU, or the power consumption of a \gls{psu} are all common side-channels \cn.
+For example, the sound of a fan, the temperature of a CPU, or the power consumption of a \gls{psu} are common side-channels \cn.

 \begin{figure}[H]
    \centering
    \includegraphics[width=\linewidth]{images/side_channel}
-    \caption{Main side channels from a typical embedded systems.}
+    \caption{Main side-channels from a typical embedded systems.}
    \label{fig:side_channel}
 \end{figure}

-Historically, side channels have been leveraged for attacks.\agd{rewrite this to not spoil the related work section but still present the context}
-Eventhough the main use of side-channel analysis is to attack a system, the core idea is to retrieve information correlated with the system's activity.
-With enougth knowledge of the normal behavior of a machine, an algorithm should be able to use the side-channel signature of a machine to assess its correct behavior.
+Eventhough the historical usecase of side-channel analysis is to attack a system, the core idea is to retrieve information correlated with the system's activity.
+With enougth knowledge of the normal behavior of a machine, an algorithm should be able to assess its correct behavior from only side-channel information.
 This idea is called physics-based security and is the core principle of this research work.

-
 \section{Proposal Organization}
-This proposal is organized as follow: Section \ref{sec:related-work} presents an overview of the related work, Chapter \ref{chap:pastwork} presents the preliminary work conducted until now, Chapter \ref{chap:futurwork} presents the main problems I want to address during my research, and finally Chapter \ref{chap:timetable} draws a proposed timeline for the completion of the planned work.
+This proposal is organized as follow: Section~\ref{sec:related-work} presents an overview of the related work, Chapter~\ref{chap:pastwork} presents the preliminary work conducted until now, Chapter~\ref{chap:futurwork} presents the main problems I want to address during my research, and finally Chapter~\ref{chap:timetable} draws a proposed timeline for the completion of the planned work.



 \section{Related Work}\label{sec:related-work}
-The idea of side-channel based IDS traces back to the seminal work in side-channel analysis by Paul C. Kocher.
-He introduced Differential Power Analysis to find secret keys used by cryptographic protocols in tamper resistant devices~\cite{kocher1999differential}.
-This led to a field of research focussing on side-channel analysis that has been ever growing. Power analysis is the most common and widely studied side-channel analysis technique~\cite{brier2004correlation,mangard2008power}. %new citations% 
-Cagalj et al.~\cite{vcagalj2014timing} show a successful passive side-channel timing attack on U.S. patent Mod 10 method and Hopper-Blum (HB) protocol.  
-Quisquater et al.~\cite{quisquater2002automatic} present an approach to identify executed instructions with the use of self-organizing maps, power analysis and analysis of electromagnetic traces. %new citations%
-Zhai et al.~\cite{zhai2015method} propose a self-organizing maps approach that uses features extracted from an embedded processor to detect abnormal behavior in embedded devices.  
-Eisenbarth et al.~\cite{eisenbarth2010building} propose a methodology for recovering the instruction flow of microcontrollers using its power consumption.
-Goldack et al.~\cite{goldack2008side} propose a solution to identify individual instructions on a PIC microcontroller through mapping each instruction type to a power consumption template. 
-Side-channel signatures can provide information about the integrity of a system.
+The idea of side-channel based analysis traces back to the seminal work by Paul C. Kocher.
+He introduced \gls{dpa} to find secret keys used by cryptographic protocols in tamper resistant devices~\cite{kocher1999differential}.
+This led to a field of research focussing on side-channel analysis that grew ever since.
+A wide variety of side-channels have since been leveraged to recover information from a system such as power consumption \cite{brier2004correlation,mangard2008power}, electromagnetic fields~\cite{sayakkara2019survey}, acoustic emanations~\cite{7479068, alevi2015keyboard}, thermal dissipations~\cite{9727162} or, on the non-physical side, cache~\cite{page2003defending}.
+
+Among them, power consumption is the most common and widely studied side-channel because of its numerous advantages.
+Power consumption leaks information about the activity of an embedded system with a low inertia --- i.e., it can transmit high frequency information contrary to thermal ---, is easy to measure with low-cost equipment at specific points in a machine --- contrary to electromagnetic fields or sound --- and is guaranteed to be present in any system.
+This combination of properties allow for a granular detection of a system activity, even at the instruction level.
+Quisquater et al.~\cite{quisquater2002automatic} present an approach to identify instructions with the use of self-organizing maps, power analysis and analysis of electromagnetic traces.\agd{this citation comes out of nowhere}
+Eisenbarth et al.~\cite{eisenbarth2010building} propose a methodology for recovering the instruction flow of microcontrollers using its power consumption.\agd{this citation comes out of nowhere}
+
+Eventhough the information portential of side-channel analysis enable powerfull attacks, it also enables defensive capabilities.
+Zhai et al.~\cite{zhai2015method} propose a self-organizing maps approach that uses features extracted from an embedded processor to detect abnormal behavior in embedded devices. 
 Different teams at Georgia Tech University leveraged power and electromagnetic backscattering \cite{8701559, jorgensen2022efficient} to detect hardware trojans and counterfeit integrated circuit.
-Due to its non-intrusive and architectur-agnostic nature, power fingerprinting has a wide range of applications from energy production systems \cite{6378346}, Software Defined Radio compliance assesments \cite{5379826}.
-However, the attack focussed side-channel analysis can offer non-intrusive run-time monitoring, as well. \\
-\indent
+Due to its non-intrusive and architectur-agnostic nature, power fingerprinting has a wide range of applications from energy production systems \cite{6378346}, Software Defined Radio compliance assesments \cite{5379826}, or applications activity on mobile devices \ref{8057232}.
 Literature shows promising work in assessing integrity through cache monitoring~\cite{7163050} and power monitoring~\cite{10.1145/2976749.2978299}.
 Works by Moreno et al. offer two building blocks for this work.
 In~\cite{moreno2013non}, the team proposes a solution for non-intrusive debugging and program tracing using side-channel analysis.
@ -77,13 +82,19 @@ In this work, they use the power consumption of a given embedded system to ident
 The team builds on their previous technique and presents a new one~\cite{Moreno2018} using the power consumption of embedded systems for non-intrusive online run-time monitoring through anomaly detection.
 They use a signals and systems analysis approach to identify anomalies using the power consumption of a system and show case this by identifying buffer overflow attacks on their system. 
 Msgna et al.~\cite{msgna2014verifying} propose a technique for using the instruction-level power consumption of a system to verify the integrity of the software components of a system with no prior knowledge of the software code.
-In~\cite{kur2009improving}, Kur et al. perform power analysis of smart cards based on the JavaCard platform helps identify vulnerable operations, obtain bytecode instruction information, and also proposes a framework to replace vulnerable operations with safe alternatives.\\  
-\indent
+In~\cite{kur2009improving}, Kur et al. perform power analysis of smart cards based on the JavaCard platform helps identify vulnerable operations, obtain bytecode instruction information, and also proposes a framework to replace vulnerable operations with safe alternatives.\\
+
+The non-intrusiveness and difficult-to-forge nature of side-channel information makes it ideal input for developing \gls{ids} systems.
+Van Aubel et al.~\cite{van2018side} proposed using electromagnetic information to protect \gls{ics} by detecting changes in software flow.
+Xun et al.~\cite{10016748} uses voltage signal of a vehicule CAN bus to detect anomalies without extensive documentation from the manufacturer.
+On a different kind of embedded systems, Liang et al. propose a framework to leverage side-channel information in additive manufacturing where tradictional \gls{ids} would fail.
+
 In more recent literature, there is a trend towards the use of \gls{ml} for side-channel analysis to enhance the security of systems.
 Michele Giovanni Calvi~\cite{calvi2019runtime} offers a solution for run time monitoring of an entire cyberphysical system treated as a black box.
 They collect data from a self-driving car during operations such as steering and acceleration.
-Using this data, they train an Long Short Term Memory~\cite{hochreiter1997long} deep learning model and use it to verify the safety of the vehicle. %new citations% 
-Zhengbing et al. \cite{4488501} suggest the use of forensic techniques for profiling user behaviour to detect intrusions and propose an intelligent lightweight \gls{ids}. Hanilçi et al.~\cite{hanilci2011recognition} use recorded speech from a cell phone to ascertain the cell phone brand and model through using vector quantization and \gls{svm} models on the \gls{mfcc} of the audio.
+Using this data, they train an Long Short Term Memory~\cite{hochreiter1997long} deep learning model and use it to verify the safety of the vehicle. 
+Zhengbing et al.~\cite{4488501} suggest the use of forensic techniques for profiling user behaviour to detect intrusions and propose an intelligent lightweight \gls{ids}.
+Hanilçi et al.~\cite{hanilci2011recognition} use recorded speech from a cell phone to ascertain the cell phone brand and model through using vector quantization and \gls{svm} models on the \gls{mfcc} of the audio.
 In~\cite{khan2019malware} Khan et al. propose a technique to identify malware in critical embedded and cyberphysical systems using \gls{em} side channel signals.
 Their technique uses deep learning on EM emanation to model the behavior of an uncompromised system.
 The system flags an activity as anomalous when the emanations differ from the normal ones used to train the neural network. 
@ -95,13 +106,11 @@ Using the ML methods, they can determine the state of cellphone cameras.
 A mechanical equivalent of physics-based security is \gls{mcm} that aims at monitoring the evolution of key parameters of a machine for health assessment.
 This topic is not restricted to ditecting attackers activity and can inform about the health of the machine over timeto enable timely maintenance.
 Different technics are deployed based on the machine type and the specific metrics of interest.
-Machining equipment is often monitored with side-channel measurment such as  vibration \cite{PENG2004199,4084702,HOU2021107451} sound \cite{sound_mcm}, temperature \cite{22438} or chemical analysis \cite{tavner1987condition}. 
+Machining equipment is often monitored with side-channel measurment such as vibration~\cite{PENG2004199,4084702,HOU2021107451} sound~\cite{sound_mcm}, temperature~\cite{22438} or chemical analysis~\cite{tavner1987condition}. 
 These technics focuses on mechanical machines with high reliability requirements and leverage side-channel information to reduce intrusivity.

 On a larger scale, power consumption information for a whole house -- or even a whole building -- provides information about the activity of each appliance. 
 Monitoring, or prediction applications can leverage this information without the need for a measurment system on each endpoint. 
-This idea of non-intrusive load monitoring was first proposed by Hart in 1992 \cite{hart1992nonintrusive}.
-The interests and challenges posed by the problem yielded different proposed solutions such as \gls{cnn} \cite{moradzadeh2021practical}, soft computing \cite{puente2020non}, or guassian models fitting on electromagnetic signatures \cite{10.1145/1864349.1864375}.
-The concepts of signal disambiguation and individual consumption retrieval are transposable from a house omposed of appliances to an embedded system composed of devices.
-
-
+This idea of non-intrusive load monitoring was first proposed by Hart in 1992~\cite{hart1992nonintrusive}.
+The interests and challenges posed by the problem yielded different proposed solutions such as \gls{cnn}~\cite{moradzadeh2021practical}, soft computing~\cite{puente2020non}, or guassian models fitting on electromagnetic signatures~\cite{10.1145/1864349.1864375}.
+The concepts of signal disambiguation and individual consumption retrieval are transposable from a house composed of appliances to an embedded system composed of devices.