deneir/PhD/research_proposal/futurwork.tex
2023-06-14 11:34:01 -04:00

124 lines
10 KiB
TeX

\chapter{Planned Work}\label{chap:futurwork}
All the work achieved in the preliminary work serves as the foundation for the planned work.
The thesis will focus on the state detection problem under various input data and detection requirements.
Detecting the state of a system constitute a stepping stone in the construction of specialized tools for physics-based security.
As illustrated by the \gls{sds} and \gls{bpv}, the detection of specific attacks often relies on the ability to pre-process the time series to find sections of interest.
In this sense, solving the state detection problem enables a deeper investigation of power consumption by making the data actionable.
The different machines and data measurement designs lead to different problems to solve and different detection capabilities.
This chapter described the problems to study with their problem statement as well as the motivations and expected results.
The problems are discretized based on the input data and measured machines that constitute the power trace.
A single sensor only measure the power flowing through one cable.
It is possible to combine sensores to measure multiple related consumptions --- for example, the consumptions of different components in the same machine.
In this case, the problem is called \textit{multi-measure} and the resulting input data is multivariate trace.
It is also possible to place the sensor on a power cable that provide power to multiple machines.
In this case, the problem is called \textit{multi-sources} and the resulting input data is an aggregate of multiple traces.
The difference between machines and components is a fine and blury line as the description of a machine often fits individual components.
In this thesis, a component is a system that expects instructions from a central unit while a machine run its own software.
For example, at a macroscopic scale, a graphics card does not take the initiative on its own to run any software and expect instructions from the rest of the \gls{pc}.
\section{Single-Source, Single-Measure}
The \gls{dsd} --- example of a Single-Source Single-Measure problem --- shows promising results in an experimental setup.
To this date, the experiments have focused on the detection of simple global states.
The global state are usualy \textit{OFF}, \textit{ON}, \textit{BOOT}, \textit{HIGH LOAD}.
Depending on the machine, other states like \textit{FIRMWARE FLASH}, \textit{SLEEP} or a specific activity mode can also be detected.
The experiments focus on the deployment to general-purpose computers, network switches, and \gls{wap}/routers.
In the next months, the goal for the \gls{dsd} is to evaluate the performances of the runtime state detection in broaders and more exhaustives conexts.
The current accuracy and edit distance performances (see Figure \ref{fig:dsd_acc}) illustrate the capabilities of the \gls{dsd} for the detection of well defined states --- i.e. states associated with a striking variation of average power consumption.
However, in order to provide a useful and reliable runtime labeling of the a machine's activity, the \gls{dsd} must achieve similar results with a more diverse selection of states.
The work on \gls{dsd} is the fundation for the planned development of more specific applications of the same principle of physics-based monitoring.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{images/dsd_acc}
\caption{Current results of the DSD algorithm on several datasets.}
\label{fig:dsd_acc}
\end{figure}
\section{Single-Source, Multi-Measure}
The global power consumption of a machine does not fully describe its activity.
In an embedded system, the power consumption can be attributed to different components, each with its specific activity.
For the simplest systems performing one specific task, the activity of each component is often correlate with each other.
If the system is in a Mode \textit{A} then each component is in Mode \textit{A}, and the global power consumption will display the Mode \textit{A} pattern.
For more complex systems, different components can be in different modes to accommodate the multi-tasking nature of the global activity.
In this case, if the first component is in Mode \textit{A} but the second is in Mode \textit{B}, this indicates a different global activity than if both are in the same mode.
For example, if the bootup sequence of a general-purpose computer shows a significant \gls{cpu} activity but no \gls{hdd} activity, it could indicate a failure to boot or an attacker booting the system from external storage.
Access to each component's individual power consumption opens the way to a more granular understanding of the machine's activity.
However, the multivariate aspect of the captured data requires an evolution of the detection techniques.
\subsection{Problem Statement}
Differentiating between the different components to better understand the activity of a machine is a valuable capability associated with a new problem.
\begin{problem-statement}[Single-Source Multi-Measure]
Given a discretized, multivariate time series $ts$ and a set of $n$ components for each of $m$ patterns $P=\{\{\chi\},P_1=\{P_{1,1},\dots, P_{1,n}\},\dots,$
$P_m=\{P_{m,1},\dots, P_{m,n}\}\}$, identify an injective mapping $m_{SSMM}:\mathbb{N}\longrightarrow P$ such that every sample $ts[i]$
maps to exactly one set of pattern components in $P$ with the condition that the sample matches an occurrence of the set of patterns in $t$.
\end{problem-statement}
The time series $ts$ is a discretized, multivariate, real-valued time series.
$ts$ is composed of $n$ dimensions with the $j^{th}$ dimension referred to as $ts_j$.
Each sample $ts[i]$ is a vector or $n$ component representing the value of each dimension of $t$ at a point in time.
The items of the set $P$ are sets of patterns $P_j$ with $j\in[1,m]$.
Each set of patterns $P_j$ is associated with one component of a global pattern.
In other words, each component $P_{j,k}$ represent a the pattern $j$ along the $k^{th}$ dimension of $ts$.
Thus, the number of components of each pattern must be equal to the dimensions of $ts$.
Figure \ref{fig:notation} illustrate the $ts$ and $P$ objects.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{images/ssmm_illustration.pdf}
\caption{Notations for the multivariate time series and the patterns set.}
\label{fig:notation}
\end{figure}
\subsection{Applications}
The goal of the multi-measure setup is dual.
First, correlated information allows for a more robust detection mechanism.
If all components of a machine display behaviours associated with the same global activity, the detection confidence is greater than with the global consumption only.
Second, multiple measures enable a more granular activity detection.
With the power consumption measurement of multiple components available, every combination of component's activity can be associated with a different global activity.
These changes allow for detecting potentially anomalous combinations of states and for a better understanding of the machine's behaviour.
The typical application of this technology would concern general-purpose computers or medium-complexity systems with multiple internal components.
These machines are typically difficult to profile with global consumption as each component influences the measure in a different way.
The detection of the activity can be restricted to general states like \textit{ON}, \textit{OFF}, \textit{SLEEP} or \textit{HIGH LOAD}.
While this information is still valuable, it does not enable in-depth monitoring of the machine.
\section{Multi-Source Single-Measure}
If the Single-Source Multi-Measure was looking \textit{in} a machine to get more insight, the Multi-Source Single-Measure is looking \textit{out} and considering multiple devices at once.
In a context where measuring the consumption of individual machines is not possible, the problem of disambiguation arises.
Signal disambiguation is the ability to identify the source of each component signal from a single aggregated signal.
This is a complicated problem as the different sources can affect each other, sometimes in a non-linear way.
\subsection{Problem Statement}
\begin{problem-statement}[Multi-Source Single-Measure]
Given a discretized aggregated time series $t_a = t_1 \oplus t_2 \oplus \dots \oplus t_k$ and a set of patterns $P=\{(P_1\times\dots\times P_n)\}$, identify an injective mapping $m_{MSSM}:\mathbb{N}\longrightarrow P$ such that every sample $t_a[i]$ maps to a pattern set in $P$ with the condition that the sample matches an occurrence of the pattern in $t_a$.
\end{problem-statement}
The time series $t_a$ is a discretized, mono-variate, real-valued time series.
The set of patterns $P$ is the cartesian product of the sets of patterns for each source $P_i$.
Thus, each element of $P$ is a set of $n$ patterns, each associated with one source.
Each set $P_i$ contain any number of pattern and the unknown $\chi$ pattern.
The unknown pattern is not added to the set $P$ as the set of all $\chi$ is already present and bears the same meaning.
The operator $\oplus$ is the aggregation function, generally the summation or caped summation.
In some applications, the associativity of the $\oplus$ operator can be discarded as the aggregation is performed at the physical level, instantly across all sources $t_i$.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{images/mssm_illustration}
\caption{Illustration of the MSSM setup.}
\label{fig:mssm_illustration}
\end{figure}
\subsection{Applications}
The successful design of a Multi-source Single-Measure monitoring system finds its best application in an industrial setting.
Any industry that relies on many simple embedded systems to reliably perform a task can benefit from a monitoring system that is minimally disruptive to install.
For example, an assembly line can leverage hundreds of conveyor belt drivers, robotic arms, or quality assessment points.
Each type of system is simple in its design and task.
However, adding a designated power monitoring measurement device to each individual system is costly, maintenance-heavy, and it multiplies the potential points of failure.
Capturing the power consumption of these machines at a single point is an efficient way to minimize the implementation footprint while maintaining a reliable physics-based monitoring solution.
\section{Conclusion}
\agd{to be filled}