final version

This commit is contained in:
Arthur Grisel-Davy 2023-08-06 12:13:47 -04:00
parent 8e6ddd5595
commit ed394f301a
2 changed files with 24 additions and 24 deletions

View file

@ -9,7 +9,7 @@
\newabbreviation{knn}{K-NN}{K-Nearest Neighbor}
\newabbreviation{rnn}{RNN}{Recurrent Neural Network}
\newabbreviation{cnn}{CNN}{Convolutional Neural Network}
\newabbreviation{svm}{SVM}{Support Vector Classifier}
\newabbreviation{svm}{SVM}{Support Vector Machine}
\newabbreviation{mlp}{MLP}{Multi Layer Perceptron}
\newabbreviation{mad}{MAD}{Machine Activity Detector}
\newabbreviation{ids}{IDS}{Intrusion Detection Systems}

View file

@ -26,8 +26,9 @@
\newcommand{\pv}{{\color{orange}[passive voice]}}
\newcommand{\wv}{{\color{orange}[weak verb]}}
% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor IEEEconf hyper-parameter}
\hyphenation{op-tical net-works semi-conduc-tor IEEEconf hyper-parameter una-voidably li-te-ra-ture exam-ple thre-sholds tra-di-tio-nal-ly ge-ne-ra-ted}
\begin{document}
\input{acronyms}
\title{\textbf{\Large MAD: One-Shot Machine Activity Detector for Physics-Based Cyber Security\\}}
@ -84,7 +85,7 @@ Common side-channel information for an embedded system includes power consumptio
Side-channel information offers compelling advantages over agent-collected information.
First, the information is difficult to forge.
Because the monitored system is not involved in the data retrieval process, there is no risk that an attacker that compromised the system could easily send forged information.
For example, if an attacker performs any computation on the system --- which is the case of most attacks --- it will unavoidably affect a variety of different side channels.
For example, if an attacker performs any computation on the system, it will unavoidably affect a variety of different side channels.
There are studies focusing on altering the power consumption profile of software, but their goal is to mask the consumption pattern to avoid leaking side-channel information.
These solutions \cite{1253591,6918465} do not offer to change the pattern to an arbitrary target but to make all activities indistinguishable.
These methods still induce changes in the consumption pattern that makes them identifiable by the detection system.
@ -145,7 +146,7 @@ This signature comparison enables the verification of expected and specific sect
Another solution for detecting intrusions is the definition of security policies.
Security policies are sets of rules that describe wanted or unwanted behavior.
These rules are built on input data accessible to the \gls{ids} such as user activity \cite{ilgun1995state} or network traffic \cite{5563714, kumar2020integrated}.
However, the input data requirements must have to apply a rule.
However, the input data requirements must have labels to apply a rule.
This illustrates the gap between the side-channel analysis methods and the rule-based intrusion detection methods.
To apply security policies to side-channel information, it is necessary to first label the data.
@ -232,7 +233,7 @@ The sample receives the label $j$ associated with the pattern $P_j$ that results
The minimum distance from the pattern $P_j$ to all other patterns $P_l$ with $l\neq i$ --- denoted $ID_j$ --- forms the basis of the threshold $T_j$.
Intuitively, the patterns in $P$ represent most of the patterns expected in the trace.
Thus, to decide that a substring matches a pattern $P_j$, it must match $P_j$ better than any other pattern $P_l$ with $l\neq i$ does.
Otherwise, suggest assigning the substring to $P_j$ when the training pattern of another class matches $P_j$ better, which is counter-intuitive.
Otherwise, the algorithm would assign the substring to $P_j$ when the training pattern of another class matches $P_j$ better, which is counter-intuitive.
The inter-distance between $P_j$ to $P_l$, defined as
\begin{equation}
ID(P_j,P_l) = \min_{i\in[0,N_l-N_j]} nd(P_j,P_l[i:i+N_j])
@ -363,12 +364,11 @@ Thus the second part also terminates.
Finally, the third part uses the same loops as the second and also terminates.
Overall, \gls{mad} always terminates for any finite time series and finite set of finite patterns.
\textbf{Influence of $\alpha$}
\textbf{Influence of $\alpha$: }
The shrink coefficient $\alpha$ is the only hyperparameter of the detector.
Its default value is one.
$\alpha$ controls the threshold of similarity that a substring should cross to get qualified as a match to a pattern.
$\alpha$ takes its value in $\mathbb{R}_*^+$.
The default value for $\alpha$ is one.
This value follows the intuitive reasoning presented in Section~\ref{sec:solution}.
To better understand the influence of the shrink coefficient, the algorithm can be perceived as a 2D area segmentation problem.
@ -389,9 +389,9 @@ This behavior naturally emerges when the areas of attraction of the patterns are
The shrink coefficient $\alpha$ --- through the modification of the threshold $T_j$ --- provides control over the shrink of the areas of attraction.
The lower the value of $\alpha$, the smaller the areas of attraction around each sample.
Applying a coefficient to the thresholds produces a reduction of the radius of the area of attraction, not an homothety of the initial areas.
In other words, the shrink does not preserve the shape of the area.
The shrinkage does not preserve the shape of the area.
For a value $\alpha < 0.5$, all areas become disks --- in the 2D representation --- and all shape information is lost.
Figure~\ref{fig:areas} illustrate the areas of capture around the patterns for different values of $\alpha$.
Figure~\ref{fig:areas} illustrates the areas of capture around the patterns for different values of $\alpha$.
\begin{figure}
\centering
@ -403,8 +403,8 @@ Figure~\ref{fig:areas} illustrate the areas of capture around the patterns for d
The influence of the $\alpha$ coefficient on the classification is monotonic and predictable.
Because $\alpha$ influences the thresholds, changing $\alpha$ results in moving the transitions in the detected labels.
In other words, a lower value of $\alpha$ expands the unknown segments while a higher value shrinks them until they disappear.
Figure~\ref{fig:alpha_impact} illustrates the impact $\alpha$ on the width of unknown segments.
A lower value of $\alpha$ expands the unknown segments while a higher value shrinks them until they disappear.
Figure~\ref{fig:alpha_impact} illustrates the influence that $\alpha$ has on the width of unknown segments.
The impact of $\alpha$ on the number of unknown samples is also monotonic.
\begin{proof}
@ -494,7 +494,7 @@ Table~\ref{tab:dataset} presents the times series and their characteristics.
\centering
\caption{Characteristics of the machines in the evaluation dataset.}
\begin{tabular}{lcc}
Name & length & Number of states\\
Name & Length & Number of states\\
\toprule
NUCPC-0 & 22700 & 11\\
NUCPC-1 & 7307 & 8\\
@ -538,8 +538,8 @@ However, the individual consumption of some appliances fit the problem statement
The REFIT-H4A1 is the first appliance of the fourth house and corresponds to a fridge.
The REFIT-H4A4 is the fourth appliance of the fourth house and corresponds to a washing machine.
The activity in this second time series was sparse with long periods without consumption.
The no-consumption sections are not challenging --- i.e., all detectors perform well on this type of pattern ---, make the manual labeling more difficult, and level all results up.
For this reason, we removed large sections of inactivity between active segments to make the time series more challenging without tempering with the order of detector performances.
The no-consumption sections are not challenging --- i.e., all detectors perform well on this type of pattern ---, and make the manual labeling more difficult, and level all results up.
For this reason, we removed large sections of inactivity between active segments to make the time series more challenging without tampering with the order of detector performances.
\subsection{Alternative Methods}
We implemented three alternative methods to compare with \gls{mad}.
@ -555,9 +555,9 @@ The alternative detectors are not meant to handle variable-size time series as i
For the \gls{svm} and \gls{mlp} detectors, the window size is shorter than the shortest pattern.
The training sample extraction algorithm slides the window along all patterns to extract all possible substrings.
These substrings constitute the training dataset with multiple samples per pattern.
The \gls{mlp} is implemented using \cite{keras} and composed of a single layer with 100 neurones.
The \gls{mlp} is implemented using Keras~\cite{keras} and composed of a single layer with 100 neurones.
The number of neurones was chosen after evaluating the accuracy of the \gls{mlp} on one of the dataset (NUCPC\_1) with varying numbers of neurones.
Similarly, the \gls{svm} detector is implemented using \cite{sklearn} with the default parameters.
Similarly, the \gls{svm} detector is implemented using scikit-learn~\cite{sklearn} with the default parameters.
The \gls{1nn} considers one window per pattern length around each sample.
Every window is compared to its pattern, and the normalized Euclidean distance is considered for the decision.
Overall, it is possible to adapt the methods to work with variable length patterns, but \gls{mad} is the only pattern-length-agnostic method by design.
@ -591,7 +591,7 @@ The scenario comprises four phases:
\begin{itemize}
\item 1 Night Sleep: During the night and until the worker begins the day, the machine is asleep in S3 sleep state\cite{sleep_state}. Any other state than sleep is considered anomalous during this time.
\item 1 Night Sleep: During the night and until the worker begins the day, the machine is asleep in S3 sleep state\cite{sleep_states}. Any other state than sleep is considered anomalous during this time.
\item 2 Work Hours: During work hours, little restriction is applied to the activity. Only a long period with the machine asleep is considered anomalous.
\item 3 Maintenance: During the night, the machine wakes up as part of an automated maintenance schedule. During maintenance, updates are fetched, and a reboot is performed.
\item 4 No Long High Load: At no point should there be a sustained high load on the machine. Given the scenario of classic office work, having all cores of a machine maxed out is suspicious. Violations of this rule are generated by running the program xmrig for more than 30 seconds. Xmrig is a legitimate crypto-mining software, but it is commonly abused by criminals to build crypto-mining malware.
@ -615,15 +615,15 @@ The pre-processing step downsamples the trace to 20 samples per second using a m
This step greatly reduces the measurement noise and the processing time and increases the consistency of the results.
The final sampling rate of 20 samples per second was selected empirically to be around one order of magnitude higher than the typical length of the patterns to detect (around five seconds).
For each compressed day of the experiment (four hours segment, thereafter referred to as days), the \gls{mad} performs state detection and returns a label vector.
For each compressed day of the experiment (four hours segment, thereafter referred to as days), \gls{mad} performs state detection and returns a label vector.
This label vector associates a label to each sample of the power trace following the mapping: -~1 is UNKNOWN, 0 is SLEEP, 1 is IDLE, 2 is HIGH and 3 is REBOOT.
The training dataset comprises one sample per state, captured during the run of a benchmark script that interactively places the machine in each state to detect.
The script on the machine generates logs that serve as ground truth to verify the results of rule checking.
The traces and ground truth for each day of the experiment are available online \cite{name_hidden_for_peer_review_2023_8192914}.
Please note that day 1 was removed due to a scheduling issue that affected to scenario.
Figure~\ref{fig:preds} present an illustration of the results.
Please note that day 1 was removed due to a scheduling issue that affected the scenario.
Figure~\ref{fig:preds} presents an illustration of the results.
The main graph line in the middle is the power consumption over time.
The line's colour represents the machine's predicted state based on the power consumption pattern.
The lines colors represent the machine state predicted from the power consumption pattern.
Below the graph, two lines illustrate the labels vectors.
The top line is the predicted labels and can be interpreted as a projection of the power consumption line on the x-axis.
The bottom line is the label's ground truth, generated from the scenario logs.
@ -697,7 +697,7 @@ In this section, we highlight specific aspects of the proposed solution.
\textbf{Dynamic Window vs Fixed Windows: }
One of the core mechanisms of \gls{mad} is the ability to choose the best-fitting window to classify each sample.
This mechanism is crucial to overcome some of the shortcomings of a traditional \gls{1nn}.
It is essential to understand the advantages of this dynamic window placement to fully appreciate the performances of \gls{mad}
It is essential to understand the advantages of this dynamic window placement to fully appreciate the performances of \gls{mad}.
Figure~\ref{fig:proof} illustrates a test case that focuses on the comparison between the two methods.
In this figure, the top graph represents the near-perfect classification of the trace into different classes by \gls{mad}.
To make the results more comparable, the $\alpha$ parameter of \gls{mad} was set to $\infty$ to avoid the distance threshold mechanism and focuses on the dynamic window placement.
@ -705,12 +705,12 @@ The middle graph represents the classification by a \gls{1nn}, and it illustrate
The bottom graph represents the predicted state for each sample by each method with -~1 the UNKNOWN state and $[0-4]$ the possible states of the trace.
\begin{itemize}
\item Transition Bleeding Error: Around transitions, tends to miss the exact transition timing and miss-classify samples.
\item Transition Bleeding Error: Around transitions, \gls{1nn} tends to miss the exact transition timing and miss-classify samples.
This is explained by the rigidity of the window around the sample.
At the transition time, the two halves of the window are competing to match different states.
Depending on the involved states' shape, it may require more than half of the window to prefer the new state, leading to miss-classified samples around the transition.
In contrast, \gls{mad} will always choose a window that fully matches either of the states, and that is not across the transition, avoiding the transition error.
\item Out-of-phase Error: When a state is described by multiple iterations of a periodic pattern, the match between a window and the trace varies dramatically every half-period. When the window is in phase with the pattern, the match is maximal and \gls{1nn} perfectly fills its role. However, when the window and the pattern are out of phase, the match is minimal, and the nearest neighbour may be a flat pattern at the average level of the pattern. This error manifests itself through predictions switching between two values at half the period of the pattern. \gls{mad} avoids this error by moving the window by, at most, half a period to ensure a perfect match with the periodic pattern.
\item Out-of-phase Error: When a state is described by multiple iterations of a periodic pattern, the match between a window and the trace varies dramatically every half-period. When the window is in phase with the pattern, the match is maximal and \gls{1nn} perfectly fills its role. However, when the window and the pattern are out of phase, the match is minimal, and the nearest neighbor may be a flat pattern at the average level of the pattern. This error manifests itself through predictions switching between two values at half the period of the pattern. \gls{mad} avoids this error by moving the window by, at most, half a period to ensure a perfect match with the periodic pattern.
\item Unknown-Edges Error: Because of the fixed nature of the window of a \gls{1nn}, every sample that is less than a half window away from either end cannot be classified. This error is not so important in most cases where edge samples are less important, and many solutions are available to solve this issue. However, \gls{mad} naturally solve this issue by shifting the window only in the valid range until the edge.
\end{itemize}