add results and explanations

This commit is contained in:
Arthur Grisel-Davy 2023-07-25 13:52:04 -04:00
parent a4b484a433
commit 2f2ea82205
4 changed files with 45 additions and 22 deletions

View file

@ -52,7 +52,7 @@ The low intrusiveness, independence with the host, data reliability and difficul
However, side-channel information often comes in the form of unlabeled time series representing a proxy variable of the activity.
Enabling the definition and enforcement of high-level security policies requires extracting the state or activity of the system.
We present in this paper a novel time series, one-shot classifier called Machine Activity Detector (MAD) specifically designed and evaluated for side-channel analysis.
We evaluate MAD in two case studies on a variety of machines and datasets where it outperforms other traditional state detection solutions.
We evaluate MAD in two case studies on a variety of machines and datasets where it outperforms other traditional state detection solutions and present formidable performances for security rules enforcement.
Results of state detection with MAD enable the definition and verification of high-level security rules to detect various attacks without any interaction with the monitored machine.
\end{abstract}
%\IEEEoverridecommandlockouts
@ -69,7 +69,6 @@ Results of state detection with MAD enable the definition and verification of hi
% for peerreview papers, inserts a page break and creates the second title.
% Will be ignored for other modes.
\IEEEpeerreviewmaketitle
\agd{reset acronyms}
\section{Introduction}
@ -588,10 +587,10 @@ The scenario comprises 4 phases:
\begin{itemize}
\item Night Sleep: During the night and until the worker begin the day, the machine is asleep in S3 sleep state\cite{sleep_state}. Any other state than sleep is considered anomalous during this time.
\item Work Hours: During work hours, little restriction is applied on the activity. Only a sustained (more than 30s) high load is considered anoamlous.
\item Evening Sleep: After work hours, the machine goes to sleep again for a few hours.
\item Maintenance: During the night, the machine wakes up as part of an automated maintenance schedule. During maintenance updates are fetched and a reboot is performed.
\item 1 Night Sleep: During the night and until the worker begin the day, the machine is asleep in S3 sleep state\cite{sleep_state}. Any other state than sleep is considered anomalous during this time.
\item 2 Work Hours: During work hours, little restriction is applied on the activity. Only a long period with the machine asleep is considered anomalous.
\item 3 Maintenance: During the night, the machine wakes up as part of an automated maintenance schedule. During maintenance updates are fetched and a reboot is performed.
\item 4 No Long High Load: At no point there should be a sustained high load on the machine. Given the scenario of classic office work, having all cores of a machine maxed out is suspicious. Violations to this rule are generated by running the programm xmrig for more than 30 seconds. Xmrig is a legitimate crypto-mining software but it is commonly abused by criminals to build crypto-mining malwares.
\end{itemize}
\begin{figure}
@ -616,48 +615,72 @@ For each comrpessed day of experiment (4 hours segment, thereafter refered as da
This label vector associate a label to each sample of the power trace following the mapping: -~1 is UNKNOWN, 0 is SLEEP, 1 is IDLE, 2 is HIGH and 3 is REBOOT.
The training dataset comprise one sample per state, captured during a the run of a benchmark script that interatively place the machine in each states to detect.
\agd{make dataset available}
The script on the machine generates logs that serves as ground truth to verify the results of rule checking.
Figure~\ref{fig:preds} present an illustration of the results.
The main graph line in the midle is the power consumption over time.
The color of the line represent the predicted state of the machine, based on the power consumption pattern.
Below the graph, two lines illustrates the labels vectors.
The top line is the predicted labels and can be seen as a projection of the power consumption line on the x-axis.
The bottom line is the labels ground truth, generated from the scenario logs.
We can already notice with this Figure that the prediction corretc most of the time except for some noise around states transitions and uncertainty between idle and generic activities (represented as UNKNOWN).
The errors at transitions are explained by the training samples that focuses on stable states and do not provide labels for transitions pattern.
A simple solution to avoid this issue if required would be to provide training patterns for states transitions.
The type of error foreshadows the good capabilities of this method for rules verification presented in more details in Section~\ref{2wexp-results}.
\begin{figure}
\centering
\includegraphics[width=0.49\textwidth]{images/preds.pdf}
\caption{Labels prediction from MAD for a one (compressed) day scenrario.}
\label{fig:preds}
\end{figure}
\subsection{Security Rules}
Many rules can be imagined to describe the expected and unwanted behavior of a machine.
System administrators can define highly specific rules to detect specific attacks or to match the typicall acticities of their infrastructure.
We selected 4 rules (see Table~\ref{tab:rules}) that are representative of common threats on companies or administrations's \gls{it} infrastructures.
These rules are not exhaustive and are merely an example of the potential of converting power cosumption traces to actionable data.
The rules are formaly defined using the \gls{stl} syntax which is bespoke for describing variable patterns with temporal components.\cn
The rules are formaly defined using the \gls{stl} syntax which is bespoke for describing variable patterns with temporal components.
\begin{table*}
\centering
\caption{Security rules applied to the detected states of the machine. $s[t]$ represent the label at time $t$.}
\begin{tabular}{p{0.03\textwidth} | p{0.25\textwidth} | p{0.37\textwidth} | p{0.25\textwidth}}
\begin{tabular}{p{0.03\textwidth} | p{0.20\textwidth} | p{0.43\textwidth} | p{0.25\textwidth}}
Rule & Description & STL Formula & Threat\\
\toprule
1 & "SLEEP" state only & $R_1 := \square_{[0,1h]\cup [2h40,3h20]}(s[t]=0)$ & Machine takeover, Botnet\cite{mitre_botnet}, Rogue Employee\\
2 & Exactly one occurence of "REBOOT" & $R_2 := \lozenge(s[t]=3) \cup (\neg \square_{[,2h40]}(s[t]=3)$ & \gls{apt}\cite{mitre_prevent}, Backdoors\\
3 & No "HIGH" state for more than 30s. & $R_3 := \square (s[t_0]=2 \rightarrow \lozenge_{[t_0,t_0+30s]}(s[t]=2))$ & CryptoMining Malware \cite{mitre_crypto}, Ransomware\cite{mitre_ransomware}, BotNet\cite{mitre_botnet}\\
4 & No "SLEEP" for more than 8m. & $R_4 := \square (s[t_0]=0 \rightarrow \lozenge_{[t_0,t_0+1h]}(s[t]=0))$ & System Malfunction\\
1 & "SLEEP" state only & $R_1 := \square_{[0,1h]}(s[t]=0)$ & Machine takeover, Botnet\cite{mitre_botnet}, Rogue Employee\\
2 & No "SLEEP" for more than 8m. & $R_4 := \square_{[1h,2h40]} (s[t_0]=0 \rightarrow \lozenge_{[t_0,t_0+1h]}(s[t_0]=0))$ & System Malfunction\\
3 & Exactly one occurence of "REBOOT" & $R_2 := \lozenge(s[t_0]=3) \cup (\neg \square_{[t_0,t_0+2h40]}(s[t]=3)$ & \gls{apt}\cite{mitre_prevent}, Backdoors\\
4 & No "HIGH" state for more than 30s. & $R_3 := \square (s[t_0]=2 \rightarrow \lozenge_{[t_0,t_0+30s]}(s[t]=2))$ & CryptoMining Malware \cite{mitre_crypto}, Ransomware\cite{mitre_ransomware}, BotNet\cite{mitre_botnet}\\
\bottomrule
\end{tabular}
\label{tab:rules}
\end{table*}
\subsection{Results}
\subsection{Results}\label{2wexp-results}
The performance measure represent the ability of the whole pipeline (\gls{mad} and rule checking) to detect anomalous behavior.
The script on the machine generates logs that serves as ground truth to verify the results of rule checking.
The main metrics are the micro and macro $F_1$ score of the rule violation detection.
The macro-$F_1$ score is defined as the arithmetic mean over individual $F_1$ scores for a more robust evaluation of the global performance as described in \cite{opitz2021macro}.
Table~\ref{tab:rules-results} presents the performance for the detection of each rule.
\agd{add comment about the results}
The performances or perfect on this scenario without any false positive or false negative over XX\agd{updates} runs.
The perfect detection of more complexe patterns like REBOOT illustrate the need for a system capable of matching arbitrary states.
Many common states from an embedded systems are represented by flat lines at varying average levels.
If the only states to detect were OFF, ON and HIGH, then a simple threshold method would work wonders.
However, the REBOOT pattern is not so simple.
The REBOOT resambles generic activities and crosses most of the same thresholds.
In order to consistently recognize it, the classifier must have, at its core, a pattern matching mechanism.
This leads to believe that \gls{mad} balances the tradeoff between simple, explainable and efficient on one side and capable, complete and versatile on the other.
\begin{table}
\centering
\caption{Performance of the complete rule violation detection pipeline.}
\begin{tabular}{lcc}
Rule & Micro-$F_1$ & Macro-$F_1$\\
\begin{tabular}{lccc}
Rule & Violation Ratio & Micro-$F_1$ & Macro-$F_1$\\
\toprule
Night Sleep & ?? & \multirow{4}*{0.??} \\
Work Hours & ?? & \\
Evening Sleep & ?? & \\
Reboot & ?? & \\
Night Sleep & 0.273 & 1.0 & \multirow{4}*{1.0} \\
Work Hours & 0.227 & 1.0 & \\
Reboot & 0.445 & 1.0 & \\
No Long High & 0.773 & 1.0 & \\
\bottomrule
\end{tabular}
\label{tab:rules-results}
@ -712,7 +735,7 @@ Finally, because \gls{mad} is distance-based and window-based, parallelization i
\section{Conclusion}
We present \gls{mad}, a novel solution to enable high-level security policy enforcement from side channel information.
We present \gls{mad} and its associated rule-verification pipeline, a novel solution to enable high-level security policy enforcement from side channel information.
Leveraging side channel information requires labeling samples to discover the state of the monitored system.
Additionally, in the use cases where side-channels are leveraged, collecting large labeled datasets can be challenging.
\gls{mad} is designed around three core features: low data requirement, flexibility of the detection capabilities, and stability of the results.