fix alpha impact figure
This commit is contained in:
parent
87820c8e80
commit
60a01f8e46
2 changed files with 97 additions and 88 deletions
161
DSD/qrs/main.tex
161
DSD/qrs/main.tex
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
\documentclass[conference]{IEEEconf}
|
||||
|
||||
|
||||
|
|
@ -33,7 +32,7 @@
|
|||
\input{acronyms}
|
||||
\title{\textbf{\Large MAD: One-Shot Machine Activity Detector for Physics-Based Cyber Security\\}}
|
||||
|
||||
\author{Arthur Grisel-Davy$^{1,*}$, Sebastian Fischmeister$^{2}$\\
|
||||
\author{Arthur Grisel-Davy$^{1,*}$, Sebastian Fischmeister$^{1}$\\
|
||||
\normalsize $^{1}$University of Waterloo, Ontario, Canada\\
|
||||
\normalsize agriseld@uwaterloo.ca, sfishme@uwaterloo.ca\\
|
||||
\normalsize *corresponding author
|
||||
|
|
@ -194,6 +193,7 @@ The pattern $\lambda$ is the \textit{unknown} pattern assigned to the samples in
|
|||
\label{fig:overview}
|
||||
\end{figure}
|
||||
|
||||
\pagebreak
|
||||
\section{Proposed Solution: MAD}\label{sec:solution}
|
||||
\gls{mad}'s core idea separates it from other traditional sliding window algorithm.
|
||||
In \gls{mad}, the sample window around the sample to classify dynamically adapts for optimal context selection.
|
||||
|
|
@ -249,6 +249,7 @@ s_i = \underset{j\in[1,k]}{\arg\min}(sd(i,e_j) \textrm{ with } sd(i,e_j)<T_j)
|
|||
\end{equation}
|
||||
In the case where no distance is below the threshold, the sample defaults to the \textit{unknown} state.
|
||||
|
||||
|
||||
\subsection{Algorithm}
|
||||
The algorithm for \gls{mad} follows three steps:
|
||||
|
||||
|
|
@ -354,12 +355,83 @@ Thus the second part also terminates.
|
|||
Finally, the third part uses the sames loops as the second and also terminates.
|
||||
Overall, \gls{mad} always terminates for any finite time series and finite set of finite patterns.
|
||||
|
||||
\textbf{Monotony of number of unknown sample}\agd{find better title}
|
||||
\agd{Explain that the number of unknown sample is monotonic as a function of alpha.
|
||||
Also, a sample that is classified as unknown will always remain unknown if alpha decreases.}
|
||||
\textbf{Influence of $\alpha$}
|
||||
The shrink coefficient $\alpha$ is the only hyperparameter of the detector.
|
||||
Its default value is one.
|
||||
$\alpha$ controls the threshold of similarity that a substring should cross to get qualified as a match to a pattern.
|
||||
$\alpha$ takes its value in $\mathbb{R}_*^+$.
|
||||
The default value for $\alpha$ is one.
|
||||
This value follows the intuitive reasoning presented in Section~\ref{sec:solution}.
|
||||
|
||||
\section{Evaluation}
|
||||
The evaluation of \gls{mad} consists in the detection of the states for time series from various machines.
|
||||
To better understand the influence of the shrink coefficient, the algorithm can be perceived as a 2D area segmentation problem.
|
||||
Let us consider the 2D plane where each pattern has a position based on its shape.
|
||||
A substring to classify also has a position in the plane and a distance to each pattern (see bottom part of Figure~\ref{fig:overview}).
|
||||
During classification, the substring takes the label of the closest pattern.
|
||||
For any pattern $P_j$, the set of positions in the plane that are assigned to $P_j$ --- i.e., the set of positions for which $P_j$ is the closest pattern --- is called the area of attraction of $P_j$.
|
||||
In a classic \gls{1nn} context, every point in the plane is in the area of attraction of one pattern.
|
||||
|
||||
This infinite area of attraction is not a desirable feature in this context.
|
||||
Let us consider now a time series exhibiting anomalous or unforeseen behavior.
|
||||
Some substrings in this time series do not resemble any of the provided pattern.
|
||||
In an infinite area of attraction context, the anomalous points are assigned to a pattern, even if they poorly match it.
|
||||
As a result, the behavior of the security rule can become unpredictable as anomalous points can receive a seemingly random label.
|
||||
|
||||
A more desirable behavior of the state detection system is to inform of the presence of unpredicted behavior.
|
||||
This behavior naturally emerges when the areas of attraction of the patterns are limited to a finite size.
|
||||
The shrink coefficient $\alpha$ --- through the modification of the threshold $T_j$ --- provides control over the shrink of the areas of attraction.
|
||||
The lower the value of $\alpha$, the smaller the areas of attraction around each sample.
|
||||
Applying a coefficient to the thresholds produces a reduction of the radius of the area of attraction, not an homothety of the initial areas.
|
||||
In other words, the shrink does not preserve the shape of the area.
|
||||
For a value $\alpha < 0.5$, all areas become disks --- in the 2D representation --- and all shape information are lost.
|
||||
|
||||
The impact of the $\alpha$ coefficient on the classification is monotonic and predictable.
|
||||
Because $\alpha$ influences the thresholds, changing $\alpha$ results in moving the transitions in the detected labels.
|
||||
In other words, a lower value of $\alpha$ expands the unknown segments while a higher value shrinks them until they disappear.
|
||||
Figure~\ref{fig:alpha_impact} illustrates the impact $\alpha$ on the width of unknown segments.
|
||||
The impact of $\alpha$ on the number of unknown sample is also monotonic.
|
||||
|
||||
\begin{proof}
|
||||
We prove the monotony of the number of unknown samples as a function of $\alpha$ by induction.
|
||||
The base case is $\alpha=0$.
|
||||
In this case, the threshold for every pattern $P_j\in P$ is $T_j = \alpha\times ID_j = 0$.
|
||||
With every $T_j=0$, no sample can have a distance below the threshold and every sample is labeled as \textit{unknown}.
|
||||
|
||||
For the induction case, let us consider $\alpha$ increasing from the value $\alpha_0$ to $\alpha_1 = \alpha_0 + \delta$ with $\delta \in \mathbb{R}_*^+$.
|
||||
The increasing of $\alpha$ induces the increase of every threshold $T$ from the value $T_0$ to $T_1$
|
||||
\begin{equation}
|
||||
\alpha_0 <\alpha_1 \rightarrow T_0 < T_1
|
||||
\end{equation}
|
||||
|
||||
For every value of every threshold $T$ we can define a set of all samples below the threshold as $S_T$.
|
||||
When a threshold increases from $T_0$ to $T_1$, all the samples in $S_{T_0}$ also belong in $S_{T_1}$ by the transitivity of order in $\mathbb{R}_*^+$.
|
||||
It is also possible for samples to belong to $S_{T_1}$ but not to $S_{T_0}$ if their distance falls between $T_0$ and $T_1$.
|
||||
Hence, $S_{T_0}$ is a subset of $S_{T_1}$ and the cardinality of $S_T$ as a function of $T$ is monotonically non-decreasing.
|
||||
|
||||
We conclude that the number of unknown samples --- i.e.,samples above every thresholds --- as a function of $\alpha$ is monotonically non-increasing.
|
||||
\end{proof}
|
||||
|
||||
|
||||
Figure~\ref{fig:alpha} presents the number of unknown samples in the classification of the NUCPC-1 time series based on the value of $\alpha$.
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.49\textwidth]{images/alpha.pdf}
|
||||
\caption{Evolution of the number of unknown samples based on the value of the shrink coefficient $\alpha$.}
|
||||
\label{fig:alpha}
|
||||
\end{figure}
|
||||
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.49\textwidth]{images/alpha_impact.pdf}
|
||||
\caption{Behavior of the classifier with different values of $\alpha$. A lower value of $\alpha$ expands the unknown sections (orange sections).}
|
||||
\label{fig:alpha_impact}
|
||||
\end{figure}
|
||||
|
||||
|
||||
\pagebreak
|
||||
\section{Case Study 1: Comparison with Other Methods}
|
||||
The first evaluation of \gls{mad} consists in the detection of the states for time series from various machines.
|
||||
We evaluate the performance of the proposed solution against other traditional methods to illustrate the capabilities and advantages of \gls{mad}.
|
||||
|
||||
\subsection{Performance Metrics}
|
||||
|
|
@ -493,78 +565,6 @@ With both performances metrics combined, \gls{mad} outperforms the other methods
|
|||
\label{fig:res}
|
||||
\end{figure*}
|
||||
|
||||
\subsection{Influence of $\alpha$}
|
||||
The shrink coefficient $\alpha$ is the only hyperparameter of the detector.
|
||||
Its default value is one.
|
||||
$\alpha$ controls the threshold of similarity that a substring should cross to get qualified as a match to a pattern.
|
||||
$\alpha$ takes its value in $\mathbb{R}_*^+$.
|
||||
The default value for $\alpha$ is one.
|
||||
This value follows the intuitive reasoning presented in Section~\ref{sec:solution}.
|
||||
|
||||
To better understand the influence of the shrink coefficient, the algorithm can be perceived as a 2D area segmentation problem.
|
||||
Let us consider the 2D plane where each pattern has a position based on its shape.
|
||||
A substring to classify also has a position in the plane and a distance to each pattern (see bottom part of Figure~\ref{fig:overview}).
|
||||
During classification, the substring takes the label of the closest pattern.
|
||||
For any pattern $P_j$, the set of positions in the plane that are assigned to $P_j$ --- i.e., the set of positions for which $P_j$ is the closest pattern --- is called the area of attraction of $P_j$.
|
||||
In a classic \gls{1nn} context, every point in the plane is in the area of attraction of one pattern.
|
||||
|
||||
This infinite area of attraction is not a desirable feature in this context.
|
||||
Let us consider now a time series exhibiting anomalous or unforeseen behavior.
|
||||
Some substrings in this time series do not resemble any of the provided pattern.
|
||||
In an infinite area of attraction context, the anomalous points are assigned to a pattern, even if they poorly match it.
|
||||
As a result, the behavior of the security rule can become unpredictable as anomalous points can receive a seemingly random label.
|
||||
|
||||
A more desirable behavior of the state detection system is to inform of the presence of unpredicted behavior.
|
||||
This behavior naturally emerges when the areas of attraction of the patterns are limited to a finite size.
|
||||
The shrink coefficient $\alpha$ --- through the modification of the threshold $T_j$ --- provides control over the shrink of the areas of attraction.
|
||||
The lower the value of $\alpha$, the smaller the areas of attraction around each sample.
|
||||
Applying a coefficient to the thresholds produces a reduction of the radius of the area of attraction, not an homothety of the initial areas.
|
||||
In other words, the shrink does not preserve the shape of the area.
|
||||
For a value $\alpha < 0.5$, all areas become disks --- in the 2D representation --- and all shape information are lost.
|
||||
|
||||
The impact of the $\alpha$ coefficient on the classification is monotonic and predictable.
|
||||
Because $\alpha$ influences the thresholds, changing $\alpha$ results in moving the transitions in the detected labels.
|
||||
In other words, a lower value of $\alpha$ expands the unknown segments while a higher value shrinks them until they disappear.
|
||||
Figure~\ref{fig:alpha_impact} illustrates the impact $\alpha$ on the width of unknown segments.
|
||||
The impact of $\alpha$ on the number of unknown sample is also monotonic.
|
||||
|
||||
\begin{proof}
|
||||
We prove the monotony of the number of unknown samples as a function of $\alpha$ by induction.
|
||||
The base case is $\alpha=0$.
|
||||
In this case, the threshold for every pattern $P_j\in P$ is $T_j = \alpha\times ID_j = 0$.
|
||||
With every $T_j=0$, no sample can have a distance below the threshold and every sample is labeled as \textit{unknown}.
|
||||
|
||||
For the induction case, let us consider $\alpha$ increasing from the value $\alpha_0$ to $\alpha_1 = \alpha_0 + \delta$ with $\delta \in \mathbb{R}_*^+$.
|
||||
The increasing of $\alpha$ induces the increase of every threshold $T$ from the value $T_0$ to $T_1$
|
||||
\begin{equation}
|
||||
\alpha_0 <\alpha_1 \rightarrow T_0 < T_1
|
||||
\end{equation}
|
||||
|
||||
For every value of every threshold $T$ we can define a set of all samples below the threshold as $S_T$.
|
||||
When a threshold increases from $T_0$ to $T_1$, all the samples in $S_{T_0}$ also belong in $S_{T_1}$ by the transitivity of order in $\mathbb{R}_*^+$.
|
||||
It is also possible for samples to belong to $S_{T_1}$ but not to $S_{T_0}$ if their distance falls between $T_0$ and $T_1$.
|
||||
Hence, $S_{T_0}$ is a subset of $S_{T_1}$ and the cardinality of $S_T$ as a function of $T$ is monotonically non-decreasing.
|
||||
|
||||
We conclude that the number of unknown samples --- i.e.,samples above every thresholds --- as a function of $\alpha$ is monotonically non-increasing.
|
||||
\end{proof}
|
||||
|
||||
|
||||
Figure~\ref{fig:alpha} presents the number of unknown samples in the classification of the NUCPC-1 time series based on the value of $\alpha$.
|
||||
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.49\textwidth]{images/alpha.pdf}
|
||||
\caption{Evolution of the number of unknown samples based on the value of the shrink coefficient $\alpha$.}
|
||||
\label{fig:alpha}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=0.49\textwidth]{images/alpha_impact.pdf}
|
||||
\caption{Behavior of the classifier with different values of $\alpha$. A lower value of $\alpha$ expands the unknown sections (orange sections)}
|
||||
\label{fig:alpha_impact}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure*}
|
||||
\centering
|
||||
|
|
@ -573,6 +573,9 @@ Figure~\ref{fig:alpha} presents the number of unknown samples in the classificat
|
|||
\label{fig:areas}
|
||||
\end{figure*}
|
||||
|
||||
\pagebreak
|
||||
\section{Case Study 2: Attack Scenarios}
|
||||
|
||||
\section{Discussion}\label{sec:discussion}
|
||||
In this section we highlight specific aspects of the proposed solution.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue