From 18d2802ee16bd182fe36c1c55292ae4159adc400 Mon Sep 17 00:00:00 2001 From: Arthur Grisel-Davy Date: Wed, 13 Dec 2023 21:54:02 -0500 Subject: [PATCH] remove useless files --- DSD/qrs/old_main.tex | 654 ---------------------------------------- DSD/qrs/refit_table.tex | 19 -- 2 files changed, 673 deletions(-) delete mode 100644 DSD/qrs/old_main.tex delete mode 100644 DSD/qrs/refit_table.tex diff --git a/DSD/qrs/old_main.tex b/DSD/qrs/old_main.tex deleted file mode 100644 index 687a237..0000000 --- a/DSD/qrs/old_main.tex +++ /dev/null @@ -1,654 +0,0 @@ -% This is samplepaper.tex, a sample chapter demonstrating the -% LLNCS macro package for Springer Computer Science proceedings; -% Version 2.20 of 2017/10/04 -% - -% Updates from conference version -% - Proof of monotonicity of the number of unknown samples as a function of alpha (Sec: Influence of alpha) -% - Added a figure to illustrate the areas of capture as a function of alpha (Fig fig:areas) - - - -\documentclass[conference]{IEEEconf} - -\renewcommand\thesection{\arabic{section}} % arabic numerals for the sections -\renewcommand\thesubsectiondis{\thesection.\arabic{subsection}.}% arabic numerals for the subsections -\renewcommand\thesubsubsectiondis{\thesubsectiondis.\arabic{subsubsection}.}% arabic numerals for the subsubsections - - -\usepackage{graphicx} -\usepackage{xcolor} -\usepackage[toc,acronym,abbreviations,nonumberlist,nogroupskip]{glossaries-extra} -\usepackage{booktabs} -\usepackage{multirow} -\usepackage{tabularx} -\usepackage{algpseudocodex} -\usepackage{algorithm} - -\hyphenation{dif-fe-rent} -\hyphenation{mo-di-fi-ca-tion} -\hyphenation{ope-ra-tions} -\hyphenation{acqui-ring} -\hyphenation{in-vo-lun-tary} -\hyphenation{re-le-vant} -\hyphenation{re-pre-sents} -\hyphenation{na-tu-ral-ly} -\hyphenation{col-lec-ting} -\hyphenation{sta-bi-li-ty} -\hyphenation{li-ne-ar} -\hyphenation{Figure} - -\newtheorem{problem-statement}{Problem Statement} - -% Used for displaying a sample figure. If possible, figure files should -% be included in EPS format. -% -% If you use the hyperref package, please uncomment the following line -% to display URLs in blue roman font according to Springer's eBook style: -% \renewcommand\UrlFont{\color{blue}\rmfamily} - -\newcommand\agd[1]{{\color{red}$\bigstar$}\footnote{agd: #1}} -\newcommand\SF[1]{{\color{blue}$\bigstar$}\footnote{sf: #1}} -\newcommand{\cn}{{\color{purple}[citation needed]}} -\newcommand{\pv}{{\color{orange}[passive voice]}} -\newcommand{\wv}{{\color{orange}[weak verb]}} -%\citestyle{acmauthoryear} - -%\renewcommand{\baselinestretch}{1.05} - -\begin{document} -\input{acronyms} - -\title{\textbf{\Large MAD: One-Shot Machine Activity Detector\\ for Physics-Based Cyber Security\\}} - -\author{Arthur Grisel-Davy$^{1,*}$, Sebastian Fischmeister$^{2}$\\ - \normalsize $^{1}$University of Waterloo, Ontario, Canada\\ - \normalsize agriseld@uwaterloo.ca, sfishme@uwaterloo.ca\\ - \normalsize *corresponding author -} - -% make the title area -\maketitle -\begin{abstract} -This electronic document is a “live” template. The various components of your paper [title, text, heads, etc.] are already defined on the style sheet, as illustrated by the portions given in this document. DO NOT USE SPECIAL CHARACTERS, SYMBOLS, OR MATH IN YOUR TITLE OR ABSTRACT. -\end{abstract} -\IEEEoverridecommandlockouts -\vspace{1.5ex} - -% For peer review papers, you can put extra information on the cover -% page as needed: -% \begin{center} \bfseries EDICS Category: 3-BBND \end{center} -% -% for peerreview papers, inserts a page break and creates the second title. -% Will be ignored for other modes. -\IEEEpeerreviewmaketitle - -\begin{abstract} -Side channel analysis offers several advantages over traditional machine monitoring methods. -The low intrusiveness, independence with the host, data reliability and difficulty to bypass are compelling arguments for using involuntary emissions as input for security policies. -However, side-channel information often comes in the form of unlabeled time series representing a proxy variable of the activity. -Enabling the definition and enforcement of high-level security policies requires extracting the state or activity of the system. -We present in this paper a novel time series, one-shot classifier called \gls{mad} specifically designed and evaluated for side-channel analysis. -\gls{mad} outperforms other traditional state detection solutions in terms of accuracy and, as importantly, Levenshtein distance of the state sequence. - -\keywords{side-channel analysis, security rules, state detection} -\end{abstract} - -\section{Introduction} - -\gls{ids}s leverage different types of data to detect intrusions. -On one side, most solutions use labeled and actionable data, often provided by the system to protect. -In the software world, this data can be the resource usage \cite{1702202}, program source code \cite{9491765} or network traffic \cite{10.1145/2940343.2940348} leveraged by an \gls{hids} or \gls{nids}. -In the machine monitoring world the input data can be the shape of a gear \cite{wang2015measurement} or the throughput of a pump \cite{gupta2021novel}. -On the other side, some methods consider only information that the system did not intentionally provide. -The system emits these activities by-product through physical mediums called side channels. -Common side-channel information for an embedded system include power consumption \cite{yang2016power} or electromagnetic fields \cite{chawla2021machine}. -For a production machine, common side-channel information include vibrations \cite{zhang2019numerical} or chemical composition of fluids \cite{4393062}. - -Side-channel information offer compelling advantages over agent-collected information. -First, the information is difficult to forge. -Because the monitored system is not involved in the data retrieval process, there is no risk that an attacker that compromised the system could easily send forged information. -For example, if an attacker performs any computation on the system --- which is the case of most attacks --- it will unavoidably affect a variety of different side channels. -Second, the side-channel information retrieval process is often non-intrusive and non-disruptive for the monitored system. -Measuring the power consumption of a computer or the vibrations of a machine does not involve the cooperation or modification of the system \cite{10.1145/2976749.2978353}. -This host-independence property is crucial for safety-critical or high-availability applications as the failure of one of the two --- monitored or monitoring --- systems does not affect the other. -These two properties --- reliable data and host-independence --- set physics-based monitoring solution apart with distinct advantages and use-cases. - -However, using side-channel data introduces new challenges. -One obstacle to overcome when designing a physics-based solution is the interpretation of the data. -Because the data collection consists of measuring a physical phenomenon, the input data is often a discrete time series. -The values in these time series are not directly actionable. -In some cases, a threshold value is enough to assess the integrity of the system. -In such a case, comparing each value of the time series to the threshold is possible \cite{jelali2013statistical}. -However, whenever a simple threshold is not a reliable factor for the decision, a more advanced analysis of the time series is required to make it actionable. -The state of a machine is often represented by a specific pattern. -This pattern could be, for example, a succession of specific amplitudes or a frequency/average pair for periodic processes. -These patterns are impossible to reliably detect with a simple threshold method. -Identifying the occurrence and position of these patterns makes the data actionable and enables higher-level --- i.e., that work at a higher level of abstraction \cite{tongaonkar2007inferring} --- security and monitoring policies. -For example, a computer starting mid-night or rebooting multiple times in a row should raise an alert for a possible intrusion or malfunction. - -Rule-based \gls{ids}s using side channel information require an accurate and practical pattern detection solution. -Many data-mining algorithms assume that training data is cheap, meaning that acquiring large --- labeled --- datasets is achievable without major expense. -Unfortunately, collecting labeled data requires following a procedure and induce downtime for the machine which can be expensive. -Collecting many training samples during normal operations of the machine is more time-consuming as the machine's activity cannot be controlled. -A single sample of each pattern to be detected in the time series is a more convenient data requirement. -Collecting a sample is immediately possible after the installation of the measurement equipment during normal operations of the machine. - -In this paper, we present \gls{mad}, a distance-based, one-shot pattern detection method for time series. -\gls{mad} focuses on providing pre-defined state detection from only one training sample per class. -This approach enables the analysis of side-channel information in contexts where the collection of large datasets is impractical. -A context selection algorithm lies at the core of \gls{mad} and yield stable classification of individual sample, important for the robustness of high-level security rules. -In experiments, \gls{mad} outperforms other approaches in accuracy and the Levenshtein distance on various simulated, lab-captured, and public times-series datasets. - -We will present the current related work on physics-based security and time series pattern detection in Section~\ref{sec:related}. -Then we will introduce the formal and practical definitions of our solution in Section~\ref{sec:statement} and~\ref{sec:solution}. -Finally, we will present the datasets considered in Section~\ref{sec:dataset} and the results in Section~\ref{sec:results} to finish with a discussion of the solution in Section~\ref{sec:discussion}. - -\section{Related Work}\label{sec:related} -Side-channel analysis focuses on extracting information from involuntary emissions of a system. -This topic traces back to the seminal work of Paul C. Kocher. -He introduced power side-channel analysis to extract secrets from several cryptographic protocols \cite{kocher1996timing}. -This led to the new field of side-channel analysis \cite{randolph2020power}. -However, the potential of leveraging side-channel information for defense and security purposes remains mostly untapped. -The information leakage through involuntary emissions through different channels provides insights into the activities of a machine. -Acoustic emissions \cite{belikovetsky2018digital}, heat pattern signature \cite{al2016forensics} or power consumption \cite{10.1145/3571288, gatlin2019detecting, CHOU2014400}, can --- among other side-channels --- reveal information about a machine's activity. -Side-channel information collection generally results in time series objects to analyze. - -There exists a variety of methods for analyzing time series. -For signature-based solutions, a specific extract of the data is compared to known-good references to assess the integrity of the host \cite{9934955, 9061783}. -This signature comparison enables the verification of expected and specific sections and requires that the sections of interest can be extracted and synchronized. -Another solution for detecting intrusions is the definition of security policies. -Security policies are sets of rules that describe wanted or unwanted behavior. -These rules are built on input data accessible to the \gls{ids} such as user activity \cite{ilgun1995state} or network traffic \cite{5563714, kumar2020integrated}. -However, the input data requirements must have to apply a rule. -This illustrates the gap between the side-channel analysis methods and the rule-based intrusion detection methods. -To apply security policies to side-channel information, it is necessary to first label the data. - -The problem of identifying pre-defined patterns in unlabeled time series is referenced under various names in the literature. -The terms \textit{activity segmentation} or \textit{activity detection} are the most relevant for the problem we are interested in. -The state of the art methods in this domain focus on human activities and leverage various sensors such as smartphones \cite{wannenburg2016physical}, cameras \cite{bodor2003vision} or wearable sensors \cite{uddin2018activity}. -These methods rely on large labeled datasets to train classification models and detect activities \cite{micucci2017unimib}. -For real-life applications, access to large labeled datasets may not be possible. -Another approach, more general than activity detection, uses \gls{cpd}. -\gls{cpd} is a sub-topic of time series analysis that focuses on detecting abrupt changes in a time series \cite{truong2020selective}. -It is assumed in many cases that these change points are representative of state transitions from the observed system. -However, \gls{cpd} is only the first step in state detection as classification of the detected segments remains necessary \cite{aminikhanghahi2017survey}. -Moreover, not all state transitions trigger abrupt changes in time series statistics, and some states include abrupt changes. -Overall, \gls{cpd} only fits a specific type of problem with stable states and abrupt transitions. -Neural networks raised in popularity for time series analysis with \gls{rnn}. -Large \gls{cnn} can perform pattern extraction in long time series, for example in the context of \gls{nilm} \cite{8598355}. -\gls{nilm} focuses on the problem of signal disaggregation. -In this problem, the signal comprises an aggregate of multiple signals, each with their own patterns \cite{angelis2022nilm}. -This problem shares many terms and core techniques as this paper but the nature of the input data makes \gls{nilm} a distinct area of research. - -The specific problem of classification with only one example of each class is called one-shot --- or few-shot --- classification. -This topic focuses on pre-extracted time series classification with few training samples, often using multi-level neural networks \cite{10.1145/3371158.3371162, 9647357}. -However, in the context of side-channel analysis, a time series contains many patterns that are not extracted. -Moreover, neural-based approaches lack interpretability, which can cause issues in the case of unforeseen time series patterns. -Simpler approaches with novelty detection capabilities are required when the output serves as input for rule-based processing. - -Finally, Duin et. al. investigate the problem of distance-based few-shot classification \cite{duin1997experiments}. -They present an approach based on the similarity between new objects and a dissimilarity matrix between items of the training set. -The similarities are evaluated with Nearest-Neighbor rules or \gls{svm}. -Their approach bears some interesting similarities with the one presented in this paper. -However, they evaluate their work on the recognition of handwritten numerals, which is far from the use case we are interested in. - - -\section{Problem Statement}\label{sec:statement} -%\gls{mad} focuses on detecting the state of a time series at any point in time. -We consider the problem from the point of view of multi-class, mono-label classification problem \cite{aly2005survey} for every sample in a time series. -The problem is multi-class because multiple states can occur in one-time series, and therefore any sample is assigned one of multiple states. -The problem is mono-label because only one state is assigned to each sample. -The classification is a mapping from the samples space to the states space. - -\begin{problem-statement}[\gls{mad}] -Given a discretized time series $t$ and a set of patterns $P=\{P_1,\dots, P_n\}$, identify a mapping $m:\mathbb{N}\longrightarrow P\cup \lambda$ such that every sample $t[i]$ -maps to a pattern in $P\cup \lambda$ with the condition that the sample matches an occurrence of the pattern in $t$. -\end{problem-statement} - -The time series $t: \mathbb{N} \longrightarrow \mathbb{R}$ is a finite, discretized, mono-variate, real-valued time series. -The patterns (also called training samples) $P_j \in P$ are of the same type as $t$. -Each pattern $P_j$ can take any length denoted $N_j$. -A sample $t[i]$ \textit{matches} a pattern $P_j \in P$ if there exists a substring of $t$, the length of $P_j$, that includes the sample, such that a similarity measure between this substring and $P_j$ is below a pre-defined threshold. -The pattern $\lambda$ is the \textit{unknown} pattern assigned to the samples in $t$ that do not match any of the patterns in $P$. - -\begin{figure} -\centering -\includegraphics[width=0.45\textwidth]{images/overview.pdf} -\caption{Illustration of the sample distance from one sample to each training example in a 2D space.} -\label{fig:overview} -\end{figure} - -\section{Proposed Solution: MAD}\label{sec:solution} -\gls{mad}'s core idea separates it from other traditional sliding window algorithm. -In \gls{mad}, the sample window around the sample to classify dynamically adapts for optimal context selection. -This principle influences the design of the detector and requires the definition of new distance metrics. -Because the patterns lengths may differ, our approach requires distance metrics that are robust to length variations. -%For the following explanation, the pattern set $P$ refers to the provided patterns only $\{P\setminus \lambda\}$ --- unless specified otherwise. -We first define the fundamental distance metric as the normalized Euclidean distance between two-time series $a$ and $b$ of the same length $N_a=N_b$ -\begin{equation} - nd(a,b) = \dfrac{EuclideanDist(a,b)}{N_a} -\end{equation} - -Using this normalized distance $nd$, we define the distance from a sample $t[i]$ to a pattern $P_j \in P$. -This is the sample distance $sd$ defined as -\begin{equation}\label{eq:sd} - sd(i,P_j) = \min_{k\in [i-N_j,i+N_j])}(nd(t[i-k:i+k],P_j)) -\end{equation} - -%with $P_j$ the training sample corresponding to the state $j$, and $t$ the complete time series. -Computing the distance $sd(i,P_j)$ requires to: (1) select every substring of $t$ of length $N_j$ that contains the sample $t[i]$, (2) evaluate their normalized distance to the pattern $P_j$, and (3) consider $sd(i,P_j)$ as the smallest of these distances. -For simplicity, Equation~\ref{eq:sd} omits the border conditions for the range of $k$. -When the sample position $i$ is less than $N_j$ or greater than $N_t-N_j$, the range adapts to only consider valid substrings. - -Our approach uses a threshold-based method to decide what label to assign to a sample. -For each sample in $t$, the algorithm compares the distance $sd(i,P_j)$ to the threshold $T_j$. -The sample receives the label $j$ associated with the pattern $P_j$ that results in the smallest distance $sd(i,P_j)$ with $sd(i,P_j)N_j$. -If $N_l