%% This is file `sample-manuscript.tex',
%% generated with the docstrip utility.
%%
%% The original source files were:
%%
%% samples.dtx (with options: `manuscript')
%%
%% IMPORTANT NOTICE:
%%
%% For the copyright see the source file.
%%
%% Any modified versions of this file must be renamed
%% with new filenames distinct from sample-manuscript.tex.
%%
%% For distribution of the original source see the terms
%% for copying and modification in the file samples.dtx.
%%
%% This generated file may be distributed as long as the
%% original source files, as listed above, are part of the
%% same distribution. (The sources need not necessarily be
%% in the same archive or directory.)
%%
%% Commands for TeXCount
%TC:macro \cite [option:text,text]
%TC:macro \citep [option:text,text]
%TC:macro \citet [option:text,text]
%TC:envir table 0 1
%TC:envir table* 0 1
%TC:envir tabular [ignore] word
%TC:envir displaymath 0 word
%TC:envir math 0 word
%TC:envir comment 0 0
%%
%%
%% The first command in your LaTeX source must be the \documentclass command. This is the generic manuscript mode required for submission and peer review.
\documentclass[manuscript,screen,review]{acmart}
%% To ensure 100% compatibility, please check the white list of
%% approved LaTeX packages to be used with the Master Article Template at
%% https://www.acm.org/publications/taps/whitelist-of-latex-packages
%% before creating your document. The white list page provides
%% information on how to submit additional LaTeX packages for
%% review and adoption.
%% Fonts used in the template cannot be substituted; margin
%% adjustments are not allowed.
\usepackage{graphicx}
\usepackage{multirow}
\usepackage{xcolor}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage{algpseudocodex}
\usepackage{algorithm}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsthm}
\newtheorem{problem-statement}{Problem Statement}
\usepackage[toc,acronym,abbreviations,nonumberlist,nogroupskip]{glossaries-extra}
\input{acronyms}
%%
%% \BibTeX command to typeset BibTeX logo in the docs
\AtBeginDocument{%
\providecommand\BibTeX{{%
\normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}}
%% Rights management information. This information is sent to you
%% when you complete the rights form. These commands have SAMPLE
%% values in them; it is your responsibility as an author to replace
%% the commands and values with those provided to you when you
%% complete the rights form.
\setcopyright{acmlicensed}
\copyrightyear{2018}
\acmYear{2018}
\acmDOI{XXXXXXX.XXXXXXX}
%% These commands are for a PROCEEDINGS abstract or paper.
% \acmConference[Conference acronym 'XX]{Make sure to enter the correct
% conference title from your rights confirmation emai}{June 03--05,
% 2018}{Woodstock, NY}
%
% Uncomment \acmBooktitle if th title of the proceedings is different
% from ``Proceedings of ...''!
%
%\acmBooktitle{Woodstock '18: ACM Symposium on Neural Gaze Detection,
% June 03--05, 2018, Woodstock, NY}
%% These commands are for a JOURNAL article.
\acmJournal{JACM}
\acmVolume{37}
\acmNumber{4}
\acmArticle{111}
\acmMonth{8}
\acmISBN{978-1-4503-XXXX-X/18/06}
%%
%% Submission ID.
%% Use this when submitting an article to a sponsored event. You'll
%% receive a unique submission ID from the organizers
%% of the event, and this ID should be used as the parameter to this command.
%%\acmSubmissionID{123-A56-BU3}
%%
%% For managing citations, it is recommended to use bibliography
%% files in BibTeX format.
%%
%% You can then either use BibTeX with the ACM-Reference-Format style,
%% or BibLaTeX with the acmnumeric or acmauthoryear sytles, that include
%% support for advanced citation of software artefact from the
%% biblatex-software package, also separately available on CTAN.
%%
%% Look at the sample-*-biblatex.tex files for templates showcasing
%% the biblatex styles.
%%
%%
%% The majority of ACM publications use numbered citations and
%% references. The command \citestyle{authoryear} switches to the
%% "author year" style.
%%
%% If you are preparing content for an event
%% sponsored by ACM SIGGRAPH, you must use the "author year" style of
%% citations and references.
%% Uncommenting
%% the next command will enable that style.
%%\citestyle{acmauthoryear}
%%
%% end of the preamble, start of the body of the document source.
\begin{document}
%%
%% The "title" command has an optional parameter,
%% allowing the author to define a "short title" to be used in page headers.
\title{This name needs to change to be different from the conference version}
%%
%% The "author" command and its associated commands are used to define
%% the authors and their affiliations.
%% Of note is the shared affiliation of the first two authors, and the
%% "authornote" and "authornotemark" commands
%% used to denote shared contribution to the research.
\author{Arthur Grisel-Davy}
\email{agriseld@uwaterloo.ca}
\affiliation{%
\institution{University of Waterloo}
\city{Waterloo}
\state{Ontario}
\country{Canada}
}
\author{Sebastian Fischmeister}
\email{sfischme@uwaterloo.ca}
\affiliation{%
\institution{University of Waterloo}
\city{Waterloo}
\state{Ontario}
\country{Canada}
}
%%
%% By default, the full list of authors will be used in the page
%% headers. Often, this list is too long, and will overlap
%% other information printed in the page headers. This command allows
%% the author to define a more concise list
%% of authors' names for this purpose.
\renewcommand{\shortauthors}{Grisel-Davy and Fischmeister}
%%
%% The abstract is a short summary of the work to be presented in the
%% article.
\begin{abstract}
Side channel analysis offers several advantages over traditional machine monitoring methods.
The low intrusiveness, independence with the host, data reliability and difficulty to bypass are compelling arguments for using involuntary emissions as input for enforcing security policies.
However, side-channel information often comes in the form of unlabeled time series of a proxy variable of the activity.
Enabling the definition and enforcement of high-level security policies requires extracting the state or activity of the system from the input data.
We present in this paper a novel time series, one-shot pattern locator and classifier called Machine Activity Detector (MAD) specifically designed and evaluated for side-channel analysis.
We evaluate MAD in two case studies on a variety of machines and datasets where it outperforms other traditional state detection solutions and presents formidable performances for security rules enforcement.
Results of state detection with MAD enable the definition and verification of high-level security rules to detect various attacks without any interaction with the monitored machine.
\end{abstract}
%%
%% The code below is generated by the tool at http://dl.acm.org/ccs.cfm.
%% Please copy and paste the code instead of the example below.
%%
\begin{CCSXML}
10010583.10010662
Hardware~Power and energy
500
10002978.10002997.10002999
Security and privacy~Intrusion detection systems
500
10010147.10010257.10010321
Computing methodologies~Machine learning algorithms
500
\end{CCSXML}
\ccsdesc[500]{Hardware~Power and energy}
\ccsdesc[500]{Security and privacy~Intrusion detection systems}
\ccsdesc[500]{Computing methodologies~Machine learning algorithms}
%%
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
\keywords{Intrusion Detection, Power Analysis, Embedded Systems, Time Series Analysis}
%\received{20 February 2007}
%\received[revised]{12 March 2009}
%\received[accepted]{5 June 2009}
%%
%% This command processes the author and affiliation and title
%% information and builds the first part of the formatted document.
\maketitle
\section{Introduction}
\gls{ids}s leverage different types of data to detect intrusions.
On one side, most solutions use labeled and actionable data, often provided by the system to protect.
This data can be the resource usage \cite{1702202}, program source code \cite{9491765} or network traffic \cite{10.1145/2940343.2940348} leveraged by an \gls{hids} or \gls{nids}.
On the other side, some methods consider only information that the system did not intentionally provide.
The system emits these activity by-products through physical mediums called side channels.
Common side-channel information for an embedded system includes power consumption \cite{yang2016power} or electromagnetic fields \cite{chawla2021machine}.
Side-channel information offers compelling advantages over agent-collected information.
First, the information is difficult to forge.
Because the monitored system is not involved in the data retrieval process, there is no risk that an attacker that compromised the system could easily send forged information.
For example, if an attacker performs any computation on the system, it will unavoidably affect a variety of different side channels.
There are studies focusing on altering the power consumption profile of software, but their goal is to mask the consumption pattern to avoid leaking side-channel information.
These solutions \cite{1253591,6918465} do not offer to change the pattern to an arbitrary target but to make all activities indistinguishable.
These methods still induce changes in the consumption pattern that makes them identifiable by the detection system.
Second, the side-channel information retrieval process is often non-intrusive and non-disruptive for the monitored system.
Measuring the power consumption of a computer does not involve the cooperation or modification of the system \cite{10.1145/2976749.2978353}.
This host independence property is crucial for safety-critical or high-availability applications as the failure of one of the two --- monitored or monitoring --- systems does not affect the other.
These two properties --- reliable data and host independence --- set physics-based monitoring solutions apart with distinct advantages and use cases.
It is interesting to notice that leveraging side-channel analysis to detect malfunction is not limited to software.
For production machines with high availability requirements, many side-channels provide useful information about the state of the machine.
Common sources of information are vibrations \cite{zhang2019numerical}, the chemical composition of various fluids \cite{4393062}, the shape of a gear \cite{wang2015measurement} or performance metrics like the throughput of a pump \cite{gupta2021novel}.
This is important to keep in mind that other domains outside of software can also benefit from side-channel analysis tools tailored for security enforcement.
However, using side-channel data introduces new challenges.
One obstacle to overcome when designing a physics-based solution is the interpretation of the data.
Because the data collection consists of measuring a physical phenomenon, the input data is often a discrete time series.
The values in these time series are not directly actionable.
In some cases, a threshold value is enough to assess the integrity of the system.
In such a case, comparing each value of the time series to the threshold is possible \cite{jelali2013statistical}.
However, whenever a simple threshold is not a reliable factor for the decision, a more advanced analysis of the time series is required to make it actionable.
The state of a machine is often represented by a specific pattern.
This pattern could be, for example, a succession of specific amplitudes or a frequency/average pair for periodic processes.
These patterns are impossible to reliably detect with a simple threshold method.
Identifying the occurrence and position of these patterns makes the data actionable and enables higher-level --- i.e., that work at a higher level of abstraction \cite{tongaonkar2007inferring} --- security and monitoring policies.
For example, a computer starting at night or rebooting multiple times in a row should raise an alert for a possible intrusion or malfunction.
Rule-based \gls{ids}s using side-channel information require an accurate and practical pattern detection solution.
Many data-mining algorithms assume that training data is cheap, meaning that acquiring large --- labeled --- datasets is achievable without significant expense.
Unfortunately, collecting labeled data requires following a procedure and induces downtime for the machine, which can be expensive.
Collecting many training samples during normal operations of the machine is more time-consuming as the machine's activity cannot be controlled.
A more convenient data requirement would be a single sample of each pattern to detect.
Collecting a sample is immediately possible after the installation of the measurement equipment during normal operations of the machine.
This paper presents \gls{mad}, a distance-based, one-shot pattern detection method for time series.
\gls{mad} focuses on providing pre-defined state detection from only one training sample per class.
This approach enables the analysis of side-channel information in contexts where the collection of large datasets is impractical.
A window selection algorithm lies at the core of \gls{mad} and yields a stable classification of individual samples, essential for the robustness of high-level security rules.
In experiments, \gls{mad} outperforms other approaches in accuracy and the reduced Levenshtein distance on various simulated, lab-captured, and public times-series datasets.
We will present the current related work on physics-based security and time series pattern detection in Section~\ref{sec:related}.
Then we will introduce the formal and practical definitions of the solution in Section~\ref{sec:statement} and~\ref{sec:solution}.
The two case studies presented in Section~\ref{sec:cs1} and~\ref{sec:cs2} illustrate the performances of the solution in various situations.
Finally, we will discuss some important aspects of the proposed solution in Section~\ref{sec:discussion}.
\section{Related Work}\label{sec:related}
Side-channel analysis focuses on extracting information from the involuntary emissions of a system.
This topic traces back to the seminal work of Paul C. Kocher.
He introduced power side-channel analysis to extract secrets from several cryptographic protocols \cite{kocher1996timing}.
This led to the new field of side-channel analysis \cite{randolph2020power}.
However, the potential of leveraging side-channel information for defense and security purposes remains mostly untapped.
The information leakage through involuntary emissions through different channels provides insights into the activities of a machine.
Acoustic emissions \cite{belikovetsky2018digital}, heat pattern signature \cite{al2016forensics} or power consumption \cite{10.1145/3571288, gatlin2019detecting, CHOU2014400}, can --- among other side-channels --- reveal information about a machine's activity.
Side-channel information collection generally results in time series objects to analyze.
There exists a variety of methods for analyzing time series.
For signature-based solutions, a specific extract of the data is compared to known-good references to assess the integrity of the host \cite{9934955, articlemlcs}.
This signature comparison enables the verification of expected and specific sections and requires that the sections of interest can be extracted and synchronized.
Another solution for detecting intrusions is the definition of security policies.
Security policies are sets of rules that describe wanted or unwanted behavior.
These rules are built on input data accessible to the \gls{ids} such as user activity \cite{ilgun1995state} or network traffic \cite{5563714, kumar2020integrated}.
However, the input data requirements must have labels to apply a rule.
This illustrates the gap between the side-channel analysis methods and the rule-based intrusion detection methods.
To apply security policies to side-channel information, it is necessary to first label the data.
The problem of identifying pre-defined patterns in unlabeled time series is referenced under various names in the literature.
The terms \textit{activity segmentation} or \textit{activity detection} are the most relevant for the problem we are interested in.
The state-of-the-art methods in this domain focus on human activities and leverage various sensors such as smartphones \cite{wannenburg2016physical}, cameras \cite{bodor2003vision} or wearable sensors \cite{uddin2018activity}.
These methods rely on large labeled datasets to train classification models and detect activities \cite{micucci2017unimib}.
For real-life applications, access to large labeled datasets may not be possible.
Another approach, more general than activity detection, uses \gls{cpd}.
\gls{cpd} is a sub-topic of time series analysis that focuses on detecting abrupt changes in a time series \cite{truong2020selective}.
It is assumed in many cases that these change points are representative of state transitions from the observed system.
However, \gls{cpd} is only the first step in state detection as classification of the detected segments remains necessary \cite{aminikhanghahi2017survey}.
Moreover, not all state transitions trigger abrupt changes in time series statistics, and some states include abrupt changes.
Overall, \gls{cpd} only fits a specific type of problem with stable states and abrupt transitions.
Neural networks raised in popularity for time series analysis with \gls{rnn}.
Large \gls{cnn} can perform pattern extraction in long time series, for example, in the context of \gls{nilm} \cite{8598355}.
\gls{nilm} focuses on the problem of signal disaggregation.
In this problem, the signal comprises an aggregate of multiple signals, each with their own patterns \cite{angelis2022nilm}.
This problem shares many terms and core techniques as this paper but the nature of the input data makes \gls{nilm} a distinct area of research.
The specific problem of classification with only one example of each class is called one-shot --- or few-shot --- classification.
This topic focuses on pre-extracted time series classification with few training samples, often using multi-level neural networks \cite{10.1145/3371158.3371162, 9647357}.
However, in the context of side-channel analysis, a time series contains many patterns that are not extracted.
Moreover, neural-based approaches lack interpretability, which can cause issues in the case of unforeseen time series patterns.
Simpler approaches with novelty detection capabilities are required when the output serves as input for rule-based processing.
Finally, Duin et. al. investigate the problem of distance-based few-shot classification \cite{duin1997experiments}.
They present an approach based on the similarity between new objects and a dissimilarity matrix between items of the training set.
The similarities are evaluated with Nearest-Neighbor rules or \gls{svm}.
Their approach bears some interesting similarities with the one presented in this paper.
However, they evaluate their work on the recognition of handwritten numerals, which is far from the use case we are interested in.
\section{Problem Statement}\label{sec:statement}
%\gls{mad} focuses on detecting the state of a time series at any point in time.
We consider the problem from the point of view of a multi-class, mono-label classification problem \cite{aly2005survey} for every sample in a time series.
The problem is multi-class because multiple states can occur in one-time series, and therefore any sample is assigned one of multiple states.
The problem is mono-label because only one state is assigned to each sample.
The classification is a mapping from the sample space to the state space.
\begin{problem-statement}[\gls{mad}]
Given a discretized time series $t$ and a set of patterns $P=\{P_1,\dots, P_n\}$, identify a mapping $m:\mathbb{N}\longrightarrow P\cup \lambda$ such that every sample $t[i]$
maps to a pattern in $P\cup \lambda$ with the condition that the sample matches an occurrence of the pattern in $t$.
\end{problem-statement}
The time series $t: \mathbb{N} \longrightarrow \mathbb{R}$ is a finite, discretized, mono-variate, real-valued time series.
The patterns (also called training samples) $P_j \in P$ are of the same type as $t$.
Each pattern $P_j$ can take any length denoted $N_j$.
A sample $t[i]$ \textit{matches} a pattern $P_j \in P$ if there exists a substring of $t$, the length of $P_j$, that includes the sample, such that a similarity measure between this substring and $P_j$ is below a pre-defined threshold.
The pattern $\lambda$ is the \textit{unknown} pattern assigned to the samples in $t$ that do not match any of the patterns in $P$.
\begin{figure}
\centering
\includegraphics[width=0.45\textwidth]{images/overview.pdf}
\caption{Illustration of the sample distance from one sample to each training example in a 2D space.}
\label{fig:overview}
\end{figure}
\section{Proposed Solution: MAD}\label{sec:solution}
\gls{mad}'s core idea separates it from other traditional sliding window algorithms.
In \gls{mad}, the sample window around the sample to classify dynamically adapts for optimal context selection.
This principle influences the design of the detector and requires the definition of new distance metrics.
Because the lengths of the patterns may differ, our approach requires distance metrics robust to length variations.
%For the following explanation, the pattern set $P$ refers to the provided patterns only $\{P\setminus \lambda\}$ --- unless specified otherwise.
We first define the fundamental distance metric as the normalized Euclidean distance between two-time series $a$ and $b$ of the same length $N_a=N_b$
\begin{equation}
nd(a,b) = \dfrac{EuclideanDist(a,b)}{N_a}
\end{equation}
Using this normalized distance $nd$, we define the distance from a sample $t[i]$ to a pattern $P_j \in P$.
This is the sample distance $sd$ defined as
\begin{equation}\label{eq:sd}
sd(i,P_j) = \min_{k\in [i-N_j,i+N_j])}(nd(t[i-k:i+k],P_j))
\end{equation}
%with $P_j$ the training sample corresponding to the state $j$, and $t$ the complete time series.
Computing the distance $sd(i,P_j)$ requires to: (1) select every substring of $t$ of length $N_j$ that contains the sample $t[i]$, (2) evaluate their normalized distance to the pattern $P_j$, and (3) consider $sd(i,P_j)$ as the smallest of these distances.
For simplicity, Equation~\ref{eq:sd} omits the border conditions for the range of $k$.
When the sample position $i$ is less than $N_j$ or greater than $N_t-N_j$, the range adapts to only consider valid substrings.
Our approach uses a threshold-based method to decide what label to assign to a sample.
For each sample in $t$, the algorithm compares the distance $sd(i,P_j)$ to the threshold $T_j$.
The sample receives the label $j$ associated with the pattern $P_j$ that results in the smallest distance $sd(i,P_j)$ with $sd(i,P_j)N_j$.
If $N_l