deneir/EET1/MLCS_conference/old_main.tex

%% bare_jrnl.tex
%% V1.4b
%% 2015/08/26
%% by Michael Shell
%% see http://www.michaelshell.org/
%% for current contact information.
%%
%% This is a skeleton file demonstrating the use of IEEEtran.cls
%% (requires IEEEtran.cls version 1.8b or later) with an IEEE
%% journal paper.
%%
%% Support sites:
%% http://www.michaelshell.org/tex/ieeetran/
%% http://www.ctan.org/pkg/ieeetran
%% and
%% http://www.ieee.org/

%%*************************************************************************
%% Legal Notice:
%% This code is offered as-is without any warranty either expressed or
%% implied; without even the implied warranty of MERCHANTABILITY or
%% FITNESS FOR A PARTICULAR PURPOSE!
%% User assumes all risk.
%% In no event shall the IEEE or any contributor to this code be liable for
%% any damages or losses, including, but not limited to, incidental,
%% consequential, or any other damages, resulting from the use or misuse
%% of any information contained here.
%%
%% All comments are the opinions of their respective authors and are not
%% necessarily endorsed by the IEEE.
%%
%% This work is distributed under the LaTeX Project Public License (LPPL)
%% ( http://www.latex-project.org/ ) version 1.3, and may be freely used,
%% distributed and modified. A copy of the LPPL, version 1.3, is included
%% in the base LaTeX documentation of all distributions of LaTeX released
%% 2003/12/01 or later.
%% Retain all contribution notices and credits.
%% ** Modified files should be clearly indicated as such, including  **
%% ** renaming them and changing author support contact information. **
%%*************************************************************************


% *** Authors should verify (and, if needed, correct) their LaTeX system  ***
% *** with the testflow diagnostic prior to trusting their LaTeX platform ***
% *** with production work. The IEEE's font choices and paper sizes can   ***
% *** trigger bugs that do not appear when using other class files.       ***                          ***
% The testflow support page is at:
% http://www.michaelshell.org/tex/testflow/


\documentclass[journal]{IEEEtran}

\usepackage[toc,acronym,abbreviations,nonumberlist,nogroupskip,style=super]{glossaries-extra}
\usepackage{numprint}
\usepackage{tabularx}
\newcolumntype{Y}{>{\centering\arraybackslash}X}

% \renewcommand{\baselinestretch}{.98}

% \usepackage[compact]{titlesec}
%
% \titlespacing*{\section}{0mm}{3mm}{1.5mm}
% \titlespacing*{\subsection}{0mm}{2.5mm}{1mm}
% \titlespacing*{\subsubsection}{0mm}{2mm}{0.75mm}


\usepackage{booktabs}
\usepackage{amssymb}
\usepackage{textcomp}% http://ctan.org/pkg/amssymb
\usepackage{pifont}% http://ctan.org/pkg/pifont
\newcommand{\cmark}{\ding{51}}%
\newcommand{\xmark}{\textbullet}%
\usepackage[hidelinks]{hyperref}
\usepackage{flushend}

% *** CITATION PACKAGES ***
%
\usepackage{cite}

\usepackage[pdftex]{graphicx}
% *** GRAPHICS RELATED PACKAGES ***
%
\ifCLASSINFOpdf
  \usepackage[pdftex]{graphicx}
  % declare the path(s) where your graphic files are
  % \graphicspath{{../pdf/}{../jpeg/}}
  % and their extensions so you won't have to specify these with
  % every instance of \includegraphics
  \DeclareGraphicsExtensions{.pdf,.jpeg,.png}
\else
  % or other class option (dvipsone, dvipdf, if not using dvips). graphicx
  % will default to the driver specified in the system graphics.cfg if no
  % driver is specified.
  % \usepackage[dvips]{graphicx}
  % declare the path(s) where your graphic files are
  \graphicspath{{../eps/}}
  % and their extensions so you won't have to specify these with
  % every instance of \includegraphics
  \DeclareGraphicsExtensions{.eps}
\fi

% \usepackage{amssymb}
% \newcommand\SF[1]{$\bigstar$\footnote{sf: #1}}
% \newcommand\AG[1]{$\bigstar$\footnote{agd: #1}}
% \newcommand\CM[1]{$\bigstar$\footnote{cm: #1}}
% \newcommand\JD[1]{$\bigstar$\footnote{jd: #1}}
% \newcommand\SSS[1]{$\bigstar$\footnote{ss: #1}}
% \newcommand\GG[1]{$\bigstar$\footnote{gg: #1}}

\usepackage[pdftex]{graphicx}
\usepackage{adjustbox}


% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}
\input{acronyms}

\begin{document}
%
% paper title
% Titles are generally capitalized except for words such as a, an, and, as,
% at, but, by, for, in, nor, of, on, or, the, to and up, which are usually
% not capitalized unless they are the first or last word of the title.
% Linebreaks \\ can be used within to get better formatting as desired.
% Do not put math or special symbols in the title.
\title{Side-channel Based Intrusion Detection\\for Network Equipment}
%\author{Paper 1175}
%
%
% author names and IEEE memberships
% note positions of commas and nonbreaking spaces ( ~ ) LaTeX will not break
% a structure at a ~ so this keeps an author's name from being broken across
% two lines.
% use \thanks{} to gain access to the first footnote area
% a separate \thanks must be used for each paragraph as LaTeX2e's \thanks
% was not built to handle multiple paragraphs
%

\author{Julian Dickert*, Sebastian Fischmeister*, Goksen U. Guler*, Arthur Grisel-Davy*, Waleed Khan*, Carlos Moreno*, Jack Morgan*, Shikhar Sakhuja$^\ddagger$*, Philippe Vibien* \thanks{* Author names are listed in alphabetical order.} \\[1em]
Department of Electrical and Computer Engineering
\& $^\ddagger$ David R. Cheriton School of Computer Science
University of Waterloo, Canada}
% note the % following the last \IEEEmembership and also \thanks -
% these prevent an unwanted space from occurring between the last author name
% and the end of the author line. i.e., if you had this:
%
% \author{....lastname \thanks{...} \thanks{...} }
%                     ^------------^------------^----Do not want these spaces!
%
% a space would be appended to the last name and could cause every name on that
% line to be shifted left slightly. This is one of those "LaTeX things". For
% instance, "\textbf{A} \textbf{B}" will typeset as "A B" not "AB". To get
% "AB" then you have to do: "\textbf{A}\textbf{B}"
% \thanks is no different in this regard, so shield the last } of each \thanks
% that ends a line with a % and do not let a space in before the next \thanks.
% Spaces after \IEEEmembership other than the last one are OK (and needed) as
% you are supposed to have spaces between the names. For what it is worth,
% this is a minor point as most people would not even notice if the said evil
% space somehow managed to creep in.


% The paper headers

% The only time the second header will appear is for the odd numbered pages
% after the title page when using the twoside option.
%
% *** Note that you probably will NOT want to include the author's ***
% *** name in the headers of peer review papers.                   ***
% You can use \ifCLASSOPTIONpeerreview for conditional compilation here if
% you desire.


% If you want to put a publisher's ID mark on the page you can do it like
% this:
%\IEEEpubid{0000--0000/00\$00.00~\copyright~2015 IEEE}
% Remember, if you use this you must call \IEEEpubidadjcol in the second
% column for its text to clear the IEEEpubid mark.


% use for special paper notices
%\IEEEspecialpapernotice{(Invited Paper)}


% make the title area
\maketitle

% As a general rule, do not put math, special symbols or citations
% in the abstract or keywords.
\begin{abstract}

Current security protection mechanisms for embedded systems often include running a Host-based Intrusion Detection System~(HIDS) on the system itself. This presents a problem where an attacker can leverage a vulnerability in the underlying system to attack the Intrusion Detection System~(IDS) and disable the protection mechanism. In the context of embedded systems, such as network equipment, these devices remain vulnerable to firmware and hardware tampering, as well as log manipulation and forging.

Recent work demonstrates the effectiveness of separating the detection mechanism from the monitored system. Side-channel emissions, such as power consumption, ultrasound, or electromagnetic waves, provide an independent and extrinsic data source that allows extraction of details about the system state. The information collected from the side-channels offers an accurate representation of the operations within the system.

To address the vulnerabilities of HIDS, this paper presents a solution for external IDS that analyzes the system state inferred from side-channels to offer protection.
The external IDS utilizes machine-learning-based side-channel analysis to monitor and assess the behaviour and integrity of network switches in real time. The proposed IDS successfully offers intrusion detection for an HP Procurve Network Switch 5406zl, with data available from a live environment with roughly \numprint{3000} active users, using its power consumption as side-channel emissions.

The proposed IDS successfully detects three different classes of attacks: (i)~firmware manipulation with \numprint[\%]{99} accuracy, (ii)~hardware tampering with \numprint[\%]{100} accuracy, and (iii)~brute-force SSH login attempts with \numprint[\%]{98} accuracy. The machine-learning models behind the IDS use a small number of power traces as the training data and still achieve a high accuracy for the attack detection.
The concepts and techniques discussed in the paper can also extend to offer intrusion detection for embedded systems in general.

% Most network embedded systems in data centers use Intrusion Detection Systems (IDS) to deter cases of attacks from bad actors. Literature shows that IDS are ineffective in defending against attacks such as Firmware Manipulation and Hardware Tampering. IDS can also be ineffective for run time monitoring of networks. Involuntary emissions -- power consumption, ultrasound, electromagnetic waves -- from a embedded system can compliment IDS and offer protection where the IDS fail. These emissions offer an accurate representation of the operations within the embedded system.

% This paper offers a proof of concept for analyzing the power consumption of a black box system under test (SUT) using Machine Learning. The SUT for the experiment was an HP Procurve Network Switch. The suite of experiments assesses the integrity of the switch and deploys run-time monitoring on the switch without any side-effects. All experiments were performed on the switch without taking it offline or tampering with the switch. Thus, the research compliments IDS in assessing integrity of an SUT by detecting firmware manipulation, hardware tampering, and offers run-time monitoring by identifying and classifying SSH attempts through just the AC and DC power consumption of the switch.


\end{abstract}
\glsresetall % reset all acronyms to be expanded on first use.

% Note that keywords are not normally used for peerreview papers.


% For peer review papers, you can put extra information on the cover
% page as needed:
% \ifCLASSOPTIONpeerreview
% \begin{center} \bfseries EDICS Category: 3-BBND \end{center}
% \fi
%
% For peerreview papers, this IEEEtran command inserts a page break and
% creates the second title. It will be ignored for other modes.


\section{Introduction}
Data centers are experiencing unprecedented growth~\cite{osti_1372902} because of the increased reliance on cloud services. Due to this growth, cyberattacks on data centers are at an all-time high with a 54\% increase just over 2019~\cite{datacenterbreach}. The downtime of data centers cost companies hundreds of thousands of dollars per hour~\cite{6848725}. For example, Facebook lost 90 million USD over an outage that lasted merely 14 hours.

All data centers use network equipment such as network switches and routers. A successful attack on a network switch could have devastating effects on the integrity of the datacenter. To deter cases of cyberattacks, datacenters often use \gls{ids}.
Current \glspl{ids} use different approaches to detect intrusions.
\glspl{hids} are implemented directly on the monitored device and leverage information provided by the system (e.g. log entries, resource usage, or configuration files) to detect intrusions.
\glspl{nids} leverage network information (e.g., traffic frames, traffic volume, or firewall configurations) to detect intrusions at the network level.
Although \glspl{hids} and \glspl{nids} offer comprehensive intrusion detection capabilities, they are still quite ineffective against attacks such as firmware modification~\cite{cisco_trust,thomson_2019} and bypassing secure boot-up~\cite{Cui2013WhenFM, hau_2015}. They also fail to offer effective run-time monitoring through auditing and verifying log entries~\cite{koch2010security}.

% Network Equipments are embedded platforms that generate recurrent emission patterns. These emissions are involuntary can exist as electromagnetic, noise or electrical signals. These emissions strictly correlate to the system's activity. Any physical channel that generates such an involuntary emission is called a side-channel. These emissions are formally called side-channel emissions and can offer insights into a system under observation. Traditionally, researchers used side-channel emissions to attack systems. These attacks can impact personal computers, servers, mobile devices or any type of embedded systems. Some examples of these attacks present the possibility of reducing the field of research for a cryptographic key \cite{10.1007/3-540-68697-5_9}, predicting user inputted text based on the sound of a keyboard \cite{10.1145/1609956.1609959} or recovering a document using the sound of a printer \cite{printers}.
% Some side channel attacks even leverage electromagnetic emissions of a chip \cite{10.1007/3-540-36400-5_4}.
% These types of attacks can be easy to implement and minimally invasive as they rely on information that is independent of the system and is extrinsically sourced.

The literature shows promising work in improving the state-of-the-art in security by analyzing side-channel emissions from embedded systems. These can be in the form of power consumption \cite{kocher1999differential, brier2004correlation, mangard2008power, quisquater2002automatic, Moreno2018, msgna2014verifying, kur2009improving}, electromagnetic waves \cite{khan2019malware, sehatbakhsh2019remote, yilmaz2019detecting, 8192483}, acoustic emissions \cite{genkin2014rsa, liuacoustic}, etc. Systems generate side-channel emissions as recurrent patterns which usually corresponds to the system's activity. Side-channel based \glspl{ids} (see Figure~\ref{fig:side-ids}) analyze side-channel emissions and can improve the state-of-art in \glspl{ids}, as shown in this paper. The \gls{ids} uses \gls{dsp} and \gls{ml} algorithms, to detect anomalies, or recognize patterns of previously detected intrusions. Thus, the use of this IDS would improve security of the embedded system by detecting attack vectors that regular \gls{ids} fail to identify.

\subsection{Contributions}
This paper proposes a side-channel based \gls{ids} that can complement existing \gls{ids}s and improve security for embedded systems. The side-channel based \gls{ids} can potentially treat any embedded system as a black box and detect a range of attacks against it.
Our \gls{ids} treats an HP Procurve 5406zl network switch as a black box.
The experiments in the paper together constitute a side-channel based IDS that has the following capabilities:

\begin{itemize}
    \item Detecting firmware manipulation and hardware tampering attacks against the switch.
    \item Defending against log entry forging by offering log verification/auditing.
\end{itemize}

The side-channel based \gls{ids} achieves near perfect accuracy scores despite using relatively straightforward \gls{dsp} methods and \gls{ml} algorithms. The algorithms analyze AC and DC power consumption of the network switch to detect these attacks. The experiments use a relatively small dataset that contains roughly \numprint{1000} power traces.
The small data requirement and high accuracy rates while defending against attacks makes the techniques outlined in this paper ready for deployment in the industry.


\subsection{Paper Organization}

The remainder of paper is organized as follows:
Section~\ref{sec:Overview} provides an overview for the motivation for the experiments and threat model.
Section~\ref{Related Work} talks about other side-channel based approaches for run-time monitoring and integrity assessment.
Section~\ref{Firmware} covers experiments related to Firmware Manipulation,
Section~\ref{RunTime} covers Log Verification and Auditing,
and Section~\ref{Hardware} covers Hardware Tampering.
Section~\ref{Discussion} holds some discussion about the scope and limitations of the work. Section~\ref{sec:big_picture} details the wider potential and applicability of the work.
The paper finally conclude in Section~\ref{Conclusion}.

\section{Overview}
\label{sec:Overview}

All embedded systems leak information about their operation through side channel emissions.
Side-channel based \glspl{ids} use \gls{dsp} methods and \gls{ml} algorithms to model the side-channel data and learn patterns from the data that correlate to the system activity.
A major part of designing a reliable side-channel \gls{ids} is identifying quality side-channel emissions. While a system emits a wide range of side-channels such as temperature, vibration, ultrasound, EM, power consumption, etc., our experiments focus on the power consumption of the system.
Power consumption has been studied for its use in assessing the internal state of embedded systems.
It is reasonably easy to measure non-intrusively, and discussions surrounding power analysis dates back to over two decades~\cite{kocher1999differential}.
Hence, our primary source of side-channel for the \gls{ids} is the \gls{ac} and \gls{dc} power consumption of the network switch.

Side-channel based IDS can complement \gls{hids} and \gls{nids} in offering runtime monitoring and integrity assessment for embedded systems as shown in Table \ref{tab:example}. Side-channel based \glspl{ids}  run independent to the system they monitor which makes it more difficult to circumvent compared to \gls{ids} hosted within the system.
Because of the independent nature, a malfunction of the \gls{ids} can not disrupt the regular operation of the system.
This makes the system monitored by the \gls{ids} immune to any operational failure or security vulnerability that the \gls{ids} might have.
This paper presents a case study for using side-channel based \glspl{ids} to offer run-time monitoring and integrity assessment for network equipment.


\begin{figure}[h]
    \centering
    \includegraphics[width=\columnwidth]{images/preview_ids}
    \caption{IDS based on the involuntary emissions of the system}
    \label{fig:side-ids}
\end{figure}

% For example, our system consumed more power while it was responding to an SSH attempt. Further, the duration of the increased power consumption was longer for a successful SSH attempt compared to an unsuccessful one. Similarly, other properties of the system can also be measured using the time-series data. For instance, an extra hardware module would constantly result in a higher power consumption which would allow analysis to detect instances of hardware manipulation. Even different firmware versions have different power consumption while booting up.

% Machine Learning methods can train over the side-channel profiles and compliment IDS in enhancing security of embedded systems. Table \ref{tab:example} shows different attacks where machine-learning based side-channel analysis can compliment IDS in offering protection for the system.

\begin{table}[htb]
    \centering
    \begin{tabularx}{\columnwidth}{X>{\hsize=.4\hsize}cccc}
        \toprule
        \textbf{Attack Scenarios} & \textbf{Reference} & \textbf{\gls{hids}} & \textbf{\gls{nids}} & \textbf{SCIDS}\tabularnewline
        \midrule
%       The attacker can:  & & & \tabularnewline
%        \addlinespace[1em]
        Run unapproved executable through backdoor &  \small{\cite{cve-2018-0150,cve-2018-0151,cve-2018-0222,cve-2018-0329,cve-2018-15439}}
        & \cmark & \xmark & \cmark\tabularnewline
        \addlinespace[1em]
        Exploit existing executable & \small{\cite{kovacs_2019,CVE-2019-12649,CVE-2019-12651}}& \cmark & \xmark & \cmark\tabularnewline
        \addlinespace[1em]
        Spy on the network & \small{\cite{Hernandez2014SmartNT,router_hacking_slingshot}}&  \xmark & \cmark & \cmark \tabularnewline
        \addlinespace[1em]
        Pivot/proxy for network attack & \small{\cite{router_hacking_slingshot,symantec_security_response}} & \xmark & \cmark & \cmark\tabularnewline
        \addlinespace[1em]
        Bypass secure boot & \small{\cite{cisco_trust,thomson_2019}} & \xmark & \xmark & \cmark\tabularnewline
        \addlinespace[1em]
        Change firmware & \small{\cite{Cui2013WhenFM,hau_2015}}& \xmark & \xmark & \cmark\tabularnewline

        \bottomrule
        \addlinespace[1em]

    \end{tabularx}
    \caption{Attack scenarios that side-channel based \gls{ids} can detect}
    \label{tab:example}
\end{table}

% The experiments successfully identifies activities within the system. With side-channel information, the models trained can identify instances of firmware manipulation, offer defence again brute-force SSH attempt while offering run-time monitoring, and detect hardware tampering.

\subsection{Threat Model}
\label{subsec:threat-model}

In the context of this work, we consider active attackers that can tamper with the execution of the network devices. These attackers can accomplish their goal by assuming different roles and exploiting several mechanisms, as summarized below:

\begin{itemize}
    \item \textbf{Remote Code Execution:} A remote attacker could exploit the exposure of the network device's administrative features (e.g., login capabilities, with or without administrative privileges) to the local network or the Internet. Thus, the attacker may take advantage of available, or zero-day, remote exploits in categories such as remote code injection, privilege escalation, etc.

    The outcome could be to temporarily tamper with the device's \hbox{execution\,---\,that} is, alter the current execution, with a device's reboot restoring the correct functionality; or modify the device's configuration settings so that even after a reboot, the altered functionality will remain active.

    \item \textbf{Brute-Force or Dictionary-Based Password Guessing:} % $\bigstar$\footnote{I'm not particularly convinced of adding this; reviewers could easily argue that it doesn't make sense to use side-channels to protect against this, which is trivially detectable by standard (Network based) IDSs. Then again, a big section of the experiments deals with brute-force login attempts; maybe we should sell those experiments as nothing more than a demonstration that we can detect deviations from the normal execution, and that SSH login attempts are just an example of such deviations?}
    A remote attacker could attempt to login through password guessing, with the objective of tampering with the device's execution once logged in.

    \item \textbf{Unauthorized Firmware Reprogramming (or Failure to Apply a Scheduled Firmware Upgrade):} Either through physical access to the device, or upon a successful administrative login (either by a legitimate administrator or a remote attacker that guessed or stole an administrator's credentials), the attacker can reprogram the firmware of the device. The applied firmware can be an older version (if the device allows it) to reactivate a particular vulnerability, or it could be a custom firmware that contains some backdoor or rootkit functionality.

    \item \textbf{Unauthorized Hardware Configuration Changes:} An attacker with physical access to the device could apply undocumented changes to the configuration of the device, e.g., by connecting or disconnecting modules, tampering with configuration switches or jumpers, etc.  Depending on the device's capabilities, a remote attacker could potentially enable or disable modules or functionality of the device, keeping these changes undocumented.

    \item \textbf{Tampering with Administrative/Maintenance Logs:}  The attacker's goal may be to mislead the operators through actions such as failing to apply a firmware upgrade while reporting that the firmware has been upgraded; this could be done with the purpose of keeping a particular vulnerability in the device while the administrators assume that such vulnerability has been addressed.
\end{itemize}

In the cases of attackers with physical access to the devices, we highlight the
aspect that these attackers are assumed to have limited, perhaps opportunistic,
physical access; that is, they may be one rogue operator in a team of several
operators with administrative access. Moreover, it is assumed that a system such
as the \gls{ids} that we propose in this paper would be implemented with additional
physical security measures, to make it physically inaccessible to such local
attackers.  In other words, it is assumed that even an attacker that can physically
tamper with the network device will not be able to tamper with the \gls{ids}.

\subsection{Analysis of Side-channels}
Electronic systems, including embedded devices, involuntarily leak information through different types of side-channels.
Due to each side-channel's specific nature, each one can, to a greater or lesser extent, prove useful for different applications.
In the context of \gls{ids} for network equipment, we considered power consumption, ultrasound and \gls{em} emissions as the most promising side-channels.

In our setup, the power consumption of the device is measured in two different ways: measurement at the AC line (between the power outlet and the device's \gls{psu}); and measurement at the DC power (from the \gls{psu} to the ``motherboard'' of the device). We evaluated both measurements since each have unique advantages that the other one lacks. During every operation of the device, the different instructions will have impacts on the overall power consumption \cite{727070} that will be detectable in both \gls{ac} and \gls{dc} power consumption.
The main advantage of collecting \gls{ac} powertraces is that it is less intrusive than capturing the \gls{dc} power consumption and offers the most transparent way to retrofit the proposed system into a network operation center.
One disadvantage, however, is its lower \gls{snr} compared to the \gls{dc} measurement.
The reason for this is the functionality of the \gls{ac}/\gls{dc} switching converter, which introduces a higher level of ``buffering'' of electrical energy, thus hiding some of the fine-grained details in the power consumption.
Recent work by Moreno~et~al.~\cite{Moreno2018} uses the power consumption of embedded systems for non-intrusive online runtime monitoring through reconstruction of the program's execution trace.
% (ALREADY SAID IN THE PREVIOUS PARAGRAPH) The power consumption's main advantage as a side-channel is that it is easily accessible from outside the network equipment and the hardware does not have to me modified, making the technique retrofittable into existing equipment. A drawback, however, is that this side-channel may require modification of the wiring, especially and more intrusively in the case of DC power.

Another potentially effective side-channel is the acoustic emanations from the electronics, usually ultrasound. Researchers have been successful in extracting full 4096-bit RSA decryption keys using these acoustic emanations~\cite{genkin2014rsa}. %new citations%
Faruque et al.~\cite{7479068} present an acoustic side channel attack to reconstruct the object that an additive manufacturing system, such as a 3D printer, is printing without access to the original design.
The main advantage of ultrasound over power consumption is its contactless measurement, using only a microphone placed near the device.
However, this technique requires precise placement of the microphone to achieve reproducible results.
Additionally, acoustic emissions from the environment (e.g., from the fans in the \gls{psu}) can interfere with the measurements, possibly reducing the effectiveness of this side-channel.

The operation of modern electronic devices also produce \gls{em} emanations. These emanations are correlated to the device's activity, making it an effective side-channel.
Nazari et al.~\cite{8192483} successfully used \gls{em} emissions to detect if program flow has deviated or if anomalous code is running.
The use of \gls{em} emissions allows for contactless measurement of the side-channel, even over longer distances than ultrasound.
Yet, the equipment necessary to measure high-frequency radiation like this is more expensive than for the other side-channels.
Moreover, network equipment is often located inside a metal case that shields \gls{em} waves, increasing the difficulty to obtain accurate measurements.

There are also other side channels that we consider less effective in the context of our work. These include temperature, vibration, time required for logical operations, etc. These side-channels are often useful in the context of attacks that rely on statistical parameters of the measurements. For example, thermal based attacks can extract RSA private keys from low-power CMOS microcontrollers \cite{hutter2013temperature}, or identify operations in neighbouring cores in multicore processors \cite{masti2015thermal}. However, overall temperature changes occur too slowly and would fail to offer any meaningful insight into the operation of embedded systems such as network switches.


% \subsection{Data Collection} \label{Data Collection}

% \begin{figure}[h]
%     \centering
%     \includegraphics[width=\columnwidth,height=5cm]{images/overview_eet}
%     \caption{Overview of the Data Collection Setup}
%     \label{fig:overview}
% \end{figure}

% A data acquisition pipeline developed in-house generates side-channel profiles that consist of emissions from different side-channels. These emissions exist as high-frequency time-series data that contain patterns corresponding to the response of the system triggered by stimuli.
% The case studies consider an HP Procurve 5406zl network switch as the \gls{sut}, and \gls{ac} and \gls{dc} power consumption together as the side-channel profile.
% The pipeline synchronizes all the emissions to minimize jitter, ensures completeness of emissions for all points in time, and automatically labels the data.
% This addresses one of the biggest problems in \gls{ml}: acquiring reliably labelled data~\cite{BARCHARD20131917,BARCHARD20111834,kozak2015,tu2015}.

% The data acquisition pipeline consists of three main components (see Figure \ref{fig:overview}):
% A Control Unit (Attack-PC), a System Under Test (network switch), and a Capture System (Digitizer-PC).
% User-defined Experiment Scenario files contain information about each experiment and include the type of attack, the number of attacks per iteration, and the number of iterations per experiment.
% When starting the pipeline, the Coordinator parses the given Experiment Scenario file and organizes the entire data collection process, including:
% \begin{itemize}
%     \item Controlling the Attack-PC, which is responsible for generating stimuli that resemble real-world attacks on the network switch.
%     \item Generating a metadata file that contains accurate labelling information for the attack during the experiment.
%     \item Starting the Digitizer-PC that captures the side-channel emissions.
%     \item Repeating this process for a given number of iterations after successful storage of the captured data.
% \end{itemize}

% Shunt resistors and differential amplifiers equip the \gls{psu}'s wiring to measure the power consumption of the switch. The current through the shunt resistor creates a proportional voltage drop which the differential amplifier amplifies and passes to the \gls{adc} in the Digitizer-PC.
% The \gls{adc}'s capture rate is \numprint[MHz]{1} for all experiments. The pipeline collects both AC and DC power traces in this fashion.


% Discuss different side channels that can be obtained from network equipment

%% Structure:
%% Define the side channel (1 sentence)
%% Cite another source that uses the side channel for something (1 sentence)
%% Discuss advantages or disadvantages of using the side channel for monitoring network equipment (2-3 sentences)
%% Verdict of whether we include the side channel

% Power
% + easily accessible from outside the network equipment
% + comes in two components

% RF
% + contactless
% - devices can be shielded
% - expensive

% ultrasound
% + Related work shows good results for attacks
% - requires precise placement of the probe
% - requires high control of the environment, so it's not scalable


%However, there are a few subtleties in the design of the circuits that the following subsections explore.

% \subsubsection{\textbf{AC Power Tracing}}\SF{This is not needed; remove it}
% A shunt resistor and a differential amplifier make up the power tracing board.
% This board sits between the switch's \gls{psu} and the power socket and measures the \gls{ac} power consumption of the network switch.

% \subsubsection{\textbf{DC Power Tracing}}\SF{this is not needed; you can bring the observations into the discussion section, but otherwise remove this section}
% The implementation of the \gls{dc} capture circuitry was more invasive than the one for AC.
% To perform measurements on the \gls{dc} side, a shunt resistor had to be integrated into the \gls{psu}'s wiring.
% The resistor's two terminals connect to a coaxial power connector that is then used to connect to another custom-built differential amplifier board.
% Due to the lower voltage on the DC side, a higher current flows through the wires which may lead to overheating issues on the resistor.
% Including the shunt resistor in the power supply allows for usage of the network switch's internal heat sinking capabilities, thus dealing with the overheating issue.

% \section{Experiments}
% The case study involves 3 different families of experiments including 2 experiments each. Each experiment uses Machine Learning algorithms to analyze the power consumption of the network switch. The switch was never taken offline while tapping into its emissions. The experiments relate to either assessing integrity of the switch or deploying run-time monitoring systems for the switch.


\section{Related Work} \label{Related Work}
The idea of side-channel based IDS traces back to the seminal work in side-channel analysis by Paul C. Kocher.
He introduced Differential Power Analysis  to find secret keys used by cryptographic protocols in tamper resistant devices~\cite{kocher1999differential}.
This led to a field of research focussing on side-channel analysis that has been ever growing. Power analysis is the most common and widely studied side-channel analysis technique~\cite{brier2004correlation,mangard2008power}. %new citations%
Cagalj et al.~\cite{vcagalj2014timing} show a successful passive side-channel timing attack on U.S. patent Mod 10 method and Hopper-Blum (HB) protocol.
Quisquater et al.~\cite{quisquater2002automatic} present an approach to identify executed instructions with the use of self-organizing maps, power analysis and analysis of electromagnetic traces. %new citations%
Zhai et al.~\cite{zhai2015method} propose a self-organizing maps approach that uses features extracted from an embedded processor to detect abnormal behavior in embedded devices.
Eisenbarth et al.~\cite{eisenbarth2010building} propose a methodology for recovering the instruction flow of microcontrollers using its power consumption.
Goldack et al.~\cite{goldack2008side} propose a solution to identify individual instructions on a PIC microcontroller through mapping each instruction type to a power consumption template.
However, the attack focussed side-channel analysis can offer non-intrusive run-time monitoring, as well. \\
\indent
Literature shows promising work in assessing integrity through cache monitoring~\cite{7163050} and power monitoring~\cite{10.1145/2976749.2978299}.
Works by Moreno et al. offer two building blocks for this work.
In~\cite{moreno2013non}, the team proposes a solution for non-intrusive debugging and program tracing using side-channel analysis.
In this work, they use the power consumption of a given embedded system to identify the code block the embedded system was executing at the time.
The team builds on their previous technique and presents a new one~\cite{Moreno2018} using the power consumption of embedded systems for non-intrusive online run-time monitoring through anomaly detection.
They use a signals and systems analysis approach to identify anomalies using the power consumption of a system and show case this by identifying buffer overflow attacks on their system.
Msgna et al.~\cite{msgna2014verifying} propose a technique for using the instruction-level power consumption of a system to verify the integrity of the software components of a system with no prior knowledge of the software code.
In~\cite{kur2009improving}, Kur et al. perform power analysis of smart cards based on the JavaCard platform helps identify vulnerable operations, obtain bytecode instruction information, and also proposes a framework to replace vulnerable operations with safe alternatives.\\
\indent
In more recent literature, there is a trend towards the use of \gls{ml} for side-channel analysis to enhance the security of systems.
Michele Giovanni Calvi~\cite{calvi2019runtime} offers a solution for run time monitoring of an entire cyberphysical system treated as a black box.
They collect data from a self-driving car during operations such as steering and acceleration.
Using this data, they train an Long Short Term Memory~\cite{hochreiter1997long} deep learning model and use it to verify the safety of the vehicle. %new citations%
Zhengbing et al. \cite{4488501} suggest the use of forensic techniques for profiling user behaviour to detect intrusions and propose an intelligent lightweight \gls{ids}. Hanilçi et al.~\cite{hanilci2011recognition} use recorded speech from a cell phone to ascertain the cell phone brand and model through using vector quantization and \gls{svm} models on the \gls{mfcc} of the audio.
In~\cite{khan2019malware} Khan et al. propose a technique to identify malware in critical embedded and cyberphysical systems using \gls{em} side channel signals.
Their technique uses deep learning on EM emanation to model the behavior of an uncompromised system.
The system flags an activity as anomalous when the emanations differ from the normal ones used to train the neural network.
Sehatbakhsh et al.~\cite{sehatbakhsh2019remote} also use EM emanations and detect malware code injection into a known application without any prior knowledge of the malware signature.
They use HDBSCAN clustering method to identify anomalous behavior exhibited by the malicious code.
Yilmaz et al.~\cite{yilmaz2019detecting} implement K-Nearest Neighbors clustering methods along with PCA dimensionality reduction method to model EM emanations from a phone with the different operational status of front/rear camera.
Using the ML methods, they can determine the state of cellphone cameras. \\
\indent
The work that this paper proposes builds on top of the aforementioned works. An HP network switch, treated as a black box, generates side-channel leaks in the form of its power consumption.
The experiments treat this power consumption as an output of the system when the input are certain attacks/stimuli that triggers the switch. The data trains \gls{ml} models which, in turn, successfully identifies the attacks/stimuli on the switch.


\section{Experiment Family I: Firmware Manipulation} \label{Firmware}
Embedded systems need regular firmware updates for a range of reasons such as addition of features or security patches.
Attacks on these systems commonly target the firmware update process~\cite{hau_2015}.
A successful attack could compromise the integrity of the switch and the data center.
The ability to modify the firmware enables attackers to perform a range of other attacks, such as Communication Channel Manipulation [CAPEC 216], Protocol Manipulation [CAPEC 272], Functionality Bypass [CAPEC 554], and Software Integrity Attack [CAPEC 184].

The following two experiments were conducted with ten different official firmware versions using the same device configuration.
Starting from the pre-installed version K.15.06.008 we performed upgrades to the next 10 higher release versions (K.15.07 to K.15.17), and picked the final build for each release.
Firmware downgrades were put aside to avoid bricking the device.


\subsection{Classifying Firmware Versions} \label{Classifying-Firmware-Versions}
Given a power trace during boot-up, the goal of this experiment is to
predict which firmware from a given set of ten different versions is currently installed on the device. The result can be used to confirm successful firmware updates and to check whether the device reports the correct version.

\subsubsection{\textbf{Feature Engineering}} \label{FE-Firmware}
With the HP Procurve Switch 5406zl taking around 120 seconds to complete its boot-up sequence, this experiment family produces the largest datasets of this case study.
At a sampling rate of \numprint[MHz]{1}, each dataset consists of \numprint{120e6} datapoints.
With a file size of two times 240 MB (one file for AC, one file for DC) per run, the \gls{ml} algorithms for this experiment would require more processing power than for any of the other performed experiments.

\begin{figure}[htp]
    \centering
    \includegraphics[width=\linewidth]{images/Firmware_Comparison_TD_direct.eps}
    \caption{Median-filtered (i.e. smoothed) power traces of boot-up sequences for two different firmware versions (ten captures each). At around 70 seconds, there is a visible difference in the time series.}
    \label{fig:eet-samples}
\end{figure}

Therefore, several preprocessing steps were applied to reduce the size of the datasets and remove noise.
It was found that a combination of downsampling and a sliding median filter yields the best results at a minimal size per training set. Given a power trace with a length of \numprint{120e6} datapoints, downsampling with a factor of (in total) \numprint{1e6} results in a sample size of 120 and provides an overall accuracy of \numprint[\%]{99} for this experiment.
A sliding median filter, which is applied between two rounds of downsampling, replaces each value with the median value of the window.

This process enables training accurate machine-learning models (cf. Table \ref{tab:fw-change-fd-precision-comparison}) with less than \numprint{1000} training sets, each consisting of 120 datapoints.

\begin{figure}[htp]
    \centering
    \includegraphics[width=\linewidth]{images/psd.eps}
    \caption{PSD of power traces of boot-up sequences for two different firmware versions (two trace for each version)}
    \label{fig:eet-psd}
\end{figure}

The model in frequency domain uses \gls{psd}~\cite{1536928} of DC data. Before the preprocessing the data have datapoints with a length of \numprint{120e6}, recorded at a sampling rate of \numprint[MHz]{1}. Visual inspections and analysis suggest that the patterns are more distinguishable between samples \numprint{70e6}-\numprint{120e6} based on this, preprocessing removes the first \numprint{70e6} samples. After removing the first \numprint{70e6} samples  decimating~\cite{1456237} data with a factor of 100 helps by filtering some of the noise and decreases required time to calculate \gls{psd}. Calculating the \gls{psd} of the data follows the preceding operations.

Figure~\ref{fig:eet-psd} shows an example \gls{psd} for two different firmware versions where different patterns are observable. The visual inspection of the \gls{psd} also shows that selecting different frequencies ranges can improve or worsen the accuracy of the model. The visible patterns on the \gls{psd} plots indicate that selecting all of the data points from \gls{psd} helps increase the accuracy of the model when correlated with the results.

\subsubsection{\textbf{Results}}

%\paragraph{\textbf{Time Domain}}
The \gls{rfc} delivers the best results of all tested \gls{ml} algorithms.
A \gls{rfc} model trained on 786 samples achieved an accuracy of over \numprint[\%]{99} on an independently collected set of \gls{dc} data.

%\paragraph{\textbf{Frequency Domain}}
Among the various \gls{ml} models trained on frequency-domain data, \gls{rfc} model has the best results with \numprint[\%]{99} accuracy. The \gls{rfc} model when tested on an independently collected validation set presents the same results, verifying the integrity of the model. The details of trained models and their performances can be found on the Table~\ref{tab:fw-change-fd-precision-comparison}.

\begin{table}[ht]
    \begin{center}
    \begin{tabularx}{\columnwidth}{YYYYY}
        \toprule
        \textbf{Model} & \textbf{Macro Precision} & \textbf{Macro Recall} & \textbf{Macro F1 Score} & \textbf{Accuracy} \tabularnewline
        \midrule
        & \multicolumn{3}{>{\hsize=\dimexpr3\hsize+3\tabcolsep+\arrayrulewidth\relax}c}{\textbf{Time Domain – DC Data}} & \tabularnewline
        \midrule
        \gls{rfc} & \numprint[\%]{100} & \numprint[\%]{100} & \numprint[\%]{100}  & \numprint[\%]{100} \tabularnewline
        \gls{svm} & \numprint[\%]{97.0}  & \numprint[\%]{97.4} & \numprint[\%]{96.8} & \numprint[\%]{99.3}\tabularnewline
        \midrule
        & \multicolumn{3}{>{\hsize=\dimexpr3\hsize+3\tabcolsep+\arrayrulewidth\relax}c}{\textbf{Time Domain – AC Data}} & \tabularnewline
        \midrule
        \gls{rfc} & \numprint[\%]{90.0} & \numprint[\%]{93.7} & \numprint[\%]{87.4}  & \numprint[\%]{98.9} \tabularnewline
        \gls{svm} & \numprint[\%]{80.7}  & \numprint[\%]{75.1} & \numprint[\%]{75.8} & \numprint[\%]{95.5} \tabularnewline
        \midrule
        & \multicolumn{3}{>{\hsize=\dimexpr3\hsize+3\tabcolsep+\arrayrulewidth\relax}c}{\textbf{Frequency Domain – DC Data}} & \tabularnewline
        \midrule
        \gls{rfc} & \numprint[\%]{97.0} & \numprint[\%]{96.5} & \numprint[\%]{97.6} & \numprint[\%]{99.8} \tabularnewline
        SVM & \numprint[\%]{95.5} & \numprint[\%]{96.5} & \numprint[\%]{95.3} & \numprint[\%]{96.0} \tabularnewline
        \bottomrule
    \end{tabularx}
    \end{center}
    \caption{Comparison between the different algorithms for firmware classification on an independent verification set of size 100}
    \label{tab:fw-change-fd-precision-comparison}
\end{table}

\subsection{Detecting Firmware Change}
Given the most recently collected power trace during boot-up and the power trace collected one before it, the goal of Experiment 2 is to predict whether the firmware has been altered between these two traces.
The model uses \gls{dtw} and a training procedure on the collected traces, which implements a distance value as a parameter to the model to provide a decision, whether there is a change in the firmware version.

\subsubsection{\textbf{Feature Engineering}}

Measuring the distance with \gls{dtw} is a computationally expensive operation~\cite{10.5555/645803.669511}.
The expensive computation requirements can be overcome by reducing the number of points while calculating the difference with \gls{dtw}. To decrease the computing times the power trace gets downsampled. Applying sliding median filter helps reduce the noise in the trace.
After these operations all of the samples contain \numprint{120} points. The samples are pseudo-randomly splitted to train, test and validation sets.

Section \ref{Classifying-Firmware-Versions} provides a more detailed description of the preprocessing steps, which are the same for this experiment.

\subsubsection{\textbf{Results}}
Results obtained from the firmware classification model (cf. section \ref{Classifying-Firmware-Versions}) suggests that it is possible to verify the firmware version on the system.
The model design from the firmware classification experiment provides useful insight to determine whether there is a change in the firmware or not as the labels are known for each firmware.

The model uses a windowed \gls{dtw} to compute the distance between the current and the previous power traces collected during the boot-up.
The distance that results from \gls{dtw} is then subjected to a comparison with the model parameters.
The model have the parameter $D_{\max}$ (maximum distance).
Optimizing of the parameter involves training on the data collected for the firmware classification experiment.

%In this context, the threshold $k$ for a class $j$, $k_j$, is defined as the increase of the distance on the maximum distance, $d_{max}$ calculated for the same class, percentage-wise.\SF{this sentence is unclear, a comma appears out of nowhere and the rest is a fragment}

Given a pseudo-random sample trace of Class $j$ from the training set, the selected sample acts as the baseline for the Class $j$ model.
\gls{dtw} in the model calculates the distance between the baseline and all the samples in the training set.
The results determine the parameters of the model.
The maximum distance is defined as ($d_{j_{\max}}$) and the variance of the distances of class $j$ as $\sigma_j$.

The following example illustrates the described process:
Given that Class 1 has a maximum distance of $d_{1_{\max}}$ when the \gls{dtw} is computed with the traces belonging to the same class.

A non-generalized, specific to each firmware version, instance of the model is the following.
Given the firmware sample $a$ belonging to Class $y$ collected during the previous boot-up and the current firmware sample $b$ belonging to any class, the model can provide the decision whether there is a change in the firmware or not.
The resulting decision is $1$ if there is a change in the firmware version, and $0$ otherwise.

\begin{equation}
    \text{decision} = \left\{\begin{matrix}
0 & \text{ if } d_{y_{\mathrm{max}}}  \sigma_j \geqslant DTW{a,b} (1+\sigma_j)\\
1 & \text{ if } d_{y_{\mathrm{max}}}  \sigma_j < DTW{a,b} (1+\sigma_j)\\
\end{matrix}\right.
\end{equation}


The equation shows the possible cases when the model makes the decision. The equivalence case provides the decision of $0$ as the maximum distance for the class is already observed and considered valid, verifying there is no change in the firmware version.

%The decisions from the model when compared to the ground truth labels that indicate whether the firmware has changed.\SF{no single sentence paragraphs}

Above steps describe the training procedure to produce models for each class. Instead of requiring another parameter in the model that uses the class information of each sample, it is possible to remove that parameter by introducing the parameter, $D_{\max}$. This parameter is the mean of the parameter $d_{n_{\max}}$ across all the models belonging to a class. Getting the average instead of the maximum of $d_{n_{\max}}$ is valid because the distance results obtained from \gls{dtw} are roughly similar on all classes and any bias that may occur towards a single class is removed from the model.

The equation to calculate the parameter and the generalized value of variance is the following:
\begin{align}
    D_{\max} &= \frac{~1~}{n} \, \sum_{i=1}^n d_{i_{\max}} \\
    \sigma_{\mathrm{all}} &= \frac{~1~}{n} \, \sum_{i=1}^n \sigma_{i_{\max}}
\end{align}
where $n$ denotes the number of classes.

The general model uses $D_{\max}$, as follows:

\begin{equation}
    \text{decision} = \left\{\begin{matrix}
0 & \text{ if } D_{\mathrm{max}} \sigma_{\mathrm{all}} \geqslant DTW{a,b} (1+\sigma_{\mathrm{all}})\\
1 & \text{ if } D_{\mathrm{max}} \sigma_{\mathrm{all}} < DTW{a,b} (1+\sigma_{\mathrm{all}})\\
\end{matrix}\right.
\end{equation}
where $a$ and $b$ denote two boot-up samples.

The equivalence case denotes that there is no change in the firmware.
Because $D_{\max}$ is the average of all $d_{j_{\max}}$ values, thus it falls into the range of observed values.

Training and test results indicate that the model achieves \numprint[\%]{99} accuracy when the $D_{\max}$ is \numprint{27.16}.
The test data is data that have been collected under same conditions with the training data and includes firmware versions that are present during the training process as well as firmware versions that are not present. The test data has never been subjected to the training process and the training procedure is applying the above notations with the described parameters that were set during the training process.
Based on the model accuracy a generalization of the model is possible with the introduced $D_{\max}$ without requiring any input from the firmware classification model.


%% \subsubsection{\textbf{Limitations}}\SF{this goes into the discussion section}
%%The similarity score can be very high\SF{no qualitative statements unless you provide data}, if the said change is a firmware upgrade where the both firmware versions have similar power traces. The model may give false positive results based on this limitation.  However this limitation can be overcome by further improvements on the implementation and usage of the model.\SF{rephrase this paragraph. It's too verbose without lots of content. The three sentences can be combined into a single one.}

%%Another limitation is that\SF{start of sentence to here is just filler; try to be more concise} the model will detect changes based on a threshold. Therefore, the threshold provided to the model must be fine-tuned\SF{passive voice} and this involves a training step.\SF{state that due to the simplicity of the approach, this fine tuning and training can be accomplished online (without using passive voice)}

\section{Experiment Family II: Run-Time Monitoring} \label{RunTime}
Secure Shell (SSH) is a cryptographic protocol, formalized by the \gls{ietf} in 2006, that allows users to securely access a remote device even if the network is insecure. All systems that enable SSH access usually maintain logs of SSH login attempts. These logs offer details about SSH login attempts on the system. However, maintaining a log of the login attempt history proves futile since an attacker with control of the system can forge these log entries. Since, side-channel \gls{ids} only focus on external properties and are independent of the system they monitor, they can defend against an attacker forging log entries. The \gls{ids} also offers defence against attacks such as Identity Spoofing [CAPEC 151], API Manipulation [CAPEC 113], Brute Force [CAPEC 112], Fuzzing [CAPEC 28], Excavation [CAPEC 116].


% The DAQ collects the dataset at 1MS/s for 50 seconds (5,000,000 data points) per trace with 60 traces for successful SSH attempts and 60 for unsuccessful. \

% Our power trace is a time series that we can define as
% \[T_{1} = X_{1}, X_{2}, ..... , X_{N}\]
% \[X_{i} \in R\] where N is the total number of data points in the time series.

% The DAQ also automates generation of the  metadata file that contains the labels. For each SSH attempt, the metadata file reports its start and end times. The label information and power consumption are two separate time series. The labels exist as a discrete time-series consisting of 0s or 1s representing no SSH attempt or an SSH attempt. When there is an SSH attempt, the power consumption time series shows a spike and the labelling time series outputs 1s. Thus, the labelling time series can be formalized as:

%  \[T_{2} = Y_{1}, Y_{2}, ..... , Y_{N}\]
%  \[Y_{i} \in [0, 1]\]

% Figure \ref{fig:ssh_overview} shows time series of 7500 datapoints in time domain along with its labels. During feature engineering, windows of 500 datapoints are chosen as samples. Because of the decimation, the data corresponds 1 millisecond to 1 datapoint. If a window contains only datapoints representative of SSH attempts, the window is labelled 1, otherwise, it is labelled 0. There is a delay of 2 seconds between each SSH attempt. A sliding window method extracts training samples for the \gls{ml} models. The sliding window method included window size of 500 datapoints and a step size of 250 datapoints. Smaller window sizes ensure more windows where all datapoints correlate to SSH activity. Thus, a sample can be formalized as:

% \[S = X_{1}, X_{2}, .... , X_{500}\]
% \[S \subset T_{1}\]

% If ${S \in [1]^{500}}$ then the feature is indicative of an SSH attempt otherwise the feature indicates no SSH attempt.

% We can represent this as a matrix of the features
% \[Z = [S_{1}, S_{2}, ... , S_{L}]\]
% \[Y_{Z} \in [0,1]^{L}\]


% All our feature engineering till this point, have been in the time domain. However, we also experimented with converting our time-domain based features into the frequency domain by running a Fourier Transformation on the features. While visualizing the FFTs of our samples (Fig. 2), we can see that the current consumption of the switch during an SSH attempt looks very different from the current consumption when the switch is idle. We explore our results from the time domain and frequency domain in the results section.

\subsection{Detecting SSH Login Attempts}
\label{detect_ssh}

This experiment aims to identify instances of SSH login attempts in the power trace collected from a network switch during its regular operation. We define regular operation as the state after bootup where all the ports and services of the network switch are functioning and the switch is available on the network for remote access.

\subsubsection{\textbf{Feature Engineering}} The signal collected from the network switch is time series $T_1 \triangleq \{x \in \mathbb{R}\}$ with uniformly sampled values $x$ at a frequency of \numprint[MHz]{1}. This experiment downsamples the data by a factor of \numprint{1000} which results in 1 sample per millisecond. Although this leads to loss of some data, it makes the feature space smaller and avoids the curse of dimensionality \cite{theodoridis2009pattern, 4766926} during training. Each sample has a corresponding label that is either 1 (\gls{ssh} login attempt) or 0 (no \gls{ssh} attempt). The labels can be represented as: $ T_2 \triangleq \{y \in \mathbb\{0,1\}\}$.


SSH login attempts show discernible patterns in the power traces collected. There is a visible spike in power consumption during each login attempt. Figure~\ref{fig:ssh_time_window} shows roughly \numprint{14000} datapoints in the time domain along with its labels. The start time of the capture along with the markers for start and end times of the individual \gls{ssh} login attempt allows the calculation of the labels.
The data acquisition process saves these timestamps while capturing the power traces. To create training samples for the \gls{ml} algorithms, a sliding window of \numprint{500} datapoints and step size of \numprint{250} datapoints divides the powertrace into multiple samples with $S \triangleq \{ x \in \mathbb{R}\}$ with $|S| = 500$ and $S \subseteq T_1$.

Every datapoint in the sample is a feature for the model. If ${S \in [1]^{500}}$ then the sample is indicative of an SSH attempt otherwise the feature indicates no SSH attempt. A matrix representation of $Z = \{ S_{1}, S_{2}, ... , S_{L}\}$ with rows of $S$ and $\forall i,j: |S_i|=|S_j|$, and the accompanying set of labels $Y_{Z} = \{ y_Z \in \{0,1\}^{L}\}$ where $L$ is the total number of samples.

\begin{figure}[htp]
    \centering
    \includegraphics[width=\linewidth]{images/time_domain_ssh.eps}


    \includegraphics[width=\linewidth]{images/time_domain_ssh_labels.eps}
    \caption{Downsampled and scaled DC power traces during a sequence of SSH login attempts (top figure) and the corresponding labels (bottom figure)}
    \label{fig:ssh_time_window}
\end{figure}

The samples created while applying sliding window to the power trace exist in time domain. Application of \gls{fft} can convert the data from time-domain to frequency domain. The \gls{fft} calculates the frequency spectrum for windows of 500 features. The spectrum is labelled 0 or 1 corresponding to their original labels from the time-domain.

% \begin{figure}[htp]
%     \centering
%     \includegraphics[width=\linewidth]{images/ssh_fft.eps}
%     \caption{Spectrum of an SSH login attempt window and an idle window in frequency domain.}\JD{Fix or remove}
%     \label{fig:ssh_fft_comparison}
% \end{figure}

\subsubsection{\textbf{Results}}

    A test set with \numprint{4095} samples consisting of \numprint{500} features each led to the results in Table \ref{tab:ssh-precision-comparison}. The feature engineering step extracts these samples from 20 power traces (each 50 second long). In total, there were 120 power traces and the model trained over 85 of them and validated over 15. \gls{ssh} attempts comprised \numprint[\%]{30} of the data, and the rest represented the idle behaviour of the system. The skew in the dataset makes the model more certain while predicting a positive class and helps lower the number of false positives.

    The \gls{svm} model trained on data in time-domain using the Gaussian Kernel configured with $C = 1$ and $\gamma = 0.1$ achieved an accuracy of \numprint[\%]{98}. \gls{rfc}, configured with 500 trees and a maximum depth of 50, performed equally well and achieved an accuracy of \numprint[\%]{97}, also on time-domain.

    The models trained on data in frequency domain were not as promising as they were in time domain. \gls{1dcnn} model had the highest accuracy with an accuracy of \numprint[\%]{94}. The \gls{svm} model did not converge while training on data in frequency domain.

    Lastly, a \gls{1dcnn} trained on a mix of data from both time and frequency domain achieves an accuracy rate of \numprint[\%]{95} and minimizes \gls{fpr} to \numprint[\%]{1}, however, it has the highest \gls{fnr}.

    Thus, \gls{svm} had the best accuracy rates along with the lowest \gls{fnr} and the second lowest \gls{fpr}. \gls{rfc} trained on time-domain data, on the other hand, has the lowest \gls{fpr} but has a much higher \gls{fnr}. Low \gls{fpr} is more important than \gls{fnr} during log verification/auditing because a system can always detect an \gls{ssh} login on a subsequent attempt even if it misses one. However, a high \gls{fpr} would flag the system incessantly and be costly to the system administrator. Thus, \gls{svm} would be the choice of algorithm to implement this experiment because of its high accuracy rates and low \gls{fpr} and \gls{fnr}.

    The \gls{svm} model requires a mean time of 763ms ($\sigma$=25ms) while \gls{rfc} requires a mean time of 469ms ($\sigma$=2.9ms) per prediction. The final model size for both the algorithms was 380MB. With a sub-second prediction time, a relatively small model size, and high precision rates, the techniques behind these models can offer effective runtime monitoring for network switches and other embedded systems.

%     \begin{figure}[htb]
%     \centering
%     \includegraphics[width=8cm]{images/time-cnn.png}
%     \caption{1D CNN with Time-Domain EET samples}
%     \label{fig:1d-cnn-TD}
% \end{figure}


%         \begin{figure}[htb]
%     \centering
%     \includegraphics[width=8cm]{images/frequency-cnn.png}
%     \caption{1D CNN with Frequency-Domain EET samples}
%     \label{fig:1d-cnn-FD}
% \end{figure}


%     \begin{figure}[ht]
%     \centering
%     \includegraphics[width=8cm]{images/merged-cnn.png}
%     \caption{1D CNN with both Frequency-Domain and Time-Domain EET samples}
%     \label{fig:1d-cnn-combined}
% \end{figure}

Table \ref{tab:ssh-precision-comparison} presents the results of all the algorithms used on data across all domains.

\begin{table}[ht]
    \begin{center}

    \begin{tabularx}{\columnwidth}{YYYYYYY}
        \toprule
        \textbf{Model} & \textbf{Precision} & \textbf{Recall} & \textbf{F1 Score} & \textbf{Accuracy} & \textbf{FPR} & \textbf{FNR} \tabularnewline
        \midrule
        & \multicolumn{5}{>{\hsize=\dimexpr5\hsize+5\tabcolsep+\arrayrulewidth\relax}Y}{\textbf{Time Domain}} & \tabularnewline
        \midrule
        \gls{rfc} & \numprint[\%]{95} & \numprint[\%]{97} & \numprint[\%]{95} & \numprint[\%]{97} & \numprint[\%]{0.6} & \numprint[\%]{14} \tabularnewline
        SVM & \numprint[\%]{95} & \numprint[\%]{97} & \numprint[\%]{96} & \numprint[\%]{98} & \numprint[\%]{0.8} & \numprint[\%]{8} \tabularnewline
        1D~CNN & \numprint[\%]{94} & \numprint[\%]{93} & \numprint[\%]{93} & \numprint[\%]{96} & \numprint[\%]{2} & \numprint[\%]{9} \tabularnewline
        \midrule
        & \multicolumn{5}{>{\hsize=\dimexpr5\hsize+5\tabcolsep+\arrayrulewidth\relax}Y}{\textbf{Frequency Domain}} & \tabularnewline
        \midrule
        \gls{rfc} & \numprint[\%]{89} & \numprint[\%]{67} & \numprint[\%]{72} &
        \numprint[\%]{88} &
        \numprint[\%]{12} &
        \numprint[\%]{8} \tabularnewline
        SVM & -- & -- & -- & -- & -- & --  \tabularnewline
        1D~CNN &
        \numprint[\%]{90} & \numprint[\%]{90} & \numprint[\%]{90} & \numprint[\%]{94} &
        \numprint[\%]{3} &
        \numprint[\%]{17} \tabularnewline
        \midrule
        & \multicolumn{5}{>{\hsize=\dimexpr5\hsize+5\tabcolsep+\arrayrulewidth\relax}Y}{\textbf{Time + Frequency Domain}} & \tabularnewline
        \midrule
        1D~CNN & \numprint[\%]{89} &
        \numprint[\%]{95} &
        \numprint[\%]{92} &
        \numprint[\%]{95} &
        \numprint[\%]{1} &
        \numprint[\%]{20} \tabularnewline
        \bottomrule
    \end{tabularx}

    \end{center}
    \caption{Comparison between the different algorithms for detecting SSH login attempts}
    \label{tab:ssh-precision-comparison}
\end{table}

% \subsubsection{\textbf{Limitations}}
% The power consumption of the network switch\SF{is it a switch? do we always call it a switch?} can affect the model, if it\SF{what changes? the switch or the model?} were to change due to changes in the firmware. Collecting side-channel emissions over more firmware versions can address these issues.
% This\SF{what does "this" refer to?} will result in a more robust model that can accommodate and work with more variations in the experiment setup.

\subsection{Classifying SSH Login Attempts}
Given a window of power trace where there is an SSH login attempt, this experiment attempts to classify the login attempt as successful or unsuccessful.

\subsubsection{\textbf{Feature Engineering}}
This experiment builds on top of experiment \ref{detect_ssh} and classifies the \gls{ssh} login attempts detected as successful or failed. The experiment considers the data only in time-domain. The matrix representation for this experiment is a slight modification of the previous one: $Z = \{ S_{1}, S_{2}, ... , S_{L}\}$ with rows of $S$ and $\forall i,j: |S_i|=|S_j|$, and the accompanying set of labels $Y_{Z} = \{ y_Z \in \{-1,1\}^{L}\}$ where $L$ is the total number of windows, $S$ is a window of \numprint{500} samples in time-domain, and all the windows correspond to either a successful or a failed SSH login attempt. Figure \ref{fig:ssh_time_classification} shows difference between the DC power trace of a successful and failed SSH login attempt.

\begin{figure}[htp]
    \begin{center}
        \includegraphics[width=\columnwidth]{images/ssh_class_2}
    \end{center}
    \caption{Downsampled DC power traces of a successful and failed SSH login attempt}
    \label{fig:ssh_time_classification}
\end{figure}

\subsubsection{\textbf{Results}}

Models trained using \glspl{svm} and \gls{1dcnn} gave the best results for the classification along with the lowest \gls{fpr} and \gls{fnr}. Optimizing the parameters of the \gls{rfc} with 250 trees, \glspl{svm} with $C = 100$, $\gamma = 10$, and Gaussian Kernel, and \gls{1dcnn}, the accuracy score reached \numprint[\%]{96.7}, \numprint[\%]{98.5} and \numprint[\%]{98.6} respectively. Table \ref{tab:ssh-classification-precision-comparison} details all the results.

 The experiment uses roughly 5000 samples extracted from experiment \ref{detect_ssh} that includes only successful and unsuccessful SSH attempts. 65\% of all the samples comprise the training set, 15\% contributes to the validation set, and the test set includes 20\% of all the samples. Testing is done over roughly 1000 samples of 500 features. The \gls{svm} model performed the best and had the lowest \gls{fpr} and \gls{fnr}. The model requires a mean time of 203 ms ($\sigma$=9 ms) per prediction and requires 184MB of storage space.


\begin{table}[ht]
    \begin{center}
    \begin{tabularx}{\columnwidth}{YYYYYYY}
        \toprule
        \textbf{Model} & \textbf{Precision} & \textbf{Recall} & \textbf{F1 Score} & \textbf{Accuracy} & \textbf{FPR} & \textbf{FNR} \tabularnewline
        \midrule
        & \multicolumn{5}{>{\hsize=\dimexpr5\hsize+5\tabcolsep+\arrayrulewidth\relax}c}{\textbf{Time Domain}} & \tabularnewline
        \midrule
        \gls{rfc} & \numprint[\%]{97} & \numprint[\%]{97} & \numprint[\%]{97} & \numprint[\%]{96.7} & \numprint[\%]{12} & \numprint[\%]{8} \tabularnewline
        SVM & \numprint[\%]{99} & \numprint[\%]{99} & \numprint[\%]{99} & \numprint[\%]{98.5} &
        \numprint[\%]{1} &
        \numprint[\%]{1.5}  \tabularnewline
        1D~CNN & \numprint[\%]{98.5} &
        \numprint[\%]{98} & \numprint[\%]{98} & \numprint[\%]{98} & \numprint[\%]{1} & \numprint[\%]{2}  \tabularnewline
        \bottomrule
    \end{tabularx}
    \end{center}
    \caption{Comparison between the different algorithms for classifying SSH login attempts}
    \label{tab:ssh-classification-precision-comparison}
\end{table}

% \subsubsection{\textbf{Limitations}}
% Along with limitations from the previous experiment, the feature engineering makes the assumption that the given input sample contains an SSH login attempt. If an input does not include an SSH attempt, it will still classify it as either a failed or a successful SSH attempt.

\section{Experiment Family III: Hardware Tampering} \label{Hardware}

The HP Procurve Switch 5406zl offers the on-the-fly installation of networking modules to modify the number of Ethernet ports available.
This capability exposes the switch to a Hardware Integrity Attack [CAPEC 440].
An attacker with physical access to the front panel of the network equipment could tamper with the modules and potentially install unauthorized ones.
Installing new modules could offer a way to gain access to the machine by an attacker leveraging a poor default configuration of the ports.
For example, on a network equipment where the default configuration does not include a limit for the number of MAC addresses per port, installing an extension module could allow an attacker to perform a MAC Flood attack [CAPEC 125]. This attack consist in filling the MAC address table of the switch with new MAC address. When this table is full, the switch is forced to broadcast every frame to every ports. This way, an attacker can receive traffic that it should not have access to \cite{7130435}.
Using this method, an attacker could gain illegitimate access without the need to reboot the system (necessary for firmware manipulation attacks).
Existing \glspl{ids} and security software do not yet offer functionality to detect the installation of unauthorized modules.
Hence, currently the only way to identify unauthorized hardware modification is through the use of the network equipment's involuntary emissions.

\subsection{Identifying Number of Expansion Modules}
\label{expe:hardware-1}

This experiment aims to identify the number of modules installed from a capture of \gls{ac} or \gls{dc} power consumption from the network equipment. In this experiment, there was no on-the-fly installation or removal of module during the capture.

\subsubsection{\textbf{Feature Engineering}}
The impact of the installation or removal of a module can is detectable in both \gls{dc} and \gls{ac} power consumption. These two types of emissions require different processing to extract the features that characterize the number of modules.

 The installation or removal of an expansion module increases or decreases the average \gls{dc} power consumption of the device.
By analyzing \gls{dc} power consumption, it is then possible to identify the number of expansion modules installed at any time.
To create the training dataset, the prepossessing program extracted snippets of data randomly picks from \numprint{138} 20 second long \gls{dc} power consumption trace. A snippet is an extract of the trace composed of consecutive data-points. Each trace is 20 second long to avoid any outlier condition that, for a few seconds, could affect the average power consumption and cause a biased training. Within each trace, the program picked 10 snippets of 5 values. Those values of number and length of snippets corresponds to the minimum training time needed to achieve a \numprint[\%]{100} accuracy with a stratified 10-fold cross validation setup with the data used in this experiment. The average value of each snippet is then computed. The final training dataset is a 1D array of shape $(\numprint{1380},1)$.

Expansion modules also have an impact on the pattern of \gls{ac} power consumption.
Each number of expansion modules will cause a different pattern in the fundamental  \numprint[Hz]{60} wave of the \gls{ac} power consumption.
Those patterns only depend on the number of modules installed and not on which slots they are used.
To create the training dataset, the prepossessing program extracted periods of the fundamental wave by detecting consecutive local minima in the trace. From each 20 second trace, the program extract $N$ periods. Depending on the number $N$, the model achieved different results (see Table \ref{tab:periods_ac}).
The extracted periods of \numprint{3333} data points (one period of the \numprint[Hz]{60} captured at 1MSPS and decimated by 5), constitute the training set of shape $(\numprint{4320},\numprint{3333})$.

\begin{table}[ht]
    \begin{center}
    \begin{tabularx}{\columnwidth}{cYYYYYY}
        \toprule
         Number of periods & 10 & 20  & 30  & 40  & 50  & 60  \tabularnewline
        \midrule
         Accuracy (\%) & \numprint{98.61}& \numprint{98.99}& \numprint{99.26}& \numprint{99.53}& \numprint{99.72}& \numprint{99.78}\tabularnewline
        \bottomrule
    \end{tabularx}
    \end{center}
    \caption{Accuracy of the AC \gls{svm} model relative to the the number of period per traces}
    \label{tab:periods_ac}
\end{table}

\subsubsection{\textbf{Results}}
Models applied to \gls{ac} and \gls{dc} data performed differently at identifying the correct number of modules installed.

The average \gls{dc} value measured in this experiment for each number of modules does not overlap (see Table~\ref{tab:clusters_dc}).
This allows to create intervals containing only one type of label.
This property enable both \gls{svm} and \gls{knn} to perfectly classify the number of modules installed.
The \gls{svm} model trained with a linear kernel performed the same as the \gls{knn} model with $K=1$.
Both methods classify the traces with a \numprint[\%]{100} accuracy.

\begin{table}[ht]
    \begin{center}
    \begin{tabularx}{\columnwidth}{cYYYYYY}
        \toprule
         Class & 1 & 2  & 3  & 4  & 5  & 6  \tabularnewline
        \midrule
         Average [mV]& \numprint{54.9}& \numprint{72.5}& \numprint{90.1}& \numprint{95.2}& \numprint{125}& \numprint{144}\tabularnewline
         St.d [mV]& \numprint{0.037}& \numprint{0.12}& \numprint{0.028}& \numprint{0.16}& \numprint{0.031}& \numprint{0.045}\tabularnewline
        \bottomrule
    \end{tabularx}
    \end{center}
    \caption{Average DC consumption for different numbers of modules installed with 200 points per class}
    \label{tab:clusters_dc}
\end{table}

The \gls{ac} periods, event when following different patterns depending on the number of modules, remain similar at some points and do not present a separation as clear as the \gls{dc} averages. The \gls{svm} model was able to identify the number of modules installed with an accuracy of \numprint[\%]{99}.

\iffalse
\begin{figure}[h]
    \centering
    \includegraphics[width=0.9\columnwidth]{images/Hardware-modification/cluster_dc}
    \caption{Average DC consumption for different numbers of modules installed with 200 points per number of modules}
    \label{fig:clusters_dc}
\end{figure}
\fi

Results from Table \ref{tab:hardware-results} shows that \gls{dc} data yields the best results with both approaches (\gls{svm} and \gls{knn}). These high accuracy and recall results are the result of the clear and non-overlapping grouping of the averages \gls{dc} consummation. The results presented are produced with a stratified 10-fold cross validation setup.

\begin{table}[ht]
    \begin{center}
    \begin{tabularx}{\columnwidth}{YYYYY}
        \toprule
        \textbf{Input data} & \textbf{Model} & \textbf{Accuracy} & \textbf{Recall}\tabularnewline
        \midrule
         \gls{dc} & SVM & \numprint[\%]{100} & \numprint[\%]{100}\tabularnewline
         \gls{dc} & KNN & \numprint[\%]{100} & \numprint[\%]{100}\tabularnewline
         \gls{ac} & SVM & \numprint[\%]{99.5} & \numprint[\%]{99.45}\tabularnewline
        \bottomrule
    \end{tabularx}
    \end{center}
    \caption{Comparison between the different models for hardware detection with a stratified 10-fold cross validation setup}
    \label{tab:hardware-results}
\end{table}

\subsection{Detecting Installation or Removal of Expansion Modules}

For this experiment, the goal is to detect the installation or removal of an expansion module from a power capture from the network equipment. For this experiment, modules were installed or removed on-the-fly during the capture.

To achieve this goal, it is possible to leverage the method used in the previous experiment \ref{expe:hardware-1} and repeat the identification in regular intervals during operation. This is a different use case where the installation or removal occurs during the capture.
Any change in the number of expansion modules identified will be considered an attack on the hardware integrity of the device.
Figure~\ref{fig:installation-modules} shows the identification of the number of modules along the \gls{dc} capture. This detection uses \numprint{500} snippets of \numprint{20} data point. The Figure illustrate the steps followed by the classification from the model. Each step correspond to the installation of a module. The installation of a module does not trigger an instantaneous increase in the average consumption. For this reason, the predictions that follows the installation or a module can vary between two consecutive values. The average consumption and the predictions stabilize after a few seconds (around 10 seconds).

\begin{figure}[h]
    \centering
    \includegraphics[width=\columnwidth,height=0.46\columnwidth]{images/Hardware-modification/detect_change}
    \caption{Identification of the number of modules and detection of an installation}
    \label{fig:installation-modules}
\end{figure}

\section{Discussion} \label{Discussion}

\noindent
\textbf{Influence of Traffic on the Results:}
The data used for training the models did not include traffic and were collected in a laboratory environment. Because the production equipment is used by actual users, it is not possible to perform attack that would disrupt to connection quality. Hence, flashing firmware is not possible because it requires rebooting the machine, \gls{ssh} attacks are not possible because it requires disabling some security features, and hardware tempering is not possible because it requires to physically disconnect the users.
However, complementary experiments were conducted to verify weather traffic would have a significant impact on the results of the experiment. This can be explained by the fact that all the expansion module consume power whether or not they have active connection. This property make the detection of the number of modules installed possible and it may not be the same for every networking equipment. For Experiment Family I (section~\ref{Firmware}), the traffic can not impact the results as the there is no traffic possible during the boot-up sequence and the experiment use only the boot-up sequences to perform the classification. For Experiment Family II (section~\ref{RunTime}) and III (section~\ref{Hardware}), we capture data containing real traffic (captures on the identical production switch) and simulated traffic (connections between multiples pairs of machines at around 1Gbps in the laboratory environment). Traffic data does not show any significant impact on \gls{dc} or \gls{ac} in both time and frequency domain. From these results, it is possible to conclude that traffic should not impact the results from the presented experiments.

\noindent
\textbf{Support for Small Datasets:} As presented in this paper, the trained models can successfully detect attacks executed on the network equipment.
Those results are especially interesting as the model training step relies on a small number of training samples to achieve near perfect accuracy scores. This is a success, because (1)~our models achieve similar accuracy as some of the most successful experiments involving \gls{ml}~\cite{chollet2017xception,szegedy2017inception,xie2017aggregated,deng2009imagenet} but (2)~use only a small sample size compared to image libraries with  millions of image samples as training data.
Our experiments use a maximum of \numprint{1000} power trace samples.
The small number of training samples makes this approach adaptable to a range of different systems and domains because it solves the issue of collecting large amounts of data usually required to enable \gls{ml} approaches.
The models trained are relatively lightweight owing to the small number of samples along with the heavy downsampling performed on data for the experiments.
The lightweight nature of the models allows for fast online run-time monitoring and integrity assessment of embedded systems.

\noindent
\textbf{Computational requirements}
The machine used for performance measurement is a standard workstation equipped with \numprint[GB]{128} of RAM and an Intel Xeon E5-2630 v3 processor. This machine was also used for training. A substantially lower-powered machine will suffice for a deployment. The time an memory consummation were obtained with the \texttt{timeit} and \texttt{memit} command in Python. The commands evaluate the time and memory needed to predict one sample. The time interval reported in the experiment starts at receiving a raw measurement and ends with a prediction based on the sample. The evaluation excludes the training of the model, since this is done offline. Measurements for the best performing models are reported in table~\ref{tab:perfs}.

\begin{table}[ht]
    \begin{center}
    \begin{tabularx}{\columnwidth}{lYYY}
        \toprule
                                & Mean Time [ms] & Standard Deviation [ms] & Peak Memory Usage [MB]\tabularnewline
        \midrule
         Experiment 1  (RF, DC) & \numprint{13.5}   & \numprint{1.9}    & \numprint{103}\tabularnewline
         %Experiment 1  (SVM, DC)            & \numprint{2.1}    & \numprint{0.6}    & \numprint{104}\tabularnewline
         Experiment 1  (RF, \gls{psd}-DC) & \numprint{2.1}   & \numprint{0.3}    & \numprint{102}\tabularnewline
         %Experiment 1  (SVM, \gls{psd}-DC)            & \numprint{2.3}    & \numprint{0.5}    & \numprint{101}\tabularnewline
         Experiment 2  (DTW)                & \numprint{0.52}       & \numprint{0.2}       & \numprint{306}\tabularnewline
         %Experiment 3 (RF, DC)  & \numprint{469}       & \numprint{2.9}       & \numprint{380}\tabularnewline
         Experiment 3 (SVM, DC)  & \numprint{763}       & \numprint{25.5}       & \numprint{380}\tabularnewline
         %Experiment 4  (RF, DC)  & \numprint{741}       & \numprint{33}       & \numprint{182}\tabularnewline
         Experiment 4  (SVM, DC)  & \numprint{203}       & \numprint{9}       & \numprint{184}\tabularnewline
         %Experiment 5 (\gls{ac})            & \numprint{175}    & \numprint{24.1}   & \numprint{240}\tabularnewline
         Experiment 5 (\gls{dc})            & \numprint{264}    & \numprint{13.7}   & \numprint{353}\tabularnewline
        \bottomrule
    \end{tabularx}
    \end{center}
    \caption{Computation time and memory usage for the best performing models}
    \label{tab:perfs}
\end{table}{}

%\noindent\textbf{}

\section{The Bigger Picture} \label{sec:big_picture}

The concepts and principles of what we showed in this paper are applicable to most embedded and real-time systems. As long as systems have recurring, well-defined behaviour, we can use side-channel analysis to identify behaviour patterns. These behaviour patterns are useful to create \gls{ids} for integrity assessment or runtime verification frameworks.

The set of side-channels is not necessarily static for the class of embedded systems. For some systems, ultrasound or even temperature might be a good channel to use in the \gls{ids} or runtime verification framework. In general, we believe that power consumption overall is a good channel with a strong preference on using DC measurements.

%This paper shows the suitability of side-channel based \gls{ids} to offer  integrity assessment and run-time monitoring for only network switches, however, the principles and technique hold sound for all embedded systems. The data acquisition technique can extend to any embedded system and capture the systems power consumption.
%The \gls{dsp} methods and \gls{ml} algorithms can use the power consumption of other embedded systems in the same fashion as discussed in earlier sections. Different embedded systems might leak different side-channel emissions that can train \gls{ml} algorithms and offer another layer of protection. The principles of a side-channel based \gls{ids} is, thus, applicable to all embedded systems.

Side-channels produce measurable physical effects that are external to the system and thus enable monitoring without interference to the system under test. The external nature has advantages to the dependability of the monitoring for certified safety-critical systems. For example, a defect in the software of either the system under test or the monitor will not affect the other system. Furthermore, isolation of the security system has the potential to provide increased cybersecurity~\cite{ICISSP2017}.

%\CM{I strongly recommend to eliminate this whole paragraph.  It is certainly an ``empty'' claim, and it makes it sound like we're desperate to make it look like we did a lot more than we're reporting (which, why would that ever be the case?)}
%We have experimented with side-channel based monitoring on a number of platforms besides the reported results. Tests included electronic control modules in vehicles, camera systems, Internet-of-Things platforms, and manufacturing systems. In all of them we found utility in monitoring side channels for runtime verification or intrusion detection.

%ttacker with access to the system cannot circumvent the side-channel based \gls{ids}, (ii)~A bug in the \gls{ids} cannot disrupt the system it monitors. The latter can be extremely beneficial for run-time monitoring and integrity assessment of embedded systems that constitute security critical infrastructure such as power grids, medical devices, etc. Human errors often cause bugs in programs that can potentially make systems and other programs using it vulnerable through attacks such as privilege escalation. A bug in an \gls{ids} hosted on a system can render the system insecure. This highlights the importance of a comprehensive external \gls{ids} hosted independent of the system it monitors, as is the case for the side-chanenl \gls{ids} that this paper proposes.


% Have a section outlining that this can be expanded to many other areas

% depend on the system, different side channels might be of interest, however, the basic concepts still hold

% an industry standard could help facilitate the proliferation of side-channel-based monitoring

% Mention that physical isolation has a twofold advantage, especially in the context of safety-critical systems:  on the one hand, the system cannot affect (e.g., maliciously disable) the monitor; and on the other hand, the system is also immune to disruption caused by the operation of the monitor.

\section{Conclusion} \label{Conclusion}

This paper introduced a side-channel based \gls{ids} that offers a novel type of runtime monitoring and integrity assessment for network equipment. The specific attacks analyzed include hardware tampering, firmware manipulation, and log tampering. Our proposed \gls{ids} defends against these attacks by determining the system state and behaviour from the information emitted by the system's physical side-channels. The results show that the used methods achieve near perfect accuracy on all experiments with only a small training set. Overall, the introduced techniques provide a glimpse on a general concept that is extensible to other real-time and embedded systems. Future work can investigate additional side channels and how the interaction can even further reduce the required sample size and improve the accuracy.


\bibliography{bibliography}{}
\bibliographystyle{unsrt}

% You can push biographies down or up by placing
% a \vfill before or after them. The appropriate
% use of \vfill depends on what kind of text is
% on the last page and whether or not the columns
% are being equalized.

%\vfill

% Can be used to pull up biographies so that the bottom of the last one
% is flush with the other column.
%\enlargethispage{-5in}


% that's all folks
\end{document}