[PDF] Talking After Lights Out: An Ad Hoc Network for Electric Grid Recovery

Abstract

When the electric grid in a region suffers a major outage, e.g., after a catastrophic cyber attack, a "black start" may be required, where the grid is slowly restarted, carefully and incrementally adding generating capacity and demand. To ensure safe and effective black start, the grid control center has to be able to communicate with field personnel and with supervisory control and data acquisition (SCADA) systems. Voice and text communication are particularly critical. As part of the Defense Advanced Research Projects Agency (DARPA) Rapid Attack Detection, Isolation, and Characterization Systems (RADICS) program, we designed, tested and evaluated a self-configuring mesh network architecture and prototype called the Phoenix Secure Emergency Network (PhoenixSEN). PhoenixSEN is designed as a drop-in replacement for primary communication networks, combines existing and new technologies, can work with a variety of link-layer protocols, emphasizes manageability and auto-configuration, and provides a core set of services and applications for coordination of people and devices including voice, text, and SCADA communication. The PhoenixSEN prototype was evaluated in the field through a series of DARPA-led exercises. The same system is also likely to support coordination of recovery efforts after large-scale natural disasters.

Full PDF

TTalking After Lights Out: An Ad Hoc Network forElectric Grid Recovery

Jan Janak ∗ , Dana Chee † , Hema Retty ‡ , Artiom Baloian ∗ , Henning Schulzrinne ∗∗ Department of Computer Science, Columbia University, USA † Perspecta Labs, USA ‡ FAST Labs, BAE Systems, USAEmail: [email protected], [email protected], [email protected],[email protected], [email protected]

Abstract —When the electric grid in a region suﬀers a majoroutage, e.g., after a catastrophic cyber attack, a “black start”may be required, where the grid is slowly restarted, carefully andincrementally adding generating capacity and demand. To ensuresafe and eﬀective black start, the grid control center has to beable to communicate with ﬁeld personnel and with supervisorycontrol and data acquisition (SCADA) systems. Voice and textcommunication are particularly critical. As part of the DefenseAdvanced Research Projects Agency (DARPA) Rapid AttackDetection, Isolation, and Characterization Systems (RADICS)program, we designed, tested and evaluated a self-conﬁguringmesh network architecture and prototype called the PhoenixSecure Emergency Network (PhoenixSEN). PhoenixSEN is de-signed as a drop-in replacement for primary communicationnetworks, combines existing and new technologies, can work witha variety of link-layer protocols, emphasizes manageability andauto-conﬁguration, and provides a core set of services and ap-plications for coordination of people and devices including voice,text, and SCADA communication. The PhoenixSEN prototypewas evaluated in the ﬁeld through a series of DARPA-led exercises.The same system is also likely to support coordination of recoveryeﬀorts after large-scale natural disasters.

Index Terms —Ad hoc networks, network architecture, networksecurity.

I. IntroductionMost electric power outages are locally-contained and recov-ery can rely on the public or utility-owned communicationsinfrastructure to coordinate restoration and energizing partsof the electric grid. Large-scale electric power outages, a.k.ablackouts, are rare but do happen [1], [2]. Recovering from alarge-scale outage typically follows a special procedure knownas black start. “A total or partial shutdown of the nationalelectricity transmission system (NETS) is an unlikely event.However, if it happens, we are obliged to make sure thereare contingency arrangements in place to ensure electricitysupplies can be restored in a timely and orderly way. Blackstart is a procedure to recover from such a shutdown.” [3]In the United States (U.S.), the black start procedure isusually managed by regional transmission organizations (RTOs)

This research was developed with funding from the Defense AdvancedResearch Projects Agency (DARPA). The views and conclusions containedin this document are those of the authors and should not be interpreted asrepresenting the oﬃcial policies, either expressed or implied, of the DefenseAdvanced Research Projects Agency or the U.S. government. Distributionstatement A. Distribution approved for public release, distribution unlimited.Not export controlled per ES-FL-020821-0013. that coordinate several electric utilities. For example, PJM, alarge RTO, describes its black start operation as follows: “BlackStart capability is necessary to restore the PJM transmissionsystem following a blackout. Black Start Service shall enablePJM, in collaboration with the Transmission Owners, todesignate speciﬁc generators whose location and capabilitiesare required to re-energize the transmission system.” [4]Even if suﬃcient black start generating capability is available,a successful black start requires coordination of electricitysupply and demand, typically by incrementally adding bothgenerating capacity and load. Such coordination usually takesplace either via phone calls to substation personnel, or viareal-time control of supervisory control and data acquisition(SCADA) devices. Both cases require network connectivity.Grid operators often rely on internet service providers (ISPs)for network services [5]. If the ISPs are also impaired by theblackout, network connectivity may be diﬃcult to guarantee.If the blackout is caused by a network-based cyber attack,the attacker may also attempt to actively thwart or delaypower restoration, making a bad situation worse. The DefenseAdvanced Research Projects Agency (DARPA) has recognizedthe danger network-based cyber attacks represent for the U.S.critical power grid infrastructure and launched the Rapid AttackDetection, Isolation, and Characterization Systems (RADICS)program [6]. The goal of the program is to create a set of toolsthat will aid the power distribution industry in recovering froma hypothetical large-scale blackout triggered by a network-basedcyber attack.In this paper, we present the design, prototype implemen-tation, and experimental evaluation of the Phoenix SecureEmergency Network (PhoenixSEN) designed by BAE Systemsand Columbia University as part of the DARPA RADICStool set. PhoenixSEN consists of a hybrid, isolated, self-forming network and services speciﬁcally designed to enable thecoordination of power restoration amidst an ongoing network-based cyber attack. It combines existing and new technologies,can work with a variety of link-layer protocols, and providesapplications for rapid coordination of people and devices. Thenetwork is designed as a drop-in replacement for primarycommunication networks that are likely to be severely impairedduring a large-scale blackout.We begin by discussing the motivation and the problem a r X i v : . [ c s . N I] F e b eing addressed by our work (and the DARPA RADICSprogram in general) in Section II. Section III presents asimpliﬁed model of the U.S. electrical grid with an emphasison networking infrastructure. We then discuss the generalarchitecture and features of the Phoenix network and node inSection IV. The subsequent sections describe various buildingblocks, applications, and services in detail: naming and servicediscovery (Section V), voice and chat support (Section VI),network monitoring (Section VII), and insider attack mitigation(Section VIII). We conclude in Section X and discuss potentialfuture work and applicability of PhoenixSEN to other scenarios,e.g., natural disaster response.II. Motivation & Problem StatementOur work is primarily motivated by the need of the powerdistribution industry for backup network infrastructure thatcould be used to recover from a large-scale blackout causedby a network-based cyber attack [7]. We assume that the gridcontrol SCADA devices, as well as the network infrastructureitself maybe be compromised and could act maliciously.Therefore, the backup infrastructure should provide a meansto reconnect healthy devices and keep those isolated frompotentially compromised or malicious devices. This wouldallow for an incremental approach where devices are connectedto the temporary (isolated) network only after they have beeninspected and deemed healthy.Like most complex networked systems today, power gridinfrastructure relies, at least partially, on networks managed byexternal ISPs for remote command, control, and coordination ofboth machines (SCADA) and operators (voice). It is likely thatthe network infrastructure itself will be severely aﬀected in caseof a large-scale blackout. This presents an interesting “chickenor the egg” dilemma. The network needs power to connect thegrid, but the grid cannot operate without the network. Clearly,there needs to be infrastructure in place that will allow the gridto be temporarily self-suﬃcient, at least during the initial restartphase when the grid is not yet fully operational. We envisiona mostly self-conﬁguring network infrastructure that can takeadvantage of various existing link layer technologies and can bedeployed to geographically dispersed sites used by the powergrid infrastructure after the blackout. We assume the personnelsetting up the infrastructure will have technical background,but not necessarily in computer or network engineering. Thenetwork would provide the minimum set of services andbandwidth necessary to black-start the power grid in a securemanner.A recent series of large scale blackouts illustrates thatsuch events are not merely a hypothetical possibility. The2019 blackout in Argentina, Uruguay, and Paraguay left 48million people without power [1]. For comparison, the Boston-Washington metropolitan area has 50 million inhabitants. In2019, a large-scale power outage in England aﬀected morethan a million homes and severely disrupted the public transitsystem [2], [8].Network-based cyber attacks on critical power grid infras-tructure can potentially have even more disastrous and long- lasting consequences, leaving tens of millions of people inlarge metropolitan areas without power for extended periodsof time. An ongoing cyber attack on power grid systems mayattempt to thwart any restart attempts, leaving a large numberof people without power for days or weeks. Depending on thestate of the power grid infrastructure, a full black-start recoveryafter a cyber attack may also take a long time, e.g., days orweeks, depending on the sophistication of the attack.While some of the critical grid infrastructure may betemporarily powered by on-site backup diesel generators, thesituation will get progressively worse as those generators beginmalfunctioning or start running out of fuel. Communicationnetworks stop working, potentially bringing down other criticalservices such as the Global Positioning System (GPS). Waterand gas systems, hospitals, nursing homes, and waste treat-ment facilities will soon begin shutting down, transportationinfrastructure can be severely aﬀected. The situation getsprogressively worse as more critical services shut down [9].In a large-scale catastrophic scenario like this, with asubstantial and prolonged disruption of electric power, time is ofthe essence. The operation of the electrical grid must be restoredwithin a few days to prevent other critical facilities from shuttingdown. A full black-start recovery of the electric grid requirescommunication and coordination. However, communicationnetworks are unlikely to remain operational after a substantialpower disruption event.While the main use case for the work presented in thispaper is black-start recovery of the power grid, a similararchitecture could also be used to create temporary isolatednetworks for emergency communications in various disasterrelief scenarios, e.g., the hurricanes that ravaged Puerto Rico in2017 [10]. Similar technology is being used by some disasterrelief organizations such as the Red Cross [11].III. Architectural Model of U.S. Electrical GridThe U.S. electrical grid is a complex, heterogeneous, geo-graphically dispersed system that combines physical infras-tructure for producing and delivering electric power withcomputer-based monitoring, management, and control. Asessential infrastructure that has been evolving for more thana century and is subject to extensive government regulation,the grid has seen incremental upgrades and organic growth,resulting in considerable variability across geographical andpolitical boundaries. The grid’s architecture is moving froma model with a small number of vertically-integrated utilitymonopolies towards an interconnected model with a multitudeof utility and non-utility companies coordinating to useshared transmission infrastructure. In the interconnected model,networking infrastructure plays an increasingly important roleand is critical for reliable grid operation.In the U.S., hundreds of companies participate in the pro-duction and transmission of electric power. To ensure reliablegrid operation, the Federal Energy Regulation Commission A vertically-integrated utility controls all stages of electric power supplychain, from generation to distribution among consumers. ransmission Provider Distribution (Load) ProviderVertically Integrated Electric Utility Company

GenerationFacility

Independent Power ProducerRegional Transmission Operator / Independent System Operator

SCADA / OASIS / Voice InteractionTransmission System DistributionSystem (a) Distributed grid model where electricity market participantsuse shared transmission infrastructure coordinated by a regionaltransmission operator (RTO) or independent system operator (ISO).

Utility (transmission and/or distribution)RTO / ISO

Substation (1:N)Control Center

Generator Station

OASIS SubsystemEnergy Management SystemSCADA SubsystemVoice Subsystem (PBX) Super PDCRouter / Firewall SCADA Master Terminal UnitRouter / FirewallPhasor Data ConcentratorRTO WAN Synchrophasor WAN Utility WANPSTNRouter / FirewallSCADA Master Terminal UnitPhasor Data Concentrator

ControlRoom

PMUsIEDs

ControlRoom

Router / FirewallSCADA Remote Terminal UnitPhasor Data Concentrator

PMUsIEDs D N P I C C P C . H TT P H TT P (b) A detailed overview of the communications infrastructure required to keep the gridoperational and reliable. The diagram shows a simpliﬁed architectural model. The existinggrid exhibits considerable variety across geographic and political boundaries.Fig. 1. The U.S. electrical grid evolves towards a more distributed model which requires an increasing amount of coordination, which in turn requires extensivedata communications infrastructure. Modern-day grid infrastructure resembles a geographically dispersed cyber-physical system (CPS), where a computerprogram (EMS) manages the ﬂow of electric power through the system via real-time SCADA-based remote instrumentation of physical processes. (FERC) has designated the North American Electric Reliabil-ity Corporation (NERC) to develop and enforce operationalstandards. Regional system operators coordinate the generation,transmission, and distribution of electric power. In some regionsthe system operator is aﬃliated with a particular utility company.In other regions the system operator is an independent entityknown as the RTO that coordinates multiple utilities. Someregions have an independent system operator (ISO) insteadwith a similar role. The diﬀerences are subtle and beyondthe scope of this paper. The RTO/ISO operates a wholesaleelectricity market, guarantees non-discriminatory access toshared grid infrastructure, and ensures reliable grid operationand compliance with NERC standards. The actual electric powergeneration, distribution, metering, and billing is provided byutility and non-utility companies coordinated by the RTO/ISO.The major elements of an electric grid are the devices thatproduce and transmit electric power, information technology(IT), industrial control systems (ICSs), and the underlyingnetwork infrastructure. Since electric power is generated andconsumed almost instantaneously, the grid must be coordinatedto match power generation with demand in real-time. Fig. 1provides a simpliﬁed architectural model of the U.S. grid witha focus on the communications infrastructure.NERC operational standards provide high-level guidanceprimarily aimed at ensuring reliable grid operation. The actualimplementation details of power grid systems are left to theRTOs/ISOs and utilities. As a result, existing grid systemsare heterogeneous and often use a multitude of devices thatcommunicate with mutually incompatible protocols. Earlysubstation automation devices communicated using proprietaryprotocols over industry-speciﬁc buses or serial links. Modern-day substation systems tend to use standardized processautomation protocols and reuse existing wired and wirelessnetwork technologies.The ﬂow of electric power through the grid is managed by an energy management system (EMS) program. The primarypurpose of the EMS to keep the grid operational and reliable inresponse to varying conditions such as the available generatorpool, transmission capacity, and instantaneous load. Ideally, asingle instance of the EMS with the ability to remotely controlcritical grid devices would be provided by the RTO/ISO forthe entire region. In practice, individual transmission operatorstypically run their own EMS to monitor and protect their assets.The EMS obtains the data about available generationresources and transmission capacity for its scheduling andplanning algorithms from the Open Access Same-Time Informa-tion System (OASIS), a standardized web-based managementsystem [12] that serves as an interface between electricitymarket participants, transmission providers, and the RTO/ISO.FERC requires that each RTO/ISO must provide an OASIS node.Clients typically interact with the OASIS system by invokingHypertext Transfer Protocol (HTTP) application programminginterfaces (APIs) over the internet.The operators of infrastructure deemed critical for thereliability of the grid by the RTO/ISO are required to providethe RTO/ISO with remote access to selected components formonitoring and control purposes. This is accomplished byintegrating the RTO/ISO’s and the operator’s SCADA andsynchrophasor subsystems over a redundant wide area network(WAN) provided for this purpose by the RTO/ISO. The dataobtained from these subsystems is used by the EMS to build aglobal view of the state of the grid. Furthermore, the EMS canuse SCADA to remotely control grid devices in the ﬁeld, e.g.,circuit breakers. Not all grid participants need to be SCADA-capable and connected to the RTO/ISO WAN. Smaller entitiessometimes rely on the internet for all communication with theRTO/ISO.From the previous paragraphs it follows that many diﬀerenttypes of data networks are involved in managing the ﬂowof electricity through the grid. The RTO/ISO operates a tility BUtility A Phoenix NodeSubstationA 1 LANSCADA Phoenix NodeLANSCADA Phoenix Node LANSCADA SubstationB 1Phoenix Node LANSCADAControl CenterPhoenix SEN Forensic point

SubstationA 2 SubstationB 2

Utility A VLAN Utility B VLAN (a) A Phoenix node at each substation connects SCADA and backend devices to a virtualnetwork spanning all substations of the utility. Per-utility virtual networks share commonphysical infrastructure but are isolated from one another. Each Phoenix node helps routepackets for other utilities. A dedicated forensic access port is provided on each node.

Weather Resistant Enclosure

Intel NUCVLAN Ethernet SwitchGPSMemory CardSCADAWi-FiLAN Wi-Fi APRTU R a d i o L i n k s PhoenixSENHandset PC Phoenix Node S u b s t a t i o n I n f r a s t r u c t u r e (b) The Phoenix node consists of an Intel NUC with a uniformsoftware installation and peripherals in a weather resistant en-closure. One-time deployment conﬁguration is performed via amemory card (distributed separately) or the included smartphone.Fig. 2. PhoenixSEN is a drop-in replacement for ISP networks used by utilities. The OLSR-based network consists of uniform Phoenix nodes deployed atsubstations and interconnected by a variety of links (long, short, fast, slow). Each utility is provided with an isolated virtual network spanning all its substations. redundant WAN used to connect to the control centers of gridinfrastructure providers critical for overall system reliability.The typical RTO/ISO WAN is a redundant Multiprotocol LabelSwitching (MPLS) network based on links leased from severalexternal ISPs. The synchrophasor subsystem, if deemed criticalfor future grid systems, will most likely use a dedicated WANwith stricter latency and bandwidth guarantees.The use of the public switched telephone network (PSTN) forhuman-to-human voice communications between the RTO/ISOand utility personnel is required by NERC. Despite an overallincrease in remote instrumentation capabilities across theelectrical grid, human-to-human voice communication remainsthe most important communication modality in emergencysituations, e.g., after a blackout. To meet NERC’s strictreliability requirements, the phone system typically uses cellularor satellite phones as backup.Each utility also operates a dedicated WAN spanning its(sometimes large) service area that connects the utility’s controlcenter with all substations. The typical utility WAN is InternetProtocol-based (IP) and uses a combination of public (ISP-owned) and non-public (utility-owned) network infrastructure.A substation with connected devices (e.g., SCADA) mustalso provide a ﬁeld area network (FAN). The FAN connectssubstation automation devices as well as any remote devices(metering, data collection) within the substation’s service area,e.g., a neighborhood. Due to the large variety in deployedautomation devices, the FAN is perhaps the most heterogeneousnetwork type and is typically based on a combination of wiredand wireless technologies. The public internet (not pictured)is typically used for other communication, e.g., to access theRTO/ISO’s OASIS portal, or to transfer metering or billinginformation between the utility and its customers.SCADA interactions between the RTO/ISO and utilitycontrol centers typically use standardized protocols such asthe Distributed Network Protocol 3 (DNP3) [13], the Inter-Control Center Communications Protocol (ICCN) [14], orIEC 61850 [15] carried over TCP/IP. The SCADA subsystemis a hierarchically organized system where the RTO/ISO’smaster terminal unit (MTU) communicates with the MTUs at utility control centers, which in turn communicate withremote terminal units (RTUs) deployed at substations. Thesynchrophasor subsystem is based on a hierarchy of phasordata concentrators (PDCs) that aggregate and process IEEEC37.118 [16] data streams coming from phasor measurementunits (PMUs) deployed across the grid.IV. Phoenix Secure Emergency NetworkThe PhoenixSEN is a self-conﬁguring ad hoc networkarchitecture designed to provide a drop-in replacement for thegrid’s primary (ISP) communication networks. The networkrequires minimal deployment conﬁguration and oﬀers essentialservices for human-to-human (voice, text) and device-to-device (SCADA) coordination. Uniform hardware and softwarearchitecture allows rapid deployment from a storage facility tosubstations. Nodes are designed for compatibility with a varietyof link technologies, e.g., radio, ﬁber, or powerline. Fig. 2aillustrates the overall network architecture.PhoenixSEN consists of a designated control center (CC)and Phoenix nodes deployed to substations or relay points. Thenodes are interconnected with short-distance and long-distancelinks. Some of the links can be provided by third-parties, e.g.,the National Guard. PhoenixSEN is designed to be deployedinto geographic areas served by multiple utility companies. Thenetwork provides an isolated virtual network to each utilityon top of shared physical infrastructure. The virtual networkspans all utility’s substations. Each Phoenix node connect itssubstation local area networks (LANs) to the appropriate virtualnetwork and also route packets for virtual networks of otherutilities. The CC provides additional equipment and services tofacilitate network monitoring and management, and to supportone-way broadcast communication across the deployment area.Each Phoenix node provides multiple virtual LANs (VLANs)to its substation. Typically, there will be one VLAN for SCADAdevices and another VLAN for IT (backend) systems. Theaddressing architecture of each VLAN is conﬁgurable, allowingit to match the original ISP network in order to minimizethe need to reconﬁgure existing equipment. Each Phoenixnode also provides essential network services locally to enable ntel NUC Docker containers Physical Ethernet interfaceVLAN Ethernet switch Link PortsConﬁgure (SNMP)Substation LAN Ports SubstationVLAN interfaceUtilitysubnet VxLANLink VLANinterface(s)ServicesOLSR agent P e r - V L A N v i r t u a l s u b n e t s SIP serverDHCP serverNTP serverDNS serverRecursive DNS resolverDNS Service DiscoveryNetmon agent802.1X authenticatorIntrusion detectionNAT / FirewallOLSR agentMgmtsubnetLink VLANinterface(s)Netmon agentDeploymentManagerConﬁg.library Instantiate servicesSelectRun

Fig. 3. Phoenix node software architecture. An isolated network environment(grey) with with all required services is created for each substation LAN. Theenvironments that belong to the same utility are connected across PhoenixSEN. communication within the substation even when disconnectedfrom PhoenixSEN. These services include, among others,Domain Name System (DNS), Dynamic Host ConﬁgurationProtocol (DHCP), Network Time Protocol (NTP), and Voiceover IP (VoIP) signaling.The primary purpose of PhoenixSEN is to restore connectiv-ity in a grid under network-based cyber attack. We assume thegrid’s devices (SCADA and IT) might be compromised andmay contribute to the attack. For this reason, Phoenix nodesprovide a dedicated access port for a forensic team and servicesthrough which portions of the network or individual devicescan be isolated from the rest of the network. PhoenixSENalso comes with a built-in intrusion detection system (IDS)that attempts to automatically mitigate certain types of insiderattacks.

A. Phoenix Node

The Phoenix node is designed to be deployed from a storagefacility to substations by ground transportation or via air liftshortly after a blackout. All hardware comes in a weather-resistant enclosure which contains all essential components suchas Intel Next Unit of Computing (NUC), Ethernet switches andcables, GPS, Wi-Fi access points, and VoIP clients. All Phoenixnodes use a uniform hardware and software architecture tosimplify deployment and installation. Fig. 2b illustrates thehardware architecture, Fig. 3 provides an overview of thesoftware architecture. Fig. 9 shows a small-scale prototypebuilt for ﬁeld evaluations.The Phoenix node is based on the Intel NUC small form-factor computer. The operating system (OS) is Ubuntu Linux16.04 in a minimal conﬁguration. All custom software forPhoenixSEN is pre-installed in the form of Docker containers.The containers are built from source code and are speciﬁcallydesigned to support re-building in the ﬁeld without internetaccess, e.g., after modiﬁcations to ﬁx critical bugs or vulnera-bilities.A newly deployed Phoenix node starts in a minimal conﬁgu-ration. In this state, the node starts only a deployment managerprocess and a management VLAN. Utility and substationspeciﬁc services are not yet provided. The deployment manager provides a user interface (UI) for the substation crew to performone-time deployment conﬁguration, i.e., to enter the utilityname and substation number. The management VLAN canbe used by the CC for remote support, conﬁguration, ormaintenance.The conﬁguration for utility and substation speciﬁc servicesis generated by a custom conﬁguration synthesis program outof a model of the whole PhoenixSEN. The model containsparameters such as number of utilities, number of substationsper utility, desired IP addressing architecture, types of VLANsand links, etc. In most cases the model can be created prior toPhoenixSEN deployment based on information collected fromutilities in the deployment region. In that case, the generatedconﬁguration library can be pre-loaded onto each Phoenix node.If the model is unavailable prior to deployment, the synthesiscan be performed at the CC and the generated conﬁgurationlibrary can be distributed to substations on a Universal SerialBus (USB) memory card, or the CC can run the synthesisprogram on Phoenix nodes remotely over the managementnetwork.Upon receiving the utility name and substation number, thenode selects the appropriate conﬁguration from the conﬁgura-tion library and starts utility and substation speciﬁc servicesand VLANs. A fully conﬁgured Phoenix node runs one ormore isolated virtual network environments. The purpose of thenetwork environment is to emulate the primary ISP network thatconnected the substation to the internet before blackout. Nodeswith multiple links also serve as transparent routers for otherutilities sharing the PhoenixSEN. The virtual network environ-ments are conﬁgured for a particular substation and typicallycorrespond to the substation’s VLANs, e.g., IT systems, VoIPdevices (phones), and SCADA. Each environment is connectedto a subset of the ports on the Ethernet switch. One portconnects to the substation LAN, the other ports connect to linkmodems or radios. The network environments are implementedusing the Linux networking namespace capability with servicesprovided by Docker containers.Each network environment runs its own instance of the Op-timized Link State Routing Protocol (OLSR) agent which pub-lishes the environment’s IP subnet information to PhoenixSEN.This information is exchanged only with OLSR agents thatbelong to the same utility and environment type (e.g., SCADA).For example, utility A’s SCADA environment on one Phoenixnode connects to utility A’s SCADA environments on all otherPhoenix nodes, but not to utility B’s SCADA environments.The utilities served by the same PhoenixSEN can use conﬂict-ing or overlapping IP subnets (e.g., 192.168.x.y). Supportingsuch scenarios natively would require VLAN support on allPhoenixSEN links. In order to stay compatible with a widevariety of link layer technologies (including IP-only point-to-point links), the node does not require VLAN support on links.Instead, it passes all traﬃc through a Virtual Extensible LAN(VxLAN) [17] gateway which encapsulates Ethernet frames inUser Datagram Protocol (UDP) packets prior to transmission. Aunique VxLAN virtual network identiﬁer (VNI) is generated foreach utility-VLAN combination during conﬁguration synthesis.ach network environment provides fully isolated elementarynetwork services to the corresponding substation VLAN suchas DHCP, DNS, or NTP. The DNS service is described inSection V. Each environment also runs a custom Netmon agentto discover and identify devices connected to the substationLAN. The Netmon service is described in more detail inSection VII.The type of the network environment determines whatadditional network services are created. For example, the VoIPenvironment provides a Session Initiation Protocol (SIP) [18]server on each Phoenix node to keep the substation’s phonesoperational and to provide a means of communication betweenthe substation personnel and the CC. Please refer to Section VIfor more detail. The SCADA environment provides additionalservices for intrusion detection and mitigating insider attacksby compromised devices (Section VIII).

B. Deployment & Network Formation

The deployment of PhoenixSEN begins with the deploymentof the CC. The CC then coordinates the deployment ofthe rest of the network and provides remote assistance tosubstations while they are setting up Phoenix nodes. Duringthe initial deployment phase, the CC may only have one-waybroadcast capability, e.g., a high-power high frequency (HF)radio with voice or low-speed data support. The CC crew canuse the broadcast channel to transmit substation-speciﬁc setupinstructions, just enough to get the substation connected. Oncethe substation is connected, the CC crew can help conﬁgurethe substation’s Phoenix node remotely.Upon Phoenix node delivery, the substation crew beginssetting up the node. We assume the crew has technical back-ground (e.g., in electric grid engineering), but not necessarilyin IT, communications, or networking. The node includes aplaybook and detailed installation instructions. All parts andconnectors are clearly labeled. The instructions are detailedenough to allow the crew to independently setup the Phoenixnode in a minimal conﬁguration with a low-bandwidth controlconnection to the CC. To facilitate this critical ﬁrst step, thenode enclosure may include a pre-conﬁgured HF receiver toreceive broadcast communication from the CC.When powered on, the Phoenix node begins to search forother Phoenix nodes on its link interfaces. Each interface hasan instance of the OLSR daemon conﬁgured to automaticallydiscover other OLSR-enabled [19] nodes reachable over theinterface. Gradually, Phoenix nodes form an OLSR-basedmobile ad hoc network (MANET) that eventually spans allsubstations and the CC. Once the network is formed, the CCcan further conﬁgure and coordinate deployed Phoenix nodesover the network.While forming, PhoenixSEN provides connectivity in phases,gradually providing additional modes of communication as thesystem transitions from one phase to another. The phases areas follows.1) Low-speed, broadcast-only communication from the CC tosubstations in the process of deploying a Phoenix node. 2) Low-speed command and control connection between theCC and each substation. The connection provides justenough bandwidth for the CC crew to conﬁgure the Phoenixnode remotely.3) The Phoenix node is fully connected to the network, butthe overlay may not have yet fully formed or the node maybe in the process of resolving an addressing conﬂict. Thesubstation may not be able to reach all other substations ofthe same utility yet.4) The node is fully connected and provides VoIP and textcommunication between the CC and the substation crew.Finally, the substation or CC crew perform one-time deploymentconﬁguration by entering the utility name and substationnumber into the node. This step can be performed locallyat the substation or remotely over the network. The node usesthe entered information to create utility and substation speciﬁcservices and VLANs. The process is described in more detailin Section IV-A.V. Naming and Service DiscoveryFor compatibility with existing devices and applications,PhoenixSEN provides naming and service discovery servicesbased on the DNS. The requirements outlined in Section IIand the fact that PhoenixSEN must remain usable whendisconnected from the internet make the traditional DNSarchitecture with a single authoritative master server diﬃcultto use in an ad hoc network. In PhoenixSEN, each node shouldbe able to independently contribute its own records into a ﬂatnamespace based on a DNS zone shared by all nodes. In thissection, we describe the design of a hybrid peer-to-peer DNSsystem used in PhoenixSEN.The two primary use-cases for DNS in PhoenixSEN arediscovering services oﬀered by Phoenix nodes and supportingthird-party tools with pre-existing Transport Layer Security(TLS) certiﬁcates. The ﬁrst use-case is implemented in theVoIP and chat service, as described in Section VI, which usesDNS service discovery to map phone numbers to Phoenix nodesin a distributed manner. The second case refers to forensicand cyber-security activities performed with third-party tools.During live exercises, external participants often needed to beable to connect their tools with mandatory TLS server certiﬁcatevalidation to PhoenixSEN. The DNS service allows registeringdomain names that match pre-existing TLS certiﬁcates on aﬁrst-come ﬁrst-serve basis and resolving those names acrossPhoenixSEN.The DNS service takes advantage of the following: 1) aweak eventual consistency model is suﬃcient for DNS; 2)in a network like PhoenixSEN, the impact of conﬂicts andinconsistencies can be minimized network-wide through design.Point 1) expands the range of protocols that could be used forDNS database replication to gossip (epidemic), ﬂooding, andmulticast-based protocols. Point 2) means that if applicationsand devices are conﬁgured appropriately, the probability ofconﬂicts due to a weak consistency model will be negligible.PhoenixSEN provides such DNS subsystem, with no dedi-cated master server and without any single point of failure. The oip-sbc container

Unbound Recursive Resolver Python Module

UnboundAPI

ISC DHCP ServerLAN

DNS Query

KnotDNS Server

DNSUPDATE DNS Query

SIP Server DNS Publisher

DNS UPDATE

Avahi Daemon mDNS Query(Avahi D-Bus API)

Avahi Publisher

DNS NOTIFY& AXFR Publish(Avahi D-Bus API) olsrd bmf mDNSDNS Query

OLSR Mesh mDNSvia OLSRRegistration Database

Fig. 4. DNS subsystem architecture. Each Phoenix node runs the same set ofservices. The block with gray background shows the VoIP service using theDNS subsystem for service discovery. Custom-made blocks are highlighted inblue. Communication via proprietary protocols is highlighted in purple. subsystem is a hybrid of regular DNS and multicast DNS [20].Devices and services can use the subsystem for network-wideDNS Service Discovery (DNS-SD) [21]. In an ad hoc networksuch as PhoenixSEN, built-in service discovery provides ameans for applications to discover services dynamically ratherthan relying on static conﬁguration.Dynamic service discovery improves robustness in scenarioswhere the network is impaired or only partially formed. ADNS-based service discovery mechanism allows the reuse ofhigher layer protocols and services, e.g., TLS server certiﬁcatevalidation.Fig. 4 illustrates the architecture of the DNS subsystem. EachPhoenix node runs the full set of services that make up theDNS subsystem. Thus, DNS is always available on substationLANs, even if the Phoenix node itself is isolated from the restof the network. As more Phoenix nodes join the network, DNSrecords gathered from other nodes will become automaticallyavailable to the devices. Where possible, the subsystem usesstandardized protocols and existing (unmodiﬁed) open sourcesoftware.Each Phoenix node provides a recursive DNS resolver tosubstation LANs based on the Unbound resolver. We choseUnbound because its behavior can be customized with anexternal Python program. We use that feature to resolve ordinaryDNS queries via multicast DNS (mDNS) [20] (described later).Unbound ﬁrst forwards the query to a local authoritativeDNS server implemented with Knot DNS server. The DNSserver is authoritative for the domain phxnet.org and for aportion of the in-addr.arpa space corresponding to the IPsubnet allocated to the Phoenix node. The domain phxnet.org represents a namespace shared by all Phoenix nodes. Any nodecan register a record in the shared namespace. If multiplenodes register the same record, all such records are mergedwhen a client attempts to resolve the record. It is left up to theapplication to resolve the potential conﬂict.If no record is found, Unbound forwards the query to anexternal Python module. The module attempts to resolve the query via mDNS using a local Avahi daemon instance . ThePython module and Avahi daemon communicate via a D-BusAPI.In a typical conﬁguration, the Avahi daemon multicastsDNS queries over LAN interfaces, e.g., local Ethernet or Wi-Fiinterfaces. Since multicast DNS packets are not forwarded bylayer 3 routers, only nodes connected to the same link can bereached this way. PhoenixSEN is, however, a routed networkmanaged by an OLSR [19] daemon running on each node.The daemon includes the Basic Multicast Forwarding (BMF)plugin that enables multicast communication for the overlaynetwork. We use the plugin to propagate Avahi’s mDNS packetsacross the OLSR network. The BMF plugin forwards multicastpackets along a spanning tree covering the entire network, i.e.,all Phoenix nodes.The Avahi daemon on each Phoenix node publishes allDNS records from the local DNS server via mDNS. This isaccomplished with a custom program called “Avahi Publisher”.The program behaves as a secondary DNS server for theshared zone. When the zone is updated, Knot sends a DNSNOTIFY request to the program. The program issues a DNSzone transfer to Knot to download all records and publishesthose records via Avahi’s D-BUS API. Thus, LAN clients thatresolve DNS records via Unbound receive records not onlyfrom the local Knot DNS server, but also from all DNS serversfound anywhere in the network.Devices that obtain an IP address via DHCP are automaticallyassigned hostnames by the DHCP server. The DHCP serveruses the standard dynamic DNS (DNS UPDATE) protocolto publish records to the local DNS server. For example, aDHCP client with the name “foo” will be assigned the host-name foo.phxnet.org . Having an automatically generatedhostname for each device has two main beneﬁts: 1) the devicecan be reached by its chosen name across the network; 2)services running on the device can use TLS certiﬁcates with awild-card certiﬁcate issued by Letsencrypt. A. Service Discovery Example

PhoenixSEN provides a SIP-based decentralized VoIP service(highlighted in gray in Fig. 4) designed to help substationpersonnel coordinate during a black-start recovery. The servicerelies on the PhoenixSEN DNS subsystem for network-wideservice discovery. We discuss how the VoIP service uses theDNS subsystem in this section.The VoIP clients at a substation register with the nearestSIP server. Since each Phoenix node runs a dedicated SIPserver, the VoIP clients usually register with the SIP at theirsubstation’s Phoenix node. To discover the SIP server, a VoIPclient performs the standard procedure described in [22]. TheDNS queries issued by the VoIP client will be forwarded byUnbound to the local Knot DNS server. The DNS server hasthe corresponding DNS PTR and SRV resource records in itszone and those records point the VoIP client to the SIP server Avahi is the de facto standard implementation of mDNS in Linux basedOSes. hoenix Node 1

Smartphone w/ SIP clientPC w/ web browserDesk phone IP DECT phone Phoenix NetworkSIP / WebRTC Server C o n f e r e n ce G r o u p C h a t T r a n s c o d i n g H ea d e r C o m p r e ss i o n Phoenix Node 2

SIP / WebRTC Server C o n f e r e n ce G r o u p C h a t T r a n s c o d i n g H ea d e r C o m p r e ss i o n Control Center LAN Smartphonew/ SIP clientPC w/ web browserDesk phone IP DECT phone Substation LAN W e b R T C SIP / sRTP S I P / R T P DNS Service Discovery DNS Service Discovery

Fig. 5. Internal architecture of the VoIP and chat subsystem. The subsystem can integrate a variety of SIP (red) and WebRTC (green) based VoIP clients.Each Phoenix node runs a full set of services. VoIP clients register at the nearest node. Communication sessions that traverse slow links (blue) are optionallytranscoded and compressed. VoIP clients and servers rely on DNS-SD for server federation and client discovery. IP multicast (used by DNS-SD) helps makethe subsystem fully decentralized and with no single point of failure. on the local Phoenix node. This way, all VoIP clients acrossthe whole network can have identical conﬁguration and can,in case of smartphones, roam between substations.Upon VoIP client registration, the SIP server publishes acustom DNS record mapping the VoIP client’s number to thehostname of the SIP server where it has registered, for example where 4822 is a VoIP client’s phone number and voip-phx23.phxnet.org is the hostname of the Phoenixnode where the client is registered. A custom program called“DNS Publisher“ (Fig. 4) publishes the record to the localDNS server via DNS UPDATE and the record is subsequentlydisseminated across the network with multicast DNS.When a remote SIP server receives a SIP INVITE [18] forthe VoIP client, it extracts the called number from the Request-URI and constructs a domain name from the number withinthe _voip.phxnet.org suﬃx. The SIP server then queriesthe DNS for the corresponding CNAME record. If a match isfound, the call is forwarded to the SIP identiﬁed by the record.Note that the multicast DNS subsystem is conﬁgured topro-actively disseminate the requests across the network. Thus,that for existing (registered) numbers, the query will usually beanswered by the local DNS server from its cache fast. Queriesfor non-existing (unregistered) numbers take a few secondsuntil multicast DNS times out.Both VoIP clients and servers rely on the DNS servicediscovery feature to locate each other. This helps make the VoIParchitecture fully decentralized and scalable, allows (mobile)VoIP clients to roam between Phoenix nodes, and makesswapping hardware components, e.g., VoIP clients, easy inthe case of compromise or malfunction.VI. Voice and ChatOne of the purposes of PhoenixSEN is to facilitate thecoordination of people during an emergency. To that end,the network provides built-in support for SIP-based [18] real-time voice and text communication. The service supports thefollowing modalities: two-party calls, multi-party conferencing, text messaging, and multi-party group chat (with a persistentlog). Fig. 5 shows the internal architecture of the subsystem.The VoIP subsystem has been designed to support a varietyof VoIP clients. IP digital enhanced cordless communications(DECT) phone represents VoIP clients that come bundled withthe Phoenix node hardware. Each Phoenix node comes witha couple of IP DECT phones pre-conﬁgured for use with thenetwork. The client labeled as “PC with a web browser” couldbe any tablet, laptop, or a PC with a recent web-browser withWeb Real-Time Communication (WebRTC) support. The SIPserver includes a JavaScript WebRTC application that can turnany such device into a fully-featured VoIP and chat clientwithout the need to install any software. The desk phone iconrepresents VoIP clients that had existed at the substation beforethe Phoenix node was installed. The smartphone VoIP clientrepresents mobile Android devices with a SIP client applicationinstalled. The mobile clients are connected via a Wi-Fi networkcreated by the Phoenix node and can roam between substationswhile keeping the same number.The VoIP and chat subsystem uses a simple four-numberdialing plan where each substation is allocated a ﬁxed preﬁx.The substation’s VoIP clients are assigned numbers from thatpreﬁx. Each Phoenix node runs a full set of VoIP services(SIP and WebRTC servers, conference server, chat server). Theclients register at the nearest SIP server, typically the onelocated on the Phoenix node installed in the client’s substation.Having a full set of VoIP services on each Phoenix nodeallows the node to function in isolation. Even if the nodeis disconnected from PhoenixSEN, calls and chats within itssubstation should remain possible.The subsystem supports ordinary two-way calls between anytwo VoIP clients, multi-party conference calls with conferencerooms created on demand, peer-to-peer chat on compatibledevices (Android clients), and a web-based group chat serviceaccessible through the JavaScript WebRTC client. The VoIPservice supports persistent logging and archiving of conver-sations (both voice and text), if needed. The SIP server oneach Phoenix node performs transparent media transcoding andenforces authentication and encryption on calls that traverse ig. 6. A PhoenixSEN topology graph shown by Netmon during a live exercise.Elements that might require attention are highlighted in red. Hexagonal nodesrepresent substation Phoenix nodes. The menus on the right show a list of asubstation’s devices (top) and detail about a particular device (bottom).

PhoenixSEN, i.e., two or more Phoenix nodes. Transcodinghelps establish an upper bound on the VoIP bandwidth on slow(long-haul) links and improves robustness of calls traversingsuch links.A simple four-digit dialing plan with preﬁxes allocatedto substations was used during live exercises, however, theVoIP subsystem has no ﬁxed dialing plan hard-coded. Instead,it relies on DNS-SD to discover available SIP servers andVoIP clients, and to discover the SIP server where a particularVoIP client is registered. Section V-A describes in detail howthe VoIP subsystem uses DNS-SD. Having a fully dynamicVoIP subsystem has important beneﬁts: 1) Phoenix nodeshave uniform hardware and software conﬁguration and areinterchangeable; 2) mobile VoIP clients roaming from onesubstation to another are supported; and 3) the subsystem isfully decentralized with no single point of failure.VII. Network MonitoringIn a geographically distributed ad hoc network architecturesuch as PhoenixSEN, real-time network monitoring and situa-tion awareness is key for successful deployment and operation.In this section, we describe Netmon, a network monitoringservice designed and developed for PhoenixSEN. Netmonprovides near real-time information about the state of thenetwork through a web-based UI. The UI lets the control centersee the topology of the formed OLSR network and providesalerts when it detects security-related events and incidents.Fig. 6 shows the UI displaying the network topology graph ofa PhoenixSEN prototype during a live exercise.The internal architecture of Netmon is shown in Fig. 7.Every Phoenix node runs a custom Netmon agent process.The agent collects data about the state of the node using OS-level instrumentation and about directly attached substation(LAN) devices using active network scanning. It also providesan API for other processes on the same node, e.g., an IDS.The collected data includes statistics from selected network

Phoenix Node (Control Center) Phoenix NodeAgentPhoenix NodeAgentPhoenix NodeAgentAgent Server Phoenix NetworkMongoDBBackend Server LaptopFrontend (UI)LAN

REST API + WebSocketWebSocket (agent protocol)MongoDB wire protocol

Fig. 7. Netmon internal architecture. Each Phoenix node runs an agent processthat streams collected data to a backend server in the control center. Networkoperators interact with Netmon via a web-based UI implemented in JavaScript.The architecture provides a near real-time overview of the network. interfaces, the state of the node’s OLSR links, and informationabout discovered devices.To discover devices attached to the node’s substation LANinterfaces, the agent periodically issues Address ResolutionProtocol (ARP) requests for all IP addresses that belong to theinterface’s conﬁgured IP subnet. A device that responds to theARP request is recorded and will be displayed in the networktopology graph, as shown in Fig. 6. Once a LAN device hasbeen discovered, it is periodically contacted to determine thedevice’s reachability. The agent also probes the state of selectedUDP and Transmission Control Protocol (TCP) ports on alldiscovered LAN devices to see what services might be oﬀeredby the device.A separate IDS process, running on the same Phoenixnode, provides additional data to the Netmon agent. The datarepresents potential intrusion detection incidents and securitythreats. Netmon transmits the data to the backend where it isthen presented to the network operator by the frontend (UI).Each agent maintains a persistent WebSocket [23] connectionto an agent server discovered via DNS SRV. The control centerPhoenix node provides one agent server instance for eachVLAN. As long as the agent is connected to the server, data isstreamed to the agent server in near real-time. When an agentgets disconnected from the server, it temporarily stores thecollected data in a local cache. All locally cached data will beuploaded to the server later once the agent has reconnected.The agent server stores all the data from all connected agentsin a persistent database. The data is processed and indexed toallow time-based addressing and aggregation, i.e., retrievingthe state of the network at a particular time. This feature allowsdebugging or post-mortem analysis after an exercise when thephysical network infrastructure is no longer operational. Abackend server, also running on the control center Phoenixnode, serves the data to the frontend (UI) via a RESTfuland WebSocket based API. The API allows the frontend codeto retrieve any data generated by the agents, and to receiveasynchronous (push) updates as new data becomes available.The UI is a JavaScript application running in a web browseron a laptop in the control center. The application is speciﬁcallydesigned to always show the most recent network state withoutthe need to reload the browser window. The UI is automatically hoenix NodeEtherShield

Account DatabaseRADIUS ServerSwitch(w/ 802.1X)Open vSwitch802.1X SupplicantDevice A(no 802.1X) Controller RADIUS SQLEAPOL Pairing / Account ProvisioningOpenFlow 802.1X AuthenticatorRADIUSDevice B(w/ 802.1X)EAPOL Provisioning

Fig. 8. The architecture of the EtherShield subsystem. An Ethernet switch-likedevice is inserted between a trusted SCADA device and untrusted network.The device transparently authenticates all Ethernet frames received and sentby the device. The diagram shows one possible conﬁguration for networksthat support Port-based Network Access Control (IEEE 802.1X). updated based on push notiﬁcations received from the backendserver. The UI provides a quick “at a glance” overview of theentire network, as well as detailed information about individualnetwork components. The operator can use the UI to quicklyscan the network for faulty (red) nodes and links, or see whichparts of the network require immediate attention.The Netmon agent is implemented in Python, uses Scapy fornetwork probing and device discovery, and SQLite for the localcache. The agent and backend servers are both implemented inJavaScript running in NodeJS. All collected data is stored in aMongoDB database on the control center Phoenix node. Thefrontend is a JavaScript application implemented with Vue.jsand running in a web browser.VIII. Mitigating Insider AttacksInsider attacks are of particular concern in large physicalinfrastructures such as the electric grid. An attacker could usecompromised devices in a substation LAN to attempt to thwartrecovery attempts. If the attacker also gains physical accessto the substation, they could plant a malicious device in thesubstation’s network (“device-in-a-closet”) [24] and use thedevice to launch man-in-the-middle (MITM) attacks. Lack ofEthernet security combined with ever shrinking form factor ofembedded devices makes such attacks practical and very hardto discover.To help mitigate the risk of network-based insider attacks,we designed and built a prototype device called EtherShield.EtherShield is a “bump in the wire” embedded device withtwo Ethernet ports: internal and external. The internal port isconnected to a trusted (not compromised) SCADA device,preferably, as close to the SCADA device as possible tominimize the risk of malicious actors getting access to thatsegment of the network. The external port is connected tothe untrusted substation LAN. Until activated, the deviceoperates as a regular Ethernet switch. Once activated from thenearest Phoenix node, the device transparently authenticates allcommunication between the SCADA device and the network.Fig. 8 shows the architecture of the EtherShield subsystem.Each EtherShield device will have been paired with thenearest Phoenix node via a MITM-resistant USB connection prior to deployment into the LAN. During pairing, the Ether-Shield device and the Phoenix node generate the cryptographicmaterial for secure communication over an untrusted LAN.Until activated, the EtherShield device passes all traﬃc betweenits two ports unmodiﬁed. Once a SCADA device protectedwith an EtherShield is deemed secure, the network operatorcan activate a secure mode in the EtherShield to isolate theSCADA device from potentially malicious traﬃc. In a securemode, the EtherShield transparently authenticates all packetsbetween the SCADA device, the nearest Phoenix node, and(optionally) other devices within the substation LAN.The authentication is based on either IEEE 802.1X [25]or IPsec [26]. The EtherShield can be conﬁgured to onlylet authenticated traﬃc through to the SCADA device, orto also allow traﬃc from other LAN devices not protectedby an EtherShield device. This gives the network operatorsome ﬂexibility while incrementally securing the substationSCADA network. Critical SCADA devices can be incrementallyprotected and isolated from malicious traﬃc without disruptingthe operation of the rest of the substation LAN.We built a prototype EtherShield device based on theRaspberry Pi running Open vSwitch. In the default “open”mode of operation, Open vSwitch is conﬁgured to function asa simple learning Ethernet switch. When switched into a securemode, the device starts either an IEEE 802.1X supplicant oran IPsec tunnel and re-conﬁgures Open vSwitch to redirect aportion of the traﬃc to those services. For example, in the IEEE802.1X mode only Extensible Authentication Protocol overLAN (EAPoL) frames are redirected to the supplicant process.Frames received from the protected device are transparentlyaugmented with an authentication header (IEEE 802.1X orIPsec authentication header) as they pass through the switch.A frame received from the untrusted LAN for the protecteddevice is ﬁrst authenticated before the frame is forwarded tothe device. A custom controller application provides a secureHTTP API. This API can be used to conﬁgure and control thedevice from the nearest Phoenix node.The Phoenix node provides a standards-based set of servicesto support the EtherShield subsystem. The cryptographicmaterial generated during pairing is kept in a PostgreSQLdatabase, along with account information. The node providesan IEEE 802.1X authenticator to the substation LAN basedon the Remote Authentication Dial-In User Service (RADIUS)protocol [27]. The authentication is implemented with FreeRA-DIUS. An IPsec-based authenticator based on strongSwan isalso provided. IX. Related Work

Disaster Communications . The critical role of communi-cation in the aftermath of a large-scale disaster event becameapparent after hurricane Katrina in 2005 [28], hurricane Irenein 2011, and during the 2017 Atlantic hurricane season [29].In Puerto Rico, over 11 million people lost electricity for11 days in the largest blackout in U.S. history [30]. Theseevents rendered existing communication networks across largegeographic areas unusable [10], [31]. Additionally, signiﬁcanthallenges caused by non-interoperable communication systemswere reported [9], limiting situational awareness and preventingcommunication across jurisdictions and organizations [32], [33].A 2016 National Institute of Standards and Technology (NIST)report [34] found communication failures to be a major obstaclein all recent disaster recovery eﬀorts.

Industrial Control Systems . Modern industrial controlsystems, including the power grid, require communication tofunction. As critical infrastructure, such systems are increas-ingly targets of cyber attacks [35], [36]. A 2011 vulnerabilityanalysis performed by the Idaho National Laboratory foundthat applying traditional IT system protective measures toreal-time energy delivery control systems is inadequate andcould lead to a power disruption [7]. The Stuxnet worm [37]and the Maroochy water breach [38] incidents illustrate thedanger network-based cyber attacks pose for industrial controlsystems. Even ordinary planned upgrades can have devastatingconsequences in systems that control physical processes [39].

Network Architectures . A network architecture suitablefor emergency scenarios will inevitably be ad hoc and tem-porary [11]. Such architectures have been the subject ofactive research for decades and many promising ideas havebeen proposed including peer-to-peer, mesh, mobile, delay-tolerant, opportunistic, and hybrid network architectures [40]–[42]. Despite the wide range of communication architectures,basic interoperability between organizations is still a problem.There have been attempts to repurpose existing technologyto provide services where communication networks are unavail-able. The Village Telco project [43] designed and built a doit yourself (DIY) toolkit for voice and text communicationsbased on wireless mesh networking and VoIP. The Osmocomproject [44] provides the essential building blocks for temporarycellular network infrastructures.

Disaster Management Systems . Recently, specialized dis-aster management systems have been gaining interest. Severalorganizations specializing in emergency response have madetheir solutions available [11], [45]–[47]. The core functionalityof these systems centers around crowd-sourced ﬁeld datagathering, incident tracking, and visualization. Such systemstypically place no special requirements on the underlyingcommunication networks. There have been attempts to useadvanced technology in emergencies, for example, PanaceaGlass proposes to use Google Glass and cloud computing forsituation awareness and eﬀective triage of patients [48], [49].

Network Monitoring Systems . The ad hoc nature oftemporary emergency networks calls for some form of real-time monitoring for topology discovery, bandwidth estimation,intrusion detection, and troubleshooting in the ﬁeld. A vastnumber of network measurement and monitoring tools havebeen developed [50]. The tools can be broadly classiﬁedinto passive, active, and hybrid (a combination of active andpassive) [51].Passive monitoring relies on packet capture [52], [53], OS in-strumentation [54], or sampling [55] to perform measurements.Passive methods require existing network traﬃc to function,the ability to observe data ﬂows within the network, and are generally less intrusive than active methods [56]. Notablepassive monitoring tools include CoralReef [57], Bro [58],Wireshark [59], Snort [60], and sFlow [61].Active monitoring estimates network properties by observingthe handling of special-purpose data injected into the networkby the monitoring tool. Active methods are generally intrusive,generate variable amounts of artiﬁcial network load, and mayinterfere with application traﬃc. Existing active methods relyon IP options (Internet Control Message Protocol (ICMP)echo [62], traceroute [63]), TCP congestion control (iPerf [64]),bandwidth and transmission time probing (BWPing [65]), orpacket dispersion measurements [66], [67]. Scapy [68] is apopular packet manipulation tool for active network discoveryand monitoring.An important aspect of any network monitoring system is itsdata processing architecture. The architecture determines thenumber and locations of collection points, types of collecteddata, collection protocols, and data processing algorithms. Ifdata collection takes place over the network being monitored,care must be taken to minimize the generated overhead and itsimpact on the monitored network. Well-known monitoringdata collection protocols include Cisco NetFlow [69], theIPFix protocol [70], and Simple Network Monitoring Pro-tocol (SNMP) [71]. SNMP is widely supported by existingnetworking equipment.The following monitoring tools inspired Netmon, the moni-toring tool presented in this paper: Moloch [72], MRTG [73],OpenNMS [74], Cacti [75], and Zabbix [76].X. ConclusionWe presented the design and prototype implementation ofPhoenixSEN, an ad hoc network architecture speciﬁcally de-signed to enable real-time coordination of people and (SCADA)devices during an emergency, while the primary networkinfrastructure may be inoperable. Our network architectureis designed with suﬃcient ﬂexibility to be used a drop-inreplacement for network infrastructure and services typicallyprovided by third-party ISPs. Our work is primarily motivatedby the needs of the power distribution industry during ahypothetical large-scale blackout triggered by a network-basedcyber attack. We believe PhoenixSEN has the potential to speedup power grid recovery and service restoration, in particularduring a cyber attack that is persistent and ongoing.Several iterations of the PhoenixSEN prototype described inthis paper were tested and evaluated through a series of DARPA-led cyber security exercises on Plum Island, NY [77]. Duringthe exercise, PhoenixSEN was used together with a varietyof power grid recovery tools contributed by other exerciseparticipant to asses the readiness of the infrastructure to recoverfrom a simulated large-scale cyber attack on the electrical grid.Photos in Fig. 9 show the evaluated PhoenixSEN prototype.This paper describes the most recent iteration of PhoenixSEN.Some of the features described in this paper ended up notbeing evaluated in live exercises. The EtherShield prototypedevice described in Section VIII was developed and testedin a lab at Columbia University. The hardware deployed on ig. 9. Photos of the PhoenixSEN prototype evaluated through DARPA-led exercises. Left to right: Phoenix node components (Intel NUC, POE switch, YealinkW52P phone), Phoenix node in weather-resistant enclosure (Intel NUC, Yealink W52P phone, Netgear Ethernet switch, Tripp Lite UPS), small-scale PhoenixSEN prototype installation in a lab. The photos were taken by Hema Retty at the University of Illinois at Urbana-Champaign in September 2018.

Plum Island did not have GPS receivers since the preciselocation of each Phoenix node was known in advance. One-time deployment conﬁguration (Section IV-A) was performedfrom a laptop attached to the Phoenix node via an Ethernetport. The conﬁguration synthesis approach was only tested ona Columbia University testbed. The chat feature (Section VI),which was originally developed to support team coordinationduring the exercise, was eventually replaced with a third-partyapplication the participants were already familiar with.

A. Future Work

Although the design of PhoenixSEN was primarily motivatedby speciﬁc needs of the power distribution industry, we believethe resulting architecture is more general and ﬂexible, and mightbe suitable for other emergency scenarios as well. As part offuture work, PhoenixSEN could be extended with featuresand services for ﬁrst responders, disaster recovery, and otheremergency scenarios where communication and coordinationis critical even if the primary communication networks arenot operational. We envision supporting a variety of commonemergency and disaster scenarios on top a general-purposehardware and software architecture with interchangeable serviceproﬁles. Each proﬁle will activate services designed to meetthe needs of a particular emergency scenario, e.g., incidentmanagement, ticketing, collaborative document editing, andothers.The VoIP subsystem in PhoenixSEN provides all the usualmodes of real-time communication: two-way calls, multi-partyconference calls, direct chat, and group chat. It is likely thatPhoenixSEN might get deployed in challenging scenarios whereordinary calls and chats may not be the most eﬃcient means ofcommunication. As part of an eﬀort to support more emergencyscenarios, further research into alternative means of (near) real-time communication might be warranted. Speciﬁcally, goodsupport for push-to-talk (PTT) communication seems to be apromising direction for ﬁrst responder communication systems.Recent advances in automated speech-to-text and text-to-speech technologies might provide a means to automaticallytranscribe and index emergency voice communications. Inaddition, such technologies might provide better support forvoice communication over long-haul and slow links such as HF band links, satellite links, and power line communication(PLC) systems. A combination of Robust Header Compression(RoHC) [78], PTT, transcoding, and voice-to-text could bepotentially used to design a near real-time, highly robust, andresilient communication system suitable for very challengedcommunication networks and scenarios.We plan to continue developing the Netmon software withfeatures speciﬁcally designed for mobile ad hoc networks. Webelieve that no other existing network monitoring solutionsupports ad hoc or highly dynamic networks well. In a futureversion of Netmon, we plan to remove the restriction to haveonly one instance of backend services per network. We willaim to make Netmon fully decentralized, with the backendservices and data sharded across all Phoenix nodes. To builda global view of the network, Netmon will obtain and mergedata obtained from all Phoenix nodes.AcknowledgmentThe work described in this paper was very much a teameﬀort. The authors developed the PhoenixSEN prototype aspart of a joint team with members from BAE Systems andPerspecta Labs.We would also like to thank Clifton Lin, Charles Tao, DefuLi, Ranga Reddy, and James Dolan, all members of the BAESystems team, for fruitful discussions and help in developingand testing the prototype.We are indebted to Frafos GmbH, the developers of the ABCsession border controller (SBC), who generously provided uswith a free license to use the SBC in our prototype. Furthermore,Frafos engineers were instrumental in helping us to install andconﬁgure the SBC during the early stages of our project.References [1] CNN. (2019, Aug.) Massive failure leaves Argentina, Paraguay andUruguay with no power. [Online]. Available: https://edition . cnn . . bbc . . nationalgrideso . . pjm . com/~/media/documents/manuals/m12 . ashx5] National Institute of Standards and Technology, “NIST Framework andRoadmap for Smart Grid Interoperability Standards, Release 3.0,” Sep.2014. [Online]. Available: http://dx . doi . org/10 . . SP . . darpa . . nationalgrideso . com/document/151081/download[9] Applied Technology Council, “Critical Assessment of Lifeline SystemPerformance: Understanding Societal Needs in Disaster Recovery,”National Institute of Standards and Technology, Tech. Rep., Apr. 2016.[Online]. Available: http://dx . doi . org/10 . . GCR . . fcc . gov/public/attachments/DOC-353805A1 . . tucsoncert . info/Red_Cross_DST_slides . . ferc . gov/legal/maj-ord-reg/land-docs/order889 . asp[13] “IEEE Standard for Electric Power Systems Communications-DistributedNetwork Protocol (DNP3),” IEEE Std 1815-2012 (Revision of IEEE Std1815-2010) , pp. 1–821, 2012.[14] International Electrotechnical Commission, “Telecontrol equipment andsystems - Part 6-503: Telecontrol protocols compatible with ISO standardsand ITU-T recommendations - TASE.2 Services and protocol,”

IEC Std60870-6-503 TASE.2 Services and protocol , 2014.[15] ——, “Communication networks and systems for power utility automa-tion,”

IEC Std 61850 , 2013.[16] “IEEE Standard for Synchrophasor Measurements for Power Systems,”

IEEE Std C37.118.1-2011 . ietf . org/rfc/rfc7348 . . ietf . org/rfc/rfc3261 . . ietf . org/rfc/rfc3626 . . ietf . org/rfc/rfc6762 . . ietf . org/rfc/rfc6763 . . ietf . org/rfc/rfc3263 . . ietf . org/rfc/rfc6455 . txt[24] C. Haschek. The curious case of the raspberry pi in the network closet.[Online]. Available: https://blog . haschek . at/2018/the-curious-case-of-the-RasPi-in-our-network . html[25] IEEE. (2020) 802.1X: Port-Based Network Access Control. [Online].Available: https://1 . ieee802 . . ietf . org/rfc/rfc6701 . . ietf . org/rfc/rfc2865 . . congress . . freepress . net/sites/default/ﬁles/2019-05/connecting_the_dots_the_telecom_crisis_in_puerto_rico_free_press . . vox . com/identities/2018/8/15/17692414/puerto-rico-power-electricity-restored-hurricane-maria[31] Federal Government of the United States. (2006) The Federal Responseto Hurricane Katrina: Lessons Learned. [Online]. Available: https://georgewbush-whitehouse . archives . gov/reports/katrina-lessons-learned/[32] M. Stute, M. Maass, T. Schons, M.-A. Kaufhold, C. Reuter, and M. Hol-lick, “Empirical insights for designing information and communicationtechnology for international disaster response,” International Journal ofDisaster Risk Reduction , p. 101598, 2020.[33] L. K. Comfort and T. W. Haase, “Communication, Coherence,and Collective Action: The Impact of Hurricane Katrina onCommunications Infrastructure,”

Public Works Management & Policy ,vol. 10, no. 4, pp. 328–343, 2006. [Online]. Available: https://doi . org/10 . . nist . gov/sites/default/ﬁles/documents/el/resilience/NIST-GCR-16-917-39 . pdf[35] The White House. (2009) Remarks by the President onSecuring Our Nation’s Cyber Infrastructure. [Online]. Avail-able: https://obamawhitehouse . archives . gov/the-press-oﬃce/remarks-president-securing-our-nations-cyber-infrastructure[36] National Electric Sector Cybersecurity Organization Resource (NESCOR.Analysis of Selected Electric Sector High Risk Failure Scenarios.[Online]. Available: https://smartgrid . epri . com/NESCOR . aspx[37] S. Karnouskos, “Stuxnet worm impact on industrial cyber-physical systemsecurity,” in IECON 2011 - 37th Annual Conference of the IEEE IndustrialElectronics Society , Nov 2011, pp. 4490–4494.[38] J. Slay and M. Miller, “Lessons Learned from the Maroochy WaterBreach,” in

International Conference on Critical Infrastructure Protection . ntsb . gov/investigations/AccidentReports/Pages/PLD18MR003-preliminary-report . aspx[40] F. Legendre, T. Hossmann, F. Sutton, and B. Plattner, “30 years ofwireless ad hoc networking research: what about humanitarian and disasterrelief solutions? what are we still missing?” in Proceedings of the 1stInternational Conference on Wireless Technologies for HumanitarianRelief (ACWR’11) , 2011, pp. 217–217.[41] G. V. Kumar, Y. V. Reddyr, and D. M. Nagendra, “Current researchwork on routing protocols for MANET: a literature survey,” internationalJournal on computer Science and Engineering , vol. 2, no. 03, pp. 706–713, 2010.[42] goTenna. (2019, Aug.) goTenna Mesh Network. [Online]. Available:https://gotenna . com/[43] M. Adeyeye and P. Gardner-Stephen, “The Village Telco project: a reliableand practical wireless mesh telephony infrastructure,” EURASIP Journalon Wireless Communications and Networking , vol. 2011, no. 1, p. 78,2011.[44] The Osmocom Project. (2020) Open Source Mobile Communications.[Online]. Available: http://osmocom . org/[45] NetHope, Inc. NetHope. [Online]. Available: https://nethope . . arrl . org/ares[47] Sahana Foundation. (2020) Open Source Disaster Management Software.[Online]. Available: https://sahanafoundation . org/[48] J. Gillis, P. Calyam, A. Bartels, M. Popescu, S. Barnes, J. Doty,D. Higbee, and S. Ahmad, “Panacea’s glass: Mobile cloud frameworkfor communication in mass casualty disaster triage,” in . IEEE, 2015, pp. 128–134.[49] D. Jiang, R. Huang, P. Calyam, J. Gillis, O. Apperson, D. Chemodanov,F. Demir, and S. Ahmad, “Hierarchical cloud-fog platform for communi-cation in disaster incident coordination,” in . slac . stanford . edu/xorg/nmtf/nmtf-tools . html[51] B. B. Lowekamp, “Combining active and passive network measurementsto build scalable monitoring systems on the grid,” SIGMETRICSPerform. Eval. Rev. , vol. 30, no. 4, p. 19–26, Mar. 2003. [Online].Available: https://doi . org/10 . . USENIX winter . tcpdump . . ietf . org/rfc/rfc3549 . . ietf . org/rfc/rfc3176 . txt[56] W. John, S. Tafvelin, and T. Olovsson, “Passive internet measurement:Overview and guidelines based on experiences,” Computer Communica-tions , vol. 33, no. 5, pp. 533–550, 2010.[57] K. Keys, D. Moore, R. Koga, E. Lagache, M. Tesch, and k. claﬀy,“The architecture of CoralReef: an Internet traﬃc monitoring softwaresuite,” in

Passive and Active Network Measurement Workshop (PAM) .Amsterdam, Netherlands: RIPE NCC, Apr 2001.[58] V. Paxon, “Using bro to detect network intruders: experiences and status,”in

Proceedings of First International Workshop on the Recent Advancesin Intrusion Detection . wireshark . . snort . org/[61] InMon Corp. (2020) sFlow. [Online]. Available: https://sﬂow . . ietf . org/rfc/rfc792 . . ietf . org/rfc/rfc1393 . txt[64] iPerf Authors. (2020) iPerf. [Online]. Available: https://iperf . fr/ [65] BWPing Authors. (2020) BWPing. [Online]. Available: https://bwping . sourceforge . io/[66] C. Dovrolis, P. Ramanathan, and D. Moore, “What do packet dispersiontechniques measure?” in Proceedings IEEE INFOCOM 2001. Conferenceon Computer Communications. Twentieth Annual Joint Conference of theIEEE Computer and Communications Society (Cat. No. 01CH37213) ,vol. 2. IEEE, 2001, pp. 905–914.[67] R. Kapoor, L.-J. Chen, L. Lao, M. Gerla, and M. Y. Sanadidi, “CapProbe:A Simple and Accurate Capacity Estimation Technique,” in

Proceedingsof the 2004 Conference on Applications, Technologies, Architectures,and Protocols for Computer Communications , ser. SIGCOMM ’04.New York, NY, USA: Association for Computing Machinery, 2004, p.67–78. [Online]. Available: https://doi . org/10 . . . . ietf . org/rfc/rfc3954 . . ietf . org/rfc/rfc7011 . . ietf . org/rfc/rfc1157 . txt[72] Moloch Developers. (2020) Moloch Full Packet Capture. [Online].Available: https://molo . ch/[73] Tobi Oetiker. (2020) The Multi Router Traﬃc Grapher. [Online].Available: https://oss . oetiker . . opennms . . cacti . . zabbix . . wired . . ietf . org/rfc/rfc5795 ..