[PDF] A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

Abstract

Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane application programming interfaces (APIs) which may be leveraged by user-defined software-defined networking (SDN) control. This offers great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community, and it is supported by various software and hardware platforms. In the first part of this paper we give a tutorial of data plane programming models, the P4 programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we categorize a large body of literature of P4-based applied research into different research domains, summarize the contributions of these papers, and extract prototypes, target platforms, and source code availability. For each research domain, we analyze how the reviewed works benefit from P4's core features. Finally, we discuss potential next steps based on our findings.

Full PDF

TThis work has been submitted to the IEEE for possible publication in the Communications Surveys & Tutorials (COMST)journal. Copyright may be transferred without notice, after which this version may no longer be accessible.

A Survey on Data Plane Programming with P4:Fundamentals, Advances, and Applied Research

Frederik Hauser, Marco Häberle, Daniel Merling, Steffen Lindner, Vladimir Gurevich, Florian Zeiger,Reinhard Frank, and Michael Menth

Abstract —With traditional networking, users can conﬁgurecontrol plane protocols to match the speciﬁc network conﬁg-uration, but without the ability to fundamentally change theunderlying algorithms. With software-deﬁned networking (SDN),the users may provide their own control plane, that can controlnetwork devices through their data plane application program-ming interfaces (APIs). Programmable data planes allow usersto deﬁne their own data plane algorithms for network devicesincluding appropriate data plane APIs which may be leveragedby user-deﬁned SDN control. Thus, programmable data planesand SDN offer great ﬂexibility for network customization, beit for specialized, commercial appliances, e.g., in 5G or datacenter networks, or for rapid prototyping in industrial andacademic research. Programming protocol-independent packetprocessors (P4) [1] has emerged as the currently most widespreadabstraction, programming language, and concept for data planeprogramming. It is developed and standardized by an opencommunity and it is supported by various software and hardwareplatforms. In this paper, we survey the literature from 2015 to2020 on data plane programming with P4. Our survey covers 497references of which 367 are scientiﬁc publications. We organizeour work into two parts. In the ﬁrst part, we give an overviewof data plane programming models, the programming language,architectures, compilers, targets, and data plane APIs. We alsoconsider research efforts to advance P4 technology. In the secondpart, we analyze a large body of literature considering P4-basedapplied research. We categorize 241 research papers into differentapplication domains, summarize their contributions, and extractprototypes, target platforms, and source code availability.

Index Terms —P4, SDN, programmable data planes

I. I

NTRODUCTION

Traditional networking devices such as routers and switchesprocess packets using data and control plane algorithms. Userscan conﬁgure control plane features and protocols, e.g., viaCLIs, web interfaces, or management APIs, but the underlyingalgorithms can be only changed by the vendor. SDN attemptsto make the network devices programmable by introducingan API that allows users to bypass the built-in control planealgorithms and replace them with user-deﬁned ones. Thosealgorithms are expressed in software and typically run onan SDN controller with an overall view of the network.

Frederik Hauser, Marco Häberle, Daniel Merling, SteffenLindner, and Michael Menth are with Chair of CommunicationNetworks, University of Tuebingen, Tuebingen, Germany. E-mail:frederik.hauser,marco.haeberle,daniel.merling,steffen.lindner,[email protected] Gurevich is with Intel, Barefoot Division (BXD), United Statesof America. E-Mail: [email protected] Zeiger and Reinhard Frank are with SiemensAG, Corporate Technology, Munich, Germany. E-Mail: ﬂo-rian.zeiger,[email protected]

Thereby, complex control plane algorithms that were designedfor distributed control can be replaced by simpler, centralizedalgorithms. This is beneﬁcial for use cases with high demandswith regard to ﬂexibility, efﬁciency and security, e.g., massivedata centers or 5G networks. However, SDN allows the userto provide only their own control plane while the data planesof the devices are still non-replaceable and remain under thecontrol of the vendors. This restriction is solved by data planeprogramming. It enables users to deﬁne their own data planealgorithms. This drastically changes their power as they canbuild custom network equipment without any compromise inperformance, scalability, speed, or power. There are differentdata plane programming models, each with many implemen-tations and programming languages [2]–[14].Programming protocol-independent packet processors (P4)is currently the most widespread abstraction, programminglanguage, and concept for data plane programming. First pub-lished as research paper in 2014 [1], it is now developed andstandardized in the P4 Language Consortium, it is supportedby various software- and hardware-based target platforms, andit is widely applied in academia and industry. In this paper,we give an overview of this interesting technology. In thefollowing, we clarify the novelty and contribution of this paperand outline its organization.

A. Novelty and Contributions

There are numerous surveys on SDN published in 2014[15], [16], 2015 [17]–[19], and 2016 [20], [21] as well assurveys on OpenFlow (OF) from 2014 [22]–[24], but only oneof them [21] mentions P4 in a single sentence. Two surveys ofdata plane programming from 2015 [18], [19] were publishedshortly after the release of P4, one conference paper from 2018[25] and a survey from 2019 [26] present P4 just as one amongother data plane programming languages.To the best of our knowledge, this is the ﬁrst surveythat exclusively covers P4 and its applications. We considerpublications on P4 that were published until the end of 2020.Beside journal, conference, and workshop papers, we alsoinclude contents from standards, websites, and source coderepositories. We included 497 references of which 367 arescientiﬁc publications. 116 scientiﬁc publications are from2020, 113 from 2019, 65 from 2018, and 73 from 2017 andbefore.We pursue two objectives. First, we give a comprehensiveintroduction and overview of P4. Second, we present a surveyof publications that describe applied research based on P4technology. a r X i v : . [ c s . N I] J a n The main contributions of the survey are the following: • We explain the evolution to data plane programming withP4, relate it to prior developments such as SDN, andcompare it to other plane programming models. • We give an overview of data plane programming with P4.It comprises the P4 programming language, architectures,compilers, targets, and data plane APIs. • We present a survey of research efforts to advance P4data planes. It comprises optimization of developmentand deployment, testing and debugging, research on P4targets, and advances on control plane operation. • We analyze a large body of literature considering P4-based applied research. We categorize research papersinto different application domains, summarize their keycontributions, and characterize them with respect to pro-totypes, target platforms, and source code availability.

B. Paper Organization

Figure 1 depicts the structure of this paper. We divide thesurvey into two main parts: an overview of P4 and a surveyof research publications that show the various domains whereP4 was applied.The ﬁrst part comprises an overview of P4. Section IIgives an introduction to network programmability. We describethe development from traditional networking and SDN todata plane programming and present the two most commondata plane programming models. In Section III, we give atechnology-oriented tutorial of P4 based on its latest versionP4 . We introduce the P4 programming language and describehow user-provided P4 programs are compiled to and executedon P4 targets. Section IV presents the concept of P4 architec-tures as intermediate layer between the P4 programs and thetargets. We introduce the four most common architectures indetail and describe P4 compilers. In Section V, we categorizeand present platforms that execute P4 programs, so-called P4targets that are based on software, FPGAs, ASICs, or NPUs.Section VI gives an introduction to data plane APIs. Wedescribe their functions, present a characterization, introducethe four main P4 data plane APIs that serve as interface forSDN controllers, and point out controller use case patterns.In Section VII, we summarize research efforts that aim toimprove P4 data plane programming.In the second part of the paper, we survey P4-based ap-plied research in communication networks. In Section VIII,we present an overview of the research domains and showstatistics about the included publications. The super-ordinateresearch domains are monitoring (Section IX), trafﬁc manage-ment and congestion control (Section X), routing and forward-ing (Section XI), advanced networking (Section XII), networksecurity (Section XIII), and miscellaneous (Section XIV) tocover additional, different topics. Each category includes atable to give a quick overview of the analyzed papers withregard to prototype implementations, target platforms, andsource code availability.Section XV concludes this work. The appendix includes alist of the acronyms used in the paper. Part I: Overview of P4

Network Programmability (Sect. II)The P4 ProgrammingLanguage (Sect. III)

Part II: Applied Research Domains

Conclusion (Sect. XV) Monitoring (Sect. IX) Introduction (Sect. I)P4 Architectures& Compilers (Sect. IV) P4Targets(Sect. V) P4 DataPlane APIs (Sect. VI)Advances in P4 Data Plane Programming (Sect. VII)Overview (Sect. VIII)AdvancedNetworking (Sect. XII) Network Security (Sect. XIII) Routing andForwarding (Sect. XI)Miscellaneous ResearchDomains (Sect. XIV)Trafﬁc Management andCongestion Control (Sect. X)

Fig. 1: Organization of the paper.

II. N

ETWORK P ROGRAMMABILITY

In this section, we ﬁrst deﬁne the notion of networkprogrammability and related terms. Then, we discuss controlplane programmability and data plane programming, elaborateon data plane programming models, and point out the beneﬁtsof data plane programming.

A. Deﬁnition of Terms

We deﬁne programmability as the ability of the softwareor the hardware to execute an externally deﬁned processingalgorithm. This ability separates programmable entities from ﬂexible (or conﬁgurable ) ones; the latter only allow changingdifferent parameters of the internally deﬁned algorithm whichstays the same.Thus, the term network programmability means the abilityto deﬁne the processing algorithm executed in a network andspeciﬁcally in individual processing nodes, such as switches,routers, load balancers, etc. It is usually assumed that nospecial processing happens in the links connecting networknodes. If necessary, such processing can be described as if ittakes place on the nodes that are the endpoints of the linksor by adding an "bump-in-the-wire" node with one input andone output.Traditionally, the algorithms, executed by telecommunica-tion devices, are split into three distinct classes: the dataplane, the control plane, and the management plane. Out ofthese three classes, the management plane algorithms havethe smallest effect on both the overall packet processing andnetwork behavior. Moreover, they have been programmablefor decades, e.g., SNMPv1 was standardized in 1988 andcreated even earlier than that. Therefore, management planealgorithms will not be further discussed in this section.

True network programmability implies the ability to specifyand change both the control plane and data plane algorithms.In practice this means the ability of network operators (endusers) to deﬁne both data and control plane algorithms on theirown, without the need to involve the original designers of thenetwork equipment. For the network equipment vendors (whotypically design their own control plane anyway), networkprogrammability mostly means the ability to deﬁne data planealgorithms without the need to involve the original designersof the chosen packet processing application-speciﬁc integratedcircuit (ASIC).Network programmability is a powerful concept that allowsboth the network equipment vendors and the end users to buildnetworks ideally suited to their needs. In addition, they can doit much faster and often cheaper than ever before and withoutcompromising the performance or quality of the equipment.For a variety of technical reasons, different layers becameprogrammable at different point in time. While the manage-ment plane became programmable in the 1980s, control planeprogrammability was not achieved until late 2000s to early2010s and a programmable switching ASICs did not appeartill the end of 2015.Thus, despite the focus on data plane programmability,we will start by discussing control plane programmabilityand its most well-known embodiment, called software-deﬁnednetworking (SDN). This discussion will also better prepare usto understand the signiﬁcance of data plane programmability.

B. Control Plane Programmability and SDN

Traditional networking devices such as routers or switcheshave complex data and control plane algorithms. They are builtinto them and generally cannot be replaced by the end users.Thus, the functionality of a device is deﬁned by its vendorwho is the only one who can change it. In industry parlance,vendors are often called original equipment manufacturers(OEMs).Software-deﬁned networking (SDN) was historically theﬁrst attempt to make the devices, and speciﬁcally their controlplane , programmable. On selected systems, device manufac-turers allowed users to bypass built-in control plane algorithmsso that the users can introduce their own. These algorithmscould then directly supply the necessary forwarding informa-tion to the data plane which was still non-replaceable andremained under the control of the device vendor or their chosensilicon provider.For a variety of technical reasons, it was decided to providean APIs that could be called remotely and that’s how SDNwas born. Figure 2 depicts SDN in comparison to traditionalnetworking. Not only the control plane became programmable,but it also became possible to implement network-wide controlplane algorithms in a centralized controller. In several impor-tant use cases, such as tightly controlled, massive data centers,these centralized, network-wide algorithms proved to be a lotsimpler and more efﬁcient, than the traditional algorithms (e.g.Border Gateway Protocol (BGP)) designed for decentralizedcontrol of many autonomous networks.The effort to standardize this approach resulted in thedevelopment of OpenFlow (OF) [27]. The hope was that once OF standardized the messaging API to control the data planefunctionality, SDN applications will be able to leverage thefunctions offered by this API to implement network control.There is a huge body of literature giving an overview of OF[22]–[24] and SDN [15]–[21].However, it soon became apparent that OF assumed a spe-ciﬁc data plane functionality which was not formally speciﬁed.Moreover, the speciﬁc data plane, that served as the basisfor OF, could not be changed. It executed the sole, althoughrelatively ﬂexible, algorithm deﬁned by the OF speciﬁcations.In part, it was this realization that led to the developmentof modern data plane programming that we discuss in thefollowing section.

Traditional networking

Control plane

SDN with ﬁxed-function data planes

Data planeControl plane

APIAPI

Programability(by the user)

Agent

Data plane

Programability(by the user)

Fig. 2: Distinction between traditional networking, SDN with ﬁxed-functiondata planes, and data plane programming.

C. Data Plane Programming

As mentioned above, data plane programmability meansthat the data plane with its algorithms can be deﬁned by theend users, be they network operators or equipment designersworking with a packet processing ASIC. In fact, data planeprogrammability existed during most of the networking in-dustry history, because data plane algorithms were typicallyexecuted on general-purpose CPUs. It is only with the adventof high-speed links, exceeding the CPU processing capabil-ities, and the subsequent introduction of packet processing(switching) ASICs that data plane programmability (or lackthereof) became an issue.The data plane algorithms are responsible for processingall the packets that pass through a telecommunication system.Thus, they ultimately deﬁne the functionality, performance,and the scalability of such systems. Any attempt to implementdata plane functionality in the upper layers, such as the controlplane, typically leads to signiﬁcant performance degradation.When data plane programming is provided to end users, itqualitatively changes their power. They can build customnetwork equipment without any compromise in performance,scalability, speed, or energy consumption.For custom networks, new control planes and SDN applica-tions can be designed and for them users can design data planealgorithms that ﬁt them ideally. Data plane programming doesnot necessarily imply any provision of APIs for end users nordoes it require support for outside control planes as in OF.Device vendors might still decide to develop a proprietarycontrol plane and use data plane programming only for their own beneﬁt without necessarily making their systems moreopen (although many do open their systems now). Figure 3visualizes both options.Four surveys from [18], [19], [25], [26] give an overviewon data plane programming, but do not set a particular focusto P4.

Control planeData plane

Vendor-based creation ofnetwork devices withdata plane programming

API

Control planeData plane

Full networkprogramability with dataplane programming

API

Programability(by the user) Programability(by the user)

Fig. 3: Data plane programmability may be used by vendors for more efﬁcientdevelopment or by end users to provide own data and control plane algorithms.

D. Data Plane Programming Models

Data plane algorithms can and often are expressed us-ing standard programming languages. However, they do notmap very well onto specialized hardware such as high-speedASICs. Therefore, several data plane models have been pro-posed as abstractions of the hardware. Data plane program-ming languages are tailored to those data plane models andprovide ways to express algorithms for them in an abstractway. The resulting code is then compiled for execution on aspeciﬁc packet processing node supporting the respective dataplane programming model.Data ﬂow graph abstractions and the Protocol IndependentSwitching Architecture (PISA) are examples for data planemodels. We give an overview of the ﬁrst and elaborate in-depths on the second as PISA is the data plane programmingmodel for P4.

1) Data Flow Graph Abstractions:

In these data planeprogramming models, packet processing is described by a di-rected graph. The nodes of the graph represent simple, reusableprimitives that can be applied to packets, e.g., packet headermodiﬁcations. The directed edges of the graph represent packettraversals where traversal decisions are performed in nodes ona per-packet basis. Figure 4 shows an exemplary graph forIPv4 and IPv6 packet forwarding.Examples for programming languages that implement thisdata plane programming model are Click [2], Vector PacketProcessors (VPP) [3], and BESS [4].

2) Protocol-Independent Switching Architecture (PISA):

Figure 5 depicts the PISA. It is based on the concept of aprogrammable match-action pipeline that well matches mod-ern switching hardware. It is a generalization of reconﬁgurablematch-action tables (RMTs) [5] and disaggregated reconﬁg-urable match-action tables (dRMTs) [6].PISA consists of a programmable parser, a programmabledeparser, and a programmable match-action pipeline in be-tween consisting of multiple stages.

EthernetinPacket IPv6inputIPv4input IPv6lookupIPv4lookup IPv6outIPv6localIPv4outIPv4local

Fig. 4: Data ﬂow graph abstraction: example graph for IPv4 and IPv6forwarding.

Matchlogic ...

Programmablematch-action pipelineProgrammableparser ProgrammabledeparserMatch-actionunit

ActionlogicMatchlogic ActionlogicMatchlogic ActionlogicActionlogicActionlogic Matchlogic ActionlogicMatchlogic ActionlogicMatchlogic ActionlogicActionlogicActionlogic

Match-actionunit M e t ada t a M e t ada t a M e t ada t a Fig. 5: Protocol-Independent Switch Architecture (PISA). • The programmable parser allows programmers to declarearbitrary headers together with a ﬁnite state machine thatdeﬁnes the order of the headers within packets. It convertsthe serialized packet headers into a well-structured form. • The programmable match-action pipeline consists ofmultiple match-action units. Each unit includes one ormore match-action-tables (MATs) to match packets andperform match-speciﬁc actions with supplied action data.The bulk of a packet processing algorithm is deﬁned inthe form of such MATs. Each MAT includes matchinglogic coupled with the memory (static random-accessmemory (SRAM) or ternary content-addressable memory(TCAM)) to store lookup keys and the correspondingaction data. The action logic, e.g., arithmetic operationsor header modiﬁcations, is implemented by arithmeticlogic units (ALUs). Additional action logic can be imple-mented using stateful objects, e.g., counters, meters, orregisters, that are stored in the SRAM. A control planemanages the matching logic by writing entries in theMATs to inﬂuence the runtime behavior. • In the programmable deparser , programmers declare howpackets are serialized.A packet, processed by a PISA pipeline, consists of packetpayload and packet metadata. PISA only processes packetmetadata that travels from the parser all the way to the deparserbut not the packet payload that travels separately.Packet metadata can be divided into packet headers, user-deﬁned and intrinsic metadata. • Packet headers is metadata that corresponds to the net-work protocol headers. They are usually extracted in the parser, emitted in the deparser or both. • Intrinsic metadata is metadata that relates to the ﬁxed-function components. P4-programmable components mayreceive information from the ﬁxed-function componentsby reading the intrinsic metadata they produce or controltheir behavior by setting the intrinsic metadata theyconsume. • User-deﬁned metadata (often referred as simply meta-data ) is a temporary storage, similar to local variables inother programming languages. It allows the developers toadd information to packets that can be used throughoutthe processing pipeline.All metadata, be it packet headers, user-deﬁned or intrinsicmetadata is transient , meaning that it is discarded when thecorresponding packet leaves the processing pipeline (e.g., issent out of an egress port or dropped).PISA provides an abstract model that is applied in var-ious ways to create concrete architectures. For example, itallows specifying pipelines containing different combinationsof programmable components, e.g., a pipeline with no parseror deparser, a pipeline with two parsers and deparsers, andadditional match-action pipelines between them. PISA alsoallows for specialized components that are required for ad-vanced processing, e.g., hash/checksum calculations. Besidesthe programmable components of PISA, switch architecturestypically also include conﬁgurable ﬁxed-function components.Examples are ingress/egress port blocks that receive or sendpackets, packet replication engines that implements multicas-ting or cloning/mirroring of packets, and trafﬁc managers,responsible for packet buffering, queuing, and scheduling.The ﬁxed-function components communicate with the pro-grammable ones by generating and/or consuming intrinsicmetadata. For example, the ingress port block generates ingressmetadata that represents the ingress port number that mightbe used within the match-action units. To output a packet,the match-action units generates intrinsic metadata that rep-resents an egress port number; this intrinsic metadata is thenconsumed by the trafﬁc manager and/or egress port block.Figure 6 depicts a typical switch architecture based onPISA. It comprises a programmable ingress and egress pipelineand three ﬁxed-function components: an ingress block, anegress block, and a packet replication engine together witha trafﬁc manager between ingress and egress pipeline.P4 (Programming Protocol-Independent Packet Processors)[1] is the most widely used domain-speciﬁc programminglanguage for describing data plane algorithms for PISA. Itsinitial idea and name were introduced in 2013 [28] and itwas published as a research paper in 2014 [1]. Since then,P4 has been further developed and standardized by the P4Language Consortium [29] that is part of the Open NetworkingFoundation (ONF) since 2019. The P4 Language Consortiumis managed by a technical steering committee and hosts ﬁveworking groups (WGs). P4 [30] was the ﬁrst standardizedversion of the language. The current speciﬁcation is P4 [31]which was ﬁrst introduced in 2016.Other data plane programming languages for PISA areFAST [7], OpenState [8], Domino [9], FlowBlaze [10],Protocol-Oblivious Forwarding [11], and NetKAT [12]. In Programmableingress pipeline P a ck e t r ep li c a t i on eng i ne + t r a f ﬁc m anage r I ng r e ss E g r e ss Programmableegress pipelineFixed-function components M e t ada t a M e t ada t a M e t ada t a M e t ada t a M e t ada t a M e t ada t a Fig. 6: Exemplary switch architecture based on PISA. addition, Broadcom [13] and Xilinx [14] offer vendor-speciﬁcprogrammable data planes based on match-action tables.

E. Beneﬁts

Data plane programmability entails multiple beneﬁts. Wecompiled a list which is certainly not complete. • Data plane programming introduces full ﬂexibility tonetwork packet processing. Algorithms, protocols, andfeatures can be added, modiﬁed, or removed. As a result,only components needed for a desired node behavior areincluded in the code, which leads to systems that are moreefﬁcient than multi-purpose appliances. Thus, complexitycan be kept low, which also facilitates smaller attacksurfaces. • With data plane programming, end users can be providedwith an API for control plane programmability and SDN. • Data plane programming allows network equipment de-signers and even end users to experiment with newprotocols and design unique applications without beingdependent on the vendors of the specialized packet-processing ASICs to implement custom algorithms. Asa result, the speed of network innovation increases byseveral orders of magnitude: a new algorithm can becoded and deployed in a matter of days whereas theturnaround time for a new silicon-based solution is mea-sured in years. • Network equipment developers can easily create differen-tiated products despite using the same packet processingASIC. In addition, they can keep their know-how tothemselves without the need to share the details withthe ASIC vendor and potentially disclose it to theircompetitors that will use the same ASIC. • Data plane programming does not require but encour-ages full transparency. If the source code is shared, alldeﬁnitions for protocols and behaviors can be viewed,analyzed, and reasoned about. • Data plane programming facilitates export of interme-diate results generated by a data plane program. Theymay serve as the basis for well-speciﬁed debugging andtelemetry information. • So far, modern data plane programs and programminglanguages have not yet achieved the degree of portability attained by the general-purpose programming languages.However, expressing data plane algorithms in a high-levellanguage has the potential to make telecommunicationsystems signiﬁcantly more target-independent. As a re-sult, end users could choose cost-efﬁcient hardware thatis well suited for their purposes and run their algorithmson top of it. This trend has been fueled by SDN and iscommonly known as network disaggregation.III. T HE P4 P

ROGRAMMING L ANGUAGE

We give an overview of the P4 programming language.We brieﬂy recap its speciﬁcation history and describe howP4 programs are deployed. We introduce the P4 processingpipeline and data types. Finally, we discuss parsers, match-action controls, and deparsers.

A. Speciﬁcation History

The P4 Language Design Working Group (LDWG) ofthe P4 Language Consortium has standardized so far twodistinct standards of P4: P4 and P4 . Table I depicts theirspeciﬁcation history. P4 Version 1.0.2 03/2015Version 1.1.0 01/2016Version 1.0.3 11/2016Version 1.0.4 05/2017Version 1.0.5 11/2018 P4 Version 1.0.0 05/2017Version 1.1.0 11/2018Version 1.2.0 11/2018Version 1.2.1 06/2020TABLE I: Speciﬁcation history of P4 and P4 . The P4 programming language dialect allows the pro-grammers to describe data plane algorithms using a combi-nation of familiar, general-purpose imperative constructs andmore specialized declarative ones that provide support for thetypical data-plane-speciﬁc functionality, e.g., counters, meters,checksum calculations, etc. As a result, the P4 language coreincludes more than 70 keywords. It further assumed a speciﬁcpipeline architecture based on PISA.P4 has been introduced to address several P4 limitationsthat became apparent in the course of its use. Those includethe lack of means to describe various targets and architectures,weak typing and generally loose semantics (caused, in part,by the above mentioned mix of imperative and declarativeprogramming constructs), relatively low-level constructs, andweak support for program modularity.Support for multiple different targets and pipeline archi-tecture is the major contribution of the P4 standard and isachieved by separating the core language from the speciﬁcs ofa given architecture, thus making it architecture-agnostic. Thestructure, capabilities and interfaces of a speciﬁc pipeline arenow encapsulated into an architecture description, while thearchitecture- or target-speciﬁc functions are accessible throughan architecture library, typically provided by the target vendor.The core components are further structured into a small setof language constructs and a core library that is useful formost P4 programs. Compared to P4 , P4 introduced stricttyping, expressions, nested data structures, several modularitymechanisms, and also removed declarative constructs, making it possible to better reason about the programs, written in thelanguage. Figure 7 illustrates the concept which is subdividedinto core components and architecture components. P4 language P416 languageCore library Corecomponents ArchitecturecomponentsArchitecture descriptionArchitecture library Fig. 7: Comparison of the P4 and P4 language according to [31]. Due to the obvious advantages of P4 , P4 developmenthas been discontinued, although it is still supported on a num-ber of targets. Therefore, we focus on P4 in the remainderof this paper where P4 implicitly stands for P4 . B. Development and Deployment Process

Figure 8 illustrates the development and deployment processof P4 programs.P4-programmable nodes, so-called P4 targets, are avail-able as software or specialized hardware (see Section V).They feature packet processing pipelines consisting of bothP4-programmable and ﬁxed-function components. The exactstructure of these pipelines is target-speciﬁc and is describedby a corresponding P4 architecture model (see Section IV)which is provided by the manufacturer of the target.P4 programs are supplied by the user and are implementedfor a particular P4 architecture model. They deﬁne algorithmsthat will be executed by the P4-programmable componentsand their interaction with the ones implemented in the ﬁxed-function logic. The composition of the P4 programs and theﬁxed-function logic constitutes the full data plane algorithm.P4 compilers (see Section IV) are also provided by themanufacturers. They translate P4 programs into target-speciﬁccode which is loaded and executed by the P4 target.The P4 compiler also generates a data plane API that canbe used by a user-supplied control plane (see Section VI) tomanage the runtime behavior of the P4 target.

P4 program (data plane) Control planeP4 architecture model P4 target

Supplied by the manufacturerSupplied by the user

Data plane APICodeP4 compiler

Fig. 8: P4 deployment process according to [31].

C. Information Flow P4 adopts PISA’s concept of packet metadata. Figure 9illustrates the information ﬂow in the P4 processing pipeline.It comprises different blocks, where packet metadata (be it headers, user-deﬁned or intrinsic metadata) is used to pass theinformation between them, therefore representing a uniforminterface.The parser splits up the received packet into individualheaders and the remaining payload. Intrinsic metadata fromthe ingress block, e.g., the ingress port number or the ingresstimestamp, is often provided by the hardware and can be madeavailable for further processing. Many targets allow the usermetadata to be initialized in the parser as well. Then, theheaders and metadata are passed to the match-action pipelinethat consists of one or more match-action units. The remainingpayload travels separately and cannot be directly affected bythe match-action pipeline processing.While traversing the individual match-action pipeline units,the headers can be added, modiﬁed, or removed and additionalmetadata can be generated.The deparser assembles the packet back by emitting thespeciﬁed headers followed by the original packet payload.Packet output is conﬁgured with intrinsic metadata that in-cludes information such as a drop ﬂag, desired egress port,queue number, etc. Parser Deparser

Intrinsicmetad.HeadersPayloadIntrinsicmetad.Intrinsicmd.Headers

P4 block w/interface

Match-action unit Match-action unit

P4 block w/interface P4 block w/interface P4 block w/interface

HeadersUser metad. User metad.User metad.

Fig. 9: Information ﬂow.

D. Data Types P4 is a statically typed language that supports a rich setof data types for data plane programming.

1) Basic Data Types: P4 includes common basic typessuch as Boolean ( bool ), signed ( int ), and unsigned ( bit )integers which are also known as bit strings. Unlike manycommon programming languages, the size of these integersis speciﬁed at bit granularity, with a wide range of sup-ported widths. For example, types such as bit<1> , int<3> , bit<128> and wider are allowed.In addition, P4 supports bit strings of variable width,represented by a special varbit type. For example, IPv4 optionscan be represented as varbit<320> since the size of IPv4options ranges from zero to 10 32-bit words.P4 also supports enumeration types that can be serializ-able (with the actual representation speciﬁed as bit or int during the type deﬁnition) or non-serializable, wherethe type representation is chosen by the compiler and hiddenfrom the user.

2) Derived Data Types:

Basic data types can be composedto construct derived data types. The most common deriveddata types are header , header stack , and struct . The header data type facilitates the deﬁnition of packetprotocol headers, e.g., IPv4 or TCP. A header consists of onemore ﬁelds of the serializable types described above, typically bit , serializable enum , or varbit . A header also has animplicit validity ﬁeld indicating whether the header is part of apacket. The ﬁeld is accessible through standard methods suchas setvalid() , setInvalid() , and isValid() . Packet parsing startswith all headers being invalid. If the parser determines that aheader is present in the packet, the header ﬁelds are extractedand the header’s validity ﬁeld is set valid. The standard packet emit() method used by a deparser equips packets only withvalid headers. Thus, P4 programs can easily add and removeheaders by manipulating their validity bits. A sample headerdeclaration is shown in Figure 10.A header stack is used to deﬁne repeating headers,e.g., VLAN tags or MPLS labels. It supports special operationsallowing headers to be “pushed” onto the stack or “popped”from it. Struct in P4 is a composed data type similar to structsin programming languages like C. Unlike the header datatype, they can contain ﬁelds of any type including other structs,headers, and others. typedef bit<48> macAddr_t;header ethernet_t {macAddr_t dstAddr;macAddr_t srcAddr;bit<16> etherType;}

Fig. 10: Sample declaration of the Ethernet header.

E. Parsers

Parsers extract header ﬁelds from ingress packets intoheader data and metadata. P4 does not include predeﬁnedpacket formats, i.e., all required header formats includingparsing mechanisms need to be part of the P4 program. Parsersare deﬁned as ﬁnite state machine (FSM) with an explicit

Start state, two ending states (

Accept and

Reject ), and custom statesin between.Figure 12 depicts the structure of a typical P4 parser forEthernet, MPLS, IPv4, TCP, and UDP headers. Figure 11shows the source code fragment of the example parser in aP4 program. The process starts in the Start state and switchesto the

Ethernet state. In this state and the following states,information from the packet headers is extracted according tothe deﬁned header structure.State transitions may be either conditional or unconditional.In the given example, the transition from the

Start state to the

Ethernet state is unconditional while in the

Ethernet state thetransition to the

MPLS , IPv4 , or

Reject state depends on thevalue of the

EtherType ﬁeld of the extracted Ethernet header.Based on previously parsed header information, any number offurther headers can be extracted from the packet. If the headerorder does not comply with the expected order, a packet can bediscarded by switching to the

Reject state. The parser can also parser SampleParser(packet_in p, out headers h) {state start {transition parse_ethernet;}state parse_ethernet {p.extract(h.ethernet);transition select(h.ethernet.etherType) {0x8847: parse_mpls;0x0800: parse_ipv4;default: reject;};}state parse_ipv4 {p.extract(h.ipv4);transition select(h.ipv4.protocol) {6: parse_tcp;17: parse_udp;default: accept;}}state parse_udp {p.extract(h.udp);transition accept;}/* Other states follow */}

Fig. 11: Sample parser implementation of the FSM in Figure 12. implicitly transition into the

Reject state in case of a parserexception, e.g., if a packet is too short.

StartEthernetMPLSethertype == == Custom States

AcceptRejectdefault

Fig. 12: Example for the FSM of a P4 parser that parses packets with Ethernet,MPLS, IPv4, TCP, and UDP headers.

F. Match-Action Controls

Match-action controls express the bulk of the packet pro-cessing algorithm and resemble traditional imperative pro-grams. They are executed after successful parsing of apacket. In some architectures they are also called match-actionpipeline units. In the following, we give an overview of controlblocks, actions, and match-action tables.

1) Control Blocks:

Control blocks, or just controls , aresimilar to functions in general-purpose languages. They are called by an apply() method. They have parameters and cancall also other control blocks. The body of a control blockcontains the deﬁnition of resources, such as tables, actions,and externs that will be used for processing. Furthermore, asingle apply() method is deﬁned that expresses the processingalgorithm.P4 offers statements to express the program ﬂow within acontrol block. Unlike common programming languages, P4does not provide any statements that would allow the pro-grammer to create loops. This ensures that all the algorithmsthat can be coded in P4 can be expressed as directed acyclicgraphs (DAGs) and thus are guaranteed to complete within apredictable time interval. Speciﬁc control statements include: • a block statement {} that expresses sequential executionof instructions. • an if() statement that expresses an execution predicatedon a Boolean condition • a switch() statement that expresses a choice frommultiple alternatives • an exit() statement that ends the control ﬂow withina control block and passes the control to the end of thetop-level controlTransformations are performed by several constructs, suchas • An assignment statement which evaluates the expressionon its right-hand-side and assigns the result to a headeror a metadata ﬁelds • A match-action operation on a table expressed as thetable’s apply() method • An invocation of an action or a function that encapsulatea sequence of statements • An invocation of an extern method that represents special,target- and architecture-speciﬁc processing, often involv-ing additional state, preserved between packetsA sample implementation of basic L2 forwarding is pro-vided in Figure 13.

2) Actions:

Actions are code fragments that can read andwrite packet headers and metadata. They work similarly tofunctions in other programming languages but have no returnvalue. Actions are typically invoked from MATs. They canreceive parameters that are supplied by the control plane asaction data in MAT entries.As in most general-purpose programming languages, theoperations are written using expressions and the results arethen assigned to the desired header or metadata ﬁelds. Theoperations available in P4 expressions include standard arith-metic and logical operations as well as more specialized onessuch as bit slicing ( field[high:low] ), bit concatenation( field1 ++ field2 ), and saturated arithmetic ( |+| and |-| ).Actions can also invoke methods of other objects, such asheaders and architecture-speciﬁc externs, e.g., counters andmeters Other actions can also be called, similar to nestedfunction calls in traditional programming languages.Action code is executed sequentially, although many hard-ware targets support parallel execution. In this case, thecompiler can optimize the action code for parallel execution control SampleControl(inout headers h, inoutstandard_metadata_t standard_metadata) {action l2_forward(egressSpec_t port) {standard_metadata.egress_spec = port;}table l2 {key = {h.ethernet.dstAddr: exact;}actions = {l2_forward; drop;}size = 1024;default_action = drop();}apply {if (h.ethernet.isValid()) {l2.apply();}}}

Fig. 13: Sample control block implementing basic L2 forwarding. as long as its effects are the same as in case of the sequentialexecution.

3) Match-Action Tables (MATs):

MATs are deﬁned withincontrol blocks and invoke actions depending on header andmetadata ﬁelds of a packet. The structure of a MAT is declaredin the P4 program and its table entries are populated by thecontrol plane at runtime. A packet is processed by selectinga matching table entry and invoking the corresponding actionwith appropriate parameters.The declaration of a MAT includes the match key, a list ofpossible actions, and additional attributes.The match key consists of one or more header or metadataﬁelds (variables), each with the assigned match type . The P4core library deﬁnes three standard match types: exact, ternary,and longest preﬁx matching (LPM). P4 architectures maydeﬁne additional match types, e.g., the v1model

P4 architectureextends the set of standard match types with the range andselector match.The list of possible actions includes the names of all actionsthat can be executed by the table. These actions can haveadditional, directional parameters which are provided as actiondata in table entries.Additional attributes may include the size of the MAT, e.g.,the maximum number of entries that can be stored in a table,a default action for a miss, or static table entries.Figure 14 illustrates the principle of MAT operation. TheMAT contains entries with values for match keys, the ID ofthe corresponding action to be invoked, and action data thatserve as parameters for action invocation. For each packet, alookup key is constructed from the set of header and metadataﬁelds speciﬁed in the table deﬁnition. It is matched againstall entries of the MAT using the rules associated with theindividual ﬁeld’s match type. When the ﬁrst match in the tableis found, the corresponding action is called and the actiondata are passed to the action as directionless parameters. If no

Lookup key Key ID DataActionDefault ActionID Data A c t i on H i t / m i ss s e l e c t o r M a t c h i ng H i t Data

Match-action table I D Control planeHeadersMetadata HeadersMetadata

Fig. 14: Structure of MATs in P4. match is found in the table, a default action is applied.As a special case, tables without a speciﬁed key alwaysinvoke the default action.

G. Deparser

The deparser is also deﬁned as a control block. Whenpacket processing by match-action control blocks is ﬁnished,the deparser serializes the packet. It reassembles the packetheader and payload back into a byte stream so that the packetcan be sent out via an egress port or stored in a buffer. Onlyvalid headers are emitted, i.e., added to the packet. Thus,match-action control blocks can easily add and remove headersby manipulating their validity. Figure 15 provides a sampleimplementation. control SampleDeparser(packet_out p, in headers h) {apply {p.emit(h.ethernet);p.emit(h.mpls);p.emit(h.ipv4);/* Normally, a packet can contain either* a TCP or a UDP header (or none at all),* but should never contain both*/p.emit(h.tcp);p.emit(h.udp);}}

Fig. 15: Sample deparser implementation.

IV. P4 A

RCHITECTURES & C

OMPILERS

We present P4 architectures and introduce P4 compilers. A. P4 Architectures

We summarize the concept of P4 architectures, describeexterns, and give an overview of the most common P4 architectures.

1) Concept:

As described before, P4 introduces the con-cept of P4 architectures as an intermediate layer between thecore P4 language and the targets. A P4 architecture servesas programming models that represents the capabilities andthe logical view of a target’s P4 processing pipeline. P4programs are developed for a speciﬁc P4 architecture. Such programs can be deployed on all targets that implement thesame P4 architecture. The manufacturers of P4 targets provideP4 compilers that compile architecture-speciﬁc P4 programsinto target-speciﬁc conﬁguration binaries.

2) Externs:

P4 architectures may provide additional func-tionalities that are not part of the P4 language core. Examplesare checksum or hash computation units, random numbergenerators, packet and byte counters, meters, registers, andmany others. To make such extern functionalities usable, P4 introduces so-called externs .Most of the externs have to be explicitly instantiated in P4programs using their constructor method. The other methodsprovided by these externs can then be invoked on the givenextern instance. Other externs (extern functions) do not requireexplicit instantiating.Along with tables and value sets, P4 externs are allowedto preserve additional state between packets. That state maybe accessible by the control plane, the data plane, or both.For example, the counter extern would preserve the numberof packets or bytes that has been counted so that each newpacket can properly increment it. The speciﬁcs of the statedepend on the nature of the extern and cannot be speciﬁedin the language; this is done inside the vendor-speciﬁc APIdeﬁnitions.While the P4 processing pipeline only allows packet headermanipulation, extern functions may operate on packet payloadas well.

3) Overview of Common P4 Architectures:

We describethe four most common P4 architectures. a) v1model: The v1model mimics the processingpipeline of P4 . As depicted in Figure 16, it consists of aprogrammable parser, an ingress match action pipeline, a traf-ﬁc manager, an egress match-action pipeline, and a deparser.It enables developers to convert P4 programs into P4 programs. Additional functionalities tracking the developmentof the reference P4 software switch Behavioral Model version2 (bmv2) (see Section V) are continuously added. All P4examples in this paper are written using v1model. I ng r e ss E g r e ss T r a f ﬁc m anage r Parser Ingressmatch-actionpipeline Egressmatch-actionpipeline Deparser

Fig. 16: v1model architecture. b) Portable Switch Architecture (PSA):

The PSA is a P4architecture created and further developed by the ArchitectureWG [32] in the P4 Language Consortium. Besides, the WGalso discusses standard functionalities, APIs, and externs thatevery target mapping the PSA should support. Its last speci-ﬁcation is Version 1.1 [33] from November 2018. Figure 17illustrates the P4 processing pipeline of the PSA. It is dividedinto an ingress and egress pipeline. Each pipeline consists of the three programmable parts: parser, multiple control blocks,and deparser. The architecture also deﬁnes conﬁgurable ﬁxed-function components.PSA speciﬁes several packet processing primitives, such as: • Sending a packet to an unicast port • Dropping a packet • Sending the packet to a multicast group • Resubmitting a packet, which moves the currently pro-cessed packet from the end of the ingress pipeline to thebeginning of the ingress pipeline for the purpose of packetre-parsing • Recirculating a packet, which moves the packet thecurrently processed packet from the end of the egresspipeline to the beginning of the ingress pipeline for thepurposes of recursive processing, e.g., tunneling • Cloning a packet, which duplicates the currently pro-cessed packet.

Clone ingress to egress (CI2E) createsa duplicate of the ingress packet at the end of theingress pipeline.

Clone egress to egress (CE2E) creates aduplicate of the deparsed packet at the end of the egresspipeline. In both cases, cloned instances start processingat the beginning of the egress pipeline. Cloning canbe helpful to implement powerful applications such asmirroring and telemetry. I ng r e ss E g r e ss T r a f ﬁc m anage r Parser Match-actionunits DeparserDeparser ParserResubmit Recirculate CE2E C I E Match-actionunits

Ingress pipeline Egress pipeline

Fig. 17: Portable Switch Architecture (PSA) with programmable and ﬁxed-function parts and special packet processing primitives. c) SimpleSumeArchitecture:

The SimpleSumeArchitec-ture is a simpliﬁed P4 architecture that is implemented byFPGA-based P4 targets. As depicted in Figure 18, it featuresa parser, a programmable match-and-action pipeline, and adeparser. I ng r e ss E g r e ss T r a f ﬁc m anage r Parser Match-actionpipeline Deparser

Fig. 18: SimpleSumeArchitecture. d) Toﬁno Native Architecture (TNA):

TNA is a propri-etary P4 architecture designed for Intel Toﬁno switchingASICs (see Section V-C). Intel has published the architecture deﬁnitions and allows developers to publish programs writtenby using it.The architecture describes a very high-performance,“industry-strength” device that is relatively complex. Thebasic programming unit is a so-called Pipeline() packagethat resembles an extended version of the Portable SwitchArchitecture (PSA) pipeline and consists of 6 top-level pro-grammable components: the ingress parser, ingress match-action control, ingress deparser, and their egress counter-parts. Since Toﬁno devices can have two or four processingpipelines, the ﬁnal switch package can be formed anywherefrom one to four distinct pipeline packages. More complexversions of the

Pipeline() package allow the programmerto specify different parsers for different ports.TNA also provides a richer set of externs com-pared to most other architectures. Most notable is TNA

RegisterAction() which represents a small code frag-ment that can be executed on the register instead of simpleread/write operations provided in other architectures. TNAprovides a clear and consistent interface for mirroring andresubmit with additional metadata being passed via the packetbyte stream. The same technique is also used to pass intrinsicmetadata which greatly simpliﬁes the design.Additional externs that are not present in other architec-tures include low-pass ﬁlters, weighted random early discardexterns, powerful hash externs that can compute CRC basedon user-deﬁned polynomials, ParserCounter, and others.The set of intrinsic metadata in Toﬁno is also larger thanin most other P4 architectures as presented before. Notableis support for two-level multicasting with additional sourcepruning, copy-to-cpu functionality, and support for IEEE 1588.

B. P4 Compiler

P4 compilers translate P4 programs into target-speciﬁcconﬁguration binaries that can be executed on P4 targets. Weﬁrst explain compilers based on the two-layer model whichare most widely in use. Then we mention other compilers inless detail.

1) Two-Layer Compiler Model:

Most P4 compilers usethe two-layer model, consisting of a common frontend anda target-speciﬁc backend.The frontend is common for all the targets and is responsiblefor parsing, syntactic and target-independent semantic analysisof the program. The program is ﬁnally transformed intoan intermediate representation (IR) that is then consumedby the target-speciﬁc backend which performs target-speciﬁctransformations.The ﬁrst-generation P4 compiler for P4 was written inPython and used the so-called high-level intermediate repre-sentation (HLIR) [34] that represented P4 program as a treeof Python objects. The compiler is referred to as p4-hlir.The new P4 compiler (p4c) [35] is written in C++ anduses C++-object-based IR. As an additional beneﬁt, the IRcan be output as a P4 program or a JSON ﬁle. The latterallows the developers and users to build powerful tools forprogram analysis without the need to augment the compiler.Figure 19 visualizes its structure and operating principle. The P4 program(.p4) ...

Back-endcompiler AFront-endcompiler Back-endcompiler Z Target ATarget ZIntermediaterepresentation

Fig. 19: Structure and operation principle of P4 compilers using the two-layermodel. compiler consists of a generic frontend that accepts both P4 and P4 code which may be written for any architecture.It furthermore has several reference backends for the bmv2,eBPF, and uBPF P4 targets as well as a backend for testingpurposes and a backend that can generate graphs of controlﬂows of P4 programs. In addition, p4c provides the so-called“mid-end” which is a library of generic transformation passesthat are used by the reference backends and can also be usedby vendor-speciﬁc backends. The compiler is developed andmaintained by P4.org.P4 target vendors design and maintain their own compilersthat include the common frontend. This ensures the uniformityof the language which is accepted by different compilers.

2) Other Compilers:

MACSAD [36] is a compiler thattranslates P4 programs into Open Data Plane (ODP) [37]programs. Jose et al. [38] introduce a compiler that mapsP4 programs to FlexPipe and RMT, two common softwareswitch architectures. P4GPU [39] is a multistage frameworkthat translates a P4 program into intermediate representationsand other languages to eventually generate GPU code.V. P4 T

ARGETS

We describe P4 targets based on software, FPGA, ASIC,and NPU. Table II compiles an overview of the targets, theirsupported architectures, and the current state of development.

Target P4 Version P4 Architecture ActiveDevelopmentSoftware p4c-behavioral P4 n.a. Xbmv2 P4 , P4 v1model, psa (cid:88) eBPF P4 ebpf_model.p4 (cid:88) uBPF P4 ubpf_model.p4 (cid:88) XDP P4 xdp_model.p4 (cid:88) T4P4S P4 , P4 v1model, psa (cid:88) Ripple n.a n.a n.aPISCES P4 n.a. XPVPP n.a. n.a. XZodiacFX P4 zodiacfx_model.p4 n.a. FPGA P4 → NetFPGA P4 SimpleSumeSwitch (cid:88)

Netcope P4 n.a. n.a. (cid:88)

P4FPGA P4 , P4 n.a. X ASIC

BarefootToﬁno/Toﬁno 2 P4 , P4 v1model, psa,TNA (cid:88) Pensando Capri P4 n.a (cid:88) NPU

Netronome P4 , P4 v1model (cid:88) TABLE II: Overview of P4 targets. A. Software-Based P4 Targets

Software-based P4 targets are packet forwarding programsthat run on a standard CPU. We describe the 9 software-basedP4 targets mentioned in Table II.

1) p4c-behavioural: p4c-behavioral [40] is a combined P4compiler and P4 software target. It was introduced with theﬁrst public release of P4. p4c-behavioral translates the givenP4 program into an executable C program.

2) Behavioral Model version 2 (bmv2):

The second versionof the P4 software switch Behavioral Model (bmv2) [41] wasintroduced to address the limitations of p4c-behavioural (seealso [42]). In contrast to p4c-behavioral, the source code ofbmv2 is static and independent of P4 programs. P4 programsare compiled to a JSON representation that is loaded onto thebmv2 during runtime. External functions and other extensionscan be added by extending bmv2’s C++ source code. bmv2 isnot a single target, but a collection of targets [43]: • simple_switch is the bmv2 target with the largest range offeatures. It contains all features from the P4 speciﬁca-tion and supports the v1model architecture of P4 . sim-ple_switch includes a program-independent Thrift APIfor runtime control. • simple_switch_grpc extends simple_switch bythe P4Runtime API that is based on gRPC (seeSection VI-C1). • psa_switch is similar to simple_switch, but supports PSAinstead of v1model. • simple_router and l2_switch support only parts of thestandard metadata and do not support P4 . They areintended to show how different architectures can beimplemented with bmv2.Although bmv2 is intended for testing purposes only,throughput rates up to 1 Gbit/s for a P4 program with IPv4LPM routing have been reported [44]. bmv2 is under activedevelopment, i.e., new functionality is added frequently.

3) BPF-based Targets:

Berkeley Packet Filters (BPFs) addan interface on a UNIX system that allows sending andreceiving raw packets via the data link layer. User spaceprograms may rely on BPFs to ﬁlter packets that are sent toit. BPF-based P4 targets are mostly intended for programmingpacket ﬁlters or basic forwarding in P4. a) eBPF:

Extended Berkeley Packet Filters (eBPFs) arean extension of BPFs for the Linux kernel. eBPF programsare dynamically loaded into the Linux kernel and executed ina virtual machine (VM). They can be linked to functions inthe kernel, inserted into the network data path via iproute2,or bound to sockets or network interfaces. eBPF programs arealways veriﬁed by the kernel before execution, e.g., programswith loops or backward pointers would not be executed. Dueto their execution in a VM, eBPF programs can only accesscertain regions in memory besides the local stack. Accessingkernel resources is protected by a white list. eBPF programsmay not block and sleep, and usage of locks is limited toprevent deadlocks. The p4c compiler features the p4c-ebpf back-end to compile P4 programs to eBPF [45]. b) uBPF: user-space BPFs (uBPFs) relocate the eBPFVM from the kernel space to the user space. p4c-ubpf [46] is a backend for p4c that compiles P4 HLIR for uBPF. In contrastto p4c-ebpf, it also supports packet modiﬁcation, checksumcalculation, and registers, but no counters. c) XDP: eXpress Data Path (XDP) is based on eBPFand allows to load an eBPF program into the RX queue of adevice driver. p4c-xdp [47] is a backend for p4c that compilesP4 HLIR for XDP. Similar to p4c-ubpf, it supports packetmodiﬁcation and checksum calculation. In contrast to p4c-ebpf, it supports counters instead of registers.

4) T P S: T P S (pronounced "tapas") [48], [49] is a soft-ware P4 target that relies on interfaces for accelerated packetprocessing such as Data Plane Development Kit (DPDK)[50] or Open Data Plane (ODP) [37]. T P S provides acompiler that translates P4 programs into target-independentC code that interfaces a network hardware abstraction library.Hardware-dependent and hardware-independent functionalitiesare separated from each other. Its source code is available onGitHub [51]. Bhardwaj et al. [52] describe optimizations forimproving T P S performance by up to 15%.

5) Ripple:

Ripple [53] is a P4 target based on DPDK. Ituses a static universal binary that is independent of the P4program. The data plane of the static binary is conﬁgured atruntime based on P4 HLIR. This results in a shorter downtimewhen updating a P4 program in contrast to targets like T P S.Ripple uses vectorization to increase the performance of packetprocessing.

6) PISCES:

PISCES [54] transforms the Open vSwitch(OVS) [55] into a software P4 target. OVS is a popularSDN software switch that is designed for high throughput onvirtualization platforms for ﬂexible networking between VMs.The PISCES compiler translates P4 programs into C code thatreplace parts of the source code of OVS. This makes OVSdependent on the P4 program, i.e., OVS must be recompiledwith every modiﬁcation of the P4 program. PISCES doesnot support stateful components such as registers, counters,or meters. The developers claim that PISCES does not addperformance overhead to OVS. As the last commit in thepublic repository [56] is from 2016, PISCES seems not tobe under active development.

7) PVPP:

PVPP [57], [58] integrates P4 programs into plu-gins for Vector Packet Processors (VPP) (see Section II-D1).The P4-to-PVPP compiler comprises two stages. First, amodiﬁed p4c compiler translates P4 programs into target-dependent JSON code. Then, a Python compiler translates theJSON code into a VPP plugin in C source code. Accordingto the authors, performance decreases by 5-17% compared toVPP but is still signiﬁcantly better than OVS. Unfortunately,the source code and further information are not available forthe public.

8) ZodiacFX:

The ZodiacFX is a lightweight developmentand experimentation board originally designed as OF switch.It is based on an Atmel processor and an Ethernet switchingchip [59]. The authors provided an extension [60], [61] to runP4 programs on the board. P4 programs are compiled usingan extended version of p4c and the p4c-zodiacfx backendcompiler. Then, the result of this compilation is used togenerate a ﬁrmware image. Zanna et al. [62] compare the performance of P4 and OF on that target, and ﬁnd out thatdifferences among all test cases are small. B. FPGA-Based P4 Targets

Several tool chains translate P4 programs into implemen-tations for ﬁeld programmable gate arrays (FPGAs). Theprocess includes logic synthesis, veriﬁcation, validation, andplacement/routing of the logic circuit for the FPGA. Wedescribe the P4 → NetFPGA, Netcope P4, and P4FPGA toolchain. Finally, we mention research results for FPGA-basedP4 targets.

1) P4 → NetFPGA:

The P4 → NetFPGA workﬂow [63], [64]provides a development environment for compiling and run-ning P4 programs on the NetFPGA SUME board [65]. Thedevelopment environment is built around the P4-SDnet com-piler and the SDnet data plane builder from Xilinx, i.e., afull license for the Xilinx Vivado design suite is needed.Custom external functions can be implemented in a hardwaredescription language (HDL) such as Verilog and includedin the ﬁnal FPGA program. This also allows external IPcores to be integrated as P4 externs in P4 programs. TheP4 → NetFPGA tool chain supports P4 based on the P4architecture SimpleSumeSwitch (see Section IV-A).

2) Netcope P4:

Netcope P4 [66] is a commercial cloudservice that creates FPGA ﬁrmware from P4 programs. Knowl-edge of HDL development is not needed and all necessaryIP cores are provided by Netcope. The cloud service can beused in conjunction with the Netcope software developmentkit (SDK). This combination allows developers to combinethe VHDL code of the cloud service with custom HDL code,e.g., from an external function. As target platform, NetcopeP4 supports FPGA boards from Netcope, Silicom, and Intelthat are based on Xilinx or Intel FPGAs.

3) P4FPGA:

P4FPGA [67] is a P4 and P4 compilerand runtime for the Bluespec programming language that cangenerate code for Xilinx and Altera FPGAs. The last commitin the archived public repository [68] is from 2017.

4) Research Results:

Benácek and Kubátová [69], [70]present how P4 parse graph descriptions can be converted tooptimized VHDL code for FPGAs. The authors demonstratehow a complex parser for several header ﬁelds achieves athroughput of 100 Gbit/s on a Xilinx Virtex-7 FPGA whileusing 2.78% slice look up tables (LUTs) and 0.76% sliceregisters (REGs). In a follow-up work [71], the optimizedparser architecture supports a throughput of 1 Tbit/s on XilinxUltraScale+ FPGAs and 800 Gbit/s on Xilinx Virtex-7 FPGAs.Da Silva et al. [72] also investigate the high-level synthesis ofpacket parsers in FPGAs. Kekely and Korenek [73] describehow MATs can be mapped to FPGAs. Iša et al. [74] describea system for automated veriﬁcation of register-transfer level(RTL) generated from P4 source code. Cao et al. [75], [76]propose a template-based process to convert P4 programs toVHDL. They use a standard P4 frontend compiler to compilethe P4 program into an intermediate representation. From thisrepresentation, a custom compiler maps the different elementsof the P4 program to VHDL templates which are used togenerate the FPGA code.

C. ASIC-Based P4 Targets1) Intel Toﬁno:

Intel Toﬁno is the world’s ﬁrst end-userprogrammable Ethernet switch ASIC. It is designed for veryhigh throughput of 6.5 Tbit/s (4.88 B pps) with 65 ports run-ning at 100 Gbit/s. Its successor, the Toﬁno 2 ASIC, supportsthroughput rates of up to 12.8 Tbit/s with ports running at upto 400 Gbit/s. Toﬁno has been built by Barefoot Networks, aformer startup company that was acquired by Intel in 2019.The Toﬁno ASIC implements the TNA, a custom P4 ar-chitecture that signiﬁcantly extends PSA (see Section IV-A).It provides support for advanced device capabilities whichare required to implement complex, industrial-strength dataplane programs. The device comes with 2 or 4 independentpacket processing pipelines (pipes), each capable of serving16 100 Gbit/s ports. All pipes can run the same P4 programor each pipe can run its own program independently. Pipescan also be connected together, allowing the programmers tobuild programs requiring longer processing pipelines.The Toﬁno ASIC processes packets at line rate irrespectiveof the complexity of the executed P4 program. This is achievedby a high degree of pipelining (each pipe is capable of process-ing hundreds of packets simultaneously) and parallelization. Inaddition to standard arithmetic and logical operations, Toﬁnoprovides specialized capabilities, often required by data planeprograms, such as hash computation units and random numbergenerators. For stateful processing Toﬁno offers counters,meters, and registers, as well as more specialized processingunits. Some of them support specialized operations, such asapproximate non-linear computations required to implementstate-of-the-art data plane algorithms. Built-in packet gener-ators allow the data plane designers to implement protocols,such as BFD, without using externally running control planeprocesses. These and other components are exposed throughTNA which is openly published by Intel [77].Toﬁno ﬁxed-function components offer plenty of advancedfunctionality. The buffering engine has a uniﬁed 22 MB buffer,shared by all the pipes, that can be subdivided into severalpools. Toﬁno Trafﬁc Manager supports both store-and-forwardas well as the cut-through mode, up to 32 queues per port, pre-cise trafﬁc shaping and multiple scheduling disciplines. Toﬁnoprovides nanosecond-precision timestamping that facilitatesboth the implementation of time synchronization protocols,such as IEEE 1588, as well as precise delay measurements.Additional intrinsic metadata support a variety of telemetryapplications, such as INT.The development is conducted using Intel P4 Studio whichis a software development environment containing the P4compiler, the driver, and other software necessary to programand manage the Toﬁno. A special interactive visualizationtool (P4i) allows the developers to see the P4 program beingmapped onto the speciﬁc hardware resources further assist-ing them in ﬁtting and optimizing their programs. Intel P4compiler for Toﬁno has special capabilities, allowing it toparallelize the code thereby taking advantage of the highlyparallel nature of Toﬁno hardware.A number of original design manufacturers (ODMs) pro-duce open systems (whiteboxes) with the Toﬁno ASIC that are used for research, development, and production of customsystems. Examples include the EdgeCore Wedge 100BF-32X[78], APS Networks BF2556-1T-A1F [79] and BF6064-T-A2F[80], NetBerg Aurora 610 [81], and others.Most whitebox systems follow a modern, server-like designwith a separate board management controller, responsible forhandling power supplies, fans, LEDs, etc., and a main CPU,typically x86_64, running a Linux operating system. The mainCPU is connected to the Toﬁno ASIC via a PCIe interface.Some boards also provide one or more high-speed on-boardEthernet connections for faster packet interface. External Eth-ernet ports support speeds from 10 Gbit/s to 100 Gbit/s usingstandard QSFP28 cages although some systems offer lower-speed (1 Gbit/s) ports as well. Most of these systems arealso powerful enough to support running development toolsnatively, e.g., a P4 compiler, even though this is not necessarilyrequired.Toﬁno ASICs are also used in proprietary network switches,e.g., by Arista [82] and Cisco [83]. Some Toﬁno-basedswitches are supported by Microsoft SONiC [84].

2) Pensando Capri:

The Capri P4 Programmable Processor[85], [86] is an ASIC that powers network interface cards(NICs) by Pensando Systems aimed for cloud providers. Itis coupled with ﬁxed function components for cryptographyoperations like AES or compression algorithms and featuresmultiple ARM cores.

D. NPU-Based P4 Targets

Network processing units (NPUs) are software-programmable ASICs that are optimized for networkingapplications. They are part of standalone network devices ordevice boards, e.g., PCI cards.Netronome network ﬂow processing (NFP) silicons can beprogrammed with P4 [87] or C [88]. A C-based programmingmodel is available that supports program functions to accesspayloads and allows developing P4 externs. The Agilio P4CSDK consists of a tool chain including a backend compiler,host software, and a full-featured integrated developmentenvironment (IDE). All current Agilio SmartNICs based onNFP-4000, NFP-5000, and NFP-6480 are supported. Harkouset al. [89] investigate the impact of basic P4 constructs onpacket latency on Agilio SmartNICs.VI. P4 D

ATA P LANE

API S We introduce data plane APIs for P4, present a characteri-zation, describe the three most commonly used P4 data planeAPIs, and compare different control plane use cases.

A. Deﬁnition & Functionality

Control planes manage the runtime behavior of P4 targetsvia data plane APIs. Alternative terms are control plane APIs and runtime APIs . The data plane API is provided by a devicedriver or an equivalent software component. It exposes dataplane features to the control plane in a well-deﬁned way.Figure 20 shows the main control plane operations. Mostimportant, data plane APIs facilitate runtime control of P4 entities (MATs and externs). They typically also comprise apacket I/O mechanism to stream packets to/from the controlplane. They also include reconﬁguration mechanisms to loadP4 programs onto the P4 target. Control planes can controldata planes only through data plane APIs, i.e., if a data planefeature is not exposed via a corresponding API, it cannot beused by the control plane.

Control planeRuntimecontrol PacketI/O LoadP4 programP4 target Data planeAPIMAT Extern CPU port

Fig. 20: Runtime management of a P4 target by the control plane throughthe data plane API. The ﬁgure depicts the four most central operations:Runtime control of MATs and extern objects, packet-in/out, and loading ofP4 programs.

It is important to note that P4 does not require a data planeAPIs. P4 targets may also be used as a packet processor witha ﬁxed behavior that is deﬁned by the P4 program where staticMAT entries are part of the P4 program itself.

B. Characterization of Data Plane APIs

Data plane APIs in P4 can be characterized by their levelof abstraction, their dependency on the P4 program, and thelocation of the control plane.

1) Level of Abstraction:

Data plane APIs can be character-ized by their level of abstraction. • Device access APIs provide direct access to hardwarefunctionalities like device registers or memories. Theytypically use low-level mechanisms like DMA transac-tions. While this results in very low overhead, this typeof API can be neither vendor- nor device-independent. • Data plane speciﬁc APIs are APIs with a higher level ofabstraction. They provide access to objects deﬁned by theP4 program instead of hardware-speciﬁc parts. In contrastto device access APIs, vendor- and device-independenceis possible for this type of API.

2) Dependency on the P4 Program:

Data plane APIs canbe characterized by their dependency on the P4 program. • Program-dependent APIs have a set of functions, datastructures, and other names that are derived from theP4 program itself. Therefore, they depend on the P4program and are applicable to this P4 program only.If the corresponding P4 program is changed, functionnames, data structures, etc., might change, which requiresa recompilation or modiﬁcation of the control planeprogram. • Program-independent APIs consist of a ﬁxed set of func-tions that receives a list of P4 objects that are deﬁned inthe P4 program. Thus, the names of the API functions,data structures, etc., do not depend on the programand are universally applicable. If the corresponding P4program changes, neither the names, nor the deﬁnitions ofthe API functions will change as long as the control plane “knows” the names of the right tables, ﬁelds and otherobject that need to be operated on. Program-independentAPIs model conﬁgurable objects either with the object-based or the table-based approach. As known fromobject-oriented programming, the object-based approachrelies on methods that are deﬁned for each class of dataplane objects. In contrast, the table-based approach treatsevery class of data plane object as a variation of a table.This reduces the number of API methods as only tablemanipulations need to be provided as methods.

3) Control Plane Location:

Data plane APIs can be char-acterized by the location of the control plane. • APIs for local control are implemented by the devicedriver and are executed on the local CPU of the devicethat hosts the programmable data plane. Usually, the APIsare presented as set of C function calls just like for otherdevices that operating system are accessing. • APIs for remote control add the ability to invoke API callsfrom a separate system. This increases system stabilityand modularity, and is essential for SDN and othersystems with centralized control. Remote control APIsfollow the base methodology of remote procedure calls(RPCs) but rely on modern message-based frameworksthat allow asynchronous communication and concurrentcalls to the API. Examples are Thrift [90] or gRPC[91]. For example, gRPC uses HTTP/2 for transport andincludes many functionalities ranging from access au-thentication, streaming, and ﬂow control. The protocol’sdata structures, services, and serialization schemes aredescribed with protocol buffers (protobuf) [92].

C. Data Plane API Implementations

We introduce the three most common data plane APIs:P4Runtime, Barefoot Runtime Interface (BRI), and BM Run-time. All of them are data-plane speciﬁc and program-independent. Table III lists their properties that have beenintroduced before.

1) P4Runtime API:

P4Runtime is one of the most com-monly used data plane APIs that is standardized in the APIWG [93] of the P4 Language Consortium. For implementingthe RPC mechanisms, it relies on the gRPC framework withprotobuf. Its most recent speciﬁcation v1.3.0 [94] was pub-lished in December 2020. a) Operating Principle:

Figure 21 depicts the operatingprinciple of P4Runtime. P4 targets include a gRPC server,controllers implement a gRPC client. To protect the gRPCconnection, TLS with optional mutual certiﬁcate authenti-cation can be enabled. The API structure of P4Runtime isdescribed within the p4runtime.proto deﬁnition. ThegRPC server on P4 targets interacts with the P4-programmablecomponents via platform drivers. It has access to P4 entities(MATs or externs) and can load target-speciﬁc conﬁgurationbinaries. The structure of the API calls to access P4 enti-ties are described in the p4info.proto . It is part of theP4Runtime but developers can extend it to use custom datastructures, e.g., to implement interaction with target-speciﬁcexterns. P4Runtime provides support for multiple controllers. For every P4 entity, read access is provided to all controllerswhereas write access is only provided to one controller. Tomanage this access, P4 entities can be arranged in groupswhere each group is assigned to one primary controller withwrite access and arbitrary, secondary controllers with readaccess. Interaction between controllers and P4 targets worksas follows. P4 compilers (see Section IV-B) with support forP4Runtime generate a P4Runtime conﬁguration. It consists ofthe target-speciﬁc conﬁguration binaries and P4Info metadata.P4Info describes all P4 entities (MATs and externs) that can beaccessed by controllers via P4Runtime. Then, the controllersestablish a gRPC connection to the gRPC server on the P4target. The target-speciﬁc conﬁguration is loaded onto the P4target and P4 entities can be accessed.

P4 entitiesTarget-speciﬁcconﬁgurationbinariesController(optional)P4RuntimeconﬁgurationP4InfoTarget-speciﬁcconﬁgurationbinariesP4program p4info.protogRPC clientController(primary)P4Runtime interfacegRPC clientgRPC server P4Runtime APIspeciﬁcationDeclaration ofP4 entities Platform driversP4compiler p4runtime.protoP4 target

Fig. 21: P4Runtime architecture (similar to [94]). b) Implementations: gRPC and protobuf libraries areavailable for many high-level programming languages suchas C++, Java, Go, or Python. Thereby, P4Runtime can beimplemented easily on both controllers and P4 targets. • Controllers : P4Runtime is supported by most commonSDN controllers. P4 brigade [95] introduces supportfor P4Runtime on the Open Network Operating System(ONOS). OpenDaylight (ODL) introduces support forP4Runtime via a plugin [96]. Stratum [97] is an open-source network operating system that includes an imple-mentation of the P4Runtime and OpenConﬁg interfaces.Custom controllers, e.g., for P4 prototypes, can be im-plemented in Python with the help of the p4runtime_lib[98]. • Targets : The

PI Library [99] is the open-source referenceimplementation of a P4Runtime gRPC server in C. It im-plements functionality for accessing MATs and supportsextensions for target-speciﬁc conﬁguration objects, e.g.,registers of a hardware P4 target. The PI Library is usedby many P4 targets including bmv2 [100] and the Toﬁno.

2) Barefoot Runtime Interface (BRI):

The BRI consists oftwo independent APIs that are available on Toﬁno-based P4hardware targets. The

BfRt API is an API for local control.It includes C, C++ and Python bindings that can be usedto implement control plane programs. The

BF Runtime isan API for remote control. As for P4Runtime, it is based on the gRPC RPC framework and protobuf, i.e., bindingsfor different languages are available. An additional Pythonlibrary implements a simpler, BfRt-like interface for caseswhere simplicity is more essential than the performance ofBF Runtime.

3) BM Runtime API:

BM Runtime API is a program-independent data plane API for the bmv2 software target.It relies on the Thrift RPC framework. bmv2 includes acommand line interface (CLI) program [101] to manipulateMATs and conﬁgure the multicast engine of the bmv2 P4software target via this API.

API Programindepen-dence Control plane location

P4Runtime (cid:88)

Remote (gRPC)BF Runtime (cid:88)

Remote (gRPC)BfRt API (cid:88)

Local (C, C++ and Python bindings)BM Runtime (cid:88)

Remote (Thrift RPC)TABLE III: Characterization of data plane speciﬁc APIs.

D. Controller Use Case Patterns

We present three use case patterns which are abstractionsof the controller use cases introduced in the P4Runtimespeciﬁcation [94]. However, these are neither conclusive norcomplete as derivations or extensions are possible.

1) Embedded/Local Controller:

P4 hardware targets (seeSection V) comprise or are attached to a computing platform.This facilitates running controllers directly on the P4 target.Figure 22 depicts this setup. The controller application mayeither use a local API, e.g., C calls, or just execute a controllerapplication that interfaces the data plane via an RPC channel.

Programmabledata planeEmbeddedcontroller

Local/remoteAPI

P4 target

Fig. 22: Embedded/local controller use case pattern. The P4 target comprisesan embedded controller that is running a control plane program.

2) Remote Controllers:

Remote controllers resemble thetypical SDN setup where data plane devices are managedby a centralized control plane with an overall view on thenetwork. Controllers need to be protected against outages andcapacity overload, i.e., they need to be replicated for fail-safety and scalability. Figure 23 depicts two possible usecases. In the ﬁrst shown use case (a), the programmable dataplane on the P4 target is managed by remote controllers. Inthe second shown use case (b), the P4 target is managedby both, the embedded controller and remote controllers.Remote controllers might be interfaced using the remote APIof the programmable data plane or an arbitrary API that isprovided by the embedded controller. This option is oftenused for the implementation of so-called hierarchical controlplane structures where control plane functionality is distributed among different layers. Control plane functions that do notrequire a global view of the network, e.g., link discovery,MAC learning for L2 forwarding, or port status monitoring,can be solely performed by the embedded/local controller.Other control plane functions that require an overall view ofthe network, e.g., routing applications, can be performed bythe remote controller, possibly in cooperation with the embed-ded/local controller where the local controller acts as proxy,i.e., it relays control plane messages between the P4 targetand the global controller. Hierarchical control planes improveload distribution as many tasks can be performed locally,which reduces load on the remote controllers. In particular,time-critical operations may beneﬁt from local controllers asadditional delays caused by the communication between a P4target and a global controller are avoided.

Remote API

Programmabledata planeRemotecontroller...P4 target Programmabledata planeEmbeddedcontroller

Local/remoteAPI

P4 target

Remote API

Remotecontroller... (a) Remotecontrollers (b) Local/embedded controller +remote controllers

Remote API

Remotecontroller...

Fig. 23: Remote controller use case pattern.

VII. A

DVANCES IN

P4 D

ATA P LANE P ROGRAMMING

We give an overview on research to improve P4 data planeprogramming. Figure 24 depicts the structure of this section.We describe related work on optimization of development anddeployment, testing and debugging, research on P4 targets, andresearch on control plane operation.

Research &developmenton P4 Testing &debuggingOptimization ofdevelopment &deployment

Program developmentCompiler optimizationSimulationProgram veriﬁcationTestingBenchmarkingDebugging

Research on P4targets

Virtualization of P4 data planesComposite P4 targetsP4 externsSecure behavior of targetsTestbeds

Research oncontrol planeoperation

Fig. 24: Organization of Section VII. A. Optimization of Development and Deployment

We describe research work on optimizing the development& deployment process of P4.

1) Program Development:

Graph-to-P4 [102] generates P4program code for given parse graphs. This introduces a higherabstraction layer that is particularly helpful for beginners.Zhou et al. [103] introduce a module system for P4 to improvesource code organization. DaPIPE [104] enables incrementaldeployment of P4 program code on P4 targets. SafeP4 [105]adds type safety to P4. P4I/O [106] presents a framework forintent-based networking with P4. Network operator describetheir network functions with an Intent Deﬁnition Language(IDL) and P4I/O generates a complete P4 program accord-ingly. To that end, P4I/O provides a P4 action repository withvarious network functions. During reconﬁguration, table andregister state are preserved by applying backup mechanisms.P4I/O is implemented for a custom bmv2. Mantis [107] is aframework to implement fast reactions to changing networkconditions in the data plane without controller interaction.To that end, annotations in the P4 code specify dynamiccomponents and a quick control loop of those componentsensure timely adjustments if necessary. Lyra [108] is a pipelineabstraction that allows developers to use simple statementsto describe their desired data plane without low-level targetspeciﬁc knowledge. Lyra then compiles that description totarget-speciﬁc code for execution. GP4P4 [109] is a pro-gramming framework for self-driven networks. It generatesP4 code from behavioral rules deﬁned by the developer. Tothat end, GP4P4 evaluates the quality of the automaticallygenerated programs and improves them based on geneticalgorithms. FlowBlaze.p4 [110]–[112] implements an executorfor FlowBlaze, an abstraction based on an extended ﬁnite statemachine for building stateful packet processing functions, inP4. This library maps FlowBlaze elements to P4 componentsfor execution on the bmv2. It also provides a GUI for deﬁningthe extended ﬁnite state machine.

2) Compiler Optimization: pcube [113] is a preprocessorfor P4 that translates primitive annotations in P4 programs intoP4 code for common operations such as loops. CacheP4 [114]introduces a behavior-level cache in front of the P4 pipeline.It identiﬁes ﬂows and performs a compound of actions toavoid unnecessary table matches. The cache is ﬁlled duringruntime by a controller that receives notiﬁcations from theswitch. P5 [115] optimizes the P4 pipeline by removing inter-feature dependencies. dRMT [6] is a new architecture for pro-grammable switches that introduces deterministic throughputand latency guarantees. Therefore, it generates schedules forCPU and memory resources from a P4 program. P2GO [116]leverages monitored trafﬁc information to optimize resourceallocation during compilation. It adjusts table and register sizeto reduce the pipeline length, and ofﬂoads rarely used parts ofthe program to the control plane. Yang et al. [117] propose acompiler module that optimizes lookup speed by reorganizingﬂow tables and prioritization of popular forwarding rules. Vasset al. [118] analyze and discuss algorithmic aspects of P4compilation.

B. Testing and Debugging

We describe research work on simulation, program veriﬁ-cation, testing, benchmarking, and debugging.

1) Simulation:

PFPSim [119] is a simulator for validationof packet processing in P4. NS4 [120], [121] is a networksimulator for P4 programs that is based on the networksimulator NS3.

2) Program Veriﬁcation:

McKeown et al. [122] introducea tool to translate P4 to the Datalog declarative programminglanguage. Then, the Datalog representation of the P4 programcan be analyzed for well-formedness. Kheradmand et al. [123]introduce a tool for static analysis of P4 programs that is basedon formal semantics. P4v [124] adapts common veriﬁcationmethods for P4 that are based on annotations in the P4 programcode. Freire et al. [125], [126] introduce assertion-basedveriﬁcation with symbolic execution. Stoenescu et al. [127]propose program veriﬁcation based on symbolic execution incombination with a novel description language designed forthe properties of P4. P4

AIG [128] proposes to use hardwareveriﬁcation techniques where developers have to annotate theircode with First Order Logic (FOL) speciﬁcations. P4

AIG thenencodes the P4 program as an Advanced-Inverter-Graph (AIG)which can be veriﬁed by hardware veriﬁcation techniques suchas circuit SAT solvers and bounded model checkers. bf4 [129]leverages static code veriﬁcation and runtime checks of rulesthat are installed by the controller to conﬁrm that the P4program is running as intended. netdiff [130] uses symbolicexecution to check if two data planes are equivalent. Thiscan be useful to verify if a data plane behaves correctly bycomparing it with a similar one, or to verify that optimizationsof a data plane do not change its behavior. Youseﬁ et al.[131] present an abstraction for liveness veriﬁcation of statefulnetwork functions (NFs). The abstraction is based on booleanformulae. Further, they provide a compiler that translates theseformulae into P4 programs.

3) Testing:

P4pktgen [132] generates test cases for P4programs by creating test packets and table entries. P4Tester[133] implements a detection scheme for runtime faults in P4programs based on probe packets. P4app [134] is a partiallyautomated open source tool for building, running, debugging,and testing P4 programs with the help of Docker images. P4RL[135] is a reinforcement learning based system for testing P4programs and P4 targets at runtime. The correct behavior isdescribed in a simple query language so that a reinforcementagent based on Double DQN can learn how to manipulateand generate packets that contradict the expected behavior.P4TrafﬁcTool [136] analyzes P4 programs to produce plugincode for common trafﬁc analyzers and generators such asWireshark.

4) Benchmarking:

Whippersnapper [137] is a benchmarksuite for P4 that differentiates between platform-independentand platform-speciﬁc tests. BB-Gen [138] is a system to evalu-ate P4 programs with existing benchmark tools by translatingP4 code into other formats. P8 [139] estimates the averagepacket latency at compilation time by analyzing the data pathprogram.

5) Debugging:

Kodeswaran et al. [140] propose to use Ball-Larus encoding to track the packet execution path through a P4 program for more precise debugging capabilities. p4-data-ﬂow [141] detects bugs by creating a control ﬂow graph of aP4 program and then identiﬁes incorrect behavior. P4box [142]extends the P4 reference compiler by so-called monitors thatinsert code before and after programmable blocks, e.g., controlblocks, for runtime veriﬁcation. P4DB [143] [144] introducesa runtime debugging system for P4 that leverages additionaldebugging snippets in the P4 program to generate reportsduring runtime. Neves et al. [145] propose a sandbox forP4 data plane programs for diagnosis and tracing. P4Consist[146] veriﬁes the consistency between control and data plane.Therefore, it generates active probe-based trafﬁc for whichthe control and data plane generate independent reports thatcan be compared later. KeySight [147] is a troubleshootingplatform that analyzes network telemetry data for detectingruntime faults. Gauntlet [148] ﬁnds both crash bugs, i.e.,abnormal termination of compilation operation, and semanticbugs, i.e., miscompilation, in compilers for programmablepacket processors. C. Research on P4 Targets

We describe research work on virtualization of P4 dataplanes, composite targets, P4 externs, secure behavior oftargets, and testbeds.

1) Virtualization of P4 Data Planes:

P4 targets are de-signed to execute one P4 program at any given time. Vir-tualization aims at sharing the resources of P4 targets formultiple P4 programs. Krude et al. [149] provide theoreticaldiscussions on how ASIC- and FPGA-based P4 targets can beshared between different tenants and how P4 programs can bemade hot-pluggable.HyPer4 [150] introduces virtualization for P4 data planes. Itsupports scenarios such as network slicing, network snapshot-ting, and virtual networking. To that end, a compiler translatesP4 programs into table entries that conﬁgure the HyPer4 persona , a P4 program that contains implementations of basicprimitives. However, HyPer4 does not support stateful memory(registers, counters, meters), LPM, range match types, andarbitrary checksums. The authors describe an implementationfor bmv2 and perform experiments that reveal 80 to 90% lowerperformance in comparison to native execution.HyperV [151]–[153] is a hypervisor for P4 data planeswith modular programmability. It allows isolation and dynamicmanagement of network functions. The authors implementeda prototype for the bmv2 P4 target. In comparison to Hyper4,HyperV achieves a 2.5x performance advantage in terms ofbandwidth and latency while reducing required resources bya factor of 4. HyperVDP [154] extends HyperV by an imple-mentation of a dynamic controller that supports instantiatingnetwork functions in virtual data planes.P4VBox [155], also published as VirtP4 [156], is a virtual-ization framework for the NetFPGA SUME P4 target. It allowsexecuting virtual switch instances in parallel and also to hot-swap them. In contrast to HyPer4, HyperV and HyperVDP,P4VBox achieves virtualization by partially re-conﬁguring thehardware.P4Visor [157] merges multiple P4 programs. This is done byprogram overlap analysis and compiler optimization. Program- ming In-Network Modular Extensions (PRIME) [158] alsoallows combining several P4 programs to a single programand to steer packets through the speciﬁc control ﬂows.P4click [159] does not only merge multiple P4 programs,but also combines the corresponding control plane blocks.The purpose of P4click is to increase the use of data planeprogrammability. P4click is currently in an early stage ofdevelopment.The Multi Tenant Portable Switch Architecture (MTPSA)[160] is a P4 architecture that offers performance isolation,resource isolation, and security isolation in a switch formultiple tenants. MTPSA is based on the PSA. It combines a Superuser pipeline that acts as a hypervisor with multiple userpipelines. User pipelines may only perform speciﬁc actionsdepending on their privileges. MTPSA is implemented forbmv2 and NetFPGA-SUME [161].

2) Composite P4 Target:

Da Silva et al. [162] introduce theidea of composite P4 targets. This tries to solve the problem oftarget-dependent support of features. The composed data planeappears as one P4 target; it is emulated by a P4 software targetbut relies on an FPGA and ASIC for packet processing.eXtra Large Table (XLT) [163] introduces gigabyte-scaleMATs by leveraging FPGA and DRAM capabilities. It com-prises a P4-capable ASIC and multiple FPGAs with DDR4DRAM. The P4-capable ASIC pre-constructs the match keyﬁeld and sends it with the full packet to the FPGA. The FPGAsends back the original packet with the search results of theMAT lookup. The authors implement a DPDK based prototypefor the T P S P4 software target.HyMoS [164] is a hybrid software and hardware switchto support NFV applications. The authors create a switch byusing P4-enabled Smart NICs as line cards and the PCIeinterface of a computer as the switch fabric. P4 is used forpacket switching between the NICs. Additional processingmay be done using DPDK or applications running on a GPU.

3) P4 Externs:

Laki et al. [165], [166] investigate asyn-chronous execution of externs. In contrast to common syn-chronous execution, other packets may be processed by thepipeline while the extern function is running. The authorsimplement and evaluate a prototype for T4P4S. Scholz etal. [167] propose that P4 targets should be extended bycryptographic hash functions that are required to build secureapplications and protocols. The authors propose an extensionof the PSA and discuss the PoC implementation for a CPU-,network processing unit (NPU)-, and FPGA-based P4 target.Da Silva et al. [168] investigate the implementation of complexoperations as extensions to P4. The authors perform a casestudy on integrating the Robust Header Compression (ROHC)scheme and conclude that an implementation as extern func-tion is superior to an implementation as a new native primitive.

4) Secure Behaviour of Targets:

Gray et al. [169] demon-strate that hardware details of P4 targets inﬂuence their packetprocessing behavior. The authors demonstrate this by sendinga special trafﬁc pattern to a P4 ﬁrewall. It ﬁlls the cache ofthis target and results in a blocking behavior although theoverall data rate is far below the capacity of the used P4target. Dumitru et al. [170] investigate the exploitation ofprogramming bugs in bmv2, P4-NetFPGA, and Toﬁno. The authors demonstrate attack scenarios by header ﬁeld accesson invalid headers, the creation of inﬁnite loops and uninten-tionally processing of dropped packets in the P4 targets.

5) Testbeds:

Large testbeds facilitate research and develop-ment on P4 programs. The i-4PEN (International P4 Experi-mental Networks) [171] is an international P4 testbed operatedby a collaboration of network research institutions from theUSA, Canada, and Taiwan. Chung et al. [172] describe howmulti-tenancy is achieved in this testbed. The 2STiC testbed[173], a national testbed in the Netherlands comprising sixsites with at least one Toﬁno-based P4 target, is connected toi-4PEN.

D. Research on Control Plane Operation

When new forwarding entries are computed by the con-troller, the data plane has to be updated. However, updating thetargets has to be performed in a manner that prevents negativeside effects. For example, microloops may occur if packetsare forwarded according to new rules at some targets while atother devices old rules are used because updates have to arriveyet.Sukapuram et al. [174], [175] introduce a timestamp inthe packet header that contains the sending time of a packet.When switches receive a packet during an update period, theycompare the timestamp of both the packet and the update todetermine whether a packet has been sent before the update,and thus, old rules should be used for forwarding.Liu et al. [176] introduce a mechanism where once a packetis matched against a speciﬁc forwarding rule, it cannot bematched downstream on a rule that is older. To that end, thepacket header contains a timestamp ﬁeld that records when thelast applied forwarding rule has been updated. If the packet ismatched against an older rule, the packet is dropped, otherwisethe timestamp is updated and the packet is forwarded.Ez-Segway [177] facilitates updating by including dataplane devices in the update process. When a data plane devicereceives an update, it determines which of its neighbors isaffected by the update as well, and forwards the update tothat neighbor. This prevents loops and black holes.TableVisor [178] is a transparent proxy-layer between thecontrol plane and data plane. It provides an abstraction fromheterogeneous data plane devices. This facilitates the conﬁg-uration of data plane switches with different properties, e.g.,forwarding table size.Molero et al. [179] propose to ofﬂoad tasks from the controlplane to the data plane. They show, that programmable dataplanes are able to run typical control plane operations likefailure detection and notiﬁcation, and connectivity retrieval.They discuss trade-offs, limitations and future research oppor-tunities.VIII. A

PPLIED R ESEARCH D OMAINS : O

VERVIEW

In the following sections, we give an overview on appliedresearch conducted with P4. We consider literature until theend of 2020; journal papers, conference papers, workshoppapers, and preprints. We categorize the works into the fol-lowing domains: monitoring (Section IX), trafﬁc management and congestion control (Section X), routing and forwarding(Section XI), advanced networking (Section XII), networksecurity (Section XIII), and other applied research domains(Section XIV). Figure 25 depicts all sections with their corre-sponding subsections.For each applied research domain, we categorize the pub-lications into more speciﬁc subsections and summarize theirkey points. We also categorize the publications with regard totheir publication year, prototype availability, target platforms,and source code availability.In Table IV, we depict the publication statistics for thepapers that fall into applied research. Out of the 367 scientiﬁcpublications we surveyed in this work (see Section I), 241fall in the area of applied research. 68 of those researchpapers were published in 2018 or before, 80 were publishedin 2019, and 93 were published in 2020. 60 out of all241 research publications released the source code of theirprototype implementations.

Venue

IEEE ACCESS 8IEEE/ACM ToN 7IEEE TNSM 5JNCA 4IEEE JSAC 2Computer Networks 2Miscellaneous 11

Conferences 169

ACM SOSR 14IEEE NFV-SDN 12IEEE ICNP 11IEEE ICC 10ACM SIGCOMM 10IEEE/IFIP NOMS 8USENIX NSDI 7ACM CoNEXT 7IEEE NetSoft 7IEEE INFOCOM 6ACM/IEEE ANCS 5IFIP Networking 5IEEE GLOBECOM 4CNSM 4USENIX ATC 3IEEE CloudNet 3APNOMS 3IFIP/IEEE IM 3IEEE/ACM INDIS 2IEEE ICDCS 2IEEE LANMAN 2OFC 2ICIN 2IEEE MILCOM 2Miscellaneous 35

Workshops 33

EuroP4 10Morning Workshop on In-Network Computing 5SPIN 3ACM HotNets 3ACM CoNEXT ENCP 2Miscellaneous 10TABLE IV: Statistics of scientiﬁc publications regarding applied researchconducted with P4. AppliedResearchDomainsRouting andForwarding

Section XI

Source RoutingMulticastPublish/Subscribe SystemsNamed Data NetworksData Plane Resilience

Trafﬁc Management andCongestion Control

Section X

Data Center SwitchingActive Queue Management (AQM)Trafﬁc SchedulingLoad BalancingCongestion NotiﬁcationTrafﬁc OfﬂoadingTrafﬁc Aggregation

Miscellaneous AppliedResearch Domains

Section XIV

Network CodingDistributed AlgorithmsState Migration

Monitoring

Section IX

NetworkSecurity

Section XIII

AdvancedNetworking

Section XII

Cellular Networks (4G/5G)Internet of Things (IoT)Industrial NetworkingTime-Sensitive Networking (TSN)Network Function Virtualization (NFV)Service Function Chaining (SFC)FirewallsDDoS Attack MitigationIntrusion Detection Systems (IDS)Detection of Heavy HittersFlow MonitoringSketchesIn-Band Network TelemetryDSL-based Monitoring SystemsOther Fields of ApplicationPath Tracking Port KnockingConnection SecurityOther Fields of ApplicationsApplication SupportOther Fields of Applications

Fig. 25: Overview graph of addressed application domains. Each addressed application domain is covered by a separate section. The leaves of the graph pointto the respective subsection.

IX. A

PPLIED R ESEARCH D OMAINS : M

ONITORING

We describe applied research on detection of heavy hitters,ﬂow monitoring, sketches, in-band network telemetry, andother areas of application. Table V shows an overview of allthe work described.

A. Detection of Heavy Hitters

Heavy hitters [258] (or "elephant ﬂows") are large trafﬁcﬂows that are the major source for network congestion. De-tection mechanisms aim at identifying heavy hitters to performextra processing, e.g., queuing, ﬂow rate control, and trafﬁcengineering.HashPipe [180] integrates a heavy hitter detection algorithmentirely on the P4 data plane. A pipeline of hash tables acts asa counter for detected ﬂows. To fulﬁll memory constraints, thenumber of ﬂows that can be stored is limited. When a new ﬂowis detected, it replaces the ﬂow with the lowest count. Thus,light ﬂows are replaced and heavy ﬂows can be detected by ahigh count. Lin et al. [182] describe an enhanced version ofthe algorithm.Popescu et al. [183] introduce a heavy hitter detectionmechanism. The controller installs TCAM entries for speciﬁcsource IP preﬁxes on the switch. If one of these entriesmatches more often than a threshold during a given time frame,the entry is split into two entries with a larger preﬁx size.This procedure is repeated until the conﬁgured granularity isreached.Harrison et al. [184] presents a controller-based and dis-tributed detection scheme for heavy hitters. The authors makeuse of counters for the match key values, e.g., source and des-tination IP pair or 5-tuple, that are maintained by P4 switches.If a counter exceeds a certain threshold, the P4 switch sends a notiﬁcation to the controller. The controller generates moreaccurate status reports by combining the notiﬁcations receivedfrom the switches.Kucera et al. [185] describe a system for detecting traf-ﬁc aggregates. The authors propose a novel algorithm thatsupports hierarchical heavy hitter detection, change detection,and super-spreader detection. The complete mechanism isimplemented on the P4 data plane and uses push notiﬁcationsto a controller.IDEAFIX [186] is a system that detects elephant ﬂowsat edge switches of Internet exchange point networks. Theproposed system analyzes ﬂow features, stores them withhash keys as indices in P4 registers, and compares them tothresholds for classiﬁcation.Turkovic et al. [187] propose a streaming approach fordetecting heavy hitters via sliding windows that are imple-mented in P4. According to the authors, interval methods thatare typically used to detect heavy hitters are not suitable forprogrammable data planes because of high hardware resources,bad accuracy, or a need for too much intervention by thecontrol plane.Ding et al. [188] propose an architecture for network-wide heavy hitter detection. The authors’ main focuses arehybrid SDN/non-SDN networks where programmable devicesare deployed only partially. To that end, they also presentan algorithm for an incremental deployment of programmabledevices with the goal of maximizing the number of networkﬂows that can be monitored.

B. Flow Monitoring

In ﬂow monitoring, trafﬁc is analyzed on a per-ﬂow level.Network devices are conﬁgured to export per-ﬂow informa-tion, e.g., packet counters, source and target IP addresses, Title Year Targets CodeDetection of Heavy Hitters (Section IX-A)HashPipe [180] 2017 bmv2 [181]Lin et al. [182] 2019 ToﬁnoPopescu et al. [183] 2017 -Harrison et al. [184] 2018 ToﬁnoKucera et al. [185] 2020 bmv2IDEAFIX [186] 2018 -Turkovic et al. [187] 2019 NetronomeDing et al. [188] 2020 bmv2 [189]

Flow Monitoring (Section IX-B)TurboFlow [190] 2018 Toﬁno, Netronome [191]* Flow [192] 2018 Toﬁno [193]Hill et al. [194] 2018 bmv2FlowStalker [195] 2019 bmv2ShadowFS [196] 2020 bmv2SpiderMon [197] 2020 bmv2ConQuest [198] 2019 ToﬁnoZhao et al. [199] 2019 bmv2, Toﬁno

Sketches (Section IX-C)SketchLearn [200] 2018 Toﬁno [201]MV-Sketch [202] 2020 bmv2, Toﬁno [203]Hang et al. [204] 2019 ToﬁnoUnivMon [205] 2016 p4c-behaviouralYang et al. [206], [207] 2018/19 Toﬁno [208]Pereira et al. [209] 2017 bmv2Martins et al. [210] 2018 bmv2Lai et al. [211] 2019 ToﬁnoLiu et al. [212] 2020 ToﬁnoSpreadSketch [213] 2020 Toﬁno [214]

Title Year Targets CodeIn-Band Network Telemetry (Section IX-D)Vestin et al. [215] 2019 NetronomeWang et al. [216] 2019 ToﬁnoIntOpt [217] 2019 P4FPGAJia et al. [218] 2020 bmv2 [219]Niu et al. [220] 2019 Toﬁno, NetronomeCAPEST [221] 2020 bmv2 [222]Choi et al. [223] 2019 bmv2Sgambelluri et al. [224] 2020 bmv2Feng et al. [225] 2020 NetronomeIntSight [226] 2020 bmv2, NetFPGA-SUME [227]Suh et al. [228] 2020 -

DSL-Based Monitoring Systems (Section IX-E)Marple [229], [230] 2017 bmv2 [231]MAFIA [232] 2019 bmv2 [233]Sonata [234] 2018 bmv2, Toﬁno [235]Teixeira et al. [236] 2020 bmv2, Toﬁno

Path Tracking (Section IX-F)UniRope [237] 2018 bmv2, PISCESKnossen et al. [238] 2019 NetronomeBasuki et al. [239] 2020 bmv2

Other Areas of Application (Section IX-G)BurstRadar [240] 2018 Toﬁno [241]Dapper [242] 2017 -He et al. [243] 2018 ToﬁnoRiesenberg et al. [244] 2019 bmv2 [245]Wang et al. [246] 2020 ToﬁnoP4STA [247] 2020 bmv2, Netronome [248]Hark et al. [249] 2019 -P4Entropy [250] 2020 bmv2 [251]Taffet et al. [252] 2019 bmv2NetView [253] 2020 bmv2, ToﬁnoFastFE [254] 2020 ToﬁnoUnroller [255] 2020 bmv2, Netcope P4-to-VHDLHang et al. [256] 2019 ToﬁnoFlowSpy [257] 2019 bmv2TABLE V: Overview of applied research on monitoring (Section IX). ports, or protocol types, as ﬂow records to a ﬂow collector.These ﬂow records are often duplicates of network packetswithout payload data. The ﬂow collector then performs cen-tralized analysis on this data. The three most widely deployedprotocols are Netﬂow [259], sFlow [260], and IPFIX [261].TurboFlow [190] is a ﬂow record generator designed forP4 switches that does not have to make use of samplingor mirroring. The data plane generates micro-ﬂow recordswith information about the most recent packets of a ﬂow. Onthe CPU module of the switch, those micro-ﬂow records areaggregated and processed into full ﬂow records.“* Flow” [192] partitions measurement queries between thedata plane and a software component. A switching ASICcomputes grouped packet vectors that contain a ﬂow identiﬁerand a variable set of packet features, e.g. packet size andtimestamps, while the software component performs aggrega-tion. “* Flow” supports dynamic and concurrent measurementapplications, i.e., measurement applications that operate on thesame ﬂows without impacting each other.Hill et al. [194] implement Bloom ﬁlters on P4 switchesto prevent sending duplicate ﬂow samples. Bloom ﬁlters are a probabilistic data structure that can be used to check whetheran entry is present in a set or not. It is possible to add elementsto that set, but it is not possible to remove entries from it.For ﬂow tracking, Bloom ﬁlters test if a ﬂow has been seenbefore without control plane interaction. Thereby, only ﬂowdata is forwarded to the collector from ﬂows that were notseen before.FlowStalker [195] is a ﬂow monitoring system running onthe P4 data plane. The monitoring operations on a packetare divided in two phases, a proactive phase that identiﬁesa ﬂow and keeps a per-ﬂow packet counter and a reactivephase that runs for large ﬂows only and gathers metrics ofthe ﬂow, e.g., byte counts and packet sizes. The controllergathers information from a cluster of switches by injecting acrawler packet that travels through the cluster at one switch.ShadowFS [196] extends FlowStalker with a mechanism toincrease the throughput of the monitored ﬂows. It achievesthis by dividing forwarding tables into two tables, a faster anda slower one. The most utilized ﬂows are moved to the fastertable if necessary.SpiderMon [197] monitors network performance and debugs performance failures inside the network with little overhead.To that end, SpiderMon monitors every ﬂow in the data planeand recognizes if the accumulated latency exceeds a certainthreshold. Furthermore, SpiderMon is able to trace back thepath of interfering ﬂows, allowing to analyze the cause of theperformance degradation.ConQuest [198] is a data plane mechanism to identifyﬂows that occupy large portions of buffers. Switches maintainsnapshots of queues in registers to determine the contributionto queue occupancy of the ﬂow of a received packet.Zhao et al. [199] implement ﬂow monitoring using hashtables. Using a novel strategy for collision resolution andrecord promotion, accurate records for elephant ﬂows andsummarized records for other ﬂows are stored. C. Sketches

Flow monitoring as described in Section IX-B requireshigh sampling rates to produce sufﬁciently detailed data. Asan alternative, streaming algorithms process sequential datastreams and are subject to different constraints like limitedmemory or processing time per item. They approximate thecurrent network status based on concluded summaries ofthe data stream. The streaming algorithms output so-calledsketches that contain summarized information about selectedproperties of the last n packets of a ﬂow.SketchLearn [200] is a sketch-based approach to track thefrequency of ﬂow records. It features multilevel sketchesthat aim for small memory usage, fast per-packet process-ing, and real-time response. Rather than ﬁnding the perfectresource conﬁguration for measurement trafﬁc and regulartrafﬁc, SketchLearn characterizes the statistical error of re-source conﬂicts based on Gaussian distributions. The learnedproperties are then used to increase the accuracy of theapproximated measurements.Tang et al. [202] present MV-Sketch, a fast and compactinvertible sketch. MV-Sketch leverages the idea of majorityvoting to decide whether a ﬂow is a heavy hitter or heavychanger. Evaluations show that MV-Sketch achieves a 3.38times higher throughput than existing invertible sketches.Hang et al. [204] try to solve the problem of inconsistencywhen a controller needs to collect the data from sketches onone or more switches. As accessing and clearing the sketcheson the switches is always subject to latency, not all sketches arereset at the same time, and there might be some delay betweenaccessing and clearing the sketches. The authors propose touse two asymmetric sketches on the switches that are used inan interleaved way. Furthermore, the authors propose to use adistributed control plane to keep latency low.UnivMon [205] is a ﬂow monitoring system based onsketches. After sampling the trafﬁc, the data plane producessketches and determines the top- k heaviest ﬂows by comparingthe number of sketches for each ﬂow. Those ﬂows are passedto the control plane which processes the data for the speciﬁcapplication.Yang et al. [206], [207] propose to adapt sketches accordingto certain trafﬁc characteristics to increase data accuracy, e.g.,during congestion or distributed denial of service (DDoS) attacks. The mechanism is based on compressing and mergingsketches when resources in the network are limited due to hightrafﬁc volume. During periods with high packet rates, only theinformation of elephant ﬂows is recorded to trade accuracy forhigher processing speed.Pereira et al. [209] propose a secured version of the Count-Min sketch. They replace the function with a cryptographichash function and provide a way for secret key renewal.Martins et al. [210] introduce sketches for multi-tenantenvironments. The authors implement bitmap and counter-array sketches using a new probabilistic data structure calledBitMatrix that consists of multiple bitmaps that are stored ina single P4 register.Lai et al. [211] use a sketch-based approach to estimatethe entropy of network trafﬁc. The authors use CRC32 hashesof header ﬁelds as match keys for match-action tables andsubsequently update k-dimensional data sketches in registers.The content of the registers is then processed by the controlplane CPU which calculates the entropy value.Liu et al. [212] use sketches for performance monitoring.They introduce lean algorithms to measure metrics like lossor out-of-order packets.SpreadSketch [213] is a sketch data structure to detectsuperspreaders. The sketch data structure is invertible, i.e., itis possible to extract the identiﬁcation of superspreaders fromthe sketch at the end of an epoch. D. In-Band Network Telemetry

Barefoot Networks, Arista, Dell, Intel and VMware spec-iﬁed in-band network telemetry (INT) speciﬁcally for P4[262]. It uses a pure data plane implementation to collecttelemetry data from the network without any interventionby the control plane. It was speciﬁed by INT is the mainfocus of the

Applications WG [263] of the P4 LanguageConsortium. Instructions for INT-enabled devices that serveas trafﬁc sources are embedded as header ﬁelds either intonormal packets or into dedicated probe packets. Trafﬁc sinksretrieve the results of instructions to trafﬁc sources. In this way,trafﬁc sinks have access to information about the data planestate of the INT-enabled devices that forwarded the packetscontaining the instructions for trafﬁc sources. The authors ofthe INT speciﬁcation name network troubleshooting, advancedcongestion control, advanced routing, and network data planeveriﬁcation as examples for high-level use cases.In two demos, INT was used for diagnosing the cause oflatency spikes during HTTP transfers [264] and for enforcingQoS policies on a per-packet basis across a metro network[265].Vestin et al. [215] enhance INT trafﬁc sinks by eventdetection. Instead of exporting telemetry items of all packetsto a stream processor, exporting has to be triggered by anevent. Furthermore, they implement an INT report collectorfor Linux that can stream telemetry data to a Kafka cluster.Wang et al. [216] design an INT system that can track whichrules in MATs matched on a packet. The resulting data isstored in a database to facilitate visualization in a web UI.IntOpt [217] uses INT to monitor service function chains.The system computes minimal monitoring ﬂows that cover all desired telemetry demands, i.e., the number of INT-sources,sinks, and forwarding nodes that are covered by this ﬂow isminimal. IntOpt uses active probing, i.e., monitoring probesfor the monitoring ﬂows are periodically inserted into thenetwork.Jia et al. [218] use INT to detect gray failures in data centernetworks using probe packets. Gray failures are failures thathappen silently and without notiﬁcation.Niu et al. [220] design a multilevel INT system for IP-over-optical networks. Their goal is to monitor both the IPnetwork and the optical network at the same time. To that end,they implement optical performance monitors for bandwidth-variable wavelength selective switches. Their measurementscan be queried by a P4 switch that is connected directly to it.CAPEST [221] leverages P4-enabled switches to estimatethe network capacity and available bandwidth of network links.The approach is passive, i.e., it does not disturb the network.A controller sends INT probe packets to trigger statisticalanalysis and export results.Choi et al. [223] leverage INT for run-time performancemonitoring, veriﬁcation, and healing of end-to-end services.P4-capable switches monitor the network based on INT in-formation and the distributed control plane veriﬁes that SLAsand other metrics are fulﬁlled. They leverage metric dynamiclogic (MDL) to specify formal assertions for SLAs.Sgambelluri el at. [224] propose a multi-layer monitoringsystem that uses an OpenConﬁg NETCONF agent for theoptical layer an P4-based INT for the packet layer. In theirprototype, they use INT to measure the delay of packets bycomputing the processing time at each switch.Feng et al. [225] implement an INT sink for NetronomeSmart NICs. After parsing the INT headers using P4, they usealgorithms written in C to perform INT tasks like aggregationand notiﬁcation. Compared to a pure P4 implementation, thisincreases the performance.IntSight [226] is a system for detecting and analyzing vio-lations of service-level objects (SLOs). SLOs are performanceguarantees towards a network, e.g., concerning bandwidth andlatency. IntSight uses INT to monitor the performance of thenetwork during a speciﬁc period of time. Egress devices gatherthis information and produce a report at the end of the periodif an SLO has been violated.Suh et al. [228] explore how a sampling mechanism can beadded to INT. Their solution supports rate-based and event-based sampling. Based on these sampling strategies, INTheaders are only added to a fraction of the packets to reduceoverhead. E. DSL-Based Monitoring Systems

Monitoring tasks can often be broken down in a set ofseveral basic operations, e.g., map, ﬁlter, or groupby. Adomain-speciﬁc language (DSL) allows to combine these basicoperations in more complex tasks.Marple [229], [230] is a performance query language thatsupports existing constructs like map, ﬁlter, groupby, andzip. A query compiler translates the queries either to P4 orto a simulator for programmable switch hardware. Stateless constructs of the query language, e.g., ﬁlters, are executedon the data plane. Stateful constructs, e.g., groupby, use aprogrammable key-value store that is split between a fast on-chip SRAM cache and a large off-chip DRAM backing store.The results are streamed from the switch to a collection server.MAFIA [232] is a DSL to describe network measurementtasks. They identify several fundamental primitive opera-tions, examples are match, tag, timestamp, sketch, or counter.MAFIA is a high-level language to describe more complexmeasurement tasks composed of those primitives. The authorsprovide a Python-based compiler that translates MAFIA codeinto a P4 program in P4 or P4 for a PISA-based P4 target.Sonata [234] is a query-driven telemetry system. It providesa query interface that provides common operators like map andreduce that can be applied on arbitrary packet ﬁelds. Sonatacombines the capabilities of both programmable switches andstream processors. The queries are partitioned between the pro-grammable switches and the stream processors to reduce theload on the stream processors. Teixeira et al. [236] extend theSonata prototype by functionalities to monitor the propertiesof packet processing inside switches, e.g., delay. F. Path Tracking

In path tracking, or packet trajectory tracing, informationabout the path a packet has taken in a network is gathered.UniRope [237] consists of two different algorithms forpacket trajectory tracing that can be selected dynamicallyto be able to choose the trade-off between accuracy andefﬁciency. These two algorithms are compact hash matching and consecutive bits ﬁlling . With compact hash matching, theforwarding switch calculates a hash value and stores it in thepacket. With consecutive bits ﬁlling, the packet trajectory isrecorded in the packet hop by hop and reconstructed at thecontroller.Knossen et al. [238] present two different approaches forpath tracking in P4. In hop recording , all forwarding P4nodes record their ID in the header of the target packet.The last node can then reconstruct the path. In forwardingstate logging , the ﬁrst P4 node records the current versionof the global forwarding state of the network and its nodeidentiﬁer in a header of the target packet. If the version ofthe global forwarding state does not change while the packetﬂows through the network, the last P4 node in the networkcan reconstruct the path using the information in the header.Basuki et al. [239] propose a privacy-aware path-trackingmechanism. Their goal is that the trajectory information inthe packets cannot be used to draw conclusions about thenetwork topology or routing information. They achieve thisby recording the information in an in-packet bloom ﬁlter.

G. Other Fields of Application

BurstRadar [240] is a system for microburst detection fordata center networks that runs directly on P4 switches. Ifqueue-induced delay is above a certain threshold, BurstRadarreports a microburst and creates a snapshot of the telemetryinformation of involved packets. This telemetry information isthen forwarded to a monitoring server. As it is not possible to gather telemetry information of packets that are already partof the egress queue, the telemetry information of all packetsand their corresponding egress port are temporarily stored ina ring buffer that is implemented using P4 registers.Dapper [242] is a P4 tool to evaluate TCP. It implementsTCP in P4 and analyzes header ﬁelds, packets sizes, andtimestamps of data and ACK packets to detect congestion.Then, ﬂow-dependent information are stored in registers.He et al. [243] propose an adaptive expiration timeoutmechanism for ﬂow entries in P4 switches. The switchesimplement a mechanism to detect the last packet of a TCPﬂow. In case of a match, it notiﬁes the controller to delete thecorresponding ﬂow entries.Riesenberg et al. [244] implement alternate marking per-formance measurement (AM-PM) for P4. AM-PM measuresdelay and packet loss in-band in a network using only one ortwo bit overhead per packet. These bits are used for coordi-nation and signalling between measurement points (MPs).Wang et al. [246] describe how TCP-friendly meters can bedesigned and implemented for P4-based switches. Accordingto their ﬁndings, meters in commercial switches interact withTCP streams in such a way that these streams can only reachabout 10% of the target rate. The experimental evaluation oftheir TCP-friendly meters shows achieved rates of up to 85%of the target rate.P4STA [247] is an open source framework that combinessoftware-based trafﬁc load generation with accurate hardwarepacket timestamps. Thereby, P4STA aggregates multiple trafﬁcﬂows to generate high trafﬁc load and leverage programmableplatforms.Hark et al. [249] use P4 to ﬁlter data plane measurements.To save resources, only relevant measurements are sent to thecontroller. The authors implement a prototype and demonstratethe system by ﬁltering measurements for a bandwidth forecastapplication.P4Entropy [250] presents an algorithm to estimate theentropy of network trafﬁc within the P4 data plane. To that end,they also developed two new algorithms, P4Log and P4Exp,to estimate logarithms and exponential functions within thedata plane as well.Taffet et al. [252] describe a P4-based implementation ofan in-band monitoring system that collects information of thepath of a packet and whether it encountered congestion. Forthis purpose, the authors repurpose previously unused ﬁeldsof the IP header.NetView [253] is a network telemetry framework that usesproactive probe packets to monitor devices. Telemetry targets,frequency, and characteristics can be conﬁgured on demandby administrators. The probe packets traverse arbitrary pathsby using source routing.FastFE [254] is a system for ofﬂoading feature extraction,i.e., deriving certain information from network trafﬁc, formachine learning (ML)-based trafﬁc analysis applications.Policies for feature extraction are deﬁned as sequential pro-grams. A policy enforcement engine translates these policiesinto primitives for either a programmable switch or a programrunning on a commodity server. Unroller [255] detects routing loops in the data plane inreal-time. It achieves this by encoding a subset of the paththat a packet takes into the packet.Hang et al. [256] use a time-based sliding window approachto measure packet rates. The goal is to record statistics entirelyinside the data plane without having to use the CPU of aswitch. Their approach is able to measure trafﬁc size withoutsampling.FlowSpy [257] is a network monitoring framework thatuses load balancing. Different monitoring tasks are distributedamong all available switches by an ILP solver. This reducesthe workload on single switches in contrast to monitoringframeworks that perform all monitoring tasks on ingress oregress switches only.X. A PPLIED R ESEARCH D OMAINS : T

RAFFIC M ANAGEMENT AND C ONGESTION C ONTROL

We describe applied research on data center switching, loadbalancing, congestion notiﬁcation, trafﬁc scheduling, trafﬁcaggregation, active queue management (AQM), and trafﬁcofﬂoading. Table VI shows an overview of all the workdescribed.

A. Data Center Switching

Trellis [266], [267] is an open-source multipurpose L2/L3spine-leaf switch fabric for data center networks. It is designedto run on whitebox switches in conjunction with the ONOScontroller where its main functionality is implemented. It sup-ports typical data center functionality such as bridging usingVLANs, routing (IPv4/IPv6 unicast/multicast routing, MPLSsegment routing), and vRouter functionality (BGBv4/v6, staticroutes, route black-holing). Trellis is part of the CORDplatform that leverages SDN, network function virtualization(NFV), and Cloud technologies for building agile data centersfor the network edge.DC.p4 [269] implements typical features of data centerswitches in P4. The list of features includes support for VLAN,NVGRE, VXLAN, ECMP, IP forwarding, access control lists(ACLs), packet mirroring, MAC learning, and packet-in/-outmessages to the control plane.Fabric.p4 [267], [271] the underlying reference data planepipeline implemented in P4. By introducing support for P4switches, the authors aim at increasing the platform hetero-geneity for the CORD fabric. Fabric.p4 is currently basedon the V1Model switch architecture, but support for PSA isplanned. It is inspired by the OpenFlow data plane abstraction(OF-DPA) and currently supports L2 bridging, IPv4/IPv6 uni-cast/multicast routing, and MPLS segment routing. Fabric.p4comes with capability proﬁles such as fabric (basic proﬁle), spgw (S/PGW), and INT. For control plane interaction, ONOSis extended by the P4Runtime.

B. Load Balancing

SHELL [273] implements stateless application-aware loadbalancing in P4. A load balancer forwards new connectionsto a set of randomly chosen application instances by adding a Title Year Targets CodeData Center Switching (Section X-A)Trellis [266], [267] 2019 bmv2 [268]DC.p4 [269] 2015 bmv2 [270]Fabric.p4 [271] 2018 bmv2 [272]

Load Balancing (Section X-B)SHELL [273] 2018 NetFPGA-SUMESilkRoad [274] 2017 ToﬁnoHULA [275] 2016 -MP-HULA [276] 2018 -Chiang et al. [277] 2019 bmv2W-ECMP [278] 2018 bmv2DASH [279] 2020 bmv2Pizzutti et al. [280], [281] 2018/20 bmv2LBAS [282] 2020 ToﬁnoDPRO [283] 2020 bmv2Kawaguchi et al. [284] 2019 bmv2AppSwitch [285] 2017 PISCESBeamer [286] 2018 bmv2, NetFPGA-SUME [287]

Congestion Notiﬁcation (Section X-C)P4QCN [288] 2019 bmv2Jiang et al. [289] 2019 -EECN [290] 2020 bmv2Chen et al. [291] 2020 bmv2Laraba et al. [292] 2020 bmv2

Trafﬁc Scheduling (Section X-D)Sharma et al. [293] 2018 bmv2Cascone et al. [294] 2017 -Bhat et al. [295] 2019 bmv2Kfoury et al. [296] 2019 bmv2Chen et al. [297] 2019 ToﬁnoLee et al. [298] 2019 bmv2

Trafﬁc Aggregation (Section X-E)Wang et al. [299] 2020 ToﬁnoRL-SP-DRR [300] 2019 bmv2

Active Queue Management (AQM) (Section X-F)Turkovic et al. [301] 2018 bmv2, NetronomeP4-Codel [302] 2018 bmv2 [303]P4-ABC [304] 2019 bmv2P4air [305] 2020 bmv2, ToﬁnoFernandes et al. [306] 2020 bmv2Wang et al. [307] 2018 bmv2, ToﬁnoSP-PIFO [308] 2020 Toﬁno

Trafﬁc Ofﬂoading (Section X-G)Andrus et al. [309] 2019 -Ibanez et al. [310] 2019 NetFPGA-SUMEKfoury et al. [311] 2020 ToﬁnoFalcon [312] 2020 ToﬁnoOsi´nski et al. [313] 2020 ToﬁnoTABLE VI: Overview of applied research on trafﬁc management and conges-tion control (Section X). segment routing (SR) header. Each application instance makesa local decision to either decline or accept the connectionattempt. After connection initiation, the client includes a pre-viously negotiated identiﬁer in all subsequent packets. In theprototypical implementation, the authors use TCP time stampsfor communicating the identiﬁer, alternatives are identiﬁers ofQUIC or TCP sequence numbers.SilkRoad [274] implements stateful load balancing on P4switches. SilkRoad implements two tables for stateful process- ing. One table maps virtual IP addresses of services to serverinstances, another table records active connections identiﬁedby hashes of 5-tuples to forward subsequent ﬂows. It applies aBloom ﬁlter to identify new connection attempts and to recordthose requests in registers to remember client requests thatarrive while the pool of server instances changes. In [314],the accompanying demo is described.HULA [275] implements a link load-based distance vec-tor routing mechanism. Switches in HULA do not maintainthe state for every path but the next hops. They send outprobes to gather link utilization information. Probe packets aredistributed throughout the network on node-speciﬁc multicasttrees. The probes have a header that contains a destination ﬁeldand the currently best path utilization to that destination. Whena node receives a probe, it updates the best path utilizationif necessary, sends one packet clone upstream back to theorigin, and forwards copies along the multicast tree furtherdownstream. This way the origin will receive multiple probepackets with different path utilization to a speciﬁc destination.Then, ﬂowlets are forwarded onto the best currently availablepath to its destination.MP-HULA [276] extends HULA by using load informationfor n best next hops and compatibility with multipath TCP(MP-TCP). It tracks subﬂows of MP-TCP with individualﬂowlets per sub-ﬂow. MP-HULA aims at distributing thosesubﬂows on different paths to aggregate bandwidth. To thatend, it is necessary to keep track of the best n next-hops whichis done with additional registers and forwarding rules.Chiang et al. [277] propose a cost-effective congestion-aware load balancing scheme (CCLB). In contrast to HULA,CCLB replaces only the leaf switches with programmableswitches, and thus is more cost-effective. They leverageExplicit Congestion Notiﬁcation (ECN) information in probepackets to recognize congestion in the network and to adaptthe load balancing. CCLB further uses ﬂowlet forwarding andis implemented for the bmv2.W-ECMP [278] is an ECMP-based load balancing mecha-nism for data centers implemented for P4 switches. Weightedprobabilities based on path utilization, are used to randomlychoose the best path to avoid congestion. A local agent on eachswitch computes link utilization for the ports. Regular trafﬁccarries an additional custom packet header that keeps track ofthe current maximum link utilization on a path. Based on themaximum link utilization, the switches update port weights ifnecessary.DASH [279] is an adaptive weighted trafﬁc splitting mech-anism that works entirely in the data plane. In contrast topopular weighted trafﬁc splitting strategies such as WCMP,DASH does not require multiple hash table entries. DASHsplits trafﬁc based on link weights by portioning the hash spaceinto unique regions.Pizzutti et al. [280], [281] implement congestion-aware loadbalancing for ﬂowlets on P4 switches. Flowlets are burstsof packets that are separated by a time gap, e.g., as causedby factors such as TCP dynamics, buffer availability, or linkcongestion. For distributing subﬂows on different paths, thecongestion state of the last route is stored in a register.LBAS [282] implements a load balancer to minimize the processing latency at both load balancers and applicationservers. LBAS does not only reduce the processing latencyat load balancers but also takes the application servers’ stateinto account. It is implemented for the Toﬁno and its averageresponse time is evaluated.DPRO [283] combines INT with trafﬁc engineering (TE)and reinforcement learning (RL). Network statistics, suchas link utilization and switch load, are gathered using anadapted INT approach. An RL-agent inside the controlleradapts the link weights based on the minimization of a max-link-utilization objective.Kawaguchi et al. [284] implement Unsplittable ﬂow EdgeLoad factor Balancing (UELB). A controller application mon-itors the link utilization and computes new optimal pathsupon congestion. The path computation is based on the UELBproblem. The forwarding is implemented in P4 for the bmv2.AppSwitch [285] implements a load balancer for key-valuestorage systems. However, the focus lies on a local agent andthe control plane communication with the storage server.Beamer [286] operates in data centers and prevents in-terruption of connections when they are load-balanced to adifferent server. To that end, the Beamer controller instructsthe new target server to forward packets of the load-balancedconnection to the old target server until the migration phaseis over. C. Congestion Notiﬁcation

P4QCN [288] proposes a congestion feedback mechanismwhere network nodes check the egress ports for congestionbefore forwarding packets. If a node detects congestion, itcalculates a feedback value that is propagated upstream. Themechanism clones the packet that caused the congestion,updates the feedback value in the header, changes the originof the ﬂow, and forwards it as a feedback packet to thesender. The sender adjusts its sending rate to reduce congestiondownstream. The authors describe an implementation wherebmv2 is extended by P4 externs for ﬂoating-point calculations.Jiang et al. [289] introduce a novel adjusting advertisedwindows (AWW) mechanism for TCP. The authors arguethat the current calculation of the advertised window in theTCP header is inaccurate because the source node does notknow the actual capacity of the network. AWW dynamicallyupdates the advertised window of ACK packets to feedbackthe network capacity indirectly to the source nodes. Each P4switch calculates the new AWW value and writes it into thepacket header.EECN [290] presents an enhanced ECN mechanism whichpiggybacks congestion information if the switch notices con-gestion. To that end, the ECN-Echo bit is set for traversingACKs as soon as congestion occurs for a given ﬂow. Thisenables fast congestion notiﬁcation without the need for addi-tional control trafﬁc.Chen et al. [291] present QoSTCP, a TCP version withadapted congestion window growth that enables rate limiting.QoSTCP is based on a marking approach similar to ECN.When a ﬂow exceeds a certain rate, the packet gets markedwith a so-called Rate-Limiting Notiﬁcation (RLN) and the congestion window growth is adapted proportional to theRLN-marked packet rate. Metering and marking is done usingP4.Laraba et al. [292] detect ECN misbehavior with the help ofP4 switches. They model ECN as extended ﬁnite state machine(EFSM) and store states and variables in registers. If end hostsdo not conform to the speciﬁed ECN state machine, packetsare either dropped or, if possible, the misbehavior is corrected.

D. Trafﬁc Scheduling

Sharma et al. [293] introduce a mechanism for per ﬂowfairness scheduling in P4. The concept is based on round-robin scheduling where each ﬂow may send a certain numberof bytes in each round. The switch assigns a round numberfor each arriving packet that depends on the number of sentbytes of ﬂow in the past.Cascone et al. [294] introduce bandwidth sharing based onsending rates between TCP senders. P4 switches use statisticalbyte counters to store the sending rate of each user. Dependingon the recorded sending rate of the user, arriving packets arepushed into different priority queues.Bhat et al. [295] leverage P4 switches to translate appli-cation layer header information into link-layer headers forbetter QoS routing. They use Q-in-Q tunneling at the edgeto forward packets to the core network and present a bmv2implementation for HTTP/2 applications, as HTTP/2 explicitlydeﬁnes a Stream ID that can directly be translated in Q-in-Qtags.Kfoury et al. [296] present a method to support dynamicTCP pacing with the aid of network state information. AP4 switch monitors the number of active TCP ﬂows, i.e.,they monitor the SYN, SYN-ACK, and ACK ﬂags and notifysenders about the current network state if a new ﬂow starts oranother terminates. To that end, they introduce a new headerand show by simulations that the overall throughput increases.Chen et al. [297] present a design for bandwidth man-agement for QoS with SDN and P4-programmable switches.Their design classiﬁes packets based on a two-rate three-color marker and assigns corresponding priorities to guaranteecertain per ﬂow bandwidth. To that end, they leverage thepriority queuing capabilities of P4-switches based on the as-signed color. Guaranteed trafﬁc goes to a high-priority queue,best-effort trafﬁc goes to a low-priority queue, and trafﬁc thatexceeds its bandwidth is simply dropped.Lee et al. [298] implement a multi-color marker for band-width guarantees in virtual networks. Their objective is toisolate bandwidth consumption of virtual networks and provideQoS for its serving ﬂows.

E. Trafﬁc Aggregation

Wang et al. [299] introduce aggregation and dis-aggregationcapabilities for P4 switches. To reduce the header overhead inthe network, multiple small packets are thereby aggregatedto a single packet. They leverage multiple register arrays tostore incoming small packets in 32 bit chunks. If enough smallpackets are stored, a larger packet gets assembled with the aid of multiple recirculations; each recirculation step appends asmall packet to the aggregated large packet.RL-SP-DRR [300] is a combination of strict priorityscheduling with rate limitation (RL-SP) and deﬁcit round-robin (DRR). RL-SP ensures prioritization of high-prioritytrafﬁc while DRR enables fair scheduling among differentpriority classes. They extend bmv2 to support RL-SP-DRRand evaluate it against strict priority queuing and no activequeuing mechanism. F. Active Queue Management (AQM)

Turkovic et al. [301] develop an active queue manage-ment (AQM) mechanism for programmable data planes. Theswitches are programmed to collect metadata associated withpacket processing, e.g., queue size and load, that are usedto prevent, detect, and dissolve congestion by forwardingaffected ﬂows on an alternate path. Two possible mechanismsfor rerouting in P4 are described. In the ﬁrst mechanism,primary and backup entries are installed in the forwardingtables and according to the gathered metadata, the suitableaction is selected. The second mechanism leverages a localcontroller on each switch that monitors ﬂows and installsupdated forwarding rules when congestion is noticed.P4-CoDel [302] implements the CoDel AQM mechanismspeciﬁed in RFC 8289 [315]. CoDel leverages a target and aninterval parameter. As long as the queuing delay is shorter thanthe target parameter, no packets are dropped. If the queuingdelay exceeds the target by a value that is at least as largeas interval, a packet is dropped and the interval parameteris decreased. This procedure is repeated until the queuingdelay is under the target threshold again. The interval is thenreset to the initial value. To avoid P4 externs, the authors useapproximated calculations for ﬂoating-point operations.P4-ABC [304] implements activity-based congestion man-agement (ABC) for P4. ABC is a domain concept whereedge nodes measure the activity, i.e., the sending rate, of eachuser and annotate the value in the packet header. Core nodesmeasure the average activity of all packets. Depending on thecurrent queue status, the average activity, and activity valuein the packet header, a drop decision is made for each packetto prevent congestion. The P4 implementation for the bmv2requires externs for ﬂoating-point calculations.P4air [305] attempts to provide more fairness for TCPﬂows with different congestion control algorithms. To thatend, P4air groups ﬂows into different categories based on theircongestion control algorithm, e.g., loss-, delay- and loss-delay-based. Afterwards, the most aggressive ﬂows are punishedbased on the previous categorization with packet drops, delayincrease, or adjusted receive windows. P4air leverages switchmetrics and ﬂow reactions, such as queuing delay and sendingrate, to determine the congestion control algorithm used by theﬂows.Fernandes et al. [306] propose a bandwidth throttling so-lution in P4. Incoming packets are dropped with a certainprobability depending on the incoming rate of the ﬂow andthe deﬁned maximum bandwidth. Rates are measured usingtime windows and byte counters. Fernandes et al. extend thebmv2 for this purpose. Wang et al. [307] present an AQM mechanism for videostreaming. Data packets are classiﬁed as base packets (ba-sic image information) or enhancement packets (additionalinformation to improve the image quality). When the queuesize exceeds a certain threshold, enhancement packets arepreferably dropped.SP-PIFO [308] features an approximation of Push-In First-Out (PIFO) queues which enables programmable packetscheduling at line rate. SP-PIFO dynamically adapts the map-ping between packet ranks and available strict-priority queues. G. Trafﬁc Ofﬂoading

Andrus et al. [309] propose to ofﬂoad video stream process-ing of surveillance cameras to P4 switches. The authors pro-pose to ofﬂoad stream processing for storage to P4 switches.In case the analytics software detected an event, it enables amultistage pipeline on the P4 switch. In the ﬁrst step, videostream data is replicated. One stream is further sent to theanalytics software, the other stream is dedicated to the videostorage. The P4 switch ﬁlters out control packets and rewritesthe destination IP address of all video packets to the videostorage.Ibanez et al. [310] try to tackle the problem of P4’spacket-by-packet programming model. Many tasks, such asperiodic updates, require either hardware-speciﬁc capabilitiesor control-plane interaction. Processing capabilities are limitedto enqueue events, i.e., data plane actions are only triggered ifpackets arrive. To eliminate this problem, the authors proposea new mechanism for event processing using the P4 language.Kfoury et al. [311] propose to ofﬂoad media trafﬁc to P4switches which act as relay servers. A SIP server receivesthe connection request, replaces IP and port information withthe relay server IP and port, and forwards the request to thereceiver. Afterwards, the media trafﬁc is routed through therelay server.Falcon [312] ofﬂoads task scheduling to programmableswitches. Job requests are sent to the switch and the switch as-signs a task in ﬁrst-come-ﬁrst-serve order to the next executorin a pool of computation nodes. Falcon reduces the schedulingoverhead by a factor of 26 and increase scheduling throughputby a factor of 25 compared to state-of-the-art schedulers.Osinski et al. [313] present vBNG, a virtual BroadbandNetwork Gateway (BNG). Some components, such as PPPoEsession handling, are ofﬂoaded to programmable switches.XI. A

PPLIED R ESEARCH D OMAINS : R

OUTING AND F ORWARDING

We describe applied research on source routing, multicast,publish-subscribe-systems, named data networking, data planeresilience, and other ﬁelds of application. Table VII shows anoverview of all the work described.

A. Source Routing

With source routing, the source node deﬁnes the processingof the packet throughout the network. To that end, a headerstack is often added to the packet to specify the operations theother network devices should execute. Title Year Targets CodeSource Routing (XI-A)Lewis et al. [316] 2018 bmv2 [317]Luo et al. [318] 2019 bmv2 [319]Kushwaha et al. [320] 2020 XilinxVirtex-7Abdelsalam et al. [321] 2020 bmv2

Multicast (XI-B)Braun et al. [322] 2017 bmv2 [323]Merling et al. [324] 2018 bmv2 [325]Elmo [326] 2019 - [327]PAM [328] 2020 bmv2

Publish/Subscribe Systems (XI-C)Wernecke et al. [329]–[332] 2018/19 bmv2Jepsen et al. [333] 2018 ToﬁnoKundel et al. [334] 2020 bmv2 [335]FastReact-PS [336] 2020

Named Data Networks (XI-D)NDN.p4 [337], [338] 2016/18 bmv2 [339], [340]ENDN [341] 2020 bmv2

Data Plane Resilience (XI-E)Sedar et al. [342] 2018 bmv2 [343]Giesen et al. [344] 2018 Toﬁno, Xil-inx SDNetSQR [345] 2019 bmv2,Toﬁno [346]P4-Protect [347] 2020 bmv2,Toﬁno [348], [349]Hirata et al. [350] 2019 -Lindner et al. [351] 2020 bmv2,Toﬁno [352], [353]D2R [354] 2019 bmv2PURR [355] 2019 bmv2,ToﬁnoBlink [356] 2019 bmv2,Toﬁno [357]

Other Fields of Applications (XI-F)Contra [358] 2019 -Michel et al. [359] 2016 bmv2Baktir et al. [360] 2018 bmv2Froes et al. [361] 2020 bmv2QROUTE [362] 2020 bmv2Gimenez et al. [363] 2020 bmv2Feng et al. [364] 2019 bmv2PFCA [365] 2020 bmv2McAuley et al. [366] 2019 bmv2R2P2 [367] 2019 Toﬁno [368]TABLE VII: Overview of applied research on routing and forwarding (Sec-tion XI).

Lewis et al. [316] implement a simple source routingmechanism with P4 for the bmv2. The authors introduce aheader stack to specify the processing of the packet towardsits destination. That header stack is constructed and pushedonto the packet by the source node. Network devices matchthe header segments to determine how the packet should beprocessed.Luo et al. [318] implement segment routing with P4. Theyintroduce a header which contains segments that identify cer-tain operations, e.g., forwarding the packet towards a speciﬁcdestination or over a speciﬁc link, updating header ﬁelds,etc. Network nodes process packets according to the topmost segment in the segment routing header and remove it aftersuccessful execution.Kushwaha et al. [320] implement bitstream, a minimalisticprogrammable data plane for carrier-class networks, in P4 forFPGAs. The focus of bitstream is to provide a programmabledata plane while ensuring several carrier-grade properties, likedeterministic latencies, short restoration time, and per-servicemeasurements. To that end, the authors implement a sourcerouting approach in P4 which leaves the conﬁguration of theheader stack to the control plane.The authors of [321] show a demo of segment routing overIPv6 dataplane (SRv6) implementation in P4. It leverages thenovel uSID instruction set for SRv6 to improve scalability andMTU efﬁciency.

B. Multicast

Multicast efﬁciently distributes one-to-many trafﬁc from thesource to all subscribers. Instead of sending individual packetsto each destination, multicast packets are distributed on tree-like structures throughout the network.Bit Index Explicit Replication (BIER) [369] is an efﬁcienttransport mechanism for IP multicast trafﬁc. In contrast totraditional IP multicast, it prevents subscriber-dependent for-warding entries in the core network by leveraging a BIERheader that contains all destinations of the BIER packet. Tothat end, the BIER header contains a bit string where each bitcorresponds to a speciﬁc destination. If a destination shouldreceive a copy of the BIER packet, its corresponding bit isactivated in the bit string in BIER header of the packet. Braunet al. [322] present a demo implementation of BIER-basedmulticast in P4. Merling et al. [324] implement BIER-basedmulticast with fast reroute capabilities in P4 for the bmv2.Elmo [326] is a system for scalable multicast in multi-tenantdatacenters. Traditional IP multicast maintains subscriber de-pendent state in core devices to forward multicast trafﬁc. Thislimits scalability, since the state in the core network has tobe updated every time subscribers change. Elmo increasesscalability of IP multicast by moving a certain subscriber-dependent state from the core devices to the packet header.Priority-based adaptive multicast (PAM) [328] is a controlprotocol for data center multicast which is implemented by theauthors in P4. Network administrators deﬁne different policiesregarding priority, latency, completion time, etc., which areinstalled on the core switches. The network devices thanmonitor link loads and adjust their forwarding to fulﬁll thepolicies.

C. Publish/Subscribe Systems

Publish/subscribe systems are used for data distribution.Subscribers are able to subscribe to announced topics. Basedon the subscriptions, the data packets are distributed from thesource to all subscribers.Wernecke et al. [329]–[332] implement a content-basedpublish/subscribe mechanism with P4. The distribution treeto all subscribers is encoded directly in the header of thedata packets. To that end, the authors introduce a header stackwhich is pushed onto the packet by the source. Each element in the stack consists of an ID and a value. When a node receives apacket, it checks whether the header stack contains an elementwith its own ID. If so, the value determines to which neighborsthe packet has to be forwarded.Jepsen et al. [333] introduce a description language toimplement publish/subscriber systems. The data plane descrip-tion is translated into a static pipeline and dynamic ﬁlters.The static pipeline is a P4 program that describes a packetprocessing pipeline for P4 switches, the dynamic ﬁlters are theforwarding rules of the match-action tables that may changeduring operation, e.g., when subscriptions change.Kundel et al. [334] propose two approaches for attribute/-value encoding in packet headers for P4-based publish/sub-scribe systems. This reduces the header overhead and facili-tates adding new attributes which can be used for subscriptionby hosts.FastReact-PS [336] is a P4-based framework for event-based publish/subscribe in industrial IoT networks. It supportsstateful and stateless processing of complex events entirelyin the data plane. Thereby, the forwarding logic can bedynamically adjusted by the control plane without the needfor recompilation. D. Named Data Networking

Named data networking (NDN) is a content-centricparadigm where information is requested with resource iden-tiﬁers instead of destinations, e.g., IP addresses. Networkdevices cache recently requested resources. If a requestedresource is not available, network devices forward the requestto other nodes.NDN.p4 [337] implements NDN without caching for P4.However, the implementation cannot cache requests because ofP4-related limitations with stateful storage. Miguel et al. [338]leverage the new functionalities of P4 to extend NDN.p4 bya caching mechanism for requests and optimize its operation.The caching mechanism is implemented with P4 externs.Enhanced NDN (ENDN) [341] is an advanced NDN archi-tecture. It offers a larger catalog of content delivery featureslike adaptive forwarding, customized monitoring, in-networkcaching control, and publish/subscribe forwarding. E. Data Plane Resilience

Sedar et al. [342] implement a fast failover mechanism with-out control plane interaction for P4 switches. The mechanismuses P4 registers or metadata ﬁelds for bit strings that indicateif a particular port is considered up or down. In a match-actiontable, the port bit string provides an additional match ﬁeld todetermine whether a particular port is up or down. Dependingon the port status, default or backup actions are executed. Theauthors rely on a local P4 agent to populate the port bit strings.Giesen et al. [344] introduce a forward error correction(FEC) mechanism for P4. Commonly, unreliable but not com-pletely broken links are avoided. As this happens at the cost ofthroughput, the proposed FEC mechanism facilitates the usageof unreliable links. The concept features a link monitoringagent that polls ports to detect unreliable connections. Whena packet should be forwarded over such a port, the P4 switch calculates a resilient encoding for the packet which is thendecoded by the receiving P4 switch.Shared Queue Ring (SQR) [345] introduces an in-networkpacket loss recovery mechanism for link failures. SQR cachesrecent trafﬁc inside a queue with slow processing speed. If alink failure is detected, the cached packets can be sent overan alternative path. While P4 does not offer the possibility tostore packets for a certain amount of time, the authors leveragethe cloning operation of P4 to keep packets inside the buffer.If a cached packet has not yet met its delay, it gets clonedto another egress port which takes some time. This procedureis repeated until the packet has been stored for a given timespan.P4-Protect [347] implements 1+1 protection for IP net-works. Incoming packets are equipped with a sequence num-ber, duplicated, and sent over two disjoint paths. At anegress point, the ﬁrst version of each packet is accepted andforwarded. As a result, a failure of a single path can becompensated without additional signaling or reconﬁguration.P4-Protect is implemented for the bmv2 and the Toﬁno.Evaluations show that line-rate processing with 100 Gbit/s canbe achieved with P4-Protect at the Toﬁno.Hirata et al. [350] implement a data plane resilience schemebased on multiple routing conﬁgurations. Multiple routingconﬁgurations with disjoint paths are deployed and a headerﬁeld identiﬁes the routing conﬁguration according to whichpackets are forwarded. In the event of a failure, a routingconﬁguration is chosen that avoids the failure.Lindner et al. [351] present a novel prototype for in-networksource protection in P4. A P4-capable switch receives sensordata from a primary and secondary sensor, but forwards onlythe data from the primary sensor if available. It detects thefailure of the primary sensor and then transparently forwardsdata from a secondary sensor to the application. Two differentmechanisms are presented. The counter-based approach storesthe number of packets received from the secondary sensorsince the last packet from the primary sensor has been re-ceived. The timer-based approach stores the time of the lastarrival of a packet from the primary sensor and considers thetime since then. If certain thresholds are exceeded, the P4-switch forwards the data from the secondary sensor.D2R [354] isa data-plane-only resilience mechanism. Upona link failure, the data plane calculates a new path to the des-tination using algorithms like breadth-ﬁrst search and iterativedeepening depth-ﬁrst search. As one pipeline iteration has notenough processing stages to compute the path, recirculationis leveraged. In addition,

Failure Carrying Packets (FCP) isused to propagate the link failure inside the network. Whilethe authors claim that their architecture works with hardwareswitches, e.g., the Toﬁno, they only present and evaluate abmv2 implementation.Chiesa et al. [355] propose a primitive for reconﬁgurablefast ReRoute (PURR) which is a FRR primitive for pro-grammable data planes, in particular for P4. For each des-tination, suitable egress ports are stored in bit strings. Duringpacket processing, the ﬁrst working suitable egress port isdetermined by a set of forwarding rules. Encoding based on

Shortest Common Supersequence guarantees that only few additional forwarding rules are required.Blink [356] detects failures without controller interaction byanalyzing TCP signals. The core concept is that the behaviorof a TCP ﬂow is predictable when it is disrupted, i.e., the samepacket is retransmitted multiple times. When this informationis aggregated over multiple ﬂows, it creates a characteristicfailure signal that is leveraged by data plane switches to triggerpacket rerouting to another neighbor. F. Other Fields of Applications

Contra [358] introduces performance-aware routing with P4.Network paths are ranked according to policies that are deﬁnedby administrators. Contra applies those policies and topologyinformation to generate P4 programs that deﬁne the behaviorof forwarding devices. During runtime, probe packets are usedto determine the current network state and update forwardingentries for best compliance with the deﬁned policies.Michel et al. [359] introduce identiﬁer-based routing withP4. The authors argue that IP addresses are not ﬁne-granularenough to enable adequate forwarding, e.g., in terms ofsecurity policies. The authors introduce a new header that con-tains an identiﬁer token. Before sending packets, applicationstransmit information on the process and user to a controllerthat returns an identiﬁer that is inserted into the packet header.P4 switches are programmed to forward packets based on thatidentiﬁer.Baktir et al. [360] propose a service-centric forwardingmechanism for P4. Instead of addressing locations, e.g., byIP addresses, the authors propose to use location-independentservice identiﬁers. Network hosts write the identiﬁer of thedesired service into the appropriate header ﬁeld, the switchesthen make forwarding decisions based on the identiﬁer in thepacket header. With this approach, the location of the servicebecomes less important since the controller simply updates theforwarding rules when a service is migrated or load balancingis desired.Froes et al. [361] classify different trafﬁc classes whichare identiﬁed by a label. Packet forwarding is based on thatcontroller-generated label instead of IP addresses. The trafﬁcclasses have different QoS properties, i.e., prioritization ofspeciﬁc classes is possible. To that end, switches leveragemultiple queues to process trafﬁc of different trafﬁc classes.QROUTE [362] is a quality of service (QoS) orientedforwarding scheme in P4. Network devices monitor their linksand annotate values, e.g., jitter or delay, in the packet header sothat downstream nodes can update their statistics. Furthermore,packet headers contain constraints like maximum jitter ordelay. According to those values, forwarding decisions aremade by the network devices.Gimenez et al. [363] implement the recursive internet-work architecture (RINA) in P4 for the bmv2. RINA is anetworking architecture which sees computer networking as atype of inter-process communication where layering should bebased on scope/scale instead of function. In general, efﬁcientimplementations require hardware support. However, up todate only software-based implementations are available. Theauthors hope that with the advance of programmable hardwarein the form of P4, hardware-based RINA will soon be possible. Feng et al. [364] implement information-centric network(ICN) based forwarding for HTTP. To that end, they proposemechanisms to convert packets from ICN to HTTP packetsand vice-versa.PFCA [365] implements a forwarding information base(FIB) caching architecture in the data plane. To that end,the P4 program contains multiple MATs that are mappedto different memory, i.e., TCAM, SRAM, dynamic randomaccess memory (DRAM), with different properties regardinglookup speed. Counters keep track of cache hits to move(un)popular rules to other tables.McAuley et al. [366] present a hybrid error control booster(HEC) that can be deployed in wireless, mobile, or hostilenetworks that are prone to link or transport layer failures.HECs increase the reliability by applying a modiﬁed Reed-Solomon code that adds parity packets or additional packetblock acknowledgments. P4 targets include an error controlprocessor that implements this functionality. It is integratedinto the P4 program as P4 extern so that the data planecan exchange HEC packets with it. A remote control planeincludes the booster manager that controls HEC operationsand parameters on the P4 targets via a data plane API.R2P2 [367] is a transport protocol based on UDP forlatency-critical RPCs optimized for datacenters or other dis-tributed infrastructure. A router module implemented in P4 orDPDK is used to relay requests to suitable servers and performload balancing. It may also perform queuing if no suitableserver is available. The goal of R2P2 is to overcome prob-lems that typically come with TCP-based RPC systems, e.g.,problems with load distribution and head-of-line-blocking.XII. A

PPLIED R ESEARCH D OMAINS : A

DVANCED N ETWORKING

We describe applied research on cellular networks (4G/5G),Internet of things (IoT), industrial networking, Time-SensitiveNetworking (TSN), network function virtualization (NFV),and service function chains (SFCs). Table VIII shows anoverview of all the work described.

A. Cellular Networks (4G/5G)

P4EC [370] builds a local exit for LTE deployments withcloud-based EPC services. A programmable switch distin-guishes trafﬁc and reroutes trafﬁc for edge computing. Non-critical trafﬁc is forwarded to the cloud-based EPC.The Trellis switch fabric (introduced in Section X-A) fea-tures the spgw.p4 proﬁle [267], [271], an implementation ofa Serving and PDN Gateway (SPGW) for 5G networking.ONOS runs an SPGW-u application that implements the 3GPPcontrol and user plane separation (CUPS) protocol to create,modify, and delete GPRS tunneling protocol (GTP) sessions.It provides support for GTP en- and decapsulation, ﬁltering,and charging.SMARTHO [372] proposes a handover framework for 5G.Distributed units (DUs) include real-time functions for multi-ple 5G radio stations. Several DUs are controlled by a centralunit (CU) that includes non-real-time control functions. P4switches are part of the CU and all DU nodes. SMARTHO Title Year Targets CodeCellular Networks (4G/5G) (XII-A)P4EC [370] 2020 ToﬁnoTrellis [271] - - [371]SMARTHO [372] 2018 bmv2Aghdai et al. [373], [374] 2018/19 NetronomeGRED [375] 2019 bmv2HDS [376] 2020 -Shen et al.1 [377] 2019 Xilinx SDNetLee et al. [378] 2019 ToﬁnoRicart-Sanchez et al. [379] 2019 NetFPGA-SUMESingh et al. [380] 2019 ToﬁnoTurboEPC [381] 2020 NetronomeVörös et al. [382] 20200 ToﬁnoLin et al. [383] 2019 Toﬁno

Internet of Things (XII-B)BLESS [384] 2017 PISCESMuppet [385] 2018 PISCESWang et al. [386] 2019 ToﬁnoMadureira et al. [387] 2020 bmv2Engelhard et al. [388] 2019 bmv2

Industrial Networking (XII-C)FastReact [389] 2018 bmv2Cesen et al. [390] 2020 bmv2

Time-Sensitive Networking (TSN) (XII-D)Rüth et al. [391] 2018 NetronomeKannan et al. [392] 2019 ToﬁnoKundel et al. [393] 2019 Toﬁno

Network Function Virtualization (NFV) (XII-E)Kathará [394] 2018 -P4NFV [395] 2018 bmv2Osi´nski et al. [396] 2019 -Moro et al. [397] 2020 -DPPx [398] 2020 bmv2Mohammadkhan et al. [399] 2019 NetronomeFOP4 [400], [401] 2019 bmv2, eBPFPlaFFE [402] 2020 Netronome

Service Function Chains (SFCs) (XII-F)P4SC [403], [404] 2019 bmv2, Toﬁno [405]Re-SFC [406] 2019 bmv2FlexMesh [407] 2020 bmv2P4-SFC [408] 2019 bmv2, Toﬁno [409]TABLE VIII: Overview of applied research on advanced networking (Sec-tion XII). introduces a P4-based mechanism for preparing handoversequences for user devices that take a ﬁxed path among 5Gradio stations controlled by DUs. This decreases the overallhandover time, e.g., for users traveling in a train.Aghdai et al. [373] propose a P4-based transparent edgegateway (EGW) for mobile edge computing (MEC) in LTEor 5G networks. Delay-sensitive and bandwidth-intense ap-plications need to be moved from data centers in the corenetwork to the edge of the radio access network (RAN). 5Gnetworks rely on GTP-U for encapsulating IP packets from themobile user to the core network. IP routers in between forwardpackets based on the outer IP address of GTP-U frames. Theauthors deploy EGWs as P4 switches at the edge of the IPtransport network where service operators can deploy scalablenetwork functions or services. Each MEC service gets a virtual IP address, the P4-based EGWs parse the inner IP destinationaddress of GTP-U. If it sees trafﬁc targeting a virtual IPaddress of a MEC service, it forwards it to the IP addressof one of the serving instances of the MEC application. Intheir follow-up work [374], the authors extend EGWs by ahandover mechanism for migrating network state.GRED [375] is an efﬁcient data placement and retrievalservice for edge computing. It tries to improve routing pathlengths and forwarding table sizes. They follow a greedy for-warding approach based on DT graphs, where the forwardingtable size is independent of the network size and the numberof ﬂows in the network. GRED is implemented in P4, but theauthors do not specify on which target.HDS [376] is a low-latency, hybrid, data sharing frameworkfor hierarchical mobile edge computing. The data locationservice is divided into two parts: intra-region and inter-region. The authors present a data sharing protocol calledCuckoo Summary for fast data localization for the intra-regionpart. Further, they developed a geographic routing scheme toachieve efﬁcient data location with only one overlay hop inthe inter-region part.Shen et al. [377] present an FGPA-based GTP engine formobile edge computing in 5G networks. Communication be-tween the 5G back-haul and the conventional Ethernet requiresde- and encapsulation of trafﬁc with GTP. As most networkentities do not have the capability to process GTP, the authorsleverage P4-programmable hardware for this purpose.Lee et al. [378] evaluate the performance of GTP-U andSRv6 stateless translation as GPT-U cannot be replaced bySRv6 without a transition period. To that end, they implementGTP and SRv6 on P4-programmable hardware. They foundthat there are no performance drops if stateless translation isused and that SRv6 stateless translation is acceptable for the5G user plane.Ricart-Sanchez et al. [379] propose an extension for the P4-NetFPGA framework for network slicing between different 5Gusers. The authors extend the capabilities of the P4 pipelineand implement their mechanism on the NetFPGA-SUME.However, the authors do not provide any details about theirimplementation.Singh et al. [380] present an implementation for the EvolvedPacket Gateway (EPG) in the Mobile Packet Core of 5G. Theyshow that they can ofﬂoad the functionality to programmableswitching ASICs and achieve line rate with low latency andjitter while scaling up to 1.7 million active users.TurboEPC [381] presents a redesign of the mobile packetcore where parts of the control plane state is ofﬂoaded toprogrammable switches. State is stored in MATs. The switchesthen process a subset of signaling messages within the dataplane itself, which leads to higher throughput and reducedlatency.Vörös et al. [382] propose a hybrid approach for thenext generation NodeB (gNB) where the majority of packetprocessing is done by a high-speed P4-programmable switch.Additional functions, such as ARQ or ciphering, are ofﬂoadedto external services such as DPDK implementations.Lin et al. [383] enhance the Content Permutation Algorithm(eCPA) for secret permutation in 5G. Packet payloads are split into code words and shufﬂed according to a secret cipher. Theyimplement eCPA for switches of the Inventec D5264 series. B. Internet of Things (IoT)

BLESS [384] implements a Bluetooth low energy (BLE)service switch based on P4 that acts as a proxy enablingﬂexible, policy-based switching and in-network operations ofIoT devices. BLE devices are strictly bound to a central devicesuch as a smartphone or tablet. IoT usage requires cloud-basedsolutions where central devices connect to an IoT infrastruc-ture. The authors propose a BLE service switch (BLESS)that is transparently inserted between peripheral and centraldevices and acts like a transparent proxy breaking up thepeer-to-peer model. It maintains BLE link layer connectionsto peripheral devices within its range. A central controllerimplements functionalities such as service discovery, accesspolicy enforcement, and subscription management so thatfeatures like service slicing, enrichment, and composition canbe realized by BLESS.Muppet [385] extends BLESS by supporting the Zigbeeprotocol in parallel to BLE. In addition to the features ofBLESS, inter-protocol services between Zigbee and BLE andBLE/Zigbee and IP protocols are introduced. An example forthe latter are HTTP transactions that are automatically sent outby the switch if it sees a speciﬁed set of BLE/Zigbee transac-tions. The data plane implementation of BLESS is extended byprotocol-dependent packet parsers and processing and supportfor encrypted Zigbee packets via packet recirculation.Wang et al. [386] implement aggregation and disaggregationof small IoT packets on P4 switches. For a small IoT packet,the header holds a large proportion of the packet’s total size.In large streams of IoT packets, this causes high overhead.The current aggregation techniques for IoT packets are imple-mented by external servers or on the control plane of switches,both resulting in low throughput and added latency. Therefore,the authors propose an implementation directly on P4 switcheswhere IoT packets are buffered, aggregated, and encapsulatedin UDP packets with a custom ﬂag-header, type, and padding.In disaggregation, the incoming packet is cloned to stripe outthe single messages until all messages are separated.Madureira et al. [387] present the

Internet of Things Proto-col (IoTP) , an L2 communication protocol for IoT data planes.The main purpose of IoTP is data aggregation at the networklevel. IoTP introduces a new, ﬁxed header and is compatiblewith any forwarding mechanism. The authors implementedIoTP for the bmv2 and store single packets of a ﬂow inregisters until the data can be aggregated.Engelhard et al. [388] present a system for massive wirelesssensor networks. They implement a physically distributed,and logically centralized wireless access systems to reducethe impairment by collisions. P4 is leveraged as connectionbetween a physical access point and a virtual access point.To that end, they extend the bmv2 to provide additionalfunctionality. However, they give information about their P4program only in form of a decision ﬂow graph.

C. Industrial Networking

FastReact [389] outsources sensor data packet processingfrom centralized controllers to P4 switches. The sensor datais recorded in variable-length time series data stores where anadditional ﬁeld holds the current moving average calculatedon the time series. Both data for all sensors can be polled bya central controller. For controlling actuators directly on thedata plane, FastReact supports the formulation of control logicin conjunctive normal form (CNF). It is mapped to actionsto either forward signal data to the controller, discard it, ordirectly send it to the actuator. FastReact also features failurerecovery directly on the switch. For every sensor and actuator,timestamps for the last received packets along a timeout limitis recorded. If failures are detected, sensor data are forwardedfollowing failover rules with backup actuators for particularsensors.Cesen et al. [390] leverage P4-capable switches to movecontrol logic to the network. Control applications reside incontrollers that are responsible for emergency intervention,e.g., if a given threshold is exceeded. The connection to thecontroller may be faulty and, therefore, controller interventionmay not be fast enough. In this work, the authors generateemergency packets, i.e., stop commands, directly in the dataplane. The action is triggered if the switch receives a packetwith a speciﬁc payload.

D. Time-Sensitive Networking (TSN)

Rüth et al. [391] introduce a scheme for implementing in-network control mechanisms for linear quadratic regulators(LQR). LQRs can be described by a multiplication of a matrixand a vector. The vector describes the control of the actuator,the matrix describes the current system state. The result ofthe multiplication is a control command. The destination of aswitch describes a speciﬁc actuator. When a switch receives acontrol packet, it matches the destination of the packet onto amatch-and-action table. The lookup provides the control vectorfor the actuator. The control vector from the lookup is thenmultiplied with the system state matrix that is stored in aregister to calculate the control command for the actuator. Theresulting control command is written into the packet headerand the packet is forwarded to the target actuator.Kannan et al. [392] introduce the Data Plane Time syn-chronization Protocol (DPTP) for distributed applications withcomputations directly on the P4 data plane. DPTP follows arequest-response model, i.e., all P4 switches request the globaltime from a designated master switch. Therefore, each switchfeatures a local control plane that generates time requests sentto the master switch. Additionally, the control plane handlesoverﬂows in time calculation for administration.Kundel et al. [393] demonstrate timestamping with nanosec-ond accuracy. They describe a simple setup with a Toﬁno-based switch and a breakout cable to connect two ports ofthe switch. In the experiment, timestamps at the moment ofsending and reception are recorded in the packet header. Theauthors compare those two timestamps to show that very ﬁne-grained measurements are possible. E. Network Function Virtualization (NFV)

Kathará [394] runs NFs as P4 programs either on soft-ware or hardware targets. For software-based deployment, theframework leverages Docker containers that run NFs as con-tainer images or individual setups for Quagga, Open vSwitch,or bmv2 container images. For hardware-based deployment onP4 switches, NFs are either replicated on every P4 switch ordistributed on multiple P4 switches as needed. In both cases,a load balancer or service classiﬁer forwards ﬂows to theappropriate P4 switch. As a main advantage, P4 programs canbe shifted between the bmv2-based P4 software targets andhardware targets depending on the required performance.P4NFV [395] also deals with the idea of running NFseither on software- or hardware-based P4 targets. The authorsadopt the ETSI NFV architecture with control and monitoringentities and add a layer that abstracts various types of software-and hardware-based P4 targets as P4 nodes. For optimizeddeployment, the targets performance characteristics are partof the P4 node description. For runtime reconﬁguration, theauthors propose two approaches. In pipeline manipulation, theP4 program features multiple match-action pipelines that canbe enabled or disabled by setting register ﬂags. In programreload, a new P4 program is compiled and loaded to the P4target. The authors propose to perform state management andmigration either directly on the data plane or via a controlplane.Osi´nski et al. [396] use P4 to ofﬂoad the data plane ofvirtual network functions (VNFs) into a cloud infrastructureby allowing VNFs to inject small P4 programs into P4 deviceslike SmartNICs or top-of-rack switches. This results in betterperformance and a microservice-based approach for the dataplane. A new P4 architecture model that integrates abstractionsused to develop VNF data planes was developed.Moro et al. [397] present a framework for NF decompositionand deployment. They split NFs into components that can runon CPUs or that can be ofﬂoaded to speciﬁc programmablehardware, e.g., P4 programmable switches. The presentedorchestrator combines multiple functions into a single P4program that can be deployed to programmable switches.DPPx [398] implements a framework for P4-based dataplane programmability and exposure which allows to enhanceNFV services. They introduce data plane modules writtenin P4 which can be leveraged by the application plane. Asan example, a dynamic optimization of packet ﬂow routing(DOPFR) is implemented using DPPx.Mohammadkhan et al. [399] provide a uniﬁed P4 switchabstraction framework where servers with software NFs andP4-capable SmartNICs are seen as one logical entity by theSDN controller. They further leverage Mixed Integer LinearProgramming (MILP) to determine partitioning of P4 tablesfor optimal placement of NFs.FOP4 [400] [401] implements a rapid prototyping plat-form that supports container-based, P4-switch-based, andSmartNIC-based NFs. They argue that a prototyping platformis needed to quickly develop and evaluate new NFV use cases.PlaFFE [402] introduces NFV ofﬂoading where some fea-tures of VNFs or embedded Network Functions (eNFs) areexecuted on SmartNICs using P4. Additionally, P4 is used to steer trafﬁc either through the eNFs or through VNFs usingSR-IOV.

F. Service Function Chains (SFCs)

P4SC [403] [404] implements a SFC framework for P4targets. SFCs are described as directed acyclic graph of servicefunctions (SFs). In P4SC, SFs are represented by blocks.Each block has an unique identiﬁer, a P4 program for ingressprocessing, and a P4 program for egress processing. P4SCincludes 15 SF blocks, e.g., L2 forwarding, which are extractedfrom switch.p4. After the user speciﬁed all SFCs for a partic-ular P4 target, the P4SC converter merges the directed acyclicgraphs of all SFCs with an LCS-based algorithm into anintermediate representation. Then, the P4SC generator createsthe ﬁnal P4 program based on the intermediate representationto be deployed onto the P4 target. P4 program generationincludes runtime management, i.e., the generator creates oneAPI per SFC while hiding SF-speciﬁc details, e.g., names ofparticular match-and-action tables.Re-SFC [406] improves P4SC’s resource usage by usingresubmit operations. If the speciﬁed order of SFs in an SFCdoes not match the pre-embedded SF of the P4 switch,incoming ﬂows cannot be processed. P4SC solves this problemby permitting redundant NF embeds, i.e., if SFs of one SFCare required by another SFCs, those SFs are just replicated.To reduce the costly usage of match-and-action tables, Re-SFCintroduces resubmit actions where packets are re-bounced tothe ingress.FlexMesh [407] tackles the problem of ﬁxed SFC ﬂowcontrol, i.e., when the speciﬁed order of SFs does not matchthe pre-embedded SF, by leveraging MATs. SFs can bedynamically bypassed, and recirculation is used to build anydesired SF chain.P4-SFC [408] is an SFC framework based on MPLS seg-ment routing and NFV. P4 is used to implement a trafﬁcclassiﬁer. A central orchestrator deploys service functions asVNFs and conﬁgures the trafﬁc classiﬁer based on deﬁnitionsof SFCs.XIII. A

PPLIED R ESEARCH D OMAINS : N

ETWORK S ECURITY

We describe applied research on ﬁrewalls, port knocking,DDoS attack mitigation, intrusion detection systems, connec-tion security, and other ﬁelds of application. Table IX showsan overview of all the work described.

A. Firewalls

Ricart-Sanchez et al. [410] present a 5G ﬁrewall that ana-lyzes GTP data transmitted between edge and core networks.P4 allows an implementation of parsing and matching GTPheader ﬁelds such as 5G user source IP, 5G user destinationIP, and identiﬁcation number of the GTP tunnel. The P4pipeline implements an allow-by-default policy, DROP actionsfor speciﬁc sets of keys can be installed via a data plane API.In a follow-up work [411], the authors extend the 5G ﬁrewallby support for multi-tenancy with VXLAN. Title Year Targets CodeFirewalls (XIII-A)Ricart-Sanchez et al. [410], [411] 2018/19 NetFPGA-SUMECoFilter [412] 2018 ToﬁnoP4Guard [413] 2018 bmv2Vörös and Kiss [414] 2016 p4c-behavioral

Port Knocking (XIII-B)P4Knocking [415] 2020 bmv2Almaini et al. [416] 2019 bmv2

DDoS Mitigation Mechanisms (XIII-C)LAMP [417] 2018 bmv2TDoSD@DP [418], [419] 2018/19 bmv2Kuka et al. [420] 2019 XilinxUltraScale+,Intel Stratix 10Paolucci et al. [421], [422] 2018/19 bmv2, NetFPGA-SUMEML-Pushback [423] 2019 -Afek et al. [424] 2017 p4c-behavioralCardoso Lapolli et al. [425] 2019 bmv2 [426]Cai et al. [427] 2020 -Lin et al. [428] 2020 bmv2Musumeci et al. [429] 2020 bmv2DIDA [430] 2020 bmv2Dimolianis et al. [431] 2020 NetronomeScholz et al. [432] 2020 bmv2, T P S,Netronome,NetFPGA SUME [433]Friday et al. [434] 2020 bmv2

Intrusion Detection Systems & Deep Packet Inspection (XIII-D)P4ID [435] 2019 bmv2Kabasele and Sadre [436] 2018 bmv2DeepMatch [437] 2020 Netronome [438]Qin et al. [439] 2020 bmv2, Netronome [440]

Connection Security (XIII-E)P4-MACsec [441] 2020 bmv2, NetFPGA-SUME [442]P4-IPsec [443] 2020 bmv2, NetFPGA-SUME, Toﬁno [444]SPINE [445] 2019 bmv2 [446]Qin et al. [447] 2020 bmv2P4NIS [448] 2020 bmv2 [449]LANIM [450] 2020 bmv2

Other Fields of Application (XIII-F)Chang et al. [451] 2019 bmv2Clé [452] 2019 -P4DAD [453] 2020 bmv2Chen [454] 2020 Toﬁno [455]Gondaliya et al. [456] 2020 NetFPGA SUMEPoise [457] 2020 Toﬁno [458]TABLE IX: Overview of applied research on network security (Section XIII).

CoFilter [412] implements an efﬁcient ﬂow identiﬁcationscheme for stateful ﬁrewalls in P4. To solve the problemof limited table sizes on SDN switches, ﬂow identiﬁers arecalculated by applying a hashing function to the 5-tuple ofevery packet directly on the switch. The proposed conceptincludes a novel hash rewrite function that is implemented onthe data plane. It resolves hash commission and hash tableoptimization using an external server.P4Guard [413] replaces software-based ﬁrewalls by P4-based virtual ﬁrewalls in the VNGuard [459] system. VN- Guard introduces controller-based deployment and manage-ment of virtual ﬁrewalls with the help of SDN and NFV.The P4-based ﬁrewall comprises a single MAT that allowsALLOW/DROP decision for Layer 3/4 header ﬁelds as matchkeys. The ﬂow statistics are recorded with the help of coun-ters. Another MAT allows enabling/disabling the ﬁrewall atruntime.Vörös and Kiss [414] present a ﬁrewall implemented inP4. The parser supports Ethernet, IPv4/IPv6, UDP, and TCPheaders. A ban list comprises MAC address/IP address entriesthat represent network hosts. Packets matching this ban listare directly dropped. To mitigate port scan or DDoS attacks,counters track packet rate and byte transfer statistics. AnotherMAT implements whitelist ﬁltering.

B. Port Knocking

Port knocking is a simple authentication mechanism foropening network ports. Network hosts send TCP SYN packetsin predeﬁned sequences to certain ports. If the sequence iscompleted correctly, the server opens up a desired port. Typi-cally, port knocking is implemented in software on servers.P4Knocking [415] implements port knocking on P4switches. The authors propose four different implementationsfor P4. In the ﬁrst implementation, P4 switches track the stateof knock sequences in registers where the source IP addressis used as an index. The second implementation uses a CRC-hash of the source IP address as index for the knocking stateregisters. To resolve the problem of hash collisions, the thirdimplementation relies on identiﬁers that are calculated andmanaged by the controller. The fourth implementation solelyrelies on the controller, i.e., P4 switches forward all knockingpackets to the controller.Almaini et al. [416] implement port knocking with a ticketmechanism on P4 switches. Trafﬁc is only forwarded if thesender has a valid ticket. Predeﬁned trusted nodes have aticket by default, untrustworthy nodes must obtain a ticketby successful authentication via port knocking. The authorsuse the HIT/MISS construct of P4 as well as stateful P4 com-ponents to implement the concept. Port knocking sequencesand trusted/untrusted hosts can be maintained by the controlplane.

C. DDoS Attack Mitigation

LAMP [417] presents a cooperative mitigation mechanismfor DDoS attacks that relies on information from the applica-tion layer. Ingress P4 switches add a unique identiﬁer to theIP options header ﬁeld of any processed packet. The last P4switch ahead of the target host stores this mapping and emptiesthe IP options header ﬁeld. If a network hosts, e.g., a databaseserver, detects an ongoing DDoS attack on the applicationlayer, it adds an attack ﬂag to the IP options header ﬁeld andsends it back to the switch. The switch forwards this packetto the ingress switch to enable dropping of all further packetsof this ﬂow.TDoSD@DP [418] is a P4-based mitigation mechanismfor DDoS attacks targeting SIP proxies. Stateful P4 registersrecord the number of SIP INVITE and SIP BYE messages. Then, a simple state machine monitors sequences of INVITEand BYE messages. Many INVITES followed by zero BYEmessages lead to dropping SIP INVITE packets where validsequences of INVITE and BYE messages will keep the portopen. In a follow-up work [419], the authors present an alter-native approach where P4 switches act as distributed sensors.An SDN controller periodically collects data from countersof P4 switches to perform centralized attack detection. Then,attack mitigation is performed by installing DROP rules onthe P4 switches.Kuka et al. [420] present a DDoS mitigation system thattargets volumetric DDoS attacks called reﬂective ampliﬁcationattacks. The authors port an existing VHDL implementationinto a P4 program that runs on FPGA targets. The implementa-tion selects the affected subset of the incoming trafﬁc, extractspacket data, and forwards it as a digest to an SDN controller.The SDN controller continuously evaluates this information;a heuristic algorithm identiﬁes aggressive IP addresses bylooking at the volumetric contribution of source IP addressesto the attack. In case of a detected attack, the SDN controllerinstalls DROP rules.Paolucci et al. [421], [422] present a stateful mitigationmechanism for TCP SYN ﬂood attacks. It is part of a P4-basededge packet-over-optical node that also comprises trafﬁc engi-neering functionality. P4 registers keep per-session statistics todetect TCP SYN ﬂood attacks. One register records the portnumber of the last TCP SYN packet, the another one recordsthe number of attempts matching the TCP SYN ﬂood behavior.If the latter one exceeds a deﬁned threshold, the packets aredropped.ML-Pushback [423] proposes an extension of the PushbackDDoS attack mitigation mechanism by machine learning tech-niques. P4 switches implement a data collector mechanism thatcollects dropped packets and forwards them as digest messagesto the control plane. On the control plane, a deep learningmodule extracts signatures and classiﬁes the collected digestwith a decision tree model. Attack mitigation is performed bythrottling attacker trafﬁc via rate limits.Afek et al. [424] implement known mitigation mechanismsfor SYN and DNS spooﬁng in DDoS attacks for OpenFlowand P4 targets. The OpenFlow implementation targets OpenvSwitch and OpenFlow 1.5 where P4 implementations arecompiled for p4c-behavioral without control plane involve-ment. In addition, the authors implemented a set of algorithmsand methods for dynamically distributing the rule space overmultiple switches.Cardoso Lapolli et al. [425] describe an algorithmic ap-proach to detect and stop DDoS attacks on P4 data planes.The algorithm was speciﬁcally created under the functionalconstraints of P4 and is based on the calculation of theShannon entropy.Cai et al. [427] propose a novel method for collecting trafﬁcinformation to detect TCP port scanning attacks. The authorspropose the "0-replacement" method as efﬁcient alternative toexisting sampling and aggregation methods. It introduces apending request counter (PRcounter) and relies on registersto bind hashing identiﬁers of the attackers’ IP addressesto PRcounter values. The authors describe the concept as compliant to PSA, but only simulation results are given.Lin et al. [428] present a comparison of OF- and P4-basedimplementations of basic mitigation mechanisms against SYNﬂooding and ARP spooﬁng attacks.Musumeci et al. [429] present P4-assisted DDoS attackmitigation using an ML classiﬁer. An ML-based DDoS attackdetection module with a classiﬁer is running on a controller.The P4 switch forwards trafﬁc to the module; the DDoSattack detection module responds with a decision. The authorsconsider three use cases: packet mirroring + header mirroring+ metadata extraction. In metadata extraction, P4 switchesimplement counters that store occurrences of IP, UDP, TCP,and SYN packets. In the case that one of the counters exceedsa deﬁned threshold, the P4 switch inserts a custom header withthe counter values and sends it to the DDoS attack detectionmodule.DIDA [430] presents a distributed mitigation mechanismagainst ampliﬁed reﬂection DDoS attacks. In this type ofDDoS attack, spoofed requests lead to responses that are bymagnitude larger. An example is a DNS ANY query. The au-thors rely on count-min sketch data structures and monitoringintervals to put the number of requests and responses intorelation. In case of a detected DDoS attack, ACLs are used toblock the trafﬁc near to the attacker.Dimolianis et al. [431] introduce a multi-feature DDoSdetection scheme for TCP/UDP trafﬁc. It considers the totalnumber of incoming trafﬁc for a particular network, the sig-niﬁcance of the network, and the symmetry ratio of incomingand outgoing trafﬁc for classiﬁcations. The feature analysis istime-dependent and focuses on distinct time intervals.Scholz et al. [432] propose a SYN proxy that relies onSYN cookies or SYN authentication as protection againstSYN ﬂooding DDoS attacks. The authors present a softwareimplementation based on DPDK and compare it to a bmv2-based P4 implementation that is ported to the T P S P4software target, Netronome P4 hardware target, and NetFPGASUME P4 hardware target. Evaluation results, beneﬁts, andchallenges for each platform are discussed.Friday et al. [434] present a two-part DDoS detection andmitigation scheme. In the ﬁrst part, a P4 target applies a one-way trafﬁc analysis using bloom ﬁlters and time-dependentstatistics such as moving averages. In the second part, theP4 target analyzes the bandwidth and transport protocols usedby various applications to perform a volumetric analysis.The processing pipeline then decides about malicious trafﬁcto be dropped. Administrators may supply custom networkparameters used for dynamic threshold calculation that arethen installed via an API on the data plane. The authorsdemonstrate the effectiveness of the proposed approach bythree use cases: UDP ampliﬁcation DDoS attacks, SYN ﬂood-ing DDoS attacks, and slow DDoS attacks.

D. Intrusion Detection Systems (IDS) & Deep Packet Inspec-tion (DPI)

P4ID [435] reduces intrusion detection system (IDS) pro-cessing load by apply pre-ﬁltering on P4 switches (IDS of-ﬂoading/bypassing). P4ID features a rule parser that translates Snort rules with a multistage mechanism into MAT entries.The P4 processing pipeline implements a stateless and astateful stage. In the stateless stage, TCP/ICMP/UDP packetsare matched against a MAT to decide if trafﬁc should bedropped, forwarded to the next hop, or forwarded to the IDS.In the stateful stage, the ﬁrst n packets of new ﬂows areforwarded to the IDS. This allows that trafﬁc targeting well-known ports can be also analyzed. Combining the feedback ofthe IDS for packet samples with the stateless stage is futurework.Kabasele and Sadre [436] present a two-level IDS forindustrial control system (ICS) networks. The IDS targets theModbus protocol that runs on top of TCP in SCADA networks.The ﬁrst level comprises two whitelists: a ﬂow whitelist for ﬁl-tering on the TCP layer and a Modbus whitelist. If no matchingentry is found for a given packet, it is forwarded to the secondlayer. This is in stark contrast to legacy whitelisting wherepackets are just dropped. In the second level, a Zeek networksecurity analyzer acts as deep packet inspector running on adedicated host. It analyzes the given packet, makes a decision,and instructs the controller to update ﬁlters on the switch.DeepMatch [437] introduces deep packet inspection (DPI)for packet payloads. The concept is implemented with thehelp of network processors; its prototype is built with theNetronome NFP-6000 SmartNIC P4 target. The authorspresent regex matching capabilities that are executed in40 Gbit/s (line rate of the platform) for stateless intra-packetmatching and about 20 Gbit/s for stateful inter-packet match-ing. The DeepMatch functionalities are natively implementedin Micro-C for the Netronome platform and integrated into theP4 processing pipeline with the help of P4 externs.Qin et al. [439] present an IDS based on binarized neuralnetworks (BNN) and federated learning. BNNs compress neu-ral networks into a simpliﬁed form that can be implementedon P4 data planes. Weights are compressed into single bits andcomputations, e.g., activation functions, are converted into bit-wise operations. P4 targets at the network edge then applyBNNs to classify incoming packets. To continuously trainthe BNNs on the P4 targets, the authors propose a federatedlearning scheme. Each P4 target is connected to a controllerthat trains an equally-structured neural network with samplesreceived from the P4 target. A cloud service aggregates localupdates received from the controllers and responds with weightupdates that are processed into the local model. E. Connection Security

P4-MACsec [441] presents an implementation of IEEE802.1AE (MACsec) for P4 switches. A two-tier control planewith local switch controllers and a central controller monitorthe network topology and automatically set up MACsec ondetected links between P4 switches. For link discovery andmonitoring, the authors implement a secured variant of LLDPthat relies on encrypted payloads and sequence numbers.MACsec is directly implemented on the P4 data plane; en-cryption/decryption using AES-GCM is implemented on theP4 target and integrated in the P4 processing pipeline as P4externs. P4-IPsec [443] presents an implementation of IPsec forP4 switches. IPsec functionality is implemented in P4 andincludes ESP in tunnel mode with support for different ciphersuites. As in P4-MACsec, the cipher suites are implementedon the P4 target and integrated as P4 externs. In contrast tostandard IPsec operation, IPsec tunnels are set up and renewedby an SDN controller without IKE. Site-to-site operation modesupports IPsec tunnels between P4 switches. Host-to-site oper-ation mode supports roadwarrior access to an internal networkvia a P4 switch. To make the roadwarrior host manageable bythe controller, the authors introduce a client agent tool forLinux hosts.SPINE [445] introduces surveillance protection in the net-work elements by IP address obfuscation against surveillancein intermediate networks. In contrast to software-based ap-proaches such as TOR, SPINE runs entirely on the data planeof two nodes with intermediate networks in between. It appliesa one-time-pad-based encryption scheme with key rotationto encrypt IP addresses and, if present, TCP sequence andacknowledgment numbers. The SPINE nodes add a versionnumber representing the encryption key index to each packetby which the receiving switch can select the appropriate keyfor decryption. The key sets required for the key rotation aremaintained by a central controller.Qin et al. [447] introduce encryption of TCP sequencenumbers using substitution-boxes to protect trafﬁc betweentwo P4 switches. An ONOS-based controller receives the ﬁrstpacket of each new ﬂow and applies security policies to decidewhether the protection should be enabled. Then, it installs thenecessary data in registers and updates MATs to enable TCPsequence number substitution.P4NIS [448] proposes a scheme to protect against eaves-dropping attacks. It comprises three lines of defense. In theﬁrst line of defense, packets that belong to one trafﬁc ﬂoware disorderly transmitted via various links. In the secondline of defense, source/destination ports and sequence/ac-knowledgment numbers are substituted via s-boxes similar tothe approach of Qin et al. [447]. The third line of defenseresembles existing encryption mechanisms that are not coveredby P4NIS.LANIM [450] presents a learning-based adaptive networkimmune mechanism to prevent against eavesdropping attacks.It targets the Smart Identiﬁer Network (SINET) [460], a novel,three-layer Internet architecture. LANIM applies the minimumrisk ML algorithm to respond to irregular conditions andapplies a policy-based encryption strategy focusing on theintent and application.

F. Other Fields of Application

Chang et al. [451] present IP source address encryption.It accomplishes non-linkability of IP addresses as proactivedefense mechanism. Network hosts are connected to trustedP4 switches at the network edges. In between, packets areexchanged via untrusted switches/routers. The P4 switch nextto the sender encrypts the sender IP address by applying anXOR operation with a hash calculated by a random numberand a shared key. The P4 switch next to the receiver decrypts the original sender IP address. The mechanism includes adynamic key update mechanism so that transformations arerandom.Clé [452] proposes to upgrade particular switches in alegacy network to P4 switches that implement security networkfunctions (SNFs) such as rule-based ﬁrewalls or IDS on P4switches. Clé comprises a smart device upgrade selectionalgorithm that selects switches to be upgraded and a controllerthat forwards trafﬁc streams to the P4 switches that implementSNFs.P4DAD [453] presents a novel approach to secure duplicateaddress detection (DAD) against spooﬁng attacks. Duplicateaddress detection is part of NDP in IPv6 where nodes checkif an IPv6 address to be applied conﬂicts with another node.As the messages exchanged in duplicate address detection arenot authenticated or encrypted, it is vulnerable to messagespooﬁng. As simple alternative to authentication or encryp-tion, P4DAD introduces a mechanism to ﬁlter spoofed NDPmessages. The P4 switch maintains registers to create bindingsbetween IPv6 addresses, port numbers, and address states.Thereby, it can detect and drop spoofed NDP messages.Chen [454] shows how AES can be implemented on Toﬁno-based P4 targets in P4 using MATs as lookup tables. Expansionof the AES key is performed in the control plane. MAT entriesspeciﬁc to the encryption keys are generated by a controller.Gondaliya et al. [456] implement six known mechanismsagainst IP address spooﬁng for the NetFPGA SUME P4target. Those are Network Ingress Filtering, Reverse PathForwarding (Loose, Strict and Feasible), Spooﬁng PreventionMethod (SPM), and Source Address Validation Improvement(SAVI). The authors compare the different mechanisms withregard to resource usage on the FPGA and report that theimplementations of all mechanisms achieve a throughput ofabout 8.5 Gbit/s and a processing latency of about 2 µs perpacket.Poise [457] introduces context-aware policies for securingP4-based networks in BYOD scenarios. Instead of relyingon a remote controller or software-based solution, Poise im-plements context-aware policy enforcement directly on P4targets. Network administrators deﬁne context-aware securitypolicies in a declarative language based on Pyretic NetCorethat are then compiled into P4 programs to be executed onP4 targets. BYOD clients run a context collection modulethat adds context information headers to network packets.The P4 program generated by Poise then parses and usesthis information to enforce ACLs based on device runtimecontexts. P4 targets in Poise are managed by a Poise controllerthat compiles the P4 programs, installs them on the P4 targets,and provides conﬁguration data to the collection modules.The authors present a prototype including PoiseDroid, animplementation of the context collection module for Androiddevices.XIV. M ISCELLANEOUS A PPLIED R ESEARCH D OMAINS

This section summarizes work that falls outside of the otherapplication domains. We describe applied research on networkcoding, distributed algorithms, state migration, and applicationsupport. Table X shows an overview of all the work described.

Title Year Targets CodeNetwork Coding (Section XIV-A)Kumar et al. [461] 2018 bmv2 [462]Gonçalves et al. [463] 2019 bmv2

Distributed Algorithm (Section XIV-B)P4CEP [464] 2018 bmv2, NetronomeDAIET [465] 2017 -Sankaran et al. [466] 2020 -Zang et al. [467] 2017 bmv2Dang et al. [468], [469] 2016/20 Toﬁno [470]P4BFT [471], [472] 2019 bmv2, NetronomeSwiShmem [473] 2020 -SC-BFT [474] 2020 bmv2 [475]LODGE [476] 2018 bmv2LOADER [477] 2020 [478]FLAIR [479] 2020 Toﬁno

State Migration (Section XIV-C)Swing State [480] 2017 bmv2P4Sync [481] 2020 bmv2 [482]Xue et al. [483] 2020 bmv2Kurzniar et al. [484] 2020 bmv2Sankaran et al. [485] 2020 NetFPGA-SUME

Application Support (Section XIV-D)P4DNS [486] 2019 NetFPGA SUME [487]P4-BNG [488] 2019 bmv2, Toﬁno, Netronome,NetFPGA-SUME [489]ARP-P4 [490] 2018 bmv2Glebke et al. [491] 2019 NetronomeCOIN [492] 2019 -Lu et al. [493] 2019 ToﬁnoYazdinejad et al. [494] 2019 bmv2P4rt-OVS [495] 2020 - [496]TABLE X: Overview of applied research on miscellaneous research domains(Section XIV).

A. Network Coding

In Network Coding (NC) [497], linear encoding and decod-ing operations are applied on packets to increase throughput,efﬁciency, scalability, and resilience. Network nodes applyprimitive operations, e.g., splitting, encoding, or decodingpackets, to implement NC mechanisms such as multicast,forward error correction, or rerouting (resilience).Kumar et al. [461] implement primitive NC operations suchas splitting, encoding, and decoding for a PSA software switchin P4 . This is the ﬁrst introduction of NC for SDN, as ﬁxed-function data plane switches, e.g., as in OF, did not supportsuch operations. The authors describe details of their imple-mentation. The open source implementation [462] relies onclone and recirculate operations to generate additional packetsfor encoding and decoding operations and packet processingloops. Temporary packet buffers for gathering operations areimplemented with P4 registers. However, P4 hardware targetsare not considered.Gonçalves et al. [463] implement NC operations that mayuse information from multiple packets during processing. Theauthors implement their concept for PISA in P4 . It featuresmultiple complex NC operations that focus on multiplicationsin Galois ﬁelds used for encoding and decoding operations.NC operations are implemented in P4 externs that extend the capabilities of the software switch to store a speciﬁc amount ofreceived packets. Again, hardware targets are not considered. B. Distributed Algorithms

We describe related work on event processing and in-network consensus.

1) Event Processing:

Data with stream characteristics oftenrequire speciﬁc processing. For example, sensor data maybe analyzed to determine whether values are within certainthresholds, or chunks of data are aggregated and preprocessed.P4CEP [464] shifts complex event processing from serversto P4 switches so that event stream data, e.g., from sensors,is directly processed on the data plane. The authors provideimplementations in P4 for bmv2 and the Netronome Agiliohardware target. The solution requires several workarounds tosolve P4 limitations regarding stateful packet processing.DAIET [465] introduces in-network data aggregation wherethe aggregation task is ofﬂoaded to the entire network. Thisreduces the amount of trafﬁc and reliefs the destination ofcomputational load. The authors provide a prototype imple-mentation in P4 but only a few details are disclosed.Sankaran et al. [466] increase the processing speed of pack-ets by reducing the time that is required by forwarding nodes toparse the packet header. To that end, ingress routers parse theheader stack to compute a so-called unique parser code (UPC)which they add to the packet header. Downstream nodes needto parse only the UPC to make forwarding decisions.

2) In-Network Consensus:

Distributed algorithms or mech-anisms may require consensus to determine the right solutionor processing. This includes communication between partici-pating entities and some ways to determine the right solution.Zhang et al. [467] propose to ofﬂoad parts of the Raftconsensus algorithm to P4 switches. However, the mechanismsrequire an additional client to run on the switch. The authorsimplement their application for a P4 software switch, butdetails are not presented.Dang et al. [468], [469] describe a P4 implementationof Paxos, a protocol that solves consensus for distributedalgorithms in a network of unreliable processors based oninformation exchange between switches. This work containsa detailed description of a complex P4 implementation. Theauthors explain all components, provide code snippets, anddiscuss their design choices.P4BFT [471], [472] introduces a consensus mechanismagainst buggy or malicious control plane instances. The con-troller responses are sent to trustworthy instances which com-pare the responses and establish consensus, e.g., by choosingthe most common response. The authors propose to ofﬂoad thecomparison process to the data plane. P4BFT is implementedin P4 and evaluated for the bmv2 and the Netronome AgilioSmartNIC.SwiShmem [473] is a distributed shared state managementlayer for the P4 data plane to implement stateful distributednetwork functions. In high performance environments con-trollers are easily overloaded when consistency of write-intensive distributed network functions, like DDoS detection,or rate limiters, is required. Therefore, SwiShmem ofﬂoads consistency mechanisms from the control plane to the dataplane. Then, consistency mechanisms operate at line ratebecause switches process trafﬁc, and generate and forwardstate update messages without controller interaction.Byzantine fault refers to a system where consensus betweenmultiple entities has to be established where one or more en-tities are unreliable. Byzantine fault tolerance (BFT) describesmechanisms that handle such faults. However, BFTs oftenrequire signiﬁcant time to reach consensus due to high com-putational overhead to reduce uncertainty. Switch-centric BFT(SC-BFT) [474] proposes to ofﬂoad BFT functionalities, i.e.,time synchronization and state synchronization, into the dataplane. This signiﬁcantly accelerates the consensus proceduresince nodes process information at line rate.LODGE [476] implements a mechanism for switches tomake forwarding decisions based on global state withoutcontrol of a central instance. Developers deﬁne global statevariables which are stored by all stateful data plane devices.When such a node processes a packet that changes a globalstate variable, the switch generates and forwards an updatepacket to all other stateful switches on a predeﬁned distributiontree.LOADER [477] introduces global state to the data plane.Consensus is maintained by the data plane devices throughdistributed algorithms, i.e., the switches send notiﬁcation mes-sages when global state changes. This increases scalability incomparison to mechanism where consensus is managed by acentral control entity.FLAIR [479] accelerates read operations in leader-basedconsensus protocols by processing the read requests in thedata plane. To that end, FLAIR devices in the core maintainpersistent information about pending write operations on allobjects in the system. When a client submits a read request, theFLAIR switch checks whether the requested object is stable,i.e., if it has pending write operations. If the object is stable,the FLAIR switch instructs another client with a stable versionof the object, to send it to the requesting client. If the objectis not stable, the FLAIR switch forwards the write request tothe leader. C. State Migration

In Swing State [480], switches maintain state in registersthat should be migrated to other nodes. For migration, stateinformation is carried by regular packets created by the P4clone operation throughout the network.P4Sync [481] is a protocol to migrate data plane statebetween switches. Thereby, it does not require controllerinteraction and provides guarantees on the authenticity of thetransferred state. To that end, it leverages the switch’s packetgenerator to transfer the content of register between devices.Authenticity in a migration operation is guaranteed by a hashchain where each packet contains the hashed values of boththe current payload and the payload of the previous packet.Xue et al. [483] propose a hybrid approach for storingﬂow entries to address the issue of limited on-switch memory.While some ﬂow entries are still stored in the internal memoryof the switch, some ﬂow entries may be stored on servers. Switches access them with only low latency via remote directmemory access (RDMA).Kuzniar et al. [484] propose to leverage programmableswitches to act as in-network cache to speed up queries overencrypted data stores. Encrypted key-value pairs are therebystored in registers.Sankaran et al. [485] describe a system to relieve switchesfrom parsing headers. They propose to parse headers at aningress switch only and add a unique parser code to thepacket that identiﬁes the set of headers of the packet. With thisinformation, following switches can parse relevant informationfrom the headers without having to parse the whole headerstack.

D. Application Support

This subsection describes work that focuses on support orimplementation of existing applications and protocols.P4DNS [486] is an in-network DNS system. The authorspropose a hybrid architecture with performance-critical com-ponents in the data plane and components with ﬂexibilityrequirements in the control plane. The data plane respondsto DNS requests and forwards regular trafﬁc while cachemanagement, recursive DNS requests, and uncached DNSresponses are handled by the control plane.P4-BNG [488] implements a carrier-grade broadband net-work gateway (BNG) in P4. The authors aim to provide animplementation for many different targets. To that end, theyintroduce an layer between data plane and control plane. Thishardware-speciﬁc BNG data plane controller runs directly onthe targets to provide a uniform interface to the control plane.It then conﬁgures the data plane according to the controlcommands from the control plane.ARP-P4 [490] implements MAC address learning based onARP solely on the P4 data plane. To substitute a control plane,the authors integrate MAC learning as an external function.Glebke et al. [491] propose to ofﬂoad computer visionfunctionalities, in particular, time-critical computations, to thedata plane. To that end, the authors leverage convolution ﬁlterson a P4-programmable NIC. The necessary computations aredistributed to various MATs.COordinate-based INdexing (COIN) [492] is a mechanismto ensure efﬁcient access to data on multiple distributededge servers. To that end, the authors introduce a centralizedinstance that indexes data and its associated location. Whenan edge server requires data that it has not cached itself,it requests the data index at the centralized instance whichprovides a data location.Lu et al. [493] propose intra-network inference (INI) andimplement it in P4. It ofﬂoads neural network computationsinto the data plane. To that end, each P4 switch communicatesvia USB with a dedicated neural compute stick which performscomputations.Yazdinejad et al. [494] present a P4-based blockchainenabled packet parser. The proposed architecture focuses onFPGAs and aims to bring the security characteristics ofblockchains into the data plane to greatly increase processingspeed. P4rt-OVS [495] is an extension for the OVS based on BPFsto combine the programmability of P4 and the well-knownfeatures of the OVS. P4rt-OVS enables runtime programmingof the OVS, in particular, the deployment of new networkfeatures without recompilation of the OVS. It contains a P4-to-BPF compiler which allows developers to write data planecode for the OVS in P4.XV. C

ONCLUSION

In this paper, we presented a survey of data plane program-ming with P4. In the ﬁrst part of the paper, we introduced theconcept of data plane programming and highlighted relationsto adjacent concepts. We described the programming modeland set a special focus on PISA, the underlying programmingmodel for P4. We provided an overview of the current stateof P4 with regard to programming language, architectures,compilers, targets, and data plane APIs. We reviewed researchefforts to advance P4 that fall in the areas of optimization ofdevelopment and deployment, research on P4 targets, and P4-speciﬁc approaches for control plane operation. In the secondpart of the paper, we analyzed 241 papers on applied researchthat leverage P4 for implementation purposes. We categorizedthese publications into research domains, summarized theirkey points, and characterized them by prototype, target plat-form, and source code availability.The survey demonstrated a tremendous uptake of P4 forprototypes in academic research from 2018 to 2020. One rea-son is certainly the multitude of openly available resources onP4 and the bmv2 P4 software target. They are an ideal startingpoint for creating P4-based prototypes, even for beginners.Most of the works presented a prototype for the P4 soft-ware target bmv2. Such implementations follow a commonprogramming model (PISA) and programming language (P4).Thus, they are close to modern switch architectures so thatthey can prove the conceptual feasibility of new data plane al-gorithms. This is a great advantage of bmv2-based prototypescompared to general software implementations.Depending on the P4 program, porting it from bmv2 to P4hardware can be straightforward while it may be a challengein other cases due to resource restrictions and potential un-availability of arbitrary extern functions on target platforms.Although there are several P4 hardware targets, the ma-jority of hardware-based prototypes documented in literatureare based on the Toﬁno ASIC which is optimized for highbandwidths on many ports. A few other studies leveragedFPGA-based P4 targets that typically have lower throughputbut allow users to provide customized externs and use themas functions in P4 programs.Some reviewed papers focused on use cases and implemen-tations that were feasible only on bmv2 due to the complexityof their algorithm or required interaction with other ﬁxed-function blocks of the switch. Other works suggested applica-tions that are also or even more appealing for networks thatdo not require high-throughput switches. Thus, ideal hardwaretargets are not yet available for those use cases while otheruses cases can be perfectly supported with Toﬁno ASICs orFPGA-based cards. The presented works have shown that data plane program-ming can speed up the evolution of computer networking eventhough some of them require novel P4-based hardware targets,e.g., for access networks or with extended functionality. In anycase, we expect P4 technology to become an integral part ofmultiple future hardware appliances.XVI. A

CKNOWLEDGEMENT

This work was partly supported by the DeutscheForschungsgemeinschaft (DFG) under grant ME2727/1-2. Theauthors alone are responsible for the content of this paper.L

IST OF A CRONYMS

ACL access control list

ALU arithmetic logic unit

API application programming interface

AQM active queue management

ASIC application-speciﬁc integrated circuit

AWW adjusting advertised windows bmv2

Behavioral Model version 2

BGP

Border Gateway Protocol

BPF

Berkeley Packet Filter

CLI command line interface

DAG directed acyclic graph

DDoS distributed denial of service

DPI deep packet inspection

DPDK

Data Plane Development Kit

DSL domain-speciﬁc language eBPF

Extended Berkeley Packet Filter

ECN

Explicit Congestion Notiﬁcation

FPGA ﬁeld programmable gate array

FSM ﬁnite state machine

GTP

GPRS tunneling protocol

HDL hardware description language

HLIR high-level intermediate representation

IDE integrated development environment

IDL

Intent Deﬁnition Language

IDS intrusion detection system

INT in-band network telemetry

LDWG

Language Design Working Group

LPM longest preﬁx matching

LUT look up table

MAT match-action-table ML machine learning NDN named data networking NF network function NFP network ﬂow processing

NFV network function virtualization

NIC network interface card

NPU network processing unit

ODM original design manufacturer

ODP

Open Data Plane

OEM original equipment manufacturer OF OpenFlow

ONF

Open Networking Foundation

OVS

Open vSwitch

PISA

Protocol Independent Switching Architecture

PSA

Portable Switch Architecture

REG register

RPC remote procedure call

RTL register-transfer level

SDK software development kit

SDN software-deﬁned networking SF service function SFC service function chain

SRAM static random-access memory

TCAM ternary content-addressable memory

TSN

Time-Sensitive Networking

TNA

Toﬁno Native Architecture uBPF user-space BPF VM virtual machine VNF virtual network function

VPP

Vector Packet Processors WG working group XDP eXpress Data PathR

EFERENCES[1] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford,C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker,“P4: Programming Protocol-independent Packet Processors,”

ACMSIGCOMM Computer Communications Review (CCR) , vol. 44, 2014.[2] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek,“The Click Modular Router,”

ACM Transactions on Computer Systems(TOCS) , vol. 18, 2000.[3] “VPP/What is VPP?” https://bit.ly/2mrxVGE, accessed 01-20-2021.[4] “BESS: Berkeley Extensible Software Switch,” http://span.cs.berkeley.edu/bess.html, accessed 01-20-2021.[5] P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Iz-zard, F. Mujica, and M. Horowitz, “Forwarding Metamorphosis: FastProgrammable Match-Action Processing in Hardware for SDN,”

ACMSIGCOMM Conference , vol. 43, 2013.[6] S. Chole, A. Fingerhut, S. Ma, A. Sivaraman, S. Vargaftik, A. Berger,G. Mendelson, M. Alizadeh, S.-T. Chuang, I. Keslassy, A. Orda, andT. Edsall, “DRMT: Disaggregated Programmable Switching,” in

ACMSIGCOMM Conference , 2017.[7] M. Moshref, A. Bhargava, A. Gupta, M. Yu, and R. Govindan, “Flow-level State Transition as a New Switch Primitive for SDN,” in

ACMSIGCOMM Conference , 2014.[8] G. Bianchi, M. Bonola, A. Capone, and C. Cascone, “OpenState:Programming Platform-independent Stateful Openﬂow Applications In-side the Switch,”

ACM SIGCOMM Computer Communications Review(CCR) , vol. 44, 2014.[9] A. Sivaraman, A. Cheung, M. Budiu, C. Kim, M. Alizadeh, H. Bal-akrishnan, G. Varghese, N. McKeown, and S. Licking, “Packet Trans-actions: High-Level Programming for Line-Rate Switches,” in

ACMSIGCOMM Conference , 2016.[10] S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. Spaziani, V. Br-uschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, F. Huici, andG. Siracusano, “FlowBlaze: Stateful Packet Processing in Hardware,” in

USENIX Symposium on Networked Systems Design & Implementation(NSDI) , 2019.[11] H. Song, “Protocol-Oblivious Forwarding: Unleash the Power of SDNThrough a Future-proof Forwarding Plane,” in

ACM Workshop on HotTopics in Networks (HotNets) , 2013.[12] C. J. Anderson, N. Foster, A. Guha, J.-B. Jeannin, D. Kozen,C. Schlesinger, and D. Walker, “NetKAT: Semantic Foundations forNetworks,” in

ACM Symposium on Principles of Programming Lan-guages (POPL)

IEEE Communications Sur-veys & Tutorials (COMST) , vol. 16, 2014. [16] Y. Jarraya, T. Madi, and M. Debbabi, “A Survey and a LayeredTaxonomy of Software-Deﬁned Networking,” IEEE CommunicationsSurveys & Tutorials (COMST) , vol. 16, 2014.[17] W. Xia, Y. Wen, C. H. Foh, D. Niyato, and H. Xie, “A Surveyon Software-Deﬁned Networking,”

IEEE Communications Surveys &Tutorials (COMST) , vol. 17, 2015.[18] D. F. Macedo, D. Guedes, L. F. M. Vieira, M. A. M. Vieira, andM. Nogueira, “Programmable Networks—From Software-Deﬁned Ra-dio to Software-Deﬁned Networking,”

IEEE Communications Surveys& Tutorials (COMST) , vol. 17, 2015.[19] D. Kreutz, F. M. V. Ramos, P. E. Veríssimo, C. E. Rothenberg,S. Azodolmolky, and S. Uhlig, “Software-Deﬁned Networking: AComprehensive Survey,”

Proceedings of the IEEE , vol. 103, 2015.[20] R. Masoudi and A. Ghaffari, “Software deﬁned networks: A survey,”

Journal of Network and Computer Applications (JNCA) , vol. 67, 2016.[21] C. Trois, M. D. Del Fabro, L. C. E. de Bona, and M. Martinello, “ASurvey on SDN Programming Languages: Toward a Taxonomy,”

IEEECommunications Surveys & Tutorials (COMST) , vol. 18, 2016.[22] W. Braun and M. Menth, “Software-Deﬁned Networking Using Open-Flow: Protocols, Applications and Architectural Design Choices,”

MDPI Future Internet Journal (FI) , vol. 6, 2014.[23] F. Hu, Q. Hao, and K. Bao, “A Survey on Software-Deﬁned Networkand OpenFlow: From Concept to Implementation,”

IEEE Communica-tions Surveys & Tutorials (COMST) , vol. 16, 2014.[24] A. Lara, A. Kolasani, and B. Ramamurthy, “Network Innovation usingOpenFlow: A Survey,”

IEEE Communications Surveys & Tutorials(COMST) , vol. 16, 2014.[25] R. Bifulco and G. Rétvári, “A Survey on the Programmable DataPlane: Abstractions, Architectures, and Open Problems,” in

IEEEInternational Conference on High Performance Switching and Routing(HPSR) , 2018.[26] E. Kaljic, A. Maric, P. Njemcevic, and M. Hadzialic, “A Surveyon Data Plane Flexibility and Programmability in Software-DeﬁnedNetworking,”

IEEE ACCESS , vol. 7, 2019.[27] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,J. Rexford, S. Shenker, and J. Turner, “OpenFlow: Enabling Innovationin Campus Networks,”

ACM SIGCOMM Computer CommunicationsReview (CCR) , vol. 38, 2008.[28] “Google Presentations: P4 Tutorial,” http://bit.ly/p4d2-2018-spring,accessed 01-20-2021.[29] “Website of the P4 Language Consortium,” https://p4.org/, accessed01-20-2021.[30] “The P4 Language Speciﬁcation,” https://p4.org/p4-spec/p4-14/v1.0.5/tex/p4.pdf, accessed 01-20-2021.[31] “P4 16 Language Speciﬁcation (v.1.2.1,” https://p4.org/p4-spec/docs/P4-16-v1.2.1.html, accessed 01-20-2021.[32] “Charter of the P4 Architecture WG,” https://github.com/p4lang/p4-spec/blob/master/p4-16/psa/charter/P4_Arch_Charter.mdk,accessed 01-20-2021.[33] “P4_16 PSA Speciﬁcation (v1.1),” https://p4lang.github.io/p4-spec/docs/PSA-v1.1.0.html, accessed 01-20-2021.[34] “P4-HLIR Speciﬁcation v.0.9.30,” https://github.com/p4lang/p4-hlir/blob/master/HLIRSpec.pdf, accessed 01-20-2021.[35] “GitHub: p4c,” https://github.com/p4lang/p4c, accessed 01-20-2021.[36] P. G. Patra, C. E. Rothenberg, and G. Pongracz, “MACSAD: High Per-formance Dataplane Applications on the Move,” in

IEEE InternationalConference on High Performance Switching and Routing (HPSR) ,2017.[37] “Open Data Plane,” https://opendataplane.org/, accessed 01-20-2021.[38] L. Jose and M. R. N. M. Lisa Yan, Stanford University; George Vargh-ese, “Compiling Packet Programs to Reconﬁgurable Switches,” in

USENIX Symposium on Networked Systems Design & Implementation(NSDI) , 2015.[39] P. Li and Y. Luo, “P4GPU: Accelerate Packet Processing of a P4Program with a CPU-GPU Heterogeneous Architecture,” in

ACM/IEEESymposium on Architectures for Networking and CommunicationsSystems (ANCS) , 2016.[40] “GitHub: p4c-behavioural,” https://github.com/p4lang/p4c-behavioral/tree/master/p4c_bm, accessed 01-20-2021.[41] “GitHub: Behavioural Model Version 2 (BMv2),” https://github.com/p4lang/behavioral-model, accessed 01-20-2021.[42] “P4 Behaviour Model: Why did we needBMv2,” https://github.com/p4lang/behavioral-model

ACM SIGCOMM Conference

ACM SIGOPS Asia-Paciﬁc Workshop on System (APSys) , 2017.[53] X. Wu, P. Li, T. Miskell, L. Wang, Y. Luo, and X. Jiang, “Ripple:An Efﬁcient Runtime Reconﬁgurable P4 Data Plane for MulticoreSystems,” in

International Conference on Networking and NetworkApplications (NaNA) , 2019.[54] M. Shahbaz, S. Choi, B. Pfaff, C. Kim, N. Feamster, N. McKeown, andJ. Rexford, “PISCES: A Programmable, Protocol-Independent SoftwareSwitch,” in

ACM SIGCOMM Conference

Asia-Paciﬁc Workshop on Networking (APnet) , 2017.[58] ——, “PVPP: A Programmable Vector Packet Processor,” in

ACMSymposium on SDN Research (SOSR) , 2017.[59] “Northbound Networks - Who are You?” https://northboundnetworks.com/pages/about-us, accessed 01-20-2021.[60] “GitHub: ZodiacFX-P4,” https://github.com/NorthboundNetworks/ZodiacFX-P4, accessed 01-20-2021.[61] “GitHub: p4c-zodiacfx,” https://github.com/NorthboundNetworks/p4c-zodiacfx, accessed 01-20-2021.[62] P. Zanna, P. Radcliffe, and K. G. Chavez, “A Method for ComparingOpenFlow and P4,” in

International Telecommunication Networks andApplications Conference (ITNAC) , 2019.[63] “GitHub: P4-NetFPGA,” https://github.com/NetFPGA/P4-NetFPGA-public/wiki, accessed 01-20-2021.[64] S. Ibanez, G. Brebner, N. McKeown, and N. Zilberman, “The P4-NetFPGA Workﬂow for Line-Rate Packet Processing,” in

ACM/SIGDAInternational Symposium on Field-Programmable Gate Arrays(FPGA) , 2019.[65] N. Zilberman, Y. Audzevich, G. A. Covington, and A. W. Moore,“NetFPGA SUME: Toward 100 Gbps as Research Commodity,”

IEEEMicro

ACM Symposium on SDN Research (SOSR) , 2017.[68] “GitHub: P4FPGA,” https://github.com/p4fpga/p4fpga, accessed 01-20-2021.[69] P. Benácek, V. Pu, and H. Kubátová, “P4-to-VHDL: Automatic Genera-tion of 100 Gbps Packet Parsers,” in

IEEE Annual International Sympo-sium on Field-Programmable Custom Computing Machines (FCCM) ,2016.[70] P. Benáˇcek, V. Puš, J. Koˇrenek, and M. Kekely, “Line Rate Pro-grammable Packet Processing in 100Gb Networks,” in

InternationalConference on Field Programmable Logic and Applications (FPL) ,2017.[71] J. Cabal, P. Benáˇcek, L. Kekely, M. Kekely, V. Puš, and J. Koˇrenek,“Conﬁgurable FPGA Packet Parser for Terabit Networks with Guar-anteed Wire-Speed Throughput,” in

ACM/SIGDA International Sympo-sium on Field-Programmable Gate Arrays (FPGA) , 2018.[72] S. da Silva, Jeferson, Boyer, François-Raymond, Langlois, andJ. Pierre, “P4-Compatible High-Level Synthesis of Low Latency 100 Gb/s Streaming Packet Parsers in FPGAs,” in

ACM/SIGDA Interna-tional Symposium on Field-Programmable Gate Arrays (FPGA) , 2018.[73] M. Kekely and J. Korenek, “Mapping of P4 Match Action Tables toFPGA,” in

International Conference on Field Programmable Logic andApplications (FPL) , 2017.[74] R. Iša, P. Benáˇcek, and V. Puš, “Veriﬁcation of Generated RTL from P4Source Code,” in

IEEE International Conference on Network Protocols(ICNP) , 2018.[75] Z. Cao, H. Su, Q. Yang, J. Shen, M. Wen, and C. Zhang, “P4 to FPGA-A Fast Approach for Generating Efﬁcient Network Processors,”

IEEEACCESS , vol. 8, 2020.[76] Z. Cao, H. Su, Q. Yang, M. Wen, and C. Zhang, “A Template-basedFramework for Generating Network Processor in FPGA,” in

IEEEConference on Computer Communications Workshops (INFOCOMWKSHPS)

P4 Workshop in Europe (EuroP4)

ACM/IEEE Symposium on Architectures for Networking and Commu-nications Systems (ANCS) , 2019.[90] “Apache Thrift,” https://thrift.apache.org/, accessed 01-20-2021.[91] “gRPC,” https://grpc.io/, accessed 01-20-2021.[92] “Google Protocol Buffers,” https://developers.google.com/protocol-buffers/, accessed 01-20-2021.[93] “Charter of the P4 API WG,” https://github.com/p4lang/p4-spec/blob/master/api/charter/P4_API_WG_charter.mdk, accessed 01-20-2021.[94] “P4 Runtime API Speciﬁcation v.1.3.0 (2019-12-01),” https://p4.org/p4runtime/spec/v1.3.0/P4Runtime-Spec.html, accessed 01-20-2021.[95] “ONOS: P4 brigade,” https://wiki.onosproject.org/display/ONOS/P4+brigade, accessed 01-20-2021.[96] “OpenDaylight: P4 brigade,” P4PluginDeveloperGuide, accessed 09-23-2019.[97] B. O’Connor, Y. Tseng, M. Pudelko, C. Cascone, A. Endurthi, Y. Wang,A. Ghaffarkhah, D. Gopalpur, T. Everman, T. Madejski, J. Wanderer,and A. Vahdat, “Using P4 on Fixed-Pipeline and Programmable Stra-tum Switches,” in

P4 Workshop in Europe (EuroP4) , 2010.[98] “GitHub: P4tutorial,” https://github.com/p4lang/tutorials/tree/master/utils/p4runtime_lib, accessed 01-20-2021.[99] “GitHub: PI Library,” https://github.com/p4lang/PI, accessed 01-20-2021.[100] “GitHub: Behavioural Model - simple_switch_grpc,” https://github.com/p4lang/behavioral-model/tree/master/targets/simple_switch_grpc,accessed 01-20-2021.[101] “GitHub: bmv2 Runtime CLI,” https://github.com/p4lang/behavioral-model/blob/master/tools/runtime_CLI.py, accessed 01-20-2021. [102] E. O. Zaballa and Z. Zhou, “Graph-to-P4: A P4 Boilerplate CodeGenerator for Parse Graphs,” in

P4 Workshop in Europe (EuroP4) ,2019.[103] Y. Zhou and J. Bi, “ClickP4: Towards Modular Programming of P4,”in

ACM SIGCOMM Conference Posters and Demos , 2017.[104] M. Baldi, “daPIPE A Data Plane Incremental Programming Environ-ment,” in

P4 Workshop in Europe (EuroP4) , 2019.[105] M. Eichholz, E. Campbell, N. Foster, G. Salvaneschi, and M. Mezini,“How to Avoid Making a Billion-Dollar Mistake: Type-Safe DataPlane Programming with SafeP4,” in

European Conference on Object-Oriented Programming (ECOOP) , 2019.[106] M. Riftadi and F. Kuipers, “P4I/O: Intent-Based Networking with P4,”in

IEEE Conference on Network Softwarization (NetSoft) , 2019.[107] L. Yu, J. Sonchack, and V. Liu, “Mantis: Reactive ProgrammableSwitches,” in

ACM SIGCOMM Conference , 2020.[108] J. Gao, E. Zhai, H. H. Liu, R. Miao, Y. Zhou, B. Tian, C. Sun,D. Cai, M. Zhang, and M. Yu, “Lyra: A Cross-Platform Languageand Compiler for Data PlaneProgramming on Heterogeneous ASICs,”in

ACM SIGCOMM Conference , 2020.[109] M. Riftadi, J. Oostenbrink, and F. Kuipers, “GP4P4: Enabling Self-Programming Networks,”

ArXiv e-prints , 2019.[110] D. Moro, D. Sanvito, and A. Capone, “FlowBlaze.p4: a library forquick prototyping of stateful SDN applications in P4,” in

IEEEConference on Network Function Virtualization and Software-DeﬁnedNetworking (NFV-SDN) , 2020.[111] ——, “Demonstrating FlowBlaze.p4: fast prototyping for EFSM-baseddata plane applications,” in

IEEE Conference on Network FunctionVirtualization and Software-Deﬁned Networking (NFV-SDN) , 2020.[112] D. Moro, D. Sanvito, and A. Capone, “Developing EFSM-BasedStateful Applications with FlowBlaze.P4 and ONOS,” in

P4 Workshopin Europe (EuroP4) , 2020.[113] R. Shah, A. Shirke, A. Trehan, M. Vutukuru, and P. Kulkarni, “pcube:Primitives for Network Data Plane Programming,” in

IEEE Interna-tional Conference on Network Protocols (ICNP) , 2018.[114] Z. Ma, J. Bi, C. Zhang, Y. Zhou, and A. B. Dogar, “CacheP4:A Behavior-level Caching Mechanism for P4,” in

ACM SIGCOMMConference Posters and Demos , 2017.[115] A. Abhashkumar, J. Lee, J. Tourrilhes, S. Banerjee, W. Wu, J.-M. Kang,and A. Akella, “P5: Policy-driven Optimization of P4 Pipeline,” in

ACM Symposium on SDN Research (SOSR) , 2017.[116] P. Wintermeyer, M. Apostolaki, A. Dietmüller, and L. Vanbever,“P2GO: P4 Proﬁle-Guided Optimizations,” in

ACM Workshop on HotTopics in Networks (HotNets) , 2020.[117] S. Yang, L. Baia, L. Cui, Z. Ming, Y. Wu, S. Yu, H. Shen, and Y. Pan,“P4 Edge node enabling stateful trafﬁc engineering and cyber security,”

Journal of Network and Computer Applications (JNCA) , vol. 171, 2020.[118] B. Vass, E. Bérczi-Kovács, C. Raiciu, and G. Rétvári, “CompilingPacket Programs to Reconﬁgurable Switches: Theory and Algorithms,”in

P4 Workshop in Europe (EuroP4) , 2020.[119] S. Abdi, U. Aftab, G. Bailey, B. Boughzala, F. Dewal, S. Parsazad,and E. Tremblay, “PFPSim: A Programmable Forwarding Plane Simu-lator,” in

ACM/IEEE Symposium on Architectures for Networking andCommunications Systems (ANCS) , 2016.[120] J. Bai, J. Bi, P. Kuang, C. Fan, Y. Zhou, and C. Zhang, “NS4: EnablingProgrammable Data Plane Simulation,” in

ACM Symposium on SDNResearch (SOSR) , 2018.[121] C. Fan, J. Bi, Y. Zhou, C. Zhang, and H. Yu, “NS4: A P4-DrivenNetwork Simulator,” in

ACM SIGCOMM Conference Posters andDemos

ArXiv e-prints , 2018.[124] J. Liu, W. Hallahan, C. Schlesinger, M. Sharif, J. Lee, R. Soulé,H. Wang, C. Ca¸scaval, N. McKeown, and N. Foster, “P4V: PracticalVeriﬁcation for Programmable Data Planes,” in

ACM SIGCOMMConference , 2018.[125] L. Freire, M. Neves, L. Leal, K. Levchenko, A. Schaeffer-Filho, andM. Barcellos, “Uncovering Bugs in P4 Programs with Assertion-basedVeriﬁcation,” in

ACM Symposium on SDN Research (SOSR) , 2018.[126] M. Neves, L. Freire, A. Schaeffer-Filho, and M. Barcellos, “Veriﬁcationof P4 Programs in Feasible Time using Assertions,” in

ACM Conferenceon emerging Networking EXperiments and Technologies (CoNEXT) ,2018. [127] R. Stoenescu, D. Dumitrescu, M. Popovici, L. Negreanu, and C. Raiciu,“Debugging P4 Programs with Vera,” in ACM SIGCOMM Conference ,2018.[128] M. A. Noureddine, A. Hsu, M. Caesar, F. A. Zaraket, and W. H.Sanders, “P4AIG: Circuit-Level Veriﬁcation of P4 Programs,” in

IEEE/IFIP International Conference on Dependable Systems and Net-works – Supplemental Volume (DSN-S) , 2019.[129] D. Dumitrescu, R. Stoenescu, L. Negreanu, and C. Raiciu, “bf4:towards bug-free P4 programs,” in

ACM SIGCOMM Conference , 2020.[130] D. Dumitrescu, R. Stoenescu, M. Popovici, L. Negreanu, and C. Raiciu,“Dataplane equivalence and its applications,” in

USENIX Symposiumon Networked Systems Design & Implementation (NSDI) , 2019.[131] F. Youseﬁ, A. Abhashkumar, K. Subramanian, K. Hans, S. Ghorbani,and A. Akella, “Liveness Veriﬁcation of Stateful Network Functions,”in

USENIX Symposium on Networked Systems Design & Implementa-tion (NSDI) , 2020.[132] A. Nötzli, J. Khan, A. Fingerhut, C. Barrett, and P. Athanas, “P4Pktgen:Automated Test Case Generation for P4 Programs,” in

ACM Symposiumon SDN Research (SOSR) , 2018.[133] Y. Zhou, J. Bi, Y. Lin, Y. Wang, D. Zhang, Z. Xi, J. Cao, and C. Sun,“P4Tester: Efﬁcient Runtime Rule Fault Detection for ProgrammableData Planes,” in

IEEE International Workshop on Quality of Service(IWQoS) , 2019.[134] “GitHub: P4app,” https://github.com/p4lang/p4app, accessed 01-20-2021.[135] A. Shukla, K. N. Hudemann, A. Hecker, and S. Schmid, “RuntimeVeriﬁcation of P4 Switches with Reinforcement Learning,” in

Workshopon Network Meets AI & ML , 2019.[136] D. Jindal, R. Joshi, and B. Leong, “P4TrafﬁcTool: Automated CodeGeneration for P4 Trafﬁc Generators and Analyzers,” in

ACM Sympo-sium on SDN Research (SOSR) , 2019.[137] H. T. Dang, H. Wang, T. Jepsen, G. Brebner, C. Kim, J. Rexford,R. Soulé, and H. Weatherspoon, “Whippersnapper: A P4 LanguageBenchmark Suite,” in

ACM Symposium on SDN Research (SOSR) ,2017.[138] F. Rodriguez, P. G. K. Patra, L. Csikor, C. E. Rothenberg, P. Vörös,S. Laki, and G. Pongrácz, “BB-Gen: A Packet Crafter for P4 TargetEvaluation,” in

ACM SIGCOMM Conference Posters and Demos , 2018.[139] H. Harkous, M. Jarschel, M. He, R. Pries, and W. Kellerer, “P8: P4with Predictable Packet Processing Performance,”

IEEE Transactionson Network and Service Management (TNSM) , 2020.[140] S. Kodeswaran, M. T. Arashloo, P. Tammana, and J. Rexford, “TrackingP4 Program Execution in the Data Plane,” in

ACM Symposium on SDNResearch (SOSR) , 2020.[141] K. Birnfeld, D. C. da Silva, W. Cordeiro, and B. B. N. de França,“P4 Switch Code Data Flow Analysis: Towards Stronger Veriﬁcationof Forwarding Plane Software,” in

IEEE/IFIP Network Operations andManagement Symposium (NOMS) , 2020.[142] M. Neves, B. Huffaker, K. Levchenko, and M. Barcellos, “DynamicProperty Enforcement in Programmable Data Planes,” in

IFIP-TC6Networking Conference (Networking) , 2019.[143] C. Zhang, J. Bi, Y. Zhou, J. Wu, B. Liu, Z. Li, A. B. Dogar,and Y. Wang, “P4DB: On-the-ﬂy Debugging of the ProgrammableData Plane,” in

IEEE International Conference on Network Protocols(ICNP) , 2017.[144] Y. Zhou, J. Bi, C. Zhang, B. Liu, Z. Li, Y. Wang, and M. Yu, “P4DB:On-the-Fly Debugging for Programmable Data Planes,”

IEEE/ACMTransactions on Networking (ToN) , vol. 27, 2019.[145] M. Neves, K. Levchenko, and M. Barcellos, “Sandboxing Data PlanePrograms for Fun and Proﬁt,” in

ACM SIGCOMM Conference Postersand Demos , 2017.[146] A. Shukla, S. Fathalli, T. Zinner, A. Hecker, and S. Schmid, “P4Consist:Toward Consistent P4 SDNs,”

IEEE Journal on Selected Areas inCommunications (JSAC) , vol. 38, 2020.[147] Z. Xia, J. Bi, Y. Zhou, and C. Zhang, “KeySight: A Scalable Trou-bleshooting Platform Based on Network Telemetry,” in

ACM Sympo-sium on SDN Research (SOSR) , 2018.[148] F. Ruffy, T. Wang, and A. Sivaraman, “Gauntlet: Finding Bugs in Com-pilers for Programmable Packet Processing,” in

USENIX Symposium onOperating Systems Design and Implementation (OSDI) , 2020.[149] J. Krude, J. Hofmann, M. Eichholz, K. Wehrle, A. Koch, andM. Mezini, “Online Reprogrammable Multi Tenant Switches,” in

ACMCoNEXT Workshop on Emerging In-Network Computing Paradigms ,2019.[150] D. Hancock and J. van der Merwe, “HyPer4: Using P4 to Virtualizethe Programmable Data Plane,” in

ACM Conference on emergingNetworking EXperiments and Technologies (CoNEXT) , 2016. [151] C. Zhang, J. Bi, Y. Zhou, A. B. Dogar, and J. Wu, “HyperV: A HighPerformance Hypervisor for Virtualization of the Programmable DataPlane,” in

IEEE International Conference on Computer Communica-tions and Networks (ICCCN) , 2017.[152] ——, “MPVisor: A Modular Programmable Data Plane Hypervisor,”in

ACM Symposium on SDN Research (SOSR) , 2017.[153] “GitHub: HyperVDP,” https://github.com/HyperVDP, accessed 01-20-2021.[154] C. Zhang, J. Bi, Y. Zhou, and J. Wu, “HyperVDP: High-PerformanceVirtualization of the Programmable Data Plane,”

IEEE Journal onSelected Areas in Communications (JSAC) , vol. 37, 2019.[155] M. Saquetti, G. Bueno, W. Cordeiro, and J. R. Azambuja, “P4VBox:Enabling P4-Based Switch Virtualization,”

IEEE Communications Let-ters , vol. 24, 2020.[156] ——, “VirtP4: An Architecture for P4 Virtualization,” in

IEEE Inter-national Parallel and Distributed Processing Symposium Workshops(IPDPSW) , 2019.[157] P. Zheng, T. Benson, and C. Hu, “P4Visor: Lightweight Virtualiza-tion and Composition Primitives for Building and Testing ModularPrograms,” in

ACM Conference on emerging Networking EXperimentsand Technologies (CoNEXT) , 2018.[158] R. Parizotto, L. Castanheira, F. Bonetti, A. Santos, and A. Schaeffer-Filho, “PRIME: Programming In-Network Modular Extensions,” in

IEEE/IFIP Network Operations and Management Symposium (NOMS) ,2020.[159] E. O. Zaballa, D. Franco, M. S. Berger, and M. Higuero, “A Per-spective on P4-Based Data and Control Plane Modularity for NetworkAutomation,” in

P4 Workshop in Europe (EuroP4) , 2020.[160] R. Stoyanov and N. Zilberman, “MTPSA: Multi-Tenant ProgrammableSwitches,” in

P4 Workshop in Europe (EuroP4) , 2020.[161] “GitHub: MTPSA,” https://github.com/mtpsa, accessed 01-20-2021.[162] J. Santiago da Silva, T. Stimpﬂing, T. Luinaud, B. Fradj, andB. Boughzala, “One for All, All for One: A Heterogeneous DataPlane for Flexible P4 Processing,” in

IEEE International Conferenceon Network Protocols (ICNP) , 2018.[163] C. Beckmann, R. Krishnamoorthy, H. Wang, A. Lam, and C. Kim,“Hurdles for a DRAM-based Match-Action Table,” in

Conference onInnovation in Clouds, Internet and Networks and Workshops (ICIN) ,2020.[164] A. Aghdai, Y. Xu, and H. J. Chao, “Design of a hybrid modular switch,”in

IEEE Conference on Network Function Virtualization and Software-Deﬁned Networking (NFV-SDN) , 2017.[165] S. Laki, D. Horpacsi, P. Voros, M. Tejfel, P. Hudoba, G. Pongracz,and L. Molnar, “The Price for Asynchronous Execution of ExternFunctions in Programmable Software Data Planes,” in

Workshop onFlexible Network Data Plane Processing (NETPROC@ICIN) , 2020.[166] D. Horpácsi, P. Vörös, M. Tejfel, S. Laki, G. Pongrácz, and L. Molnár,“Asynchronous Extern Functions in Programmable Software DataPlanes,” in

P4 Workshop in Europe (EuroP4) , 2019.[167] D. Scholz, A. Oeldemann, F. Geyer, S. Gallenmüller, H. Stubbe,T. Wild, A. Herkersdorf, and G. Carle, “Cryptographic Hashing in P4Data Planes,” in

P4 Workshop in Europe (EuroP4) , 2019.[168] J. S. da Silva, F.-R. Boyer, L.-O. Chiquette, and J. P. Langlois, “ExternObjects in P4: an ROHC Header Compression Scheme Case Study,”in

IEEE Conference on Network Softwarization (NetSoft) , 2018.[169] N. Gray, A. Grigorjew, T. Hosssfeld, A. Shukla, and T. Zinner, “High-lighting the Gap Between Expected and Actual Behavior in P4-enabledNetworks,” in

IFIP/IEEE Symposium on Integrated Management (IM) ,2019.[170] M. V. Dumitru, D. Dumitrescu, and C. Raiciu, “Can We Exploit BuggyP4 Programs?” in

ACM Symposium on SDN Research (SOSR) , 2020.[171] J. Mambretti, J. Chen, F. Yeh, and S. Y. Yu, “International P4Networking Testbed,” in

ACM/IEEE Symposium on Architectures forNetworking and Communications Systems (ANCS) , 2019.[172] B. Chung, C. Tseng, J. H. Chen, and J. Mambretti, “P4MT: Multi-Tenant Support Prototype for International P4 Testbed,” in

ACM/IEEESymposium on Architectures for Networking and CommunicationsSystems (ANCS)

ComputerNetworks , vol. 155, 2019.[175] ——, “ProFlow: Proportional Per-Bidirectional-Flow Consistent Up-dates,”

IEEE Transactions on Network and Service Management(TNSM) , vol. 16, 2019. [176] S. Liu, T. A. Benson, and M. K. Reiter, “Efﬁcient and Safe NetworkUpdates with Sufﬁx Causal Consistency,” in European Conference onComputer Systems (EUROSYS) , 2019.[177] T. D. Nguyen, M. Chiesa, and M. Canini, “Decentralized ConsistentNetwork Updates in SDN with ez-Segway,”

ArXiv e-prints , 2017.[178] S. Geissler, S. Herrnleben, R. Bauer, A. Grigorjew, T. Zinner, andM. Jarschel, “The Power of Composition: Abstracting aMulti-DeviceSDN Data Path Through a Single API,”

IEEE Transactions on Networkand Service Management (TNSM) , 2019.[179] E. C. Molero, S. Vissicchio, and L. Vanbever, “Hardware-AcceleratedNetwork Control Planes,” in

ACM Workshop on Hot Topics in Networks(HotNets) , 2018.[180] V. Sivaraman, S. Narayana, O. Rottenstreich, S. Muthukrishnan, andJ. Rexford, “Heavy-Hitter Detection Entirely in the Data Plane,” in

ACM Symposium on SDN Research (SOSR) , 2017.[181] “GitHub: Hashpipe,” https://github.com/vibhaa/hashpipe, accessed 01-20-2021.[182] Y. Lin, C. Huang, and S. Tsai, “SDN Soft Computing Application forDetecting Heavy Hitters,”

IEEE Transactions on Industrial Informatics(ToII) , vol. 15, 2019.[183] D. A. Popescu, G. Antichi, and A. W. Moore, “Enabling Fast Hier-archical Heavy Hitter Detection using Programmable Data Planes,” in

ACM Symposium on SDN Research (SOSR) , 2017.[184] R. Harrison, Q. Cai, A. Gupta, and J. Rexford, “Network-Wide HeavyHitter Detection with Commodity Switches,” in

ACM Symposium onSDN Research (SOSR) , 2018.[185] J. Kuˇcera, D. A. Popescu, H. Wang, A. Moore, J. Koˇrenek, andG. Antichi, “Enabling Event-Triggered Data Plane Monitoring,” in

ACM Symposium on SDN Research (SOSR) , 2020.[186] M. Silva, A. Jacobs, R. Pﬁtscher, and L. Granville, “IDEAFIX: Iden-tifying Elephant Flows in P4-Based IXP Networks,” in

IEEE GlobalCommunications Conference (GLOBECOM) , 2018.[187] B. Turkovic, J. Oostenbrink, and F. Kuipers, “Detecting Heavy Hittersin the Data-plane,”

ArXiv e-prints , 2019.[188] D. Ding, M. Savi, G. Antichi, and D. Siracusa, “An Incrementally-Deployable P4-Enabled Architecture for Network-Wide Heavy-HitterDetection,”

IEEE Transactions on Network and Service Management(TNSM) , vol. 17, 2020.[189] “GitHub: Network-Wide Heavy-Hitter Detection Implemen-tation in P4 Language,” https://github.com/DINGDAMU/Network-wide-heavy-hitter-detection, accessed 01-20-2021.[190] J. Sonchack, A. J. Aviv, E. Keller, and J. M. Smith, “Turboﬂow:Information Rich Flow Record Generation on Commodity Switches,”in

European Conference on Computer Systems (EUROSYS) , 2018.[191] “GitHub: TurboFlow,” https://github.com/jsonch/TurboFlow, accessed01-20-2021.[192] J. Sonchack, O. Michel, A. J. Aviv, E. Keller, and J. M. Smith,“Scaling Hardware Accelerated Network Monitoring to Concurrentand Dynamic Queries With *Flow,” in

USENIX Annual TechnicalConference (ATC) , 2018.[193] “GitHub: StarFlow,” https://github.com/jsonch/starﬂow, accessed 01-25-2021.[194] J. Hill, M. Aloserij, and P. Grosso, “Tracking Network Flows withP4,” in

IEEE/ACM Innovating the Network for Data-Intensive Science(INDIS) , 2018.[195] L. Castanheira, R. Parizotto, and A. E. Schaeffer-Filho, “FlowStalker:Comprehensive Trafﬁc Flow Monitoring on the Data Plane using P4,”in

IEEE International Conference on Communications (ICC) , 2019.[196] R. Parizotto, L. Castanheira, R. H. Ribeiro, L. Zembruzki, A. S. Jacobs,L. Z. Granville, and A. Schaeffer-Filho, “ShadowFS: Speeding-up DataPlane Monitoring and Telemetry using P4,” in

IEEE InternationalConference on Communications (ICC) , 2020.[197] W. Wang, P. Tammana, A. Chen, and T. S. E. Ng, “Grasp theRoot Causes in the Data Plane: Diagnosing Latency Problems withSpiderMon,” in

ACM Symposium on SDN Research (SOSR) , 2020.[198] X. Chen, S. Landau-Feibish, Y. Koral, J. Rexford, O. Rottenstreich,S. A. Monetti, and T.-Y. Wang, “Fine-Grained Queue Measurementin the Data Plane,” in

ACM Conference on emerging NetworkingEXperiments and Technologies (CoNEXT) , 2019.[199] Z. Zhao, X. Shi, X. Yin, Z. Wang, and Q. Li, “HashFlow forBetter Flow Record Collection,” in

IEEE International Conference onDistributed Computing Systems (ICDCS) , 2019.[200] Q. Huang, P. P. C. Lee, and Y. Bao, “Sketchlearn: Relieving UserBurdens in Approximate Measurement with Automated StatisticalInference,” in

ACM SIGCOMM Conference , 2018.[201] “GitHub: SketchLearn,” https://github.com/huangqundl/SketchLearn,accessed 01-20-2021. [202] L. Tang, Q. Huang, and P. C. Lee, “A Fast and Compact InvertibleSketch for Network-Wide Heavy Flow Detection,”

IEEE/ACM Trans-actions on Networking (ToN) , vol. 28, 2020.[203] “GitHub: MV-Sketch,” https://github.com/Grace-TL/MV-Sketch, ac-cessed 01-20-2021.[204] Z. Hang, M. Wen, Y. Shi, and C. Zhang, “Interleaved Sketch: To-ward Consistent Network Telemetry for Commodity ProgrammableSwitches,”

IEEE ACCESS , vol. 7, 2019.[205] Z. Liu, A. Manousis, G. Vorsanger, V. Sekar, and V. Braverman, “OneSketch to Rule Them All: Rethinking Network Flow Monitoring withUnivMon,” in

ACM SIGCOMM Conference , 2016.[206] T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao,X. Li, and S. Uhlig, “Elastic Sketch: Adaptive and Fast Network-wideMeasurements,” in

ACM SIGCOMM Conference , 2018.[207] T. Yang, J. Jiang, P. Liu, Q. Huang, J. Gong, Y. Zhou, R. Miao, X. Li,and S. Uhlig, “Adaptive Measurements Using One Elastic Sketch,”

IEEE/ACM Transactions on Networking (ToN) , vol. 27, 2019.[208] “GitHub: ElasticSketch,” https://github.com/BlockLiu/ElasticSketchCode, accessed 01-20-2021.[209] F. Pereira, N. Neves, and F. M. V. Ramos, “Secure network monitoringusing programmable data planes,” in

IEEE Conference on NetworkFunction Virtualization and Software-Deﬁned Networking (NFV-SDN) ,2017.[210] R. F. T. Martins, F. L. Verdi, R. Villaça, and L. F. U. Garcia, “UsingProbabilistic Data Structures for Monitoring of Multi-tenant P4-basedNetworks,” in

IEEE Symposium on Computers and Communications(ISCC) , 2018.[211] Y.-K. Lai, K.-Y. Shih, P.-Y. Huang, H.-P. Lee, Y.-J. Lin, T.-L. Liu,and J. H. Chen, “Sketch-based Entropy Estimation for Network TrafﬁcAnalysis using Programmable Data Plane ASICs,” in

ACM/IEEE Sym-posium on Architectures for Networking and Communications Systems(ANCS) , 2019.[212] Z. Liu, S. Zhou, O. Rottenstreich, V. Braverman, and J. Rex-ford, “Memory-Efﬁcient Performance Monitoring on ProgrammableSwitches with Lean Algorithms,” in

SIAM Symposium on AlgorithmicPrinciples of Computer Systems (APOCS) , 2020.[213] L. Tang, Q. Huang, and P. P. C. Lee, “SpreadSketch: Toward Invertibleand Network-Wide Detection of Superspreaders,” in

IEEE InternationalConference on Computer Communications (INFOCOM) , 2020.[214] “GitHub: SpreadSketch,” http://adslab.cse.cuhk.edu.hk/software/spreadsketch/, accessed 01-20-2021.[215] J. Vestin, A. Kassler, D. Bhamare, K. Grinnemo, J. Andersson, andG. Pongracz, “Programmable Event Detection for In-Band NetworkTelemetry,” in

IEEE International Conference on Cloud Networking(IEEE CloudNet) , 2019.[216] S. Wang, Y. Chen, J. Li, H. Hu, J. Tsai, and Y. Lin, “A Bandwidth-Efﬁcient INT System for Tracking the Rules Matched by the Packets ofa Flow,” in

IEEE Global Communications Conference (GLOBECOM) ,2019.[217] D. Bhamare, A. Kassler, J. Vestin, M. A. Khoshkholghi, and J. Taheri,“IntOpt: In-Band Network Telemetry Optimization for NFV ServiceChain Monitoring,” in

IEEE International Conference on Communica-tions (ICC) , 2019.[218] C. Jia, T. Pan, Z. Bian, X. Lin, E. Song, C. Xu, T. Huang, and Y. Liu,“Rapid Detection and Localization of Gray Failures in Data Centersvia In-band Network Telemetry,” in

IEEE/IFIP Network Operationsand Management Symposium (NOMS) , 2020.[219] “GitHub: Gray Failures Detection and Localization,” https://github.com/graytower/INT_DETECT, accessed 01-20-2021.[220] B. Niu, J. Kong, S. Tang, Y. Li, and Z. Zhu, “Visualize Your IP-Over-Optical Network in Realtime: A P4-Based Flexible Multilayer In-BandNetwork Telemetry (ML-INT) System,”

IEEE ACCESS , vol. 7, 2019.[221] N. S. Kagami, R. I. T. da Costa Filho, and L. P. Gaspary, “CAPEST:Ofﬂoading Network Capacity and Available Bandwidth Estimationto Programmable Data Planes,”

IEEE Transactions on Network andService Management (TNSM) , vol. 17, 2020.[222] “GitHub: Capest,” https://github.com/nicolaskagami/capest, accessed01-20-2021.[223] N. Choi, L. Jagadeesan, Y. Jin, N. N. Mohanasamy, M. R. Rahman,K. Sabnani, and M. Thottan, “Run-time Performance Monitoring,Veriﬁcation, and Healing of End-to-End Services,” in

IEEE Conferenceon Network Softwarization (NetSoft) , 2019.[224] A. Sgambelluri, F. Paolucci, A. Giorgetti, D. Scano, and F. Cugini,“Exploiting Telemetry in Multi-Layer Networks,” in

InternationalConference on Transparent Optical Networks (ICTON) , 2020. [225] Y. Feng, S. Panda, S. G. Kulkarni, K. K. Ramakrishnan, andN. Dufﬁeld, “A SmartNIC-Accelerated Monitoring Platform for In-band Network Telemetry,” in IEEE International Symposium on Localand Metropolitan Area Networks (LANMAN) , 2020.[226] J. Marques, K. Levchenko, and L. Gaspary, “IntSight: Diagnosing SLOViolations with in-Band Network Telemetry,” in

ACM Conference onemerging Networking EXperiments and Technologies (CoNEXT) , 2020.[227] “GitHub: IntSight,” https://github.com/jonadmark/intsight-conext, ac-cessed 01-20-2021.[228] D. Suh, S. Jang, S. Han, S. Pack, and X. Wang, “Flexible sampling-based in-band network telemetry in programmable data plane,”

ICTExpress , vol. 6, 2020.[229] S. Narayana, A. Sivaraman, V. Nathan, P. Goyal, V. Arun, M. Alizadeh,V. Jeyakumar, and C. Kim, “Language-Directed Hardware Design forNetwork Performance Monitoring,” in

ACM SIGCOMM Conference ,2017.[230] V. Nathan, S. Narayana, A. Sivaraman, P. Goyal, V. Arun, M. Alizadeh,V. Jeyakumar, and C. Kim, “Demonstration of the Marple System forNetwork Performance Monitoring,” in

ACM SIGCOMM ConferencePosters and Demos , 2017.[231] “GitHub: Marple,” https://github.com/performance-queries/marple, ac-cessed 01-20-2021.[232] P. Laffranchini, L. Rodrigues, M. Canini, and B. Krishnamurthy, “Mea-surements As First-class Artifacts,” in

IEEE International Conferenceon Computer Communications (INFOCOM) , 2019.[233] “GitHub: Maﬁa,” https://github.com/paololaff/maﬁa-sdn, accessed 01-20-2021.[234] A. Gupta, R. Harrison, M. Canini, N. Feamster, J. Rexford, andW. Willinger, “Sonata: Query-Driven Streaming Network Telemetry,”in

ACM Symposium on SDN Research (SOSR) , 2018.[235] “GitHub: SONATA,” https://github.com/Sonata-Princeton/SONATA-DEV, accessed 01-20-2021.[236] R. Teixeira, R. Harrison, A. Gupta, and J. Rexford, “PacketScope:Monitoring the Packet Lifecycle Inside a Switch,” in

ACM Symposiumon SDN Research (SOSR) , 2020.[237] Y. Gao, Y. Jing, and W. Dong, “UniROPE: Universal and RobustPacket Trajectory Tracing for Software-Deﬁned Networks,”

IEEE/ACMTransactions on Networking (ToN) , vol. 26, 2018.[238] S. Knossen, J. Hill, and P. Grosso, “Hop Recording and ForwardingState Logging: Two Implementations for Path Tracking in P4,” in

IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS) ,2019.[239] A. Indra Basuki, D. Rosiyadi, and I. Setiawan, “Preserving NetworkPrivacy on Fine-grain Path-tracking Using P4-based SDN,” in

Inter-national Conference on Radar, Antenna, Microwave, Electronics, andTelecommunications (ICRAMET) , 2020.[240] R. Joshi, T. Qu, M. C. Chan, B. Leong, and B. T. Loo, “BurstRadar:Practical Real-time Microburst Monitoring for Datacenter Networks,”in

ACM SIGOPS Asia-Paciﬁc Workshop on System (APSys) , 2018.[241] “GitHub: BurstRadar,” https://github.com/harshgondaliya/burstradar,accessed 01-20-2021.[242] M. Ghasemi, T. Benson, and J. Rexford, “Dapper: Data Plane Per-formance Diagnosis of TCP,” in

ACM Symposium on SDN Research(SOSR) , 2017.[243] C.-H. He, B. Y. Chang, S. Chakraborty, C. Chen, and L. C. Wang, “AZero Flow Entry Expiration Timeout P4 Switch,” in

ACM Symposiumon SDN Research (SOSR) , 2018.[244] A. Riesenberg, Y. Kirzon, M. Bunin, E. Galili, G. Navon, andT. Mizrahi, “Time-Multiplexed Parsing in Marking-Based NetworkTelemetry,” in

ACM International Conference on Systems and Storage(SYSTOR) , 2019.[245] “GitHub: P4 Alternate Marking Algorithm,” https://github.com/AlternateMarkingP4/FlaseClase, accessed 01-20-2021.[246] S. Y. Wang, H. W. Hu, and Y. B. Lin, “Design and Implementationof TCP-Friendly Meters in P4 Switches,”

IEEE/ACM Transactions onNetworking (ToN) , vol. 28, 2020.[247] R. Kundel, F. Siegmund, J. Blendin, A. Rizk, and B. Koldehofe,“P4STA: High Performance Packet Timestamping with ProgrammablePacket Processors,” in

IEEE/IFIP Network Operations and Manage-ment Symposium (NOMS) , 2020.[248] “GitHub: P4STA,” https://github.com/ralfkundel/P4STA, accessed 01-20-2021.[249] R. Hark, D. Bhat, M. Zink, R. Steinmetz, and A. Rizk, “PreprocessingMonitoring Information on the SDN Data-Plane using P4,” in

IEEEConference on Network Function Virtualization and Software-DeﬁnedNetworking (NFV-SDN) , 2019. [250] D. Ding, M. Savi, and D. Siracusa, “Estimating Logarithmic andExponential Functions to Track Network Trafﬁc Entropy in P4,” in

IEEE/IFIP Network Operations and Management Symposium (NOMS) ,2020.[251] “GitHub: P4Entropy,” https://github.com/DINGDAMU/P4Entropy, ac-cessed 01-20-2021.[252] P. Taffet and J. Mellor-Crummey, “Lightweight, Packet-Centric Moni-toring of Network Trafﬁc and Congestion Implemented in P4,” in

IEEESymposium on High-Performance Interconnects (HOTI) , 2019.[253] Y. Lin, Y. Zhou, Z. Liu, K. Liu, Y. Wang, M. Xu, J. Bi, Y. Liu, andJ. Wu, “NetView: Towards On-Demand Network-Wide Telemetry in theData Center,” in

IEEE International Conference on Communications(ICC) , 2020.[254] J. Bai, M. Zhang, G. Li, C. Liu, M. Xu, and H. Hu, “FastFE: Ac-celerating ML-Based Trafﬁc Analysis with Programmable Switches,”in

Workshop on Secure Programmable Network Infrastructure (SPIN) ,2020.[255] J. Kuˇcera, R. B. Basat, M. Kuka, G. Antichi, M. Yu, and M. Mitzen-macher, “Detecting Routing Loops in the Data Plane,” in

ACMConference on emerging Networking EXperiments and Technologies(CoNEXT) , 2020.[256] Z. Hang, Y. Shi, M. Wen, and C. Zhang, “TBSW: Time-BasedSliding Window Algorithm for Network Trafﬁc Measurement,” in

IEEE International Conference on High Performance Computing andCommunications; IEEE International Conference on Smart City; IEEEInternational Conference on Data Science and Systems (HPCC/S-martCity/DSS) , 2019.[257] B. Guan and S. Shen, “FlowSpy: An Efﬁcient Network MonitoringFramework Using P4 in Software-Deﬁned Networks,” in

IEEE Semi-annual Vehicular Technology Conference (VTC)

Optical Fiber Communication Conference (OFC)

ACM SIGCOMM Conference , 2015.[270] “GitHub: DC.p4,” https://github.com/p4lang/papers/tree/master/sosr15,accessed 01-20-2021.[271] “Open Network Foundation: P4 apps at ONF,” https://github.com/p4lang/p4-applications/blob/master/meeting_slides/2018_04_19_ONF.pdf, accessed 01-20-2021.[272] “GitHub: fabric.p4,” https://github.com/opennetworkinglab/onos/blob/master/pipelines/fabric/impl/src/main/resources/fabric.p4, accessed 01-20-2021.[273] B. Pit-Claudel, Y. Desmouceaux, P. Pﬁster, M. Townsley, andT. Clausen, “Stateless Load-Aware Load Balancing in P4,” in

IEEEInternational Conference on Network Protocols (ICNP) , 2018.[274] R. Miao, H. Zeng, C. Kim, J. Lee, and M. Yu, “SilkRoad: MakingStateful Layer-4 Load Balancing Fast and Cheap using SwitchingASICs,” in

ACM SIGCOMM Conference , 2017. [275] N. Katta, M. Hira, C. Kim, A. Sivaraman, and J. Rexford, “HULA:Scalable Load Balancing using Programmable Data Planes,” in ACMSymposium on SDN Research (SOSR) , 2016.[276] C. H. Benet, A. J. Kassler, T. Benson, and G. Pongracz, “MP-HULA:Multipath Transport Aware Load Balancing using Programmable DataPlanes,” in

Morning Workshop on In-Network Computing , 2018.[277] B. T. Chiang and K. Wang, “Cost-effective Congestion-aware LoadBalancing for Datacenters,” in

International Conference on Electronics,Information, and Communication (ICEIC) , 2019.[278] J.-L. Ye, C. Chen, and Y. H. Chu, “A Weighted ECMP Load BalancingScheme for Data Centers using P4 Switches,” in

IEEE InternationalConference on Cloud Networking (IEEE CloudNet) , 2018.[279] K.-F. Hsu, P. Tammana, R. Beckett, A. Chen, J. Rexford, and D. Walker,“Adaptive Weighted Trafﬁc Splitting in Programmable Data Planes,”in

ACM Symposium on SDN Research (SOSR) , 2020.[280] M. Pizzutti and A. Schaeffer-Filho, “An Efﬁcient Multipath MechanismBased on the Flowlet Abstraction and P4,” in

IEEE Global Communi-cations Conference (GLOBECOM) , 2018.[281] ——, “Adaptive Multipath Routing based on Hybrid Data and ControlPlane Operation,” in

IEEE International Conference on ComputerCommunications (INFOCOM) , 2020.[282] J. Zhang, S. Wen, J. Zhang, H. Chai, T. Pan, T. Huang, L. Zhang,Y. Liu, and F. R. Yu, “Fast Switch-Based Load Balancer ConsideringApplication Server States,”

IEEE/ACM Transactions on Networking(ToN) , vol. 28, 2020.[283] Q. Li, J. Zhang, T. Pan, T. Huang, and Y. Liu, “Data-driven Routing Op-timization based on Programmable Data Plane,” in

IEEE InternationalConference on Computer Communications and Networks (ICCCN) ,2020.[284] E. Kawaguchi, H. Kasuga, and N. Shinomiya, “Unsplittable ﬂow EdgeLoad factor Balancing in SDN using P4 Runtime,” in

InternationalTelecommunication Networks and Applications Conference (ITNAC) ,2019.[285] E. Cidon, S. Choi, S. Katti, and N. McKeown, “AppSwitch:Application-layer Load Balancing withina Software Switch,” in

Asia-Paciﬁc Workshop on Networking (APnet) , 2017.[286] V. Olteanu, A. Agache, A. Voinescu, and C. Raiciu, “Stateless Dat-acenter Load-balancing with Beamer,” in

USENIX Symposium onNetworked Systems Design & Implementation (NSDI) , 2018.[287] “GitHub: Beamer,” https://github.com/Beamer-LB, accessed 01-25-2021.[288] J. Geng, J. Yan, and Y. Zhang, “P4QCN: Congestion Control using P4-Capable Device in Data Center Networks,”

Electronics Journal , vol. 8,2019.[289] J. Jiang and Y. Zhang, “An Accurate Congestion Control Mechanismin Programmable Network,” in

IEEE Annual Computing and Commu-nication Workshop and Conference (CCWC) , 2019.[290] S. Shahzad, E. Jung, J. Chung, and R. Kettimuthu, “Enhanced ExplicitCongestion Notiﬁcation (EECN) in TCP with P4 Programming,” in

International Conference on Green and Human Information Technology(ICGHIT) , 2020.[291] C. Chen, H. Fang, and M. S. Iqbal, “QoSTCP: Provide ConsistentRate Guarantees to TCP ﬂows in Software Deﬁned Networks,” in

IEEEInternational Conference on Communications (ICC) , 2020.[292] A. Laraba, J. François, I. Chrisment, S. R. Chowdhury, and R. Boutaba,“Defeating Protocol Abuse with P4: Application to Explicit CongestionNotiﬁcation,” in

IFIP-TC6 Networking Conference (Networking) , 2020.[293] N. K. Sharma, M. Liu, K. Atreya, and A. Krishnamurthy, “Ap-proximating Fair Queueing on Reconﬁgurable Switches,” in

USENIXSymposium on Networked Systems Design & Implementation (NSDI) ,2018.[294] C. Cascone, N. Bonelli, L. Bianchi, A. Capone, and B. Sansò, “TowardsApproximate Fair Bandwidth Sharing via Dynamic Priority Queuing,”in

IEEE International Symposium on Local and Metropolitan AreaNetworks (LANMAN) , 2017.[295] D. Bhat, J. Anderson, P. Ruth, M. Zink, and K. Keahey, “Application-based QoE support with P4 and OpenFlow,” in

IEEE Conference onComputer Communications Workshops (INFOCOM WKSHPS) , 2019.[296] E. F. Kfoury, J. Crichigno, E. Bou-Harb, D. Khoury, and G. Srivastava,“Enabling TCP Pacing using Programmable Data Plane Switches,” in

International Conference on Telecommunications and Signal Process-ing (TSP) , 2019.[297] Y. Chen, L. Yen, W. Wang, C. Chuang, Y. Liu, and C. Tseng, “P4-Enabled Bandwidth Management,” in

Asia-Paciﬁc Network Operationsand Management Symposium (APNOMS) , 2019. [298] S. S. W. Lee and K. Chan, “A Trafﬁc Meter Based on a MulticolorMarker for Bandwidth Guarantee and Priority Differentiation in SDNVirtual Networks,”

IEEE Transactions on Network and Service Man-agement (TNSM) , vol. 16, 2019.[299] S.-Y. Wang, J.-Y. Li, and Y.-B. Lin, “Aggregating and disaggregatingpackets with various sizes of payload in P4 switches at 100 Gbps linerate,”

Journal of Network and Computer Applications (JNCA) , vol. 165,2020.[300] K. Tokmakov, M. Sarker, J. Domaschka, and S. Wesner, “A Case forData Centre Trafﬁc Management on Software Programmable EthernetSwitches,” in

IEEE International Conference on Cloud Networking(IEEE CloudNet) , 2019.[301] B. Turkovic, F. Kuipers, N. van Adrichem, and K. Langendoen, “FastNetwork Congestion Detection and Avoidance using P4,” in

Workshopon Networking for Emerging Applications and Technologies (NEAT) ,2018.[302] R. Kundel, J. Blendin, T. Viernickel, B. Koldehofe, and R. Steinmetz,“P4-CoDel: Active Queue Management in Programmable Data Planes,”in

IEEE Conference on Network Function Virtualization and Software-Deﬁned Networking (NFV-SDN) , 2018.[303] “GitHub: P4-CoDel,” https://github.com/ralfkundel/p4-codel, accessed01-20-2021.[304] M. Menth, H. Mostafaei, D. Merling, and M. Häberle, “Implementationand Evaluation of Activity-Based Congestion Management using P4(P4-ABC),”

MDPI Future Internet Journal (FI) , vol. 11, 2019.[305] B. Turkovic and F. Kuipers, “P4air: Increasing Fairness among Compet-ing Congestion Control Algorithms,” in

IEEE International Conferenceon Network Protocols (ICNP) , 2020.[306] L. B. Fernandes and L. Camargos, “Bandwidth throttling in a P4switch,” in

IEEE Conference on Network Function Virtualization andSoftware-Deﬁned Networking (NFV-SDN) , 2020.[307] G. Wang, C. Chen, C. Chen, L. Pan, Y. Wang, C. Fan, and C. Hsu,“Streaming Scalable Video Sequences with Media-Aware NetworkElements Implemented in P4 Programming Language,” in

IEEE/IFIPNetwork Operations and Management Symposium (NOMS) , 2018.[308] A. G. Alcoz, A. Dietmüller, and L. Vanbever, “SP-PIFO: Approxi-mating Push-In First-Out Behaviors using Strict-Priority Queues,” in

USENIX Symposium on Networked Systems Design & Implementation(NSDI) , 2020.[309] B. Andrus, S. A. Sasu, T. Szyrkowiec, A. Autenrieth, M. Chamania,J. K. Fischer, and S. Rasp, “Zero-Touch Provisioning of DistributedVideo Analytics in a Software-Deﬁned Metro-Haul Network with P4Processing,” in

Optical Fiber Communication Conference (OFC) , 2019.[310] S. Ibanez, G. Antichi, G. Brebner, and N. McKeown, “Event-DrivenPacket Processing,” in

ACM Workshop on Hot Topics in Networks(HotNets) , 2019.[311] E. F. Kfoury, J. Crichigno, and E. Bou-Harb, “Ofﬂoading MediaTrafﬁc to Programmable Data Plane Switches,” in

IEEE InternationalConference on Communications (ICC) , 2020.[312] I. Kettaneh, S. Udayashankar, A. Abdel-hadi, R. Grosman, and S. Al-Kiswany, “Falcon: Low Latency, Network-Accelerated Scheduling,” in

P4 Workshop in Europe (EuroP4) , 2020.[313] T. Osi´nski, M. Kossakowski, M. Pawlik, J. Palim ˛aka, M. Sala, andH. Tarasiuk, “Unleashing the Performance of Virtual BNG by Ofﬂoad-ing Data Plane to a Programmable ASIC,” in

P4 Workshop in Europe(EuroP4) , 2020.[314] J. Lee, R. Miao, C. Kim, M. Yu, and H. Zeng, “Stateful Layer-4Load Balancing in Switching ASICs,” in

ACM SIGCOMM ConferencePosters and Demos , 2017.[315] K. Nichols, V. Jacobson, A. McGregor, and J. Iyengar, “ControlledDelay Active Queue Management,” Internet Requests for Comments,RFC Editor, RFC 8289, 01 2018. [Online]. Available: https://tools.ietf.org/rfc/rfc8289.txt[316] B. Lewis, L. Fawcett, M. Broadbent, and N. Race, “Using P4 to EnableScalable Intents in Software Deﬁned Networks,” in

IEEE InternationalConference on Network Protocols (ICNP) , 2018.[317] “GitHub: P4 Source Routing,” https://github.com/BenRLewis/P4-Source-Routing, accessed 01-20-2021.[318] L. Luo, H. Yu, S. Luo, Z. Ye, X. Du, and M. Guizani, “Scalable ExplicitPath Control in Software-Deﬁned Networks,”

Journal of Network andComputer Applications (JNCA) , vol. 141, 2019.[319] “GitHub: P4 Paco,” https://github.com/an15m/paco, accessed 01-20-2021.[320] A. Kushwaha, S. Sharma, N. Bazard, A. Gumaste, and B. Mukherjee,“Design, Analysis, and a Terabit Implementation of a Source-Routing-Based SDN Data Plane,”

IEEE Systems Journal , 2020. [321] A. Abdelsalam, A. Tulumello, M. Bonola, S. Salsano, and C. Filsﬁls,“Pushing Network Programmability to the limits with SRv6 uSIDs andP4,” in P4 Workshop in Europe (EuroP4) , 2020.[322] W. Braun, J. Hartmann, and M. Menth, “Demo: Scalable and ReliableSoftware-Deﬁned Multicast with BIER and P4,” in

IFIP/IEEE Sympo-sium on Integrated Management (IM) , 2017.[323] “Bitbucket: p4-bfr),” https://bitbucket.org/wb-ut/p4-bfr, accessed 01-20-2021.[324] D. Merling, S. Lindner, and M. Menth, “P4-Based Implementation ofBIER and BIER-FRR for Scalable and Resilient Multicast,”

Journal ofNetwork and Computer Applications (JNCA) , vol. 169, 2020.[325] “GitHub: P4-BIER,” https://github.com/uni-tue-kn/p4-bier, accessed01-20-2021.[326] M. Shahbaz, L. Suresh, J. Rexford, N. Feamster, O. Rottenstreich, andM. Hira, “Elmo: Source Routed Multicast for Public Clouds,” in

ACMSpecial Interest Group on Data Communication , 2019.[327] “GitHub: Elmo MCast,” https://github.com/Elmo-MCast/p4-programs,accessed 01-20-2021.[328] S. Luo, H. Yu, K. Li, and H. Xing, “Efﬁcient File Dissemination inData Center Networks with Priority-based Adaptive Multicast,”

IEEEJournal on Selected Areas in Communications (JSAC) , vol. 38, 2020.[329] C. Wernecke, H. Parzyjegla, G. Mühl, P. Danielis, and D. Timmer-mann, “Realizing Content-Based Publish/Subscribe with P4,” in

IEEEConference on Network Function Virtualization and Software-DeﬁnedNetworking (NFV-SDN) , 2018.[330] C. Wernecke, H. Parzyjegla, G. Mühl, E. Schweissguth, and D. Tim-mermann, “Flexible Notiﬁcation Forwarding for Content-Based Pub-lish/Subscribe Using P4,” in

IEEE Conference on Network FunctionVirtualization and Software-Deﬁned Networking (NFV-SDN) , 2020.[331] C. Wernecke, H. Parzyjegla, and G. Mühl, “Implementing Content-based Publish/Subscribe on the Network Layer with P4,” in

IEEEConference on Network Function Virtualization and Software-DeﬁnedNetworking (NFV-SDN) , 2020.[332] C. Wernecke, H. Parzyjegla, G. Mühl, P. Danielis, E. Schweissguth,and D. Timmermann, “Stitching Notiﬁcation Distribution Trees forContent-based Publish/Subscribe with P4,” in

IEEE Conference onNetwork Function Virtualization and Software-Deﬁned Networking(NFV-SDN) , 2020.[333] T. Jepsen, M. Moshref, A. Carzaniga, N. Foster, and R. Soulé, “PacketSubscriptions for Programmable ASICs,” in

ACM Workshop on HotTopics in Networks (HotNets) , 2018.[334] R. Kundel, C. Gaertner, M. Luthra, S. Bhowmik, and B. Koldehofe,“Flexible Content-based Publish/Subscribe over Programmable DataPlanes,” in

IEEE/IFIP Network Operations and Management Sympo-sium (NOMS) , 2020.[335] “GitHub: p4bsub,” https://github.com/ralfkundel/p4bsub/, accessed 01-20-2021.[336] J. Vestin, A. Kassler, S. Laki, and G. Pongrácz, “Towards In-NetworkEvent Detection and Filtering for Publish/Subscribe Communicationusing Programmable Data Planes,”

IEEE Transactions on Network andService Management (TNSM) , 2020.[337] S. Signorello, R. State, J. François, and O. Festor, “NDN.p4: Pro-gramming Information-Centric Data-Planes,” in

IEEE Conference onNetwork Softwarization (NetSoft) , 2016.[338] R. Miguel, S. Signorello, and F. M. V. Ramos, “Named Data Network-ing with Programmable Switches,” in

IEEE International Conferenceon Network Protocols (ICNP) , 2018.[339] “GitHub: NDN.p4,” https://github.com/signorello/NDN.p4, accessed01-20-2021.[340] “GitHub: NDN.p4-16,” https://github.com/netx-ulx/NDN.p4-16, ac-cessed 01-20-2021.[341] O. Karrakchou, N. Samaan, and A. Karmouch, “ENDN: An EnhancedNDN Architecture with a P4-programmable Data Plane,” in

Interna-tional Conference on Networking (ICN) , 2020.[342] R. Sedar, M. Borokhovich, M. Chiesa, G. Antichi, and S. Schmid,“Supporting Emerging Applications With Low-Latency Failover in P4,”in

Morning Workshop on In-Network Computing , 2018.[343] “GitHub: P4-FRR,” https://bitbucket.org/roshanms/p4-frr/src/master/,accessed 01-20-2021.[344] H. Giesen, L. Shi, J. Sonchack, A. Chelluri, N. Prabhu, N. Sultana,L. Kant, A. J. McAuley, A. Poylisher, A. DeHon, and B. T. Loo,“In-Network Computing to the Rescue of Faulty Links,” in

MorningWorkshop on In-Network Computing , 2018.[345] T. Qu, R. Joshi, M. Chan, B. Leong, D. Guo, and Z. Liu, “SQR: In-network Packet Loss Recovery from Link Failures for Highly ReliableDatacenter Networks,” in

IEEE International Conference on NetworkProtocols (ICNP) , 2019. [346] “GitHub: P4 SQR,” https://git.io/fjbnV, accessed 01-20-2021.[347] S. Lindner, D. Merling, M. Häberle, and M. Menth, “P4-Protect: 1+1Path Protection for P4,” in

P4 Workshop in Europe (EuroP4) , 2020.[348] “GitHub: P4-Protect BMv2,” https://github.com/uni-tue-kn/p4-protect,accessed 01-20-2021.[349] “GitHub: P4-Protect Toﬁno,” https://github.com/uni-tue-kn/p4-protect-toﬁno, accessed 01-20-2021.[350] K. Hirata, , and T. Tachibana, “Implementation of Multiple RoutingConﬁgurations on Software-Deﬁned Networks with P4,” in

Asia-PaciﬁcSignal and Information Processing Association Annual Summit andConference (APSIPA ASC) , 2019.[351] S. Lindner, M. Häberle, F. Heimgaertner, N. Nayak, S. Schildt,D. Grewe, H.Loehr, and M. Ment, “P4 In-Network Source Protectionfor Sensor Failover,” in

IFIP-TC6 Networking Conference (Network-ing) , 2020.[352] “GitHub: P4 Source Protection BMv2,” https://github.com/uni-tue-kn/p4-source-protection, accessed 01-20-2021.[353] “GitHub: P4 Source Protection Toﬁno,” https://github.com/uni-tue-kn/p4-source-protection-toﬁno, accessed 01-20-2021.[354] K. Subramanian, A. Abhashkumar, L. D’Antoni, and A. Akella, “D2R:Dataplane-Only Policy-Compliant Routing Under Failures,” 2019.[355] M. Chiesa, R. Sedar, G. Antichi, M. Borokhovich, A. Kamisi´nski,G. Nikolaidis, and S. Schmid, “PURR: A Primitive for ReconﬁgurableFast Reroute,” in

ACM Conference on emerging Networking EXperi-ments and Technologies (CoNEXT) , 2019.[356] T. Holterbach, E. C. Molero, M. Apostolaki, A. Dainotti, S. Vissicchio,and L. Vanbever, “Blink: Fast Connectivity Recovery Entirely in theData Plane,” in

USENIX Symposium on Networked Systems Design &Implementation (NSDI) , 2019.[357] “GitHub: Blink,” https://github.com/nsg-ethz/Blink, accessed 01-20-2021.[358] K.-F. Hsu, R. Beckett, A. Chen, J. Rexford, and D. Walker, “Contra:A Programmable System for Performance-aware Routing,” in

USENIXSymposium on Networked Systems Design & Implementation (NSDI) ,2020.[359] O. Michel and E. Keller, “Policy Routing using Process-Level Identi-ﬁers,” in

IEEE International Conference on Cloud Engineering Work-shop (IC2EW) , 2016.[360] A. C. Baktir, A. Ozgovde, and C. Ersoy, “Implementing Service-CentricModel with P4: A Fully-Programmable Approach,” in

IEEE/IFIPNetwork Operations and Management Symposium (NOMS) , 2018.[361] W. Froes, L. Santos, L. N. Sampaio, M. Martinello, A. Liberato, andR. S. Villaca, “ProgLab: Programmable Labels for QoS Provisioningon Software Deﬁned Networks,”

Computer Communications , vol. 161,2020.[362] N. VARYANI, Z.-L. ZHANG, and D. DAI, “QROUTE: An EfﬁcientQuality of Service (QoS) Routing Scheme for Software-Deﬁned Over-lay Networks,”

IEEE ACCESS , vol. 8, 2020.[363] S. Gimenez, E. Grasa, and S. Bunch, “A Proof of Concept Implemen-tation of a RINA Interior Router using P4-enabled Software Targets,”in

Conference on Innovation in Clouds, Internet and Networks andWorkshops (ICIN) , 2020.[364] W. Feng, X. Tan, and Y. Jin, “Implementing ICN over P4 in HTTPScenario,” in

IEEE International Conference on Hot Information-Centric Networking (HotICN) , 2019.[365] G. Grigoryan, Y. Liu, and M. Kwon, “PFCA: A Programmable FIBCaching Architecture,”

IEEE/ACM Transactions on Networking (ToN) ,vol. 28, 2020.[366] A. McAuley, Y. M. Gottlieb, L. Kant, J. Lee, and A. Poylisher, “P4-Based Hybrid Error Control Booster Providing New Design Tradeoffsin Wireless Networks,” in

IEEE Military Communications Conference(MILCOM) , 2019.[367] M. Kogias, G. Prekas, A. Ghosn, J. Fietz, and E. Bugnion, “R2P2:Making RPCs ﬁrst-class datacenter citizens,” in

USENIX Annual Tech-nical Conference (ATC) , 2019.[368] “GitHub: R2P2 - Request Response Pair Protocol,” https://github.com/epﬂ-dcsl/r2p2, accessed 01-25-2021.[369] D. Merling, M. Menth, N. Warnke, and T. Eckert, “An Overview ofBit Index Explicit Replication (BIER),”

IETF Journal , 2018.[370] M. Hollingsworth, J. Lee, Z. Liu, J. Lee, S. Ha, and D. Grunwald,“P4EC: Enabling Terabit Edge Computing in Enterprise 4G LTE,” in

USENIX Workshop on Hot Topics in Edge Computing (HotEdge) , 2020.[371] “GitHub: spgw.p4,” https://github.com/opennetworkinglab/onos/blob/master/pipelines/fabric/impl/src/main/resources/include/control/spgw.p4, accessed 01-20-2021. [372] P. Palagummi and K. M. Sivalingam, “SMARTHO: A Network InitiatedHandover in NG-RAN using P4-based Switches,” in InternationalConference on Network and Services Management (CNSM) , 2018.[373] A. Aghdai, M. Huang, D. Dai, Y. Xu, and J. Chao, “Transparent EdgeGateway for Mobile Networks,” in

IEEE International Conference onNetwork Protocols (ICNP) , 2018.[374] A. Aghdai, Y. Xu, M. Huang, D. H. Dai, and H. J. Chao, “Enabling Mo-bility in LTE-Compatible Mobile-edge Computing with ProgrammableSwitches,”

ArXiv e-prints , 2019.[375] J. Xie, C. Qian, D. Guo, X. Li, S. Shi, and H. Chen, “EfﬁcientData Placement and Retrieval Services in Edge Computing,” in

IEEEInternational Conference on Distributed Computing Systems (ICDCS) ,2019.[376] J. Xie, D. Guo, X. Shi, H. Cai, C. Qian, and H. Chen, “A Fast HybridData Sharing Framework for Hierarchical Mobile Edge Computing,”in

IEEE International Conference on Computer Communications (IN-FOCOM) , 2020.[377] C. Shen, D. Lee, C. Ku, M. Lin, K. Lu, and S. Tan, “A Pro-grammable and FPGA-accelerated GTP Ofﬂoading Engine for MobileEdge Computing in 5G Networks,” in

IEEE Conference on ComputerCommunications Workshops (INFOCOM WKSHPS) , 2019.[378] C. Lee, K. Ebisawa, H. Kuwata, M. Kohno, and S. Matsushima,“Performance Evaluation of GTP-U and SRv6 Stateless Translation,”in

International Conference on Network and Services Management(CNSM) , 2019.[379] R. Ricart-Sanchez, P. Malagon, J. M. Alcaraz-Calero, and Q. Wang,“P4-NetFPGA-based network slicing solution for 5G MEC architec-tures,” in

ACM/IEEE Symposium on Architectures for Networking andCommunications Systems (ANCS) , 2019.[380] S. K. Singh, C. E. Rothenberg, G. Patra, and G. Pongracz, “Of-ﬂoading Virtual Evolved Packet Gateway User Plane Functions to aProgrammable ASIC,” in

ACM CoNEXT Workshop on Emerging In-Network Computing Paradigms , 2019.[381] R. Shah, V. Kumar, M. Vutukuru, and P. Kulkarni, “TurboEPC: Lever-aging Dataplane Programmability to Accelerate the Mobile PacketCore,” in

ACM Symposium on SDN Research (SOSR) , 2020.[382] P. Vörös, G. Pongrácz, and S. Laki, “Towards a Hybrid Next GenerationNodeB,” in

P4 Workshop in Europe (EuroP4) , 2020.[383] Y. Lin, T. Huang, and S. Tsai, “Enhancing 5G/IoT Transport SecurityThrough Content Permutation,”

IEEE ACCESS , vol. 7, 2019.[384] M. Uddin, S. Mukherjee, H. Chang, and T. V. Lakshman, “SDN-BasedService Automation for IoT,” in

IEEE International Conference onNetwork Protocols (ICNP) , 2017.[385] ——, “SDN-Based Multi-Protocol Edge Switching for IoT ServiceAutomation,”

IEEE Journal on Selected Areas in Communications(JSAC) , vol. 36, 2018.[386] S.-Y. Wang, C.-M. Wu, Y.-B. Linm, and C.-C. Huang, “High-SpeedData-Plane Packet Aggregation and Disaggregation by P4 Switches,”

Journal of Network and Computer Applications (JNCA) , vol. 142, 2019.[387] A. L. R. Madureira, F. R. C. Araújo, and L. N. Sampaio, “Onsupporting IoT data aggregation through programmable data planes,”

Computer Networks , vol. 177, 2020.[388] P. Engelhard, A. Zachlod, J. Schulz-Zander, and S. Du, “Toward scal-able and virtualized massive wireless sensor networks,” in

InternationalConference on Networked Systems (NetSys) , 2019.[389] J. Vestin, A. Kassler, and J. Åkerberg, “FastReact: In-Network Controland Caching for Industrial Control Networks using Programmable DataPlanes,” in

IEEE International Conference on Emerging Technologiesand Factory Automation (ETFA) , 2018.[390] F. E. R. Cesen, L. Csikor, C. Recalde, C. E. Rothenberg, andG. Pongrácz, “Towards Low Latency Industrial Robot Control inProgrammable Data Planes,” in

IEEE Conference on Network Soft-warization (NetSoft) , 2020.[391] J. Rüth, R. Glebke, K. Wehrle, V. Causevic, and S. Hirche, “TowardsIn-Network Industrial Feedback Control,” in

Morning Workshop on In-Network Computing , 2018.[392] P. G. Kannan, R. Joshi, and M. C. Chan, “Precise Time-Synchronizationin the Data-Plane using Programmable Switching ASICs,” in

ACMSymposium on SDN Research (SOSR) , 2019.[393] R. Kundel, F. Siegmund, and B. Koldehofe, “How to Measure the Speedof Light with Programmable Data Plane Hardware?” in

P4 Workshopin Europe (EuroP4) , 2019.[394] G. Bonoﬁglio, V. Iovinella, G. Lospoto, and G. D. Battista, “Kathará:A Container-Based Framework for Implementing Network FunctionVirtualization and Software Deﬁned Networks,” in

IEEE/IFIP NetworkOperations and Management Symposium (NOMS) , 2018. [395] M. He, A. Basta, A. Blenk, N. Deric, and W. Kellerer, “P4NFV:An NFV Architecture with Flexible Data Plane Reconﬁguration,”in

International Conference on Network and Services Management(CNSM) , 2018.[396] T. Osi´nski, H. Tarasiuk, M. Kossakowski, and R. Picard, “OfﬂoadingData Plane Functions to the Multi-Tenant Cloud Infrastructure usingP4,” in

P4 Workshop in Europe (EuroP4) , 2019.[397] D. Moro, G. Verticale, and A. Capone, “A Framework for NetworkFunction Decomposition and Deployment,” in

International Workshopon the Design of Reliable Communication Networks (DRCN) , 2020.[398] T. Osi´nski, H. Tarasiuk, L. Rajewski, and E. Kowalczyk, “DPPx: AP4-based Data Plane Programmability and Exposure framework toenhance NFV services,” in

IEEE Conference on Network Softwarization(NetSoft) , 2019.[399] A. Mohammadkhan, S. Panda, S. G. Kulkarni, K. K. Ramakrishnan,and L. N. Bhuyan, “P4NFV: P4 Enabled NFV Systems with Smart-NICs,” in

IEEE Conference on Network Function Virtualization andSoftware-Deﬁned Networking (NFV-SDN) , 2019.[400] D. Moro, M. Peuster, H. Karl, and A. Capone, “FOP4: FunctionOfﬂoading Prototyping in Heterogeneous and Programmable NetworkScenarios,” in

IEEE Conference on Network Function Virtualizationand Software-Deﬁned Networking (NFV-SDN) , 2019.[401] ——, “Demonstrating FOP4: A Flexible Platform to Prototype NFVOfﬂoading Scenarios,” in

IEEE Conference on Network FunctionVirtualization and Software-Deﬁned Networking (NFV-SDN) , 2019.[402] D. R. Maﬁoletti, C. K. Dominicini, M. Martinello, M. R. N. Ribeiro,and R. d. S. Villaça, “Piaffe: A place-as-you-go in-network frameworkfor ﬂexible embedding of vnfs,” in

IEEE International Conference onCommunications (ICC) , 2020.[403] X. Chen, D. Zhang, X. Wang, K. Zhu, and H. Zhou, “P4SC: TowardsHigh-Performance Service Function Chain Implementation on the P4-Capable Device,” in

IFIP/IEEE Symposium on Integrated Management(IM) , 2019.[404] D. Zhang, X. Chen, Q. Huang, X. Hong, C. Wu, H. Zhou, Y. Yang,H. Liu, and Y. Chen, “P4SC: A High Performance and FlexibleFramework for Service Function Chain,”

IEEE ACCESS , vol. 7, 2019.[405] “GitHub: P4SC,” https://github.com/P4SC/p4sc, accessed 01-20-2021.[406] H. Lee, J. Lee, H. Ko, and S. Pack, “Resource-Efﬁcient ServiceFunction Chaining in Programmable Data Plane,” in

P4 Workshop inEurope (EuroP4) , 2019.[407] Y. Zhou, J. Bi, C. Zhang, M. Xu, and J. Wu, “FlexMesh: FlexiblyChaining Network Functions on Programmable Data Planes at Run-time,” in

IFIP-TC6 Networking Conference (Networking) , 2020.[408] A. Stockmayer, S. Hinselmann, M. Häberle, and M. Menth, “ServiceFunction Chaining Based on Segment Routing Using P4 and SR-IOV(P4-SFC),” in

Workshop on Virtualization in High-Performance CloudComputing (VHPC) , 2020.[409] “GitHub: P4-SFC,” https://github.com/uni-tue-kn/p4-sfc-faas, accessed01-20-2021.[410] R. Ricart-Sanchez, P. Malagon, J. M. Alcaraz-Calero, and Q. Wang,“Hardware-Accelerated Firewall for 5G Mobile Networks,” in

IEEEInternational Conference on Network Protocols (ICNP) , 2018.[411] Ruben Ricart-Sanchez and Pedro Malagon and Jose M. Alcaraz-Caleroand Qi Wang, “NetFPGA-Based Firewall Solution for 5G Multi-TenantArchitectures,” in

IEEE International Conference on Edge Computing(EDGE) , 2019.[412] J. Cao, J. Bi, Y. Zhou, and C. Zhang, “CoFilter: A High-PerformanceSwitch-Assisted Stateful Packet Filter,” in

ACM SIGCOMM ConferencePosters and Demos , 2018.[413] R. Datta, S. Choi, A. Chowdhary, and Y. Park, “P4Guard: DesigningP4 Based Firewall,” in

IEEE Military Communications Conference(MILCOM) , 2018.[414] P. Vörös and A. Kiss, “Security Middleware Programming Using P4,”in

International Conference on Human Aspects of Information Security,Privacy, and Trust (HAS) , 2016.[415] E. O. Zaballa, D. Franco, Z. Zhou, and M. S. Berger, “P4Knocking:Ofﬂoading host-based ﬁrewall functionalities to the network,” in

Con-ference on Innovation in Clouds, Internet and Networks and Workshops(ICIN) , 2020.[416] A. Almaini, A. Al-Dubai, I. Romdhani, and M. Schramm, “Delegationof Authentication to the Data Plane in Software-Deﬁned Networks,” in

IEEE International Conferences on Smart Computing, Networking andServices (SmartCNS) , 2019.[417] G. Grigoryan and Y. Liu, “LAMP: Prompt Layer 7 Attack Mitigationwith Programmable Data Planes,” in

ACM/IEEE Symposium on Archi-tectures for Networking and Communications Systems (ANCS) , 2018. [418] A. Febro, H. Xiao, and J. Spring, “Telephony Denial of Service Defenseat Data Plane (TDoSD@DP),” in IEEE/IFIP Network Operations andManagement Symposium (NOMS) , 2018.[419] ——, “Distributed SIP DDoS Defense with P4,” in

IEEE WirelessCommunications and Networking Conference (WCNC) , 2019.[420] M. Kuka, K. Vojanec, J. Kuˇcera, and P. Benáˇcek, “Accelerated DDoSAttacks Mitigation using Programmable Data Plane,” in

ACM/IEEESymposium on Architectures for Networking and CommunicationsSystems (ANCS) , 2019.[421] F. Paolucci, F. Cugini, and P. Castoldi, “P4-based Multi-Layer TrafﬁcEngineering Encompassing Cyber Security,” in

Optical Fiber Commu-nication Conference (OFC) , 2018.[422] F. Paolucci, F. Civerchia, A. Sgambelluri, A. Giorgetti, F. Cugini, andP. Castoldi, “An efﬁcient pipeline processing scheme for programmingProtocol-independent Packet Processors,”

IEEE/OSA Journal of OpticalCommunications and Networking , vol. 11, 2019.[423] Y. Mi and A. Wang, “ML-Pushback: Machine Learning Based Push-back Defense Against DDoS,” in

ACM Conference on emerging Net-working EXperiments and Technologies (CoNEXT) , 2019.[424] Y. Afek, A. Bremler-Barr, and L. Shaﬁr, “Network Anti-Spooﬁng withSDN Data Plane,” in

IEEE International Conference on ComputerCommunications (INFOCOM) , 2017.[425] A. C. Lapolli, J. A. Marques, and L. P. Gaspary, “Ofﬂoading Real-timeDDoS Attack Detection to Programmable Data Planes,” in

IFIP/IEEESymposium on Integrated Management (IM) , 2019.[426] “GitHub: ddosd-p4,” https://github.com/aclapolli/ddosd-p4, accessed01-20-2021.[427] P. Kuang, Y. Liu, and L. He, “P4DAD: Securing Duplicate AddressDetection Using P4,” in

IEEE International Conference on Communi-cations (ICC) , 2020.[428] T.-Y. Lin, J.-P. Wu, P.-H. Hung, C.-H. Shao, Y.-T. Wang, Y.-Z.Cai, and M.-H. Tsai, “Mitigating SYN ﬂooding Attack and ARPSpooﬁng in SDN Data Plane,” in

Asia-Paciﬁc Network Operations andManagement Symposium (APNOMS) , 2020.[429] F. Musumeci, V. Ionata, F. Paolucci, and M. Cugini, Filippo Tornatore,“Machine-learning-assisted DDoS attack detection with P4 language,”in

IEEE International Conference on Communications (ICC) , 2020.[430] X. Z. Khooi, L. Csikor, D. M. Divakaran, and M. S. Kang, “DIDA:Distributed In-Network Defense Architecture Against Ampliﬁed Re-ﬂection DDoS Attacks,” in

IEEE Conference on Network Softwariza-tion (NetSoft) , 2020.[431] M. Dimolianis, A. Pavlidis, and V. Maglaris, “A Multi-Feature DDoSDetection Schema on P4 Network Hardware,” in

Workshop on FlexibleNetwork Data Plane Processing (NETPROC@ICIN) , 2020.[432] D. Scholz, S. Gallenmüller, H. Stubbe, and G. Carle, “SYN FloodDefense in Programmable Data Planes,” in

P4 Workshop in Europe(EuroP4) , 2020.[433] “GitHub: syn-proxy,” https://github.com/syn-proxy, accessed 01-20-2021.[434] K. Friday, E. Kfoury, E. Bou-Harb, and J. Crichigno, “Towards aUniﬁed In-Network DDoS Detection and Mitigation Strategy,” in

IEEEConference on Network Softwarization (NetSoft) , 2020.[435] Benjamin Lewis and Matthew Broadbent and Nicholas Race, “P4ID:P4 Enhanced Intrusion Detection,” in

IEEE Conference on NetworkFunction Virtualization and Software-Deﬁned Networking (NFV-SDN) ,2019.[436] Gorby Kabasele Ndonda and Ramin Sadre, “A Two-level IntrusionDetection System for Industrial Control System Networks using P4,”in

International Symposium for ICS & SCADA Cyber Security Research(ICS-CSR) , 2018.[437] J. Hypolite, J. Sonchack, S. Hershkop, N. Dautenhahn, A. DeHon, andJ. M. Smith, “DeepMatch: Practical Deep Packet Inspection in the DataPlane Using Network Processors,” in

ACM Conference on emergingNetworking EXperiments and Technologies (CoNEXT) , 2020.[438] “GitHub: DeepMatch,” https://github.com/jhypolite/DeepMatch, ac-cessed 01-20-2021.[439] Q. Qin, K. Poularakis, K. K. Leung, and L. Tassiulas, “Line-Speedand Scalable Intrusion Detection at the Network Edge via FederatedLearning,” in

IFIP-TC6 Networking Conference (Networking) , 2020.[440] “GitHub: syn-proxy,” https://github.com/vxxx03/IFIPNetworking20,accessed 01-20-2021.[441] F. Hauser, M. Schmidt, M. Häberle, and M. Menth, “P4-MACsec: Dy-namic Topology Monitoring and Data Layer Protection With MACsecin P4-Based SDN,”

IEEE ACCESS , vol. 8, 2020.[442] “GitHub: P4-MACsec,” https://github.com/uni-tue-kn/p4-macsec, ac-cessed 01-20-2021. [443] F. Hauser, M. Häberle, M. Schmidt, and M. Menth, “P4-IPsec: Site-to-Site and Host-to-Site VPN With IPsec in P4-Based SDN,”

IEEEACCESS , vol. 8, 2020.[444] “GitHub: P4-IPsec,” https://github.com/uni-tue-kn/p4-ipsec, accessed01-20-2021.[445] T. Datta, N. Feamster, J. Rexford, and L. Wang, “SPINE: SurveillanceProtection in the Network Elements,” in

USENIX Workshop on Freeand Open Communications on the Internet (FOCI) , 2019.[446] “GitHub: SPINE,” https://github.com/SPINE-P4/spine-code, accessed01-20-2021.[447] Y. Qin, W. Quan, F. Song, L. Zhang, G. Liu, M. Liu, and C. Yu,“Flexible Encryption for Reliable Transmission Based on the P4Programmable Platform,” in

Information Communication TechnologiesConference (ICTC) , 2020.[448] G. Liu, W. Quan, N. Cheng, N. Lu, H. Zhang, and X. Shen,“P4NIS: Improving network immunity against eavesdropping withprogrammable data planes,” in

IEEE Conference on Computer Com-munications Workshops (INFOCOM WKSHPS) , 2020, pp. 91–96.[449] “GitHub: P4NIS,” https://github.com/KB00100100/P4NIS, accessed01-20-2021.[450] M. Liu, D. Gao, G. Liu, J. He, L. Jin, C. Zhou, and F. Yang, “Learningbased adaptive network immune mechanism to defense eavesdroppingattacks,”

IEEE ACCESS , vol. 7, 2019.[451] D. Chang, W. Sun, and Y. Yang, “A SDN Proactive Defense Mech-anism Based on IP Transformation,” in

International Conference onSafety Produce Informatization (IICSPI) , 2019.[452] W. Feng, Z.-L. Zhang, C. Liu, and J. Chen, “Clé: Enhancing Secu-rity with Programmable Dataplane Enabled Hybrid SDN,” in

ACMConference on emerging Networking EXperiments and Technologies(CoNEXT) , 2019.[453] P. Kuang, Y. Liu, and L. He, “P4DAD: Securing Duplicate AddressDetection Using P4,” in

IEEE International Conference on Communi-cations (ICC) , 2020.[454] X. Chen, “Implementing aes encryption on programmable switchesvia scrambled lookup tables,” in

Workshop on Secure ProgrammableNetwork Infrastructure (SPIN) , 2020.[455] “GitHub: Toﬁno AES encryption,” https://github.com/Princeton-Cabernet/p4-projects/tree/master/AES-toﬁno, accessed01-20-2021.[456] H. Gondaliya, G. C. Sankaran, and K. M. Sivalingam, “Compara-tive Evaluation of IP Address Anti-Spooﬁng Mechanisms Using aP4/NetFPGA-Based Switch,” in

P4 Workshop in Europe (EuroP4) ,2020.[457] Q. Kang, L. Xue, A. Morrison, Y. Tang, A. Chen, and X. Luo, “Pro-grammable In-Network Security for Context-aware BYOD Policies,”in

USENIX Security Symposium , 2020.[458] “GitHub: Poise,” https://github.com/qiaokang92/poise, accessed 01-20-2021.[459] J. Deng, H. Hu, H. Li, Z. Pan, K. Wang, G. Ahn, J. Bi, and Y. Park,“VNGuard: An NFV/SDN Combination Framework for Provisioningand Managing Virtual Firewalls,” in

IEEE Conference on NetworkFunction Virtualization and Software-Deﬁned Networking (NFV-SDN) ,2015.[460] H. Zhang, W. Quan, H.-c. Chao, and C. Qiao, “Smart identiﬁernetwork: A collaborative architecture for the future internet,”

NetworksMagazine , vol. 30, no. 3, 2016.[461] R. Kumar, V. Babu, and D. Nicol, “Network Coding for Critical In-frastructure Networks,” in

IEEE International Conference on NetworkProtocols (ICNP) , 2018.[462] “GitHub: AquaFlow,” https://github.com/gopchandani/AquaFlow, ac-cessed 01-20-2021.[463] D. Goncalves, S. Signorello, F. M. V. Ramos, and M. Medard, “RandomLinear Network Coding on Programmable Switches,” in

ACM/IEEESymposium on Architectures for Networking and CommunicationsSystems (ANCS) , 2019.[464] T. Kohler, R. Mayer, F. Dürr, M. Maaß, S. Bhowmik, and K. Rothermel,“P4CEP: Towards In-Network Complex Event Processing,” in

MorningWorkshop on In-Network Computing , 2018.[465] A. Sapio, I. Abdelaziz, M. Canini, and P. Kalnis, “DAIET: A System forData Aggregation Inside the Network,” in

ACM Symposium on CloudComputing (SoCC) , 2017.[466] G. C. Sankaran and K. M. Sivalingam, “Design and Analysis of FastIP Address-Lookup Schemes based on Cooperation among Routers,” in

International Conference on COMmunication Systems and NETworks(COMSNETS) , 2020. [467] Y. Zhang, B. Han, Z.-L. Zhang, and V. Gopalakrishnan, “Network-Assisted Raft Consensus Algorithm,” in ACM SIGCOMM ConferencePosters and Demos , 2017.[468] H. T. Dang, M. Canini, F. Pedone, and R. Soulé, “Paxos Made Switch-y,”

ACM SIGCOMM Computer Communications Review (CCR) ,vol. 46, 2016.[469] H. T. Dang, P. Bressana, H. Wang, K. S. Lee, N. Zilbermanand,H. Weatherspoon, M. Canini, F. Pedone, and R. Soulé, “P4xos: Con-sensus as a Network Service,”

IEEE/ACM Transactions on Networking(ToN) , vol. 28, 2020.[470] “GitHub: P4xos,” https://github.com/P4xos/P4xos, accessed 01-20-2021.[471] E. Sakic, N. Deric, E. Goshi, and W. Kellerer, “P4BFT: Hardware-Accelerated Byzantine-Resilient Network Control Plane,” in

IEEEGlobal Communications Conference (GLOBECOM) , 2019.[472] E. Sakic, N. Deric, C. B. Serna, E. Goshi, and W. Kellerer, “P4BFT: ADemonstration of Hardware-Accelerated BFT in Fault-Tolerant Net-work Control Plane,” in

ACM SIGCOMM Conference Posters andDemos , 2019.[473] L. Zeno, D. R. K. Ports, J. Nelson, and M. Silberstein, “SwiShmem:Distributed Shared State Abstractions for Programmable Switches,” in

ACM Workshop on Hot Topics in Networks (HotNets) , 2020.[474] S. Han, S. Jang, H. Lee, and S. Pack, “Switch-Centric Byzantine FaultTolerance Mechanism in Distributed Software Deﬁned Networks,”

IEEE Communications Letters , vol. 24, 2020.[475] “GitHub: SC-BFT,” https://github.com/MNC-KOR/SC-BFT, accessed01-20-2021.[476] G. Sviridov, M. Bonola, A. Tulumello, P. Giaccone, A. Bianco, andG. Bianchi, “LODGE: LOcal Decisions on Global statEs in Pro-grananaable Data Planes,” in

IEEE Conference on Network Softwariza-tion (NetSoft) , 2018.[477] ——, “LOcAl DEcisions on Replicated States (LOADER) in pro-grammable dataplanes: Programming abstraction and experimentalevaluation,”

Computer Networks , vol. 181, 2020.[478] “GitHub: LOADER,” https://github.com/german-sv/loader, accessed01-20-2021.[479] H. Takruri, I. Kettaneh, A. Alquraan, and S. Al-Kiswany, “FLAIR:Accelerating Reads with Consistency-Aware Network Routing,” in

USENIX Symposium on Networked Systems Design & Implementation(NSDI) , 2020.[480] S. Luo, H. Yu, and L. Vanbever, “Swing State: Consistent Updates forStateful and Programmable Data Planes,” in

ACM Symposium on SDNResearch (SOSR) , 2017.[481] J. Xing, A. Chen, and T. E. Ng, “Secure State Migration in the DataPlane,” in

Workshop on Secure Programmable Network Infrastructure(SPIN) , 2020.[482] “GitHub: P4Sync,” https://github.com/jiarong0907/P4Sync, accessed01-20-2021.[483] Y. Xue and Z. Zhu, “Hybrid Flow Table Installation: OptimizingRemote Placements of Flow Tables on Servers to Enhance PDPSwitches for In-Network Computing,”

IEEE Transactions on Networkand Service Management (TNSM) , 2020.[484] C. Kuzniar, M. Neves, and I. Haque, “POSTER: Accelerating En-crypted Data Stores Using Programmable Switches,” in

IEEE Inter-national Conference on Network Protocols (ICNP) , 2020.[485] G. C. Sankaran and K. M. Sivalingam, “Collaborative Packet HeaderParsing in NetFPGA-Based High Speed Switches,”

IEEE NetworkingLetters , vol. 2, 2020.[486] J. Woodruff, M. Ramanujam, and N. Zilberman, “P4DNS: In-NetworkDNS,” in

P4 Workshop in Europe (EuroP4) , 2019.[487] “GitHub: P4DNS,” https://github.com/cucl-srg/P4DNS, accessed 01-20-2021.[488] R. Kundel, L. Nobach, J. Blendin, H.-J. Kolbe, G. Schyguda, V. Gure-vich, B. Koldehofe, and R. Steinmetz, “P4-BNG: Central Ofﬁce Net-work Functions on Programmable Packet Pipelines,” in

InternationalConference on Network and Services Management (CNSM) , 2019.[489] “GitHub: p4se,” https://github.com/opencord/p4se, accessed 01-20-2021.[490] I. Martinez-Yelmo, J. Alvarez-Horcajo, M. Briso-Montiano, D. Lopez-Pajares, and E. Rojas, “ARP-P4: A Hybrid ARP-Path/P4RuntimeSwitch,” in

IEEE International Conference on Network Protocols(ICNP) , 2018.[491] R. Glebke, J. Krude, I. Kunze, J. Rüth, F. Senger, and K. Wehrle,“Towards Executing Computer Vision Functionality on ProgrammableNetwork Devices,” in

ACM CoNEXT Workshop on Emerging In-Network Computing Paradigms , 2019. [492] J. Xie, C. Qian, D. Guo, M. Wang, S. Shi, and H. Chen, “EfﬁcientIndexing Mechanism for Unstructured Data Sharing Systems in EdgeComputing,” in

IEEE International Conference on Computer Commu-nications (INFOCOM) , 2019.[493] Y.-S. Lu and K. C.-J. Lin, “Enabling Inference Inside SoftwareSwitches,” in

Asia-Paciﬁc Network Operations and Management Sym-posium (APNOMS) , 2020.[494] A. Yazdinejad, R. M. Parizi, A. Dehghantanha, and K.-K. R. Choo,“P4-to-blockchain: A secure blockchain-enabled packet parser forsoftware deﬁned networking,”

Computers & Security Journal , vol. 88,2019.[495] T. Osi´nski, H. Tarasiuk, P. Chaignon, and M. Kossakowski, “P4rt-OVS: Programming Protocol-Independent,Runtime Extensions forOpen vSwitch with P4,” in

IFIP-TC6 Networking Conference (Net-working) , 2020.[496] “GitHub: P4rt-OVS,” https://github.com/Orange-OpenSource/p4rt-ovs,accessed 01-20-2021.[497] S. R. Li, R. W. Yeung, and N. Cai, “Linear Network Coding,”

IEEETransactions on Information Theory , vol. 49, 2003.

Frederik Hauser (Student Member, IEEE) studiedcomputer science at the University of Tuebingen,Germany, and received his Master degree. Sincethen, he has been a researcher at the Chair of Com-munication Networks at the University of Tuebingen,pursuing his PhD. His main research interests in-clude software deﬁned networking, network functionvirtualization, and network security.

Marco Haeberle (Student Member, IEEE) studiedcomputer science at the University of Tuebingen,Germany, and received his Master degree. Sincethen, he has been a researcher at the Chair of Com-munication Networks at the University of Tuebin-gen, pursuing his PhD. His main research interestsinclude software deﬁned networking, P4, networksecurity, and automated network management.

Daniel Merling is a Ph.D student at the chair ofcommunication networks of Prof. Dr. habil. MichaelMenth at the Eberhard Karls University Tübingen,Germany. There he obtained his master’s degree in2017 and afterwards, became part of the communi-cation networks research group. His area of exper-tise include software-deﬁned networking, scalability,routing and resilience issues, and multicast.

Steffen Lindner is a Ph.D. student at the EberhardKarls University Tübingen, Germany. He wrote hisbachelor and master thesis at the chair of communi-cation networks of Prof. Dr. habil. Michael Menth.He started his Ph.D. in September 2019 at the com-munication networks research group. His researchinterests include software-deﬁned networking, P4and congestion management. Vladimir Gurevich is a Principal Engineer at IntelCorp., where he conducts educational and devel-opment activities related to P4 language, ToﬁnoASICs and the data plane APIs. Intel ConnectivityAcademy course developed by Vladimir is the mostpopular educational program for teaching the P4language and data plane programming and currentlyhas more than 800 graduates. He also leads a IntelConnectivity Research Program serving more than150 universities and research organizations workingwith Intel on next-generation networks.

Florian Zeiger holds a PhD in Computer Science.For more than 10 years he has been doing researchand technology transfer on ad-hoc networks, mobileplatform remote operations, and IIoT. Since 2011he is working in Industry as Project Manager, KeyExpert, and Senior Research Scientist in national& international R&D projects. He currently worksfor Siemens and he is a certiﬁed PMI Project Man-agement Professional and IACRB Certiﬁed SCADASecurity Architect.

Reinhard Frank is a Senior Industrial Communica-tion & Virtualization Expert (Research Scientist) atthe Siemens AG. His interests are virtualization inindustrial networks, industrial routing and switchingresearch aspects with focus on software deﬁnedcapabilities and zero trust architectures.