Proceedings of the 2021 ACM SIGCOMM 2021 Conference | 2021

Semi-automated protocol disambiguation and code generation

 
 
 
 
 
 

Abstract


For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, Sage, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the author of the protocol specification, Sage can generate protocol code automatically. Using Sage, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after fixing these, Sage is able to automatically generate code that interoperates perfectly with Linux implementations. We show that Sage generalizes to sections of BFD, IGMP, and NTP and identify additional conceptual components that Sage needs to support to generalize to complete, complex protocols like BGP and TCP.

Volume None
Pages None
DOI 10.1145/3452296.3472910
Language English
Journal Proceedings of the 2021 ACM SIGCOMM 2021 Conference

Full Text