[PDF] A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

Abstract

In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) for the generated SQL query is both crucial and challenging; the parser is required to connect the natural language (NL) question and the SQL query to the structured knowledge in the database. We formulate two linking processes to address this challenge: schema linking which links explicit NL mentions to the database and structural linking which links the entities in the output SQL with their structural relationships in the database schema. Intuitively, the effectiveness of these two linking processes changes based on the entity being generated, thus we propose to dynamically choose between them using a gating mechanism. Integrating the proposed method with two graph neural network-based semantic parsers together with BERT representations demonstrates substantial gains in parsing accuracy on the challenging Spider dataset. Analyses show that our proposed method helps to enhance the structure of the model output when generating complicated SQL queries and offers more explainable predictions.

Full PDF

AA Tale of Two Linkings: Dynamically Gating between Schema Linkingand Structural Linking for Text-to-SQL Parsing

Sanxing Chen ♦ Aidan San ♦ Xiaodong Liu ♣ Yangfeng Ji ♦♦ University of Virginia ♣ Microsoft Research { sc3hn, aws9xm, yangfeng } @[email protected] Abstract

In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) for thegenerated SQL query is both crucial and challenging; the parser is required to connect the naturallanguage (NL) question and the SQL query to the structured knowledge in the database. Weformulate two linking processes to address this challenge: schema linking which links explicit NLmentions to the database and structural linking which links the entities in the output SQL withtheir structural relationships in the database schema. Intuitively, the effectiveness of these twolinking processes changes based on the entity being generated, thus we propose to dynamicallychoose between them using a gating mechanism. Integrating the proposed method with twograph neural network-based semantic parsers together with BERT representations demonstratessubstantial gains in parsing accuracy on the challenging Spider dataset. Analyses show that ourproposed method helps to enhance the structure of the model output when generating complicatedSQL queries and offers more explainable predictions.

Semantic parsing, which aims at mapping natural language (NL) utterances to computer understandablelogic forms or programming languages, has been an active research topic in the ﬁeld of natural languageprocessing (NLP) for decades (Zettlemoyer and Collins, 2005; Liang et al., 2011). Although a varietyof logic forms have been studied by researchers, Text-to-SQL has particularly attracted a large amountof attention due to the desire of natural language interfaces to database (NLIDB) (Warren and Pereira,1982; Zelle and Mooney, 1996; Dong, 2019) for both scientiﬁc and industrial reasons. Recently, thereis a growing interest in neural based Text-to-SQL semantic parsing, thanks to the development of newevaluation paradigms and datasets (Iyer et al., 2017; Zhong et al., 2017; Yu et al., 2018; Yu et al., 2019).Text-to-SQL parsing requires strict structured prediction due to its application scenario where the outputSQL will be sent to an executor program directly. To enhance the capacity of an auto-regressive model tocapture structural information, current state-of-the-art semantic parsers usually adopt a grammar-baseddecoder (Xiao et al., 2016; Yin and Neubig, 2017; Krishnamurthy et al., 2017). Rather than directlygenerating the tokens in a traditional sequence-to-sequence manner, grammar-based decoders produce asequence of production rules to construct an abstract syntax tree (AST) of the corresponding SQL. As thegrammar constraints narrows down the search space to only grammatically valid ASTs, those parsers canusually generate well-formed SQL skeletons (Guo et al., 2019; Bogin et al., 2019a).However, it is still difﬁcult for current state-of-the-art models to ﬁll in the skeletons with semanticallycorrect entities, especially when they are required to generalize to unseen DB schemas (Yu et al., 2018;Suhr et al., 2020). To predict the correct entity, the model should have a database (DB) schema groundedunderstanding of the NL question, which means that the model should be able to jointly learn the semanticsin the NL question and the structured knowledge in a given database. We formulate two types of entitygeneration problems, which can be addressed by the following two linking processes respectively.

This work is licensed under a Creative Commons Attribution 4.0 International License. License details: http://creativecommons.org/licenses/by/4.0/ . a r X i v : . [ c s . C L ] N ov Q: For each continent, list its id, name, and how many countries it has?

SELECT t1 . contid , t1 . continent ,

COUNT (*)

FROM continents AS t1 JOIN countries AS t2 ON t1 . contid = t2 . continent GROUP BY t1 . contid ;2 Q1:

What is the average, minimum, and maximum age of all singers from (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)

France?

Q2:

What is the average, minimum, and maximum age for all (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)

French singers?

SELECT AVG ( age ) ,

MIN ( age ) ,

MAX ( age )

FROM singer

WHERE country = ’ France ’ ;3 Q:

What is the ﬁrst name and gender of the all the students who have more than one pet?

SELECT t1 . fname , t1 . sex

FROM student AS t1 JOIN has pet AS t2 ON t1 . s t u i d = t2 . s t u i d GROUP BY t1 . s t u i d

HAVING COUNT (*) > Table 1:

Several examples that are taken from the Spider dataset. Entity mentions are underlined in NLquestions. Q1 and Q2 are paraphrases of each other which should lead to the same SQL result. (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)

Wavy (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) underline indicates the mention can only be resolved by linking to a cell value or common sense reasoning.

Schema linking.

Schema linking (Guo et al., 2019; Wang et al., 2020) is an instance of entity linking (Shenet al., 2014) in the context of linking to relational DB schema. Text-to-SQL semantic parsers should learnto recognize an entity mention in the NL question and link it to the corresponding unique entity in theDB schema. This task can be challenging due to the diversity and ambiguity NL mentions. However, inpractice, the solution is often relatively easy when a particular entity is well realized with similar wordingin both the NL question and DB schema. As shown in Table 1, in Spider (Yu et al., 2018), the underlinedmentions can almost exactly match the corresponding schema entities. Therefore, current state-of-the-artparsers normally address this problem with simple string matching or embedding matching modules.

Structural linking.

While some entities are generated because they are mentioned in the NL question,others can be generated because of their role as special functional components in SQL, e.g. , contid and stuid in the ON clauses of the ﬁrst and third examples in Table 1. These entities usually cannotﬁnd their corresponding mentions in the NL question but can be induced by the structural constraints ofSQL. Such phenomenons are generally referred to as the structural mismatch between NL and formallanguages (Kwiatkowski et al., 2013; Berant and Liang, 2014). This process frequently occurs whengenerating complex SQL queries. We propose to treat this entity generation process as ﬁnding a structurallink between current candidates and past generated entities. Although previous work has considered somesimple structural constraints (Guo et al., 2019), to the best of our knowledge, we are the ﬁrst to formallydescribe this process.While schema linking and structural linking can complement each other ( e.g. , the entity used by the GROUP BY clause needs to be a column in a previously selected table and also has its mention in the NLquestion), they actually address different types of problems. In most cases, e.g. , the examples in Table 1described before, a decoder may need to discriminate one from another for better generation performance.In this work, we propose to use a dynamic gating mechanism to serve as a switch between the two linkingprocesses. Schema linking can be implemented as the decoder attending to the encoded representations ofthe NL question to ﬁnd an entity mention, then locating the corresponding entity from the schema. On theother hand, in structural linking, our decoder performs self-attention over those past decoder states wherean entity was generated and further makes a decision between copying one of the previously generatedentities or taking one of its linked entities ( e.g. , the foreign key of a previously joined table) based on theschema structure. From the model’s perspective, it can also be viewed as a memory pointer network thatenhances the structured prediction ability of an auto-regressive model, and the dynamic gating determineswhen to emphasize this enhancement. Our proposed method can be easily applied to most semanticparsers as long as their decoders explicitly or implicitly have two modules that deal with schema linkingand structural linking.We integrate the dynamic gating technique to two state-of-the-art Text-to-SQL parsing models (Boginet al., 2019a; Bogin et al., 2019b) and further augment them with pretrained BERT (Devlin et al., 2019)word representations. We evaluate our model on the Spider dataset which is challenging because of its or each continent, list its id, name, and how many countries it has

QuestionEmbeddingNL Encoder query "-> ["select", selections, from, group_by] selections "-> [selection, ",", selections] selection "-> [column] column "-> [" continents@contid "] selections "-> [selection, ",", selection] selection "-> [column] column "-> [" continents@continent "] !.. !.. !.. from "-> ["from", table, join] table "-> [" continents "] join "-> ["join", table, "on", join_condition] table "-> [" countries "] join_condition "-> [column, "=", column] column "-> [“ continents@contid "] column "-> [" countries@continent "] group_by "-> ["group", "by", column] column "-> [" countries@continent "] Grammar Decoder continents@contidcontinents@continencontinentscountries

Decoder Memory continents@contid

Decoder Memory

EmbeddingsEncoder UnitsColumn NodesTable NodesPrimary-Foreign Key EdgesTable-Column Edges Gating MechanismEncoder Bi-directional Link

Schema LinkingStructural LinkingSchema LinkingStructural Linking countries@continentcontinents@contidcontinentscountries

Schema Graph

Figure 1:

An illustration of our proposed method when running the ﬁrst example shown in Table 1. Thisﬁgure shows two independent entity generation procedures, the top one favors schema linking while thebottom one favors structural linking. Some details are omitted for the sake of simplicity. Grammars aresimpliﬁed to ﬁt in the limited space, readers are encouraged to refer to (Krishnamurthy et al., 2017) fordetails.cross-domain setting where a model needs to generalize to not only complex SQL but also unseen DBs.Experimental results show that our proposed method consistently yields an improvement of more than3% on exact set matching accuracy and sees the most beneﬁts when generating complex SQL. Furtheranalysis conﬁrms that the models are dynamically switching between the two linking processes.

In this section, we ﬁrst formulate the Text-to-SQL semantic parsing task in §2.1. We will then describethe details of our proposed method in §2.2.

The task of Text-to-SQL semantic parsing is to predict a SQL query S based on input ( Q , G ) where Q = { q , . . . , q |Q| } is the NL question and G = ( V , E ) is the DB schema being queried. In theschema, V = { ( e , t ) , . . . , ( e |V| , t |V| ) } is a set which usually contains two types of entities ( i.e. ,tables and columns ) and their textual descriptions ( i.e. , table names and column names), while E = { ( e ( s )1 , e ( t )1 , l ) , . . . , ( e ( s ) |E| , e ( t ) |E| , l |E| ) } contains the relations l between source entity e ( s ) and tar-get entity e ( t ) , e.g. , table-column relationships, foreign-primary key relationships, etc. The output S = { a , . . . , a |S| } is a sequence of decoder actions which further compose an AST of SQL.Typical state-of-the-art Text-to-SQL parsers, consist of three components: a NL encoder, a schemaencoder and a grammar decoder (Guo et al., 2019; Bogin et al., 2019a).The NL encoder takes the NL question tokens Q as input, maps them to word embeddings E Q , thenfeeds them to a Bi-LSTM (Hochreiter and Schmidhuber, 1997). The hidden states of the Bi-LSTM serveas the contextual word representation of each token.The schema encoder takes G as input and builds a relation-aware entity representation for every entityin the schema. The initial representation of an entity is a combination of its words embeddings and typeinformation. Then self-attention (Zhang et al., 2019; Shaw et al., 2019) or graph-based models (Boginet al., 2019a; Wang et al., 2020) are utilized to exploit the relational information between each pair of Note that columns may have more ﬁne-grained types like binary, numeric, string and date/time, primary/foreign etc. In SQL, a foreign key in one table is used to refer to a primary key in another table to link these two tables together for jointqueries. inkgate

NLencoder S c h e m a li nk i ng Copygate

LinkentityCopyentity S t r u c t u r a l l i n k i n g Figure 2:

An illustration of our gating mechanism.entities from the DB schema, thus produce the ﬁnal representation of all entities H V ∈ R |V|× dim . We willdetail this in §3.Finally, a grammar decoder (Xiao et al., 2016; Yin and Neubig, 2017; Krishnamurthy et al., 2017)generates an AST of output SQL in a depth-ﬁrst order. The decoder is typically an auto-regressive model( e.g. , LSTM) which estimates the probability of generating an action sequence.There are two cases of using actions. Depending a speciﬁc case, an action is either (i) producing a newproduction rule to unfold the leftmost non-terminal node in the AST, or (ii) generating an entity ( e.g. ,a table or a column) from the DB schema if it is required by last output production rule. In the formercase, at step t , the decoder normally uses its hidden states h t to retrieve a context vector c t from the NLencoder. Then an action embedding a t is produced based on the concatenation of h t and c t . This actionembedding will directly predict a production rule from the target vocabulary which is a subset of a ﬁxednumber of production rules. For the latter case, the decoder needs to estimate a probability distributionover a schema-speciﬁc vocabulary under grammatical constraints which come from the structure of boththe output SQL and the DB schema, as well as semantic constraints implied in the NL question. In this paper, we focus on the decision made when the decoder is looking for an entity to ﬁll in a slot( i.e. , case (ii) in the last paragraph). Our decoder predicts the entity based on a mixed probability modelconsisting of two processes:•

Schema linking.

The decoder attends to the output of the NL encoder (which can be seen asselecting a most relevant NL mention), then ﬁnds the corresponding entity based on string-matchingor embedding-matching results.•

Structural linking.

The decoder self-attends to the output states from those previous decoding stepswhich have generated entities, then ﬁnds another entity which is structurally linked to the attendedentity.The choice between them is controlled by a gating mechanism called the link gate .Formally, the marginal probability of generating an entity e is deﬁned as follows: Pr( a t = e ) = Pr( e | S CHM ) Pr( S CHM ) + Pr( e | S TRCT ) Pr( S TRCT ) (1)where Pr( S CHM ) and Pr( S TRCT ) are the probability of choosing schema linking and structural linkingrespectively. They are further computed as: ρ link = Sigmoid(FF( a t )) (2) Pr( S CHM ) = ρ link (3) Pr( S TRCT ) = 1 − ρ link (4)Equation 2 stands for our proposed link gate which is computed by the action embedding a t . The reasonfor purely basing the gate value on a t is that intuitively the choice between the two processes is about theole of the current entity we want to generate. The role of an entity is determined by the SQL clause thatcontains it. Since a t is directly used to predict a production rule in case (i), it should be able to capturethis information. The link gate allows the decoder to dynamically choose between information from ourtwo linking processes, and prevents them from interfering each other.In practice, we model the probability of the schema linking process generating an entity mentionedin the NL question, namely Pr( e | S CHM ) , as a multiplication of the attention weights λ ∈ R |Q| over theNL encoder outputs and a schema linking matrix M ∈ R |Q|×|V| . The probability of the structural linkingprocess, namely Pr( e | S TRCT ) , is similarly computed by multiplying decoder self-attention weights and astructural linking matrix T ∈ R |V|×|V| .The structural linking matrix T captures the relationship between every pair of entities given therelational DB schema. Common structural links include relations between a table and its columns, a tableand its primary/foreign key, a primary key and one of its linked foreign keys in other tables, etc. There arealso multi-steps links which are the combinations of the one-step links listed above. Note that there maynot be a unique link between every pair of entities and some entities may not have a link between them atall. Meanwhile, an entity can link to itself which can be considered to be a special zero-step structurallink. It is the only structural link modeled in most current Text-to-SQL semantic parsers.We compute the structural linking score between an entity e i and e j by an additive attention mecha-nism (Bahdanau et al., 2015), as follows: T i,j = v (cid:62) α tanh ( W α [ e i ; e j ]) (5)where e i and e j are the corresponding entity representations retrieved from H V , while v α and W α areboth trainable parameters. Compared to the dot-product attention used in Bogin et al. (2019a) and Boginet al. (2019b), the additive attention we used here is expected to capture more of the structural relationshipbetween entity pairs rather than only the similarity of entity representations. T is expected to capture all types of relationships between entities, but it can be overwhelmed by thelarge workload. So, we single out the zero-step relationship ( i.e. , copying) and address it by anotherstructural linking matrix T copy = I , which is trivially an identity matrix in this case. To choose betweencopying and other types of links, a copy gate ( ρ copy ) is obtained in the same manner as to how we computethe link gate in Equation 2.We use decoder self-attention to ﬁnd the past generated entity which could have structural constraintson the entity that we currently want to generate. β = Softmax(Attention( a t , H m )) (6) H m = { a i | i < t, a i ∈ e } (7) H m is a memory matrix consisting of the action embeddings from every past decoding step which hasgenerated an entity. In this way, we compute two sets of attention weights β copy and β link for copying andlinking separately using the additive attention (Bahdanau et al., 2015) again. The motivation for separateattention weights is that these two linking patterns might need to attend to different generated entities.Overall the probability of generating an entity via structural linking is modeled as: Pr( a t = e i | S TRCT ) = ρ copy ( β copy T copy ) i + (1 − ρ copy )( β link T ) i (8)Finally, this probability is mixed with the probability of schema linking controlled by the link gate. In this section, we describe how we integrate our proposed method into a grammar decoder and leveragethe entity representation from a GNN module.We use the type constrained grammar decoder from Krishnamurthy et al. (2017). To predict a t at timestep t , the decoder will ﬁrst obtain the context vector c t from the NL encoder by performing dot-productttention (Luong et al., 2015). Then the action embedding is generated by a feed-forward network takingthe concatenation of decoder hidden state and context vector as input. a t = FF([ h t ; c t ]) (9) a t is used to predict the production rule or estimate the gate values in the entity generation process.We adopt the idea from (Bogin et al., 2019a; Bogin et al., 2019b) to learn a schema relation-awareentity representation H V by a GNN module. The initial embedding of each entity h (0) e is deﬁned asa non-linear transformation of the combination of its type embedding and the average over the wordembeddings of its neighbors in the schema graph. In later time steps, the hidden state is updated by agated recurrent unit (Cho et al., 2014; Li et al., 2016) as h ( l ) e = GRU (cid:16) h ( l − e , x ( l ) e (cid:17) , where the input x ( l ) e is deﬁned as a weighted summation over the hidden states of its neighbor entities: x ( l ) e = (cid:88) t ∈{← , → , ↔} (cid:88) ( s,e,l ) ∈E ,l = t W t h ( l − s + b t They consider three edge types, i.e. , bidirectional edges between a table and its contained columns ↔ ,unidirectional edges between a foreign key and a connected primary key ← and its reverse version → .Given a ﬁxed GNN recurrence step L , we have the ﬁnal hidden states of all the entities in the graph as theentity representation H V = { h ( L ) e | ( e, t ) ∈ V} . We also adopt their schema linking module to create aschema linking matrix M based on word embedding similarity and some simple manually design features( e.g. , editing distance and lemma). In their G LOBAL

GNN (Bogin et al., 2019b), an additional GNN andan auxiliary training loss are added to ﬁlter out irrelevant nodes in the graph, thus producing a better entityrepresentation.To augment our model with pretrained BERT embeddings, we follow Hwang et al. (2019) and Zhang etal. (2019) to feed the concatenation of NL question and the textual descriptions of DB entities to BERTand use the top layer hidden states of BERT as the input embeddings.

We evaluate the effectiveness of our proposed method by integrating it into two state-of-the-art semanticparsers on the Spider dataset and further ablate out some components to understand their contributions.

We implement our model using PyTorch (Paszke et al., 2019) and AllenNLP (Gardner et al., 2018). Forthe GNN and G

LOBAL

GNN models we revise and build upon the code released in (Bogin et al., 2019a;Bogin et al., 2019b). We re-ran the experiment and report the results on our re-implementation andfound our results slightly improves upon their reported results. In BERT experiments, we use the baseuncased BERT model with 768 hidden size provided by HuggingFace’s Transformers library (Wolf etal., 2019). We follow the database split setting of Spider, where any databases that appear at testing timeare ensured to be unseen at training time. Our code and models are available at https://github.com/sanxing-chen/linking-tale . The experimental results in Table 3 show that our proposed gating mechanism leads a substantial im-provement on all the GNN, G

LOBAL

GNN, and BERT baselines. Spider questions are divided intodifferent levels of difﬁculty (hardness). Most of the improvements come from gains in complicated ( i.e. ,Medium, Hard and Extra Hard) SQL generation. Specially, we observe up to . gains in the Hardset when applying our method on the G LOBAL

GNN baseline. One major contribution comes from thepartial matching F1 score of

IUEN ( i.e. , SQL clauses INTERSECT , UNION , EXCEPT , NESTED whichonly appear in Hard and Extra Hard levels) increasing from . to . . We also notice that the We choose these models because of the ability of GNNs to model various types of structural links, and they are among a fewstate-of-the-art models that are publicly available at the time of writing. ardness

Table 2:

Number of examplesin the development set of Spiderwith different hardness levels as-sociated with the SQL need tobe generated. Model Acc. Easy Medium Hard ExtraGNN 47.7% 68.8% 51.8% 31.2% 22.9%+ Ours 50.7% 66.4% 54.8% 42.8% 25.3%G

LOBAL

GNN 49.3% 69.2% 53.0% 32.8% 27.6%+ Ours 52.8% 70.4% 55.7% 46.6% 25.9%+ BERT 53.5% 76.0% 57.3% 36.2% 28.3%+ BERT + Ours 57.6% 73.6% 61.6% 48.9% 32.9%

Table 3:

Exact Set Matching Accuracy on SQL queries with differenthardness levels in the development set of Spider. Greatest improve-ments in the Hard level; small ﬂuctuation in Easy level due to gatebias.SQL output well-formedness is improved. For instance, before applying our method the decoder wouldoccasionally select the same columns twice to perform the ON clause. After applying our dynamic gating,this issue is virtually eliminated (error rate from to . ). link Copy gate copy

Figure 3:

Value distributions of the link gate andcopy gate measured in dev set of Spider using theG

LOBAL

GNN model. Values of copy gate are con-sidered when the corresponding link gate value issmall ( ρ link < . ).As shown in Figure 3, the values of both gatesare polarized to or , thus making the gatingmechanism act as a binary gate. These statisticscoincide with our hypothesis that most entity gen-eration decisions in Text-to-SQL can be solelymade by evidence from either schema linking orstructural linking . In addition, among all thecases where structural linking is chosen and thecopy gate takes control, fewer than 20% of casesfavor copying. This suggests that there are lots ofcircumstances where different kinds of structurallinking are adopted. We also conduct several experiments to examineseveral design choices in our proposed method.

Sharing Action Embedding.

In the design ofour gating mechanism, one critical decision is touse the action embedding a t to perform decoderself-attention and produce the gating values. Thisis based on our intuition that the action embed-ding captures the structural information of the output SQL at the current position. To verify this decision,we conduct an ablation experiment by using a dedicated embedding to produce the gating value. Thisdedicated embedding is produced in exactly same way as we generated a t in Equation 9, but uses adifferent set of parameters for the feed-forward network. As we can see from Figure 4 (“dedicatedembed”), sharing the parameters with the action embedding is important. Keeping entities.

Guo et al. (2019) also uses a memory-augmented pointer network to perform a copymechanism which assists column selection. In contrast with our memory matrix consisting of actionembeddings (Equation 7), their memory matrix consists of the entity embeddings of columns that havebeen selected previously. They further remove the columns from the candidates in the schema linkingprocess once they are generated to prevent the decoder from repeatedly generating the same columns. To This is different from the case of intentionally self join. Although this issue can be resolved by engineering more grammar rules, we leave it as an indicator of the improvement ofthe well-formedness of the output SQL. L Show the stadium name and the number of concerts in each stadium.

SQL

SELECT stadium.name ρ l = N/A , ρ c = N/A , COUNT (*)

FROM concert ρ l = 0 . , ρ c = 0 . JOIN stadium ρ l = 0 . , ρ c = 0 . ON concert.stadium id ρ l = 0 . , ρ c = 0 . = stadium.stadium id ρ l = 0 . , ρ c = 0 . GROUP BY concert.stadium id ρ l = 0 . , ρ c = 0 . ; NL Which city has most number of departing ﬂights?

SQL

SELECT airports.city ρ l = N/A , ρ c = N/A

FROM airports ρ l = 0 . , ρ c = 0 . JOIN ﬂights ρ l = 0 . , ρ c = 0 . ON ﬂights.airportcode ρ l = 0 . , ρ c = 0 . = ﬂights.sourceairport ρ l = 0 . , ρ c = 0 . GROUP BY airports.city ρ l = 0 . , ρ c = 1 . ORDER BY COUNT (*)

DESC LIMIT Table 5:

Sample predictions of our model. ρ link and ρ copy are abbreviated as ρ l and ρ c respectively. Gatevalues are not applicable (denoted by N/A) for the ﬁrst entity since it has no previously generated entity.Some of the results reﬂect gate bias, see text for details.determine if our dynamic gating and structural linking module can add enough structural constraints tothe decoding process to resolve this kind of problem, we conduct an experiment where we also removethe entities in the schema linking process after they are generated ( i.e. , “removing entity” in Figure 4)to see if it further improves the model. Our results shows that this change can actually hurts the model.Speciﬁcally, we observe a drop in accuracy of the WHERE and

IUEN clauses, which suggests that in ourcontext, the information about a speciﬁc entity in the schema linking process is still useful even after theentity has been generated once. Model Dev Acc. (%)G

LOBAL

GNN G

LOBAL

BERTBase 49.3 53.5Ours dedicated embed 50.0 55.4removing entity 49.4 57.1without copy 50.9 55.8

Table 4:

Alternative approaches and ablation results.

Copy Gate.

In addition, removing the copygate and copy mechanism also harms theperformance of our model ( i.e. , “withoutcopy” in Figure 4). This result conﬁrmsthat it is beneﬁcial to handle different typesof structural links separately. We hypothe-size that different types of structural linksconﬂict with each other, so they are hard toﬁt in one structural linking matrix. Overall,these two results further supports our claimthat the copy gate can determine when tocopy or link to an entity by itself.

Error analysis reveals that our gating mechanism is biased, e.g. , for the ﬁrst few entities beingselected in a SQL query the link gate is trained to favor schema linking in most cases. But, such biascould sometimes be wrong. In such cases where structural linking is needed but absent, the model mayselect duplicate columns or the wrong table during decoding. Similarly, the copy gate might be biasedtoward copying an entity from memory in

GROUP BY clauses. Out of all the SQL clause components,only the

GROUP BY clause’s partial matching F1 score drops (about ) due to this copy gate bias. It istrue that the entity needed in the GROUP BY clause is usually selected, but the information from schemalinking can still be beneﬁcial, e.g. , in the second example of Table 5, the model wants to copy the wrongentity but the link gate rectiﬁes it with schema linking information. Our gating mechanism only relies onthe current action embedding (which can be seen as short-term structural information) to determine thegate values. We believe that introducing more global structural constraints is a promising direction to ﬁnda more ﬂexible and accurate gating mechanism. Short attention spans.

In our experiments, we notice that the action history the model usually attendsto is very short, i.e. , the model only utilizes the the output memory of the most recent three entities in99% of cases. This coincides with similar ﬁndings in language modeling (Daniluk et al., 2017) where The semantics of some pronouns ( e.g. , “each” and “which” in the examples of Table 5) in NL question match with the

GROUP BY clause, but this could be a dataset bias. he augmented-memory was expected to facilitate the modeling of long-range dependencies but failedto do so. Although long-term context is important for language modeling, it is not as important in ourText-to-SQL scenario since most dependencies in programming languages like SQL lie within a shortspan. We are interested in exploring semantic parsing tasks which requires long-term structural constraintsusing our method in the future.

Structural linking patterns.

We have shown the effectiveness of our method in dealing with differenttypes of structural links separately using different components of the model in the previous section. So far,the only special type of structural links we can explicitly model is the copy mechanism, and we treat allother types of links uniformly using additive attention in Equation 5. This might limit the model’s abilityto take advantage of the complicated relationships between entities. Currently, the entity representationprovided by the GNN model is still difﬁcult to explain, because the node representations contain a mixof information from different message-passing steps. One could imagine training GNNs with differentmessage-passing steps each modeling a different level of structural linking, could lead to a more clear andexpressive linking pattern.

Semantic parsing.

Semantic parsing research focus on mapping NL to formal languages like lambdacalculus (Zettlemoyer and Collins, 2005; Kwiatkowksi et al., 2010; Liang et al., 2011; Dong and Lapata,2016), Prolog-style queries (Zelle and Mooney, 1996; Tang and Mooney, 2000), and more recently toSQL (Warren and Pereira, 1982; Popescu et al., 2003; Giordani and Moschitti, 2009; Zhong et al., 2017;Iyer et al., 2017). It can also tackle the problem of parsing NL descriptions to complicated general-purposeprogramming language such as Python (Ling et al., 2016; Rabinovich et al., 2017; Yin and Neubig, 2017).Our proposed method is tested for Text-to-SQL parsing and can be adapted to other semantic parsingapplications.

Structural mismatch.

Programming languages like SQL express the same intent in a completely differentway from NL by design (Kate, 2008). The phenomenon called structural mismatch widely exists betweenNL and various programming language and is a major challenge in semantic parsing (Dong, 2019). Toalleviate the structural mismatch problem, early approaches rely on linguistic formalisms like parsingresults from ﬂexible CCGs (Zettlemoyer and Collins, 2005; Zettlemoyer and Collins, 2007; Kwiatkowskiet al., 2011; Kwiatkowski et al., 2013). Chen et al. (2016) proposed to use sentence rewriting to revise theNL question to a new question which has the same structure with the targeted logical form. Recently, Guoet al. (2019) proposed to ﬁrst translate the NL question to an intermediate representation (IR) designed tobridge NL and SQL, then use a deterministic algorithm to convert the IR to SQL. In addition to taking aconsiderable amount of engineering effort, their designed IR is still unable to cover some SQL grammarslike the self-join in the ON clause, and is more challenging to apply to other programming languages. Wedeal with this problem by explicitly modeling the prediction structure with external predeﬁned structure( i.e. , DB schema) by structural linking. Memory pointer network.

Memory networks were ﬁrst introduced in the context of the questionanswering task, where they served as a differentiable long-term knowledge base to enhance an auto-regressive model’s poor memory (Weston et al., 2015; Sukhbaatar et al., 2015). Copy mechanismsuse attention as a pointer to select and copy items from source text, thus addressing the problem of avariable output vocabulary size (Vinyals et al., 2015; See et al., 2017). Recent research has appliedmemory-augmented pointer networks to various NLP tasks, including task-oriented dialogue (Wu et al.,2019) and also semantic parsing (Liang et al., 2017; Guo et al., 2019). Our dynamic gating mechanismcan also be seen a memory controller, except our memory is read-only and acts as both the query and keyin a pointer network. Different from most current techniques, our pointer network does not only performcopying but can also point to a start point of structural linking.

In this paper, we formulated the entity generation process in Text-to-SQL semantic parsing as two kindsof linking problems, namely schema linking and structural linking. We further proposed a dynamic gatingechanism to explicitly model the decision between these two linking processes. Experimental resultsshow the effectiveness of our proposed method and conﬁrm our intuitions. In the future, we would liketo apply our proposed method to other semantic parsing tasks, such as general purpose code generation,where structural constraints may be more important.

Acknowledgments

We thank Yu Bai and members of the UVA NLP group for valuable discussion and feedback. We alsothank all anonymous reviewers for their helpful comments and suggestions.

References

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning toalign and translate. In

International Conference on Learning Representations .Jonathan Berant and Percy Liang. 2014. Semantic parsing via paraphrasing. In

Proceedings of the 52nd An-nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1415–1425,Baltimore, Maryland, June. Association for Computational Linguistics.Ben Bogin, Jonathan Berant, and Matt Gardner. 2019a. Representing schema structure with graph neural networksfor text-to-SQL parsing. In

Proceedings of the 57th Annual Meeting of the Association for ComputationalLinguistics , pages 4560–4565, Florence, Italy, July. Association for Computational Linguistics.Ben Bogin, Matt Gardner, and Jonathan Berant. 2019b. Global reasoning over database structures for text-to-SQLparsing. In

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing andthe 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages 3659–3664,Hong Kong, China, November. Association for Computational Linguistics.Bo Chen, Le Sun, Xianpei Han, and Bo An. 2016. Sentence rewriting for semantic parsing. In

Proceedingsof the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages766–777, Berlin, Germany, August. Association for Computational Linguistics.Kyunghyun Cho, Bart van Merri¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk,and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machinetranslation. In

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP) , pages 1724–1734, Doha, Qatar, October. Association for Computational Linguistics.Michał Daniluk, Tim Rockt¨aschel, Johannes Welbl, and Sebastian Riedel. 2017. Frustratingly short attentionspans in neural language modeling. In

International Conference on Learning Representations .Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirec-tional transformers for language understanding. In

Proceedings of the 2019 Conference of the North AmericanChapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long andShort Papers) , pages 4171–4186, Minneapolis, Minnesota, June. Association for Computational Linguistics.Li Dong and Mirella Lapata. 2016. Language to logical form with neural attention. In

Proceedings of the 54thAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 33–43, Berlin,Germany, August. Association for Computational Linguistics.Li Dong. 2019.

Learning Natural Language Interfaces with Neural Models . phdthesis, the University of Edin-burgh.Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, MichaelSchmitz, and Luke Zettlemoyer. 2018. AllenNLP: A deep semantic natural language processing platform. In

Proceedings of Workshop for NLP Open Source Software (NLP-OSS) , pages 1–6, Melbourne, Australia, July.Association for Computational Linguistics.Alessandra Giordani and Alessandro Moschitti. 2009. Semantic mapping between natural language questions andsql queries via syntactic pairing. In

International Conference on Application of Natural Language to Informa-tion Systems , pages 207–221. Springer.Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towardscomplex text-to-SQL in cross-domain database with intermediate representation. In

Proceedings of the 57thAnnual Meeting of the Association for Computational Linguistics , pages 4524–4535, Florence, Italy, July. Asso-ciation for Computational Linguistics.epp Hochreiter and J¨urgen Schmidhuber. 1997. Long short-term memory.

Neural computation , 9(8):1735–1780.Wonseok Hwang, Jinyeong Yim, Seunghyun Park, and Minjoon Seo. 2019. A comprehensive exploration onwikisql with table-aware word contextualization. In

Proceedings of the Second KR2ML workshop at NeurIPS2019 .Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning aneural semantic parser from user feedback. In

Proceedings of the 55th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers) , pages 963–973, Vancouver, Canada, July. Association forComputational Linguistics.Rohit Kate. 2008. Transforming meaning representation grammars to improve semantic parsing. In

CoNLL2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning , pages 33–40,Manchester, England, August. Coling 2008 Organizing Committee.Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gardner. 2017. Neural semantic parsing with type constraintsfor semi-structured tables. In

Proceedings of the 2017 Conference on Empirical Methods in Natural LanguageProcessing , pages 1516–1526, Copenhagen, Denmark, September. Association for Computational Linguistics.Tom Kwiatkowksi, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2010. Inducing probabilisticCCG grammars from logical form with higher-order uniﬁcation. In

Proceedings of the 2010 Conference onEmpirical Methods in Natural Language Processing , pages 1223–1233, Cambridge, MA, October. Associationfor Computational Linguistics.Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2011. Lexical generalization inCCG grammar induction for semantic parsing. In

Proceedings of the 2011 Conference on Empirical Methodsin Natural Language Processing , pages 1512–1523, Edinburgh, Scotland, UK., July. Association for Computa-tional Linguistics.Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer. 2013. Scaling semantic parsers with on-the-ﬂy ontology matching. In

Proceedings of the 2013 Conference on Empirical Methods in Natural LanguageProcessing , pages 1545–1556, Seattle, Washington, USA, October. Association for Computational Linguistics.Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph sequence neural networks.In

International Conference on Learning Representations .Percy Liang, Michael Jordan, and Dan Klein. 2011. Learning dependency-based compositional semantics. In

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human LanguageTechnologies , pages 590–599, Portland, Oregon, USA, June. Association for Computational Linguistics.Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, and Ni Lao. 2017. Neural symbolic machines:Learning semantic parsers on Freebase with weak supervision. In

Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics (Volume 1: Long Papers) , pages 23–33, Vancouver, Canada, July.Association for Computational Linguistics.Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tom´aˇs Koˇcisk´y, Fumin Wang, and AndrewSenior. 2016. Latent predictor networks for code generation. In

Proceedings of the 54th Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers) , pages 599–609, Berlin, Germany, August.Association for Computational Linguistics.Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neuralmachine translation. In

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Pro-cessing , pages 1412–1421, Lisbon, Portugal, September. Association for Computational Linguistics.Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zem-ing Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito,Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chin-tala. 2019. Pytorch: An imperative style, high-performance deep learning library. In

Advances in NeuralInformation Processing Systems 32 , pages 8024–8035.Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. 2003. Towards a theory of natural language interfaces todatabases. In

Proceedings of the 8th international conference on Intelligent user interfaces , pages 149–157.ACM.Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract syntax networks for code generation andsemantic parsing. In

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers) , pages 1139–1149, Vancouver, Canada, July. Association for Computational Linguis-tics.bigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In

Proceedings of the 55th Annual Meeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) , pages 1073–1083, Vancouver, Canada, July. Association for ComputationalLinguistics.Peter Shaw, Philip Massey, Angelica Chen, Francesco Piccinno, and Yasemin Altun. 2019. Generating logicalforms from graph representations of text and entities. In

Proceedings of the 57th Annual Meeting of the As-sociation for Computational Linguistics , pages 95–106, Florence, Italy, July. Association for ComputationalLinguistics.Wei Shen, Jianyong Wang, and Jiawei Han. 2014. Entity linking with a knowledge base: Issues, techniques, andsolutions.

IEEE Transactions on Knowledge and Data Engineering , 27(2):443–460.Alane Suhr, Ming-Wei Chang, Peter Shaw, and Kenton Lee. 2020. Exploring unexplored generalization chal-lenges for cross-database semantic parsing. In

Proceedings of the 58th Annual Meeting of the Association forComputational Linguistics , pages 8372–8388, Online, July. Association for Computational Linguistics.Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. In

Advances inneural information processing systems , pages 2440–2448.Lappoon R. Tang and Raymond J. Mooney. 2000. Automated construction of database interfaces: Intergratingstatistical and relational learning for semantic parsing. In , pages 133–141, Hong Kong, China, October.Association for Computational Linguistics.Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In

Advances in Neural InformationProcessing Systems , pages 2692–2700.Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL:Relation-aware schema encoding and linking for text-to-SQL parsers. In

Proceedings of the 58th Annual Meet-ing of the Association for Computational Linguistics , pages 7567–7578, Online, July. Association for Computa-tional Linguistics.David H.D. Warren and Fernando C.N. Pereira. 1982. An efﬁcient easily adaptable system for interpreting naturallanguage queries.

American Journal of Computational Linguistics , 8(3-4):110–122.Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory networks. In

International Conference onLearning Representations .Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac,Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. 2019. Huggingface’s transformers: State-of-the-art natural language processing.

ArXiv , abs/1910.03771.Chien-Sheng Wu, Richard Socher, and Caiming Xiong. 2019. Global-to-local memory pointer networks fortask-oriented dialogue. In

International Conference on Learning Representations .Chunyang Xiao, Marc Dymetman, and Claire Gardent. 2016. Sequence-based structured prediction for semanticparsing. In

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers) , pages 1341–1350, Berlin, Germany, August. Association for Computational Linguistics.Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers) , pages 440–450, Vancouver, Canada, July. Association for Computational Linguistics.Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao,Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A large-scale human-labeled dataset forcomplex and cross-domain semantic parsing and text-to-SQL task. In

Proceedings of the 2018 Conference onEmpirical Methods in Natural Language Processing , pages 3911–3921, Brussels, Belgium, October-November.Association for Computational Linguistics.Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, TaoChen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong,Richard Socher, and Dragomir Radev. 2019. SParC: Cross-domain semantic parsing in context. In

Proceedingsof the 57th Annual Meeting of the Association for Computational Linguistics , pages 4511–4523, Florence, Italy,July. Association for Computational Linguistics.ohn M. Zelle and Raymond J. Mooney. 1996. Learning to parse database queries using inductive logic program-ming. In

Proceedings of the Thirteenth National Conference on Artiﬁcial Intelligence - Volume 2 , AAAI’96,pages 1050–1055. AAAI Press.Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classiﬁca-tion with probabilistic categorial grammars. In

Proceedings of the Twenty-First Conference on Uncertainty inArtiﬁcial Intelligence , UAI’05, pages 658–666, Arlington, Virginia, United States. AUAI Press.Luke Zettlemoyer and Michael Collins. 2007. Online learning of relaxed CCG grammars for parsing to logicalform. In

Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing andComputational Natural Language Learning (EMNLP-CoNLL) , pages 678–687, Prague, Czech Republic, June.Association for Computational Linguistics.Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, RichardSocher, and Dragomir Radev. 2019. Editing-based SQL query generation for cross-domain context-dependentquestions. In

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing andthe 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages 5338–5349,Hong Kong, China, November. Association for Computational Linguistics.Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from naturallanguage using reinforcement learning.