TCN: Table Convolutional Network for Web Table Interpretation
Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, Meng Jiang
TTCN: Table Convolutional Network for Web Table Interpretation
Daheng Wang ∗ , Prashant Shiralkar , Colin Lockard , Binxuan Huang Xin Luna Dong , Meng Jiang University of Notre Dame, Notre Dame, IN 46556, USA Amazon.com, Seattle, WA 98109, USA{dwang8,mjiang2}@nd.edu,{shiralp,clockard,binxuan,lunadong}@amazon.com
ABSTRACT
Information extraction from semi-structured webpages providesvaluable long-tailed facts for augmenting knowledge graph. Rela-tional Web tables are a critical component containing additionalentities and attributes of rich and diverse knowledge. However,extracting knowledge from relational tables is challenging becauseof sparse contextual information. Existing work linearize table cellsand heavily rely on modifying deep language models such as BERTwhich only captures related cells information in the same table. Inthis work, we propose a novel relational table representation learn-ing approach considering both the intra- and inter-table contextualinformation. On one hand, the proposed Table Convolutional Net-work model employs the attention mechanism to adaptively focuson the most informative intra-table cells of the same row or col-umn; and, on the other hand, it aggregates inter-table contextualinformation from various types of implicit connections betweencells across different tables. Specifically, we propose three novelaggregation modules for (i) cells of the same value, (ii) cells of thesame schema position, and (iii) cells linked to the same page topic.We further devise a supervised multi-task training objective forjointly predicting column type and pairwise column relation, aswell as a table cell recovery objective for pre-training. Experimentson real Web table datasets demonstrate our method can outperformcompetitive baselines by + . of F1 for column type predictionand by + . of F1 for pairwise column relation prediction. CCS CONCEPTS • Information systems → Data mining . KEYWORDS
Web table, information extraction, knowledge extraction
ACM Reference Format:
Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, XinLuna Dong, Meng Jiang. 2021. TCN: Table Convolutional Network for WebTable Interpretation. In
Proceedings of the Web Conference 2021 (WWW ’21),April 19–23, 2021, Ljubljana, Slovenia.
ACM, New York, NY, USA, 12 pages.https://doi.org/10.1145/3442381.3450090 *Most of the work was conducted when the author was interning at AmazonThis paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-8312-7/21/04.https://doi.org/10.1145/3442381.3450090 (cid:59)(cid:86)(cid:87)(cid:80)(cid:74)(cid:3)(cid:76)(cid:85)(cid:91)(cid:80)(cid:91)(cid:96)(cid:58)(cid:92)(cid:73)(cid:81)(cid:76)(cid:74)(cid:91)(cid:3)(cid:76)(cid:85)(cid:91)(cid:80)(cid:91)(cid:96)(cid:3)(cid:74)(cid:86)(cid:83)(cid:92)(cid:84)(cid:85) (cid:57)(cid:76)(cid:74)(cid:86)(cid:89)(cid:75)(cid:3)(cid:83)(cid:72)(cid:73)(cid:76)(cid:83) (cid:57)(cid:76)(cid:83)(cid:76)(cid:72)(cid:90)(cid:76)(cid:3)(cid:96)(cid:76)(cid:72)(cid:89) (cid:50)(cid:76)(cid:96)(cid:20)(cid:61)(cid:72)(cid:83)(cid:92)(cid:76)(cid:3)(cid:87)(cid:72)(cid:80)(cid:89)(cid:90) (cid:57)(cid:76)(cid:83)(cid:72)(cid:91)(cid:80)(cid:86)(cid:85)(cid:72)(cid:83)(cid:3)(cid:91)(cid:72)(cid:73)(cid:83)(cid:76)
Figure 1: An example page of Elton John from discogs.comshowing demographic as key-value pairs and a Web table ofdiscography details. Image is redacted for privacy reasons.
In recent years, there has been a significant thrust both in academiaand industry toward the creation of large knowledge bases (KB)that can power intelligent applications such as question answering,personal assistant, and recommendation. These knowledge bases(e.g., DBpedia [20], Wikidata [44], Freebase [2]) contain facts aboutreal-world entities such as people, organizations, etc., from a varietyof domains and languages in the form of (subject, predicate, object)triples. The field of Information Extraction (IE) aims at populatingthese KBs by extracting facts from websites on the Web. The in-formation on the Web can be roughly categorized into four types,namely, unstructured text, semi-structured data, Web tables andsemantic annotations [10]. Recently with the advances in naturallanguage processing (NLP), there has been significant progress inthe development of effective extraction techniques for the text andsemi-structured data [16, 26–28]. However, we have seen limitedsuccess to transform the next rich information source, Web tables,into triples that can augment a knowledge base [49].A Web table is a tabular structure embedded within a webpagedisplaying information about entities, their attributes and relation-ships with other entities along rows and columns. It contains highquality relational knowledge and differs from other types of tablessuch as layout tables, which are primarily meant for formattingpurposes, or matrix tables, meant to show numerical summaries ina grid format. Because they contain metadata such as table captionand column headers that mimic tables in a relational database, theyare also known as relational Web tables. An example of such tablefrom the detail page of musical artist “Elton John” on discogs.com a r X i v : . [ c s . I R ] F e b s shown in Figure 1. This table shows discography informationabout the artist (the main topic entity of the page) such as albums inwhich he has performed along the rows, their release date and theirpublishers, along with other details on the page such as biographi-cal information (e.g., real name, biography, alternate names, etc.)displayed in a key-value format. Although such relational tablesare ubiquitous on the Web (a 2016 study estimates a total of 233Mtables on the Web [22]), they are particularly prevalent on semi-structured websites such as discogs.com which are known to bevery rich sources of information in a variety of domains [25]. Thesesites contain detail pages of different types, e.g., artist, album, trackand publisher pages. Since these websites are created by populatingHTML templates from an underlying relational database, there aremillions of such pages with embedded Web tables, making themparticularly attractive for knowledge base enrichment.Our goal is to develop effective methods for extracting and inter-preting information in Web tables to augment a knowledge base. Inthis paper, we focus on the task of Web table interpretation, whilerelying on existing work to perform the task of table extraction,entailing detecting and filtering of tables from the pages [5]. Thetask is aligning the schema of a given table to the ontology of aknowledge base. It involves determining the type of each columnand the relation between columns from a fixed set of types andrelations in the ontology so that tabular entries can be extracted astriples to augment the KB. It is also known as the metadata recoveryproblem in literature [5]. However, such alignment is challengingfor two reasons, namely schema heterogeneity and context limita-tion . The problem of schema heterogeneity arises because tablesfrom different websites use different terminology for table captionand column headers. For example, one website might use “Name”while another website might use “Label” as the header of a columncontaining publisher names. Besides, the caption, header and/ortabular cell entry may be missing altogether. The second challengeis that the information in the table cells is often very short, typicallyconsisting of only a few words, and thus lacks adequate context toperform any effective reasoning using off-the-shelf NLP methods,thereby requiring a different approach for table interpretation.Although the work on Web table interpretation began over adecade ago [3], the success has been limited. Early work employedprobabilistic graphical models to capture the joint relationshipbetween rows, columns and table header, but suffered from lowprecision ( ∼ inter-table contexts . Theyare (a) shared-schema context: information shared between tablesfollowing the same schema from different pages of the same websource. For example, tables from other artist pages of discogs.comcontain some common values such as publisher names producing albums for various artists, and (b) across-schema context: infor-mation shared between tables from multiple sources in the samedomain, for example, tables from artist pages on both discogs.comand musicbrainz.org may show the same publisher names for analbum. None of the existing approaches consider the opportunity toleverage this inter-tabular context for designing models for the twotasks. In this paper, we answer the question: how can we leverageboth the intra-table and inter-table contexts to improve web tableinterpretation, especially in the presence of shared table schemaand overlapping values across webpages and even across websites?Different from the existing setting, we consider a collection oftables as the input to our problem to utilize the full context availablefrom both intra-table and inter-table implicit connections. We viewthe collection of tables as a graph in which the nodes represent thetabular cell entries and the edges represent the implicit connectionsbetween them. An edge may connect two values from the samerow, same column, same cell position in tables following a commonschema, or any position in tables having the same value. Given thefact that each node (a value in our case) can have a variable numberof links, and inspired by the capabilities of a graph neural network(GNN) to learn effectively from such graph data [45, 46], our goal isto learn a table representation that makes use of the intra-table andinter-table contexts for learning a prediction model for the columntype and relation prediction tasks. We propose a novel relationaltable representation learning framework called Table ConvolutionNetwork (TCN) that operates on implicitly connected relationalWeb tables and aggregates information from the available context.Our approach gives us two main advantages: (a) it allows usto integrate information from multiple Web tables that providegreater context, and (b) when the inter-table context is unavailable,our model reduces to common case of having only one table asthe input, thereby unifying the general case of Web tables. Totrain the network efficiently, we propose two training schemes:(a) supervised multi-tasking, and (b) unsupervised by way of pre-training for scenarios when the supervision may not be available.We make the following contributions through this paper: • We propose a novel representation learning framework calledTable Convolution Networks (TCN) for the problem of Webtable interpretation involving column type and pairwise col-umn relation prediction. At its core, TCN utilizes the intra-table and inter-table context available from a collection oftables, instead of the limited context from a single table. • We show two approaches to train the network, namely aclassic supervised mode, and an unsupervised mode thatemploys pre-training through self-supervision for jointlypredicting column type and relation between column pairs. • We perform extensive experiments with several state-of-the-art baselines on two datasets containing 128K tables of 2.5Mtriples, showing that TCN outperforms all of them with anF1 score of 93.8%, a relative improvement of 4.8% points onthe column type detection task, and an F1 score of 93.3%, arelative improvement of 4.1% on the relation prediction task.The roadmap of this paper is organized as follows. We review re-lated work in Section 2. In Section 3, we formally define the researchproblem. Our proposed Table Convolutional Network approach isintroduced in Section 4. Section 5 presents experimental results.We conclude the paper and discuss on future work in Section 6.
RELATED WORK
We discuss two lines of research related to our work.
Relational Table Interpretation.
Relational tables on the Webdescribe a set of entities with their attributes and have been widelyused as vehicle for conveying complex relational information [4, 13].Since the relationships of table cells are not explicitly expressed,relational table interpretation aims at discovering the semantics ofthe data contained in relational tables, with the goal of transformingthem into knowledge intelligently processable by machines [3, 48].With the help from existing knowledge bases, this is accomplishedby first classifying tables according to some taxonomy [15, 34],then identifying what table columns are about and uncovering thebinary relation of table columns [50]. The extracted knowledge canin turn be readily used for augmenting knowledge bases [37].Column type annotation refers to associating a relational ta-ble column with the type of entities it contains. Earlier methodscombine the exact match or certain entity search strategies witha majority vote scheme for predicting the column type [31, 43].Fan et al. [12] proposed a two-stage method which first matchescolumn to candidate concepts and then employ crowdsourcing forrefinement on type prediction. T2KMatch by Lehmberg et al. [21]proposed to stitch Web tables of the same schema into a larger oneto improve the prediction performance. Sherlock [18] by Hulseboset al. proposed a set of statistical features describing the characterand word level distributions along with some semantic features forfeeding into a deep classifier to get high prediction accuracy.Relation extraction is the task of associating a pair of columns ina table with the relation that holds between their contents. Mulwadet al. [30] proposed a semantic message passing algorithm usingknowledge from the linked open data cloud to infer the semanticsbetween table columns. Munoz et al. [32] proposed to use an exist-ing linked data knowledge base to find known pre-existing relationsbetween entities and extend on analogous table columns. Sekha-vat et al. [38] proposed a probabilistic model leveraging naturallanguage patterns associated with relations in knowledge base.Another common task for interpreting relational table is entitylinking which is the process of detecting and disambiguating spe-cific entities mentioned in the table [1, 11]. Existing work oftencouple it with column type annotation and relation extraction to-gether as a prerequisite or joint task [31, 36, 51]. However, thisrequires expensive preprocessing steps and largely limits the flexi-bility for interpreting semantics of relational table [6]. In this work,we do not assume the availability of any pre-linked entities, andfocus on the tasks of column type annotation and pairwise columnrelation extraction completely based on the table cell contents.
Representation Learning of Tabular Data.
Earlier work utilizedprobabilistic models to capture dependencies between table cells.Limaye et al. [23] proposed to model the entity, type and relation in-formation of table cells as random variables, and jointly learn theirdistributions by defining a set of potential functions. A MarkovRandom Fields model was proposed by Ibrahim et al. [19] for canon-icalizing table headers and cells into concepts and entities with aspecial consideration on numerical cell values of quantities. Meimeiby Takeoka et al. [41] proposed to incorporate multi-label classi-fiers in the probabilistic model to support versatile types of celland improve predictive preformance. These methods have high complexity due to MCMC sampling and they cannot be directlyapplied on large-scale relational Web tables.Some studies made efforts in table representations learning byleveraging the word embedding model word2vec [29]. Table2Vecby Zhang et al. [49] proposed to linearize a cropped portion ofthe table’s grid structure into sequence of cell tokens as the in-put of word2vec. This treatment was also adopted by Gentile etal. [14] for blocking to reduce the efficiency of entity matching.Record2Vec by Sim et al. [39] transformed structured records intoattribute sequence and combined word2vec with a tailored tripletloss. However, shallow neural models like word2vec have rela-tively limited expressiveness which pose difficulties on effectivelycapturing the semantics of relational tables.Recent methods utilize deep neural language model for learningtable representations. TURL by Deng et al. [8] proposed a pre-training/finetuning framework for relational Web tables by inject-ing visibility matrix into the encoder of Transformer [42] to attendon structurally related table components. The authors also pro-posed a Masked Entity Recovery objective to enhance the learningcapability but this requires pre-linked entities of table cells as modelinput which is not available in most real cases. TaPas by Herzig etal. [17] proposed to jointly learn the embedding of natural languagequestions over relational tables by extending the BERT [9] modelwith more table-aware positional embeddings. TaBERT by Yin et al.[47] adopted a similar idea for semantic parsing on database tablescombining with Masked Column Prediction and Cell Value Recov-ery as two additional unsupervised objectives for pre-training. Onemajor limitation of these methods is they only focus on aggregat-ing components of a single table via indirect techniques such asvisibility matrix and content snapshot [7, 33, 40]. In contrast, wepropose to directly capture intra-table context by attending on thecolumn and rows cells with easy integration of inter-table contexts.And, we fully consider the highly valuable inter-table contextualinformation by aggregating various types of implicit connectionsbetween tables of same cell value or position.
Given a collection of webpages, perhaps from semi-structured web-sites, with relational Web tables embedded in them, and our goal isto uncover the semantics of the columns by annotating them withtheir type and determining the relation between pairs of columns. AWeb table in such pages can be schematically understood as a gridcomprising of rows and columns. Each row contains informationabout a single real-world entity (e.g., an album name), typicallyfound in one of the first few columns, and its attributes and rela-tionships with other entities in other columns (e.g., release yearand publisher), and each column contains attributes or entities ofthe same type described by an optional column header. Taking thetable in Figure 1 as an example, the first column contains the albumentities being described, while the rest of the columns indicate itsattributes and relationships. We call this column the subject col-umn to indicate it is the subject of the rows. Moreover, we caninfer “DJM Records” to be the publisher of “Empty Sky” due to thefact that their association is found in the same row, and likewise,we can inter “Empty Sky” to be a “Release” (a technical term foralbum) by knowing other values in the same column and due to able 1: Symbols and their descriptions.
Symbol Description 𝑇 𝑘 a relational Web table 𝑡 𝑚,𝑛 ( 𝑡 𝑚,𝑛𝑘 ) table cell of 𝑇 𝑘 locates at the intersection ofthe 𝑚 -th row and the 𝑛 -th column 𝑡 𝑚, ∗ , 𝑡 ∗ ,𝑛 the 𝑚 -th row, and the 𝑛 -th column of 𝑇 𝑘 𝑆 𝑟𝑘 , 𝑆 𝑐𝑘 𝑇 𝑘 ’s number of rows, and number of columns 𝜙 the table schema mapping function 𝑝 𝑘 𝑇 𝑘 ’s page topic of short text D a dataset of relational tables 𝐾 , 𝑈 number of relational tables, and number ofunique table schema in DC , R set of target column types, and set of targetrelations between subject and object columns e 𝑡 𝑚,𝑛 initial embedding vector of table cell 𝑡 𝑚,𝑛 e 𝑐𝑡 𝑚,𝑛 , e 𝑟𝑡 𝑚,𝑛 the column-wise, and row-wise aggregatedcontext vectors of target cell 𝑡 𝑚,𝑛 AGG 𝑎 the intra-table aggregation function e 𝑎𝑡 𝑚,𝑛 the intra-table contextual embedding of 𝑡 𝑚,𝑛 N 𝑣 , N 𝑠 , N 𝑝 set of value cells, set of position cells, andset of and topic cells e 𝑣𝑡 𝑚,𝑛 , e 𝑠𝑡 𝑚,𝑛 aggregated inter-table contextual embeddings e 𝑝𝑡 𝑚,𝑛 of N 𝑣 ( 𝑡 𝑚,𝑛 ) , N 𝑠 ( 𝑡 𝑚,𝑛 ) , and N 𝑝 ( 𝑡 𝑚,𝑛 ) 𝐷 , W dimension of vector, and matrix of parameters presence of the header “Album”. In a set of 𝐾 relational Web tables,we denote the 𝑘 -th table 𝑇 𝑘 ( 𝑘 = , . . . , 𝐾 ) as a set of row tuples, i.e., 𝑇 𝑘 (cid:66) {( 𝑡 , 𝑘 , 𝑡 , 𝑘 , . . . , 𝑡 ,𝑆 𝑐𝑘 𝑘 ) , . . . , ( 𝑡 𝑆 𝑟𝑘 , 𝑘 , 𝑡 𝑆 𝑟𝑘 , 𝑘 , . . . , 𝑡 𝑆 𝑟𝑘 ,𝑆 𝑐𝑘 𝑘 )} where 𝑆 𝑟𝑘 and 𝑆 𝑐𝑘 are the number of rows and columns of the 𝑘 -th table. Thefirst row 𝑡 , ∗ typically contains the table header (e.g., “Title of Track”and “Name of Composer”). When the context is clear, we omit thesubscript and use 𝑡 𝑚,𝑛 to denote the cell at the intersection of 𝑚 -throw ( ≤ 𝑚 ≤ 𝑆 𝑟𝑘 ) and 𝑛 -th column ( ≤ 𝑛 ≤ 𝑆 𝑐𝑘 ) of table 𝑇 𝑘 . Weuse 𝑡 𝑚, ∗ and 𝑡 ∗ ,𝑛 to denote all cells at the 𝑚 -th row and the 𝑛 -thcolumn of the table respectively, i.e., 𝑡 𝑚, ∗ (cid:66) ( 𝑡 𝑚, , 𝑡 𝑚, , . . . , 𝑡 𝑚,𝑆 𝑐𝑘 ) and 𝑡 ∗ ,𝑛 (cid:66) ( 𝑡 ,𝑛 , 𝑡 ,𝑛 , . . . , 𝑡 𝑆 𝑟𝑘 ,𝑛 ) .A Web table has additional contexts that further describe itssemantics and therefore should be leveraged for better table in-terpretation. They come in two forms: metadata from the page ofthe table, namely an optional table caption and the
Given a relational table dataset D = {( 𝑇 𝑘 , 𝑝 𝑘 )} 𝐾𝑘 = , ourgoal is to perform the following two table interpretation tasks: Column type detection: we aim to predict the type of a columnfrom among a fixed set of predefined types from an ontology, i.e., 𝑓 𝑐 : {{ 𝑡 ∗ ,𝑛𝑘 } 𝑆 𝑐𝑘 𝑛 = } 𝐾𝑘 = → C where C is the set of target entity types; Pairwise column relation prediction: we aim to predict the pair-wise relation between the subject column and object columns, i.e., 𝑓 𝑟 : {{( 𝑡 ∗ , 𝑘 , 𝑡 ∗ ,𝑛𝑘 )} 𝑆 𝑐𝑘 𝑛 = } 𝐾𝑘 = → R where R is the set of known rela-tions from the known ontology. In this section, we present a novel deep architecture Table Convo-lutional Network (TCN) for relational table representation learning.TCN first learns the latent embedding of each relational table cellby aggregating both intra- and inter-table contextual information.These learned cell embeddings are then summarized into columnembedding which are used for predicting the column type and pair-wise column relation. The overall framework of TCN is shown inFigure 2. We first introduce the intra-table aggregation module forsummarizing cells of the same column and row (Section 4.1). Then,for capturing the contextual information across tables, we proposethree specific inter-table aggregation methods to fully learn fromvarious types of implicit connections between tables (Section 4.2).At last, we present the model’s training procedure in a classic su-pervised setting as well as for pre-training on large-scale relationalWeb tables dataset (Section 4.3). olumn aggregationRow aggregation Value aggregationPosition aggregation Topic aggregationTable 𝑇 ! of target cell 𝑡 !",$ 𝑡 !",$ 𝑝 ! Tables of value cells 𝒩 % (𝑡 !",$ ) Tables of position cells 𝒩 & (𝑡 !",$ ) with the same schema as 𝑇 ! Tables of topic cells 𝒩 ’ (𝑡 !",$ ) Page topic of table 𝑇 ! 𝑝 ! Figure 2: Overall framework of the proposed TCN for learning relational table latent representations by considering boththe intra- and inter-table contextual information. Page topic 𝑝 𝑘 is appended to the right of each table as a pseudo-column(dashed cells). Arrows/cells highlighted in various colors denote different types of connection to the target cell 𝑡 𝑚,𝑛𝑘 and thecorresponding aggregation module. The intra-table context of 𝑡 𝑚,𝑛𝑘 is aggregated from cells of the same column and row (greenand yellow). Morevoer, 3 types of inter-table contexts are aggregated from (i) value cells N 𝑏 of the same value (blue), (ii) positioncells N 𝑠 of the same schema position (purple), and (iii) topic cells N 𝑝 of the same value as 𝑡 𝑚,𝑛𝑘 ’s topic (turquoise) respectively. For learning the latent representation of a target table cell 𝑡 𝑚,𝑛 ∈ 𝑇 𝑘 ,besides the information carried by its own cell value, it is natural toassume other cells in 𝑇 𝑘 of the same column or row are helpful forcapturing the intra-table context of 𝑡 𝑚,𝑛 . As an example, a singlecell that of a common person name “Pete Bellotte” is ambiguousby itself, unless other song composer names appear in the samecolumn, or his composed song names are present in the same row. We use e 𝑡 𝑚,𝑛 ∈ R 𝐷 𝑑 to denote theinitial 𝐷 𝑑 -dim embedding vector of cell 𝑡 𝑚,𝑛 . It can be pre-trainedword embeddings [35] or simply setting to one-hot identifier vector.A straightforward way to consider other cells of the same column ascontext of 𝑡 𝑚,𝑛 is applying a pooling operator on their embeddings,e.g., (cid:205) 𝑆 𝑟𝑘 𝑚 ′ = e 𝑡 𝑚 ′ ,𝑛 /( 𝑆 𝑟𝑘 − ) where 𝑚 ′ ≠ 𝑚 . However, different cellsin the column have various contributions to the context of the targetcell and they should be considered differently. For example, in atrending songs table of a singer, cells of his or her main artist songsshould be more important of larger weight values compared withfeatured artist songs. This can be achieved by setting the targetcell embedding e 𝑡 𝑚,𝑛 as query to attend on other cell embeddings { e 𝑡 𝑚 ′ ,𝑛 } 𝑆 𝑟𝑘 𝑚 ′ = ( 𝑚 ′ ≠ 𝑚 ) of the same column (see Figure 3(a)): 𝛼 𝑡 𝑚 ′ ,𝑛 = exp (cid:0) e 𝑡 𝑚 ′ ,𝑛 ⊤ · e 𝑡 𝑚,𝑛 (cid:1)(cid:205) 𝑆 𝑟𝑘 ˜ 𝑚 = , ˜ 𝑚 ≠ 𝑚 exp (cid:0) e 𝑡 ˜ 𝑚,𝑛 ⊤ · e 𝑡 𝑚,𝑛 (cid:1) , (1)where 𝛼 𝑡 𝑚 ′ ,𝑛 is the weight of column cell 𝑡 𝑚 ′ ,𝑛 . The column-wiseaggregated context embedding e 𝑐𝑡 𝑚,𝑛 ∈ R 𝐷 𝑐 can be computed by e 𝑐𝑡 𝑚,𝑛 = 𝜎 (cid:169)(cid:173)(cid:171) W 𝑐 · 𝑆 𝑟𝑘 ∑︁ 𝑚 ′ = ,𝑚 ′ ≠ 𝑚 𝛼 𝑡 𝑚 ′ ,𝑛 e 𝑡 𝑚 ′ ,𝑛 (cid:170)(cid:174)(cid:172) , (2)where 𝜎 is nonlinear ReLU, W 𝑐 ∈ R 𝐷 𝑐 × 𝐷 𝑑 is parameter matrix. Analogous to column aggregation, we canalso use the target cell as query to attend and aggregate other rowcells. However, different from column cells that are homogeneous ofthe same entity type, cells of the same row are mostly heterogeneousin type and contain complex relational information conditioningon the page topic 𝑝 𝑘 [51]. In other words, knowing the page topiccan greatly benefit inferring the factual knowledge of other rowcells with respect to 𝑡 𝑚,𝑛 . For capturing the impact from page topic 𝑝 𝑘 , we incorporate the topic embedding vector e 𝑝 𝑘 ∈ R 𝐷 𝑝 into thetarget cell query e 𝑡 𝑚,𝑛 for attending other row cells (see Figure 3(b)): 𝛽 𝑡 𝑚,𝑛 ′ = exp (cid:0) e 𝑡 𝑚,𝑛 ′ ⊤ · W 𝑞 · ( e 𝑡 𝑚,𝑛 ∥ e 𝑝 𝑘 ) (cid:1)(cid:205) 𝑆 𝑐𝑘 ˜ 𝑛 = , ˜ 𝑛 ≠ 𝑛 exp (cid:0) e 𝑡 𝑚, ˜ 𝑛 ⊤ · W 𝑞 · ( e 𝑡 𝑚,𝑛 ∥ e 𝑝 𝑘 ) (cid:1) , (3)where W 𝑞 ∈ R 𝐷 𝑑 ×( 𝐷 𝑑 + 𝐷 𝑝 ) is a bilinear transformation allowinginteractions from row cells to both the target cell and page topic, and ∥ is the vector concatenation operator. So, comparing with Eqn. (1)the attention weights of row cells are adaptively determined basedon the target cell information as well as the page topic semantics.In addition, we explicitly include the page topic into the row-wiseaggregated context vector by concatenating the topic embedding e 𝑝 𝑘 with the attended sum of row cell embeddings: e 𝑟𝑡 𝑚,𝑛 = 𝜎 (cid:169)(cid:173)(cid:171) W 𝑟 · 𝑆 𝑐𝑘 ∑︁ 𝑛 ′ = ,𝑛 ′ ≠ 𝑛 (cid:0) 𝛽 𝑡 𝑚,𝑛 ′ e 𝑡 𝑚,𝑛 ′ (cid:1) ∥ e 𝑝 𝑘 (cid:170)(cid:174)(cid:172) , (4)where e 𝑟𝑡 𝑚,𝑛 ∈ R 𝐷 𝑟 denotes the row-wise aggregated 𝐷 𝑟 -dim con-text embedding of 𝑡 𝑚,𝑛 and W 𝑟 ∈ R 𝐷 𝑟 ×( 𝐷 𝑑 + 𝐷 𝑝 ) is parameter ma-trix. Intuitively, this can be seen as appending the page topic 𝑝 𝑘 asa pseudo-column of identical topic cells to the last column of 𝑇 𝑘 . After we have distilledcontextual information of 𝑡 𝑚,𝑛 by aggregating from related cellsof the same column and row in 𝑇 𝑘 , we can fuse these column- and !, 𝑡 $, 𝑡 %, 𝑡 & !" , …… 𝐞 ! !, " 𝛼 ’ 𝛼 ’ &,% 𝛼 ’ ’!",% (a) Column aggregation 𝐞 ! !, " 𝛽 ! !, 𝛽 ! !,$ 𝛽 ! !,%&’ 𝑡 ",$ 𝑡 ",% 𝑡 ",& 𝑡 ",’ &’ … … 𝑝 ( 𝐖 (b) Row aggregation Figure 3: The intra-table aggregation module for summariz-ing contexts of target cell 𝑡 𝑚,𝑛 inside table 𝑇 𝑘 . (a) Column ag-gregation uses the embedding of 𝑡 𝑚,𝑛 as query to attend onother column cells for generating column-wise aggregatedcontext e 𝑐𝑡 𝑚,𝑛 . (b) Row aggregation also incorporates the em-bedding of page topic 𝑝 𝑘 into the query for attending otherrow cells. The result is concatenated with topic embeddingas the row-wise aggregated context embedding e 𝑟𝑡 𝑚,𝑛 . row-wise aggregated context embeddings into a holistic intra-tablecontext embedding e 𝑎𝑡 𝑚,𝑛 ∈ R 𝐷 𝑎 . We use function AGG 𝑎 to denotethis whole intra-table aggregation process (from Eqn. (2) to (5)): e 𝑎𝑡 𝑚,𝑛 = 𝜎 (cid:0) W 𝑎 · ( e 𝑐𝑡 𝑚,𝑛 ∥ e 𝑟𝑡 𝑚,𝑛 ) (cid:1) = AGG 𝑎 ( 𝑡 𝑚,𝑛 ) , (5)where W 𝑎 ∈ R 𝐷 𝑎 ×( 𝐷 𝑐 + 𝐷 𝑟 ) is the parameter matrix. The output em-bedding e 𝑎𝑡 𝑚,𝑛 encapsulates the intra-table contextual informationof target cell 𝑡 𝑚,𝑛 from all informative cells of relational table 𝑇 𝑘 .Most existing work of table representation learning rely on indirecttechniques such as visibility matrix [8] and content snapshot [47]for modeling related cells inside the table and does not considercontexts across tables. In contrast, the proposed intra-table aggre-gation of TCN directly captures the intra-table context, and can beeasily applied for integrating with various inter-table contexts. Weuse this intra-table aggregation function AGG 𝑎 as the underlyingoperation for summarizing all intra-table contexts of arbitrary cellsthat are implicitly connected to the target cell 𝑡 𝑚,𝑛𝑘 . By aggregating from related cells in the same table, we can learnlocally context-aware latent representation of a table cell. However,on the Web there are also a lot of implicit connections across dif-ferent tables, and these hidden connections often provide highlyvaluable context that are complementary to the intra-table context.For example, the song composer’s name “Pete Bellotte” could appearin multiple tables where he also serves as a record producer in someof them. These two roles are subtly different yet complementary toeach other. Jointly modeling the intra- and inter-table contexts canbenefit capturing more accurate relational information.Such inter-table connections can also be of various types. Besidestables connected by the same cell value, there are often tablessharing the same schema (i.e., the same headers), and the topicof certain tables might appear in other tables as cell values. Forexample, Web tables designed for describing music albums will havethe identical header cells, and the page topic (i.e., the album name)can also appear in the singer’s discography table. To effectively 𝐞 ! !" " 𝑡 !",$ 𝑡 ! ! %", &$ 𝑡 ! " %", &$ 𝑡 ! %", &$ 𝐞 ! !( " 𝐞 ! !) " 𝐄 " ’(,) 𝛀 * $%,’ + 𝐖 , 𝐖 - 𝐞 " ’(,) Tables of value cells 𝒩 + (𝑡 !",$ ) Table 𝑇 ! of target cell 𝑡 !",$ Figure 4: The value aggregation module for summarizingthe target cell 𝑡 𝑚,𝑛𝑘 ’s inter-table contextual information fromits value cells N 𝑣 ( 𝑡 𝑚,𝑛𝑘 ) . Double-sided arrows indicate N 𝑣 share the same cell value as 𝑡 𝑚,𝑛𝑘 . The intra-table contextsof N 𝑣 extracted via AGG 𝑎 (Section 4.1.3) are arranged intomatrix E 𝑣𝑡 𝑚,𝑛 . The value cells aggregated context embedding E 𝑣𝑡 𝑚,𝑛 is summed by self-attention weights of Ω 𝑣𝑡 𝑚,𝑛 . modeling context of heterogeneous connections, we propose threeinter-table aggregation modules for distilling inter-table contexts. Each relational table describe a set of re-lations between its cells, and in turn each unique cell could alsobe expressed by a set of tables where it appears. Intuitively, eachtable can be seen as a partial view of target cell’s context. Forcapturing the contextual information from different tables, we es-tablish connections between cells of different tables containing thesame value. Particularly, we adopt basic normalizing proceduresto canonicalize table cells with no additional step of expensiveentity linking [31, 36, 51]. In practice, we apply minimal prepro-cessing on cell values by lowering all cases and removing redun-dant spaces. Given a target cell 𝑡 𝑚,𝑛𝑘 in relational table 𝑇 𝑘 , we use N 𝑣 to denote cells of other tables containing the same value, i.e., N 𝑣 ( 𝑡 𝑚,𝑛𝑘 ) (cid:66) { 𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ | 𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ = 𝑡 𝑚,𝑛𝑘 ∧ ≤ 𝑘 ′ ≤ 𝐾 ∧ 𝑘 ′ ≠ 𝑘 } .By applying the intra-table aggregation function AGG 𝑎 as previ-ously introduced (Section 4.1.3), we can produce the local contextsof all value cells { e 𝑎𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ = AGG 𝑎 ( 𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ ) | 𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ ∈ N 𝑣 ( 𝑡 𝑚,𝑛𝑘 )} with re-spect to the corresponding relational table. For effectively focusingon the most useful connections of value cells, we further processthis variant-sized set of value cells’ intra-table contexts into a singlevector (see Figure 4) by leveraging the self-attention mechanism[42]. Specifically, we arrange all extracted intra-table contexts of N 𝑣 ( 𝑡 𝑚,𝑛𝑘 ) into a matrix of E 𝑣𝑡 𝑚,𝑛 ∈ R |N 𝑣 ( 𝑡 𝑚,𝑛𝑘 ) |× 𝐷 𝑎 , where each rowcontains the context aggregated from one value cell of 𝑡 𝑚,𝑛𝑘 . Therelative importance for value cells of 𝑡 𝑚,𝑛𝑘 can be calculated as: Ω 𝑣𝑡 𝑚,𝑛 = softmax (cid:16) W 𝑠 · (cid:0) E 𝑣𝑡 𝑚,𝑛 (cid:1) ⊤ (cid:17) , (6)where W 𝑠 ∈ R 𝑉 × 𝐷 𝑎 is a parameter matrix for computing the 𝑉 -view weight matrix Ω 𝑣𝑡 𝑚,𝑛 ∈ R 𝑉 ×|N 𝑣 ( 𝑡 𝑚,𝑛𝑘 ) | , the softmax is appliedrow-wisely, and 𝑉 is the number of attention heads setting to 2in practice. Each row of Ω 𝑣𝑡 𝑚,𝑛 reflects one view of the value cellsimportance distribution. Note if 𝑉 = , W 𝑠 degenerates into aarameter query vector and the softmax function can be expandedout similarly to Eqn. (1). Then, the value cells aggregated contextembedding can be computed as: E 𝑣𝑡 𝑚,𝑛 = mean (cid:0) Ω 𝑣𝑡 𝑚,𝑛 · E 𝑣𝑡 𝑚,𝑛 · W 𝑏 (cid:1) , (7)where W 𝑏 ∈ R 𝐷 𝑎 × 𝐷 𝑏 is the parameter matrix for transforminginto 𝐷 𝑏 -dim value cells aggregated context, and the final output E 𝑣𝑡 𝑚,𝑛 ∈ R 𝐷 𝑏 is obtained via a element-wise mean pooling function. Besides linking tables based on theircell values, the unique grid-like structure of relational tables alsogrants us a valuable foundation for establishing connections be-tween tables based on the cell’s relative position inside the table.The intuition is that for a subset of relational tables with the sameschema, i.e., { 𝑇 𝑘 | 𝜙 ( 𝑘 ) = 𝑢 } ( ≤ 𝑢 ≤ 𝑈 ), cells at the same positionin terms of row index 𝑚 ( ≤ 𝑚 ≤ max ({ 𝑆 𝑟𝑘 | 𝜙 ( 𝑘 ) = 𝑢 }) ) andcolumn index 𝑛 ( ≤ 𝑛 ≤ max ({ 𝑆 𝑐𝑘 | 𝜙 ( 𝑘 ) = 𝑢 }) ) could provideuseful contextual information to each other. For example, supposethere is a collection of identical schema tables describing variousmusic albums, knowing any cell of a song track name (or composer)would reveal other cells of the same position also instantiate thesame “Release” (or “People”) type. We use N 𝑠 to denote positioncells, i.e., N 𝑠 ( 𝑡 𝑚,𝑛𝑘 ) (cid:66) { 𝑡 𝑚,𝑛𝑘 ′ | 𝜙 ( 𝑘 ) = 𝜙 ( 𝑘 ′ ) ∧ ≤ 𝑘 ′ ≤ 𝐾 ∧ 𝑘 ′ ≠ 𝑘 } .In general domain, the connections between 𝑡 𝑚,𝑛𝑘 and positioncells N 𝑠 maybe sparse because the number of unique table schema 𝑈 is also large and comparable to total number of tables 𝐾 . However,an important practical case is relational tables on semi-structuredwebsite which consist of a set of detail pages that each containsinformation about a particular page topic [24, 27]. Typically, thefactual information of these tables are automatically populatedfrom an underlying database. When the relational table dataset D is constructed from such semi-structured websites, there are a largenumber of helpful inter-table connections to position cells N 𝑠 .Without losing generality, we propose to aggregate from posi-tion connections, which potentially is a rich source of the targetcell’s inter-table contextual information. For generating the posi-tion cells aggregated context embedding e 𝑠𝑡 𝑚,𝑛 ∈ R 𝐷 𝑠 , we adpot thesimilar strategy proposed for aggregating value cells (see Section4.2.1). Specifically, we arrange all intra-table contexts of N 𝑠 ( 𝑡 𝑚,𝑛𝑘 ) into matrix E 𝑠𝑡 𝑚,𝑛 ∈ R |N 𝑠 ( 𝑡 𝑚,𝑛𝑘 ) |× 𝐷 𝑎 and substitute it into Eqn. (6).The result Ω 𝑠𝑡 𝑚,𝑛 is further substituted into Eqn. (7) for computing e 𝑠𝑡 𝑚,𝑛 . Note that we truncate the number of position cells when |N 𝑠 | is too large according to a sampling budget 𝑏 in practice tomaintain computational tractability. We will test the effectivenessof our proposed method in both general domain and on a speciallyconstructed dataset from semi-structured websites. Another important type of implicit inter-table connections can be discovered by examining the underlyingconnectivity between the page topic of target cell and cells of othertables. Relational Web tables are mainly created for conveyingknowledge of relations that are relevant to the page topic. The tablecells and topic both refer to relevant real world entities [51]. It iscommon to see the topic of certain tables appear as the cell valuesof other relational tables. In the example of music album tables, thepage topic of album name would appear in other relational tables such as the singer’s discography. So, it is also beneficial for themodel to extract contextual information from these topic cells.We use N 𝑝 to denote topic cells containing the same value asthe page topic 𝑝 𝑘 of target cell 𝑡 𝑚,𝑛𝑘 , i.e., N 𝑝 ( 𝑡 𝑚,𝑛𝑘 ) (cid:66) { 𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ | 𝑡 ˜ 𝑚, ˜ 𝑛𝑘 ′ = 𝑝 𝑘 ∧ ≤ 𝑘 ′ ≤ 𝐾 ∧ 𝑘 ′ ≠ 𝑘 } . Similar to the treatment of valuecells N 𝑣 and position cells N 𝑠 , we first apply the intra-table aggre-gation function AGG 𝑎 (Section 4.1.3) to extract N 𝑝 ’s intra-tablecontexts for constructing E 𝑝𝑡 𝑚,𝑛 , and then generate the topic cellsaggregated context embedding e 𝑝𝑡 𝑚,𝑛 ∈ R 𝐷 𝑝 according to Eqn. (6)and (7). Different from the inter-table connections made by N 𝑣 and N 𝑠 , which are directly linking to the target cell 𝑡 𝑚,𝑛𝑘 , topic cells N 𝑝 connect to the page topic 𝑝 𝑘 of 𝑡 𝑚,𝑛𝑘 . Instead of simply using e 𝑝𝑡 𝑚,𝑛 as the part of the contextual information of 𝑡 𝑚,𝑛𝑘 , we fuse it with 𝑝 𝑘 ’s initial embedding e 𝑝 𝑘 . This can be seen as bringing inter-tablecontextual information into the embedding of page topic, and theresult e 𝑝𝑝 𝑘 = e 𝑝 𝑘 ∥ e 𝑝𝑡 𝑚,𝑛 is substituted into Eqn. (3). In this way, weincorporate the global contexts of topic cells into the intra-tableaggregation function AGG 𝑎 of the model (Section 4.1.2).By aggregating from column and row cells, as well as varioustypes of inter-table connections to value, position and topic cells,the proposed TCN fully considers both the intra- and inter-tablecontextual information during learning. Next, we present the fusionof all contexts and the training procedures of the model. After generating the intra-table contextual embedding e 𝑎𝑡 𝑚,𝑛 anddifferent inter-table contextual embeddings e 𝑣𝑡 𝑚,𝑛 , e 𝑠𝑡 𝑚,𝑛 , e 𝑝𝑡 𝑚,𝑛 , thefinal latent representation of target cell 𝑡 𝑚,𝑛 can be computed as: h 𝑡 𝑚,𝑛 = 𝜎 (cid:0) W ℎ · (cid:0) e 𝑡 𝑚,𝑛 ∥ e 𝑎𝑡 𝑚,𝑛 ∥ e 𝑣𝑡 𝑚,𝑛 ∥ e 𝑠𝑡 𝑚,𝑛 (cid:1)(cid:1) , (8)where W ℎ is the parameter matrix for fusing the initial cell em-bedding e 𝑡 𝑚,𝑛 with all intra- and inter-table contextual embeddingsinto the final 𝐷 ℎ -dim representation of 𝑡 𝑚,𝑛 . Note that topic cellsaggregated context e 𝑝𝑡 𝑚,𝑛 is incorporated in e 𝑎𝑡 𝑚,𝑛 via AGG 𝑎 . The proposed TCN can be trained in a super-vised mode by jointly predicting the type of columns and pairwiserelation between columns for each relational table. Since both ofthese two multi-class classification tasks are on table column level,we compute the embedding h 𝑡 ∗ ,𝑛𝑘 ∈ R 𝐷 ℎ of table column 𝑡 ∗ ,𝑛𝑘 asthe mean of its cell embeddings, i.e., h 𝑡 ∗ ,𝑛𝑘 = Avg (cid:16) { h 𝑡 𝑚,𝑛𝑘 } 𝑆 𝑟𝑘 𝑚 = (cid:17) . Forcolumn type prediction, we use a single dense layer as the finalpredictive model. The discrepancy between the predicted type dis-tribution and the ground truth column type is measured by the lossfunction J C . Specifically, given 𝑐 𝑡 ∗ ,𝑛𝑘 ∈ C denoted as the true typeof 𝑡 ∗ ,𝑛𝑘 , we employ the following cross-entropy objective: J C 𝑘 = − (cid:205) 𝑆 𝑐𝑘 𝑛 = (cid:205) 𝑐 ∈C I 𝑐 𝑡 ∗ ,𝑛𝑘 = 𝑐 · log exp (cid:18) M 𝑐 · h 𝑡 ∗ ,𝑛𝑘 (cid:19)(cid:205) 𝑐 ′∈C exp (cid:18) M 𝑐 ′ · h 𝑡 ∗ ,𝑛𝑘 (cid:19) , (9)where M 𝑐 is the parameter matrix for column type 𝑐 ∈ C and I isan indicator function.Similarly, we concatenate the embeddings of a pair of subject andobject columns ( 𝑡 ∗ , 𝑘 , 𝑡 ∗ ,𝑛𝑘 ) for feeding into a dense layer to generatehe prediction on the true relation 𝑟 𝑡 ∗ ,𝑛𝑘 ∈ R between them: J R 𝑘 = − (cid:205) 𝑆 𝑐𝑘 𝑛 = (cid:205) 𝑟 ∈R I 𝑟 𝑡 ∗ ,𝑛𝑘 = 𝑟 · log exp (cid:18) M 𝑟 · (cid:18) h 𝑡 ∗ , 𝑘 ∥ h 𝑡 ∗ ,𝑛𝑘 (cid:19)(cid:19)(cid:205) 𝑟 ′∈R exp (cid:18) M 𝑟 ′ · (cid:18) h 𝑡 ∗ , 𝑘 ∥ h 𝑡 ∗ ,𝑛𝑘 (cid:19)(cid:19) , (10)where M 𝑟 is parameter matrix for pairwise column relation 𝑟 ∈ R .So, given a mini-batch of relational tables B ⊆ D , the overalltraining objective J of the proposed TCN can be obtained via aconvex combination of the above two tasks’ loss functions: J = ∑︁ 𝑘 ∈B 𝛾 J C 𝑘 + ( − 𝛾 )J R 𝑘 , (11)where 𝛾 is a mixture hyperparameter for balancing the magnitudeof two objectives for predicting column type and pairwise relation. Learning relational table repre-sentation directly under the supervision of column type and relationlabels are not always feasible due to the expensive cost for obtaininghigh quality annotations. The proposed TCN can also be trained inan unsupervised way without relying on explicit labels. Specifically,we first train TCN according to the pre-training objective to ob-tain the output cell embeddings. We then use these pre-trained cellembeddings as initialization for the supervised fine-tuning phaseaimed at jointly predicting column type and pairwise column rela-tion (Eqn. (11)). Similar to the the Masked Language Model (MLM)objective of BERT [9], we randomly mask of table cells be-forehand for recovery. Given a masked cell ^ 𝑡 𝑚,𝑛𝑘 and the globalcontext-aware embedding h ^ 𝑡 𝑚,𝑛𝑘 learned by TCN, the objective forpredicting the original cell value is computed as: J = − (cid:205) 𝑘 ∈B (cid:205) ^ 𝑡 𝑚,𝑛𝑘 ∈ ^ 𝑇 𝑘 (cid:205) 𝑣 ∈V I 𝑡 𝑚,𝑛𝑘 = 𝑣 · log exp (cid:18) M 𝑣 · h ^ 𝑡𝑚,𝑛𝑘 (cid:19)(cid:205) 𝑣 ′∈V exp (cid:18) M 𝑣 ′ · h ^ 𝑡𝑚,𝑛𝑘 (cid:19) , (12)where ^ 𝑇 𝑘 is the set of all masked cells of the 𝑘 -th table, V is the setof all cell values of D , and M 𝑣 is parameter matrix for cell value 𝑣 which could include one or multiple words after the normalization(Section 4.2.1). The pre-trained cell embeddings can later be usedfor initializing fine-turning phase. As we show in experiments(Section 5.5), pre-training improves the performance by + . and + . of F1-weighted for predicting column type and relation. Assuming the attention-based intra-table aggregation functionAGG 𝑎 takes constant time, the per-batch time complexity of theproposed TCN is O(| 𝐵 | ( 𝑏 𝑣 + 𝑏 𝑠 + 𝑏 𝑝 )) in principle, where | 𝐵 | isbatch size and 𝑏 𝑣 (and 𝑏 𝑠 , 𝑏 𝑝 ) is the sampling budget for inter-tableconnections of value (and position, topic) cells. In practice, we sim-ply set 𝑏 𝑣 = 𝑏 𝑠 = 𝑏 𝑝 so the time complexity becomes O(| 𝐵 | 𝑏 ) whichis linear to the product of batch size | 𝐵 | and sampling budget 𝑏 . Inoptimized implementation, we can process tables in batch into onelarge table beforehand by padding and concatenating cells whichcan further reduce the complexity to O( 𝑏 ) . This allows us to scalethe model to tens of millions tables while remaining control on thebalance between model expressiveness and efficiency. Table 2: Statistics of two real Web table datasets D 𝑚 and D 𝑤 . 𝑆 𝑟𝑘 cols. 𝑆 𝑐𝑘 types |C| relations |R|D 𝑚 D 𝑤 |N 𝑣 | cells |N 𝑠 | cells |N 𝑝 |D 𝑚
11 10.7 5048.1 2.4 D 𝑤 In this section, we evaluate the performance of the proposed TCNfor relational table representation learning against competitive base-lines on two real world large-scale relational Web tables datasets.Particularly, we aim at answering the following research questions: • RQ1 : How does the proposed method perform comparedwith the state-of-the-art methods for predicting relationaltable’s column type and pairwise relation between columns? • RQ2 : How does each type of the proposed inter-table aggre-gation module affect the overall performance of the model? • RQ3 : Is the proposed method also effective when appliedfor pre-training on unlabeled corpus of relational tables? • RQ4 : What are some concrete examples of the column typeand column pairwise relations discovered by the model? • RQ5 : What are the recommended hyper-parameter settingsfor applying the proposed model in practical cases?
We collected a dataset D 𝑚 of 128K relational Web tables from6 mainstream semi-structured websites in the music domain. Ta-bles of D 𝑚 generally fall into three page topic categories: “Person”(e.g., singer, composer and etc.), “Release” (i.e., music album), and“Recording” (i.e., song track). The number of unique table schemasin D 𝑚 is relatively small ( 𝑈 = ) because tables are automaticallypopulated. We manually annotated each table schema to obtain col-umn type and pairwise column relation information. For relationalWeb tables in the general domain, we utilize datasets provided byDeng et al. [8] containing annotated relational tables from a rawcorpus of 570K Web tables on Wikipedia. We build a dataset D 𝑤 bytaking a subset of 5.5K tables with annotations on both column typeand relation. Specifically, we take the intersection of task-specificdatasets for column type annotation and relation extraction de-scribed in the paper. For both datasets, we keep tables with at least2 columns/rows. More descriptive statistics are provided in Table 2. We compare the proposed TCN againstthe state-of-the-art tabular data representation learning methods: • Table2Vec [49]: This method flattens a cropped portion ofrelational table and its topic into a sequence of cell tokens anduses word2vec [29] for learning column/cell embeddings. • TaBERT [47]: It jointly learns embeddings of natural lan-guage sentences and relational tables using its content snap-shots to carve out relevant rows for feeding into BERT [9]. able 3: Performance of baselines and variants of TCN and on predicting column type C and pairwise column relation R on dataset D 𝑚 . For all metrics, higher values indicate better performance. Bold highlights global highest values. Underlinedenotes best performance among baselines. Relative improvements over the base variant TCN-intra are shown in parenthesis. Method Column type C Pairwise column relation R Acc. F1-weighted Cohen’s kappa 𝜅 Acc. F1-weighted Cohen’s kappa 𝜅 Table2Vec .832 .820 .763 .822 .810 .772TaBERT .908 .861 .834 .877 .870 .846TURL .914 .877 .876 .890 .889 .838HNN .916 .883 .869 .848 .843 .794Sherlock .922 .895 .863 .831 .818 .802TCN-intra .911 .881 .873 .893 .894 .869TCN- N 𝑣 .939 (+3.1%) .916 (+4.0%) .897 (+2.8%) .920 (+3.0%) .920 (+2.9%) .898 (+3.3%)TCN- N 𝑠 .934 (+2.5%) .908 (+3.1%) .894 (+2.4%) .908 (+1.7%) .912 (+2.0%) .881 (+1.4%)TCN- N 𝑝 .923 (+1.3%) .890 (+1.0%) .880 (+0.8%) .906 (+1.4%) .904 (+1.1%) .875 (+0.7%)TCN .958 (+5.2%) .938 (+6.5%) .913 (+4.6%) .934 (+4.6%) .925 (+3.5%) .905 (+4.1%) • TURL [8]: This method uses linearized cells and linked en-tities as input into its Transformer [42] based encoder en-hanced with structure-aware visibility matrix for pre-training.Besides, we consider methods that are specifically designed forcolumn type prediction or pairwise column relation extraction: • HNN [6]: This method models the contextual semantics ofrelational table column using a bidirectional-RNN with anattention layer and learns column embeddings with a CNN. • Sherlock [18]: It utilizes four categories of statistical fea-tures describing the character/word distributions and seman-tic features to predict the semantic type on column level.We use open-source implementations provided by the original pa-pers for all baseline methods and follow the recommended setupguidelines when possible. For TaBERT, we use the page topic asits NL utterance. And we set the embedding of linked entity inTURL the same as its cell since they are not provided as input inour case. As HNN and Sherlock are originally designed for pre-dicting semantic type of columns, we concatenate the embeddingsof subject and object pair of columns to predict the relation. Wealso tested with representative methods (e.g., MTab, MantisTable,CVS2KG) from the SemTab 2019 challenges i . But we were only ableto successfully generate results from MTab due to the high timecomplexity of others. We observed a relative large performance gapbetween it and Table2Vec’s (probably because of its dependenceon pre-linked entities) so we exclude it in following discussions. Forall methods, we use the same random split of 80/10/10 percents oftables for training/validation/test at each round and we report theaverage performance of five runs. For TCN, the sampling budget 𝑏 for inter-table connections is set to 20 and objective mixture weight 𝛾 is set to 0.5. We set the dimension of cell and all context embed-dings as and initialize each cell embedding by matching withGlove embeddings [35] (taking mean in case of multiple tokens). For classifying multi-class column type C and pairwise column relation R , we use metrics of mean accuracy (Acc.), F1-weighted score and the Cohen’s kappa 𝜅 coefficient. i D 𝑤 in terms of accuracy.(b) Performance of TCN and baselines on dataset D 𝑤 in terms of F1-weighted. Figure 5: The proposed TCN can consistently outperformbaseline methods for column type C prediction and pairwisecolumn relation R extraction on open domain dataset D 𝑤 . Table 3 and Figure 5 presents the experimental results of applyingthe proposed TCN and baseline methods on predicting relationalWeb table column type C and pairwise column relation R on dataset D 𝑚 and D 𝑤 respectively. In the following discussions, we refer toF1-weighted as F1 unless explicitly stated otherwise.For tabular data representation learning methods, TURL per-forms better than other baselines on both tasks across datasets.Table2Vec underperforms all other methods because it simplycrops and flattens part of the table for feeding into the shallow a) Column type C (b) Pairwise column relation R Figure 6: Relative improvements of TCN and variants overthe base variant TCN-intra for predicting column type C andpairwise column relation R on the open domain dataset D 𝑤 . word2vec. There is quite large performance margin ( + . and + . for two tasks on D 𝑚 in terms of F1) from Table2Vec toTaBERT using a deep BERT-based encoder. TURL enhances thedeep encoder with visibility mask which can partially capture thegrid-like structure of relational table so it has better performancecompared with Table2Vec and TaBERT. This means better model-ing the intra-table context is beneficial. But these methods all relyon linearizing the table into a long sequence of cell tokens.For methods specifically designed for column type prediction,Sherlock performs better than HNN for predicting column type C on two datasets. However, it cannot produce competitive resultsfor predicting pairwise column relation R on two datasets which isroughly on the same performance level as Table2Vec. It is interest-ing to note Sherlock achieves the best performance for predictingcolumn type C on D 𝑚 among all baselines because of the effective-ness of its statistical features. But none of these baselines is capableof modeling the valuable inter-table contextual information.The proposed TCN can consistently outperform baseline meth-ods on all metrics across two datasets. For predicting the columntype C , TCN scores an F1 of .938 on D 𝑚 which is + . relativelyover Sherlock ( + . relatively over TURL), and scores an F1of .933 on D 𝑤 which is + . relatively over Sherlock ( + . relatively over TURL). For predicting the pairwise column rela-tion R , TCN can generate an F1 of .925 on D 𝑚 which is + . relatively over TURL, and score an F1 of .941 on D 𝑤 which is + . relatively over TURL. This justifies the effectiveness TCN’sinter-table aggregation modules for capturing the contextual infor-mation across various types of implicitly connected tables. Theseinter-table contexts are complementary to the intra-table contextproviding additional discriminative power in downstream tasks. To further validate the effectiveness of each inter-table aggregationmodule of TCN, we propose 4 model variants by including selectedtype(s) of inter-table aggregations and compare against them: • TCN-intra: Base version only considers intra-table contexts,i.e., cells of the same column/row, using the intra-table ag-gregation AGG 𝑎 (Section 4.1.3) on each table independently. • TCN- N 𝑣 : This variant considers the inter-table context ofvalue cells N 𝑣 (Section 4.2.1) besides intra-table context. Table 4: Performance of TCN and baselines under differenttraining settings evaluated in terms of F1-weighted.
Method D 𝑚 D 𝑤 Type C Relation R Type C Relation R TCN (supervised) .938 .925 .933 .941TCN + TaBERT .948 .933 .942 .949for pre-trainingTCN + TURL .945 .934 .946 .953for pre-trainingTCN full w/ .957 .946 .951 .960pre-training& fine-tuning • TCN- N 𝑠 : Position cells N 𝑠 of the same schema position (Sec-tion 4.2.2) are considered as the inter-table context here. • TCN- N 𝑝 : Topic cells N 𝑝 of the same value as target cell’spage topic (Section 4.2.3) are considered in this variant.The performance of TCN and its variants are presented in Table 3and Figure 6. The base variant TCN-intra which only considersintra-table contextual information performs roughly on the samelevel as the best baseline model TURL for tabular data representa-tion learning. This demonstrates the intra-table aggregation func-tion AGG 𝑎 can successfully summarize the contextual informationinside the target table from cells of the same column and row. Wecompare other TCN’s variant against TCN-intra for evaluating therelative contribution of each inter-table aggregation module.The TCN- N 𝑣 considering inter-table contexts from value cellsprovides a significant increase on the performance for both tasks.By only aggregating from value cells, TCN- N 𝑣 can score F1s of.916 and .921 for predicting C on two datasets which are relatively + . and + . over the baseline Sherlock (similar trend alsohold for predicting R ). This indicates value cells of the same valueas target cell consistently provide rich inter-table contextual infor-mation complementary to the intra-table contexts, and the valuecells aggregation module of TCN is effective in learning from them.The TCN- N 𝑠 considering position cells besides the intra-tablecontext gives high performance for both tasks on dataset D 𝑚 but isless helpful on dataset D 𝑤 . This is because of the difference in twodatasets: D 𝑚 contains relational tables from semi-structured web-sites with rich connections between position cells ( |N 𝑠 | > × )while D 𝑤 is constructed from open domain Wikipedia pages withsparse position cells ( |N 𝑠 | = . ). Given relatively densely con-nected relational table schemas, we found aggregating from posi-tion cells can provide us useful inter-table contextual information.The TCN- N 𝑝 considering topic cells besides the intra-table con-text generally gives good improvements of performance for bothtasks on two datasets comparable to the improvements provided byTCN- N 𝑣 . This confirms that aggregating from topic cells of othertables can provide TCN- N 𝑝 additional contextual information. And,at last, by considering all three types of inter-table aggregationmodules, the full version of TCN can consistently improve uponTCN-intra which focuses only on the intra-table context by + . and + . for two tasks across datasets in terms of F1. .5 Unsupervised Pre-training (RQ3) Besides training the proposed TCN under the explicit supervisionof column type and relation labels, we can also pre-train TCN onlarge-scale unlabeled corpus of relational Web tables and fine-tuneon downstream tasks (see Section 4.3.2). We also leverage twobaseline methods (TaBERT and TURL) capable of pre-training onunlabeled relational tables to obtain initial cell embeddings as inputfor supervised multi-task training of TCN. We conduct comparativeexperiments on both datasets and present the results in Table 4.We can see that incorporating the pre-trained cell embeddingsof two baseline methods can improve the final performance of TCNon both tasks of predicting column type C and pairwise columnrelation R . The performance improvement gained by utilizing thepre-trained embeddings of TURL is similar to TaBERT since theyboth focus on effectively summarizing intra-table contextual in-formation during the pre-training phase. In contrast to TaBERTand TURL, the proposed TCN can naturally incorporate differenttypes of inter-table context during the pre-training phase withoutcolumn type or relation labels. So, by combing the intra- and inter-table contexts learned during both the pre-training and fine-tuningphases, the propose TCN can score F1s of .957 and .951 for pre-dicting C on two datasets ( + . and + . relatively over TCNwithout pre-training); and scores F1s of .946 and .960 for predicting R ( + . and + . relatively over TCN without pre-training). Our goal of TCN is to automatically discover column type andpairwise relation between columns given a relational Web tableand an external knowledge base. We provide concrete examples onthe output of TCN to show its effectiveness and practical utilities.We used a proprietary music ontology for dataset D 𝑚 in ex-periments which includes column types such as “Release”, “Peo-ple”, “Recording”, “RecordLabel”, “XMLSchema We examine the impact of TCN’s key hyper-parameters: (1) thesampling budget 𝑏 for different types of inter-table connections Sampling budget b (log scale) F - w e i g h t e d Type C Relation R (a) Improving the value of sampling bud-get 𝑏 is generally beneficial until 20. Objective mixture weight γ F - w e i g h t e d Type C Relation R (b) A choice of 𝛾 in range of [0.2, 0.8] canyield stable model performance. Figure 7: Sensitivity of TCN’s performance on different val-ues of sampling budget 𝑏 and objective mixture weight 𝛾 . ( |N 𝑏 | , |N 𝑠 | , and |N 𝑝 | ), and (2) the mixture weight 𝛾 of overall ob-jective (Eqn. (11)), on the model’s performance using dataset D 𝑚 .We analyze these two hyper-parameters adopting the gird-searchstrategy: 𝑏 is chosen from { , , , , , } and 𝛾 is chosenfrom { . , . , . , . , . , . } . Figure 7 presents the results.In Figure 7(a) we can see that improving the value of 𝑏 from 2 to10 can noticeably improve the performance because larger 𝑏 valuesallow TCN to learn from more inter-table connections and thusaggregating more contextual information across tables. But furtherimproving 𝑏 brings in diminishing benefits and large values suchas 100 in turn hurts the performance because of additional noise. InFigure 7(b), we can see that TCN’s performance is generally stablefor a 𝛾 values in range of [0.2, 0.8]. For extreme cases of small (orlarge) values of 𝛾 , the supervised multi-task objective of TCN willdegenerates into single-task setting because the gradients from onetask are being squashed leading to the model only focuses on eitherthe column type or pairwise relation prediction. In practice, werecommend finding the optimal values of 𝑏 based on the datasetand choosing a balancing value of 𝛾 such as 0.5. In this work, we proposed a novel approach for learning relationaltable latent representations. Our proposed method aggregates cellsof the same column and row into the intra-table context. In addition,three types of inter-table contexts are aggregated from value cellsof the same value, position cells of the same position, and topic cellsof the same value as target cell’s page topic. Extensive experimentson two real relational table datasets from open domain and semi-structured websites demonstrated the effectiveness of our model.We focus on relational Web tables of horizontal format in this workbut a huge number of tables with various types and structures can befound on the Web. We consider handling complex table formats suchas different orientations, composite header cells, changing subjectcolumn index and nested cells as an interesting future direction.Also, applying the leaned cell embeddings for other downstreamtasks such as cell entity linking, table type prediction, and tablerelation detection would be promising to explore in the future
ACKNOWLEDGMENTS
We thank all anonymous reviewers for valuable comments. Thisresearch was supported in part by NSF Grants IIS-1849816.
EFERENCES [1] Chandra Sekhar Bhagavatula, Thanapon Noraset, and Doug Downey. 2015. TabEL:entity linking in web tables. In
International Semantic Web Conference . Springer,425–441.[2] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor.2008. Freebase: a collaboratively created graph database for structuring humanknowledge. In
Proceedings of the 2008 ACM SIGMOD international conference onManagement of data . 1247–1250.[3] Michael Cafarella, Alon Halevy, Hongrae Lee, Jayant Madhavan, Cong Yu,Daisy Zhe Wang, and Eugene Wu. 2018. Ten years of webtables.
Proceedings ofthe VLDB Endowment
11, 12 (2018), 2140–2149.[4] Michael J Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang.2008. Webtables: exploring the power of tables on the web.
Proceedings of theVLDB Endowment
1, 1 (2008), 538–549.[5] Michael J. Cafarella, A. Halevy, Y. Zhang, D. Wang, and E. Wu. 2008. Uncoveringthe Relational Web. In
WebDB .[6] J Chen, I Horrocks, E Jimenez-Ruiz, and C Sutton. 2019. Learning SemanticAnnotations for Tabular Data.
IJCAI 2019 (2019), 2088–2094.[7] Marco Cremaschi, Roberto Avogadro, and David Chieregato. 2019. MantisTable:an Automatic Approach for the Semantic Table Interpretation.
SemTab ISWC
ArXiv abs/2006.14806 (2020).[9] J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding. In
NAACL-HLT .[10] Xin Luna Dong, Hannaneh Hajishirzi, Colin Lockard, and Prashant Shiralkar.2020. Multi-Modal Information Extraction from Text, Semi-Structured, andTabular Data on the Web. In
Proceedings of the 26th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20) . Association for Computing Machinery, New York, NY, USA, 3543–3544.https://doi.org/10.1145/3394486.3406468[11] Vasilis Efthymiou, Oktie Hassanzadeh, Mariano Rodriguez-Muro, and VassilisChristophides. 2017. Matching web tables with knowledge base entities: fromentity lookups to entity embeddings. In
International Semantic Web Conference .Springer, 260–277.[12] Ju Fan, Meiyu Lu, Beng Chin Ooi, Wang-Chiew Tan, and Meihui Zhang. 2014.A hybrid machine-crowdsourcing system for matching web tables. In . IEEE, 976–987.[13] Besnik Fetahu, Avishek Anand, and Maria Koutraki. 2019. Tablenet: An approachfor determining fine-grained relations for wikipedia tables. In
The World WideWeb Conference . 2736–2742.[14] Anna Lisa Gentile, Petar Ristoski, Steffen Eckel, Dominique Ritze, and HeikoPaulheim. 2017. Entity Matching on Web Tables: a Table Embeddings approachfor Blocking.. In
EDBT . 510–513.[15] Majid Ghasemi-Gol and Pedro A. Szekely. 2018. TabVec: Table Vectors for Classi-fication of Web Tables.
ArXiv abs/1802.06290 (2018).[16] Ralph Grishman. 2015. Information extraction.
IEEE Intelligent Systems
30, 5(2015), 8–15.[17] Jonathan Herzig, P. Nowak, Thomas Müller, Francesco Piccinno, and Julian MartinEisenschlos. 2020. TAPAS: Weakly Supervised Table Parsing via Pre-training. In
ACL .[18] Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satya-narayan, Tim Kraska, Çagatay Demiralp, and César Hidalgo. 2019. Sherlock: Adeep learning approach to semantic data type detection. In
Proceedings of the25th ACM SIGKDD . 1500–1508.[19] Yusra Ibrahim, Mirek Riedewald, and Gerhard Weikum. 2016. Making sense ofentities and quantities in web tables. In
Proceedings of the 25th ACM Internationalon Conference on Information and Knowledge Management . 1703–1712.[20] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas,Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, SörenAuer, et al. 2015. DBpedia–a large-scale, multilingual knowledge base extractedfrom Wikipedia.
Semantic web
6, 2 (2015), 167–195.[21] Oliver Lehmberg and Christian Bizer. 2017. Stitching web tables for improvingmatching quality.
Proceedings of the VLDB Endowment
10, 11 (2017), 1502–1513.[22] Oliver Lehmberg, Dominique Ritze, Robert Meusel, and Christian Bizer. 2016.A large public corpus of web tables containing time and context metadata. In
Proceedings of the 25th International Conference Companion on WWW . 75–76.[23] Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating andsearching web tables using entities, types and relationships.
Proceedings of theVLDB Endowment
3, 1-2 (2010), 1338–1347.[24] Colin Lockard, Xin Luna Dong, Arash Einolghozati, and Prashant Shiralkar.2018. CERES: Distantly Supervised Relation Extraction from the Semi-StructuredWeb.
Proceedings of the VLDB Endowment
11, 10 (June 2018), 1084–1096. https://doi.org/10.14778/3231751.3231758[25] Colin Lockard, Prashant Shiralkar, Xin Dong, and Hannaneh Hajishirzi. 2020.Web-scale Knowledge Collection.
Proceedings of the 13th International Conference on Web Search and Data Mining (2020).[26] Colin Lockard, Prashant Shiralkar, X. Dong, and Hannaneh Hajishirzi. 2020.ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages.In
ACL .[27] Colin Lockard, Prashant Shiralkar, and Xin Luna Dong. 2019. Openceres: Whenopen information extraction meets the semi-structured web. In
Proceedings of the2019 Conference of the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, Volume 1 . 3047–3056.[28] Mausam Mausam. 2016. Open Information Extraction Systems and DownstreamApplications. In
Proceedings of the Twenty-Fifth International Joint Conferenceon Artificial Intelligence (New York, New York, USA) (IJCAI’16) . AAAI Press,4074–4077.[29] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013.Distributed representations of words and phrases and their compositionality. In
Advances in neural information processing systems . 3111–3119.[30] Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic message passingfor generating linked data from tables. In
International Semantic Web Conference .Springer, 363–378.[31] Varish Mulwad, Tim Finin, Zareen Syed, Anupam Joshi, et al. 2010. Using linkeddata to interpret tables. In
Proceedings of the the First International Workshop onConsuming Linked Data .[32] Emir Muñoz, Aidan Hogan, and Alessandra Mileo. 2014. Using linked data tomine RDF from wikipedia’s tables. In
Proceedings of the 7th ACM internationalconference on Web search and data mining . 533–542.[33] Phuc Nguyen, Natthawut Kertkeidkachorn, Ryutaro Ichise, and Hideaki Takeda.2019. MTab: matching tabular data to knowledge graph using probability models. arXiv preprint arXiv:1910.00246 (2019).[34] Kyosuke Nishida, Kugatsu Sadamitsu, Ryuichiro Higashinaka, and YoshihiroMatsuo. 2017. Understanding the semantic structures of tables with a hybriddeep neural network architecture. In
Thirty-First AAAI Conference on ArtificialIntelligence .[35] Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove:Global vectors for word representation. In
Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP) . 1532–1543.[36] Dominique Ritze, Oliver Lehmberg, and Christian Bizer. 2015. Matching htmltables to dbpedia. In
Proceedings of the 5th International Conference on Web Intel-ligence, Mining and Semantics . 1–6.[37] Dominique Ritze, Oliver Lehmberg, Yaser Oulabi, and Christian Bizer. 2016.Profiling the potential of web tables for augmenting cross-domain knowledgebases. In
Proceedings of the 25th International Conference on WWW . 251–261.[38] Yoones A Sekhavat, Francesco Di Paolo, Denilson Barbosa, and Paolo Merialdo.2014. Knowledge Base Augmentation using Tabular Data.. In
LDOW .[39] Adelene YL Sim and Andrew Borthwick. 2018. Record2Vec: unsupervised repre-sentation learning for structured records. In . IEEE, 1236–1241.[40] Bram Steenwinckel, Gilles Vandewiele, Filip De Turck, and Femke Ongenae. 2019.Csv2kg: Transforming tabular data into semantic knowledge.
SemTab, ISWCChallenge (2019).[41] Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, and Takeshi Okadome.2019. Meimei: An efficient probabilistic approach for semantically annotatingtables. In
Proceedings of the AAAI Conference , Vol. 33. 281–288.[42] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is allyou need. In
Advances in neural information processing systems . 5998–6008.[43] Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, FeiWu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on theWeb.
Proceedings of the VLDB Endowment
4, 9 (2011).[44] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborativeknowledgebase.
Commun. ACM
57, 10 (2014), 78–85.[45] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, andS Yu Philip. 2020. A comprehensive survey on graph neural networks.
IEEETransactions on Neural Networks and Learning Systems (2020).[46] Keyulu Xu, Weihua Hu, J. Leskovec, and S. Jegelka. 2019. How Powerful areGraph Neural Networks?
ArXiv abs/1810.00826 (2019).[47] Pengcheng Yin, G. Neubig, W. Yih, and S. Riedel. 2020. TaBERT: Pretraining forJoint Understanding of Textual and Tabular Data. In
ACL .[48] Wenhao Yu, Zongze Li, Qingkai Zeng, and Meng Jiang. 2019. Tablepedia: Au-tomating pdf table reading in an experimental evidence exploration and analyticsystem. In
The World Wide Web Conference . 3615–3619.[49] Li Zhang, Shuo Zhang, and Krisztian Balog. 2019. Table2Vec: neural word andentity embeddings for table population and retrieval. In
Proceedings of the 42ndInternational ACM SIGIR Conference on Research and Development in InformationRetrieval . 1029–1032.[50] Shuo Zhang and Krisztian Balog. 2020. Web Table Extraction, Retrieval, andAugmentation: A Survey.
ACM Transactions on Intelligent Systems and Technology(TIST)
11, 2 (2020), 1–35.[51] Ziqi Zhang. 2017. Effective and efficient semantic table interpretation usingtableminer+.