Generating Coherent and Diverse Slogans with Sequence-to-Sequence Transformer
NNatural Language Engineering (2021), 1–22doi:10.1017/xxxxx
ARTICLE
Generating Coherent and Diverse Slogans withSequence-to-Sequence Transformer
Yiping Jin , Akshay Bhatia , Dittaya Wanvarie , ∗ , and Phu T. V. Le Department of Mathematics and Computer Science, Faculty of Science,Chulalongkorn University, Bangkok, Thailand 10300 ∗ Corresponding author. [email protected] Knorex, 140 Robinson Road,
Abstract
Previous work in slogan generation focused on generating novel slogans by utilising templates mined fromreal slogans. While some such slogans can be catchy, they are often not coherent with the company’s focusor style across their marketing communications because the templates are mined from other companies’slogans. We propose a sequence-to-sequence transformer model to generate slogans from a brief companydescription. A naïve sequence-to-sequence model fine-tuned for slogan generation is prone to introducingfalse information, especially unrelated company names appearing in the training data. We use delexicalisa-tion to address this problem and improve the generated slogans’ quality by a large margin. Furthermore, weapply two simple but effective approaches to generate more diverse slogans. Firstly, we train a slogan gen-erator conditioned on the industry. During inference time, by changing the industry, we can obtain different“flavours” of slogans. Secondly, instead of using only the company description as the input sequence, wesample random paragraphs from the company’s website. Surprisingly, the model can generate meaningfulslogans, even if the input sequence does not resemble a company description. We validate the effective-ness of the proposed method with both quantitative evaluation and qualitative evaluation. Our best modelachieved a ROUGE-1/-2/-L F score of 53.13/33.30/46.49. Besides, human evaluators assigned the gen-erated slogans an average score of 3.39 on a scale of 1-5, indicating the system can generate plausibleslogans with a quality close to human-written ones (average score 3.55).
1. Introduction
An attractive slogan is a key component in (online) advertisements. Slogans not only provide ahighly succinct summary of the advertised product or service, but are also catchy and distinct .Their purpose is to attract viewers’ attention and to encourage them to engage with the adver-tisements (ads) and to visit the advertisers’ landing page a . Human copywriters are specialisedcreative content creators who are responsible for composing slogans. This process requires a deepdomain knowledge on the advertised product or service and is very time-consuming.Another factor we must consider is ads fatigue (Abrams and Vee (2007)). Even when an ad hasgood quality, its effectiveness will decrease over time after the users see the same ad repeatedly. Itmotivates advertisers to deliver highly personalised and contextualised ads (Vempati et al. (2020)).While advertisers can easily provide a dozen alternative images and use different ad layouts to a Landing page is the web page users will be redirected to after clicking on the ad. © Cambridge University Press 2021 a r X i v : . [ c s . C L ] F e b Yiping Jin et al. create new ads dynamically (Bruce et al. (2017)), the advertising messages/slogans usually needto be manually composed.Previous work in automatic slogan generation focused almost exclusively on modifying exist-ing slogans by replacing part of the slogan with new keywords or phrases. This approach ensuresthat the generated slogans are well-formed and linguistically plausible by relying on slogan skele-tons extracted from real slogans. However, viewers may sometimes notice the similarity betweenthe generated slogan and the original one. E.g., the slogan skeleton “The NN of Vintage” expressesthat the advertised product is elegant. It can instantiate “novel” slogans like “The Phone ofVintage” or “The Car of Vintage”, both being well-formed, but the catchiness may decrease oreven have a negative impact on the viewers if they see such similar slogans repeatedly.We highlight another quality of slogans that was rarely discussed in the literature on slo-gan generation - cohesiveness . The slogan should be cohesive with the advertisers’ onlinecommunication style and content, especially with the landing page. Some companies exploit click-bait (Papadopoulou et al. (2017)) to drive more visitors to their web site. However, it will likelydamage their brand image in the long run because viewers might feel being deceived after realis-ing that the landing page has nothing to do with the ad’s message. The approach in previous workrelies on existing slogans from other companies. There is no guarantee that they are cohesive withthe landing page.We propose a sequence-to-sequence (seq2seq) transformer model to generate slogans from abrief company description to address the weakness of previous work. The model generates dis-tinct slogans because we do not use another company’s slogan as a skeleton. We delexicalise thecompany name in both the descriptions and the slogans to prevent the model from inserting unsup-ported information, which proved to improve the generation quality drastically. We validate theeffectiveness of the proposed method with both quantitative evaluation and qualitative evaluation.Our best model achieved a ROUGE-1/-2/-L F score of 53.13/33.30/46.49. Besides, a human eval-uation also revealed that the slogans generated by our model are plausible and have quality closeto human-written slogans.Generating diverse slogans is crucial to avoid ads fatigue and enable personalisation. We pro-pose two approaches to increase the diversity of generated slogans. Firstly, we train a model P ( slogan | description , industry ) conditioned additionally on the industry. During inference time,by changing the input industry, our model can focus on different aspects in the description andgenerate slogans for a different target audience. Secondly, we sample random paragraphs from thelanding page and use them as inputs to the seq2seq model instead of the company’s description.The main contributions of this work are as follows:• Applying a transformer-based encoder-decoder model to generate cohesive slogans from ashort company description and further improving the performance using delexicalisation.• Proposing novel approaches to generate diverse slogans by conditioning on differentindustries and sampled paragraphs on the landing page.• Introducing a simple way to crawl and clean a large corpus for slogan generation andproviding benchmark datasets and a competitive baseline for future work to compare with.
2. Related Work
We review the literature in two related fields: (1) slogan generation, and (2) sequence-to-sequencemodels. atural Language Engineering A slogan is a catchy, memorable, and concise message used in advertising. Traditionally, slogansare composed by human copywriters, and it requires in-depth domain knowledge and creativity.Previous work in automatic slogan mostly focused on manipulating existing slogans by injectingnovel keywords or concepts while maintaining certain linguistic qualities.Özbal et al. (2013) proposed B
RAIN S UP , the first framework for creative sentence generationthat allows users to force certain words to be present in the final sentence and to specify var-ious emotion, domain or linguistic properties. B RAIN S UP generates novel sentences based onmorpho-syntactic patterns automatically mined from a corpus of dependency-parsed sentences.The patterns serve as general skeletons of well-formed sentences. Each pattern contains severalempty slots to be filled in. During generation, the algorithm first searches for the most frequentsyntactic patterns compatible with the user’s specification. It then fills in the slots using beamsearch and a scoring function that evaluates how well the user’s specification is satisfied in eachcandidate utterance.Tomašic et al. (2014) utilised similar slogan skeletons as B RAIN S UP capturing the POS tagand dependency type of each slot. Instead of letting the user specify the final slogan’s propertiesexplicitly, their algorithm takes a textual description of a company or a product as the input andparses for keywords and main entities automatically. They also replaced beam search with geneticalgorithm to ensure good coverage of the search space. The initial population is generated fromrandom skeletons. Each generation is evaluated using a list of ten heuristic-based scoring functionsbefore producing a new generation using crossovers and mutations. Specifically, the mutation isperformed by replacing a random word with another random word having the same POS tag.Crossover chooses a random pair of words in two slogans and switches them. E.g. input: [“Just do it”, “ Drink more milk”] ⇒ [“Just drink it”, “ Do more milk”].Gatti et al. (2015) proposed an approach to modify well-known expressions by injecting anovel concept from evolving news. They first extract the most salient keywords from the news andexpand the keywords using WordNet and Freebase. When blending a keyword into well-knownexpressions, they check the word2vec embedding (Mikolov et al. (2013)) similarity between eachkeyword and the phrase it shall replace to avoid generating nonsense output. Gatti et al. (2015)also used dependency statistics similar to B RAIN S UP to impose lexical and syntactic constraints.The final output is ranked by the mean rank of semantic similarity and dependency scores, thusbalancing the relatedness and grammaticality. In subsequent work, Gatti et al. (2017) applied asimilar approach to modify song lyrics with characterizing words taken from daily news.Iwama and Kano (2018) presented a Japanese slogan generator using a slogan database, caseframes, and word vectors. The system achieved an impressive result in an ad slogan competitionfor human copywriters and was employed by one of the world’s largest advertising agencies.Unfortunately, their approach involves manually selecting the best slogans from ten times largersamples and Iwama and Kano (2018) did not provide any detailed description of their approach.Munigala et al. (2018) introduced a system to generate persuasive sentences from the inputproduct specification in the fashion domain. Their system first identifies fashion-related keywordsfrom the input specifications and expand them to creative phrases. They then synthesise persuasivedescriptions from the keywords and phrases using a large domain-specific neural language model(LM). Instead of letting the LM generate free-form text, the candidates at each time step are limitedto extracted keywords, expanded in-domain noun phrases and verb phrases as well as commonfunctional words. The LM minimises the overall perplexity with beam search. The generatedsentence always begins with a verb to form an imperative and persuasive sentence. Munigalaet al. (2018) demonstrated that their unsupervised system produced better output than a supervisedLSTM encoder-decoder model. However, the encoder-decoder was trained on a much smallerparallel corpus of title text-style tip pairs compared to the corpus they used to train the languagemodel. Yiping Jin et al.
Recently, Alnajjar and Toivonen (2020) proposed a slogan generation system based on gener-ating nominal metaphors. The input to the system is a target concept (e.g., car), and an adjectivedescribing the target concept (e.g., elegant). The system generates slogans involving a metaphorsuch as “The Car Of Stage”, suggesting that the car is as elegant as a stage performance. Theirsystem extracts slogan skeletons from existing slogans. Given a target concept T and a property P , the system identifies candidate metaphorical vehicles b v . For each skeleton s and the (cid:104) T , v (cid:105) pair, the system searches for potential slots that can be filled. After identifying plausible slots, thesystem synthesises candidate slogans optimised using genetic algorithms similar to Tomašic et al.(2014).Misawa et al. (2020) applied a Gated Recurrent Unit (GRU) (Cho et al. (2014)) encoder-decoder model to slogan generation. They argued that good slogans should not be generic butdistinctive towards the target item. To enhance distinctiveness, they used a reconstruction loss (Niuet al. (2019)) by reconstructing the corresponding description from a slogan. A similar approachwas used by Li et al. (2016) to increase the diversity of responses in dialogue systems and avoidalways predicting a “safe” response such as “I don’t know”. Misawa et al. (2020) also employeda copying mechanism (See et al. (2017)) to handle out-of-vocabulary words occurring in the inputsequence. Their proposed model achieved the best ROUGE-L score of 19.38 c , outperformingvarious neural encoder-decoder baselines.Our approach is most similar to Misawa et al. (2020) in that we also employ an encoder-decoderframework. However, we differ from their work in two principled ways. Firstly, we use a moremodern Transformer architecture (Vaswani et al. (2017)), which is the current state-of-the-artarchitecture family and outperforms recurrent neural networks in most language generation andlanguage understanding benchmarks. We do not encounter the problem of generating generic slo-gans and out-of-vocabulary words (due to subword tokenisation). Therefore, the model is greatlysimplified and can be trained using a standard cross-entropy loss. Secondly, we propose simpleyet effective approaches to improve the consistency and diversity of generated slogans drastically. Sutskever et al. (2014) presented a seminal sequence learning framework using multi-layer LongShort-Term Memory (LSTM) (Hochreiter and Schmidhuber (1997)). The framework encodes theinput sequence to a vector of fixed dimensionality, then decodes the target sequence based on thevector. This framework enables learning sequence-to-sequence (seq2seq) d tasks where the inputand target sequence are of a different length. Sutskever et al. (2014) demonstrated that their simpleframework achieved close to state-of-the-art performance in an English to French translation task.The main limitation of Sutskever et al. (2014) is that the performance degrades drastically whenthe input sequence becomes longer. It is because of unavoidable information loss when compress-ing the whole input sequence to a fixed-dimension vector. Bahdanau et al. (2015) and Luonget al. (2015) overcame this limitation by introducing attention mechanism to LSTM encoder-decoder. The model stores a contextualised vector for each time step in the input sequence. Duringdecoding, the decoder computes the attention weights dynamically to focus on different contextu-alised vectors. Attention mechanism overtook the previous state-of-the-art in English-French andEnglish-German translation and yields much more robust performance for longer input sequences.LSTM, or more generally recurrent neural networks, cannot be fully parallelised on modernGPU hardware because of an inherent temporal dependency. The hidden states need to be com-puted one step at a time. Vaswani et al. (2017) proposed a new architecture, the Transformer, b A metaphor has two parts: the tenor (target concept) and the vehicle. The vehicle is the object whose attributes areborrowed. c The result was reported on a Japanese corpus. So it is not directly comparable to our work. d We use sequence-to-sequence and encoder-decoder interchangeably in this paper. atural Language Engineering which is based solely on multi-head self-attention and feed-forward layers. They also introduced“positional encodings”, which are added to the input embeddings to allow the model to use thesequence’s order. The model achieved a new state-of-the-art performance, albeit taking a muchshorter time to train than LSTM with attention mechanism.Devlin et al. (2019) argued that the standard Transformer (Vaswani et al. (2017)) suffers fromthe limitation that they are unidirectional and every token can only attend to previous tokens in theself-attention layers. To this end, they introduced BERT, a pre-trained bidirectional transformer byusing a masked language model (MLM) pre-training objective. MLM masks some random tokenswith a [MASK] token and provides a bidirectional context for predicting the masked tokens.Besides, Devlin et al. (2019) used the next sentence prediction task as an additional pre-trainingobjective.Despite achieving state-of-the-art results on multiple language understanding tasks, BERT doesnot make predictions auto-regressively, reducing its effectiveness for generation tasks. Lewis et al.(2020) presented BART, a model combining a bidirectional encoder (similar to BERT) and anauto-regressive decoder. This combination allows BART to capture rich bidirectional contextualrepresentation, and yield strong performance in language generation tasks. Besides MLM, Lewiset al. (2020) introduced new pre-training objectives, including masking text spans, token deletion,sentence permutation, and document rotation. These tasks are particularly suitable for a seq2seqmodel like BART because there is no one-to-one correspondence between the input and targettokens.Zhang et al. (2020) employed an encoder-decoder transformer architecture similar to BART.They introduced a novel pre-training objective specifically designed for abstractive summarisa-tion. Instead of masking single tokens (like BERT) or text spans (like BART), they mask wholesentences (referred to as “gap sentences”) and try to reconstruct these sentences from their context.Zhang et al. (2020) demonstrated that the model performs best when using important sentencesselected greedily based on the ROUGE-F score between the selected sentences and the remainingsentences. Their proposed model PEGASUS achieved state-of-the-art performance on all 12 sum-marisation tasks they evaluated. Their model also performed surprisingly well on a low-resourcesetting due to the relatedness of the pre-training task and abstractive summarisation.While large-scale transformer-based language models demonstrate impressive text generationcapabilities, users cannot easily control particular aspects of the generated text. Keskar et al.(2019) proposed CTRL, a conditional transformer language model conditioned on control codesthat influence the style and content. Control codes indicate the domain of the data, such asWikipedia, Amazon reviews and subreddits focusing on different topics. Keskar et al. (2019) usenaturally occurring words as control codes and prepend them to the raw text prompt. Formally,given a sequence of the form x = ( x , ..., x n ) and a control code c , CTRL learns the conditionalprobability p θ ( x i | x < i , c ) . By changing or mixing control codes, CTRL can generate novel textwith very different style and content.In this work, we use BART model architecture due to its flexibility as a seq2seq model and com-petitive performance on language generation tasks. We were also inspired by CTRL and applieda similar idea to generate slogans conditioned on different industries.
3. Method
We apply a Transformer-based sequence-to-sequence (seq2seq) model to generate slogans. Themodel’s inputs are a short company description, the company name (used in delexicalised models),and the industry the company belongs to (used in conditional models). This section describes themodel architecture, the fine-tuning procedure, and two methods we used to generate more coherentslogans, namely delexicalisation and conditional language generation.
Yiping Jin et al.
We choose BART-style encoder-decoder model (Lewis et al. (2020)) with a bidirectional encoderand an autoregressive (left-to-right) decoder. BART enjoys the benefit of capturing bidirectionalcontext representation like BERT and is particularly strong in language generation tasks.Following Lewis et al. (2020), we use two different model sizes. For the small model, we usedistilBart e with 6 layers of encoders and decoders each and 230M parameters. The model wasa distilled version of BART-large trained by the HuggingFace team, and its size is equivalent toBART-base. The large model we use is BART-large f . It has 12 layers of encoders and decodersand 406M parameters. To reduce the computation and speed up the iteration time, we use the smallmodel throughout this work except in Section 5.2, where we study the impact of a larger modeland dataset size. We observe that the seq2seq slogan generation from the corresponding description is analo-gous to abstractive summarisation. Therefore, we initialise the model’s weights from a fine-tuned summarisation model on the CNN/DailyMail dataset (Hermann et al. (2015)) instead of from apre-trained model using unsupervised learning objectives. We freeze up to the second last encoderlayer (including the embedding layer) and fine-tune the last encoder layer and the decoder. Wedo not fine-tune the whole model due to two reasons: 1) we do not want the model to unlearn itsability to perform abstractive summarisation 2) by freezing a large portion of the parameters, werequire much less RAM and can train using a much larger batch size.
Lewis et al. (2020) and Matsumaru et al. (2020) highlighted that a a major weakness of trans-former models is introducing unsupported information. We do observe this problem in our initialexperiment. Particularly, the model sometimes inserts an unrelated company name into the slo-gan. Table 1 shows an example where the seq2seq model generated slogans containing irrelevantcompany names. We verified that both Buxfer and Voyant Inc. are real U.S.-based companiesproviding similar financial advisory service as AEON. This type of error is disastrous becausenothing can be worse than generating a plausible slogan and attach it to one of the advertiser’scompetitors.Table 1. : An example of wrongly inserting unrelated company names (in bold). The input to theseq2seq model is the description alone.
Company Name:
AEON Credit Service Malaysia .....................................................................................................................................................................................................................................................
Description:
We offer a range of services including the issuance of Credit Cards, Easy Paymentschemes, Personal Financing and Insurance. .....................................................................................................................................................................................................................................................
Wrong Output:
Credit Cards, Personal Financing and Insurance | BuxferVoyant Inc. | Offering Credit Cards, Rewards, Travel, Offers
A further investigation revealed that both Buxfer and Voyant Inc. are in our training datasetof description-slogan pairs. The problem of inserting unrelated company names is more likely tooccur when the training dataset is relatively small, where the model tends to overfit by memorising. e https://huggingface.co/sshleifer/distilbart-cnn-6-6 f https://huggingface.co/facebook/bart-large-cnn atural Language Engineering We address this problem with a simple but effective treatment - delexicalisation.Delexicalisation replaces surface text with a generic mask token in both the input and targetsequence. After the model generates an output sequence, any mask token is substituted with theoriginal surface text g . Delexicalisation has been used in RNN-based dialogue generation (Wenet al. (2015)). It is critical for handling rare entities that might never occur in the training dataset.Consider a dialogue task of ordering food, the dialogue act request(food=“Tom Yum Kung”) h istrivial for humans. They can say “May I have a TYK?” or “I’d like a TYK”. However, the wordsin “Tom Yum Kung” might not even be in the RNN model’s vocabulary, causing a naïve RNN(without subword tokenisation or copying mechanism) to fail. Even if they are in the vocabulary,the model also tends to assign very low probabilities to these rare words.With delexicalisation, surface texts of various company names are substituted by one token“ [COMPANY] ”. This treatment prevents the model from generating unrelated company namesand allows the model to easily learn diverse contexts around the delexicalised company names inboth the description and the slogan.The company name is readily available in our system because it is required when any newadvertiser registers for an account. However, we notice that companies often use their short-ened names instead of their official/legal name. Examples are “Google LLC” almost exclusivelyreferred to as “Google” and “Prudential Assurance Company Singapore (Pte) Limited” oftenreferred to as “Prudential”. Therefore we use a prefix word matching algorithm to perform delex-icalisation. The process is illustrated in Algorithm 1 (we omit the details handling the case andpunctuations in the company name for simplicity). Besides the delexicalised text, the algorithmalso returns the surface text of the delexicalised company name, which will replace the masktoken during inference. It is also possible to use a more sophisticated approach to perform delex-icalisation, such as relying on a knowledge base or company directory such as Crunchbase tofind alternative company names. However, the simple substitution algorithm suffices our use case.Table 2 shows an example description and slogan before and after delexicalisation. Algorithm 1:
Prefix matching for delexilising company names. This is performed forboth the description and the slogan.
Input: company_name, text, MASK_TOKEN
Result: delexicalised_text, surface_formdelexicalised_text = text;surface_form = company_name + “ ”; while company_name.contains(“ ”) do surface_form = surface_form.substring(0, surface_form.lastIndexOf(“ ”)); if text.contains(surface_form) then delexicalised_text = text.replace(surface_form, MASK_TOKEN);break; endend While slogans are diverse and creative, we hypothesise that there are some common style orfocus in slogans of companies from the same industry. For example, automotive dealers men-tion their location much more often than a software company; a health care provider is more g If we do not want the company name to appear in the slogan, we can remove the [COMPANY] token if it does not appearin the same clause as the slogan. E.g. “ [COMPANY] | [SLOGAN] ” or “ [SLOGAN] - [COMPANY] ”. h Tom Yum Kung (TYK) is a spicy and sour Thai soup cooked with shrimp. One of the favourite dishes in Thailand.
Yiping Jin et al.
Table 2. : An example description and slogan before and after performing delexicalisation
Company Name:
Huawei Technologies Group Co., Ltd .......................................................................................................................................................................................................................................
Description:
Huawei is a leading global provider of information and communica-tions technology (ICT) infrastructure and smart devices.
Slogan:
Huawei - Building a Fully Connected, Intelligent World .......................................................................................................................................................................................................................................
Surface Form:
Huawei
Delexicalised Description: [COMPANY] is a leading global provider of information and commu-nications technology (ICT) infrastructure and smart devices.
Delexicalised Slogan: [COMPANY] - Building a Fully Connected, Intelligent World likely to mention their accreditation than a hotel. Table 3 exemplifies typical slogans for differentindustries. Table 3. : Typical slogans for different industries
Industry Company Name Slogan
Automotive Mills Body Shops Auto Repair and Body Shop Evansville, IN .................................................................................................................................................................................................................................................
Automotive Apex Car Rental Affordable Car Hire & Van Hire In Norwich .................................................................................................................................................................................................................................................
ComputerSoftware Limine Solutions Limine - Next Generation Evidence Management andPresentation Software .................................................................................................................................................................................................................................................
ComputerSoftware SoftWorks AI OCR and Document Capture Experts .................................................................................................................................................................................................................................................
Hospital &Health Care Animas Surgical Hospital Physician-Owned Hospital in Durango, CO .................................................................................................................................................................................................................................................
Hospital &Health Care Today’s Vision TX Optometrists Providing Eyeglasses, Contact Lenses &Eye Exams .................................................................................................................................................................................................................................................
Hospitality Savage River Lodge Luxury cabins and yurts in Western Maryland .................................................................................................................................................................................................................................................
Hospitality Thai Garden Resort A Tropical Paradise in Pattaya
To exploit the industry information, we modify the generation from P ( slogan | description ) to P ( slogan | description , industry ) by conditioning on the industry additionally. We prepend theindustry name to the input sequence similar to the control code in CTRL (Keskar et al. (2019))without a special token separating the industry and the input sequence (like the [SEP] tokenin BERT). Our method differs from Keskar et al. (2019) in two slight ways: 1) CTRL uses anautoregressive transformer similar to GPT-2 (Radford et al. (2019)) while we use an encoder-decoder transformer with a bi-directional encoder. 2) The control codes were used during pre-training in CTRL while we prepend industry names only during fine-tuning for slogan generation.
4. Datasets
While there are some online slogan databases such as Textart.ru i and Slogans Hub j , they containat most hundreds to thousands of slogans, which are too few to form a training dataset, espe-cially for a general slogan generator not limited to a particular domain. Besides, these databasesdo not contain company descriptions. Some even provide a list slogans without specifying theircorresponding company or product. They might be used to train a language model producing i j https://sloganshub.org/ atural Language Engineering slogan-like utterance (Boigne (2020)), but it will not be of much practical use because we do nothave control over the generated slogan’s content.We observe that many companies’ websites use their company name plus their slogan as theHTML page title (displayed when mousing over the tab in a browser). Examples are “Skype | Communication tool for free calls and chat” and “Virgin Active Health Clubs - Live Happily EverActive”. Besides, many companies also provide a brief description in the “description” field inthe HTML < meta > tag k . Therefore, our model’s input and output sequence can potentially becrawled from companies’ URLs.We crawl the title and description field in the HTML < meta > tag using the Beautiful Souplibrary l from the first 4.8 million company URLs in the Kaggle 7+ Million Company Dataset m .The dataset provides many additional fields, but we utilise only the company name, URL andindustry in this work. The crawling took around 30 days to complete using a cloud instance withtwo vCPUs. Out of the 4.8M companies, we were able to crawl both the < meta > tag descriptionand the page title for 1.13M companies, which is at least three orders of magnitude larger than anypublicly-available slogan database. However, this dataset contains much noise due to the apparentreason that not all companies include their slogan in their HTML page title. We perform thefollowing steps to clean/filter the noise in the dataset:(1) Delexicalise the company name in both the description and the slogan (HTML page title).(2) Deduplicate the slogans and keep only the first occurring company if multiple companieshave the same slogan.(3) Remove all non-alphanumeric characters at the beginning and the end of the slogan.(4) Filter by blocked keywords/phrases. Sometimes the crawling is blocked by a firewall, andthe returned title is “Page could not be loaded” or “Access to this page is denied”. We dida manual analysis of a large set of HTML page titles and came up with a list of 50 suchblocked keywords/phrases.(5) Remove prefix or suffix phrases indicating the structure in the website, such as “Homepage- ”, “ | Welcome page”, “About us”.(6) Filter based on the length of the description and the slogan. The slogan must containbetween 20 and 100 characters while the description must contain at least 30 characters.When counting the length, the company name is replaced by an empty string. It is neededbecause some page titles contain only the company name. If the company name is long, itmight pass the filter.(7) Concatenate the description and the slogan and detect their language using an open-sourcelibrary n . We keep the data only if its detected language is English.(8) Filter based on lexicographical features, such as the occurrence of “ | ” must not exceed one,the total punctuations must not exceed three, the company name must not be mentionedmore than once, and the longest word sequence without any punctuation must be at leastfour words. We come up with these rules based on an analysis of a large number of HTMLpage titles.(9) Filter based on named entity tags. We use Spacy o to perform named entity recognition.Many companies’ HTML page titles contain a long list of locations where they operate. k l m n https://github.com/shuyo/language-detection o https://spacy.io/ (a) Slogans (b) Descriptions Figure 1 : Distribution of the number of tokens in (a) slogans, and (b) descriptions
Figure 2 : Distribution of the number of companies belonging to each industry in log-10 scaleWe discard a title if over 30% of its surface text consists of named entities with the tag“
GPE ”.The total number of (description, slogan) pairs after all the cleaning and filtering steps are261k. We reserve roughly 2% of the dataset for validation and test each. The remaining 96% isused for training. The validation set contains 5,011 pairs. For the test set, the first author of thispaper manually curated 1,000 companies with their descriptions and slogans, ensuring the slogansare clean and plausible. We publish our validation dataset and manually-curated test dataset forfuture comparisons p . The full training dataset D f ull contains 251k companies. To reduce thecomputation, we randomly sample a smaller training dataset containing 121k companies, denotedas D k to evaluate different models in Section 5.1.We perform some data analysis on D k dataset to better understand the data. We first tokenisethe dataset with distilBART’s default tokeniser (it uses subword tokenisation). Figure 1 shows thedistribution of the number of tokens in slogans and descriptions. Based on the distribution, wechoose a maximum sequence length of 64 for the description and 32 for the slogan.There are 149 unique industries in the dataset. Figure 2 shows the distribution of the numberof companies belonging to each industry in a log-10 scale. As we can see, most industries containbetween 10 and 10 . companies. Table 4 shows the most frequent ten industries with the numberof companies and the percentage in the D k dataset. p https://github.com/YipingNUS/slogan-generation-dataset atural Language Engineering Table 4. : The most frequent ten industries in D k dataset Industry
Information Technology and Services 9,539 7.9% ..............................................................................................................................................................
Computer Software 5,749 4.7% ..............................................................................................................................................................
Marketing and Advertising 5,581 4.6% ..............................................................................................................................................................
Real Estate 4,358 3.6% ..............................................................................................................................................................
Internet 3,836 3.2% ..............................................................................................................................................................
Financial Services 3,590 3.0% ..............................................................................................................................................................
Construction 3,528 2.9% ..............................................................................................................................................................
Automotive 3,433 2.8% ..............................................................................................................................................................
Hospital & Health Care 3,105 2.6% ..............................................................................................................................................................
Hospitality 2,735 2.3%
5. Experiments
We conduct a comprehensive evaluation of our proposed method. In Section 5.1, we conduct aquantitative evaluation and compare our proposed methods with other rule-based and encoder-decoder baselines in terms of ROUGE -1/-2/-L F scores. In Section 5.2, we scale up the modeland the dataset and study the impact on the performance. We conduct a human evaluation inSection 5.3 to further validate the quality of the slogans generated by our model.We use the distilBART and BART-large implementation in the Hugging Face library (Wolfet al. (2019)). We use a training batch size of 64 for distilBART and 32 for BART-large, which arethe maximum batch sizes we can fit in an Nvidia Quadro P5000 GPU (16 GB vRAM). We use acosine decay learning rate with warm-up (He et al. (2019)) and a maximum learning rate of 1e-4.The learning rate is chosen with Fastai’s learning rate finder (Howard and Gugger (2020)).We train all BART models for three epochs. Based on our observation, the models convergewithin around 2-3 epochs. To ensure the results are reproducible, we use greedy decoding dur-ing evaluation where at each time step, the model deterministically outputs the token with themaximum probability. We also add a repetition penalty θ = . Besides seq2seq transformers with or without delexicalisation and conditional generation, wecompare with four baselines that do not require additional input other than a short descriptionin natural language. In contrast, Özbal et al. (2013) requires explicitly specify constraints suchas keywords to be included or specific linguistic features. Gatti et al. (2015) requires externalnews articles and Alnajjar and Toivonen (2020) takes a target concept and a describing adjectiveas input. While some of our models also use the company name or the industry as input, theyare readily available in both the dataset and real-world advertising systems where the system isapplied.• first-k words : taking the first- k words from the description and predict as the slogan. Wechoose k that yields the highest ROUGE-1 F score on the validation dataset. This is similarto the first sentence baseline, which is simple but surprisingly competitive (Katragadda et al. (2009)). Since many short descriptions contain a single sentence, we select first k wordsinstead.• Misawa et al. (2020): a GRU encoder-decoder model for slogan generation with additionalreconstruction loss to generate distinct slogans and copying mechanism to handle unknownwords. We follow the hyper-parameters in Misawa et al. (2020) closely. Specifically, themodel has a single layer for both the bi-directional encoder and the auto-regressive decoder.We apply a dropout of 0.5 between the two layers. The embedding dimension and the hiddendimension are 200 and 512 separately, and the vocabulary contains 30K most frequent words.We use the Spacy library to perform word tokenisation. The embedding matrix is randomlyinitialised and trained jointly with the model. We use Adam optimiser with a learning rate of1e-3 and train for 10 epochs (The GRU enc-dec models take more epochs to converge thanthe transformer models, likely because the models are randomly initialised).• Encoder-Decoder (Bahdanau et al. (2015)): a strong and versatile baseline for sequencelearning tasks. We use identical hyper-parameters as Misawa et al. (2020) and remove thereconstruction loss and copying mechanism to make the two models directly comparable.•
Encoder-Decoder + copy (See et al. (2017)): encoder-decoder model with copying mechanismto handle unknown words. Equivalent to Misawa et al. (2020) with the reconstruction lossremoved.Table 5 presents the ROUGE -1/-2/-L scores of various models on both the validation and themanually-curated test dataset.Table 5. : The ROUGE F scores for various models on the validation dataset and the test dataset.Best ROUGE scores and scores within 0.15 of the best numbers are bolded (following Zhang et al.(2020)). Valid Dataset Test DatasetR1 R2 RL R1 R2 RLFirst k words ( k = ) 38.53 21.40 34.03 37.08 20.00 32.89Misawa et al. (2020) 19.35 5.98 18.36 19.32 5.86 18.36Encoder-Decoder 16.45 4.89 15.58 16.74 4.67 15.77Encoder-Decoder+copy 17.27 4.65 16.37 17.22 4.48 16.31 .............................................................................................................................................................................................................................. distilBART 48.56 28.05 42.32 42.12 21.30 36.43distilBART+delexicalisation distilBART+delexicalisation+conditional The first- k words baseline achieved a reasonable performance, showing that there is certaindegree of overlap between slogans and descriptions. However, its performance lagged behind anyof the seq2seq transformer models. Figure 3 shows how the first- k words baseline’s ROUGE F scores change by varying k . It is obvious that not the larger k the better. The best ROUGE scoresare achieved when k is in the range (9, 12).Comparing the three GRU encoder-decoder baselines, it is clear both the copying mechanismand the reconstruction loss Misawa et al. (2020) employed improved the models’ ROUGE-1 and ROUGE-L scores substantially. However, their performance pales comparing with anytransformer-based model or even the first- k words baseline. Table 6 provides a more intuitiveoverview of the various models’ performance by showing the generated slogans from the sameinputs. atural Language Engineering Figure 3 : The ROUGE -1/-2/-L scores of the first k word baseline by predicting the first k words inthe description as the slogan.Table 6. : Sample generated slogans from various systems. “Gold” is the original slogan of thecompany. The distilBART model uses neither delexicalisation nor conditional generation, so itsinput is precisely the same as the other two baselines.
Gold:
Our strategies are proven to create and improve culture
First-k words:
Discover how FireSeeds can grow and scale your organization by helping
Misawa et al. (2020):
The World’s Leading Independent Brand Communications Agency distilBART:
FireSeeds | Recruiting, Growth & Retaining Leaders in a Culture That Works ....................................................................................................................................................................................................................................................
Gold:
Cross-Border Payments Made Simple
First-k words:
A FinTech enabling cross-border home and commercial payments through a state-of-the-art
Misawa et al. (2020):
Smart Payment Solutions for Everyone distilBART:
FinTech - A state-of-the-art platform for cross border payments ....................................................................................................................................................................................................................................................
Gold:
Home of ED Triage Courses and Consulting for Process Improvement
First-k words:
ED triage education and consulting tailored to your exact needs! Triage
Misawa et al. (2020):
Education | Consulting and Consulting Services distilBART:
ED triage education and consulting tailored to your exact needs! | Medi-Eds ....................................................................................................................................................................................................................................................
Gold:
Bright Ideas To Make Life Easier
First-k words:
Spring Chicken offer a range of living aids that help you
Misawa et al. (2020):
Senior Living | Senior Living | Spring Chicken distilBART:
Spring Chicken | Living Aids That Help You Live Life To The Fullest ....................................................................................................................................................................................................................................................
Gold:
Commercial Property Investment Advisors - Michael Elliott
First-k words:
Over £50bn of real estate transacted since 1985 in London commercial
Misawa et al. (2020): | Real Estate Agents in Reading , London distilBART:
London Commercial Property Investment Advisors | The Riggs Group
We can observe that while first- k words baseline sometimes has substantial word overlap withthe original slogan, its style is often different from slogans. Misawa et al. (2020) occasionallygenerates novel slogans such as the second example in Table 6. However, it suffers from generatingrepetitions (the third and the fourth example) and irrelevant slogans (the first example). In the firstexample, the company is a human resource company. None of the unigrams in the generated slogan by Misawa et al. (2020) occur in the description. The model likely generates this sloganbecause there are many advertising companies in the training data. Compared to Misawa et al.(2020), distilBART generates more specific and coherent slogans. It copies part of the phrasesfrom the description when appropriate and composes novel slogans in other cases. Its output isalso more fluid and rarely contains repetitions. In the third and fifth example, the model outputs awrong company name, which can be avoided by the delexicalised model.Delexicalisation improved the ROUGE scores on the test set drastically by 8-10%. Conditionallanguage generation yielded a further marginal but consistent improvement. The best distilBARTmodel achieved a ROUGE -1/-2/-L score of 51.68/31.36/44.35. Following Lewis et al. (2020) and Zhang et al. (2020), we present the performance of a largermodel, BART-large. Additionally, we also scale up the training dataset size by using D f ull , whichis roughly double the size as D k .Table 7. : The ROUGE F scores by scaling up the model size and the training dataset size. Allmodels use both delexicalisation and are conditioned on the industry. Valid Dataset Test Datasetparams epoch time R1 R2 RL R1 R2 RLdistilBART, D k ....................................................................................................................................................................................................................................... BART-large, D k D full D full Our result seems to confirm the recent trend that the larger the model size and training data size,the better the performance (Xia et al. (2020)). However, in most cases, distilBART and BART-large output very close if not identical slogans. Considering that BART-large consumes twice asmuch RAM and takes almost twice as much time for both training and inference, the user mightprefer a more compact distilBART model if they are willing to sacrifice roughly 1% ROUGEscore. Our experiments also provide insights such as if the training time is of concern, increasingthe model size yields a more significant improvement than increasing the dataset size (comparingrow 2 and row 3 in Table 7).
We conducted a human evaluation on the generated slogans and the company’s original slogansto further validate the generated slogans’ quality. We randomly sampled 300 companies from thetest dataset (recall that the test set has been manually-curated to ensure they consist of plausibleslogans) and invited two human annotators to score the slogans independently. The two annotatorsboth completed their undergraduate study from a U.S. university and are proficient with English.They were not involved in developing the models in this work.We use BART-large trained on D f ull in Section 5.2 to generate slogans. The annotators scoredthe 600 slogans (300 original slogans + 300 generated slogans) on a 1-5 scale, with higher beingbetter. The original and generated slogan for the same company were displayed consecutively toimprove intra-rater reliability. However, we randomised the two slogans’ order so the annotatorscannot guess which slogan is system generated. The annotation guideline is shown in Appendix Aand the annotation UI is presented in Appendix B. atural Language Engineering (a) (b) Figure 4 : (a) Confusion matrix of scores assigned by the two annotators, and (b) Histogram of theabsolute difference between the scores assigned by the two annotators.Figure 4a shows the confusion matrix of the scores assigned by the two annotators for all 600slogans. While over 80% of the time, the scores they assigned differ by no more than 1 (Figure 4b),they do not always assign precisely the same score. It is likely because of the subjective natureof the task. A slogan can be perfectly clear and concise. An annotator might assign it a score of5 while the other might assign 4 because he thinks it is not catchy enough. We believe differentopinions of the annotators should be acknowledged instead of avoided. The overall inter-annotatoragreement is 42%, while the mean absolute deviation between the two annotators is 0.77. Wediscard the 13 slogans which the scores assigned by the two annotators differ by 3 because theyare likely to be unreliable. There are 288 remaining companies with both original and generatedslogans.Figure 5 plots the score of the original slogan against the score of the generated slogan. Theradius represents the count while the colour indicates which slogan has a higher score. We canobserve that annotator 2 tends to assign scores on a broader range while annotator 1 assigns mostslogans a score between 3 and 4. Both annotators tend to assign more 5s to the original slo-gans, which is why the means score for original slogans S orig is slightly higher than the generatedslogans S bart .We take the average of the two annotators’ scores as the final score for each slogan. The finalaverage scores are S orig = .
55, and S bart = .
39. Table 8 shows the examples where the originaland the generated slogans have the largest score gap. A general observation is that the BARTmodel often generates a noun phrase as the slogan, such as “A leading independent stockbroker in the UK” and “Multi-Currency Travel
Mastercard ” (the noun phrase heads are highlighted). Itis likely because most slogans in our dataset are of this form. However, we do notice the modelcapturing catchy expressions like “The Future of [placeholder] is ...” and generating sloganssuch as “The Future of Disinfection is Here!”, “The Future of Loyalty is Now”, and “The Futureof AI is Here!”.We further conducted a paired t-test and obtained a p -value of 0.0062. According to the signif-icance level of p < .
01, the difference in the scores is statistically significant, which means ourmodel’s output slogans still do not reach human-level yet. Nevertheless, Figure 5 shows that theannotators assign same or higher scores to the system-generated slogans in many cases, and thegap is not very large. (a) Annotator 1: S orig = . S bart = .
43 (b) Annotator 2: S orig = . S bart = . Figure 5 : Scatter plot of scores for the original and the generated slogans. The radius represents thecount. Green dots denote generated slogans receive higher scores than the original ones. Red dotsdenote original slogans receive higher score while orange dots denote the two receive the samescore.Table 8. : Examples where either the original or the generated slogans are superior by a largemargin. The company names are removed from both original and generated slogans.
Original Slogan Generated Slogan S orig : S bart Great Performances Start With Great Service West Michigan’s Full Service Backline Rental 5 : 1.5Pragmatic Solutions to Recycling Today Sustainably reducing, reuse and recyclingwaste materials in the environment 5 : 2Your place of work, our AREA of expertise Your place of work, our (COMP 4 : 1Convenient, secure and cheap currencyexchange Multi-Currency Travel Mastercard and SmartApp 5 : 3The ............................................................................................................................................................................................................................................................
Types of Flowers, Meaning of Rose Colors &More The World’s Largest Flower Resource 2 : 5Vertem is a highly experienced, Defaqto 5 starrated, independent stockbroker A leading independent stockbroker in the UK 1.5 : 4.5Eco-Friendly Disinfectants That Work WithNature The Future of Disinfection is Here! 2.5 : 5Life Changing Adventures & SustainableShopping in Pembrokeshire Pembrokeshire’s Leading SustainableAdventure Centre 3 : 5Connecting tenants and landlords Rent your apartment online in seconds, nothours! 3 : 5
6. Generating Diverse Slogans
Generating diverse slogans is crucial to avoid ads fatigue and enable personalisation. While ourmodel can generate high-quality slogans from descriptions, given one description, the differentgenerated slogans tend to be similar to each other, such as replacing some words or using a slightlydifferent expression. We can increase the diversity naïvely by increasing the temperature duringgeneration. However, it will likely decrease the quality of generated slogans. We propose twoalternative approaches to increase diversity. Firstly, we tap on the conditional language generationintroduced in Section 3.4. Secondly, we sample random paragraphs from the landing page and usethem as inputs to the seq2seq model instead of the company’s description. atural Language Engineering In Section 3.4, we train a model P ( slogan | description , industry ) conditioned additionally on theindustry. It allows us to change the industry during inference time so the model can generatecustomised slogans for different industries by focusing on different aspects of the description.To verify this capability, we experiment with some companies that are related to multiple indus-tries. We present sample company descriptions, the industries we specified and the correspondinggenerated slogans in Table 9. We use greedy generation to ensure the diversity is not due tosampling. Table 9. : Sample generated slogans conditioned on different industries Desc:
SoftClinic is an end-to-end hospital management software that helps physicians, clinics, and hos-pitals to make their practice paperless. We tailor customized hospital management information systemsand clinic management systems that best suit the unique requirement of each hospital. We provide scal-able hospital software with different modules like Laboratory, Pharmacy, HR, Inventory, and Accounting;thereby digitizing the basic management and record-keeping and making management tasks easier.SoftClinic has made global marks in a very short span of time and is currently being used in 45 countriesby more than 25,000 physicians across all verticals of the healthcare industry
Industry:
Hospital & Health Care
Output:
SoftClinic | Hospital Management Software for Physicians, Clinics, and Hospitals
Industry:
Computer Software
Output:
SoftClinic | End-to-End Hospital Management Software ....................................................................................................................................................................................................................................................
Desc:
At Honda, The Power of Dreams drives us to create intelligent products that enhance mobilityand increase the joy in people’s lives. We continue to draw inspiration from the visionary ideas of ournamesake, Soichiro Honda. He saw incredible power in the freedom of mobility and used his imaginationto change the world. It is this mindset that guides us to help move people forward and protect the futurewe all share.
Industry:
Transportation/Trucking/Railroad
Output:
Honda | The Power of Dreams - Mobility Solutions for the Futures
Industry:
Automotive
Output:
Honda | The Power of Dreams - Inspiring the joy in people’s lives ....................................................................................................................................................................................................................................................
Desc:
Find your fit with Fitbit’s family of fitness products that help you stay motivated and improve yourhealth by tracking your activity, exercise, food, weight and sleep.
Industry:
Computer Software
Output:
Fitbit | The Family of Fitness Products for Better Health
Industry:
Health, Wellness and Fitness
Output:
Fitbit | The Best Way to Stay Motivated and Better Fit
The result is promising as the model managed to learn what aspects are important for eachindustry. E.g., for the first company “SoftClinic”, when conditioned on “Hospital & HealthCare”, the slogan emphasises that the software serves “Physicians, Clinics, and Hospitals”. Onthe other hand, when conditioned on “Computer Software”, the slogan focuses on the software’sfunctionality (“end-to-end”).
There is yet another more straightforward way to generate diverse slogans, that is to vary theinput sequence. We initially planned to mine “description-like” text related to the company otherthan the < meta > tag description as detailed in Section 4. However, we observed that the modelcould generate plausible slogans with input sequences that are not necessarily descriptions. We useBeautiful Soup library to crawl the text from the landing page URL, then split it to chunks based on our model’s maximum input sequence length and use random text chunks as input instead of thecompany’s description. Table 10 shows the generated slogans for two digital marketing companiesfrom their description and web page. For the slogans generated from the description, we use abeam size of 10 and sample among the top 50 candidate tokens with the highest probability ateach time step. For the slogans generated from the web page, we use greedy decoding. We can seethat the slogans generated from texts on the advertiser’s web page are much more diverse. Theyalso do not sacrifice quality or relevance.Table 10. : Sample generated slogans from the description and paragraphs from the landing page Company:
Knorex
Desc:
Cross-channel marketing cloud platform augmented by Machine Learning. Single dashboard toautomate and optimize all your campaigns across digital marketing funnels in one place.
Page URL:
Desc slogans:
XPO Marketing Automation Platform for Digital Marketing | KnorexXPO Marketing Automation Platform | KnorexXPO Marketing Automation Platform for Agencies, Big and Small | XPOXPO | AI-Powered Marketing Automation Platform for Agencies & SMBsXPO | AI-Powered Digital Marketing Automation Platform for Agencies
URL slogans:
Knorex Automated Media Buying (i.e) Automation Platform for Audience EngagementKnorex - AI-powered ad budgeting and placement solutionsXPO | The Data Integration Platform for Sales, Marketing, and Marketing TeamsXPO as a Marketing Cloud Platform | KnorexKnorex - The Most Trusted Automation Platform for Media Audience Intelligence ....................................................................................................................................................................................................................................................
Company:
GSM
Desc:
GSM specializes in automotive digital marketing solutions designed to drive leads and revenue byincreasing the online visibility of your dealership.
Page URL:
Desc slogans:
GSM’s Display Advertising | Digital Advertising for DealershipsGSM | Digital Advertising for Automotive DealershipsGSM’s Display Advertising - Digital Advertising for DealershipsGSM’s Display Advertising | Digital Advertising Made EasyGSM’s Display Advertising | Digital Advertising and Digital Marketing Agency
URL slogans:
GSM | Market Research and Market Intelligence for BusinessGSM | Digital Marketing Tips To Drive Your Business ForwardGSM | The Most Powerful Digital Marketing Platform for Buyer DataGSM | Virtual, Online Billboard Advertising for Auto DealershipsGSM | Automotive Marketing and Strategy Agency
7. Conclusion
In this work, we model slogan generation as a sequence generation problem using a sequence-to-sequence transformer model with the company’s description as input. It ensures the cohesiveness atural Language Engineering between the generated slogan and the company’s marketing communication. We further applieddelexicalisation to improve the generated slogans’ quality drastically. In addition, we introducedtwo simple but effective approaches to improve the diversity of the slogans, namely generatingslogans conditioned on different industries and different paragraphs from the advertiser’s webpage. Our best model achieved a ROUGE -1/-2/-L F score of 53.13/33.30/46.49 on a manually-curated slogan dataset. A Human evaluation further validated that the slogans generated by oursystem are plausible and have a quality close to human-written slogans. Acknowledgement
Yiping is supported by the scholarship from “The 100 th Anniversary Chulalongkorn UniversityFund for Doctoral Scholarship” and also “The 90 th Anniversary Chulalongkorn University Fund(Ratchadaphiseksomphot Endowment Fund)”. We are also grateful to our colleague Bao Dai forproviding the initial slogan dataset. We would like to thank our colleagues Khang Nguyen andHy Dang for conducting the manual evaluation on the slogans. The transformer models in thiswork were trained using Paperspace Gradient’s free P5000 cloud GPUs. We are grateful for theirgenerosity for providing the platform.
References
Abrams, Z. and Vee, E.
Proceedings ofthe International Workshop on Web and Internet Economics , pp. 535–540, Bangalore, India. Springer.
Alnajjar, K. and Toivonen, H.
Natural Language Engineering , pp. 1–33.
Bahdanau, D. , Cho, K. , and Bengio, Y. Proceedings of the 3rd International Conference on Learning Representations , San Diego, CA, USA.
Boigne, J. https://jonathanbgn.com/gpt2/2020/01/20/slogan-generator.html . Accessed: 2020-01-14.
Bruce, N. I. , Murthi, B. , and Rao, R. C. Journal of marketing research , 54(2):202–218.
Cho, K. , van Merriënboer, B. , Gulcehre, C. , Bahdanau, D. , Bougares, F. , Schwenk, H. , and Bengio, Y. Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing , pp. 1724–1734, Doha, Qatar.
Devlin, J. , Chang, M.-W. , Lee, K. , and Toutanova, K. Proceedings of the 2019 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pp. 4171–4186,Minneapolis, Minnesota.
Gatti, L. , Özbal, G. , Guerini, M. , Stock, O. , and Strapparava, C. Proceedings of the 24th International Joint Conference on Artificial Intelligence , pp. 2452–2458, Buenos Aires, Argentina.
Gatti, L. , Özbal, G. , Stock, O. , and Strapparava, C. Proceedings of the 15thConference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers , pp.298–304, Valencia, Spain.
He, T. , Zhang, Z. , Zhang, H. , Zhang, Z. , Xie, J. , and Li, M. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 558–567, LongBeach, CA, USA.
Hermann, K. M. , Kocisky, T. , Grefenstette, E. , Espeholt, L. , Kay, W. , Suleyman, M. , and Blunsom, P. Hochreiter, S. and Schmidhuber, J.
Neural computation , 9(8):1735–1780.
Howard, J. and Gugger, S.
Information , 11(2):108.
Iwama, K. and Kano, Y.
Proceedings ofthe 11th International Conference on Natural Language Generation , pp. 197–198, Tilburg, The Netherlands.
Katragadda, R. , Pingali, P. , and Varma, V. Proceedings of the Third International Workshop on Cross Lingual Information Access:Addressing the Information Need of Multilingual Societies (CLIAWS3) , pp. 46–52, Boulder, Colorado.
Keskar, N. S. , McCann, B. , Varshney, L. , Xiong, C. , and Socher, R. Model for Controllable Generation. arXiv preprint arXiv:1909.05858 . Lewis, M. , Liu, Y. , Goyal, N. , Ghazvininejad, M. , Mohamed, A. , Levy, O. , Stoyanov, V. , and Zettlemoyer, L. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pp. 7871–7880.
Li, J. , Galley, M. , Brockett, C. , Gao, J. , and Dolan, B. Proceedings of the 2016 Conference of the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies , pp. 110–119, San Diego, CA, USA.
Luong, M.-T. , Pham, H. , and Manning, C. D. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pp. 1412–1421, Lisbon,Portugal.
Matsumaru, K. , Takase, S. , and Okazaki, N. Proceedings of the58th Annual Meeting of the Association for Computational Linguistics , pp. 1335–1346.
Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. S. , and Dean, J. Misawa, S. , Miura, Y. , Taniguchi, T. , and Ohkuma, T. Proceedings of Workshop on Natural Language Processing in E-Commerce , pp. 87–97, Barcelona, Spain.
Munigala, V. , Mishra, A. , Tamilselvam, S. G. , Khare, S. , Dasgupta, R. , and Sankaran, A. Companion Proceedings of the The Web Conference 2018 , pp.335–342, Lyon, France.
Niu, X. , Xu, W. , and Carpuat, M. Proceedings of the 2019 Conference of the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pp. 442–448, Minneapolis, Minnesota,USA.
Özbal, G. , Pighin, D. , and Strapparava, C. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pp.1446–1455, Sofia, Bulgaria.
Papadopoulou, O. , Zampoglou, M. , Papadopoulos, S. , and Kompatsiaris, I. arXiv preprint arXiv:1710.08528 . Radford, A. , Wu, J. , Child, R. , Luan, D. , Amodei, D. , and Sutskever, I. OpenAI blog , 1(8):9.
See, A. , Liu, P. J. , and Manning, C. D. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pp.1073–1083, Vancouver, Canada.
Sutskever, I. , Vinyals, O. , and Le, Q. V. Tomašic, P. , Znidaršic, M. , and Papa, G. Proceedings of 5th InternationalConference on Computational Creativity , volume 301, pp. 340–343, Ljubljana, Slovenia.
Vaswani, A. , Shazeer, N. , Parmar, N. , Uszkoreit, J. , Jones, L. , Gomez, A. N. , Kaiser, Ł. , and Polosukhin, I. Advances in neural information processing systems , pp. 5998–6008, Long Beach, CA, USA.
Vempati, S. , Malayil, K. T. , Sruthi, V. , and Sandeep, R. Fashion Recommender Systems , pp. 25–48. Springer.
Wen, T.-H. , Gasic, M. , Mrkši´c, N. , Su, P.-H. , Vandyke, D. , and Young, S. Proceedings of the 2015 Conference on Empirical Methodsin Natural Language Processing , pp. 1711–1721, Lisbon, Portugal.
Wolf, T. , Debut, L. , Sanh, V. , Chaumond, J. , Delangue, C. , Moi, A. , Cistac, P. , Rault, T. , Louf, R. , Funtowicz, M. , andothers ArXiv , pp. arXiv–1910.
Xia, P. , Wu, S. , and Van Durme, B. Proceedings of the2020 Conference on Empirical Methods in Natural Language Processing , pp. 7516–7533.
Zhang, J. , Zhao, Y. , Saleh, M. , and Liu, P. Proceedings of the International Conference on Machine Learning , pp. 11328–11339. PMLR.
Appendix A. Slogan Annotation Guideline for Human Evaluators
You will be shown two slogans for the same company in sequence at each time. One of them is written by the companythemselves, and another is generated by an AI model. For each slogan, please rate on a scale of 1 (poor) to 5 (great). Pleaseensure your rating standard is consistent both among the two candidates and across different companies. atural Language Engineering Please note that the order of the two slogans is randomly shuffled. So you should not use the order information to make ajudgement. The website URL of the company is also provided. You are not required to check the website. However, in caseyou doubt if the slogan is relevant to the company, you can visit the website to verify.When scoring the slogans, please pay attention to the following aspects:
Succinct : A slogan should be succinct. It doesn’t have to be a complete sentence (although it can be). If two candidatescontain roughly the same information, prefer the shorter one. So slogan A below is better than slogan B.Slogan A: Knorex | Performance Precision MarketingSlogan B: Knorex is a provider of performance precision marketing solutions
Specific : A slogan should be specific, ideally highlighting the unique selling point (USP) instead of a covering description.So slogan B below is better than slogan A.Slogan A: The Best Hospital in JerseySlogan B: Award-Winning, Accredited Hospital in Jersey
Catchy : A slogan should be catchy and memorable. Examples are using metaphor, humour or creativity. So slogan Abelow is better than slogan B (for the company M&Ms). However, creativity should not undermine clarity and relevance tothe company. In case you need to visit the company’s URL, please do so to help you make a better judgement.Slogan A: Melts in Your Mouth, Not in Your HandsSlogan B: Multi-Coloured Button-Shaped Chocolates
Well-formed : A slogan should be well-formed. Obvious grammatical errors or misspellings that are not intentional dueto creativity should be penalised. So slogan B below is better than slogan A.Slogan A: Your trip with us bookSlogan B: Book your trip with usLastly, please perform the labelling independently, especially do not discuss with the other annotator performing the sametask. Thank you for your contribution!
Appendix B. Annotation Interface