[PDF] Scaling Creative Inspiration with Fine-Grained Functional Facets of Product Ideas

Abstract

Web-scale repositories of products, patents and scientific papers offer an opportunity for creating automated systems that scour millions of ideas and assist users in discovering inspirations and solutions. Yet the common representation of ideas is in the form of raw textual descriptions, lacking important structure that is required for supporting creative innovation. Prior work has pointed to the importance of functional structure -- capturing the mechanisms and purposes of inventions -- for allowing users to discover structural connections across ideas and creatively adapt existing technologies. However, the use of functional representations was either coarse and limited in expressivity, or dependent on curated knowledge bases with poor coverage and significant manual effort from users. To help bridge this gap and unlock the potential of large-scale idea mining, we propose a novel computational representation that automatically breaks up products into fine-grained functional facets. We train a model to extract these facets from a challenging real-world corpus of invention descriptions, and represent each product as a set of facet embeddings. We design similarity metrics that support granular matching between functional facets across ideas, and use them to build a novel functional search capability that enables expressive queries for mechanisms and purposes. We construct a graph capturing hierarchical relations between purposes and mechanisms across an entire corpus of products, and use the graph to help problem-solvers explore the design space around a focal problem and view related problem perspectives. In empirical user studies, our approach leads to a significant boost in search accuracy and in the quality of creative inspirations, outperforming strong baselines and state-of-art representations of product texts by 50-60%.

Full PDF

SScaling Creative Inspiration withFine-Grained Functional Facets of Product Ideas

TOM HOPE * , Allen Institute for AI and The University of Washington, US

RONEN TAMARI,

Hebrew University of Jerusalem, Israel

HYEONSU KANG,

Carnegie Mellon University, US

DANIEL HERSHCOVICH,

University of Copenhagen, Denmark

JOEL CHAN,

University of Maryland, US

ANIKET KITTUR,

Carnegie Mellon University, US

DAFNA SHAHAF,

Hebrew University of Jerusalem, Israel

Web-scale repositories of products, patents and scientific papers offer an opportunity for building automated systems that scour millionsof existing ideas and assist users in discovering novel inspirations and solutions to problems. Yet the current way ideas in such repositoriesare represented is largely in the form of unstructured text, which is not amenable to the kind of user interactions required for creativeinnovation. Prior work has pointed to the importance of functional representations – capturing the mechanisms and purposes of inventions– for allowing users to discover structural connections across ideas and creatively adapt existing technologies. However, previous workexploring the use of functional representations was either very coarse-grained and limited in expressivity, or dependent on manuallycurated knowledge bases with poor coverage and significant manual effort from users.To help bridge this gap and unlock the potential of large-scale idea mining, we propose a novel computational representationthat automatically breaks up products into fine-grained functional facets . We train a model to extract these facets from a challengingreal-world corpus of invention descriptions, and represent each product as a set of facet embeddings. We design similarity metricsthat support granular matching between functional facets across ideas, and use them to build a novel functional search capability thatenables expressive queries for mechanisms and purposes. We construct a graph capturing hierarchical relations between purposes andmechanisms across an entire corpus of products, and use the graph to help problem-solvers explore the design space around a focalproblem and view related problem perspectives. In empirical user studies, our approach leads to a significant boost in search accuracyand in the quality of creative inspirations, outperforming strong baselines and state-of-art representations of product texts by 50-60%.

A modern-day engineer, scientist or designer has access to online repositories of millions of products, scientific papersand patents, containing descriptions of myriad technologies and their uses; essentially, a huge database of problems andsolutions. Combined with rapid advances in algorithms for extracting information from large unstructured databases, thisraises the prospect of using machines to augment and scale the process of innovation, helping human problem solversidentify inspirations and solutions across domains.The human ability to detect abstract relations across ideas and find ways to creatively adapt existing tools for newuses, has been a driving force in the history of innovation [10, 28, 32, 34, 40]. Microwave ovens were discovered by repurposing radar technology developed during World War II; Teflon, today chiefly used in non-stick cookware, wasfirst used in armament development; and gigantic organizations such as NASA and Procter & Gamble actively engage insearching for opportunities to adapt existing technologies for new domains and markets[18]. In a very different kind ofexample, a car mechanic recently invented a simple device to ease childbirths by adapting a trick for extracting a cork a r X i v : . [ c s . H C ] F e b , Hope et al. stuck in a wine bottle – which he discovered online, in a YouTube video[1]. This award-winning vacuum device couldsave millions of lives in developing countries. Strikingly, according to the World Health Organization, there has been noinnovation in this area of work “for almost centuries”.These and many other examples suggest a future where automated systems mine web-scale repositories with myriaddescriptions of inventions, surfacing pertinent inspirations or solutions to problems. But despite the immense promise foraccelerating the pace of innovation, finding inspirations continues to currently mostly be a manual, trial-and-error process,or simply the result of serendipity. A key limiting factor is that these large idea repositories cannot support the kinds ofuser interactions that are required to support creative inspiration, because the predominant computational representationof ideas – in the form of unstructured textual descriptions – is unsuitable for these interactions.Human creativity often relies on detecting structural matches across distant ideas, adapting them by transferringmechanisms from one domain to another [12, 13, 25, 26] – but this human skill is notoriously hard to transfer to machines[36]. A primary reason is that structured representations of ideas are simply not generally available. Repositories ofscientific papers, patent publications or product descriptions are typically limited to “structure” in the form of high-levelcategory-focused keywords, which do not support the functional interactions we desire. For example, to identify that acontraption for extracting a cork stuck in a bottle could serve as relevant inspiration for easing childbirth, an automatedsystem would need to figure out that a vacuum-based mechanism can serve the purpose of extraction of physical objects,and match this function to the problem of extracting babies stuck in the birth canal. At the same time, most structuredknowledge bases that do provide richer, more structured representations (e.g., [6, 62]) are hand-crafted and small, andprevious efforts to scale-up have been limited in expressivity [17, 22]. General-purpose knowledge bases (e.g., Cyc [44],NELL [53], DBpedia [20]) largely encode categorical knowledge (e.g., is-a, has-a) and rarely functional knowledge (e.g.,used-for), and can also suffer from poor coverage [29].One promising recent approach [36] trains neural networks to learn one aggregate purpose and mechanism vectorof products as coarse, soft “structure” that can be derived from raw text and used to find analogically related productswith similar overall purpose but distant mechanism. The resulting matches led to increased creativity measured in anempirical ideation study. Further work used the same approach to find analogies in scientific papers [10]. However, inreality products have multiple fine-grained purposes with different mechanisms for achieving each, as demonstrated inFigure 1. As shown in Figure 1, the single-vector approach of [36] to search for products related to a smart pillow devicecannot disentangle its different functional facets (tracking sleep, neck support, etc.). The aggregate approach squashestogether multiple purposes and mechanisms into one soft “puddle”, losing important information for retrieval of productsthat have only partial functional matches and limiting the ability to find diverse adaptation opportunities.Importantly, this aggregate representation does not only harm retrieval accuracy, but suffers from a fundamentallimitation in terms of the interactions it enables. Prior work has demonstrated the importance of interactions for traversingand exploring granular functions. A recent study [29] showed that providing designers with computational tools toexpress the particular aspects of purposes that they are interested in and to traverse multiple levels of granularity andabstraction, could significantly increase the novelty and usefulness of ideas they generated. An earlier study [65] showedthat representing problems in terms of multiple purposes and constraints enabled designers to search for more novel anduseful inspirations. The WordTree method [46] – a prominent method in creative engineering design — directs designersto break their problem into subfunctions, and then use the WordNet[51] database to explore abstractions and relatedfunctional facets to inspire analogies to products and designs across domains.However, to date the scope of applicability of these interactions has been limited by the lack of scalable means formodeling ideas in terms of granular purposes and mechanisms. The approach in [65] explored only manually constructed caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , • • • • • • • • • • • • • • • • What everyone wants to have is a comfortable way to sleep while traveling. A neck pillow ﬁlled with soft material that supports your neck. It’s unique because it has sensors to track your sleep.

Track sleep SleepComfort

TravelSupport neck

Neck pillowSoft material

Sensors

Standard searchvector representation Purpose & mechanism aggregate vectors

Our approach: Fine-grained purpose & mechanism facets

Neck pillow with sensorsfor comfortable traveling Soft pillow with sensors Robotic neck brace Car seat vital signs monitor

Fig. 1. Extracting fine-grained purpose and mechanism functional facets from an online product description, to search foradaptation opportunities. Green spans are mechanisms, red spans are purposes.

Left : Standard vector-based search doesnot enable control for partial functional matches. Retrieval results are typically highly similar to the original product, whichis not helpful in creative innovation interactions.

Center : The aggregate approach in previous work [36] captures only oneoverall, coarse purpose/mechanism, limiting the expressivity of the search and losing important information for retrievalof products that have only partial functional matches.

Right : Our fine-grained functional facets enable users to discoverfocused matches based on specific functions, retrieving more diverse inspirations for creative adaptation. problem representations, and the WordTree method provided instructions to, but not technical scaffolds for, identificationof functional facets to use for exploring a design space. The system of [29] required both manual effort from the userin specifying the different purposes, as well as a manually-curated knowledge base (Cyc[45]) in which those purposeswere already connected in a concept graph describing their hierarchical relationships, which suffers from poor coverage[29] for real-world product description texts. In addition, even after the user manually specified granular purposes, thesystem was forced to use the aggregate approach of [36] to retrieve relevant matches from a corpus of products, since noautomated tool for extracting granular purposes at scale across all products was available.To help close this gap and enable interactions between humans and automated systems that facilitate innovation, wedevelop a new computational representation of idea descriptions based on fine-grained functional facets . Our systemautomatically identifies multiple purposes and mechanisms within a given product description. We then construct a novelspan-based representation of each product in terms of purpose and mechanism functional facets and their correspondingvector embeddings. We demonstrate the utility of our approach for supporting human creativity in two applications: (1)Fine-grained functional search for alternative uses of mechanisms, and (2) Exploring alternative problem perspectivesaround a focal problem for potential inspirations.

Functional search for alternative uses of mechanisms.

Our span-based representation enables innovators to searchfor ideas with expressive queries for specific functions. Figure 1 shows an example of functional facets automaticallyextracted by our system and their use for retrieval of potential inspirations for adaptation opportunities. In Section 3, we , Hope et al. build a prototype fine-grained functional search tool , and evaluate its utility in an alternative uses task in which users findunconventional applications of given mechanisms, potentially leading to pathways to new markets. Exploring problem perspectives with a functional concept graph.

We further use our representation to automaticallygenerate a functional concept graph that embeds purpose/mechanism facets at different levels of granularity. Whilethe coarse representation in [36] made it hard to pull out discrete and interpretable concepts from product texts, ourfine-grained approach allow us to mine recurring functional relations, such as specific problems that are often mentionedtogether or specific problems and solutions associated with them. This level of detail can enable us to map the landscape ofideas — similarly to manually curated functional ontologies, a core tool used in engineering and design ideation [27, 35].By automating the graph construction, we take a step toward removing the dependence on manually-constructed KBsthat limited previous work [29]. We evaluate the utility of our graph in an application involving problem reformulation[15, 16]: construing an existing problem in terms of other structurally related problems , to explore alternative problemperspectives and the design space around a focal problem. This capability can help users “break out” of fixation on thedetails of a specific problem and connect to parts of the design space that may superficially look unrelated [11, 40].In both applications, our approach leads to a significant boost of 50-60% over the best-performing baselines, includingthe previous work of [36].Our computational representation of idea descriptions and the interactions it enables, help address several key challengesto unlocking the potential of large scale online idea mining, including the bottlenecks in manual construction of structuredidea repositories; limited expressivity for users in searching fine-grained purposes and mechanisms; and harnessing idearepositories to flexibly explore alternative problem formulations across levels of abstraction. We believe our representationmay serve as a useful building block for novel creativity support tools that can help users find and recombine theinspirations latent in unstructured idea repositories at a scale previously impossible.

A summary of our contributions : • We propose a novel computational representation of ideas with granular functional facets for purposes and mechanismsextracted automatically from product descriptions. • We use crowd workers to annotate product texts from a challenging real-world corpus, and evaluate several extractionmodels trained on these annotations. We represent each product as a set of span embeddings , corresponding to themultiple facets, and use similarity metrics over these sets to support partial, focused matching between ideas. • Using our similarity measures between ideas, we build a novel functional search capability that supports expressive,fine-grained queries for purposes and mechanisms. • We demonstrate the flexibility and utility of the representation for computational support of core creative tasks: (1)searching for alternative, atypical product uses for potential adaptation opportunities; and (2) creating a functionalconcept graph that enables to explore the design space around a focal problem. Through two empirical user studieswe demonstrate that our representation significantly outperforms both previous work and state-of-the art embeddingbaselines on these tasks. We achieve Mean Average Precision (MAP) of 87% in the alternative product uses search, and62% of our inspirations for design space exploration are found to be useful and novel – a relative boost of 50-60% overthe best-performing baselines, including the coarse representation approach of [36].

Our goal in this section is to construct a representation that can support the creative innovation tasks and interactionsdiscussed in the Introduction. Previous work [36] suggested a representation separating an idea into one purpose vector caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , and one mechanism vector. While that approach showed promise, the one-vector representation was coarse, mashingtogether many different purposes and mechanisms, and limiting interactions that require fine-grained control by the user.Figure 1 shows an example. When searching for products sharing structural relations with a smart pillow product,the aggregate purpose/mechanism vectors squash together multiple concepts such as comfort, sleep, travel, neck support (purposes) or neck pillow, soft material, sensors (mechanisms) – limiting the ability to tease apart different sub-purposesand sub-mechanisms. This results in retrieval of another smart pillow, which is only slightly different in that it is notintended for travel. The aggregate vectors are also not interpretable – leaving the user blind to what is truly being matchedas part of the process of idea retrieval, and not enabling targeted focus on specific functional aspects.In contrast, we propose to use span representations [42]. Given a product text description, we extract tagged spans oftext corresponding to purposes and mechanisms (see Figure 2), and represent the product as a set of span embeddings. Bydoing so, we are able to employ similarity metrics that support partial, faceted matching between ideas.Continuing our example, we can now represent the smart pillow with a set of purpose and mechanism spans (Figure 1,right). This allows to retrieve a wider range of products with faceted matches, such as a robotic neck brace for the necksupport purpose, or a car seat vital signs monitor which matches on the embedding combination of travel, support neck,sensors . These retrieved products could point to new directions to explore, such as new markets where the smart pillowtechnology could be adapted (e.g., to increase comfort in robotic neck braces or car seats with sensors).More technically, we use a standard sequence tagging formulation, with X 𝑁 = { x , x , . . . , x 𝑁 } a training set of 𝑁 texts,each a sequence of tokens x 𝑖 = ( 𝑥 𝑖 , 𝑥 𝑖 , . . . , 𝑥 𝑇𝑖 ) , and Y 𝑁 a corresponding set of label sequences, Y 𝑁 = { y , y , . . . , y 𝑁 } , y 𝑖 = { 𝑦 𝑖 , 𝑦 𝑖 , . . . , 𝑦 𝑇𝑖 } , where each 𝑦 𝑗 indicates token 𝑗 ’s label (purpose/mechanism/other). In later sections, we representeach product 𝑖 as a set of purpose span embedding vectors and a set of mechanism span embedding vectors.For the reasons discussed above, we view the span-based approach not simply as a more flexible and nuanced model,but as a potential building block that can power new interfaces and paradigms for innovation that we explore later in thispaper. We start by describing our data and annotation process; we then discuss and evaluate models to extract spans fromproduct texts, followed by applications and experiments. Fig. 2. Crowdsourcing interface for fine-grained purposesand mechanisms. Boxes are predefined chunks to annotate.

We use real-world product idea descriptions taken fromcrowdsourced innovation website Quirky.com and used in[36], including 8500 user-generated texts describing inven-tions across diverse domains (e.g., kitchen products, healthand fitness, clean energy). Texts typically include multi-ple purposes and mechanisms. Texts in Quirky use very nonstandard language , including grammatical and spellingerrors (e.g., “Folds Up Perfect For Carrying. you can walk-on, put your mouth on and or hands on. numbers in anyconfiguration 4 learning to De / Composing Numbers.”).

Annotation.

To create a dataset annotated with purposes and mechanisms, we collect crowdsourced annotations onAmazon Mechanical Turk (AMT). We observed that in the annotation task of [36] workers tend to annotate long, oftenirrelevant spans. We thus guided workers to focus on shorter spans. To further improve quality and encourage moregranular annotations, we limited maximal span length that could be annotated, and disabled the annotation of stopwords. , Hope et al. Fig. 2 shows our tagging interface; rectangles are taggable chunks. For quality control, we required US-based workerswith approval rate over 95% and at least 1000 approved tasks, and filtered unreasonably fast users. Workers were paid$0.1 per task. In total, we had annotating workers. Median completion time was seconds.While a manual inspection of the annotations revealed they are mostly satisfactory, we observe two main issues: First,there are often multiple correct annotations . Second, workers provide partial tagging – in particular, if similar spansappear in different sentences, very few workers bother tagging more than one instance (despite instructions). These issueswould have made computing evaluation metrics problematic. We thus decided to use the crowdsourced annotations as a bronze-standard for training and development sets only. For a reliable evaluation, we collected gold-standard test setsannotated by two CS graduate students. Annotators were instructed to mark all the relevant chunks, resulting in highinter-annotator agreement of . . We collect annotated training sentences and gold sentences, for a total of , 𝑡𝑜𝑘𝑒𝑛𝑠 (tag proportions: . mechanism, . purpose, . other). A note on related annotated data.

There has been recent work on the related topic of information extraction from scientific papers by classifying sentences, citations, or phrases. Recent supervised approaches [8, 38, 47] use annotationswhich are often provided by either paper authors themselves, NLP experts, domain experts, or involve elaborate (multi-round) annotation protocols. Sequence tagging models are often trained and evaluated on (relatively) clean, succinctsentences [49, 66]. When trained on noisy texts, results typically suffer drastically [2]. Our corpus of product descriptionsis significantly noisier than scientific papers, and our training annotations were collected in a scalable, low-cost mannerby non-experts. Using noisy crowdsourced annotation for training and development only is consistent with our quest for alightweight annotation approach that would still enable training useful models. In a closer domain than scientific texts,[43] classify product review sentences as containing a usage expression or not, over five products only. In contrast, thiswork focuses on extracting fine-grained purposes and mechanisms from a diverse range of products. Review texts areoften written in fairly clean and coherent language, commonly appear in NLP tasks [61], and do not typically describe indetail the mechanisms and purposes of products. In addition, sentence-level classification would not support the userinteractions we explore in this paper, which require fine-grained control .

After collecting annotations, we can now train models to extract the spans. We explore several models likely to havesufficient power to learn our proposed novel representation, with the goal of selecting the best performing one. Inparticular, we chose two approaches that are common for related sequence-tagging problems, such as named entityrecognition (NER) and part-of-speech (POS) tagging: a common baseline and a recent state-of-the-art model. We alsotried a model-enrichment approach with syntactic relational inputs.We stress that our goal in this section is to find a reasonable model whose output could support creative downstreamtasks; many other architectures are possible and could be considered in future work. • BiLSTM-CRF.

A BiLSTM-CRF [37] neural network, a common baseline approach for NER tasks, enriched withsemantic and syntactic input embeddings known to often boost performance [66]. We first pass the input sentence x = ( 𝑥 , 𝑥 , . . . , 𝑥 𝑇 ) through an embedding module resulting in v 𝑇 , v 𝑖 ∈ R 𝑑 𝑒 , where 𝑑 𝑒 is the embedded spacedimension. We adopt the “multi-channel” strategy as in [66], concatenating input word embeddings (pretrained GloVevectors [56]) with part-of-speech (POS) and NER embeddings. We additionally add an embedding corresponding to theincoming dependency relation. The sequence of token embeddings is then processed with a BiLSTM layer to obtaincontextualized word representations h ( ) 𝑇 , h 𝑖 ∈ R 𝑑 ℎ , where 𝑑 ℎ is the hidden state dimension. The outputs are fed into a caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , Configuration P R F Enriched BiLSTM 45.24 39.01 41.90Pooled-Flair 53.30 39.80 45.50GCN 47.85 47.93

GCN self-train 49.00 52.00

Table 1. Raw extraction accuracy evaluation. All approaches use CRF loss. GCN with syntactic edges outperforms baselines.Self-training further improves results. Random-label achieves only . F . linear layer 𝑓 to obtain per-word tag scores 𝑓 (cid:16) h ( 𝐿 ) (cid:17) , 𝑓 (cid:16) h ( 𝐿 ) (cid:17) , ..., 𝑓 (cid:16) h ( 𝐿 ) 𝑇 (cid:17) . These are used as inputs to a conditionalrandom field (CRF) model which maximizes the tag sequence log likelihood under a pairwise transition model betweenadjacent tags [5]. • Pooled Flair.

A pre-trained language model [4] based on contextualized string embeddings, recently shown tooutperform powerful approaches such as BERT [14] in NER and POS tagging tasks and achieve state-of-art results.Flair uses a character-based language model pre-trained over large corpora, combined with a memory mechanism thatdynamically aggregates embeddings of each unique string encountered during training and a pooling operation to distilla global word representation. We follow [4] and concatenate pre-trained GloVe vectors to token embeddings, add a CRFdecoder, and freeze the language-model weights rather than fine-tune them [14, 57]. • GCN.

We also explore a model-enrichment approach with syntactic relational inputs. We employ a graph convolutionalnetwork (GCN) [39] over dependency-parse edges [66]. GCNs are known to be useful for propagating relationalinformation and utilizing syntactic cues [49, 66]. The linguistic cues are of special relevance and interest to us, as theyare known to exist for purpose/mechanism mentions in texts [22].We used a GCN with same token embeddings as in the BiLSTM-CRF baseline, with a BiLSTM layer for sequentialcontext and a CRF decoder. For the graph fed into the GCN, we use a pre-computed syntactic edges with dependencyparsing: For sentence x 𝑇 , we convert its dependency tree to A 𝑠𝑦𝑛 where A 𝑠𝑦𝑛𝑖 𝑗 = for any two tokens 𝑥 𝑖 , 𝑥 𝑗 connectedby a dependency edge. We also add self-loops A 𝑠𝑒𝑙 𝑓 = 𝐼 (to propagate from h ( 𝑙 − ) 𝑖 to h ( 𝑙 ) 𝑖 [66]). Following [66], wenormalize activations to reduce bias toward high-degree nodes. For an 𝐿 -layer GCN, denoting h ( 𝑙 ) 𝑖 ∈ R 𝑑 ℎ to be the 𝑙 -thlayer output node, the GCN operation can be written as ℎ ( 𝑙 ) 𝑖 = 𝜎 (cid:169)(cid:173)(cid:171) ∑︁ 𝑟 ∈R  𝑛 ∑︁ 𝑗 = A 𝑟𝑖 𝑗 W ( 𝑙 ) 𝑟 ℎ ( 𝑙 − ) 𝑗 / 𝑑 𝑟𝑖 + b ( 𝑙 ) 𝑟 (cid:170)(cid:174)(cid:172) where R = { syn , self } , 𝜎 is the ReLU activation function, W ( 𝑙 ) 𝑟 is a linear transformation, b ( 𝑙 ) 𝑟 is a bias term and 𝑑 𝑟𝑖 = (cid:205) 𝑇𝑗 = A 𝑟𝑖 𝑗 is the degree of token 𝑖 w.r.t 𝑟 . In the GCN architecture, 𝐿 layers correspond to propagating informationacross 𝐿 -order neighborhoods. We set the contextualized word vectors h ( ) 𝑇 to be the input to the GCN, and use h ( 𝐿 ) 𝑇 asthe output word representations. Similarly to [49], we do not model edge directions or dependency types in the GCNlayers, to avoid over-parameterization in our data-scarce setting. We also attempted edge-wise gating [49] to mitigatenoise propagation but did not see improvements, similarly to [66].In our experiments, we followed standard GCN training procedures. Specifically, we base our model on the experimentalsetup detailed in [66] (see also the authors’ code which we adapt for our architecture, at https://github.com/qipeng/gcn-over-pruned-trees). We pre-process the data using the spaCy (https://spacy.io) package for tokenization, dependency https://github.com/flairNLP/flair 7 , Hope et al. Top K (%) P r e c i s i on mechanismpurpose 4045505560657075 Fig. 3. Precision@K results for the best performing model (GCN + self-training). parsing, and POS/NER-tagging. We use pretrained GloVE embeddings of dimension , and NER, POS and dependencyrelation embeddings of size each, giving a total embedding dimension 𝑑 𝑒 = . The bi-directional LSTM and GCNlayers’ hidden dimension is 𝑑 ℎ = , with hidden layer for the LSTM. We find that the setting of hidden layers worksbest for the GCNs. The semantic similarity threshold 𝐾 was tuned on the development sets, and was found to be . onQuirky and . for the patents data. We also tried training with edge label information based on syntactic relations, butfound this hurts performance. The training itself was carried out using SGD with gradient clipping (cutoff ) for epochs, selecting the best model on the development set.For the Pooled-Flair approach [4], we use the FLAIR framework [3], with the settings obtaining SOTA results forCONLL-2003 as in [4] (see https://github.com/flairNLP/flair/blob/master/resources/docs/EXPERIMENTS.md). We alsoexperiment with non-pooled embeddings and obtain similar results. We experiment with initial learning rate and batchsize settings described in [4], finding . and to work best, respectively. In this section we assess extraction accuracy (whether we are able to extract purpose and mechanism spans of text). In thenext sections, we evaluate the utility of the extracted spans for enabling creative innovation tasks.To evaluate raw accuracy of the model’s predictions, we use the standard IOB label markup to encode the purpose and mechanism spans (5 possible labels per token, {Beginning, Inside} x {Purpose, Mechanism} plus an "Outside" label). Weconduct experiments using a train/development/test split of 18702/3614/512 .We pre-process the data using the spaCy package for tokenization, dependency parsing, and POS/NER-tagging. Weuse pretrained GloVE embeddings of dimension , and NER, POS and dependency relation embeddings of size each, giving a total embedding dimension 𝑑 𝑒 = . The bi-directional LSTM and GCN layers’ hidden dimension is 𝑑 ℎ = , with hidden layer for the LSTM. We find that the setting of hidden layers works best for the GCNs. Thetraining itself was carried out using SGD with gradient clipping (cutoff ) for epochs, selecting the best model andhyper-parameters based on the development set. For Flair, we experiment with initial learning rate and batch size settingsdescribed in [4], finding . and to work best, respectively.Due to our challenging setting, we train models on bronze-standard annotations with noisy and partial tagging done bynon-experts; for evaluation we use a curated gold-standard test set (Section 2). See Table 1 for results: GCN reaches an F caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , Fig. 4. Comparing our GCN model predictions (right) to human annotations (left). Interestingly, our model managed tocorrect some errors made by the annotator (e.g., “it’s”, “heated”, “coffee warm”, “beverages”). Purposes shown in pink,mechanisms in green. score of ∼ , outperforming the BiLSTM-CRF model (enriched with multi-channel GloVe, POS, NER and dependencyrelation embeddings) by . GCN also surpasses the strong Pooled-Flair pre-trained language model by nearly . . Arandom baseline guessing each token by label frequencies (Section 2) achieves . F . We interpret these results aspossibly attesting to the utility of graph representations and features capturing syntactic and semantic information whenlabels are noisy. As a sanity check, we also computed precision@K (Figure 3). As expected, precision is higher withlow values of 𝐾 , and gradually degrades. Precision for mechanisms is higher than for purposes. Interestingly, a manualinspection revealed many cases where despite the noisy training setting, our models managed to correct mistaken orpartial annotations (see Figure 4). Self-Training.

According to the results, we chose GCN as our best-performing model. We experimented adding self-training [59] to GCN. Self-training is a common approach in semi-supervised learning where we iteratively re-label “O”tags in training data with model predictions. A large portion of our training sentences are (erroneously) un-annotated byworkers, perhaps due to annotation fatigue, introducing bias towards the “O” label.Self-training with GCN shows an improvement in F by an additional . , substantially increasing recall (more than over Flair), see Table 1. Self-training stopped after iterations, following no gain in F on the development set. Our focus in this paper is to study the utility of the extracted purposes and mechanisms, in terms of the user interactionsthey enable. We explore two tasks demonstrating the value of our novel representation for supporting creative innovation.We start with a case study involving search for alternative uses .Our task is inspired by one of the most well-known divergent thinking tests [31] for measuring creative ability –the alternative uses test [33], where participants are asked to think of as many uses as possible for some object. Asidefrom serving as a measure of creativity, the ability to find alternative uses for technologies has important applicationsin engineering, science and industry. Technologies developed at NASA, the US space agency, have led to over 2,000spinoffs, finding new uses in areas such as computer technology, agriculture, health, transportation, and even consumerproducts . Procter & Gamble, the multinational consumer goods company, has invested millions of dollars in systematicsearch for ideas to re-purpose and adapt from other industries, such as using a compound that speeds up wound healing totreat wrinkles - an idea that led to a new line of anti-wrinkle products[18]. And very recently, the COVID-19 pandemic https://spinoff.nasa.gov/ 9 , Hope et al. provided a stark example of human innovation during times of crisis, with many companies actively seeking to pivot theirbusiness and re-purpose existing products to fit the new climate [19].One teaching example is that of John Osher, creator of the popular “Spin Pop” – a lollipop with a mechanism fortwirling in your mouth. After selling his invention in the late 90’s, Osher and a group fellow inventors proactively searchedfor ideas – “rather than having an idea come to us” . The group drew up a list of dozens of potential ideas, and eventuallylanded on the ”Spin Brush” – a cheap electric toothbrush adapted from the same mechanisms behind the twirling lollipop.This case of repurposing an existing technology is involved a systematic search process which required a rich, granularunderstanding of products and their designs rather than pure serendipity. However, Osher and his team still had to rely onhuman processing power – inherently limited in its ability to scour millions of potential descriptions of problems availableonline, and find relevant and non-obvious candidate problems for which the twirling mechanism could be adapted.Introducing automation could help accelerate the search process, helping scale human ingenuity by sifting throughmillions of ideas for relevant inspirations. However, the task is challenging for existing search systems, because it requiresa nuanced, multi-aspect understanding of both products and queries. Consider, for example, a company that manufacturessome product (e.g., light bulbs). The company is familiar with straightforward usages of their products (lamps, flashlights),and wants to identify non-standard uses and expand to new markets. Finding uses for a lightbulb that are not about thestandard purpose of illuminating a space would be difficult to do with a standard search query over an idea repository. Tocome up with different applications of lights, one may turn to the Web to collect examples. However, it quickly turns outto be a non-trivial task, as the term “lights” or “lighting” will bring back lots of results close to “lamps,” “flashlight,” andthe like. The result of a quick Google search is also inundated with Christmas lights or light bulbs (not to mention light inthe sense of “lightweight”). What one might want instead is finding a diverse set of applications other than just buildingfloor lamps or decorative lights.In contrast, using our representation, each idea in the repository is associated with mechanism spans and purpose spans,and one could form a query such as mechanism=“light bulb” , purpose= NOT “light” . Using our system, the searcheradds “light” as a mechanism and also add “light” as a negative purpose (i.e., results should not include “light” purpose).Our engine returns interesting examples such as billiard laser instructor devices (Table 1, warning signs on food packagesto get attention of kids with allergies and lights attached to furniture to protect your pinky toes at night (Fig. 5 bottom). We have built a prototype search engine supporting our representation. Figure 5 shows the top two results for the light bulbscenario: warning lights on food for kids with allergies, and lights attached to furniture to protect your pinky toe at night.These are non-standard recombinations [21] (light + allergies, light + furniture guard) that could lead the company to newmarkets. We conduct an experiment simulating scenarios where users wish to find novel/uncommon uses of mechanisms.Table 2 shows the scenarios and examples. To choose these scenarios for the experiment, we find popular/commonmechanisms in the dataset and their most typical uses. For example, one frequent mechanism is RFID, which is typicallyused for purposes such as “locating” and “tracking”. We then create queries searching for different uses – purposes thatdo not include concepts related to the typical uses of a given mechanism. We now describe the methods we use to retrieve results for the scenarios. To automate scenario selection, we cluster mechanisms (see Section 4.1 for details), select frequent mechanisms from the top largest mechanism clusters,and identify purposes strongly co-occurring with them (e.g., “RFID” co-occurs with “locating”, “tracking”) to avoid.10 caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , Fig. 5. Applications for light where light is not in the purpose. Two of the results and their automatic annotations (purposesin pink, mechanisms in green).

We represent each product 𝑖 as a set of purpose vectors P 𝑖 (cid:66) { p 𝑖 , p 𝑖 , . . . , p 𝑃 𝑖 𝑖 } , and a set ofmechanism vectors M 𝑖 (cid:66) { p 𝑖 , p 𝑖 , . . . , p 𝑀 𝑖 𝑖 } extracted with our GCN model.Similarly, we define a set of query vectors q 𝑝 (cid:66) q , q , . . . q 𝑄 𝑝 and q 𝑚 (cid:66) q , q , . . . q 𝑄 𝑚 . Each query chunk can benegated, meaning it should not appear. Finally, we define distance metrics 𝑑 𝑝 (· , ·) , 𝑑 𝑚 (· , ·) between sets of purposes andmechanisms. For example, to locate a dog using RFID but not GPS:argmin 𝑖 𝑑 𝑝 ({ q “locate dog” } , P ˜ 𝑖 ) 𝑠.𝑡. 𝑑 𝑚 ({ q “GPS” } , M ˜ 𝑖 ) ≥ threshold 𝑑 𝑚 ({ q “RFID” } , M ˜ 𝑖 ) ≤ threshold (1)We explore two alternatives for computing distance metrics 𝑑 𝑚 , 𝑑 𝑝 : • FineGrained-AVG. 𝑑 𝑝 ( q 𝑝 , P 𝑖 ) is 1 minus the dot product between average query and purpose vectors (normalized tounit norm). We define 𝑑 𝑚 similarly. • FineGrained-MAXMIN.

We match each element in q 𝑝 with its nearest neighbor in P 𝑖 , and then find the minimumover the distances between matches. 𝑑 𝑝 is defined as 1 minus the minimum. All vectors are normalized. We define 𝑑 𝑚 , Hope et al. Query Example results

Mechanism: light . Purpose: NOT light

Billiard laser instructor (projector)Mechanism: solar energy . Purpose: NOT generating power

Light bulbs with built-in solar chips.Mechanism: water . Purpose: NOT cleaning , NOT drinking

A lighter that burns hydrogen generated from waterand sunlight.Mechanism:

RFID . Purpose: NOT locating , NOT tracking

A digital lock for your luggage with RFID access.Mechanism: light . Purpose: cleaning

A UV box to clean and sanitize barbells at the gym.

Table 2. Scenarios and example results retrieved by our FineGrained-AVG method. All queries reflect non-trivial uses ofmechanisms (e.g., a query for using water not for drinking/cleaning, retrieves a lighter running on hydrogen from water andsunlight).Fig. 6. Results for search evaluation test case. Mean average precision (MAP) and Normalized Discounted Cumulative Gain(NDCG) by method, averaged across queries. Methods in bold use our model. similarly. This captures cases where queries match only a small subset of product chunks, erring on the side of cautionwith a max-min approach.

We test our model against: • AvgGloVe.

A weighted average of GloVe vectors of the entire text (excluding stop words), similar to standard NLPapproaches for retrieval and textual similarity. We average query terms and normalize to unit norm. Distance iscomputed via the dot product. • Aggregate purpose/mechanism.

Representing each document with the model in [36], using a BiLSTM neural networktaking as input raw text and producing two vectors corresponding to aggregate purpose and mechanism. We averageand normalize query vectors, and use the dot product.For all four methods, we handle negative purpose queries by filtering out all products whose distance is greater than 𝜆 ,where lambda is a threshold selected to be the th percentile of distances. caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , We recruited five engineering students to judge the retrieved product ideas. Each participant provided binary relevanceranking to the top 20 results from each of the four methods, shuffled randomly so that judges are blind to the condition .See Figure 6 for results. We report Non Cummulative Discounted Gain (NDCG) and Mean Average Precision (MAP),two common metrics in information retrieval [60]. Our FineGrained-AVG wins for both metrics, followed by FineGrained-MAXMIN. The baselines perform much worse, with the aggregate-vectors approach in [36] outperforming standardembedding-based retrieval with GloVe. Importantly, our approach achieves high MAP (85% - 87%) in absolute terms, inaddition to a large relative improvement over the baselines (MAP of 40%-60%).Table 2 shows example results of FineGrained-AVG. For instance, a query for using light not for lighting results inlaser-based billiard instructions. A query for using RFID not for locating or tracking results in an idea for an RFID-basedlock, or RFIDs used at supermarket checkouts.Looking at examples of retrieved results demonstrates the benefit of our approach. For instance, with the queryfor using light for the non-standard purpose of cleaning, the top ranked result retrieved by FineGrained-AVG is a UV Light Sterilizer , with purposes that include

Sterilizes bacteria , Keep public and peoplehealthy and

Cleaner fresher air , and the top result from FineGrained-MAXMIN is similarly a

Standalonebug zapper bulb that uses uv light / black light . Conversely, the top result for both baselines (stan-dard search and aggregate-vectors) is a

Toilet / Bathroom Light , with a sensor light that glowsaround your toilet and has extra batteries if you lose electricity in the bathroom .It appears that both baselines were not able to accurately capture and disentangle purposes and mechanisms, despite theaggregate-vector being explicitly designed for that. The aggregate-vector approach squashes multiple purposes togetherby design into one soft, aggregate vector, which in this case includes concepts like toilet and bathroom that aresomewhat topically related to cleaning. The aggregate approach had similar issues with accuracy in the next threeproduct idea it retrieved (

Switch that glows in the dark , a

Dash Light to illuminate your ashtray , and a light strip to change your water color ). Only the fifth result (out of the top five) wascloser to being related to the query (a LED lamp designed to look like a window that can keep airodorless with an electrostatic air purifier ), yet not precisely capturing the purpose of cleaning – dueto squashing together multiple concepts in one soft average (this product was also ranked as the top fourth result by thestandard search baseline). In contrast, the fifth result found by FineGrained-MAXMIN was a die grinder with a light to see inside when cleaning / fixing root welds inside steel pipe .As another example, for the query of using RFID not for locating or tracking , the top result with both FineGrained-AVG and FineGrained-MAXMIN is a walk through checkout scanner that uses RFID, a product not cap-tured by the two other baselines in their top five results. The first-ranked result found by the aggregate-vector base-line approach was a customizable luggage system with

RFID protection (also the the second resultretrieved by FineGrained-AVG) but it also retrieved products such as a wifi enabled chip for kids andpets that allows them to go in or out without tripping the alarm , and a case with laser andbluetooth to connect to smart devices , that are of weaker relatedness to RFID technology.Overall, our results demonstrate that fine-grained purposes and mechanisms lead to better functional search expressivitythan approaches based on distributional representations or coarse purpose-mechanism vectors. Inter-rater agreement measured across all scenarios was at by both Fleiss kappa and Krippendorff’s alpha tests.13 , Hope et al.

In this section we test the value of our novel representation for supporting users in exploring the design space for solvinga given problem. We use our span-based representation to construct a corpus-wide graph of purpose/mechanism concepts.We demonstrate the utility of this approach in an ideation task, helping users identify useful inspirations in the form ofproblems that are related to their own.Our goal is to help users “break out” of fixation on a certain domain, a well-known hindrance to innovation [11, 40].Doing so is challenging because it requires some level of abstraction : being able to go beyond the details of a concreteproblem to connect to a part of the design space that may look dissimilar on the surface, but has abstract similarity.Numerous studies in engineering and cognitive psychology have shown the benefits of problem abstractions for ideation[22, 24, 30, 40, 46, 63, 64]. However, these studies either involve non-scalable methods (relying on highly-structuredannotations, or on crowd-sourcing) or simple, syntactical pattern-matching heuristics incapable of capturing deeperabstract relations.In the work closest to ours [36], crowd workers were given a product description from the Quirky database, and askedto come up with ideas for products that solve the same problem in a different way. Soft aggregate vectors representingpurposes and mechanisms were used to find near-purpose, far-mechanism analogies. Thus, the ability to find analogs waslimited by relying on having a given mechanism to control for structural distance. Unlike [36], in our setup we assume arealistic scenario where we are given only a very short problem title – e.g., generating power for a phone, remindingto take medicine, folding laundry – and aim to find inspirational stimuli [30] in the “sweet spot” for creative ideation –structurally related to the given problem, not too near yet also not too far [23].To address this challenge, in this section we build a tool inspired by functional modeling, which we call a

FunctionalConcept Graph . A functional model [35] is, roughly put, a hierarchical ontology of functions and ways to achieve them,and is a key concept in engineering design. Such models are especially useful for innovation, allowing problem-solversto “break out” of a fixed overly-concrete purpose or mechanism and move up and down the hierarchy. Despite theirgreat potential, today’s functional models are constructed manually, and thus do not scale. We thus construct a (crude)approximation of a functional representation that would still be useful for exploring the design space and suggestingpotentially useful inspirations to users.In our approach, Functional Concept Graphs consist of nodes corresponding to purposes or mechanisms, and edgesencoding semantic (not necessarily hierarchical) relations. Our span-based representation enables us to build this graph(Figure 7) – products that mention certain purposes (e.g., "charge your phone") will often mention other, structurally relatedproblems that could be more general/abstract (e.g., "generate power") or more specific ("wireless phone charging"). Theserelations between purposes and mechanisms could help find connections across ideas at different levels of abstraction.Our approach allows us to look at fine-grained co-occurrences of concepts appearing together in products and thus inferrelations between them, unlike the coarse representation in [36] that represented entire products with one aggregatepurpose/mechanism vector which could not reveal important granular information needed for constructing such a graph.In other words, we can discover patterns in the form of products that solve problem 𝑝 𝑖 , also often solve problems I ,and suggest I as potential inspirations to be recommended . However, naively looking for co-occurrences of problemsmay yield I too near to the original 𝑝 𝑖 , as many frequently co-occurring purposes tend to be very similar, while we areinterested discovering the more abstract relations. In addition, raw chunks of text extracted from our tagging model have This also bears certain resemblance to collaborative filtering [41], where recommendations are based on the pattern: people who buy item X, also often buyY ; in our case, instead of people and items, we have ideas written by people, and the problems they solve.14 caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , ,

Fig. 7. An example of our learned functional concept graph extracted from texts. Mechanism in green, purpose in pink. Titlesare tags nearest to cluster centroids (redacted to fit). countless variants that are not sufficiently abstract and are thus sparsely co-occurring. We thus design our approach toencourage abstract inspirations, as we describe next.

We develop a method to infer this representation from co-occurrence patterns of the fine-grained spans of text. We takethe following two steps (see more details in the next section):

I. Concept discretization.

Intuitively, nodes in our graph should correspond to groups of related spans (“charging”,“charging the battery”, “charging a laptop”). To achieve this, we take all purpose and mechanism spans ˆ P , ˆ M in thecorpus, extracted using our GCN model, and cluster them (separately), using pre-trained vector representations. We referto the clusters C 𝑝 , C 𝑚 as concepts . II. Relations.

We employ rule-mining [55] to discover a set of relations R between concepts. Relations are Antecedent = ⇒ Consequent , with weights corresponding to rule confidence. To illustrate our intuition, suppose that when “preventhead injury” appears in a product description, the conditional probability of “safety” appearing too is large (but not theother way around). In this case, we can (weakly) infer that preventing head injuries is a sub-purpose of “safety”. Indeed,manually observing the purpose-purpose edges, the one-directional relations captured are often sub-purpose , and thebi-directional ones often encode abstract similarity . Similarly, for mechanism concepts the one-directional relations areoften part of (“cell phone” and “battery”), and bi-directional are mechanisms that co-occur often. For pairs of purposeand mechanism concepts, the relation is often functionality (“charger”, “charge”).

Example.

Figure 7 shows a subgraph from our automatically constructed functional concept graph (showing only high-confidence edges). Pink nodes correspond to purposes and green nodes to mechanisms. The figure shows a part of thegraph related to electricity, power and charging. A designer could go from the problem of charging batteries to the moregeneral problem of generating power, and from there to another branch (e.g., solar power and mechanical stored energy),to get inspired by structurally related ideas. , Hope et al. Fig. 8. A snippet from our ideation interface for “morning medicine reminder”. Users indicate which inspirations were useful,and what they inspired. For example, seeing “real time health checker” inspired one user to suggest a monitoring device forfinding the best time for reminding to take the medicine.

Next, we set out to test the utility of the functional concept graph, based on our nuanced representation, in an ideationtask. In our setup we gave participants problems (e.g., reminding people to take their medication) and asked them to thinkof creative solutions. Participants were also given a list of potential inspirations, and were instructed to mark whethereach was novel and helpful. They were encouraged to explain the solution it inspired. See example in Figure 8: Seeing“real time health checker” inspired one user to suggest monitoring the person to find the best time to remind them to takemedicine.To create a set of seed problems, a graduate student mapped between problems from WikiHow.com (a website ofhow-to guides) to purposes in our data. Using this source allowed us to collect real-world problems that are broadlyfamiliar, with succinct and self-explanatory titles that do not require further reading to understand. The student was taskedwith confirming that our Quirky dataset contains idea descriptions that mention these problems. For a given problem inWikiHow ( how to remember to take medication ), they performed keyword search over 𝐾 purpose spans gleaned byour model from Quirky, and found matching spans ( morning medicine reminder ). We use those matching spans as ourseed problem description given to users (purple text in Figure 8). We collect problems this way. Table 3 shows moreexamples, such as Tracking distance walked , folding laundry or sensing dryness level .Inspirations are other purpose spans from our dataset (see Table 3), selected automatically using our approach andcomparing to baselines. For our approach, we explore two common and powerful vector representations of spans, onebased pre-trained word embeddings [56] and the other on a more recent language model representation tuned to capturesemantic similarity[58]. Our method is based on clustering related purpose spans into concept nodes in the functionalgraph; some of these nodes contain tens of spans in them. Thus, we also explore two approaches to “summarize” eachconcept cluster with representative spans displayed to users.In more concrete detail, we experiment with the following approaches for selecting inspirations: We experiment with the following two span representations for building a functional conceptgraph: caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , Problem Inspirations Rater explanation

Track distance walked Protect children Get ideas from devices that keep track of childrenFolding laundry Store toilet paper Roll laundry around a tube instead of foldingDispense medicine Pet bowl that keeps antsaway Based on pet bowls that can dispense food during the daySense dryness level Voltage reading Use electric current to measure water level (safely)Waterproof Ideas from sensors in waterproof devicesTemperature readingMorning medicine reminder Schedule coffee, coffeealarm Alarm clock with coffee and medicine remindersSend vital data, real-time health checker Health trackers to tell if medicine not taken, alert accordinglyHeart rate monitoring,continuously monitorglucose Find the best time to take medicine

Table 3. Example inspirations and explanations given by human evaluators. • GloVe pre-trained word embeddings, averaged across tokens. • BERT-based contextualized vectors that have been fine-tuned for semantic similarity tasks [58].Each representation is used to cluster the spans. We cluster the spans using K-Means++ [7]. We then apply the Apriorialgorithm to automatically mine association rules between clusters, [55] and use the confidence metric to select the toprules . To use the mined rules between purpose nodes (clusters) for selecting inspirations shown to users, we start fromthe purpose node corresponding to the given problem and take its consequents ; as explained earlier, this captures a weaksignal of abstract similarity.We experiment with two approaches for displaying concepts to users – one that attempts to summarize the clusterindependently of the seed problem, and one that takes the seed problem into account: • TextRank [50].

We construct a graph where nodes are the spans in a cluster and edges represent textual similarity. Werun PageRank [54] on this graph, selecting the top 𝐾 spans to present. • Nearest spans.

Following the findings in [23], select the top 𝐾 spans in C 𝑝 that are nearest to the query 𝑝 𝑖 . (For bothapproaches, we use 𝐾 = ). • Purpose span similarity.

Given a problem 𝑝 𝑖 , we find the 𝐾 = nearest purpose spans of text in our corpus (out of 𝐾 purposes). We experiment with the same two vector representations used by our approach: GloVe and BERT. Thismethod is similar to applying the methodology in [36] to our setting, where in our setting we are given only a problem 𝑝 𝑖 and no mechanism 𝑚 𝑖 is available to control for structural distance. While this approach relies on our model forextracting purpose spans, we consider it a baseline to study the added value of our hierarchy. • Linguistic abstraction.

We use the WordNet [52] lexical database to extract hypernyms (for each token in 𝑝 𝑖 ), in orderto capture potential abstractions. WordNet is often used in similar fashion for design-by-analogy studies [30, 46]. We use RoBERTa-large-STS-SNLI, available at github.com/UKPLab/sentence-transformers. 𝐾 = selected automatically with elbow-based criteria on silhouette scores. We use the top rules in our experiment. 17 , Hope et al. Alert/remind

Making hot drinks

Medicine delivery

Medical monitoring

Coffee machine alarmSmart medicine injector Smart medicine injectorpill reminderSmart medicine injector

Fig. 9. Example from our Functional Concept Graph, explaining the inspirations shown to users in Figure 8. Nodes representconcepts (clusters of purposes), named by us for readability. Edges are annotated with products containing spans from bothconcepts. The problem of “medicine morning reminder” is mapped (via embedding) to the

Alert/remind concept, which islinked to the concepts of medical monitoring and making hot drinks through products such as “smart medicine injector” and“coffee machine alarm” (among others, not displayed in the figure). These links serve as inspirations in our study. • Random concepts.

Random inspirations are often considered as a baseline in ideation studies since diversity ofexamples is a known booster for creative ability [36]. For each task, we select a random cluster from C 𝑝 and display itsTextRank summary. In our study, each method generated 𝐾 = spans (concept summaries), which are groupedand displayed together in a box (Figure 8). For each problem a rater views boxes in randomized order, to avoid bias. Werecruit raters ( graduate students, a senior engineering professor, and an architect). Raters were instructed to markinspirations they consider useful and relevant for solving a given problem, while being not about the same problem . Raterswere also encouraged to write comments, especially for non-trivial cases which they found of interest (see Table 3). Intotal, raters viewed boxes, or purpose descriptors. Qualitative analysis.

Table 3 and Figure 8 show examples of problems, inspirations and user explanations from ourstudy. For instance, users facing the “morning medicine reminder” problem were presented with nearby concepts inthe Functional Concept Graph that included health monitoring and coffee machines . To explore why theseconcepts are connected in our graph and why they are potentially useful as inspirations, we make use of the directinterpretability of our approach. We examine the purpose co-occurrences from which the Functional Concept Graphwas constructed. Figure 9 shows the a graph with concept nodes of

Making hot drinks, alerting/reminding, healthmonitoring, medicine delivery , and edges representing products in which two adjacent purposes were co-mentioned (e.g.,a coffee machine alarm product that mentioned the purposes of making hot drinks and alerting/reminding , or a"smart medicine injector" that mentioned both alerting/reminding and medicine delivery ). This explains why the conceptsare nearby in the graph, as there are multiple products in our dataset that refer to purposes from both concepts. caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , Fig. 10. Inspiration user study results. Left: Proportion of inspirations selected by at least raters, per condition. Right:Proportion of boxes (clusters) with at least spans marked by ≥ raters. For example, a pill reminder product refers to the problem of forgetting to take medicine at prescribed times(

Sends notification if you forgot to take your AM or PM meds ), while a smart injector device administers medicine on set time intervals . At the same time, both of these products of course mentionpurposes of medicine delivery. When our graph construction algorithm observes enough similar co-occurrence patternsbetween the concepts of alerting and medicine delivery, across multiple products, an edge is added between the two in thegraph. Similarly, an

Alarm coffee maker product mentions the purposes of time management and makingcoffee at a set time as well as alerting when the coffee is ready , explaining how it emergesas a potential inspiration in our graph.This type of linkage or overlap between an original problem space and inspiration problems helps get at a sweet-spotof innovation [12] by finding ideas that are not too near and not too far from the original problem, helping users break outof fixation as discussed earlier in this section. Users used these inspirations to come up with a tracker that alerts the userat the best time to take a medicine, and a coffee machine reminding the user to take their medication with their morningcoffee. Those creative directions demonstrate the utility of the Functional Concept Graph for exploring the design space.

Quantitative results.

Figure 10 shows the results of the user study. On the left, we show the proportion of inspirations(individual spans) selected by at least two raters, for each method. Our approach significantly outperforms all thebaselines. The effect is particularly pronounced for the BERT-based approach, with of inspirations found useful,while the best baseline reaches less than . Interestingly, for both BERT and GloVe representations, the Nearest-spansummarization approach fares better, potentially due to striking a balance between being too far/near the initial problem 𝑝 𝑖 .Figure 10 (right) shows the proportion of inspiration boxes that got at least individual inspirations marked (by atleast raters). This metric measures the effect of a box as one unit, as each box is meant to represent a coherent cluster.Our method is able to reach , while the best baseline (GloVe search on purpose spans) yields only . Again, thenearest-span summarization is prefered to TextRank.Importantly, for both individual inspiration spans and inspirations boxes, - are rated as useful – high figuresconsidering the challenging nature of the task. , Hope et al. In this paper we introduced a novel span-based representation of ideas in terms of their fine-grained purposes andmechanisms and used it to develop new tools for creative ideation. We trained a model to extract spans from a noisy,real-world corpus of products. We used this representation to help search for alternative, uncommon uses of productsand to generate a graph capturing abstract similarities in idea repositories to help problem-solvers explore the designspace around their problem. In both ideation studies, we were able to achieve high accuracies, significantly outperformbaselines and help boost user creativity.In future work, we would like to further explore weak supervision approaches to augment annotation in noisy settings.Another direction is learning purposes and mechanisms in an end-to-end fashion. Another exciting prospect is deployingour search engine publicly, allowing scientists, engineers and designers to perform rich queries, discover new similarities,and boost innovation with enhanced capabilities not possible with today’s search.Beyond supporting richer search for creative inspiration, a data-driven approach to extracting functional facets andlearning abstractive relationships between the facets could power much more expansive approaches to mapping out designspaces for entire domains or problem areas, identifying key subproblems and constraints and novel paths through thedesign space. Mapping approaches like this, such as technological roadmapping [9], have already shown significantpromise for reinvigorating research and development in real-world applications such as neural recording [48]. However,these mapping exercises are still highly manual and labor-intensive processes; computational support for such tasks couldhave transformative impacts on innovation.

REFERENCES [1] The car mechanic who uncorked a childbirth revolution.

BBC News , 2013.[2] G. Aguilar, S. Maharjan, A. P. L. Monroy, and T. Solorio. A multi-task approach for named entity recognition in social media data. In , 2017.[3] A. Akbik, T. Bergmann, D. Blythe, K. Rasul, S. Schweter, and R. Vollgraf. Flair: An easy-to-use framework for state-of-the-art nlp. In

NAACL-HLT ,2019.[4] A. Akbik, T. Bergmann, and R. Vollgraf. Pooled contextualized embeddings for named entity recognition. In

NAACL , 2019.[5] A. Akbik, D. Blythe, and R. Vollgraf. Contextual string embeddings for sequence labeling. In

International Conference on Computational Linguistics ,2018.[6] G. Altshuller.

40 principles: TRIZ keys to innovation . 2002.[7] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In

ACM-SIAM symposium on Discrete algorithms , 2007.[8] I. Augenstein, M. Das, S. Riedel, L. Vikraman, and A. McCallum. Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientificpublications. arXiv preprint arXiv:1704.02853 , 2017.[9] E. S. Boyden and A. H. Marblestone. Architecting Discovery: A Model for How Engineers Can Help Invent Tools for Neuroscience.

Neuron ,102(3):523–525, May 2019. Publisher: Elsevier.[10] J. Chan, J. Chang, T. Hope, D. Shahaf, and A. Kittur. Solvent: A mixed initiative system for finding analogies between research papers.

CSCW , 2018.[11] J. Chan, S. P. Dow, and C. D. Schunn. Do The Best Design Ideas (Really) Come From Conceptually Distant Sources Of Inspiration?

Design Studies ,2015.[12] J. Chan, K. Fu, C. Schunn, J. Cagan, K. Wood, and K. Kotovsky. On the benefits and pitfalls of analogies for innovative design: Ideation performancebased on analogical distance, commonness, and modality of examples.

Journal of mechanical design , 2011.[13] J. Chan, T. Hope, D. Shahaf, and A. Kittur. Scaling up analogy with crowdsourcing and machine learning. In

ICCBR-16 .[14] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In

NAACL-HLT ,2019.[15] K. Dorst. The core of “design thinking” and its application.

Design Studies , 2011.[16] H. Dubberly and S. Evenson. On Modeling: The Analysis-synthesis Bridge Model. interactions , 2008.[17] J. R. Duflou and P.-A. Verhaegen. Systematic innovation through patent based product aspect analysis.

CIRP Annals - Manufacturing Technology ,2011.[18] K. Essick. Technology scouts: hoping to find the next big thing.

Science Business , Feb. 2006.[19] K. Essick. Innovation and creativity in a time of crisis.

Science Business , 2020.20 caling Creative Inspiration withFine-Grained Functional Facets of Product Ideas , , [20] M. Färber, F. Bartscherer, C. Menne, and A. Rettinger. Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago.

Semantic Web , 2018.[21] L. Fleming. Recombinant uncertainty in technological search.

Management science , 47(1):117–132, 2001.[22] K. Fu, J. Cagan, K. Kotovsky, and K. L. Wood. Discovering Structure In Design Databases Through Functional And Surface Based Mapping.

JMD ,2013.[23] K. Fu, J. Chan, J. Cagan, K. Kotovsky, C. Schunn, and K. Wood. The Meaning of Near and Far: The Impact of Structuring Design Databases and theEffect of Distance of Analogy on Design Output.

JMD , 2013.[24] K. Fu, J. Chan, C. Schunn, J. Cagan, and K. Kotovsky. Expert representation of design repository space: A comparison to and validation of algorithmicoutput.

Design Studies , 2013.[25] D. Gentner and K. J. Kurtz. Relational Categories. In

Categorization inside and outside the laboratory: Essays in honor of Douglas L. Medin , APAdecade of behavior series. American Psychological Association, Washington, DC, US, 2005.[26] D. Gentner and A. B. Markman. Structure mapping in analogy and similarity.

American psychologist , 1997.[27] K. Gericke and B. Eisenbart. The integrated function modeling framework and its relation to function structures.

AI EDAM , 2017.[28] M. L. Gick and K. J. Holyoak. Analogical problem solving.

Cognitive psychology , 12(3):306–355, 1980.[29] K. Gilon, J. Chan, F. Y. Ng, H. Lifshitz-Assaf, A. Kittur, and D. Shahaf. Analogy mining for specific design needs. In

Proceedings of the 2018 CHIConference on Human Factors in Computing Systems , CHI ’18, pages 121:1–121:11. ACM, 2018.[30] K. Goucher-Lambert and J. Cagan. Crowdsourcing inspiration: Using crowd generated inspirational stimuli to support designer ideation.

DesignStudies , 2019.[31] J. P. Guilford. Three faces of intellect.

American psychologist , 1959.[32] J. P. Guilford.

The nature of human intelligence . McGraw-Hill, New York, NY, 1967.[33] J. P. Guilford. The nature of human intelligence. 1967.[34] G. S. Halford, R. Baker, J. E. McCredden, and J. D. Bain. How many variables can humans process?

Psychological science , 2005.[35] J. Hirtz, R. Stone, D. A. McAdams, S. Szykman, and K. Wood. A functional basis for engineering design: reconciling and evolving previous efforts.

Research in engineering Design , 2002.[36] T. Hope, J. Chan, A. Kittur, and D. Shahaf. Accelerating innovation through analogy mining. In

KDD , 2017.[37] Z. Huang, W. Xu, and K. Yu. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 , 2015.[38] D. Jin and P. Szolovits. Hierarchical neural networks for sequential sentence classification in medical scientific abstracts. arXiv preprintarXiv:1808.06161 , 2018.[39] T. N. Kipf and M. Welling. Semi-Supervised Classification with Graph Convolutional Networks. sep 2016.[40] A. Kittur, L. Yu, T. Hope, J. Chan, H. Lifshitz-Assaf, K. Gilon, F. Ng, R. E. Kraut, and D. Shahaf. Scaling up analogical innovation with crowds and ai.

PNAS , 2019.[41] Y. Koren and R. Bell. Advances in collaborative filtering. In

Recommender systems handbook , pages 77–118. Springer, 2015.[42] T. Kuribayashi, H. Ouchi, N. Inoue, P. Reisert, T. Miyoshi, J. Suzuki, and K. Inui. An empirical study of span representations in argumentationstructure parsing. In

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 4691–4698, Florence, Italy, July2019. Association for Computational Linguistics.[43] S. Lahiri, V. V. Vydiswaran, and R. Mihalcea. Identifying usage expression sentences in consumer product reviews. In

Proceedings of the EighthInternational Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages 394–403, 2017.[44] D. B. Lenat. Cyc: a large-scale investment in knowledge infrastructure. In

Communications of the ACM , 1995.[45] D. B. Lenat and R. V. Guha.

Building large knowledge-based systems; representation and inference in the Cyc project . Addison-Wesley LongmanPublishing Co., Inc., 1989.[46] J. Linsey, A. Markman, and K. Wood. Design by analogy: a study of the wordtree method for problem re-representation.

JMD , 2012.[47] Y. Luan, L. He, M. Ostendorf, and H. Hajishirzi. Multi-task identification of entities, relations, and coreferencefor scientific knowledge graphconstruction. In

Proc. Conf. Empirical Methods Natural Language Process. (EMNLP) , 2018.[48] A. H. Marblestone, B. M. Zamft, Y. G. Maguire, M. G. Shapiro, T. R. Cybulski, J. I. Glaser, D. Amodei, P. B. Stranges, R. Kalhor, D. A. Dalrymple,D. Seo, E. Alon, M. M. Maharbiz, J. M. Carmena, J. M. Rabaey, E. S. Boyden, G. M. Church, and K. P. Kording. Physical Principles for ScalableNeural Recording.

Frontiers in Computational Neuroscience , 7, 2013. arXiv: 1306.5709.[49] D. Marcheggiani and I. Titov. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling. 1, 2017.[50] R. Mihalcea and P. Tarau. Textrank: Bringing order into text. In

EMNLP , 2004.[51] G. A. Miller. WordNet: a lexical database for English.

Communications of the ACM , 38(11):39–41, 1995.[52] G. A. Miller. Wordnet: a lexical database for english.

Communications of the ACM , 1995.[53] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, et al. Never-ending learning.

Communications of the ACM , 2018.[54] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.[55] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In

Database Theory—ICDT’99 . 1999.[56] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In

EMNLP , 2014.[57] M. E. Peters, S. Ruder, and N. A. Smith. To tune or not to tune? adapting pretrained representations to diverse tasks. In

RepL4NLP@ACL , 2019.[58] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In

EMNLP , 2019.21 , Hope et al. [59] M. Sachan and E. Xing. Self-training for jointly learning to ask and answer questions. In

NAACL-HLT , 2018.[60] H. Schütze, C. D. Manning, and P. Raghavan. Introduction to information retrieval. In

International communication of association for computingmachinery conference , 2008.[61] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentimenttreebank. In

Proceedings of the 2013 conference on empirical methods in natural language processing , pages 1631–1642, 2013.[62] S. Vattam, B. Wiltgen, M. Helms, A. K. Goel, and J. Yen. DANE: Fostering Creativity in and through Biologically Inspired Design. In

DesignCreativity 2010 . 2011.[63] L. Yu, A. Kittur, and R. E. Kraut. Searching for analogical ideas with crowds. In

CHI , 2014.[64] L. Yu, B. Kraut, and A. Kittur. Distributed analogical idea generation: innovating with crowds. In

CHI’14 , 2014.[65] L. Yu, R. E. Kraut, and A. Kittur. Distributed analogical idea generation with multiple constraints. In

Proceedings of the 19th ACM Conference onComputer-Supported Cooperative Work & Social Computing . ACM, 2016.[66] Y. Zhang, P. Qi, and C. D. Manning. Graph convolution over pruned dependency trees improves relation extraction. In