[PDF] Understanding Emails and Drafting Responses -- An Approach Using GPT-3

Abstract

Providing computer systems with the ability to understand and generate natural language has long been a challenge of engineers. Recent progress in natural language processing (NLP), like the GPT-3 language model released by OpenAI, has made both possible to an extent. In this paper, we explore the possibility of rationalising email communication using GPT-3. First, we demonstrate the technical feasibility of understanding incoming emails and generating responses, drawing on literature from the disciplines of software engineering as well as data science. Second, we apply knowledge from both business studies and, again, software engineering to identify ways to tackle challenges we encountered. Third, we argue for the economic viability of such a solution by analysing costs and market demand. We conclude that applying GPT-3 to rationalising email communication is feasible both technically and economically.

Full PDF

UUnderstanding Emails and Drafting ResponsesAn Approach Using GPT-3

Jonas Thiergart, Stefan Huber, Thomas ÜbellackerUniversity College Maastricht2021-02-04 ntroduction

Email is an essential medium for digital communication. By today, more than 300 billionemails are being sent and received daily. Although new communication means arecontinuously developed, the email-market is still predicted to grow to more than 360 billionemails and 4.48 billion users by 2024 (Clement, 2020).Nevertheless, day-to-day email management is a flawed process. First of all, it takestime: professionals spend some 2.6 hours a day on checking, reading, responding to andwriting emails (Plummer, 2019), which implies that the average knowledge worker “spendsan estimated 28 per cent of the workweek managing email” (McKinsey & Company, 2012,para. 4). Secondly, it is inefficient: one study found that among professionals, “over 50% ofmessages received . . . on mobile email devices were not directly addressed to the recipient”(Mazmanian et al., 2013, p.7). Moreover, in this study, “for every eight work-related emailmessages participants received on mobile email devices . . . they replied to, or initiated, onlyone email message” (p.7). Other arguments for the inefficiency of email clients exist, butfrankly, no one seems to dispute this. A survey on email customer service interactions foundthat “66% of U.S. consumers say the most important thing a brand can do is value their time”(Sacks, 2019, para. 1). Email communication in its current form squanders vast quantities ofwhat is arguably the most precious resource. Moreover, aside from being a waste of humanpotential, labour time spent on routine email management, especially that of highly trainedprofessionals, engenders high opportunity costs from a business perspective.Innovation in Natural Language Processing (NLP) technology could be the tool to fixthis problem. The recently released NLP technology known as GPT-3, in particular, mightexpand the possibilities for automating email management tasks. In this research paper, wedraw on both technological and economic knowledge to explore these. 1e ultimately seek to answer the question: Can one economically viably apply GPT-3to understanding incoming emails and rationalising human effort in responding to them? Tothat end, we first set the stage by providing a general overview of the key terms and principlesbehind NLP and specifically, GPT-3, as well as their properties. Next, we argue for the thesisthat it is possible and economically viable to use GPT-3 to rationalise specific emailcommunication processes. Our argumentation works by substantiating three premises that,together, prove the thesis. First, we determine whether GPT-3 can answer emails withsufficient quality to augment human effort. To that end, we discuss GPT-3’s capabilities andpropose a technical way to do so. Second, we consider challenges relating to this application,like GPT-3’s lack of access to business-internal knowledge, bias and its inaccuracy; we aim toprove that one can sufficiently address them. Third, we consider such a product’s scalabilityand potential users to determine commercial viability. Finally, we synthesise our findings toprove the thesis while touching on our work’s limitations.

Background Information

Natural Language Processing as an Answer to Complex Linguistics

Human language, also known as natural language, works differently from computer language.Natural language consists of many rules, exceptions, and particularities. Moreover, it isambiguous and can vary in meaning depending on the context. Therefore, computer systemsstruggle to deal with it (Knight, 2016).The discipline of Natural Language Processing (NLP) seeks to empower them to doso. NLP “is a collective term referring to automatic computational processing of humanlanguages. It includes both algorithms that take human-produced text as input and algorithmsthat produce natural looking text as outputs” (Bahja, 2020, p.2). In other words, by combining2isciplines like linguistics, statistics, data science and artificial intelligence, NLP engineersaim to model natural language in a way that allows computers to understand the content(natural language understanding) and express its own concepts (natural language generation)in natural language (Liddy, 2001).Over the years, NLP technology has undergone several evolutionary stages. Theearliest NLP models were “simple” mathematical representations of words and their meaningthat computers could deal with (Pennington et al., 2014). The technology advanced with therise of neural networks, allowing the creation of far more sophisticated models and theimplementation of context-information (Dai & Le, 2015). Except for a few linguisticallypeculiar fields like biomedicine, where domain-specific language models seem to keepoutperforming their non-specific counterparts (Poon & Gao, 2020), the current,state-of-the-art NLP models in most fields are “task agnostic” models that are fine-tuned on aspecific task. This means the models are first trained on large, general datasets and thencustomised for particular tasks with another, smaller dataset (Colin et al., 2019). While thismakes them high-performing, Brown et al. (2020), the creators of GPT-3, point out that inmany cases, fine-tuning datasets either lack quality or are not available at all. Thus, the latestmachine learning models were trained by increasingly large natural language datasets,resulting in cross-domain language models.NLP already creates enormous value for the economy in various areas where largeamounts of text data need to be understood, inferences drawn, and responses drafted. Forexample, NLP augments financial analysts by making predictions about the stock marketbased on analysing written “data from internet, news, blogs and social networking sites”(Bahja, 2020, p.5), or allows doctors to consider text data about past cases of illnesses, withone test improving cancer diagnosis accuracy by 22.6% (p.6). Moreover, NLP can summarisehard-to-read texts that are “full of abbreviations” or “spelling mistakes” and makes customer3ervice chatbots possible (p.6). This has made NLP’s presence felt, if not always consciously,in most peoples’ daily lives.

Generative Pre-Trained Transformer 3 Does Not Require TrainingData

Generative Pre-trained Transformer 3 (GPT-3), as a “task-agnostic” language model, couldmake possible even more applications of NLP. The system, which the research lab OpenAIreleased for testing in August 2020, can compete with previous, fine-tuned models for manyuse cases and on many benchmarks without any fine-tuning. According to its creators, it ispre-trained on a dataset over ten times larger than that of any comparable models that includesi.a. Wikipedia and a filtered version of the Common Crawl dataset (Brown et al., 2020).OpenAI has set up an application programming interface (API) where users access GPT-3 and“program” it by entering words in human language, giving instructions on what to do, and,optionally, a few examples of the desired output. That way, GPT-3 gets a similar number ofinstructions to what humans would get when asked to perform a task (unlike older NLPmodels, which require large sets of training data), providing the flexibility needed for a widerange of applications. As such a general-purpose language model, GPT-3 can be applied to awide variety of tasks. GPT-3 has proven capable of generating poetry or articlesindistinguishable from human authors, computer code for web interfaces, and product or jobdescriptions (Dale, 2020). 4 echnical Viability

Having explained relevant concepts, in the following, we evaluate the technical viability ofusing GPT-3 to understand emails and generate responses. To do so, we identify the individualsteps required and determine that they can be performed.

Understanding Incoming Emails

Context Understanding Using Text Classification

The very first step in understanding an email might be to classify it. Classification is used todistinguish spam from relevant messages and help to understand the context of an email (Wasiet al., 2015, p. 130). In many businesses, it makes sense to classify emails to be more efficientin responding to them. Consider, for example, a company’s support email system. Incomingemails could be classified into predefined categories to prioritise and distribute them amongdifferent persons or departments. Moreover, email providers like Gmail, for instance, classifyconsumer emails into categories like “social”, “promotions”, “updates” and “forums” (Izatt,2020) by default. A more sophisticated system may allow for using template responses orautomatically triggering other business processes based on an incoming email classification.Common techniques for text classification range from low-level semantic analysis tomore sophisticated methods like deep learning. A simple approach would be using a NaiveBayes Classifier considering words in the email’s body and title, assuming their order isirrelevant (Al-Alwani, 2014, p. 691). Still, a comparison of different supervised learningalgorithms shows that other algorithms easily outperform Naive Bayes Classification(Caruana & Niculescu-Mizil, 2006, p. 165). Those may perform well at rationalising emailclassification processes, but only if ample training data is available. 5he currency of GPT-3 implies academic literature on its uses for text classification islimited, so Übellacker (2021) conducted empirical testing, concluding that GPT-3’scapabilities are sufficient in this regard. As previously mentioned, prompts in the GPT-3 APIare formulated similarly to commands that one would make to a person. As a result, providingGPT-3 with a list of possible categories, along with a brief description of the email itself andwhat it should do (without giving specific examples) already works well. GPT-3 understandscategories just by their titles, adding to the algorithm's flexibility and enabling applicationswhere no or insufficient training data is available. Depending on the email body’s length, theGPT-3 prompt is also relatively short (usually 110-300 tokens, a unit for text length), whichmeans the classification is relatively fast (Übellacker, 2021). Hence, classifying texts withGPT-3 works (especially when there is no training data available). Nonetheless, it is moreexpensive and inaccurate than text classification using fine-tuned models. Hence, if there istraining data available, we recommend using those. However, due to flexibility reasons, in thecourse of the paper, we follow the classification approach with GPT-3. The classificationallows for assessing an email’s context and further processing it.

Extracting Relevant Information from Email Body

Another vital step in understanding emails is extracting information for further processing.There are several methods for extracting information from unstructured data (in this case,email bodies in plain text), including Named-Entity Recognition (NER). NER systems areused to extract named entities like organisations, geographical locations, person names ordates from natural texts (Singh, 2018). If we examined the context-category of an email, inthe first step of using text classification, we can now use information extraction to findcategory-specifically relevant information. In case of an email invite to an event, for example,NER could be used to gather event details that can then automatically be added to a calendar.Another application would be an incoming order email, where NER then is used to extract6nformation about the articles that are being ordered. NER technologies like Duckling (Wit.ai,2020) and spaCy (Schmitt et al., 2019) are pre-trained to be capable of recognising certainentities like dates, numbers, distances, emails, persons, organisations, etc. They can also betrained to detect custom entities when enough training data is available (Nazakat, 2020).In contrast, GPT-3, being pre-trained, can extract named-entities with little or notraining data. Test prompts show that GPT-3 can extract event information without anyspecific customization (Übellacker, 2021). This information could easily be forwarded to auser’s calendar system, further processed or used later for email response generation. Anotherexample shows that it is possible to extract detailed order information from an incoming emailthat could potentially be automatically forwarded to a customer relations system (Übellacker,2021).

Generating Responses With GPT-3

After email classification and information extraction, the last step toward rationalizing emailcommunication is generating a response text, which is something GPT-3 can do. Historically,natural language generation technologies suffered from the chicken-and-egg problem ofneeding a large, annotated data set to train models, but there being a lack of people to writeand annotate text. Fortunately, GPT-3 offers new possibilities and is innately capable ofcomposing text in cross-domain applications (Brown et al. 2020). In the context of emailautomation, GPT-3 can be used to answer emails, feeding the prompt with backgroundinformation gathered during the email understanding process and the corresponding emailthread. By all indications, the quality of GPT-3 generated text is many ways similar to that oftext written by humans. GPT-3 can generate grammatically correct, coherent responses. Withits vast, previous knowledge, it can even answer emails where the message is not completelyspecified, like open-ended questions (Dale, 2020). This means that, for example, if a user asks7PT-3 for a nice restaurant in a given city, it will be able to recommend a place to eat. Evenmore remarkably, it can do so in a conversational manner and mimic empathy (Aronsson etal., 2020), allowing the email recipient to feel understood.Nevertheless, GPT-3 sometimes interprets messages awkwardly. It is capable ofanswering general emails but has trouble in specific cases. For example, if it is askedsomething that requires information that is not in the training dataset, such as internalbusiness data, it cannot respond appropriately. However, the GitHub repository by Übellacker(2021) shows that GPT-3 can achieve impressive results when given enough context. Thisrepository does not show exhaustive testing with different inputs (which calls for furtherresearch), but it nevertheless seems likely that GPT-3 can be used in a way that allows it tounderstand text with great accuracy.

Solving Main Limitations

In the previous chapter, we showed that using GPT-3 to understand emails and respond tothem works in theory. However, this approach has inherent, technical limitations that couldmake it unworkable in practice. In this section, we argue that these can be addressed.

GPT-3 Makes Mistakes

One of the main limitations of GPT-3 is its lack of reliability. In some cases, especially as aresult of weak instructions, the produced output is wrong content-wise, reflects undesirablebiases from the training data such as racism (Floridi & Chiriatti, 2020) or is simply nonsense(Dale, 2020). There are specific, fine-tuned models capable of detecting such issues(Al-Hassan & Al-Dossari, 2019) in GPT-3’s output. To that end, it is advisable to implementadditional software in the architecture. Nevertheless, these measures leave room for mistakes.8hus, it becomes clear that GPT-3 is currently not made for automation. Instead, we proposeretaining a “human-in-the-loop”: The solution should be understood as a tool to augmentpeople and support their email writing, rather than a replacement. This could be achieved byadding functionality to email clients that allows users to pick whether to fully accept, edit ordiscard automatically drafted emails.

GPT-3 Is a Few-Shot-Learner

Another limitation relates to the fact that, as previously mentioned, GPT-3 works withoutlarge fine-tuning data sets. By its nature, the language model has knowledge about almost anydomain. It learns based on only a few commands on what to do and ideally, some examples ofthe desired output (few-shot learning) (Wang et al., 2020). We describe above how GPT-3therefore performs well in tasks that require only general information but struggles with tasksthat require further knowledge. However, when writing emails in a work environment (whichmay include a large portion of emails), it is often necessary to use just such (business-internal)information that GPT-3 has never seen before. In the following, we propose a solution thatallows us to augment GPT-3 with business-internal information.To determine how to include business-internal information in emails, one must firstunderstand that it comes in a variety of forms. In many cases, information is stored inenterprise resource planning (ERP) or customer relationship management (CRM) systems,wikis, email archives or product documentations. Moreover, recent years show a steadyincrease in unstructured data, making them less accessible for software systems. Whilestructured data like tables and databases are often easier to implement, GPT-3 might also wantto consider information from PDFs, images and video files (Harvard Business Review, 2020).The main obstacle in the way of augmenting GPT-3 with business-internal informationis that since we cannot feed the GPT-3 prompt with a lot of data (OpenAI, 2020), it needs tobe provided with precisely the right data. To address this, we propose a software architecture9hat allows GPT-3 to analyse an email’s content and evaluate which information topics aremost relevant for responding to it. Thus, we need a solution to search through large amountsof unstructured resources and return text information to questions.Cloud service providers offer different AI services that could perform this task.Current cloud search services that search over large amounts of unstructured business datawith keyword searches are not based on keyword-matching, but they understand the contentand the context to deliver precise search results (Harvard Business Review, 2020; Talia,2013). Most services like Kendra by Amazon Web Services (AWS) or IBM’s WatsonDiscovery return paragraphs that are likely to contain the answer to a given question or givenkeywords. They are straightforward when adapting the product for new customers and havemany connectors to different data sources such as SharePoint or Google Drive (whichbecomes relevant when companies use cloud services to store their data). From theperspective of economic viability, Watson Discovery is the most affordable option for thistype of search (Robinson 2020; Amazon 2020; IBM 2020).Azure Cognitive Search, the search service of Microsoft, is in a similar price range asWatson Discovery, but it works differently. Instead of a single service that returns passages tosearches, Azure’s approach to searching through unstructured data envisions a stream ofdifferent services that are configured and connected in a row. In contrast to the previouslyanalysed services, Azure’s Cognitive Search itself only searches through structured data.Therefore, it is served as a final step of a series of other AI models used to prepareinformation, rather than as an independent service.Engineers in the discipline of knowledge mining develop approaches that aim to getinsights out of this great variety of unstructured information. To that end, knowledge miningapplies pre-built AI-services such as computer vision, sentiment analysis or languagerecognition to extracting key information out of these different types of data and makes itaccessible as structured data (Harvard Business Review, 2020). Besides the option for custom10odels, Azure offers predefined knowledge mining models such as computer vision andlayout understanding that enrich the data, i.e. gain structured information from unstructuredcontent. Thereby Cognitive Search allows searching through a range of unstructured inputsuch as images, audio files and even videos, which goes far beyond the capabilities of searchservices of other providers (Microsoft, 2020).Consequently, while Watson Discovery might be a good fit for companies that onlywant GPT-3 to consider information from text documents, Azure search with its affiliatedfunctions (Robinson, 2020) currently appears to be a workable as well as the most suitabletool for providing GPT-3 with the business-internal information necessary for drafting emailresponses.

Economic Viability

Having demonstrated that applying a GPT-3-based software to rationalising emailunderstanding as well as drafting is feasible and that technical solutions exist for the mainchallenges, the next step is to determine whether deploying it at scale is economically viable.This depends on two types of factors: financial factors like the cost structure for the serviceprovider and benefit to customers, as well as the demand side, the potential size of the marketfor this service. We devote a section to each of these concepts, beginning with the financialfactors.

Cost Analysis

Generally, software products are easily scalable because of their near-zero marginal cost(MC). The physical infrastructure like cloud services (the main cost factor for scaling) isbecoming so cheap that one theorist coined the term “zero marginal cost society” (Rifkin,11014) to describe the situation. This is because the effect of diseconomy of scale (which limitsscalability) only takes place when the long-run average cost per unit (LRAC) increases withthe quantity produced; a constant (in this case, near-zero) MC implies that LRAC cannotincrease.In the case of GPT-3-based email rationalisation, MC is not zero, but still constant,meaning such a service would be easy to scale (provided customers were prepared to pay aprice above MC for it). The reason for the non-zero MC is that GPT-3’s creators’ businessmodel was to receive payment proportionally to the amount of text processed by the GPT-3software. This payment will be on the order of magnitude of 6 dollar cents for 1000 “tokens”of text (MLK, 2020). If this were the only MC, the average cost per unit would merelyapproach MC in the long run, leading to no increase in the LRAC and ergo, no diseconomy ofscale. However, other costs of providing the service arise: the “customisation costs” ofadapting the software to a new customer (integrating business internal data) - these mayindeed increase LRAC if one starts with customers with low customisation cost. Overall, theproduct may, therefore, not necessarily be scalable to fill the whole market demand. Still, itappears the MC of the product would initially be constant (there are many similar customers,as we will show below) - possibly making it scalable to a degree that would allow a providerto pay fixed costs (e.g., for labour and a flat rate for the search service), rendering the productprofitable.The earlier thoughts regarding scalability focused on proving there would be no(early) diseconomy of scale since there is no change in LRAC as quantity increases. It wasstill not clear whether adopting the technology is cost saving. To answer that question, we willmake and compare rough estimates of the cost of generating a given length of text with GPT-3or writing it manually. For the sake of argument, we will consider an email with a length of500 words. Considering human adults can read 300-575 words per minute (Nelson, 2012),humans would need around 1 minute to read such an input text as well as a likely longer, but12ard-to-estimate time to write a response. Assuming workers reading this text were paid theCalifornia minimum wage of 15 dollars/hour (US Department of Labor, 2021), the wage costfor merely reading it would be around 25 cents. The total cost of responding would besignificantly higher. To estimate the cost of responding with GPT-3, one needs to estimate thetotal number (and therefore price) of tokens required to generate a response in addition to thewage cost of manually editing out any inaccuracies before sending. We know “2048 tokenstranslated to words will be ~ 1500 words” (He, 2020, para. 3), implying that a 500-word emailwill be around 700 tokens long. The total number of tokens necessary for generating an emailresponse is greater than the length of the email itself because of how GPT-3 functions. Tocreate a response email using the business information connector, one needs to feed text intoGPT-3 several times, which, based on our proposed setup, totals a fixed 63 tokens plus twotimes the length of the preceding email and one time the length of the generated response R(Übellacker, 2021). This works out to tokens. At 6 cents/1000 tokens, the GPT-3 cost is atleast about 8.8 cents. If one assumes the response is as long (700 tokens) as the receivedemail, the total GPT-3 cost increases to about 13 cents. Therefore, while the cost of manuallyresponding is 25 cents plus the wage cost of writing a response, the cost of using GPT-3 is 13cents plus the wage cost of editing a response. Assuming that editing a reply takes less timethan writing one from scratch (or that wage costs are usually above minimum wage), thedifference becomes even greater. As a result, the effective cost of manually responding toemails is higher than the constant, marginal cost of doing so with GPT-3. Therefore, applyinga GPT-3-based software at scale would save users time and money even at a price well aboveMC and therefore be economically viable for providers.

Market Demand

Aside from thinking about the costs associated with using GPT-3 to rationalise emailcommunication, one must identify the potential demand to accurately judge whether it is an13conomically viable solution. To do so, one must define some criteria users need to fulfil; inthis case, what type of entity they are, as well as how many and what type of emails they sendin their value creation chain.We know that around 28% of knowledge workers’ work hours are devoted to emailmanagement (McKinsey & Company, 2012). It seems reasonable to assume that most emailmanagement happens at work and that employers generally provide employees with workequipment. Hence, the natural customers for the described product appear to be employers,i.e., private firms or the government. Therefore, the answer to the question of the type ofentity of users seems to be clear.The more interesting question is that of the nature and role of emails in their valuecreation chain. From the background information section, we know that some fields involve alinguistic complexity that general language models like GPT-3 “understand” poorly. On theother hand, certain types of messages, like advertisements, do not need to be generated; theyare obviously reused verbatim. Therefore, the subject matter to be understood and respondedto must neither be overly complex nor too simple. Another issue is that of importance - atleast until the technology becomes more reliable, firms might prefer to draft certain highlyconsequential documents like deal proposals manually. Finally, because the marginal costs ofthe software include the per-volume cost demanded by OpenAI for the use of the GPT-3 API,profits and pricing depend on the quantity of text understood or generated, signifying thatfirms with a high volume of email traffic are preferable customers. Therefore, the ideal userswould be corporate or public employers whose work processes involve understanding andresponding to large volumes of “semi-complicated” text where the possible damage of amisunderstanding is not excessively high.What types of firms, then, fulfil these criteria? Our research indicates that a market forGPT-3-based email rationalisation exists in several different sectors of the economy, of which14e shall explore just a few. In all sectors, the damage of a small mistake in wording seemsminor as content generally involves neither vast amounts of money nor human safety.The insurance industry, where correspondence consists primarily in handling“insurance claims and policy adjustments'' (Stoeckli et al. 2018, p.299), represents oneexample. Studies of this sector have found that even in the digital world, directcommunication between two people results in a significant increase in trust and convincement(Maas & Bühler, 2015, p.22), implying that completely automating insurance customerservice is not desirable. Simultaneously, response speed and accessibility for requests areconsidered two critical wishes of customers (p.22), which has led to a shift from offline medialike call centres to online media like email or apps at all stages of the customer journey (p.22).Moreover, surveyed experts estimate the automation potential at 28% of the industry’s valuecreation (p.38). These combined trends signify that GPT-3-based email customer service witha “human-in-the-loop” could meet many of the insurance sector’s future demands. Althoughinsurance markets differ in the frequency of customer interaction, with e.g., healthcare usuallyrequiring more customer service than life insurance (Stoeckli et al. 2018, p.299), the volumeof communication is likely enormous, as health insurance coverage alone is near-universal inmany countries (New York State, 2011).The market for utilities such as energy faces similar trends. Analyses show that“administrative processes in customer management and billing (including changes inprovider, address, or product) are proliferating. Distributed generation and multiple channelsare resulting in more convoluted and error-prone processes” (Peters et al., 2016). At the sametime, it is becoming clear that “traditional energy retailers need to adapt to the digitalage—fast. Unencumbered by large sales forces and expensive call centres, challengers keepcosts low by communicating with customers primarily online” (Lehrke et al., 2018). Again,the combined developments of increasing administrative processes and a shift toward onlinecommunication imply that the utilities sector’s customer service might present a market for15ationalised email communication. The volume of traffic is considerable, seeing as utilities,perhaps more so than health insurance, is a near-universal market in terms of number ofcustomers.A third market for GPT-3-based email rationalisation may be that of emailcommunication in public administration. For example, the German government recentlyintroduced a secure email client known as De-mail, designed to bundle all communicationbetween citizens or private firms and the federal government’s institutions into a single,digital channel. This includes applications for transfer payments like maternity support,customs clearances, the information provided for court proceedings, answers to generalquestions, etc. (Bundesverwaltungsamt, 2018). As with the commercial firms, the governmentresponds to the demand of citizens for practical communication and cost incentives byshifting the bulk of communication to online media like email. The conditions for usingGPT-3 with a human-in-the-loop are met; there exists a large volume of not completelystandardisable email communication. The general point applies more broadly, as, according tothe latest UN E-Government Survey, the number of countries that provide governmentservices via email or apps has significantly risen in all sectors (UN Department of Economicand Social Affairs, 2020).These three examples indicate that a considerable market for the type of rationalisedemail communication delivered by GPT-3-based software exists. This, combined with thecost-effectiveness of the technology shown earlier, suggests that the already low fixed costs ofdeveloping the software would become negligible if the product were scaled to fill even afraction of the potential market size. Ergo, this technology would probably be economicallyviable. 16 onclusion

In this paper, we explored the possibility of rationalising email communication using GPT-3.Primo, we demonstrated the technical feasibility of understanding incoming emails andgenerating responses, drawing on literature from the disciplines of software engineering aswell as data science. We found that it is indeed possible to use GPT-3 to classify emails, applynamed-entity recognition and ultimately to generate emails. Secondo, we applied knowledgefrom both business studies and, again, software engineering to identify ways to tacklechallenges we encountered. We proved that company internal information can beimplemented without conventional fine-tuning. Moreover, weaknesses inherent in GPT-3 suchas bias and the likeliness of mistakes can be tackled by means of an appropriateimplementation of a human-in-the-loop. Terzo, we argued for the economic viability of such asolution by analysing costs and market demand, utilizing common economic principles aswell as market research literature. To that end, we showed that the marginal cost is relativelyconstant (leading to an absence of the effect of diseconomy of scale, indicating scalability),estimated that the technology would be cost saving compared to manual email drafting andfinally showed that even given its limitations, a large demand for such a solution exists infields from customer service in insurance or utilities to public administration. We concludethat applying GPT-3 to rationalising email communication is feasible both technically andeconomically, assuming the particular architecture we propose. Consequently, this idea mightbe promising for entrepreneurs.However, we realise our findings are limited. For instance, we cannot know whetherworkers would accept this solution or that the time spent editing responses is far below thatspent writing them. Furthermore, a major limitation results from the customization costs: Inour proposed solution, GPT-3 can access company internal knowledge by making use ofcognitive search services. Depending on the information structure of a company, the costs of17aking this knowledge available to search services might surpass any possible savings. Toanswer such questions, one would need to conduct further, experimental testing. Anothercaveat is that the uses of GPT-3 prompts cited in this paper are not optimized and in need offurther improvement. Moreover, in this study, we ignored that other, disruptive technologiesmight replace traditional email communication.Despite its limitations, the study honed our understanding of the nature and scope ofuse cases of general language models in a business context. We proved the possibility ofapplying GPT-3, a pre-trained model, to dynamic real-life use-cases where previously, a lot ofdata was necessary to train a fine-tuned model. This realization could have significantimplications on future business process design. We are optimistic that entrepreneurship andfurther research into this idea will reduce the inefficiency of email communication.

Reference List

Al-Alwani, A. (2014). A novel email response algorithm for email management systems.

Journal of Computer Science , (4), 689-696. https://doi.org/10.3844/jcssp.2014.689.696Al-Hassan, A., & Al-Dossari, H. (2019). Detection of hate speech in social networks: Asurvey on multilingual corpus. Computer Science & Information Technology (CS & IT) .https://doi.org/10.5121/csit.2019.90208Amazon Web Services Inc. (2021).

Amazon Kendra: Developer Guide . Amazon.https://docs.aws.amazon.com/kendra/Aronsson, J., Lu, P., Strüber, D., & Berger, T. (2020). A Maturity Assessment Framework forConversational AI Development Platforms. arXiv.org .https://doi.org/10.1145/3412841.3442046Bahja, M. (2020). Natural Language Processing Applications in Business.

IntechOpen .https://doi.org/10.5772/intechopen.92203 18rown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A.,Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan,T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020).Language Models are Few-Shot Learners. arXiv:2005.14165Bundesverwaltungsamt. (2018).

De-Mail-Einführung in der Bundesverwaltung -Einsatzbereiche und Anwendungsbeispiele

Proceedings of the 23rd international conference on Machine learning -ICML '06

Advances in NeuralInformation Processing Systems 28 (p. 1). Curran Associates, Inc.Dale, R. (2021). GPT-3: What’s it good for?

Natural Language Engineering .https://doi.org/10.1017/S1351324920000601Elkins, K., & Chun, J. (2020, September 14).

Can GPT-3 Pass a Writer's Turing Test?

Journalof Cultural Analytics. https://doi.org/10.22148/001c.17212Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences.

Mindsand Machines . https://doi.org/10.1007/s11023-020-09548-1Harvard Business Review. (2019, November 21). Knowledge Mining: the Next Wave ofArtificial Intelligence-led Transformation. Harvard Business Review.https://hbr.org/sponsored/2019/11/knowledge-mining-the-next-wave-of-artificial-intellige19ce-led-transformationHe, C. (2020, September 16).

Understand the pricing of GPT3 . Medium.https://chengh.medium.com/understand-the-pricing-of-gpt3-e646b2d63320Izatt, M. (2020, January 15). Making Gmail’s tabbed inbox work better for you.

Google CloudBlog .https://cloud.google.com/blog/products/gmail/how-gmail-sorts-your-email-based-on-your-preferencesKnight, W. (2016, August 9).

AI’s language problem

The digital energyretailer

Lambda Blog .https://lambdalabs.com/blog/demystifying-gpt-3/Liddy, E. D. (2001).

Natural Language Processing . SURFACE: the Institutional repository forSyracuse University. https://surface.syr.edu/istpub/63/Maas, P., & Bühler, P. (2015).

Industrialisierung der Assekuranz in einer digitalen Welt

Organization Science , (5), 1337-1357. https://doi.org/10.1287/orsc.1120.0806McKinsey & Company. (2012, July 1). The social economy: Unlocking value and productivitythrough social technologies

A developer’s guide to building AI-driven knowledge miningsolutions . Cloud Computing Services | Microsoft Azure.https://azure.microsoft.com/en-us/resources/a-developers-guide-to-building-ai-driven-knowledge-mining-solutions/MLK. (2020, September 4).

OpenAI GPT-3 pricing revealed - Bad news for hobbyists | MLK -Machine learning knowledge . Machine Learning Knowledge.https://machinelearningknowledge.ai/openai-gpt-3-pricing/Nazakat, A. (2020, June 19).

Chatbot: A conversational agent employed with named entityrecognition model using artificial neural network . arXiv.org. arXiv:2007.04248Nelson, B. (2012).

Do You Read Fast Enough to be Successful?

Foreign countries with universal health care

About OpenAI . https://openai.com/about/OpenAI. (2021).

Safety Best Practices . OpenAI API Documentation. Retrieved January 16,2021, from https://beta.openai.com/docs/safety-best-practicesPennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for wordrepresentation.

Proceedings of the 2014 Conference on Empirical Methods in NaturalLanguage Processing (EMNLP) . https://doi.org/10.3115/v1/d14-1162Peters, P., Booth, A., & Moor, N. (2016, May 12).

The digital utility: New opportunities andchallenges

Microsoft Research

The zero marginal cost society: The Internet of things, the collaborativeCommons, and the eclipse of capitalism . St. Martin's Press.Robinson, S. (2020, April 9). Enterprise search software comparison.

SearchContentManagement .https://searchcontentmanagement.techtarget.com/feature/Enterprise-search-software-comparisonSacks, R. (2019, March 1).

Use natural language processing to automate customer support .Medium.https://medium.com/ibm-watson/use-ibm-watson-natural-language-classifier-to-automate-customer-support-b35c2761211cSagar, R. (2020, June 3).

OpenAI Releases GPT-3, The Largest Model So Far .https://analyticsindiamag.com/open-ai-gpt-3-language-model/Sewain, A. (2020).

How NLP сan increase efficiency by reducing time spent on emails .2021.AI. https://2021.ai/nlp-increase-efficiency/Schmitt, X., Kubler, S., Robert, J., Papadakis, M., & LeTraon, Y. (2019). A Replicablecomparison study of NER software: StanfordNLP, NLTK, OpenNLP, spacy, gate. . https://doi.org/10.1109/snams.2019.8931850Singh, S. (2018, July 6).

Natural language processing for information extraction . arXiv.org.arXiv:1807.02383 22toeckli, E., Dremel, C., & Uebernickel, F. (2018). Exploring characteristics andtransformational capabilities of InsurTech innovations to understand insurance valuecreation in a digital world.

Electronic Markets , (3), 287-305.https://doi.org/10.1007/s12525-018-0304-7Talia, D. (2013). Clouds for Scalable Big Data Analytics. Computer .https://doi.org/10.1109/MC.2013.162UN Department of Economic and Social Affairs. (2020).

United Nations e-government survey2020 . United Nations Publications.https://publicadministration.un.org/egovkb/en-us/Reports/UN-E-Government-Survey-2020US Department of Labor. (2021).

State minimum wage laws