The Optics of Language-Integrated Query
HHighlights
The Optics of Language-Integrated Query (cid:63)
J. L´opez-Gonz´alez, Juan M. Serrano, • Getter, A ffi ne Fold and Fold optics are lifted into Optica, a query language for LINQ • XQuery and SQL queries derived from non-standard optic representations • Optics as a higher-level interface over comprehension-based query languages • Typed tagless-final encoding in Scala of the Optica type system and semantics a r X i v : . [ c s . P L ] S e p he Optics of Language-Integrated Query J. L´opez-Gonz´alez a,b, ∗ , Juan M. Serrano a,b , a Universidad Rey Juan Carlos, Calle Tulip´an, s / n, 28933 M´ostoles, Spain b Habla Computing SL, Avda. Gregorio Peces Barba, 28918 Legan´es, Spain
Abstract
Monadic comprehensions reign over the realm of language-integrated query (LINQ), and forgood reasons. Indeed, comprehensions are tightly integrated with general purpose program-ming languages and close enough to common query languages, such as SQL, to guarantee theirtranslation into e ff ective queries. Comprehensions also support features for writing reusable andcomposable queries, such as the handling of nested data and the use of functional abstractions.In parallel to these developments, optics have emerged in recent years as the technology ofchoice to write programs that manipulate complex data structures with nested components. Opticabstractions are easily composable and, in principle, permit both data access and updates. Thispaper attempts to exploit the notion of optic for LINQ as a higher-level language that comple-ments comprehension-based approaches.In order to do this, we lift a restricted subset of optics, namely getters, a ffi ne folds and folds,into a full-blown DSL. The type system of the resulting language of optics , that we have named Optica , distills their compositional properties, whereas its denotational semantics is given bystandard optics. This formal specification of the concept of optic enables the definition of non-standard optic representations beyond van Laarhoven, profunctor optics, etc. In particular, thepaper demonstrates that a restricted subset of XQuery can be understood as an optic representa-tion; it introduces Triplets, a non-standard semantic domain to normalize optic expressions andfacilitate the generation of SQL queries; and it describes how to generate comprehension-basedqueries from optic expressions, thus showing that both approaches can coexist.Despite the limited expressiveness of optics in relation to comprehensions, results are encour-aging enough to anticipate the convenience and feasibility of extending existing comprehension-based libraries for LINQ in the functional ecosystem, with optic capabilities. In order to showthis potential, the paper also describes S-Optica, a Scala implementation of Optica using thetagless-final approach.
Keywords: optics, language-integrated query, type systems, comprehensions, typed tagless-final, Scala (cid:63)
This work is partially supported by a Doctorate Industry Program grant to Habla Computing SL, from the SpanishMinistry of Economy, Industry and Competitiveness. ∗ Corresponding author
Email addresses: [email protected] (J. L´opez-Gonz´alez), [email protected] (JuanM. Serrano)
Preprint submitted to Science of Computer Programming September 3, 2020 . Introduction
The research field of language-integrated query (LINQ [1, 2, 3]) aims at alleviating the impedance mismatch problem [4, 5] that commonly originates in software systems where general-purpose programming languages, on the one hand, and query languages, on the other, need tointeroperate. The problem manifests itself in the form of maintenance, reliability and securityproblems, which are essentially due to the mismatches of programming paradigms and datamodels endorsed by the interacting languages. In order to tackle this issue, the LINQ researchfield favors a domain-specific language (DSL)-based approach [6]. From this perspective, theprogrammer does not simply inject query expressions in the general-purpose language as plain strings , a practice which is a well-known source of bugs and injection attack problems; on thecontrary, she uses a DSL which ensures that the query is well-formed, correctly typed, and more-over, that it helps to overcome the conceptual gap between both the general-purpose and thequery language.Indeed, not every DSL can be given the seal of approval from a language-integrated perspec-tive. For instance, we may embed SQL in a host language like Scala [7] to attain the stateddemand of type safety and yet, the disparity between the flat and nested computational mod-els from both languages would not be reduced in the slightest . This is, without a shadow ofa doubt, a necessary step in order to generate well-formed SQL queries. Scala libraries suchas Doobie [8], which focus on this specific issue, are worthwhile. However, to properly bridgethe impedance mismatch gap, we need DSLs at a higher level of abstraction: close enough togeneral-purpose languages, yet specific enough to allow for the e ffi cient generation of queries fora wide range of querying languages [9]. Since early in its foundation, the LINQ research fieldhas exploited monadic comprehensions [10, 11] as its DSL of choice for this purpose. The basicinsight was originally introduced in [12], and then developed by the Nested Relational Calculus(NRC) [13, 14], which provides the foundation of query languages based on comprehensions.NRC subsumes much work on LINQ theories and systems such as in Kleisli [15], Links [16],Microsoft’s LINQ [2, 17], Database Supported Haskell (DSH) [18], T-LINQ [3], QUE Λ [19] andSQUR [20].In essence, the purpose of research on LINQ is borrowing the comprehension syntax thatin-memory data structures such as lists, sets, bags, and other bulk types enjoy in order to expressqueries at a generic monadic level. To this aim, bulk types are lifted into a proper DSL that ab-stracts away its characteristic in-memory representation but still allows to express queries usingcomprehension syntax. In some cases, as in Kleisli, Links and Microsoft’s LINQ, this liftingmechanism is a primitive part of the general-purpose language itself. In the case of more con-ventional, functional programming (FP) languages, such as Scala, F embedded in the host language using one of the several techniques that FP o ff ers forthis purpose: typed tagless-final [21], generalized algebraic data types (GADTs) [22] or quoteddomain-specific languages (QDSLs) [23]. For instance, quotation is used to embed the T-LINQlanguage in F Λ , in OCaml. Similarly,Quill [24] is a QDSL heavily inspired by T-LINQ, which is embedded in Scala.To illustrate the use of comprehensions in LINQ, we consider the data structures in column“comprehensions” of Table 1, implemented in the Scala programming language – our language ofchoice throughout this paper . According to this model, a couple consists of two members, first The entities of a flat model are either base types or classes with no nested collections. Otherwise, we say that themodel is nested . Appendix A o ff ers a brief account of the major Scala features that are used in this paper. omprehension optics m od e l class Couple(fst: String, snd: String) class
Person(name: String, age: Int) class
Couple(fst: Person, snd: Person) class
Person(name: String, age: Int) i mm u t ab l e def under50_a(couples: List[Couple],people: List[Person]): List[String] =for {c ← couplesw ← people if c.fst == w.name && w.age < 50} yield w.name val under50Fl: Fold[List[Couple], String] = couples ≫ fst ≫ filtered (age < 50) ≫ name val under50_c: List[Couple] => List[String] = under50Fl.getAll g e n e r i c val under50_b = quote { for {c ← query[Couple]w ← query[Person] if c.fst == w.name && w.age < 50} yield w.name} ? Table 1: Towards Optic-Based LINQ. and second, to which we refer by their names; besides names, each person has also an age. Givena list of couples and a list of people, we may obtain the names of those partners who occupy thefirst position and are under 50 years of age by using a list comprehension, as query under50_a shows. Now, using Slick [25] or Quill [24], two well-known libraries of the Scala ecosystem,we may express the very same query in a generic way. To be precise, we call a query generic ifit can be e ffi ciently run (e.g. via appropriate compilation) against data stores of di ff erent sorts:in-memory, external relational databases, non-relational stores such as XML / JSON files, etc. Forinstance, the query under50_b in Table 1 is a generic query implemented in Quill. Being generic,this query may be compiled to di ff erent targets according to the mappings between Scala caseclasses and database schemas that the Quill framework supports (as of this writing Cassandra’sCQL [26] and SQL). For instance, the resulting SQL expression generated for this query by Quillwould be as follows: SELECT w.name
FROM
Couple AS c INNER JOIN
Person AS w ON c.fst = w.name WHERE w.age < 50
We have illustrated the use of comprehensions with a simple example of the so-called flat-flatquery, i.e. a query that receives and returns flat types. These queries are significant because rela-tional databases cannot handle nested results. For instance, we cannot write an SQL query thatreturns rows in which a field contains a list of values. This demonstrates a significant mismatchbetween SQL and programming languages, where nested data models are customary. More-over, comprehension queries, being founded on NRC, can perfectly well handle nested data,which might lead us to think about wasted expressiveness. In fact, the opposite is true: severalconservativity results show that we can certainly use nested data as intermediate values in com-prehensions [27], even in the presence of parameterized queries (i.e. using lambda expressions)[16] and still be able to generate normalized queries which do not use nested data in any way. Wecan even accommodate flat-nested queries through several flat-flat normalized queries by usingtechniques for query shredding [28]. In sum, comprehensions are exceptionally good from aLINQ perspective: well integrated to a wide range of programming languages and close enoughto relational databases in order to generate e ff ective queries.3n spite of this, we find three major problems or inconveniences in the current use of com-prehensions for LINQ. First, comprehensions can only express retrieval queries but updates areequally important. This is acknowledged as an open problem in the LINQ field [9]. Second,the use of nested data and functional abstraction in comprehension-based languages such asLinks / T-LINQ undoubtedly helps in obtaining more compositional queries [3]. However, thisis done at the expense of a complex re-writing machinery, specially in the case of QDSLs likeT-LINQ. Alternative approaches based on normalization-by-evaluation [20] ameliorate this prob-lem, but the support for compositional queries is nevertheless limited. Basically, this is due tothe fact that comprehensions are nearer to the point-wise notation exemplified by the relationalcalculus than to pure relational algebra and have to deal with variable (re)naming, freshness andscope. Functionally, both formalisms are equally expressive, but the point-free combinators ofrelational algebra are arguably more flexible [29]. This flexibility naturally derives in more mod-ular queries, which directly impacts non-functional concerns such as reuse and change tolerance[30, 31]. Third, there are potential querying infrastructures which are essentially hierarchicalrather than relational, such as NoSQL document-oriented databases, which build upon nesteddata sources in JSON, XML or YAML. The translation of queries at the programming languagelevel into these infrastructures may benefit from a more primitive, algebraic querying model,which is hierarchical by nature.This paper attempts to show that optics [32] may play this role in the realm of language-integrated query, and it supports a pure algebraic approach to LINQ that may potentially beextended for dealing with updates. Indeed, optics, also known as functional references , are ab-stractions that select parts which are contextualized within a whole, which provide methods toaccess and / or update the values that they are selecting. Since the first appearance of lens [33],arguably the most prominent optic, a rich catalog of optics has emerged [34]. They can composewith each other, with a very few exceptions, so that they can seamlessly produce quite complextransformations over immutable nested data structures. Indeed, they have become an essentialcompanion for the functional programmer, as evidenced by the growing popularity of optic li-braries in the functional ecosystem [35, 36]. In sum, we may say that optics are the de-factostandard to manipulating nested data in a point-free, algebraic style; they are as ubiquitous ascomprehensions in functional programming languages and, more importantly, the most commonvariants are explicitly designed to handle both read and write accessors.How do we use optics as a higher-level language to express generic queries? How do theseoptic-based queries relate to generic comprehension queries? How do we translate optic expres-sions to SQL / NoSQL query languages directly? To answer to these questions, we may followthe same strategy that is illustrated in Table 1 for comprehension queries: by using lenses [33],traversals [37], folds, and other optic abstractions, we can query and update immutable datastructures quite naturally; why not borrow this very same syntax to express generic queries thatcan be interpreted over di ff erent target storage systems? For instance, in column “optics” ofTable 1, an alternative nested model is defined for the couples example, where keys in Couple are replaced by full-blown
Person values. Then, query under50_c , derived from optic under50Fl ,which in turn is composed from several optics (the fold couples , and getters her , age and name ),o ff ers an alternative formulation to the comprehension query under50_a . What we are lookingfor is a generic optic-based query, analogous to the comprehension-based query under50_b . Inessence, we need to lift optics into a full-fledged DSL. Contributions and outline.
This paper sets out to define the language of optics which we havedubbed
Optica . We aim at showing that Optica may play the role of an e ff ective query language4or LINQ, alone and in combination with comprehension queries, albeit at the expense of limitedexpressiveness. In general, we aim to prove that optics o ff er a fruitful abstraction for LINQ,and restrict our attention to proving the feasibility of this approach on a selected subset of opticabstractions and domain examples. In particular, these are our contributions: • A review of concrete optics from the mindset of LINQ. We show how to exploit a subset ofstandard optic abstractions and their combinators in order to express compositional queries(Sect. 2). We focus exclusively on read-only optics, i.e. those which select parts from thewhole but do not write back in the data structure, namely, getters , a ffi ne folds and folds .This allows us to focus on a tractable subset of optics, and to prepare the ground to tacklemore ambitious problems in future work, such as the modeling of updates in LINQ. • A formal specification of read-only optics in terms of Optica, a full-blown DSL. The syntaxand type system of the language formalize their compositional and querying features in anabstract way. Its denotational semantics is given by concrete optics themselves (Sect. 3).We show how to implement generic queries over abstract optic models by using Optica ina declarative way, once and for all. • The abstract specification of read-only optics provided by Optica enables the definition ofalternative, non-standard optic representations. We provide three major Optica interpre-tations which attempt to show the capabilities of Optica as a general query language forLINQ: – An XQuery interpretation, that allows us to directly translate Optica queries intoXQuery [38] expressions (Sect. 4). This aims at showing the adequacy of Optica fordealing with common data sources of NoSQL document-oriented databases. – A SQL intepretation, that generates SQL queries from Optica expressions (Sect. 5).This non-standard denotational semantics builds upon
Triplets , a semantic domainwhich normalizes the optic expression in order to facilitate its direct translation toSQL. The proposed semantics works similarly to the normalization-by-evaluationapproach of SQUR [20]. The major di ff erence is that SQUR consists of a relationalcalculus whereas we work over optics, which are more akin to relational algebra. – A T-LINQ interpretation, that generates comprehension queries. This non-standardsemantics is aimed at showing how to use Optica as a higher-level language for nesteddata in conjunction with comprehension-based languages (Sect. 6). • S-Optica, an embedded DSL implementation of Optica in Scala using the tagless-finalapproach (Sect. 7). This implementation is intended as a proof-of-concept for illustratinghow to implement the formal Optica type system and semantics in a common, generalpurpose programming language of the software industry. It also aims at providing anexample of the tagless final approach, as well as serving as a reference for extendingexisting libraries for LINQ with optic capabilities.As can be seen, Sects. 2-7 contain the bulk of the paper. Section 8 discusses related workand limitations of the approach. Finally, section 9 concludes and points towards current andfuture work. The Scala library that accompanies this paper is publicly available on a Githubrepository [39]. 5 . Querying with Optics
This section introduces three di ff erent kinds of read-only optics: getters , a ffi ne folds and folds together with their main combinators, where we use Scala as the vehicle to implementthem . In essence, read-only optics are just views without updates, and hence they are not sub-ject to the familiar optic laws [34]. They are not as widespread as their siblings with updatingcapabilities (namely, lenses , a ffi ne traversals and traversals ), given that selecting nested fieldsfrom immutable data structures is usually a trivial task. Nonetheless, they exhibit the same com-positional features and patterns as the rest of optics, and will thus allow us to illustrate the generaldeclarative querying style advocated by them. The abstractions and examples that we put forwardin this section will be used throughout the paper. ffi ne Folds and Folds First of all, it is worth noting that we choose the concrete optic representation, where thenotions of whole and part are clear, in order to make definitions easier to understand. Thereare other representations, such as van Laarhoven [37] or profunctor [32], that implement opticcomposition in a remarkably elegant way, whose signatures, however, are not as easy to approachfor an outsider . Definition 1 (Getter).
A getter consists of a function that selects a single part from the whole .We encode it in Scala as follows: case class Getter[S, A](get: S => A) The type parameters S and A will consistently serve as the whole and the part, respectively,throughout the di ff erent optic definitions.There are several getter combinators that will be used frequently in the text that have beencollected in the companion object for Getter , which is shown in Fig. 1. The andThen methodcombines getters that are selecting nested values in order to produce a new getter that selectsa deeply nested value. The getter id is the neutral component under the andThen composition,where whole and part do coincide. The fork combinator is required if we wish to put di ff erentparts together. The like combinator selects a constant part which is taken as parameter, where thewhole is ignored. The remaining combinators essentially lift arithmetic operations into functionsthat take getters selecting operands as parameters and produce a getter that selects the operationresult. Remark 1.
We assume ≫ and ∗∗∗ as infix versions of andThen and fork , respectively, where thelast symbol has precedence over the first. Similarly, we will overload the operators === , > and - as infix versions of equal , greaterThan and subtract , respectively. Last, we will use the postfixexpression p.not as an alias for not(p) . We ignore other read-only optics such as fold1 , since they do not add value in the particular examples that we haveselected for this paper. Appendix A provides a Scala cheat sheet that describes the most fundamental Scala abstractions in the context ofthis work. As evidenced by the jokes around this topic in the functional programming community https://pbs.twimg.com/media/CypY7B1W8AAvqwl.jpg A concrete lens is basically a getter plus a function to update the whole from a new version of the part. bject Getter { def id[A]: Getter[A, A] = Getter(a => a) def andThen[S, A, B](u: Getter[S, A], d: Getter[A, B]): Getter[S, B] = Getter(s => d.get(u.get(s))) def fork[S, A, B](l: Getter[S, A], r: Getter[S, B]): Getter[S, (A, B)] = Getter(s => (l.get(s), r.get(s))) def like[S, A](a: A): Getter[S, A] = Getter(const(a)) def not[S](b: Getter[S, Boolean]): Getter[S, Boolean] = b ≫ Getter(!_) def equal[S, A: Equal](x: Getter[S, A], y: Getter[S, A]): Getter[S, Boolean] = Getter(s => x.get(s) === y.get(s)) def greaterThan[S](x: Getter[S, Int], y: Getter[S, Int]): Getter[S, Boolean] = Getter(s => x.get(s) > y.get(s)) def subtract[S](x: Getter[S, Int], y: Getter[S, Int]): Getter[S, Int] = Getter(s => x.get(s) - y.get(s))} Figure 1: Getter Combinators.
Remark 2.
Fork-like optic composition (we will also refer to it as horizontal composition ) isnot widespread in the folklore. Indeed, it is not possible to implement it in a safe way for mostoptics. For example, an analogous implementation for composing lenses horizontally wouldviolate lens laws [40] when both lenses select the very same part. Definition 2 (A ffi neFold). An a ffi ne fold consists of a function that selects at most one part fromthe whole. We encode it as follows: case class AffineFold[S, A](preview: S => Option[A])
We could see this optic as a simplification of an a ffi ne traversal , where we omit the updatingpart.Once again, we have packaged several a ffi ne fold combinators in Fig. 2. The identity a ffi nefold simply selects the whole value and wraps it in a Some case. The andThen combinator selectsthe innermost value just in case both optics u and d denote defined values. Otherwise, it will selectnothing. We implement this functionality in terms of the Option monad using for-comprehensionsyntax. Finally, we consider filtered as an interesting builder of a ffi ne folds, which declares thesame types for whole and part. It just discards the value (returning None ) in case it is actuallypointing to something and the input predicate does not hold for it.
Remark 3.
It is worth emphasizing that the predicate that filtered takes as parameter is a getteritself. This is unusual in folklore libraries, where a plain lambda expression is taken as an argu-ment instead. However, predicates can be perfectly understood as queries (getters, in particular).We will exploit this idea in the next section to avoid introducing lambda terms in the Optica DSL. And consequently, we will refer to andThen as vertical composition . bject AffineFold { def id[A]: AffineFold[A, A] = AffineFold(a => Some(a)) def andThen[A, B, C](u: AffineFold[A, B],d: AffineFold[B, C]): AffineFold[A, C] = AffineFold(s =>for {b ← u.preview(s)c ← d.preview(b)} yield c) def filtered[S](p: Getter[S, Boolean]): AffineFold[S, S] = AffineFold(s => if (p.get(s)) Some(s) else
None) implicit def to af [S, A](g: Getter[S, A]): AffineFold[S, A] = AffineFold(s => Some(g.get(s)))}
Figure 2: A ffi ne Fold Combinators. Remark 4.
One of the major benefits of optics is that they compose heterogeneously ; in otherwords, it is possible to combine getters, a ffi ne folds and folds among them. To put it simply,we can turn getters into a ffi ne folds and we can turn a ffi ne folds into folds. An example of suchcasting is shown in Fig. 2 ( to af ), where we find it implemented as an implicit converter. Thereby,the Scala compiler applies the conversion automatically when it detects a getter where an a ffi nefold is expected instead. Definition 3 (Fold).
A fold consists of an optic that selects a (possibly empty) sequence of partsfrom the whole. case class
Fold[S, A](getAll: S => List[A])
We could see this optic as a simplification of a traversal [37], where we omit the updating part.As usual, we have packaged the fold primitives in the corresponding companion object,which can be found in Fig. 3. The implementation of id and andThen is basically the same as theone we showed for a ffi ne folds, the di ff erence being that we work with the List monad insteadof the
Option one . The nonEmpty method takes a fold as a parameter and produces a getter thatchecks whether the number of selected parts is equal to zero. The remaining combinators ( empty , all , any and elem ) are just derived definitions, which are implemented in terms of other combi-nators, where we assume that object-oriented dot syntax is available. For instance, nonEmpty(fl) becomes fl.nonEmpty and all(fl)(p) becomes fl.all(p) .The implementation of elem might deserve further explanation. Since we favor getters overplain functions as predicates (as stated in Remark 3), we need to use optic abstractions andcombinators to build them. The following is a derivation where we start with an implementationthat we consider natural —which requires lambda expressions— and we end up with the standingimplementation —where lambda expressions are removed. With a very few exceptions, which are beyond the scope of this paper. Similarly, we can have getters following the same pattern by using the Id monad, but we avoid doing this for brevity. bject Fold { def id[A]: Fold[A, A] = Fold(a => List(a)) def andThen[A, B, C](u: Fold[A, B], d: Fold[B, C]): Fold[A, C] = Fold(s =>for {b ← u.getAll(s)c ← d.getAll(b)} yield c) def nonEmpty[S, A](fl: Fold[S, A]): Getter[S, Boolean] = Getter(fl.getAll(_).nonEmpty) /* List.nonEmpty */ def empty[S, A](fl: Fold[S, A]): Getter[S, Boolean] = fl.nonEmpty.not def all[S, A](fl: Fold[S, A])(p: Getter[A, Boolean]): Getter[S, Boolean] = (fl ≫ filtered(p.not)).empty def any[S, A](fl: Fold[S, A])(p: Getter[A, Boolean]): Getter[S, Boolean] = fl.all(p.not).not def elem[S, A: Equal](fl: Fold[S, A])(a: A): Getter[S, Boolean] = fl.any(id === like(a)) implicit def to fl [S, A](a: AffineFold[S, A]): Fold[S, A] = Fold(s => a.preview(s).toList)} Figure 3: Fold Combinators. fl.any(Getter(s => s === a)) (cid:39) [definition of ‘id‘ getter]fl.any(Getter(s => id.get(s) === a) (cid:39) [definition of ‘like‘ getter]fl.any(Getter(s => id.get(s) === like(a).get(s)) (cid:39) [definition of ‘equal‘ getter]fl.any(id === like(a)) Note that === is overloaded. In fact, the occurrence of this method in the last line corresponds tothe equal combinator from getters, while the rest refer to the standard comparison method fromthe
Equal type class.
Remark 5.
As we have seen throughout this section, read-only optics are essentially functionsthat select parts from the whole, but we have introduced them as separated definitions. Thisdistinction between functions and optics turns out to be central in this work, since Optica expres-sions denoting one or the other can be evaluated in very di ff erent ways, as we will see throughSects. 3-6. Once we have seen several standard combinators and some interesting features from optics,we will exercise them to illustrate the querying style and common patterns advocated by optics.For this task, we have selected two examples from [3] , which will be used throughout the paper. The first example has been slightly updated in order to adapt it to today’s society. .2.1. Couples Example This example extends the one which was introduced in Table 1. Remember that it consistsof a simple relation of couples, where the name and age of each person forming them is alsosupplied: type
Couples = List[Couple] case class
Couple(fst: Person, snd: Person) case class
Person(name: String, age: Int)
The associated data structures are defined following a nested , rather than a relational approach,i.e. couples contain a full person value, rather than a person key. This distinction becomesessential in Sect. 6. Once we have defined the model, we provide
CoupleModel -specific optics toselect relevant parts from the domain. object
CoupleModel { val couples: Fold[Couples, Couple] = Fold(identity) val fst: Getter[Couple, Person] = Getter(_.fst) val snd: Getter[Couple, Person] = Getter(_.snd) val name: Getter[Person, String] = Getter(_.name) val age: Getter[Person, Int] = Getter(_.age)}
Basically, and for this particular example, we supply a getter for each field, where whole and partcorrespond to data and field types, respectively. The Scala placeholder syntax is used in thesedefinitions. There is also a simple fold that we can use to select each couple from
Couples , thatwe see as the root type in the nested model.
Remark 6.
The examples that are presented in this paper do not include a ffi ne folds as part ofthe domain models, but they could be helpful to model optional values. For instance, we coulduse them to consider an optional address field associated to each person.Now, we can use the standard optics defined in the previous section and the specific opticsdefined for this domain to compose new ones. For instance, the following fold selects the nameand age di ff erence from all those couples where the first member is older than the second one. val differencesFl: Fold[Couples, (String, Int)] = couples ≫ filtered((fst ≫ age) > (snd ≫ age)) ≫ (fst ≫ name) ∗∗∗ ((fst ≫ age) - (snd ≫ age)) Firstly, we use couples as an entry point and we use filtered to remove the couples in which theage of the first member is not greater than the age of the second one. Right after filtering, weselect the name of the first member and we put it together with the age di ff erence, by means of ∗∗∗ , to determine the subparts that the optic is selecting.Once we have defined the fold, we need to generate the query that selects the correspondinginformation from the immutable structure, i.e. a function that takes the couples as argument andreturns the matching values. For this task, we simply use getAll . val differences: Couples => List[(String, Int)] = differencesFl.getAll If we feed this query with the same data that was used in the original example [3], we shouldexpect the same result. val data: Couples = List(Couple(Person("Alex", 60), Person("Bert", 55)),Couple(Person("Cora", 33), Person("Demi", 31)),Couple(Person("Eric", 21), Person("Fred", 60))) val res: List[(String, Int)] = differences(data) // res: List[(String, Int)] = List((Alex,5), (Cora,2)) res when we run thequery with the original data. As expected, it indicates that Alex and Cora are older than theirmates by 5 and 2 years, respectively.
We move on to the next example, where our model is an organization which is formed byemployees. In addition, each employee has a set of tasks that she is able to perform. type
Org = List[Department] case class
Department(dpt: String, employees: List[Employee]) case class
Employee(emp: String, tasks: List[Task]) case class
Task(tsk: String)
Once again, we supply
OrgModel -specific optics to select relevant parts from the domain: object
OrgModel { val departments: Fold[Org, Department] = Fold(identity) val dpt: Getter[Department, String] = Getter(_.dpt) val employees: Fold[Department, Employee] = Fold(_.employees) val emp: Getter[Employee, String] = Getter(_.emp) val tasks: Fold[Employee, Task] = Fold(_.tasks) val tsk: Getter[Task, String] = Getter(_.tsk)}
In this case, we find several fields containing lists, thus, we provide folds instead of getters todeal with sequences of parts. Now, we compose a fold to select the name of those departmentswhere all employees know how to abstract . def expertiseFl: Fold[Org, String] = departments ≫ filtered(employees.all(tasks.elem("abstract"))) ≫ dpt This expression refers first to all departments, and then it filters the ones where all employeescontain the task "abstract" . Finally, it selects their textual identifier ( dpt ). Once the fold isdefined, we produce the query to obtain the selected departments: def expertise: Org => List[String] = expertiseFl.getAll Once more, we feed the query with the original organization’s data. val data: Org = List(Department("Product", List(Employee("Alex", List(Task("build"))),Employee("Bert", List(Task("build"))))),Department("Quality", List.empty),Department("Research", List(Employee("Cora", List(Task("abstract"), Task("build"), Task("design"))),Employee("Demi", List(Task("abstract"), Task("design"))),Employee("Eric", List(Task("abstract"), Task("call"), Task("design"))))),Department("Sales", List(Employee("Fred", List(Task("call")))))) val res: List[String] = expertise(data) // res: List[String] = List(Quality, Research)
The resulting value shows that the departments of
Quality and
Research are the only ones whereall employees are able to abstract .The general pattern should be clear now. Firstly, we define the involved data types in themodel and supply specific optics to select their parts. Secondly, we use these optics and thestandard ones to express more sophisticated selectors in a modular and elegant way. Finally, werun the optic method with an initial whole to produce the expected query. As it can be seen, theapproach is eminently declarative: the aspects of composing the desired optic and running it arecompletely decoupled. 11 ase types b :: = N | B | S Model types t :: = b | ( t , t )Optic Types s :: = getter t t | a ffi ne t t | fold t t Query Types u :: = t → t | t → option t | t → list t Constants c (of base type)Optic Expressions e :: = id gt | id af | id fl | e ≫ gt e | e ≫ af e | e ≫ fl e | like c | not e | e > e | e == e | e − e | e ∗∗∗ e | filtered e | nonEmpty e | to af e | to fl e Query Expressions q :: = get e | preview e | getAll e Figure 4: Optica syntax
Remark 7.
We will refer to get , preview and getAll as the queries derived from their corre-sponding optic types. Read-only optics just supply basic reading queries, in a one-to-one map-ping. Thereby, the separation of concerns between optic expressions and generated queries isnot as clear as in other optics. For instance, lenses broaden their catalog of derived queries withqueries to read, replace or even modify the part that the lens is selecting. We further discuss theimplications of this insight in Sect. 8.
3. Optica Core
The last section has introduced optics using their so-called concrete representation, but thesame querying style is actually supported by other isomorphic representations, such as vanLaarhoven or profunctor optics. This section aims at specifying the concepts of getters, a ffi nefolds and folds, in a generic way, independently of any particular representation . The resultingformalization is a domain-specific language that we have named Optica , which directly sup-ports the optic querying style, and potentially allows for non-standard optic representations thatgenerate queries in XQuery or SQL, for instance, as the next sections will show.This section will first introduce the syntax and type system of Optica, where standard primi-tives and combinators are declared. Secondly, we will show how to provide a generic version ofthe models and queries that we have seen in Sect. 2.2. Finally, we will present the standard se-mantics that we can use as an interpretation to deal with immutable data structures: they recoverconcrete optics and queries, as introduced in the last section.
We introduce the syntax of Optica in Fig. 4. The upper part contains the model types (naturalnumbers, boolean, string and product), optic types (getters, a ffi ne folds and folds) and querytypes (selection functions). The second part shows the set of expressions that form the language,which are defined in close correspondence with their concrete counterparts, introduced in theprevious section. Despite that, there are several aspects which deserve further explanation:12 ef empty fl = def all fl p = nonEmpty fl ≫ not empty ( fl ≫ filtered ( not p )) def any fl p = def elem fl u = not ( all fl ( not p )) any fl ( id == like a ) Figure 5: Optica derived definitions • Constants are not valid query expressions on their own. As we will see later, we use like as the mechanism to represent literals in the language. By doing so, we can reuse opticcombinators for constants, improving the language composability. • The formal syntax avoids the object-oriented dot notation, idiomatic in Scala, and favoursthe prefix notation, as is usual practice in related work. • The methods all , any , elem and empty are not included as syntax primitives. Instead, theyare introduced as derived definitions, as can be seen in Fig. 5. • At present, query expressions are atomic, i.e. it is not possible to compose several of them.The type system is presented in Fig. 6, where α , β and γ represent model types (see Fig. 4).Unlike T-LINQ [9] or QUE Λ [20], Optica does not introduce terms for variables. Thereby, itstype rules are slightly simplified, since they omit the characteristic ‘ Γ (cid:96) ’ prefix. They are struc-tured into four groups, corresponding to getters, a ffi ne folds, folds, and their derived queries.The only optic constructors are id ∗ and like , which allow us to form new optic expressions fromscratch. The remaining combinators should be straightforward, since they exactly correspondwith the ones introduced in the companion objects for getters, a ffi ne folds and folds of Sect. 2 .In regard to queries, note that their type rules do not proceed from the companion objects, butfrom the case class definitions of concrete optics themselves. Their formalization leads to in-troducing functions as a new semantic domain for Optica. However, note that the part of thelanguage corresponding to optics is purely first-order, i.e. no lambdas are needed in order tocreate optic expressions. As we have seen in Sect. 2, we defined domain optics to model the couple and organizationexamples. Now, we want to do the same, but in a general way. To do so, we need to extend thecore language from Sect. 3.1 with new primitives, specific to the particular domain. We presentthem, along with their associated type rules, in Fig. 7. As can be seen, it introduces the entitytypes (
Couple and
Person ) and a term for each optic introduced in Sect. 2.2.1. The type rules justdetermine the type associated to each term, i.e. the kind, whole and part associated to each optic.Once we have defined the Optica language primitives (where we place the standard opticsand combinators) and the domain extension (where we find the structure of the domain data With the exception of those combinators, like any , all , elem and empty , which can be defined compositionally interms of other combinators and do not need to access the internal optic representation. id gt : getter α α id gt g : getter α β g : getter β γ g ≫ gt g : getter α γ ≫ gt g : getter α β g : getter α γ g ∗∗∗ g : getter α ( β, γ ) ∗∗∗ b : β β ∈ base types like b : getter α β likeg : getter α B not g : getter α B not g : getter α N g : getter α N g > g : getter α B > g : getter α β g : getter α β g == g : getter α B == g : getter α N g : getter α N g − g : getter α N − id af : a ffi ne α α id af a : a ffi ne α β a : a ffi ne β γ a ≫ af a : a ffi ne α γ ≫ af p : getter α B filtered p : a ffi ne α α filtered g : getter α β to af g : a ffi ne α β to af id fl : fold α α id fl f : fold α β f : fold β γ f ≫ fl f : fold α γ ≫ fl f : fold α β nonEmpty f : getter α B nonEmpty a : a ffi ne α β to fl a : fold α β to fl g : getter α β get g : α → β get a : a ffi ne α β preview a : α → option β previewf : fold α β getAll f : α → list β getAll Figure 6: Optica type system ntity Types t + : = Couples | Couple | Person
Optic Expressions e + : = couples | fst | snd | name | agecouples : fold Couples Couple couples fst : getter Couple Person fstsnd : getter Couple Person snd name : getter Person S nameage : getter Person N age Figure 7: Couple syntax and type system
Entity Types t + : = Org | Department | Employee | Task
Optic Expressions e + : = departments | dpt | employees | emp | tasks | tskdepartments : fold Org Department departments dpt : getter Department S dptemployee : fold Department Employee employees emp : getter Employee S emptasks : getter Employee Task tasks tsk : getter Task S tsk Figure 8: Organization syntax and type system model in terms of specific optics), we should be able to provide generic domain queries. Weadapt differences (Sect. 2.2.1) as follows:
Definition 4.
The generic versions for di ff erencesFl (optic expression) and di ff erences (queryexpression) are implemented as follows, in terms of the Optica and couple-specific primitives. def di ff erencesFl = couples ≫ to fl ( filtered (( fst ≫ age ) > ( snd ≫ age )) ≫ to af (( fst ≫ name ) ∗∗∗ (( fst ≫ age ) − ( snd ≫ age )))) def di ff erences = getAll di ff erencesFl The implementation of the generic versions of differencesFl and differences are basically thesame as the ones we introduced in Sect. 2.2.1 —modulo the di ff erences that we listed in Sect. 3.1and the fact that the invocations to casting methods such as to af and to fl are made explicit.We can carry out the same exercise for the organization example (Sect. 2.2.2). Once again, weintroduce a language extension containing the entity types and terms associated to this example(Fig. 8). Once we do that, we are able to introduce a generic counterpart for the expertise query.15 [ N ] = Int T [ B ] = Boolean T [ S ] = String T [( α, β )] = ( T [ α ] , T [ β ] ) T [ getter α β ] = Getter [ T [ α ] , T [ β ]] T [ a ffi ne α β ] = Affine [ T [ α ] , T [ β ]] T [ fold α β ] = Fold [ T [ α ] , T [ β ]] T [ α → β ] = T [ α ] ⇒ T [ β ] T [ α → option β ] = T [ α ] ⇒ Option [ T [ β ]] T [ α → list β ] = T [ α ] ⇒ List [ T [ β ]] Figure 9: Optica semantic domains
Definition 5.
The generic versions for expertiseFl (optic expression) and expertise (query ex-pression) are implemented as follows, in terms of the Optica and organization-specific primitives. def expertiseFl = departments ≫ to fl ( filtered ( all employees ( elem ( tasks ≫ to fl ( to af tsk )) ‘ abstract ’)) ≫ to af dpt ) def expertise = getAll expertiseFl At this point, we have defined generic queries which are not coupled to any particular queryinginfrastructure. In the rest of the paper, we will show how to reuse such queries for generatingin-memory, XQuery, SQL and comprehension-based queries.
In defining a new language, it is common practice to start with its syntax and type system, andthen proceed to define its semantics. In our case, we have proceeded in reverse: we started withthe intended semantics (optics and queries) and created an abstract syntax and type system whichmimic its structure. Therefore, what is new in this section is how to formalize the connectionbetween the syntax and type system of Optica and concrete optics, its intended semantics. Forthis task, we provide semantic functions T (Fig. 9) and E (Fig. 10). The first maps Optica typesto their corresponding semantic domains. The second maps an expression of type t to an elementof the semantic domain T ( t ). As can be seen, T just maps types to their Scala counterparts .Given this scenario, the implementation of E turns out to be trivial. In fact, we just translate theOptica expressions into their Scala analogues from Sect. 2. We use ⊕ to unify the di ff erent binarycombinators ( > , - , etc.).We also need to take into account the evaluation of the extended versions of the language,where terms specific to each example are introduced. For instance, Fig. 11 shows the semantic Scala does not include a standard type for natural numbers. Instead of supplying them on our own, we prefer tochoose the standard
Int type for simplicity. E [ id gt : getter α α ] = Getter.id E [ g ≫ gt h : getter α γ ] = Getter.andThen ( E [ g : getter α β ] , E [ h : getter β γ ]) E [ g ∗∗∗ h : getter α ( β, γ )] = Getter.fork ( E [ g : getter α β ] , E [ h : getter α γ ]) E [ like b : getter α β ] = Getter.like ( E [ b : β ]) E [ not g : getter α B ] = Getter.not ( E [ g : getter α B ]) E [ g ⊕ h : getter α δ ] = Getter. ⊕ ( E [ g : getter α β ] , E [ h : getter α γ ]) E [ id af : a ffi ne α α ] = AffineFold.id E [ g ≫ af h : a ffi ne α γ ] = AffineFold.andThen ( E [ g : a ffi ne α β ] , E [ h : a ffi ne β γ ]) E [ filtered p : a ffi ne α α ] = AffineFold.filtered ( E [ p : getter α B ]) E [ to af g : a ffi ne α β ] = AffineFold.to af ( E [ g : getter α β ]) E [ id fl : fold α α ] = Fold.id E [ g ≫ fl h : fold α γ ] = Fold.andThen ( E [ g : fold α β ] , E [ h : fold β γ ]) E [ nonEmpty g : getter α B ] = Fold.nonEmpty ( E [ g : fold α β ]) E [ to fl a : fold α β ] = Fold.to fl ( E [ g : a ffi ne α β ]) E [ get g : α → β ] = E [ g : getter α β ] .get E [ preview g : α → option β ] = E [ g : a ffi ne α β ] .preview E [ getAll g : α → list β ] = E [ g : fold α β ] .getAll Figure 10: Optica standard semantics T [ Couples ] = Couples T [ Couple ] = Couple T [ Person ] = Person E [ couples : fold Couples Couple ] = CoupleModel.couples E [ f st : getter Couple Person ] = CoupleModel.fst E [ snd : getter Couple Person ] = CoupleModel.snd E [ name : getter Person S ] = CoupleModel.name E [ age : getter Person N ] = CoupleModel.age
Figure 11: Semantic domains and standard semantics for couples extension omains and the evaluation of each term from the couples example extension. It is also trivial,since it just maps the domain-specific terms to the concrete optics from Sect. 2.2.1. The corre-sponding instance for the organization extension follows the very same pattern and we omit it forbrevity. Once we have defined the standard semantics for all terms, we should be able to translategeneric queries into plain functions, by means of E . We evaluate di ff erences (Def. 4) as follows: def di ff erencesR : Couples ⇒ List[(String, Int)] = E [ di ff erences : Couples → list ( S , N )] As can be seen, the resulting value is a Scala function that works with immutable data structures.Finally, expertise (Def. 5) is evaluated in this way: def expertiseR : Org ⇒ List[String] = E [ expertise : Org → list S ] It recovers a Scala function which selects the corresponding department names. These functionsare exactly the same as their counterparts from Sect. 2.
4. XQuery
So far, we have seen that optics allow us to manipulate immutable data structures in a modu-lar and elegant way, and that concrete optics can be lifted into the Optica language, a full-blownDSL. The standard semantics of Optica is given in terms of concrete optics; however, this is notsignificant from the point of view of language-integrated query. The state of real applications ismostly handled through SQL and NoSQL databases, web services, etc.; therefore, this sectionand the following will show how to reuse Optica expressions in order to generate queries forexternal data sources beyond in-memory data structures. In particular, this section shows howgetters, a ffi ne folds and folds from the Optica DSL can actually be given a non-standard repre-sentation in terms of XQuery expressions. Prior to that, we will manually adapt the differences and expertise queries and corresponding models into the XML / XQuery setting [38] in an id-iomatic way. This will serve as a point of reference to implement the aforementioned semantics.In this sense, there are several assumptions that we need to make in order to map optics and XMLmodels, which are described subsequently. / XQuery Background
Accommodating objects into XML types is not a trivial task [41]. Figure 12 shows a possibleway of encoding the state of the couples example in an XML document. It contains a root element xml where couples hang from as couple elements, which in turn contain subelements for the first( fst ) and second ( snd ) members that form the couple. Finally, name and age are simple tags thatcontain primitive attributes.Usually, an XML document is accompanied by an XSD schema, which is essential to validatethe information that we place in the document. The schema associated to the couple documentcan be found in Appendix B.1. Among other things, it prevents us from defining people withouta name element, placing non numerical values as age values, and defining several fst elementsinside a couple. Later, we will see that it is important to take this schema into account whileimplementing queries.Now, we would like to produce an XQuery expression, analogous to differences . It shouldbe able to collect the name and age di ff erence of all people who occupy the first position in the18 xml >
XQuery E xml [ couples : fold Couples Couple ] = couple E xml [ f st : getter Couple Person ] = fst E xml [ snd : getter Couple Person ] = snd E xml [ name : getter Person S ] = name E xml [ age : getter Person N ] = age Figure 14: XQuery non-standard semantics for couples extension same semantic domain as the one that we have embraced for queries. Therefore, we define T xml as follows: T xml [ t ] = XQuery
In fact, regardless of the input type, it will always evaluate to an XQuery expression. Remark 8will shed some light on this decision. The rest of the section revolves around the details of E xml and discusses the results. Before presenting the implementation of E xml , there are several assumptions about the adap-tation of the Optic models into XML schemas that need to be made, where we basically adoptthe same conventions that we have seen in Sect. 4.1. Firstly, we will assume that all informationis hanging from an xml element, which acts as the root of the XML document. Secondly, we willassume that every optic corresponds to an XML element, where the optic kind determines thecardinality. Finally, optics that select base types are adapted as simple type elements containing avalue with the corresponding base type; optics that select domain entity types are adapted as ele-ments with complex type , since they nest other elements in turn. Each of the previous conventionsare assumed in the XSD schemas that can be found in the appendix.Now, we have all the ingredients that we need to provide the implementation of the XQueryevaluator. Given its simplicity, we will start with the evaluation of the extended terms for thecouples example that we have collected in Fig. 14. As we have said in the previous paragraph,optics correspond to XML elements, and thus we represent them as mere element selection.Indeed, optic names are good candidates as tag names. However, we need to adjust the pluralnames of folds into the singular, like in couple , since this information is supplied as individualelements. The evaluation for the organization model should be straightforward now and does notadd any value; therefore, we do not show it. The evaluations for the core combinators are collected in Fig. 15. We start with the com-binators for getters. Firstly, ≫ gt is translated as nested access, where the evaluations of thecomposing expressions g and h are tied together. For ∗∗∗ we use the XML interpolation, wherethe evaluation of the composing expressions is placed in the corresponding projection elements.Finally, id gt is interpreted as a self reference ( . ), which is neutral under composition. Now, wemove on to standard getter constructions, beginning with like . Since it produces constant optics,whose part does not depend on the surrounding whole, we decide to map them to XQuery liter-223 E xml [ ] :: XQuery E xml [ id gt : getter α α ] = . E xml [ g ≫ gt h : getter α γ ] = E xml [ g : getter α β ] / E xml [ h : getter β γ ] E xml [ g ∗∗∗ h : getter α ( β, γ )] = < tuple >< fst > E xml [ g : getter α β ] < / fst >< snd > E xml [ h : getter α γ ] < / snd >< / tuple > E xml [ like b : getter α β ] = b E xml [ not g : getter α B ] = not ( E xml [ g : getter α B ]) E xml [ g ⊕ h : getter α δ ] = ( E xml [ g : getter α β ] ⊕ E xml [ h : getter α γ ]) E xml [ id af : a ffi ne α α ] = . E xml [ g ≫ af h : a ffi ne α γ ] = E xml [ g : a ffi ne α β ] / E xml [ h : a ffi ne β γ ] E xml [ filtered p : a ffi ne α α ] = . (cid:104) E xml [ p : a ffi ne α B ] (cid:105) E xml [ to af g : a ffi ne α β ] = E xml [ g : getter α β ] E xml [ id fl : fold α α ] = . E xml [ g ≫ fl h : fold α γ ] = E xml [ g : fold α β ] / E xml [ h : fold β γ ] E xml [ nonEmpty g : getter α B ] = exists ( E xml [ g : fold α β ]) E xml [ to fl a : fold α β ] = E xml [ a : a ffi ne α β ] E xml [ get g : α → β ] = / xml / E xml [ g : getter α β ] E xml [ preview g : α → option β ] = / xml / E xml [ g : a ffi ne α β ] E xml [ getAll g : α → list β ] = / xml / E xml [ g : fold α β ] Figure 15: XQuery non-standard semantics ls . Next, we can see that not is interpreted as the function not , and binary combinators, whichare unified by the symbol ⊕ , are interpreted as the corresponding XQuery operations.Moving on to a ffi ne folds, we find that the composition and identity primitives share the sameimplementation as the ones we have seen for getters. This situation —which also occurs in foldcombinators— demonstrates that we do not make a di ff erence between semantic domains in theinterpretation. In fact, if we understand XQuery as a representation of an a ffi ne fold, it is naturalthat we can also use it as a representation of a getter, and the implementation of to af confirmsthis intuition. This module also contains filtered . Since we have a filtering mechanism availablein XQuery, we simply interpret this primitive into square brackets ([]), passing the semantics ofthe predicate getter as an argument to it.Finally, we present the fold related method nonEmpty . In this particular case, we need toadapt any fold into a getter that selects a boolean. Luckily, XQuery provides a function exists which turns XQuery expressions into booleans. It does it by checking that the result producedby the query is not empty. It might have been noticed that exists was not even mentioned inthe background section. In fact, it was not necessary since the not function does the trick byturning an expression denoting a sequence into a boolean. In particular, not(exists(sq)) (where sq denotes a sequence of elements) is equivalent to not(sq) . However, while evaluating, we donot know whether the exists invocation denoted by nonEmpty will be consumed by a functionlike not (denoted by another expression), and thus we need to invoke exists explicitly .As final notes, we must say that interpreting optic expressions like differencesFl (Def. 4)or expertiseFl (Def. 5) leads to relative queries, i.e. queries that do not start with / and whichare relative to the current context. Those queries are valid XQuery expressions, but they willnot produce any results if we run them against the XML document which contains the wholehierarchy. Fortunately, we could easily compose such relative queries with the ones generatedby external models to produce queries over more complex domains. Leaving this possibilityaside, the next section shows the final refinement that we need to perform in order to obtain theexpected XQuery expressions. The evaluation of query expressions from Optica can be found at the bottom of Fig. 15. Sinceboth optic and query types denote an XQuery expression, the semantics of query expressions isalmost direct. The only caveat is that get , preview and getAll prepend the /xml fragment tothe relative query obtained in the optic representation. This is just a consequence of one ofthe assumptions that we made when adopting XML, where we have stated that an < xml /> rootelement was necessary by convention. Thereby, we take the opportunity to prepend it here.At this point, with the required evaluations at hand, we should be able to recover the targetqueries. As a result, E xml [ di ff erences ] provides the following XQuery expression: /xml/couple[fst/age > snd/age]/
As we have seen throughout this section, both optic and query types are evaluatedinto the same semantic domain
XQuery . Indeed, if we leave interpolation facilities aside, this isessentially an interpretation into XPath, which is just a language to select parts from an XMLdocument, just like optics select parts from immutable data structures. In this sense, it is onlynatural that XQuery can behave as a non-standard optic representation.
5. SQL
SQL is a query language for relational data sources which greatly di ff ers from the hierarchicalnature of both XML and optic models. Nevertheless, this section will show that we can generateSQL statements from Optica expressions. Firstly, we manually adapt the couple and organizationexamples into the SQL setting to better understand the kind of queries that we want to produce.Then, we will present the SQL non-standard semantics of Optica and the assumptions that webuild upon in order to automatically generate analogous queries to the ones that we have obtainedmanually. As opposed to XML, relational databases are organized around flat data sources. As a con-sequence, we face the object-relational impedance mismatch [5] when trying to accommodatethe object models underlying optics into the relational setting. Fortunately, there are patternsthat we can embrace to approach this task, like the
Foreign Key Aggregation or the
Foreign KeyAssociation patterns [42]. We take them as a reference and propose the following tables to adaptthe couples example model that we introduced in 2.2.1:
CREATE TABLE
Person (name varchar (255)
PRIMARY KEY ,age int NOT NULL ); CREATE TABLE
Couple (fst varchar (255)
NOT NULL ,snd varchar (255)
NOT NULL , FOREIGN KEY (fst) REFERENCES Person(name),
FOREIGN KEY (snd) REFERENCES Person(name)); Once again, these invocations could be removed from the resulting query by means of annotations, as in [19], butwe wanted to keep the interpretation compositional in order to make it simpler. http://basex.org/basex/xquery/ ame age Alex 60Bert 55Cora 33Demi 31Eric 21Fred 60 (a)
Person fst snd
Alex BertCora DemiEric Fred (b)
Couple
Alex 5Cora 2 (c)
Differences
Figure 16: Data for the couples example.
As can be seen, case classes are adapted as tables and their attributes are adapted as columns.Once again, as we have seen in the XQuery interpretation, it is necessary to distinguish betweenattributes which contain base types and attributes containing other entities. In fact, attributes thatrefer to entities require pointers to establish the precise connections between the adapted tables,following the Foreign Key Aggregation pattern. We assume figures 16a and 16b as the initialstate for these tables, where the columns in
Couple are clearly selecting names from
Person .Previously, we have seen that the adaptation of differences in the XML setting producedXML as output. We are now dealing with SQL tables, where the output of a statement is a tableitself. Thereby, we would expect Fig. 16c as the result of executing the adaptation of differences .In particular, we could produce such output with the following query:
SELECT w.name, w.age - m.age
FROM
Couple c
INNER JOIN
Person w ON c.fst = w.name INNER JOIN
Person m ON c.snd = m.name WHERE w.age > m.age;
This statement is clearly separated in three major sections. First, we describe
FROM , which buildsthe raw table that the other parts use to gather information from. This table is created by joiningthe table
Couple with two occurrences of table
Person , thereby incorporating the informationfrom the couple members fst and snd . Variables c , w and m allow us to refer to these three tables.Second, the WHERE clause introduces filters that are applied over the compound table to discard therows that do not match the criteria: those where the age of the first member is not greater thanthe age of the second one. Last, the
SELECT clause indicates the columns that we are interested in:the name of the first member and the age di ff erence.Now, we move on to the organization example. First of all, we create tables for departments,employees and tasks following the same adaptation pattern: CREATE TABLE
Department (dpt varchar (255)
PRIMARY KEY ); CREATE TABLE
Employee (emp varchar (255)
PRIMARY KEY ,dpt varchar (255)
NOT NULL , FOREIGN KEY (dpt) REFERENCES Department(dpt));
CREATE TABLE
Task (tsk varchar (255)
NOT NULL ,emp varchar (255)
NOT NULL , FOREIGN KEY (emp) REFERENCES Employee(emp));
All components in the previous statements should be familiar at this point, but there is an im-portant change in the way we configure foreign keys. As we have seen in the couples example,26 pt ProductQualityResearchSales (a)
Department emp dpt
Alex ProductBert ProductCora ResearchDemi ResearchEric ResearchFred Sales (b)
Employee tsk emp build Alexbuild Bertabstract Corabuild Coradesign Coraabstract Demidesign Demiabstract Ericcall Ericdesign Ericcall Fred (c)
Task
QualityResearch (d)
Expertise
Figure 17: Data for the organization example. getters selecting entities were mapped into a column containing a foreign key. However, theorganization example contains multivalued attributes, like employees or tasks , that should not beadapted as a single column. For this situation we adopt the Foreign Key Association pattern. Weassume that these tables have been populated with the data in figures 17a, 17b and 17c.As we have seen before, Quality and
Research are the departments where all employees areable to abstract; therefore, the adaptation of expertise should produce Fig. 17d as a result. Wepropose the following query to generate it:
SELECT d.dpt
FROM
Department AS d WHERE NOT ( EXISTS ( SELECT e.*
FROM
Employee AS e WHERE NOT ( EXISTS ( SELECT t.*
FROM
Task AS t WHERE (t.tsk = "abstract") AND (e.emp = t.emp))) AND (d.dpt = e.dpt))); Reading this query is by no means trivial. Fortunately, it shares the same pattern as the querythat we have seen while adapting expertise in the XQuery setting. In fact,
EXISTS is a functionthat returns true as long as the nested statement produces non-empty results. If we combine itwith
NOT to negate predicates, we can check if all rows satisfy a condition. Beyond the noisegenerated by this pattern, there are additional filters which manifest relations between nested andouter variables that introduce even more complexity in the picture.
Peculiarities of SQL have been known for a long time now [43]. As Date states in con-nection with certain aspects of SQL, “there is so much confusion in this area that it is dif-ficult to criticize it coherently”. Part of the problem resides in that the formal definition ofSQL was produced after the fact , where many academic considerations were neglected. Con-sequently, the language does have its weak points, where the lack of orthogonality becomesa central issue. Although many deficiencies have been remedied in the last decades, the ob-trusive syntax of the
SELECT statement remains a problem. For example, despite the fact thatrelational algebra combinators may appear in any order, the rigid structure of
SELECT statementsmight demand the programmer to recast a relational algebra expression that is considered natu-ral (like
UNION (tabexp1, tabexp2) ) into a semantically equivalent form, compliant with the SQL27tandard (like ( SELECT ...
FROM ...
WHERE ...)
UNION ( SELECT ...
FROM ...
WHERE ...) ). Fortu-nately, [16] supplies a list of syntactic rules which we can use to rewrite any expression from anordinary impure functional programming language into its SQL form. Optica expressions sharewith relational algebra the purely compositional character of algebraic expressions; hence, theyalso require a set of transformations before being able to be translated to SQL queries. Thesetransformations will not be carried out on the optic expression itself, but through a new semanticdomain which plays the role of an intermediate expression that can be directly translated to SQL. T sql [ getter α β ] = Triplet → Triplet T sql [ a ffi ne α β ] = Triplet → Triplet T sql [ fold α β ] = Triplet → Triplet T sql [ α → list β ] = ( S → S ) → SQL T sql [ N ] = Fragment T sql [ B ] = Fragment T sql [ S ] = Fragment
Figure 18: SQL semantic domains
Accordingly, the new semantic domains defined by the semantic function T sql are shown inFig. 18. Firstly, all optic types are mapped to a Triplet endofunction. Triplets are the intermediateexpressions which lie between optic and SQL expressions, whose major purpose is to reconcilethe main disagreements among them. Secondly, since we aim at generating SQL statements, thesemantic domain associated to query types is, as expected, an SQL expression. However, it isrequired to supply a function ( S → S ) that maps relational table names to the column name whichcorresponds to the primary key —information that is not contemplated by the optic model— inorder to produce SQL statements. It is important to remark that the types of the queries get and preview are ignored here. Later on we will explain why this partiality is needed. Finally, basetypes are mapped to triplet fragments , i.e. their evaluation will be used to form triplets.For the rest of the section we will proceed as usual, introducing the semantic function E sql ,which is responsible for evaluating domain, optic and query terms, and we will conclude dis-cussing the results. Prior to that, we find it essential to describe the details about the intermediatestructure Triplet . As we have just seen, a SQL select statement exhibits a remarkable separation of concerns,where selection and filtering, although sharing syntax, belong to di ff erent query clauses. Thisseparation requires a unifying mechanism to refer to the very same item from both clauses. SQLsolves this problem by means of variables declared in the FROM clause, which are accessible fromthe
SELECT and
WHERE scopes.This way of representing queries in SQL contrasts with its optic counterpart. In Optica,the aspects of selection and filtering may appear anywhere within the expression. Moreover,the information required by these components does not need to be collected in a single
FROM component, but specified on demand. In optic expressions, there is no need for variables either,since it is the context where two optics appear that determines whether they are selecting thesame item or not. For example, consider the following optic expression , where we find two This query is only a less direct way of implementing query under50 from Sect. 1. igure 19: Triplet generated for di ff erencesFl . occurrences of fst : couples ≫ filtered ( fst ≫ age < ≫ fst ≫ name Despite having one of them surrounded by filtered , we can see that both of them are selectingthe very same person. Furthermore, note that the information required by the filtering expression(the age of the first member) is collected within the predicate scope and not shared globally.
Triplet is the data structure that we use as an intermediary to alleviate the aforementioneddisagreements. Its main objective is to segregate the three di ff erent aspects, which are evident ina SELECT statement, from an Optica expression. In particular, a triplet is made of three componentswhich correspond to the
SELECT , FROM and
WHERE clauses, respectively. We present an informal viewof this concept in Fig. 19, where we represent the triplet associated to the expression di ff erencesFl (Def. 4). A triplet may be considered as a structured optic whose actual focus is determinedthrough three components: • The middle component determines the potential focus of the optic. In particular, Fig. 19shows this component as a trie whose edges are optics focusing on entity types (notbase types). Its elements are sequences of optics that represent a vertical composition,e.g. the sequence made of the primitive fold couples and getter fst represents the fold couples ≫ fst . The figure labels each node with a distinct name that refers to the entitiesof that unique path. In this example we potentially refer to the list of couples ( c ) and twolists of people: its first ( w ) and second ( m ) members. The nodes of the trie, colored inblack, and its associated names can be reused in the left and right components. • The right component further constrains the potential collections of entities identified by theentity trie, by imposing conditions over them. In the example, there is just one conditionthat restricts the collection of couples (and, consequently, its dependent collections ofpeople) to those where her age ( w ) is greater than his ( m ). These conditions are representedin terms of directed graphs whose edges are optics or binary combinators, like > , whichmake two di ff erent paths converge. Note that red nodes form restriction graphs. https://en.wikipedia.org/wiki/Trie :: = ( s , f , w ) s :: = ( e , e , . . . , e ) f :: = / | insert ˆ p fw :: = { e , e , . . . , e } e :: = like c | not e | e > e | e == e | e − e | ˆ p | ˆ p . optic | nonEmpty t ˆ p :: = ( optic , optic , . . . , optic ) Figure 20: Triplet syntax • Last, the left component defines the actual selection of the overall optic by selecting insequence certain collections from the entity trie, and possibly by further refining themthrough additional optic expressions selecting base values. In the example, we select hername and the age di ff erence of the couple (which will be greater than 0) according to theconstraints which were imposed by the right component. Selections are represented usingthe same graphs as in the constrain component, but nodes forming them are colored inblue.We formalize the notion of triplet in Fig. 20 through its associated syntax. As we have pointedout previously, the middle component is just a trie whose keys are primitive optic expressionsfocusing on entities. Thus, the elements stored in the trie are sequences of such expressions,which we will refer to as paths ( ˆ p ) . Entity tries may be the empty trie, / , or the result ofinserting a new path, insert ˆ p f . The left and right components, s and w , are a sequence and aset of expressions ( e ), respectively. Repeated restrictions in w are redundant and their ordering isirrelevant —that is why a set is chosen. Expressions e are very similar to those from Optica ( like , not , > , etc.), but there are a few major changes that deserve further explanation. Essentially,expressions do not include vertical composition as such; instead, if the vertical compositionselects an entity, it is simply represented through a path from the entity trie. Otherwise, if itselects a base type, it is represented as the projection of an attribute from a path, as in ˆp . optic .For instance, the Optica expression couples ≫ fst would denote the path ( couples , fst ), while theexpression couples ≫ fst ≫ name would denote the projection ( couples , fst ) . name . Horizontalcomposition is also unneeded since the left component is able to collect a sequence of singleselections. Finally, expressions also contain a nonEmpty term, where we keep a snapshot ofa triplet that is later used to produce nested queries. Section 5.2.4, where we formalize theprecise correspondence between triplets and SQL, will show that a nonEmpty term is eventuallytranslated into an EXISTS operator.At this point, we might consider using
Triplet as the chosen optic representation. However,composing the di ff erent triplets generated by optic subexpressions turns out to be a clumsy task.Instead, we would like to use a representation with better compositional guarantees. In this sense,it is more convenient to use a triplet endofunction so that each subexpression can describe theprecise transformation that it performs over the input triplet when it is composed through verticalcomposition . This is how we obtain Triplet → Triplet , the chosen semantic domain for optictypes. We illustrate the idea behind this function in Fig. 21, which shows the evolution of the We will use hats , as in ˆ p , to emphasize the terms which correspond to paths. This is reminiscent of the functional representation of di ff erence lists, where concatenation is implemented in termsof plain composition, and the list is recovered by passing the empty list as input. In this case, the analogues of the emptylist and concatenation are the empty triplet (Def. 6) and vertical composition. di ff erencesFl query, starting from the empty triplet. The arcs in this figureare labelled by the optic subexpressions that identify the applied transformations. As expected,the last triplet in the chain corresponds to the structure that we presented in Fig. 19. We willdetail these steps throughout the next sections while presenting the E sql definition. This section provides the semantics of primitive optics from the domain syntax, such as couples , fst , etc., in terms of the precise transformations that they carry out over an input Triplet .Their formalization can be found in Fig. 22, using the couples domain for illustration purposes.Note that (cid:97) represents the concatenation of sequences. Before explaining this formalization, wewill describe the occurrences of domain primitives in the particular example shown in Fig. 21 aswell as in Fig. 24, where Step b) is shown in detail. • Step a) shows the changes introduced by the term couples . This is a very special case sinceit takes the initial triplet as input. As can be seen, the new changes consist of introducingthe new path in the trie and selecting it in the left component. Bear in mind that we can onlyintroduce optics selecting domain entities in the trie, like couples , that selects a sequenceof Couple entities. • Step b) contains more domain terms in the predicate. In particular, Step b1) (Fig. 24)shows the changes introduced by fst when it is applied to a triplet that focuses on couples.Since this optic focuses on entities and the input triplet is not empty, the result is a tripletthat extends the entity trie by appending the new optic to the couples path, changing itsfocus (i.e. the left part of the triplet) to the new path w . • Step b3) represents the changes introduced by age . In this case, we deal with an opticselecting a base type N . Thereby, it cannot be introduced in the trie. Instead, we refine thefocus of the input triplet which becomes a projection to the new optic.Fortunately, the behaviour of the first and second cases can be factorized, as long as wecontemplate the following definition for the empty triplet: Definition 6.
We formalize the empty triplet as the one that contains a single selection ˆ(), anempty trie and an empty set of restrictions. empty = (( ˆ()) , /, ∅ )The empty sequence () is used in tries to refer to its root, and thereby, we use it as the initial pathin the left component.However, our formalization must take into account the distinction between optics selectingentities and optics selecting base types. Figure 22 introduces functions base and entity for thistask, where (cid:55)→ just represents the standard “maps to” notation from functions. They take theoptic expression as parameter and they produce triplet endofunctions as result. The rest of theimplementation should be straightforward, given the previous explanations. We do not show theevaluation of the organization terms since they follow the very same pattern, exploiting entity and base . 31 igure 21: Triplet evolution for di ff erencesFl sql [ : op α β where op ∈ { getter , a ffi ne , fold } ] :: Triplet → Triplet E sql [ couples : fold Couples Couple ] = entity couples E sql [ f st : getter Couple Person ] = entity fst E sql [ snd : getter Couple Person ] = entity snd E sql [ name : getter Person S ] = base name E sql [ age : getter Person N ] = base agebase b = (( ˆ x ) , f , w ) (cid:55)→ (( ˆ x . b ) , f , w ) whereˆ x is an element of fentity e = (( ˆ x ) , f , w ) (cid:55)→ ((ˆ y ) , f2 , w ) whereˆ y = ˆ x (cid:97) ( e ) and f2 = insert ˆ y f andˆ x is an element of f Figure 22: Triplet non-standard semantics for couples extension
This section specifies the triplet transformations that are associated to the Optica core combi-nators which can be found in Fig. 23. Before delving into the semantics of the getter, a ffi ne foldand fold combinators, we will introduce the next definitions, which will be useful for ensuringthe consistency of the formalization: Definition 7.
Given e : optic β γ , where optic ∈ { getter , a ffi ne , fold } , a triplet t is a valid inputfor e if one of the following conditions holds: (1) t = empty (2) t = E sql [ e : optic α β ] t , for some e , t such that t is a valid input for e Basically, an input triplet is valid for a given optic e if it is the empty triplet (Def. 6), or if itis the result obtained from evaluating an optic expression e with a valid input, where the ‘part’type of optic e coincides with the ‘whole’ type from e . Definition 8. A singleton model type is either a base type or a domain type, i.e. it is the result ofdiscarding product types from Optica model types. Proposition 1.
Let e : optic α β , where optic ∈ { getter , a ffi ne , fold } , β ∈ singleton model type,and t a valid input for e; then: (( s ) , , ) = E sql [ e : optic α β ] t The proposition states that, given a valid input, the result from evaluating an optic that selectsa singleton model type always returns a single selection s . This can be easily proven by inductionsince all combinators producing optics that select singleton model types do generate a singleexpression as left component, according to Fig. 23. In fact, this proposition turns out to benecessary to consider that the implementations of base and entity (Fig. 22) are well-defined. Getters.
First, we describe the implementation of ≫ gt . As Fig. 21 suggests, vertical com-position should be evaluated as the chaining of transformations, i.e. as function composition.Consequently, id gt is implemented as the identity function, meaning no transformation at all.334 E sql [ : op α β where op ∈ { getter , a ffi ne , fold } ] :: Triplet → Triplet E sql [ id gt : getter α α ] = t (cid:55)→ t E sql [ g ≫ gt h : getter α γ ] = E sql [ h : getter β γ ] · E sql [ g : getter α β ] E sql [ g ∗∗∗ h : getter α ( β, γ )] = t (cid:55)→ ( s (cid:97) s , f (cid:79) f , w ∪ w ) where( s , f , w ) = E sql [ g : getter α β ] t and( s , f , w ) = E sql [ h : getter α γ ] t E sql [ like b : getter α β ] = ( , f , w ) (cid:55)→ (( like b ) , f , w ) E sql [ not g : getter α B ] = ((( b ) , f , w ) (cid:55)→ (( not b ) , f , w ))) · E sql [ g : getter α B ] E sql [ g ⊕ h : getter α δ ] = t (cid:55)→ (( b ⊕ b ) , f (cid:79) f , w ∪ w ) where(( b ) , f , w ) = E sql [ g : getter α β ] t and(( b ) , f , w ) = E sql [ f : getter α γ ] t E sql [ id af : a ffi ne α α ] = t (cid:55)→ t E sql [ g ≫ af h : a ffi ne α γ ] = E sql [ h : a ffi ne β γ ] · E sql [ g : a ffi ne α β ] E sql [ filtered g : a ffi ne α α ] = ( s , f , w ) (cid:55)→ ( s , f , { b } ∪ w ) where(( b ) , f , ∅ ) = E sql [ g : getter α B ] ( s , f , ∅ ) E sql [ to af g : a ffi ne α β ] = E sql [ g : getter α β ] E sql [ id fl : fold α α ] = t (cid:55)→ t E sql [ g ≫ fl h : fold α γ ] = E sql [ h : fold β γ ] · E sql [ g : fold α β ] E sql [ nonEmpty g : getter α B ] = ( s , f , w ) (cid:55)→ (( nonEmpty ( E sql [ g : fold α β ] ( s , f , ∅ ))) , f , w ) E sql [ to fl a : fold α β ] = E sql [ a : a ffi ne α β ] Figure 23: Triplet non-standard semanticsigure 24: Filtered step in detail tep e) (Fig. 21) shows an example of horizontal composition ( ∗∗∗ ), where a pair of divergingtriplets are somehow combined. In this special case, the changes are only reflected in the leftcomponent since the middle and right components are exactly the same in both triplets. Theevaluation of ∗∗∗ supplies the input triplet t as an argument to the evaluations of g and h , whichresults in a pair of diverging triplets, as those in the illustration. To carry out the combination ofthem, we concatenate the selections s and s , we merge the tries f and f ( (cid:79) ), and we make theunion of the sets of restrictions w and w . Both the union of sets and the merging of tries areidempotent operations.Next, we find like and not as examples of unary standard combinators which just updatethe left component of the input triplet. The former ignores the previous selection and replacesit by the constant value. The latter transforms the triplet by applying the operation over thecurrent selection. The evaluation demands a single expression as input, where we rely on Prop. 1.Moreover, the Optica type system guarantees that such an expression represents an optic selectinga boolean.Finally, Step b5) (Fig. 24) represents a binary combinator. The situation is very similar to thatof ∗∗∗ . However, instead of concatenating the selections, their single components are fused intothe corresponding expression. The evaluation of this term assumes that the triplets which derivefrom the evaluations of g and h contain singleton selections. Once again, we rely on Prop. 1 sinceall the binary combinators that we can find in Optica take base types as operands. A ffi ne Folds. As in the XQuery evaluation, composition and identity primitives are exactly thesame as those we have just presented for getters. In addition, to af only returns the evaluation ofits argument. The same concept will apply to folds. Consequently, there only remains filtered .As we have seen before, Step b) (Fig. 21) represents this combinator, which was further detailedin Fig. 24 given its complexity. This figure shows an inner box that describes the triplet evolutionspecified by the predicate, which starts from the same input triplet as the filtered whole expres-sion. The rest of the evolution inside the box should be straightforward now. However, it has yetto be explained how to move from the result of Step b5) to the final result of Step b) . Informally,what happens in this example is that the selection of the whole expression does by no meanschange, which seems meaningful since the filter expression should not change the focus; the leftcomponent of the inner expression represents the predicate, which becomes a new constraint inthe right component of the resulting triplet; last, the middle component remains unchanged inthis particular case.The evaluation of filtered in Fig. 23 formalizes the previous intuitions. Firstly, the overallinput triplet is passed as argument to the evaluation of the predicate. Its right component is resetto the empty set and the triplet generated by the predicate is expected to contain an empty setof restrictions, since getters are unable to update the restriction component of the triplet. In theresulting triplet, the selection s passes as is, while the restriction that was selected in the predicateis appended to the existing ones in w . Finally, note that the new entity tree results from the innertriplet, f , since new paths may have been created internally. Folds.
Lastly, we present the interpretation of nonEmpty , which introduces a significant di ff er-ence in comparison with the rest of combinators: it takes a fold as parameter. The evaluation offolds is problematic since they lead to the introduction of nested queries in this infrastructure,as we will see later. This is the reason why we use the nonempty term from Fig. 20 here, which Incidentally, this step is better to illustrate the result of merging two tries.
Remark 9.
An Optica expression is always translatable into a triplet endofunction, as evidencedby the total implementation of E sql , where Prop. 1 has proven essential. In fact, this evaluationjust consists on moving things around to adapt Optica expressions to the triplet configuration.Unfortunately, translating triplets into SQL statements is a partial process, as described in thefollowing section. We have designed triplets to be easily translatable into
SELECT statements. This is clearlyevidenced in Fig. 25, where we compare the triplet generated for di ff erencesFl (Def. 4) and theexpected SQL query that we presented in the background (Sect. 5.1) for the same example. Whilethe translation of the expressions in the left and right components is straightforward, the genera-tion of the FROM clause from the middle component requires further explanation. We present theformal translation of Optica query expressions into SQL in Fig. 26. What first calls our attentionis the absence of translations for get and preview . In fact, it is only possible to produce a SQLstatement from getAll . As suggested in Prop. 9, the translation of triplets into SQL statements isa partial process.
Preconditions.
We describe the precise conditions that an Optica query should satisfy in orderto produce a valid SQL statement : The optic selected type, i.e. its ‘part’, is a flat type. For instance, couples is not translatableinto SQL, since it selects Couple as part, which contains nested references to the entityPerson. The expression couples ≫ fst is valid, since Person does not contain furthernested data structures: name and age are plain values. The expression cannot contain a fold selecting a base type. For example, departments ≫ employees ≫ tasks is valid since all the involved folds do select entity types. The original kind (ignoring castings) of the leftmost expression forming a query has tobe a fold . For example, couples ≫ fst ≫ name is translatable into SQL (it starts withthe couples fold) while fst ≫ name is not (it starts with the fst getter). Thereby, get andpreview are omitted, since getter or a ffi ne fold expressions do not satisfy such condition.We will further motivate each limitation in the following paragraphs, where the whole process ofgenerating SQL statements from triplets is described. Since we aim at turning triplets into SQL expressions, the very first step is to produce atriplet. We achieve this by evaluating the fold expression that accompanies getAll and supplyingthe empty triplet (Def. 6) to the resulting function. Then, we need to refine the entity trie of theobtained triplet by assigning fresh names for each of its paths (which the evaluation function E sql does not generate). Last, we pass the refined triplet as argument to the actual translator ( sql ).Besides the triplet, note that this function receives an additional argument, ˆ local , that specifies It is important to note that an error should be raised if any of them is not satisfied. sql function will be used to translate both the whole SQL query, and the inner queriesof nonEmpty expressions. In this very first invocation, we aim at translating the whole triplet;thus, we pass as ˆ local the ˆ top of the entity trie, which represents the common prefix of every pathof the trie.The sql function delegates the generation of each clause of the whole
SELECT statement intothe corresponding functions select , from and where . Moreover, it calls an additional function, where + , whose purpose will be motivated later on. The results obtained from each function areconcatenated to form the final query. Note that parentheses and brackets are discarded in the re-sult, they are simply introduced to delimit the arguments supplied to each function. In particular,an invocation surrounded by brackets informs that the invocation may be omitted, taking intoaccount the accompanying conditions. We describe the generation process of each clause in thenext paragraphs, where we will make frequent use of the following additional definitions: ρ. ˆ top The key which starts every path of the entity trie, if any ρ . local ( ρ ) The local path of ρ which is extended by ρ , if any ρ ( ˆ p ) The name assigned to the given path in the refined trie ˆ p . last The key which finishes the given path ˆ p . up The second to last key of the given pathoptic . name The name of the given optic primitiveoptic . kind The kind of optic: getter, a ffi ne fold or foldoptic . whole The type of the whole entity that the optic inspectsoptic . part The type of focus to which the optic points to (an entity or base type)pk ( type ) The primary key of the relational table associated to the given type
With a little abuse of notation, we will omit the last attribute in path expressions, as inˆ p . name , instead of writing the more verbose ˆ p . last . name . Select clause.
The select function generates the
SELECT clause by separating the result of trans-lating each expression with commas. We describe the translation of the di ff erent types of expres-sions in the following lines: • The translation of a path ˆ x simply refers to all the columns of the table corresponding tothat path, which was assigned by the fresh function. Also note that the path must referto an entity with no further nested entities (Precondition 1). Otherwise, the query outputwould not contain all the data required to reassemble the entity, i.e. this work does notsupport query shredding [28] yet. • The translation of a projection ˆ x . base is basically the same, but we find an interestingrestriction here. SQL does not support multivalued columns, and therefore we cannot usea fold to project values (Precondition 2). • The translation of nonEmpty is given in terms of
EXISTS , which contains a nested SQLexpression. Thereby, we invoke the sql generator recursively. Before doing this, we needfirst to generate fresh names for the trie of the nonEmpty expression and to merge it withthe outer entity trie . Second, we need to calculate the right ˆ local path and to pass it tothe sql generator. The combinator (cid:67) merges tries, keeping the names from the left when it finds conflicting paths. The evaluation of the rest of expressions should be straightforward since they just adaptoperators and literals into their SQL form.
Remark 10.
None of the optic models associated to the guiding examples include a ffi ne foldsin their definitions. In the particular case of the SQL interpretation, such optics are assumed asfields which may contain a NULL value, i.e. as nullable table columns.
Where clause.
We continue with the
WHERE clause given its similarity with the
SELECT clause,which is generated by the where and where + functions. The former is quite similar to select since it basically delegates the evaluation of the restriction expressions, although it uses AND asdelimiter for the results. The evaluation of expressions is exactly the same as the one that wehave introduced in the previous paragraph. Note that where produces
WHERE True if the set ofrestrictions is empty. Concerning where + , this function is responsible for appending the pre-cise connection between nested and outer variables, which were introduced at the very end ofSect. 5.1. We will explain it together with the discussion of the generation of the FROM clause inthe next paragraph.
From clause.
Before venturing into the from function, there are a few conditions that the genera-tor should preserve. Firstly, it is assumed that ρ. ˆ top must refer to a fold (Precondition 3), since weneed an entry point in the hierarchical tables. This means that we can only translate expressionsthat start with a fold, like di ff erencesFl (Def. 4) —which starts with couples — or expertiseFl (Def. 5) —which starts with departments . Secondly, the invocation to from is omitted if ˆ local is not defined, since this indicates that the current query is not introducing new variables, andtherefore no FROM clause is required.As expected, the from function prepares the
FROM clause. It selects the ‘part’ type from ˆ local as the initial table. Then, it produces an
INNER JOIN expression for each element hanging fromit. This is the reason why tries contain nothing more than entities, since they correspond torelational tables. In general, the complexity associated to these definitions is due to the choiceand implementation of the corresponding Foreign Key patterns (Sect. 5.1).
Figure 25: From triplet to SQL E sql [ getAll g : α → list β ] :: ( S → S ) → SQL E sql [ getAll g : α → list β ] pk = sql ( s , ρ, w ) pk ρ. ˆ top where( s , f , w ) = E sql [ g : fold α β ] empty and ρ = fresh f and ρ. ˆ top is defined sql ( s , ρ, w ) pk [ ˆ local ] = ( select s ρ pk ) [ from ρ pk ˆ local ] ( where w ρ pk ) [ where + ρ pk ˆ local ]; where ρ. ˆ top . kind = fold and from invocation is omitted if ˆ local is not defined and where + invocation is omitted if ρ. ˆ top = ˆ localselect ( e , e , . . . , e n ) ρ pk = SELECT expr e ρ pk , expr e ρ pk , . . . , expr e n ρ pkexpr ˆ x ρ pk = ρ ( ˆ x ) . ∗ whereˆ x . last . part ∈ flat types expr ˆ x . optic ρ pk = ρ ( ˆ x ) . ( optic . name ) where optic . part ∈ base types and optic . kind ∈ { getter , a ffi ne } expr ( t ⊕ u ) ρ pk = expr t ρ pk ⊕ expr u ρ pkexpr ( not e ) ρ pk = NOT ( expr e ρ pk ) expr ( like a ) ρ pk = aexpr ( nonEmpty ( s , f , w )) ρ pk = EXISTS ( sql ( s , ρ (cid:48) , w ) pk ρ. local ( ρ (cid:48) ) ) where ρ (cid:48) = ρ (cid:67) fresh fwhere ∅ ρ = WHERE
Truewhere { e , e , . . . , e n } ρ pk = WHERE expr e ρ pk AND expr e ρ pk AND . . .
AND expr e n ρ pkwhere + ρ pk ˆ local = AND ρ ( ˆ local ) . key = ρ ( ˆ local . up ) . key where key = pk ( ˆ local . whole ) from ρ pk ˆ local = FROM ˆ local . part AS ρ ( ˆ local ) joins where joins = { eqjoin ˆ x ρ pk | ˆ x ∈ ρ, ˆ x = ˆ local (cid:97) ˆ y for some ˆ y (cid:44) () } eqjoin ˆ x ρ pk = INNER JOIN ˆx . part AS ρ ( ˆ x ) cond where cond = USING pk ( ˆ x . whole ) if ˆ x . kind = fold ON ρ ( ˆ x . up ) . ( ˆ x . up . name ) = ρ ( ˆ x ) . ( pk ( ˆ x . part )) otherwise Figure 26: SQL generation nce we have implemented E sql , we can use it to translate generic queries into SQL state-ments. For di ff erences (Def. 4) we get: def di ff erencesSQL : SQL = E sql [ di ff erences : Couples → list ( S , N )] ( Person (cid:32) name ) and we adapt expertise (Def. 5) as follows: def expertiseSQL : SQL = E sql [ expertise : Org → list S ] ( Department (cid:32) dpt , Employee (cid:32) emp ) Unlike other evaluators, E sql requires the relation of primary keys for the involved tables as anadditional argument, since this information is not contemplated in the optic model. We use thenotation ( t (cid:32) k , t (cid:32) k , ..., t n (cid:32) k n ) to build such argument. For instance, the primarykey associated to the table Person is the column name . If we ignore variable names, the SQLstatements which are generated by the previous definitions are exactly the same as those thatwere introduced in Sect. 5.1.As can be inferred from Fig. 26, the evaluation of getAll always leads to a SQL
SELECT state-ment, unless an error condition is present. The resulting query does not contain nested sub-queries, beyond the ones that emerge in the context of
EXISTS (due to the nonEmpty term). The
FROM clause uses
INNER JOIN s as the means to navigate downwards the tables in the model. Besidesthe previous elements, the evaluator just produces expressions with basic functions, operators andliterals; no additional SQL features are required.Clearly, the SQL semantics are not as neat as those associated the XML infrastructure (Sect. 4),since they require a non-trivial normalization into triplets prior to generating the SQL statement.Besides, such generation is partial, and thus triplets must meet certain conditions to guarantee acorrect translation. Fortunately, as we will see in the next section, we can take a di ff erent pathtowards the generation of SQL where we can rely on existing work on language-integrated query.Despite this fact, in Sect. 8 we will discuss why the triplet normalization is still relevant.
6. T-LINQ
This section introduces Optica as a higher level language that we can interpret into com-prehensions. In particular, we generate T-LINQ [3] queries from optic expressions, essentiallyfollowing a similar translation implemented in the Links language in [44]. By doing this, wedemonstrate that the compositional style embraced by optics can be fruitfully exploited in orderto generate comprehension expressions automatically. Moreover, we open the possibility of del-egating the arduous task of generating SQL statements from optic expressions, described in theprevious section, to the existing translation and normalization techniques of T-LINQ. As usual,we supply a brief background on the querying language, T-LINQ, and then we show the non-standard semantics that is needed in order to produce the corresponding comprehension-basedqueries.
In order to manually adapt the expertise query (Def. 5) as a T-LINQ expression , we willfirst review the di ff erence between a relational (or flattened) model and a nested model. Figure 27 This section omits the couples example for the sake of brevity. We select expertise over di ff erences since we considerit to be more challenging. ype NestedOrg = NestedDepartment listtype
NestedDepartment = { dpt : string ; employees : NestedEmployee list } type NestedEmployee = { emp : string ; tasks : Task list } type Task = { tsk : string } (a) Nested organization type Org = { departments : { dpt : string } list ; employees : { dpt : string ; emp : string } list ; tasks : { emp : string ; tsk : string } list } (b) Flattened organization Figure 27: Alternative organization models shows the nested (
NestOrg ) and flat models (
Org ) for the organization example from [3], as T-LINQ records. Note that
Org di ff ers from the nested version NestedOrg in the type of their fields,since it contains textual values which act as foreign keys to refer to the corresponding entities. Infact, the second version has a strong correspondence with the SQL tables that we introduced inSect 5.1. Cheney et al show the quoted expression that adapts the flattened model into the nestedone (Fig. 28), where % org splices the database representation ( < @ database (“Org”)@ > ). In par-ticular, the programmer understands such representation as a list of entities from the relationalmodel; therefore, she can use the widespread notation of list comprehensions to implement thedesired queries, where filtering ( if ... then ) and mapping ( yield ) features are also available. Fig-ure 29 shows the implementation of the expertise query in terms of T-LINQ, which builds uponthe flattened model . Later on, we will see that the nested model becomes essential in the eval-uation of Optica expressions, where we will try to produce an equivalent for expertiseTlinq fromthe evaluation of expertise . def nestedOrg = < @ for d in % org . departments doyield { dpt = d . dpt , employees = for e in % org . employees doif d . dpt = e . dpt thenyield { emp = e . emp , employees = for t in % org . tasks doif e . emp = t . emp thenyield { tsk = t . tsk }}} @ > Figure 28: From flat to nested organization model T-LINQ does support a compositional style, where analogous combinators for all , any , etc. could be supplied [3,Sect. 3.2]. Using these combinators and the nested version of the organizational model, NestedOrg , the expertiseTlinq could be written more concisely. Then, and thanks to its normalization engine, the query could be rewritten to itsequivalent version over the relational model. ef expertiseTlinq = < @ for d in % org . departments doif not existsfor e in % org . employees doif d . dpt = e . dpt ∧ not existsfor t in % org . tasks doif e . emp = t . emp ∧ t . tsk = ‘ abstract ’ then yield t . tsk then yield e . emp then yield d . dpt @ > Figure 29: T-LINQ analogous for expertise
As usual, we provide E tlinq in order to evaluate Optica expressions into T-LINQ expressions.Prior to this, we need to determine the semantic domains for this evaluation by means of T tlinq ,which is shown in Fig. 30. As expected, this semantic function maps Optica types to T-LINQrepresentation types ( Repr ). In particular it just relies on an auxiliary function T aux and wraps theresulting type with Expr . The implementation of T aux is direct for base types, whereas tuples arerepresented as records. Concerning query types, its representation is also straightforward sincefunctions are directly supported by T-LINQ, although we map option to list , since such datatypeis not contemplated by T-LINQ. Last, optic types are simply represented by the query type theygenerate. The next sections present the semantic domains for domain types, the implementationof E tlinq and discusses the final results. T tlinq [ t ] = Expr < T aux [ t ] > T aux [ N ] = int T aux [ B ] = bool T aux [ S ] = string T aux [( α, β )] = { T aux [ α ] , T aux [ β ] } T aux [ α → β ] = T aux [ α ] → T aux [ β ] T aux [ α → option β ] = T aux [ α ] → list T aux [ β ] T aux [ α → list β ] = T aux [ α ] → list T aux [ β ] T aux [ getter α β ] = T aux [ α → β ] T aux [ a ffi ne α β ] = T aux [ α → option β ] T aux [ fold α β ] = T aux [ α → list β ] Figure 30: Semantic domains of the T-LINQ evaluation
This section introduces the evaluation of domain and core terms into T-LINQ expressions.As we have already seen, all domain terms represent optic expressions, and thus they have tobe adapted as functions. Figure 31 shows the semantic domains (by extending T aux ) and the43valuation of the terms in the organization example. Note how the organization types are mappedto the corresponding nested (instead of relational) types. This aspect will be relevant later onwhile generating the target queries. Back to the evaluation of terms, we can see that this isessentially a T-LINQ adaptation of the code that we presented in OrgModel (Sect. 2.2.2) where weused lambda expressions to build concrete optics. T aux [ Org ] = NestedOrg T aux [ Department ] = NestedDepartment T aux [ Employee ] = NestedEmployee T aux [ Task ] = Task E tlinq [ departments : fold Org Department ] = < @ fun ( ds ) → ds @ > E tlinq [ dpt : getter Department S ] = < @ fun ( d ) → d . dpt @ > E tlinq [ employees : fold Department Employee ] = < @ fun ( es ) → es @ > E tlinq [ emp : getter Employee S ] = < @ fun ( e ) → e . emp @ > E tlinq [ tasks : fold Employee Task ] = < @ fun ( ts ) → ts @ > E tlinq [ tsk : getter Task S ] = < @ fun ( t ) → t . tsk @ > Figure 31: T-LINQ semantic domains and non-standard semantics for organization extension
The evaluation of core combinators (Fig. 32) also shares a strong resemblance with thosethat we have seen for their concrete counterparts in Sect. 2.1. In essence, the di ff erence liesin the fact that concrete optics build directly upon the type system of Scala and the T-LINQinterpretation on its own type system. Thus, the evaluation of ∗∗∗ creates a T-LINQ lambdaexpression using T-LINQ records instead of using the lambda expressions and products of Scala.Similarly, ≫ af and ≫ fl implement composition by using directly the primitives of T-LINQ,whereas the implementation of this combinator in concrete optics is based upon the standardScala implementation. The last step towards the generation of final queries is supplying the non-standard seman-tics for queries, which are shown in Fig. 33. This step is trivial since they share the very samesemantic domain as their associated optics; therefore, we just need to evaluate their optic argu-ment. However, and in order to produce the final queries, there is a non-negligible disagreementthat we need to address: the T-LINQ expressions which are generated by E tlinq refer to entitiesfrom the nested model, as introduced by Fig. 31. To resolve this mismatch, we need to reconcilethe relational model with the nested model, so that we can use nestedOrg (Fig. 28) for the task.Thereby, we just supply the nested data to the T-LINQ lambda expression generated from theOptica expression: def expertiseTlinq = < @ % E tlinq [ expertise : Org → list S ] % nestedOrg @ > This produces an alternative implementation of the query which was presented in Fig. 29. How-ever, the T-LINQ expression generated by the new version is much more di ffi cult to read and lesse ffi cient, given the complexity introduced by nestedOrg . Fortunately, this is not a problem, sinceboth queries share the very same normal form, and consequently, they produce the same SQLstatement. 445 E tlinq [ id gt : getter α α ] = < @ fun ( a ) → a @ > E tlinq [ g ≫ gt h : getter α γ ] = < @ fun ( a ) → % E tlinq [ h : getter β γ ] (% E tlinq [ g : getter α β ] a ) @ > E tlinq [ g ∗∗∗ h : getter α ( β, γ )] = < @ fun ( a ) → { = % E tlinq [ g : getter α β ] a , = % E tlinq [ h : getter α γ ] a } @ > E tlinq [ like b : getter α β ] = < @ fun ( a ) → b @ > E tlinq [ not g : getter α B ] = < @ fun ( a ) → not (% E tlinq [ g : getter α B ] a ) @ > E tlinq [ g ⊕ h : getter α δ ] = < @ fun ( a ) → (% E tlinq [ g : getter α β ] a ⊕ % E tlinq [ h : getter α γ ] a ) @ > E tlinq [ id af : a ffi ne α α ] = < @ fun ( a ) → yield a @ > E tlinq [ g ≫ af h : a ffi ne α γ ] = < @ fun ( a ) → for b in % E tlinq [ g : a ffi ne α β ] a dofor c in % E tlinq [ h : a ffi ne β γ ] b yield c @ > E tlinq [ filtered p : a ffi ne α α ] = < @ fun ( a ) → if % E tlinq [ p : a ffi ne α B ] a then yield a @ > E tlinq [ to af g : a ffi ne α β ] = < @ fun ( a ) → yield (% E tlinq [ g : getter α β ] a ) @ > E tlinq [ id fl : fold α α ] = < @ fun ( a ) → yield a @ > E tlinq [ g ≫ fl h : fold α γ ] = < @ fun ( a ) → for b in % E tlinq [ g : fold α β ] a dofor c in % E tlinq [ h : fold β γ ] b yield c @ > E tlinq [ nonEmpty g : getter α B ] = < @ fun ( a ) → exists (% E tlinq [ g : fold α β ] a ) @ > E tlinq [ to fl a : fold α β ] = E tlinq [ a : a ffi ne α β ] Figure 32: T-LINQ non-standard semantics for optic terms tlinq [ get g : α → β ] = E tlinq [ g : getter α β ] E tlinq [ preview g : α → option β ] = E tlinq [ g : a ffi ne α β ] E tlinq [ getAll g : α → list β ] = E tlinq [ g : fold α β ] Figure 33: T-LINQ non-standard semantics for query terms
7. S-Optica: Optica as a Scala library
This section aims at implementing the Optica DSL in Scala. The resulting library (which wecall S-Optica) is provided as a proof-of-concept of the feasibility of extending existing librariesfor LINQ, especially those based on comprehensions with optic capabilities. We will show in de-tail the S-Optica implementation of the syntax and type system of Optica, as well as its standardsemantics. The reader may want to look into the accompanying sources for more informationabout the S-Optica implementation of the interpreters for XQuery, SQL and T-LINQ. The S-Optica implementation is also intended to serve as an illustration of the tagless-final style [21],that we have chosen in order to implement our DSL.
In the tagless-final style, the syntax and type system of a typed DSL is implemented through a type constructor class , which represents the class of representations, or possible interpretations,of that DSL. This type class does not need to be a single, monolithic module, but it is usuallydecomposed into di ff erent type classes which encode di ff erent aspects of the DSL. In our case,the division of classes has taken into consideration the structure of optics and combinators that wefollowed in Sect. 2, and the di ff erence between optic and query types as introduced in Sect. 3.1.Accordingly, Fig. 34 shows the syntax and semantics of the Optica fragment corresponding togetters, a ffi ne folds and folds; Fig. 35 shows the implementation of the fragment of queries, aswell as the overall Optica type class. Some comments on the implementation follow below: • The primitive combinators of the di ff erent types of optics, getters, a ffi ne folds and folds,are implemented in their respective modules. Those which are not primitive, but can bedefined in terms of other combinators, namely, any , all , elem and empty , are defined in the OpticCom type class. • The implementation of these derived combinators benefits from the same syntactic en-hancements that we assumed in Sect. 2.1. In fact, their implementation is literally thesame as that for concrete optics shown in Fig. 3. The di ff erences simply lie in their sig-natures and the intended semantics: whereas the implementations of Fig. 3 only work forconcrete optics, the implementations of Fig. 34 work for any optic representation Repr[_] .Thus, we may instantiate this class in order to work with concrete optics, or any otherstandard representation such as van Laarhoven or profunctor optics; of course, we mayalso instantiate this class in order to work with XQuery, TripletFun or T-LINQ, since theseare legitimate read-only optic representations as have been shown throughout the paper. We supply a brief tour of how to encode type (constructor) classes in Scala in Appendix A.1. We actually used concrete optic types in the signatures of these type classes, i.e. the types
Getter[_,_] , Affine[_,_] , etc. (to which the signatures refer to) are exactly those definedin Sect. 2.1. How can these signatures work for any representation, then? The reason issimply that these combinators do not receive and return plain concrete optics types, buttheir representations : the empty combinator does not receive a concrete fold, but anythingthat counts as a fold optic. Concrete types thus behave mostly as phantom types [45, 46],which specify the abstract semantic domains of the language and aid in the definition ofits type system. • The query types of Optica correspond in the tagless-final style to observations [19]. Thesecan be understood as the standard interpretations that we demand from any representationof the implemented DSL. This matches perfectly with the distinction between optics andquery types: for instance, we will always want a getAll interpretation from a fold program,irrespective of the optic representation. In the tagless-final style, we commonly assigndi ff erent type constructors to DSL expressions and observations. Thus, in Fig. 35 weuse Repr[_] and
Obs[_] for optic and query representations, respectively. This is actuallyequivalent to having two di ff erent DSLs, one for optics and another for queries, into whichoptics are compiled. • Base types of Optica also enjoy a di ff erent representation. As the implementation of the like combinator shows, base values are represented using the very type system of the hostlanguage, i.e. Scala. Thus, its representation is not Repr[_] nor
Obs[_] , but the identity typeconstructor. This representation for base types is also common practice in tagless-finalstyle . • To avoid overloading the like method for the di ff erent base types, Int , String and
Boolean ,we use the GADT
Base , whose object instances are marked as implicits, thereby enablingthe context bound syntax in Scala. The
Base
GADT is also declared in those signaturesthat depends on the like combinator, namely, elem , and the combinator equal . In order to write domain queries, we need to extend the syntax and type system of the Opticalanguage, as we have seen in Sect. 3.2. Quoting from [47], “extensibility is the strong suite of thetagless-final embedding”; therefore, this task should be easy. Indeed, we simply need to declarea new type class where we have a component containing an entry for each domain optic in themodel, as shown in Fig. 36. The types
Couples , Person , etc., are immutable data structures whichmostly behave as phantom types and aid in the extension of the type system of the language.Once we have the core and domain primitives available, we should be able to implementgeneric optic expressions by declaring both dependencies, the combinators of the Optica APIand the domain model (note that observations are not needed to write pure optic expressions): We would run into problems, however, if the target optic representation does not also use itself the Scala types forrepresenting base types. This constraint could be slightly lifted since we may want to compare not only base, but model types in general (cf.Fig. 4). trait
GetterCom[Repr[_]] { def id gt [S]: Repr[Getter[S, S]] def andThen gt [S, A, B](u: Repr[Getter[S, A]],d: Repr[Getter[A, B]]): Repr[Getter[S, B]] def fork gt [S, A, B](l: Repr[Getter[S, A]],r: Repr[Getter[S, B]]): Repr[Getter[S, (A, B)]] def like[S, A: Base](a: A): Repr[Getter[S, A]] def not[S](b: Repr[Getter[S, Boolean]]): Repr[Getter[S, Boolean]] def equal[S, A: Base](x: Repr[Getter[S, A]],y: Repr[Getter[S, A]]): Repr[Getter[S, Boolean]] def greaterThan[S](x: Repr[Getter[S, Int]],y: Repr[Getter[S, Int]]): Repr[Getter[S, Boolean]] def subtract[S](x: Repr[Getter[S, Int]],y: Repr[Getter[S, Int]]): Repr[Getter[S, Int]]} trait AffineFoldCom[Repr[_]] { def id af [S]: Repr[AffineFold[S, S]] def andThen af [S, A, B](u: Repr[AffineFold[S, A]],d: Repr[AffineFold[A, B]]): Repr[AffineFold[S, B]] def filtered[S](p: Repr[Getter[S, Boolean]]): Repr[AffineFold[S, S]] def to af [S, A](gt: Repr[Getter[S, A]]): Repr[AffineFold[S, A]]} trait FoldCom[Repr[_]] { def id fl [S]: Repr[Fold[S, S]] def andThen fl [S, A, B](u: Repr[Fold[S, A]],d: Repr[Fold[A, B]]): Repr[Fold[S, B]] def nonEmpty[S, A](fl: Repr[Fold[S, A]]): Repr[Getter[S, Boolean]] def to fl [S, A](afl: Repr[AffineFold[S, A]]): Repr[Fold[S, A]]} trait OpticaCom[Repr[_]] extends
GetterCom[Repr] with
AffineFoldCom[Repr] with
FoldCom[Repr] { def empty[S, A](fl: Repr[Fold[S, A]]): Repr[Getter[S, Boolean]] = fl.nonEmpty.not def all[S, A](fl: Repr[Fold[S, A]])(p: Repr[Getter[A, Boolean]]): Repr[Getter[S, Boolean]] = (fl ≫ filtered(p.not)).empty def any[S, A](fl: Repr[Fold[S, A]])(p: Repr[Getter[A, Boolean]]): Repr[Getter[S, Boolean]] = fl.all(p.not).not def elem[S, A: Base](fl: Repr[Fold[S, A]])(a: A): Repr[Getter[S, Boolean]] = fl.any(id gt === like(a))} Figure 34: OpticaCom symantics (optic combinators). rait
GetterQuery[Repr[_], Obs[_]] { def get[S, A](gt: Repr[Getter[S, A]]): Obs[S => A]} trait
AffineFoldQuery[Repr[_], Obs[_]] { def preview[S, A](af: Repr[AffineFold[S, A]]): Obs[S => Option[A]]} trait
FoldQuery[Repr[_], Obs[_]] { def getAll[S, A](fl: Repr[Fold[S, A]]): Obs[S => List[A]]} trait
Optica[Repr[_], Obs[_]] extends
OpticaCom[Repr] with
GetterQuery[Repr, Obs] with
AffineFoldQuery[Repr, Obs] with
Fold[Repr, Obs]
Figure 35: Optica symantics (generic combinators and queries). trait
CoupleModel[Repr[_]] { def couples: Repr[Fold[Couples, Couple]] def fst: Repr[Getter[Couple, Person]] def snd: Repr[Getter[Couple, Person]] def name: Repr[Getter[Person, String]] def age: Repr[Getter[Person, Int]]}
Figure 36: Couple domain symantics def differencesFl[Repr[_]]( implicit
O: OpticaCom[Repr],M: CoupleModel[Repr]): Repr[Fold[Couples, (String, Int)]] = couples ≫ filtered((fst ≫ age) > (snd ≫ age)) ≫ (fst ≫ name) ∗∗∗ ((fst ≫ age) - (snd ≫ age)) and generic query expressions (in this occasion, we pass the whole Optica type class, whichincludes the queries): def differences[Repr[_], Obs[_]]( implicit O: Optica[Repr, Obs],M: CoupleModel[Repr]): Obs[Couples => List[(String, Int)]] = differencesFl.getAll As can be seen, the required primitives are injected using the Scala implicit mechanism. In con-trast with Def. 4, this version remarks the aforementioned existence of di ff erent representationsfor optics and queries, as evidenced by the result types. Scala implicits are also exploited bythe library to omit invocations to casting methods, although the required syntax is not shown forbrevity. Type classes in the tagless-final style are commonly named
Symantics , a portmanteau of‘syntax’ and ‘semantics’, to emphasise the fact that the same abstraction serves a double purpose:the type class declaration defines the syntax and type system of the language, whereas type classinstances provide its semantics. The standard semantics of the language is no exception, and forthis purpose we greatly benefit from having reused the standard semantic domains at the syntactic49evel: simply use the identity type lambda λ [x => x] for both the Repr and
Obs parameters, andmap each primitive into its concrete counterpart .We can find the interpretation that supplies the standard semantics of Optica in Fig. 37. Inparticular, it is represented by the singleton object R , which is also a common name for meta-circular interpretations in the tagless-final style. We follow the very same pattern to instantiatethe couple domain terms, as we show in Fig. 38.Now we can use the standard semantics to evaluate generic queries, and to re-implement, ina modular way, the ad-hoc functions that deal with immutable data structures. For instance: val differencesR: Couples => List[(String, Int)] = differences[ λ [x => x], λ [x => x]](R, CoupleModelR) As can be seen, we have specified the standard representation types for optics and queriesalongside the associated evidences. Fortunately, they could be inferred implicitly, as shown inthis alternative and preferred version. val differencesR: Couples => List[(String, Int)] = differences The resulting function is extensionally equal to differences from Sect. 2.2.1. The implemen-tation of the rest of interpretations in this article (XQuery, SQL and T-LINQ) follows the sameprinciples. Interested readers can find a
README file in the companion sources [39] whichbriefly describes the library structure and supplies links to the aforementioned interpreters —andother relevant modules.
8. Discussion
The language of optics.
One the most prominent sought-after features of optics is modularity , i.e. the capacity of cre-ating optics for compound data structures out of simpler optics for their parts. This is speciallyemphasized in the framework of profunctor optics [32], where optic composition builds uponplain function composition and enables straightforward combinations of isos, prisms, lenses,a ffi ne traversals, and traversals. The profunctor representation is particularly convenient to im-plement (and even reveal) the compositional structure of the di ff erent varieties of optics, but, inessence, this structure is also enjoyed by concrete optics, van Laarhoven optics, etc. Modularityis a feature of the language of optics, rather than of any particular representation. This paperhas shown, albeit for a very restricted subset of optics (getters, a ffi ne folds and folds), that thiscompositional structure of optics can be encoded in the type system of a formal language, that wehave named Optica. The denotational semantics of this language was given in terms of concreteoptics but any other isomorphic representation, such as profunctor optics, may have served aswell.Now, the specification of Optica includes not only the compositional features of read-onlyoptics but also, and significantly, their characteristic queries. Taking into account this non-compositional character of optics is essential as soon as we tackle the extension of Optica withnew varieties of optics. For instance, the major di ff erence between folds and traversals is notfound in their compositional properties, but in the queries that they must support: besides getAll ,traversals must also support a putAll query to replace the content of the elements that they areselecting. This is the common case in which standard semantic domains do not eventually behave as phantom types. trait
RGetterCom extends
GetterCom[ λ [x => x]] { def id gt [S] = Getter.id def andThen gt [S, A, B](u: Getter[S, A], d: Getter[A, B]) = Getter.andThen(u, d) def fork gt [S, A, B](l: Getter[S, A], r: Getter[S, B]) = Getter.fork(l, r) def like[S, A: Base](a: A) = Getter.like(a) def not[S](b: Getter[S, Boolean]) = Getter.not(b) def eq[S, A: Base](x: Getter[S, A], y: Getter[S, A]) = Getter.eq(x, y) def gt[S](x: Getter[S, Int], y: Getter[S, Int]) = Getter.gt(x, y) def sub[S](x: Getter[S, Int], y: Getter[S, Int]) = Getter.sub(x, y)} trait
RAffineFoldCom extends
AffineFoldCom[ λ [x => x]] { def id af [S] = AffineFold.id def andThen af [S, A, B](u: AffineFold[S, A], d: AffineFold[A, B]) = AffineFold.andThen(u, d) def filtered[S](p: Getter[S, Boolean]) = AffineFold.filtered(p) def as af [S, A](gt: Getter[S, A]) = gt} trait RFoldCom extends
FoldCom[ λ [x => x]] { def id fl [S] = Fold.id def andThen fl [S, A, B](u: Fold[S, A], d: Fold[A, B]) = Fold.andThen(u, d) def nonEmpty[S, A](fl: Fold[S, A]) = fl.nonEmpty def as fl [S, A](afl: AffineFold[S, A]) = afl} trait RGetterAct extends
GetterAct[ λ [x => x], λ [x => x]] { def get[S, A](gt: Getter[S, A]) = gt.get} trait RAffineFoldAct extends
AffineFoldAct[ λ [x => x], λ [x => x]] { def preview[S, A](af: AffineFold[S, A]) = af.preview} trait RFoldAct extends
FoldAct[ λ [x => x], λ [x => x]] { def getAll[S, A](fl: Fold[S, A]) = fl.getAll} implicit object R extends Optica[ λ [x => x], λ [x => x]] with RGetterCom with
RGetterAct with
RAffineFoldCom with
RAffineFoldAct with
RFoldCom with
RFoldAct
Figure 37: Optica standard semantics. implicit object
CoupleExampleR extends
CoupleModel[ λ [x => x]] { val couples = CoupleModel.couples val fst = CoupleModel.fst val snd = CoupleModel.snd val name = CoupleModel.name val age = CoupleModel.age}
Figure 38: Couple domain standard semantics n the implementation side, we have found the typed tagless-final approach especially suit-able in order to encode this separation of concerns between declarative optic combinators andtheir intended queries. In essence, it closely corresponds to the di ff erence between represen-tations of the DSL and their observations or interpreters. Another essential feature from thetagless-final pattern that we plan to profit from is extensibility. In particular, new optics will beadded to S-Optica through their own type classes (as we have done for getters, a ffi ne folds andfolds) so that we can fully reuse old queries without recompiling sources. Optics versus comprehensions
Optics can be seamlessly combined with comprehensions, as shown in Sect. 6. Indeed, byusing the T-LINQ interpreter of Optica we can freely mix optic expressions with general com-prehension queries. In this way, optics may play within comprehensions a similar role to thatwhich is played by XPath within XQuery [44]. In the following paragraphs, we discuss the basictrade-o ff between expressiveness and modularity of comprehension and Optica queries, so as tobetter appreciate their role in the LINQ landscape.The separation of concerns between declaratively selecting parts of a data structure and build-ing a variety of queries related to those parts is the cornerstone of optics. In this regard, the LINQapproach based on comprehensions focuses on the query building side and, commonly, on con-structing queries of a simple kind: retrieval queries denoting a multiset (the semantic domain forqueries on QUE Λ [19], T-LINQ [3], NRC [13], etc.). The optics approach is, hence, potentiallymore modular. For instance, a representation of traversals intended for SQL should allow us togenerate both SELECT and
UPDATE statements for the queries getAll and putAll , respectively. Weplan to deal with this extension and its trade-o ff s with expressiveness in future versions of Optica.We can still claim further modularity advantages of optics over comprehensions. Basically,these are due to the fact that optics provide a language which is more akin to relational algebrathan the calculus approach that monads provide for comprehensions [29]. Arguably, the supportfor functional abstraction and intermediate nested data of comprehensions languages and systemssuch as Links, T-LINQ or DSH o ff er , leads also to highly compositional queries . We can findexamples, however, where the di ff erence in style manifests itself. For instance, this is the querythat remains to complete Table 1, in the style of S-Optica: def under50_d[Repr[_], Obs[_]]( implicit O: Optica[Repr, Obs],M: CoupleModel[Repr]): Obs[Couples => List[String]] = (couples ≫ fst ≫ filtered (age < 50) ≫ name).getAll which we compare to an analogous query using the Scala implementation of T-LINQ : def under50_e[Repr[_]](couples: Repr[Couples])( implicit Q: Tlinq[Repr],N: CoupleNested[Repr]): Repr[List[String]] = foreach(couples)(c => where (c.fst.age < 50) (yields (c.fst.name))) DSH, in particular, comes with an extensive catalog of list-processing combinators: https://github.com/ulricha/dsh/blob/master/src/Database/DSH/Frontend/Externals.hs . Indeed, our version of the expertise query in Sect. 6 is no more simple than the equivalent version using nested datain [3]. Tlinq[_[_]] provides the tagless-final implementation in Scala of T-LINQ, that we have used to implement thecorresponding interpreter for S-Optica. The role of
CoupleNested in the sample query is similar to the
OrgNested model in Sect. 6.1.
52s this example shows, in adopting the language of optics, modularity is improved in severalrespects. First, as we have mentioned earlier, the query is actually composed of two majorparts: the optic expression, which declares what to select from, and the query expression, whichactually specifies the kind of query to be executed over the selection. Second, the optic expressionis unaware of variables and builds upon finer grained and reusable modules, such as couples , fst , age and name . This results in pure algebraic queries that are arguably more simple to composeand maintain. In essence, we are building out of simpler optics in a purely compositional style,and deriving queries in one shot.The downside of the optics approach in relation to comprehensions, at least in the currentversion of Optica, is its limited expressiveness. Indeed, variables are fruitfully exploited in com-prehensions to express arbitrary joins (e.g. cyclic) whereas optic queries appear to move onlydownwards from the root of the hierarchy. Relational models are more general than nested mod-els, providing the programmer with better navigation tools [48], and therefore not every modelis expressible in Optica. Take the couple model as an example. We assume that each person ishanging from a couple, so that we can find them by diving into the couple fields fst/snd . How-ever, the relational model is able to supply more entries for people who do not necessarily forma couple. To alleviate this problem, we may introduce a new fold people besides the existing couples , sharing a virtual root type as source. The connections between people and fst / snd wouldstill be unclear in the optic model; therefore, new mechanisms should be introduced in order toestablish the precise relationship among them. We leave for future work a more precise investi-gation of the compared expressiveness of the comprehension and optic languages, as well as theextension of Optica with already supported features in comprehension languages like grouping,aggregation and order-by queries [19, 20, 49]. Optics as a general query language.
The role of optics in LINQ expands beyond its combined use with comprehensions. By liftingoptics into a full-fledged DSL, we have opened the door to non-standard interpretations thatdirectly translate the language of optics to data accessors for alternative representations beyondimmutable data structures. For instance, we have provided an interpretation to turn Optica queriesinto XQuery expressions where we have seen that the connection among them is straightforward,leading to a compositional interpreter. The translation ignores the XQuery FLWOR syntax andbasically focuses on XPath features. Indeed, we understand XPath as a language to select partsfrom an XML document, which makes it a perfect example of optic representation. Moreover,since XPath does not provide the means to update an XML document, it also fits perfectly withread-only optics such as getters, a ffi ne folds and folds.It might be worth mentioning that synergies among optics and XML are by no means new.In fact, prominent optic libraries are extended with modules to cope with XML or JSON docu-ments, even packaged as domain-specific query languages, such as JsonPath . In these projects,standard optics facilitate the definition of these DSLs for querying JSON or XML documents.Nevertheless, our approach is radically di ff erent since we provide a general optic language inorder to build generic optics which may be translated over those DSLs (JsonPath, XQuery, etc.).Our approach also di ff ers from others where the process is reversed and a translation ofXPath expressions into a general query language based on comprehensions is performed [3]. In https://hackage.haskell.org/package/xml-lens-0.1.6.3/docs/Text-XML-Lens.html https://github.com/julien-truffaut/jsonpath.pres view forests , a concept that is exploited bythe framework to separate the XML structure from its computation. On its part, Optica exposesa hierarchy of domain optics that external components may use to compose optic expressions, asapplication queries. In addition, we could understand view forests as a kind of optic since theyalso select parts from the underlying database. However, Optica is more general considering thatthe same application queries can be reused against di ff erent targets, and not only SQL.Nevertheless, SQL is the primary target of classical LINQ with comprehensions, and wehave also provided a non-standard SQL interpreter for Optica. Commonly, comprehension-basedqueries need to be flattened in order to guarantee a good performance: the naive translation toSQL is not optimal since it typically leads to nested subqueries. Moreover, translations of flat-flatqueries to SQL are guaranteed to be total and to avoid the problem of query avalanche [16]. Insystems like Links or T-LINQ, these guarantees are even statically checked. In Sect. 5.2.4, ourtranslation to SQL attains similar guarantees concerning the type of generated queries, which areabsent of subqueries, beyond those that are generated by EXISTS , which are unavoidable. How-ever, failures in query generation are signalled at run-time rather than at compile time. We planto solve this limitation in future work by using the optimization techniques that the tagless-finalapproach o ff ers [19]. Our translation process resembles the denotational approach of SQUR [20]and Links [51] rather than the rewriting approach followed in [16, 3, 19]. In particular, we usean intermediate language TripletFun to decouple the filtering, selection and collection aspects ofthe final SQL query. We di ff er from SQUR, however, because the ultimate translation to SQL isperformed directly from this non-standard semantic domain rather than from a normalized opticquery. We plan to incorporate normalization and partial evaluation in future work, which will beconvenient as soon as we extend the language with projections first and second , in correspon-dence with the fork combinator.Given the existing translation to comprehensions from Optica and the established resultsconcerning the generation of SQL from comprehensions the usefulness of TripletFun for thispurpose is certainly relative. However, this demonstrates an instance of optic representationin the relational setting, which we believe to have the potential of being very useful when weextend our results for optics with updating capabilities. In this light, we intend to exploit the verysame
TripletFun representation to generate both
SELECT and
UPDATE statements. Moreover, the
TripletFun interpreter represents an example of complex translation using an intermediate opticrepresentation, which resembles the denotational approach of [20] but performed in the algebraicsetting of optics rather than in the relational calculus of comprehensions. This semantic stylemay serve as a reference for similar complex interpreters, e.g. for NoSQL databases such asMongoDB [52].
Optica versus ORMs and LINQ libraries
Connections between optics and databases are widespread. As a matter of fact, lensesemerged in this context [33] under the umbrella of bidirectional programming . We remark [53]as a recent work in this field, where a practical approach to the view update problem is intro-duced by means of the so-called incremental relational lenses . Although we still do not know ifextending Optica will lead us to contemplate views in the non-standard SQL semantics, we findthis research essential to deal with updating optics in an e ff ective way.54-Optica and object-relational mappers (ORMs), like Hibernate, pursue similar goals: theyaim at working with data in persistent stores as if it were plain in-memory data. However, S-Optica uses the language of optics while ORMs try to remain as close as possible to the customaryobject-oriented style. These are other relevant di ff erences: • S-Optica does not stick to relational databases as its preferred target infrastructure. In fact,Sect. 3 and Sect. 4 show that in-memory immutable data structures and XML files are alsopotential sources of information. • S-Optica is eminently declarative. Indeed, S-Optica queries are simply values that donot produce side e ff ects on their own. This contrasts with ORMs, where it requires a hugeunderstanding of the particular ORM to identify which queries are being launched at anytime. The declarative style of S-Optica enables compositionality as well as the possibilityto introduce further optimizations. • S-Optica queries are expressive and well-typed. Many ORMs introduce contrived addi-tional languages to express queries, and their expressions are usually presented as plainstrings, so that errors are not detected at compile time. • ORMs usually consider the notion of object as the smallest granularity concept to dealwith, while S-Optica supports queries that select very specific parts from the whole data. • ORMs are able to write data back to the store, while this feature is future work in Optica.In general, ORMs have been used for a long time and they are consequently very mature,while Optica is still an experimental and limited library. However, it already solves many of theproblems that are deep-rooted in the ORM approach.The Scala libraries Quill [24] and Slick [25] are arguably the most similar frameworks toS-Optica. The former is strongly inspired by T-LINQ [3] and it therefore follows the same theo-retical principles. The major benefit with regard to the original P-LINQ, the F flatMap methodfor the type constructor
Query , it apparently lacks an implementation for point which is requiredto translate some Optica queries into Quill expressions . Slick is similar to Quill, but it doesnot build upon a theoretical language like T-LINQ . In any case, both Quill and Slick map re-lational models in Scala in a direct way, i.e. as flat data models, whereas S-Optica works withnested data models and has to solve a bigger impedance mismatch. On the other hand, althoughQuill and Slick support updates and deletes, they do this with ad-hoc languages that escape thecollection-like interface. Optica should be able to supply a standard interface in order to supportwrites by introducing additional optics. As a final note, we want to recall that, as Sect. 6 pointsout, Optica should not be seen as a competitor but as a complement for these libraries, sinceoptics and comprehensions were shown to be compatible. Strictly speaking, values are objects in Scala whereas S-Optica queries are polymorphic methods. These may beeasily turned into values by using a Church encoding representation. For instance, the S-Optica query (like 1).getAll . A comparison between Quill and Slick (written by Quill’s author) is provided here https://github.com/getquill/quill/blob/master/SLICK.md . . Conclusions This paper has attempted to demonstrate that optics embrace a much wider range of represen-tations beyond concrete, van Laarhoven, profunctor optics and other isomorphic acquaintances.We have shown, for instance, that a restricted subset of XQuery can be properly understood as an optic representation , i.e. as an abstraction whose essential purpose is to allow us to select partsfrom a data source by using powerful combinators, declaratively, and derive queries from thoseselectors. From this standpoint, data sources of optic representations may range far beyond gen-eral immutable structures: they might be XML documents, as in the case of XQuery, or relationaldatabases. In fact, we have also shown how to derive SQL queries from
TripletFun , an optic rep-resentation that endorses the separation of concerns between selection, filtering and collectionaspects, which characterizes SQL
SELECT statements. Strictly speaking, we may say that SQL isnot an optic but a query language which is translatable from an optic representation. In futurework, we aim at testing the generality of the language of optics through the generation of othere ff ective, idiomatic translations into a diverse range of querying infrastructures. We will partic-ularly pay attention to technologies that are more recent than XQuery, with a clear bias towardsnested data models such as document-oriented NoSQL databases like MongoDB [52] [52], andlanguages like GraphQL [55].We put forward Optica, a full-fledged DSL, to specify what all these representations havein common, i.e. the concept of optic itself. Technically speaking, the type system of Opticaencodes the compositional and querying features of getters, a ffi ne folds and folds, independentlyof any particular representation; concrete optics provide the semantic domains for its standarddenotational semantics; and XQuery, TripletFun and T-LINQ represent semantic domains fornon-standard optic representations. Currently, Optica only pays attention to a very restricted setof optics, namely getters, a ffi ne folds and folds. In future work, we will contemplate other opticslike lenses, a ffi ne traversals or traversals, as well as additional combinators that populate de-factolibraries like Haskell lens and Monocle. This will force us to also pay attention to the laws (e.g.the get-set law of lenses) that the intended queries of optics must comply with. We think opticalgebras [56] will be instrumental in that formalization.The ultimate goal behind this quest for the language of optics has been to show that optics canplay a significant role in the theory and practice of language-integrated query. In particular, wehave demonstrated how optics can be used as a high-level language in order to derive comprehen-sion queries, the most common approach in the LINQ field nowadays. This has the advantageof allowing programmers to exploit optics, the de-facto standard for dealing with hierarchicaldata structures, in their LINQ developments. Additionally, the XQuery and SQL interpretationshave also shown that the language of optics is general enough to cope with LINQ systems in-dependently from comprehensions. However, in the case of SQL, this is done at the expenseof a more limited expressiveness since joins are not currently supported. We plan to investigatepossible extensions to Optica based on the compositional encoding of equijoins in [29]. Wealso plan to investigate future interpretations of Optica into declarative query languages such asDatalog [57] and description logics [58], as well as its connection with recent developments incomprehension-related languages based on monoids [59]. We also think that Optica in its currentshape has a great potential to deal with modern warehouse technologies aimed at data analytics,where updates are not customary.Optics show a potential to cope not only with retrieval queries but also with updates, a kindof query that is commonly unaddressed in theoretical accounts but patently necessary in prac-tice. This paper lays the foundation to engage with this issue in future work. On the one hand,56xtending the syntax and type system of Optica (and S-Optica) with new optic types and com-binators is trivial. On the other hand, the feasibility of introducing updates in the interpretationis subject to limitations of the particular infrastructure. For example, XQuery does not supportupdates (although there are extensions that deal with them [60]), and thus the evaluation of opticswith updating capabilities would be partial. As for SQL, it does support updates, but there is atradeo ff with expressiveness: not all relational queries can be updatable views, which introducesa new level of partiality. Whether triplets need to be extended in order to accomodate updatesis something that requires further research. Lastly, we are very optimistic about the potentialof updates in modern technologies based on nested models [52, 55], where we have carried outseveral simple experiments with positive results.Finally, we have implemented a proof-of-concept of Optica and its interpreters in the Scalalibrary S-Optica by using the tagless-final style. Optica is thus implemented as a type class:the class of optic representations and their intended queries. Beyond the generic queries thatwe have used to guide the explanations, we have tested S-Optica with other queries around thesame domains and with new domains that were extracted from the o ffi cial documentation ofMonocle, Slick and Quill. These examples are located in an experimental branch of S-Opticathat will be available as soon as the library matures. In this sense, we intend to profit from themany improvements in the inminent release of Scala 3.0, particularly in regard to type classesand metaprogramming facilities , with a new implementation of Optica in Dotty [61]. Similarimplementations may have also been developed in other languages that support the tagless-finalapproach, such as Haskell or OCaml. In any case, the results that have been obtained are encour-aging enough to anticipate the feasibility of extending existing comprehension-based libraries inthese languages for LINQ, with optic capabilities. Appendix A. Scala Background
This section aims at providing a brief background of those Scala features that we use in thispaper. First, Table A.2 supplies a cheat sheet where we can find examples and short descriptionsof the abstractions and constructions that we consider to be more relevant in the particular contextof this work. As can be seen, some of them are specific to Scala but there are other conceptswhich are widespread in the functional programming community, where we just want to showhow to encode them in this language. Second, we describe the general pattern to encode typeclasses [62] in Scala [63].
Appendix A.1. Encoding Type Classes in Scala
In Scala, we can use traits to define new type classes. For instance, we encode
Functor asfollows: trait
Functor[F[_]] { def fmap[A, B](fa: F[A])(f: A => B): F[B]}
The trait itself is parameterized with a type constructor
F[_] ; therefore, this is a type constructorclass. It declares the fmap method, which is parameterized with concrete type parameters A and B and value parameters fa and f (organized in two sections ). As can be seen, function types https://dotty.epfl.ch/docs/reference/metaprogramming/toc.html Scala supports definitions with multiple groups of value parameters, delimited by parentheses. In this particularsituation, the separation turns out to be helpful to improve type inference while invoking the method.
Abstraction Code Example Description algebraic data type sealed abstract class
Option[A] case class
None[A]() extends
Option[A] case class
Some[A](a: A) extends
Option[A]
ADTs are implemented using object inher-itance. The example shows Option, whichis also known as Maybe. case class case class
Person(name: String,age: Int)
Defines a class with special features, likeconstruction and observation facilities. companion object trait
Person object
Person
Module that serve as a companion to a classor trait with the very same name. We use itto supply class members and provide im-plicit definitions, like conversors and typeclass instances. for comprehension for {i ← List(1, 2)j ← List(3, 4)} yield i + j // res: List[Int] = List(4,5,5,6)
Syntactic sugar for flatMap , map , etc.Analogous for Haskell’s do-notation . function type val f: Int => Boolean = i => i > 0f(3) // res: Boolean = true Function types are represented with arrowsseparating domain and codomain. Lambdaexpressions follow a similar syntax, wherethe arrow separates parameter and functionbody. implicit resolution def isum(x: Int)( implicit y: Int): Int = x + y implicit val i: Int = // res: Int = Family of techniques to let the compiler in-fer certain parameters automatically. In theexample, i is implicitly passed as second ar-gument to isum . partial function syntax val f: Option[Int] => Boolean = { case None => truecase
Some(_) => false } Special syntax for those situations wherewe want to produce an anonymous (poten-tially partial) function that requires patternmatching on its parameter. placeholder syntax val inc: Int => Int = i => i + 1 val inc2: Int => Int = _ + 1 Syntax for lambda expressions where werefer to the parameter as ’ ’ ; consequently,there is no need to name it. Both inc and inc2 (placeholder syntax) are equivalent. trait trait
Person { def name: String = "John" def age: Int} Similar to Java interfaces (as they enablemultiple inheritance), but traits support par-tial implementation of members. type parameter trait
List[A] def nil[A]: List[A] trait
Symantics[Repr[_]]
Types that are taken as parameters by classor method definitions. It is required a spe-cial notation if we expect higher kindedtypes, like
Repr . type lambda λ [x => x] λ [x => Int] λ [x => Option[x]]
Notation enabled by the kind-projector compiler plugin to produce anonymoustype functions of the kind ∗ → ∗ .Table A.2: Scala Cheat Sheet. re represented with the arrow => . Now, we could follow the same pattern to provide other typeclasses, like Pointed : trait Pointed[F[_]] { def point[A](a: A): F[A]} or Bind : trait Bind[F[_]] { def bind[A, B](fa: F[A])(f: A => F[B]): F[B]} which follow the very same pattern. The previous definitions form the building blocks of
Monad .Thereby, we could compose them to provide the corresponding type class: trait
Monad[F[_]] extends
Functor[F] with
Pointed[F] with
Bind[F] { def fmap[A, B](fa: F[A])(f: A => B) = bind(fa)(a => point(f(a)))} Here, we exploit the multiple inheritance mechanism provided by Scala to mix the involved traits.At this point, we should be able to implement for fmap in terms of bind and point , once and forall. It is common practice to deploy type class instances in the type class companion object sincethe Scala compiler will search for instances in this module, among other places. For example,this is the
Monad companion object, where we have placed the monad instance for
Option ( Maybe in Haskell). object
Monad { implicit object
OptionMonad extends
Monad[Option] { def point[A](a: A) = Some(a) def bind[A, B](fa: Option[A])(f: A => Option[B]) = fa match { case None => None case
Some(a) = de} There are several alternatives to supply an instance. In this occasion, we have decided to imple-ment it as an object
OptionMonad which is declared with the implicit modifier, so that the compilercould find it implicitly if necessary. The implementation of point and bind turns out to be trivial.Once we have defined a type class, we could implement derived functionality. For instance, wecould define the typical join method. def join[F[_], A](ffa: F[F[A]])( implicit
M: Monad[F]): F[A] = M.bind(mma)(identity)
This method requires an implicit evidence of
Monad for F , which is used in the implementation toinvoke bind . Now, we could use the Option instance to flatten a nested optional value by meansof join . join[Option, Int](Option(Option(3)))(Monad.OptionMonad) // res: Option[Int] = Some(3)
Here, we manually supply the type parameters and the monad evidence. Fortunately, the Scalacompiler is able to infer them; therefore, the next version is preferred. join(Option(Option(3))) // res: Option[Int] = Some(3)
As a final remark, note that a monad instance subsumes instances for the rest of type classes thatform it, e.g.
OptionMonad is also an
Option instance for
Pointed .59 ppendix B. XML Schemas
Appendix B.1. Couple XSD
We would like to thank James Cheney, Oleg Kiselyov, Eric Torreborre and the anonymousreviewers for their helpful comments and corrections to a previous version of this paper. In partic-60lar, we give Cheney credit for Sect. 6, since he showed us how to translate into comprehensionsthe optic-based organization example, using the Links programming language.
References [1] V. Tannen, P. Buneman, L. Wong, Naturally embedded query languages, in: J. Biskup, R. Hull (Eds.), DatabaseTheory - ICDT’92, 4th International Conference, Berlin, Germany, October 14-16, 1992, Proceedings, Vol. 646 ofLecture Notes in Computer Science, Springer, 1992, pp. 140–154. doi:10.1007/3-540-56039-4\_38 .[2] E. Meijer, B. Beckman, G. M. Bierman, LINQ: reconciling object, relations and XML in the .NET framework,in: S. Chaudhuri, V. Hristidis, N. Polyzotis (Eds.), Proceedings of the ACM SIGMOD International Conference onManagement of Data, Chicago, Illinois, USA, June 27-29, 2006, ACM, 2006, p. 706. doi:10.1145/1142473.1142552 .[3] J. Cheney, S. Lindley, P. Wadler, A practical theory of language-integrated query, in: Proceedings of the 18th ACMSIGPLAN International Conference on Functional Programming, ICFP ’13, ACM, New York, NY, USA, 2013, pp.403–416. doi:10.1145/2500365.2500586 .[4] G. Copeland, D. Maier, Making Smalltalk a database system, in: ACM Sigmod Record, Vol. 14, ACM, 1984, pp.316–325.[5] C. Ireland, D. Bowers, M. Newton, K. Waugh, A classification of object-relational impedance mismatch, in: 2009First International Confernce on Advances in Databases, Knowledge, and Data Applications, IEEE, 2009, pp.36–43.[6] P. Hudak, Building domain-specific embedded languages, ACM Comput. Surv. 28 (4es) (Dec. 1996). doi:10.1145/242224.242477 .[7] M. Odersky, P. Altherr, V. Cremet, B. Emir, S. Maneth, S. Micheloud, N. Mihaylov, M. Schinz, E. Stenman,M. Zenger, An overview of the Scala programming language, Tech. rep. (2004).[8] R. Norris, Doobie - a functional JDBC layer for Scala [cited 14 / / https://tpolecat.github.io/doobie/ [9] J. Cheney, Language-integrated query: state of the art and open problems, in: L. Dayn‘es, G. Fletcher, W. S. Han(Eds.), NII Shonan Meeting Report No. 2017-6, 2017. doi:10.1145/2588555.2612186 .URL https://shonan.nii.ac.jp/archives/seminar/098/wp-content/uploads/sites/149/2017/02/cheney-shonan-linq.pdf [10] P. Wadler, Comprehending monads, in: Proceedings of the 1990 ACM conference on LISP and functional pro-gramming, ACM, 1990, pp. 61–78.[11] P. Wadler, Monads for functional programming, in: International School on Advanced Functional Programming,Springer, 1995, pp. 24–52.[12] P. Trinder, P. Wadler, Improving list comprehension database queries, in: Fourth IEEE Region 10 InternationalConference TENCON, 1989, pp. 186–192. doi:10.1109/TENCON.1989.176921 .[13] P. Buneman, L. Libkin, D. Suciu, V. Tannen, L. Wong, Comprehension syntax, SIGMOD Record 23 (1) (1994)87–96. doi:10.1145/181550.181564 .[14] P. Buneman, S. A. Naqvi, V. Tannen, L. Wong, Principles of programming with complex objects and collectiontypes, Theor. Comput. Sci. 149 (1) (1995) 3–48. doi:10.1016/0304-3975(95)00024-Q .[15] L. Wong, Kleisli, a functional query system, J. Funct. Program. 10 (1) (2000) 19–56. doi:10.1017/S0956796899003585 .[16] E. Cooper, The script-writer’s dream: How to write great SQL in your own language, and be sure it will succeed,in: P. Gardner, F. Geerts (Eds.), Database Programming Languages - DBPL 2009, 12th International Symposium,Lyon, France, August 24, 2009. Proceedings, Vol. 5708 of Lecture Notes in Computer Science, Springer, 2009, pp.36–51. doi:10.1007/978-3-642-03793-1\_3 .[17] D. Syme, Leveraging .NET meta-programming components from F doi:10.1145/1159876.1159884 .[18] A. Ulrich, G. Giorgidze, J. Weijers, N. Schweinsberg, DSH: Database supported Haskell, http://hackage.haskell.org/package/DSH (2000–2004).[19] K. Suzuki, O. Kiselyov, Y. Kameyama, Finally, safely-extensible and e ffi cient language-integrated query, in: Pro-ceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM ’16,ACM, New York, NY, USA, 2016, pp. 37–48. doi:10.1145/2847538.2847542 .[20] O. Kiselyov, T. Katsushima, Sound and e ffi cient language-integrated query - maintaining the ORDER, in: B. E.Chang (Ed.), Programming Languages and Systems - 15th Asian Symposium, APLAS 2017, Suzhou, China,November 27-29, 2017, Proceedings, Vol. 10695 of Lecture Notes in Computer Science, Springer, 2017, pp. 364–383. doi:10.1007/978-3-319-71237-6\_18 .
21] O. Kiselyov, Typed tagless final interpreters, in: Generic and Indexed Programming, Springer, 2012, pp. 130–174.[22] H. Xi, C. Chen, G. Chen, Guarded recursive datatype constructors, in: Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’03, ACM, New York, NY, USA, 2003, pp.224–235. doi:10.1145/604131.604150 .[23] S. Najd, S. Lindley, J. Svenningsson, P. Wadler, Everything old is new again: quoted domain-specific languages,in: M. Erwig, T. Rompf (Eds.), Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation andProgram Manipulation, PEPM 2016, St. Petersburg, FL, USA, January 20 - 22, 2016, ACM, 2016, pp. 25–36. doi:10.1145/2847538.2847541 .URL http://dl.acm.org/citation.cfm?id=2847538 [24] F. W. Brasil, Quill - compile-time language integrated queries for Scala [cited 14 / / https://getquill.io [25] Lightbend, Slick - functional relational mapping for Scala [cited 14 / / http://slick.lightbend.com/ [26] Apache Software Foundation, The Cassandra query language.URL http://cassandra.apache.org/doc/4.0/cql/ [27] L. Wong, Normal forms and conservative extension properties for query languages over collection types, J. Comput.Syst. Sci. 52 (3) (1996) 495–505. doi:10.1006/jcss.1996.0037 .[28] J. Cheney, S. Lindley, P. Wadler, Query shredding: e ffi cient relational evaluation of queries over nested multisets,in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, 2014, pp.1027–1038. doi:10.1145/2588555.2612186 .[29] J. Gibbons, F. Henglein, R. Hinze, N. Wu, Relational algebra by way of adjunctions, PACMPL 2 (ICFP) (2018)86:1–86:28. doi:10.1145/3236781 .[30] D. L. Parnas, On the criteria to be used in decomposing systems into modules, Commun. ACM 15 (12) (1972)1053–1058. doi:10.1145/361598.361623 .[31] J. Hughes, Why functional programming matters, The computer journal 32 (2) (1989) 98–107.[32] M. Pickering, N. Wu, J. Gibbons, Profunctor optics: Modular data accessors, Art, Science, and Engineering ofProgramming 1 (2) (4 2017). doi:10.22152/programming-journal.org/2017/1/7 .[33] J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, A. Schmitt, Combinators for bidirectional tree transfor-mations: A linguistic approach to the view-update problem, ACM Trans. Program. Lang. Syst. 29 (3) (May 2007). doi:10.1145/1232420.1232424 .[34] O. Grenrus, Glassery (Apr 2018) [cited 14 / / http://oleg.fi/gists/posts/2017-04-18-glassery.html [35] E. Kmett, lens - lenses, folds, and traversals [cited 14 / / https://github.com/ekmett/lens [36] J. Tru ff aut, Monocle - optics library for Scala [cited 14 / / http://julien-truffaut.github.io/Monocle/ [37] R. O’Connor, Functor is to Lens as Applicative is to Biplate: Introducing Multiplate, CoRR abs / arXiv:1103.2841 .[38] P. Wadler, XQuery: A typed functional language for querying XML, in: International School on Advanced Func-tional Programming, Springer, 2002, pp. 188–212.[39] Habla Computing, Optica - optic-based language-integrated query [cited 14 / / https://github.com/hablapps/scico19 [40] S. Fischer, Z. Hu, H. Pacheco, A clear picture of lens laws, in: International Conference on Mathematics of ProgramConstruction, Springer, 2015, pp. 215–223.[41] R. L¨ammel, E. Meijer, Revealing the X / O impedance mismatch, in: International Spring School on Datatype-Generic Programming, Springer, 2006, pp. 285–367.[42] W. Keller, Mapping objects to tables, in: Proc. of European Conference on Pattern Languages of Programming andComputing, Kloster Irsee, Germany, Vol. 206, Citeseer, 1997, p. 207.[43] C. J. Date, A critique of the SQL database language, SIGMOD Rec. 14 (3) (1984) 8–54. doi:10.1145/984549.984551 .[44] J. Cheney, Email correspondence, Personal communication (May 2019).[45] D. Leijen, E. Meijer, Domain specific embedded compilers, in: Proceedings of the 2nd Conference on Conferenceon Domain-Specific Languages - Volume 2, DSL99, USENIX Association, USA, 1999, p. 9.[46] R. Hinze, et al., Fun with phantom types, The fun of programming (2003) 245–262.[47] O. Kiselyov, E ff ects without monads: Non-determinism back to the meta language, Electronic Proceedings inTheoretical Computer Science 294 (2019) 1540. doi:10.4204/eptcs.294.2 .URL http://dx.doi.org/10.4204/EPTCS.294.2 [48] C. W. Bachman, The programmer as navigator, Commun. ACM 16 (11) (1973) 653658. doi:10.1145/355611.362534 .
49] T. Katsushima, O. Kiselyov, Language-integrated query with ordering, grouping and outer joins (poster paper),in: U. P. Schultz, J. Yallop (Eds.), Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluationand Program Manipulation, PEPM 2017, Paris, France, January 18-20, 2017, ACM, 2017, pp. 123–124. doi:10.1145/3018882 .URL http://dl.acm.org/citation.cfm?id=3018893 [50] M. Fern´andez, Y. Kadiyska, D. Suciu, A. Morishima, W.-C. Tan, Silkroute: A framework for publishing relationaldata in XML, ACM Transactions on Database Systems (TODS) 27 (4) (2002) 438–493.[51] S. Lindley, J. Cheney, Row-based e ff ect types for database integration, in: B. C. Pierce (Ed.), Proceedings of TLDI2012: The Seventh ACM SIGPLAN Workshop on Types in Languages Design and Implementation, Philadelphia,PA, USA, Saturday, January 28, 2012, ACM, 2012, pp. 91–102. doi:10.1145/2103786.2103798 .URL http://dl.acm.org/citation.cfm?id=2103786 [52] MongoDB, Inc., MongoDB, (2019).[53] R. Horn, R. Perera, J. Cheney, Incremental relational lenses, Proc. ACM Program. Lang. 2 (ICFP) (2018) 74:1–74:30. doi:10.1145/3236769 .[54] E. Burmako, Scala macros: let our powers combine!: on how rich syntax and static types work with metaprogram-ming, in: Proceedings of the 4th Workshop on Scala, ACM, 2013, p. 3.[55] O. Hartig, J. P´erez, An initial analysis of Facebook’s GraphQL language, in: AMW 2017 11th Alberto MendelzonInternational Workshop on Foundations of Data Management and the Web, Montevideo, Uruguay, June 7-9, 2017.,Vol. 1912, Juan Reutter, Divesh Srivastava, 2017.[56] J. L´opez-Gonz´alez, J. M. Serrano, Towards optic-based algebraic theories: The case of lenses, in: InternationalSymposium on Trends in Functional Programming, Springer, 2018, pp. 74–93.[57] S. Ceri, G. Gottlob, L. Tanca, What you always wanted to know about Datalog (and never dared to ask), IEEEtransactions on knowledge and data engineering 1 (1) (1989) 146–166.[58] M. Kr¨otzsch, F. Simancik, I. Horrocks, A description logic primer, CoRR abs / arXiv:1201.4089 .URL https://arxiv.org/pdf/1201.4089 [59] L. Fegaras, An algebra for distributed big data analytics, Journal of Functional Programming 27 (2017) e27. doi:10.1017/S0956796817000193 .[60] M. Benedikt, J. Cheney, Semantics, types and e ff ects for XML updates, in: P. Gardner, F. Geerts (Eds.), DatabaseProgramming Languages, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 1–17.[61] M. Odersky, D. Petrashko, G. Martres, et al., The Dotty project [cited 18 / / https://github.com/lampepfl/dotty [62] P. Wadler, S. Blott, How to make ad-hoc polymorphism less ad hoc, in: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ACM, 1989, pp. 60–76.[63] B. C. Oliveira, A. Moors, M. Odersky, Type classes as objects and implicits, in: Proceedings of the ACM Interna-tional Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’10, ACM,New York, NY, USA, 2010, pp. 341–360. doi:10.1145/1869459.1869489 ..