A community-based transcriptomics classification and nomenclature of neocortical cell types
Rafael Yuste, Michael Hawrylycz, Nadia Aalling, Detlev Arendt, Ruben Armananzas, Giorgio Ascoli, Concha Bielza, Vahid Bokharaie, Tobias Bergmann, Irina Bystron, Marco Capogna, Yoonjeung Chang, Ann Clemens, Christiaan de Kock, Javier DeFelipe, Sandra Dos Santos, Keagan Dunville, Dirk Feldmeyer, Richard Fiath, Gordon Fishell, Angelica Foggetti, Xuefan Gao, Parviz Ghaderi, Onur Gunturkun, Vanessa Jane Hall, Moritz Helmstaedter, Suzana Herculano-Houzel, Markus Hilscher, Hajime Hirase, Jens Hjerling-Leffler, Rebecca Hodge, Z. Josh Huang, Rafiq Huda, Yuan Juan, Konstantin Khodosevich, Ole Kiehn, Henner Koch, Eric Kuebler, Malte Kuhnemund, Pedro Larranaga, Boudewijn Lelieveldt, Emma Louise Louth, Jan Lui, Huibert Mansvelder, Oscar Marin, Julio Martínez-Trujillo, Homeira Moradi, Natalia Goriounova, Alok Mohapatra, Maiken Nedergaard, Pavel Němec, Netanel Ofer, Ulrich Pfisterer, Samuel Pontes, William Redmond, Jean Rossier, Joshua Sanes, Richard Scheuermann, Esther Serrano Saiz, Peter Somogyi, Gábor Tamás, Andreas Tolias, Maria Tosches, Miguel Turrero Garcia, Argel Aguilar-Valles, Hermany Munguba, Christian Wozny, Thomas Wuttke, Liu Yong, Hongkui Zeng, Ed S. Lein
Copenhagen Classification, p.1
A community-based transcriptomics classification and nomenclature of neocortical cell types
Rafael Yuste , Michael Hawrylycz , Nadia Aalling , Detlev Arendt , Ruben Armananzas , Giorgio Ascoli , Concha Bielza , Vahid Bokharaie , Tobias Bergmann , Irina Bystron , Marco Capogna , Yoonjeung Chang , Ann Clemens , Christiaan de Kock , Javier DeFelipe , Sandra Dos Santos , Keagan Dunville , Dirk Feldmeyer , Richárd Fiáth , Gordon Fishell , Angelica Foggetti , Xuefan Gao , Parviz Ghaderi , Onur Güntürkün , Vanessa Jane Hall , Moritz Helmstaedter , Suzana Herculano-Houzel , Markus Hilscher , Hajime Hirase , Jens Hjerling-Leffler , Rebecca Hodge , Z. Josh Huang , Rafiq Huda , Yuan Juan , Konstantin Khodosevich , Ole Kiehn , Henner Koch , Eric Kuebler , Malte Kühnemund , Pedro Larrañaga , Boudewijn Lelieveldt , Emma Louise Louth , Jan Lui , Huibert Mansvelder , Oscar Marin , Julio Martínez-Trujillo , Homeira Moradi , Natalia Goriounova , Alok Mohapatra , Maiken Nedergaard , Pavel Němec , Netanel Ofer , Ulrich Pfisterer , Samuel Pontes , William Redmond , Jean Rossier , Joshua Sanes , Richard Scheuermann , Esther Serrano Saiz , Peter Somogyi , Gábor Tamás , Andreas Tolias , Maria Tosches , Miguel Turrero Garcia , Argel Aguilar-Valles , Hermany Munguba , Christian Wozny , Thomas Wuttke , Liu Yong , Hongkui Zeng , Ed S. Lein Copenhagen Classification, p.2
28 University of Western Ontario 29 CARTANA 30 Leiden University Medical Center 31 Stanford University 32 King’s College London 33 Krembil Research Institute 34 University of Haifa 35 Charles University 36 Bar Ilan University 37 Macquarie University 38 Sorbonne University 39 J. Craig Venter Institute 40 University of Szeged 41 Baylor College of Medicine 42 Carleton University 43 University of Strathclyde
Copenhagen Classification, p.3
To understand the function of cortical circuits it is necessary to classify their underlying cellular diversity. Traditional attempts based on comparing anatomical or physiological features of neurons and glia, while productive, have not resulted in a unified taxonomy of neural cell types. The recent development of single-cell transcriptomics has enabled, for the first time, systematic high-throughput profiling of large numbers of cortical cells and the generation of datasets that hold the promise of being complete, accurate and permanent. Statistical analyses of these data have revealed the existence of clear clusters, many of which correspond to cell types defined by traditional criteria, and which are conserved across cortical areas and species. To capitalize on these innovations and advance the field, we, the Copenhagen Convention Group, propose the community adopts a transcriptome-based taxonomy of the cell types in the adult mammalian neocortex. This core classification should be ontological, hierarchical and use a standardized nomenclature. It should be configured to flexibly incorporate new data from multiple approaches, developmental stages and a growing number of species, enabling improvement and revision of the classification. This community-based strategy could serve as a common foundation for future detailed analysis and reverse engineering of cortical circuits and serve as an example for cell type classification in other parts of the nervous system and other organs. Anatomical and physiological classifications of cortical cell types
After more than a hundred years of sustained progress by generations of neuroscientists, it is clear that neocortical neurons and glial cells, like cells in any tissue, belong to many distinct types. Different cell types play discrete roles in cortical function and computation, making it important to characterize and describe them accurately and in their absolute and relative numbers. However, classifications of cortical neurons beyond the largest divisions have generally been subjective, based on a qualitative assessment of the morphology of a small number of neurons by individual investigators. Towering historical figures like Cajal, Lorente de Nó and Szentágothai, among others, proposed classifications of cortical cells based on their morphologies as visualized with histological stains (Figure 1A-C) . These anatomical classifications described several dozen types of pyramidal neurons, short-axon cells and glial cells, which were subsequently complemented by morphological accounts of additional cortical cell types by many researchers but without arriving at a clear consensus as to the number or even the definition of a cortical cell type. In particular, there is no established convention for assessing which morphological features of a neuron are essential to characterize a given cell type. Over the last few decades, the introduction of new optical microscopy, morphological, ultrastructural, immunohistochemical, and electrophysiological methods, and the widespread use of molecular markers (
Figure 1D-H ), have provided increasingly finer phenotypic measurements of cortical cells and enabled new efforts to classify them more quantitatively, using supervised or unsupervised machine learning methods such as cluster analysis . A community effort to classify neocortical inhibitory cells was attempted at the 2005 Petilla
Copenhagen Classification, p.4
Convention, held in Cajal’s hometown in Navarre, Spain, and led to the adoption of a common standardized terminology to describe the anatomical, physiological and molecular features of neocortical interneurons . While useful, this attempt fell short of providing a classification and working framework that investigators could incorporate into their research. But, at the same time, an outcome of the Petilla Convention was the realization that there was not yet a single classification method that both captured the inherently multi-modal nature of cell phenotypes and could serve as a standard. While most researchers accepted the existence of cell types that could be captured and defined independently by different methods, there was no agreement as to which would form an optimal basis for classification. In principle, many criteria can be used, including 1) an anatomical or connectivity-based classification , 2) a parametrization of the intrinsic electrophysiological properties , 3) a combination of structural and physiological criteria , 4) molecular markers detected with antibodies or single cell PCR , 5) identification of developmental or epigenetic attractor states or 6) using evolutionary approaches that identify homologous cell types across species . Ideally, all these classifications would converge and agree with each other, or at least substantially overlap. Indeed, there is indeed substantial concordance among categories based on anatomical, molecular and physiological criteria , but it has not been easy to combine these approaches and datasets into a unified taxonomy. There are significant differences between experts in assigning neurons to particular classes in the literature and experts often disagree on what constitutes ground truth. This uncertainty is exacerbated by technical problems: conventional approaches have been laborious, low-throughput, frequently non-quantitative, and generally plagued by an inability to sample cells in non-biased ways. Thus, setting aside debates about the importance of various criteria and the nature or even existence of discrete cell types, it is not surprising that the cell type problem remains challenging. However, a new approach now available, single-cell transcriptomics, can help break the impasse. Transcriptomics as the core framework for classifying cell types
Recent advances in high-throughput single-cell transcriptomics (scRNAseq) have dramatically changed the paradigm of cellular classification, offering a powerful new quantitative genetic framework for classifying cell types . This new approach measures the expression of thousands of genes (transcriptomes) in large numbers of single cells and operates at relatively high speed and low cost. Related methods in epigenomics can identify sites of methylation and putative gene transcriptional regulation, essential to cell function and state. These new approaches descend from the methodological, conceptual and economic revolution created by the
Human Genome Project . With diverse genomes in hand, it became feasible to generate entire transcriptomes from tissues, and these methods were then miniaturized for amplifying the RNA present in single cells. Initially, practical considerations limited this application to only a few hundred cells per experiment but effective new methods quickly emerged for profiling thousands of cells or nuclei at a time . With simultaneous advances in computational methods needed to analyze an overwhelming amount of sequence-based data , it is now possible to classify and characterize the complete diversity of neural cells in an unbiased way in any tissue or species, including the neocortex . Copenhagen Classification, p.5
Conceptually, as much as the genome is the internal genetic description for each species, the transcriptome, as the complete set of genes being expressed, provides another internal code that describes each cell within an organism in a spatiotemporal context. Practically, the scale of scRNAseq promises near-saturating analysis of complex cellular brain regions like the neocortex or any other organ, providing for the first time a comprehensive and quantitative description of cellular diversity and the prospect of simplifying tissue cell composition to a finite number of cell types and states defined by the clustering of these datasets. Importantly however, these transcriptionally-defined clusters represent a probabilistic
Figure 1: Historical milestones in cortical cell type classification.
Morphological characterization and classification of neurons ( A ) and glial cells ( B ) by Ramón y Cajal (1899). C . Diagram showing the connections of different types of interneurons with pyramidal cells; from
Szentágothai (1975) . D . Cortical cell type classifcation based on intrinsic firing properties (from Petilla convention) . E . Definition of GABAergic interneuron classes based on non-overlapping and combinatorial marker gene expression; from Kawaguchi and Kubota (1997) . F . Correlation of firing properties with class markers. G . Complex relationships between cellular morphology, marker gene expression and intrinsic firing properties based on multimodal analysis (from Markram et al. (2004) ) . H . Comprehensive morphological and physiological classifications of cortical cell types (from Markram et al. (2015) ). Copenhagen Classification, p.6 description of cell types in a high-dimensional landscape of gene expression across all cells in a tissue, rather than a definition based on a small set of necessary and sufficient cellular markers or other features. The scale, precision and high information content of these current methods now far outpace other classical methods of cellular phenotyping in neuroscience and have the potential to approach the criteria of Complete, Accurate and Permanent (CAP) often cited by the late Sydney Brenner as the gold standard in science . Indeed, major consortium efforts now aim to generate a complete description of cell types based on molecular criteria across the cortex (
Allen Institute for Brain Science ), the entire brain (the
NIH BRAIN Initiative Cell Census Network and even the entire body (the Human Cell Atlas ). As the Human Genome Project offers a means for comparative analysis of orthologous genes across species, these efforts promise to define all or most cell types and states in humans and model organisms, with the possibility of extending them to a variety of species to understand the evolution of cell type diversity. These enormous investments have the potential for a transformative effect on the neuroscience community, which will be accelerated by a formalization of a molecular classification and adoption by the community. Transcriptomic classifications offer a number of advantages when used as a framework for bounding the problem of cellular diversity . For example: 1) High-throughput transcriptomics is uniquely effective at allowing a systematic, comprehensive analysis of cellular diversity in complex tissues. Its quantitative and high-throughput nature enables the adoption of rigorous definitions and criteria using datasets from tens of thousands to millions of cells.
2) The genes expressed by a cell during its developmental trajectory and maturity ultimately underlie its structure and function, and so the transcriptome offers predictive power based on interpreting gene function. From this perspective, other cellular phenotypes, including morphology, are in part encoded by genes, rather than completely independent defining criteria . Of course the transcriptome of the mature neuron, measured at a single point in time, does not fully predict cellular properties for many reasons: it fails to capture the cell’s developmental history, with intrinsic or extrinsic influences determining its phenotype (such as interaction within its synaptic microcircuitry or neuromodulatory effects), nor does it reveal post-transcriptional or post-translational modifications, regulation, trafficking or physiological brain-state dependent relatively short term modulation of subsets of genes. Nevertheless, in general, transcriptome-based classifications are so far largely concordant with a large body of literature regarding cellular anatomy, physiology, epigenetic markers, function and developmental origin, while offering an open-ended means for generating hypotheses about gene expression underlying other cellular phenotypes. 3) A molecular definition of cell types allows the identification of robust cell type markers and the creation of genetic tools to target, label and manipulate specific cell types . Even if such tools do not resolve the lumping or splitting of discrete subtypes, they provide the means to standardize the datasets obtained by different researchers. 4) Transcriptomic data also can provide information about human diseases, allowing a potential linkage between genes associated with disease and their cellular locus of action. Admittedly, pathological disturbances could be deduced, independent of Copenhagen Classification, p.7 transcriptomics, from pathophysiological studies combined with time and place-dependent modifications. The cell type transcriptomics-based data might lead to identification of many mechanistically unresolved diseases, as changes in the expression levels of key genes from involved cell types. 5) And finally, the transcriptome is unique among cellular phenotypes in that it allows quantitative alignment of cell types between highly disparate datasets based on conserved molecular signatures across evolutionary or developmental time. Indeed, advances in single nucleus sequencing now allow transcriptomic analysis in any species or developmental stage, enabling the potential for alignment of cell types across species (based on conserved expression of homologous genes) and developmental stages (based on gradual developmental trajectories) . Systematic cross-species comparisons of similarly acquired and analyzed single-cell transcriptomes will also make possible the objective examination of the degree to which cell type diversity in the cerebral cortex has increased or been constrained in evolution. Proposing a molecularly based classification scheme for use by a field traditionally centered on cellular anatomy, physiology and synaptic connectivity may be challenging unless such a classification correlates strongly with those features. Recent work in the retina is promising in this regard, where a large body of work has established a highly diverse set of anatomically, physiologically and functionally discrete cell types . For example, for mouse bipolar cells, a class comprising 15 types of excitatory interneurons, there is essentially a perfect correspondence between types defined by scRNAseq, high-throughput optical imaging of electrical activity, and serial section electron microscopy . Application of single- cell transcriptomics to the retina identifies clusters that strongly correlate with this prior cell type knowledge . Whereas an analogous concordance may not be as straightforward in the neocortex, to date there is little evidence that it cannot be achieved. Importantly, transcriptomics results can be complex and, even within a single putative cell type, there could be variation in gene expression due to cell state, differentiation, and other dynamic processes. Some studies have suggested that cell types are less defined discrete entities but rather are components of a complex landscape of possible states . Further, there are aspects of genome regulation such as transcription-factor binding activities, that are not yet measurable in single cells . Differentially regulated genes that establish these cell states or drive transitions will need to be identified and analyzed. In this respect, the extensions of scRNAseq to epigenomics methods measuring open chromatin and methylation state are now becoming possible and will help to understand the impact of gene regulation on cell state . Progress in the simultaneous measurement of transcription and regulatory state will continue, and having a solid transcriptome-based taxonomy will form a strong foundation for further elucidation of cell type characteristics in different mammalian cortices. Experimental tools are increasingly available to aid in transcriptomic classification and phenotypic characterization in model animals, such as specific Cre lines and viruses, Patch-seq and spatial transcriptomic methods such as MERFISH. These efforts will be best approached as a community effort, linking evidence derived by them to a molecular framework. Of course it is possible that there will be significant mismatches between phenotypes, as revealed by different methods, that will be problematic for the usefulness of a molecular classification; for example, long-range connectivity patterns may have been set up early in development and may not be
Copenhagen Classification, p.8 correlated with adult gene expression. This information will need to be incorporated into a cell type classification. Potential mismatches, however, do not negate the value of a core transcriptomic classification. Genes differentiating types are likely to have cellular functions, genes are the linkage to genetics of human disease, and genes are the only path to genetic tools to manipulate cell types.
Biological insights from cortical transcriptomics
A transcriptomics classification is not only practical but could also enable important biological discoveries. Indeed, the application of scRNAseq to mouse and human cortex has already identified a complex but finite set of molecularly defined cell types that generally agree with the vast prior literature on cytoarchitectural organization, developmental origins, functional properties and long-range projections . Moreover, these initial transcriptomic studies of cortical tissue are proving their meaningfulness by providing key insights into the biology of the system. To start, the hierarchical (agglomerative) organization of transcriptomic cell types, based on relative similarity between clusters, makes strong biological sense. Viewed as a tree or dendrogram, the initial branches reflect major classes (neuronal vs. non-neuronal, excitatory vs. inhibitory), with finer splits reflecting more subtle variants of each class. The major splits likely reflect different developmental programs; for example, neocortical neurons are split into excitatory glutamatergic vs. inhibitory GABAergic classes reflecting their different developmental origins in embryonic pallium versus subpallial proliferative regions, while the next splits in the GABAergic branch contain neurons generated by medial and caudal subdivisions of the ganglionic eminences and the preoptic areas ( Figure 2A) . These transcriptomic divisions are consistent with a long literature on cell fate specification of different GABAergic classes and the transcription factors involved in that process (
Figure 2B ). Transcriptomics also allows an quantitative analysis of developmental trajectories involved in this specification and maturation ( Figure 2C ). Finally, the genes that differentiate different transcriptomic classes are predictive of their cellular and circuit function, as differential expression of genes associated with neuronal connectivity and synaptic communication define these classes ( Figure 2D ). Determining how consistent the organization of transcriptomic cell types is across a wider range of species will contribute to ratify the validity of this classification – or to understand the biological and evolutionary origins of its limitations. Transcriptomic classification provides a direct avenue for quantitative comparative analysis across species, by aligning cell types across species based on shared gene covariation. This is particularly relevant to better understand the human brain, because this alignment of cell types allows inference of cellular phenotypes in the human brain where such information is extremely difficult or impossible to obtain. For example, a recent study of human cortex demonstrated that the overall cellular organization of the human cortex is highly conserved with that of the mouse, allowing identification of homologous cell types. Similarly, a recent transcriptomic study performed in the mouse, turtle and lizard found that the same major classes of cortical GABAergic neurons (somatostatin, parvalbumin-like and serotonin receptor 3A HTR3A) exist in mammals and reptiles (Figure 2E) . Despite the overall conservation between mouse and human, many differences are seen in homologous types, including their proportions, laminar distributions, gene expression, and morphology (for example, HTR3A is not expressed in human GABAergic interneurons ). Another recent comparative transcriptome study of prefrontal cortex of humans, chimpanzees and macaques Copenhagen Classification, p.9 revealed that, in spite of its apparent histological conservation, the human neocortex shows many unique gene-expression features . Finally, there are also new insights for non-neuronal cells from transcriptomic studies, which have identified astrocyte diversity and divergent molecular phenotypes between mouse and human that correlate with known morphological specializations in primate astrocytes . Copenhagen Classification, p.10
Figure 2: Biological insights from molecular analyses of cortical cell types.
A. Comprehensive single-cell transcriptome analysis reveals molecular diversity of cell types, with relatively invariant interneuron and non-neuronal types across cortical areas but significant variation in excitatory neurons (modified from Tasic et al. (2018)) . B. Major interneuron classes are specified by distinct transcription factor codes (from Kessaris et al. (2014). C. Single-cell transcriptomics of GABAergic interneuron development demonstrates gradual changes in gene expression underlying developmental maturation (modified from Mayer et al. (2018)) D. Gene families shaping cardinal GABAergic neuron type include neuronal connectivity, ligand receptors, electrical signaling, intracellular signal transduction synaptic transmission, and gene transcription.
These gene families assemble membrane-proximal molecular machines that customize the input–output connectivity and properties in different GABAergic neuron types.
E. Single-cell transcriptomics allows cross-species comparisons and demonstrates conservation of major cell classes from reptiles to mammals, with conserved transcription factors but some species specific effectors (from Tosches et al. (2018) and Tosches and Laurent (2019)) . Copenhagen Classification, p.11
A probabilistic definition of cortical cell types
While there is compelling evidence for the existence of distinct cell types based on robust clusters of observable and measurable cell attributes, a precise definition of a type is more challenging as different and partially conflicting classifications have been put forward, emphasizing structural/functional characteristics or cellular identities . In an effort to arrive at a conceptual definition of cell types, many different criteria have been proposed. For example, a cell type may consist of groups of neurons that share a common developmental origin, common sets of gene expression patterns (such as a necessary and sufficient transcription factor code), common morphological or physiological features, or a common function in the synaptic circuit, either through input-output connectivity or the transfer function that they carry out within a same environment, while processing their inputs. While each of these views has merit and should be ultimately explained by a meaningful definition of cell type, they are often not all readily measurable nor easily combined, particularly acrosss species. Given the complexity of cellular function, the present lack of full correspondence of multimodal data sets within and across species, and the required explanatory power that a meaningful definition of cell type should support, a plausible way forward may be to utilize an operational definition of a cell type, building on the existence of statistically defined clusters over a set of measurable attributes. Indeed, in most recent single-cell sequencing approaches, groups of transcriptomic profiles are clearly identified by data clustering, whereby sets of cells are subjected to a variety of iterative and hierarchical clustering methods with subsequent permutation testing for significance. As is common with basic statistical analysis, identified groups are compared to a null hypothesis of no group structure . A critical and challenging question is how to represent a transcriptomic taxonomy. One natural approach is to adopt a hierarchical framework, a task to which cluster analysis is well suited. This approach follows the historical tradition of using cladistics to classify organisms, assuming common ancestors in their evolution and with synapomorphies (shared derived traits) among related clades. While statistical clusters do not presume any hierarchy in the structure of the data, biological systems have a temporal evolution which is one of their essential features . Indeed, evolutionary history or development of a neural circuit implies earlier stages which are often less specialized and represent common ancestors of later states, making natural the structured classification of cell types as a hierarchical tree . A hierarchical classification of cell types is thus particularly suited both to establishing cell type homologies across species (revealed through shared hierarchical clustering across species) and revealing species-specific cell types through objective criteria. While a hierarchical organization appears to mirror developmental principles and spatiotemporal organization in the neocortex, this may not be universally true for other brain regions, organs or across evolution to the same degree. In that case, complex inclusion/exclusion and probabilistic class relationships are not well represented hierarchically and may be more amenable to graph based or other set theoretic constructions. This operational definition of cell types is particularly applicable to transcriptomic profiling, where the dimension of the underlying space is large, the variance comparatively high, and where competing approaches have largely identified common cell type patterns. Despite these successes, relatively little progress has been made on the rigorous definition of transcriptomic classes, and the description of intra- and inter-class variability. Moreover, there is Copenhagen Classification, p.12 often a discrepancy as to when to consider two cells as belonging to the same group, or two groups as being adequately different as to justify subdivision. This issue is compounded by the possibility that some transcriptomic clusters represent a state of a cell type (for example, a pathological, developmental or functional state), rather than a fixed cell type. In this case, a finer distinction among clusters may have gone too far. In fact, aside from its biological meaningfulness, one advantage of casting the cell type classification as a cladistic one is that the lumping/splitting distinction remaps itself as a distinction between different levels of the hierarchical tree, since one could split a group into subgroups at a lower level of the hierarchy to reflect data obtained in different physiological or developmental conditions. Indeed, the space of the transcriptomes for cortical cell types may be visualized as a complex high dimensional landscape with isolated peaks of expression for a given cell type but also valleys and gradients between more weakly defined classes, which could be described alternatively as types or states. Unfortunately, approaches to transcriptome cluster identification do not typically take this complexity into account when forming cell type classes . While presently there is much work being done in visualization of data via clustering, a robust statistical framework that enables a quantitative definition of cell type (or tendency to be a type) is still missing. The development of a rigorous probabilistic or statistical framework for cell type definitions would greatly assist with the interpretation of large data sources and in the identification of those attributes most relevant (and non-redundant) to cell type classification and function . Ideally, this framework should provide a quantitative definition of a cell type that is independent of the methods used by clustering analysts and would include a description of quantiative metrics such as resolution, complexity, variability, uniqueness, and association of variables with other attributes. Within this statistical definition of cell type there are two alternative approaches to find and test the validity of clusters. One is “hard” clustering, with clearly defined borders between clusters and with each cell strictly assigned to a particular type. Alternatively, in “soft” (or “fuzzy”) clustering, any given cell has a particular probability of belonging to a particular cluster. Despite the probabilistic nature, inter- and intra-cluster distance may still be defined for outcome validation. Also, projections to low dimensionality spaces to ease clustering visualization are possible. Although most cluster analyses performed today use hard definitions, we forsee that the biological diversity of cells in the nervous system, and the discontinuous variation found in many different types of data, including transcriptomics, as described by our landscape metaphor, are more flexibly captured with a probabilistic criterion. There is much opportunity for progress in deriving statistical and probabilistic models of cell type, models for describing data modality covariation, effective utilization of linkage correspondence methods between data modalities, and informatics frameworks for elucidating cell type structure and taxonomy. Ultimately, the consensus description of cell types may form a continuum taken to the cladistic limits, beginning with hard and ending with soft distinctions among cells types, with an ambiguous transition between these extremes.
A unified taxonomy and nomenclature of cortical cell types
Using such operational definitions of cell types, a data-driven transcriptomic classification of cortical cell types should allow the creation of a formal unified cell type classification, ontology and nomenclature system, whose principles are generalizable to any biological system. Following the genetic paradigm proposed, there are many lessons to learn
Copenhagen Classification, p.13 from genomics. For example, the classification could be iteratively updated, refined with subsequent accumulation of data such as builds of the genome or transcriptome, which changed dramatically in the early years and have become increasingly more stable. To accomplish this, a coherent principled nomenclature system is essential. Like in current gene nomenclature, many aliases exist linking cell types to commonly used terminology relating to cellular anatomy or other phenotypes. This nomenclature should be portable across species, much as current gene symbols refer to orthologous genes. For the classification to be useful like the genome has been, computational tools need to be developed to allow researchers to quantitatively map their data to this reference classification, conceptually similar to BLAST alignment tools to map sequence data. In addition, this classification should aim to be an ontological taxonomy (i.e. describing the actual biological reality of the data, rather than just simply reflecting the statistical structure of the data itself), and follow hierarchical cladistics for the reasons described above. It should also aim to be a consensus classification that incorporates the richness of data accumulated by different groups, be presented in a curated output that is public, easily accessible, and with revisions managed by a curation committee of experts. Creation of such an ontology is a serious project in data organization that can build on prior efforts in cell ontologies it seems likely that many of these types will vary in a somewhat continuous fashion across cortical areas and possibly also across species. The ontological system therefore needs to be able to accommodate gradients. Likewise, the classification system should also have a temporal component to capture the developmental progression from progenitor cell division to a terminally differentiated state. In addition to species-specific classification, a key goal should be the creation of a comparative “consensus” classification representing the alignment of cell types across species. Hand in hand with this taxonomy we propose the adoption of a formal standardized nomenclature, which we view as an essential step to organize the knowledge . An old Basque proverb says, “a name is necessary for something to be”, and a similar Chinese saying, “the beginning of wisdom is to call things by their right names”, Drawing on the long anatomical tradition of the field, and taking advantage of the fact that humans are visual animals (which makes images easier to remember, as opposed to a list of marker genes), one possibility that is atttractive despite potential drawbacks is to incorporate in this nomenclature older descriptive anatomical terms, when possible (such as chandelier, double-bouquet, basket, Martinotti, pyramidal, for example), to seek consistency with the vast literature on neocortical cell types. To name higher level branches of hierarchical trees, one could combine names, like in species taxonomy, with a name that describes the genus and a second one that describes the species. Whether this tradition should be followed for new undescribed cell types is debatable, and adopting ultimately a non-morphologically based nomenclature could make it more flexible, more easily applicable across species, and also compatible with other tissues outside the cortex or the brain. Given the explosion of scRNA-seq-based atlasing efforts now under way, Copenhagen Classification, p.14 developing a nomenclature convention that works for cortex, brain and even the whole body is an essential problem to be solved. This taxonomy will only be useful and successful if adopted by the community. In addition to the nomenclature, a series of research tools should be developed, ideally by a community consortium, to facilitate similar experimental access to these cell types by the broader range of investigators. We envision molecular and genetic tools, such as standard sets of antibodies and RNA probes to identify key molecular markers for each cell type, as well as cell or mouse lines that are used as resources for the entire community. Statistical tools to enable direct comparisons among datasets, and mapping of new datasets to reference datasets, are essential. Finally, as described below, an open informatics backbone needs to be developed as an essential part of the taxonomy.
A knowledge environment platform for community data aggregation
While one can view the transcriptomic classification as an outcome of the genomics revolution, one could also bring to bear the capability of another recent technological revolution, the internet. If we start from the premise that it is unlikely that a biologically comprehensive and widely accepted cell type classification will be completed for several years, then a natural question is whether we have the necessary tools in place to support this effort. In addition to functional annotation, additional refinement of transcriptomic and computational methods, the invention of new methods for measuring structure, function and connectivity, and acquisition of massive amounts of data will all be needed. As is well known, there is tremendous inefficiency in the scientific process with respect to deriving and retaining value from data , and contemporary scientfic publications and presentations are at odds with a dynamic and growing body of knowledge with an intended integrated outcome. Modern internet applications and community environments are much more applicable to our goal, and there are several features of this approach worth adopting in a way that could harness the power of the community of researchers more effectively. We propose that cell type classification and nomenclature could, in addition to its intrinsic scientific and medical value, simultaneously serve as a framework platform to accumulate data from the field, as a community effort. There are different possibilities for such a community taxonomy platform. For example, one could use an open website, where cortical cell types are listed and annotated, Wikipedia-like, by users. The site could look like a spreadsheet, with rows in the matrix representing each cell type and columns the different phenotypical features that characterize them. For example, in a given row, after a tentative name proposed for the cell type, examples of categories that could be annotated would include “alternative names”, “molecular features”, “morphological features”, “functional features”, “developmental originins” and “comparative aspects”, among others. Although a Wikipedia-style model might be a convenient way of summarizing types and their supporting evidence, we think that a more useful scenario, and one that can more efficiently inform successive versions of the classification, is to invest in a larger effort to create a dynamic community knowledge framework. The appropriate data structure we propose for such community platform, initially based on a transcriptomics cell type taxonomy, but then incorporating information from many sources, is a Knowledge Graph ( Figure 3 ). A knowledge graph is a data structure where the nodes
Copenhagen Classification, p.15 represent categories (in this case, cell types, using transcriptomics as the initial scaffold) and the links, or edges, between them represent their statistical relations (which can be expressed as conditional probabilities). This is represented in a multidimensional space, defined by the different metrics used to measure the nodes (in this case, the different attributes of data associated with each cell type) , . Through a process of community data aggregation of both raw and metadata, this graph automatically updates itself, following conventional optimization algorithms, as new data can change the relative position and distance of nodes with respect to one another. In this fashion, this graph could serve as the backend infrastructure for the taxonomy described above, since the incorporation of new data could enable finer (or different) groupings of cells. As the data organization would be non-hierarchical, it would not by itself build the taxonomy, but enable data aggregation that would serve for the next iteration of cell type clustering. The taxonomy and nomenclature, on the other hand, would inform the graph, determining its nodes. Another version of such graph would indicate where similarities and dissimilarities occur between pairs of species. The cortical cell knowledge graph could start small, initialized by standardized transcriptomics data, and become multimodal as different types of data, such as connectomics or other CAP databases, become accessible. This standardized database could be powered by open source algorithms and managed and curated by database administrators. The proposed cell type knowledge framework would represent a living and updatable resource that maintains an actively derived and flexible ontology of cortical cell types, benefitting from present active ontology efforts. It would be a dynamic database with query capability but it would only accept peer-reviewed published data in a standardized fashion and nomenclature, providing a common denominator for the research in the field, integrating quantitative and qualitative cell type classification and allowing for update subject to validation and review. The knowledge framework would utilize computational engines that allow new data to be compared and for users to query the current state of cell type understanding from the perspective of their new data. This neuronal classifier should be able to assign the most likely type to multi- or uni-modal datasets based on similarities to the current framework’s knowledge. In addition to supporting literature reference, the dynamic framework might include online forums for scientific discussion and education. Ultimately, a cell type community knowledge framework would be a dynamic and living resource that researchers, clinicians, and educators would refer to as the benchmark resource for cell types in the brain and could also infuse a spirit of collaborative endeavour in this often competitive field. Copenhagen Classification, p.16
A community-based taxonomy of cortical cell types
In summary, we think that the field of neocortical studies is ready for a synthetic, principled classification of cortical cell types, based on single-cell transcriptomic data, anchored on quantitative criteria that operationally define cell clusters based on their statistical (and probabilistic) grouping, and expressed as a hierarchical tree. Although initially molecularly driven, this taxonomy should be revised and modified as other CAP datasets become available, becoming a true multimodal classification of cortical cell types. We view this core classification as potentially valid for all mammalian species, and also likely at least partly applicable to homologous structures in other vertebrates, as a broad framework to encapsulate evolutionary conservation with species specialization. Indeed, only with such a systematic approach to comparing cell types across species will it be possible to understand how cell type diversity evolved in the cerebral cortex. In addition, we propose that the community input to support this taxonomy and enable future revisions of it is channelled into an open platform, such as a knowledge graph, as it is
Figure 3: Proposed community framework: taxonomy, database, and Cell Type Knowledge Graph. A . Profiling with scRNAseq and combining transcriptomic and anatomic features yields B . Cell type taxonomy scaffold, forming a genetic basis for cell type characterization. C . Cell type measurable entities form annotations including multmodal electrophysiology, neuron morphology, epigenetic and other genome level annotations and supporting literature citations. This information is combined into a Cell Type Knowledge Graph as a graphical database supporting effective information organization. D . Knowledge graph architecture supports the enrichement of scientific discovery and consistent building on cell type knowledge as a Complete, Accurate, and Permanent (CAP) resource. Copenhagen Classification, p.17 : This document resulted from the group discussions at the FENS/Brain Prize meeting “The Necessity of Cell Types for Brain Function” that took place in Copenhagen, Denmark on October 7-10, 2018. We thank the FENS and Brain Prize Foundation and staff for help and the Lundbeck Foundation for support. This paper is dedicated to the memory of S. Brenner.
References
1. Tasic, B. , et al.
Shared and distinct transcriptomic cell types across neocortical areas.
Nature , 72-78 (2018). 2. Mayer, C. , et al.
Developmental diversification of cortical inhibitory interneurons.
Nature , 457-462 (2018).
Copenhagen Classification, p.18
3. Ramón y Cajal, S.
La Textura del Sistema Nerviosa del Hombre y los Vertebrados , (Moya (Primera Edicion), Madrid, 1899). 4. Szentagóthai, J. The neuron network of the cerebral cortex: a functional interpretation.
Proc. R. Soc. Lond. B , 219-248 (1978). 5. Ascoli, G.A. , et al.
Petilla terminology: nomenclature of features of GABAergic interneurons of the cerebral cortex.
Nature reviews , 557-568 (2008). 6. Ramón y Cajal, S. Sur la structure de l'ecorce cerebrale de quelques mamiferes. La Cellule , 124-176 (1891). 7. Lorente de Nó, R. La corteza cerebral del ratón. Trab. Lab. Invest. Bio. (Madrid) , 41-78 (1922). 8. Peters, A. & Jones, E.G. Cerebral Cortex , (Plenum, New York, 1984). 9. Lund, J.S. Anatomical organization of macaque monkey striate visual cortex.
Ann. Rev. Neurosci. , 253-288 (1988). 10. Gilbert, C.D. Microcircuitry of visual cortex. Annu. Rev. Neurosci. , 217-247 (1983). 11. Jones, E.G. & Diamond, I.T. (eds.). The Barrel Cortex of Rodents. , 446 (Plenum, New York, 1995). 12. Fairen, A., De Felipe, J. & Regidor, J. Nonpyramidal neurons. in
Cerebral Cortex , Vol. 1 (eds. Peters, A. & Jones, E.G.) 201-253 (Plenum, New York, 1984). 13. Mountcastle, V.B.
Perceptual Neuroscience: The cerebral cortex , (Harvard University Press, Cambridge, Mass., 1998). 14. Kawaguchi, Y. & Kubota, Y. GABAergic cell subtypes and their synaptic connections in rat frontal cortex.
Cereb. Cortex , 476-486 (1997). 15. Tosches, M.A. , et al. Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles.
Science , 881-888 (2018). 16. Cauli, B. , et al.
Molecular and physiological diversity of cortical nonpyramidal cells.
J Neurosci , 3894-3906. (1997). 17. Tsiola, A., Hamzei-Sichani, F., Peterlin, Z. & Yuste, R. Quantitative morphologic classification of layer 5 neurons from mouse primary visual cortex. The Journal of comparative neurology , 415-428 (2003). 18. Guerra, L. , et al.
Comparison between supervised and unsupervised classifications of neuronal cell types: a case study.
Dev Neurobiol , 71-82 (2011). 19. Markram, H. , et al. Interneurons of the neocortical inhibitory system.
Nat Rev Neurosci. , 793-807 (2004). 20. Markram, H. , et al. Reconstruction and Simulation of Neocortical Microcircuitry.
Cell , 456-492 (2015). 21. DeFelipe, J. , et al.
New insights into the classification and nomenclature of cortical GABAergic interneurons.
Nature reviews. Neuroscience , 202-216 (2013). 22. Shepherd, G.M., Marenco, L., Hines, M., Migliore, M., McDougal, R.A., Carnevale, N.T., Newton, A.J., Surles-Zeigler, M. , Ascoli, G. Neuron names: a gene- and property-based name format, with special reference to cortical neurons. Front. Neuroanat. (2019). 23. McGarry, L.M. , et al.
Quantitative classification of somatostatin-positive neocortical interneurons identifies three interneuron subtypes.
Front Neural Circuits , 12 (2010). 24. Butt, S.J.B. , et al. The Temporal and Spatial Origins of Cortical Interneurons predicts their Physiological Subtype.
Neuron (2005). 25. Yuste, R. Origin and classification of neocortical interneurons.
Neuron , 524-527 (2005). 26. Kepecs, A. & Fishell, G. Interneuron cell types are fit to function. Nature , 318-326 (2014). 27. Arendt, D. , et al.
The origin and evolution of cell types.
Nat Rev Genet , 744-757 (2016). Copenhagen Classification, p.19
28. Dumitriu, D., Cossart, R., Huang, J. & Yuste, R. Correlation between axonal morphologies and synaptic input kinetics of interneurons from mouse visual cortex.
Cerebral cortex (New York, N.Y. : 1991) , 81-91 (2007). 29. Jiang, X. , et al. Principles of connectivity among morphologically defined cell types in adult neocortex.
Science (2015). 30. Wheeler, D.W. , et al.
Hippocampome.org: a knowledge base of neuron types in the rodent hippocampus.
Elife (2015). 31. Mihaljevic, B., P. Larrañaga, R. Benavides-Piccione, S.. Hill, J.. DeFelipe, C. Bielza. Towards a supervised classification of neocortical interneuron morphologies. BMC Bioinfomatics , 511 (2018). 32. Shekhar, K. , et al. Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics.
Cell , 1308-1323 e1330 (2016). 33. Tasic, B. , et al.
Adult mouse cortical cell taxonomy revealed by single cell transcriptomics.
Nat Neurosci , 335-346 (2016). 34. Paul, A. , et al. Transcriptional Architecture of Synaptic Communication Delineates GABAergic Neuron Identity.
Cell , 522-539 e520 (2017). 35. Yager, T.D., Nickerson, D.A. & Hood, L.E. The Human Genome Project: creating an infrastructure for biology and medicine.
Trends Biochem Sci , et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.
Cell , 1202-1214 (2015). 38. Klein, A.M. , et al.
Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.
Cell , 1187-1201 (2015). 39. Zheng, G.X. , et al.
Massively parallel digital transcriptional profiling of single cells.
Nat Commun , 14049 (2017). 40. Habib, N. , et al. Massively parallel single-nucleus RNA-seq with DroNc-seq.
Nat Methods , 955-958 (2017). 41. Bush, E.C. , et al. PLATE-Seq for genome-wide regulatory network analysis of high-throughput screens.
Nat Commun , 105 (2017). 42. Garber M, G.M., Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods , 469-477 (2011). 43. Stuart, T. & Satija, R. Integrative single-cell analysis.
Nat Rev Genet , 257-272 (2019). 44. White, J.G., Southgate, E., Thomson, J.N. & Brenner, S. The structure of the nervous system of the nematode Caenorhabditis elegans . Philos. Trans. R. Soc. Lond. (Biol). , 1-340 (1986). 45. Sulston, J.E. & Horvitz, H.R. Post-embryonic cell lineages of the nematode, Caenorhabditis elegans.
Dev Biol , 110-156 (1977). 46. Ecker, J.R. , et al. The BRAIN Initiative Cell Census Consortium: Lessons Learned toward Generating a Comprehensive Brain Cell Atlas.
Neuron , 542-557 (2017). 47. Regev, A. , et al. The Human Cell Atlas.
Elife (2017). 48. Zeng, H. & Sanes, J.R. Neuronal cell-type classification: challenges, opportunities and the path forward. Nature reviews , 530-546 (2017). 49. Fu, M. & Zuo, Y. Experience-dependent structural plasticity in the cortex. Trends Neurosci , 177-187 (2011). 50. Gerfen, C.R., Paletzki, R. & Heintz, N. GENSAT BAC cre-recombinase driver lines to study the functional organization of cerebral cortical and basal ganglia circuits. Neuron , 1368-1383 (2013). 51. He, M. , et al. Strategies and Tools for Combinatorial Targeting of GABAergic Neurons in Mouse Cerebral Cortex.
Neuron , 1228-1243 (2016). Copenhagen Classification, p.20
52. Nowakowski, T.J. , et al.
Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex.
Science , 1318-1323 (2017). 53. Peng, Y.R. , et al.
Molecular Classification and Comparative Taxonomics of Foveal and Peripheral Cells in Primate Retina.
Cell (2019). 54. Franke, K. , et al.
Inhibition decorrelates visual feature representations in the inner retina.
Nature , 439-444 (2017). 55. Greene, M.J., Kim, J.S., Seung, H.S. & EyeWirers. Analogous Convergence of Sustained and Transient Inputs in Parallel On and Off Pathways for Retinal Motion Computation.
Cell Rep , 1892-1900 (2016). 56. Tsukamoto, Y. & Omi, N. Classification of Mouse Retinal Bipolar Cells: Type-Specific Connectivity with Special Reference to Rod-Driven AII Amacrine Pathways. Front Neuroanat , 92 (2017). 57. Kim, J.S. , et al. Space-time wiring specificity supports direction selectivity in the retina.
Nature , 331-336 (2014). 58. Martersteck, E.M. , et al.
Diverse Central Projection Patterns of Retinal Ganglion Cells.
Cell Rep , 2058-2072 (2017). 59. Durruthy-Durruthy, R. , et al. Reconstruction of the mouse otocyst and early neuroblast lineage at single-cell resolution.
Cell , 964-978 (2014). 60. Trapnell, C. , et al.
The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.
Nat Biotechnol , 381-386 (2014). 61. Shalek, A.K. , et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation.
Nature , 363-369 (2014). 62. Trapnell, C. Defining cell types and states with single-cell genomics.
Genome Res , 1491-1498 (2015). 63. Cao, J. , et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells.
Science , 1380-1385 (2018). 64. Winnubst, J., Bas, E., Ferreira, T. , Wu,Z., , Economo, M.N., Edson, P., Arthur, B.J., Bruns, C., Rokicki, K., Schauder, D., Olbris, D.J., Murphy, S.D., Ackerman, D.G., Arshadi, C., Baldwin, P., Blake, R., Elsayed, A., Hasan, M., Ramirez, D., Dos Santos, B, & Weldon, M., Zafar, A., Dudmann, J.T., Gerfen, C.R., Hantman, A.W., Korff, W., Sternson, S.M., Spruston, N., Svoboda, K., Chandrashekar, J. . Reconstruction of 1,000 projection neurons reveals new cell types and organization of long-range connectivity in the mouse brain.
BiorXiv (2019). 65. Hodge, R.D. , et al.
Conserved cell types with divergent features between human and mouse cortex. bioRxiv (2018). 66. He, Z. , et al.
Comprehensive transcriptome analysis of neocortical layers in humans, chimpanzees and macaques.
Nat Neurosci , 886-895 (2017). 67. Zeisel, A. , et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.
Science , 1138-1142 (2015). 68. Bakken, T.E. , et al.
Equivalent high-resolution identification of neuronal cell types with single-nucleus and single-cell RNA-sequencing. bioRxiv , 239749 (2017). 69. Hobert, O. Terminal Selectors of Neuronal Identity.
Curr Top Dev Biol , 455-475 (2016). 70. Arendt, D., Bertucci, P.Y., Achim, K. & Musser, J.M. Evolution of neuronal types and families.
Curr Opin Neurobiol , 144-152 (2019). 71. Romesburg, H.C. Cluster analysis for researchers. , (Lifetime Learning, Belmont, CA, 1984). 72. Wiley, E.O., Liberman, B.S.
Phylogenetics: Theory and Practice of Phylogenetic Systematics , (Wiley-Blackwell, 2011). 73. Andrews, T.S. & Hemberg, M. Identifying cell populations with scRNASeq.
Mol Aspects Med , 114-122 (2018). Copenhagen Classification, p.21
74. Kiselev, V.Y. , et al.
SC3: consensus clustering of single-cell RNA-seq data.
Nat Methods , 483-486 (2017). 75. Harris, K.D. , et al. Classes and continua of hippocampal CA1 inhibitory neurons revealed by single-cell transcriptomics.
PLoS Biol , e2006387 (2018). 76. Saeys, Y., Inza, I., Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics , 2507-2517 (2007). 77. Pereira, P., Gama, J., Pedroso, J. Hierarchical Clustering of Time-Series Data Streams IEEE Transactions on Knowledge and Data Engineering , 615-627 (2008). 78. Altschul S., G., W., Miller, W., Myers E., Lipman, D. Basic Local Alignment Search Tool. Journal of Molecular Biology , 403-410 (1990). 79. Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types.
Genome Biol , R21 (2005). 80. Osumi-Sutherland, D. Cell ontology in an age of data-driven cell classification. BMC Bioinformatics , 558 (2017). 81. Firestein, S. Ignorance: How it drives science , (Oxford University Press, New York, 2012). 82. Kendal, S., Creen, M.
An Introduction to Knowledge Engineering , (Springer-Verlag, London, 2007). 83. Kejriwal, M., , ISBN & Springer.