LRP2020: Machine Learning Advantages in Canadian Astrophysics
K.A. Venn, S. Fabbro, A Liu, Y. Hezaveh, L. Perreault-Levasseur, G. Eadie, S. Ellison, J. Woo, JJ. Kavelaars, K.M. Yi, R. Hlozek, J. Bovy, H. Teimoorinia, S. Ravanbakhsh, L. Spencer
11 E015: Machine Learning Advantages in Canadian Astrophysics
Machine Learning Advantages in Canadian Astrophysics
The application of machine learning (ML) methods to the analysis of astrophysical datasets is on the rise, particu-larly as the computing power and complex algorithms become more powerful and accessible. As the field of MLenjoys a continuous stream of breakthroughs, its applications demonstrate the great potential of ML, ranging fromachieving tens of millions of times increase in analysis speed (e.g., modeling of gravitational lenses or analysingspectroscopic surveys) to solutions of previously unsolved problems (e.g., foreground subtraction or efficient tele-scope operations). The number of astronomical publications that include ML has been steadily increasing since2010. With the advent of extremely large datasets from a new generation of surveys in the 2020s, ML methodswill become an indispensable tool in astrophysics. Canada is an unambiguous world leader in the developmentof the field of machine learning, attracting large investments and skilled researchers to its prestigious AI ResearchInstitutions. This provides a unique opportunity for Canada to also be a world leader in the application of machinelearning in the field of astrophysics, and foster the training of a new generation of highly skilled researchers.
Authors
Kim Venn (University of Victoria),
S´ebastien Fabbro (NRC-Herzberg),
Adrian Liu (McGill),
Yashar Hezaveh (Universit´e de Montr´eal),
Laurence Perreault-Levasseur (Universit´e de Montr´eal, MILA),
Gwendolyn Eadie (University of Toronto),
Sara Ellison (University of Victoria),
Joanna Woo (Simon Fraser University),
JJ Kave-laars (NRC-Herzberg),
Kwang Moo Yi (University of Victoria),
Ren´ee Hlo˜zek (University of Toronto),
Jo Bovy (University of Toronto),
Hossen Teimoorinia (NRC-Herzberg),
Siamak Ravanbakhsh (McGill, MILA),
LockeSpencer (University of Lethbridge)
Applying machine learning (ML) methods to analyzing astrophysical datasets have become extremely popular,particularly as the computing power and complex algorithms become more powerful and accessible. Large obser-vational surveys, as well as simulations, have provided massive datasets for developing ML tools with astrophysicalapplications, making ML ever more tempting. As the field of ML enjoys a continuous stream of breakthroughs,its applications in astronomy demonstrate its great potential. These examples include achieving tens of millions oftimes increase in analysis speed, for example in modeling gravitational lenses or analysing spectroscopic surveys,as well as introducing new solutions to previously unsolved problems, such as foreground subtraction or efficienttelescope operations. With the advent of extremely large datasets from a new generation of surveys in the 2020s,ML methods will become an indispensable tool in astrophysics.
Today’s widespread use of artificial intelligence (AI), and more specifically ML, a certain class of approaches toAI, can be traced back to the beginning of the ‘deep learning revolution’, when the 2012
ImageNet challenge waswon by Krizhevsky et al. Originally published in 2009,
ImageNet is an image database containing more than 14million hand-annotated images. Since 2010, the ImageNet project runs an annual competition called the ImageNetLarge Scale Visual Recognition Challenge (ILSVRC), where algorithms compete to identify objects present in asubset of images from the dataset. In 2012, a group from the University of Toronto submitted a deep convolutionalneural network (CNN) architecture called
AlexNet (which is still used in research to this day) that outperformedprevious models and nearest contenders by a margin of more than 10% in accuracy. One of the strengths of thismodel when compared to its brittle competitors, and which enabled its widespread use across many fields, is thedata-driven nature of its training: while the training was performed on a specific dataset, the architecture of themodel itself is completely general and could, in principle, be used to perform learning on practically any visualdataset. a r X i v : . [ a s t r o - ph . I M ] O c t E015: Machine Learning Advantages in Canadian AstrophysicsFigure 1: Astronomy papers that include machine learning methods in the abstract or title since 2010. From theAstrophysics Data System (ADS).By 2014, all the high-scoring
ImageNet competitors were using deep architectures. Since then, other high-profile datasets have been introduced for deep learning by Google, Microsoft, and the Canadian Institute for Ad-vanced Research . Today, these convolutional neural networks are omnipresent beyond the confined world ofacademic challenges, including but not limited to, Facebook , self-driving cars, natural language processing, frauddetection, robot navigation, medical diagnostics, targeted marketing, and gaming. They are readily applicable toanything requiring image, audio or video processing.CNNs and many deep learning methods learn patterns between nearby pixels on ascending levels of abstractionto produce outputs of interest, using thousands to millions of computations at each level. In CNNs, at each ofthese levels, referred to as layers , this is done by processing the image by convolving it with a number of filters.The resulting maps are then fed to the following layer as an input. After a number of these layers, the output ofthe last layer is interpreted as the output of the network. The values of the filters, also called network weights,are learned through a process known as training , where pairs of correct input-output examples are shown to thenetwork. Given enough training examples, these networks can make accurate predictions on previously unseenexamples using these learned parameters.While CNNs date back to well before the beginning of the 21st century, the keys to their recent success – properinitialization (Glorot & Bengio, 2010), advanced activation functions (Nair & Hinton, 2010), better solvers (Kingma& Ba, 2015), and the use of Graphic Processing Units (GPU) for high performance computing – were not developeduntil the last decade. Therefore, the 1990’s saw a number of attempts at using neural networks for, e.g., spectralclassifications and parameter estimation(e.g., Bailer-Jones et al., 1998; Gulati et al., 1994) and for star/galaxyseparation through algorithms such as SExtractor (Bertin & Arnouts, 1996), but they were not successful. Deeplearning only started to gain popularity among the astronomy community around 2016, after the CASCA Mid-TermReview of LRP2010. In Fig. 1, the rise in ML applications in astronomy can be seen to have sky rocketed, nowreaching about 2 papers/day.Even with this recent growth in popularity, astronomy still represents a golden, mostly unexplored opportunityfor machine learning, given the existence of relatively large, homogeneous datasets for a range of applications https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/ Deep learning is known to benefit immensely from data (the “ unreasonable effectiveness of data”), as demonstrated by Sun et al.(2017).
E015: Machine Learning Advantages in Canadian AstrophysicsFigure 2: The CNN used by StarNet to analyse SDSS-APOGEE spectra (see Fabbro et al., 2018).from imaging to spectroscopy. The Sloan Digital Sky Survey (SDSS) DR15 alone provides over 170 TB data (ofthis ∼ % is raw or intermediate data), where > TB is APOGEE spectra (raw, reduced, or synthetic), > TB is eBOSS photometric data (raw or reduced), > TB is eBOSS spectroscopic data, and > TB is MaNGAspectra (raw or reduced 2d individual exposures and 3d summary stacks). The datasets associated with the LargeSynoptic Survey Telescope (LSST) and the Square Kilometre Array (SKA) are expected to dwarf these, e.g., LSSTis expected to obtain 20 TB per night, and SKA is estimated to run at 160 TB per second. These latter data rates aretoo high to record all of the raw data, which is much like the LHC at ∼ TB per second , forcing astronomersto automate the selection of only the most interesting events to record.One of the early successful applications of deep learning in astronomy was in the field of strong gravitationallensing, where CNNs were used for performing both the tasks of lens finding (e.g. Lanusse et al., 2017; Jacobs et al.,2017; Petrillo et al., 2018; Pourrahmani et al., 2018; Schaefer et al., 2018), and lens modeling (Hezaveh et al., 2017;Perreault Levasseur et al., 2017; Morningstar et al., 2018, 2019), automating and accelerating the inference of lensparameters by many orders of magnitude (about ten million times faster on a single graphics processing unit),without loss of accuracy when compared to time- and resource-consuming traditional methods. In the comingyears, hundred of thousands of new gravitational lenses from large surveys (e.g., Euclid, LSST), existing and newfacilities (e.g., ALMA, JWST, TMT), will provide an opportunity to transform this field. The further developmentof these promising machine learning-based analysis methods and their implementation in analysis pipelines willallow us to fully exploit the wealth of these upcoming data and circumvent difficulties faced by traditional maximumlikelihood methods, allowing us, for example, to map the distribution of matter on small-scales to high precision,opening a new window for testing dark matter models.An additional early success of ML in astrophysics has been the use of a deep neural network architecture toanalyse both observed and synthetic stellar spectra (see Fig. 2). Fabbro et al. (2018) showed that the stellar param-eters (temperature, gravity, and metallicity) from entire SDSS-III APOGEE spectral database can be determinedwith similar precision and accuracy as the APOGEE pipeline in only a few seconds with machine learning. Thedata-driven ML model was further developed by Leung & Bovy (2019) to compute chemical abundances for over15 elements with higher precision than the APOGEE (or other) data reduction pipelines. The data-driven ML modelwas also modified to analyse nearly 1 million spectra from LAMOST (Zhang et al., 2019) with excellent perfor-mance and significantly improved precision. In the upcoming era of spectroscopic surveys, ML will be invaluablefor fast, efficient, and precise analyses.We summarize a few more important and successful applications of ML in the astronomical literature: • The detection of complex, rare, or new structures in surveys, such as gravitational lenses (Lanusse et al.,2018), supernovae (Moss, 2018), or galaxy mergers (Ackermann et al., 2018), etc; • Surrogate modelling of computationally expensive simulations, where ML models can act as extremelyfast interpolators or emulators, for example in planetary atmosphere simulations (Zingales & Waldmann,2018), in the production of otherwise extremely expensive hydrodynamical simulation to produce HI maps(Zamudio-Fernandez et al., 2019), or in the extrapolation of large scale cosmological structure formation https://home.cern/science/computing/processing-what-record E015: Machine Learning Advantages in Canadian AstrophysicsFigure 3: Comparison of ML techniques. Near the origin, where the size of the dataset is small, different represen-tations are not that distinct, and improvements in feature engineering perform better. As data sizes increase, deepneural networks are able to benefit from their over-parameterized redundancy and capture representations better.Thus, there is a data volume and variety requirement for selecting the best alogrithms foran application.(Yin et al., 2019); • The morphological classifications of galaxies (Dieleman et al., 2015); • The removal of systematic contaminants in data, for example in terrestrial radio frequency interference inradio telescope data (Kerrigan et al., 2019), in foreground removal (such as dust) in CMB data (Aylor et al.,2019) and in intensity mapping data, or in cases of difficult background removal (RFI in radio data, cosmicrays in space, bright star halos in wide fields) • The Photometric LSST Astronomical Time Series Classification Challenge (The PLAsTiCC team et al.,2018), an open ML challenge hosted on Kaggle with over 1000 teams participating in the challenge, manyof whom where not astronomers. This challenge illustrated the power of astronomical data to engage diversegroups interested in machine learning techniques and methodologies. • Other examples related to time domain astronomy challenges are reviewed by Hloˇzek (2019).
In the coming decade, several astronomy programs will provide very large and homogeneous dataset that are idealfor ML applications. It is a challenge to consider how we are going to access and analyse such large datasetsgiven that the data volume will increase more rapidly than the network speed; e.g., see Fig. 3. ML methods benefitgreatly from larger data sets, and will require access to storage and computing resources in order to exploit fullythe potential of multi-surveys. Machine learning infrastructure platforms, with fast data access, easy workflowmanagement, and complex software pipeline deployments have been key for the success of the AI in industry (e.g.,see statements from The State of AI 2019 and AI for the Real World ). However, for astronomy research, theaccess to the resources and computing infrastructure has not kept up with the fast pace development observed inindustry data science platforms. At minimum, establishing a more robustly resources CANFAR science portalis required to enable this. Tight cooperation with national research infrastructure, but also the new AI ResearchInstitutes (next Section) will be necessary to take full advantage of the revolutions coming from AI research.This leap will be particularly important in the next decade with very large scale surveys. • Euclid will be launched by ESA in 2022 and will observe 15,000 deg of the darkest sky that is free of https://hbr.org/2018/01/artificial-intelligence-for-the-real-world E015: Machine Learning Advantages in Canadian Astrophysicscontamination by light from our Galaxy and our Solar System. About 10 billion sources will be observed,where ∼ • The LSST will begin in 2020 in Chile with an 8.4-meter telescope and 3-gigapixel camera to produce awide-field astronomical survey of the universe. It will photograph the entire available sky every few nights,collecting 20 TB of raw image data every night, and processed in near-real time to produce alert notificationsof new and unexpected astronomical events. The data will be reprocessed annually to create a 500 PB dataarchive by the end of the 10-year mission. Canada has ambitions to join the LSST project by constructing a20 PB public imaging and catalog archive as in-kind contribution to the project. This science archive will becoupled with the Canadian LSST Alert Science Platform (CLASP). CLASP will supply computing hardwareand the platform interface needed to make optimal use of the LSST Alert Stream. • The Canadian HI Observatory and Radio transient Detector (CHORD) is a proposed next-generation radiointerferometer, which along with the SKA, are examples of large radio instruments to be expected in the nextdecade. Both instruments will combine signals received from hundreds to thousands of antennas spread overseveral thousand kilometres. Already, similar projects such as the Canadian Hydrogen Intensity MappingExperiment (CHIME) and the Hydrogen of Epoch of Reionization Array (HERA) require custom hardwarefor on-site processing, with data flows on the order of 13 terabits of data/second—more than all Canada’sinternet traffic. Even after compression, these telescopes have data rates of order ∼ T B per day. Theanticipated data rates will only get more extreme with CHORD and the SKA, posing huge problems in bothstorage and analysis that will necessitate even better algorithms for real-time signal processing, compression,and extraction of signals of interest. Some estimates for the SKA, for example, suggest that the array couldgenerate an exabyte/day of raw data, possibly compressed to ∼
10 PB/day. • Multi-object spectroscopic facilities (e.g., the ESO-4MOST, INT-WEAVE, Subaru-PFS, SDSS-V, and theCanadian-led Maunakea Spectroscopic Explorer) are planned for the 2020’s to obtain high quality spectra ofseveral thousand objects simultaneously. Spectra need to be converted into physical parameters to be usedto reveal the structure and dynamics of the Milky Way galaxy, the nature of dark matter and dark energy,the formation and evolution of galaxies, and the structure of the cosmos. These surveys will deliver over > million spectra each over the whole sky, requiring intensive data analysis, including time domaininformation. Big data has become an essential tool for scientific progress, underpinning world-class research across all disci-plines, including astronomy.
Canadian Government Support:
Through Budget 2018 , the Canadian Government provided funding in sup-port of a Digital Research Infrastructure Strategy to deliver more open and equitable access to advanced computingand big data resources for researchers across Canada. The National Research Council’s (NRC’s) world-class scien-tists are an important part of this strategy, helping to advance Canadian academia and industry through cutting-edgeinnovation and reinforcing Canada’s research capabilities and strengths. Their facilities, expertise and networkshelp convene strategic, large-scale national teams committed to innovation. Budget 2018 helped to lower the costof partnering with the NRC so more small and medium-sized enterprises, colleges and universities have been ableto make use of its services. This plays out in terms of ML applications in astronomy through NRC-Herzberg, par-ticularly through the new hires and strategic investments through the Canadian Astronomical Data Centre (CADC)and development of the Canadian Advanced Network For Astronomy Research. NRC-Herzberg Astronomy & Astrophysics (CADC/CANFAR):
Current expertise in ML applications inCanadian astronomy are centred at or include significant collaboration with NRC-HAA. Through CANFAR (op- E015: Machine Learning Advantages in Canadian Astrophysicserated by NRC’s CADC), astronomers in Canada have direct and easy access to cyber infrastructure, growingnumbers of GPUs, fast access to resources, fast and interactive processing, and safe storage. These are necessarycomponents for a competitive research program in the era of big data to remain competitive with the world. Thedata that CADC has been archiving for over a decade also provides long baselines for validating new ML datamethods, and providing testing data sets. Furthermore, NRC-HAA CADC are a working model of the value andinnovation possible when expertise is brought together from computer science, engineering, data science, etc. andfocused on astronomy.
AI Research Institutions (MILA, Vector, AMII):
An equally important resource and new opportunity inCanada for the development of ML applications in astronomy are through our world-leading AI research centers:the Montreal Institute for Learning Algorithms (MILA, Montreal). the Vector Institute for Artificial Intelligence(Vector, Toronto), and the Alberta Machine Intelligence Institute (AMII, Edmonton). These are not-for-profit re-search and educational institutions, closely partnered with the Universities, and partially supported through govern-ment funding. MILA and Vector were founded in 2017, while AMII has a 15 year history and only recently begunto specialize in ML applications. All of these institutions work with industry, start-ups, incubators, and acceleratorsto advance AI research and drive its application, adoption, and commercialization across Canada. Increasing ourlinks with these specialized institutions, which are recognized as the world leaders in ML/AI, can provide Canadianastronomers with a competitive advantage in the era of big data science.
New Methodologies:
ML is a different way of thinking about computing problems and big data comparedwith traditional methods. It goes beyond feature engineering to improve performance (e.g., detecting anomaliesor reducing data uncertainties), and we will need to move beyond simply substituting traditional legacy programswith ML applications. Some examples include astronomers at UVic and SFU who are exploring ML for detectionof new astronomical phenomena (e.g., Teimoorinia et al., 2016), and even as a tool to learn new physics. Thiskind of innovative thinking is ideal for the ML research centres and their innovative think-tank methodologies,but combined with astronomers who know the motivating science questions. ML gives us the opportunity toimagine using ML methodologies to e.g., control observatories, synchronize observing facilities so that there ismore collaboration in observational planning, and even collate multi-wavelength datasets (imaging, spectroscopy,wavelength regions, spatial and wavelength resolution, time domain, polarization) in ways we have yet to imagine.
Advantages for Canadian Astronomical Research:
Currently, there are several groups in Canada leadingthe development of ML applications for astronomical research. In Montreal, astronomers at McGill and UdMare working with researchers at MILA on ML applications for the analysis of Euclid data to study cosmologicalstructure (Hezaveh, 2019). In Victoria, several groups of astronomers are working with researchers at NRC on MLapplications ranging from extragalactic studies (Bottrell et al., 2019) to stellar spectroscopic surveys (Bialek et al.,2019). In Toronto, several groups of astronomers are using ML techniques in both data science analyses (Leung& Bovy, 2019), and looking into ML for wavefront reconstruction and predictions for use in adaptive optics (e.g.,GIRMOS; Swanson et al., 2018). ML could even be embedded in the next generation of Canadian astronomicalinstrumentation (e.g., TMT-IRMOS) and surveys (MSE). It also provides excellent opportunities for collaborationswith industry (autonomous telescopes are alot less dangerous than autonomous vehicles), and provide outstandingopportunities for knowledge transfer and contributions to the Canadian economy. Finally, this new technique isknown to draw top students and researchers from interdisciplinary fields, making it an ideal training tool for skilledresearchers in the future.
The growth of machine learning applications in astronomical research has been remarkable and powerful. Thisnew data analysis technique requires us to think differently about astronomical problems, develop new approachesto data science, and collaborate extensively with researchers in computer science, engineering, and other fields.It attracts some of the top HQP from around the world, and can foster training of a new generation of highlyskilled researchers in astronomy and beyond. The methodologies are highly transferrable and astronomy providessome of the largest, homogeneous data sets for application testing and development. The knowledge transfer E015: Machine Learning Advantages in Canadian Astrophysicsis extraordinary such that ML developments in astronomy can be directly related to industrial innovation andthe Canadian economy, fields that are being lead by expertise at the Canadian ML research institutes (MILA,Vector, and AMII). For Canadian astronomy itself, ML could be embedded in the next generation of astronomicalinstrumentation and surveys, such that ML pipelines and scientific analysis tools make it possible for science tocome directly from the telescope.
1: How does the proposed initiative result in fundamental or transformational advances in our under-standing of the Universe?
As CMB observations have become increasingly more precise, inflation and the simplest Λ CDM models con-tinue to be a successful explanation of the evolution of the universe. However, the nature of the key componentsof these models remains unknown. Discovering the physical nature of the field(s) driving inflation, the sourceof the apparent accelerated expansion of the universe (dark energy), and the particle(s) constituting dark matterare the primary goals of modern cosmology.
ML as a component of future large scale surveys:
In the coming decade, a new generation of large surveysand telescopes (e.g., Euclid, LSST, CHIME) promise to transform cosmology as we know it. The large vol-umes of data produced by these instruments will allow significant improvements in the precision measurementof cosmological parameters, potentially allowing us to pinpoint specific models of dark energy and particleproperties of dark matter, while opening new windows into the exploration of the physics of the universe.The analysis of these data requires new computational and statistical methods, not only for the best possibleclassification schemes but also for scientific inference.
ML integration with physical simulations:
Complex astrophysical processes, combined with instrumentalsimulations face a computational bottleneck. The interplay between ML and physical models will continueto generate more creative ways to accelerate simulations, incorporate domain knowledge in ML models to bemore data efficient, and better match between simulations and real data by learning how to fill the syntheticgap. The interpretability of ML methods is a new, active area of research (e.g., Doshi-Velez & Kim, 2017), andastronomy-specific applications in which we already have physical understanding could provide interesting testcases for interpretability studies.
ML as tool for discovery:
In the medical field, ML has revealed previously unknown gender differences inretinal images, for example (Poplin et al., 2018). Therefore, ML has tremendous potential as a direct vehicle fordiscovery. Astronomers are only beginning to explore the power of ML to answer unsolved mysteries, makingML an exciting frontier field. New ML techniques, expanding software eco-system, and ML specializedhardware will permit drastic improvements in efficiencies in the analysis of complex simulations and largedata sets. ML is expected to emerge as a necessary tool to fully exploit the vast majority of survey data in thecoming decade. In this sense, ML will not only be an asset, but a necessity.
2: What are the main scientific risks and how will they be mitigated?
One of the main critiques of ML is that the algorithms are a black box, resulting in uncertain interpretations andpoor/misguided errors analyses. These risks can be mitigated through comparisons with traditional methods(which typically have a different set of uncertainties), benchmark data sets, incorporating physics into theML models, and statistical methods. Additional tools, such as the generation of saliency maps, have beendeveloped by the larger ML community to aid in interpretation. Thus, the risks are not specific to astrophysics,and often develop into interesting and active research topics, such as adversarial or interpretable ML withpromising results. While ML associated risks can be mitigated, training our community in best practices andcollaborating with ML researchers will be essential to ensuring that these techniques are used correctly. E015: Machine Learning Advantages in Canadian AstrophysicsThere is evidence that the Canadian astronomy community wants more focused training in statistics (seethe LRP2020 white paper
Astrostatistics in Canada ), and a desire for training in ML techniques likely existsas well. Collaborating with ML researchers and with statisticians will improve our ability to interpret MLoutputs. Recent development in deep learning tend to show larger ML models with larger data sets performbetter, and it could transform into a risk for Canadian scientific institutions if we do not have access to thelarge infrastructure needed to train such models. Mitigating this risk would involve co-development with ML-research in resource efficient and high performance methods for astrophysical research, while also securingaccess to ML friendly digital infrastructure.
3: Is there the expectation of and capacity for Canadian scientific, technical or strategic leadership?
Canada is a world leader in the development of the field of machine learning, attracting large investmentsand skilled researchers to its prestigious AI Research Institutes (MILA, Vector, AMII). This provides uniqueopportunities for Canada to also be a leader in the application of ML in the field of astrophysics.
4: Is there support from, involvement from, and coordination within the relevant Canadian communityand more broadly?
Established Canadian astronomers associated with large scale surveys (McGill, UdM, UVic, UBC, Toronto,UWO, etc.) directly benefit from ML. The Canadian ML-astro community is growing with a few recent hires(NRC, UdM). The Canadian astronomical instrumentation community is also starting to look into ML appli-cations for adaptive optics, RFI filtering, image processing. One could easily imagine ML being introducedinto observatory operations, such as queue scheduling, remote observing, and inter-observatory operations.The recent wave of new cross-disciplinary institutes at Canadian universities (McData at McMaster, Matrixat UVic, UBC Data Science Institute) have attracted funds and students, and the astrophysics communityshould seek more participation, driven by our ML interests and involvement in big data surveys. Altogetherthe Canadian community is involved in various ML activities but could benefit from more unification throughorganised meetings, collaboration with Canadian ML hubs, the Canadian ML Research Institutes, and industryparticipation.
5: Will this program position Canadian astronomy for future opportunities and returns in 2020-2030or beyond 2030?
Machine learning methods have seen a rapid expansion and breakthroughs in the past few years, often lead bythe Canadian AI Research Institutes (MILA, Vector, and AMII). The simplicity of implementing ML models,their remarkable power in finding complex patterns, and their adaptability to many different problems hasresulted in their widespread use in different fields. Astronomy provides new clean and robust applications toassess their efficacy, as well as new challenges. Conversely, they provide astronomers new tools for improvingspeed and precision in data analysis. For two main reasons, these methods could transcend current methods inastrophysics and cosmology, putting Canadian astronomy at an extreme advantage in the 2020’s and beyond.Simultaneously, the data rates for upcoming larger facilities, like SKA, will be another factor of 10-100xhigher. ML has two distinct advantages; (1) speed and automation, and (2) deep learning networks can learncomplex, high-order, non-Gaussian priors from their training data. These together can result in higher precisionand accuracy for many astronomical problems. By ramping up now in the 2020’s, we can expect Canadianastronomy to be in a leadership position throughout the 2030’s. E015: Machine Learning Advantages in Canadian Astrophysics
6: In what ways is the cost-benefit ratio, including existing investments and future operating costs,favourable?
Costs are currently the scientific choices of Canadian astronomy faculty, and commitments made at NRC-CADC. But we advocate for infrastructure that can keep up with these needs. Currently Compute Canada isnot able to keep pace with demand, with limited storage capacity and a short-term per user/per group accounts.Also, ML requires some overhead for users, thus modules that make it easier and quicker for users to accessML benefits should be developed. We suggest that this should be done in coordination with the ML leaders inCanada (e.g., MILA, Vector, AMII), as experts in computational techniques and statistical methodologies forbig data.
7: What are the main programmatic risks and how will they be mitigated?
Technical readiness:
The research pace in ML has been extremely rapid in the past few years, and it has beendifficult even for ML researchers to keep up. While it is not clear that this pace will continue, a lag in theastrophysical implementations could mean losing our current competitive edge. To avoid that loss requires anincrease in human resources (e.g., accessible user modules and computational help for applications), but alsoan increase in infrastructures (e.g., fast computers and processors, large and safe storage, easy access to andinteractions with computing and data resources).
Governance plan:
It is not yet clear where the raw data from the next generation of large surveys will bestored, but even working on the reduced data sets will be larger than anything we are yet used to. If storageis not in Canada, then we may only have the facilities to store processed data, made more complex by issuesrelated to data rights. For spectroscopy (SDSS, PFS, MSE), then raw data and reduced spectra will likely beheld by the survey institutions, but users should be able to download entire spectral libraries and parametertables. While this is not as computing intensive, these can still be large files and require significant storagespace. Modules to help users get the benefits of ML will need to be centralized (e.g. at NRC-CADC), andtherefore we will need a platform for sharing these across all Canadian Universities. Thus, we will need todevelop a governance model around safe, reliable, and large data storage for Canadian astronomers to leverageour current investments and reach our scientific goals.
8: Does the proposed initiative offer specific tangible benefits to Canadians, including but not limitedto interdisciplinary research, industry opportunities, HQP training, EDI, outreach or education?
Interdisciplinary and industrial opportunities:
ML is interdisciplinary by nature, being lead by computerscience and statistics, and involving engineering, scientists, medical researchers. The benefits of investing inML to Canadians are wide spread, ranging from technological transfers with research into self-driving cars,natural language processing, fraud detection, robot navigation, medical diagnostics, targeted marketing, andgaming, to knowledge transfer opportunities as students and researchers work on both scientific/astronomicalproblems that then be applied to industry applications. Because ML skills and training are in high demand andhighly transferrable, the field is very attractive to new students. This means some of the top young researcherswill at attracted to astronomical problems, providing our Canadian astronomical community with outstandingresearchers.
Equity, Diversity, and Inclusivity:
ML is attracting women and researchers from other under-representedgroups. We note that nearly 50% of the co-authors on this white paper are women, and most of us have(co-)supervised women students on ML related projects. Furthermore, our ML applications and big data re-quirements can provide outstanding research and employment opportunities for young Canadians and attracttop international researchers to Canada. One group whom we need to reach out to more with respect to astro-nomical research in general, and ML specifically, are the indigenous Canadian communities.0 E015: Machine Learning Advantages in Canadian Astrophysics
Outreach and education:
Astronomy has always captured the public’s imagination like no other science.The public is already familiar with ML in the form of image searches and facial recognition on Google andFacebook, and will be naturally fascinated by the use of ML in astronomy. This initiative will have the potentialto inspire schools and young people to learn programming skills early.