La Serena School for Data Science: multidisciplinary hands-on education in the era of big data
A. Bayo, M. J. Graham, D. Norman, M. Cerda, G. Damke, A. Zenteno, C. Ibarlucea
aa r X i v : . [ phy s i c s . e d - ph ] J a n Education and Heritage in the Era of Big Data in AstronomyProceedings IAU Symposium No. 367, 2021A.C. Editor, B.D. Editor & C.E. Editor, eds. © La Serena School for Data Science:multidisciplinary hands-on education in theera of big data.
A. Bayo , M. J. Graham , D. Norman , M. Cerda , , G. Damke , ,A. Zenteno , and C. Ibarlucea. Inst. de F´ısica y Astronom´ıa, Universidad de Valpara´ıso, Chileemail: [email protected] N´ucleo Milenio de Formaci´on Planetaria (NPF) CaliforniaInstitute of Technology. NSF’s OIR Lab, Tucson, AZ Inst. of Biomedical Sciences & Centerfor Medical Informatics and Telemedicine, Universidad de Chile. Biomedical NeuroscienceInstitute, Santiago, Chile. Instituto de Investigaci´on Multidisciplinar en Ciencia y Tecnolog´ıa,Universidad de La Serena Association of Universities for Research in Astronomy (AURA).
Abstract.
La Serena School for Data Science is a multidisciplinary program with six editions so far anda constant format: during 10-14 days, a group of ∼
30 students (15 from the US, 15 from Chileand 1-3 from Caribbean countries) and ∼ Keywords. astroinformatics, statistics, data bases, surveys, machine learning, big data.
1. Introduction: the challenge
The volume and complexity of astronomical data continue to grow as the current gen-eration of surveys come online (Gaia, SDSS / APOGEE, etc). Beyond these challenges,astronomers will need to work with giga-, tera-, and even peta-bytes of data in real timein the era of LSST. Large data-sets pose the challenge of developing and using new toolsfor data discovery, interoperability and access, and analysis.This framework brings also new opportunities for interdisciplinary research in appliedmathematics, statistics, machine learning, and other areas under active development.Astronomy provides a sandbox where scientists can come together from diverse fields toaddress common challenges within the “Big Data” paradigm.But of course, astronomy is not alone. Society’s inexorable digitization of data andthe rapidly evolving Internet are driving the need for global transformation of dataintensive science in many fields. Indeed, “Big Data” now impacts nearly every aspect ofour modern society, including retail, manufacturing, financial services, communications &mobile services, health care, life sciences, engineering, natural sciences, art & humanities.Clearly, our research leadership hangs on whether the next generation can be pro-ductive within the petabyte-sized data volumes generated in different domains. Unfor-tunately, the development of “Big Data” and “Artificial Intelligence” (AI) related skills(including, for example machine learning) is not present commonly enough in the Uni-versity curricula worldwide. This is particularly true in Chile (with a few examples that1 A. Bayo, et al.have emerged in the last years like the astroinformatics initiative from Universidad deChile), and also in the US outside of the main / top Universities.For instance, reports in Chile yield a deficit in highly trained AI related professionalsin the thousands per year (as claimed in the “Pol´ıtica Nacional de Inteligencia Artificial”draft presented by the Chilean government in December 2020).La Serena School for Data Science (LSSDS) emerged in 2013 with the leadership ofAURA, aiming at covering part of the gap in training via a combine effort of Chileanand US individuals and institutions.
2. A diverse school with a rich history
LSSDS targeted since the beginning students either in their last years of undergraduateschool, or the first years of graduate school (with a new pilot program involving high-school students). The school has welcomed students with majors or minors in eithermathematics, statistics, physics, computer sciences, astronomy, and more recently, bio-related subjects.The program is very intense, and lasts between 10 and 14 days with constant inter-actions between the core faculty and the students. The teaching philosophy is projectoriented with ∼
34% of the time spent on lectures (covering basic to intermediate levelstatistics, basics of Data Science, Machine Learning, Distributed computing and data-bases), ∼
23% of the time spent on hands-on labs (that settle the content of the lectures),and the remaining ∼
43% of the time is devoted to work on the group’s project with oneof the faculty acting as the mentor. The transition between the three types of activitiesare gradual through the school, with the first days being more “lecture” heavy, and thelast days of the school being focused solely on project time (Cabrera-Vives et al. (2017)).The school also offers opportunities for social interactions, with the “top” activitybeing the visit to several of the AURA telescopes stargazing close to the summit.The groups for the projects, with typically four students, are purposely design tomaximise the diversity among the students. The projects themselves are proposed bythe core faculty and tend to have different steps in difficulty with a possible open-endedfinal goal. Some of these projects have in fact resulted in successful observing proposals,continued collaboration, and even the replication of parts of the school by some of thestudents in their home institutions.Diversity in the school is pursued since the very beginning, starting with the studentselection process. When sorting the students applications, we try to keep a balance be-tween the different minors and majors, we look for diversity of institutions (keeping abalance between students from big / well-known schools and those less renowned), genderbalance, and nationality.We believe that most of the success of the program can be grouped in two aspects:the inherent richness obtained from the multidisciplinary and diversity conscious student(and faculty) selection process, and the strong commitment from the faculty. Regardingthe latter, more than half of the professors stay through the whole school serving aslecturers, hands-on instructors, and as mentors for the student group projects.Another very relevant factor is the very strong commitment and support from AURA,NOIRLab, the LSST Corporation, NSF, REUNA, CONICYT, CORFO and other Chileaninstitutions that, since the very beginning understood the need to train the new genera-tion of, not only astronomers, but scientists, in data driven problems, which particularlybenefit from diversity and multidisciplinarity.
References
Cabrera-Vives, G. Reyes, I., F¨orster, F., Est´evez, P. A., Maureira, JC.. 2017,