AA Living Review of Machine Learning for Particle Physics
Matthew Feickert ∗ and Benjamin Nachman
2, 3, † Department of Physics, University of Illinois at Urbana-Champaign Physics Division, Lawrence Berkeley National Laboratory Berkeley Institute for Data Science, University of California (Dated: February 5, 2021)Modern machine learning techniques, including deep learning, are rapidly being applied, adapted,and developed for high energy physics. Given the fast pace of this research, we have created a livingreview with the goal of providing a nearly comprehensive list of citations for those developing andapplying these approaches to experimental, phenomenological, or theoretical analyses. As a livingdocument, it will be updated as often as possible to incorporate the latest developments. A list ofproper (unchanging) reviews can be found within. Papers are grouped into a small set of topics to beas useful as possible. Suggestions and contributions are most welcome, and we provide instructionsfor participating.
Machine learning (ML) is a generic term usedto describe any automated inference procedure,broadly deﬁned. As such, machine learning plays akey role in nearly all areas of high energy physics(HEP). Traditionally, machine learning has beensynonymous with “multivariate techniques”, withBoosted Decision Trees as the community favoritemethod and TMVA  as the community favoritetool. The set of methods and tools commonly usedin HEP has grown signiﬁcantly in recent years as aresult of the deep learning revolution.With the rapid development of research at the in-tersection of machine learning and HEP, it is diﬃcultto follow the latest developments. This is a chal-lenge for new researchers to integrate into the ﬁeldand also for seasoned practitioners to put their workinto the context of the existing literature. To helpsolve this challenge, we have created a review of MLfor HEP with the goal of providing a nearly com-prehensive list of citations for papers that developand apply ML to experimental, phenomenological,or theoretical analyses. In order to be comprehen-sive and remain useful, this review is living in thesense that it is continuously updated and is open forcommunity contributions.The
Living Review ( https://github.com/iml-wg/HEPML-LivingReview ) also includes a listof “normal” (unchanging) reviews within. The re-mainder of the references are organized into a smallnumber of topics to make searching through themeﬃcient for the user. Papers may be referenced inmore than one category. The fact that a paper islisted in the review does not endorse or validate ∗ [email protected] † [email protected] its content — that is for the community (and forpeer-review) to decide. Furthermore, the classiﬁ-cation is a best attempt and may have ﬂaws andcommunity input is requested if (a) we have misseda paper you think should be included, (b) a paperhas been misclassiﬁed, or (c) a citation for a paperis not correct or if the journal information is nowavailable. The review is built automatically fromthe contents of the Git repository on GitHub usingL A TEX focused continuous integration services andafter passing validation checks PDF and Markdownversions, seen respectively in Fig. 1 and Fig. 2, areautomatically deployed through continuous deliveryto a web accessible area on GitHub. In addition toproviding a living PDF hosted on GitHub, we alsoprovide a corresponding BibTeX ﬁle that anyonecan use when they write new papers. Please checkback before you post your own paper to arXiv toensure that you have the latest updates.Note that this Living Review does not provide areview of machine learning in general. Some meth-ods (e.g. Generative Adversarial Networks ) havecitations within the Living Review, but we encour-age you to look elsewhere for original research andreviews in areas of pure and applied machine learn-ing outside of HEP.The purpose of this paper is to brieﬂy introducethe structure of the Living Review (Sec. II) and de-scribe how to contribute (Sec. III). The note endswith outlook (Sec. IV) and conclusions (Sec. V). Fur-thermore, it will serve as an unchanging reference tothe review, which may be useful in some cases.
Organizing papers into topics is critical for discov-erability. Most papers do not provide keywords andoften do not specify enough information in their ti-tle and/or abstract to be automatically categorized. a r X i v : . [ h e p - ph ] F e b Therefore, we have proposed a list of categories andmanually place papers into groups. A single papercan be in more than one group. As with all parts ofthe review, the categorized are alive and may changeand expand as the ﬁeld evolves. Categories includeClassiﬁcation, Regression, Generation, Anomaly De-tection, and more. Furthermore, sub-categories areprovided in some cases when there are multiple re-search directions within a particular category. Wehave also provided brief descriptions for each cate-gory and sub-category, as illustrated in Fig. 1.
FIG. 1. A snapshot of the PDF form of the review, whichincludes descriptions for each category and sub-category.FIG. 2. A snapshot of the Markdown website form of thereview, with topic papers hyperlinked to their referencesand, when available, DOIs.
In addition to being a living document that is up-dated on demand with the release of new publica- tions, the review also beneﬁts from community in-volvement. Anyone may — and frequent contribu-tors have — submit a new paper or document to thereview in the form of a contribution through a pullrequest (PR) to the review’s GitHub project. Tohelp steer new contributions and ensure a smoothPR process and review with the maintainers, a con-tributions guide is located in the project’s Git repos-itory in the form of a “
CONTRIBUTING.md ” docu-ment, seen partially in Fig. 3 — a project staplein the Open Source community. The contributingguide gives detailed instruction and examples on therecommended procedures and software workﬂow tomake revisions and additions, and additionally ad-dresses frequently asked questions new contributorsmay have.
FIG. 3. A snapshot of the
CONTRIBUTING.md docu-ment detailing the guidelines for contributions from non-maintainers to the review.
IV. FUTURE PLANS
Suggestions for new features can be submitted tothe Living Review by creating a GitHub issue as doc-umented in the project’s
CONTRIBUTING.md . Thereare already several key features that we would liketo add in the future, mostly related with various lev-els of automation. The most basic update we wantto add is to automatically update paper references.Papers are mostly added to the Living Review whenthey are posted to arXiv. Journal references are cur-rently only added in an ad-hoc fashion. One way thiscould be implemented is to be synched with Inspirefollowing the links from the preprints. This will notwork for all papers, as they are not all listed on arXivand may not be listed on Inspire. The longer termvision is for some parts of the daily update to beautomated. It seems unlikely that this can be com-pletely automated given the rapidly changing natureof the ﬁeld (and thus what constitutes the ﬁeld), butcertainly quires for some key words may be able tocatch a signiﬁcant fraction of new papers posted toarXiv.
This paper has described the Living Review ofMachine Learning for High Energy Physics. Thereview will continuously evolve as new papers arewritten in this area and we welcome and encour-age community contributions to any aspect of theproject. Machine learning holds great potential tosigniﬁcantly enhance the way we do HEP, both ex-perimentally and theoretically, as is becoming well-documented by the growing literature in this area.We hope that the Living Review is a useful tool to keep track of this rapid progress.
We are grateful to the CERN Inter-ExperimentalLHC Machine Learning Working Group (IML) forsupporting the Living Review initiative. We wouldlike to particularly thank Loukas Gouskos, DavidRousseau, Pietro Vischia, and Riccardo Torre whohelped us deﬁne the scope of the Living Reviewproject and have graciously agreed to allow the re-view to be hosted on the IML GitHub. We arealso grateful to everyone in the HEP communityhas contributed to the review. We would also liketo thank Martin Erdmann and Kyle Cranmer fortheir support and encouragement at the beginningof this project. BN is supported by the Departmentof Energy, Oﬃce of Science under contract numberDE-AC02-05CH11231. MF is supported in part bythe National Science Foundation, under cooperativeagreement OAC-1836650.  A. Hoecker et al.et al.