Integrating Machine Learning with HPC-driven Simulations for Enhanced Student Learning
IIntegrating Machine Learning with HPC-drivenSimulations for Enhanced Student Learning
Vikram Jadhao, JCS Kadupitiya
Intelligent Systems EngineeringIndiana University
Bloomington, Indiana 47408 { vjadhao,kadu } @iu.edu Abstract —We explore the idea of integrating machinelearning (ML) with high performance computing (HPC)-driven simulations to address challenges in using simu-lations to teach computational science and engineeringcourses. We demonstrate that a ML surrogate, designedusing artificial neural networks, yields predictions in ex-cellent agreement with explicit simulation, but at far lesstime and computing costs. We develop a web application onnanoHUB that supports both HPC-driven simulation andthe ML surrogate methods to produce simulation outputs.This tool is used for both in-classroom instruction and forsolving homework problems associated with two coursescovering topics in the broad areas of computational ma-terials science, modeling and simulation, and engineeringapplications of HPC-enabled simulations. The evaluationof the tool via in-classroom student feedback and surveysshows that the ML-enhanced tool provides a dynamic andresponsive simulation environment that enhances studentlearning. The improvement in the interactivity with thesimulation framework in terms of real-time engagementand anytime access enables students to develop intuition forthe physical system behavior through rapid visualization ofvariations in output quantities with changes in inputs.
Index Terms —Machine Learning, HPC-driven Simula-tions, Computational Science, Scientific Computing
I. I
NTRODUCTION
The use of computational simulations is ubiquitous ininvestigating phenomena associated with a wide range ofdisciplines including materials science and engineering,bioengineering, chemistry, chemical engineering, andphysics. Simulations have enabled the understanding ofmicroscopic mechanisms underlying several biologicaland material phenomena such as ion transport acrossthe cell membrane, flow of polymeric liquids, stabi-lization of colloidal dispersions, and self-assembly ofnanostructures [1]–[5]. Classical molecular dynamics(MD) simulations are an important class of simulationapproaches that are generalizable to study a broad rangeof material and chemical systems [1]. In the MD method,Newton’s equations of motion for a system of manyparticles are solved at each timestep to evolve the particlepositions, velocities, and forces forward in time. Inseveral applications where the computational complexityper time step is proportional to the square of the total number of particles (system size), MD simulations incurhigh computational costs. These high costs are typicallymitigated by employing high performance computing(HPC) resources and utilizing parallel computing tech-niques such as OpenMP and MPI. Using HPC-enabledacceleration techniques can dramatically enhance theperformance of MD simulations in many cases.The parallelized MD simulations enable dynamicsof systems with a large number of particles over awide range of input system parameters. In additionto enabling state-of-the-art research, these simulationscan be employed as innovative educational tools forteaching materials covered in computational science andengineering courses. However, despite the employmentof the optimal parallelization model suited for the sizeand complexity of the model system, MD simulationscan take a relatively long time to furnish accurate in-formation, varying from minutes to days depending onthe model system specifics. Primary factors contributingto this scenario are the time delays resulting from thecombination of waiting time in a queue on a computingcluster and the actual runtime for the simulation. Giventhis prognosis, the use of MD simulations has beengenerally limited to “outside-classroom” activities suchas solving homework problems, where the simulationtime requirements are easier to meet.To use MD simulations during in-classroom sessions,the associated computational tool needs to be sufficientlyagile to overcome the following educational challenges: • Provide simulation-based responses to student ques-tions in real-time . • Make the process of explaining underlying scientificconcepts seamless by having rapid access to accu-rate trends in the variation of simulation outputs. • Do synchronous simulation-based analysis of modelsystem behavior in real-time with students. • Provide a dynamic environment for students toperform in silico experiments during class to learnthe concepts by visualizing the system responseunder different input conditions.Addressing these educational challenges can improve a r X i v : . [ phy s i c s . e d - ph ] A ug he classroom teaching of computational science andengineering concepts, and enhance student learning.Motivated in part by these challenges, we recentlyintroduced the idea of integrating machine learning (ML)methods with MD simulations to develop “ML surro-gates” for MD simulations [6], [7]. We demonstratedthat an artificial neural network (ANN) based regressionmodel, trained on data produced by completed runs of agiven HPC-accelerated MD simulation, can successfullyimitate part or all of that MD simulation. We alsoshowed that performance improvements of several ordersof magnitude could be achieved by replacing traditionallarge-scale HPC simulations with ML surrogates. Thecentral idea and approach were illustrated using MDsimulations of ions in nanoconfinement [4]. The MLsurrogate was found to accurately predict the ionicdistributions in confinement and produce the outputswith an inference time of over a factor , smallerthan the corresponding MD simulation runtime [7], [8].Our earlier papers focused on the technical detailsof designing ML surrogates for MD simulations andevaluating the performance metrics associated with theproposed data-driven approach [7], [8]. In this paper,we explore the potential of employing these ML sur-rogates to address the aforementioned educational chal-lenges and enhance the usability of simulation toolsin education. We develop a nanoHUB web applicationwith a GUI that supports both ML surrogate and MDsimulation methods to produce simulation outputs. ThenanoHUB tool is used for both in-classroom teachingand for solving homework problems associated withtwo courses offered at Indiana University (IU). The MLsurrogate built into the tool is employed extensivelyduring the in-classroom instruction to teach conceptssuch as self-assembly, ionic behavior near interfaces,nanoscale material design, modeling and simulation, andneural networks. The impact of the use of ML-enhancedtool in student learning is assessed and evaluated byconducting a survey following other assessment studies[9], [10]. Based on the educational evaluation of thetool, we find that the improvement in the interactivitywith the simulation framework in terms of dynamic, real-time engagement and anytime access enables enhancedstudent learning of computational science concepts.II. R ELATED W ORK
A. ML for enhancing simulations of material systems
Computational science and engineering is being trans-formed by the use of ML. In the area of simula-tions of materials, ML techniques and in particulardeep neural networks based methods have been usedto predict parameters, generate configurations, classifymaterial properties, and design force fields [11]–[17].More recently, ML has been used to design surrogate models that can predict specific outcomes of simulationsby bypassing part or all of the simulation. For example,the dissociation timescale of compounds was predictedusing an ML surrogate for ab initio
MD simulations bybypassing the time evolution of the particle trajectories[18]. Deep neural networks trained on HPC-generatedsimulation data were used as an efficient surrogate formolecular simulation to predict adsorption equilibriaas a function of thermodynamic state variables [19].Convolutional neural network based ML “emulators”have been developed to predict simulation outputs suchas power spectrum in biogeochemistry [20]. We havealso developed surrogates that can predict the outputs(ionic density profiles) of MD simulations of ions inconfinement [6], [7], or in another example, computeforces at each timestep in an MD simulation to bypassthe expensive force calculation step [21]. While theseML surrogates have enabled remarkable performance im-provements to facilitate research investigations, their usefor education applications has been relatively unexploreddespite their ability to produce outputs in real-time andfor a continuous range of input parameters. In this paper,we probe the potential of using ML surrogates for MDsimulations to enhance student learning of topics in thearea of computational science and engineering.
B. Simulation caching
Generally, the approach to provide simulation outputin real-time is to store the previous simulation results ina cache (simulation caching). For example, nanoHUBprovides caching as a feature in computational toolscreated using their Rappture GUI [22]. Cached simu-lations provide a static environment with pre-selectedparameters defining simulations that can be “looked up”.This simulation environment offers limited explorationspace, interactivity, and responsiveness to the student. Toencourage and empower students to directly experimentand explore the model system and associated phenom-ena, a new approach is needed that delivers an inter-active, dynamic, and responsive simulation environmentopen for wide exploration. We show that ML surrogatesare excellent candidates to fulfill this need.III. B
ACKGROUND
A. Use of HPC-enabled simulations in education
One of us teaches two courses at IU that have beentaken by undergraduate and graduate students with in-terests in diverse focus areas including nanoscale en-gineering, bioengineering, computer engineering, chem-istry, and physics. The courses feature application-basedlearning of basic scientific computing concepts and sim-ulation techniques, including the use of parallel com-puting methods. Applications are designed employingthe state-of-the-art research in nanomaterials engineeringovering several material systems such as virus-likeparticles [3], shape-changing nanocontainers [23], [24],ion channels [4], and polymeric fluids [5]. HPC-enabledMD simulations are key parts of these courses. Thesesimulations serve as important tools for understandingdiverse self-assembly phenomena in nanoscale materials[2], predicting material behavior in practical applications[5], and isolating interesting regions of parameter spacefor experimental exploration [23].
B. GUI-wrapped simulations on nanoHUB
To facilitate the use of simulations by students inclassrooms and for solving homework problems, thesimulations are deployed as computational tools onnanoHUB [22]. nanoHUB provides free online accessand a Jupyter-notebook based GUI wrapper for executingsimulation codes. Depending on the input selected by theusers on the GUI, simulations are launched on virtualmachines or on a supercomputing cluster such that theassociated simulation wait and run time is minimized.The authors and the extended research group membershave published 6 nanoHUB tools that enable explorationof diverse self-assembly phenomena in nanomaterials:Ions in nanoconfinement [25], Nanosphere electrostaticslab [26], Nanoparticle assembly lab [27], Nanoparticleshape lab [28], Polyvalent nanoparticle binding simulator[29], and Souffle: Virus capsid assembly lab [30]. Thesetools provide an interactive GUI to students for exam-ining the links between nanomaterial system parametersand their structural and dynamical behavior via render-ings of simulation output on the tool canvas. The toolsalso enable students to learn the workflows associatedwith a large scientific simulation software ecosystem.Three of the six tools have already been employed inteaching materials associated with the aforementionedcourses. Some in-classroom lecture videos are availableon nanoHUB as educational resources [31], [32].
C. nanoHUB tool: Ions in nanoconfinement
The nanoHUB tool “Ions in nanoconfinement” [25]enables users to simulate the self-assembly of ionsin nanoconfinement created by material surfaces repre-sented as identical planar interfaces. A primitive modelof electrolytes is used to model the ions [4], [33]. Simu-lations are performed using standard molecular dynamicsmethods [4], [34]. The inputs associated with the toolinclude ion valency, ion size, electrolyte concentration,and interface separation. The simulation outputs includethe density profiles of ions in confinement. This toolhas been employed every semester since Spring 2018in illustrating concepts in a graduate course (SimulatingNanoscale Systems) and an undergraduate course (Intro-duction to Modeling and Simulation) at IU. In less than3 years of its launch, the tool has been used by over 130users and run over 3400 times [25].
Fig. 1. System overview of the ML surrogate for simulation approachfor generating rapid and accurate predictions of simulation outputs foruse in classroom teaching.
The tool comes with a hybrid OpenMP/MPI acceler-ation that enables simulations to be completed within30 minutes for any combination of input parameters(assuming no waiting time on HPC cluster). In classroomusage, we observed that the fastest simulations tookabout 10 minutes to provide the converged ionic densitieswhile the slowest ones (generally associated with alarger number of simulation steps and system sizes)took as long as 1 hour. Primary factors contributing tothis scenario were the time delays resulting from thecombination of waiting time in a queue on a computingcluster and the actual runtime of the MD simulation. Nothaving rapid access to expected trends in the variation ofionic densities with input parameters made the process ofexplaining concepts and mechanisms unwieldy and time-consuming. As we demonstrate below, integrating thistool with a ML surrogate improved its overall usabilityas an educational tool.IV. ML S
URROGATES FOR S IMULATIONS
We now describe a general approach, introduced inRefs. [6], [7], that utilizes ML to enable real-timeand anytime engagement with simulations, significantlyenhancing the potential for their use in both researchand education. In this approach, ML surrogate modelis designed using data produced by completed runs ofa given HPC-accelerated simulation, and then deployedto approximate the complex relationships between thephysical input parameters and the output results ofsimulations. The ML surrogate bypasses the explicitcomputational evolution of the simulated model compo-nents, yielding accurate outputs in much less time andcomputing costs. Figure 1 shows the overview of thisframework. First, the attributes of the model system arefed to the framework. These inputs are used to launchthe simulation on the HPC cluster. Simultaneously, theseinputs are fed to the ML-based prediction module. Boththe simulation and ML methods are designed to ex-tract (infer) the desired output quantities. Error handler ig. 2. Ionic density profiles for systems I (a), II (b), III (c), and IV (d) predicted by the ML surrogate (red circles) and extracted with MDsimulation (green circles with errorbars). See main text for system definitions. For each system, the ML-predicted density profile is in excellentagreement with the simulation result. aborts the simulation program and displays appropriateerror messages when a simulation fails due to any pre-defined criteria. At the end of the simulation run, theoutput quantities are saved for future retraining of theML model, which occurs after a set number of newsuccessful simulation runs.In previous papers, we applied this framework to thecase of MD simulations of ions in nanoconfinement. Thesurrogate was trained to learn the relationship betweenthe output distribution of positive ions and 5 inputparameters characterizing the ionic system: confinementlength h , salt concentration c , positive ion valency z p ,negative ion valency z n , and ion diameter d . The rangeof each input parameter were as follows: h ∈ (3 . , . nm, c ∈ (0 . , . M, z p ∈ , , , z n ∈ − , and d ∈ (0 . , . nm. The output quantity was selectedto be the distribution of positive ions confined by twoidentical planar interfaces at z = − h/ and z = h/ .For simplicity, using the symmetry of the ionic densityaround the confinement center z = 0 , ML surrogate wastrained to make predictions characterizing the densityof ions in the left half of the confinement (i.e., for z ∈ ( − h/ , ). The predictions were made at approx-imately positions; the associated P ≈ densityvalues were selected as the output parameters (features).The dataset for training the ANN-based ML surrogate was generated by sweeping over a few discrete valuesfor each of the input and output parameters to create andrun 6,864 MD simulations utilizing HPC resources. Onaverage, each MD simulation was run for over millioncomputational steps and took 4200 CPU hours ( ≈ minutes per simulation). The training dataset creationtook approximately 25 days, including the queue waittimes on the IU BigRed2 supercomputing cluster. Theentire data set was separated into training and testingsets using a ratio of 0.8:0.2. The ANN architecture with2 hidden layers was implemented in Python using scikit-learn, Keras, and TensorFlow ML libraries [35]–[37]for regression and prediction of P ≈ continuous(output) variables. The details of the data generation,preprocessing, ANN feature extraction, and regressionare provided in our earlier papers [7], [8].The ANN-based surrogate model produced the ionicdistribution in excellent agreement with explicit MDsimulation results [7]. In addition to the high accuracy ofML inferences, the surrogate yielded output results over10,000 times faster than the parallel MD simulation. Thetypical ML inference time associated with a predictionof the density profile was ≈ . seconds (or, almostinstantaneous). In strike contrast, the average runtimeof the parallel MD simulation to produce a similarlyconverged output was ≈ minutes. ig. 3. GUI of the ML-enhanced “Ions in nanoconfinement” nanoHUB tool [25]. The GUI shows the density profile predicted by the MLsurrogate for half of the position values (green line) and the result extracted via MD simulation (red markers) for an example ionic system. The overall success rates and rapid inference timesassociated with predictions made by ML surrogatesenable a dynamic and responsive simulation environmentfor exploration by students in classroom settings. Thefollowing capabilities associated with these surrogatesare of particular significance in education use: • Learning pre-identified features associated with thesimulation outputs • Generating accurate predictions in real-time forunsimulated state points • Providing a dynamic environment to rapidly explorethe input-output relationships • Enabling anytime and anywhere access to simula-tion results In the next section, we describe the results associatedwith the use of simulation tools integrated with MLsurrogates in teaching materials associated with compu-tational science and engineering courses.V. R
ESULTS
A. Technical evaluation
We first discuss the technical results showing thecomparison between the predictions made by the MLsurrogate and the outputs of MD simulations. Figure2 (a) - (d) shows the ionic density profiles predictedby the ML surrogate for a set of 4 systems ran-domly selected from the entire testing dataset. Thesesystems are: system I (3 , , − , . , . , system II . , , − , . , . , system III (3 . , , − , . , . ,and system IV (3 . , , − , . , . , where the paren-theses list the 5 aforementioned input parameters charac-terizing the ionic system: confinement length h , positiveion valency z p , negative ion valency z n , salt concen-tration c , and ion diameter d . As the figure indicates,for each system, the ML-predicted density profile isin excellent agreement with the result extracted usingMD simulation (ground truth). In addition to the highaccuracy, we note that the ML inferences are made in amuch shorter time of ≈ . seconds compared to MDsimulations ( ≈ minutes on average).Motivated by the good agreement between ionic den-sities generated via ML surrogate and MD simulationsas well as the remarkable performance enhancementresulting from the use of ML surrogates, we integratedthe ML surrogate with the nanoHUB tool “Ions innanoconfinement”. The ML-enhanced tool was deployedon nanoHUB in October 2019. Figure 3 shows theJupyter python notebook based GUI of the deployedtool. Users are provided with the choice to click “Run”and “Predict using ML” buttons simultaneously or sep-arately depending on the desired information. “Predictusing ML” activates the ML surrogate which predictshalf of the density profile instantaneously; the resultis available in the “Prediction Graph” tab as well asin the “Positive Ion Density” tab. Users can enableML surrogate any time by clicking the “Predict usingML” button to access the ML-predicted ionic densityprofile. Clicking the “Run” button instructs the executionengine to either submit a job on an HPC cluster (if the“Cluster mode” button is checked) or run the simulationon a VM. When the simulation is over, the executionengine passes the generated data to be plotted on the“Positive Ion Density” and “Negative Ion Density” tabs.For illustration purposes, Figure 3 also shows the finaldensity plot obtained using the integrated MD and MLmethod for the input parameters h = 3 . nm, z p = 1 , z n = − , c = 0 . M, and d = 0 . nm. The MLprediction is shown as an overlay in the “Positive IonDensity” tab along with the result of the MD simulation. B. Educational evaluation
An accurate and rapid assessment of ionic distribu-tions in confinement by the ML surrogate enables in-classroom instruction of several important concepts suchas interfacial effects, self-assembly in nanoscale systems,and the intimate connection between solution conditionsand the material assembly behavior. For example, byusing the ML surrogate, students can instantaneouslyrecord changes in the ionic structure as the salt concen-tration c is tuned. Figure 4 shows a selected subset ofionic density profiles predicted by the ML surrogate fordifferent c = 0 . , . , . , . M. Other input parameters
Fig. 4. Density profiles of confined positive ions for different saltconcentration c = 0 . , . , . , . M predicted by the ML surrogate. are fixed to h = 3 . nm, z p = 1 , z n = − , and d = 0 . nm. By performing in silico experiments inrapid succession using the ML surrogate, students canreadily visualize the response of the ionic system underchanges in c . For example, students learn that increasingsalt concentration leads to the accumulation of ionsnear the interface (higher peaks in the ionic density) orto the emergence of more modulations in the densityprofile. Both these observations inferred by the MLsurrogate follow the expected behavior in these systemsas reported and elucidated in previous work [4]. The MLsurrogate also enables instructors to perform simulation-based analysis of the ionic model system behaviorsynchronously with students. Further, the remarkableagility of the surrogate in yielding the predictions enableseducators to respond to student questions in real-timevia live demonstration using the ML surrogate. TheML surrogate thus helps resolve the 4 key educationalchallenges outlined in the Introduction (Section I).As noted before, one of the authors regularly teachestwo courses at IU: 1) Simulating nanoscale systems(Fall semester) and 2) Introduction to modeling andsimulation (Spring semester). The students in thesecourses learn computational model development, sim-ulation techniques such as molecular dynamics, dataanalysis and visualization, computational materials sci-ence concepts such as self-assembly and interfacial phe-nomena, parallel computing methods, and engineeringapplications of simulations. The learning is facilitatedby having students perform HPC-based simulations thatenable the extraction of structure-property relationshipsin materials at the nanoscale. Students also become fa-miliar with important practical aspects of research in sci-entific computing such as scalability, time discretization,convergence, model resolution, and simulation accuracy.nanoHUB computational tools are key parts of theseourses as they help facilitate the use of simulations bystudents via a user-friendly, web application requiringno software installation to run the simulations. The MLsurrogate was integrated into the nanoHUB tool “Ionsin nanoconfinement” and the enhanced tool was pilottested in Course 1 in Fall 2019. Six students majoringin different fields, including nanoengineering, computerengineering, and chemistry took the course. The toolwas also used in Course 2 by 15 students in Spring2020. Students used the tool during in-classroom lecturesas well as to solve homework problems. The tool wasactively employed by the instructor in the classroomto help students develop an intuitive understanding ofthe ionic system behavior via rapid experimentation andvisualization of changes in ionic structure.Below we enumerate a subset of the learning outcomesof these courses in order to provide the context for theresults of the tool evaluation discussed in the remainderof this section. When students complete the aforemen-tioned two courses they should be able to:1) Develop scale-appropriate and computationally-efficient models of real / experimental systems.2) Develop an in-depth understanding of computa-tional materials science concepts such as self-assembly and structure-property relationships.3) Develop simulation methods and apply them tosolve engineering problems.4) Use parallel computing methods to enhance com-putational simulations of nanoscale materials.5) Use web-based computational tools and under-stand the associated scientific workflow. TABLE IR
ATING - BASED QUESTIONS USED IN THE SURVEY
ID Question
Q1 Were the use of simulations in the class valuable inlearning concepts?Q2 Were the learning objectives regarding the use of thenanoHUB tool clear?Q3 Rate the tool in terms of user-friendliness.Q4 Rate the tool in terms of convenience (in terms of howfast the results were inferred by ML).Q5 Rate the tool in terms of accuracy (as compared with MDresults).Q6 Rate the tool in terms of use for in-class conceptualunderstanding.Q7 Rate the tool in terms of use for homework problemsolving.Q8 Rate the tool in terms of GUI layout.Q9 Rate the tool in terms of quality.Q10 Rate the tool in terms of consistency.
To assess the impact of the use of the ML-enhanced
Fig. 5. Participant ratings (as a percentage) for the questions (TableI) used in the survey for evaluating the ML-enhanced tool. tool and associated simulations on student learning, atool evaluation survey was conducted at the end of theFall 2019 semester, where students provided feedbackon their experience with the tool. The survey questionswere constructed following similar educational evalua-tion studies [9], [10], [38]–[43], and comprised of bothrating-based as well as text-response-based questions.First, a set of 10 questions tabulated in Table I askedthe students to rate the simulation tool in terms ofdifferent features such as user-friendliness, clarity, utility,consistency etc. Participant ratings as a percentage forthe 10 questions listed in Table I are shown in the formof a bar graph in Figure 5. The rating scale was from1 to 5, where higher scores represent higher ratings.We received a total of 60 rating responses for these10 questions. Based on the responses received for all10 questions, the mean response rating was 4.26 with avariance of 0.39, indicating that on average the studentsevaluated the simulation tool close to the highest ratingof 5.0. More specifically, in terms of user-friendliness,convenience, GUI layout, and consistency, students ratedthe tool at 4 or higher. We also asked the students howmany times they used the nanoHUB tool in the class.16.7% of the students responded that they have used itmore than 20 times, while 50% said they used the toolbetween 10 and 20 times (Figure 6).Next, we asked a series of text-response-based ques-tions. We discuss a few of these questions below. Stu-dents were asked what aspects of the online simulationtool were useful and valuable to them. The studentshighlighted that the simulation tool helped them in-crease the conceptual and practical understanding of thenanoscale simulations due to the user-friendly interface,ML-enabled instantaneous predictions, and availabilityof multidimensional input choices. For example, here isan excerpt from a student response: “
ML provided the ig. 6. A pie chart showing how many times students used the ML-enhanced simulation tool in the classroom. Nearly of the studentsused the tool over 20 times. quick answer when that was needed. Easier to use forsingle simulation than accessing supercomputer. ”Students were asked to compare simulation-drivenclassroom teaching experience with a non-simulation-driven classroom teaching experience. 83% of the stu-dents enjoyed simulation-driven teaching of computa-tional science concepts stating that “ simulations aid tounderstanding the concept taught in the class moreclearly ” and “ it allows students to be the researchersand experimenters in the sense of using these toolsto generate our results for our assignments ”. The rest(17%) still preferred the simulation-driven classroomteaching experience but stated that they felt there wereoccasions with extra downtime in the classroom becauseof waiting for cluster resources to run the simulations,and they suggested that “ the cluster waiting time needsto be filled with useful content ”.Students were also asked to isolate what aspects ofthe nanoHUB online tool were most useful to them.80% of the responses indicated that the students likethe ML prediction feature. Here is an excerpt from astudent response: “ the predicted machine learning aspectwas beneficial because it was very accurate with thesimulated results, so if need be one do not have to waitfor the simulation to finish computing to know what theresults would have been ”. The survey responses alsoindicated that the students enjoy the freedom to probethe system behavior by tuning several model parameters,and they find the output graphs helpful.Finally, students were asked to provide suggestionsto improve the ML-enhanced simulation tool. 66.6%of the students provided feedback to improve the tool,while 33.4% said that they do not have any suggestions to improve the tool. Some suggestions were: “ allow-ing users to download the ML prediction graph ” and“ providing 3D snapshots of the simulation ”. The toolhas been updated based on these useful suggestionsand the latest version provides options to downloadthe ML prediction result and visualize the snapshots ofions in confinement. Some suggestions such as “ graphupdates do not always happen when changing valuesand toggling ML, especially after full simulation wasrun ” have not yet been implemented. These are relatedto the GUI rendering issues which we plan to resolve inthe future working with the nanoHUB team.VI. D
ISCUSSION AND C ONCLUSION
In this paper, we explored the potential of usingML surrogates for HPC-enabled simulations to addressseveral educational challenges in teaching computationalscience and engineering courses. The ML surrogateyields predictions in excellent agreement with simula-tion, but at far less time and computing costs, deliveringa dynamic and responsive simulation environment forrapid exploration by students in classroom settings. Wedeveloped a web application on nanoHUB that supportedboth HPC-enabled simulation and the ML surrogatemethods to produce simulation outputs.The nanoHUB tool was used for both in-classroom in-struction and for solving homework problems associatedwith two courses covering topics on computational mate-rials science, modeling and simulation, and engineeringapplications of HPC-enabled simulations. The educa-tional utility of the tool was evaluated using a survey thatwas answered enthusiastically by the students. Surveyresponses showed that the ML-enhanced tool is well-accepted among students and scored very high marks onconvenience, user-friendliness, and consistency. Studentsalso provided constructive feedback to improve the toolfurther in order to ensure its future success. The improve-ment in the interactivity with the simulation frameworkin terms of real-time engagement and anytime accessenhanced the student learning of computational scienceconcepts. The integrated simulation tool also enabledstudents to better understand the practical aspects ofscientific computing including the tradeoffs betweensimulation accuracy, scalability, and efficiency.Results from this investigation are encouraging andwe expect the ML surrogate approach to be broadly ap-plicable. We plan to explore the development of ML sur-rogates to predict outputs of other simulations includingMD simulations of shape-changing nanoparticles [23],[28], [32] and different types of Monte Carlo simulations[1], [44], [45]. Another line of future work is to exploreways to reduce the training costs of the ML surrogatesand probe their potential in predicting simulation outputsoutside the pre-defined range of training datasets.e note that the integration of the ML surrogate ina computational tool hosted on nanoHUB exposes thisapproach to a much broader community of students,educators, and researchers. nanoHUB is the largest on-line resource for educational materials in nanotechnology[22], hosting over 500 web applications for launchingsimulations and serving over 1 million users worldwide.Finally, we want to emphasize that the use of MLsurrogate is not intended to avoid or exclude HPC ineducation. Instead, our vision is for ML surrogate tocomplement and supplement HPC-enabled simulationsfor education applications. Note also that ML surrogatewas designed using completed runs of HPC-enabled sim-ulations. Without HPC, the time to generate the datasetsto train the ML surrogate becomes prohibitively large[7], [8]. The use of ML surrogates contributes a novelway of teaching HPC topics by helping students developintuition or “feel” for the physical system behaviorthrough rapid exploration and visualization of variationsin output quantities with changes in inputs, before theyuse HPC to solve specific problems.ACKNOWLEDGMENTThis work is supported by the National Science Foun-dation through Awards 1720625 (Network for Compu-tational Nanotechnology - Engineered nanoBIO Node)and DMR-1753182. Simulations were performed usingthe Big Red II supercomputing system. V.J. thanks G.C. Fox, F. Sun, and P. Sharma for useful conversations.We also thank all the students for providing valuablefeedback on the ML-enhanced simulation tool.R
EFERENCES[1] D. Frenkel and B. Smit,
Understanding Molecular Simulation ,2nd ed. Academic Press, 2001.[2] S. C. Glotzer, “Assembly engineering: Materials design for the21st century (2013 pv danckwerts lecture),”
Chemical Engineer-ing Science , vol. 121, pp. 3–9, 2015.[3] N. E. Brunk, M. Uchida, B. Lee, M. Fukuto, L. Yang, T. Douglas,and V. Jadhao, “Linker-mediated assembly of virus-like particlesinto ordered arrays via electrostatic control,”
ACS Applied BioMat. , vol. 2, no. 5, pp. 2192–2201, 2019.[4] Y. Jing, V. Jadhao, J. W. Zwanikken, and M. Olvera de la Cruz,“Ionic structure in liquids confined by dielectric interfaces,”
TheJournal of chemical physics , vol. 143, no. 19, p. 194508, 2015.[5] V. Jadhao and M. O. Robbins, “Rheological properties of liquidsunder conditions of elastohydrodynamic lubrication,”
TribologyLetters , vol. 67, no. 3, p. 66, 2019. [Online]. Available:https://doi.org/10.1007/s11249-019-1178-3[6] J. Kadupitiya, G. C. Fox, and V. Jadhao, “Machine learning forperformance enhancement of molecular dynamics simulations,”in
International Conference on Computational Science , 2019, pp.116–130. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-030-22741-8$ $9[7] J. Kadupitiya, F. Sun, G. Fox, and V. Jadhao, “Machine learningsurrogates for molecular dynamics simulations of soft materials,”
Journal of Computational Science , p. 101107, 2020.[8] J. C. Kadupitige, G. Fox, and V. Jadhao, “Machine learning forauto-tuning of simulation parameters in car-parrinello moleculardynamics,” in
APS Meeting Abstracts , 2019. [9] R. Tanaka, R. F. da Silva, and H. Casanova, “Teaching paralleland distributed computing concepts in simulation with wrench.”in
EduHPC@ SC , 2019, pp. 1–9.[10] S. Srivastava, M. Smith, A. Ghimire, and S. Gao, “Assessingthe integration of parallel and distributed computing in earlyundergraduate computer science curriculum using unpluggedactivities,” in . IEEE, 2019, pp. 17–24.[11] M. Spellings and S. C. Glotzer, “Machine learning for crystalidentification and discovery,”
AIChE Journal , vol. 64, no. 6, pp.2198–2206, 2018.[12] S. S. Schoenholz, “Combining machine learning and physicsto understand glassy systems,”
Journal of Physics: ConferenceSeries , vol. 1036, no. 1, p. 012021, 2018.[13] A. L. Ferguson, “Machine learning and data science in softmaterials engineering,”
Journal of Physics: Condensed Matter ,vol. 30, no. 4, p. 043002, 2017.[14] K. Ch’ng, J. Carrasquilla, R. G. Melko, and E. Khatami, “Ma-chine learning phases of strongly correlated fermions,”
Phys. Rev.X , vol. 7, p. 031038, Aug 2017.[15] J. Kadupitiya, G. C. Fox, and V. Jadhao, “Machine learning forparameter auto-tuning in molecular dynamics simulations: Effi-cient dynamics of ions near polarizable nanoparticles,”
IndianaUniversity, Nov , 2018.[16] G. Fox, J. A. Glazier, J. Kadupitiya, V. Jadhao et al. ,“Learning everywhere: Pervasive machine learning for effectivehigh-performance computation,” in
IEEE IPDPS Workshops ,2019, pp. 422–429. [Online]. Available: https://doi.org/10.1109/IPDPSW.2019.00081[17] J. Wang, S. Olsson, C. Wehmeyer, A. Perez, N. E. Charron,G. De Fabritiis, F. Noe, and C. Clementi, “Machine learningof coarse-grained molecular dynamics force fields,”
ACS centralscience , 2019.[18] F. Hse, I. Fdez. Galvn, A. Aspuru-Guzik, R. Lindh, andM. Vacher, “How machine learning can assist the interpretationof ab initio molecular dynamics simulations and conceptualunderstanding of chemistry,”
Chem. Sci. , vol. 10, pp. 2298–2307,2019. [Online]. Available: http://dx.doi.org/10.1039/C8SC04516J[19] Y. Sun, R. F. DeJaco, and J. I. Siepmann, “Deep neural networklearning of complex binary sorption equilibria from molecularsimulation data,”
Chemical science , vol. 10, no. 16, pp. 4377–4388, 2019.[20] M. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield,D. Froula, G. Gregori, M. Jarvis, S. Khatiwala, J. Korenaga et al. , “Up to two billion times acceleration of scientific sim-ulations with deep neural architecture search,” arXiv preprintarXiv:2001.08055 , 2020.[21] J. Kadupitiya, G. C. Fox, and V. Jadhao, “Simulating moleculardynamics with large timesteps using recurrent neural networks,” arXiv preprint arXiv:2004.06493 , 2020.[22] G. Klimeck, M. McLennan, S. P. Brophy, G. B. A. III, and M. S.Lundstrom, “nanohub.org: Advancing education and research innanotechnology,”
Computing in Science Engineering , vol. 10,no. 5, pp. 17–23, Sept 2008.[23] N. E. Brunk and V. Jadhao, “Computational studies of shapecontrol of charged deformable nanocontainers,”
Journal ofMaterials Chemistry B , vol. 7, p. 6370, 2019. [Online].Available: http://dx.doi.org/10.1039/C9TB01003C[24] V. Jadhao, Z. Yao, C. K. Thomas, and M. Olvera de la Cruz,“Coulomb energy of uniformly charged spheroidal shell systems,”
Physical Review E , vol. 91, no. 3, p. 032305, 2015.[25] K. Kadupitiya, N. Anousheh, S. Marru, G. C. Fox, and V. Jadhao,“Ions in nanoconfinement,” Dec 2017, nanoHUB. [Online].Available: https://nanohub.org/resources/nanoconfinement[26] J. Kadupitiya, N. Brunk, S. Ali, G. C. Fox, and V. Jadhao,“Nanosphere electrostatics lab,” May 2018, nanoHUB. [Online].Available: https://nanohub.org/resources/nselectrostatic[27] N. Brunk, J. Kadupitiya, M. Uchida, T. Douglas, and V. Jadhao,“Nanoparticle assembly lab,” January 2019, nanoHUB. [Online].Available: https://nanohub.org/resources/npassemblylab28] J. Kadupitiya, N. Brunk, and V. Jadhao, “Nanoparticleshape lab,” January 2020, nanoHUB. [Online]. Available:https://nanohub.org/resources/npshapelab[29] L. Nilsson, J. Kadupitiya, and V. Jadhao, “Polyvalent nanoparticlebinding simulator,” Apr 2019, nanoHUB. [Online]. Available:https://nanohub.org/resources/nanobind[30] ——, “Souffle: Virus capsid assembly lab,” Apr 2020, nanoHUB.[Online]. Available: https://nanohub.org/resources/capsidsouffle[31] V. Jadhao, “Nanoscale simulations and engineering applications:Applications - self-assembly in nanoconfinement,” Feb 2019.[Online]. Available: https://nanohub.org/resources/29671[32] ——, “Shape changing nanocontainers tutorial,” Apr 2020.[Online]. Available: https://nanohub.org/resources/33259[33] F. J. Solis, V. Jadhao, and M. Olvera de la Cruz, “Generatingtrue minima in constrained variational formulations via modifiedlagrange multipliers,”
Physical Review E , vol. 88, no. 5, p.053306, 2013.[34] S. Plimpton, “Fast parallel algorithms for short-range moleculardynamics,”
Journal of Computational Physics , vol. 117, no. 1,pp. 1 – 19, 1995.[35] F. Chollet et al. , “Keras,” 2015.[36] L. Buitinck et al. , “Api design for machine learning software: ex-periences from the scikit-learn project,” arXiv:1309.0238 , 2013.[37] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,M. Devin, S. Ghemawat, G. Irving, M. Isard et al. , “Tensorflow:a system for large-scale machine learning.” in
OSDI , vol. 16,2016, pp. 265–283.[38] D. Marchant, C.-J. Johnsen, B. Vinter, and K. Skovhede, “Teach-ing concurrent and distributed programming with concepts overmathematical proofs,” in . IEEE, 2019, pp.49–57.[39] A. Shoker, “Successful systems in production graduate teach-ing,” in . IEEE, 2019, pp. 42–48.[40] A. Gonz´alez-Escribano, V. Lara-Mongil, E. Rodriguez-Gutiez,and Y. Torres, “Toward improving collaborative behaviour duringcompetitive programming assignments.” in
EduHPC@ SC , 2019,pp. 68–74.[41] R. Carratal´a-S´aez, S. Iserte, and S. Catal´an, “Teaching on de-mand: an hpc experience.” in
EduHPC@ SC , 2019, pp. 32–41.[42] A. Qasem, “A gentle introduction to heterogeneous computingfor cs1 students,” in . IEEE, 2019, pp.10–16.[43] J. Miller and M. Arenaz, “Measuring the impact of hpc train-ing,” in . IEEE, 2019, pp. 58–67.[44] V. Jadhao and N. Makri, “Iterative monte carlo with bead-adaptedsampling for complex-time correlation functions,”
The Journal ofchemical physics , vol. 132, no. 10, p. 104110, 2010.[45] ——, “Iterative monte carlo path integral with optimal grids fromwhole-necklace sampling,”