[PDF] Embedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy

Abstract

Scanning probe microscopists generally do not rely on complete images to assess the quality of data acquired during a scan. Instead, assessments of the state of the tip apex, which not only determines the resolution in any scanning probe technique but can also generate a wide array of frustrating artefacts, are carried out in real time on the basis of a few lines of an image (and, typically, their associated line profiles.) The very small number of machine learning approaches to probe microscopy published to date, however, involve classifications based on full images. Given that data acquisition is the most time-consuming task during routine tip conditioning, automated methods are thus currently extremely slow in comparison to the tried-and-trusted strategies and heuristics used routinely by probe microscopists. Here, we explore various strategies by which different STM image classes (arising from changes in the tip state) can be correctly identified from partial scans. By employing a secondary temporal network and a rolling window of a small group of individual scanlines, we find that tip assessment is possible with a small fraction of a complete image. We achieve this with little-to-no performance penalty -- or, indeed, markedly improved performance in some cases -- and introduce a protocol to detect the state of the tip apex in real time.

Full PDF

aa r X i v : . [ phy s i c s . a t m - c l u s ] J u l Embedding Human Heuristics inMachine-Learning-Enabled Probe Microscopy

Oliver M. Gordon

School of Physics & Astronomy, The University of Nottingham, University Park,Nottingham, NG7 2RD, United KingdomE-mail: [email protected]

Filipe L.Q. Junqueira

School of Physics & Astronomy, The University of Nottingham, University Park,Nottingham, NG7 2RD, United KingdomE-mail: [email protected]

Philip J. Moriarty

School of Physics & Astronomy, The University of Nottingham, University Park,Nottingham, NG7 2RD, United KingdomE-mail: [email protected]

Abstract.

Scanning probe microscopists generally do not rely on complete imagesto assess the quality of data acquired during a scan. Instead, assessments of thestate of the tip apex, which not only determines the resolution in any scanning probetechnique but can also generate a wide array of frustrating artefacts, are carried outin real time on the basis of a few lines of an image (and, typically, their associated lineproﬁles.) The very small number of machine learning approaches to probe microscopypublished to date, however, involve classiﬁcations based on full images. Given thatdata acquisition is the most time-consuming task during routine tip conditioning,automated methods are thus currently extremely slow in comparison to the tried-and-trusted strategies and heuristics used routinely by probe microscopists. Here, weexplore various strategies by which diﬀerent STM image classes (arising from changes inthe tip state) can be correctly identiﬁed from partial scans. By employing a secondarytemporal network and a rolling window of a small group of individual scanlines, weﬁnd that tip assessment is possible with a small fraction of a complete image. Weachieve this with little-to-no performance penalty – or, indeed, markedly improvedperformance in some cases – and introduce a protocol to detect the state of the tipapex in real time.

Submitted to:

Machine Learning Sci. Tech. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy

1. Introduction

One of the major challenges in the drive to fully automate the scanning probe microscopeis the need to constantly maintain the integrity of the tip[1, 2]. During an experimentalsession, interactions with the surface can cause the tip to spontaneously and randomlychange shape, modifying the interactions and therefore changing the data acquired ina highly non-linear fashion. This frequently results in inconsistent scans containingvisual artefacts, often making data unusable or, at best, problematic to interpret. Fur-thermore, it is becoming de rigeuer in state-of-the-art SPM to functionalise tips bydeliberately picking up adsorbed molecules or atoms from the surface[3], vastly improv-ing resolution[4], enabling direct measurement of intermolecular pair potentials[5, 6],and/or modifying the capability of the probe, for better or worse, to manipulate andposition single adsorbates[7].Indeed, SPM experimentation is now at the point where not only is singleatom/molecule termination of the tip apex required, but ﬁne control and detailed un-derstanding of its atomic/molecular orbital structure is often essential. Gross et al. [8]provided a particularly elegant example of the importance of “orbital engineering” of thistype by demonstrating the signiﬁcant enhancement of submolecular resolution in scan-ning tunnelling microscopy (STM) images of pentacene and naphthalocyanine moleculesvia tunnelling through p -wave orbitals, as the tunnelling matrix element for these statesis proportional not to the sample wavefunction itself but its spatial derivatives. Thespatial distribution and orientation of electron density at the tip apex also plays a cen-tral role in single atom manipulation[9]. Controlling and maintaining the atomistic andorbital structure of the tip apex is therefore a vital part of state-of-the-art SPM oper-ation. Currently, this requires a protracted and repetitive routine of voltage pulsing,“gentle” (or not-so-gentle) indenting of the tip into the surface, scanning at relativelyhigh voltages and currents, and/or attempts to pick up adsorbates. This is at presenta high-eﬀort, time-consuming and manual process involving only simple sub-processes,making it ideal to automate.Whilst convolutional neural networks (CNNs) have been shown to be capable ofassessing SPM tips[10, 11, 12], and, most recently, of extracting “hidden order” fromSTM datasets[13], CNN methods to-date have been trained exclusively with completeimages. Partial scans comprising a small number of scanlines therefore simply do notprovide the information upon which the network mathematically depends and so cur-rent methods of CNN image assessment require complete scans. This method of CNNassessment after complete scans compares extremely poorly to human-based assessment,in which SPM operators routinely perform accurate assessment during in-process scansby observing individual line proﬁles as the image is acquired. Indeed, as little as 1-2%of a full scan may be required to correctly assess a particularly poor tip. Furthermore,because the majority of time spent maintaining the tip is spent acquiring the data to mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy

2. H:Si(100) Dataset

As discussed in the Introduction, SPM images often contain multiple features becauseof the scanning probe apex changing during a single scan. These tip changes also reg-ularly and immediately result in discontinuities perpendicular to the direction of thescan. After the tip changes shape, multiple, more complex visual artefacts can alsoappear[14, 15, 16]. For example, features can appear to ’ghost’ due to the presence ofmultiple tip apices[17, 10], or large blurs may appear due to impurities on the probeitself. Whilst these particular features can be seen when scanning any surface, othersare speciﬁc to the surface being investigated[18]. For example, for the H:Si(100) surface,four diﬀerent, distinct tip states of ’individual atoms’ (for the sharpest tips), ’dimers,’asymmetries’, and ’rows’ have been observed and discussed in the literature[15, 19, 20].Typically, an operator will want to coerce the tip into producing images with one ofthese atomistic resolutions visible. (It is also worth noting that the tip apex capable ofthe highest resolution may not be best suited to other tasks, including, in particular,single atom manipulation[21].) Uncontrolled, and sometimes controlled, tip changes,however, mean that it is possible to produce images of H:Si(100) showing a combinationof any of these four states, tip change shears, and other defects. Examples for each stateare shown in Figure 1, along with a diagram of the H:Si(100)-(2x1) surface reconstruc-tion. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy Side-On (a) (b) (c) (d) (e) (f)

Figure 1.

Selection of images showing key tip states for STM imaging of H:Si(100). (a)Ball-and-stick model of atomic structure of H:Si(100)-(2x1), which comprises rows ofhydrogen-terminated silicon dimers; (b) atomic resolution; (c) asymmetry; (d) dimerresolution; (e) row resolution; and (f) bad/blurry. The tip can also spontaneouslychange during imaging, resulting in the horizontal discontinuity in (e). Frequently,features appear to blend between images, such as with asymmetries and atoms, orthe dimer-like modulation in rows. Asymmetries and dimer classes were thereforecombined.

Besides its distinctive surface features, H:Si(100) is an ideal test-bed for developingCNN automation techniques. In addition to the relative simplicity of its reconstructionand a wealth of previous literature[22], H:Si(100) is a well understood substrate thathas been used in many important advances in single atom technology and atomicallyprecise materials engineering[23, 24, 25, 26, 21, 27, 28]. Furthermore, because it hasbeen previously studied in the context of machine-learning-enabled SPM[10, 11, 12]], agood comparison can be formed with existing machine learning approaches based on fullscans. As such, we used our existing dataset of 6167 complete images of H:Si(100)[12].These images were acquired on a Omicron variable-temperature STM between March2014 and November 2015, and at varying scan sizes and voltage biases of 3x3nm to80x80nm and -2V to +2V respectively. They were then hand-classiﬁed into the fourcategories listed above, as well as ”tip changes” and ”generic defects”. Speciﬁc defectswere not considered, as tip conditioning is performed based on the presence of any de-fect, and not the speciﬁc defect itself. As such, combining all defects into one categorysimpliﬁed the classiﬁcation task, improving CNN performance.From here, images were then randomly assigned into a training/testing set for train-ing the network. Performance was then calculated with a separate, blind holdout setfor veriﬁcation. After random shuﬄing, 4987 of the images were assigned into the train-ing/testing datasets in an 80/20 split, and 1180 into the holdout set. Data was thenﬁltered to remove ambiguous images that were classiﬁed in multiple categories and/orthe human classiﬁers did not perfectly agree upon[12]. This left 3386 images for train-ing/testing, and 648 for blind veriﬁcation. Because of the relatively small number ofimages available and to further improve performance, all data were then pre-processedusing identical methods as in Gordon et al. [12] (namely ﬂattening and scaling linescansto have mean of 0 and standard deviation of 1). The training/testing sets were also aug-mented with vertical and horizontal ﬂips, random rotations from 0-360 ◦ , crops, pans,and random Gaussian noise. This step was needed to prevent the network from rapidly mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy

3. Results and Discussion

In many neural network applications, data are often of varying length. For example, innatural language processing[32], some words and sentences are inevitably longer thanothers. In these cases, shorter pieces of data are lengthened by ”padding” them with amarker value[32] until they are as long as the longest piece of data. The marker value ischosen such that it cannot naturally appear in the real data. Training and testing thencontinues as normal, as the network learns to ignore the marker value. In the contextof SPM, we can exploit the fact that images are sequentially generated one linescan ata time, and that completed images contain the same number of linescans, regardlessof scan parameters. During an incomplete scan, the missing linescans can therefore bereplaced with a marker value to allow the network to produce an output. Figure 2demonstrates how data could be padded during scanning to form a full sized image. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy C NN C NN D a t a A v a li a b l e Time C NN Missing linesgiven arbitrary'ignore' value

Figure 2.

Figure to demonstrate a potential method to allow neural networks topredict the state of an SPM tip using incomplete scans. Because CNNs can only makepredictions if given the same number of data points used during training, it is notpossible to make predictions using incomplete scans. It is also computationally wastefulto create multiple CNNs for each stage of scan completeness. Instead, partial scanscan be ”padded” with an arbitrary marker value until there are enough data points toequal a full sized scan. After each successive linescan, less padding is required. Thisallows the CNN (green border) to train/predict using incomplete scans.

As such, it is possible simulate and test partial scans with the original dataset ofcomplete scans. To do this, a random number of linescans from the end of the scan were”masked” during training by replacing the real data with the marker value. To do this,we let  A ij A ij +1 A ij +2 ... A iN  = M, (1)where N is the total number of lines in a full image, and M the marker value. Thisproduces an array, A ij , for the i th image of a dataset, in which only j linescans appear tohave been produced. To improve performance, data is further augmented by repeating A i multiple times, but with randomly assigned j .Although this method is simple and can easily be applied to existing protocols, theuse of a marker value is of course highly problematic. In the context of SPM, data cantheoretically contain any positive or negative value within the operating range of theacquisition hardware. As such, no marker value exists that could not show up in the ac-tual dataset, without being so large as to make the actual data miniscule by comparison mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy M = 0 and M = 10. As an alternative, we also consider ”tiling” by repeating A ij to full scan-size.This avoids the need to ﬁll with an arbitrary marker value.The CNN structure was chosen to be VGG-like[33] after strong all-round perfor-mance was previously found for H:Si(100) using a similar structure[12]. This network[33]begins with two 2D convolutional layers of 32 output ﬁlters, 3x3 convolutional ﬁlters,and 3x3 strides. This is followed by a third max pooling layer with 2x2 convolutionalﬁlters and 2x2 strides. This three layer block is then repeated, but with output ﬁltersof 64 and then 128 layers respectively. The very ﬁrst convolutional layer in the modelwas then altered to have 7x7 convolutional ﬁlters and 2x2 strides. This structure wasthen trained three separate times to create a majority voting ensemble. Not only doesthis allow for the performance beneﬁts seen when taking a majority vote of a subjectivetask, but also reduces variance in CNN performance which was found to vary by about1% between repeats.To test this method, the performance of the CNN ensemble was calculated as oneadditional line was unmasked at a time. We do this by masking from the j th line of A ij using Equation 1 for all 648 images in the veriﬁcation dataset. The CNN ensemblewas then used to predict the tip state, P ( A ij ), from j = 2 to j = N . By assumingthe human prediction to be perfectly correct, performance was calculated by comparingCNN predictions to the corresponding human predictions. Performance is shown as afunction of j in Figure 3.From these ﬁgures, it can clearly be seen that for all types of padding, the padding-enabled-CNNs successfully learnt to make correct observations with limited data. Fur-thermore, when comparing the performance diﬀerence of small amounts of data with j = 2 to full scans with j = N , the performance of all padding types only decreased byan average of 4 ±

1% and 7 ±

2% for mean AUROC and balanced accuracy respectively.Given that the balanced accuracy and AUROC values are signiﬁcantly better than the0.25 and 0.50 of guessing respectively, it is entirely possible to assess SPM tip state withonly a small number of linescans.However, at j = N the padding-enabled-CNNs performed signiﬁcantly worse thanan identical ensemble trained without padding. Here, padding reduced full size perfor-mance by up to 12% and 22% for the mean AUROC and balanced accuracy respectively,when compared to the worst performing padding methods. Giving CNNs the ability toclassify partial scans therefore signiﬁcantly harms performance, reducing the real-worldeﬀectiveness of such systems. We also note that this architecture also performed betterthan the ensembles presented in Gordon et al. [12]. The large initial convolutional win- mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy Number of Lines, j B a l a n c e d A cc u r a c y j M e a n A U R O C (a) (b) Figure 3.

Figure to demonstrate the balanced accuracy (a), and mean area-underthe receiver-operator-characteristic curve (b) of a neural network trained to classifypartial STM images of the H:Si(100) surface. Given that SPM data is generated oneline at a time, incomplete scans can be padded to full-size with a marker value that theotherwise identical network then learns to ignore. In this way, the data requirementsfor neural network automated state detection can be reduced signiﬁcantly. Here, weconsider marker values of 10 (red), 0 (blue), and also tile the data to size (green).However, performance is far below an identical network trained exclusively on full sizedata (black) dow may have caused this. Besides the reduced maximum performance, there was also alarge computational ineﬃciency due to training the CNNs to perform (and subsequentlyignore) a large number of expensive calculations on meaningless data.One advantage of partial-scan methods is that tip changes can be instantly detectedby looking for changes and impulses in CNN output, as visible in Figure 4. This is asigniﬁcant improvement on previous full-scan methods which require a secondary ”tipchange” network[12]. We note that without manual labelling of all tip change locationson all images, a quantitative analysis of tip-change detection is not possible. However,the imperfect ignoring of the marker values meant that some of the horizontal shears dueto tip changes caused little-to-no-change in network output. The change in prediction toreﬂect a new tip state was also often small, and tended to ”drift” rather than instantly”snap” to the new value. This was particularly problematic for tip changes later on ina scan. One explanation is that the network learnt to heavily rely on earlier scan-linesbecause training images often had early scanlines present, but later scanlines did soincreasingly rarely. It was also impossible to detect tip changes using the ”tile” methodof data padding, which created a horizontal shear (visually identical to a tip changeshear) between every tile. As such, padding should only be employed early on in scansand when the tip state is likely stable. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy (a)(b)(c) N e t w o r k C o n (cid:1) d e n c e Number of Lines Scanned, j Figure 4.

Figure to demonstrate a variety of methods by which the H:Si(100) tip statesof individual atoms (yellow), asymmetries/dimers (blue), rows (green), and genericdefects (red) can be recognised from incomplete SPM scans. Instead of detectingSPM tip states using complete scans, neural networks were taught to recognise partialscans by zero padding (a), or by classifying single linescans (b). In this case, non-defect categories had to be combined together (pink). However, optimal results werefound by forming a ”window” with a small group of 20 consecutive linescans, andgiving additional predictive power by using a second LSTM network as the window is”rolled” over time (c). This network was found to perform the strongest at single classclassiﬁcations, and showed good responsiveness with varying tip state.

One alternative to padding incomplete scans is to train so as to classify the individuallinescans that form an image, rather an image in its entirety. As new lines are scanned,they could immediately be predicted. This negates much of the insensitivity and com-putational wastefulness caused as a result of padding, and is demonstrated in Figure 5.However, one consequence of basing predictions on individual linescans is that eachlinescan is stripped of its context to the rest of the scan. Acquiring more linescansshould therefore not improve network performance. As such, a small amount of contextcan be applied to the other scanlines in the image by applying an additional layer tocumulatively average the network predictions using the equation P ( A ij ) = P jk =1 P ( A ik ) j , (2)where P ( A ij ) is the cumulatively averaged vector describing the predictions of the j th linescan of the i th image in the dataset. A prediction for the entire image is therefore mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy D a t a A v a li a b l e Time

Exis(cid:1) n(cid:0) (cid:2)(cid:3)ta ignored C NN C NN C NN C(cid:4)(cid:5)(cid:6)(cid:7)(cid:8) (cid:1)ve Aver (cid:9)(cid:10) e (cid:11)(cid:12)(cid:13)(cid:14)(cid:15)(cid:16) (cid:1)ve Aver (cid:17)(cid:18) e Figure 5.

Figure to demonstrate a potential method to allow neural networks topredict the state of an SPM tip using incomplete scans. Instead of training/predictingwith complete scans, the network (green border) was allowed to predict individuallinescans. As more linescans become available during a scan, network predictions arecumulatively averaged to give context between successive linescans. found when the condition j = N is met.We also note that although the cumulative averaging provided context to the pre-dictions, the actual predictive part of the network was unaware of the surroundinglinescans. Whilst this averaging therefore served to reward consistent single-class out-put, it should be expected to have poor responsiveness to scans where the tip constantlychanges shape. Furthermore, the network had little-to-no ability to distinguish betweenfeatures that cannot be distinguished at the 1D level. For example, a single linescan of’atoms’ or ’rows’ features in Figure 1 would appear identical with a half-rectiﬁed sinu-soid. The varying scan areas of the dataset required to make predictions invariant toscan area then prevent the network from learning any spatial information to distinguishbetween the two states. As such, the number of tip states was simpliﬁed to just two -”generic defect” and ”visible resolution”.Adaptions also had to be made to the network architecture. Because 2D convolu-tions cannot be performed on 1D data, the 2D layers of the network were replaced withtheir one-dimensional counterparts to provide the closest possible comparison betweenthe protocols. Furthermore, because successive lines were often highly similar, only 1in every 30 lines of each image were used during training to prevent improper trainingand decrease training time.As before, performance was veriﬁed by iteratively predicting additional lines of the i images in the holdout set and calculating the cumulative predictions using Equation 2.This is shown in Figure 6. To compare with full-sized performance, the 1D convolutions mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy Number of Lines, j

32 64 96 128 B a l a n c e d A cc u r a c y (a) M e a n A U R O C (b) Figure 6.

Without the cumulative averaging layer, the low standard deviation demonstratedthat performance remained near constant as expected, with AUROC of 0 . ± . . ± . mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy Whilst single linescans provide an eﬀective method to make a basic assessment of the tip,the inability to assess the complete range of states makes it of limited use. To overcomethe lack of context between linescans, a CNN could instead be trained to recognise asmall ”window” consisting of a ﬁxed number, W , of linescans. As new data becomesavailable, the window could then be ”rolled” to consist of the new line and the ( W − j = W + 1 to j = N P ( A ij ) = P jk = W +1 P ( A i ( k − W ): k ) j . (3)However, whilst eﬀective at improving single-state classiﬁcation performance andrewarding tip consistency, cumulative averaging does not make the predictive part ofthe neural network aware of the lines surrounding each window, resulting in decreasedresponsiveness. One recent advance in the area of video content recognition is the Long-Term Recurrent Convolutional Network (LRCN)[34], which has been shown to be highlyeﬀective at this task. Here, a second network is placed just before the ﬁnal (dense) CNNlayer (which reduces the output to a size equal to the number of classiﬁcation categories).This second network is typically a long-short-term-memory (LSTM) network[35], whichis often used for 1D sequence classiﬁcation. The LSTM network then acts on the tem-poral domain of the data, giving context to the single CNNs which have no knowledgeof how the video frames link together. This can be made analogous to SPM, whereeach sub-image of width W becomes a video frame. The temporal element is seen asthe window rolls when j increments over time with new data. We therefore replace thecumulative averaging layer with an LSTM network with 256 hidden layers, and calculatepredictions, P ( A i ( j − W ): j ), from j = W + 1 to j = N as before, with increasing j chosenas the temporal axis. For consistency, we employ the same 2D CNN architecture asbefore. The resulting protocol is shown in Figure 7.One consequence of this method is that W linescans must ﬁrst be accumulatedbefore any predictions can be made. As such, whilst larger W will give the networkmore data with which to make predictions, a larger number of linescans are required tobe scanned before the window can be fully ﬁlled. For example, for a window of W = 20,predictions can only be made after the 20th, 21st, 22nd linescans, and so on. We alsonote that the size and number of convolutions used meant that predictions with W < mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy D a t a A v a li a b l e Time V ariable widthCNN window C NN C NN C NN LSTM NetworkLSTM Network

Figure 7.

Figure to demonstrate a potential method to allow neural networksto predict the state of an SPM tip using incomplete scans. Rather thantraining/predicting with complete scans, the network (green border) can instead beallowed to predict a small group of individual linescans. This window of CNNs canthen be rolled to make additional predictions as successive linescans become availableover time. The outputs of these CNNs can then be fed into a second temporal neuralnetwork, to make a ﬁnal prediction. is repeated with each additional window, multiplying memory usage by N − W +1 times.As can be seen in Figure 8, the inclusion of additional linescans once again resultedin improved performance, demonstrating that the LSTM component did indeed learnfrom the temporal evolution of the scans. Performance was also very strong regardlessof j . For example, full scans with W = 20 yielded a near-perfect AUROC of 0.960 anda balanced accuracy of 0.847. This is almost identical to the AUROC and balancedaccuracy of 0.963 and 0.856 respectively calculated when training the CNN componentonly on full-sized images. The wider LRCN networks were even able to exceed full-sizeperformance, despite using less data . This is understandable, given that a human op-erator will often look not only at the scanlines, but also at how they evolve over time.Only the LRCN network takes advantage of this temporal context. It can therefore beconcluded that by adding LSTM to an existing network and retraining on partial scansof ﬁxed size, a full set of STM image classes/tip states can be correctly and accuratelyassessed with negligible performance impact despite using a fraction of the data. How-ever, increasing W beyond W = 30 did not always improve performance. Althoughwider windows provided more opportunities to observe trends in the 2D convolutionaldomain, smaller windows provided more temporal elements for the LSTM layer to use.The beneﬁt of using temporal information can also be seen by comparing LRCNto cumulative averaging. For the same W = 20 window, full-scan performance usingcumulative averaging was calculated to have AUROC of 0.880 and balanced accuracy mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy N(cid:19)(cid:20)(cid:21)(cid:22)(cid:23) o(cid:24) (cid:25)(cid:26)(cid:27)(cid:28)(cid:29)(cid:30) j Figure 8.

Figure to demonstrate the balanced accuracy (a), and mean area-underthe receiver-operator-characteristic curve (b) of a neural network trained to classifythe SPM tip states of the H:Si(100) surface with incomplete scans. Given that SPMdata is generated one line at a time, the identical network can be trained with smallgroups of 20 (green), 30 (yellow), or 40 (blue) linescans, for example. These predictionsare then fed into a secondary LSTM network that acts temporally. This prevents theneed to train (and therefore classify) only on complete scans (black). In this way,the data requirements for neural network automated state detection can be reducedsigniﬁcantly. of 0.620. Not only was this slightly worse than the padding method of Figure 3, butalso signiﬁcantly poorer than LRCN, which scored 9.10% higher for AUROC and 36.7%for balanced accuracy. This performance disparity held true regardless of values of W and j , or when classifying variable state images. As seen in Figure 4(c), cumulativeaveraging was often unresponsive to both sudden changes in state. Moreover, LRCNwas more able to correctly distinguish between atoms and asymmetries, and was lesslikely to mistakenly see rotated surfaces as a ”generic defect” compared to the baselineof full scan classiﬁcation. Whilst it would seem obvious to combine both LRCN andcumulative averaging, the issues with decreased responsiveness later in a scan remain.This resulted in a small performance penalty which increased as more linescans weresimulated (on the order of 1% at j = N ). Whilst cumulative averaging was still bet-ter than guessing and is therefore another potential method for speeding up tip staterecognition, LRCN is superior.Furthermore, whilst the state of the tip was still successfully observed with W = 20,the size and number of convolutions used meant that window sizes below W = 20were not possible to test. This meant that j = 20 lines must ﬁrst be acquired beforepredictions can be made. To reduce the number of linescans further, larger imagescould instead be considered (which in this case would be achieved by downscaling from512x512 to a size larger than 128x128). For example, simulating W = 20 with 256 pointsper linescan would be equivalent to 128 points per linescan with W = 10. However,the same number of data-points would need to be acquired before predictions could be mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy W , this wouldresult in a diﬀerent network that could not be fairly compared in this study. To allowfor predictions at any j , it is trivial to create a ”hybrid” network ensemble in which abasic assessment is made using the linescan/padding methods for low j , and then LRCNfor the remainder of the scan.

4. Conclusion

By comparing a variety of methods based around a common VGG network, we havesuccessfully demonstrated that STM images of the H:Si(100) surface can be accuratelyassessed using partial scans. As such, only a few lines from a typical 128x128 scan arenow required to assess the tip, which is a fraction of the data required by previous CNNassessment protocols. Given that the majority of the time spent maintaining SPM tipsis spent acquiring data, a ”hybrid” approach combining individual linescans and LRCNprediction would speed up CNN routines by approximately 100 times. This allows forstate recognition in a time similar to that of current manual means, thus making itpractical for everyday use. However, given that the states considered only apply to theH:Si(100) surface, new datasets and networks must be manually created and trained foreach surface, making this strategy non-applicable to poorly understood surfaces.Relative to a full-size network, we ﬁnd that similar or better performance can beachieved with less data by creating a small window of multiple linescans, and addingan LSTM layer to make predictions as the window is rolled over time. Furthermore,we qualitatively demonstrate that the use of partial linescans allows tip changes to bedetected without the need for a secondary network. We also show that this methodallows for the detection of images in which tip changes cause multiple tip states to bepresent, alongside their relative position in the image. However, the low number ofhuman classiﬁers and lack of manual labelling of these positions during data collectionmeant that only single tip-state images were quantitatively assessed. Furthermore, noneof these approaches overcome the limitation of only being able to automate assessmentof a single, already known surface reconstruction after a lengthy data collection process.In future, we aim to assess SPM tips with a ”hybrid” approach combining multipleprotocols of predicting with padded full-scans, individual linescans, and temporallyconnected partial scans of ﬁxed width. Ultimately, this will enable seamless, automaticand constant maintenance of SPM tip integrity as part of routine experimental sessions.Unsupervised learning is the next, obvious, protocol to adopt in order to make machinelearning strategies sample-independent. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy Acknowledgements

The authors gratefully acknowledge funding by the Engineering and Physical SciencesResearch Council via grant EP/N02379X/1. We also thank I Swart, L. Knijﬀ, S.E.Freeney, and S. Zevenhuizen of the Debye Institute for Nanomaterials Science, atUtrecht University for their continued assistance and advice (including support for theinvaluable MATE-for-Dummies and access2TheMatrix Python packages.) We gratefullyacknowledge helpful discussions with Bob Wolkow, John Randall, Morten Moller (whoalso provided the dataset of H:Si(100) images used in this work), and Richard Woolley. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy References [1] F. Tajaddodianfar, S. O. R. Moheimani, J. Owen, and J. N. Randall, “On the eﬀect of local barrierheight in scanning tunneling microscopy: Measurement methods and control implications,”

Rev.Sci. Instr. , vol. 89, JAN 2018.[2] S. Tewari, K. M. Bastiaans, M. P. Allan, and J. M. van Ruitenbeek, “Robust procedure for creatingand characterizing the atomic structure of scanning tunneling microscope tips,”

Beilstein J.Nanotech. , vol. 8, pp. 2389–2395, NOV 13 2017.[3] F. J. Giessibl, “The qplus sensor, a powerful core for the atomic force microscope,”

Review ofScientiﬁc Instruments , vol. 90, no. 1, p. 011101, 2019.[4] L. Gross, F. Mohn, N. Moll, P. Liljeroth, and G. Meyer, “The chemical structure of a moleculeresolved by atomic force microscopy,”

Science , vol. 325, no. 5944, pp. 1110–1114, 2009.[5] Z. Sun, M. P. Boneschanscher, I. Swart, D. Vanmaekelbergh, and P. Liljeroth, “QuantitativeAtomic Force Microscopy with Carbon Monoxide Terminated Tips,”

Phys. Rev. Lett. , vol. 106,JAN 27 2011.[6] C. Chiutu, A. M. Sweetman, A. J. Lakin, A. Stannard, S. Jarvis, L. Kantorovich, J. L. Dunn,and P. Moriarty, “Precise Orientation of a Single C-60 Molecule on the Tip of a Scanning ProbeMicroscope,”

Phys. Rev. Lett. , vol. 108, JUN 26 2012.[7] G. Meyer, L. Bartels, and K. Rieder, “Atom manipulation with the STM: nanostructuring, tipfunctionalization, and femtochemistry,”

Comp. Mat. Sci. , vol. 20, pp. 443–450, MAR 2001.[8] L. Gross, N. Moll, F. Mohn, A. Curioni, G. Meyer, F. Hanke, and M. Persson, “High-ResolutionMolecular Orbital Imaging Using a p-Wave STM Tip,”

Phys. Rev. Lett. , vol. 107, AUG 15 2011.[9] S. Jarvis, A. Sweetman, J. Bamidele, L. Kantorovich, and P. Moriarty, “Role of orbital overlap inatomic manipulation,”

Phys. Rev. B , vol. 85, JUN 7 2012.[10] M. Rashidi and R. A. Wolkow, “Autonomous scanning probe microscopy in situ tip conditioningthrough machine learning,”

ACS Nano , 2018.[11] M. Rashidi, J. Croshaw, K. Mastel, M. Tamura, H. Hosseinzadeh, and R. A. Wolkow, “Autonomousatomic scale manufacturing through machine learning,” arXiv preprint arXiv:1902.08818 , 2019.[12] O. Gordon, P. D’Hondt, L. Knijﬀ, S. Freeney, F. Junqueira, P. Moriarty, and I. Swart,“Scanning Probe State Recognition With Multi-Class Neural Network Ensembles,” arXivpreprint arXiv:1903.09101 , 2019.[13] Y. Zhang, A. Mesaros, K. Fujita, S. D. Edkins, M. H. Hamidian, K. Ch’ng, H. Eisaki, S. Uchida,J. C. S. Davis, E. Khatami, and E.-A. Kim, “Machine learning in electronic-quantum-matterimaging experiments,”

Nature , vol. 570, pp. 484+, JUN 27 2019.[14] J. C. Straton, T. T. Bilyeu, B. Moon, and P. Moeck, “Double-tip eﬀects on scanning tunnelingmicroscopy imaging of 2d periodic objects: unambiguous detection and limits of their removal bycrystallographic averaging in the spatial frequency domain,”

Crystal Research and Technology ,vol. 49, no. 9, pp. 663–680, 2014.[15] R. A. J. Woolley, J. Stirling, A. Radocea, N. Krasnogor, and P. Moriarty, “Automated probemicroscopy via evolutionary optimization at the atomic scale,”

Applied Physics Letters , vol. 98,p. 253104, jun 2011.[16] J. Stirling, R. A. Woolley, and P. Moriarty, “Scanning probe image wizard: A toolbox forautomated scanning probe microscopy data analysis,”

Review of Scientiﬁc Instruments , vol. 84,no. 11, p. 113701, 2013.[17] Y. Wang, J. I. Kilpatrick, S. P. Jarvis, F. M. F. Boland, A. Kokaram, and D. Corrigan, “Double-tipartifact removal from atomic force microscopy images,”

IEEE Transactions on Image Processing ,vol. 25, pp. 2774–2788, June 2016.[18] R. A. Wolkow, “Direct observation of an increase in buckled dimers on si (001) at low temperature,”

Physical review letters , vol. 68, no. 17, p. 2636, 1992.[19] A. Sweetman, J. Stirling, S. P. Jarvis, P. Rahe, and P. Moriarty, “Measuring the reactivity of asilicon-terminated probe,”

Physical Review B , vol. 94, sep 2016. mbedding Human Heuristics in Machine-Learning-Enabled Probe Microscopy [20] A. Sweetman, S. Jarvis, R. Danza, and P. Moriarty, “Eﬀect of the tip state during qplusnoncontact atomic force microscopy of si (100) at 5 k: Probing the probe,” Beilstein journal ofnanotechnology , vol. 3, p. 25, 2012.[21] M. Møller, S. P. Jarvis, L. Gu´erinet, P. Sharp, R. Woolley, P. Rahe, and P. Moriarty, “Automatedextraction of single h atoms with STM: tip state dependency,”

Nanotechnology , vol. 28, p. 075302,jan 2017.[22] M. A. Walsh and M. C. Hersam, “Atomic-Scale Templates Patterned by Ultrahigh VacuumScanning Tunneling Microscopy on Silicon,”

Ann. Rev. Phys. Chem. , vol. 60, pp. 193–216,2009.[23] T. Shen, C. Wang, G. Abeln, J. Tucker, J. Lyding, P. Avouris, and R. Walkup, “Atomic-scale desorption through electronic and vibrational-excitation mechanisms,”

Science , vol. 268,pp. 1590–1592, JUN 16 1995.[24] G. Lopinski, D. Wayner, and R. Wolkow, “Self-directed growth of molecular nanostructures onsilicon,”

Nature , vol. 406, no. 6791, p. 48, 2000.[25] M. Fuechsle, J. A. Miwa, S. Mahapatra, H. Ryu, S. Lee, O. Warschkow, L. C. Hollenberg,G. Klimeck, and M. Y. Simmons, “A single-atom transistor,”

Nature nanotechnology , vol. 7,no. 4, p. 242, 2012.[26] B. Weber, S. Mahapatra, H. Ryu, S. Lee, A. Fuhrer, T. C. G. Reusch, D. L. Thompson, W. C. T.Lee, G. Klimeck, L. C. L. Hollenberg, and M. Y. Simmons, “Ohm’s Law Survives to the AtomicScale,”

Science , vol. 335, pp. 64–67, JAN 6 2012.[27] R. Achal, M. Rashidi, J. Croshaw, D. Churchill, M. Taucer, T. Huﬀ, M. Cloutier, J. Pitters, andR. A. Wolkow, “Lithography for robust and editable atomic-scale silicon devices and memories,”

Nature Comms. , vol. 9, JUL 23 2018.[28] T. Huﬀ, H. Labidi, M. Rashidi, L. Livadaru, T. Dienel, R. Achal, W. Vine, J. Pitters, and R. A.Wolkow, “Binary atomic silicon logic,”

Nature Electronics , vol. 1, pp. 636–643, DEC 2018.[29] I. Goodfellow, Y. Bengio, and A. Courville,

Deep learning . MIT press, 2016.[30] T. Fawcett, “An introduction to roc analysis,”

Pattern recognition letters , vol. 27, no. 8, pp. 861–874, 2006.[31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,”

Journal of MachineLearning Research , vol. 12, pp. 2825–2830, 2011.[32] T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based naturallanguage processing,” ieee Computational intelligenCe magazine , vol. 13, no. 3, pp. 55–75, 2018.[33] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale imagerecognition,” arXiv preprint arXiv:1409.1556 , 2014.[34] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, andT. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,”in

Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 2625–2634, 2015.[35] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”