[PDF] Nanosurveyor: a framework for real-time data processing

Abstract

Scientists are drawn to synchrotrons and accelerator based light sources because of their brightness, coherence and flux. The rate of improvement in brightness and detector technology has outpaced Moore's law growth seen for computers, networks, and storage, and is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionality. Here we present an integrated software/algorithmic framework designed to capitalize on high throughput experiments, and describe the streamlined processing pipeline of ptychography data analysis. The pipeline provides throughput, compression, and resolution as well as rapid feedback to the microscope operators.

Full PDF

DDaurer ∗ et al. RESEARCH

Nanosurveyor: a framework for real-time dataprocessing

Benedikt J. Daurer ∗ , Hari Krishnan ∗ , Talita Perciano , Filipe R.N.C. Maia , David A. Shapiro , JamesA. Sethian and Stefano Marchesini Abstract

Scientists are drawn to synchrotrons andaccelerator based light sources because of theirbrightness, coherence and ﬂux. The rate ofimprovement in brightness and detector technologyhas outpaced Moore’s law growth seen forcomputers, networks, and storage, and is enablingnovel observations and discoveries with fasterframe rates, larger ﬁelds of view, higher resolution,and higher dimensionality. Here we present anintegrated software/algorithmic frameworkdesigned to capitalize on high throughputexperiments, and describe the streamlinedprocessing pipeline of ptychography data analysis.The pipeline provides throughput, compression,and resolution as well as rapid feedback to themicroscope operators.

Keywords: streaming; ptychography

When new drugs are synthesized [1], dust particles arebrought back from space [2], or new superconductorsare discovered [3], a variety of sophisticated X-ray mi-croscopes, spectrometers and scattering instrumentsare often summoned to characterize their structureand properties. High resolution and hyperspectral X-ray imaging, scattering and tomography instrumentsat modern synchrotrons are among the workhorses ofmodern discovery to study nano-materials and charac-terize chemical interactions or electronic properties attheir interfaces.A new generation of microscopes are being pio-neered, commissioned and planned at several U.S De-partment of Energy (DOE) user facilities [4, 5, 6] and * Correspondence: [email protected], [email protected],[email protected]. Lawrence Berkeley National Laboratory, Berkeley, CA, USAFull list of author information is available at the end of the article elsewhere to achieve superior resolution and contrastin three dimensions, encompassing a macroscopic ﬁeldof view and chemical or magnetic sensitivity, by cou-pling together the brightest sources of tunable X-rays,nanometer positioning, nanofocusing lenses and fasterdetectors. Existing soft X-ray detector technology inuse at the Advanced Light Source (ALS) for exam-ple generates 350 MBytes/second per instrument [7],commercial detectors for hard X-rays can record 6GB/second or raw data per detector [8, 9], and a syn-chrotron light source can support 40 or more exper-iments simultaneously 24 hours a day. Furthermore,accelerator technology such as multi-bend achromat[10] will increase brightness by two orders of magni-tude around the globe [11, 12].Transforming massive amounts of data into thesharpest images ever recorded will help mankindunderstand ever more complex nano-materials, self-assembled devices, or to study diﬀerent length-scalesinvolved in life - from macro-molecular machines tobones - where observing the whole picture is as impor-tant as recovering the local arrangement of the com-ponents. In order to do so, there is a need for reducingraw data into meaningful images as rapidly as possi-ble, using the fewest possible computational resourcesto sustain ever increasing data rates.Modern synchrotron experiments often have quitecomplex processing pipelines, iterating through manydiﬀerent steps until reaching the ﬁnal output. One ex-ample for such an experiment is ptychography [15, 16,17], which enables one to build up very large imagesby combining the large ﬁeld of view of a high preci-sion scanning microscope system with the resolutionprovided by diﬀraction measurements.Ptychography uses a small step size relative to thesize of the illuminating beam when scanning the sam-ple, continuously generating large redundant datasetsthat can be reduced into a high resolution image. Reso-lution of a ptychography image does not depend on thesize or shape of the illumination. X-ray wavelengthscan probe atomic and subatomic scales, although res-olution in scattering experiments is limitated by otherfactors such as radiation damage, exposure and bright-ness of the source to a few nanometers except in spe-cial cases (such as periodic crystals). To reconstruct an a r X i v : . [ phy s i c s . i n s - d e t ] S e p aurer ∗ et al. Page 2 of 9 image of the object from a series of x-ray scattering ex-periments, one needs to solve a diﬃcult phase retriavalproblem because at short wavelengths it is only pos-sible to measure the intensity of the photons on a de-tector. The phase retrieval problem is made tractablein ptychography by recording multiple diﬀraction pat-terns from overlapping regions of the object, providingredundant datasets to compensate for the lack of thephase information. The problem is made even morechallenging in the presence of noise, experimental un-certainties, optical aberrations, and perturbations ofthe experimental geometry which require specializedsolvers and software [18, 19, 20].In addition to its complex reconstruction pipeline, aptychography experiment involves additional I/O op-erations such as calibrating the detector, ﬁltering rawdata, and communicating parameters (such as X-raywavelength, scan positions, detector distance and ﬂuxor exposure times) to the analysis infrastructure.Large community driven projects have developedframeworks optimized for distributed data stream pro-cessing. Map-Reduce based solutions such as Hadoop[21, 22] and Spark [23] provide distributed I/O, a uni-ﬁed environment, and hooks for running map and re-duce operations over a cloud-based network. Otherframeworks such as Flink [24], Samza [25], and Storm[26] are more tailored for realtime stream processing oftasks executing a directed acyclic graph (DAG) [27] ofoperations as fast as possible. Workﬂow graphs such aLuigi [28] and Dask Distributed [29, 30] provide an it-erative component, but are either optimized for batchprocessing and workers are treated as a singular entityable to execute the DAG in its entirety.Such frameworks target operations as a unit of tasksand generalize the notion of resources, however theecosystem is harder to decentralize. These paradigmsare not easily mappable to a production beamline en-vironment, where data from a detector might be run-ning on a ﬁeld-programmable gate array (FPGA), themotion control system on a real-time MCU, the ac-quisition control on a windows operating system, andthe scientist a macOS laptop. The rest of the pipelinetasks might hop to several diﬀerent architectures in-cluding CPUs for latency bound tasks, and GPUs forhigh throughput image processing and visualization.While frameworks such as Flink along with Kafka[31] (high throughput distributed message system) andZooKeeper [32] (distributed coordination and manage-ment) can be adopted to ﬁt the described processingenvironment, our solution at a lower level accomplishesthe same task with less computational and human re-sources.

Nanosurveyor is a modular framework to sup-port distributed real-time analysis and visualization of data. The framework makes use of a modular in-frastructure similar to Hummingbird [33] developed tomonitor ﬂash x-ray imaging experiments at free elec-tron lasers (FELs) with high data rates in real timeover multiple cores and nodes.Within this framework, we developed a streamlinedprocessing pipeline for ptychography which uniﬁes allcomponents involved and allows users to monitor andquickly act upon changes along the experimental andcomputational pipeline.

Nanosurveyor was developed to provide real-timefeedback through analysis and visualization for exper-iments performed at synchrotron facilities, and exe-cute a complex set of operations within a productionenvironment. Its design is such that it can be eﬀec-tively adapted to diﬀerent beamline environments. Itis built around a client-server infrastructure allowingusers to use facility resources while located at a beam-line or remotely, operating on live data streamed fromthe beamline. Additionally, one can use the

Nanosur-veyor user interface for oﬀ-line processing of experi-mental data saved on disk. In this section we describethe resources and capabilities provided by the modularstreaming infrastructure.

As described above,

Nanosurveyor is designed to beadaptable and modular. Therefore, we designed it witha client-server infrastructure (1) enabling users to runtheir experiment while at the beamline or remotelyfrom their institution. This strategy also allows theclient to be very light and ﬂexible while the server canbe scaled according to the resources needed.The

Nanosurveyor infrastructure equips eachmodule with two fundamental capabilities. First, adescription format language of key-value pairs allowsevery module to describe its input and output. Sec-ond, it provides the ability to describe the connectionbetween the modules, including the front-end.The capability to connect the communication pathbetween modules allows the end-to-end pipeline to beconstructed and described seamlessly. This is donethrough a proxy communication layer allowing themodules to run either closely together or on completelyseparate machines. This strategy is transparent to thebeamline user and accommodates both environmentswith centralized resources as well as those where re-sources are spread across a network.Additionally, as each module in the pipeline canbe executed in its own environment,

Nanosurveyor provides dynamic parallelism by allowing the user toscale the number of resources available to each step: aurer ∗ et al. Page 3 of 9

Dataprocessing unit

Data streamEvent loop

Data collection unit D e t ec t o r Motors, etc.

Trigger

FrontendBackend

Visualization & Experimental control unit

Dark calibrationDetector correctionData ﬁltering & reduction Image reconstruction Image segmentationData writingControl socketData socket

Figure 1

Overview of the real-time streaming framework ofNanosurveyor. The modular server-client infrastructure isdivided into a back-end (running the data processing unit) anda front-end (running the visualization and control unit). Oncean experiment has started, the data collection unitcontinuously receives new data packets from a detector andsends raw data frames to the data processing unit. Dependingon the speciﬁc needs of the experiment, data is beingcorrected, reduced, reconstructed and various outputs arewritten to ﬁle. At all times, there is an active connection(asynchronous socket communication) between allcomponents (including the visualization interface) allowing theuser to monitor progress while data is still being acquired andprocessed. this is done by treating each stage as a worker processthat can be scaled up or down to address bottleneckor performance issues.

The core components of the

Nanosurveyor stream-ing software are written in Python using

ZeroMQ , ahigh performance messaging library [34] for networkcommunication,

PyQt4 [35] and

PyQtGraph [36] for thegraphical user interface (GUI) and visualization, and

Numpy [37] together with

Scipy [38] for manipulationof data arrays. For some components, we used C ex-tensions in order to boost the performance to meetthe demands of producing a real-time interactive toolrunning at the beamline.Python is a language with a robust and active com-munity with libraries that are well tested, supported,and maintained. Additionally, the choice of Python al-lows our infrastructure to be ﬂexible to the demands ofvarying requirements of diﬀerent processing pipelines.The ptychography pipeline (discussed in detail later inthe paper) contains GPU optimized code and Pythonbinding support easily allows the Nanosurveyor in-frastructure to provide support for these types of hy-brid architectures. The framework currently runs onMac, Linux and Linux-based cluster environments, andcan be extended to Windows platforms depending on support for module dependencies. The core compo-nents that

Nanosurveyor depends on are availableon all major platforms.

A critical component in generating usable real-timepipelines relies on the communication infrastructure.This enables a clear and concise separation of the in-puts and outputs at the module level. Furthermore,it deﬁnes how modules communicate from beginningto end, and ensures that tasks are load-balanced toachieve the appropriate performance characteristics ofthe pipeline.The communication in

Nanosurveyor uses JavascriptObject Notation (JSON) [39], an industry standardway of conveying metadata between modules as wellas between the front-end and back-end. The metadataprovides a human readable component.

ZeroMQ provides the communication backbone of the

Nanosurveyor infrastructure. Using the publisher-subscriber model for the core components enables

Nanosurveyor to provide a load-balancing scheme,which uses a backlog queue to avoid losing data whensuﬃcient resources are not be available. The execu-tion pipeline creates a command port and a data port.The command port allows metadata to reach and up-date parameters as well as return responses to keepstatus requests alive and provide feedback on the cur-rent state of the running module. The data port movesdata through the pipeline, running the actionable itemwithin each module and moving the result to the out-put queue to be processed by the next stage of thepipeline.Two types of conﬁgurations are required: front-endand back-end. The front-end sets up the variables nec-essary for each module to function while the back-endconﬁguration is responsible for allocating resources,balancing the load of workers, scheduling activities,and communicating between modules while providingfeedback to the front-end.These two components provide the

Nanosurveyor infrastructure with the information it needs to estab-lish the relevant connections, receive and send parame-ters to ensure proper conﬁguration, and introspect thestate of parameters and data to provide visual feed-back to the user when running through the processingpipeline.

The

Nanosurveyor framework consists of an as-sortment of core components that ensure that thefront-end provides easy to use and adaptable inter-face while the back-end is eﬃcient, resilient, and re-sponsive. The individual processing modules are all aurer ∗ et al. Page 4 of 9 based on the same structure: an event loop runs rout-ing data from the control and data sockets, waiting fortasks, asking the handler for conﬁguration parameters(JSON string), and processing data (receiving/sendingthrough the data socket).

The main back-end handler is running a big

ZeroMQ event loop. The main task of the handler is to registerthe modules that run on the back-end and ensure dataand control paths are appropriately connected up andrunning. It also does the following: • Launches all the processing modules as separateprocesses (single-core or MPI) and keeps track ofthe jobs started. This can be done with a batchprocessing system such as SLURM (or any otherqueuing system) or by launching separate pythonprocesses; • Creates the sockets for streaming pipeline, whichis a list of control and data sockets communicatingbetween the handler and all the processing mod-ules as well as the data collector and the interface; • Runs the event loop, takes commands, deals outdata packets and handles everything in the back-end including user interruption and other controland conﬁguration commands.

Tracking and ensuring the correctness of data is an im-portant part of the execution pipeline. The

Nanosur-veyor framework provides a module called nscxwrite which allows customized writing of ﬁles at diﬀerentstages of the data acquisition pipeline (raw, ﬁltered,and reconstructed). This capability provides severalbeneﬁts, such as assurances to users that data movescorrectly from module to module and is not corruptedalong the way, as well as an ability to debug an algo-rithm that is executed within a complex sequence ofevents.Furthermore, the ability to save intermediate datacan be enabled or disabled (for performance reasonsor to reduce storage) as well as customized. Theframework also comes with a standalone script called nsraw2cxi , that translates raw detector data to pro-cessed CXI ﬁles, and a script to stream simulatedFCCD data through the pipeline for testing. The dataformat of the output ﬁles follows the CXI ﬁle format[40].

Nanosurveyor also provides a way to debug a com-plex pipeline through logging of both the output anderror channels which includes communication betweenmodules as well as output and error that arise fromwithin modules. The output of all modules are piped to STDOUT andSTDERR within the ﬁle system running each process( $HOME/.nanosurveyor/streaming/log/ ).This is a useful tool that invokes tail -f on thepiped out/err ﬁles, making it possible to monitor whatis going within the individual processing modules.

For the front-end, the framework provides a versatileGUI based on

PyQt4 and

PyQtgraph for monitoring,visualizing and controlling the data processed live orpost-processed through the pipeline.

PyQt4 (built on Qt ) provides the ability to construct and modify theuser interface to easily add and remove functionalitywhile PyQtgraph provides access to advanced visual-ization functionality for data that can be representedas images or volumes. Several common operations pro-vided through the framework include: • View the content of already processed ﬁles: In-spect reconstructions from collected data and pro-vide other useful utilities (histograms, error lineplots, correlation plots, and others); • Control and monitor the streaming: Conﬁgurestreaming, inspect live reconstruction, monitorperformance (upload/download rates, status up-date of the streaming components); • Simulate an experiment starting from an SEM im-age or similar; • Process and inspect, through a provided interface,data from custom modules processed on the back-end (e.g. data from a ptychography or Tomogra-phy reconstruction).Generally speaking, the design facilitates adding newmodules to the GUI, e.g. a viewer for tomogramsor similar. This ﬂexibility allows the front-end to becustomized for diﬀerent beamline processing environ-ments.Finally, the architecture aims to be modular in thefront and back-end of the client-server architecture,meaning that there is a template structure for the ba-sic features of a processing module. Additionally, inprinciple, any given processing module can be hookedinto this network (e.g. tomography, spectral analysis,or any other image analysis).

We adapted the outlined streaming framework de-scribed above for the speciﬁc needs of ptychogra-phy and are currently implementing this ptychogra-phy streaming pipeline at the beamline for scanningtransmission X-Ray microscopy (STXM) at the Ad-vanced Light Source (ALS). The main motivation forthis project is to make high-resolution ptychographicreconstructions available to the user in real-time. To aurer ∗ et al. Page 5 of 9

Handler

Frontend

Experiment control Graphical User Interface (GUI)

Backend

DarkworkerFrameworker1 Image workerFrame worker2 Frame worker N SHARP

TCP (connecting)TCP (binding)

Control sockets:

TCP (binding)UDP (binding)UDP (connecting)TCP (connecting)

Data sockets:

Data streamTriggerEvent loop

Framegrabber

CCD

Scan x/y

Figure 2

Streaming pipeline implemented at the ALS forptychographic imaging. The software structure follows thesame logic as sketched in Figure 1. Once a new scan has beentriggered by the experimental control, a frame-grabbercontinuously receives raw data packets from the camera,assembles them to a frame and sends raw frames to theback-end. Incoming frames are processed by diﬀerent (andindependent) workers of the back-end and reduced data issend back to the front-end and visualized in a graphical userinterface (GUI). A handler is coordinating the data andcommunication workﬂow. achieve this goal, we streamlined all relevant process-ing components of ptychography into a single unit. Adetailed outline of our pipeline is sketched in Figure 2.As described in the previous sections, we follow theidea of a modular streaming network using a client-server architecture, with a back-end for ptychographicprocessing pipeline and a front-end for conﬁguration,control and visualization purposes.On the back-end side, the streaming infrastructureis composed of a communication handler and four dif-ferent kinds of workers addressing dark frames, diﬀrac-tion frames, reduced and downsampled images and theptychographic reconstruction using a software pack-age for scalable heterogeneous adaptive real-time pty-chography (

SHARP [20]). The handler bridges theback-end with the front-end and controls the commu-nication and data ﬂow among the diﬀerent back-endworkers. The dark worker accumulates dark framesand provides statistical maps (mean and variance) ofthe noise structure on the detector. The frame work-ers transform raw into clean (pre-processed) diﬀrac-tion frames. This involves a subtraction of the av-erage dark, ﬁltering, photon counting and downsam-pling. Depending on the computing capacities of theback-end, it is possible to run as many frame work-ers simultaneously as needed. The image worker re-

Figure 3

Graphical User Interface (GUI) for theptychographic streaming pipeline implemented at the ALS.The interface provides (a) Real-time view of the ptychographicreconstruction (high resolution), (b) Real-time view of theSTXM analysis (low resolution), (c) Current guess of theillumination function, (d) Current processed data frame, (e)Logging and error messages and (f) Error metrics of theiterative reconstruction process, and other control andmonitoring elements around. duces a collection of clean diﬀraction frames, produc-ing low-resolution image reconstructions and an initialestimate of the illumination function which, togetherwith the clean diﬀraction frames, is then feeded as aninput for the high-resolution ptychograhic reconstruc-tion worker (

SHARP ).The front-end consists of a worker that reads rawdata frames from a fast charge-coupled device (FCCD)[41], coordinating with a separately developed inter-face for controlling the experiment (such as motors andshutters) and a graphical user interface (GUI) whichis used both for visualizing and controlling the ongo-ing reconstruction. An example view of the GUI forstreaming ptychography is shown in Figure 3.Following the data ﬂow along the streaming pipeline,the starting trigger comes from the control interfacewhich initiates a new ptychographic scan providinginformation about the scan (step size, scan pattern,number of scan points) and other relevant informa-tion (e.g. wavelength) to the back-end handler. Simul-taneously, the control sends triggers to the scanningmotors and the FCCD. A typical ptychographic scancombines the accumulation of a given number of darkframes together with scanning the sample in a regionof interest. The frame-grabber, already waiting for rawdata packets to arrive, assembles the data and sends itframe-by-frame to the back-end handler. When deal-ing with an acquisition control system that runs inde-pendently, the handler can distinguish between darkand data frames using counters. Dark and data framesare distributed to the corresponding workers. Having aurer ∗ et al. Page 6 of 9 clean diﬀraction frames and an initial guess for the il-lumination ready, the

SHARP worker is able to startthe iterative reconstruction process.

SHARP initial-izes and allocates space to hold all frames in a scan,computes a decomposition scheme, initializes the im-age and starts the reconstruction process. Unmeasuredframes are either set to a bright-ﬁeld frame (measuredby removing the sample) or their weight is set to 0until real data is received.Depending on the conﬁguration, data at diﬀerentstates within the streaming ﬂow can be displayed inthe GUI and/or saved to a CXI ﬁle via the nscxiwrite worker module.All components of the streaming interface run in-dependent event loops and use asynchronous (non-blocking) socket communication. To maximize perfor-mance, the front-end operates very close to the ac-tual experiment, while the back-end runs remotely ona powerful GPU/CPU cluster.

We developed the following processing scheme for de-noising and cleaning the raw data from the FCCD andpreparing frames for the ptychographic reconstruction,1 Deﬁne center (acquire some diﬀraction framesand compute the center of mass if needed).This is needed for cropping, and to deal withbeamstop transmission;2 Average dark frames: we ﬁrst acquire a se-quence of frames when no light is present, andcompute the average and standard deviationof each pixel and readout block. We set a bi-nary threshold to deﬁne bad (noisy) pixels orbad (noisy) ADC channels, when the stan-dard deviation is above a threshold or if thestandard deviation is equal to 0;3 Remove oﬀset using the overscan linear orquadratic oﬀset: stretch out the read-out se-quence in time and ﬁt a second order polyno-mial over the overscan;4 Identify the background by thresholding;5 Perform a Fourier transform of the readoutsequence of the background for each channel,remove high frequency spikes by thresholding,and subtract from data;6 Threshold signal below 1 photon;7 Divide by beamstop transmission;8 Crop image around center;9 Downsample: take a fast Fourier transform(FFT), crop or multiply by a kernel (e.g.Gaussian) and take inverse fast Fourier trans-form (IFFT).

For testing the functionality and performance of thestreaming ptychography pipeline as well as exploringdiﬀerent conﬁgurations, we developed a protocol thatsimulates an entire ptychography scan. Using a sim-ulated illumination from a Fresnel Zone Plate (FZP)and basic scan parameters (number of scan points, thescanning step size, the scanning pattern), diﬀractionpatterns from a well-known test sample are calculatedin the same raw data format as those generated by theFCCD. As a last step, Poisson noise and a real back-ground are added to the data. These raw data packetstogether with the simulated metadata are introducedto the end-to-end streaming pipeline and produce out-puts as shown in Figure 3.One major beneﬁt of this feature is the ability toscale and test the pipeline at diﬀerent acquisition ratesand therefore be able to provide performance metricson the behavior of a sequence of algorithms enablingdevelopers to further improve their execution pipeline.In a simple performance test, we simulated a 40x40scan producing 1 600 raw data frames which were sentby a virtual FCCD at a rate of 10 Hz. At the endof the pipeline, we observed a complete reconstructedimage after around 5 minutes. This translates into astreamlining pipeline rate of about 2 Hz, with most ofthe time spent on ﬁltering and cleaning the individualframes. A signiﬁcant portion of the pre-processing timeis unique to the FCCD pipeline. While this rate is stillfar from ideal, it can be easily be sped up and scaled byusing parallel execution, load-balancing strategies, andeventually through high throughput GPU optimiza-tions. With further improvements on the performanceon the individual components as well as optimizationof the network communication, we expect a substantialincrease of the processing rate.

Experimental data produced by the FCCD can involvemissing frames, corrupted frames, and timing issuesbetween diﬀerent hardware and software components.In addition, the correct choice parameter values for theptychographic reconstruction might be inherent to thedata itself and can thus carry from exeperiment to ex-periment. To make

Nanosurveyor more robust forsuch cases, it is desirable to expose conﬁguration pa-rameters as a runtime or heuristic feature rather thandetermined them at execution time, and take a moredata-based approach where options are set based onfeature detection.

Performance considerations and additional limitationsmust be understood and considered in integrating such aurer ∗ et al. Page 7 of 9 an execution pipeline in a production environment.While the following list is not comprehensive, in build-ing this environment, we have considered the following: • Limits (performance, algorithm, memory, disk)to software and hardware need to be consid-ered. The

Nanosurveyor infrastructure pro-vides logging support while the

ZeroMQ publisher-subscriber model allows a stuck or crashed processto be replaced with another. The current solution

Nanosurveyor can be made more robust andthis is work that is considered as active and on-going; • Hardware failures are inevitable in a produc-tion environment involving machinery. Recoveryfrom these types of issues requires customizationfor each beamline environment. Within Nanosur-veyor, there is a heartbeat for each module and abase mechanism within the framework to informthe user that a failure (or multiple failures) mighthave occurred; • Interrupting experiments should be a core use caseof any real-time feedback loop when trying to getan understanding of the data as quickly as possi-ble. Once information about the material is ﬂow-ing through the computational pipeline, it is valu-able to be able to determine if an experiment is,in fact, failing or uninteresting. This can occur inmany ways such as wrong setup, wrong materialor wrong region of scanning. For these scenariosit is prudent for a working pipeline to be able toabort, clear out the pipeline, and reset itself; • Expensive operations and algorithms executed ina beamline operating environment may have vary-ing degrees of performance characteristics (seebullet point on Limits ). These characteristics canoften slow down the overall pipeline if any oneof the operations is ineﬃcient.

Nanosurveyor attempts to get around this issue in two ways:ﬁrst, it allows for a load-balancing approach wheremore workers can be added to the expensive stagesof the pipeline. Second, using the

ZeroMQ queue,the beamline can still operate with the slowdownand backlog while ensuring that the pipeline cancontinue to function, at least until hardware mem-ory runs out. This issue can also be mitigated byevaluating the performance of the module and ifpossible optimizing the algorithm as well.

This work introduced

Nanosurveyor - a frameworkfor real-time processing at synchrotron facilities. Theinfrastructure provides a modular framework, supportfor load-balancing operations, the ability to run in adistributed client-server mode, and gives feedback oneach stage of a complex pipeline. The framework was adapted to support streamlinedpipelines for ptychography. In this case, expensivestages such as pre-processing are load-balanced withmultiple workers, and image reconstruction are paral-lelized over MPI to compute eﬃciently in a distributedmanner. Results from every stage of the pipeline arethen transmitted to the front-end, providing users atthe beamline comprehensive knowledge of the experi-ment and of how the data is transformed from start ofacquisition to end output. Although the

Nanosur-veyor framework provides several core capabilitiesthat are necessary for operating at typical beamlines,there are several key advances that we are currentlyworking on to make the computational pipeline com-plete. A couple of highlights include:

Iterative execution, instrument control:

Adding sup-port for controlling the beamline itself will completethe current pipeline and provide an iterative execu-tion loop enabling future pipelines to adaptively ac-quire and analyze data from the operating beamline,and automatically request more data when necessary.For example, if the reconstruction detects bad frames,or that the sample has drifted, then more frames canbe automatically requested on the ﬂy without inter-rupting the overall experiment. If the reconstructiondetermines that part of the image being acquired isempty or uninteresting it could request fewer framesand focus on the relevant part of the sample.

Optimizing pipeline execution:

Currently commu-nication occurs over

ZeroMQ providing many bene-ﬁts, including dealing with backlog, automated load-balancing, and the ability to interleave work runningdiﬀerent stages of the execution pipeline. We are alsoinvestigating ways to fuse modules to optimize execu-tion times. Making communication agnostic by usinghandles enables eﬃcient use of memory optimizationstrategies, socket communication, or saving on datamovement costs e.g., transferring data between GPU-based modules by moving a pointer rather than copy-ing data.In conclusion, we have presented a framework that isbuilt to run at modern beamlines, can handle the geo-graphic considerations between users and experimentsrunning at synchrotron facilities, and supports real-time feedback. These features, along with the modu-lar design, provide a foundation that can be extendedand readily deployed on many of the beamlines in usetoday. Further information about

Nanosurveyor isavailable at or upon request to [email protected].

Competing interests

The authors declare that they have no competing interests. aurer ∗ et al. Page 8 of 9

Author’s contributions

BJD, HK, TP, FM and SM designed and implemented the real-timestreaming framework. BJD, HK, TP, JAS and SM wrote the manuscriptwith contributions from all. DAS translated the preprocessing code frommatlab to python and helped us testing the streaming ptychographyframework at the ALS.

Acknowledgements

This work was partially funded by the Center for Applied Mathematics forEnergy Research Applications, a joint ASCR-BES funded project within theOﬃce of Science, US Department of Energy, under contract numberDOE-DE-AC03-76SF00098, by the Swedish Research Council and by theSwedish Foundation for Strategic Research. The Advanced Light Source issupported by the Director, Oﬃce of Science, Oﬃce of Basic EnergySciences, of the U.S. Department of Energy under Contract No.DE-AC02-05CH11231.

Author details Laboratory of Molecular Biophysics, Department of Cell and MolecularBiology, Uppsala University, Uppsala, SE. Lawrence Berkeley NationalLaboratory, Berkeley, CA, USA. Department of Mathematics, Universityof California, Berkeley, Berkeley, CA, USA.

References

1. Cavalier, M.C., Pierce, A.D., Wilder, P.T., Alasady, M.J., Hartman,K.G., Neau, D.B., Foley, T.L., Jadhav, A., Maloney, D.J., Simeonov,A., et al. : Covalent small molecule inhibitors of ca2+-bound s100b.Biochemistry (42), 6628–6640 (2014)2. Westphal, A.J., Stroud, R.M., Bechtel, H.A., Brenker, F.E.,Butterworth, A.L., Flynn, G.J., Frank, D.R., Gainsforth, Z., Hillier,J.K., Postberg, F., et al. : Evidence for interstellar origin of seven dustparticles collected by the stardust spacecraft. Science (6198),786–791 (2014)3. Uchiyama, H., Shen, K., Lee, S., Damascelli, A., Lu, D., Feng, D.,Shen, Z.-X., Tajima, S.: Electronic structure of mgb 2 fromangle-resolved photoemission spectroscopy. Physical review letters (15), 157002 (2002)4. Nazaretski, E., Huang, X., Yan, H., Lauer, K., Conley, R., Bouet, N.,Zhou, J., Xu, W., Eom, D., Legnini, D., Harder, R., Lin, C.-H., Chen,Y.-S., Hwu, Y., Chu, Y.S.: Design and performance of a scanningptychography microscope. Review of Scientiﬁc Instruments (3)(2014). doi:10.1063/1.48689685. Winarski, R.P., Holt, M.V., Rose, V., Fuesz, P., Carbaugh, D., Benson,C., Shu, D., Kline, D., Stephenson, G.B., McNulty, I., et al. : A hardx-ray nanoprobe beamline for nanoscale microscopy. Journal ofsynchrotron radiation (6), 1056–1060 (2012)6. Shapiro, D., Roy, S., Celestre, R., Chao, W., Doering, D., Howells, M.,Kevan, S., Kilcoyne, D., Kirz, J., Marchesini, S., et al. : Developmentof coherent scattering and diﬀractive imaging and the cosmic facilityat the advanced light source. In: Journal of Physics: Conference Series,vol. 425, p. 192011 (2013). IOP Publishing7. Doering, D., Chuang, Y.-D., Andresen, N., Chow, K., Contarato, D.,Cummings, C., Domning, E., Joseph, J., Pepper, J.S., Smith, B.,Zizka, G., Ford, C., Lee, W.S., Weaver, M., Patthey, L., Weizeorick,J., Hussain, Z., Denes, P.: Development of a compact fast ccd cameraand resonant soft x-ray scattering endstation for time-resolvedpump-probe experiments. Review of Scientiﬁc Instruments (7),073303 (2011). doi:10.1063/1.36098628. Broennimann, C., Eikenberry, E.F., Henrich, B., Horisberger, R.,Huelsen, G., Pohl, E., Schmitt, B., Schulze-Briese, C., Suzuki, M.,Tomizaki, T., Toyokawa, H., Wagner, A.: The pilatus 1m detector.Journal of Synchrotron Radiation (2), 120–130 (2006).doi:10.1107/S09090495050386659. Dinapoli, R., Bergamaschi, A., Henrich, B., Horisberger, R., Johnson,I., Mozzanica, A., Schmid, E., Schmitt, B., Schreiber, A., Shi, X., etal. : Eiger: Next generation single photon counting detector for x-rayapplications. Nuclear Instruments and Methods in Physics ResearchSection A: Accelerators, Spectrometers, Detectors and AssociatedEquipment (1), 79–83 (2011)10. Eriksson, M., Al-dmour, E., Ahlb¨ack, J., Andersson, ˚A., Bocchetta, C.,Johansson, M., Kumbaro, D., Leemann, S., Lilja, P., Lindau, F., et al. :The max iv facility. In: Journal of Physics: Conference Series, vol. 425,p. 072008 (2013). IOP Publishing 11. Almer, J., Chupas, P., Stephenson, B., Tiede, D., Vogt, S., Young, L.,Evans, P., Parise, J., Suter, B.: Emerging opportunities in high-energyx-ray science: The diﬀraction-limited storage ring frontier. SynchrotronRadiation News (1), 12–13 (2016).doi:10.1080/08940886.2016.1124675.http://dx.doi.org/10.1080/08940886.2016.112467512. Reich, E.S., et al. : Ultimate upgrade for us synchrotron. Nature (7466), 148–149 (2013)13. Tarawneh, H., Steier, C., Falcone, R., Robin, D., Nishimura, H., Sun,C., Wan, W.: Als-ii, a potential soft x-ray, diﬀraction limited upgradeof the advanced light source. In: Journal of Physics: Conference Series,vol. 493, p. 012020 (2014). IOP Publishing14. Borland, M., Sajaev, V., Sun, Y.: A seven-bend-achromat lattice as apotential upgrade for the advanced photon source. Proc. ofNA-PAC2013, MOPHO07, Pasadena, California, USA (2013)15. Rodenburg, J.M.: Ptychography and related diﬀractive imagingmethods. Advances in Imaging and Electron Physics (2008)16. Rodenburg, J.M., Hurst, A.C., Cullis, A.G., Dobson, B.R., Pfeiﬀer, F.,Bunk, O., David, C., Jeﬁmovs, K., Johnson, I.: Hard-x-ray lenslessimaging of extended objects. Phys. Rev. Lett. , 034801 (2007).doi:10.1103/PhysRevLett.98.03480117. Thibault, P., Dierolf, M., Menzel, A., Bunk, O., David, C., Pfeiﬀer, F.:High-Resolution scanning x-ray diﬀraction microscopy. Science (5887), 379–382 (2008). doi:10.1126/science.115857318. Nashed, Y.S.G., Vine, D.J., Peterka, T., Deng, J., Ross, R., Jacobsen,C.: Parallel ptychographic reconstruction. Opt. Express (26),32082–32097 (2014). doi:10.1364/OE.22.03208219. ptypy. http://ptycho.github.io/ptypy/

20. Marchesini, S., Krishnan, H., Daurer, B.J., Shapiro, D.A., Perciano,T., Sethian, J.A., Maia, F.R.N.C.:

SHARP : a distributed GPU-basedptychographic solver. Journal of Applied Crystallography (4),1245–1252 (2016). doi:10.1107/S160057671600807421. Apache Hadoop. http://hadoop.apache.org/

22. White, T.: Hadoop: The Deﬁnitive Guide, 1st edn. O’Reilly Media, Inc.Sebastopol CA, ??? (2009)23. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.:Spark: Cluster computing with working sets. In: Proceedings of the2Nd USENIX Conference on Hot Topics in Cloud Computing.HotCloud’10, pp. 10–10. USENIX Association, Berkeley, CA, USA(2010). http://dl.acm.org/citation.cfm?id=1863103.186311324. Apache Flink. http://flink.apache.org/

25. Apache Samza. http://samza.apache.org/

26. Apache Storm. http://storm.apache.org/

27. Jensen, F.V.: An Introduction to Bayesian Networks vol. 210. UCLpress London, ??? (1996)28. Luigi: A workﬂow engine in Python. https://luigi.readthedocs.io/en/stable

29. Dask Development Team: Dask: Library for Dynamic Task Scheduling.(2016). http://dask.pydata.org

30. Rocklin, M.: Dask: Parallel computation with blocked algorithms andtask scheduling. In: Huﬀ, K., Bergstra, J. (eds.) Proceedings of the14th Python in Science Conference, pp. 130–136 (2015)31. Apache Kafka. http://kafka.apache.org/

32. Apache ZooKeeper. http://zookeeper.apache.org/

33. Daurer, B.J., Hantke, M.F., Nettelblad, C., Maia, F.R.: Hummingbird:monitoring and analyzing ﬂash x-ray imaging experiments in real time.Journal of applied crystallography (3) (2016)34. Hintjens, P.: ZeroMQ: Messaging for Many Applications. ” O’ReillyMedia, Inc.”, ??? (2013)35. Riverbank Computing: (2016).

36. Campagnola, L.: (2016). http://pyqtgraph.org

37. van der Walt, S., Colbert, S.C., Varoquaux, G.: The numpy array: Astructure for eﬃcient numerical computation. Computing in ScienceEngineering (2), 22–30 (2011). doi:10.1109/MCSE.2011.3738. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open sourcescientiﬁc tools for Python. [Online; accessed 2016-04-07] (2001–).

39. ECMA international: (2016).

40. Maia, F.R.N.C.: The Coherent X-ray Imaging Data Bank. Nature aurer ∗ et al. Page 9 of 9

Methods (9), 854–855 (2012). doi:10.1038/nmeth.211041. Denes, P., Doering, D., Padmore, H., Walder, J.-P., Weizeorick, J.: Afast, direct x-ray detection charge-coupled device. Review of ScientiﬁcInstruments80