CHIPS: A Service for Collecting, Organizing, Processing, and Sharing Medical Image Data in the Cloud
Rudolph Pienaar, Ata Turk, Jorge Bernal-Rusiel, Nicolas Rannou, Daniel Haehn, P. Ellen Grant, Orran Krieger
CCHIPS – A Service for Collecting, Organizing, Processing, andSharing Medical Image Data in the Cloud
Rudolph Pienaar , , Ata Turk , Jorge Bernal-Rusiel , Nicolas Rannou , Daniel Haehn , P. Ellen Grant , , andOrran Krieger Boston Children’s Hospital, Boston, MA, 02115 USA, [email protected] , Harvard Medical School, Boston, MA 02115, USA, Boston University, Boston, MA 02115, USA, Harvard School of Engineering and Applied Sciences,Cambridge, MA 02138, USA, Eunate Technology S.L., Sopela, Spain
Abstract.
Web browsers are increasingly used as middleware platforms offering a central access point forservice provision. Using backend containerization, RESTful APIs, and distributed computing allows for complexsystems to be realized that address the needs of modern compute intense environments. In this paper, wepresent a web-based medical image data and information management software platform called
CHIPS ( C loud H ealthcare I mage P rocessing S ervice). This cloud-based services allows for authenticated and secure retrieval ofmedical image data from resources typically found in hospitals, organizes and presents information in a modernfeed-like interface, provides access to a growing library of plugins that process these data, allows for easy datasharing between users and provides powerful 3D visualization and real-time collaboration. Image processing isorchestrated across additional cloud-based resources using containerization technologies. Keywords: web-based neuroimaging, big-data, applied containerization, web-based collaborative visualization,real-time collaboration, HTML5, web services, telemedicine, cloud-storage
Modern web browsers are becoming powerful platforms for advanced application development [9,6]. New advancesin core web application technologies such as the modern web browsers’ universal support of ECMAScript 5 (and6) [10], CSS3 and HTML5 APIs have made it much more feasible to implement powerful middle-ware platformsfor data management and powerful graphical rendering, as well as real-time communication purely in client-sideJavaScript [12,2]. The last decade has seen a slow, but steady, shift to fully distributed solutions using web-standards[4,11,15,17], closely tracked by expressiveness of the JavaScript programming language. Web-based solutions areespecially appealing as they do not require the installation of any client-side software other than a standard webbrowser which enhances accessibility and usability.Unrelated to rise of web-technologies, a new emerging trend is the rapid adoption of containerization technologies.These have enabled the concept of compute portability in a similar sense to data portability. Just as data can bemoved from place to place, containerization allows for operations on that data to also be moved from place to place.To our knowledge, no web-based platform currently exists that provides data and compute agnostic services(some services, such as CBRAIN [15] and LONI [14] provide conceptually similar approaches, but do not have deepconnectivity to typical hospital database repositories), in particular collection, management, and real-time sharingof medical data, as well as access to pipelines that process that data. In this paper, we introduce
CHIPS ( C loud H ealthcare I mage P rocessing S ervice). CHIPS is a novel web-based medical data storage and data processingworkflow service that provides strict data security while also facilitating secure, real-time interactive collaborationover the Internet and internal Intranets.
CHIPS is able to seamlessly collect data from typical sources found in hospitals (such as Picture Archive andCommunications Systems, PACS) and easily export to approved cloud storage.
CHIPS not only manages datacollection and organization, but it also provides a large (and expanding) library of pipelines to analyze importeddata, and the containerized compute can execute in a large variety of remote resources.
CHIPS provides for persistentrecord and management of activity in feeds as well as for powerful visualization of data. In particular, it makes use ofthe popular
XTK toolkit which was also developed by our team at the Fetal-Neonatal Neuroimaging and Developmental a r X i v : . [ c s . D C ] O c t Science Center, Boston Childrens Hospital for the in-browser rendering and visualization of medical image data andcan be freely downloaded from the web [8]. Fig. 1:
CHIPS connects multiple inputPACS sources to multiple “cloud” computenodes. The creation of
CHIPS has been motivated by both clinical and researchneeds. On the clinical side,
CHIPS was built to provide clinicians witheasy access to large amounts of data (especially from hospital imagedatabases like Picture Archive and Communications Systems – PACS),to provide for powerful collaboration, and to allow for easy access to alibrary of analysis processes or pipelines. On the research side,
CHIPS was designed to allow computational researchers to test and developnew algorithms for image processing across heterogeneous platforms,while allowing life science researchers to focus on their research protocolsand data processing, without needing to spend time on the minutiae ofperforming data analysis.The system design is highly distributed, as shown in Figure 1, whichshows a
CHIPS deployment connected to multiple input sources andmultiple compute sources. Though the figure suggests a single, discretecentral point, components of
CHIPS do reside on each input (PACS)and compute location.
Fig. 2:
The internal
CHIPS logical architec-ture.
Architecturally
CHIPS is not a single monolithic system, but a dis-tributed collection of interconnected components, including a front-endwebserver and web-based UI; a core RESTful back-end central serverthat provides access to all data, feeds, users, etc; a DICOM/PACS in-terface; a set of independent RESTful microservices that handle inter-network data IO and also remote process management, and a core cloud-based computational platform that orchestrates offloading of image pro-cessing pipelines to some remote cloud-based compute – see Figure 2.The top the red box of Figure 2 contains the
PACS node and repre-sents the Hospital image data repository. The second blue box, labeled
Web-entry point and data hosting node contains the main
CHIPS back-end and is presented as being in a “cloud” (i.e. some resource that isaccessible from the Internet). Finally, the bottom yellow box is shownon a separate “cloud” to emphasize that it is topologically distinct fromthe
Web-entry point .The logical relationships between data (represented as the rectan-gles with a tree structure) and compute elements denoted by the namedhexagons is shown by either data connectors (thick blue arrows) or con-trol connections (single line arrows). In the syntax of the diagram, thestylized cloud icon touching some of the boxes denotes that these com-pute elements are controlled by a REST API, while the sphere icondenotes web-access.An remote compute is denoted by plugin , which is controlled by a manage component. In the most abstract sense, the plugin processesan input data structure, and outputs a transformed data structure (thetwo tree graphs as shown). File transfer between the data cloud and compute cloud is performed by the file IO handler component. A query/retrieve process in the data cloud connects to an authentication process, auth in theHospital network, while on-the-fly anonymization of DICOM images is handled by process anonymizer anon . Finally http://fnndsc.babymri.org http://goxtk.com the dispatcher is a component that determines what compute node (or cloud) is best suited for the data analysisat hand. The circle icon attached to the manage and plugin icons implies the attached process and can providereal-time feedback information to other software agents about the controlled process via its own REST interface. CHIPS is designed as a distributed system, and the underlying components are containerized (currently using docker .In Figure 2, the Main CHIPS web interface and associated backend database is housed within a single container .Input data and processed results are accessible in the hosting node and volume mapped as appropriate to this backend. Other components of CHIPS in the web-entry node are similarly containerized. This includes the manage block,which is responsible for spawning processes on the underlying system. Not only does manage provide the means tostart and stop processes, but it also tracks the execution state, termination state, and standard output/error streamsof the process. The manage component has a REST interface through which clients can start/stop and query processes.Also containerized is the IO component that can transfer entire directory trees across network boundaries fromone system to another as well as the dispatch component that can orchestrate multiple processing jobs as handledby manage . The plugin container houses the particular compute to perform on a given set of data, and is spawnedby the manage component under direction of the dispatch . Since the compute typically occurs on a separate systemto the data hosting node, the IO containers perform the necessary transmit of data to this compute system, as wellas the retrieve of resultant data back to the data node, allowing the web container to present (and visualize) resultsto the user. Fig. 3:
CHIPS home page with a “cards” organi-zation. Figure 3 shows the home page view on first logging into the sys-tem. Studies that have been “sent” to
CHIPS appear in theirown “cards” on the user’s home page with a small visualizationof a represented image set of the study. Various control on thishome page allow users to organize/tag “cards” in specific projects(or folders), remove cards, bookmark for easy access, etc. Newcards can be generated by clicking on the + (cid:13) icon and choosingan activity (such as PACS Query/Retrieve), and any card can beseamlessly shared with other users of the system.On selecting a given feed, the core image data in that feedis visualized in a rich, web-based viewer – see Figure 4. Varioustabs and elements of the feed view provide different perspectiveson the data, and also provide the ability to annotate notes, oradd comments. As in the feed view, a + (cid:13) icon is also present,and if selected, opens a ribbon of “plugins” (or “apps”) to run on the data contained in the feed. For example,certain plugins might perform a surface reconstruction of the brain surface with tissue segmentation (for example, aFreeSurfer plugin). Fig. 4:
Visualizing pulled and processed data.
The interface semantics within a feed are straightforward: auser clicks on the feed and enters the top level data view. Oncea plugin from the + (cid:13) is applied, the feed data is processed ac-cordingly. When the plugin is completed, its output files are alsoorganized in the feed in a logical tree view (accessible via the left”Data” tab) in a manner akin to an email thread. In this manner,the thread of execution from data → plugin → data is defined –in effect building a workflow.Any image visualized can also be shared in real-time usingcollaboration features built into the viewer library and leveragingthe Google Drive API and Google Realtime API [2]. https://github.com/FNNDSC/ChRIS_ultron_backEnd https://github.com/FNNDSC/pman https://github.com/FNNDSC/pfioh https://github.com/FNNDSC/swarm Fig. 5:
Big data pre-processing.
An important component of
CHIPS lies increating a foundation suitable for future sup-port of “data mining”. Recently, the term
BigData has come into common parlance, espe-cially in the context of informatics [13,16,7].Despite the term and the use of
Big , the con-cept often refers to the use of predictive ana-lytics and other advanced data analytics toolsthat extract meaning from sets of data anddoes not necessarily to the particular size ofthe data set.In healthcare, big data analytics has im-pacted the field in very specific areas suchas clinical risk intervention, waste and carevariability reduction, and automated report-ing. However, as a field, biomedical imaginghas not especially benefited from big data ap-proaches due to the unstructured nature ofimage data, complexity of results from analy-sis in terms of data formats (again usually unstructured), simple quality issues such as noise in image acquisitions,etc.
CHIPS constructs a framework to allow big data methods to be used in this image space. Consider that theincoming source data to
CHIPS are DICOM images that by their nature contain a large amount of meta informa-tion, most of which is non PHI and will be left unchanged by the anonymization processes. Information about thescanning/imaging protocol, acquisition parameters, as well as certain non-PHI demographics such as patient sex andage can be meaningfully databased. Moreover, the application of an analysis pipeline to an image data-set can inturn result in large amounts of meaningful data that can be databased and associated with the incoming source data.For example, FreeSurfer, which is dockerized as a plugin in the
CHIPS system produces volumetric segmentationsand surface reconstructions on raw input MRI T1 weighted data [3,5,1].In Figure 5 input raw DICOM (purple block) and output processed data from the DICOMs (green block) areshown. A
DICOM tag extraction process removes the image meta data and associates this information with theparticular image record. DICOM data is regularly formatted and easily extracted. Importantly, for the output data,and assuming the output data is a 3D surface reconstruction and tables of brain parcellation volume values, a structured analysis process regularizes all this information into meta data that will be added to the space of datapertaining to this image record. This processing will lay the ground work on which data analytics can explore andmine for relations between (for example) input acquisition parameters and pipeline output results, or simply mineacross output results for hidden trends in data trajectories (for example volumetric changes with age or sex).
CHIPS is a distributed system that provides a single, cloud-based, access point to a large family of services. Theseinclude: (a) accessing medical image data securely from participating institutions with authenticated access andbuilt-in anonymization of collected image data; (b) organizing collected data in a modern UI that allows for easydata management and sharing; (c) performing processing on images by dispatching data to remote clouds andcontrolling/managing remote execution on these resources; (d) powerful real-time collaboration on images usingsecure third party services (such as the Google RealTime API); and intuitively constructing medical image processingworkflows.
CHIPS is not only a medical data management system, but strives to improve the quality of healthcareby allowing clinical users the ability to easily perform value added processing and sharing of data and information.Current and future directions for
CHIPS include facilitating the construction of big-data frameworks and allowingfor users to simply construct experiments for data analytics and various machine learning pipelines.All analysis and development conducted by the
CHIPS system at the Boston Children’s Hospital was conductedunder relevant Institutional Review Board approval, which governed access to image data and controlled the scopeof sharing of such data.
References
1. FreeSurfer. http://surfer.nmr.mgh.harvard.edu/
2. Bernal-Rusiel, J.L., Rannou, N., Gollub, R., Pieper, S., Murphy, S., Robertson, R., Grant, P.E., Pienaar, R.: Reusableclient-side javascript modules for immersive web-based real-time collaborative neuroimage visualization. Frontiers in Neu-roinformatics 11, 32 (2017)3. Dale, A.M., Fischl, B., Sereno, M.I.: Cortical Surface-Based Analysis – I. Segmentation and Surface Reconstruction.NEUROIMAGE 9, 179–194 (1999)4. Eckersley, P., Egan, G.F., De Schutter, E., Yiyuan, T., Novak, M., Sebesta, V., Matthiessen, L., Jaaskelainen, I.P.,Ruotsalainen, U., Herz, A.V., et al.: Neuroscience data and tool sharing. Neuroinformatics 1(2), 149–165 (2003)5. Fischl, B., Sereno, M.I., Dale, A.M.: Cortical surface-based analysis II: Inflation, flattening, and a surface-based coordinatesystem. NeuroImage 9, 195–207 (1999)6. Ginsburg, D., Gerhard, S., Calle, J.E.C., Pienaar, R.: Realtime visualization of the connectome in the browser using webgl.Frontiers in Neuroinformatics (2011)7. Greene, C.S., Tan, J., Ung, M., Moore, J.H., Cheng, C.: Big data bioinformatics. Journal of Cellular Physiology 229(12),1896–1900 (2014), http://dx.doi.org/10.1002/jcp.24662
8. Haehn, D., Rannou, N., Ahtam, B., Grant, E., Pienaar, R.: Neuroimaging in the browser using the x toolkit. In: Frontiersin Neuroinformatics Conference Abstract: 5th INCF Congress of Neuroinformatics (Munich) (2014)9. Haehn, D., Rannou, N., Grant, P.E., Pienaar, R.: Slice:drop: Collaborative medical imaging in the browser. In: ACMSIGGRAPH 2013 Computer Animation Festival. pp. 1–1. SIGGRAPH ’13, ACM, New York, NY, USA (2013), http://doi.acm.org/10.1145/2503541.2503645
10. Khan, F., Foley-Bourgon, V., Kathrotia, S., Lavoie, E., Hendren, L.: Using javascript and webcl for numerical compu-tations: A comparative study of native and web technologies. In: ACM SIGPLAN Notices. vol. 50, pp. 91–102. ACM(2014)11. Millan, J., Yunda, L.: An open-access web-based medical image atlas for collaborative medical image sharing, processing,web semantic searching and analysis with uses in medical training, research and second opinion of cases. Nova 12(22),143–150 (2014)12. Mwalongo, F., Krone, M., Reina, G., Ertl, T.: State-of-the-art report in web-based visualization. In: Computer GraphicsForum. vol. 35, pp. 553–575. Wiley Online Library (2016)13. Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Big Data 1(1),51–59 (Feb 2013), http://dx.doi.org/10.1089/big.2013.1508
14. Rex, D.E., Ma, J.Q., Toga, A.W.: The LONI Pipeline Processing Environment. Neuroimage 19(3), 1033–1048 (Jul 2003),
15. Sherif, T., Rioux, P., Rousseau, M.E., Kassis, N., Beck, N., Adalat, R., Das, S., Glatard, T., Evans, A.C.: Cbrain: a web-based, distributed computing platform for collaborative neuroimaging research. Frontiers in neuroinformatics 8 (2014)16. Swan, M.: The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 1(2), 85–99(Jun 2013), http://dx.doi.org/10.1089/big.2012.0002http://dx.doi.org/10.1089/big.2012.0002