Freecyto: Quantized Flow Cytometry Analysis for the Web
Nathan Wong, Daehwan Kim, Zachery Robinson, Connie Huang, Irina M. Conboy
FFreecyto: Quantized Flow Cytometry Analysis forthe Web
Nathan Wong , Daehwan Kim , Zachery Robinson , Connie Huang , and Irina M.Conboy Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA 94720, USA * [email protected] * [email protected] ABSTRACT
Flow cytometry (FCM) is an analytic technique that is capable of detecting and recording the emission of fluorescence and lightscattering of cells or particles (that are collectively called “events”) in a population . A typical FCM experiment can producea large array of data making the analysis computationally intensive . Current FCM data analysis platforms (FlowJo , etc.),while very useful, do not allow interactive data processing online due to the data size limitations. Here we report a moreeffective way to analyze FCM data. Freecyto is a free, easy-to-learn, Python-flask-based web application that uses a weightedk-means clustering algorithm to facilitate the interactive analysis of flow cytometry data. A key limitation of web browsers istheir inability to interactively display large amounts of data. Freecyto addresses this bottleneck through the use of the k-meansalgorithm to quantize the data, allowing the user to access a representative set of data points for interactive visualizationof complex datasets. Moreover, Freecyto enables the interactive analyses of large complex datasets while preserving thestandard FCM visualization features, such as the generation of scatterplots (dotplots), histograms, heatmaps, boxplots, aswell as a SQL-based sub-population gating feature . We also show that Freecyto can be applied to the analysis of variousexperimental setups that frequently require the use of FCM. Finally, we demonstrate that the data accuracy is preserved whenFreecyto is compared to conventional FCM software. Keywords
Flow cytometry, Big data analysis, Web application, Machine learning, Unsupervised learning, Data Quantization, Softwaredevelopment
Introduction
Flow cytometry is broadly used in biomedicine, which is exemplified by identification of protein marker expressions ,determinations of cell-fate and cell cycle progression , analysis of pathology-caused changes, e.g. cancer promoted,immune-skewing, etc. , testing therapeutic efficacy of a treatment , and, more recently, gene-editing detection workflows .A common experimental setup in biomedicine relies on being able to identify specific changes between a control and anexperimental cell population. The changes between control and experimental cohorts are often determined throughfluorescently tagged antibodies that are specific for given proteins; and the fluorescence is examined by microscopy and/or highthroughput screening using a flow cytometer .Successful FCM experiments rely on the accuracy and resolution of the data analysis, e.g. the performance of the FCMsoftware that provides quantitative outputs for large numbers of events . In FCM analysis, an event is constituted by thecytometer’s detection of fluorescence emission and/or light scatter signals from a single cell or particle that passes through themicrofluidic flow chamber. With thousands of these events, individual measures of fluorescence, size and granularity areproduced, and to add complexity, these measurements can be deliberately modified by a researcher through the instrumentsetup, which can be changed from run to run . FCM analysis, thus, becomes a computational and statistical challenge thatproduces meaningful data only if the analysis is adequate for the experimental complexity. Inherent in this requirement, thedatasets that are produced with the conventional FCM software (FlowJo , Cytobank , OpenCyto , and Webflow ) aretypically quite large, which complicates their interactive web analyses.In this work we developed a new FCM software that facilitates the FCM data analysis, while maintaining the accuracy andresolution of the data. In fact, analysis of flow cytometry experiments, despite having tens of thousands of data points, can beperformed and visualized on a mobile device. Importantly, while simplifying the data analysis and having the intuitive work a r X i v : . [ q - b i o . Q M ] N ov ow, Freecyto preserves the key features of traditional FCM software, such as scatterplots (dotplots) of two different emission,histograms of a fluorescent emission measurement , the side-by-side comparison of the results between the control andexperimental populations and gating on sub-populations of cells.Similarly to FlowJo , Cytobank , OpenCyto , and Webflow , Freecyto supports machine learning applications, but it doesnot require the installation of specific software packages (often OS-dependent), a detailed understanding of the softwareworkflow, or extra layers of complexity in displaying, interacting, and sharing the FCM analysis with other researchers.Additional features of Freecyto are robust data-management and data-sharing: Freecyto is built on a secure centralized databasemanagement system, allowing for data to be stored remotely and analyses to be shared and edited by anyone, yet it maintainsthe safeguard of proper permissions. Notably, the decisions on instrument settings (such as, changing the gain and signalintensity) and experimental set-ups (for instance, additional runs of certain cohorts) become better informed - based on realtime user-friendly data analysis.A key feature of Freecyto is the k-means clustering algorithm in which data points are clustered together into k clusters basedon a Euclidian distance metric. This use of k-means algorithm as a method of data quantization is distinct from the flowcytometry studies, which use clustering algorithms to analyze the data . Freecyto, in contrast, uses k-means to create areduced, representative dataset of the original, so that the user can have much greater capability in analyzing the data, such asapplying the stated clustering algorithms to the data. The original data is then reduced to the centers of the clusters, allowingthe user to gate interactively on these centers. We show that FCM data analysis remains faithful when Freecyto is compared tothe conventional FlowJo software.By focusing and quantizing the data, Freecyto offers a better control over the analysis of FCM experiments, increasing thecomputational feasibility of any and particularly, very large datasets. Because of the high dimensional nature of flow cytometrydata and the increasing technological developments in flow cytometers which have pushed the number of parameters and thesheer volume of data ever higher, there is a greater need for FCM software to handle increasingly large data sets . Freecytowas developed to address this challenge. Results
K-means Quantization
While the quick visualization capabilities are sufficient for most basic flow cytometry operations, a more detailed study mayrequire additional specialized functions, such as sub-population gating and quadrant (coordinate-system) gating. Having datasets on the magnitude of 10 or 10 events, presents a significant challenge to interactively plot these on the web. In the case ofgating, having tens of thousands of points that users can lasso-select on the web is virtually impossible for personal computersand standard web browsers. Freecyto solves this problem by introducing a k-means clustering algorithm for quantizing theinput data (Figure 1).First, after running the k-means clustering algorithm, the centroids are used to construct a Voronoi diagram. Thus, the originaldataset is partitioned into Voronoi cells, and each cell contains all the original points that belong to that cluster. Following, foreach Voronoi cell, the variance is computed, with the centroid used as the mean of the geometric space. Finally, thewithin-cluster variance is plotted as a colormap within the Voronoi diagram to portray which cells contain more of theunderlying variance, and the variance is summed up across all Voronoi cells to portray the elbow at which minimalwithin-cluster variance is lost with respect to the increase in computation power due to increasing the number of clusters.K-means clustering (implemented with Lloyd’s algorithm, clusters initialized with kmeans++ with a default seed) is anunsupervised machine-learning algorithm that is used to identify clusters of points based on each point’s distance from thecenter of a proposed cluster. Freecyto runs this algorithm on the user-selected channels, identifying a pre-defined number ofclusters, and storing only the centers of these clusters. The number of clusters is either user-selected (if running locally) orapproximated automatically as a range between 250 and 5000 based on the size of the dataset. This simplifies the conventionalk-clustering approach and enables future development of more suitable algorithms to determine k . Freecyto’s applicationof k-means clustering quantization vastly reduces the complexity of the flow cytometry data, without significant loss to thevariability within the original dataset as we will show in the next section. The reduced dataset that is generated is highlysuitable for downstream statistical analysis, such as hierarchical clustering or dimensionality reduction to identifysub-populations of cells (Supplemental Figure 5). igure 1. K-means Workflow in Freecyto. (A)
The process by which the original dataset is quantized, and how manualgating works on a shared data source. (B)
The principles behind k-means quantization, and the Voronoi diagram computedfrom the reduced dataset projected on the original dataset. igure 2.
K-means Within-Cluster Variance Visualization of Synthetic Datasets. (A)
Original spiral data (N=5000). (B)
Cluster centers with Voronoi cells outlined. (C)
Within-cluster variance of each Voronoi cell with increasing k, and byextension, the MSE in each cluster identified by k-means. (D)
Trend of increasing clusters and the average within-clustervariance of each cluster. (E)
Original bimodal data (N=10000). (F, G, H)
Cluster centers and variance loss in each Voronoi cellwith increasing k. idelity of Data Quantization in Interactive Analysis.
To quantitively examine the quality of our reduced data set, we compute the mean-squared error (MSE) of each cluster. For thek-means algorithm, this is equivalent to computing the within-cluster variance of each cluster, because the predicted clustercenter is the mean of all points in that cluster. The MSE of each cluster, as visualized by Voronoi cells, is then mapped to acolor range to depict how faithfully each cluster center captures the other points in that cluster. In Figure 2C and 2G, it’s shownthat with increasing k, the lower the MSE for each cluster. Finally, the average of all the MSE for all clusters is computed (2Dand 2H) to show that the data lost in each cluster center decreases rapidly in exchange for smaller increases in the number ofclusters chosen.The quantized data can then be plotted interactively through Bokeh on a webpage and downloaded as a SQL database withinthe web application. In this interactive analysis portion, each flow cytometry data file is treated as a shared data source, thus inFreecyto the user can lasso-select a sub-population of cells that are displayed in a scatterplot graph or a fluorescence channeland observe the quantized data for that sub-population of cells in the other FCM channel(s). This Freecyto feature allows theuser to quickly and with more precision determine how the size of the cells or a signal for a specific marker (cell-fate protein,for example) is related to other markers (transgene expression, for instance) for each cell in the studied population. Demo:(3:07 – 6:20)One key question is whether our method of k-means clustering qualitatively maintains the accuracy and resolution of the data.To address this, we compared side-by-side Freecyto and the conventional FCM software FlowJo in the analysis of GFP positivecells in a population and in studying cells in early and late stages of apoptosis (e.g. AnnexinV-7AAD and co-stain). Here weused Freecyto modality for such a common feature of FCM as a coordinate system gating to identify the percentage of cellslocated within certain thresholds. As shown in Figures 3 and 4, Freecyto was as accurate as FlowJo in the resolution of thesedata sets, at the same time preserving the key features of FCM software, such as allowing the user to specify fluorescencethresholds and visualize and quantify the percentage of cells located in these quadrants (Figures 3, 4).Moreover, Freecyto generated quantized data points are stored in an SQLite database - essential to the deep gating tool. Thedeep gating tool allows the user to lasso-select a sub-population of cells and graphically display only the gated cells for alladvanced analysis operations. This is useful in narrowing the analysis to specific sub-populations, as well as identifying outliersin the dataset. This deep-gating function can be applied as many times as needed, and all deep-gates can be reset by pressingthe reset-gating button, after which the visualization and quantification of the results will reflect the original, unaltered dataset(Figures 3, 4). Both the results of the k-means quantization and the sub-populations identified from manual gating can bedownloaded directly in the application.To comparatively analyze the accuracy and capabilities of Freecyto and FlowJo, WT and GFP+ cells were mixed at fivedifferent ratios, 100:0, 75:25, 50:50, 25:75, and 0:100, WT:GFP+; and run on Guava Easycyte Flow cytometer(Millipore-Sigma). The data was analyzed by FlowJo and Freecyto in parallel. As a result, the number of GFP positive cellsincreased linearly from 100:0 WT/GFP+ to 0:100 WT/GFP+, as expected, which was accurately detected by both FlowJo andFreecyto.To compare Freecyto and Flowjo in another commonly analyzed by Flow Cytometry assay – cell apoptosis, IMR90 humanfibroblasts were treated (or not) with hydrogen peroxide, H O , at 200 µ M for 24h to induce apoptosis. The cells were assayedwith Annexin V and 7-AAD and run on the Guava Easycyte Flow cytometer (Millipore-Sigma). The results were analysed withFreecyto, yielding accurate and visually clear data. The negative control, isotype-matched IgG fluorescence was used to set upthe quadrant, Figure 4A. Early apoptotic cells positive for Annexin V can be seen in the top left quadrant and late apoptoticcells positive for both Annexin V and 7-AAD in the top right quadrant. As expected, Freecyto shows the number of Annexin Vpositive cells, Figure 4B. The number of cells in early and late stages of apoptosis were increased with H O , as compared tothe untreated control, Figure 4C. In summary, the analysis of apoptosis (Annexin V and 7ADD assay) yields the predictedresults and is as accurate and sensitive with Freecyto as it is with Flowjo. Web (Uwsgi-flask-nginx) application to allow platform-agnostic, mobile-ready access to flow cytometryanalysis
Several core technologies are deeply integrated into Freecyto in order to allow seamless processing and visualization of flowcytometry data. Chiefly, the integration of these technologies allows for robust storage of user data, high-throughput handlingof the data, e.g. processing operations, and interactivity of the data visualizations.Computationally expensive operations in flow cytometry, including reading and parsing data, performing visualizations, andobtaining sample statistics, are all performed server-side in Freecyto. Freecyto is hosted as a Python-flask-uwsgi-nginxapplication on a Digital Ocean server. igure 3.
Analysis of GFP positive and negative cell populations. (A) (B)
The same 50:50 GFP transgenic cell ratios with the coordinates gated byFlowJo. (C)
Compares Freecyto and FlowJo measurements of GFP+ cells for 100:0, 75:25, 50:50, 25:75, and 0:100 ratios. (D)
Density plot created by Freecyto which outlines the density of cells after the k-means quantization is performed with 250clusters. (E)
MSE of each cluster with varying values of k. (F)
The resulting density plot with varying values of k. igure 4.
Analysis of Apoptosis.
IMR90 cells were treated with hydrogen peroxide, H O , at 200 µ M for 24h to induceapoptosis. The cells were then stained with Annexin V and 7-AAD. Early apoptotic cells are positive for Annexin V and areseen in the top left quadrant (Q1) and late apoptotic cells, which are positive for both annexin and 7-AAD are seen in the topright quadrant (Q2). Live cells are negative for both stains (Q4). (A)
Negative control: Isotype-matched IgG staining (1stantibody) + secondary (FITC). (B)
Untreated group. (C) H O treatment group. s e r F un c t i on s A u t ho r i z a t i on and C a c h i ng C a ll ab l e AP I S e r v e r Upload new flow cytometry experimentsAuthorized?Query and update saved experimental analysis Query server to analyze, store, and expose data for quick visualiation tasks, such as t-SNE, KDE plots, and histograms, for the selected channels
Freecyto Application Workflow
User Login or Account Creation
Yes Google Firebase APISaved data repositoryNo
Python code to normalize data, compute statistics, generate static visualization images, and export raw data to Excel file
Store analysis
View and download completed quick visualizations and raw dataChange fluorescence channels for analysis
Yes No
Run 'advanced analysis' for interactive analysis, and downloaded quantized dataset and gated subpopulations
Cache busting by appending random hash to repeated documents
Query server to perform input data quantization and relevant downstream analysis, for the selected channels.Python code to perform k-means clustering for data quantization, create interactive HTML documents for gating analysis, and export quantized and gated data to SQLite database
View and share previously performed analysis
Figure 5.
Freecyto Application WorkflowWhile most flow cytometry tools have unique requirements depending on the user’s operating system (OS), applicationdependencies (a specific version of python packages), or computational resources (i.e. four CPU cores), Freecyto can beaccessed without platform restrictions and dependencies. This application also is designed to be mobile-compatible, allowingusers to access their flow cytometry analysis and also perform new flow cytometry analysis directly on their mobile devices(Figure 5).In addition, Freecyto can be downloaded as a Flask application (open-source), so that users can install the appropriatedependencies and run the application on a local intranet (useful if users desire a stricter control of Flow cytometry data privacy).This also allows for greater control over default parameters and application modules, such changing the number of reduced datapoints used in interactive analysis and implementing a clustering model on top of the reduced data set (Figure 5).Demo: (0:00 – 1:00)
Parallel processing (multiprocessing) of computationally intensive analysis functions
Freecyto integrates advances in multiprocessing functionality in order to speed up traditionally expensive FCM data analysisoperations. Multiprocessing is implemented when users upload multiple files, when visualizations are performed, and when thek-means algorithm is running. These operations are asynchronously performed on the server-side, speeding up the time it takesfor the user to receive analyses outputs from their data by an order of magnitude. Through the implementation of thismultiprocessing a side-by-side over five files upload becomes possible (Supplemental Figure 3). ser data management and authentication
Google Firestore/Datastore is integrated to store references to previously performed visualization operations. For example, theimages that are generated from an experimental upload are stored in a unique directory on the server, and the references to thegenerated images are stored in a collection as a unique entry under the user account in Google Firestore. This preventsredundant analysis operations (i.e. the user uploads the same experimental files), yet, it allows the user to access the previouslyperformed operation. A sortable table of previously performed experiments (50 most recent) are listed in the user home page,allowing the user to easily access previously analysed flow cytometry results.Firebase and Google identity platform: Google and Email logins are enabled, allowing the user to create and access their useraccount with these authentication methods. This prevents unauthorized usage of the application, requiring the user to create anaccount before accessing the analysis toolkit. To promote scientific knowledge and collaborations, sharing the results of a flowcytometry experiment on Freecyto merely requires sharing the URL of the experiment. Demo: (1:00 – 1:30)
Side-by-side experiment comparisons (multiple file upload)
Freecyto supports user upload of multiple flow cytometry files as a result of the multiprocessing pipeline. For normalization ofthe raw input files, the user may select hyperlog, logicle, or no transformation to be applied. Logicle and hyperlogtransformations normalize the flow cytometry data by transforming most events (including negatively measured values) to anormalized fluorescence value of between 0 and 1 . This improves on traditional free flow cytometry analysis applications,which limit the user to uploading only a single flow cytometry file at a time, though many flow cytometry experiments haveanywhere from 2 to 10+ files to analyse. Freecyto’s approach allows the user to upload numerous files concurrently, enablingplots to be overlaid for easy and clearly visualized comparison between the datasets. In another feature of Freecyto, if overlaysmake it harder to discern the individual plots, then individual files can also be graphed and visualized. Demo: (1:30 – 2:00) Quick visualization capabilities
Freecyto is built on the principle that FCM analysis should be easy to perform and that real-time data processing expands theresearch capabilities in acutely and accurately modulating the FCM experiments. Freecyto’s pipeline achieves this by quickvisualization of the scatterplots, density-estimation plots, histograms, box-whisker diagrams, and correlation tables, which aregenerated by Freecyto based on the selected fluorescence channels. In addition, t-SNE plots allow users to visualize segregatingfeatures of the data. The images and relevant statistics are displayed through a carousel slider (Siema) and a table respectively.It is integral to flow cytometry analysis to allow users to select the fluorescence channels they wish to visualize. Freecytoaccomplishes this with a simple checkbox list of all possible channels. The user selects the channels they wish to visualize,presses “submit,” and the images automatically update to match the desired fluorescence channels to visualize. This pipeline isdesigned to be minimalistic – it allows the user to quickly determine how their data looks, offering enough modularity tofacilitate the most common flow cytometry analysis operations. In addition, the converted flow cytometry data can bedownloaded as an Excel spreadsheet. Demo: (2:00 – 3:07)
Discussion
Freecyto was developed as a new data processing software for Flow Cytometry data and validated for enhancing the speed,convenience, and machine learning capacity of the FCM data analysis, while preserving the accuracy. These features werevalidated in key FCM set-ups of studying sub-populations with variable expression of a transgene, and in viability-apoptosisstudies. Summarily, the use of our weighted k-means clustering algorithm innovated FCM data analysis and transformed it intoa simple, easy to use online platform.Freecyto offers all the necessary features to perform typical FCM analyses, in addition to providing the user interactive analysisof the data and it fills a niche when compared with other FCM software (Table 1). Freecyto is a flexible platform that allowsmodifications. For example, Opencyto allows users to create automated gating pipelines in R which may solve the subjectivityand time-consuming nature of manual gating and such a feature is very compatible to build on top of Freecyto’s existingframework . Freecyto does not innovate the existing flow cytometry analysis, instead it innovates the approach to suchanalyses, thereby improving on the ease and accessibility of FCM data, while also providing greater flexibility and control ingating large datasets, through the quantizing of the data with a weighted k-means clustering algorithm. eature Freecyto Opencyto Cytobank FlowJoWhat is it? Python web application R software package Cloud-based web server Software package (OS de-pendent)
Summary
K-means algorithm al-lows interactive gatingbetween any combinationof channels in side-by-side graphs Pipeline for automatedgating algorithms (as op-posed to manual gating) Specialized service thatuses many different toolse.g Citrus to performFCM analyses Automation of repeatedanalyses, customizabledata visualizations Free-to-use
Yes Yes No No
Requiressoftwaredownload
No Yes No Yes
Straight-forwarddata analy-sis sharing
Yes No Yes No
Beginner-friendly
Yes No No No
Mobile-compatible
Yes No No No
Table 1.
Comparing Freecyto with other flow cytometry applications.
Conclusions
FCM analysis is essential for a broad range of biomedical studies, many of which are directly and critically important forhuman health. Freecyto allows for the streamlined, fast, facile, user-friendly and easy to share analysis of multiple FCMexperiments in parallel, harnessing the transmissibility of internet ease-of-use to power and serve its analytical platform.Whereas many FCM analysis packages are expensive, require software/OS dependencies, or have a significant learning curve,Freecyto is free, web-based, and easy to use, and while simplifying FCM studies, Freecyto improves the processing ofhigh-volume data and facilitates the real-time data analysis.As flow cytometry development continues to improve, the need for indexing and manipulating large quantities of scientific datacannot be understated. Freecyto integrates state-of-the-art data storing and indexing features with Google Cloud, creating aninterface for users to have greater confidence and connectivity with their flow cytometry data. In this regard, our k-meansquantization approach might be broadly useful and important not only in FCM, but more broadly, for Big Data analysis inomics, medical data for machine learning and AI, computer vision, environmental engineering, etc. large data realms.
Materials and Methods
Data Visualization
Several Python packages were used in creating this application. Flask was used to serve the web application. GoogleIdentity (Firebase) was used to authenticate users, and Google DataStore was used to store references to previously performedexperiments. Pandas, NumPy, FlowUtils, and Cytoflow were used to dynamically store and transform the raw flow cytometrydata. Matplotlib, Seaborn, and Pandas were used to generate images of scatterplots, box-plots, heatmaps, and histograms. Thet-distributed stochastic neighbour embedding (t-SNE) projection was performed with Scikit-learn (sklearn) with perplexity of40. For the interactive analysis, sklearn was used for the weighted k-means clustering. SQLite3 was used to store clustereddata. Bokeh and Holoviews were used to display the interactive graphs. HTML5UP and Creative Tim Light Bootstrap Themeinspired the front-end template design of the web application.
Multiprocessing
Multiprocessing, assuming a multi-core machine, was implemented to speed up the data visualization algorithms. Chiefly,the results of a benchmark test on a quad-core, 8 GB RAM, 2.3 Ghz MacBook Pro are reported below for the static imagevisualizations, and for the interactive data analysis portions. eighted K-means Algorithm X = { x , x , ..., x n } such that every x i has d dimensions. Let Ω be a diagonal d x d matrix such that the diagonal entries are theweights of each dimension. k is the number of clusters we want to find. S is the set of all k clusters such that S = { S , S , ..., S k } .We want to minimize the loss function: arg min S k ∑ i = ∑ x ∈ S i ( x − µ i ) T Ω ( x − µ i ) In the default case, let the diagonal entries of Ω be 1 if the corresponding channel was selected for visualization, and 0 otherwise. Voronoi Diagram Algorithm X = { x , x , ..., x n } such that every x i has d dimensions. R is the set of all k Voronoi diagrams such that R = { R , R , ..., R k } and S is the set of all k clusters such that S = { S , S , ..., S k } . d is a distance metric, for which we used Euclidean distance. Wewant to find the region such that every point in the region is closest to the set of points described by the k-means clustering. R k = { x ∈ X | d ( x , S k ) ≤ d ( x , S j ) ∀ j (cid:54) = k } Or equivalently, because the distance of every point x in S k to it’s mean centroid µ k has already been minimized in the convergedk-means algorithm: ∀ x ∈ S k | d ( x , S k ) ≤ d ( x , S j ) ∀ j (cid:54) = k = ⇒ R k = { x ∈ S k } Web application (open-source) licenses • Advanced Analysis: Light bootstrap theme by Creative Tim: MIT Licensehttps://github.com/timcreative/freebies/blob/master/LICENSE.md• Lens by HTML5UP: Creative Commons 3.0 https://html5up.net/license• NumPy: https://github.com/numpy/numpy/blob/master/LICENSE.txt• SciPy: https://scipy.org/scipylib/license.html• Scikit-learn: https://scikit-learn.org/stable/• Pandas: https://github.com/pandas-dev/pandas/blob/master/LICENSE• Matplotlib: https://matplotlib.org/users/license.html• Bokeh: https://github.com/bokeh/bokeh/blob/master/LICENSE.txt• Holoviews: https://github.com/pyviz/holoviews/blob/master/LICENSE.txt• Flask: http://flask.pocoo.org/docs/1.0/license/• SQLAlchemy: https://docs.sqlalchemy.org/en/latest/copyright.html• Cytoflow: https://github.com/bpteague/cytoflow/blob/master/LICENSE.txt• FlowUtils: https://github.com/whitews/FlowUtils/blob/master/LICENSE
Myoblast cultures
Transgenic GFP+ and WT (C57.B6) mouse myoblasts were cultured in growth medium: Ham’s F10, 20% Bovine GrowthSerum and 5 ng/ml bFGF on 1 µ g/cm Matrigel. Cells were washed and detached with PBS (three 37C) and were pelleted bycentrifugation. Cells were pelleted and counted using a hemocytometer. ell culture and apoptotic assay
Normal human lung fibroblast cells (IMR-90) were obtained from ATCC . When cells were grown to 70%confluence, they were subcultured at dilution for later passaging.The apoptotic assay of IMR90 was conducted by Apoptosis Detection Kit (ab214663, Abcam) according to the manufacturer’sprotocol. Briefly, cells were detached using 0.05% trypsin and washed twice with PBS. Then, samples were resuspended in 1xannexin-binding buffer and incubated with 5 µ L Annexin V-FITC and 5 µ L 7-amino-actinomycin D (7-AAD) for 15 min at37°C, avoiding light. Finally, events were acquired with a Guava Easycyte Flow cytometer (Millipore-Sigma) and analysed byFreecyto and Flowjo software individually to quantify the distribution of cells.
Abbreviations
FCM : Flow cytometry
Event(s) : Emission(s) of fluorescence and light scattering of cells or particles t-SNE : Barnes-Hut approximation of t-distributed stochastic neighbour embedding
K-means : Lloyd’s Algorithm with Euclidean distances for k-means clustering (k-means++ is used for cluster center initializa-tion).
MSE : Mean squared error WT : Wild type GFP : Green fluorescent protein
IMR-90 : Human lung fibroblast cells
Data Availability
The datasets generated and/or analysed during the current study are available in the Freecyto Github repository,https://github.com/nathan2wong/freecyto/tree/master/datasets.Project name: FreecytoProject homepage: https://freecyto.comDemo: https://youtu.be/JlIVgxh4_YAArchived version: https://github.com/nathan2wong/freecytoOperating system(s): Platform independentProgramming Language: Python, JavaScriptOther requirements: Listed on GitHubLicense: BSD3Any restrictions to use by non-academics: License Needed
Acknowledgements
We would like to thank Alex Park for providing technical help with these studies, and Michael Conboy for the helpfulsuggestions on the work and the manuscript.
Funding
This work was supported by NIH R01 EB023776, R01 HL139605 and Open Philanthropy awards to IC, and the funds wereused to support the data collection of the study.
Author Information
Affiliations
Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA 94720, USANathan Wong, Daehwan Kim, Zachery Robinson, Connie Huang, and Irina M. Conboy ontributions
NW created the Freecyto software and wrote the manuscript. ZR provided figures, data, and analyses of the GFP cell experiment(Figure 3). DH provided figures, data, and analyses of the apoptotic cell experiment (Figure 4). CH provided figures, tables(Figure 1A, Table 1), and contributed code for downstream analysis in the Freecyto software. IC co-wrote the manuscript andcontributed to design of these studies. All authors read and approved the final manuscript.
Corresponding Author
Correspondence to Nathan Wong ([email protected]) and Irina Conboy ([email protected]).
Ethics Declarations
The authors declare no competing interests.
Additional Information
Necessary Resources
Freecyto is designed to be fully compatible with a standard user setup, and very little setup is required to begin using Freecytofor your flow cytometry needs.• A web browser with JavaScript enabled (Core functions in the interactive analysis portion require JavaScript to be fullyfunctional). Common browsers that satisfy this requirement include Google Chrome and Firefox. Mobile devices thathave a mobile web browsing application can also satisfy this requirement.• A valid Google ID or email address. This allows Freecyto to recognize the user and keep records of previous jobsperformed under this user ID.• A valid internet connection (HTTP, HTTPS) is required to access the online interface of Freecyto.
Walkthrough
To begin, navigate to freecyto.com. Note that several documentation options are available for viewing on the home page. Theseoptions include: (1) Detailed, feature-specific documentation, (2) Video run-through of the application, (3) Open-source licensesand attributions (4) Freecyto’s privacy policy, and (5) Login URL to access the Freecyto application interface. [SupplementalFigure 1]Next, press “advanced analysis” to access the interactive visualizations of the flow cytometry data. This is an example of theshallow gating feature, in which selecting a sub-population of cells will display that sub-population across all selectedfluorescence channels. [Supplemental Figure 2]
References O'Neill, K., Aghaeepour, N., Špidlen, J. & Brinkman, R. Flow cytometry bioinformatics.
PLoS Comput. Biol. , e1003365,DOI: 10.1371/journal.pcbi.1003365 (2013). Lugli, E., Roederer, M. & Cossarizza, A. Data analysis in flow cytometry: The future just started.
Cytom. Part A ,705–713, DOI: 10.1002/cyto.a.20901 (2010). Flowjo™ software. [software application] (2019). Ramel, S. et al.
Evaluation of p53 protein expression in barrett’s esophagus by two-parameter flow cytometry.
Gastroen-terology , 1220 – 1228, DOI: https://doi.org/10.1016/0016-5085(92)70016-5 (1992). Leith, C. et al.
Correlation of multidrug resistance (MDR1) protein expression with functional dye/drug efflux in acutemyeloid leukemia by multiparameter flow cytometry: identification of discordant MDR-/efflux+ and MDR1+/efflux- cases.
Blood , 2329–2342, DOI: 10.1182/blood.V86.6.2329.bloodjournal8662329 (1995). https://ashpublications.org/blood/article-pdf/86/6/2329/617651/2329.pdf. Rosner, M., Schipany, K. & Hengstschläger, M. Merging high-quality biochemical fractionation with a refined flowcytometry approach to monitor nucleocytoplasmic protein expression throughout the unperturbed mammalian cell cycle.
Nat. Protoc. , 602–626, DOI: 10.1038/nprot.2013.011 (2013). . Darzynkiewicz, Z. et al.
Features of apoptotic cells measured by flow cytometry.
Cytometry , 795–808, DOI:10.1002/cyto.990130802 (1992). Barlogie, B. et al.
Flow cytometry in clinical cancer research.
Cancer Res. , 3982–3997 (1983). https://cancerres.aacrjournals.org/content/43/9/3982.full.pdf. Keyes, T. J., Domizi, P., Lo, Y.-C., Nolan, G. P. & Davis, K. L. A cancer biologist's primer on machine learning applicationsin high-dimensional cytometry.
Cytom. Part A , 782–799, DOI: 10.1002/cyto.a.24158 (2020). Brando, B. et al.
Cytofluorometric methods for assessing absolute numbers of cell subsets in blood.
Cytometry , 327–346, DOI: https://doi.org/10.1002/1097-0320(20001215)42:6<327::AID-CYTO1000>3.0.CO;2-F (2000). https://onlinelibrary.wiley.com/doi/pdf/10.1002/1097-0320%2820001215%2942%3A6%3C327%3A%3AAID-CYTO1000%3E3.0.CO%3B2-F. Lugli, E., Troiano, L. & Cossarizza, A. Investigating t cells by polychromatic flow cytometry.
Methods molecular biology(Clifton, N.J.) , 47–63, DOI: 10.1007/978-1-60327-527-9_5 (2009).
Benedek, G., Meza-Romero, R., Bourdette, D. & Vandenbark, A. A. The use of flow cytometry to assess a novel drugefficacy in multiple sclerosis.
Metab. Brain Dis. , 877–884, DOI: 10.1007/s11011-014-9634-0 (2014). Hu, W. et al.
RNA-directed gene editing specifically eradicates latent and prevents new HIV-1 infection.
Proc. Natl. Acad.Sci. , 11461–11466, DOI: 10.1073/pnas.1405186111 (2014).
McKinnon, K. M. Flow cytometry: An overview.
Curr. Protoc. Immunol. , DOI: 10.1002/cpim.40 (2018).
Maecker, H. T. & Trotter, J. Flow cytometry controls, instrument setup, and the determination of positivity.
Cytom. Part A , 1037–1042, DOI: 10.1002/cyto.a.20333 (2006).
Kotecha, N., Krutzik, P. O. & Irish, J. M. Web-based analysis and publication of flow cytometry experiments.
Curr. Protoc.Cytom. , 10.17.1–10.17.24, DOI: 10.1002/0471142956.cy1017s53 (2010). Finak, G. et al.
OpenCyto: An open source infrastructure for scalable, robust, reproducible, and automated, end-to-endflow cytometry data analysis.
PLoS Comput. Biol. , e1003806, DOI: 10.1371/journal.pcbi.1003806 (2014). Hammer, M. M., Kotecha, N., Irish, J. M., Nolan, G. P. & Krutzik, P. O. WebFlow: A software package for high-throughputanalysis of flow cytometry data.
ASSAY Drug Dev. Technol. , 44–55, DOI: 10.1089/adt.2008.174 (2009). Murphy, R. F. Automated identification of subpopulations in flow cytometric list mode data using cluster analysis.
Cytometry , 302–309, DOI: 10.1002/cyto.990060405 (1985). Bruggner, R. V., Bodenmiller, B., Dill, D. L., Tibshirani, R. J. & Nolan, G. P. Automated identification of stratifyingsignatures in cellular subpopulations.
Proc. Natl. Acad. Sci. , E2770–E2777, DOI: 10.1073/pnas.1408792111 (2014).
Ye, X. & Ho, J. W. K. Ultrafast clustering of single-cell flow cytometry data using FlowGrid.
BMC Syst. Biol. , DOI:10.1186/s12918-019-0690-2 (2019). Ge, Y. & Sealfon, S. C. flowPeaks: a fast unsupervised clustering for flow cytometry data via k-means and density peakfinding.
Bioinformatics , 2052–2058, DOI: 10.1093/bioinformatics/bts300 (2012). Dorfman, D. M., LaPlante, C. D. & Li, B. FLOCK cluster analysis of plasma cell flow cytometry data predicts bonemarrow involvement by plasma cell neoplasia.
Leuk. Res. , 40–45, DOI: 10.1016/j.leukres.2016.07.003 (2016). Bendall, S. C. et al.
Single-cell mass cytometry of differential immune and drug responses across a human hematopoieticcontinuum.
Science , 687–696, DOI: 10.1126/science.1198704 (2011).
Mair, F. et al.
The end of gating? an introduction to automated analysis of high dimensional cytometry data.
Eur. J.Immunol. , 34–43, DOI: 10.1002/eji.201545774 (2015). Yuan, C. & Yang, H. Research on k-value selection method of k-means clustering algorithm. J , 226–235, DOI:10.3390/j2020016 (2019). Pham, D. T., Dimov, S. S. & Nguyen, C. D. Selection of k in k-means clustering.
Proc. Inst. Mech. Eng. Part C: J. Mech.Eng. Sci. , 103–119, DOI: 10.1243/095440605x8298 (2005).
Bagwell, C. B. Hyperlog?a flexible log-like transform for negative, zero, and positive valued data.
Cytom. Part A ,34–42, DOI: 10.1002/cyto.a.20114 (2005).
Moon, K. R. et al.
Visualizing structure and transitions in high-dimensional biological data.
Nat. Biotechnol. , 1482–1492,DOI: 10.1038/s41587-019-0336-3 (2019). upplemental Figures Supplemental Figure 1. Freecyto Quick Visualization Walkthrough.(A) Freecyto Homepage.
Navigate to freecyto.com and select login to continue. After clicking Login to access the Freecytoapplication interface, you need to create a new user account either through Google or email. If you already have an account onFreecyto, log in with those credentials. (B) Freecyto Login Page.
Create an account using a Google or Email ID. Once you have successfully logged in, you will beable to access your personal user portal. From here, you can see all past analyses that you performed (linked to your user ID).You can also sort and search past saved analyses and access visualizations of those analyses directly and quickly by clicking onthe corresponding link. (C) Freecyto User Portal.
View previously performed analyses and access the page to create a new job. New users will haveno previous experiments saved. However, each time the user uploads data or another user shares an experiment, the experimentwill be listed in the table of the home page. These experiments can be sorted, indexed, and accessed without needing to repeatpreviously performed analysis operations. To begin a new job, click “New Job” located in the left column of the dashboard.Next, upload any number of FCS files you wish to analyze. (D) Freecyto New Job.
Upload new FCS file(s) to begin a new analysis job. After the files have been uploaded, you will beable to access the quick visualizations page, in which the standard histograms, scatterplots, heatmaps, and box-whiskerdiagrams are displayed in a slideshow (image carousel) format. (E) Freecyto Quick Visualization.
View histograms, scatterplots, box-whisker diagrams, heatmaps of the uploaded flowcytometry data. You may also change the fluorescence channels displayed at this time, by scrolling to the bottom of the pageand selecting the new fluorescence channels to display. (F) Changing the quick visualization display options. upplemental Figure 2. Freecyto Interactive Analysis Walkthrough.
Next, press “advanced analysis” to access theinteractive visualizations of the flow cytometry data. This is an example of the shallow gating feature, in which selecting asub-population of cells will display that sub-population across all selected fluorescence channels. (A) Freecyto Interactive Shallow Gating.
Shallow gating to see associated fluorescence values of a selected region.Coordinate gating analysis can then be performed to determine the percentage of cells that are located within or outside thebounds of preset x and y values. (B) Freecyto Interactive Coordinate Gating Display.
Gate flow cytometry experimental files based on specific X and Yvalues and see the percentage of cells within and outside these regions. Deep gating can also be performed to specificallyexamine sub-populations of cells. (C) Freecyto Interactive Deep Gating Display (Before).(D) Freecyto Interactive Deep Gating Display (After). upplemental Figure 3. Multiprocessing vs No Pipeline.
Plots show the time taken to process files when usingmultiprocessing vs. no multiprocessing for (A)
Quick visualization and for (B)
Advanced visualization. upplemental Figure 4.
Some advantages of the Freecyto analysis include multiple file upload and quick data visualization. (A) Multiple File Upload.
You can upload multiple files here and customize available settings, such as t-SNE and KDEvisualizations with the option of various transformations. (B) Quick Visualizations.
You now have access to many different visualizations of your uploaded data, including histograms,kernel density plots, and heatmaps. upplemental Figure 5.
Downstream analysis of flow cytometry experiments. (A) Visualizing the local structure of the 50:50 WT/GFP+ experiment.
Ward hierarchical clustering is performeddownstream of the k-means quantization on the spearman correlation matrix of the Green Fluorescence and Side Scatterchannels. We find the 2 distinct sub-populations as expected from this experiment. (B) Dimensionality reduction comparison.
Various dimensionality reduction techniques (PCA, tSNE, PHATE ) wereperformed on the same downstream data, but with all 15 channels selected as features. As expected, 2 distinct sub-populationswere noted in each of these methods.) wereperformed on the same downstream data, but with all 15 channels selected as features. As expected, 2 distinct sub-populationswere noted in each of these methods.