Leveraging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities
-- Journal manuscript No. (will be inserted by the editor)
Leveraging Big Data Analytics in Healthcare Enhancement:Trends, Challenges and Opportunities
Arshia Rehman · Saeeda Naz · Imran Razzak the date of receipt and acceptance should be inserted later
Abstract
Clinicians decisions are becoming more andmore evidence-based meaning in no other field the bigdata analytics so promising as in healthcare. Due to thesheer size and availability of healthcare data, big dataanalytics has revolutionized this industry and promisesus a world of opportunities. It promises us the powerof early detection, prediction, prevention and helps usto improve the quality of life. Researchers and clini-cians are working to inhibit big data from having apositive impact on health in the future. Different toolsand techniques are being used to analyze, process, accu-mulate, assimilate and manage large amount of health-care data either in structured or unstructured form. Inthis paper, we would like to address the need of bigdata analytics in healthcare: why and how can it helpto improve life?. We present the emerging landscapeof big data and analytical techniques in the five sub-disciplines of healthcare i.e.medical image analysis andimaging informatics, bioinformatics, clinical informat-ics, public health informatics and medical signal ana-lytics. We presents different architectures, advantagesand repositories of each discipline that draws an inte-grated depiction of how distinct healthcare activitiesare accomplished in the pipeline to facilitate individualpatients from multiple perspectives. Finally the paper
A. RehmanComputer Science Department, Govt. Girls PostgraduateCollege No.1, Abbottabad, KPK, PakistanE-mail: [email protected]. NazComputer Science Department, Govt. Girls PostgraduateCollege No.1, Abbottabad, KPK, PakistanE-mail: [email protected]. RazzakDeakin University, Geelong, AustraliaE-mail: [email protected] ends with the notable applications and challenges inadoption of big data analytics in healthcare.
Keywords
Big Data Analytics · , Medical ImageProcessing and Imaging Informatics · Bioinformaticsand Genomics · Clinical informatics · Public Healthinformatics · Medical Signal Analytics
Due to the sheer size and availability of multidimen-sional data, the rate of technological innovation havethe huge potential to make a an extra ordinary im-pact on our daily life in different disciplines especiallyin healthcare sector. The rapidly growing and exploiteddata will refer to introduce a new gigantic term knownas big data. Uncovering information from such com-plicated nature of data is often complex process. Thedevelopment and analysis of tools and methods for anal-ysis of such large quantities of data provides us with anopportunity to make the transition into this new era fareasier. Having data-driven, real-time insights accessibleto the organization through analytics can be a criticalenabler for executing the organization strategies. Bigdata analytics greatest asset is its possibilities and itsneed to find new ways to provide the services that weare looking for.Unlike other field, big data analytics is so promis-ing in healthcare sector and received much more at-tention in the last few years. Clinicians decisions arebecoming evidence-based, meaning that they are rely-ing more on large swathes of research and clinical dataas opposed to solely their schooling and professionalopinion. Big data in terms of healthcare is defined asthe name given to larger and complex electronic health-care datasets that are problematic or almost impossible a r X i v : . [ s t a t . O T ] A p r Arshia Rehman et al. to manage by employing common traditional methods,tools or software [83,204,17]. Big data in healthcare isgenerated by healthcare record (such as patients record,disease surveillance, hospital, medicine, health manage-ment, doctor, clinical decision support or feedback ofpatient [36,65,81,82]) and clinical data (like imaging,personal, financial record, genetic and pharmaceuticaldata and Electronic Medical Records (EMR) etc. [259,183]). The generation and management of these enor-mous healthcare records is considered to be very com-plex thus, big data analytics is introduced [278,272].With the rise of technological innovation and person-alised medicine, big data analytics has the potentialto make a huge impact on our life i.e. how it helps topredict, prevent, manage, treat and cure disease. Fur-thermore, it helps, government agencies, policy makerand hospital to manage resources, improving medicalresearch, planning preventative methods and managingepidemic.With the advancement in information technologyand emergence of digitized computerized systems, hardcopy medical data is tend to move towards ElectronicHealth Records (EHR) and Electronic Medical Records(EMR) systems. These systems generated exponentialgrowth of data [237,212]. Health data is not only col-lected from clinical record, tele-monitoring or medicaltests but there are also a larger number of healthcareapps. These apps have tremendous amount of subscrip-tions. According to the
Ericsson Mobility Report of2018, in Q4 of 2017, there were a total of 7.8 billionmobile subscriptions, with 53 million new subscriptionsadded during the quarter as the growth of people on thisplanet subscribe new and valuable data about healthand well-being everyday. These apps contain volumi-nous data due to the world of social media. There aremore than two billions people who use internet for thepurpose of mailing, downloading, surfing, blogging andentertainment etc. This amount of data also tend tomove towards the concept of big data. Fig. 1 depictsthe ecosystem of healthcare assisted by big data andcloud computing approaches.Moving towards the five characteristics of big datain healthcare sector,
Volume refers to the medical recordof personal data, clinical data, radiology images, genet-ics and population information, resource intensive ap-plications like 3D imaging genomics and biological se-quences. Likewise rapid increase in diseases and medi-cations produce exponential growth of data that is to bestored, manipulate and managed. For the effective cap-turing, management and manipulation of data, mod-ern techniques like advances in data management, cloudcomputing and visualization etc. play a vibrant role forhealthcare systems. Volume is rapidly increasing in bio-
Fig. 1
Healthcare Ecosystem assisted By big data and CloudComputing [168] medical informatics like Proteomics DB [275] containsdata volume of 5.17 TB covering 92% of human genesinformation explained in Swiss-Prot database. Vast amountof volume is produced from medical images like VisibleHuman Project comprehends female data-sets of 39 GB[3]. It is estimated that volume of big data in healthcareincreased to 35 zeta-bytes by 2020 [102,214].
Variety in healthcare divulges that there is a gigan-tic amount of healthcare record either it is structured,unstructured or semi structured. There is a verity ofunstructured healthcare record generated daily like pa-tient information, doctor notes, prescriptions, clinical orofficial medical records, images of MRI, CT and radiofilms etc. Furthermore, structured and semi structuredverity regarding to EMS and EHS comprises actuarialdata, electronic apps and automated databases infor-mation like physician name, hospital name, treatmentreimbursement codes, patient name, address etc., infor-mation of electronic billings and accounting and someof the clinical and laboratory instrument reading obser-vations. For the conversion of unstructured data intostructured data-sets, data analytics provides differentfacilities; one of them is natural language processing inhealth fidelity.Another important characteristic is velocity that canbe at rest or motion pace. At rest velocity, healthcarerecord encompasses doctor or nurse notes, scripts, docu-mentary files, renders record, X-ray films etc. Moreover,medium velocity healthcare data includes blood pres-sure readings, measurement of daily diabetic glucoseby insulin pumps and EKGs etc. However, sometimeshigh velocity is required, as it become a staple of life ordeath. This type of data embroils on real time data likemonitoring of inside heart, anesthesia and trauma for everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 3 blood pressure, room operations, detecting infections ordiseases like cancer etc. at early stage.
Value describes how much data is beneficial or hcareecosystem. For example raw data like paper prescrip-tions, official record or patient information is less valu-able than diagnostics record, medicines and laboratoryinstruments reading record.
Veracity tells the reliabilityor understandability of healthcare record that explainsthe capturing of diagnosis, procedures, treatments etc.and to verifying the information of patient, hospital andreimbursement code etc. Different domains of health-care and medical care propose in the literature. Thisreview paper discusses five sub-disciplines (i.e., medicalimage processing and imaging informatics, bioinformat-ics, clinical informatics, public health informatics, med-ical signal analytics) that directly or indirectly involvein healthcare and bio-medical [213,211]. Before present-ing the literature review, we present the theoretical in-formation of big data and data analytics in Section 2.Different architectures of big data analytics deployedin the domain of healthcare are explaining in Section 3.We also present the advantages of big data to health-care in Section 4 that give the insights how healthcarecan be improved by big data analytics. Then we movetowards the literature review for which we have pro-posed a review methodology for the selection of articlesexplained in Section 5. Based on the review method-ology, the big data in five sub-disciplines of healthcare(i.e., medical image processing and imaging informat-ics, bioinformatics, clinical informatics, public healthinformatics, medical signal analytics) comprehensivelyexplain in Section 6. We also summarize our main find-ings in Section 7. Then, Section 8 presents the notableapplications of healthcare analytics based on the mainfindings. Section 9 discusses the challenges and openresearch issues. Finally the Section 10 draws conclusionof this paper.
The concept of big data was introduced in 1990’s byCox and Ellsworth [51], when they considered visualiza-tion as a Big Data problem. The significant academicreferences of big data in computer science was first dis-covered by Weiss and Indurkhya [274]. In 2000, Diebold[67] introduced big data in statistics/econometrics whenthey referred to exploited quality information. The con-cept was enriched by Douglas Laney at Gartner in anunpublished 2001 research [155]. In short, the term Bigdata is attributed to Weiss and Indurkhya, Diebold,and Laney.Big data is the name given to the larger andenormous data-sets that are usually complex so thattraditional information processing techniques are not enough to deal with them. Mostly the difficulties orchallenges regarding to big data are how to capture,store, share and analyze data, how to visualize, updateor query information privacy. From the view of Radar[191], Big Data deals with the huge amount of data thatis not fit into the conventional databases thus alterna-tive way is chosen to extract and process the data fromit. According to ZDNet big data involves techniquesand procedures for the creation, formation, manipula-tion and organization of larger data-sets and facilitiesoffering for its storage. Techopedia demarcatdes thatunstructured large complex data that is processed bymassive parallelism on readily-available hardware be-cause relational database engines are unable to processthat data. Literature divulges that big data is largerdata sets, enormous growth of data, massive data, un-structured or complex data [242,77,45,100,80].Basically main characteristics of big data are com-plexity and massive size [202,21,22]. However, big datais deliberated by three characteristics known as 3Vsvolume, variety and velocity [273,172,223]. Two addi-tional characteristics are extended to make 5Vs prop-erties of big data as depicted in Fig. 2. These addi-tional characteristics are value and veracity [77,230,231]. Volume leads to the size or quantity of stored andgenerated data. When the volume of data is large it be-comes big data [169,168]. Variety is the type or natureof data when grouped from several sources. Data is var-ied in terms of format like CSV, text or Excel formatin which data stored in a database. Likewise variousforms of data also vary such as video, audio, SMS orPDF data [168]. This verity is also one of the decisivecharacteristic of big data. Velocity specifies the speedof data at which it is generated or processed. Value de-scribes how much data is beneficial or valuable. Thebig data and the value is strongly co-related as stor-age of raw data is useless and inoperable. Huge datais valuable due to the costs and benefits while collect-ing and evaluating data [168]. The term veracity is thequality of data understand-ability. In other words reli-ability, quality and accuracy of big data depend on theveracity property because it prevents dirty data.Data analytics is the amalgamation of two wordswhere data refers to raw facts, figures and informationand analytics means use of several tools to analyze dataalthough data is small or big. Analytics is a canopy andumbrella term for all data analysis applications [273].The big data analytics is the process of analyzing largevoluminous data using different strategies. As afore-mentioned big data is integrated from multiple sources, Fig. 2 thus big data analytics is used to explore how to extractvaluable and hidden patterns and connections from thisintegrated data. In other words, big data analytics issimply analysis of data with the intention of extractinginformation and supporting conclusion making from theinclusive procedure of scrutinizing, modeling, cleansing,and transforming of Big data.Data analytics can be analyzed by three generalmethods: descriptive, predictive and prescriptive ana-lytics [86,29]. Descriptive analytics deals with the con-densation of big data into smaller meaningful informa-tion. Predictive analytics is the data reduction analyticsthat predicts the future analysis by deploying a diver-sity of machine learning, statistical, modeling and datamining techniques to study latest recent and historicaldata. Prescriptive analytics is basically the predictiveanalytics that is used to take action and make the busi-ness decision.Most extensively used approaches for predictive anddescriptive analytics on big data are based on eithersupervised, unsupervised, or hybrid machine learning.An exponential time increase in data has made it dif-ficult to extract valuable information from this data.Despite the strong performance of traditional methods,their predictive power is limited as traditional analysisonly deals with primary analysis whereas data analyt-ics deals with secondary analysis. Data mining involvesthe digging or mining of data from many dimensionsor perspectives through data analysis tools to find pre-viously unknown patterns and associations from datathat may be used as valid information [215,186,210,185]. Moreover, it makes use of this extracted informa-tion to build predictive models. It has been deployedintensively and extensively by many organizations, es-pecially in the healthcare sector.Data mining is not a magical wand but in fact a bigpowerful tool that does not discover solutions withoutguidance. Data mining is convenient for the succeedingpurposes: – Exploratory data analysis to examine the data cor-pus to summarize their main characteristics. – Descriptive modeling to segregating the data intoclusters based on their properties. – Predictive modeling to forecasting information fromexisting data. – Discovering pattern to find patterns that occur fre-quently. – Content retrieval to discover hidden patterns.Several techniques deploy for reduction, optimiza-tion or regression analysis etc. for big data. On accountof the voluminous amount of big data; its dimensional-ity is reduced by linear mapping approaches like Prin-cipal Component Analysis (PCA) [120], Singular ValueDecomposition (SVD) [253]. Some non linear mappingmethods for dimensonality reduction are Kernel Princi-pal Component Analysis (KPCA) [234], Sammons map-ping [228,62],Laplacian eigenmaps [18].Mathematical optimization is another analytics toolthat involve multi-objective and multi-modal optimiza-tion approaches like pareto optimization [195,121], evo-lutionary algorithms [64,11]. Extracting meaningful in-formation and cluster development and analysis is achievedby various clustering algorithms like Clustering LARgeApplications (CLARA) [142] and Balanced IterativeReducing using Cluster Hierarchies (BIRCH) [289] etc.
Our anticipated general framework of big data analyt-ics for healthcare is an abstraction of several concep-tual steps that describe the generic functionalities ofthe domain. The first step in the framework is datacollection, in which health and the clinical data is col-lected from internal or external sources. Verity of dataincludes Electronic Healthcare Records (EHRs), clini-cal images and health monitoring devices logs etc. Afterthe collection of data, next step is Data processing inwhich healthcare data is stored, extract and load inthe data ware houses, middle-ware or in traditional for-mats like CSV, tables etc. Data transformation is thenext step in which data is transform, aggregate andloaded in database file systems like Hadoop cloud orin a Hadoop distributed file systems (HDFS). Analyt-ical phase is used to examine the big data using bigdata tools and platforms like Hadoop, Mapreduce, Hive,Hbase, Jaql, Avro and several others. Finally the out-put is generated in the form of reports and queries usingdata mining and OLAP tools. The self explanatory gen-eral and conceptual architecture are depicted in Fig. 3and Fig. 19. everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 5
Fig. 3
Conceptual Journey of Data to Information in BigData Analytics Environment
Based on the domain abstraction and identification,there are several definitions of big data architecturesproposed and developed by researchers for big data an-alytics. Some the important architectures are Hadhoop,MapReduce [63], Streaming graph [248], Fault tolerantgraph etc. We present some of the renowned architec-tures along with its core component comprehensivelyin detail. One of the major framework on Apache plat-form is Hadoop developed by Doug Cutting and ApacheLucene. It is a collection of open-source software util-ities used for distributed computation, processing andstorage of huge data sets or big data. Two architecturesor core component of Hadoop are: – Hadoop Distributed File System (HDFS) – MapReduceSucceeding Fig. 5 and Fig. 6, depicts the core compo-nents and basic framework Apache Hadoop.3.1 Hadoop Distributed File System (HDFS)HDFS [244] is the master-slave architecture intendedto run on the commodity hardware. It provide greatthroughput access to application data. It allows the un-derlying storage for the Hadoop cluster and enhanceshealthcare data analytics system by segregating hugeexpanse of data into smaller one and disseminated itacross various servers/nodes. The architecture of HDFSis divided into Name-node and Data-node where Name-node is master and Data node is slave. Documents arestored in the data node having size of 64M that can notbe changed. Following Fig. 7 illustrates the architectureof HDFS. According to Fig. 7, Client is a HDFS user.Name-node is responsible to manage the name space in the file system. It stores and maintains the files andfolders into a file system tree .The Data node is theplace where the real data is saved and handles.3.2 MapReduceMapreduce is the another cornerstone of Apache Hadoopthat is developed in 2004 when Google published a the-sis [63]. MapReduce is a standard functional program-ming model that process and analyze . It breaks taskinto sub-tasks , gathering its outputs and analyze ef-ficiently large datasets in parallel mode. Data analysisand processing employed two steps namely, Map phaseand Reduce phase.The architecture of MapReduce operation is splitinto three main components: Client, Job-Tracker andTask-Tracker. Client submit its job to the Job-Trackerin the form of JAR file. Job-Tracker maintains all thejobs that are executed on the MapReduce thus act asmaster service. Task-Tracker executes the jobs that areassigned by Job-Tracker thus act as slave service. Fig. 8demonstrates the generic architecture of MapReduceoperation.3.3 Apache HiveApache Hive [39] is a Structured Query Language (SQL)based Extract Transform Load (ETL) and datawarehouse on Hadoop plateform. It is a run time Hadoopprovision framework that works on Hive Query Lan-guage (HQL) that converts SQL queries into MapRe-duce jobs. The main operations performed by Hive aredata encapsulation, analyzing, adhoc querying and sum-marizing large data-sets. Apache Hive have four majorcomponents: Hive Clients, Services, Processing frame-work and Distributed Storage. Hive client like ThriftClients, JDBC Clients, ODBC Clients etc. can be writ-ten in any supportive language like C++, Java, Pythonetc.Services are used to perform queries. Services ofHive may include command line interface (CLI), Webinterface (WI), Hive server, driver, meta-store etc. Queriesare processed, executed and managed using internalHadoop MapReduce framework. Finally the distributeddata is deposited in HDFS. The core components arerevealed in Fig. 9.3.4 Apache HBaseApache HBase works on non-SQL and non-relationalapproach. It is a database management approach usingcolumn oriented structure lies on the top of HDFS. It
Arshia Rehman et al.
Fig. 4
Architecture of Big Data Analytics Platform
Fig. 5
Core Components of Hadoop
Fig. 6
Framework of Hadoop used the key/value data that perform read/write oper-ations on large HDFS database. Apache Hbase is cat-egorized into three main components: HMaster Server,HBase Region Server, and Zookeeper. HMaster Server is
Fig. 7
Architecture of HDFS
Fig. 8
MapReduce Architecture the main component that manages and monitors HBaseRegion Servers, perform database operations using DDLto create, update and delete tables. Hbase tables aredivided into several regions that are manage, handleand execute operations through Hbase Region Servers.Hbase is a distributed system that is coordinate by everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 7
Fig. 9
Hive Architecture
Zookeeper. The components of Apache HBase are de-picted in Fig. 10.
Fig. 10
Hbase Architecture
Fig. 11
Presto Architecture on Hadoop platform like applications of distributed andaccessible machine learning algorithms.3.7 AvroAvro assists serialization and data encoding that ad-vances structure of data by identifying data types, mean-ing and scheme. It has the functionalities of serializa-tion and versioning control features. Avro configurationis illustrated from the Fig. 12.
Fig. 12
Avro Architecture
How big data analytics can improve healthcare? Simpleanswer to this question is: Analyzing big data can aidhealthcare stakeholders to deliver efficient proceduresand insights into the patients and their health. Numer-ous benefits can be obtained with big data analytics.
Arshia Rehman et al.
Main source of healthcare data are: EHR (ElectronicHealth Records), LIMS (Laboratory Information Man-agement system), Pharmacy, MDI (Monitoring and di-agnostic instruments), Finance (Insurance claim andbilling) and hospital resources. With the advancementof data acquisition devices and analytics techniques,data source are getting enriched with newer forms ofdata i.e. hospitals start to collect Genetic informationin EHR as well. Within this vast variety of patientdata lies the valuable insights for both patient as wellas organizations, which, when applied judiciously canbring in wonderful results. Potential benefits includesadvanced patient care:
Quality of Care:
EHR helps in assembling de-mographic and medical data such as clinical data, labtest, diagnoses, and medical conditions. By discoveringassociations and patterns within this data, helps health-care practitioners to provide quality care, save lives andlower costs.
Disease Prevention:
Spending more on healthdoes not guarantee health system efficiency. The invest-ment in prevention can help to reduce the cost as wellas improve health quality and efficiency. Health systemsface considerable challenges in endorsing and protect-ing health at a time when the burden on finances andresources is substantial in many countries. The earlydetection and prevention of disease plays a very impor-tant role in reducing deaths as well as healthcare costs.Thus, the core question are: How can we diminish thelevel of ill health in the population? And how can weprevent the disease to occur based on early symptomsof patient?
Efficiency:
Managing healthcare data using tradi-tional analytical tools is nearly impossible due to thediversity and volume of data. Healthcare stakeholdersuse big data as a part of their business intelligence strat-egy to examine historical patient admission rates andto analyze staff efficiency.
Disease Cureness:
Healthcare practices have largelybeen reactive where the patient has to wait until the on-set of disease after which treatment is prescribed whichhopefully leads to a cure.However, no two persons inthe world would have the same in genetic sequence.Furthermore, environmental factors associated with theonset of the disease are not known., which is the motivewhy particular medication seems to work for few peo-ple but not for others. Since there are millions of thingsto be considered in a single genome, it is almost im-possible to study them comprehensively. On the otherhand, big data in healthcare have been revolutioniz-ing the expanse of genomics medicine. Big data analyt-ics can extract hidden patterns, unknown correlations, and insights by exploring large data-sets. Scientists arebanking on big data to discover the cure for cancer.
Cost:
Healthcare cost can be cut down by analyz-ing bid data i.e. predictive analytics can helps to detectdisease at early stage. Moreover, big data also reliefs inreducing medication errors by advancing economic andadministrative performance, and reduce re-admissions.For example, patient groups effected by a disease andare treated with different drug regimens can be com-pared to determine which treatment plans work bestfor the same of similar disease which result in savingresources and money.
Finding diseases cure:
A particular medicationseems to work for a few people but not for others, andthere are numerous things to be discovered in a sin-gle genome. It isn’t feasible to observe all of them inelement. however big statistics can help in uncoveringunknown correlations, hidden styles, and insights by us-ing analyzing large sets of statistics. through applyingmachine getting to know, big facts can have a look athuman genomes and find the correct remedy or drugsto deal with cancer.
The review methodology is the systematic process offinding the relevant literature from different sources.The main objectives of review methodology are: – To deploy the definitions and concepts of Big datain healthcare. – To explore the five sub-disciplines (i.e., medical im-age processing and imaging informatics, bioinfor-matics, clinical informatics, public health informat-ics, medical signal analytics [208,215]) that directlyor indirectly involve in healthcare and bio-medical. – To illustrate the repositories and complex datasetsof five sub disciplines. – To determine the big data analytical architecturesand techniques in healthcare. – To discuss the potential advantages and applicationsof big data in healthcare. – To present the open challenges and research issuesof big data in healthcare and the strategies tacklingthe challenges facing in the domain.he main steps of review methodology are informa-tion sources, selection criteria, and search and selectionprocedure.
Information Sources : The first step in thesystematic process of research methodology is to col-lect the relevant articles. To search the relevant articleswe used Google Scholar. We scanned the references topresent a thorough review.
Selection Criteria : In second everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 9 step, we selected the literature on the basis of followinginclusion-exclusion criteria: – Studies were based on articles and reviews – Studies written in English language – Studies related to the big data analytics in health-care – Studies published from 2000 to 2019
Search and selection procedure : In the third step, wesearch the studies from the information sources contain-ing the keywords of big data, big data analytics, health-care, biomedical and healthcare analytics. As mentionedearlier, our goal is to expand the research in health-care using five sub-disciplines, we used the additionalkeywords: medical, medical image processing imaginginformatics, bioinformatics, clinical informatics, publichealth informatics, medical signal analytics. On the ba-sis of initial search criteria, 47,130 papers were foundthus we scrutinized the title, keywords and abstract andexclude 28,280 papers. We also perform the screeningon the basis of full text reading and exclude 18,020 pa-pers that are irrelevant to the big data or healthcaredomain. We ended with 830 papers that are includedin this review paper.The abstract symbols are used to present schematicprocess of review methodology in Fig. 13.
Health professionals, just like business entrepreneurs,are capable of collecting massive amounts of data andlook for best strategies to use these numbers to re-duce costs of treatment, predict outbreaks of epidemics,avoid preventable diseases and improve the quality oflife in general.Different domains of healthcare and medical carehad been proposed in the literature. The general overview,analysis and examples of big data in healthcare analyt-ics was presented in the studies of Raghupathi [204] andWard et al. [272]. The meaning of big data in health-care was presented in the literature reviews of Baro etal. [17] and Wamba et al. [261]. In 2017, Zhang andLi [290] presented the literature review of specializedhealthcare and HIV self-management. Jacofsky [132]discussed the pitfalls of analytics related to the physi-cians from metadata sets in healthcare. Another casestudy of healthcare analytics was presented in 2018by Wang et al. [269] that presented IT-enabled pro-cedures, advantages, and capabilities of big data ana-lytics. Galetsi and Katsaliaki [84] reviewed the articlesof big data analytical techniques for healthcare from2000-2016.
Fig. 13
Schematic Process of Review Methodology
In this review, we will discuss five sub-disciplines(i.e., medical image processing and imaging informatics,bioinformatics, clinical informatics, public health infor-matics, medical signal analytics) that directly or indi-rectly involve in healthcare and bio-medical. As men-tioned earlier, we will cover the literature from 2000-2019 that will provide the comprehensive evaluation ofbig data techniques in healthcare domains. The litera- ture review of five sub-disciplines of healthcare are ex-plained comprehensively in the following subsections.6.1 Medical Image Processing and Imaging InformaticsMedical image processing and imaging informatics arethe main applications that play a vital role in health-care and bio-medical. One of acceptable use of med-ical imaging is to detect diseases like tumors detec-tion of brain and lungs, artery stenosis detection, or-gan delineation detection, aneurysm detection and thediagnosis of spinal deformity and so on. Image process-ing and machine learning techniques were deployed inthese applications for the accurate and effective use ofcomputer-aided medical diagnostics and decision mak-ing. In complex healthcare and bio-medical, informa-tion is generated, managed, analyzed, exchanged, andrepresented imaging information using imaging infor-matics [243,209,184].After the brief introduction, we will elaborate therelated work of medical imaging and informatics, tech-niques and applications deployed in big data healthcare.Medical imaging is used in image acquisition. Mag-netic Resonance Imaging (MRI), Computed Tomogra-phy (CT), photo-acoustic and ultrasound images areused for single dimensional medical data like visualiz-ing the structure of blood vessels [90,209,216]. How-ever for multidimensional medical data like 3d ultra-sound, functional MRI (fMRI), Positron-emission to-mography (PET) etc. are used as shown in Fig 14 .There are publicly available medical images reposito-ries that contains medical images of patients in dif-ferent sizes and modalities depicted in the Table 1.Shackelford [239] used fMRI images and single nu- Fig. 14
Popular Image Modalities in Healthcare Like CT,MRI, PET images cleotide polymorphism (SNP) for the classification ofschizophrenia and healthy subjects. They retrieved 87%classification using hybrid machine learning method.Chen et al. [47] introduced a computer-aided decisionsupport system for the treatment of patients with trau-matic brain injury (TBI). They predict the intracranialpressure (ICP) level from CT scans images. They com-bined CT scans images for features extraction, medi-cal records and patients demographics. They achieved70.3% accuracy, 65.2% sensitivity and 73.7% specificitycorrespondingly.Yao et al. [282] introduced a system for retrievalof medical images based on Hadoop. They applied thelocal binary pattern algorithm and Brushlet transformfor feature extraction of medical images. They imple-mented MapReduce for storing features in HDFS. Theyreported highest precision rate of 95.04% and recall of92.21% on brain CT images. They concluded that re-trieval efficiency of medical images were improved butretrieval time decreased.Jai-Andaloussi et al. [133] employed the MapRe-duce for computation and HDFS for storage in content-based image retrieval systems. They used mammogra-phy image database and applied Bi-dimensional Em-pirical Mode Decomposition with Generalized Gaus-sian Density functions (BEMD-GGD) method and Bi-dimensional Empirical Mode Decomposition with Huang-Hilbert Transform (BEMD-HHT) method. They usedKernal Linear Discriminant (KLD) and euclidean dis-tance. They produced promising results to prove the hy-pothesis that MapReduce technique can be effectivelyemployed for content-based medical image retrieval.Dilsizian and Siegel [68] worked on cardiac imagingand medical data by integrating several techniques likedata mining, AI, and parallel computing. Their systemuse AI and big data for the diagnostic imaging of 55participating sites from the group of formation of op-timal cardiovascular utilization strategies. The systemresult decreased from 10% to 5% in such case.Istephan et al. [131] conducted a feasibility study inthe epilepsy domain. They used the distributing com-putation of hadoop clusters. The framework deals withthe structured and unstructured medical data.6.2 BioinformaticsBioinformatics is a discipline of sciences which dealswith mathematical, computerized and IT-based meth-ods, techniques, algorithms and software tool for cap-turing, storing, analyzing, compiling, simulating andmodeling information of life science and biological data.Role of big data in bioinformatics is to provide effi-cient data manipulation tools for investigation in order everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 11 Table 1
Medical Image RepositoriesDatabases Images Patients Data Size Modalities ApplicationsImage CLEFDatabase
145 41 36 GB MRI 3D MS Lesion Segmentation Techniques devel-opment and comparisonADNI Database http://marathon.csee.usf.edu/Mammogr aphy/Database.html https://public.cancerimagingarchive.net/ ncia/dataBasketDisplay.jsf https://eddie.via.cornell.edu/crpf.html http://adni.loni.ucla.edu/data-samples/acscess-data/ to analyze biological information of patient. Hadoopand MapReduce are currently used extensively used forbioinformatics analytics.Basically, bioinformatics is the combination of biol-ogy and computer science [192]. The biological analysissystem analyzes variations at the molecular level. Thebioinformatics consists of a variety of data types likeGenomics (Genes sequencing), RNA, DNA, Proteomics(protein sequencing), gene ontholgoy, protein-proteininteraction, pathway data, association network of thedisease gene and a network of human disease as shownin Fig 15. With the current trends in personalized care,there is an increasing demand to analyze massive size ofpersonalized patient data in a manageable time frame. Fig. 15
Bioinformatics Types
The size of bioinformatics’ data is increasing expo-nentially day by day. For example, a single human’ssequence of the genome is almost up to 200 GB . A database produced by European Bio-informatics Insti-tution (EBI) has getting double volume after each year[141].
Genomics or Genome sequencing data is cur-rently being annotated as big data of bioinformaticsproblem because human genomics consists of 30,000 to35,000 genes [154,72]. Genomics data is usually the datarelated to gene sequencing, DNA sequencing, genotyp-ing and gene expression etc.[42,203] Gene is made ofDNA comprising 3 billion pairs of four building blocksor bases known as Adenine, Thymine, Cytosine andGuanine. The single genome has the size of about 3GB. Genome analysis employing micro-arrays has beenprofitable in examining traits across a population andwidely contributed in treatments of several complicateddiseases like bipolar disease, hypertension, rheumatoidarthritis, diabetes, muscular degeneration, coronary heartdisease and Crohns disease etc. [147]. This genomics in-formation tends to move towards big data analytics.In bioinformatics, protein sequencing and protein-protein interaction are sophisticated problems in func-tional genomics. This is due to huge number of enor-mous features in feature vector that is not only costeffective and complex analysis, but also reduces accu-racy. Thus feature selection of big data problem is over-come by the method proposed by Bagyamathi et al [13].They combined improved harmony search algorithm toimprove the accuracy and feature selection. Likewise,another feature selection methodology was introducedby Barbu et al. [16]. They reduced the dimensional-ity of an instance using annealing technique for bigdata learning. Similarly, adaptiveness or behavior of bigdata is predicted by Incremental learning approach. Forthis purpose, Zeng et al. [288] implemented incremental
Table 2
Bio-informatics DatabasesDatabase DatabaseType Size DescriptionEuropean Molecular BiologyLaboratory (EMBL) [139] DNASequences 185000 organisms EMBL is the part of an international alliance with DDBJ(Japan) and GenBank (USA). It is used to analyze collec-tion of nucleotide sequences and annotation from sourcesthat are publically available.Genetic Sequence Data Bank(GenBank) [27,283] DNASequence 15000 DNA andRNA sequencesentries This database contains nucleotide sequences that provideinformation based on functional and physical contexts ofthe sequences.DDBJ [250] DNASequences 1880115 entriesand 1134086245bases This dataset is known as All-round Retrieval for Sequenceand Annotation that enable its users to search keywordsfrom Nucleotide Sequence Database CollaborationThe GDB Human Genome[160] GenomicsDatabase Public Database of human genes, clones, STSs, polymor-phisms and mapsSWISS-PROTT [32,31] ProteinSequences 557012 sequenceentries, compris-ing 199714119amino acids It contains information of protein variety, function andassociated disordersUniProtKB / TrEMBL ProteinSequences Computer-annotated protein sequence database. It con-tains sequence translation of coding sequences present inthe EMBL/GenBank/DDBJPROSITE [128] ProteinSequences 1329 patterns and552 profile entries This database contains meaningful biologically signaturesthat described patterns or profilesPDP [148] ProteinStructure 32500 structures This repository is informative with online reports, sum-maries, tools and information related to structural ge-nomics initiativesBiowareHouse [159] ComprehensiveDatabase This detailed repository is the integration of the set ofdatabases including ENZYME, KEGG, and BioCyc, andin addition the UniProt, GenBank, NCBI Taxonomy, andCMR databases, and the Gene Ontology feature selection method called FRSA-IFS-HIS. Theyapplied fuzzy rough set theory on Hybrid informationsystems and reported better performance in big datafeature selection.Once the features were extracted and selected, nextstep is classification or clustering. Classification is thesupervised learning procedure of finding a model thatdescribes and discriminates data classes or concepts.The model is used to predict the class label of testinstances from already trained instances. Among nu-merous models described in the literature, linear andnon linear density-based classifiers, neural networks, de-cision trees, support vector machines (SVMs), NaiveBayes, and K-nearest neighbour (KNN) are the mostoften used methods in numerous applications[177,73,8,180]. In big data analytic, advanced models had been re-ported in the literature like neural networks approaches,divide-and-conquer SVM [122], Multi-hyper-plane Ma-chine (MM) classification model [70] etc. for big dataparallel and distributed learning.Giveki et al. [92] diagnosed automatic detection ofdiabetics using weighted SVM on mutual informationand modified cuckoosearch. They conducted experiment on diabetics datasets by selecting features from PCA.Haller et al. [107] classified Parkinson patients by em-ploying SVM. They performed pre-processing using DTIfractional anisotropy data and select most discriminatedvoxels as features and then classified using SVM. Sonet al. [247] predict the heart failure patients by deploy-ing SVM. Likewise, Sumit. Bhatia et. al. [25] classifiedheart disease by SVM. They selected optimal featuresubset using integer-coded genetic algorithm.The big data classification and regression is effec-tively performed using advanced decision tree. In bioin-formatics, Jerry. Ye et. al.[284] implemented GradientBoosted Decision Trees(GBDT) techniques to distributeand parallelize big data. Calaway et al. [37] estimatedefficiency of decision tree on big data by employingrxDTree. Hall et al. [106] modified decision tree learningby generating rules for large training data-set.Clustering is the unsupervised learning that ana-lyzes data objects without labeled responses. To han-dle big data CLARA [142], CLARANS [189] DBSCAN[79], DENCLUE [119], and CURE [101], k-mode and k-prototype methods [127], PDBCSCAN [279], IGDCA[46], methods were used in the literature. Literature di- everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 13 vulges several bioinfoamtics repositories [150] explainedin the Table 2.Along that there were several techniques and toolsemployed in bioinformatics for specific task. One of thebioinformatics type is microarray data analysis. Toolsused for this type were caCORRECT [249] and om-niBiomarker [201]. For gene-gene network analysis, Fast-GCN [162], UCLA Gene Expression, Tool (UGET) [57],WGCNA [156] tools were used for specific tasks likefinding disease associated with genes, parallelism withGPU etc. Several tools had been proposed for Pro-tein -Protein interaction (PPI) that is a complex andtime consuming process. NeMo [219], MCODE [12], andClusterONE [187], PathBLAST [144] had been devel-oped for PPI analysis. For pathway analysis, GO-Elite[287], PathVisio [130], directPA [281], Pathway Proces-sor [99], Pathway-PDT [198] and Pathview [166] toolshad been employed.In Protein-Protein Interaction and Protein Sequence,Sequencing data was mapped with the specific genomesfor the analysis of various tasks like genotype and ex-pression variation. As DNA sequencing is produced fromsequencing machines ranges from millions of data there-fore matching with the genomes is one of major task.There are several techniques for the matching of DNAsequence with reference gene. A parallel computing modelfor matching genomes is CloudBurst [233]. It use 24 coreclusters for evaluation that is 24 times faster in speedthan single core system. It has the capability of shortread mapping of 7 million reads that improved the scal-ability of reading huge sequencing data. On the basisof CloudBurst, Contrail [232] was developed to accumu-late hefty genomes and for the identification of singlenucleotide polymorphisms (SNP), Crossbow [104] wasprepared.A proteomic search engine based on Hadoop dis-tributed framework is Hydra [161] software package.It is a distributed computing environment that pro-cess large peptide and spectra databases to supportsearching of immense volumes of spectrometry data. Ithas the fast processing of performing 27 billion pep-tide scorings on a 43-node Hadoop cluster in approxi-mately 40 minutes. Another query engine for bioinfor-matics and genomics researchers is SeqWare [53] builton Apache HBase [89]. Th SeqWare has an interactiveinterface with genome browsers and tools. It includesloaded U87MG and 1102GBM tumor databases usedfor the comparison with other prototypes.There are certain tools used for the error identifica-tion of sequencing data. SAMQA [220] is the error iden-tification tool that provides a scale-able quality for stan-dards for large scale genomic data. ART [126] can iden-tify three types of errors from sequencing data like base insertion, deletion and substitution. CloudRS [41]is aparallel algorithm for error correction. It is based onRS algorithm [94]. For the analysis of data sequencingand genomic analysis, several frameworks and toolkitswere developed. CloVR [7,75] is a distributed virtualmachine package for sequencing analysis that supportboth local and cloud systems. Another virtual machinetool is CloudBioLinux [149] that provides 135 bioinfor-matics packages for analysis. Genome Analysis Toolkit(GATK)[174,9] analyze large sequence and genomics.It based on MapReduce-based programming frameworkthat had been used in 1000 Genomes Projects. BlueSNP[125] analyzed 1,000 phenotypes and find associationbased on R package and Hadoop platform.6.3 Clinical InformaticsThe clinical laboratory is a major source of data relatedto patients’ diseases and health issue. There is approxi-mately 80% unstructured data like clinical documents,radiology, pathology, patient discharge summaries, di-agnostic testing reports, X-ray and radio-logical imagesand transcribed notes etc. as shown in Fig 16. Clini-cal informatics is the study of Information Technology(IT) and healthcare for organizing the patient’s clinicaldata and laboratory test, reports etc. into structuredand computerized form to increase data retrieval andextraction efficiently that will assist in evaluations andreports effectively. It divulge the development of elec-tronic health informatics systems for improvement ofcare and management of patients and sharing of datain seconds using computer and internet. Increasinglylaboratory data is being integrated with other dataof patient in order to improve the diagnostic processefficiency, and increase its meaningful use to improvepatient outcomes. IT-based systems replace the man-ual data entry in records, reports, documents; also savetime and cost associated with records, hospital data andreports on daily bases, like billing and schedules of pa-tients [1]. However, clinical informatics is currently notpracticed in small clinics, hospitals, laboratories in ruraland county side areas due to implementation of clinicalinformatics technology [26]. For boosting the implemen-tation the Electronic Care Records (EHR) system as aclinical informatics in the whole government hospitalsin USA, HITEC [28] made some interesting incentivesfor the medical organizations, hospital and clinics. Thatthe doctors and physicians should use EHR systems fordata of patients which they can share with any othersand can provide to patients online and or can accessanywhere.In big data analytics, the first step is to store andmanage data in some structured form. Clinical data is
Fig. 16
Unstructured Clinical Informatics store to observe the information of patients, hospitalsand other relevant structured and unstructured record.It can be than used to settle on clinical decision, as-sessing patients and make treatment plans. Data ware-houses and relational databases are the traditional andstructured methods to store and retrieve data. However,to use clinical data, it is first transformed and clas-sified when it is integrated from multiple sources [15,116]. A detailed systematic review paper is publishedin [35] till 2011. We here presents the further relatedwork. Dutta et al. [74] stored EEG data using Hadoopand HBase in data warehouses. Jin et al. [135] analyzedand stored distributed EHR data using big data toolslike Hadoop HDFS and HBase. Similarly, Nguyen et al.[190] stored signal clinical data using HBase. Jayapan-dian et al. [134] and Sahoo et al.[227] developed a sys-tem named ’Cloudwave’ for storing and querying EEGclinical data that is voluminous. Mazurek [171] storedunstructured data in Not Only Structured Query Lan-guage (NoSQL) repositories to provide fast processingspeed and data mining capabilities. For this purpose,relational and multidimensional technologies were com-bined with NoSQL.Clinical data is often retrieved and shared interac-tively for data integration and knowledge sharing, sothe cloud computing was the usually consider for thispurpose. Bahga and Madisetti [14] proposed a systembased on cloud approach for inter-operable EHRs. Chenet al.[44] translated the informatics aspects of presentand future using cloud computing. For multi-site clini-cal traits, the interactions of researchers were enhancedby the conceptual software architecture developed bySharp [241] using cloud approach. Clinical data is ana-lyzed to predict the disease, risk, diagnosis, and progres-sion. Literature divulges a lot of data analysis strategiesfor the prediction of clinical record. One of the pre-dictive modeling platform was ”PARAMO” designedby Ng et al.[188] for analyzing EHR and the genera- tion and reuse of clinical data. using a Hadoop cluster.They analyzed the EHR from 5,000 patients to 300,000patients and reported promising time effective results.Chawla and Davis [40] formulated the framework forpatient-centered to explained the big data approachesfor personalized medicine. Similarly, the big data forperioperative medicine were illustrated by Abbott [2].Zolfaghar et al. [292] implemented big data techniquesfor the predictive model. They conducted an experi-ment on patient data of ”National Inpatient Datasetand the MultiCare Health System” for the congestiveheart failure. They reported the maximum accuracyupto 77% and recall upto 61%, respectively. Rangara-jan et al. [206] proposed data lake architecture thatused HDFS for data storage. Similar health conditionsof patients were clustered using K-means. From eachcluster, the successful recommendation was found bydeploying SVM. Wang and Hajli [267] examined 109case description of 63 healthcare organizations. Theymodeled the big data analytics for business transfor-mation using RBT theory and capability building viewin the model. Each case occurrences along with pair-wise connections, constructs and path-to-value chainswere used to find business value.6.4 Public Health InformaticsInformatics is an ”Applied Information Science”. It syn-thesizes the practices and theories of information tech-nology, computer science, management sciences and be-havioral sciences into concepts, tools and methods forimplementing information systems into health for pub-lic. Informatics uses to transform raw data into infor-mation effectively according to requirement of users.healthcare informatics researches is a scientic attemptthat improve both health service organizations perfor-mance and patient care outcomes as shown in the fol-lowing Fig. 17. Public healthcare is determined through
Fig. 17
Healthcare Informatics Researches
Epidemiology. Epidemiology is the study of analyzing everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 15
Table 3
Clinical informatics DatabasesDatabase DatabaseType DescriptionTexas Inpatient Public UseData File (PUDF) StructuredEHR This dataset contains record of patients, hospitals,admission type/source, claims,admit day and discharge details. In 2017 dataset contains 699 hospitals, 776,554base date records, 12,486,488 charges date records in First quarter. In Sec-ond quarter there were 694 hospitals, 761,921 base date records and 11,985,920charges date records.Multi-parameter IntelligentMonitoring in IntensiveCare II (MIMIC-II) ClinicalDatabase [226] StructuredEHR This dataset encompasses detailed clinical data, including physiological waveforms and records subsets from minute-by-minute. It contains 32,536 subjectswith 40,426 ICU admissions and 25,328 intensive care unit stays.Patient Discharge Data ByAdmission Type Unstructured Dataset contains the information of inpatient discharges by type of admissionfor each California hospital for years 2009-2015 containing 9,322 entries.Framingham Heart StudyDatabase StructuredEHR It is a genetic dataset for cardiovascular diseases like Heart. It include 5,209 menand women having age between 30 and 62 years. 1948, participants had beenassessed every 2 yearsBasic Stand Alone (BSA)Medicare Claims Public UseFiles (PUFs) Unstructured CSV format that contain non-identifiable claim-specific information and arewithin the public domain.Nationwide Inpatient Sample StructuredEHR This dataset contains discharge information including diagnosis, procedures, sta-tus, demographics, cost and length of stay. It comparises 1051 hospitals of 45states.i2b2 Informatics for Integrat-ing Biology & the Bedside UnstructuredClinicalData Clinical notes used for clinical NLP challenges like deidentification, Smoking,Obesity, Medication, Relations and co-reference challenges https://data.chhs.ca.gov/dataset/patient-discharge-data-by-admission-type/resource/460bd2e8-3b0e-4a41-b2a6-1044f7c82178 https://epi.grants.cancer.gov/pharm/pharmacoepi.html how frequently diseases arise in different groups of peo-ple and why. Epidemiological information is used to for-mulate and evaluate techniques to prevent illness. Thisinformation is also serve as a guideline to the manage-ment of patients in whom disease has already evolved.Traditionally, epidemiology has been based on data col-lected by public health agencies through health person-nel in hospitals, doctors’ offices, and out in the field.The healthcare mechanism is the usual first lineof reaction to clinical activities, whether of large orless severity. Informatics are used to figuring out sen-tinel occasions and leading to analysis can keep awayfrom doubtlessly devastating effects. An example of re-sponse is war on cancer announced in 1973 when theprogrammers of National Institutes of Health feed thedata from registries to the information system enti-tled with Surveillance Epidemiology and End Results(SEER) system. This system provide the informationto the public health planner and epidemiologists to an-alyze the distribution of cancer throughout the pop-ulation [109]. After 3 many years of monitoring andevaluation, Age-adjusted mortality rates as a conse-quence of cancer were dropping step by step since theearly 1990, with important development in areas in- cluding lung most cancers reflecting fulfillment in pub-lic health efforts aimed at controlling precipitants tothe disease [118].Another example of that capacity can be seen insidethe response to the 2001 bio-terrorism assaults. Dur-ing September 2001,anthrax spores had been traced topostal facilities in Trenton, New Jersey and Brentwood,Washington. Epidemiologists face dadaunting venture:the new Jersey facility was a facility of 281,387 squareft, staffed by 250 employees according to shift and pro-cessing over 2 million items of mail in line with day [294].Informatics helped to become aware of the those whocould have been exposed to anthrax,monitored the screen-ing system,and recorded who obtained antibiotics anddistribution of recognized cases and known deaths. Fur-ther analytical strategies and significant healthcare re-searches were explained in [262,217].In latest years, innovative data sources have intro-duced that are used to collect data in a second fromindividuals directly using electronic devices. Social me-dia change the life of society and make global World.The exponential amount of data is produced daily. Bigdata is produced from Public Health (PH) informationand can be generally characterized as big data. Pub- lic healthcare data is collected, analyzed, assured andaccessed so that big data analytics techniques are de-ployed to extract hidden informative patterns. Publicor social media information is further used to predict,monitor and diagnosis of diseases i.e. efficient and ef-fective use of PH data determines the extent to whichsocietal health concerns can be determined. Literaturedivulges several survey papers based on data mining[115,138], deep learning [207,176] and other [10]. Wehere presents some of the public healthcare work usingsocial media. The data-sets for public healthcare datacorpus are explained in the Table 4.Young et al.[285] gathered 553,186,016 tweets fromthe Twitter. They extracted more than 9,800 keywordsand geographic annotations that contains HIV risk words.They revealed that social media monitor global HIVoccurrence and concluded that positive correlation ofgreater than 0.01 was retrieved between HIV-relatedtweets and HIV cases.Hay et al. [111] facilitated public health surveillanceusing online social media combined with epidemiologi-cal information. They developed atlas for real-time dis-ease monitoring.Nambisan et al.[181] detected depression from mes-sages and tweets of social media thus big data analytictools were used to extract the hidden valuable patternsfor detecting mental disorders. They concluded that be-havioral and emotional patterns in messages showed thesymptoms of depression.Tsugawa et al. [256] implemented multiple regres-sion models to detect the depressive tendencies. Theyextracted frequency of words form messages and Twit-ter from the popular micro-blogging services to detectdepression and achieved a correlation of approximately0.5. Park et al. [196] analyzed depression of 60 par-ticipants from their activities on tweeter from senti-ment words of depressed users. Another contributionby the same author was to detect the symptoms ofdepressive users through Facebook [197]. Choudhuryet al. [61,59] developed a large dataset from Twitterposts using crowd sourcing methodology. They imple-mented the probabilistic model to indicate the depres-sion level form social media. In [58] quantified post-partum changes and depression of 376 mothers fromTwitter posts. Similarly, same authors in [60] detectedand predicted the onset of post-partum depression of165 mothers through Facebook shared data. Sadelik etal. [225] predicted infectious diseases through the so-cial network. They used 1000 Twitter messages relatedto healthcare. They applied statistical models on geo-tagged postings made on Twitter for prediction of dis-eases that cause an infection like flu etc. Digital mediais widely used to improve healthcare monitoring and its effectiveness. Ginsberg et al. [91] used Trends mod-els and search queries on Google to detect influenza andflue like diseases. One of the most earlier comprehen-sive review paper of public healthcare informatics usingsocial media was presented by Hagg et al. [105]6.5 Medical Signal AnalyticsNowadays technology is advancing rapidly that pro-vide effectiveness in every walk of life, especially inhealthcare. Currently, healthcare systems use a varietyof continuous monitoring devices that generate signals.Physiological signal monitoring devices and Telemetrydevices are pervasive [19] because these devices im-prove healthcare management and patient healthcare[30,123]. These devices use discretized or physiologicalwaveform data and generate alert mechanisms in caseof an overt event. There are certain issues in medicalsignals that tend to move towards big data. The mostnotable obstacle is volume and velocity of continuousand high-resolution multitude monitors connected toeach patient. The generated alarm systems are unreli-able and cause alarm exhaustion for both caregivers andpatients [71,98]. The primary failure of these systemsare due to the relay on single sources of information.The first step in streaming data analytics in health-care is to the acquisition of signals. It is usually rare tostore the streaming signals from continuous acquisitiondevices. However to access the live streaming data fromdevices is one of the foremost tasks for big data analyt-ics applications. As there are many challenges poses tohealthcare systems during streaming data collection likenetwork bandwidth, scalability, and cost [173]. ThusResearch communities are developing continuous mon-itoring technologies [5] to capture live monitor signals.Next step is to store the signals data from monitor-ing devices using Big Data analytics tools like HDFS,MapReduce, and MongoDB [4,143] etc. Medical dataincluding signals is complex due to interconnected andinterdependent data among several sources. Thus, datais integrated and aggregation techniques are deployedfor effective performance [229,23]. The workflow of gen-eralized streaming healthcare is depicted in Fig.18. Themost notable data repositories containing signals infor-mation in healthcare is explained in Table 5.After introducing medical signal analytics, we willpresent some of the related work of Big Data analyt-ics in medical signaling. Han et al. [108] developed apatient care management system using a scalable in-frastructure. This system combined static and continu-ous data from monitored ICU devices. It analyzed andmined medical data in real time. everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 17 Table 4
Public Health DatabasesDatabase Database Type Size DescriptionOhio Hospital In-patient/OutpatientDatabase Public PatientsRecords 35 million patientrecords per year This repository contains hospital record such as numberof admissions, discharges, stay length, transfers, numberof patients with specific codesBehavioral Risk Fac-tor Surveillance Sys-tem (BRFSS) [69] Survey System 50 states data of morethan 400,000 adult in-terviews each year This system contains record of mental illness, smoking, al-cohol, lifestyle (diet, exercise) and diseases (diabetes, can-cer) etc.Surveillance Epidemi-ology and End Re-sults (SEER) Pro-gram [112] Cancer Dataset 7.7M cases and morethan 350,000 casesare added each year This program contains survival data from population-based cancer registries covering approximately 28% USpopulation.PatientsLikeMe On-line Patient NetworkDatabase [245] Online Pa-tient NetworkDatabase more than 200,000patients and is track-ing 1,500 diseases This data corups contains information of disease-specificfunctional scores, sympotms etc. through which peoplehaving same symptoms connect with each others.Human MortalityDatabase [276] Public Mortality 39 countries or areas This database contains information about population andmortality in detail along with Birth, death, population sizeby country.
Table 5
Medical Signal DatabasesDatabase DatabaseType Size ApplicationsThe Multi-parameter Intelli-gent Monitoring in IntensiveCare II (MIMIC-II) ClinicalDatabase [226] StructuredEHR 32,536 subjects with40,426 ICU admis-sions and 25,328 ICUstays It contains comprehensive clinical data, includ-ing physiological waveforms and minute-by-minuterecords subsets.MIMIC-III Database [136] hospitalDatabase 38,597 patients,49,785 hospitaladmissions This data corpus is inforamative with vital signs,laboratory measurements,medications, imaging re-ports, details of observations , fluid balance, diag-nostic codes, procedure codes, stay length of hos-pital, survival data, etc.The ECG-ID Database SignalsDatabase 90 persons, 310 ECGrecordings EEG signal recordings each have 10 annotatedbeats, digitized at 500 Hz with 12-bit resolutionand recorded for 20 seconds.CinC Challenge 2000 datasets [95] EEG sig-nal baseddatabase 583 megabytes, 70records This dataset contain EEG signals of 70 records,used 35 records for learning set and 35 for testingMIT-BIH PolysomnographicDatabase [95] PhysiologicSignalsDatabase 18 records, each have4 files This database is the collection of recordings of mul-tiple physiologic signals during sleepEEG Motor Move-ment/Imagery Dataset[95] EEG SignalsDatabase 109 volunteers,1500recordings Two minutes EEG recordings, 64-channel EEGwere recorded using the BCI2000 systemAmerican Heart Association(AHA) EEG SignalsDatabase 80 recordings ECG recordings of 80 two-channel records digitizedat 250 Hz per channel with 12-bit resolution withrange of 10 mV. Bressan et al. [34] implemented an architecture forneonatal ICU. It used data of EEG monitors, infusionpumps and cerebral oxygenation monitors. Their pro-posed system provide effective decision system for clin-ics. Lee and Mark [158]conducted experiment on MIMICII database for therapeutic intervention to hypotensive episodes. Their system predict intensive care based us-ing blood pressure and cardiac time series data.Sun et al.[252] also used MIMIC II database to ex-tract the physiological waveform data along with clini-cal data. They selected cohorts and find the similarity ofpatients from them that is beneficial for healthcare. Thesimilarity is used for the treatment of similar diseases
Fig. 18
Generalized Work-Flow of Streaming Healthcare and deducted effective decisions from them. Anotherstudy on MIMIC II database is to detect the cardiovas-cular instability in patients at an early stage. For thispurpose Cao et al. [38] developed a system that com-bined multiple waveform data from MIMIC II corpus.Roux et al.[157] discussed the neuro-critical careof the patient’s disorders using different physiologicalmonitoring systems. They provide a platform for theresearchers with guidelines by examining the potentialsand implications of neuro-monitoring.Rajan et al. [205] used a multi-channel signal ac-quisition method for the development of physiologicalsignal monitoring system using NI myRIO connectedwith the wireless network. They also used the Internetof Things( IOT) techniques for better performance inhealthcare.Zhang et al. [291] recognized the Lung cancer usingsensor-based wrist pulse signal processing with the tech-nique of cubic support vector machine (CSVM). Theyimplemented iterative slide window (ISW) algorithmfor signal segmentation and extract 26 features. Usingthese strategies, they achieved 78.13% accuracy. Nandaet al. [182] distinguished between essential tremor andParkinsons tremor using non-invasive recording tech-niques. They employed Neural Network for the classi-fication of tremor sEMG signals and achieved 91.66%accuracy.
This survey presents the emerging landscape of big dataand analytical techniques in the five sub-disciplines ofhealthcare. We present various domains of healthcarein which big data technology has played a significantrole in modern-day healthcare revolution, as it has to- tally changed the perception of people about health-care activities. Big data analytical techniques deployedin five sub-disciplines such as, medical image processingand imaging informatics, bioinformatics, clinical infor-matics, public health informatics, medical signal ana-lytics are explained comprehensively that draws an in-tegrated depiction of how distinct healthcare activitiesare accomplished in a pipeline to facilitate individualpatients from multiple perspectives. The existing re-views did not provide the detailed explanation in mul-tiple sub-disciplines of healthcare. There is no compre-hensive evaluation of studies in the existing reviews.The existing studies discussed the different sourcesof healthcare for big data such as pharmaceutical firms,healthcare providers, diagnostic companies, laborato-ries, not-for-profit organizations insurance companiesand web-health portals [222,178,254,272,124,33]. Thebig data techniques used for the analysis of healthcaredata are machine learning, data mining, cluster analy-sis, pattern recognition, neural networks, deep leaningand spatial analysis. Most of the studies processed thepatient data using Hadoop and its tools, but they arebatch processing tools [265,129,264,175,200]. There aresome studies that used newer tools like Spark, Stormand GraphLab etc. for the processing of real time andstreaming data [200]. Most of the studies discussed theapplications of big data analytics in different fields ofhealthcare like personalized medicine, clinical decisionsupport, clinical operations optimization and cost effec-tiveness of healthcare. It can be showed that healthcareanalytics improve the quality and early identification ofpatients. There are researches related to diabetes, gyne-cology, oncology, cardiovascular diseases and so on thatenable save time and cost [269,167,277,260,88,76].With the rapid increase of publications in biomed-ical and healthcare industry, we have conducted thedetailed review regarding healthcare analytics in fivesub-disciplines. We summarized the usability studiesof each discipline in Table 6, including image visual-ization, image classification, image retrieval, data andworkflow sharing, data analysis, feature selection, bioin-foramtics classification and clustering, micro-array dataanalysis, protein-protein interaction, pathway analysis,protein sequencing, query and search engine, error iden-tification of sequencing data, storage and retrieval ofEHR, treatment recommendation, business transforma-tion, disease prediction, diagnosis and progression, datasecurity, infectious disease surveillance, population healthmanagement, mental health management, chronic dis-ease management, signal acquisition, signal storing frommonitoring devices, signal integration and aggregationrespectively. It is concluded from this survey, that bioin-formatics is one of primary discipline in which big data everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 19
Fig. 19
Deep Learning Architecture for Big Data Analytics analytics is currently evolved and playing a scientificrole, due to the complex and massive bioinformaticsdata. There are a lot of tools, techniques and platformsfor bioinformatics used to analyze biological, genomics,proteins and gene sequencing data. However, there isless potential of big data applications in other disci-plines such as, in medical imaging informatics, clinicalinformatics, public health informatics and medical sig-nal analytics.
Healthcare sector produces huge amounts of patientdata on a daily basis. Traditionally, most of this datawas used to be in the form of hard copies but, due tothe advancement in data acquisition devices, healthcareorganizations are gathering data electronically. health-care data analytics has the potential to bring in dra-matic changes in healthcare industry to smooth the pro-cess and improving the quality of care. Data analyticresearchers, healthcare providers, government agenciesand the pharmaceutical companies identify range of dif-ferent ways that big data techniques can help us tosignificantly improve patient outcome through policymaking and evidenced based decisions. Below are themajor areas in healthcare sector where big data analyt-ics has a huge impact:
Strategic Planning: ‘Management is based on earlymeasure: you cant manage if you cant measure’ . Health-care is a time critical service. Hospitals are strugglingwith patient flow. Machine learning and data analytics plays important role in the prediction of patient flowand ensuring smooth patient flow as well as reducingwaiting period. Early predicting of hospital visit helpsthe management to decide and take the necessary stepto reduce patient waiting time thereby giving timelytreatment. Patient Flow Manager, Q-nomy’s are the ap-plication that provides a comprehensive graphical viewpatient flow information, drawing of inpatient, elective,emergency, outpatient and other hospital systems. Forexample, care mangers can analyze check-up resultsamong patient in different demographic groups thathelp to identify what factors discourage patient fromtaking up treatment. The classical example is staff man-agement: how many clinicians and nurcess staff shouldbe give at specific time.For our first example of big data in healthcare, wewill look at one classic problem that any shift managerfaces: how many people do I put on staff at any giventime period? If you put on too many workers, you runthe risk of having unnecessary labor costs add up. Toofew workers, you can have poor customer service out-comes which can be fatal for patients in that industry.In other example, we can predict admission trend basedon admission history of last few years i.e. using 10 yearsworth of hospital admissions records, which data scien-tists crunched using time series analysis techniques fol-lowed by machine learning relevant to predicted futureadmissions trends.
Fraud Detection: ‘Suspect, detect and protect’ . Fraud,waste, and abuse have caused significant cost and itrange from honest mistakes that result in erroneous
Table 6
Comparative analysis of the literatureHealthcare Discipline Big Data Analytical Technique StudiesMedical Image Processing andImaging Informatics Image Visualization [90,164,140,93,286,137]Image Classification [239,145,152,216]Image Retrieval [282,133,97,280,170]Data and Workflow Sharing [20,50,221,221]Data Analysis [68,170,263]Bioinformatics Feature Selection [13,16,288,293,255,165]Classification [92,25,284,37,107,106,293,56,66]Clustering [142,189,79,119,279,46,199]Microarray Data Analysis [249,201,162,57,156]Protein-Protein Interaction [219,12,187,144]Pathway Analysis [287,130,281,99,198,166]Protein Sequencing [233,232,104]Protein Query and Search Engine [161,53,89]Error Identification of SequencingData [220,126,41,94,7,75,149,174,9,125]Clinical Informatics Storage of EHR [28,15,116,74,135,190,134,227,171,206]Retrieval of EHR [96,241]Interactive data retrieval for DataSharing [64,14,251,14,44,113,270]Treatment Recommendation [47,43,68,131]Business Transformation [267,268,269,103,266,146]Disease Predication, Diagnosis andProgression [188,40,2,292]Data Security [235,246,163,238]Public Health Informatics Infectious Disease Surveillance [111,285,87]Population Health Management [153,52,85,258,110]Mental Health Management [181,236,49,179]Chronic disease Management [24,257,151]Medical Signal Analytics Signal Acquisition [173,158,5,252,205]Signal Storing from Monitoring De-vices [4,143,108,34,38,157]Signal Integration and Aggregation [229,23,291,182] billings, inefficiencies that may result in wasteful diag-nostic tests, over-payments due to false claims. Personaldata is extremely sensitive due to its profitable value inblack-markets, thus, healthcare industry is 200% morelikely to experience data breaches than any other. Withthat in mind, effective detection of frauds is very im-portant for reducing the cost and improving the qualityof healthcare system. Fraud detection in healthcare isan important yet difficult problem. Big data has in-herent security issues and healthcare organization aremore vulnerable than they already are. Many organi-zations are using analytics to reduce security threatsby analyzing the changes in network traffic, or suspi-cious behavior that reflects a cyber-attack. WhiteHatAICentaur system, NICE ACTIMIZE, NHCAA, SAS, andOptum etc. are being used for medical claims process-ing that identifies and detects healthcare fraud, waste,and abuse before it happens. Likewise, data analyticcan helps to prevent fraud and inaccurate claims in asystemic, repeatable way by streamlining the process ofinsurance claims. For example, the Centers for Medi-care and Medicaid Services saved over $210 . Resource Management: ‘How you use a facility,many factors pushing and pulling’ . Big data is makinghuge advances in reducing hospital waiting lists. De-spite expensive efforts by the government and health-care organizations, waiting times barely changed, withthe median even increasing slightly i.e. Australia hasbeen trying hard to reduce the waiting list times on itshospital for more than two decades. Efficient and timelyresource utilization helps to over come the patient flowand reduces the financial burden on organization. Dataanalytics continues to make inroads the manage hospi-tal resources efficiently with respect to patient flow andrisk. Examples are readmission, ambulance, and bedutilization etc.The common example is 30 days patient readmis-sion or return visit to an emergency department. 30days readmission identify the patients that have highpossibility to return to hospital with 30 days of dis-charge. The development of risk prediction model helpsto identify patients who would benefit from the diseasemanagement program in an effort to not only reducethe patient readmissions but also healthcare cost. everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 21
Personalized Medicine: ‘Disease and its treat-ment is unique as we are’ . The promise of personal-ized medicine is the shift away from one size fits allmedicine. Through the datafication and genomic fin-gerprints, much more information of each patient canbe analyzed without requiring multiple rounds of test-ing. Best treatment can be made on an individual basisat a faster rate bu using personalized data.
Genomics: ‘The more is the data you have the bet-ter you can treat’ . Human body consists of 30,000 to35,000 genes [72,48]. From the DNA structure of thehuman, it is estimated that there are 23 chromosomeswith the distribution of 3.2 billion base pairs [78,117].This data increases dramatically to about 200 giga-bytes. Thus big data analytics is required for genomicsand sequencing practices that are used for the treat-ment of complex diseases like Crohns disease and age-related muscular degeneration [147]. The impact of ge-nomic data analytics has the great potential to improvehealthcare outcomes, quality, and safety, as well as costsavings.
Disease Prediction and Prevention: ‘Precau-tion and care can help live longer’ . Many healthcare or-ganizations, research labs, hospitals are leveraging BigData analytics are by changing the models of treat-ment delivery. Thus Big Data analytics have tremen-dous applications in the healthcare domain for reducingcost overhead, detecting and curing diseases, predict-ing epidemics and enhancing the worth of human lifeby averting deaths. Number of projects from ”Google”,”DeepMind”, ”IBM”, ”Royal Free London NHS Foun-dation Trust” and ”Imperial College Healthcare NHSTrust” and others have proved the importance of deeplearning and machine learning for detection, identifi-cation, diagnostics and predictive analytics. DeepMindcollaborated with Moorfields Eye Hospital to the ana-lyze anonymized eye scans, searching for early signs ofdiseases leading to blindness. There are also projectssigned with the Royal Free London NHS FoundationTrust and Imperial College Healthcare NHS Trust todevelop new clinical mobile apps linked to EHR.Big data has transformed healthcare by putting datato work, revealing clinical and operational insights. Themost applicable applications of IBM are IBM Contentand Predictive Analytics. ”IBM Content and PredictiveAnalytics” for healthcare is the first industry-specificanalytics solution to enable organizations to analyze thepast, see the present and predict the future by simulta-neously. For example, we can predict admission trendbased on admission history of last few years i.e. using 10years worth of hospital admissions records, which datascientists crunched using time series analysis techniquesfollowed by machine learning relevant to predicted fu- ture admissions trends. One of the major application ofbig data analytics in the healthcare domain is medicalimage processing. As in healthcare enormous amount ofmedical images are produced like X-ray, CT and PET-CT images, MRI, ultrasound, fluoroscopy and photoa-coustic imaging. These medical images produced bigdata that are used for various purposes like detection,diagnoses, assessment and decision making of therapyetc. [218].Heart is the basic organ of the body. If the heartstops its working human body does not exist. Thereare several disorders of heart among them one is theheart attack. Big data analytics facilitates to predictthe heart attack at the early stage using early heart at-tack detection system based on medical biosensor [55,271] that detect heart attack at the early stage. Thereare some online systems [6] and healthcare informationsystem [193] that provides guidance about heart dis-eases using IOT and Hadoop techniques.The brain is the vital organ of the body that con-trols all the activities of the body just like CPU of thecomputer. Thus data mining and data analytics toolsare deployed to detect the brain disorders like Parkin-son’s brain disease prediction [240,216]. Diabetics is oneof the common diseases in this world. Big data analyt-ics tools like ‘Hive’ and ‘R’ are used for the analysisof diabetics using descriptive dataset [224,54]. Efficientpredictive models are established to reveal the data re-lated to the investigation of diabetics.There are online applications that are remotely fa-cilitating the healthcare domain. AmWell, Practo, Portea,and Isabel etc. are the most popular apps that are usedfor various purposes like appointment of doctors at hos-pitals, clinics etc., patient diagnosis, ordering medicines,consultation with the doctor remotely for treatment etc[194]. Summering the applications of big data analyticsin healthcare [204,114], it is beneficial to identify anddiagnose the patient accurately and precisely. It is usedfor the prediction and management of health risks andobesity to efficiently detect the level of frauds. It re-duced the cost, variations, and elimination of duplicatecare and improper claim submission.
The healthcare sector suffers from multiple challenges,ranging from new disease outbreaks to preserving anoptimal operational efficiency. To overcome these chal-lenges, data mining and data analytics in the devel-opment of applications of healthcare have tremendouspotential, however, success hinges on the availability ofquality data but there is no magic recipe to successfullyapply data analytics methods on any problem. Thus, the successful development of data analytics based ap-plications depends on how data is stored, prepared andmined. However, chemical analytics poses a series ofchallenges when dealing with a enormous amount ofcomplex data. These challenges involve data complex-ity, access to data, regulatory compliance, informationsecurity and efficient analytics methods, inter-operability,manageability, security, development, re-usability, andmaturity.9.1 Multiple Source Information ManagementIn healthcare data analytics, the main goal is to an-alyze the real world medical data to perform predic-tion or classification task. One of the biggest hurdles indevelopment of such application depends upon on thedata structure i.e. how medical data is spread acrossmany sources, how data is stored, prepared and mined.One of the worst example of lack of data sharing is: awoman who was was suffering from mental illness andsubstance abuse, visited variety of local hospitals morethan 900 times in a period of less than 3 years in Oak-land, California, USA. It results in heavy cost, extensiveuse of hospital resources and more important, harder forwomanto get good care.Healthcare data is data correlations are leveragingin longitudinal records i.e. complex, heterogeneous, dis-tributed and dynamic data i.e. in the US alone, health-care data extended to 150 exabytes in 2011 and is ex-pected to reach the zettabyte scale soon. Despite therapid increase in EHR adoption, there are several chal-lenges around making this information useful, readableand relevant to the physicians and patients who needit most. One of the key challenges in the healthcareindustry is how to manage, store and exchange all ofthis data. Inter-operability is considered to be one ofthe solutions to this problem. There exists a poor inter-operability in EHRs that creates big data analytics chal-lenging in healthcare. Integration of different data sourceswould require developing a new infrastructure where alldata providers can collaborate each other to share. An-other challenge is data privacy that limits the sharingof data by blocking out significant patient identifica-tion information such as MRN and SSN. Healthcareneeds to catch up with other industries that have al-ready moved from standard regression-based methodsto more future-oriented like predictive analytics, ma-chine learning, and graph analytics. Big data technolo-gies like Data ingestion, data modeling, and data visu-alization are integrated with existing tools to provide asupported enterprise solution.Big data management is one of the hard tasks asthere is a big cluster of data that is monitored and managed. Most patients visit multiple clinics to try tofind a reason for their disease and medical solution fortheir illness. To overcome this issue, several manage-ment tools are integrated that is overwhelming andcost effective strategy. Proficiently handling large ca-pacities of medical imaging data and extracting pos-sibly useful information is another hard task. Hospi-tals have yet to achieve a level of inter-operability, andwithout it, it is almost impossible to improve patientcare. The US Health Department is aiming for inter-operability between disparate EHRs by 2024. Medicalstakeholders (physicians, administrators, patients etc.)believe that inter-operability will improve patient care,reduce medical errors and save costs. Imagine havingthe insight and opinions of hundreds of IVF/PGD pa-tients to assist your decision before undergoing treat-ment rather than only relying on a physicians recom-mendations. Due to the importance of data integration,healthcare organizations are turning to the implementa-tion of inter-operability. To achieve a high level of inter-operability, HL7, HIPPA, HITECH and other healthstandardization bodies have demarcated several stan-dards and guidelines to assist organizations to knowwhether they meet inter-operability and security stan-dards. The Authorized Testing and Certifying Body(ATCB) provides a sovereign, third-party opinion onEHR. Two types of certification (CCHIT and ARRA)are used to evaluate the system. The review processcomprises standardized test scripts and exchange testsof standardized data. Healthcare industry needs to catchup with other fields that have already progress fromstandardization.9.2 Security and Privacy and ConfidentialityEvery stakeholder in the health industry has a role toplay in ensuring the security and privacy of patient in-formation. It is a shared responsibility. Patient privacyand information security are fundamental componentsof a well-functioning healthcare system that helps to ac-complish better health outcomes, healthier people, andsmarter spending. For example, a patient may not dis-close certain information or may ask a physician notto record his health information due to a lack of trustand the perception that this information might not bekept confidential. This attitude puts the patient at riskand deprives physicians and researchers of importantinformation as well as putting the organization at riskin terms of clinical outcomes and operational efficiencyanalysis. To reap the benefits, providers and individu-als must belief that patients health information is keptprivate and secure. On the other hand, providers arefacing several challenges in ensuring that privacy and everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 23 security issues are managed at a standard that meetsthe patients satisfaction i.e. efficient data analysis with-out providing access to precise data in specific patientrecords. Security and privacy in data analytics posesseveral challenges, especially when it draws informationfrom multiple sources.The major goal in healthcare is not to protect thepatients privacy rather it is to save lives. The HIPAA(Health Insurance Portability and Accountability Act)of 1996 comes to mind when privacy is debated in thehealth sector. It delivers legal rights to patients con-cerning their personally identifiable information andestablishes responsibilities for healthcare providers todefend and restrict its use or disclosure. With the es-calation in the amount of healthcare data, data ana-lytics researchers envisage huge challenges in ensuringthe anonymity of patient information to avoid its use ordisclosure. Limiting data access, unfortunately reducesinformation content which might be very important.Moreover, real data is not static but grows larger andvaries over time and none of the existing techniquesresult in any convenient content being released in thisscenario.9.3 Advanced Analyzing TechniquesTechnological advancements (wearable devices, patient-centered care etc.) are transforming the entire health-care industry. The nature of health data has progressed,and currently, EHRs have simplified the data acquisi-tion process with the help of the latest technology, butunfortunately, they dont have the ability to aggregate,transform, or perform analytics on it. Intelligence is re-stricted to retrospective reporting that is insufficient fordata analysis. A plethora of algorithms, techniques, andtools are available for the examination of complex data.Traditional machine learning deploys statistical analy-sis based on a sample of a total dataset. The use of tra-ditional machine learning methods for this data is notefficient and is computationally infeasible. The combi-nation of the huge volume of healthcare data and com-putational power lets the analysts to focus on analyticstechniques which are scaled up to accommodate the vol-ume, velocity, and variety of complex data. During thelast decade, there has been a melodramatic change inthe size and complexity of data thus, several emergingdata analysis techniques have been presented.Healthcare needs to catch up with other industriesthat have already progressed from traditional meth-ods to advance methods like predictive analytics, deepmachine learning, and graph analytics. Innovative an-alytics techniques need to be developed to interrogatehealthcare data and gain insight into hidden patterns, trends, and associations in the data. It deduces relation-ships without the need for a specific model and enablesthe machine to identify the patterns of interest in hugeunstructured data. As one example, a deep learning al-gorithm that observed data from Wikipedia learned onits own that California and Texas are both states in theU.S. It does not have to be modeled to understand theconception of a country and state, and this is a giganticdifference between older machine learning and emergingdeep learning methods.9.4 Data QualityGone are the days when healthcare data was small,structured and collected exclusively in electronic healthrecords. Due to the tremendous advancements in IT,wearable technology and other body sensors, data hasbecome quite large (moving to big data), unstructured(80% of electronic health data is unstructured), non-standard as well as in a multimedia format. This varietyin data makes it challenging and interesting for analysis.Currently, the quality of healthcare data is a cause ofconcern for four reasons, incompleteness (missing data),inconsistency (data mismatch between within same orvarious EHR sources), inaccuracy (non-standard, incor-rect or imprecise data) and data fragmentation. Dataquality involves a group of different techniques, thesebeing data standardization, verification, validation, mon-itoring, profiling, and matching. The problem of poordata in the health industry has reached epidemic pro-portions and introduces several pernicious effects, par-ticularly in relation to disease prevention. The problemwith dirty data is mostly related to missing values, du-plication, outliers and stale records.Although real-time data monitors (especially in ICUs)are partially used in most hospitals, real-time data an-alytics is not in practice. Hospitals are moving to real-time data collection and in the near future, real-timedata analytics will revolutionize the healthcare indus-try, enabling such things as the early identification ofinfections, the continuous monitoring of the progressof treatment, and the selection of the right drugs etc.which could help to reduce morbidity and mortality. Toachieve real-time data processing, we need data stan-dardization and device inter-operability.The other common issue is data standardization.Structuring of only 20 percent of data has shown its im-portance but on the other hand, clinical notes are still inpractice and created in billions due to the reason thatthe physician can best explain the clinical encounter.Empower physicians as well as maintaining the dataquality is quite challenging. So far, this data is excluded from data analytics as its available in the natural lan-guage and not discrete. Transforming this unstructureddata into a discreet form requires efficient intelligenttechnology and it is has been a very difficult problemfor medical IT until now. The only way this unstruc-tured and nonstandard data can be used is by usingNLP to translate the data using ICD or SNOMED CTinto discrete data.
10 Conclusion
The exponential growth of big data analytics has rapidlyincreased that plays a vital role in the progression ofhealthcare practices and research. It includes provid-ing tools to collect, analyze, manage and store a largevolume of structured, unstructured and large complexdata. Big Data has brought a dramatic change in health-care which reduce the cost of treatment and acceleratethe identification of disease, cancer etc. and improve thelife’s quality. It has been recently applied in aiding inthe process of healthcare personnel, care delivery, earlydisease detection, disease exploration, patient care, andcommunity services.In this paper, we have discussed the big data analyt-ics methods, tools, techniques and architectures in thehealthcare domain. We have focused on five major sub-disciplines of healthcare i.e. medical image processingand imaging informatics, bioinformatics, clinical infor-matics, Public Health informatics and Medical Signalanalytics along with techniques, tools, and repositoriesdeployed in each discipline. These disciplines plays a vi-tal role in healthcare and bio medical due to the enor-mous amount of data.Healthcare providers had no direct incentive in shar-ing the patient information with each other, that madeit harder to efficiently utilize the power of analytics inhealthcare industry. We can possibly change the way tohealthcare providers use modern advances and sophisti-cated technologies to pick up understanding from theirclinical, data warehouses, information storehouses forextracting informative patterns and decision making.Later on we’ll see the quick, across the board executionand utilization of Big Data Analytics over the socialinsurance association and the medicinal services indus-try. Keeping that in mind, the few difficulties must betended to. Its potential is extraordinary however, issues,for example, multiple source information management,ensuring protection, shielding security, setting up mod-els and administration, advance analyzing techniquesand data quality are the notable challenges in the do-main. Regardless, the future trends of Big Data in thesocial insurance framework have the capability of en-hancing and quickening communications among clini- cians, executive, logistic manger, and analyst by dimin-ishing costs, reducing risks and improving personalizedcare.Implementation of big data analytic is the responsi-bility for all stakeholders in healthcare industry. Theymust be effectively engaged in the review and policymaking process if big data that could results in improv-ing the patient outcomes. Government agencies, health-care professionals, hardware companies, pharmaceuti-cal industries, people, data scientist, researchers andvendors must be involved in developing the big dataframework that will provide the future direction of bigdata analytics in healthcare industry.
References
1. Abbott, P.A., Coenen, A.: Globalization and advancesin information and communication technologies: Theimpact on nursing and health. Nursing outlook (5),238–246 (2008)2. Abbott, R.: Big data and pharmacovigilance: usinghealth information exchanges to revolutionize drugsafety. Iowa L. Rev. , 225 (2013)3. Ackerman, M.J.: The visible human project: a resourcefor education. Academic medicine: journal of the As-sociation of American Medical Colleges (6), 667–670(1999)4. Adri´an, G., Francisco, G.E., Marcela, M., Baum, A.,Daniel, L., de Quir´os Fern´an, G.B.: Mongodb: an opensource alternative for hl7-cda clinical documents man-agement. In: Proceedings of the Open Source Interna-tional Conference (CISL13) (2013)5. Ahmad, S., Ramsay, T., Huebsch, L., Flanagan, S., Mc-Diarmid, S., Batkin, I., McIntyre, L., Sundaresan, S.R.,Maziak, D.E., Shamji, F.M., et al.: Continuous multi-parameter heart rate variability analysis heralds onsetof sepsis in adults. PloS one (8), e6642 (2009)6. Alexander, C., Wang, L.: Big data analytics in heartattack prediction. The Journal of Nursing Care (393)(2017)7. Angiuoli, S.V., Matalka, M., Gussman, A., Galens, K.,Vangala, M., Riley, D.R., Arze, C., White, J.R., White,O., Fricke, W.F.: Clovr: a virtual machine for automatedand portable sequence analysis from the desktop us-ing cloud computing. BMC bioinformatics (1), 356(2011)8. Anzai, Y.: Pattern recognition and machine learning.Elsevier (2012)9. Van der Auwera, G.A., Carneiro, M.O., Hartl, C.,Poplin, R., Del Angel, G., Levy-Moonshine, A., Jordan,T., Shakir, K., Roazen, D., Thibault, J., et al.: Fromfastq data to high-confidence variant calls: the genomeanalysis toolkit best practices pipeline. Current proto-cols in bioinformatics pp. 11–10 (2013)10. Aziz, H.A.: A review of the role of public health in-formatics in healthcare. Journal of Taibah UniversityMedical Sciences (1), 78–81 (2017)11. B¨ack, T.: Evolutionary computation: Toward a new phi-losophy of machine intelligence (1997)12. Bader, G.D., Hogue, C.W.: An automated method forfinding molecular complexes in large protein interactionnetworks. BMC bioinformatics (1), 2 (2003)everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 2513. Bagyamathi, M., Inbarani, H.H.: A novel hybridizedrough set and improved harmony search based featureselection for protein sequence classification. In: Big datain complex systems, pp. 173–204. Springer (2015)14. Bahga, A., Madisetti, V.K.: A cloud-based approach forinteroperable electronic health records (ehrs). IEEEJournal of Biomedical and Health Informatics (5),894–906 (2013)15. Bakshi, K.: Considerations for big data: Architectureand approach. In: Aerospace Conference, 2012 IEEE,pp. 1–7. IEEE (2012)16. Barbu, A., She, Y., Ding, L., Gramajo, G.: Feature selec-tion with annealing for big data learning. arXiv preprint(2013)17. Baro, E., Degoul, S., Beuscart, R., Chazard, E.: Towarda literature-driven definition of big data in healthcare.BioMed research international (2015)18. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimen-sionality reduction and data representation. Neuralcomputation (6), 1373–1396 (2003)19. Belle, A., Thiagarajan, R., Soroushmehr, S., Navidi, F.,Beard, D.A., Najarian, K.: Big data analytics in health-care. BioMed research international (2015)20. Benjamin, M., Aradi, Y., Shreiber, R.: From shared datato sharing workflow: Merging pacs and teleradiology.European Journal of Radiology (1), 3–9 (2010)21. Berger, M.L., Doban, V.: Big data, advanced analyt-ics and the future of comparative effectiveness research.Journal of Comparative Effectiveness Research (2),167–176 (2014)22. BERNARD, E.: Supporting diagnosis and treatment inmedical care based on big data processing. In: Cross-Border Challenges in Informatics with a Focus on Dis-ease Surveillance and Utilising Big Data: Proceedings ofthe EFMI Special Topic Conference, 27-29 April 2014,Budapest, Hungary, vol. 197, p. 65. IOS Press (2014)23. Berndt, D.J., Fisher, J.W., Hevner, A.R., Studnicki,J.: Healthcare data warehousing and quality assurance.Computer (12), 56–65 (2001)24. Bhardwaj, N., Wodajo, B., Spano, A., Neal, S., Cous-tasse, A.: The impact of big data on chronic diseasemanagement. The health care manager (1), 90–98(2018)25. Bhatia, S., Prakash, P., Pillai, G.: Svm based deci-sion support system for heart disease classification withinteger-coded genetic algorithm to select critical fea-tures. In: Proceedings of the world congress on engi-neering and computer science, pp. 34–38 (2008)26. Bhattacherjee, A., Hikmet, N.: Physicians’ resistance to-ward healthcare information technology: a theoreticalmodel and empirical test. European Journal of Infor-mation Systems (6), 725–737 (2007)27. Bilofsky, H.S., Christian, B.: The genbank R (cid:13) genetic se-quence data bank. Nucleic acids research (5), 1861–1863 (1988)28. Blumenthal, D.: Launching hitech. New England Jour-nal of Medicine (5), 382–385 (2010)29. Bochare, A.: Heterogeneous data integration for clinicaldecision support system. - - , – (2011)30. Bodo, M., Settle, T., Royal, J., Lombardini, E., Sawyer,E., Rothwell, S.W.: Multimodal noninvasive monitoringof soft tissue wound healing. Journal of clinical moni-toring and computing (6), 677–688 (2013)31. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter,M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Mi-choud, K., O’donovan, C., Phan, I., et al.: The swiss-prot protein knowledgebase and its supplement tremblin 2003. Nucleic acids research (1), 365–370 (2003) 32. Boeckmann, B., Blatter, M.C., Famiglietti, L., Hinz, U.,Lane, L., Roechert, B., Bairoch, A.: Protein variety andfunctional diversity: Swiss-prot annotation in its bio-logical context. Comptes rendus biologies (10-11),882–899 (2005)33. Bradley, P.S.: Implications of big data analytics on pop-ulation health management. Big data (3), 152–159(2013)34. Bressan, N., James, A., McGregor, C.: Trends and op-portunities for integrated real time neonatal clinical de-cision support. In: Biomedical and Health Informatics(BHI), 2012 IEEE-EMBS International Conference on,pp. 687–690. IEEE (2012)35. Buntin, M.B., Burke, M.F., Hoaglin, M.C., Blumenthal,D.: The benefits of health information technology: A re-view of the recent literature shows predominantly posi-tive results. Health Affairs pp. 464–471 (2011)36. Burghard, C.: Big data and analytics key to accountablecare success. IDC health insights pp. 1–9 (2012)37. Calaway, R., Edlefsen, L., Gong, L., Fast, S.: Big datadecision trees with r. Revolution (2016)38. Cao, H., Eshelman, L., Chbat, N., Nielsen, L., Gross, B.,Saeed, M.: Predicting icu hemodynamic instability usingcontinuous multiparameter trends. In: Engineering inMedicine and Biology Society, 2008. EMBS 2008. 30thAnnual International Conference of the IEEE, pp. 3803–3806. IEEE (2008)39. Capriolo, E., Wampler, D., Rutherglen, J.: Program-ming Hive: Data warehouse and query language forHadoop. ” O’Reilly Media, Inc.” (2012)40. Chawla, N.V., Davis, D.A.: Bringing big data to person-alized healthcare: a patient-centered framework. Jour-nal of general internal medicine (3), 660–665 (2013)41. Chen, C.C., Chang, Y.J., Chung, W.C., Lee, D.T., Ho,J.M.: Cloudrs: an error correction algorithm of high-throughput sequencing data based on scalable frame-work. In: Big Data, 2013 IEEE International Conferenceon, pp. 717–722. IEEE (2013)42. Chen, H., Chiang, R.H., Storey, V.C.: Business intelli-gence and analytics: from big data to big impact. MISquarterly pp. 1165–1188 (2012)43. Chen, J., Li, K., Rong, H., Bilal, K., Yang, N., Li, K.: Adisease diagnosis and treatment recommendation sys-tem based on big data mining and cloud computing.Information Sciences , 124–149 (2018)44. Chen, J., Qian, F., Yan, W., Shen, B.: Translationalbiomedical informatics in the cloud: present and future.BioMed research international (2013)45. Chen, M., Mao, S., Liu, Y.: Big data: A survey. Mobilenetworks and applications (2), 171–209 (2014)46. Chen, N., Chen, A.z., Zhou, L.x.: An incremental griddensity-based clustering algorithm. Journal of software (1), 1–7 (2002)47. Chen, W., Cockrell, C., Ward, K.R., Najarian, K.: In-tracranial pressure level prediction in traumatic braininjury by extracting features from multiple sources andusing machine learning methods. In: Bioinformatics andBiomedicine (BIBM), 2010 IEEE International Confer-ence on, pp. 510–515. IEEE (2010)48. Consortium, I.H.G.S., et al.: Initial sequencing and anal-ysis of the human genome. Nature (6822), 860(2001)49. Conway, M., OConnor, D.: Social media, big data, andmental health: current advances and ethical implica-tions. Current opinion in psychology , 77–82 (2016)50. Costa, C., Oliveira, J.L.: Telecardiology through ubiqui-tous internet services. International journal of medicalinformatics (9), 612–621 (2012)6 Arshia Rehman et al.51. Cox, M., Ellsworth, D.: Application-controlled demandpaging for out-of-core visualization. In: Proceedings ofthe 8th conference on Visualization’97, pp. 235–ff. IEEEComputer Society Press (1997)52. Cunha, J., Silva, C., Antunes, M.: Health twitter bigbata management with hadoop framework. ProcediaComputer Science , 425–431 (2015)53. D OConnor, B., Merriman, B., Nelson, S.F.: Seqwarequery engine: storing and searching sequence data inthe cloud. - (12), S2 (2010)54. Daghistani, T., Al Shammari, R., Razzak, M.I.: Discov-ering diabetes complications: an ontology based model.Acta Informatica Medica (6), 385 (2015)55. D’Agostino Sr, R.B., Grundy, S., Sullivan, L.M., Wil-son, P., Group, C.R.P., et al.: Validation of the fram-ingham coronary heart disease prediction scores: resultsof a multiple ethnic groups investigation. Jama (2),180–187 (2001)56. David, S.K., Saeb, A.T., Rafiullah, M., Rubeaan, K.:Classification techniques and data mining tools used inmedical bioinformatics. In: Big Data Governance andPerspectives in Knowledge Management, pp. 105–126.IGI Global (2019)57. Day, A., Dong, J., Funari, V.A., Harry, B., Strom, S.P.,Cohn, D.H., Nelson, S.F.: Disease gene characterizationthrough large-scale co-expression analysis. PloS one (12), e8491 (2009)58. De Choudhury, M., Counts, S., Horvitz, E.: Predictingpostpartum changes in emotion and behavior via socialmedia. In: Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems, pp. 3267–3276.ACM (2013)59. De Choudhury, M., Counts, S., Horvitz, E.: Social me-dia as a measurement tool of depression in populations.In: Proceedings of the 5th Annual ACM Web ScienceConference, pp. 47–56. ACM (2013)60. De Choudhury, M., Counts, S., Horvitz, E.J., Hoff, A.:Characterizing and predicting postpartum depressionfrom shared facebook data. In: Proceedings of the 17thACM conference on Computer supported cooperativework & social computing, pp. 626–638. ACM (2014)61. De Choudhury, M., Gamon, M., Counts, S., Horvitz,E.: Predicting depression via social media. ICWSM ,1–10 (2013)62. De Ridder, D., Duin, R.P.: Sammon’s mapping usingneural networks: a comparison. Pattern RecognitionLetters (11-13), 1307–1316 (1997)63. Dean, J., Ghemawat, S.: Mapreduce: simplified dataprocessing on large clusters. Communications of theACM (1), 107–113 (2008)64. Deb, K.: Multi-objective optimization using evolution-ary algorithms, vol. 16. John Wiley & Sons (2001)65. Dembosky, A.: Data prescription for better healthcare.Financial Times (12), 2012 (2012)66. Devi, A.S., Maragatham, G.: Big genome data classifi-cation with random forests using variantspark. In: Inter-national Conference on Computer Networks and Com-munication Technologies, pp. 599–614. Springer (2019)67. Diebold, F.X.: Big data dynamic factor models formacroeconomic measurement and forecasting. In: Ad-vances in Economics and Econometrics: Theory and Ap-plications, Eighth World Congress of the EconometricSociety,(edited by M. Dewatripont, LP Hansen and S.Turnovsky), pp. 115–122 (2003)68. Dilsizian, S.E., Siegel, E.L.: Artificial intelligence inmedicine and cardiac imaging: harnessing big data andadvanced computing to provide personalized medical diagnosis and treatment. Current cardiology reports (1), 441 (2014)69. for Disease Control, C., (CDC), P., et al.: Behavioral riskfactors surveillance system (brfss). Website (2015)70. Djuric, N.: Big data algorithms for visualization andsupervised learning. Temple University (2013)71. Drew, B.J., Harris, P., Z`egre-Hemsey, J.K., Mammone,T., Schindler, D., Salas-Boni, R., Bai, Y., Tinoco, A.,Ding, Q., Hu, X.: Insights into the problem of alarmfatigue with physiologic monitor devices: a comprehen-sive observational study of consecutive intensive careunit patients. PloS one (10), e110274 (2014)72. Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L.,Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko,I., Nilsen, G.B., Yeung, G., et al.: Human genome se-quencing using unchained base reads on self-assemblingdna nanoarrays. Science (5961), 78–81 (2010)73. Duda, R.O., Hart, P.E., Stork, D.G., et al.: Pattern clas-sification, vol. 2. Wiley New York (1973)74. Dutta, H., Kamil, A., Pooleery, M., Sethumadhavan, S.,Demme, J.: Distributed storage of large-scale multidi-mensional electroencephalogram data using hadoop andhbase. In: Grid and Cloud Database Management, pp.331–347. Springer (2011)75. Eelmets, M.: Clovr: A virtual machine for automatedand portable sequence analysis from the desktop usingcloud computing. – - (-) (2011)76. El Naqa, I.: Perspectives on making big data analyticswork for oncology. Methods , 32–44 (2016)77. Emani, C.K., Cullot, N., Nicolle, C.: Understandablebig data: a survey. Computer science review (34), 226–231(1996)80. Eynon, R.: The rise of big data: what does it mean foreducation, technology, and media research? (2013)81. Feldman, B., Martin, E.M., Skotnes, T.: Big data inhealthcare hype and hope. October 2012. Dr. Bonnie (2012)82. Fernandes, L.M., O’Connor, M., Weaver, V.: Big data,bigger outcomes. Journal of AHIMA (10), 38–43(2012)83. Frost, S.: Drowning in big data? reducing informationtechnology complexities and costs for healthcare orga-nizations (2015)84. Galetsi, P., Katsaliaki, K.: A review of the literature onbig data analytics in healthcare. Journal of the Opera-tional Research Society pp. 1–19 (2019)85. Gamache, R., Kharrazi, H., Weiner, J.P.: Public andpopulation health informatics: The bridging of big datato benefit communities. Yearbook of medical informat-ics (01), 199–206 (2018)86. Ganjir, V., Sarkar, B., Kumar, R.: Big data analyticsfor healthcare. International Journal of Research in En-gineering, Technology and Science , 1–6 (2016)87. Garattini, C., Raffle, J., Aisyah, D.N., Sartain, F., Ko-zlakidis, Z.: Big data analytics, infectious diseases andassociated ethical impacts. Philosophy & technology (1), 69–85 (2019)everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 2788. Geerts, H., Dacks, P.A., Devanarayan, V., Haas,M., Khachaturian, Z.S., Gordon, M.F., Maudsley, S.,Romero, K., Stephenson, D., Initiative, B.H.M., et al.:Big data to smart data in alzheimer’s disease: The brainhealth modeling initiative to foster actionable knowl-edge. Alzheimer’s & Dementia (9), 1014–1021 (2016)89. George, L.: HBase: the definitive guide: random accessto your planet-size data. ” O’Reilly Media, Inc.” (2011)90. Gessner, R.C., Frederick, C.B., Foster, F.S., Dayton,P.A.: Acoustic angiography: a new imaging modalityfor assessing microvasculature architecture. Journal ofBiomedical Imaging , 14 (2013)91. Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer,L., Smolinski, M.S., Brilliant, L.: Detecting influenzaepidemics using search engine query data. Nature (7232), 1012 (2009)92. Giveki, D., Salimi, H., Bahmanyar, G., Khademian, Y.:Automatic detection of diabetes diagnosis using featureweighted support vector machines based on mutual in-formation and modified cuckoo search. arXiv preprintarXiv:1201.2173 (2012)93. Glemser, P.A., Engel, K., Simons, D., Steffens, J.,Schlemmer, H.P., Orakcioglu, B.: A new approach forphotorealistic visualization of rendered computed to-mography images. World neurosurgery , e283–e292(2018)94. Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J.,Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea,T.P., Sykes, S., et al.: High-quality draft assemblies ofmammalian genomes from massively parallel sequencedata. Proceedings of the National Academy of Sciences (4), 1513–1518 (2011)95. Goldberger, A.L., Amaral, L.A., Glass, L., Hausdorff,J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody,G.B., Peng, C.K., Stanley, H.E.: Physiobank, phys-iotoolkit, and physionet. Circulation (23), e215–e220 (2000)96. Goli-Malekabadi, Z., Sargolzaei-Javan, M., Akbari,M.K.: An effective model for store and retrieve bighealth data in cloud computing. Computer methodsand programs in biomedicine , 75–82 (2016)97. Grace, R.K., Manimegalai, R., Kumar, S.S.: Medical im-age retrieval system in grid using hadoop framework. In:2014 International Conference on Computational Sci-ence and Computational Intelligence, vol. 1, pp. 144–148. IEEE (2014)98. Graham, K.C., Cvach, M.: Monitor alarm fatigue: stan-dardizing use of physiological monitors and decreasingnuisance alarms. American Journal of Critical Care (1), 28–34 (2010)99. Grosu, P., Townsend, J.P., Hartl, D.L., Cavalieri, D.:Pathway processor: a tool for integrating whole-genomeexpression results into metabolic networks. Genome re-search (7), 1121–1126 (2002)100. Groves, P., Kayyali, B., Knott, D., Van Kuiken, S.: Thebig datarevolution in healthcare. McKinsey Quarterly , 3 (2013)101. Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clus-tering algorithm for large databases. In: ACM SigmodRecord, pp. 73–84. ACM (1998)102. Gui, H., Zheng, R., Ma, C., Fan, H., Xu, L.: An architec-ture for healthcare big data management and analysis.In: International Conference on Health Information Sci-ence, pp. 154–160. Springer (2016)103. Gupta, M., George, J.F.: Toward the development of abig data analytics capability. Information & Manage-ment (8), 1049–1064 (2016) 104. Gurtowski, J., Schatz, M.C., Langmead, B.: Genotyp-ing in the cloud with crossbow. Current protocols inbioinformatics pp. 15–3 (2012)105. Hagg, E., Dahinten, V.S., Currie, L.M.: The emerginguse of social media for health-related purposes in lowand middle-income countries: A scoping review. Interna-tional journal of medical informatics , 92–105 (2018)106. Hall, L.O., Chawla, N., Bowyer, K.W.: Decision treelearning on very large data sets. In: Systems, Man, andCybernetics, 1998. 1998 IEEE International Conferenceon, vol. 3, pp. 2579–2584. IEEE (1998)107. Haller, S., Badoud, S., Nguyen, D., Garibotto, V.,Lovblad, K., Burkhard, P.: Individual detection of pa-tients with parkinson disease using support vector ma-chine analysis of diffusion tensor imaging data: initialresults. American Journal of Neuroradiology (11),2123–2128 (2012)108. Han, H., Ryoo, H.C., Patrick, H.: An infrastructure ofstream data mining, fusion and management for mon-itored patients. In: Computer-Based Medical Systems,2006. CBMS 2006. 19th IEEE International Symposiumon, pp. 461–468. IEEE (2006)109. Hankey, B.F., Ries, L.A., Edwards, B.K.: The surveil-lance, epidemiology, and end results program: a na-tional resource. Cancer Epidemiology and PreventionBiomarkers (12), 1117–1121 (1999)110. Hatef, E., Weiner, J.P., Kharrazi, H.: A public healthperspective on using electronic health records to ad-dress social determinants of health: The potential fora national system of local community health records inthe united states. International journal of medical in-formatics , 86–89 (2019)111. Hay, S.I., George, D.B., Moyes, C.L., Brownstein, J.S.:Big data opportunities for global infectious diseasesurveillance. PLoS medicine (4), e1001413 (2013)112. Hayat, M.J., Howlader, N., Reichman, M.E., Edwards,B.K.: Cancer statistics, trends, and multiple primarycancer analyses from the surveillance, epidemiology, andend results (seer) program. The oncologist (1), 20–37(2007)113. He, C., Fan, X., Li, Y.: Toward ubiquitous healthcareservices with a novel efficient cloud platform. IEEETransactions on Biomedical Engineering (1), 230–234(2012)114. Helm-Murtagh, S.C.: Use of big data by blue cross andblue shield of north carolina. North Carolina medicaljournal (3), 195–197 (2014)115. Herland, M., Khoshgoftaar, T.M., Wald, R.: A review ofdata mining using big data in health informatics. Jour-nal of Big Data (1), 2 (2014)116. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L.,Cetin, F.B., Babu, S.: Starfish: a self-tuning system forbig data analytics. - (2011), 261–272 (2011)117. Hey, A.J., Trefethen, A.E.: The data deluge: An e-science perspective. - (2003)118. Hiatt, R.A., Rimer, B.K.: A new strategy for cancercontrol research. Cancer Epidemiology and PreventionBiomarkers (11), 957–964 (1999)119. Hinneburg, A., Keim, D.A., et al.: An efficient approachto clustering in large multimedia databases with noise.In: KDD, vol. 98, pp. 58–65 (1998)120. Holland, S.M.: Principal components analysis (pca). De-partment of Geology, University of Georgia, Athens, GApp. 30602–2501 (2008)121. Horn, J., Nafpliotis, N., Goldberg, D.E.: A niched paretogenetic algorithm for multiobjective optimization. In:Evolutionary Computation, 1994. IEEE World Congress8 Arshia Rehman et al.on Computational Intelligence., Proceedings of the FirstIEEE Conference on, pp. 82–87. Ieee (1994)122. Hsieh, C.J., Si, S., Dhillon, I.: A divide-and-conquersolver for kernel support vector machines. In: Inter-national Conference on Machine Learning, pp. 566–574(2014)123. Hu, P., Galvagno, S.M., Sen, A., Dutton, R., Jordan,S., Floccare, D., Handley, C., Shackelford, S., Pasley,J., Mackenzie, C.: Identification of dynamic prehospi-tal changes with continuous vital signs acquisition. Airmedical journal (1), 27–33 (2014)124. Huang, B.E., Mulyasasmita, W., Rajagopal, G.: Thepath from big data to precision medicine. Expert Re-view of Precision Medicine and Drug Development (2),129–143 (2016)125. Huang, H., Tata, S., Prill, R.J.: Bluesnp: R package forhighly scalable genome-wide association studies usinghadoop clusters. Bioinformatics (1), 135–136 (2012)126. Huang, W., Li, L., Myers, J.R., Marth, G.T.: Art: anext-generation sequencing read simulator. Bioinfor-matics (4), 593–594 (2011)127. Huang, Z.: Extensions to the k-means algorithm for clus-tering large data sets with categorical values. Data min-ing and knowledge discovery (3), 283–304 (1998)128. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Cas-tro, E., Langendijk-Genevaux, P.S., Pagni, M., Sigrist,C.J.: The prosite database. Nucleic acids research (suppl 1), D227–D230 (2006)129. Hung, C.L., Lin, Y.L.: Implementation of a parallel pro-tein structure alignment service on cloud. Internationaljournal of genomics (2013)130. van Iersel, M.P., Kelder, T., Pico, A.R., Hanspers, K.,Coort, S., Conklin, B.R., Evelo, C.: Presenting and ex-ploring biological pathways with pathvisio. BMC bioin-formatics (1), 399 (2008)131. Istephan, S., Siadat, M.R.: Unstructured medical imagequery using big data–an epilepsy case study. Journal ofbiomedical informatics , 218–226 (2016)132. Jacofsky, D.: The myths of big datain health care. Thebone & joint journal (12), 1571–1576 (2017)133. Jai-Andaloussi, S., Elabdouli, A., Chaffai, A., Madrane,N., Sekkaki, A.: Medical content based image retrievalby using the hadoop framework. In: Telecommunica-tions (ICT), 2013 20th International Conference on, pp.1–5. IEEE (2013)134. Jayapandian, C.P., Chen, C.H., Bozorgi, A., Lhatoo,S.D., Zhang, G.Q., Sahoo, S.S.: Cloudwave: distributedprocessing of big data from electrophysiological record-ings for epilepsy clinical research using hadoop. In:AMIA Annual Symposium Proceedings, vol. 2013, p.691. American Medical Informatics Association (2013)135. Jin, Y., Deyu, T., Yi, Z.: A distributed storage model forehr based on hbase. In: Information Management, Inno-vation Management and Industrial Engineering (ICIII),2011 International Conference on, vol. 2, pp. 369–372.IEEE (2011)136. Johnson, A.E., Pollard, T.J., Shen, L., Li-wei, H.L.,Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi,L.A., Mark, R.G.: Mimic-iii, a freely accessible criticalcare database. Scientific data , 160035 (2016)137. Jorge, J.A., Sim˜oes Lopes, D.: Challenges and ap-proaches to interactive visualization in healthcareworkspaces. Annals of Medicine (sup1), 22–22 (2019)138. Kamesh, D., Neelima, V., Priya, R.R.: A review of datamining using bigdata in health informatics. Interna-tional Journal of Scientific and Research Publications (3) (2015) 139. Kanz, C., Aldebert, P., Althorpe, N., Baker, W., Bald-win, A., Bates, K., Browne, P., van den Broek, A., Cas-tro, M., Cochrane, G., et al.: The embl nucleotide se-quence database. Nucleic Acids Research (suppl 1),D29–D33 (2005)140. Karmonik, C., Boone, T.B., Khavari, R.: Workflow forvisualization of neuroimaging data with an augmentedreality device. Journal of digital imaging (1), 26–31(2018)141. Kashya, H., Ahmed, H.A., Hoque, N., Roy, S., Bhat-tacharyya, D.K.: Big data analytics in bioinformatics:A machine learning perspective. JOURNAL OF LA-TEX CLASS FILES (9), 837–854 (2014)142. Kaufman, L., Rousseeuw, P.J.: Finding groups in data:an introduction to cluster analysis, vol. 344. John Wiley& Sons (2009)143. Kaur, K., Rani, R.: Managing data in healthcare infor-mation systems: many models, one solution. Computer (3), 52–59 (2015)144. Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stock-well, B.R., Ideker, T.: Pathblast: a tool for alignmentof protein interaction networks. Nucleic acids research (suppl 2), W83–W88 (2004)145. Khan, S., Islam, N., Jan, Z., Din, I.U., Rodrigues, J.J.C.:A novel deep learning based framework for the detectionand classification of breast cancer using transfer learn-ing. Pattern Recognition Letters , 1–6 (2019)146. Kim, M.K., Park, J.H.: Identifying and prioritizing crit-ical factors for promoting the implementation and us-age of big data in healthcare. Information Development (3), 257–269 (2017)147. Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson,R.K., Mardis, E.R.: The next-generation sequencingrevolution and its impact on genomics. Cell (1),27–38 (2013)148. Kouranov, A., Xie, L., de la Cruz, J., Chen, L., West-brook, J., Bourne, P.E., Berman, H.M.: The rcsb pdb in-formation portal for structural genomics. Nucleic acidsresearch (suppl 1), D302–D305 (2006)149. Krampis, K., Booth, T., Chapman, B., Tiwari, B., Bi-cak, M., Field, D., Nelson, K.E.: Cloud biolinux: pre-configured and on-demand bioinformatics computing forthe genomics community. BMC bioinformatics (1), 42(2012)150. Kumar, V., Sharma, R.M., Thakur, R.: Big data ana-lytics: Bioinformatics perspective. – - (-) (2016)151. Kupersmith, J., Francis, J., Kerr, E., Krein, S., Pogach,L., Kolodner, R.M., Perlin, J.B.: Advancing evidence-based care for diabetes: Lessons from the veteranshealth administration: A highly regarded ehr system isbut one contributor to the quality transformation of thevha since the mid-1990s. Health Affairs (Suppl1),w156–w168 (2007)152. Lakshmanaprabu, S., Mohanty, S.N., Shankar, K.,Arunkumar, N., Ramirez, G.: Optimal deep learningmodel for classification of lung cancer on ct images. Fu-ture Generation Computer Systems , 374–382 (2019)153. Lamarche-Vadel, A., Pavillon, G., Aouba, A., Johans-son, L.A., Meyer, L., Jougla, E., Rey, G.: Automatedcomparison of last hospital main diagnosis and under-lying cause of death icd10 codes, france, 2008–2009.BMC medical informatics and decision making (1),44 (2014)154. Lander Eric, S., Linton Lauren, M., Bruce, B., Chad,N., Zody Michael, C., Jennifer, B., Keri, D., Ken, D.,Michael, D., William, F., et al.: Initial sequencing andanalysis of the human genome. - (2001)everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 29155. Laney, D.: 3d data management: Controlling data vol-ume, velocity and variety. META group research note (70), 1 (2001)156. Langfelder, P., Horvath, S.: Wgcna: an r package forweighted correlation network analysis. BMC bioinfor-matics (1), 559 (2008)157. Le Roux, P., Menon, D.K., Citerio, G., Vespa, P., Bader,M.K., Brophy, G.M., Diringer, M.N., Stocchetti, N.,Videtta, W., Armonda, R., et al.: Consensus summarystatement of the international multidisciplinary consen-sus conference on multimodality monitoring in neuro-critical care. Neurocritical care (2), 1–26 (2014)158. Lee, J., Mark, R.: A hypotensive episode predictor forintensive care based on heart rate and blood pressuretime series. In: Computing in Cardiology, 2010, pp. 81–84. IEEE (2010)159. Lee, T.J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D.W., Tenenbaum, J.D., Karp, P.D.: Bioware-house: a bioinformatics database warehouse toolkit.BMC bioinformatics (1), 170 (2006)160. Letovsky, S.I., Cottingham, R.W., Porter, C.J., Li,P.W.: Gdb: the human genome database. Nucleic AcidsResearch (1), 94–99 (1998)161. Lewis, S., Csordas, A., Killcoyne, S., Hermjakob, H.,Hoopmann, M.R., Moritz, R.L., Deutsch, E.W., Boyle,J.: Hydra: a scalable proteomic search engine whichutilizes the hadoop distributed computing framework.BMC bioinformatics (1), 324 (2012)162. Liang, M., Zhang, F., Jin, G., Zhu, J.: Fastgcn: a gpuaccelerated tool for fast gene co-expression networks.PloS one (1), e0116776 (2015)163. Lin, W., Dou, W., Zhou, Z., Liu, C.: A cloud-basedframework for home-diagnosis service over big medicaldata. Journal of Systems and Software , 192–206(2015)164. Lu, J., Xu, Q., Li, B., Yuan, X., Sato, K.: Image pro-cessing apparatus, image processing method and medi-cal imaging device (2019). US Patent App. 10/282,631165. Lualdi, M., Fasano, M.: Statistical analysis of pro-teomics data: A review on feature selection. Journalof proteomics , 18–26 (2019)166. Luo, W., Brouwer, C.: Pathview: an r/bioconductorpackage for pathway-based data integration and visu-alization. Bioinformatics (14), 1830–1831 (2013)167. Maia, A.T., Sammut, S.J., Jacinta-Fernandes, A., Chin,S.F.: Big data in cancer genomics. Current Opinion inSystems Biology , 78–84 (2017)168. Manogaran, G., Lopez, D.: A survey of big data ar-chitectures and machine learning algorithms in health-care. International Journal of Biomedical Engineeringand Technology (2-4), 182–211 (2017)169. Manogaran, G., Lopez, D.: Health data analytics usingscalable logistic regression with stochastic gradient de-scent. International Journal of Advanced IntelligenceParadigms (1-2), 118–132 (2018)170. Markonis, D., Schaer, R., Eggel, I., M¨uller, H., De-peursinge, A.: Using mapreduce for large-scale medi-cal image analysis. In: 2012 IEEE Second InternationalConference on Healthcare Informatics, Imaging and Sys-tems Biology, pp. 1–1. IEEE (2012)171. Mazurek, M.: Applying nosql databases for operational-izing clinical data mining models. In: International Con-ference: Beyond Databases, Architectures and Struc-tures, pp. 527–536. Springer (2014)172. McAfee, A., Brynjolfsson, E., Davenport, T.H., Patil,D., Barton, D.: Big data: the management revolution.Harvard business review (10), 60–68 (2012) 173. McCullough, J.S., Casey, M., Moscovice, I., Prasad, S.:The effect of health information technology on qualityin us hospitals. Health Affairs (4), 647–654 (2010)174. McKenna, A., Hanna, M., Banks, E., Sivachenko, A.,Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler,D., Gabriel, S., Daly, M., et al.: The genome analy-sis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome research (9), 1297–1303 (2010)175. Meng, B., Pratx, G., Xing, L.: Ultrafast and scalablecone-beam ct reconstruction using mapreduce in a cloudcomputing environment. Medical physics (12), 6603–6609 (2011)176. Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.:Deep learning for healthcare: review, opportunities andchallenges. Briefings in bioinformatics (2017)177. Mitchell, T.M., et al.: Machine learning. 1997. BurrRidge, IL: McGraw Hill (37), 870–877 (1997)178. Mohammed, E.A., Far, B.H., Naugler, C.: Applicationsof the mapreduce programming framework to clinicalbig data analysis: current landscape and future trends.BioData mining (1), 22 (2014)179. Mohr, D.C., Burns, M.N., Schueller, S.M., Clarke, G.,Klinkman, M.: Behavioral intervention technologies: ev-idence review and recommendations for future researchin mental health. General hospital psychiatry (4),332–338 (2013)180. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Founda-tions of machine learning. MIT press (2012)181. Nambisan, P., Luo, Z., Kapoor, A., Patrick, T.B., Cisler,R.A.: Social media, big data, and public health in-formatics: Ruminating behavior of depression revealedthrough twitter. In: 2015 48th Hawaii InternationalConference on System Sciences, pp. 2906–2913. IEEE(2015)182. Nanda, S.K., Lin, W.Y., Lee, M.Y., Chen, R.S.: A quan-titative classification of essential and parkinson’s tremorusing wavelet transform and artificial neural network onsemg and accelerometer signals. In: Networking, Sens-ing and Control (ICNSC), 2015 IEEE 12th InternationalConference on, pp. 399–404. IEEE (2015)183. Naseer, A., Rani, M., Naz, S., Razzak, M.I., Imran, M.,Xu, G.: Refining parkinsons neurological disorder iden-tification through deep transfer learning. Neural Com-puting and Applications pp. 1–16 (2019)184. Naz, S., Umar, A.I., Ahmad, R., Ahmed, S.B., Shirazi,S.H., Razzak, M.I.: Urdu nastaliq text recognition sys-tem based on multi-dimensional recurrent neural net-work and statistical features. Neural computing andapplications (2), 219–231 (2017)185. Naz, S., Umar, A.I., Ahmad, R., Ahmed, S.B., Shirazi,S.H., Siddiqi, I., Razzak, M.I.: Offline cursive urdu-nastaliq script recognition using multidimensional re-current neural networks. Neurocomputing (177),228–241 (2016)186. Naz, S., Umar, A.I., Ahmad, R., Siddiqi, I., Ahmed,S.B., Razzak, M.I., Shafait, F.: Urdu nastaliq recogni-tion using convolutional–recursive deep learning. Neu-rocomputing , 80–87 (2017)187. Nepusz, T., Yu, H., Paccanaro, A.: Detecting overlap-ping protein complexes in protein-protein interactionnetworks. Nature methods (5), 471 (2012)188. Ng, K., Ghoting, A., Steinhubl, S.R., Stewart, W.F.,Malin, B., Sun, J.: Paramo: a parallel predictive model-ing platform for healthcare analytic research using elec-tronic health records. Journal of biomedical informatics , 160–170 (2014)0 Arshia Rehman et al.189. Ng, R.T., Han, J.: Clarans: A method for clusteringobjects for spatial data mining. IEEE transactionson knowledge and data engineering (5), 1003–1016(2002)190. Nguyen, A.V., Wynden, R., Sun, Y.: Hbase, mapreduce,and integrated data visualization for processing clinicalsignal data. In: AAAI Spring Symposium: Computa-tional Physiology, vol. 2011. California, CA: Associationfor the Advancement of Artificial Intelligence (2011)191. O’Reilly, T., Steele, J., Loukides, M., Hill, C.: Solvingthe wanamaker problem for healthcare (2012)192. ODriscoll, A., Daugelaite, J., Sleator, R.D.: big data,hadoop and cloud computing in genomics. Journal ofbiomedical informatics (5), 774–781 (2013)193. Palaniappan, S., Awang, R.: Intelligent heart diseaseprediction system using data mining techniques. In:Computer Systems and Applications, 2008. AICCSA2008. IEEE/ACS International Conference on, pp. 108–115. IEEE (2008)194. Panda, M., Ali, S.M., Panda, S.K.: Big data in healthcare: A mobile based solution. In: Big Data Analyticsand Computational Intelligence (ICBDAC), 2017 Inter-national Conference on, pp. 149–152. IEEE (2017)195. Pareto, V.: Cours d’´economie politique, vol. 1. LibrairieDroz (1964)196. Park, M., Cha, C., Cha, M.: Depressive moods ofusers portrayed in twitter. In: Proceedings of theACM SIGKDD Workshop on healthcare informatics(HI-KDD), vol. 2012, pp. 1–8. ACM New York, NY(2012)197. Park, S., Lee, S.W., Kwak, J., Cha, M., Jeong, B.: Ac-tivities on facebook reveal the depressive state of users.Journal of medical Internet research (10) (2013)198. Park, Y.S., Schmidt, M., Martin, E.R., Pericak-Vance,M.A., Chung, R.H.: Pathway-pdt: a flexible pathwayanalysis tool for nuclear families. BMC bioinformatics (1), 267 (2013)199. Patel, D.T.: Big data analytics in bioinformatics. In:Biotechnology: Concepts, Methodologies, Tools, andApplications, pp. 1967–1984. IGI Global (2019)200. Peek, N., Holmes, J., Sun, J.: Technical challenges forbig data in biomedicine and health: data sources, infras-tructure, and analytics. Yearbook of medical informatics (01), 42–47 (2014)201. Phan, J.H., Young, A.N., Wang, M.D.: omnibiomarker:a web-based application for knowledge-driven biomarkeridentification. IEEE Transactions on Biomedical Engi-neering (12), 3364–3367 (2013)202. Porche, D.J.: Mens health big data (2014)203. Priyanka, K., Kulennavar, N.: A survey on big data ana-lytics in health care. International Journal of ComputerScience and Information Technologies (4), 5865–5868(2014)204. Raghupathi, W., Raghupathi, V.: Big data analytics inhealthcare: promise and potential. Health informationscience and systems (1), 3 (2014)205. Rajan, J.P., Rajan, S.E.: An internet of things basedphysiological signal monitoring and receiving system forvirtual enhanced health care network. Technology andHealth Care (2), 1–7 (2018)206. Rangarajan, S., Liu, H., Wang, H., Wang, C.L.: Scalablearchitecture for personalized healthcare service recom-mendation using big data lake. In: Service Research andInnovation, pp. 65–79. Springer (2015)207. Ravı, D., Wong, C., Deligianni, F., Berthelot, M.,Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep learningfor health informatics. IEEE journal of biomedical andhealth informatics (1), 4–21 (2017) 208. Razzak, I., Blumenstein, M., Xu, G.: Multiclass supportmatrix machines by maximizing the inter-class marginfor single trial eeg classification. IEEE Transactions onNeural Systems and Rehabilitation Engineering (2019)209. Razzak, I., Imran, M., Xu, G.: Efficient brain tumorsegmentation with multiscale two-pathway-group con-ventional neural networks. IEEE journal of biomedicaland health informatics (2018)210. Razzak, I., Saris, R.A., Blumenstein, M., Xu, G.: Inte-grating joint feature selection into subspace learning: Aformulation of 2dpca for outliers robust feature selec-tion. Neural Networks (2019)211. Razzak, M.I.: Malarial parasite classification using re-current neural network. Int J Image Process , 69 (2015)212. Razzak, M.I., Imran, M., Xu, G.: Big data analytics forpreventive medicine. Neural Computing and Applica-tions pp. 1–35 (2019)213. Razzak, M.I., Naz, S.: Microscopic blood smear segmen-tation and classification using deep contour aware cnnand extreme machine learning. In: 2017 IEEE Con-ference on Computer Vision and Pattern RecognitionWorkshops (CVPRW), pp. 801–807. IEEE (2017)214. Razzak, M.I., Naz, S., Zaib, A.: Deep learning for med-ical image processing: Overview, challenges and thefuture. In: Classification in BioApps, pp. 323–350.Springer (2018)215. Razzak, M.I., Saris, R.A., Blumenstein, M., Xu, G.: Ro-bust 2d joint sparse principal component analysis withf-norm minimization for sparse modelling: 2d-rjspca. In:2018 International Joint Conference on Neural Networks(IJCNN), pp. 1–7. IEEE (2018)216. Rehman, A., Naz, S., Razzak, M.I., Akram, F., Im-ran, M.: A deep learning-based framework for automaticbrain tumors classification using transfer learning. Cir-cuits, Systems, and Signal Processing pp. 1–19 (2019)217. Revere, D., Turner, A.M., Madhavan, A., Rambo, N.,Bugni, P.F., Kimball, A., Fuller, S.S.: Understanding theinformation needs of public health practitioners: a lit-erature review to inform design of an interactive digitalknowledge management system. Journal of biomedicalinformatics (4), 410–421 (2007)218. Ritter, F., Boskamp, T., Homeyer, A., Laue, H.,Schwier, M., Link, F., Peitgen, H.O.: Medical imageanalysis. IEEE pulse (6), 60–70 (2011)219. Rivera, C.G., Vakil, R., Bader, J.S.: Nemo: networkmodule identification in cytoscape. BMC bioinformatics (1), S61 (2010)220. Robinson, T., Killcoyne, S., Bressler, R., Boyle, J.:Samqa: error classification and validation of high-throughput sequenced read data. BMC genomics (1),419 (2011)221. Ross, P., Pohjonen, H.: Images crossing borders: imageand workflow sharing on multiple levels. Insights intoImaging (2), 141–148 (2011)222. Rouse, W.B., Serban, N.: Understanding and managingthe complexity of healthcare. MIT Press (2014)223. Russom, P., et al.: Big data analytics. TDWI best prac-tices report, fourth quarter (4), 1–34 (2011)224. Sadhana, S.S., Shetty, S.: Analysis of diabetic dataset using hive and r. International Journal of Emerg-ing Technology and Advanced Engineering (7), 626–9(2014)225. Sadilek, A., Kautz, H.A., Silenzio, V.: Modeling spreadof disease from social interactions. In: ICWSM, pp. 322–329 (2012)226. Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G.,Lehman, L.W., Moody, G., Heldt, T., Kyaw, T.H.,everaging Big Data Analytics in Healthcare Enhancement: Trends, Challenges and Opportunities 31Moody, B., Mark, R.G.: Multiparameter intelligentmonitoring in intensive care ii (mimic-ii): a public-accessintensive care unit database. Critical care medicine (5), 952 (2011)227. Sahoo, S.S., Jayapandian, C., Garg, G., Kaffashi, F.,Chung, S., Bozorgi, A., Chen, C.H., Loparo, K., Lha-too, S.D., Zhang, G.Q.: Heart beats in the cloud: dis-tributed analysis of electrophysiological big datausingcloud computing for epilepsy clinical research. Journalof the American Medical Informatics Association (2),263–271 (2013)228. Sammon, J.W.: A nonlinear mapping for data structureanalysis. IEEE Transactions on computers (11), 1363–1369(2009)234. Sch¨olkopf, B., Smola, A., M¨uller, K.R.: Kernel principalcomponent analysis. In: International Conference onArtificial Neural Networks, pp. 583–588. Springer (1997)235. Schultz, T.: Turning healthcare challenges into big dataopportunities: A use-case review across the pharmaceu-tical development lifecycle. Bulletin of the AmericanSociety for Information Science and Technology (5),34–40 (2013)236. Seabrook, E.M., Kern, M.L., Rickard, N.S.: Social net-working sites, depression, and anxiety: a systematic re-view. JMIR mental health (4), e50 (2016)237. Sessler, D.: Big data–and its contributions to peri-operative medicine. Anaesthesia (2), 100–105 (2014)238. Seth, B., Dalal, S., Kumar, R.: Securing bioinformat-ics cloud for big data: Budding buzzword or a glanceof the future. In: Recent Advances in ComputationalIntelligence, pp. 121–147. Springer (2019)239. Shackelford, K.: System & method for delineation andquantification of fluid accumulation in efast trauma ul-trasound images (2014). US Patent App. 14/167,448240. Shamli, N., Sathiyabhama, B.: Parkinson’s brain diseaseprediction using big data analytics (2016)241. Sharp, J.: An application architecture to facilitate multi-site clinical trial collaboration in the cloud. In: Proceed-ings of the 2nd International Workshop on Software En-gineering for Cloud Computing, pp. 64–68. ACM (2011)242. Shin, D.: Demystifying big data: Anatomy of big datadevelopmental process. Telecommunications Policy (9), 837–854 (2016)243. Shirazi, S.H., Umar, A.I., Naz, S., Razzak, M.I.: Effi-cient leukocyte segmentation and recognition in periph-eral blood image. Technology and Health Care (3),335–347 (2016)244. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: Thehadoop distributed file system. In: Mass storage systems and technologies (MSST), 2010 IEEE 26th symposiumon, pp. 1–10. Ieee (2010)245. Smith, C.A., Wicks, P.J.: Patientslikeme: Consumerhealth vocabulary as a folksonomy. In: AMIA annualsymposium proceedings, vol. 2008, p. 682. AmericanMedical Informatics Association (2008)246. Sobhy, D., El-Sonbaty, Y., Elnasr, M.A.: Medcloud:healthcare cloud computing system. In: 2012 Interna-tional Conference for Internet Technology and SecuredTransactions, pp. 161–166. IEEE (2012)247. Son, Y.J., Kim, H.G., Kim, E.H., Choi, S., Lee, S.K.:Application of support vector machine for prediction ofmedication adherence in heart failure patients. Health-care informatics research (4), 253–259 (2010)248. Stanton, I., Kliot, G.: Streaming graph partitioning forlarge distributed graphs. In: Proceedings of the 18thACM SIGKDD international conference on Knowledgediscovery and data mining, pp. 1222–1230. ACM (2012)249. Stokes, T.H., Moffitt, R.A., Phan, J.H., Wang, M.D.:chip artifact correction (cacorrect): a bioinformatics sys-tem for quality assurance of genomics and proteomicsarray data. Annals of biomedical engineering (6),1068–1080 (2007)250. Sugawara, H., Ogasawara, O., Okubo, K., Gojobori, T.,Tateno, Y.: Ddbj with new system and face. Nucleicacids research (suppl 1), D22–D24 (2007)251. Sultana, S.N., Ramu, G., Reddy, B.E.: Cloud-based de-velopment of smart and connected data in healthcareapplication. International Journal of Distributed andParallel Systems (6), 1 (2014)252. Sun, J., Sow, D., Hu, J., Ebadollahi, S.: A systemfor mining temporal physiological data streams for ad-vanced prognostic decision support. In: Data Mining(ICDM), 2010 IEEE 10th International Conference on,pp. 1061–1066. IEEE (2010)253. SVD, S.V.D.: Singular value decomposition. - - (-), 593–594 (2014)254. Swan, M.: The quantified self: Fundamental disruptionin big data science and biological discovery. Big data (2), 85–99 (2013)255. Tadist, K., Najah, S., Nikolov, N.S., Mrabti, F., Zahi,A.: Feature selection methods and genomic big data: asystematic review. Journal of Big Data (1), 79 (2019)256. Tsugawa, S., Mogi, Y., Kikuchi, Y., Kishino, F., Fujita,K., Itoh, Y., Ohsaki, H.: On estimating depressive ten-dencies of twitter users utilizing their tweet data. In:Virtual Reality (VR), 2013 IEEE, pp. 1–4. IEEE (2013)257. Tu, J.V., Chu, A., Donovan, L.R., Ko, D.T., Booth,G.L., Tu, K., Maclagan, L.C., Guo, H., Austin, P.C.,Hogg, W., et al.: The cardiovascular health in ambu-latory care research team (canheart) using big data tomeasure and improve cardiovascular health and health-care services. Circulation: Cardiovascular Quality andOutcomes (2), 204–212 (2015)258. Van Schaik, P., Peng, Y., Ojelabi, A., Ling, J.: Explain-able statistical learning in public health for policy de-velopment: the case of real-world suicide data. BMCmedical research methodology (1), 152 (2019)259. Vayena, E., Salath´e, M., Madoff, L.C., Brownstein, J.S.:Ethical challenges of big data in public health. PLoScomputational biology (2), e1003904 (2015)260. Viceconti, M., Hunter, P., Hose, R.: Big data, big knowl-edge: big data for personalized healthcare. IEEE journalof biomedical and health informatics (4), 1209–1215(2015)2 Arshia Rehman et al.261. Wamba, S.F., Akter, S., Edwards, A., Chopin, G.,Gnanzou, D.: How big datacan make big impact: Find-ings from a systematic review and a longitudinal casestudy. International Journal of Production Economics , 234–246 (2015)262. Wan, T.T.: Healthcare informatics research: from datato evidence-based management. Journal of Medical Sys-tems (1), 3–7 (2006)263. Wang, F., Lee, R., Liu, Q., Aji, A., Zhang, X., Saltz, J.:Hadoopgis: A high performance query system for ana-lytical medical imaging with mapreduce: Technical re-port. Emory University (2011)264. Wang, L., Chen, D., Ranjan, R., Khan, S.U., KolOdziej,J., Wang, J.: Parallel processing of massive eeg datawith mapreduce. In: 2012 IEEE 18th International Con-ference on Parallel and Distributed Systems, pp. 164–171. Ieee (2012)265. Wang, W., Haerian, K., Salmasian, H., Harpaz, R.,Chase, H., Friedman, C.: A drug-adverse event extrac-tion algorithm to support pharmacovigilance knowledgemining from pubmed citations. In: AMIA annual sym-posium proceedings, vol. 2011, p. 1464. American Med-ical Informatics Association (2011)266. Wang, Y., Byrd, T.A.: Business analytics-enableddecision-making effectiveness through knowledge ab-sorptive capacity in health care. Journal of KnowledgeManagement (3), 517–539 (2017)267. Wang, Y., Hajli, N.: Exploring the path to big dataanalytics success in healthcare. Journal of Business Re-search , 287–299 (2017)268. Wang, Y., Kung, L., Byrd, T.A.: Big data analytics:Understanding its capabilities and potential benefits forhealthcare organizations. Technological Forecasting andSocial Change , 3–13 (2018)269. Wang, Y., Kung, L., Wang, W.Y.C., Cegielski, C.G.:An integrated big data analytics-enabled transforma-tion model: Application to health care. Information &Management (1), 64–79 (2018)270. Wang, Y., Wang, L., Liu, H., Lei, C.: Large-scale clinicaldata management and analysis system based on cloudcomputing. In: Frontier and Future Development of In-formation Technology in Medicine and Education, pp.1575–1583. Springer (2014)271. Waqialla, M., Razzak, M.I.: An ontology-based frame-work aiming to support cardiac rehabilitation program.Procedia Computer Science , 23–32 (2016)272. Ward, M.J., Marsolo, K.A., Froehle, C.M.: Applicationsof business analytics in healthcare. Business horizons (5), 571–582 (2014)273. Watson, H.J.: Tutorial: Big data analytics: Concepts,technologies, and applications. CAIS , 65 (2014)274. Weiss, S.M., Indurkhya, N.: Predictive data mining: apractical guide. Morgan Kaufmann (1998)275. Wilhelm, M., Schlegl, J., Hahne, H., Gholami, A.M.,Lieberenz, M., Savitski, M.M., Ziegler, E., Butzmann,L., Gessulat, S., Marx, H., et al.: Mass-spectrometry-based draft of the human proteome. Nature (7502),582 (2014)276. Wilmoth, J.R., Shkolnikov, V.: Human mortalitydatabase. University of California - (-) (2010)277. Wong, H.T., Yin, Q., Guo, Y.Q., Murray, K., Zhou,D.H., Slade, D.: Big data as a new approach in emer-gency medicine research. Journal of Acute Disease (3),178–179 (2015)278. Wyber, R., Vaillancourt, S., Perry, W., Mannava, P.,Folaranmi, T., Celi, L.A.: Big data in global health: im-proving health in low-and middle-income countries. Bul- letin of the World Health Organization (3), 203–208(2015)279. Xu, X., J¨ager, J., Kriegel, H.P.: A fast parallel clusteringalgorithm for large spatial databases. In: High Perfor-mance Data Mining, pp. 263–290. Springer (1999)280. Yang, C.T., Shih, W.C., Chen, L.T., Kuo, C.T., Jiang,F.C., Leu, F.Y.: Accessing medical image file with co-allocation hdfs in cloud. Future Generation ComputerSystems , 61–73 (2015)281. Yang, P., Patrick, E., Tan, S.X., Fazakerley, D.J.,Burchfield, J., Gribben, C., Prior, M.J., James, D.E.,Hwa Yang, Y.: Direction pathway analysis of large-scaleproteomics data reveals novel features of the insulin ac-tion pathway. Bioinformatics (6), 808–814 (2013)282. Yao, Q.A., Zheng, H., Xu, Z.Y., Wu, Q., Li, Z.W., Yun,L.: Massive medical images retrieval system based onhadoop. Journal of Multimedia (2), 216–222 (2014)283. Yao, Y.G., Salas, A., Logan, I., Bandelt, H.J.: mtdnadata mining in genbank needs surveying. The AmericanJournal of Human Genetics (6), 929–933 (2009)284. Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gra-dient boosted distributed decision trees. In: Proceedingsof the 18th ACM conference on Information and knowl-edge management, pp. 2061–2064. ACM (2009)285. Young, S.D., Rivers, C., Lewis, B.: Methods of usingreal-time social media technologies for detection and re-mote monitoring of hiv outcomes. Preventive medicine , 112–115 (2014)286. Yu, D., Engel, K.: Joint visualization of 3d reconstructedphotograph and internal medical scan (2018). US PatentApp. 10/092,191287. Zambon, A.C., Gaj, S., Ho, I., Hanspers, K., Vranizan,K., Evelo, C.T., Conklin, B.R., Pico, A.R., Salomonis,N.: Go-elite: a flexible solution for pathway and ontologyover-representation. Bioinformatics (16), 2209–2210(2012)288. Zeng, A., Li, T., Liu, D., Zhang, J., Chen, H.: A fuzzyrough set approach for incremental feature selection onhybrid information systems. Fuzzy Sets and Systems , 39–60 (2015)289. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an ef-ficient data clustering method for very large databases.In: ACM Sigmod Record, pp. 103–114. ACM (1996)290. Zhang, Y., Li, X.: Uses of information and communica-tion technologies in hiv self-management: A systematicreview of global literature. International Journal of In-formation Management (2), 75–83 (2017)291. Zhang, Z., Zhang, Y., Yao, L., Song, H., Kos, A.: Asensor-based wrist pulse signal processing and lung can-cer recognition. Journal of biomedical informatics ,107–116 (2018)292. Zolfaghar, K., Meadem, N., Teredesai, A., Roy, S.B.,Chin, S.C., Muckian, B.: Big data solutions for predict-ing risk-of-readmission for congestive heart failure pa-tients. In: Big Data, 2013 IEEE International Confer-ence on, pp. 64–71. IEEE (2013)293. Zou, Q., Zeng, J., Cao, L., Ji, R.: A novel features rank-ing metric with application to scalable visual and bioin-formatics data classification. Neurocomputing ,346–354 (2016)294. Zubieta, J.C., Skinner, R., Dean, A.G.: Initiating in-formatics and gis support for a field investigation ofbioterrorism: The new jersey anthrax experience. In-ternational journal of health geographics2