Interactive Network Visualization of Opioid Crisis Related Data- Policy, Pharmaceutical, Training, and More
Olga Scrivner, Elizabeth McAvoy, Thuy Nguyen, Tenzin Choeden, Kosali Simon, Katy Börner
IInteractive Network Visualization of Opioid CrisisRelated Data- Policy, Pharmaceutical, Training, andMore
Olga Scrivner ∗ ,a , Elizabeth McAvoy b , Thuy Nguyen b,c , Tenzin Choeden a ,Kosali Simon b , Katy B¨orner a a Luddy School of Informatics, Computing, and Engineering, Indiana University b O’Neill School of Public and Environmental Affairs, Indiana University c School of Public Health, University of Michigan
Abstract
Responding to the U.S. opioid crisis requires a holistic approach supported byevidence from linking and analyzing multiple data sources. This paper discusseshow 20 available resources can be combined to answer pressing public healthquestions related to the crisis. It presents a network view based on U.S. geo-graphical units and other standard concepts, crosswalked to communicate thecoverage and interlinkage of these resources. These opioid-related datasets canbe grouped by four themes: (1) drug prescriptions, (2) opioid related harms,(3) opioid treatment workforce, jobs, and training, and (4) drug policy. An in-teractive network visualization was created and is freely available online; it letsusers explore key metadata, relevant scholarly works, and data interlinkages insupport of informed decision making through data analysis.
Key words:
Opioid Crisis, Policy, Network Visualization, Data Linkage
1. Introduction
The U.S. opioid epidemic is a major national concern, with the number offatal drug overdoses accelerating during the COVID-19 pandemic. As of May2020, the 12-month counts of reported deaths from drug overdose have increased ∗ Corresponding author: [email protected]
Preprint submitted to Preventive Medicine Report February 11, 2021 a r X i v : . [ ec on . E M ] F e b y an estimated 17% compared with the year 2019—rising from 67,281 to 79,251deaths [1]. Furthermore, according to a recent study of spatial and temporaloverdose spikes by the Overdose Detection Mapping Application Program, thenumber of reported overdoses has increased by 18% between pre- (Jan 1 throughMarch 18, 2020) and post-stay-at-home order (March 19 through May 19, 2020),while the number of counties reporting fatalities has increased [2].To address the current opioid crisis, the Department of Health and HumanServices (HHS) strategic priorities include improvements in (1) pain manage-ment, (2) prevention, treatment and recovery, (3) data and research related tothe opioid crisis, and (4) overdose-reversing drugs [3]. A holistic understandingof multiple datasets of drug policy, pharmacy claims, treatment workforce, andopioid-related harms can advance research related to the opioid crisis. We fo-cus our discussion on data resources that are available without major hurdlesto access . These data often include aggregate-level identifiers, such as geo-graphical units (state, county), drug names, and occupation codes. Using theseaggregate-level identifiers can serve as linkages between datasets, and these link-ages may allow researchers and stakeholders to identify new areas for public orhealth interventions and provide evidence-based guidelines for practitioners andpatients. A systematic view of datasets suggests that data linkages become an“informational asset” transforming the way we observe and analyze data [4].Stakeholders and decision makers, however, are often challenged by the largenumber, complexity, and peculiarities of the existing datasets. Researchers mayalso not be aware of available resources as these datasets are provided by dif-ferent sources and have varying data quality and coverage. Some datasets arefreely available while others require signing of legal documents or payment offees for additional aspects of the data. Furthermore, some datasets are massivein size requiring database expertise to run queries; other datasets exist only astextual data in a PDF format and require file parsing before usage. This review We acknowledge that there are many valuable data resources. however they are harderto incorporate into research, due to privacy protection or cost concerns.
2. Background
The causes, consequences, and manifestations of the U.S. opioid crisis havebeen studied from many different angles, including prevention, treatment, drugprescription, law enforcement, criminal justice, and overdose reversal. Treat-ment expansions and prescription reductions are two essential steps in reduc-ing mortality and improving safety for patients with chronic pain. Monitoringand regulatory policies play an equally important role in balancing betweenharms, cost, availability, and benefits of opioid use, as seen in policies such asprescription drug monitoring programs (PDMPs), health insurance expansions,and comprehensive federal legislation (e.g., the Comprehensive Addiction andCare Act) [8, 9]. These efforts have led to a decrease in the overall U.S. drugprescription rate, which has fallen from 81.3 per 100 people in 2012 to 46.7 in2019 [10]. But while the U.S. has had success in implementing preventativemeasures, it has struggled with improving treatment access for those sufferingfrom addiction. A major gap remains between service demand and supply: only30% of U.S. adults with Opioid Use Disorder (OUD) have reported receiving atreatment, according to the 2015-2017 National Survey on Drug Use and Healthdata (NSDUH). In addition, the 2017 Treatment Episode Dataset (TEDs) re-veals an increase in Opiate-related admissions (682,074), whereas only 1,691 outof 15,961 treatment facilities are OTP certified (see the 2019 National Surveyof Substance Abuse Treatment Services [N-SSATs]). In terms of the numberof establishments, the 2018 County Business Patterns data (CBP) identifies36,254 Substance Use Disorder Treatment (SUDT) outpatient centers, 42,906Residential SUDT facilities, and 692 SUDT hospitals. Despite the high priorityfor training expressed by the U.S. Department for Health and Human Serviceand high job demand, the behavioral health workforce (integrated mental andsubstance use disorder) has been “characterized as being in crisis” [11, p. 15].The interdependence of these social, health, economic, and public policy factorscalls for an interdisciplinary holistic and systematic approach where researchersand practitioners can zoom out and examine the problem as a whole and thenzoom in to solve the most pressing issues that have the highest positive impacton improving health and services while decreasing crime and addictions-relateddisorders.Recently, several studies were published that review the current literatureand secondary data relevant to the opioid addiction crisis [12, 6, 13]. Maclean etal. [13] collected and reviewed economic studies and identified several topics rele-vant for understanding the opioid crisis: (1) pharmaceutical industries and drugprescriptions, (2) healthcare providers and labor market, (3) harms and crime,(4) policies. Another study, [12] extracted intervention variables (e.g., pre-vention, treatment, harm reductions) and enabling variables (e.g., surveillance,stigma). Furthermore, Smart et al. [5, 6] reviewed existing datasets, groupingthem according to the HHS strategic priorities: (1) better pain management, (2)addiction prevention, treatment and recovery service, and (3) better targetingof overdose-reversing drugs. In addition, authors classified data based on type,namely national surveys, electronic health records (EHR), claims data, mor-tality records, prescription monitoring data, contextual and policy data, andothers (national, state, local). Strengths and weaknesses of each dataset wereassessed using various metrics (e.g., data accessibility, data linkage, coverage).Data descriptions are often presented in a tabular format with new attributesrendered as columns. For instance, in [12], each variable is provided with itsrelative frequency of occurrence in the reviewed literature, whereas in [6], aplus/minus sign is used to indicate strengths and weaknesses for each dataset.A different perspective, called “probabilistic linkage,” was developed by Weber4t al. [4] in 2014 focusing on a visual representation of potential biomedicalsources and the values of their linkages. The team used a tabular form withsizes, shapes, colors, and positions to indicate data quality, data linkage, typesof data (e.g., pharma, claims, EHR, non-clinical data), data coverage, and eventhe probabilities for obtaining new data or linking existing data.Over the last several years, many new datasets became available (e.g., data.govand nlm.nih.gov), and researchers now have access to datasets with diverse qual-ity and coverage. In order to federate and use these resources, detailed knowl-edge about the datasets is required. Understanding data linkages [14] becomescritical for understanding, communicating, and reducing disease [15]. Data vi-sualizations can be used to communicate the complexity of heterogeneous data.For example, SPOKE [16] and Springer Nature SciGraph [17] use a knowledgegraph (KG) to interlink and query different datasets. The SPOKE KG inter-links more than 30 publicly available biomedical databases, whereas SciGraphinterlinks funders, projects, publications, citations, and scholarly metadata insupport of data exploration.
3. Methods
A recent review of the economics literature related to the opioid crisis byMaclean et al. described over 100 major economic studies on the U.S. opioidcrisis based on a comprehensive review of the literature and expert consulta-tions [13]. Building on this work, we applied a modified protocol of scopingreviews [18] to identify open datasets used in the 120 studies cited (see Fig-ure 1). Specifically, the 120 articles ranging from 1986 to 2020 were importedto the Mendeley library group, and duplicate records were removed. Each arti-cle was scanned for datasets mentioned in the methodology section and articleswithout datasets were discarded. The remaining set (107 articles) was taggedin Mendeley with dataset names as they were used in the studies. We identified283 unique name tags. Across the 107 studies, there were many inconsistencies5n naming and spelling, for instance, ‘nvss,’ ‘nvss multiple cause of death,’ and‘nvss multiple cause-of-death mortality’ all referred to U.S mortality data fromdeath certificates, produced by the National Center for Health Statistics. Wenormalized labels using OpenRefine and the Nearest Neighbor algorithm withPrediction by Partial Matching (PPM) distance [19]. The algorithm detected61 clusters that were merged, resulting in 230 normalized labels. We manuallyinspected all labels and removed datasets that did not fit our eligibility criteria:(1) dataset must be publicly available, and (2) dataset should fall into one ofthe following categories: i) pharmaceutical data–related to opioid prescription,ii) policy data–related to state drug laws, iii) opioid overdose data–related totreatment and treatment results, and iv) employment data–related to trainingand hiring in the substance use disorder treatment industry (SUDT). As a re-sult, we identified 20 datasets for synthesis and data linkage exploration (seeTable 1).
Figure 1: PRISMA flow diagram of the scoping review process able 1: Datasets to support research on the opioid crisis. 16 datasets are extracted from thereview study [13] while 4 additional datasets underlined are ones we identified as relevant.Datasets marked with * require a request submission prior to download. Dataset Description Type
CDC Mortality CDC Opioid Overdose Rate HarmsTEDS-A Treatment Episode Data Set: Admissions HarmsNAS* National Alcohol Survey HarmsNSDUH National Survey on Drug Use and Health HarmsNPDS* National Poison Data System HarmsTEDS-D Treatment Episode Data Set: Discharge HarmsN-SSATs National Survey of Substance Abuse Treatment Services JobsQCEW Quarterly Census of Employment and Wages JobsIPEDS Integrated Postsecondary Education Data System JobsCBP County Business Patterns JobsACS American Community Survey JobsMEPS Medical Expenditure Panel Survey PharmaSunshine Act Open Payments PharmaSDUD (Medicaid) State Drug Utilization Data PharmaMedicare* Medicare Part D Prescription Drug Event PharmaARCOS* Automated Reports and Consolidated Ordering System PharmaCDC Prescription CDC Drug Prescription PharmaPDMP Prescription Drug Monitoring Program PharmaPDAPS Prescription Drug Abuse Policy System PolicyNAMSDL National Alliance for Model State Drug Laws Policy
Data synthesis follows a 3-step process for each dataset: (1) data description(dictionary, size, and time coverage), (2) data linkages, and (3) scholarly meta-data (relevant publications). For each dataset, we searched for a data downloadlink and dictionary, which provides valuable information about data contentand format. For some datasets, the dictionary URLs were not available. As a7esult, we provide this information only for 10 of 20 datasets in this study. Sizewas determined as the number of records based on the most recent year: (1)Small - less than 10,000, (2) Medium-sized - between 10,000 and 1,000,000, (3)Large - 1,000,000 or greater. Time coverage provides information on the yearwhen the dataset became available and the most recent data available for down-load. Several data attributes are used to identify data linkages: geographicalunits (e.g., state, county) and standard crosswalks (e.g., the North AmericanIndustry Classification System or NAICS, Drug Name). Finally, we identifiedthree recent publications that use a dataset to illustrate research results de-rived from that data. In total, 16 variables exist for each dataset: commonabbreviation, full name, data description, dataset category, source URL (somemissing data), dictionary URL, the number of records per year (most recent),size, time coverage (year start and year end), size, geo units, crosswalks, andthree publications.
Network visualizations are widely used to capture the relationship betweenentities (e.g., co-authorship network or gene-disease networks). They displayentities (nodes) and their relationship (edges) in layouts that showcase overallconnectivity structure and clusters while avoiding edge crossings. Networks canbe extracted from tabular data, e.g., a co-author network can be extracted froma tabulation of papers and the set of authors per paper—co-author links connectall authors that appear in a paper together, creating an undirected weightednetwork [20]. In addition, each node and edge can be color- or size-coded tovisualize additional attributes (e.g., number of papers, number of citations, yearof first publication, publication, topic).To compute a visualization of the 20 datasets, we first converted the csv filewith all 20 datasets (rows) and 16 attributes (columns) into two separate files,namely nodelist and edgelist. The nodelist has an additional numeric identifierfor each dataset that is used in the edgelist to describe how datasets are linked.For instance, the ID for CDC Mortality dataset is ‘0’ and the ID for TEDS8 igure 2: Nodelist, partial—only 7 of 16 attributes are shown (top) and Edgelist (bottom) admission is ‘1’ (see Figure 2). These two datasets share the same attribute‘State.’ Thus, we can build their linkage from CDC Mortality (source) to TEDsadmission (target) and vice versa, since the network is undirected. The resultingnetwork has 20 nodes of four categories and 117 edges of xx different types.We used the Force Atlas 2 algorithm in Gephi [21] to layout the network in a2-dimensional space in a manner that minimizes edge crossings and stress: i.e.,interlinked nodes are in close proximity (see Figure 2). Datasets are color-codedto visually render 4 themes: prescription, harms, jobs, and policy. The workflowfor creating this network in Gephi is available at GitHub (https://github.com/obscrivn/datasets).9 igure 3: Network representation of the 20 datasets with policy data (in light blue), pharma-ceutical data (dark blue), opioid data (red), and jobs (hiring/training) data (orange). Circlesize corresponds to the size of the dataset. Edge color denotes the type of linkage.
The interactive visualization is created using JavaScript GEXF viewer pack-age [22]. The Gephi network is exported from Gephi into a gexf format (.gexf),a native xml format suitable for JavaScript (js) interactive visualization frame-works. Then, gefx.js code is updated and uploaded to GitHub. The interac-tive solution is available at https://obscrivn.github.io/datasets/ and it supportssearch, filter, and details on demand [23], as illustrated in Figure 4.
4. Results and Discussion
The visualization makes it possible to interactively explore key metadata anddata interlinkages. Using the online site, users can explore and navigate eachdataset by clicking on the node, examining the linked datasets, reviewing thedata dictionary, and getting familiar with recent publications using the selecteddataset. The collection of relevant scholarly articles helps researchers becomefamiliar with case studies that use these datasets. Figure 4 zooms into the10 igure 4: Interactive network visualization with legend in top left explaining color and sizecoding; details on demand in lower left; interactive network layout on right.
ARCOS dataset.The dark blue color in the legend specifies that this datasetbelongs to a pharmaceutical category. By clicking on the ARCOS node, theattribute menu is shown on the left. The dictionary and dataset attributesprovide direct links for reviewing the data dictionary and downloading data.From the data description, a potential user learns that this dataset providesinformation on drug sales and distribution. To provide relevant informationabout data usage, the visualization shows three recent scholarly publicationsusing the ARCOS dataset. For instance, the study by [24] presents new evidencethat changes in house prices near drug dispensaries are negatively correlatedwith drug quantities. A user might then check the ACS dataset, which is theAmerican Community Survey about households. Next, the researcher can viewtime coverage and size for ARCOS: it ranges from 2009 to 2019 and the size of thedataset is medium. In addition, the menu specifies to which datasets ARCOScan be linked. For example, CDC Mortality and TEDS-admission share the’State’ attribute, whereas Open Payments, MEPS, and CDC Prescription share‘Drug Name’ attribute. The researcher can explore various hypotheses basedon these potential links; for instance,
Do states with high hospital admission ates and high prescription rates have also evidence of large payments to medicalpractitioners and some negative changes in households?
5. Conclusion
One key priority laid out by HHS for combating the opioid crisis is better ac-cess to data and the encouragement of data-driven (policy) decision making. Toassist researchers and policymakers navigating through existing datasets,we de-veloped a dataset and visualization that makes it possible to explore importantcharacteristics and interlinkages of 20 widely used, publicly available datasets.Going forward, we plan to apply the same methodology to individual-level linkeddata and non-public resources. A current limitation of the presented work is thefact that the datasets are not updated as new data becomes available. In futureresearch, we will perform regular updates of the datasets and their interlinkages.Another important area for future work is conducting user studies to identifyhow to best improve the visualization for different stakeholder groups and whatadditional datasets should be added.
References [1] F. Ahmad, L. Rossen, P. Sutton, Provisional drug overdosedeath counts, 2020. URL: .[2] A. Alter, C. Yeager, COVID-19 impact on US national overdose crisis, Tech-nical Report, ODMAP, 2020. URL: http://odmap.org/Content/docs/news/2020/ODMAP-Report-May-2020.pdf .[3] T. E. Price, Secretary Price announces HHS strategy forfighting Opioid crisis, 2017. URL: . 124] G. M. Weber, K. D. Mandl, I. S. Kohane, Finding the missing link for bigbiomedical data, JAMA 311 (2014). doi: .[5] R. Smart, C. A. Kase, A. Meyer, B. Stein, Data sources and data-linking strategies to support research to address the Opioid crisis, Tech-nical Report September, U.S. Department of Health and Human Ser-vices, 2018. URL: https://aspe.hhs.gov/system/files/pdf/259641/OpioidDataLinkage.pdf .[6] R. Smart, C. A. Kase, E. A. Taylor, S. Lumsden, S. R. Smith, B. D. Stein,Strengths and weaknesses of existing data sources to support research toaddress the opioids crisis, Preventive Medicine Reports 17 (2020). doi: .[7] B. Saloner, H. Y. Chang, N. Krawczyk, L. Ferris, M. Eisenberg,T. Richards, K. Lemke, K. E. Schneider, M. Baier, J. P. Weiner, Pre-dictive modeling of Opioid overdose using linked statewide medical andcriminal justice data, JAMA Psychiatry 77 (2020) 1155–62. doi: .[8] G. Poitras, The prescription opioid epidemic: an update, Medicolegaland Bioethics 8 (2018) 21–32. URL: http://dx.doi.org/10.2147/MB.S170220 . doi: .[9] O. Scrivner, T. Nguyen, K. Simon, E. Middaugh, B. Taska, K. B¨orner, Jobpostings in the substance use disorder treatment related sector during thefirst five years of Medicaid expansion, PLOS ONE 15 (2020) e0228394.URL: https://dx.plos.org/10.1371/journal.pone.0228394 . doi: .[10] CDC, U.S. Opioid dispensing rate maps, 2019. URL: .[11] S. M. Skillman, C. R. Snyder, B. K. Frogner, D. G. Patterson, The be-havioral health workforce needed for integration with primary care: Infor-13ation for health workforce planning, Technical Report, Center for HealthWorkforce Studies, University of Washington, 2016.[12] P. Leece, T. Khorasheh, N. Paul, S. Keller-Olaman, S. Massarella, J. Cald-well, M. Parkinson, C. Strike, S. Taha, G. Penney, R. Henderson, H. Man-son, ‘Communities are attempting to tackle the crisis’: A scoping reviewon community plans to prevent and reduce opioid-related harms, BMJOpen 9 (2019) e028583. URL: http://bmjopen.bmj.com/ . doi: .[13] J. C. Maclean, J. Mallatt, C. J. Ruhm, K. Simon, Economic studies onthe Opioid crisis: A review, NBER Working Papers (2020). URL: . doi: .[14] N. Shlomo, Overview of data linkage methods for policy design andevaluation, in: N. Crato, P. Paruolo (Eds.), Data-Driven PolicyImpact Evaluation, Springer International Publishing, 2018, pp. 47–65. URL: https://doi.org/10.1007/978-3-319-78461-8{_}4 . doi: .[15] P. Neish, Linked data: what is it and why should you care?, TheAustralian Library Journal 64 (2015) 3–10. URL: .doi: .[16] S. Baranzini, S. Bandyopadhyay, M. Keiser, Scalable Precision MedicineKnowledge Engine, 2021. URL: https://spoke.ucsf.edu/ .[17] Springer Nature, SN SciGraph. A Linked Open Data platform for thescholarly domain, 2020. URL: .[18] H. Arksey, L. O’Malley, Scoping studies: Towards a methodological frame-work, International Journal of Social Research Methodology: Theory and14ractice 8 (2005) 19–32. URL: . doi: .[19] O. Stephens, Clustering in depth. Methods and theory behind the clus-tering functionality in OpenRefine, 2018. URL: https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth .[20] F. Emmert-Streib, S. Tripathi, O. Yli-Harja, M. Dehmer, Under-standing the World Economy in Terms of Networks: A Survey ofData-Based Network Science Approaches on Economic Networks, Fron-tiers in Applied Mathematics and Statistics 4 (2018) 37. URL: .doi: .[21] M. Bastian, S. Heymann, M. Jacomy, Gephi: an open source soft-ware for exploring and manipulating networks, in: E. Adar, M. Hurst,T. Finin, N. Glance, N. Nicolov, B. Tseng (Eds.), Proceedings of theThird International Conference on Weblogs and Social Media, The AAAIPress, Menlo Park, California, 2009, pp. 361–2. URL: https://gephi.org/users/publications/ .[22] R. Velt, A JavaScript GEXF viewer, 2019. URL: https://github.com/raphv/gexf-js .[23] B. Shneiderman, The eyes have it: a task by data type taxonomyfor information visualizations, in: Proceedings 1996 IEEE Symposiumon Visual Languages, IEEE Comput. Soc. Press, 1996, pp. 336–43.URL: http://ieeexplore.ieee.org/document/545307/ . doi: .[24] W. D’Lima, M. Thibodeau, Health Crisis and Housing Market Effects -Evidence from the U.S. Opioid Epidemic, SSRN (2019). URL: https://papers.ssrn.com/abstract=3456404https://papers.ssrn.com/abstract=3456404