Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen Bent is active.

Publication


Featured researches published by Stephen Bent.


Journal of General Internal Medicine | 2008

Herbal medicine in the United States: review of efficacy, safety, and regulation: grand rounds at University of California, San Francisco Medical Center.

Stephen Bent

IntroductionHerbal products have gained increasing popularity in the last decade, and are now used by approximately 20% of the population. Herbal products are complex mixtures of organic chemicals that may come from any raw or processed part of a plant, including leaves, stems, flowers, roots, and seeds. Under the current law, herbs are defined as dietary supplements, and manufacturers can therefore produce, sell, and market herbs without first demonstrating safety and efficacy, as is required for pharmaceutical drugs. Although herbs are often perceived as “natural” and therefore safe, many different side effects have been reported owing to active ingredients, contaminants, or interactions with drugs.ResultsUnfortunately, there is limited scientific evidence to establish the safety and efficacy of most herbal products. Of the top 10 herbs, 5 (ginkgo, garlic, St. John’s wort, soy, and kava) have scientific evidence suggesting efficacy, but concerns over safety and a consideration of other medical therapies may temper the decision to use these products.ConclusionsHerbal products are not likely to become an important alternative to standard medical therapies unless there are changes to the regulation, standardization, and funding for research of these products.


Annals of Internal Medicine | 2004

Systematic Review: Computed Tomography and Ultrasonography To Detect Acute Appendicitis in Adults and Adolescents

Teruhiko Terasawa; C. Craig Blackmore; Stephen Bent; R. Jeffrey Kohlwes

Context Is computed tomography or ultrasonography better for diagnosing acute appendicitis? Contribution This meta-analysis summarized data from 22 prospective studies that compared results of computed tomography, ultrasonography, or both with surgical findings or clinical follow-up in patients with suspected appendicitis. Computed tomography findings (positive likelihood ratio, 13.3 [95% CI, 9.9 to 17.9]) increased the certainty of diagnosis more than did ultrasonography (positive likelihood ratio, 5.8 [CI, 3.5 to 9.5]). Cautions All studies had significant limitations that probably inflated estimates of diagnostic accuracy, such as inadequate blinding of the reference standard and pathologic verification of disease only in patients with positive test results. The Editors Acute appendicitis is one of the most common acute surgical conditions in the United States; 250000 appendectomies are performed each year (1). Although early clinical evaluation and surgical intervention are mandatory, conventional diagnostic approaches such as history taking, physical examination, and routine laboratory tests are not always accurate (2, 3). Therefore, imaging tests are commonly used to improve diagnostic accuracy (4). In addition, compared with inpatient observation and serial basic laboratory tests, appendiceal imaging may be relatively inexpensive (5). During the past decade, appendiceal computed tomography and graded compression ultrasonography (6) have gained widespread use (4). Appropriateness criteria prepared by the American College of Radiology recommended graded compression ultrasonography as a screening test for most patients with suspected appendicitis (7). These criteria also recommended that computed tomography be used only in patients who are obese; have a rigid, noncompressible abdomen; or are thought to have appendicitis complicated by abscess (7). Published review articles have also supported the initial use of ultrasonography and have advised that computed tomography be reserved for patients with inconclusive sonogram findings (4, 8). However, in a survey of practicing emergency radiologists, the selection of appendiceal imaging varied, and there was no consensus about the best imaging approach for patients with a typical or atypical presentation of appendicitis (9). Most published studies of appendiceal imaging have evaluated the diagnostic accuracy of a single method, either computed tomography or ultrasonography; only a few studies have directly compared the 2 tests (10-13). In addition, no recent systematic review or meta-analysis has critically appraised currently available data on appendiceal imaging. Although 1 meta-analysis on appendiceal ultrasonography was conducted in 1994 (14), most of the included studies evaluated both children and adults, and the meta-analysis did not explore methodologic quality. We conducted a meta-analysis of the diagnostic accuracy of appendiceal computed tomography and ultrasonography in adults and adolescents. We explored the following 2 questions: 1) What is the diagnostic accuracy of computed tomography and ultrasonography? 2) What are the strengths and limitations of the current literature? Methods Study Identification We searched MEDLINE and EMBASE for English- and nonEnglish-language literature published from January 1966 through December 2003. The detailed search strategy can be found in the Appendix. We also manually searched the reference lists of eligible studies, review articles, and textbooks and consulted with experts in diagnostic imaging. Study Selection Two of the authors reviewed the pertinent studies to determine eligibility. We included only studies that prospectively evaluated computed tomography or graded compression ultrasonography in adults and adolescents (patients 14 years of age) who had suspected appendicitis, followed by surgical and pathologic confirmation or clinical follow-up. We expanded the original inclusion criterion from age 18 years or older to age 14 years or older in March 2002 after performing the pilot MEDLINE search, when it became clear that many studies included both adolescents and adults. The 6 computed tomography studies and 4 ultrasonography studies identified by using the original criterion of 18 years of age or older were examined separately in a sensitivity analysis. Data Abstraction Two independent reviewers abstracted relevant data for English-language articles. For nonEnglish-language articles, data were abstracted by a single reviewer working with a physician who was a native speaker of the relevant language. Abstractors were not blinded to journals. On the basis of clinical presentation before the imaging test, we categorized studies into 2 groups: atypical, which referred to studies enrolling only patients with an atypical presentation for appendicitis, and suspected, which referred to studies enrolling patients with both typical and atypical presentations. For 1 computed tomography study (13), we abstracted the combined test results using 3 different computed tomography protocols because this was how the authors published the data. Inconsistencies between reviewers were resolved by discussion, and a third reviewer adjudicated unresolved disagreements. When we could not extract or appropriately analyze pertinent data from published articles, we contacted a corresponding author for clarification. Assessment of Study Quality and Applicability We assessed study quality and applicability by using the checklist prepared for the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests (15). Since we included studies that had a combined reference standard of surgical and pathologic confirmation or medical follow-up, we also abstracted how each study obtained medical follow-up on patients who did not proceed to surgery. If there was no explicit description of how such patients were followed after leaving the emergency department, we considered the follow-up inadequate. Data Synthesis and Analysis For each study, we constructed a 2 2 contingency table consisting of true-positive (TP), false-positive (FP), false-negative (FN), and true-negative (TN) results according to an imaging test and reference standard (surgery or clinical follow-up). We then calculated sensitivity as TP/(TP + FN), specificity as TN/(FP + TN), the likelihood ratio for a positive test result as (TP/[TP + FN])/(FP/[FP + TN]), and the likelihood ratio for a negative test result as (FN/[TP + FN])/(TN/[FP + TN]). We explored the heterogeneity of sensitivity and specificity between studies by comparing confidence intervals of individual study findings with the summary estimates, using forest plots (16). For likelihood ratios, we estimated the statistics Q and I2 as means of quantifying heterogeneity among studies, and we considered the studies heterogeneous if the I2 was more than 30% (17). The summary sensitivity and specificity were calculated as follows, respectively, regardless of heterogeneity: the sum of TPs/(TPs + FNs) and the sum of TNs/(FPs + TNs) (18). We calculated the summary likelihood ratio statistics using the MantelHaenszel fixed-effects model for computed tomography studies because there was no statistical evidence of heterogeneity (positive likelihood ratio: Q = 9.16 [P> 0.2], I2 = 0% [CI, 0% to 58%]; negative likelihood ratio: Q = 7.38 [P> 0.2], I2 = 0% [CI, 0% to 58%]) (19). We used the DerSimonianLaird random-effects model to combine the likelihood ratios for ultrasonography studies since statistical evidence of heterogeneity was suggested (positive likelihood ratio: Q = 86.33 [P< 0.001], I2 = 85% [CI, 76% to 90%]; negative likelihood ratio: Q = 52.75 [P< 0.001], I2 = 75% [CI, 59% to 85%]) (20). Sensitivity Analysis Three computed tomography studies (10, 21, 22) and 3 ultrasonography studies (10, 11, 23) reported nondiagnostic results of imaging tests, that is, cases in which interpreters could not judge whether test results were positive or negative. Nondiagnostic results, if any, were excluded from the calculation of statistics in the main analysis. In a sensitivity analysis, we estimated sensitivity and specificity considering the numbers of nondiagnostic studies as false-negative and false-positive results, respectively, to evaluate a worst-case scenario. Also, we performed preplanned subgroup analyses for studies of adult participants ( 18 years of age), studies with different patient presentations at enrollment (suspected vs. atypical), and studies that included a high percentage of women (>67%). Role of the Funding Sources The funding sources had no role in study design, conduct, data collection, data analysis, data interpretation, or reporting or in the decision to submit the manuscript for publication. Data Synthesis The MEDLINE search identified 316 potentially relevant articles (Figure 1). We excluded 199 studies by scanning the titles and abstracts. We then retrieved and reviewed 117 full reports for inclusion and excluded 97 studies: 80 studies because they enrolled patients younger than 14 years of age, 10 studies because they were retrospective, 5 studies because they were casecontrol studies or case series, and 2 studies for other reasons. Subsequently, the EMBASE search identified 81 potentially relevant articles, 63 of which were excluded after we scanned the titles and abstracts. We then retrieved and reviewed 18 full reports for inclusion and excluded 16 studies: 10 studies enrolling patients younger than 14 years of age, 3 retrospective studies, 1 case series, 1 review article, and 1 study from which pertinent data could not be obtained because it evaluated only the combined diagnostic accuracy of both computed tomography and preceding clinical examinations. Lists of the excluded articles can be found in the Appendix. Figure 1. Article selection process. Study Characteristics We identified 12 studies of computed tomography (10-13, 21, 22, 24-29) and 14 studies of ultrasonography (10-13, 23, 30


Annals of Internal Medicine | 2003

Test characteristics of α-fetoprotein for detecting hepatocellular carcinoma in patients with hepatitis C: A systematic review and critical analysis

Samir Gupta; Stephen Bent; Jeffrey Kohlwes

People with hepatitis C virus (HCV) have a 2% annual risk and a 7% to 14% five-year risk for hepatocellular carcinoma (1-3), a tumor with an estimated median survival duration of 4.3 to 20 months after diagnosis (4-7). Some studies suggest a possible survival advantage when small tumors are detected (8, 9), but no randomized, controlled trials of screening for hepatocellular carcinoma in patients with HCV have been conducted. Although the National Cancer Institute currently recommends against screening for hepatocellular carcinoma (10), many physicians currently screen high-risk populations with various strategies, including serum -fetoprotein (AFP), ultrasonography, and computed tomography (11). The use of AFP, a tumor marker variably secreted by hepatocellular carcinomas, to detect these tumors has been widely debated (12-14). Many conclude that AFP is not a useful diagnostic test (12, 15), but AFP continues to be commonly used (11). To determine a summary estimate of the test characteristics of AFP for detecting hepatocellular carcinoma in patients with HCV, we conducted a systematic review. Methods Study Search Protocol We performed a MEDLINE search from 1966 through December 2002 for English- and nonEnglish-language articles using the following search terms: hepatitis C, hepatocellular carcinoma, screening, diagnosis, -fetoprotein, sensitivity, and specificity. Bibliographies of all reviewed articles were searched to identify additional relevant titles. Titles that mentioned hepatocellular carcinoma or HCV and screening were identified for abstract review. Abstracts that described the use of AFP as a diagnostic or screening test for hepatocellular carcinoma were marked for full article review. Inclusion and Exclusion Criteria Study designs accepted for analysis included randomized, controlled trials, cohort studies, or casecontrol studies that used AFP to detect hepatocellular carcinoma in HCV-infected patients with or without cirrhosis. We required that the authors report sensitivity and specificity for the use of serum AFP (or data sufficient to calculate these test characteristics) and that they identify some gold standard for diagnosis. Computed tomography, magnetic resonance imaging, histopathology, and disease-free time greater than 2 years were considered adequate gold standards. Ultrasonography was not considered an adequate gold standard because its sensitivity for hepatocellular carcinoma is controversial (12, 14, 16, 17). Studies were excluded from analysis if the cause of viral hepatitis was unclear, if at least 50% of the study patients did not have HCV, and if the same data were presented in a separate article by the same investigators. Data abstracted were study design, cause of hepatitis, whether the AFP test was used for diagnosis or screening, type of gold standards used, percentage of the study sample with cirrhosis, and reported sensitivity and specificity of AFP for detecting hepatocellular carcinoma. Analysis To grade the quality of evidence for use of serum AFP as a screening test for hepatocellular carcinoma in patients with HCV, we independently determined study design, whether application of the gold standard for each study was blinded to AFP result, whether the patient selection was independent, the type of gold standard implemented, and presence of partial verification bias. Disparity in grade was resolved by discussion and consensus among all three authors. Results A total of 1239 titles were identified, 55 relevant abstracts were reviewed, and 18 articles were identified as potentially relevant. Five studies met all inclusion criteria and were included in the analysis (15, 18-21). Of the 18 potentially relevant articles, 5 were excluded because they were uncontrolled case series (22-26), 6 were excluded because most study patients did not have HCV (27-32), 1 was excluded because it did not identify the cause of hepatitis in all study patients (33), and 6 were excluded because they did not provide both sensitivity and specificity data for AFP in their study sample (8, 34-38). Characteristics of the 5 studies meeting all inclusion criteria and no exclusion criteria are shown in Table 1 (15, 18-21). Table 1. Characteristics of Included Studies Two studies were prospective cohort studies (15, 18) and 3 were casecontrol studies (19-21). One study (15) universally applied an acceptable gold standard test to both case-patients and controls, and each study used a different gold standard. Table 2 presents abstracted sensitivity and specificity data and abstracted or calculated positive and negative likelihood ratios for the diagnosis of hepatocellular carcinoma at an AFP cutoff value of 20 g/L. This cutoff value was chosen because each included article provided data for a cutoff value of 20 g/L, and an AFP level of 20 g/L is considered a level that prompts further testing (19). Other cutoff values were not reported in every article. Exclusive data for patients with HCV were available for all studies but one (18); for the latter study, only combined data for patients with HCV and hepatitis B virus were available. Table 2. Abstracted Test Characteristics of -Fetoprotein Levels Higher than 20 g/L for Detecting Hepatocellular Carcinoma Sensitivity of AFP levels higher than 20 g/L ranged from 41% to 65%, while specificity ranged from 80% to 94%. Positive likelihood ratios for AFP levels higher than 20 g/L ranged from 3.1 to 6.8 and negative likelihood ratios ranged from 0.4 to 0.6. Table 3 shows the sensitivity and specificity data for an AFP cutoff value higher than 200 g/L, a value that is frequently reported to be specific for the diagnosis of hepatocellular carcinoma (19, 21). Four of the 5 studies reported sensitivity and specificity for this cutoff value. The range of specificities was very high at this cutoff value (99% to 100%), but the sensitivity was very low (20% to 45%). Table 3. Abstracted Test Characteristics for -Fetoprotein Levels Higher than 200 g/L for Detecting Hepatocellular Carcinoma Discussion Our systematic review of the literature shows that the quality of evidence describing the characteristics of AFP as a diagnostic test for hepatocellular carcinoma in patients with HCV is limited. Three of the reviewed studies were casecontrol studies (19-21), which potentially overestimate the sensitivity and specificity of the test in question (39, 40). In contrast, cohort studies are less susceptible to bias because they are more likely to include patients with a varying spectrum of disease, particularly those patients who present more subtly, and therefore more closely reflect the manner in which a test will be implemented in clinical practice (39). Two studies (15, 20) may have partial verification bias, which occurs when the result of the test being evaluated (in this case, AFP or ultrasonography) influences the decision to administer the gold standard test (39, 40). This may falsely elevate sensitivity and specificity (39). Four of five studies (18-21) applied a gold standard of uncertain validity to both case-patients and controls, resulting in a possible underestimation of disease prevalence and an unknown ultimate effect on sensitivity and specificity. Blinding was not reported in four studies (15, 19-21) and may have affected interpretation of gold standard test results. Without systematic blinding, investigators may be more vigilant in applying gold standards to those patients with positive test results and thereby falsely elevate specificity (39, 40). Finally, four studies (15, 18, 20, 21) included patients with and without cirrhosis. Patients with cirrhosis have a higher risk for cancer (41) but commonly have elevated levels of AFP thought to be unrelated to hepatocellular carcinoma (42), leading to an unknown effect on sensitivity and specificity. Notably, one excluded study reported sensitivity of 80% and specificity of 95% for AFP levels higher than 10 g/L applied to a subgroup of patients with histologically severe liver injury (35). Given the significant concerns about the validity of the data generated by the studies reviewed, we could not calculate conclusive summary estimates of the sensitivity and specificity of AFP as a diagnostic test for hepatocellular carcinoma. The biases previously mentioned that affect the reported sensitivities and specificities tend to overestimate the utility of AFP as a diagnostic test, but to guide current interpretation of AFP in practice, we can consider the use of this test if these best-case estimates are true. The most common use of AFP is to screen for hepatocellular carcinoma in asymptomatic patients with HCV. In this scenario, reported prevalence data indicate a pretest probability of hepatocellular carcinoma in patients with HCV is 5% to 12% (41, 43). Using a prevalence of 5% with the range of positive likelihood ratios for an AFP level higher than 20 g/L (3.16.8), results in a post-test probability of 14% to 25%, while an AFP level lower than 20 g/L results in a post-test probability of 2% to 3%. Although a post-test probability of 25% would prompt further work-up with imaging, a post-test probability of 2% is unlikely to be reassuring enough to preclude the use of other screening strategies, including ultrasonography or computed tomography. The other common use of AFP involves the evaluation of patients presenting with one or more high-risk features, including a hepatic nodule (found incidentally or with a screening test) or decompensated liver failure. Data for AFP at higher cutoff values, such as an AFP level higher than 200 g/L (Table 3), suggest that AFP, although not sensitive, can be highly specific for hepatocellular carcinoma. A low AFP level (<200 g/L) would not be informative enough to stop further search for hepatocellular carcinoma, but an AFP level higher than 200 g/L would strongly suggest that cancer is present, allowing for earlier counseling of a patient. In addition to the quality of articles review


Annals of Internal Medicine | 2014

Accuracy of Fecal Immunochemical Tests for Colorectal Cancer: Systematic Review and Meta-analysis

Jeffrey K. Lee; Elizabeth Liles; Stephen Bent; Theodore R. Levin; Douglas A. Corley

Colorectal cancer (CRC) is the second-leading cause of cancer-related deaths in the United States (1). Randomized, controlled trials have shown that annual or biennial fecal occult blood tests (FOBTs) are associated with a 15% to 33% decrease in CRC mortality rates (24). However, FOBTs only detect approximately 13% to 50% of cancer with 1 round of screening in asymptomatic patients (5, 6). In addition, adherence to repeated rounds of FOBTs in real-world screening programs is low, raising concern about their effectiveness as screening tests (7, 8). Fecal immunochemical tests (FITs) are more sensitive at detecting both CRC and adenomas than FOBTs (9, 10). Many FITs require only 1 or 2 stool samples, and none require dietary or medication restrictions, increasing ease of use. In 2008, several U.S. professional societies endorsed the use of FITs to replace FOBTs because of the formers improved performance characteristics and potential for higher participation rates (10, 11). Countries in Europe and Asia have also adopted widespread CRC screening programs using FITs (12, 13). However, the diagnostic characteristics of these tests have been difficult to estimate, with reported sensitivities ranging from 25% to 100% for CRC and specificities usually exceeding 90% (9, 14, 15). The lack of a precise estimate of sensitivity has resulted in confusion among health care providers about the sources of this variation, how best to apply FITs for CRC screening, the optimal number of stool samples for testing, optimal cutoff value for a positive test result, and whether any brand of FIT is superior to others. Our analysis provides a quantitative meta-analysis of the diagnostic accuracy (sensitivity and specificity) of FITs for CRC. In addition, we explored potential sources of heterogeneity by analyzing subgroups classified by FIT sample number, cutoff value for a positive test result, FIT brand, and the reference standard. Methods We developed a protocol on the basis of standard guidelines for the systematic review of diagnostic studies (16, 17) and the strategy used for the U.S. Preventive Services Task Force review in 2008 (9). We followed the STARD (Standards for the Reporting of Diagnostic Accuracy Studies) (18) and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (19) statements for reporting our systematic review. This study was conducted as part of the National Cancer Institutefunded consortium, Population-Based Research Optimizing Screening through Personalized Regimens. The overall aim of this consortium is to conduct multisite, coordinated, transdisciplinary research to evaluate and improve cancer screening processes. Data Sources and Searches We included all studies identified in the previous USPSTF report (9) plus other studies identified by a search of FIT for CRC between 1 January 2008 and 31 August 2013 using MEDLINE (via Ovid), EMBASE, Database of Abstracts of Reviews of Effects, Health Technology Assessment Database, Cochrane Database of Systematic Reviews, and Cochrane Central Register of Controlled Trials. We also searched bibliographies and reference lists of eligible papers and related reviews, consulted experts in the field, and contacted several authors from the included studies to locate additional studies. The Appendix Table 1 provides further details of our search strategy. Appendix Table 1. Search Strategy Study Selection Two persons independently reviewed the pertinent studies to determine eligibility. We included studies if they met all of the following criteria: evaluated the diagnostic accuracy of FITs for CRC; reported absolute numbers of true-positive, false-negative, true-negative, and false-positive observations, or if these same variables could be obtained from personal communication; used a randomized trial or cohort study design; evaluated adult participants who were asymptomatic and older than 18 years with a mean age greater than 40 years; and reported an appropriate reference standard (colonoscopy or 2-year longitudinal follow-up of controls with medical records or cancer registry). Given that only a subset of studies reported data on adenomatous polyps and that there is variability in definitions of polyps, we limited the scope of this analysis to test performance characteristics for detecting CRC; we excluded studies reporting test performance estimates for detection of adenomas only. We did not include conference abstracts and casecontrol studies, which, by creating spectrum bias, can overestimate the accuracy of a test (20). To avoid duplicate reporting of the same population for studies reporting several cutoff values or numbers of samples, we used the cutoff value or sample number most commonly used in current practice in the United States, used in national recommendations, or recommended by expert opinion in the main analyses. In addition, we selected the sample number or cutoff value a priori that was most similar to those in other studies for our subgroup analyses. Data Extraction and Quality Assessment Two reviewers independently evaluated and extracted relevant information from each included study and assessed study quality via the Quality Assessment of Diagnostic Accuracy Studies 2 instrument (21). For studies with incomplete or unavailable information, we contacted the corresponding authors or coauthors to complete missing information. Of the 15 contacted authors, 12 provided additional data. We converted units for cutoff thresholds for a positive test result in each study to micrograms of hemoglobin per gram of stool, as recommended by leading experts (22). Data Synthesis and Analysis We calculated the sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio (LR), and negative LR with 95% CI of each study. A positive LR greater than 5 and a negative LR less than 0.2 provide strong diagnostic evidence to rule in or rule out diagnoses, respectively (23). The overall pooled sensitivity and specificity of FIT for CRC were estimated using a bivariate random-effects model (24). We calculated the pooled positive LR and negative LR along with the respective CI using the bivariate model (24) according to the method used by Zwinderman and colleagues (25). We also generated a hierarchical summary receiver-operating characteristic curve that plots the individual and summary estimates of sensitivities and specificities along with a 95% confidence and prediction region (26). Last, we calculated the area under the hierarchical summary receiver-operating characteristic curve. An area under the curve between 0.9 and 1.0 indicates that the test in question is highly accurate (27). The Q value and the inconsistency index (I 2) test were used to estimate the heterogeneity between each study (28). We regarded values of 25%, 50%, and 75% for the I 2 as indicative of low, moderate, and high statistical heterogeneity, respectively (28). In addition, we calculated the between-study variance of logit sensitivity and logit specificity (24, 29). In diagnostic accuracy studies, 1 of the primary causes of heterogeneity is the threshold effect, which occurs when different cutoff values are used between studies to define a positive (or negative) test result. We searched for evidence of a threshold effect by calculating the squared correlation coefficient estimated from the between-study covariance variable in the bivariate model (30). We stratified studies into 4 subgroups on the basis of the number of FIT samples (1, 2, or 3 samples), prespecified cutoff values of fecal hemoglobin concentration for a positive test result (<20 g/g, 20 to 50 g/g, and >50 g/g), brand, and reference standard used to follow up on patients with negative FIT results. Cutoff values were grouped to ensure an adequate number of data sets for each analysis. To determine whether studies using older (discontinued) FITs were causing heterogeneity in our summary estimates, we did sensitivity analyses by removing these studies and recalculating the I 2 test for the remaining group. In addition to threshold effect and subgroup analyses, we did a bivariate random-effects meta-regression analysis to identify additional sources of heterogeneity that may have influenced our overall summary estimates (30). We used the following prespecified variables for our meta-regression: type of FIT (qualitative, point-of-care tests or quantitative, automated tests), geographic region (Asian or non-Asian countries), and enrollment of patients younger than 40 years. We used Stata, version 12.0 (StataCorp, College Station, Texas), for all statistical analyses. All tests were 2-sided, and we considered P values less than 0.05 to be statistically significant. Role of the Funding Source The study was funded by the National Institute of Diabetes and Digestive and Kidney Diseases and the National Cancer Institute. The funding source had no role in the conception, design, analysis, or conduct of the review. Results Study Selection The 2008 USPSTF report (9) included 9 studies in its systematic review (3139); our literature search identified 1771 additional new potential sources (Figure 1). After abstract review, we identified 53 articles for full-text review; of these, 18 unique articles satisfied all inclusion criteria and were included in our analysis (14, 15, 3146). Because 1 article (46) evaluated more than 1 FIT brand in a head-to-head comparison, the final analysis included 19 studies or data sets. Figure 1. Summary of evidence search and selection. USPSTF = U.S. Preventive Services Task Force. Characteristics of Included Studies Table 1 and the Supplement show the main characteristics of the included studies. Eighteen articles described 19 cohort studies of FIT sensitivity and specificity for CRC in average-risk asymptomatic patients; sample sizes ranged from 80 to 27860. Twelve studies (14, 3336, 4042, 4446) used colonoscopy in all patients, regardless of FIT results, as the re


Annals of Internal Medicine | 2003

The Relative Safety of Ephedra Compared with Other Herbal Products

Stephen Bent; Thomas N. Tiedt; Michelle C. Odden; Michael G. Shlipak

Ephedra (ma huang) use is associated with a greatly increased risk for adverse reactions compared with other herbs, and its use should be restricted.


Annals of Internal Medicine | 2006

Brief Communication: Better Ways To Question Patients about Adverse Medical Events: A Randomized, Controlled Trial

Stephen Bent; Amy Padula; Andrew L. Avins

Context Investigators use diverse methods to assess the adverse events experienced by study participants. Contribution During a 1-month placebo run-in period of a clinical trial, this single-blind substudy randomly assigned 214 men with benign prostatic hyperplasia to 3 groups to test different methods of asking about recent medical problems. Men who completed a checklist about 53 common side effects reported many more problems than participants in the 2 groups that were given different formats of open-ended questions. For example, 77% of the checklist group reported 1 or more medical problems, compared with 13% and 14% of the open-ended groups. Implications Varying the assessment method can cause large differences in reported rates of adverse events. The Editors Currently, there is no standard method for identifying adverse events that occur during a clinical trial. Although regulatory agencies (such as the U.S. Food and Drug Administration) require that studies of new drugs report adverse events in a standard way, they do not specify a standard method for ascertaining these data (1). Consequently, how individual studies identify adverse events varies considerably. For example, early studies of nonsteroidal anti-inflammatory druginduced gastric ulcers reported much lower frequencies of ulcers than more recent studies, mostly because researchers have recently made greater efforts to detect this side effect (2). The implications of this lack of consistent ascertainment methods are substantial; comparisons of rates of reported side effects from 2 or more drugs may not be valid if the methods of collecting adverse events differ. This could impair the ability of patients and physicians to compare the riskbenefit profile of drugs. We therefore conducted a randomized, controlled trial to determine whether different methods of identifying adverse events in a clinical trial would lead to different estimates of the frequency of these events. Methods Study Design The study protocol and all procedures were approved by the Committee on Human Research at the University of California, San Francisco. The study, which took place between April 2002 and April 2005, was a randomized, single-blind, controlled trial that assigned patients to 3 groups to test self-administered methods of assessing medical problems that they experienced while taking a placebo for 1 month. Participants We recruited participants from a larger study that was examining the safety and efficacy of the herb saw palmetto for treatment of benign prostatic hyperplasia (3). The trial, known as the STEP (Saw Palmetto Treatment for Enlarged Prostates) study, required that participants be 50 years of age or older, have moderate to severe symptoms of benign prostatic hyperplasia, and have no serious comorbid illness. All participants in the study gave informed consent; were told that they would be taking placebo at some point during the study; and were assigned to a single-blind, 1-month placebo run-in period. Randomization and Intervention After taking the placebo (referred to as the study medication) for 1 month, patients were randomly assigned to 3 methods of collecting adverse events. All patients were given 1 of 3 self-administered paper forms. The form given to the first group asked an open-ended question: Did you have any significant medical problem since the last study visit? The form given to the second group asked an open-ended question that was more defined: Since the last study visit, have you limited your usual daily activities for more than 1 day because of a medical problem? A checklist accompanied the form given to the third group, which asked a more pointed question: Since the last visit, have you experienced any of the following (checklist attached)? The checklist contained 53 symptoms, grouped by anatomical region. Two of the authors developed the checklist after conducting an unpublished review of checklists that were used in earlier clinical trials performed at the same institution. The checklist did not ask patients to rate the frequency or severity of symptoms and did not ask patients to make a judgment about whether their medical problem was caused by the study medication. Patients in the open-ended question groups who answered yes were asked to identify their medical problem, which was recorded by a study assistant on the same checklist used in the third group. Outcomes and Analysis The primary outcome measure was the difference in the proportion of patients reporting 1 or more adverse events in each group. All patients in the STEP study were included in the current study; therefore, the sample size was not calculated on the basis of the needs of this study. Participants were randomly assigned to the 3 groups in equal proportions by using a computer-generated, random allocation sequence that was prepared before the study began. Study personnel were blinded to the allocation sequence but were aware of group assignments after they were made. Patients were not informed of their group assignment. Persons performing the data analysis were blinded to group assignment. Baseline characteristics of the 3 intervention groups were compared by using analysis of variance for continuous variables and chi-square tests for categorical variables. We also used chi-square tests to compare the number and specific type of adverse events that occurred among groups. All analyses were performed by using Stata, version 8.0 (Stata Corp., College Station, Texas). Role of the Funding Sources The funding organizations had no role in the design and conduct of the study; the collection, management, analysis, and interpretation of the data; or the preparation, review, or approval of the manuscript. Results We randomly assigned 214 patients to 1 of 3 methods of collecting data on adverse effects. Patients were predominantly healthy, well-educated white men (mean age, 63 years) who were taking a mean of 2.5 medications (Table 1). Baseline characteristics of the patients were similar among the 3 groups. All patients completed the study and the outcome assessment (Figure). Table 1. Baseline Characteristics of Study Participants Figure. Flow diagram showing the distribution of participants at each stage of the study. The group that was assigned to a checklist method reported a significantly greater number of adverse events (238 events) than the first and second groups, which were asked open-ended questions (11 and 14 adverse events, respectively; P< 0.001) (Table 2). A much higher percentage of patients in the checklist group reported 1 or more adverse events (77%) compared with the patients asked each of the 2 different open-ended questions (14% for the first group and 13% for the second group; P< 0.001). For each of the 10 most commonly reported adverse events (Table 2), participants in the checklist group reported a greater number of adverse events (P< 0.001). No serious adverse events occurred during the study period. Table 2. 10 Most Frequently Reported Adverse Events Discussion In this randomized, controlled trial, we found that a checklist method of identifying adverse events dramatically increased the number of reported events compared with 2 types of open-ended questions. Although this finding is intuitive, the magnitude of effect has important implications both for the conduct of clinical trials and for assessment of the riskbenefit profile of drugs and other interventions. It is common practice for physicians and patients to select drugs and other interventions on the basis of their reported rate of side effects. However, if different drugs used for the same indication are examined in clinical trials that use different methods of identifying adverse events, then it is not valid to compare the reported rate of side effects. For example, the reported rates of sexual side effects from selective serotonin reuptake inhibitors range from 2% to 73%; much of this difference is probably attributable to different methods of adverse event collection (4). Similarly, a recent systematic review found that published trials of pharmacologic treatments for rheumatoid arthritis were much more likely to report data on harm than trials of nonpharmacologic treatment (5), highlighting the difficulty of comparing the safety of different treatments for the same condition. The 3 self-administered questions that we used to assess the frequency of adverse events in this study were, by design, limited in scope. The self-administered forms did not ask patients to describe the timing, severity, or frequency of their medical problems, nor did they ask participants or investigators to make a judgment of causality. Other techniques to assess adverse events, such as changes in vital signs, laboratory tests, physical examinations, or more detailed searches for expected adverse events, were not included. The purpose of this simplified approach was to isolate and contrast 3 different initial screening methods of identifying medical problems occurring among participants in a clinical trial. Because all patients in the current study were taking placebo, probably none of the reported adverse events were true side effects of the study medication but were merely symptoms that commonly occur in adults. For example, a previous survey of university students and hospital staff found that 81% of respondents reported some symptom within the past 3 days when prompted by a checklist (6). This highlights the problem that most study participants are likely to have a high prevalence of symptoms that are unrelated to a study drug or intervention, and a checklist method is therefore likely to have very low specificity for detecting true side effects. The wording of the 3 self-administered questions that we used in this study asked about 3 different thresholds of medical problems. One question asked participants if they experienced a significant medical problem, one asked if they limited their usual daily activities fo


Clinical Gastroenterology and Hepatology | 2009

Gender as a Risk Factor for Advanced Neoplasia and Colorectal Cancer: A Systematic Review and Meta-analysis

Stephen P. Nguyen; Stephen Bent; Yea-Hung Chen; Jonathan P. Terdiman

BACKGROUND & AIMS Studies have reported higher rates of advanced colorectal neoplasia in men than in women. We performed a meta-analysis to provide a quantitative pooled risk estimate of the association between gender and advanced colorectal neoplasia. METHODS We conducted a systematic review to identify studies of average risk and asymptomatic individuals undergoing screening colonoscopy. We also included studies of subjects with a family history of colorectal neoplasia. We used random effects models to evaluate pooled relative risk estimates and performed heterogeneity and publication bias analyses. The primary outcome measure was relative risk of advanced neoplasia in men compared with women. A secondary outcome measure was relative risk for colorectal cancer. RESULTS Seventeen studies consisting of 18 different populations were included, comprising 924,932 men and women. The pooled relative risk estimate for advanced neoplasia for men compared with women was 1.83 (95% confidence interval [CI], 1.69 -1.97). This positive association between gender and advanced neoplasia was significant across all age groups from 40 to older than 70 years. In 5 studies, the relative risk estimate for cancer for men compared with women was 2.02 (95% CI, 1.53-2.66). Significant heterogeneity was found for the overall analysis and for studies reporting on cancer but not for studies thate xcluded subjects with a family history or for those analyses grouped by age. CONCLUSIONS This meta-analysis provides strong evidence that men are at greater risk for advanced colorectal neoplasia across all age groups. This might inform decisions to create sex-specific colorectal cancer screening recommendations.


Journal of General Internal Medicine | 2009

Genetic Testing Before Anticoagulation? A Systematic Review of Pharmacogenetic Dosing of Warfarin

Kirsten Neudoerffer Kangelaris; Stephen Bent; Robert L. Nussbaum; David A. Garcia; Jeffrey A. Tice

BackgroundGenotype-guided initial warfarin dosing may reduce over-anticoagulation and serious bleeding compared to a one-dose-fits-all dosing method.ObjectiveThe objective of this review was to investigate the safety and efficacy of genotype-guided dosing of warfarin in reducing the occurrence of serious bleeding events and over-anticoagulation.Data SourcesThe authors searched PubMed, EMBASE and International Pharmaceutical Abstracts through January 23, 2009, without language restrictions. Selected articles were randomized trials comparing pharmacogenetic dosing of warfarin versus a “standard” dose control algorithm in adult patients taking warfarin for the first time.Review MethodsTwo reviewers independently extracted data and assessed study quality using a validated instrument. The primary outcomes were major bleeding and time spent within the therapeutic range International Normalized Ratio (INR). Secondary outcomes included minor bleeding, thrombotic events and other measures of anticoagulation quality.ResultsThree of 2,014 studies (423 patients) met the inclusion and exclusion criteria. Differences in study quality, dosing algorithms, length of follow-up and outcome measures limited meta-analysis. Summary estimates revealed no statistically significant difference in bleeding rates or time within the therapeutic range INR. The highest quality study found no significant difference in primary or secondary outcomes, although there was a trend towards more rapid achievement of a stable dose (14.1 vs. 19.6 days, p = 0.07) in the pharmocogenetic arm.ConclusionsWe did not find sufficient evidence to support the use of pharmacogenetics to guide warfarin therapy. Additional clinical trials are needed to define the optimal approach to use warfarin pharmacogenetics in clinical practice.


Journal of General Internal Medicine | 2005

N-acetylcysteine for the prevention of contrast-induced nephropathy. A systematic review and meta-analysis.

Raymond W. Liu; Deepu Nair; Joachim H. Ix; Dan H. Moore; Stephen Bent

OBJECTIVE: Contrast-induced nephropathy is a common cause of acute renal failure in hospitalized patients. Although patients are often given N-acetylcysteine to prevent renal injury from contrast agents, there are no clear guidelines supporting its use. We conducted a systematic review to determine whether administering N-acetylcysteine around the time of contrast administration reduces the risk of contrast-induced nephropathy.DESIGN: We searched MEDLINE, EMBASE, the Cochrane Collaboration Database, bibliographies of retrieved articles, and abstracts of conference proceedings, and consulted with experts to identify relevant studies. Randomized controlled trials of N-acetylcysteine in hospitalized patients receiving contrast were included. Studies were excluded if they did not report change in creatinine or incidence of contrast-induced nephropathy at 48 hours.MEASUREMENTS AND MAIN RESULTS: Nine randomized controlled trials satisfied all inclusion criteria and were included in the analysis. The difference in mean change in creatinine between the N-acetylcy-steine-treated group and controls was −0.27 mg/dl (95% confidence interval [CI], −0.43 to −0.11). The relative risk of developing contrast-induced nephropathy was 0.43 (95% CI, 0.24 to 0.75) in subjects randomized to N-acetylcysteine. Significant heterogeneity existed among studies, suggesting differences in patient populations or study methodology not identified by sensitivity analyses. The incidence of dialysis was rare (0.2%).CONCLUSIONS: Our findings suggest that N-acetylcysteine helps prevent declining renal function and contrast-induced nephropathy. While N-acetylcysteine is inexpensive and nontoxic, undeviating insistence for dosing at least 12 hours in advance of contrast exposure may delay diagnostic and therapeutic procedures. Future studies are needed to address the longer-term clinical outcomes and cost-effectiveness of this agent.


Journal of General Internal Medicine | 2005

Spontaneous bleeding associated with ginkgo biloba: a case report and systematic review of the literature: a case report and systematic review of the literature.

Stephen Bent; Harley Goldberg; Amy Padula; Andrew L. Avins

AbstractBACKGROUND:Ginkgo biloba (ginkgo) is a herbal remedy used by over 2% of the adult population in the United States. Several review articles have suggested that ginkgo may increase the risk of bleeding. OBJECTIVE: To report a case of bleeding associated with using ginkgo, to systematically review the literature for similar case reports, and to evaluate whether using ginkgo is causally related to bleeding. DATA SOURCES: We searched MEDLINE, EMBASE, IBIDS, and the Cochrane Collaboration Database from 1966 to October 2004 with no language restrictions. REVIEW METHODS: Published case reports of bleeding events in persons using ginkgo were selected. Two reviewers independently abstracted a standard set of information to assess whether ginkgo caused the bleeding event. RESULTS: Fifteen published case reports described a temporal association between using ginkgo and a bleeding event. Most cases involved serious medical conditions, including 8 episodes of intracranial bleeding. However, 13 of the case reports identified other risk factors for bleeding. Only 6 reports clearly described that ginkgo was stopped and that bleeding did not recur. Bleeding times, measured in 3 reports, were elevated when patients were taking ginkgo. CONCLUSION: A structured assessment of published case reports suggests a possible causal association between using ginkgo and bleeding events. Given the widespread use of this herb and the serious nature of the reported events, further studies are needed. Patients using ginkgo, particularly those with known bleeding risks, should be counseled about a possible increase in bleeding risk.

Collaboration


Dive into the Stephen Bent's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John Neuhaus

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge