[PDF] Measurements of Partial Branching Fractions of Inclusive B \to X_u \, \ell^+\, ν_{\ell} Decays with Hadronic Tagging

Abstract

We present measurements of partial branching fractions of inclusive semileptonic B \to X_u \, \ell^+\, \nu_{\ell} decays using the full Belle data set of 711 fb^{-1} of integrated luminosity at the \Upsilon(4S) resonance and for \ell = e, \mu. Inclusive semileptonic B \to X_u \, \ell^+\, \nu_{\ell} decays are CKM suppressed and measurements are complicated by the large background from CKM-favored B \to X_c \, \ell^+\, \nu_{\ell} transitions, which have a similar signature. Using machine learning techniques, we reduce this and other backgrounds effectively, whilst retaining access to a large fraction of the B \to X_u \, \ell^+\, \nu_{\ell} phase space and high signal efficiency. We measure partial branching fractions in three phase-space regions covering about 31\% to 86\% of the accessible B \to X_u \, \ell^+\, \nu_{\ell} phase space. The most inclusive measurement corresponds to the phase space with lepton energies of E_\ell^B > 1 GeV, and we obtain \Delta \mathcal{B}(B \to X_u \ell^+ \, \nu_\ell) = \left( 1.59 \pm 0.07 \pm 0.16 \right) \times 10^{-3} from a two-dimensional fit of the hadronic mass spectrum and the four-momentum-transfer squared distribution, with the uncertainties denoting the statistical and systematic error. We find \left| V_{ub} \right| = \left( 4.10 \pm 0.09 \pm 0.22 \pm 0.15 \right) \times 10^{-3} from an average of four calculations for the partial decay rate with the third uncertainty denoting the average theory error. This value is higher but compatible with the determination from exclusive semileptonic decays within 1.3 standard deviations. In addition, we report charmless inclusive partial branching fractions separately for B^+ and B^0 mesons as well as for electron and muon final states. No isospin breaking or lepton flavor universality violating effects are observed.

Full PDF

BBelle Preprint 2020-22, KEK Preprint 2020-39

Measurements of Partial Branching Fractions of Inclusive B → X u (cid:96) + ν (cid:96) Decays withHadronic Tagging

L. Cao, ∗ W. Sutcliﬀe, R. Van Tonder, F. U. Bernlochner, † I. Adachi,

18, 14

H. Aihara, S. Al Said,

81, 36

D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, V. Babu, M. Bauer, P. Behera, K. Belous, J. Bennett, M. Bessner, V. Bhardwaj, T. Bilka, J. Biswal, G. Bonvicini, A. Bozek, M. Braˇcko,

48, 33

T. E. Browder, M. Campajola,

30, 56

D. ˇCervenkov, M.-C. Chang, P. Chang, V. Chekelian, A. Chen, B. G. Cheon, K. Chilikin, H. E. Cho, K. Cho, S.-J. Cho, S.-K. Choi, Y. Choi, S. Choudhury, D. Cinabro, S. Cunliﬀe, S. Das, N. Dash, G. De Nardo,

30, 56

F. Di Capua,

30, 56

J. Dingfelder, Z. Doleˇzal, T. V. Dong, S. Dubey, S. Eidelman,

4, 65, 43

D. Epifanov,

4, 65

T. Ferber, D. Ferlewicz, A. Frey, B. G. Fulsom, R. Garg, V. Gaur, A. Garmash,

4, 65

A. Giri, P. Goldenzweig, Y. Guan, C. Hadjivasiliou, T. Hara,

18, 14

O. Hartbrich, K. Hayasaka, H. Hayashii, M. T. Hedges, M. Hernandez Villanueva, W.-S. Hou, C.-L. Hsu, T. Iijima,

55, 54

K. Inami, A. Ishikawa,

18, 14

R. Itoh,

18, 14

M. Iwasaki, Y. Iwasaki, W. W. Jacobs, E.-J. Jang, H. B. Jeon, S. Jia, Y. Jin, C. W. Joo, K. K. Joo, K. H. Kang, G. Karyan, T. Kawasaki, H. Kichimi, C. Kiesling, B. H. Kim, C. H. Kim, D. Y. Kim, H. J. Kim, K.-H. Kim, S. H. Kim, Y.-K. Kim, K. Kinoshita, P. Kodyˇs, T. Konno, A. Korobov,

4, 65

S. Korpar,

48, 33

D. Kotchetkov, E. Kovalenko,

4, 65

P. Kriˇzan,

45, 33

R. Kroeger, P. Krokovny,

4, 65

T. Kuhr, M. Kumar, R. Kumar, K. Kumara, A. Kuzmin,

4, 65

Y.-J. Kwon, K. Lalwani, J. S. Lange, I. S. Lee, S. C. Lee, P. Lewis, C. H. Li, J. Li, L. K. Li, Y. B. Li, L. Li Gioi, J. Libby, K. Lieret, Z. Liptak, ‡ D. Liventsev,

92, 18

J. MacNaughton, C. MacQueen, M. Masuda,

87, 72

T. Matsuda, D. Matvienko,

4, 65, 43

M. Merola,

30, 56

F. Metzner, K. Miyabayashi, R. Mizuk,

43, 20

G. B. Mohanty, T. J. Moon, T. Mori, M. Mrvar, R. Mussa, M. Nakao,

18, 14

Z. Natkaniec, A. Natochii, L. Nayak, M. Nayak, M. Niiyama, N. K. Nisar, S. Nishida,

18, 14

K. Nishimura, S. Ogawa, H. Ono,

62, 63

Y. Onuki, P. Oskin, P. Pakhlov,

43, 53

G. Pakhlova,

20, 43

T. Pang, S. Pardi, C. W. Park, H. Park, S.-H. Park, S. Patra, S. Paul,

83, 49

T. K. Pedlar, R. Pestotnik, L. E. Piilonen, T. Podobnik,

45, 33

V. Popov, E. Prencipe, M. T. Prim, M. Ritter, M. R¨ohrken, A. Rostomyan, N. Rout, M. Rozanska, G. Russo, D. Sahoo, Y. Sakai,

18, 14

S. Sandilya, A. Sangal, L. Santelj,

45, 33

T. Sanuki, V. Savinov, G. Schnell,

2, 22

J. Schueler, C. Schwanda, A. J. Schwartz, Y. Seino, K. Senyo, M. E. Sevior, M. Shapkin, C. Sharma, C. P. Shen, J.-G. Shiu, F. Simon, A. Sokolov, E. Solovieva, S. Staniˇc, M. Stariˇc, Z. S. Stottler, J. F. Strube, T. Sumiyoshi, M. Takizawa,

76, 19, 73

U. Tamponi, K. Tanida, F. Tenchini, K. Trabelsi, M. Uchida, T. Uglov,

43, 20

Y. Unno, S. Uno,

18, 14

P. Urquijo, Y. Usov,

4, 65

S. E. Vahsen, G. Varner, K. E. Varvell, A. Vinokurova,

4, 65

V. Vorobyev,

4, 65, 43

A. Vossen, E. Waheed, C. H. Wang, E. Wang, M.-Z. Wang, P. Wang, M. Watanabe, S. Watanuki, S. Wehle, J. Wiechczynski, E. Won, X. Xu, B. D. Yabsley, W. Yan, S. B. Yang, H. Ye, J. H. Yin, C. Z. Yuan, Y. Yusa, Z. P. Zhang, V. Zhilich,

4, 65

V. Zhukova, and V. Zhulanov

4, 65 (The Belle Collaboration) University of Bonn, 53115 Bonn Department of Physics, University of the Basque Country UPV/EHU, 48080 Bilbao Brookhaven National Laboratory, Upton, New York 11973 Budker Institute of Nuclear Physics SB RAS, Novosibirsk 630090 Faculty of Mathematics and Physics, Charles University, 121 16 Prague Chonnam National University, Gwangju 61186 University of Cincinnati, Cincinnati, Ohio 45221 Deutsches Elektronen–Synchrotron, 22607 Hamburg Duke University, Durham, North Carolina 27708 Department of Physics, Fu Jen Catholic University, Taipei 24205 Key Laboratory of Nuclear Physics and Ion-beam Application (MOE)and Institute of Modern Physics, Fudan University, Shanghai 200443 Justus-Liebig-Universit¨at Gießen, 35392 Gießen II. Physikalisches Institut, Georg-August-Universit¨at G¨ottingen, 37073 G¨ottingen SOKENDAI (The Graduate University for Advanced Studies), Hayama 240-0193 Gyeongsang National University, Jinju 52828 Department of Physics and Institute of Natural Sciences, Hanyang University, Seoul 04763 University of Hawaii, Honolulu, Hawaii 96822 High Energy Accelerator Research Organization (KEK), Tsukuba 305-0801 J-PARC Branch, KEK Theory Center, High Energy Accelerator Research Organization (KEK), Tsukuba 305-0801 Higher School of Economics (HSE), Moscow 101000 a r X i v : . [ h e p - e x ] J a n Forschungszentrum J¨ulich, 52425 J¨ulich IKERBASQUE, Basque Foundation for Science, 48013 Bilbao Indian Institute of Science Education and Research Mohali, SAS Nagar, 140306 Indian Institute of Technology Hyderabad, Telangana 502285 Indian Institute of Technology Madras, Chennai 600036 Indiana University, Bloomington, Indiana 47408 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049 Institute of High Energy Physics, Vienna 1050 Institute for High Energy Physics, Protvino 142281 INFN - Sezione di Napoli, 80126 Napoli INFN - Sezione di Torino, 10125 Torino Advanced Science Research Center, Japan Atomic Energy Agency, Naka 319-1195 J. Stefan Institute, 1000 Ljubljana Institut f¨ur Experimentelle Teilchenphysik, Karlsruher Institut f¨ur Technologie, 76131 Karlsruhe Kavli Institute for the Physics and Mathematics of the Universe (WPI), University of Tokyo, Kashiwa 277-8583 Department of Physics, Faculty of Science, King Abdulaziz University, Jeddah 21589 Kitasato University, Sagamihara 252-0373 Korea Institute of Science and Technology Information, Daejeon 34141 Korea University, Seoul 02841 Kyoto Sangyo University, Kyoto 603-8555 Kyungpook National University, Daegu 41566 Universit´e Paris-Saclay, CNRS/IN2P3, IJCLab, 91405 Orsay P.N. Lebedev Physical Institute of the Russian Academy of Sciences, Moscow 119991 Liaoning Normal University, Dalian 116029 Faculty of Mathematics and Physics, University of Ljubljana, 1000 Ljubljana Ludwig Maximilians University, 80539 Munich Malaviya National Institute of Technology Jaipur, Jaipur 302017 University of Maribor, 2000 Maribor Max-Planck-Institut f¨ur Physik, 80805 M¨unchen School of Physics, University of Melbourne, Victoria 3010 University of Mississippi, University, Mississippi 38677 University of Miyazaki, Miyazaki 889-2192 Moscow Physical Engineering Institute, Moscow 115409 Graduate School of Science, Nagoya University, Nagoya 464-8602 Kobayashi-Maskawa Institute, Nagoya University, Nagoya 464-8602 Universit`a di Napoli Federico II, 80126 Napoli Nara Women’s University, Nara 630-8506 National Central University, Chung-li 32054 National United University, Miao Li 36003 Department of Physics, National Taiwan University, Taipei 10617 H. Niewodniczanski Institute of Nuclear Physics, Krakow 31-342 Nippon Dental University, Niigata 951-8580 Niigata University, Niigata 950-2181 University of Nova Gorica, 5000 Nova Gorica Novosibirsk State University, Novosibirsk 630090 Osaka City University, Osaka 558-8585 Paciﬁc Northwest National Laboratory, Richland, Washington 99352 Panjab University, Chandigarh 160014 Peking University, Beijing 100871 University of Pittsburgh, Pittsburgh, Pennsylvania 15260 Punjab Agricultural University, Ludhiana 141004 Research Center for Nuclear Physics, Osaka University, Osaka 567-0047 Meson Science Laboratory, Cluster for Pioneering Research, RIKEN, Saitama 351-0198 Department of Modern Physics and State Key Laboratory of Particle Detection and Electronics,University of Science and Technology of China, Hefei 230026 Seoul National University, Seoul 08826 Showa Pharmaceutical University, Tokyo 194-8543 Soochow University, Suzhou 215006 Soongsil University, Seoul 06978 Sungkyunkwan University, Suwon 16419 School of Physics, University of Sydney, New South Wales 2006 Department of Physics, Faculty of Science, University of Tabuk, Tabuk 71451 Tata Institute of Fundamental Research, Mumbai 400005 Department of Physics, Technische Universit¨at M¨unchen, 85748 Garching School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978 Toho University, Funabashi 274-8510 Department of Physics, Tohoku University, Sendai 980-8578 Earthquake Research Institute, University of Tokyo, Tokyo 113-0032 Department of Physics, University of Tokyo, Tokyo 113-0033 Tokyo Institute of Technology, Tokyo 152-8550 Tokyo Metropolitan University, Tokyo 192-0397 Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 Wayne State University, Detroit, Michigan 48202 Yamagata University, Yamagata 990-8560 Yonsei University, Seoul 03722 Luther College, Decorah, Iowa 52101

We present measurements of partial branching fractions of inclusive semileptonic B → X u (cid:96) + ν (cid:96) decays using the full Belle data set of 711 fb − of integrated luminosity at the Υ(4 S ) resonance andfor (cid:96) = e, µ . Inclusive semileptonic B → X u (cid:96) + ν (cid:96) decays are CKM suppressed and measurementsare complicated by the large background from CKM-favored B → X c (cid:96) + ν (cid:96) transitions, which havea similar signature. Using machine learning techniques, we reduce this and other backgroundseﬀectively, whilst retaining access to a large fraction of the B → X u (cid:96) + ν (cid:96) phase space and highsignal eﬃciency. We measure partial branching fractions in three phase-space regions coveringabout 31% to 86% of the accessible B → X u (cid:96) + ν (cid:96) phase space. The most inclusive measurementcorresponds to the phase space with lepton energies of E B(cid:96) > B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − from a two-dimensional ﬁt of the hadronic mass spectrumand the four-momentum-transfer squared distribution, with the uncertainties denoting the statisticaland systematic error. We ﬁnd | V ub | = (4 . ± . ± . ± . × − from an average of fourcalculations for the partial decay rate with the third uncertainty denoting the average theory error.This value is higher but compatible with the determination from exclusive semileptonic decayswithin 1.3 standard deviations. In addition, we report charmless inclusive partial branching fractionsseparately for B + and B mesons as well as for electron and muon ﬁnal states. No isospin breakingor lepton ﬂavor universality violating eﬀects are observed. I. INTRODUCTION

Precision measurements of the absolute value of theCabibbo-Kobayashi-Maskawa (CKM) matrix element V ub are important to challenge the Standard Model ofparticle physics (SM) [1, 2]. In the SM, the CKM ma-trix is a 3 × | V ub | and the CKM an-gle γ = φ are imperative to isolate such eﬀects, as theirmeasurements involve tree-level processes, which are ex-pected to remain unaﬀected by new physics and thusprovide an unbiased measure for the amount of CPV dueto the Kobayashi-Maskawa (KM) mechanism [2] alone. ∗ [email protected] † ﬂ[email protected] ‡ now at Hiroshima University Charmless semileptonic decays of B mesons provide aclean avenue to measure | V ub | , as their decay rate is the-oretically better understood than purely hadronic tran-sitions and their decay signature is more accessible thanleptonic B meson decays. The existing measurements ei-ther focus on exclusive ﬁnal states, with B → π (cid:96) + ν (cid:96) [6]and the ratio of Λ b → p µ + ν µ and Λ b → Λ c µ + ν µ [7]providing the most precise measurements to date, andmeasurements reconstructing the B → X u (cid:96) + ν (cid:96) decayfully inclusively . Central for both approaches are re-liable predictions of the (partial) decay rates ∆Γ( B → X u (cid:96) + ν (cid:96) ) (omitting the CKM factor) from theory toconvert measured (partial or full) branching fractions,∆ B ( B → X u (cid:96) + ν (cid:96) ), into measurements of | V ub | via | V ub | = (cid:115) ∆ B ( B → X u (cid:96) + ν (cid:96) ) τ B ∆Γ( B → X u (cid:96) + ν (cid:96) ) , with τ B denoting the B meson lifetime. For exclusivemeasurements, the non-perturbative parts of the decayrates can be reliably predicted by lattice QCD [8] or light-cone sum rules [9] and constrained by the measurements Charge conjugation is implied throughout this paper. In addi-tion, B → X u (cid:96) + ν (cid:96) is deﬁned as the average branching fractionof charged and neutral B meson decays and (cid:96) = e or µ .

9. The Decay B æ X u ¸‹ The B meson, being the lightest meson containing a b quark, can only decay via the weakinteraction. In the following I discuss the semileptonic decay B æ X u ¸‹ , where the ﬁnalstate consists of a hadronic ( X u ) and a leptonic ( ¸‹ ) system.At the energy scale of the B meson mass the propagator term of the virtual W ± bosoncan be integrated out and the weak interaction is described by the eective coupling G F together with the corresponding CKM matrix elements. However, at this energy scalethe bound state of the two quarks, of which the B meson is composed, is described bynon-perturbative QCD. In case the virtual W ± boson decays into a lepton and neutrinopair there exists no strong interaction between the decay products of the W ± and thehadronic system X u . Therefore it is possible to factorize the strong and weak interactioncontributions and treat them separately.The eective Standard Model (SM) Lagrangian describing these decays is given by L e = ≠ G F Ô V ub (u “ µ P L b)( ‹“ µ P L ¸ ) + h . c ., (9.1)with Fermi’s constant G F , the CKM matrix element V ub and the projection operator P L = (1 ≠ “ ) /

2. The decay B æ ﬁ¸‹ is shown at parton level and as an eective diagramin Figure 9.1. b ud d ⌫` + W + B ⇡ (a) Parton level Feynman diagram. B ⌫` + ⇡ (b) Eective Feynman diagram. Figure 9.1.: One possible parton level Feynman diagram (a) and the eective Feynmandiagram (b). In the eective Feynman diagram, the propagator of the W isintegrated out, i.e. the weak interaction is point-like, and the gluon interactionsare described by the blob. 79 B X c cdbd V * cb FIG. 1. The CKM suppressed and favored inclusive semilep-tonic processes B → X u (cid:96) + ν (cid:96) (left) and B → X c (cid:96) + ν (cid:96) (right)for a B meson decay. of the decay dynamics. The determination of | V ub | usinginclusive decays is very challenging due to the large back-ground from the CKM-favored B → X c (cid:96) + ν (cid:96) process.Both processes have a very similar decay signature inthe form of a high momentum lepton, a hadronic system,and missing energy from the neutrino that escapes detec-tion. Figure 1 shows an illustration of both processes fora B -meson decay. A clear separation of the processes isonly possible in kinematic regions where B → X c (cid:96) + ν (cid:96) is kinematically forbidden. In these regions, however,non-perturbative shape functions enter the descriptionof the decay dynamics, making predictions for the decayrates dependent on the precise modeling. These functionsparametrize at leading order the Fermi motion of the b quark inside the B meson. Properties of the leading-order Λ QCD /m b shape function can be determined usingthe photon energy spectrum of B → X s γ decays and mo-ments of the lepton energy or hadronic invariant mass insemileptonic B decays [10–12], but the modeling of boththe leading and subleading shape functions introduceslarge theory uncertainties on the decay rate. In the fu-ture, more model-independent approaches aim to directlymeasure the leading-order shape function [13, 14].As such methods are not yet realized, it is beneﬁcialto extend the measurement region as much as possibleinto the B → X c (cid:96) + ν (cid:96) dominated phase space. Thiswas done, e.g., by Refs. [15, 16]. This reduces the the-ory uncertainties on the predicted partial rates [17–22],although making the measurement more prone to sys-tematic uncertainties. This strategy is also adopted inthe measurement described in this paper.The corresponding world averages of | V ub | from bothexclusive and inclusive determinations are [6]: | V excl .ub | = (3 . ± . ± . × − , (1) | V incl .ub | = (cid:16) . ± . +0 . − . (cid:17) × − . (2)Here the uncertainties are experimental and from theory.Both world averages exhibit a disagreement of about 3standard deviations between them. This disagreement islimiting the reach of present-day precision tests of theKM mechanism and searches for loop-level new physics,see e.g. Ref.[23] for a recent analysis.One important experimental method to extend theprobed B → X u (cid:96) + ν (cid:96) phase space into regions dominatedby B → X c (cid:96) + ν (cid:96) transitions is the full reconstructionof the second B meson of the e + e − → Υ(4 S ) → B ¯ B process. This process is referred to as “tagging” andallows for the reconstruction of the hadronic X sys-tem of the semileptonic process. In addition, the neu-trino four-momentum can be reconstructed. Propertiesof both are instrumental to distinguish B → X u (cid:96) + ν (cid:96) and B → X c (cid:96) + ν (cid:96) processes. In this manuscript the re-construction of the second B meson and the separation of B → X u (cid:96) + ν (cid:96) from B → X c (cid:96) + ν (cid:96) processes were carriedout using machine learning approaches. Several neuralnetworks were trained to identify correctly reconstructedtag-side B mesons. The distinguishing variables of theclassiﬁcation algorithm were carefully selected in ordernot to introduce a bias in the measured partial branch-ing fractions. In addition, the modeling of backgroundswas validated in B → X c (cid:96) + ν (cid:96) enriched selections. Wereport the measurement of three partial branching frac-tions, covering 30% - 85% of the accessible B → X u (cid:96) + ν (cid:96) phase space. The measurement of fully diﬀerential dis-tributions, which allow one to determine the leading andsubleading shape functions, is left for future work.The main improvement over the previous Belle resultof Ref. [16] lies in the adoption of a more eﬃcient taggingalgorithm for the reconstruction of the second B mesonand the improvements of the B → X u (cid:96) + ν (cid:96) signal and B → X c (cid:96) + ν (cid:96) background descriptions. In addition, thefull Belle data set of 711 fb − is analyzed and we avoidthe direct use of kinematic properties of the candidatesemileptonic decay in the background suppression. Afterthe ﬁnal selection we retain a factor of approximatively1.8 times more signal events than the previous analysis.The remainder of this manuscript is organized as fol-lows: Section II provides an overview of the data setand the simulated signal and background samples, thatwere used in the analysis. Section III details the analy-sis strategy and reconstruction of the hadronic X systemof the semileptonic decay. Section IV introduces the ﬁtprocedure used to separate B → X u (cid:96) + ν (cid:96) signal frombackground contributions. Section V lists the system-atic uncertainties aﬀecting the measurements and Sec-tion VI summarizes sideband studies central to validatethe modeling of the crucial B → X c (cid:96) + ν (cid:96) backgroundprocesses. Finally, Section VII shows the selected sig-nal events and compares them with the expectation fromsimulation. In Section VIII the measured partial branch-ing fractions and subsequent values of | V ub | are discussed.Section IX presents our conclusions. II. DATA SET AND SIMULATED SAMPLES

The analysis utilizes the full Belle data set of(772 ± × B meson pairs, which were producedat the KEKB accelerator complex [24] with a center-of-mass energy of √ s = 10 .

58 GeV corresponding to theΥ(4 S ) resonance. In addition, 79 fb − of collision eventsrecorded 60 MeV below the Υ(4 S ) resonance peak areused to derive corrections and for cross-checks.The Belle detector is a large-solid-angle magnetic spec-trometer that consists of a silicon vertex detector, a 50-layer central drift chamber (CDC), an array of aerogelthreshold Cherenkov counters (ACC), a barrel-like ar-rangement of time-of-ﬂight scintillation counters (TOF),and an electromagnetic calorimeter composed of CsI(Tl)crystals (ECL) located inside a superconducting solenoidcoil that provides a 1 . K L mesons and to identify muons (KLM). A moredetailed description of the detector and its layout andperformance can be found in Ref. [25] and in referencestherein.Charged tracks are identiﬁed as electron or muon can-didates by combining the information of multiple subde-tectors into a lepton identiﬁcation likelihood ratio, L LID .For electrons, the most important identifying featuresare the ratio of the energy deposition in the ECL withrespect to the reconstructed track momentum, the en-ergy loss in the CDC, the shower shape in the ECL,the quality of the geometrical matching of the track tothe shower position in the ECL, and the photon yield inthe ACC [26]. Muon candidates can be identiﬁed fromcharged track trajectories extrapolated to the outer de-tector. The most important identifying features are thediﬀerence between expected and measured penetrationdepth as well as the transverse deviation of KLM hitsfrom the extrapolated trajectory [27]. Charged tracksare identiﬁed as pions or kaons using a likelihood ratio L K /π ID = L K ID / ( L K ID + L π ID ). The most importantidentifying features of the kaon ( L K ID ) and pion ( L π ID )likelihoods for low momentum particles with transversemomentum below 1 GeV in the laboratory frame arethe recorded energy loss by ionization, d E/ d x , in theCDC, and the time of ﬂight information from the TOF.Higher-momentum kaon and pion classiﬁcation relies onthe Cherenkov light recorded in the ACC. In order toavoid the diﬃculties in understanding the eﬃciencies ofreconstructing K L mesons, they are not explicitly recon-structed or used in this analysis.Photons are identiﬁed as energy depositions in theECL, vetoing clusters to which an associated track canbe assigned. Only photons with an energy deposi-tion of E γ >

100 MeV, 150 MeV, and 50 MeV in the for-ward endcap, backward endcap and barrel part of thecalorimeter, respectively, are considered. We reconstruct π candidates from photon candidates. The invariantmass is required to fall inside a window of m γγ ∈ [0 . , .

15] GeV, which corresponds to about 2.5 timesthe π mass resolution.Monte Carlo (MC) samples of B meson decays andcontinuum processes ( e + e − → q ¯ q with q = u, d, s, c )are simulated using the EvtGen generator [28]. Thesesamples are used to evaluate reconstruction eﬃcienciesand acceptance, and to estimate background contami- We use natural units: (cid:126) = c = 1. TABLE I. Branching fractions for B → X u (cid:96) + ν (cid:96) and B → X c (cid:96) + ν (cid:96) background processes that were used are listed. Moredetails on the applied corrections can be found in the text.We neglect the small contribution from B + → D ( ∗ ) s K + (cid:96) + ν (cid:96) which has a branching fraction of similar size as B → Dππ (cid:96) + ν (cid:96) . B Value B + Value B B → X u (cid:96) + ν (cid:96) B → π (cid:96) + ν (cid:96) (7 . ± . × − (1 . ± . × − B → η (cid:96) + ν (cid:96) (3 . ± . × − - B → η (cid:48) (cid:96) + ν (cid:96) (2 . ± . × − - B → ω (cid:96) + ν (cid:96) (1 . ± . × − - B → ρ (cid:96) + ν (cid:96) (1 . ± . × − (2 . ± . × − B → X u (cid:96) + ν (cid:96) (2 . ± . × − (2 . ± . × − B → X c (cid:96) + ν (cid:96) B → D (cid:96) + ν (cid:96) (2 . ± . × − (2 . ± . × − B → D ∗ (cid:96) + ν (cid:96) (5 . ± . × − (5 . ± . × − B → D ∗ (cid:96) + ν (cid:96) (4 . ± . × − (3 . ± . × − ( (cid:44) → Dπ ) B → D ∗ (cid:96) + ν (cid:96) (4 . ± . × − (3 . ± . × − ( (cid:44) → D ∗ π ) B → D (cid:96) + ν (cid:96) (4 . ± . × − (3 . ± . × − ( (cid:44) → D ∗ π ) B → D ∗ (cid:96) + ν (cid:96) (1 . ± . × − (1 . ± . × − ( (cid:44) → D ∗ π ) B → D ∗ (cid:96) + ν (cid:96) (1 . ± . × − (1 . ± . × − ( (cid:44) → Dπ ) B → D (cid:96) + ν (cid:96) (2 . ± . × − (2 . ± . × − ( (cid:44) → Dππ ) B → Dππ (cid:96) + ν (cid:96) (0 . ± . × − (0 . ± . × − B → D ∗ ππ (cid:96) + ν (cid:96) (2 . ± . × − (2 . ± . × − B → Dη (cid:96) + ν (cid:96) (4 . ± . × − (4 . ± . × − B → D ∗ η (cid:96) + ν (cid:96) (4 . ± . × − (4 . ± . × − B → X c (cid:96) + ν (cid:96) (10 . ± . × − (10 . ± . × − nations. The sample sizes used correspond to approxi-mately ten and ﬁve times, respectively, the Belle collisiondata for B meson and continuum decays. The interac-tions of particles traversing the detector are simulatedusing Geant3 [29]. Electromagnetic ﬁnal-state radiationis simulated using the

PHOTOS [30] package for all chargedﬁnal-state particles. The eﬃciencies in the MC are cor-rected using data-driven methods to account for, e.g.,diﬀerences in identiﬁcation and reconstruction eﬃcien-cies.The most important background processes are semilep-tonic B → X c (cid:96) + ν (cid:96) decays and continuum processes,which both can produce high-momentum leptons in amomentum range similar to the B → X u (cid:96) + ν (cid:96) process.The semileptonic background from B → X c (cid:96) + ν (cid:96) decaysis dominated by B → D (cid:96) + ν (cid:96) and B → D ∗ (cid:96) + ν (cid:96) de-cays. The B → D (cid:96) + ν (cid:96) decays are modeled using theBGL parametrization [31] with form factor central val-ues and uncertainties taken from the ﬁt in Ref. [32].For B → D ∗ (cid:96) + ν (cid:96) we use the BGL implementa-tion proposed by Refs. [33, 34] with form factor cen-tral values and uncertainties from the ﬁt to the mea-surement of Ref. [35]. Both backgrounds are normal-ized to the average branching fraction of Ref. [6] assum-ing isospin symmetry. Semileptonic B → D ∗∗ (cid:96) + ν (cid:96) de-cays with D ∗∗ = { D ∗ , D ∗ , D , D ∗ } denoting the four or-bitally excited charmed mesons are modeled using theheavy-quark-symmetry-based form factors proposed inRef. [36]. We simulate all D ∗∗ decays using masses andwidths from Ref. [37]. For the branching fractions weadopt the values of Ref. [6] and correct them to accountfor missing isospin-conjugated and other established de-cay modes, following the prescription given in Ref. [36].To correct for the fact that the measurements were car-ried out in the D ∗∗ → D ( ∗ )+ π − decay modes, we ac-count for the missing isospin modes with a factor of f π = B ( D ∗∗ → D ( ∗ ) − π + ) B ( D ∗∗ → D ( ∗ ) π ) = 23 . (3)The measurements of the B → D ∗ (cid:96) ¯ ν (cid:96) in Ref. [6] are con-verted to only account for the D ∗ → D ∗ − π + decay. Toalso account for D ∗ → D − π + contributions, we apply afactor of [37] f D ∗ = B ( D ∗ → D − π + ) B ( D ∗ → D ∗ − π + ) = 1 . ± . . (4)The world average of B → D ∗ (cid:96) ¯ ν (cid:96) given in Ref. [6]combines measurements, which show poor agreement,and the resulting probability of the combination is be-low 0.01%. Notably, the measurement of Ref. [38] isin conﬂict with the measured branching fractions ofRefs. [39, 40] and with the expectation of B ( B → D ∗ (cid:96) ¯ ν (cid:96) )being of similar size than B ( B → D (cid:96) ¯ ν (cid:96) ) [41, 42]. Weperform our own average excluding the conﬂicting mea-surement and use B ( B + → D ∗ ( → D ∗ − π + ) (cid:96) + ν (cid:96) ) = (0 . ± . × − . (5)The world average of B → D (cid:96) ¯ ν (cid:96) does not include con-tributions from prompt three-body decays of D → Dππ .We account for these using a factor [43] f D = B ( D → D ∗ − π + ) B ( D → D π + π − ) = 2 . ± . . (6)We subtract the contribution of D → Dππ from themeasured non-resonant plus resonant B → Dππ(cid:96) ¯ ν (cid:96) branching fraction of Ref. [44]. To account for missingisospin-conjugated modes of the three-hadron ﬁnal stateswe adopt the prescription from Ref. [44], which calculatesan average isospin correction factor of f ππ = B ( D ∗∗ → D ( ∗ ) 0 π + π − ) B ( D ∗∗ → D ( ∗ ) ππ ) = 12 ± . (7) The uncertainty takes into account the full spread of ﬁnalstates ( f (500) → ππ or ρ → ππ result in f ππ = 2 / /

3, respectively) and the non-resonant three-bodydecays ( f ππ = 3 / B ( D ∗ → Dπ ) + B ( D ∗ → D ∗ π ) = 1 , B ( D → D ∗ π ) + B ( D → Dππ ) = 1 , B ( D ∗ → D ∗ π ) = 1 , and B ( D → Dπ ) = 1 . (8)For the remaining B → D ( ∗ ) π π (cid:96) + ν (cid:96) contributions weuse the measured value of Ref. [44]. The remaining “gap”between the sum of all considered exclusive modes andthe inclusive B → X c (cid:96) + ν (cid:96) branching fraction is ﬁlled inequal parts with B → D η (cid:96) + ν (cid:96) and B → D ∗ η (cid:96) + ν (cid:96) andfor both we assume a 100% uncertainty. We simulate B → D ( ∗ ) π π (cid:96) + ν (cid:96) and B → D ( ∗ ) η (cid:96) + ν (cid:96) ﬁnal states as-suming that they are produced by the decay of two broadresonant states D ∗∗ gap with masses and widths identical to D ∗ and D . Although there is currently no experimen-tal evidence for decays of charm 1 P states into theseﬁnal states or the existence of such an additional broadstate (e.g. a 2 S ) in semileptonic transitions, this descrip-tion provides a better kinematic description of the initialthree-body decay, B → D ∗∗ gap (cid:96) ¯ ν (cid:96) , than e.g. a model basedon the equidistribution of all ﬁnal-state particles in phasespace. For the form factors we adapt Ref. [36].Semileptonic B → X u (cid:96) + ν (cid:96) decays are modeled asa mixture of speciﬁc exclusive modes and non-resonantcontributions. We normalize their corresponding branch-ing fractions to the world averages from Ref. [37]:semileptonic B → π (cid:96) + ν (cid:96) decays are simulated using theBCL parametrization [45] with form factor central val-ues and uncertainties from the global ﬁt carried out byRef. [46]. The processes of B → ρ (cid:96) + ν (cid:96) and B → ω (cid:96) + ν (cid:96) are modeled using the BCL form factor parametrization.We ﬁt the measurements of Refs. [47–49] in combinationwith the light-cone sum rule predictions of Ref. [9] todetermine a set of form factor central values and uncer-tainties. The processes of B → η (cid:96) + ν (cid:96) and B → η (cid:48) (cid:96) + ν (cid:96) are modeled using the LCSR calculation of Ref. [50].For the uncertainties we assume for these states that thepole-parameters α + / and the form factor normalization f + Bη (0) at maximum recoil can be treated as uncorre-lated. In addition to these narrow resonances, we simu-late non-resonant B → X u (cid:96) + ν (cid:96) decays with at least twopions in the ﬁnal state following the DFN model [51].The triple diﬀerential rate of this model is a functionof the four-momentum-transfer squared ( q ), the leptonenergy ( E B(cid:96) ) in the B rest-frame, and the hadronic in-variant mass squared ( M X ) of the X u system at next-to-leading order precision in the strong coupling con-stant α s . This triple diﬀerential rate is convolved witha non-perturbative shape function using an ad-hoc expo-nential model. The free parameters of the model arethe b quark mass in the Kagan-Neubert scheme [52], m KN b = (4 . ± .

04) GeV and a non-perturbative pa-rameter a KN = 1 . ± .

5. The values of these parameterswere determined in Ref. [53] from a ﬁt to B → X c (cid:96) + ν (cid:96) and B → X s γ decay properties. At leading order, thenon-perturbative parameter a KN is related to the aver-age momentum squared of the b quark inside the B mesonand determines the second moment of the shape function.It is deﬁned as a KN = − /λ − m B − m KN b and the kinetic energy parameter λ . The hadronization of the parton-level B → X u (cid:96) + ν (cid:96) DFN simulation is carried out using the JETSET al-gorithm [54], producing ﬁnal states with two or moremesons. The inclusive and exclusive B → X u (cid:96) + ν (cid:96) pre-dictions are combined using a so-called ‘hybrid’ approach,which is a method originally suggested by Ref. [55], andour implementation closely follows Ref. [56] and uses thelibrary of Ref. [57]. To this end, we combine both pre-dictions such that the partial branching fractions in thetriple diﬀerential rate of the inclusive (∆ B incl ijk ) and com-bined exclusive (∆ B excl ijk ) predictions reproduce the inclu-sive values. This is achieved by assigning weights to theinclusive contributions w ijk such that∆ B incl ijk = ∆ B excl ijk + w ijk × ∆ B incl ijk , (9)with i, j, k denoting the corresponding bin in the threedimensions of q , E B(cid:96) , and M X : q = [0 , . , , . , , . , , ,

25] GeV ,E B(cid:96) = [0 , . , , . , . , . , , . ,

3] GeV ,M X = [0 , . , . , . , , . , , .

5] GeV . To study the model dependence of the DFN shape func-tion, we also determine weights using the BLNP modelof Ref. [58] and treat the diﬀerence later as a systematicuncertainty. For the b quark mass in the shape-functionscheme we use m SF b = 4 .

61 GeV and µ π = 0 .

20 GeV .Figures detailing the hybrid model construction can befound in Appendix A.Table I summarizes the branching fractions for the sig-nal and the important B → X c (cid:96) + ν (cid:96) background pro-cesses that were used. Figure 2 shows the generator-level distributions and yields of B → X c (cid:96) + ν (cid:96) and B → X u (cid:96) + ν (cid:96) after the tag-side reconstruction (cf. Sec-tion III). The B → X u (cid:96) + ν (cid:96) yields were scaled up by afactor of 50 to make them visible. A clear separation canbe obtained at low values of M X and high values of E B(cid:96) . III. ANALYSIS STRATEGY, HADRONICTAGGING, AND X RECONSTRUCTIONA. Neutral Network Based Tag SideReconstruction

We reconstruct collision events using the hadronic fullreconstruction algorithm of Ref. [59]. The algorithm re-constructs one of the B mesons produced in the col-lision event using hadronic decay channels. We label E B [GeV] E v e n t s / ( . G e V ) ×10 M X [GeV] E v e n t s / ( . G e V ) ×10 B X c (×0.02) BB BBB

Higher resonances& non-resonant

FIG. 2. The generator-level E B(cid:96) and M X distributionsof the CKM suppressed and favored inclusive semileptonicprocesses, B → X u (cid:96) + ν (cid:96) (scaled up by a factor of 50) and B → X c (cid:96) + ν (cid:96) , respectively, are shown, using the models de-scribed in the text. such B mesons in the following as B tag . Instead of at-tempting to reconstruct as many B meson decay cas-cades as possible, the algorithm employs a hierarchi-cal reconstruction ansatz in four stages: at the ﬁrststage, neural networks are trained to identify chargedtracks and neutral energy depositions as detector stableparticles ( e + , µ + , K + , π + , γ ), neutral π candidates, or K S candidates. At the second stage, these candidateparticles are combined into heavier meson candidates( J/ψ, D , D + , D s ) and for each target ﬁnal state a neu-ral network is trained to identify probable candidates. Inaddition to the classiﬁer output from the ﬁrst stage, ver-tex ﬁt probabilities of the candidate combinations, andthe full four-momentum of the combination are passedto the input layer. At the third stage, candidates for D ∗ , D ∗ + , and D ∗ s mesons are formed and separate neu-ral networks are trained to identify viable combinations.The input layer aggregates the output classiﬁers from allprevious reconstruction stages. The ﬁnal stage combinesthe information from all previous stages to form B tag candidates. The viability of such combinations is againassessed by a neural network that was trained to dis-tinguish correctly reconstructed candidates from wrongcombinations and whose output classiﬁer score we denoteby O FR . Over 1104 decay cascades are reconstructed inthis manner, achieving an eﬃciency of 0.28% and 0.18%for charged and neutral B meson pairs [60], respectively.Finally, the output of this classiﬁer is used as an inputand combined with a range of event shape variables totrain a neural network to distinguish reconstructed B meson candidates from continuum processes. The out-put classiﬁer score of this neural network is denoted as O Cont . Both classiﬁer scores are mapped to a range of[0 ,

1) signifying the reconstruction quality of poor to ex-cellent candidates. We retain B tag candidates that showat least moderate agreement based on these two outputsand require that O FR > − and O Cont > − . De-spite these relatively low values, knowledge of the chargeand momentum of the decay constituents in combinationwith the known beam-energy allows one to infer the ﬂavorand four-momentum of the B tag candidate. We requirethe B tag candidates to have at least a beam-constrainedmass of M bc = (cid:113) E − | p tag | > .

27 GeV , (10)with p tag denoting the momentum of the B tag candidatein the center-of-mass frame of the colliding e + e − -pair.Furthermore, E beam = √ s/ e + e − -pair. The energy dif-ference ∆ E = E tag − E beam , (11)is already used in the input layer of the neural networktrained in the ﬁnal stage of the reconstruction. Here E tag denotes the energy of the B tag candidate in the center-of-mass frame of the colliding e + e − -pair. In each eventa single B tag candidate is then selected according to thehighest O FR score of the hierarchical full reconstructionalgorithm. All tracks and clusters not used in the re-construction of the B tag candidate are used to deﬁne thesignal side. B. Signal Side Reconstruction

The signal side of the event is reconstructedby identifying a well-reconstructed lepton with E B(cid:96) = | p B(cid:96) | > B rest frame us-ing the likelihood mentioned in Section II. The signal B rest frame is calculated using the momentum of the B tag candidate via p sig = p e + e − − (cid:18)(cid:113) m B + | p tag | , p tag (cid:19) , (12)with p e + e − denoting the four-momentum of the collidingelectron-positron pair. Leptons from J/ψ and photonconversions in detector material are rejected by combin-ing the lepton candidate with oppositely charged tracks We neglect the small correction of the lepton mass term to theenergy of the lepton. ( t ) on the signal side and demanding that m (cid:96)t > .

14 GeVand m et / ∈ [3 . , .

15] GeV or m µt / ∈ [3 . , .

12] GeV. Ifmultiple lepton candidates are present on the signal side,the event is discarded as multiple leptons are likely tooriginate from a double semileptonic b → c → s cascade.For charged B tag candidates, we demand that the chargeassignment of the signal-side lepton be opposite that ofthe B tag charge. The hadronic X system is reconstructedfrom the remaining unassigned charged particles and neu-tral energy depositions. Its four momentum is calculatedas p X = (cid:88) i (cid:18)(cid:113) m π + | p i | , p i (cid:19) + (cid:88) j (cid:0) E j , k j (cid:1) , (13)with E i = | k i | the energy of the neutral energy depo-sitions and all charged particles with momentum p i areassumed to be pions. With the X system reconstructed,we can also reconstruct the missing mass squared, M = (cid:0) p sig − p X − p (cid:96) (cid:1) , (14)which should peak at zero, M ≈ m ν ≈ , forcorrectly reconstructed semileptonic B → X u (cid:96) + ν (cid:96) and B → X c (cid:96) + ν (cid:96) decays. The hadronic mass of the X sys-tem is later used to discriminate B → X u (cid:96) + ν (cid:96) signaldecays from B → X c (cid:96) + ν (cid:96) and other remaining back-grounds. It is reconstructed using M X = (cid:113) ( p X ) µ ( p X ) µ . (15)In addition, we reconstruct the four-momentum-transfersquared, q , as q = (cid:0) p sig − p X (cid:1) . (16)The resolution of both variables for B → X u (cid:96) + ν (cid:96) isshown in Figure 3 as residuals with respect to the gener-ated values of q and M X . The resolution for M X has aroot-mean-square (RMS) deviation of 0 .

47 GeV, but ex-hibits a large tail towards larger values. The distinct peakat 0 is from B → π − (cid:96) + ν (cid:96) and other low-multiplicityﬁnal states comprised of only charged pions. The four-momentum-transfer squared q exhibits a large resolu-tion, which is caused by a combination of the tag-side B and the X reconstruction. The RMS deviation for q is 1 .

59 GeV . The core resolution is dominated bythe tagging resolution, whereas the large negative tail isdominated from the resolution of the reconstruction ofthe X system. C. Background Suppression BDT

At this point in the reconstruction, the B → X c (cid:96) + ν (cid:96) process completely dominates the selected events. Toidentify B → X u (cid:96) + ν (cid:96) , we combine several distinguish-ing features into a single discriminant. This is achieved M reco X M true X [GeV] E v e n t s / ( . G e V ) RMS = 0.47 GeV q reco q true [GeV ] E v e n t s / ( . G e V ) RMS = 1.59 GeV FIG. 3. The resolution of the reconstructed M X and q values for B → X u (cid:96) + ν (cid:96) signal is shown as a residual withrespect to the generated values. by using a machine learning based classiﬁcation withboosted decision trees (BDTs). Note that all momentaare in the center-of-mass frame of the colliding e + e − -pair.These features are:1. M : The average B → X c (cid:96) + ν (cid:96) multiplicity ishigher than B → X u (cid:96) + ν (cid:96) , broadening the missingmass squared distribution.2. D ∗ veto: We search for low momentum neu-tral and charged pions in the X system with | p π | <

220 MeV, compatible with a D ∗ → Dπ transition. The key idea of this is that due to thesmall available phase space from the small mass dif-ference between the D ∗ and D mesons, the ﬂightdirection of the slow pion is strongly correlated withthe D ∗ momentum direction. The energy and mo-mentum of a D ∗ candidate can thus be approxi-mated as E D ∗ = m D ∗ m D ∗ − m D × E π , p D ∗ = p π × (cid:113) E D ∗ − m D ∗ | p π | , (17)with m D ∗ and m D denoting the D ∗ and D meson masses, respectively, and E π = (cid:113) m π + | p π | is theenergy of the slow pion. Using the D ∗ candidatefour momentum p D ∗ = ( E D ∗ , p D ∗ ) we can calcu-late M ,D ∗ = (cid:0) p sig − p D ∗ − p (cid:96) (cid:1) , cos θ B,D ∗ (cid:96) = 2 E beam E D ∗ (cid:96) − m B − m D ∗ (cid:96) | p B || p D ∗ (cid:96) | , cos θ ∗ = p (cid:96) · p D ∗ | p (cid:96) || p D ∗ | , (18)with p D ∗ (cid:96) = p D ∗ + p (cid:96) = ( E D ∗ (cid:96) , p D ∗ (cid:96) ) and | p B | = (cid:113) E B − m B . These three variables are usedexclusively for events with charged and neutral slowpion candidates.3. Kaons: We identify the number of K + candidatesusing the particle-identiﬁcation likelihood, cf. Sec-tion II. In addition, we reconstruct K S candidatesfrom displaced tracks found in the X system.4. B sig vertex ﬁt: The charmed mesons producedin B → X c (cid:96) + ν (cid:96) transitions exhibit a longer life-time than their charmless counterparts producedin B → X u (cid:96) + ν (cid:96) decays. This can be exploitedby carrying out a vertex ﬁt using the lepton andall charged constituents, not identiﬁed as kaons, ofthe X system and we use its χ value as a discrim-inator.5. Q tot : The total event charge as calculated fromthe X system plus lepton on the signal and fromthe B tag constituents. Due to the larger averagemultiplicity of B → X c (cid:96) + ν (cid:96) , the expected net zeroevent charge is more often violated in comparisonto B → X u (cid:96) + ν (cid:96) candidate events.We use the BDT implementation of Ref. [61] and train aclassiﬁer O BDT with simulated B → X u (cid:96) + ν (cid:96) and B → X c (cid:96) + ν (cid:96) events, which we discard in the later analysis.Ref. [61] uses optimized boosting and pruning proceduresto maximize the classiﬁcation performance. We choosea selection criteria on O BDT that rejects 98.7% of B → X c (cid:96) + ν (cid:96) and retains 18.5% of B → X u (cid:96) + ν (cid:96) signal. Thisworking point was chosen by maximizing the signiﬁcanceof the most inclusive partial branching fraction, takinginto account the full set of systematic uncertainties andthe full analysis procedure. The stability of the result asa function of the BDT selection is further discussed inSection VIII.Table II lists the eﬃciencies for signal and B → X c (cid:96) + ν (cid:96) background for the M bc and the BDT selections.Figure 4 shows the output classiﬁer of the backgroundsuppression BDT for MC and data. The classiﬁer outputshows good agreement between simulated and observeddata over the full range. A comparison of the shape ofall input variables for B → X u (cid:96) + ν (cid:96) and B → X c (cid:96) + ν (cid:96) ,and further MC and data comparisons can be found inAppendix B.0 BDT classifier output E v e n t s / ( . ) ×10 Other

B X c B X u B X u shapeDataMC unc. FIG. 4. The shape of the background suppression classiﬁer O BDT is shown. MC is divided into B → X u (cid:96) + ν (cid:96) signal, thedominant B → X c (cid:96) + ν (cid:96) background, and all other contribu-tions. To increase visibility, the B → X u (cid:96) + ν (cid:96) componentis shown with a scaling factor (red dashed line). The uncer-tainties on the MC contain the full systematic errors and arefurther discussed in Section V.TABLE II. The selection eﬃciencies for B → X u (cid:96) + ν (cid:96) signal, B → X c (cid:96) + ν (cid:96) and for data are listed after the reconstruc-tion of the B tag and lepton candidate. The nominal selectionrequirement on the BDT classiﬁer O BDT is 0.85. The othertwo requirements were introduced to test the stability of theresult, cf. Section VIII.Selection B → X u (cid:96) + ν (cid:96) B → X c (cid:96) + ν (cid:96) Data M bc > .

27 GeV 84.8% 83.8% 80.2% O BDT > .

85 18.5% 1.3% 1.6% O BDT > .

83 21.9% 1.7% 2.1% O BDT > .

87 14.5% 0.9% 1.1%

D. Tagging Eﬃciency Calibration

The reconstruction eﬃciency of the hadronic full re-construction algorithm of Ref. [59] diﬀers between simu-lated samples and the reconstructed data. This diﬀerencemainly arises due to imperfections, e.g. in the simulationof detector responses, particle identiﬁcation eﬃciencies,or incorrect branching fractions in the reconstructed de-cay cascades. To address this, the reconstruction eﬃ-ciency is calibrated using a data-driven approach and wefollow closely the procedure outlined in Ref. [32]. We re-construct full reconstruction events by requiring exactlyone lepton on the signal side, and apply the same B tag and lepton selection criteria outlined in the previous sec-tion. This B → X (cid:96) + ν (cid:96) enriched sample is divided intogroups of subsamples according to the B tag decay chan-nel and the multivariate classiﬁer output O FR used inthe hierarchical reconstruction. Each of these groups ofsubsamples is studied individually to derive a calibrationfactor for the hadronic tagging eﬃciency: the calibra- TABLE III. The binning choices of the four ﬁts are given.Fit variable Bins M X [0 , . , . , . , . , .

0] GeV q [0 , , , , , , , ,

26] GeV E B(cid:96)

15 equidist. bins in [1 , .

5] GeV & [2 . , .

7] GeV M X : q [0 , .

5] GeV × [0 , , , , , , , ,

26] GeV [1 . , .

9] GeV × [0 , , , ,

26] GeV [1 . , .

5] GeV × [0 , , ,

26] GeV [2 . , .

0] GeV × [0 , ,

26] GeV tion factor is obtained by comparing the number of in-clusive semileptonic B -meson decays, N ( B → X (cid:96) + ν (cid:96) ),in data with the expectation from the simulated sam-ples, N MC ( B → X (cid:96) + ν (cid:96) ). The semileptonic yield is de-termined via a binned maximum likelihood ﬁt using thethe lepton energy spectrum. To reduce the modeling de-pendence of the B → X (cid:96) + ν (cid:96) sample this is done in acoarse granularity of ﬁve bins. The calibration factor ofeach these groups of subsamples is given by C tag ( B tag mode , O FR ) = N ( B → X (cid:96) + ν (cid:96) ) N MC ( B → X (cid:96) + ν (cid:96) ) . (19)The free parameters in the ﬁt are the yield of the semilep-tonic B → X (cid:96) + ν (cid:96) decays, the yield of backgrounds fromfake leptons and the yield of backgrounds from true lep-tons. Approximately 1200 calibration factors are deter-mined this way. The leading uncertainty on the C tag factors is from the assumed B → X (cid:96) + ν (cid:96) compositionand the lepton PID performance, cf. Section V. We alsoapply corrections to the continuum eﬃciency. These arederived by using the oﬀ-resonance sample and compar-ing the number of reconstructed oﬀ-resonance events indata with the simulated on-resonance continuum events,correcting for diﬀerences in the selection. IV. FITTING PROCEDURE

In order to determine the B → X u (cid:96) + ν (cid:96) signal yieldand constrain all backgrounds, we perform a binned like-lihood ﬁt in the discriminating variables. To reduce thedependence on the precise modeling of the B → X u (cid:96) + ν (cid:96) signal, we use coarse bins over regions that are very sen-sitive to the admixture of resonant and non-resonant de-cays, cf. Section II. The total likelihood function is con-structed as the product of individual Poisson distribu-tions P , L = bins (cid:89) i P ( n i ; ν i ) × (cid:89) k G k , (20)with n i denoting the number of observed data events and ν i the total number of expected events in a given bin i .1Here, G k are nuisance-parameter (NP) constraints, whoserole is to incorporate systematic uncertainties of a source k into the ﬁt. Their construction is further discussedin Section V. The number of expected events in a givenbin, ν i , is estimated using simulated collision events andis given by ν i = processes (cid:88) k f ik η k , (21)with η k denoting the total number of events from a givenprocess k , and f ik denoting the fraction of such eventsbeing reconstructed in bin i as determined by the MCsimulation.We carry out four separate ﬁts to measure three partialbranching fractions, each using diﬀerent discriminatingvariables to determine the B → X u (cid:96) + ν (cid:96) yield. The ﬁtsand variables are:1. The hadronic mass, M X : Signal is expected to pre-dominantly populate the low hadronic mass region,whereas remaining B → X c (cid:96) + ν (cid:96) background willproduce a sharp peak at around M X ≈ X system will result in a non-negligible amount ofthese backgrounds to also be present in the low andhigh M X region.2. The four-momentum-transfer squared, q : Signalwill on average have a higher q than B → X c (cid:96) + ν (cid:96) background, whose kinematic endpoint is q =( m B − m D ) ≈ . . However, the recon-structed q of B → X c (cid:96) + ν (cid:96) events is smeared overthe entire kinematic range due to the sizeable reso-lution in the reconstruction of the inclusive X sys-tem and the B tag reconstruction.3. The lepton energy in the B meson rest-frame, E B(cid:96) : Signal and B → X c (cid:96) + ν (cid:96) canbe separated beyond the kinematic endpointof the B → X c (cid:96) + ν (cid:96) background, which is m B (cid:16) m B − m D + m (cid:96) (cid:17) ≈ . E B(cid:96) = | p B(cid:96) | ), which has excellent resolution. Thismakes the measurement more sensitive to the exactcomposition of the B → X c (cid:96) + ν (cid:96) background and B → X u (cid:96) + ν (cid:96) signal. To minimize the dependenceon the signal modeling, the endpoint of the lep-ton spectrum, ranging from E B(cid:96) ∈ [2 . , .

7] GeV,is treated as a single coarse bin in the ﬁt. Toreduce the dependence on the exact modeling of B → X c (cid:96) + ν (cid:96) we require M X < . M X and q simultaneously in atwo dimensional ﬁt.A summary of the binning choices is provided in Ta-ble III. The likelihood Eq. 20 is numerically maximized toﬁt the value of three diﬀerent components, η k , from the observed events and by using the sequential least squaresprogramming method implementation of Ref. [62]. Thethree components we determine are:a) Signal B → X u (cid:96) + ν (cid:96) events that fall inside thephase-space region of the partial branching fractionwe wish to determine.b) Signal B → X u (cid:96) + ν (cid:96) events that fall outsidesaid region. This component has a very similarshape as other backgrounds. We thus constrainthis component in all ﬁts to its expectation us-ing the world average of B ( B → X u (cid:96) + ν (cid:96) ) =(2 . ± . × − [37]. We also investigated dif-ferent approaches: for instance linking this com-ponent with the component of a). This leads tosmall shifts of O (0 . − B → X c (cid:96) + ν (cid:96) and secondary B → h h and h → h (cid:96) − ν de-cays and simulated as described in Section III. Here h , h , and h denote hadronic ﬁnal states.Conﬁdence intervals for the three components are con-structed using the proﬁle likelihood ratio method. For agiven component η k the ratio isΛ( η k ) = − L ( η k , (cid:98) η η k , (cid:98) θ η k ) L ( (cid:98) η k , (cid:98) η , (cid:98) θ ) , (22)where (cid:98) η k , (cid:98) η , (cid:98) θ are the values of the component of inter-est, the remaining components, and a vector of nuisanceparameters (NPs), respectively, that maximize the like-lihood function, whereas the remaining components (cid:98) η η k and nuisance parameters (cid:98) θ η k maximize the likelihood forthe speciﬁc value η k . In the asymptotic limit, the teststatistic Eq. 22 can be used to construct approximateconﬁdence intervals through1 − CL = (cid:90) ∞ Λ( η k ) f χ ( x ; 1 dof) d x , (23)with f χ ( x ; 1 dof) denoting the χ distribution of the vari-able x with a single degree of freedom. Further, CL de-notes the desired conﬁdence level. The determined signalyields (cid:98) η k = (cid:98) η sig are translated into partial branching frac-tions via∆ B ( B → X u (cid:96) + ν (cid:96) ; Reg . ) = (cid:98) η sig · (cid:15) ∆ B (Reg . ) (cid:0) (cid:15) tag · (cid:15) sel (cid:1) · N BB . (24)Here (cid:15) tag denotes the tagging eﬃciency, as determinedafter applying the calibration factor introduced in Sec-tion III D. Further, (cid:15) sel and (cid:15) ∆ B (Reg.) denote the signalside selection eﬃciency and a correction to the eﬃciencyto account for the fraction of B → X u (cid:96) + ν (cid:96) phase-spaceregion that is measured. The factor of 4 in the denom-inator is due to the factor N BB = (771 . ± . × B meson pairs and our averaging over electron and muonﬁnal states.To validate the ﬁt procedure we generated ensemblesof pseudoexperiments for diﬀerent input branching frac-tions for B → X u (cid:96) + ν (cid:96) signal and B → X c (cid:96) + ν (cid:96) back-ground. Fits to these ensembles show no biases in cen-tral values and no under- or overcoverage of CI. Us-ing the current world average of B ( B → X u (cid:96) + ν (cid:96) ) =(2 . ± . × − , we expect approximately between930 - 2070 B → X u (cid:96) + ν (cid:96) signal events with signiﬁcances s = (cid:98) η sig /(cid:15) ranging from about 9 to 15 standard devia-tions, depending on the signal region under study, andwith (cid:15) being the expected ﬁt error determined from Asi-mov data sets [63]. V. SYSTEMATIC UNCERTAINTIES

Several systematic uncertainties aﬀect the determina-tion of the reported partial branching fractions. Themost important uncertainties arise from the modeling ofthe B → X u (cid:96) + ν (cid:96) signal component and from the taggingcalibration correction. This is followed by uncertaintieson particle identiﬁcation of kaons and leptons, the un-certainty on the number of B -meson pairs, the statisticaluncertainty on the used MC samples, and uncertaintiesrelated to the eﬃciency of the track reconstructions. Ta-ble IV summarizes the systematic uncertainties for theﬁve measured partial branching fractions probing threephase-space regions. The table separates uncertaintiesthat originate from the background subtraction ( ‘Addi-tive uncertainties’) and uncertainties related to the trans-lation of the ﬁtted signal yields into partial branchingfractions ( ‘Multiplicative uncertainties’).The tagging calibration uncertainties are evaluated byproducing diﬀerent sets of calibration factors. These setstake into account the correlation structure from commonsystematic uncertainties (cf. Section III D) and that in-dividual channels and ranges of the output classiﬁer arestatistically independent. When applying the diﬀerentsets of calibration factors, we notice only negligible shapechanges on the signal and background template shapes,but the overall tagging eﬃciency is aﬀected. The associ-ated uncertainty on the calibration factors is found to be3.6% and is identical for the ﬁve measured partial branch-ing fractions. The B → X u (cid:96) + ν (cid:96) and B → X c (cid:96) + ν (cid:96) modeling uncertainties do directly aﬀect the shapes of M X , q , and E B(cid:96) signal and background distributions.Further, the B → X u (cid:96) + ν (cid:96) modeling aﬀects the overallreconstruction eﬃciencies and migrations of events insideand outside of the phase-space regions we measure. Weevaluate the uncertainties on the composition of the hy-brid B → X u (cid:96) + ν (cid:96) MC by variations of the B → π (cid:96) + ν (cid:96) , B → ρ (cid:96) + ν (cid:96) , B → ω (cid:96) + ν (cid:96) , B → η (cid:96) + ν (cid:96) , B → η (cid:48) (cid:96) + ν (cid:96) branching fractions and form factors. The uncertainty onnon-resonant B → X u (cid:96) + ν (cid:96) contributions in the hybridmodel is estimated by changing the underlying modelfrom that of DFN [51] to that of BLNP [17]. In addition, the uncertainty on the used DFN parameters m Sb and a (cf. Section II) are incorporated. For each of these vari-ations, new hybrid weights are calculated to propagatethe uncertainties into shapes and eﬃciencies. We esti-mate the uncertainties of X u fragmentation into s ¯ s quarkpairs by variations of the corresponding JETSET param-eter γ s (cf. Ref. [54]). As our BDT is trained to rejectﬁnal states with kaon candidates, a change in this frac-tion will directly impact the signal eﬃciency. The s ¯ s pro-duction probability has been measured by Refs. [64, 65]at center-of-mass energies of 12 and 36 GeV with val-ues of γ s = 0 . ± .

05 and γ s = 0 . ± .

06, respec-tively. We adopt the value and error of γ s = 0 . ± . X u system of the non-resonantsignal component is hadronized by JETSET into ﬁnalstates with two or more pions. We test the impact onthe signal eﬃciency by changing the post-ﬁt charged pionmultiplicity of non-resonant B → X u (cid:96) + ν (cid:96) to the distri-bution observed in data in the signal enriched region of M X < . B → X c (cid:96) + ν (cid:96) background after the BDT selectionis dominated by B → D (cid:96) + ν (cid:96) and B → D ∗ (cid:96) + ν (cid:96) de-cays. We evaluate the uncertainties on the modeling of B → D (cid:96) + ν (cid:96) B → D ∗ (cid:96) + ν (cid:96) and B → D ∗∗ (cid:96) + ν (cid:96) by vari-ations of the BGL parameters and heavy quark form fac-tors within their uncertainties. In addition, we propagatethe branching fraction uncertainties. The uncertaintieson the B → X c (cid:96) + ν (cid:96) gap branching fractions are taken tobe large enough to account for the diﬀerence between thesum of all exclusive branching fractions measured and theinclusive branching fraction measured. We also evaluatethe impact on the eﬃciency of the lepton- and hadron-identiﬁcation uncertainties, and the overall tracking ef-ﬁciency uncertainty. The statistical uncertainty on allgenerated MC samples is also evaluated and propagatedinto the systematic errors.We incorporate the eﬀect of additive systematic uncer-tainties directly into the likelihood function. This can bedone by introducing a vector of NPs, θ k , for each ﬁt tem-plate of a process k (e.g. signal or background). Eachelement of this vector represents one bin of the ﬁttedobservables of interest (e.g. M X , q , E B(cid:96) or a 2D bin of M X : q ). These NPs are constrained parameters in thelikelihood Eq. 20 using multivariate Gaussian distribu-tions, G k = G k ( ; θ k , Σ k ). Here Σ k denotes the system-atic covariance matrix for a given template k and θ k isa vector of NPs. The covariance Σ k is the sum over allpossible uncertainty sources for a given template k ,Σ k = error sources (cid:88) s Σ ks , (25)with Σ ks denoting the covariance matrix of error source s .The covariance matrices Σ ks depend on uncertainty vec-tors σ ks , which represent the absolute error in bins of theﬁt variable of template k . Uncertainties from the same er-ror source are either fully correlated, or for the case of MC3 M X [GeV] E v e n t s / ( . G e V ) ×10 B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc. 0 5 10 15 20 25 30 q [GeV ] E v e n t s / ( . G e V ) ×10 B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc.1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 E B [GeV] E v e n t s / ( . G e V ) ×10 M X < 1.7 GeV B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc. 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 E B [GeV] E v e n t s / ( . G e V ) ×10 M X > 1.7 GeV B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc.

FIG. 5. (Top) The M X and q spectra of the selected candidates prior to applying the background BDT are shown.(Bottom) The E B(cid:96) spectrum of the selected candidates prior to applying the background BDT are shown for events with M X < . M X > . or other statistical uncertainties, are treated as uncorre-lated. Both cases can be expressed as Σ ks = σ ks ⊗ σ ks or Σ ks = Diag (cid:16) σ ks (cid:17) , respectively. For particle identi-ﬁcation uncertainties, we estimate Σ ks using sets of cor-rection tables, sampled according to their statistical andsystematic uncertainties. The systematic NPs are incor-porated in Eq. 21 by rewriting the fractions f ik for alltemplates as f ik = η MC ik (cid:80) j η MC jk → η MC ik (1 + θ ik ) (cid:80) j η MC jk (cid:0) θ jk (cid:1) , (26)to take into account changes in the signal or backgroundshape. Here η MC ik denotes the predicted number of MCevents of a given bin i and a process k , and θ ik is theassociated nuisance parameter constrained by G k . VI. B → X c (cid:96) ¯ ν (cid:96) CONTROL REGION

Figure 5 compares the reconstructed M X , q , and E B(cid:96) distributions with the expectation from MC before ap-plying the background suppression BDT. All corrections are applied and the MC uncertainty contains all system-atic uncertainties discussed in Section V. The agreementof M X and q is excellent, but some diﬀerences in theshape of the lepton momentum spectrum are seen. Thisis likely due to imperfections of the modeling of the inclu-sive B → X c (cid:96) + ν (cid:96) background. The discrepancy reducesin the M X < . q and M X in two di-mensions. We use the lepton spectrum to measure thesame regions of phase space, to validate the obtained re-sults. VII. B → X u (cid:96) + ν (cid:96) SIGNAL REGION

Figure 6 shows the reconstructed M X , q , and E B(cid:96) distributions after the BDT selection is applied. The B → X u (cid:96) + ν (cid:96) contribution is now clearly visible atlow M X and high E B(cid:96) , while the reconstructed eventsand the MC expectation show good agreement. The B → X c (cid:96) + ν (cid:96) background is dominated by contributionsfrom B → D (cid:96) + ν (cid:96) and B → D ∗ (cid:96) + ν (cid:96) decays, and theremaining background is predominantly from secondaryleptons, and misidentiﬁed lepton candidates.4 M X [GeV] E v e n t s / ( . G e V ) B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc. 0 5 10 15 20 25 q [GeV ] E v e n t s / ( . G e V ) B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc.1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 E B [GeV] E v e n t s / ( . G e V ) M X < 1.7 GeV B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc. 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 E B [GeV] E v e n t s / ( . G e V ) M X > 1.7 GeV B DB D * B D ** Gap modesSec. & fakeContinuum

B X u DataMC unc.

FIG. 6. The M X , q and E B(cid:96) spectra after applying the background BDT but before the ﬁt are shown. The B → X u (cid:96) + ν (cid:96) contribution is shown in red and scaled to the world average of B ( B → X u (cid:96) + ν (cid:96) ) = (2 . ± . × − . The data and MCagreement is reasonable in all variables. The E B(cid:96) spectra is shown with selections of M X < . M X > . M X < . B → X c (cid:96) + ν (cid:96) modeling of higher charmed states. TABLE IV. The fractional uncertainty on the extracted B → X u (cid:96) + ν (cid:96) partial branching fractions are shown. For deﬁnitionsof additive and multiplicative errors, see text. Fractional uncertainties [%]Phase-space region M X < . M X < . , M X < . , E B(cid:96) > E B(cid:96) > E B(cid:96) > E B(cid:96) > q > ,E B(cid:96) > M X ﬁt) ( E B(cid:96) ﬁt) ( q ﬁt) ( E B(cid:96) ﬁt) ( M X : q ﬁt) Additive uncertainties B → X u (cid:96) + ν (cid:96) modeling B → π (cid:96) + ν (cid:96) FFs 0.1 0.7 1.4 0.6 0.4 B → ρ (cid:96) + ν (cid:96) FFs 0.2 1.9 4.3 1.9 0.7 B → ω (cid:96) + ν (cid:96) FFs 0.5 3.2 5.2 3.1 0.8 B → η (cid:96) + ν (cid:96) FFs 0.1 0.7 1.4 0.8 0.3 B → η (cid:48) (cid:96) + ν (cid:96) FFs 0.1 0.7 1.4 0.8 1.2 B ( B → π (cid:96) + ν (cid:96) ) 0.2 0.1 0.2 0.1 0.2 B ( B → ρ (cid:96) + ν (cid:96) ) 0.3 0.7 0.8 0.5 0.4 B ( B → ω (cid:96) + ν (cid:96) ) < B ( B → η (cid:96) + ν (cid:96) ) < < < B ( B → η (cid:48) (cid:96) + ν (cid:96) ) < < < B ( B → X u (cid:96) + ν ) 0.7 2.0 2.1 2.1 2.1DFN parameters 2.3 3.5 1.1 3.5 5.0Hybrid model 2.7 8.7 4.6 8.7 3.1 B → X c (cid:96) + ν (cid:96) modeling B → D (cid:96) + ν (cid:96) FFs 0.1 0.1 0.9 0.1 < B → D ∗ (cid:96) + ν (cid:96) FFs 1.4 1.2 3.0 1.3 1.1 B → D ∗∗ (cid:96) + ν (cid:96) FFs 0.4 0.5 0.3 0.5 0.4 B ( B → D (cid:96) + ν (cid:96) ) 0.1 < < B ( B → D ∗ (cid:96) + ν (cid:96) ) < < < B ( B → D ∗∗ (cid:96) + ν (cid:96) ) 0.6 0.1 0.3 0.1 0.5Gap modeling 1.1 0.1 0.3 0.1 1.0MC statistics 1.3 1.6 3.8 1.7 1.6Tracking eﬃciency 0.3 - 0.8 - 0.4 L (cid:96) ID shape 1.0 0.5 1.3 0.6 1.2 L K /π ID shape 1.2 - 1.3 - 1.0 D → X(cid:96) ν (cid:96) π s eﬃciency < Multiplicative uncertainties B → X u (cid:96) + ν (cid:96) modeling B → π (cid:96) + ν (cid:96) FFs 0.2 0.2 1.9 0.2 0.2 B → ρ (cid:96) + ν (cid:96) FFs 0.7 0.8 3.7 0.8 0.6 B → ω (cid:96) + ν (cid:96) FFs 1.3 1.6 6.1 1.6 1.1 B → η (cid:96) + ν (cid:96) FFs 0.3 0.3 1.7 0.3 0.2 B → η (cid:48) (cid:96) + ν (cid:96) FFs 0.2 0.3 1.7 0.3 0.2 B ( B → π (cid:96) + ν (cid:96) ) 0.3 0.4 0.4 0.4 0.3 B ( B → ρ (cid:96) + ν (cid:96) ) 0.4 0.6 0.6 0.6 0.4 B ( B → ω (cid:96) + ν (cid:96) ) < < < < B ( B → η (cid:96) + ν (cid:96) ) 0.1 0.1 < < B ( B → η (cid:48) (cid:96) + ν (cid:96) ) 0.1 0.1 0.1 0.1 0.1 B ( B → X u (cid:96) + ν ) 3.0 3.2 2.9 4.8 3.8DFN parameters 2.5 2.5 2.7 6.8 3.6Hybrid model 0.2 0.8 1.4 4.7 2.8 π + multiplicity 1.7 2.5 2.3 3.1 1.7 γ s ( s ¯ s fragmentation) 0.5 0.8 1.1 1.1 0.8 L (cid:96) ID eﬃciency 1.5 1.6 1.6 1.6 1.5 L K /π ID eﬃciency 0.7 0.6 0.6 0.6 0.7 N B ¯ B Total syst. uncertainty 7.8 12.6 14.6 15.4 10.4 E v e n t s / b i n BackgroundSignal-outSignal-inDataMC uncertainty0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 M X [GeV] P u ll E v e n t s / b i n BackgroundSignal-outSignal-inDataMC uncertainty0 5 10 15 20 25 q [GeV ] P u ll FIG. 7. The post-ﬁt distributions of the one-dimensionalﬁts to M X and q are shown, corresponding to the measuredpartial branching fractions for E B(cid:96) > M X < . M X < . q > , respectively. VIII. RESULTS

We report partial branching fractions for three phase-space regions from ﬁve ﬁts to the reconstructed variablesintroduced in Section IV. All partial branching fractionscorrespond to a selection with E B(cid:96) > A. Partial Branching Fraction Results

For the partial branching fraction with M X < . M X we ﬁnd∆ B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − , (27) E v e n t s / b i n BackgroundSignal-outSignal-inDataMC uncertainty1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 E B [GeV] P u ll FIG. 8. The post-ﬁt distributions of the ﬁt to E B(cid:96) with M X < . E B(cid:96) > M X < . with the ﬁrst and second error denoting the statisticaland systematic uncertainty, respectively. The resultingpost-ﬁt distribution is shown in the top panel of Fig-ure 7. With this selection about 56% of the available B → X u (cid:96) + ν (cid:96) phase space is probed. The partial branch-ing fraction is in good agreement with the value obtainedby ﬁtting E B(cid:96) and corrected to the same phase space. Theﬁt is shown in Figure 8 and we measure∆ B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − , (28)with a larger systematic and statistical uncertainty thanEq. 27. To further probe the B → X u (cid:96) + ν (cid:96) enrichedregion, we carry out a measurement for M X < . q > from a ﬁt to the q spectrum. Thisselection only probes about 30% of the available B → X u (cid:96) + ν (cid:96) phase space. We ﬁnd∆ B ( B → X u (cid:96) + ν (cid:96) ) = (0 . ± . ± . × − . (29)The corresponding post-ﬁt distribution of q is shownin the bottom panel of Figure 7. The most precisedeterminations of B → X u (cid:96) + ν (cid:96) are obtained from atwo-dimensional ﬁt, exploiting the full combined dis-criminatory power of M X and q . The resulting par-tial branching fraction probes about 85% of the available B → X u (cid:96) + ν (cid:96) phase space. We measure∆ B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − . (30)The projection of the 2D ﬁt onto M X and the q distribu-tion for the signal enriched region of M X < . q distributions aregiven in Appendix D. The partial branching fraction isalso in good agreement from the measurement obtained7 TABLE V. The ﬁtted signal yields in ( (cid:98) η sig ) and outside ( (cid:98) η sig − out ) the measured phase-space regions, the background yields( (cid:98) η bkg ) and the product of tagging and selection eﬃciency are listed.Phase-space region Additional Selection Fit variable(s) (cid:98) η sig (cid:98) η sig − out (cid:98) η bkg (cid:0) (cid:15) tag · (cid:15) sel (cid:1) M X < . E B(cid:96) > M X ﬁt 1558 ± ±

72 364 ±

51 6912 ±

138 0 . ± . M X < . E B(cid:96) > M X < . E B(cid:96) ﬁt 1285 ± ±

136 22 ± ±

153 0 . ± . M X < . q > , E B(cid:96) > M X < . q ﬁt 938 ± ±

98 474 ±

58 1253 ±

194 0 . ± . E B(cid:96) > M X < . E B(cid:96) ﬁt 1303 ± ±

138 - 1366 ±

154 0 . ± . E B(cid:96) > M X : q ﬁt 1801 ± ±

127 - 7032 ±

167 0 . ± . by ﬁtting E B(cid:96) , covering the same phase space (c.f. Fig-ure 8):∆ B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − . (31)The uncertainties are larger, but both results arecompatible. The nuisance parameter pulls of all ﬁtsare provided in Appendix D. The result of Eq. 30can be further compared with the most precise mea-surement to date of this region of Ref. [66], where∆ B ( B → X u (cid:96) ν (cid:96) ) = (1 . ± . × − , and showsgood agreement. The measurement can also be com-pared to Ref. [15] using a similar experimental approach.The measured partial branching fraction of E B(cid:96) > B ( B → X u (cid:96) ν (cid:96) ) = (1 . ± . × − , which iscompatible with Eq. 30 within 0.9 standard deviations.Belle previously reported in Ref. [16] using also a similarapproach for the same phase space a higher value of∆ B ( B → X u (cid:96) ν (cid:96) ) = (1 . ± . × − . We cannotquantify the statistical overlap between both results, butby comparing the number of determined signal eventsone can estimate it to be below 55%. The dominantsystematic uncertainties of Ref. [16] were evaluatedusing diﬀerent approaches, but fully correlating thedominant systematic uncertainties and assuming astatistical correlation of 55% we obtain a compatibilityof 1.7 standard deviations. The main diﬀerence of thisanalysis with Ref. [16] lies in the modeling of signaland background processes: since its publication ourunderstanding improved and more precise measurementsof branching fractions and form factors were madeavailable. Further, for the B → X u (cid:96) + ν (cid:96) signal processin this paper a hybrid approach was adopted (seeSection II and Appendix A), whereas Ref. [16] usedan alternative approach to model signal as a mix ofinclusive and exclusive decay modes. Note that thiswork supersedes Ref. [16]. B. | V ub | Determination

We determine | V ub | from the measured partial branch-ing fractions using a range of theoretical rate predictions.In principle, the total B → X u (cid:96) + ν (cid:96) decay rate can becalculated using the same approach as B → X c (cid:96) + ν (cid:96) us-ing the heavy quark expansion (HQE) in inverse pow-ers of m b . Unfortunately, the measurement requirementsnecessary to separate B → X u (cid:96) + ν (cid:96) from the dominant B → X c (cid:96) + ν (cid:96) background spoil the convergence of thisapproach. In the predictions for the partial rates cor-responding to our measurements, perturbative and non-perturbative uncertainties are largely enhanced and asoutlined in the introduction the predictions are sensitiveto the shape function modeling.The relationship between measured partial branchingfractions, predictions of the rate (omitting CKM factors)∆Γ( B → X u (cid:96) + ν (cid:96) ), and | V ub | is | V ub | = (cid:115) ∆ B ( B → X u (cid:96) + ν (cid:96) ) τ B · ∆Γ( B → X u (cid:96) + ν (cid:96) ) . (32)with τ B = (1 . ± . B meson lifetime [37]. We use fourpredictions for the theoretical partial rates. All predic-tions use the same input values as Ref. [6] chooses fortheir world averages. The four predictions are:- BLNP : The prediction of Bosch, Lange, Neubert,and Paz (short BLNP) of Ref. [17] provides a pre-diction at next-to-leading-order accuracy in termsof the strong coupling constant α s and incorporatesall known corrections. Predictions are interpolatedbetween the shape-function dominated region (end-point of the lepton spectrum, small hadronic mass)to the region of phase space, that can be describedvia the operator product expansion (OPE). As in-put we use m SF b = 4 . ± .

03 GeV and µ π =0 . +0 . − . GeV .8 E v e n t s / b i n BackgroundSignalDataMC uncertainty0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 M X [GeV] P u ll E v e n t s / b i n w i d t h BackgroundSignalDataMC uncertainty0 5 10 15 20 25 q [GeV ] P u ll FIG. 9. The post-ﬁt projection of M X of the two-dimensionalﬁt to M X : q on M X and the q distribution in the rangeof M X ∈ [0 , .

5] GeV are shown. The resulting yields arecorrected to correspond to a partial branching fraction with E B(cid:96) > q distributions are given inFigure 21 (Appendix D). - DGE : The Dressed Gluon Approximation (shortDGE) from Andersen and Gardi [19, 20] makes pre-dictions by avoiding the direct use of shape func-tions, but produces predictions for hadronic observ-ables using the on-shell b -quark mass. The calcu-lation is carried out in the MS scheme and we use m b (MS) = 4 . ± .

04 GeV.-

GGOU : The prediction from Gambino, Giordano,Ossola, and Uraltsev [18] (short GGOU) incorpo-rates all known perturbative and non-perturbativeeﬀects up to the order O ( α s β ) and O (1 /m b ), re-spectively. The shape function dependence is incor-porated by parametrizing its eﬀects in each struc-ture function with a single light-cone function. Thecalculation is carried out in the kinetic scheme andwe use as inputs m kin b = 4 . ± .

02 GeV and µ π = 0 . ± .

08 GeV .- ADFR : The calculation of Aglietti, Di Lodovico,Ferrera, and Ricciardi [21, 22] makes use of the ra-tio of B → X u (cid:96) + ν (cid:96) to B → X c (cid:96) + ν (cid:96) rates andsoft-gluon resummation at next-to-next-to-leading-order and an eﬀective QCD coupling approach.The calculation uses the MS scheme and we use m b (MS) = 4 . ± .

04 GeV.Table VI lists the decay rates and their associated uncer-tainties for the probed regions of phase space, which weuse to extract | V ub | from the measured partial branchingfractions with Eq. 32. C. | V ub | Results

From the partial branching fractions with E B(cid:96) > M X < . M X we ﬁnd | V ub | (BLNP) = (3 . ± . ± . ± . × − , | V ub | (DGE) = (cid:16) . ± . ± . +0 . − . (cid:17) × − , | V ub | (GGOU) = (cid:16) . ± . +0 . − . . − . (cid:17) × − , | V ub | (ADFR) = (3 . ± . ± . ± . × − . (33)The uncertainties denote the statistical uncertainty, thesystematic uncertainty and the theory error from the par-tial rate prediction. For the partial branching fractionwith E B(cid:96) > M X < . q > weﬁnd | V ub | (BLNP) = (cid:16) . +0 . − . . − . . − . (cid:17) × − , | V ub | (DGE) = (cid:16) . +0 . − . . − . . − . (cid:17) × − , | V ub | (GGOU) = (cid:16) . +0 . − . . − . . − . (cid:17) × − , | V ub | (ADFR) = (cid:16) . +0 . − . . − . ± . (cid:17) × − . (34)Finally, the most inclusive determination with E B(cid:96) > M X and q resultsin | V ub | (BLNP) = (cid:16) . ± . +0 . − . . − . (cid:17) × − , | V ub | (DGE) = (cid:16) . ± . +0 . − . . − . (cid:17) × − , | V ub | (GGOU) = (cid:16) . ± . +0 . − . . − . (cid:17) × − , | V ub | (ADFR) = (cid:16) . ± . +0 . − . ± . (cid:17) × − . (35)In order to quote a single value for | V ub | we adapt theprocedure of Ref. [67] and calculate a simple arithmetic9 TABLE VI. The theory rates ∆Γ( B → X u (cid:96) + ν (cid:96) ) from various theory calculations are listed. The rates are given in units ofps − . Phase-space region BLNP [17] DGE [19, 20] GGOU [18] ADFR [21, 22] M X < . . +5 . − . . +5 . − . . +3 . − . . +5 . − . M X < . q > . +3 . − . . +2 . − . . +3 . − . . +3 . − . E B(cid:96) > . +6 . − . . +3 . − . . +2 . − . . +5 . − . average of the most precise determinations in Eq. 35 toobtain | V ub | = (4 . ± . ± . ± . × − . (36)This value is larger, but compatible with the ex-clusive measurement of | V ub | from B → π (cid:96) + ν (cid:96) of | V ub | = (3 . ± . ± . × − within 1.3 standarddeviations. D. Stability Checks

To check the stability of the result we redetermine thepartial branching fractions using two additional workingpoints. We change the BDT selection to increase anddecrease the amount of B → X c (cid:96) + ν (cid:96) and other back-grounds, and repeat the full analysis procedure. Theresulting values of ∆ B ( B → X u (cid:96) ν (cid:96) ) are determined us-ing the two-dimensional ﬁt of M X : q and are shownin Figure 10. The background contamination changes by BDT classifier cut ( B X u , E B > GeV)

Total uncertaintyStat. uncertaintyTotal unc. for BDT cut = 0.85Stat. unc. for BDT cut = 0.85 S i g n a l e ff i c i e n c y ( % ) B k g . e ff i c i e n c y ( % ) FIG. 10. The stability of the determined partial branchingfraction ∆ B ( B → X u (cid:96) ν (cid:96) ) using the M X : q ﬁt is studiedas a function of the BDT selection requirement. The clas-siﬁer output selection of 0 .

83 and 0 .

87 correspond to signaleﬃciencies after the pre-selection of 22% and 15%, respec-tively. These selections increase, or decrease the backgroundfrom B → X c (cid:96) + ν (cid:96) and other processes by 37% and 33%,respectively. The grey and yellow bands show the total andstatistical error, respectively, with the nominal BDT workingpoint of 0.85. +37% and − E. B → X u (cid:96) + ν (cid:96) Charged Pion Multiplicity

The modeling the B → X u (cid:96) + ν (cid:96) signal composition iscrucial to all presented measurements. One aspect dif-ﬁcult to assess is the X u fragmentation simulation: thecharmless X u state can decay via many diﬀerent channelsproducing a number of charged or neutral pions or kaons.In Section V we discussed how we assess the uncertaintyon the number of s ¯ s quark pairs produced in the X u frag-mentation. Due to the BDT removing such events to sup-press the dominant B → X c (cid:96) + ν (cid:96) background, no signal-enriched region can be easily obtained. The accuracy ofthe fragmentation into the number of charged pions canbe tested in the signal enriched region of M X < . Number of ± E v e n t s / ( . ) Other

B X c B BBB + B B + B Non-reso.

B X u DataMC unc.

FIG. 11. The post-ﬁt charged pion multiplicity is shown forevents with M X < . M X : q . The uncertainty band shownon the MC includes the full systematic uncertainties dis-cussed in Section V. The agreement overall lies within theassigned uncertainties, with the data having more eventsin the zero multiplicity bin and less in the two chargedpion multiplicity bin. We use this distribution to cor-rect our simulation to assign an additional uncertaintyfrom the charged pion fragmentation. More details canbe found in Section V and Appendix C. F. Lepton Flavor Universality and WeakAnnihilation Contributions

To test the lepton ﬂavor universality in B → X u (cid:96) + ν (cid:96) we also carry out ﬁts to determine the partial branchingfraction for electron and muon ﬁnal states. For this wecategorzie the selected events accordingly and carry outa ﬁt to the M X : q distributions using the same gran-ularity as the ﬁt described in Section VIII A. We carryout a simultaneous analysis of both samples, such thatshared NPs for the modeling of the signal or backgroundcomponents can be correctly correlated afterwards. Theresulting yields are corrected to a partial branching frac-tion with E B(cid:96) > B ( B → X u e + ν e ) = (1 . ± . ± . × − , (37)∆ B ( B → X u µ + ν µ ) = (1 . ± . ± . × − , (38)with a total correlation of ρ = 0 .

57. The ratio of theelectron to the muon ﬁnal state is R eµ = ∆ B ( B → X u e + ν e )∆ B ( B → X u µ + ν µ ) = 0 . ± . ± . , (39)with the ﬁrst error denoting the statistical uncertaintyand the second the systematic uncertainty. We observeno signiﬁcant deviation from lepton ﬂavor universality.More details on the ﬁt can be found in Appendix E.Isospin breaking eﬀects can be studied by separatelymeasuring the partial branching fraction for charged andneutral B meson ﬁnal states. We determine the ratio R iso = τ B τ B + × ∆ B ( B + → X u (cid:96) + ν (cid:96) )∆ B ( B → X u (cid:96) + ν (cid:96) ) , (40)by using the information from the composition of thefully reconstructed tag-side B -meson decays to separatecharged and neutral B candidates. The partial branch-ing fraction is then determined by a simultaneous ﬁt ofboth samples in M X : q to correctly correlate commonsystematic uncertainties. To account for the small con-tamination of wrongly assigned B tag ﬂavors, we use thewrong-tag fractions from our simulation. The measurednumber of signal events in the reconstructed neutral andcharged B candidate categories (denoted in the following as N and N +reco ) are related to the number of neutraland charged B mesons ( N and N +true ) via N = P B → B N + P B +true → B N +true , (41) N +reco = P B → B +reco N + P B +true → B +reco N +true . (42)Here e.g. P B → B +reco denotes the probability to identifyin the reconstruction of the tag-side B -meson a true B as a B + candidate. In the simulation we ﬁnd P B → B = 0 . P B → B +reco = 0 . , (43) P B +true → B +reco = 0 . P B +true → B = 0 . . (44)Using this procedure we determine for the individual par-tial branching fractions with E B(cid:96) > B ( B + → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − , (45)∆ B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − , (46)with a total correlation of ρ = 0 .

52 and for the ratioEq. 40 R iso = 1 . ± . ± . , (47)compatible with the expectation of equal semileptonicrates for both isospin states. Isospin breaking eﬀectswould for instance arise from weak annihilation contri-butions, which only can contribute to charged B me-son ﬁnal states. Using Eq. 47 the relative contributionfrom weak annihilation processes to the total semilep-tonic B → X u (cid:96) + ν (cid:96) rate can be constrained viaΓ wa Γ( B → X u (cid:96) + ν (cid:96) ) = f u f wa × ( R iso − . (48)Here f u is a factor that corrects the measured partialbranching fraction to the full inclusive phase space. Weestimate it using the DFN model [51] (cf. Section II fordetails) and ﬁnd f u = 0 .

86. We further assume that f wa = 1, as such processes would produce a high momen-tum lepton. We recoverΓ wa Γ( B → X u (cid:96) + ν (cid:96) ) = 0 . ± . , (49)which translates into a limit of [ − . , .

17] at 90% CL.This result is more stringent than the limit of Ref. [15],but weaker than the result of Ref. [68], that directly usedthe shape of the q distribution to constrain weak an-nihilation processes. Our result is also weaker than theestimates of Refs. [69–72] that constrain weak annihila-tion contributions to be of the order 2-3%.1 IX. SUMMARY AND CONCLUSIONS

We report measurements of partial branching frac-tions with diﬀerent requirements on the properties of thehadronic system of the B → X u (cid:96) + ν (cid:96) decay and witha lepton energy of E B(cid:96) > B rest-frame,covering 31-86% of the available phase space. The size-able background from semileptonic B → X c (cid:96) + ν (cid:96) de-cays is suppressed using multivariate methods in theform of a BDT. This approach allows us to reduce suchbackgrounds to an acceptable level, whilst retaining ahigh signal eﬃciency. Signal yields are obtained using abinned likelihood ﬁt in either the reconstructed hadronicmass M X , the four-momentum-transfer squared q , orthe lepton energy E B(cid:96) . The most precise result is ob-tained from a two-dimensional ﬁt of M X and q . Trans-lated to a partial branching fraction for E B(cid:96) > B ( B → X u (cid:96) + ν (cid:96) ) = (1 . ± . ± . × − , (50)with the errors denoting statistical and systematic un-certainties. The partial branching fraction is compatiblewith the value obtained by a ﬁt of the lepton energyspectrum E B(cid:96) and with the most precise determinationof Ref. [66]. In addition, it is stable under variationsof the background suppression BDT. From this partialbranching fraction we obtain a value of | V ub | = (4 . ± . ± . ± . × − (51)from an average over four theoretical calculations. Thisvalue is higher than, but compatible with, the valueof | V ub | from exclusive determinations by 1.3 standarddeviations. The compatibility with the value expectedfrom CKM unitarity from a ﬁt of Ref. [73] of | V ub | = (cid:16) . +0 . − . (cid:17) × − is 1.6 standard deviations. Fig-ure 12 summarizes the situation. The result presentedhere supersedes Ref. [16]: this paper uses a more eﬃ-cient tagging algorithm, incorporates improvements ofthe B → X u (cid:96) + ν (cid:96) signal and B → X c (cid:96) + ν (cid:96) backgrounddescriptions, and analyzes the full Belle data set of 711fb − . The measurement of kinematic diﬀerential shapesof M X , q , and other properties are left for future work.These results will be crucial for future direct measure-ments with Belle II that will attempt to use data-drivenmethods to directly constrain the shape function using B → X u (cid:96) + ν (cid:96) information. ACKNOWLEDGMENTS

We thank Kerstin Tackmann, Frank Tackmann,Zoltan Ligeti, Ian Stewart, Thomas Mannel, and KeriVoss for useful discussions about the subject matter ofthis manuscript. LC, WS, RVT, and FB were supportedby the DFG Emmy-Noether Grant No. BE 6075/1-1.WS was supported by the Alexander von Humboldt | V ub | BLNPDGEGGOUADFROur averageHFLAV B CKMFitter

FIG. 12. The obtained values of | V ub | from the four cal-culations and the arithmetic average is compared to the de-termination from exclusive B → π (cid:96) + ν (cid:96) and the expectationfrom CKM unitarity [73] without the direct constraints fromsemileptonic and leptonic decays. Foundation. FB is dedicating this paper to his fatherUrs Bernlochner, who sadly passed away during thewriting of this manuscript. We miss you so much. Wethank the KEKB group for the excellent operation of theaccelerator; the KEK cryogenics group for the eﬃcientoperation of the solenoid; and the KEK computergroup, and the Paciﬁc Northwest National Laboratory(PNNL) Environmental Molecular Sciences Laboratory(EMSL) computing group for strong computing support;and the National Institute of Informatics, and ScienceInformation NETwork 5 (SINET5) for valuable networksupport. We acknowledge support from the Ministryof Education, Culture, Sports, Science, and Technology(MEXT) of Japan, the Japan Society for the Promotionof Science (JSPS), and the Tau-Lepton Physics ResearchCenter of Nagoya University; the Australian ResearchCouncil including grants DP180102629, DP170102389,DP170102204, DP150103061, FT130100303; Aus-trian Science Fund (FWF); the National Nat-ural Science Foundation of China under Con-tracts No. 11435013, No. 11475187, No. 11521505,No. 11575017, No. 11675166, No. 11705209; Key Re-search Program of Frontier Sciences, Chinese Academyof Sciences (CAS), Grant No. QYZDJ-SSW-SLH011;the CAS Center for Excellence in Particle Physics(CCEPP); the Shanghai Pujiang Program under GrantNo. 18PJ1401000; the Ministry of Education, Youthand Sports of the Czech Republic under ContractNo. LTT17020; the Carl Zeiss Foundation, the DeutscheForschungsgemeinschaft, the Excellence Cluster Uni-verse, and the VolkswagenStiftung; the Departmentof Science and Technology of India; the IstitutoNazionale di Fisica Nucleare of Italy; National ResearchFoundation (NRF) of Korea Grant Nos. 2016R1D1A1B-01010135, 2016R1D1A1B02012900, 2018R1A2B3003643,2018R1A6A1A06024970, 2018R1D1A1B07047294,22019K1A3A7A09033840, 2019R1I1A3A01058933; Ra-diation Science Research Institute, Foreign Large-sizeResearch Facility Application Supporting project, theGlobal Science Experimental Data Hub Center of theKorea Institute of Science and Technology Informationand KREONET/GLORIAD; the Polish Ministry ofScience and Higher Education and the National ScienceCenter; the Ministry of Science and Higher Educationof the Russian Federation, Agreement 14.W03.31.0026; University of Tabuk research grants S-1440-0321, S-0256-1438, and S-0280-1439 (Saudi Arabia); the SlovenianResearch Agency; Ikerbasque, Basque Foundation forScience, Spain; the Swiss National Science Founda-tion; the Ministry of Education and the Ministry ofScience and Technology of Taiwan; and the UnitedStates Department of Energy and the National ScienceFoundation. [1] N. Cabibbo, Phys. Rev. Lett. , 531 (1963).[2] M. Kobayashi and T. Maskawa, Progressof Theoretical Physics , 652 (1973),https://academic.oup.com/ptp/article-pdf/49/2/652/5257692/49-2-652.pdf.[3] P. Zyla et al. (Particle Data Group, CKM Quark-MixingMatrix Review), Prog. Theor. Exp. Phys. (2020).[4] K. Abe et al. (T2K Collaboration), Nature , 339(2020), arXiv:1910.03887 [hep-ex].[5] E. Kou, P. Urquijo, et al. (Belle II Collaboration),Prog. Theor. Exp. Phys. , 123C01 (2019), [Erratum:PTEP 2020, 029201 (2020)], arXiv:1808.10567 [hep-ex].[6] Y. S. Amhis et al. (Heavy Flavor Averaging Group(HFLAV)), (2019), arXiv:1909.12524 [hep-ex].[7] R. Aaij et al. (LHCb Collaboration), Nature Phys. ,743 (2015), arXiv:1504.01568 [hep-ex].[8] S. Aoki et al. (Flavour Lattice Averaging Group), Eur.Phys. J. C , 113 (2020), arXiv:1902.08191 [hep-lat].[9] A. Bharucha, JHEP , 092 (2012), arXiv:1203.1359[hep-ph].[10] P. Gambino and N. Uraltsev, Eur. Phys. J. C , 181(2004), arXiv:hep-ph/0401063.[11] C. W. Bauer, Z. Ligeti, M. Luke, A. V. Manohar, andM. Trott, Phys. Rev. D , 094017 (2004), arXiv:hep-ph/0408002.[12] D. Benson, I. I. Bigi, and N. Uraltsev, Nucl. Phys. B , 371 (2005), arXiv:hep-ph/0410080.[13] F. U. Bernlochner, H. Lacker, Z. Ligeti, I. W. Stewart,F. J. Tackmann, and K. Tackmann (SIMBA Collabora-tion), (2020), arXiv:2007.04320 [hep-ph].[14] P. Gambino, K. J. Healey, and C. Mondino, Phys. Rev.D , 014031 (2016), arXiv:1604.07598 [hep-ph].[15] J. Lees et al. (BaBar Collaboration), Phys. Rev. D ,032004 (2012), arXiv:1112.0702 [hep-ex].[16] P. Urquijo et al. (Belle Collaboration), Phys. Rev. Lett. , 021801 (2010), arXiv:0907.0379 [hep-ex].[17] B. O. Lange, M. Neubert, and G. Paz, Phys. Rev. D ,073006 (2005), arXiv:hep-ph/0504071.[18] P. Gambino, P. Giordano, G. Ossola, and N. Uraltsev,JHEP , 058 (2007), arXiv:0707.2493 [hep-ph].[19] J. R. Andersen and E. Gardi, JHEP , 097 (2006),arXiv:hep-ph/0509360.[20] E. Gardi, Frascati Phys. Ser. , 381 (2008),arXiv:0806.4524 [hep-ph].[21] U. Aglietti, F. Di Lodovico, G. Ferrera, and G. Ricciardi,Eur. Phys. J. C , 831 (2009), arXiv:0711.0860 [hep-ph].[22] U. Aglietti, G. Ferrera, and G. Ricciardi, Nucl. Phys. B , 85 (2007), arXiv:hep-ph/0608047. [23] M. Bona et al. (UTFit Collaboration), Presentation atthe ICHEP 2020 Conference (Online) (2020).[24] S. Kurokawa and E. Kikutani, Nucl. Instr. and. Meth. A499 , 1 (2003), and other papers included in this Vol-ume; T. Abe et al. , Prog. Theor. Exp. Phys. ,03A001 (2013) and references therein.[25] A. Abashian et al. , Nucl. Instrum. Meth.

A479 , 117(2002), also see detector section in J. Brodzicka et al. ,Prog. Theor. Exp. Phys. , 04D001 (2012).[26] K. Hanagaki, H. Kakuno, H. Ikeda, T. Iijima, andT. Tsukamoto, Nucl. Instr. and. Meth.

A485 , 490(2002).[27] A. Abashian et al. , Nucl. Instr. and. Meth.

A491 , 69(2002).[28] D. J. Lange, Nucl. Instr. and. Meth.

A462 , 152 (2001).[29] R. Brun, F. Bruyant, M. Maire, A. C. McPherson, andP. Zanarini, CERN-DD-EE-84-1 (1987).[30] E. Barberio, B. van Eijk, and Z. Was, Comput. Phys.Commun. , 115 (1991).[31] C. G. Boyd, B. Grinstein, and R. F. Lebed, Phys. Rev.Lett. , 4603 (1995), arXiv:hep-ph/9412324 [hep-ph].[32] R. Glattauer et al. (Belle Collaboration), Phys. Rev. D , 032006 (2016), arXiv:1510.03657 [hep-ex].[33] B. Grinstein and A. Kobach, Phys. Lett. B 771 , 359(2017), arXiv:1703.08170 [hep-ph].[34] D. Bigi, P. Gambino, and S. Schacht, Phys. Lett.

B 769 ,441 (2017), arXiv:1703.06124 [hep-ph].[35] E. Waheed et al. (Belle Collaboration), Phys. Rev. D , 052007 (2019), arXiv:1809.03290 [hep-ex].[36] F. U. Bernlochner and Z. Ligeti, Phys. Rev. D , 014022(2017), arXiv:1606.09300 [hep-ph].[37] P. Zyla et al. (Particle Data Group), Prog. Theor. Exp.Phys. (2020).[38] D. Liventsev et al. (Belle Collaboration), Phys. Rev. D , 091503 (2008), arXiv:0711.3252 [hep-ex].[39] B. Aubert et al. (BaBar Collaboration), Phys. Rev. Lett. , 261802 (2008), arXiv:0808.0528 [hep-ex].[40] J. Abdallah et al. (DELPHI Collaboration), Eur. Phys.J. C , 35 (2006), arXiv:hep-ex/0510024.[41] A. K. Leibovich, Z. Ligeti, I. W. Stewart, andM. B. Wise, Phys. Rev. D , 308 (1998), arXiv:hep-ph/9705467.[42] I. Bigi, B. Blossier, A. Le Yaouanc, L. Oliver, O. Pene, J.-C. Raynal, A. Oyanguren, and P. Roudeau, Eur. Phys.J. C , 975 (2007), arXiv:0708.1621 [hep-ph].[43] R. Aaij et al. (LHCb Collaboration), Phys. Rev. D , 092001 (2011), [Erratum: Phys.Rev.D 85, 039904(2012)], arXiv:1109.6831 [hep-ex]. [44] J. Lees et al. (BaBar Collaboration), Phys. Rev. Lett. , 041801 (2016), arXiv:1507.08303 [hep-ex].[45] C. Bourrely, I. Caprini, and L. Lellouch, Phys. Rev. D79 , 013008 (2009), [Erratum: Phys. Rev. D82, 099902(2010)], arXiv:0807.2722 [hep-ph].[46] J. A. Bailey et al. (Fermilab Lattice and MILC Collabora-tions), Phys. Rev. D , 014024 (2015), arXiv:1503.07839[hep-lat].[47] A. Sibidanov et al. (Belle Collaboration), Phys. Rev. D , 032005 (2013), arXiv:1306.2781 [hep-ex].[48] J. P. Lees et al. (BaBar Collaboration), Phys. Rev. D87 ,032004 (2013), [Erratum: Phys. Rev. D87, no.9, 099904(2013)], arXiv:1205.6245 [hep-ex].[49] P. del Amo Sanchez et al. (BaBar Collaboration), Phys.Rev.

D 83 , 032007 (2011), arXiv:1005.3288 [hep-ex].[50] G. Duplancic and B. Melic, JHEP , 138 (2015),arXiv:1508.05287 [hep-ph].[51] F. De Fazio and M. Neubert, JHEP , 017 (1999),arXiv:hep-ph/9905351 [hep-ph].[52] A. L. Kagan and M. Neubert, Eur. Phys. J. C , 5 (1999),arXiv:hep-ph/9805303.[53] O. Buchmuller and H. Flacher, Phys. Rev. D , 073008(2006), arXiv:hep-ph/0507253.[54] T. Sj¨ostrand, Comput. Phys. Commun. , 74 (1994).[55] C. Ramirez, J. F. Donoghue, and G. Burdman, Phys.Rev. D 41 , 1496 (1990).[56] M. Prim et al. (Belle Collaboration), Phys. Rev. D ,032007 (2020), arXiv:1911.03186 [hep-ex].[57] M. Prim, “b2-hive/eﬀort v0.1.0,” (2020).[58] B. O. Lange, M. Neubert, and G. Paz, Phys. Rev. D ,073006 (2005), arXiv:hep-ph/0504071.[59] M. Feindt, F. Keller, M. Kreps, T. Kuhr, S. Neubauer,D. Zander, and A. Zupanc, Nucl. Instrum. Meth. A ,432 (2011), arXiv:1102.3876 [hep-ex].[60] A. Bevan et al. , Eur. Phys. J. C , 3026 (2014, Page95), arXiv:1406.6311 [hep-ex].[61] T. Chen and C. Guestrin, Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining KDD ’16, 785 (2016).[62] P. Ongmongkolkul, C. Deil, H. Dembinski, Dapid,C. Burr, Andrew, F. Rost, A. Pearce, L. Geiger, andO. Zapata, “iminuit - minuit from python,” (2012–),[Online; accessed 2018.03.05].[63] G. Cowan, K. Cranmer, E. Gross, and O. Vitells, Eur.Phys. J. C , 1554 (2011), [Erratum: Eur.Phys.J.C 73,2501 (2013)], arXiv:1007.1727 [physics.data-an].[64] M. Althoﬀ et al. (TASSO Collaboration), Z. Phys. C ,27 (1985).[65] W. Bartel et al. (JADE Collaboration), Z. Phys. C ,187 (1983).[66] J. Lees et al. (BaBar Collaboration), Phys. Rev. D ,072001 (2017), arXiv:1611.05624 [hep-ex].[67] P. Zyla et al. (Particle Data Group, Semileptonic b-Hadron Decays, Determination of V cb and V ub Review),Prog. Theor. Exp. Phys. (2020).[68] J. Rosner et al. (CLEO Collaboration), Phys. Rev. Lett. , 121801 (2006), arXiv:hep-ex/0601027.[69] P. Gambino and J. F. Kamenik, Nucl. Phys. B , 424(2010), arXiv:1004.0114 [hep-ph].[70] Z. Ligeti, M. Luke, and A. V. Manohar, Phys. Rev. D , 033003 (2010), arXiv:1003.1351 [hep-ph].[71] M. Voloshin, Phys. Lett. B , 74 (2001), arXiv:hep-ph/0106040. [72] I. I. Bigi and N. Uraltsev, Nucl. Phys. B , 33 (1994),arXiv:hep-ph/9310285.[73] J. Charles et al. (CKMﬁtter Group), Eur. Phys. J. C ,1 (2005), arXiv:hep-ph/0406184. A. B → X u (cid:96) + ν (cid:96) HYBRID MC DETAILS

Figure 13 shows the generator level hybrid B → X u (cid:96) + ν (cid:96) signal sample for E B(cid:96) , M X , and q described in Section II. E B [GeV] E v e n t s / ( . G e V ) ×10 ResonancesNon-resonantHybrid modelDFNBLNP 0.0 0.5 1.0 1.5 2.0 2.5 3.0 E B [GeV] E v e n t s / ( . G e V ) ×10 ResonancesNon-resonantHybrid modelDFNBLNP0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 M X [GeV] E v e n t s / ( . G e V ) ×10 ResonancesNon-resonantHybrid modelDFNBLNP 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 M X [GeV] E v e n t s / ( . G e V ) ×10 ResonancesNon-resonantHybrid modelDFNBLNP0 5 10 15 20 25 q [GeV ] E v e n t s / ( . G e V ) ×10 ResonancesNon-resonantHybrid modelDFNBLNP 0 5 10 15 20 25 q [GeV ] E v e n t s / ( . G e V ) ×10 ResonancesNon-resonantHybrid modelDFNBLNP

FIG. 13. The generator level B → X u (cid:96) + ν (cid:96) distributions E B(cid:96) , M X , and q for neutral (left) and charged (right) B mesons areshown. The black histogram shows the merged hybrid model, composed of resonant and non-resonant contributions. For moredetails on the used models and how the hybrid B → X u (cid:96) + ν (cid:96) signal sample is constructed, see Section II. B. INPUT VARIABLES OF B → X c (cid:96) ¯ ν (cid:96) SUPPRESSION BDT

The shapes of the variables used in the B → X c (cid:96) + ν (cid:96) background suppression BDT are shown in Figures 14 and 16.The most discriminating variables are M , the B sig vertex ﬁt probability, and M ,D ∗ . Figures 15 and 17 showthe agreement between recorded and simulated events, taking into account the full uncertainties detailed in Section V.More details about the BDT can be found in Section III C.5

10 5 0 5 10 15 M [GeV ] E v e n t s / ( . G e V ) B X u B X c Other 10 8 6 4 2 0 2 4 6

Vertex fit log ( /dof ) E v e n t s / ( . ) B X u B X c Other30 25 20 15 10 5 0 5 M miss,D * ( slow ) [GeV ] E v e n t s / ( . G e V ) B X u B X c Other 30 25 20 15 10 5 0 5 M miss,D * ( + slow ) [GeV ] E v e n t s / ( . G e V ) B X u B X c Other1 0 1 2 3 4 5 6 7

Number of K + E v e n t s / b i n B X u B X c Other 1 0 1 2 3 4 5 6 7

Number of K s E v e n t s / b i n B X u B X c Other3 2 1 0 1 2 3

Total charge E v e n t s / b i n B X u B X c Other

FIG. 14. The shape of the input variables for the B → X c (cid:96) + ν (cid:96) background suppression BDT are shown. For details anddeﬁnitions see Section III C. M miss [GeV ] E v e n t s / ( . G e V ) ×10 Other

B X c B X u DataMC unc. 6 4 2 0 2 4 6

Vertex fit log ( /dof ) E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc.20 15 10 5 0 5 M miss,D * ( slow ) [GeV ] E v e n t s / ( . G e V ) ×10 Other

B X c B X u DataMC unc. 20 15 10 5 0 5 M miss,D * ( + slow ) [GeV ] E v e n t s / ( . G e V ) ×10 Other

B X c B X u DataMC unc.1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Number of K + E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc. 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Number of K s E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc.3 2 1 0 1 2 3

Total charge E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc.

FIG. 15. The input variables for the B → X c (cid:96) + ν (cid:96) background suppression BDT for recorded and simulated events are shown.The uncertainty on the simulated events incorporate the full systematic uncertainties detailed in Section V. cos BYD * ( ) E v e n t s / ( . ) B X u B X c Other 0.5 0.0 0.5 1.0 1.5 2.0 cos

BYD * ( +slow ) E v e n t s / ( . ) B X u B X c Other1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 cos *D * ( +slow ) E v e n t s / ( . ) B X u B X c Other 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 cos *D * ( ) E v e n t s / ( . ) B X u B X c Other

FIG. 16. The shape of the input variables for the B → X c (cid:96) + ν (cid:96) background suppression BDT are shown. For details anddeﬁnitions see Section III C. cos BYD * ( ) E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc. 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 cos

BYD * ( +slow ) E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc.1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 cos *D * ( +slow ) E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc. 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 cos *D * ( ) E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc.

FIG. 17. The input variables for the B → X c (cid:96) + ν (cid:96) background suppression BDT for recorded and simulated events are shown.The uncertainty on the simulated events incorporate the full systematic uncertainties detailed in Section V. C. B → X u (cid:96) + ν (cid:96) CHARGED PION FRAGMENTATION MODELING

Figure 18 compares the charged pion multiplicity at diﬀerent stages in the selection. This variable is not used in thesignal extraction, but its modeling is tested to make sure that the B → X u (cid:96) + ν (cid:96) fragmentation probabilities cannotbias the ﬁnal result. The agreement in the signal enriched region with M X < . n π ± observed in thisselection by assigning the non-resonant B → X u (cid:96) + ν (cid:96) events a correction weight as a function of the true charged pionmultiplicity. After this procedure the agreement is perfect and we use the diﬀerence in the reconstruction eﬃciencyas an uncertainty on the pion fragmentation on the partial branching fractions and | V ub | (cf. Section V). Number of ± E v e n t s / ( . ) ×10 Other

B X c B X u DataMC unc. 0 1 2 3 4 5 6 7 8

Number of ± E v e n t s / ( . ) Other

B X c B BBB + B B + B Non-reso.

B X u DataMC unc.

Number of ± E v e n t s / ( . ) Other

B X c B BBB + B B + B Non-reso.

B X u DataMC unc.

Number of ± E v e n t s / ( . ) Other

B X c B BBB + B B + B Non-reso.

B X u DataMC unc.

FIG. 18. The charged pion multiplicity ( n π ± ) are compared between data and the simulation: (top left) for all events priorthe BDT selection; (top right) for all events after the BDT selection; (bottom left): for the signal enriched region of M X < . n π ± fragmentationprobability to match the one observed in data. D. NUISANCE PARAMETER PULLS AND ADDITIONAL FIT PLOTS

Figures 19 and 20 show the nuisance parameter pulls for each ﬁt category k and bin i deﬁned as (cid:16)(cid:98) θ ik − θ ik (cid:17) / (cid:113) Σ k,ii , (52)of the partial branching fraction ﬁts, with (cid:98) θ ( θ ) corresponding to the post-ﬁt (pre-ﬁt) value of the nuisance parameter.Note that uncertainties of each pull shows the post-ﬁt error (cid:113)(cid:98) Σ k,ii (53)normalized to the pre-ﬁt constraint (cid:113) Σ k,ii . (54)Figure 21 shows the post-ﬁt q distributions of the two-dimensional ﬁt to M X : q on M X . Standard Deviations N u i s a n c e P a r a m e t e r s Signal-inSignal-out Background

Standard Deviations N u i s a n c e P a r a m e t e r s Signal-inSignal-out Background

Standard Deviations N u i s a n c e P a r a m e t e r s Signal Background

FIG. 19. The nuisance parameter pulls on the 1D ﬁts of M X , q , and E B(cid:96) with and without M X < . Standard Deviations N u i s a n c e P a r a m e t e r s Signal Background

FIG. 20. The nuisance parameter pulls on the 2D ﬁt of M X : q is shown. E v e n t s / b i n w i d t h BackgroundSignalDataMC uncertainty0 5 10 15 20 25 q [GeV ] P u ll E v e n t s / b i n w i d t h BackgroundSignalDataMC uncertainty0 5 10 15 20 25 q [GeV ] P u ll E v e n t s / b i n w i d t h BackgroundSignalDataMC uncertainty0 5 10 15 20 25 q [GeV ] P u ll E v e n t s / b i n w i d t h BackgroundSignalDataMC uncertainty0 5 10 15 20 25 q [GeV ] P u ll FIG. 21. The post-ﬁt q distributions of the two-dimensional ﬁt to M X : q on M X are shown. The panels correspond to: M X ∈ [0 , .

5] GeV (top left), M X ∈ [1 . , .

9] GeV (top right), M X ∈ [1 . , .

4] GeV (bottom left) and M X ∈ [2 . ,

4] GeV (bottomright). The resulting yields are corrected to correspond to a partial branching fraction with E B(cid:96) > TABLE VII. The ﬁtted yields separated in electron and muon candidates, as well as in charged or neutral B mesons.Decay mode (cid:98) η sig (cid:98) η bkg (cid:0) (cid:15) tag · (cid:15) sel (cid:1) ∆ B B + → X u (cid:96) + ν ± ±

65 3667 ± ±

64 0 . ± .

13 1 . ± . ± . B → X u (cid:96) + ν ± ±

65 3375 ± ±

64 0 . ± .

11 1 . ± . ± . B → X u e + ν ± ±

64 3315 ± ±

65 0 . ± .

12 1 . ± . ± . B → X u µ + ν ± ±

74 3712 ± ±

73 0 . ± .

13 1 . ± . ± . E. ADDITIONAL FIT DETAILS TO THE LEPTON FLAVOR UNIVERSALITY AND WEAKANNIHILATION TESTS

The ﬁtted yields of the two-dimensional ﬁt to M X : q separated in electron and muon candidates, as well as incharged or neutral B mesons are listed in Table VII. F. BDT EFFICIENCIES

Figure 22 shows the eﬃciency of the BDT selection as a function of the reconstructed variables q , M X , and thelepton energy E B(cid:96) for simulated B → X u (cid:96) + ν (cid:96) events. Although we avoided using these variables in the boosteddecision tree, a residual dependence on the kinematic variables is seen. For instance the eﬃciency increases with anincrease in E B(cid:96) and a decrease with respect to high q . The eﬃciency on the hadronic mass M X is relatively ﬂat. Thiseﬃciency dependence is linked to the used variables in the BDT. Although we carefully avoided kinematic variablesthat would allow the BDT to learn these kinematic properties, there are indirect connections: e.g. high E B(cid:96) ﬁnalstates have a lower multiplicity as they are dominated by B → π(cid:96) ¯ ν (cid:96) decays. Further, their corresponding hadronicsystem carries little momentum and on average such decays retain a better resolution in discriminating variables ofthe background suppression BDT. A concrete example is M (cf. Figure 15): high multiplicity B → X u (cid:96) + ν (cid:96) decayswill retain a larger tail in this variable and will be selected with a lower eﬃciency by the BDT. E B [GeV] E ff i c i e n c y M X [GeV] E ff i c i e n c y q [GeV ] E ff i c i e n c y Bin number E ff i c i e n c y FIG. 22. The B → X u (cid:96) + ν (cid:96) eﬃciency after the BDT selection is shown as a function of the reconstructed kinematic variables( E B(cid:96) , M X , q ) used in the signal extraction. The bottom right plot shows the eﬃciencies in the bins of M X : q2