Biogeosciences | 2021

Can machine learning extract the mechanisms controlling phytoplankton growth from large-scale observations? – A proof-of-concept study

 
 

Abstract


Abstract. A key challenge for biological oceanography is relating the physiological\nmechanisms controlling phytoplankton growth to the spatial distribution of\nthose phytoplankton. Physiological mechanisms are often isolated by varying\none driver of growth, such as nutrient or light, in a controlled laboratory\nsetting producing what we call “intrinsic relationships”. We contrast\nthese with the “apparent relationships” which emerge in the environment in\nclimatological data. Although previous studies have found machine learning\n(ML) can find apparent relationships, there has yet to be a systematic study\nexamining when and why these apparent relationships diverge from the\nunderlying intrinsic relationships found in the lab and how and why this may depend on the method applied.\xa0Here we conduct a proof-of-concept study\nwith three scenarios in which biomass is by construction a function of\ntime-averaged phytoplankton growth rate. In the first scenario, the inputs\nand outputs of the intrinsic and apparent relationships vary over the\nsame\xa0monthly timescales. In the second, the intrinsic relationships relate\naverages of drivers that vary on hourly timescales to biomass, but the\napparent relationships are sought between monthly averages of these inputs\nand monthly-averaged output. In the third scenario we apply ML to the output\nof an actual Earth system model (ESM). Our results demonstrated that when\nintrinsic and apparent relationships operate on the same spatial and\ntemporal timescale, neural network ensembles (NNEs) were\xa0able to extract the\nintrinsic relationships when only provided information about the apparent\nrelationships, while colimitation and its inability to extrapolate resulted in random forests (RFs) diverging from the true response. When\nintrinsic and apparent relationships operated on different timescales (as\nlittle separation as hourly versus daily),\xa0NNEs fed with apparent\nrelationships in time-averaged data produced responses with the right shape\nbut underestimated the biomass. This was because when the intrinsic\nrelationship was nonlinear, the response to a time-averaged input differed\nsystematically from the time-averaged response. Although the limitations\nfound by NNEs were overestimated, they were able to produce more realistic\nshapes of the actual relationships compared to multiple linear regression.\nAdditionally, NNEs were able to model the interactions between predictors\nand their effects on biomass, allowing for a qualitative assessment of the\ncolimitation patterns and the nutrient causing the most limitation. Future\nresearch may be able to use this type of analysis for observational datasets\nand other ESMs to identify apparent relationships between biogeochemical\nvariables (rather than spatiotemporal distributions only) and identify\ninteractions and colimitations without having to perform (or at least\nperforming fewer) growth experiments in a lab. From our study, it appears\nthat ML can extract useful information from ESM output and could likely do\nso for observational datasets as well.

Volume 18
Pages 1941-1970
DOI 10.5194/BG-18-1941-2021
Language English
Journal Biogeosciences

Full Text