Soil carbon-food synergy: sizable contributions of small-scale farmers

Benefits to agricultural yield improvement, soil degradation prevention, and climate mitigation are central to the synergies of soil organic carbon (SOC) build-up. However, the contributions of small-scale farmers, the main target of recent agricultural and rural development policies, to SOC enhancement are understudied. Here, we present a global analysis of small-scale farmers’ contributions to the potential of additional SOC stocks and the associated increase in crop production. We applied random forest machine learning models to global gridded datasets on crop yield (wheat, maize, rice, soybean, sorghum and millet), soil, climate and agronomic management practices from the 2000s (n = 1808 to 8123). Using the established crop-specific SOC-yield relationships, the potentials of additional SOC build-up and crop production increase were simulated. The estimated SOC increase was converted into global decadal mean temperature change using the temperature sensitivity to cumulative total anthropogenic CO2 emissions from preindustrial levels. The amount of inorganic nitrogen (N) input that would result in the same yield outcome as the SOC build-up was derived from the crop-specific N-yield relationships. SOC contributes to yields in addition to management and climatic factors. Additional SOC sums up to 12.78 GtC (11.55–14.05 GtC) of global SOC stock, which earns 38.24 Mt (22.88–57.48 Mt) of additional crop production and prevents warming by 0.030 °C (0.019–0.041 °C). This production increase equates to what would be achieved by an inorganic N input of 5.82 Mt N (3.89–7.14 Mt N). Small-scale farmers account for 28% (26–30%) of the additional SOC build-up and 17% (15–20%) of the production increase. Key crops and regions in terms of small-scale farmers’ contributions include Sub-Saharan African maize and rice, Latin American and Caribbean soybean and maize, and South Asian rice and wheat. The contribution of small-scale farmers to the potential increase in SOC stock and crop production is sizable, which in theory further leads to saving inorganic N input. These findings emphasize the importance of linking soil management to sustainable land and climate mitigation with institutions and policy for small-scale farmers. Such a joint policy would assist multiple development goals.


Introduction
The co-benefits of crop yield improvement, soil degradation prevention, and climate mitigation from carbon (C) sequestration into agricultural land have been well recognized (Lal 2004;Bossio et al. 2020 Iizumi et al. CABI Agric Biosci (2021) 2:43 the 4 per mil initiative (Minasny et al. 2017), which sets an ambitious goal for the global SOC stock over land to increase by 0.4% per year in the 0-40 cm soil depth over 20 years. While the feasibility of the goal is under debate (Lal 2016; van Groenigen et al. 2017;Poulton et al. 2018;Schlesinger and Amundson 2019;Rumpel et al. 2020), identifying the world's agricultural land suitable for the additional build-up of SOC in terms of synergies between multiple development goals is pivotal. Although not an exhaustive list, such synergies between SOC enhancement and food production include the reduction of inorganic nitrogen (N) application and fertilizer costs (Kanter et al. 2015;Cui et al. 2018); the protection of drinking water quality and fisheries resources by avoiding eutrophication of waters (Vitousek et al. 2009); ecological intensification and pest control through the maintenance of soil biodiversity (Garratt et al. 2018). Improvement in drought tolerance (Iizumi and Wagai 2019), yield stability (Zhang et al. 2016;Knapp and van der Heijden 2018), and higher on-farm income also occurs. Yield improvement is a primary benefit of SOC management (Lal 2010;Henryson et al. 2018;Soussana et al. 2019). However, further research is needed to fully utilize this potential benefit for the following three reasons. First, recent meta-analyses (Oldfield et al. 2019;Sun et al. 2020) indicate that the effect of increasing SOC on yield for a spatial domain greater than the experimental plot scale is elusive; thus, global-scale SOC-yield relationships are uncertain. Second, varying SOC-yield relationships reported in earlier studies indicate that factors other than SOC, such as thermal and moisture regimes and management practices adopted by farmers, also contribute to yield. It is therefore imperative to quantify the size of the SOC effect on yield relative to other contributing factors before adding increasing SOC as an effective option for land management for the targets in the 2030 Agenda for Sustainable Development. Last, the SOC-induced yield benefit needs to be linked with small-scale farmers, the main target in recent agricultural and rural development (Cui et al. 2018;Ricciardi et al. 2018), to maximize the synergies between multiple development goals related to food security, sustainable land, climate mitigation, and even safe water.
When investments in improved SOC management practices take place, a high priority of resource-poor small-scale farmers who live on C-poor cropland in many cases is anticipated. Small-scale farmers in developing countries, in many cases, do not have the necessary resources, including N, which is an integral part of soil organic matter together with C (typical C:N mass ratio is 10-15). Crop residues are often used for animal feed, bedding or as a source of energy by direct combustion or for ethanol production (Lal 2004;Poulton et al. 2018).
Better institutional interventions to motivate small-scale farmers for SOC build-up via effective use of crop residues and better access to education and technology for improved SOC management are necessary (Zhao et al. 2018).
The main question we address in this study is: how much increase in crop production could small-scale farmers worldwide achieve by enhancing SOC? First, we estimate the yield responses of major crops to soil, climate, and management factors estimated by applying a machine learning technique to global gridded agricultural and environmental datasets. A simulation experiment was conducted to identify farming types (irrigated, highinput, low-input and subsistence), crops, and regions in which yield increases are anticipated through SOC management. Then we assessed the relative contributions of small-scale farmers (with low-input and subsistence farming types together) to the estimated potential of additional SOC stock and associated crop production increase. We examined six major crops (wheat, maize, rice, soybean, sorghum and millet) that together occupy half of the global cropland area. Findings from such an assessment would be useful for investment institutions to explore synergetic interventions to simultaneously proceed with multiple development goals. Table 1 summarizes the global gridded datasets used for this study. All of the datasets represent agricultural and environmental conditions between 1994 and 2015. The spatial resolution varied by dataset from the 30 arc second (approximately 1 km or 0.009°) to the 30 arc minute (56 km or 0.5°). Spatial aggregation of the datasets to a grid size of 0.5° was conducted. The crop-specific harvested areas were considered in the spatial aggregation. Then, random forest (RF) models were built for each of the crops with yield as the response variable and soil, climate, and management factors as the explanatory variables. The following subsections elaborate on each variable.

Yield
Two different yield datasets were used to consider uncertainties of the SOC-yield relationship associated with the use of different yield datasets (Table 1). One was the M3 dataset (Monfreda et al. 2008) solely based on national and subnational censuses. The other was the Spatial Production Allocation Model (SPAM) 2005 version 3.2 (Wood-Sichra et al. 2016), a hybrid dataset derived by inputting agricultural censuses and satellite land cover maps into the optimization model. The M3 yields mostly represent the average of the years between 1997 and Iizumi et al. CABI Agric Biosci (2021) 2:43 2003, whereas the SPAM2005 yields represent the average for the 2004-2006 period. Although SPAM2000 (You et al. 2009) is closer to the M3 dataset in terms of the data-collected year, millet yield is not available. Therefore, pearl millet and small millet available in SPAM2005 were combined (by averaging with the harvested areas of the SPAM2005 dataset as the weights) and used. The spatial resolution is the same between the M3 and SPAM2005 datasets (5 arc minutes), and these were aggregated using the harvested areas of the M3 dataset as the weights.

Soil
We used the topsoil organic carbon content and available water storage capacity as the soil variables (Table 1). These data were obtained from the Regridded Harmonized World Soil Database (HWSD) version 1.2 (Wieder et al. 2014). The regridded data have a grid size of 3 arc minutes and were aggregated from the 30 arc second resolution data originally compiled by the Food and Agriculture Organization of the United Nations (FAO) and collaborative organizations (FAO et al. 2012). We further aggregated the data to the 5 arc minute resolution using the inverse distance-weighted averaging method to have a common resolution with the harvested area maps of the M3 dataset. The area-weighted average over the six crops considered here was calculated to represent the average soil condition over the cropland distributed within a 0.5° grid cell (Fig. 1a).
We also used the Global Soil Organic Carbon (GSOC) map compiled by FAO and the Intergovernmental Technical Panel on Soils (ITPS) (FAO and ITPS 2018) as another and latest global SOC dataset. The 30 arc second resolution GSOC dataset was aggregated in a similar manner as the HWSD dataset (Fig. 1b). The differences between the HWSD and GSOC datasets are nonnegligible if we focus on the geographical distributions of C-poor (SOC < 3 kg C m -2 ) and C-rich (SOC > 9 kg C m -2 ) cropland areas (Fig. 1c,d) and are expected to be an important source of uncertainties in the SOC-yield relationship. The HWSD-based available water storage capacity data were used throughout the analysis since this variable was not available in the GSOC dataset.

Climate
The climatic factors considered here included growing season temperature and the water balance index (Table 1). For the former, the average temperature from emergence to harvesting was computed for each crop, cropping season (rainfed and irrigated seasons) and year using the daily 2-m air temperature obtained from the 0.5°-resolution global retrospective meteorological forcing dataset (S14FD; Iizumi et al. 2017b). Emergence and harvesting were defined as the first date at which the fraction between the accumulated growing degree days (GDD) and crop total GDD requirement reached 0.1 and 1.0, respectively. The average temperatures calculated for these seasons were averaged using the rainfed and irrigated areas available in the Monthly Irrigated and Rainfed Crop Areas dataset for the years around 2000 (MIRCA; Portmann et al. 2010) as the weights. This was done to account for the possibility that crop calendars are different between the seasons (e.g., rice in Asia). Then, the multiyear average over the 1998-2002 period was calculated and used for analysis.
The calculation of the total crop GDD requirement was based on daily mean temperature, planting and harvesting dates, and crop-specific temperature thresholds. The middle day of planting and harvesting months available in the MIRCA were aggregated from the 5 arc minute resolution by extracting the data value of a finer grid cell within a coarser grid cell that has the largest harvested area for crop and season of interest. The base temperature and upper-temperature limit values used here are available in Additional file 1: Table S1. For the water balance index, we first calculated the average precipitation between rainfed and irrigated seasons in a similar manner as the temperature. Then, the standard deviation (SD) of the seasonal precipitation for the 1998-2002 period and the multiyear average were calculated. A measure of season water balance was computed as: (1) WB = P − 1.28 × σ P − CWN , Fig. 1 Geospatial patterns of (a, b) the current SOC levels, (c, d) current SOC categories, (e, f) maximum attainable SOC levels (SOC max ), (g) climate bins, and (g) differences in the SOC max value between the HWSD and GSOC datasets Iizumi et al. CABI Agric Biosci (2021) 2:43 where WB is the water balance index (mm season -1 ), P is the average season precipitation over the 5-year period (mm season -1 ), σ P is the SD of the season precipitation for the same period (mm season -1 ), and CWN is the crop-specific water needs (mm season -1 ) (Additional file 1: Table S1). The combined term, P − 1.28 × σ P , indicates the water supply by precipitation in a 1-in-10 dry year when a normal distribution was assumed for the season precipitation. The CWN values used here corresponded to the upper end of the reported range; for instance, the range for maize reported by Brouwer and Heibloem (1986) is from 500 to 800 mm season -1 . A WB value of zero indicates that crop water needs under rainfed conditions can be met, even after the consideration of farmers' risk averse attitude that relatively higher water-consuming cultivars (e.g., long season cultivars) are grown under drier-than-normal conditions. Negative and positive WB values indicate that crops are exposed to water deficits and water surpluses, respectively.

Management
The variables used to characterize management level included the N application rate, pesticide application rate, irrigation intensity, farm field size, and agricultural knowledge stock (Table 1). As management information at the farm level is hardly accessible, the variables used here are indicators for the average management level at the landscape to country levels. Crop-specific average N application rates between 1994 and 2001 (Mueller et al. 2012) were spatially aggregated, as was done for the yield data. The top 20 crop-specific pesticide application rates in 2015 available in the PEST-CHEMGRIDS v1.01 dataset (Maggi et al. 2019) were averaged across the pesticide types and then spatially aggregated. The average irrigation intensity between 1998 and 2002 (the irrigation-equipped area divided by the harvested area) was calculated from the MIRCA dataset. The original grid size of these three datasets is 5 arc minutes.
The farm field size category data from approximately 2005 (Fritz et al. 2015; 1 = very small, 2 = small, 3 = medium, and 4 = large) offer a satellite-derived indication of the average physical size of farm fields located within a 30 arc second grid cell. The data were aggregated by counting the number of appearances of each category, and a single category that most frequently appeared was selected as the typical farm field size for each 0.5° grid cell.
Last, the country agricultural knowledge stock (Iizumi et al. 2017a)-the sum of the annual governmental expenditures for agricultural research and development since 1961 with a certain obsolescence rate-was used to consider the yield improvement from management practices other than those considered above (e.g., adoption of high-yielding varieties). The average between 1998 and 2002 was used. We assumed that the knowledge stock level is the same across grid cells within a country.

Farming types
The crop-and farming type-specific harvested areas (the average between 2009 and 2011) were obtained from SPAM2010 version 1.1 (Yu et al. 2020) and used after spatial aggregation. Four farming types, irrigated, high-input, low-input, and subsistence, are available. We assumed that the low-input and subsistence farming types together represent resource-poor, small-scale farmers, although other factors, such as the extent of cultivated area and on-farm income, are also used to define small-scale farmers (Lowder et al. 2016;. The 5 arc minute data were aggregated and used for the summation of potential changes in SOC stock, crop production and inorganic N input estimated at the 0.5° grid cell level to those at the regional and global levels. Although the harvested areas for these farming types for 2005 are available in SPAM2005 (Wood-Sichra et al. 2016), we used SPAM2010 to set the baseline time point as recent as possible.

Random forest models
We employed the RF model (Breiman 2001), a machine learning technique. The RF model is a nonparametric classification and regression tree analysis method and has increasingly been applied to address nonlinear climate-yield relationships (Jeong et al. 2016;Hoffman et al. 2018;Mann et al. 2019;Laborde et al. 2020). The model fitting was conducted using the randomForest function in the statistical package R (R Core Team 2021) with the following settings (ntree = 500, mtry = 3, nodesize = 5). These values for the number of trees to grow (ntree), the number of explanatory variables randomly sampled as candidates at each split (mtry), and the minimum size of terminal nodes (nodesize) were set to be the default used in the function according to the literature (Liaw and Wiener 2002;Jeong et al. 2016). We consider that the setting used shows of the lower bound on achievable accuracy under the default setting although an extensive exploration might increase model accuracy further. The relative importance of the individual variables considered here in explaining the global yield patterns in the 2000s was also estimated within the function. The RF model was separately fitted to each of the crops. For a given crop, model fitting was conducted for each of the four dataset combinations consisting of two yield datasets (M3 and SPAM2005) and two SOC datasets (HWSD and GSOC), Page 6 of 15 Iizumi et al. CABI Agric Biosci (2021) 2:43 with the data sources for the remaining variables kept the same to examine the uncertainties in estimated SOCyield and N-yield relationships. Although the sample size used for the model fitting varied by crop and dataset combinations, it strongly depended on the extent of the global harvested area for a crop of interest and ranged from 1808 for millet to 8,123 for maize. These relationships, both specific to crop and dataset combinations, were used in the simulation experiment described in the subsequent section.

Simulation experiment
Using the RF-derived SOC-yield relationships, we calculated potential increases in SOC stock and associated increases in yields. The increases in yields are converted into decreases in inorganic N input equivalent to those yield increases using the RF-derived N-yield relationships. The detailed procedure is explained below: 1. The difference in SOC between the current and maximum attainable levels is calculated for each location (Fig. 2a) For each of the 100 climate bins consisting of 10 thermal regimes and 10 moisture regimes, we gathered current SOC values from the cropland grid cells that had the same climate bin with that of interest and selected the 95% tile value as the SOC max value. The SOC max values were determined for each SOC dataset ( Fig. 1e, f ).
2. The increase in yield corresponding to the increase in SOC between the current and maximum attainable levels is derived crop by crop using the SOC-yield relationships (Fig. 2a). The yield increases are converted into increases in production by multiplying the harvested areas of the M3 dataset. 3. The change in the N application rate that gives the same magnitude of yield change as that calculated in the former step is computed for each crop using the N-yield relationships (Fig. 2b). 4. The N application rate that would achieve the current yield level under the SOC max level calculated in the former step is compared with the current N application rate to derive the amount of inorganic N input potentially saved through that SOC increase (Fig. 2b). The decreases in the N application rate are multiplied by the harvested areas of the M3 dataset to derive the total amount of inorganic N input saved for the crops.
Then, the calculated changes in SOC stock, crop production and inorganic N input were aggregated for each crop, farming type and region. These calculations were conducted for each dataset combination since the SOCand N-yield relationships are different between the four dataset combinations (M3-HWSD, M3-GSOC, SPAM-HWSD and SPAM-GSOC).
We simulated only when the current SOC level for a given cropland grid cell falls within the effective range of the SOC-yield relationships (approximately 3 to 9 kg C m -2 , with some variations between the crops and dataset combinations, as shown in "Results"). Although an increase in SOC in C-poor cropland (< 3 kg C m -2 ; Fig. 1c, d) should be beneficial for yield improvement, the RF-derived SOC-yield relationships in that area were highly uncertain due to the lack of samples. The RF-derived SOC-yield relationships in the C-rich cropland (> 9 kg C m -2 ; Fig. 1c, d) were uncertain too, and the yield change through improved soil management in the C-rich cropland is expected to be small and will not motivate farmers in that area to further increase SOC. For these reasons, we limited our simulation only to croplands with intermediate SCC levels.

Climate mitigation by additional SOC stock
The estimated potential of additional SOC stock over the global cropland was converted into a global decadal mean surface temperature change. The Intergovernmental Panel on Climate Change (IPCC) Working Group I Fifth Assessment Report (IPCC 2013) illustrates, in Figure SPM.10, a linear relationship between cumulative total anthropogenic CO 2 emissions from 1870 and global decadal mean surface temperature change relative to 1861-1880. We recalculated this relationship using the bias-corrected daily mean 2-m air temperature data of eight atmosphere-ocean coupled general circulation models (GCMs) (Iizumi et al. 2017b) used in the Coupled Model Intercomparison Project phase 5 (CMIP5; Taylor et al. 2012) to derive a global decadal mean surface temperature change relative to 1850-1900 per 1 GtCO 2 change. See Supplementary Figure S10 and Supplementary Table S2 of Iizumi and Wagai (2019) for more details. The temperature sensitivity used here spans from 4.482 × 10 −4 to 7.898 × 10 −4 °C (GtCO 2 ) −1 , consisting of eight GCMs and four representative concentration pathways (RCPs; van Vuuren et al. 2011).

Relative importance of soil, climate and management factors
The RF models were successfully fitted to the data, enabling us to infer the relative contribution to yield from individual factors. A predominant portion of the geospatial yield patterns in the 2000s was explained by soil, climate and management factors, with explained variances of 71.5% to 93.5% and root-mean-square errors (RMSEs) of 15% to 34% relative to the average actual yields (Additional file 1: Fig. S1). The fitting performance of the RF models for millet was notably different between the two yield datasets. For the SPAM dataset, we used total millet yields calculated from pearl and small millet, but it is unclear whether this is the reason for the discrepancy. The RF models revealed that pesticide application rate, agricultural knowledge stock, and N application rate are leading management factors in explaining the yields on a crop-and multidataset-average basis (the gray bars in Fig. 3). Climatic factors (the seasonal temperature in particular), the remaining management factors (irrigation intensity and farm field size), and SOC were determined to be important in addition to the leading management factors. The soil water holding capacity factors were presumed to be important in addition to these factors. However, the estimated importance of SOC varied depending on which SOC dataset was analyzed. This tendency was prominent for wheat and millet, and SOC was ranked as the fourth most important factor for these crops when the GSOC dataset was used (Fig. 3). The uncertainty of the estimated importance of SOC associated with the use of different datasets was small for maize, rice, soybean and sorghum.

Yield response patterns to individual factors
If the M3-HWSD dataset combination was taken as the example for explanatory purposes, on a multicrop-average basis, the RF models revealed the following yield response patterns for the individual factors. The yield increased with an increase in the pesticide application rate and levels off (Fig. 4a). A similar pattern as that for the pesticide application rate was found for the agricultural knowledge stock and N application rate (Fig. 4b,  c). The yield response to climate factors is nonlinear; the yield starts decreasing when the season temperature exceeds approximately 10 °C (Fig. 4d); the yield increases along with the reduction in water deficits, while water surplus no longer elevates yield and even decreases (Fig. 2g). The yields almost linearly responded to increasing irrigation intensity and farm field size (Fig. 2e, f ). The yield steeply increases with an increase in SOC (Fig. 2h), whereas the yield weakly increases with an increase in soil available water storage capacity (Fig. 2i). The estimated yield responses for the other datasets (Additional file 1: Figs. S2-S4) are qualitatively similar to those for the M3-HWSD. However, some quantitative Fig. 4 Yield responses to soil, climate and management factors. The partial dependence plots derived using the RF models fitted to the M3-HWSD dataset combination are presented. The variables are sorted by the multicrop-and multidataset-average importance from high (a) to low (i). The y-axis indicates the yield change relative to the crop-specific, non-area-weighted global mean yield. The colored solid lines indicate the relationship between the yield and individual factors that fall within the 90% probability interval of the samples, while the colored dotted lines indicate the relationship that falls within either of the lower 5% or upper 5% intervals. The bold black lines indicate the multicrop-average relationship for the 90%-probability intervals Iizumi et al. CABI Agric Biosci (2021) 2:43 differences are notable. For instance, the yield response to SOC for SPAM-GSOC (Additional file 1: Fig. S4h) is relatively moderate compared to that for M3-HWSD (Fig. 2h), M3-GSOC (Additional file 1: Fig. S2h) and SPAM-HWSD (Additional file 1: Fig. S3h). The yield response to soil available water storage capacity varied in magnitude according to the dataset used ( Fig. 2i versus Additional file 1: Fig. S3i, for instance).

Established SOC-and N-yield relationships
When SOC increases, yield also increases. This tendency appeared in the estimated SOC-yield relationships irrespective of crops and datasets (Fig. 5). However, the estimated yield response to increasing SOC was different between the crops, with relatively larger responses for wheat, sorghum and millet (Fig. 5a, e, f ) than for maize, rice and soybean (Fig. 5b, c, d). In addition, the magnitude of the response varies by the dataset used. For wheat, the increase in yield for change in SOC between 3 and 9 kg C m -2 estimated using the GSOC dataset was remarkably greater than that estimated using the HWSD dataset (Fig. 5a). Another example was rice: when the HWSD dataset was used, the yield increased as SOC increased, but when the GSOC dataset was used, the yield did not change much with increasing SOC (Fig. 5c).
The use of different SOC datasets played a greater role than the use of different yield datasets as the source of uncertainties in the estimated SOC-yield relationships. In contrast, for any crop, the estimated N-yield relationships were not sensitive to the choice of yield and SOC datasets. A tendency for yields to increase with increasing N application rate was detected in all crops considered here except soybean, with some variations by crop (Fig. 6). For instance, the estimated yield response for rice to an increase in the N application rate between 0 and 200 kg N ha -1 year -1 (Fig. 6c) was smaller than that for wheat and maize (Fig. 6a, b). The estimated yield response for soybean that was less sensitive to changes in the N application rate was reasonable since soybean is a legume, and yield is less sensitive to soil N content than cereals due to nitrogen fixation.

Additional SOC stock and resulting synergies
The simulation experiment revealed that the additional SOC stock aimed at increasing the SOC-controlled yield over the global cropland accounted for 12.78 GtC (Fig. 7a, b) with the minimum-maximum range between the four dataset combinations from 11.55 to 14.05 GtC ( Table 2). The agricultural C sequestration presented above equated to 46.9 GtCO 2 (42.4-51.6 GtCO 2 ), which is 1.4-fold of the global annual CO 2 emissions in 2018 (33.5 GtCO 2 ) and was estimated to contribute to preventing global decadal mean temperature warming by 0.030 °C (0.019-0.041 °C). The increase in crop production through SOC management was estimated to reach 38.24 Mt (22.88-57.48 Mt) globally for the six crops together ( Fig. 7b; Table 2), with relatively large uncertainty in the estimates in East Asia (China) and South Asia (India) (the right panels of Additional file 1: Figs. S5-S8). In absolute terms, this equals the average annual production of wheat in France in the 2010s (38 Mt). This production increase was presumed to be equal to what would be achieved by an inorganic N input of 5.82 Mt N (3.89-7.14 Mt N) (Additional file 1: Fig. S9), which equals 5.5% of the global N input in the 2010s. Basically, the production increase and inorganic N input savings were both proportional to the size of the additional SOC stock (Additional file 1: Fig. S10).
The aggregated simulation results by farming type showed that, at the global scale, the contributions of small-scale farmers (the low-input and subsistence farming types together) accounted for 28% (26-30%) of the additional SOC stock and 17% (15-20%) of the crop production increase, respectively ( Fig. 7b; Table 2). The key crops and regions in terms of the small-scale farmers' contributions include maize and rice in Sub-Saharan Africa, soybean and maize in Latin America and the Caribbean, rice and wheat in South Asia and Southeast Asia and Oceania, wheat in Eastern Europe and Central Asia, and so on (Fig. 7a).
This uncertainty in the estimated global and regional potentials of additional SOC stocks stems from the different geospatial patterns (the left panels of Additional file 1: Figs. S5-S8). The uncertainty was especially large in East Asia and South Asia ( Table 2). The different geospatial patterns of C-poor and C-rich areas (Fig. 1c, d) and Table 2 Global and regional potentials of additional SOC stock, crop production increase and inorganic N-input equivalent to that production increase, and the small-scale farmers' contributions The averages and minimum-maximum ranges of the estimates between the four dataset combinations are presented  (Fig. 1h) between the HWSD and GSOC datasets largely contributed to this uncertainty. The variation in the SOC-yield relationships resulted from the choice of the dataset (Fig. 5). The different geospatial patterns of harvested areas between the crops (Additional file 1: Figs. S5-S8) also contributed to the uncertainty as the secondary factor. The uncertainty in the estimated small-scale farmers' contributions was relatively large for South Asia and Eastern Europe and Central Asia but small for the remaining regions (Table 2).

Discussion
Here, we first discuss the validity of the estimated SOCyield relationships. Then, the estimated potentials of additional SOC stock are compared with earlier studies. Finally, we discussed the possible implications and limitations of this study. The validity of the estimated yield responses to factors other than SOC is discussed in the Additional file 1.

SOC-yield relationships
The yield response to SOC estimated in the present study was generally consistent with the literature. The estimated yield response to SOC was smaller for soybean than for the other crops (Fig. 5). This tendency is in accord with the study reporting a weaker yield response to crop residue application for soybean than for maize (Wilhelm et al. 1986). By reviewing the studies conducted in China and India, Lal (2010) showed that the yield response to SOC was higher for wheat than for maize and rice when crops grown in the same location were compared. Our results are in line with these observations. The estimated yield response in relatively C-rich soil conditions was less sensitive to further increases in SOC regardless of crops (Fig. 5). This "yield plateau" pattern is qualitatively consistent with the literature reporting yield decline along with SOC increase under well-fertilized conditions in India (Benbi and Chand 2007). The yield plateau is also reported for cooler regions in China (Pan et al. 2009) and maize-producing areas on soils with SOC > 2-3% worldwide (Oldfield et al. 2019). However, the threshold SOC level leading to the yield plateau varies between earlier literature and this study. Zhang et al. (2016) showed that the threshold SOC level for the yield plateau in China ranged from 2.2 to 4.6 kg C m -2 , which is lower than our result of approximately 6 kg C m -2 (Fig. 5). Their analysis indicated that the threshold was more strongly controlled by geographic region (temperate versus subtropical) than crop (maize versus wheat) (Zhang et al. 2010). This observation in part supports our use of the SOC max values specific to climate bins, although there is room for improvement by accounting for edaphic factors (e.g., texture, mineralogy, aggregate structure) that may also be linked with the inherent capacity of soil to store organic carbon (Gulde et al. 2008;Singh et al. 2017;Six et al. 2002).

Comparisons with earlier studies
Estimation of the soil C sequestration potential in agricultural land has rarely been conducted at a global scale. A notable exception is Lal (2016), which claims that the world's cropland soils could sequester as much as 62 t ha -1 (6.2 kg C m -2 ) over the next 50-75 years, with a total C sink capacity of ~ 88 GtC on 1420 Mha; however, Lal (2016) also states that the actual or attainable potential may also be only one-third to one-half of the capacity (i.e., 29 to 44 GtC). Our estimate of 12.78 GtC (11.55-14.05 GtC) is quite close to the earlier estimates of 14.5 to 22 GtC, the values obtained by multiplying 29 to 44 GtC of Lal (2016) by 0.5, given that the six crops (779 Mha) used in this study occupy 50% of the global cropland area. Notably, the additional SOC build-up estimated in this study can be achieved in one to three decades if socioeconomic constraints are resolved, given the global technical potential for a C sequestration rate of 0.4-1.4 GtC year -1 reported in the literature (Sommer and Bossio 2014 (0.4 to 0.7 GtC year -1 ); Smith 2016 (0.7 GtC year -1 ); Fuss et al. 2018 (1.4 GtC year -1 )).

Implications and limitations
The SOC effects to increase yields are estimated to be smaller than the effects from the management and climatic factors with some variations by crop. This is thought to be reasonable, given that improved varieties explain one-fourth to half of the past yield increase, and improved management (the increased use of synthetic fertilizers, irrigation, chemicals and machinery, and improved input-use efficiency) are the reasons for the remaining portion of the yield increase (Herdt and Capule 1983;Jones 2013). And there is a growing body of evidence that seasonal climate conditions explain one-third to two-third of yield variability (Iizumi et al. 2013;Ray et al. 2015;Heino et al. 2018). Although these explained variance values of yield trend and variability do not necessarily sum up to a 100%, it is evident that management and climatic factors together explain a predominant portion of variations in yield in space and time. Nevertheless, SOC effect deserves high attention because SOC buildup can be achieved through traditional farming practices (manure application, reduced or no tillage, cover cropping etc.) even when farmer's access to agricultural chemicals, synthetic fertilizer and seeds of modern varieties are limited (FAO and ITPS 2021). While the reduction in agricultural chemicals and inorganic N input would have certain environmental benefits, enhancing SOC has well-known and more wide-ranging co-benefits than controlling other management factors. Soil management practices need to be fine-tuned for farming types and crops specific to the region of implementation in addition to the physical and chemical characteristics of the soils (Amelung et al. 2020). While the feasibility of increasing the C sequestration rate in global cropland and achieving the 4 per mille goal is under debate, even smaller SOC build-up can have robust cobenefits on yield, soil fertility, drought tolerance and water quality, as mentioned earlier. Small-scale farmers have been a linchpin in recent policymaking (Cui et al. 2018;Ricciardi et al. 2018;Lesiv et al. 2019), and their contribution to the potential of global SOC build-up and crop production increase is estimated to be sizeable. Therefore, it is critical that institutions and policies promoting soil management for sustainable land and climate mitigation will be aligned with and embedded within agricultural and rural development policies in which small-scale farmers are the main target.
We excluded C-poor croplands with SOC < 3 kg C m -2 from our simulation, most of which are located in arid and semiarid regions of the world, although such areas should have a certain potential of additional SOC stock and co-benefits to dryland agriculture (Plaza-Bonilla et al. 2015). The SOC enhancement in these areas is likely affected by the availability of irrigation, as indicated by the finding from field irrigation experiments across the different climate zones that reports the positive effect of irrigation on SOC build-up, particularly in arid and semiarid areas with low initial SOC (Trost et al. 2013;Zhou et al. 2016). Future research needs to address this limitation.
Other limitations in the current approach included the uncertainty associated with the significant discrepancy between the two global SOC datasets discussed above. Even the estimates of current global SOC stocks are substantially different between existing global soil datasets (FAO and ITPS 2018). Improvement in the global SOC dataset is essential to obtain more accurate estimates of the additional SOC build-up in global cropland. Simulations using process-based crop models coupled with soil biogeochemical models are awaited to estimate the time required to build up the SOC presented in this study as well as the amount of required input to the soils. The relationships used in our simulation indicated the yield response to SOC and N application rate when the other explanatory variables considered remain unchanged. This assumption may not be realistic for some regions and crops since the SOC build-up rate itself is influenced by management and climatic conditions. Process-based models are capable of assessing multiple factors that correlate with one another. Such interactions are, however, difficult to address by statistical models.

Conclusions
This study presents the potential synergy between crop production and climate mitigation from additional SOC build-up over global cropland via improved soil management. Estimating the amount of inorganic N input required to achieve the crop production increase expected from the SOC build-up presented in this study would be a useful starting point to explore further synergies through saving N fertilizer input and costs and maintaining drinking water quality. Importantly, the contributions of small-scale farmers to the potential of additional SOC stock and crop production increase are sizable. Our findings emphasize the importance of linking institutions and policy between agricultural and rural development, sustainable land management and climate mitigation to simultaneously pursue multiple development goals.
Additional file 1: Table S1. Base temperature, upper temperature limit and crop water needs used to calculate the season temperature and water balance index. Fig. S1 Comparisons between the observed and modeled yields.