Genetic analysis of scab disease resistance in common bean ( Phaseolus vulgaris ) varieties using GWAS and functional genomics approaches

Introduction Scab is a fungal disease of common beans caused by the pathogen Elsinoë phaseoli . The disease results in major economic losses on common beans, and there are efforts to develop integrated pest management strategies to control the disease. Modern computational biology and bioinformatics tools were utilized to identify scab disease resistance genes in the common bean by identification of genomic regions and genes associated with resistance to scab disease during natural infection in the field. Methods A diverse set of common bean accessions were analyzed for genetic association with scab disease resistance using a Genome-Wide Association Study design of infected plants and non-infected plants (controls). A fixed and random model circulating probability unification model of these two covariates that considers a minor allele frequency threshold value of 0.03 were deployed during the analysis. Annotation of genes proteins with significant association values was conducted using a machine learning algorithm of support vector machine on prPred using python3 on Linux Ubuntu 18.04 computing platform with an accuracy of 0.935. Results Common bean accessions tested showed varying phenotypes of susceptibility to scab disease. Out of 179 accessions, 16 and 163 accessions were observed to be resistant and susceptible to scab disease, respectively. Genomic analysis revealed a significant association on chromosome one SNP S1_6571566 where the protein-coding sequence had a resistant possibility of 55% and annotated to the Enhancer of Poly-comb like protein. Conclusion The significant differences in the phenotypic variability for scab disease indicate wide genetic variability among the common bean accessions. The resistant gene associated with scab disease was successfully identified by GWAS analysis. The identified common bean accessions resistant to scab disease can be adopted into breeding programs as sources of resistance.


Introduction
Economically important diseases of the common bean are a major problem causing crop production losses of up to 100% (Mahuku et al. 2002).Common beans are subjected to numerous biotic stresses from bacterial, viral and fungal pathogens that include scab disease, which is a common fungal disease among many other plant species and significantly reduces yield (Fan et al. 2017).Scab disease, caused by the fungal pathogen Elsinoë phaseoli, poses a significant economic threat to common bean (Phaseolus vulgaris) grain production and causes yield losses (Masheti 2019).Although control and preventive measures toward the disease have been seen to be good agronomic practices that have been adopted from standard agronomic guidelines for the control of other diseases, breeding for resistant varieties would be an even more sustainable approach (Otsyula et al. 2020;Singh and Schwartz 2010).A sustainable mitigation strategy for the economic losses associated with scab disease would be identification of genes associated with resistance and breeding for resistant common bean varieties that are adapted to local environments.
In common bean, this particular disease is caused by the fungal pathogen Elsinoë phaseoli of the genus Elsinoë, family Elsinoaceae, and has historically been recorded to have a devastating impact, leading to gross economic damage due to complete plant losses in farmers' fields (Masheti 2019;Mutitu 1979;Otsyula et al. 2018;Phillips 1994).Scab, is a common disease among leguminous plants, including Lima bean, cowpea, runner bean, and common beans (Mutitu 1979;Phillips 1994;Singh and Allen 1979), It is characterized by cork-like white lesions on stem and pods, leading to leaf folding at the midrib.Empirically, severe scab infection has been seen to have a major impact on some yield components, such as the number of pods per plant affected by the scab disease, which in turn exhibits a negative direct effect on yield.The observation from the infection in South Africa was that many of the infected pods were distorted and failed to form seeds (Phillips 1994).In Kenya, similar observations are seen in farmers' fields with curly pods that are heavily infected.Efforts to map the disease genes associated with the scab pathogen in common bean have not been reported, even though there are efforts to identify resistance genes in other crops, such as apples (Malus Domestica Borkh) (McClure et al. 2018), where GWAS was used to identify QTL with statistically significant associations to scab disease resistance.One pivotal approach to understanding scab disease resistance in common beans is Genome-Wide Association Studies (GWAS).GWAS has been widely used to identify genetic factors associated with disease resistance in various crops, and its application in common beans is gaining momentum (Nunzio D' Agostino et al. 2023;Perseguini et al. 2016;Zhang et al. 2019;Zuiderveen et al. 2016).This usually incorporates statistical models like multilocus mixed-model (MLMM), mixed linear models (MLM) and generalized linear models (GLM) are commonly used to test the association between genetic markers and phenotypic traits.The FarmCPU GWAS models are designed to minimize false positives and negatives by utilizing a mixed linear model method that considers both fixed and random effects.This approach enables the control of population structure and familial relatedness simultaneously (Liu et al. 2016).FarmCPU has demonstrated considerable promise in identifying loci associated with critical agricultural traits, such as disease resistance (Miao et al. 2018;Sadessa et al. 2022).
In addition, the application of computational biology tools in understanding scab disease resistance cannot be overlooked.These tools, including machine learning and deep learning technologies, have revolutionized the field of proteomics.They enable the prediction of protein functions related to disease resistance, particularly focusing on specific motifs like nucleotide-binding site leucine-rich repeats (NBS-LRRs), the TIR Toll-like interleukin receptors domain, Receptor-like kinase and Receptor-like proteins (RLKs and RLPs) and WRKY domain (Tang et al. 2017;Hammond-Kosack and Kanyuka 2007).Computational tools such as NLR-Parser, disease resistance protein prediction program (DRPPP), NBSpred and RGAugury have been developed to identify these motifs in protein sequences, facilitating the classification of proteins as resistant or non-resistant (Kushwaha et al. 2021;Pal et al. 2016).Moreover, support vector machine (SVM) algorithms have been effectively deployed for this purpose.The SVM is a large-margin classifier, which is a vector-space-based machine learning method where the goal is to find a decision boundary between two classes that is maximally far from any point in the training data (Klampanos 2009).It was used to classify plant disease resistant and non-resistant proteins.In this study we evaluated a diverse set of common bean accessions against naturally occurring scab disease and subjected the genotypes to a GWAS analysis to identify SNPs of association with resistance, then annotated the genes associated with the SNPs to determine their function using the computation biology approach.

Phenotypic evaluation of common beans accessions for scab disease
A total of 179 common bean accessions of the Andean diversity panel (Cichy et al. 2015) were evaluated for scab infection occurring naturally in two agro-ecological zones: Butonge; Lower Midland zone LM (Latitude 0° 42′ 45.6762″ N Longitude 34° 27′ 55.5762″ E) and Kakamega; Upper Midland zone UM (Latitude: 0° 16′ 47.5062″ N Longitude: 34° 46′ 1.8192″ E), aided by spacer rows of highly susceptible accessions to increase disease pressure and ensure even distribution in the experimental fields.Disease severity was assessed using a scale ranging from 0 to 3 (Mbugua 2016), with 0 indicating no disease, 1 indicating healthy (resistant) plants, 2 indicating scab lesions coalescing into dead tissue zones with leaf curling (tolerant), and 3 indicating severe disease affecting over 50% of plants with stem twisting and plant death (susceptible).Data on flowering and pod filling were collected concurrently with disease severity assessments.
Samples of scab-infected common bean tissues (pods, stems, and leaves) were collected for further analysis.The tissues were washed, sterilized, and dissected for microscopy analysis.Methylene blue staining was used, and the specimens were viewed under a light microscope at 400× magnification.
Infected tissue samples from pods, stems, and leaves were scraped and streaked on potato dextrose agar (PDA) media doped with chloramphenicol antibiotic (50 mg/l).After growth, distinct colonies were subcultured on fresh PDA media, allowing for the isolation of Elsinoë phaseoli, Infected and healthy plant tissues were macerated to extract toxins with an ether and acetone (1:1 v/v) mixture, and their spectroscopic absorbance was measured at 400-600 nm on spectrophotometer (Model: UV-61PCS from mrc lab).
To reflect a balanced distribution of the effect of scab disease on the plants in the field, a geometric mean of the progressive disease scores at the three disease severity stages was calculated using an R-programming language.The geometric means were then used as variables to perform the analysis of variance statistics (Sokal and Rohlf 2012).To compute the geometric mean, Log(x) was first calculated, before the arithmetic means and its confidence interval was computed by the row mean (Signorell et al. 2022).This was restricted to positive inputs since from our scoring scale zero was for no disease in the environment.Thus the geometric mean is defined as: In the R programming suite (version 4.0) this is given by exp(rowMean(log(x))).
Yield data was collected by weighing the harvested grain per plot in grams.Yield was thus calculated using the formulae.
Yield per hectare = Yield in tons/Area in hectares.

Yield(g)
Plot size * 10000 1000000 The Phenotype was adjusted for environmental effects using an analysis of variance (ANOVA) of the data.Assuming the mixed effect model: where yij is the response variable of the jth experimental unit on the ith explanatory variable.βi is the effect of the ith treatment and eij is the random error ~ (0,Iσ2).
Broad-sense heritability for scab resistance and susceptibility traits was calculated using R-programming and references to (Schmidt, et al. 2019a, b).Genomic prediction was carried out using the best linear unbiased prediction (BLUP) method for phenotypic values, with the agridat package in R, as described in Wright's 2021 software package.Cluster analysis was conducted using a Minkowski distance matrix to categorize the data into distinct clusters.

Identification of genetic variants using genome-wide association studies
As the common beans have shown variable phenotypic reactions to the scab disease, with some showing degrees of resistance to the disease and the majority showing susceptibility (Otsyula et al. 2018).These variations in phenotypic reaction to scab disease form a precursor for a GWAS on the resisrtance genotype.Genomic data consisting of SNP information for 174 genotypes with 31,194 SNPs were sourced from (Cichy et al. 2015;Song et al. 2015).To identify genetic markers associated with scab resistance, a Genome-Wide Association Study (GWAS) was conducted using the FarmCPU (fixed and random model circulating probability unification) model of fixed effect without a kinship to remove confounding and the ambiguity of determining associated markers in LD with a testing marker.A linear model approach where a FarmCPU using both fixed and random effect models iteratively in a forward and backward stepwise regression whereby the fixed effect model without a kinship matrix was used to remove confounding and the ambiguity of determining associated markers in linkage disequilibrium (LD) with a testing marker.
The si represents the fixed effects, which are systematic factors that do not vary with the genetic markers such as population structure.S represents the genetic markers, which are assumed to have some effect on the phenotype (y), e represents the error term, which includes random variation or unexplained factors that affect the phenotype but are not accounted for by the genetic markers and fixed effects K represents these potential sources of variation, due to kinship.While kinship derived from the associated markers are used to select the associated markers using maximum likelihood method (Liu et al. 2016) to analyze genetic variation across the entire common bean genome to identify genetic markers that correlate with scab disease resistance traits.Quality control measures were applied to exclude SNPs with low minor allele frequencies at 3% and individuals with incomplete SNP genotype data at 10%.Seven principal components were used to infer features of the study population (Zhao et al. 2018).This was set in a case (infected) vs control (non-infected plants) while the mean severity score was used as covariates in the study.
A support vector machine (SVM) learning algorithm, prPred, was used to predict resistance of non-annotated protein sequences of candidate genes.K-spaced amino acid pair encoding scheme was incorporated into a support vector machine to classify plant disease resistant proteins (Wang et al. 2021).The model training dataset included R proteins and non-R proteins from 35 species.Various features were extracted from the protein sequences, such as amino acid composition, grouped amino acid composition, quasi-sequence-order, and others.Here the k-spaced amino acid pair encoding scheme was used to select features to be incorporated into the SVM to classify disease-resistant proteins in plants (Wang et al. 2021).It computed the frequency of all amino acid pairs with k spaces separated by k of other amino acids within the peptide sequence by k number of residues, k = (1, 2, 3) such as CK, CxK, CxxK and CxxxK, where x is the k residues (Hasan et al. 2015;Huang et al. 2021).This method counted the frequency of specific amino acid pairs with varying spaces (k = 1, 2, or 3) between them, such as CK, CxK, CxxK, and CxxxK.This method is effective in feature selection and predicting plant disease resistance proteins.This tool has been used to predict plant resistant protein effectively.

Development of PCR primers targeting genes associated with scab disease resistant
Primers were designed for candidate genes associated with scab disease resistance.Two sets of primer pairs were created, with outer primers flanking the SNP region and inner primers having a nucleotide mismatch at the 3' end to specifically target the alternate allele A/G.These primers were designed for the gene of the coding sequence with Gene symbol PHAVU_001G055900g in the Phaseolus vulgaris reference genome (Schmutz et al. 2014).Outer Primers: Forward 5ʹ-GGT ATG GTA CAG TTA TGA CAA GTG -3ʹ, reverse 5ʹ-CAG CCA TGT TCA AGC AGC CTTCA-3ʹ.Inner Primers: Forward 5ʹ-GGA GAT GCT TTT TGT TGA TAATA-3ʹ, reverse 5ʹ-AAT AGA AAT CTC AAC CCA ACCAc-3ʹ.
PCR primer validation was performed on scab-tolerant local accessions, including un-genotyped ones that exhibited varying responses to scab disease in the field.This validation aimed to assess marker polymorphism between resistant and susceptible accessions and confirm the alignment of identified genes with observed phenotypes.Twelve scab-resistant plant accessions were used, with genomic Deoxyribonucleic acid (DNA) extracted from their young trifoliate leaves using a modified cetyltrimethylammonium bromide (CTAB) extraction protocol.The DNA concentration was confirmed by agarose gel electrophoresis, displaying a single, prominent band at approximately 10 kbp.Three polymerase chain reaction (PCR) reactions were performed: one using outer primers to target the original gene version, another combining reverse outer and forward inner primers, and a third using forward outer primers and reverse inner to target an alternate version on the forward and reverse strands.PCR reaction was achieved using OneTaq PCR premix containing 1U/µl of Taq polymerace enzyme, 0.2 mM deoxyribonucleotide triphosphate (dNTPs), reaction buffer and 2 mM MgCl 2 , 0.5 µM of each reverse and forward primers, 5 ng/μl of genomic DNA was used in the reaction.PCR program was set at one cycle at 94 °C for 3 min followed by 34 cycles at 94 °C for 10 s, annealing temperatures for 30 s at 64 °C (Outer Primer), 50 °C (Reverse Outer plus Forward inner), 51 °C (Forward Outer plus Reverse Inner) and the extension at 72 °C for 2 min.A final extension for 5 min at 72 °C and stored at 4 °C.A 6% horizontal polyacrylamide gel electrophoresis was performed for one hour and post-stained with 0.03 mg/ml of ethidium bromide.

Phenotypic evaluation of common beans accessions for scab disease
Scab disease symptoms were observed to progress over time in different common bean accessions under natural infection conditions throughout the growing period.The symptoms (Fig. 1) included corky wart-like lesions on the leaves, twisting of the stem and leaves, scab lesions coalescing into dead tissue zones on leaves, inward curling of leaves due to midrib infection, stem twisting, cork-like lesions on pods, mummified pods due to scab infection, complete defoliation, and plant death.
Microscopic examination (Fig. 2) of infected plant tissues revealed the presence of mycelia that penetrated plant cells, dense canker tissue containing acervuli, asci, and ascospores.These features are indicative of Elsinoë phaseoli infection, confirming the cause of scab disease.
Ultraviolet (UV) spectroscopic methods were employed to compare the spectral signatures of infected tissue with those of healthy tissue from the same common bean accession (Loc0004).A distinct and unique spectral signature at 470 nm was identified in the infected tissue compared to the healthy tissue of the same accession (Fig. 3).This method facilitated the identification of distinct differences in the spectral characteristics associated with Elsinoe phaseoli infection.Notably, a unique spectral signature was observed at a specific wavelength, (470 nm), for the infected plant tissue which was not the case for the healthy tissue.
Disease severity scores varied across three growth stages, with the highest scores recorded at the pod filling stage (Fig. 4).The data showed a positive skewness, indicating that the distribution of severity scores was skewed to the right.In the positively skewed distribution, the right tail was longer than the left side, demonstrating a higher disease severity scores compared to low ones.There was an increase in standard error (SE) across the growth stages from 0.0154 to 0.0344 in Kakamega field and 0.0198 to 0.0354 in Butonge field, corresponding to a growing variability in the estimates.
Broad-sense heritability for scab disease traits was calculated to be approximately 0.45642, indicating the proportion of phenotypic variance attributable to genetic factors.The random effect model accounted for the Best Linear Unbiased Predictor (BLUP) values controlling for unobserved genetic heterogeneity when the genetic heterogeneity was constant over time while minimizing bias and providing the best linear estimate at the intercept and represent the estimated random effects of the different genotypes (Table 1).The heritability values under standard (h2.s),cullis (h2.c) and Piepho (h2.p) approach were calculated as 0.45642, 0.60272 and 0.61512 and a genetic variance (V.g) and Hierarchical clustering (Fig. 5) of the unweighted pair group method with arithmetic mean (UPGMA) for common bean accessions based on their similarity in terms of resistance or susceptibility to scab disease.The dendrogram identified three distinct clusters where common bean accessions with similar disease phenotypes were grouped together.The first clusters consisted of common bean varieties such as ADP0537, ADP0030, LOC0003 and ADP0719 that exhibited high resistance to scab disease.The second cluster consisted of common bean varieties that showed moderate levels of resistance to scab disease in the field trials.These varieties may possess  3): Elsinochrome from Elsinoë phaseoli agar plug with a distinct absorbance at 470 nm partial resistance or a combination of genes that provide protection against specific pathogens.The third cluster identified common bean varieties such as ADP0585, ADP0303, loc0001, loc0004 that were susceptible to scab disease.

Identification of genetic variants using genome-wide association studies and functional genomics
The (QQ) plot (Fig. 6) assessed how well the GWAS model accounted for the population structure and familial relatedness.The negative logarithms of the p-values from the models fitted in GWAS were plotted against their expected value under the null hypothesis of no association with the trait.The GWAS analysis using FarmCPU identified genetic loci associated with scab disease resistance in a panel of 179 common bean accessions.After controlling for population structure and relatedness, we detected 1 significant quantitative trait loci (QTL) associated with scab disease resistance across different environments, with p-values of 1.8e−08.These QTL spanned region on chromosomes 1 and explained up to 45.6% of the phenotypic variation in scab disease resistance.Our results suggest that the identified QTL could be promising targets for marker-assisted breeding to common bean improvement.
Genome-wide association studies (GWAS) identified significant single nucleotide polymorphisms (SNP) associated with scab disease resistance on chromosomes one.This SNP explained a significant proportion of the phenotypic variation in resistance.A second suggestive locus of interest was observed on chromosome eleven at S11_1967229.Additional SNPs were also identified on chromosome one at S1_5502835 and chromosome eleven at S11_9666528 and S11_38448497.Associations between phenotype and genetic markers are displayed as Manhattan plots (Fig. 7).
Among the SNPs tested (Table 2), S1_6571566 had the lowest p-value (1.81E−06) and a minor allele frequency of 0.084848.This SNP was also associated with a relatively high effect size of 0.46, suggesting that it may be an important genetic variant underlying scab disease resistance in common beans.
The Support Vector Machine prPred, predicted the significant protein on (Table 3) linked to the SNP associated with scab disease resistance in the common bean to be a resistant protein called enhancer of polycomb-like 1 (EPL1) protein family of the gene symbol PHAVU_001G055900g with an accuracy of 0.547104 followed by ATP binding cassette 2 transporters (ABC2 transporter) protein of the gene symbol PHAVU_001G054400g within a nearby locus, with an accuracy of 0.640101.On chromosome eleven the nucleotide binding site leucine rich repeat (NBS-LRR) protein of gene symbol PHAVU_010G063000g had the highest resistance possibility of 0.991374.The Adaptin and PHD finger proteins on chromosome eleven were predicted to be resistant at 0.573563 and 0.769060 resistant possibilities respectively.On chromosome one a domain loci was detected tagged by S1_5502834 encoding protein kinases involved in disease resistance.There were putative genes associated with the resistance on chromosome one, and eleven with close proximity to the tagging SNP (Additional file 1: Table S1, Additional file 2: Table S2, Additional file 3: Table S3, Additional file 4: Table S4, Additional file 5: Table S5 & Additional file 6: Table S6) The other proteins are classified as non-R protein at a low percentage prediction of below 50% on the prPred (Table 3).
The significant SNP S1_6571566, whereby the alternate allele is a G as opposed to a wildtype A on chromosome one which was tagged on the EPL1 resulted in an amino acid substitution of methionine with a valine M662V on the protein sequence.Resistant gene prediction on the SVM for the mutant variant resulted in a slight increase in the predicted resistance possibility from 0.547104 to 0.54732196, reflecting a marginal gain of function of 0.021796%.

Development of PCR markers targeting genes associated with scab disease resistance
The second primer pair (with a reverse outer and forward inner primer) showed variation in the form of a single nucleotide polymorphism (SNP) associated with scab disease resistance (Fig. 7).Furthermore, the third primer pair (with a forward outer and reverse inner primer) revealed variation specifically in the resistant accession loc0003 (MCM 2001), with a distinguishable dominant band among all accessions (Fig. 8).

Phenotyping scab disease on common bean
Scab and other fungal diseases are responsible for significant losses to common bean grain yield and quality worldwide.The recent outbreak of scab disease in Kenya is a threat to food security (Masheti 2019;Otsyula et al. 2020).The most sustainable and effective way to tackle scab disease is through identification and development of common bean scab resistant accessions.In addition, pursuing resistant breeding, identifying scab-resistant germplasm is important for the breeding of resistant varieties.Currently, many common bean accessions are available for exploration of genetic and phenotypic variation with respect to scab resistance (Otsyula et al. 2018).In this study, the 179 common bean accessions evaluated for resistance to scab disease showed significant differences among common bean genotypes in the two locations, indicating variability for resistance to the disease.Scab disease severely affected the growth and yield of infected plants.It led to lesions on leaves, poor budding and flowering, pod distortion or mummification, leaf loss, and plant death.The onset of the disease was characterized by various symptoms, including lesions on leaves and pods.
The study revealed that the evaluated genotypes showed considerable resistance to scab, with 8.9% of the genotypes showing resistance.Majority of the genotypes in the two agro-ecological zones in western Kenya showed symptoms that were expressed progressively as folding leaves, twisting stems and cork-like white lesions on stem, cork-like lesions on the pods and stems as well as death of the entire plant and mummified pod.The symptoms observed in this study are similar to those reported by (Phillips 1994).The folding of the leaf could be as a result of electrolyte leakage on the plant cells caused by the phytotoxic elsinochrome that's produced by Elsinoë spp as described by (Jiao et al. 2019).Fungal infections on plants can be deduced from observed symptoms caused by the fungus through electrolyte imbalance and toxicity are usually as a result of fungus feeding on the cell's nutrients and causing damage to the plant through destruction of the cell wall.
The diseased common bean plant tissues were investigated in the laboratory through a cross-section microscopy for identification of the pathogen.Morphological features which are synonymous with Elsinoë of the Elsinoeceae family were observed in form of asci containing ascospore in locules (Jayawardena et al. 2014).These sexual reproductive parts of the fungus were globose and were found localized within the plant cell indicating intercellular existence of the fungus through cellular colonization to obtain food from host cells after causing the cell's death.These morphologies of the pathogen observed on infected plant tissue were synonymous with the Elsinoë spp.(Fan et al. 2017;Jayawardena et al. 2014).The majority of Elsinoe spp.produce elsinochrome, which is a class of secondary metabolites called perylenoquinone which are aromatic polyketide characterized by a highly conjugated pentacyclic core, that confers them with potent light-induced bioactivities and unique photo physical properties producing a singlet oxygen that is reactive causing cell damage and electrolyte leakage in the plant cells thus making the food available for the fungus (Hu et al. 2019;Jiao et al. 2019).Spectroscopic analysis of fungal toxin extracted from infected plant tissue revealed traces of elsinochrome at an absorbance of 470 nm while the healthy plants had no significant absorbance at these wavelengths (Jiao et al. 2019;Kuyama and Tamura 1957;Liao and Chung 2008).This suggests that the disease symptoms were due to Elsinoë Phaseoli pathogen causing disease through its virulence factor elsinochrome.To detect the presence of elsinochrome in the plant tissue, a serial extraction using ether and acetone was adopted and subsequently detected by spectrophotometric method (Banu and Cathrine 2015;Jiao et al. 2019;Kuyama and Tamura 1957).Comparing the absorbance pattern for the crude extract to other perylenequinones such as elsinochrome, cercosporin and hypocrelin (Daub et al. 2013), the extract on the infected tissue had absorbance similar to the perylenequinone core derivatives indicated the presence of the light activated elsinochrome (Hu et al. 2019;Jiao et al. 2019;Kuyama and Tamura 1957;Liao and Chung 2008).The detection of elsinochrome at 470 nm is evidence that the symptoms scored against the common bean plants in the field experiment were due to an elsinochrome producing pathogen Elsinoë phaseoli.
A cluster analysis with a Minkowski distance which is Euclidean distance weighted with Manhattan distance of the severity means for the two sites revealed a clade containing only the resistant accessions.This clustering confirmed the phenotypic observations performed in the sites by scouting the field and recording the accessions that had no scab disease symptoms and were resistant to scab disease.Common bean accessions grouped in this cluster were considered as resistant and the remaining accessions were considered as susceptible.The occurrence of several clusters as resistant, tolerant and susceptible and the continuous distribution of disease score suggest a quantitative nature of disease resistance (French et al. 2016;Poland et al. 2009).The resistant accessions Loc0003 (MCM 2001) which is locally known to have the resistance gene for bean common mosaic virus and bean common mosaic necrosis virus which are the Bc3 gene and the I gene was among the most resistant accessions (Ali 1950;Drijfhout 1978;Mukeshimana et al. 2005).This reaction was also observed on accessions ADP0551 (AFR 612), ADP0555 (BRB191) and ADP0211 (G 4780) which was resistant to scab disease.Two black seeded accessions ADP0030 (Rh.No 6), and ADP0214 (G 5087) were observed to show resistance to scab disease and were clustered within the resistant cluster along with ADP0526 (Cal 143), ADP0020 (KIGOMA), ADP0717 (VTTT924/4-4), ADP0540 (AFR 708), ADP0529 (LYA-MUNGO 90), ADP0354 (G 22502), ADP0537 (AFR 619), ADP0636 (Montcalm), ADP0739 (UYOLE 03), and ADP0719 (NUA 59).

Scab disease-resistant gene identification
Fourteen common bean varieties were considered to be resistant while the remaining 165 were considered to be susceptible from the phenotyping field experiment.In order to capture the uniform distribution of the progression of the scab disease on common bean under natural infection a scale of 1 to 3 was used where the geometric mean across the three stage scores reflected a balanced unbiased distribution of how the common beans were reacting to the scab disease throughout the growing period (Sokal and Rohlf 2012).A genomic prediction based on a Best Linear Unbiased Prediction (BLUPs) elucidated the severity scores based on a fixed genetic effect and the random environment effect on the genotypes.The BLUPs values predicted were used as covariates in a case vs control GWAS study against their genotyping by sequencing SNP data (Cichy et al. 2015;Song et al. 2015).On synching the phenotypic with the genotypic SNP data, the un-genotyped common beans were filtered off.A population of 165 SNP genotyped common bean varieties were used to measure the resistance trait of scab disease resistance in common beans.The study was Table 3 Resistant genes prediction and annotation for the significant SNP and the suggestive SNPs associated with candidate genes for scab resistance The Chromosome SNP position highlights the location of the open reading frame (ORF) for the gene in the prediction.'NaN' indicates that the exact distance from the target SNP could not be determined; however, the distance is in Mb (mega bases) proximity to the target SNP.The Phaseolus vulgaris v2.1 genome was used for these annotations.The proteins were classified into domains based on their domain motifs, which also define their functionality.able to identify some potentially important genetic variants associated with this complex trait in common beans.
The SNP S1_6571566 was found to have a significant association with scab disease resistance with a p-value (1.81E−06) and a minor allele frequency of (0.084848) with an effect size of 0.455757.The S1_5502835 also had a high effect size of 0.382909 within chromosome one along with SNPs on chromosome eleven S11_19677299 with an effect size of 0.179289.These SNPs are considered to have a large impact on the phenotype based on their moderately high effect size from the GWAS analysis.A larger effect size is more desirable as it provides stronger evidence for the association between the genetic variant and the scab disease resistance trait (Bukszár and van den Oord 2010;Holland et al. 2016;Stringer et al. 2011).However, the effect size of genetic variants associated with complex traits such as disease resistance is usually small (Ingvarsson and Street 2010;Zhang et al. 2022).A common approach is to focus on variants with small effect sizes but high statistical significance, which was achieved with a large sample size and rigorous statistical analysis.Whereby, the significance threshold for genome-wide association is often set at p < 5.0 × 10 −8 , which corresponds to a false discovery rate (FDR) of approximately 0.05.The quantiles-quantiles plots show the absence of spurious associations due to population structure and familial relatedness with one outlier which is the significant SNP of the GWAS.The Manhattan plot for the GWAS showed a significant SNP associated with scab disease resistance of PHAVU_001G055900g on chromosome one.This was a single SNP which scored a p-value of (p = 1.81E−06) suggesting the significant association with the trait of scab disease resistance being tested.The next nearest SNP to these loci was in chromosome one S1_6231746, gene PHAVU_001G054400g.This was of interest in finding the scab disease resistant gene in the study since the two SNPs were separated by 340kbp in the genome.On chromosome one S1_5502834 tagged on the intergenic region encoding Receptor-like kinases was of interest in this study due to the large effect size of 38%.The gene PHAVU_011G023900g of interest on chromosome eleven tagged by S11_1967299 was also investigated Fig. 8 EPL1 gene PCR amplification with ARMS PCR primers.(P): Primer Pair 1 identifying the presence of the EPL gene irrespective of any alternate allele at 111 bp, a combination of outer forward primer and outer reverse primer.EPL primer pair 2 identifying the wild type allele at 77 bp by terminating the primer sequence on the SNP causing a mismatch on the forward strand is a combination of forward inner primer and reverse outer primer.EPL primer pair 3 identifying the alternate type by terminating the primer sequence on the SNP causing a mismatch on the reverse strand forward strand is a combination of forward outer primer and reverse inner primer.The accessions of resistant genotype lOC0003 (MCM 2001).ADP0030 (Rh.No 6), ADP0719 (NUA 59) and ADP537 (ARF 619).(Q): The Amplification Refractory Mutation System ARMs PCR primer design scheme for the EPL1 gene as a potential candidate which would enhance the gene finding study.
The fixed and random model circulating probability unification (FarmCPU) model, which controls both the false negative and false positive at multiple loci (Kaler et al. 2020) was able to control false positives.The model employed a combination of a Fixed Effect Model (FEM) and a Random Effect Model (REM), which were iteratively applied in 2 out of a potential 10 iterations.Kinship information was established by estimating associated markers within the (REM) to mitigate the risk of overfitting in the (FEM) (Liu et al. 2016).The FarmCPU model thus reduced the chances of having false positives and false negatives by iterating through the FEM by testing markers, one at a time, and multiple associated markers as covariates to control false positives and the REM while defining a kinship.Alternative methods employed otherwise in various other studies (Yoosefzadeh-Najafabadi et al. 2022;Zhou et al. 2019) is the use of machine learning algorithms and the integration into GWAS studies to detect and specify causative SNPs in a less significant detection to specify functional roles within the minor QTLs such as the common bean's chromosome 11 plant homeo-domain and the adaptin N protein families from this study.Here the machine learning algorithm was used to predict the function of specific proteins linked to the discovered SNPs.The choice of the machine learning model influences the prediction accuracy that could be achieved in this endeavour.Wang et al. (2021) developed a machine-learning algorithm based on a Support Vector Machine to achieve an overall best prediction accuracy of 93%.Prediction algorithms was based on numerical feature encoding, extraction and feature selection of the features used in the algorithm, other prediction algorithms have been in existence and were based on the resistance motif features of the NBS LRR motif, Receptor-like Kinases RLKs, Receptor-like proteins RLPs and the TIR motif the lectin domain motifs.This approach would rather seem to limit terms of the discovery of novel scab disease-resistant proteins, thus a more generalized method that involved the use of the K-spaced amino acid sequence feature was a better approach.The support vector machine learning algorithm of prPred predicted the R-proteins for the proteins linked to the significant SNP on chromosome one and other SNPs of interest with a prediction accuracy of 93% between its training and test dataset.The best prediction came at 0.99 in chromosome eleven, however the SNP was not above the significant threshold and with a small effect of 0.24817 on the phenotype, followed by 0.57 on chromosome eleven S11_7240334.The significant SNP was tagged with a 0.55 R protein prediction on chromosome one S1_6571566 with an effect size of 0.455757.An increase in sample size would scale the significance (Uffelmann et al. 2021) of the SNPs in the study and thus a close look into the role of these genes linked to the SNPs informed on the role of the SNP locus as disease resistant locus.
Disease resistance in plants is a complex trait that encompasses genetics and epigenetics in its modulation and expression.This is usually driven by a myriad of genes and protein domain complexes, of which Polycomb group (PcG) encoding genes are recognized for playing a major role in epigenetic regulation, acting as factors in histone modifications.Polycomb repressive complex 1 (PRC1) and PRC2 are the major complexes composed of PcG proteins in plants.The PRC2 catalyzes methylation of H3K27me3, while PRC1, consisting of the catalytic domain E3 ubiquitin ligase RING1 and PCGF, ubiquitinates histones at H2AK119.The catalytic activities of PRC1 and PRC2 vary in distinct cell types, particularly in plants.These processes alter DNA accessibility, representing one of the mechanisms involved in gene silencing by polycomb group (PcG) proteins.They are crucial in mediating chromatin modification and regulating developmental aspects in defence and disease resistance impacting the extent of repression activities (Bannister and Kouzarides 2011;Cochran 2017;Derkacheva and Hennig 2013).Apart from their involvement in developmental regulation, polycomb group proteins play diverse roles in epigenetic regulation, particularly in response to biotic stresses in plants, impacting plant defences and disease resistance through epigenetic regulation (Hennig and Derkacheva 2009;Holec and Berger 2011;Wu et al. 2022).
The enhancer of polycomb like 1 (EPL1), a member of the PcG gene family, is an epigenetic-related factor that has been implicated in histone modification by chromatin remodelling and regulating the expression of other genes by altering histones to specific DNA sequences and modulating the activity of transcription factors, facilitating epigenetic responses to plant stresses such as disease, drought, and toxin exposure (Spitz and Furlong 2012).It plays a pivotal role in epigenetic regulation of gene expression under environmental stresses by regulation of the expression of genes involved in the plant's defence mechanisms (Kleinmanns and Schubert 2014;Barozai and Aziz 2018;Springer et al. 2002;Stankunas et al. 1998;Xie and Duan 2023).The SNP in chromosome 1, S1_6571566, significantly associated with the scab disease in common bean and tagged on EPL1 encoding gene in chromosome one (Table 3) and exhibited a 46% effect size (Table 2) and anchored to resistant gene prediction of 55%, plays a role in common bean defence against scab disease activated through the toxicity of the Elsinoë phaseoli virulent factor elsinochrome.
Epigenetic disease defence mechanisms have been extensively studied in various plants like Triticum aestivum (wheat), Oryza sativa (rice), and Brassica napus (canola) (Alvarez et al. 2010;Derkacheva and Hennig 2013;Guo et al. 2019;Hoang et al. 2018;Kumar et al. 2020;Shilpa et al. 2022;Tirnaz et al. 2020), enhancing our understanding of how plants employ epigenetic regulation for defence against biotic and abiotic stressors.The Epc-N (enhancer of polycomb N terminus) domain, found in the PFP1 protein is linked to Rhynchosporium commune pathogenicity in barley, this suggests a potential involvement of the EPL1 protein domain in disease resistance.This domain is associated with chromatin remodelling proteins, particularly those in histone acetyltransferase (HAT) complexes (Loreto Espinosa-Cores et al. 2020;Searle and Pillus 2017).The Epc-N domain's role in protein-protein interactions may contribute to the regulation of gene expression related to disease resistance in plants, necessitating further studies in the context of plant-pathogen interactions (Siersleben et al. 2014).Notably, there are more distinct domain types involved in plant-pathogen interaction, such as Receptorlike kinases (RLKs) and Receptor-like proteins involved in innate plant immunity, nucleotide-binding site leucine-rich repeat (NBS-LRR) domains, and plant homeodomain (PHD) finger (Marone et al. 2013;Tang et al. 2017).These genes play a crucial role in plant defence against plant pathogens and have been identified as disease-resistant genes (Garzón et al. 2013;Jiang et al. 2022).These domains were of interest in this study as they are associated with suggestive SNPs with a high effect size; S1_5502835 with an effect size of 38%, S11_38448497 with an effect size of 24%, and 18% for S11_1967299 in scab disease, respectively.The resistant protein prediction and annotation identified these domains to be, Receptor-like kinases (RLKs) and Receptor-like proteins (RLPs) encoding regions showing 75% resistance possibilities, NB-ARC at 99% resistance protein possibilities.PHD finger having a resistant protein possibility prediction of 76% is also involved in plant defence against biotic and abiotic stressors (Guk et al. 2022).
In common beans, disease resistance involves a multifaceted interplay among various factors, creating a complex phenomenon.Epigenetic interactions between genes play a crucial role in conferring resistance to diverse plant pathogens, even as these pathogens continuously evolve to counter plant defence mechanisms (Duffy et al. 2003;Spoel et al. 2007).Recent studies have uncovered the involvement of this complex in regulating various biological processes in plants beyond disease resistance (Bu et al. 2014;Larese et al. 2012;Peng et al. 2018;Umezawa et al. 2013).The presence of domain complexes linked to disease resistance motifs underscores the intricate molecular mechanisms at play.The EPL1 gene which has a predicted resistance possibility of approximately 55% is predicted to be involved in disease resistance indicating the role it plays as a disease resistance motif within their domains.Some other examples of polycomb-like genes that have been shown to play a role in disease resistance in plants include CURLY LEAF and MEDEA.These genes are often associated with the immune response of the plant and help to protect it from pathogens.Mutations in E(Pc), also known as EPL1 genes, result in homeotic phenotypes (Goodrich et al. 1997;Roy et al. 2018).Mutagenesis studies of histone modification factors have demonstrated their ability to convert factors and alter the degree of methylation, serving as mono-, tri-, or dimethyltransferases (Bannister and Kouzarides 2011;Guo et al. 2022).Mutation caused by the SNP on EPL1, changing a methionine to a valine at position 662 of its protein sequence can have a significant impact on gene regulation and expression.This mutation resulted in a change in the amino acid sequence which in turn affected the resistance possibility by a small significant margin of 0.00021796 at the protein sequence.The marginal gain of function of the EPL1 gene was reflected in some common bean accessions with resistance to scab disease as discussed later where scab disease resistant accessions were tagged with the alternate allele that resulted in the amino acid substitution.Enhancers of polycomb play a crucial role in Mutations in enhancer genes can alter the DNA sequence and affect the binding of enhancer proteins to their target sequences, leading to changes in the expression of nearby genes (Jores et al. 2020).In some cases, a mutation in an Epc encoding gene can cause it to lose its function and result in the suppression of gene expression (Matsui et al. 2017).
As our understanding deepens, unraveling the regulatory roles of these complexes may provide valuable insights into enhancing plant disease resistance.It is evident that the presence of domain complexes linked to disease resistance motifs underscores the intricate molecular mechanisms at play, although the complex has been recently shown to be involved in the regulation of a variety of plant biological processes (Bu et al. 2014;Larese et al. 2012;Peng et al. 2018;Umezawa et al. 2013).
The SNP in chromosome one, S1_6231746 is of particular interest due to its association with the gene encoding the ABC transporter (ATP-binding cassette) protein.This gene exhibits a predicted resistance involvement, estimated at 64%.The ABC transporter is a cell membrane pump which was identified to possess a disease resistance mechanism in Trichoderma spp.This mechanism involves shielding against xenobiotic stresses associated with mycotoxins, leading to its upregulation in the presence of mycotoxins.In wheat the protein domain has been linked to disease resistance of the Lr34 and Lr67 gene and have been shown to confer resistance to multiple biotrophic diseases such as rust and powdery mildew through the mechanisms of inhibiting hexose transport (Krattinger et al. 2009;Moore et al. 2015).This is an important protein domain in plants, due to its role in detoxification processes related to microbial toxins in plant cells (Kang et al. 2011;Martinoia et al. 1993;Ruocco et al. 2009).
The DDT and PHD finger protein domain on chromosome 11 gene PHAVU_011G023900g are shown to be directly involved with plant stress tolerance and have been shown to play a role in plant disease resistance (Waziri et al. 2020).They help regulate plant immune responses by controlling the expression of defencerelated genes, and activating plant immunity pathways in response to pathogen attack.The expression of certain PHD genes can also be induced by pathogen-associated molecular patterns (PAMPs), which are recognized by the plant and trigger an immune response (Lai et al. 2020;Pang et al. 2022).Additionally, PHD genes can act as transcriptional regulators, modulating the expression of other genes that are involved in disease resistance (Guk et al. 2022;Wei et al. 2009).AP2/ERF domain-containing transcription factor AP2-1 (ADAPTIN N) tagged to SNP on chromosome eleven S11_7240334, plays a crucial role in the regulation of various developmental processes and stress responses in plants.Studies by Jisha et al. (2015) have shown that the expression of AP2-1 is upregulated in response to various environmental stress conditions, suggesting that it plays an important role in the plant's ability to adapt to biotic and abiotic stress conditions.Additionally, genetic analysis has demonstrated that AP2-1 is involved in the regulation of stress-responsive genes, including those involved in drought tolerance and salt tolerance (Gu et al. 2017;Xie et al. 2022).
The genes associated with the significant SNP, the EPL1 and the ABC transporter genes are found in the same neighbourhood as other resistant genes such as the RPP4 gene encoding resistance to rust disease (Meyer et al. 2009).Mapping of common bean disease resistance was conducted by Garzon and Blair (2014), who created an SSR bean panel and found that the largest number of markers linked to resistant gene homologues in common beans were on chromosomes one, six, four, and eleven, with chromosome one recording the highest number.Camilo López et al. (2003) also identified and mapped several RFLP markers of resistant gene homologues to chromosome one of the common bean, with resistance to anthracnose, angular leaf spot (ALS), and Bean golden yellow mosaic virus (BGYMV).In common beans, the subtelomeres have been known to be host clusters of resistant genes (Chen et al. 2018).Anthracnose, ALS, Powdery mildew and rust resistant genes are mapped on chromosome one of the common bean, particularly the Co-AC locus in a 631 Kbp genomic region in the subtelomeric region of chromosome Pv01 (Gílio et al. 2020).These R genes are usually tightly linked together with other R genes in these cluster regions.In our study, we found the SNPs associated with scab disease resistance were mapped in the lower subtelomeric region of the common bean and the resistant gene homologue the ABC transporter which has been linked to rust and powdery mildew in wheat has also been found within this region.The identification of the EPL1 gene PHD finger, AP2-1 and the ABC transporter as genes associated with scab disease resistance is an indication that the association for resistance to scab as earlier studies reveal the nature by which scab causes disease is by the production of phytotoxin elsinochrome (Chung 2011;Liao and Chung 2008).The target of an enhancer is typically determined by performing functional assays such as ChIP-seq or chromatin conformation capture experiments, or by using predictive bioinformatics algorithms (Furey 2012;Schmidt et al. 2020).Thus, the roles of the EPL1 which have been identified as having significant associations with scab disease resistance in common beans, can be further defined through bioinformatics investigations.

Development of PCR markers
The use of SNP markers visualization for Allele-specific ARMS PCR has shown promise in the genotyping of plants for disease resistance and is a novel approach in plant breeding and genetic research in barley (Chiapparino et al. 2004).This technique has also been effective in the detection and identification of Xanthomonads associated with Pistachio Dieback in Australia (Marefat et al. 2006).Additionally, the Tetra-primer ARMS PCR method has been used for rapid detection and characterization of Plasmopara viticola phenotypes resistant to carboxylic acid amide fungicides (Zhang et al. 2017).These studies demonstrate the versatility and applicability of the ARMS PCR method in the development of an effective marker for scab disease resistance genes.Kompetitive allele-specific PCR (KASP) method is considered superior in terms of cost and time, its implementation can be hindered by the lack of appropriate tools and equipment in many laboratories in developing countries.Therefore, an adaptable technology such as ARMS PCR provides a feasible alternative for researchers.
Two allele-specific primer pairs and one non-allele specific pair were designed to target markers for the gene associated with scab disease resistance and markers in the locus of interest for scab disease resistance.The ARMS primers were designed using several criteria to ensure their specificity and avoid unwanted binding.
The non-allele specific (outer primer) acted as a housekeeping marker for the overall open reading frame for the genes.The inner primers were designed to target the wild-type and the alternate SNP on the allele in a mismatched design.
The EPL1 gene, which was found to be associated with scab disease resistance, was validated against the designed primer targeting these genes on common beans selected as the most resistant and susceptible accessions.The polymorphism observed indicated that the primer targeted the gene mutation on the resistant accession, which showed clear polymorphism against the other accessions used.Screening by amplification using the EPL1 gene marker clearly suggested that the marker could identify some of the common bean accessions resistant to scab disease.The power of detection did not extend to other resistant phenotypes including ADP0354, ADP0526, ADP0551, ADP0717, ADP0529, ADP0739, ADP0211 and ADP0054.Taken together, this highlights the presence of other sources of resistance in the germplasm collection as well as the limitation of this marker to detect other resistance sources, possibly reflecting the phylogenetic divergence of the common bean germplasm in the collection.The reverse inner primer pair was the most distinguishing primer that identified the resistant phenotype through the genetic marker of the EPL1 gene in common beans.
The use of Allele-specific ARMS PCR can provide an effective and adaptable alternative to KASP PCR for genotyping common beans for scab resistance and the development of an effective marker for scab disease resistance genes.The designed ARMS primers showed promising results in targeting the EPL1 gene associated with scab disease resistance in common beans, and further studies can be conducted to validate its use in marker-assisted selection for breeding research.

Conclusions
The results from this study indicated that there was significant phenotypic variability for scab disease reaction on common bean accessions grown under natural infection of Elsinoë phaseoli in western Kenya.This led to the identification of scab disease resistant common bean accessions that could serve as a potential source of resistant genes for genetic improvement breeding programs for scab disease resistance.
Novel genes associated with scab disease resistance were successfully identified on chromosome one of common bean (Phaseolus vulgaris) where the ABC2 and EPL1 genes that are involved in microbial toxins detoxification in plant cells and systemic resistance of common bean against scab caused by Elsinoë phaseoli, respectively.These genes identified could have a significant impact in aiding the breeding and crop improvement effort of common beans for resistance to scab disease and the discovery of more resistant genes.The GWAS approach was successfully used to identify SNPs and putative genes associated with scab resistance using diverse set of common bean accessions.

Fig. 1
Fig. 1 Scab disease symptom progression in common bean.A Healthy common bean plant.(Aα and Aβ): Scab infected plant after three weeks with symptom of corky wart-like white lesions on leaves and stem.B Scab infected common bean at five weeks after sowing.(Bα and Bβ): Folding leaf at the midrib, twisting of the stem.C Dead infected common bean plant.(Cα and Cβ): Plant's death after lesions coalesce to entire common bean plant after eight weeks, corky wart-like lesions on the pod and stem of a common bean plant and mummified pods due to scab infection

Fig. 2 Fig. 3
Fig. 2 Microscopy images of cross-sections of plant tissues infected with Elsinoë phaseoli.A A dense canker caused by dead stem tissues.B The cankers merge with pseudoparenchymatic tissues, where the acervuli arise from dead plant cells colonized by Elsinoë phaseoli.C Elsinoë phaseoli asci containing ascospores within the scab lesion on infected tissue.D The Elsinoë phaseoli pathogen on PDA media after thirty days of growth.E Reverse plate of Elsinoë phaseoli culture on PDA media.F The Ellipsoid microconidia of Elsinoë phaseoli at × 400 magnification

Fig. 4
Fig. 4 Distribution of the severity scores across the growth stages and across the two sites (Butonge and Kakamega)

Fig. 5
Fig. 5 UPGMA clustering by Minkowski distance representing different clusters of common beans reaction to scab disease.Note: Genotypes deemed to be resistant were clustered together in the same clade while the local accession Loc0003 was also clustered with resistant genotypes

Fig. 6
Fig. 6 Quantile-quantile (QQ)-plot of p-values.The Y-axis is the observed negative base 10 logarithm of the p-values, and the X-axis is the expected observed negative base 10 logarithm of the p-values under the assumption that the p-values follow a uniform [0,1] distribution.One outlier which is farthest from the hypothesis line of no association between the SNP and the trait

Table 1
BLUPs and scab disease severity in common bean genotypes across locations MS Mean severity, T/Ha Tons per Hectare, * = No yield data, UM Upper Midland Zone (Location = 7QH8 + WVV), LM = Lower Midland zone (Location = PF78 + 35).Mean score (MS) values with similar letters in the same column are not significantly different.** indicates the accessions were not genotyped.V.

Table 2
GWAS results for scab disease resistance in common beans using a FarmCPU modelThe association table above includes information on the SNP (SNP), its chromosome (Chromosome), position (Position), p-value (P.value), minor allele frequency (maf ), number of observations (nobs), and false discovery rate (FDR) adjusted p-values.The table also provides the effect of each SNP on scab disease resistance in common beans.The rows display the results for each SNP above the minor allele frequency threshold.The SNPs sorted by their p values from smallest to largest