Jason D. Gillman1, Ashley Tetlow2, Katherine Hagely2, Jeffery G. Boersma3, Andrea Cardinal4

Jason D. Gillman1, Ashley Tetlow2, Katherine Hagely2, Jeffery G. Boersma3, Andrea Cardinal4

Identification of the molecular genetic basis of the low palmitic acid seed oil trait in soybean mutant line RG3 and association analysis of molecular markers with elevated seed stearic acid and reduced seed palmitic acid

Jason D. Gillman1, Ashley Tetlow2, Katherine Hagely2, Jeffery G. Boersma3, Andrea Cardinal4, Istvan Rajcan3, and Kristin Bilyeu1*

1 USDA-ARS, Plant Genetics Research Unit, 110 Waters Hall, Univ. of Missouri, Columbia, MO 65211;

2University of Missouri, Division of Plant Sciences, 110 Waters Hall, Columbia, MO 65211;

3University of Guelph, Department of Plant Agriculture, 50 Stone Rd. E., Guelph, Ontario, Canada.

4North Carolina State University, Department of Crop Science, 1244 Williams Hall, Raleigh, NC 27695-7620

*Corresponding author ()

Abbreviations used:

KASIII, beta-ketoacyl-[acyl-carrier-protein] synthase III; SACPD-C, Stearoyl-acyl carrier protein desaturase isoform C; FATB1a, Fatty Acyl-ACP thioesterase B isoform 1a; RIL, Recombinant Inbred Line

Keywords: soybean, oil improvement, palmitic acid, stearic acid, mutagenesis

Abstract:

The fatty acid composition of vegetable oil is becoming increasingly critical for the ultimate functionality and utilization in foods and industrial products. Partial chemical hydrogenation of soybean oil increases oxidative stability and shelf life but also results in the introduction of trans fats as an unavoidable byproduct. Due to mandatory labeling of consumer products containing trans fats, conventional soybean oil has lost the ability to deliver the most appropriate economical functionality and oxidative stability, particularly for baking applications. Genetic improvement of the fatty acid profile of soybean oil is one method to meet these new requirements for oil feedstocks. In this report, we characterize three mutant genetic loci controlling the saturated fatty acid content of soybean oil: two genes additively reduce palmitic acid content (fap1 and fap3-ug), and one gene independently elevates stearic acid content (fas). We identified a new null allele of fap3-ug/GmFATB1A (derived from line ELLP2) present in line RG3. The splicing defect mutation in a beta-ketoacyl-[acyl-carrier-protein] synthase III (KASIII) candidate gene located in the region mapped to fap1 derived originally from EMS mutant line C1726 (Cardinal et al. 2014) was also present in line RG3. We also utilized the elevated stearic acid line RG7, which has previously been shown to contain novel mutant fas/SACPD-C alleles encoding stearoyl-acyl carrier protein desaturase (Boersma et al. 2012). Molecular marker assays have been developed to track these causative mutations and understand their contributions to seed oil fatty acid profiles in a recombinant inbred line population segregating for fap1, fap3-ug, and fas alleles.

Introduction

Soybean seed oil is the most widely utilized edible oil consumed in the United States (~66% of total edible fats, compiled from USDA statistics). Conventional soybean oil consists primarily of triacylglycerols, which contain five principle fatty acid species: palmitic acid (C16:0 ~100 g kg-1), stearic acid (C18:0, ~40 g kg-1), oleic acid (C18:1, ~220 g kg-1), linoleic acid (C18:2, ~540 g kg-1) and linolenic acid (C18:3, ~100 g kg-1) (Wilson, 2004). The majority of soybean oil is used for salad/cooking oil and frying/baking, representing ~53% and ~21% of soybean oil utilization, respectively ( compiled from USDA statistics, accessed online 11/15/2013). However, the greatest dietary source of trans fats (before mandatory labeling) was baked goods containing partially hydrogenated vegetable oils (United States Department of Health And Human Services, 2005), and liquid oils are not ideal for these applications. Alternatives to hydrogenation include the use of palmitic acid-rich tropical oils. However, dietary consumption of most saturated fats, such as palmitic acid from tropical oils, elevates low density lipoprotein (LDL) cholesterol levels in human blood plasma (Khosla and Hayes, 1993). Elevated LDL cholesterol levels are directly correlated with increased risk of coronary heart disease (Angelantonio et al., 2009). The American Heart Association recommends that individuals limit their daily intake of saturated fat to <7% of total daily calories ( accessed 11/15/2013). Although stearic acid is a fully saturated fatty acid, it is considered to be "heart health neutral" as stearic acid intake does not raise LDL concentrations in blood serum (Yu et al., 1995) and the replacement of saturated fats in controlled diets with stearic acid has been demonstrated to result in positive effects on the level of LDL cholesterol levels in blood serum (Hunter et al., 2010).

These findings have generated two complimentary goals in soybean oil breeding: the reduction of palmitic acid content for coronary heart health, and the elevation of stearic acid content for increased oxidative stability and utility for baked goods. The most successful strategy to enhance soybean oil composition has involved the development of mutant lines via Ethyl Methane Sulphonate (EMS) treatment of seeds (recently reviewed in Fehr, 2007 and Gillman and Bilyeu, 2012). This method has yielded numerous mutant lines with altered fatty acid compositions, and induced mutants with either lowered palmitic acid or increased stearic acid have been generated by several independent researchers (recently reviewed in Gillman and Bilyeu, 2012).

Genetic studies have definitively identified at least two independent mutant loci, fap1 and fap3, which result in reduction of palmitic acid content to ~80-90 g kg-1 or ~70-80 g kg-1, respectively (Erickson et al., 1988; Schnebly et al., 1994). A third locus, sop1, appears to be non-allelic to fap1 (Kinoshita, et al., 1998), though the allelic status of sop1 in regard to fap3 remains unclear. By combining fap1 and fap3 mutations, along with unidentified minor modifier genes, lines with <40 g kg-1 palmitic acid content have been developed (Fehr, 2007). The genetic basis underlying the fap3 locus has been shown to be due to loss of function mutations for a 16:0-ACP thioesterase gene (FATB1a/ Glyma05g08060) (Cardinal et al., 2007; De Vries et al., 2011). Recently, the reduced palmitic acid phenotype due to fap1 has been shown to map to a locus on the distal end of soybean LG K/ Gm09 and is highly correlated with a splice site defect affecting a beta-ketoacyl-[acyl-carrier-protein] synthase III gene (KAS III/Glyma09g41380) (Cardinal et al., 2014).

The molecular basis for the elevated stearic acid trait in EMS-induced mutants RG7 and RG8 has also recently been determined, and is due to recessive mutations affecting the Stearoyl Acyl Carrier Protein Desaturase gene, isoform C (SACPD-C/Glyma14g27990, Boersma et al., 2012). When homozygous, loss of function mutations result in elevation of seed stearic acid levels similar to other previously identified recessive sacpd-c mutations, although not to the degree noted in line A6, which features a deletion of the entire SACPD-C gene (Zhang et al., 2008). RG7 possesses a nonsense mutation which results in a premature stop codon (W64*) in the SACPD-C transcript and a truncated protein, whereas RG8 was found to contain a missense mutation which results in the substitution of a leucine residue for an ancestrally invariant proline residue (P237L) (Boersma et al., 2012).

We characterized at the molecular level a previously developed recombinant inbred line (RIL) population from a cross between a low palmitic acid line RG3 (fap1, fap3-ug) and the high stearic acid line RG7 (fas). RG3 is a mutant line which features very low palmitic acid content (~45 g kg-1) (Primomo et al., 2002), due to transgressive segregation derived from a cross between two independent low palmitic acid EMS-derived mutant lines: C1726 (fap1, ~86 g kg-1 palmitic acid) (Erickson et al., 1988) and ELLP2 (fap3-ug, ~70 g kg-1 palmitic acid) (Primomo et al., 2002; Stosjin et al., 1998). RG7 (fas) is an EMS mutant derived from ‘Elgin 87’ found by phenotypic selection for elevated stearic acid content (Primomo et al., 2002). Thus, the three independent mutant loci affecting fatty acid profiles (fap1, fap3-ug, and fas) were expected to have segregated independently in this RIL population.

The objectives of this work were: 1) to determine the molecular genetic basis for the fap3-ug low palmitic acid trait in RG3 derived from ELLP2; 2) To develop efficient, perfect molecular marker assays for the relevant genes segregating in the RG3 x RG7 RIL population; and 3) to use the perfect molecular markers to quantify the phenotypic contributions and effects of specific mutant alleles in a cross between the low palmitic acid line RG3 and an elevated stearic acid line RG7.

Materials and Methods:

Fatty Acid Phenotypic Analysis
Fatty acid analysis was performed on individual seed chips as previously described (Beuselinck et al., 2006; Bilyeu et al., 2005). The remainder of the each seed was frozen and ground with a mortar and pestle and a portion was used for DNA isolation. This allowed sorting of each individual seed into a genotypic category.

DNA isolation and Genomic Amplification

Genomic DNA was isolated from ~20-30 mg seed tissue with the DNeasy Plant Mini Kit (Qiagen, Inc., Valencia, CA) and used at ~5-50 nanograms per PCR amplification or SimpleProbe assay. Gene-specific primer pairs (Supplementary Table 1) were developed using the Primer3Plus software ( Amplification primer pairs were designed to contain at least two gene-specific SNP differences when compared to homeologous sequence. All primers were blasted against the unmasked Glycine max Williams 82 genomic sequence ( with an E-value cut off of 10.0 to ensure gene specificity. PCR amplification was performed using Ex taq according to manufacturer's recommendations (Takara, Otsu, Shiga, Japan) in a PTC-200 thermocycler (MJ Research/Bio-Rad, Hercules, CA), with the following conditions: 95°C for an initial 5 minute denaturation, followed by 40 cycles of 95°C for 30 seconds, followed by 60°C for 30 seconds, and an extension step at 72°C for 1 minute/kilobase of target sequence. PCR products were run on 1% agarose gels to ensure appropriate size and purified using a QIAquick PCR purification kit (Qiagen). Following purification, products were Sanger sequenced at the DNA Core Facility at the University of Missouri-Columbia.

Sequence Evaluation

Sequencing traces were imported into ContigExpress (Invitrogen, Carlsbad, CA), trimmed, aligned, and manually evaluated for disagreements between EMS mutant contig sequences and the ‘Williams 82’ (W82) reference [(Schmutz et al., 2010), accessed at and the appropriate cultivars used for mutagenesis (‘Elgin 87’, ‘Century’). Putative single nucleotide polymorphisms (SNPs) were verified by at least two independent PCR reactions. Sequences were aligned using the AlignX software (Invitrogen).

Molecular Marker Development

In order to develop gene-specific primer pairs for use with a SimpleProbe, sequences corresponding to ~500 bps surrounding each mutation (SACPD-C/ Glyma14g27990, FATB1a/Glyma05g08060 and KASIII/ Glyma09g41380) were aligned with their appropriate homeologs, as identified using BLAST searches of the soybean genome ( Alignments were built using the AlignX software (Invitrogen), and primers of 20-28 bps in size were manually designed with Tm of ~60°C (Supplementary Table 1). PCR reactions contained template, buffer (40mM Tricine KOH (pH 8.0), 16mM KCl, 3.5mM MgCl2, 3.75 µg mL-1BSA, 200 µM dNTPs), 10% (v/v) DMSO, 0.5 µM of each primer, and 0.2X Titanium Taq polymerase (BD Biosciences, Palo Alto, CA, USA). Genomic DNA was used at ~20-50 ng per PCR amplification. PCR products were analyzed by gel electrophoresis on 1% gels to ensure specific amplification and purified, Sanger sequenced, and analyzed as described above.

Design of SimpleProbe Assays

SimpleProbe assays, based upon the disassociation kinetics of SimpleProbe oligonucleotides (Roche Applied Sciences, Indianapolis, IN) were designed using the Lightcycler Probe Design Software, version 1 (Roche Applied Sciences) to be exactly complimentary to the 'Williams 82' reference sequence ( SimpleProbes were purchased from Roche Applied Sciences.

SimpleProbe reactions each contained three primers, with SPC corresponding to the proprietary SimpleProbe quencher sequence (Supplementary Table 2).

SimpleProbe Assay Conditions

All SimpleProbe assay asymmetric PCR reactions contained DNA template, buffer (40mM Tricine KOH (pH 8.0), 16mM KCl, 3.5mM MgCl2, 3.75 µg mL-1BSA, 200 µM dNTPs), 10% (v/v) DMSO, 0.5 µM of the primer corresponding to the amplified DNA strand complimentary to the probe, 0.1 µM of the primer from the same sense strand as the SimpleProbe, 0.2µM SimpleProbe and 0.2X Titanium Taq polymerase (BD Biosciences). SimpleProbe genotyping reactions were carried out using a Lightcycler 480 II (Roche Applied Science) with the following conditions: 95°C for an initial 5 minute denaturation, followed by 45 cycles of 95°C for 30 seconds 60°C for 30 seconds, and 72°C for 30 seconds. A negative control was included to verify that no genomic contamination of stocks was present. Following asymmetric PCR, melting curve analysis was performed on Lightcycler 480 II, using the following conditions: reactions were heated to 95°C for 5 minutes, followed by a two minute hold at the lowest temperature to be evaluated by melting curve. Melting curve analysis was carried out with 10 readings collected per 1°C, and covered the following ranges: 45°C-70° for SACPD-C/RG7, 50-75°C for KASIII/C1726, and 50-70°C for RG3/FATB1a.

RIL Population Development

A recombinant Inbred Line (RIL) population was developed at the University of Guelph from the cross between the low palmitic soybean line, RG3 (fap1, fap3-ug) and the high stearic acid line RG7 (fas). RG3 is a mutant line that features very low palmitic acid content (~45 g kg-1, (Primomo et al., 2002) which was derived from a cross between two independent low palmitic acid EMS-derived mutant lines: C1726 (fap1, ~86 g kg-1 palmitic acid) (Erickson et al., 1988) and ELLP2 (fap3-ug, ~70 g kg-1 palmitic acid) (Primomo et al., 2002; Stosjin et al., 1998). RG7 (fas) is an EMS mutant derived from Elgin 87 found by phenotypic selection for elevated stearic acid content (Primomo et al., 2002). The cross was made in the growth room of the Crop Science building at the University of Guelph in 1998. The F1 and F2 populations were also grown in the growth room in 1999, after which they were advanced using single seed descent in the field at the Woodstock Research Station to F3 in 2000 and F4 in 2001. The F4:5 seeds, which had been kept in cold storage at the Elora Research Station, University of Guelph, were provided to the USDA-ARS group at Columbia, MO, to be used in this study.

Statistical Analysis

After genotyping one individual seed from a RIL, fatty acid data for the same individual seed was sorted into the eight possible homozygote genotypic combinations of FATB1a/ fap3-ug, KASIII/ fap1 and SACPD-C/fas. The small number of heterozygote RIL lines were excluded from our analysis. Each genotypic combination means were compared by ANOVA/Tukey’s HSD test set to a threshold of α=0.01, using the software package JMP version 9.

Results and Discussion

Analysis of the relative contribution of FATB1a and C1726 fap1 alleles to seed palmitic acid levels

The low palmitic acid phenotype for two mutant lines with independent fap3 mutations, N79-2077-12 and A22, have been determined to be due to lesions in the gene encoding FATB1a (Cardinal et al., 2007). We reasoned that the reduced palmitic acid phenotype in EMS mutant line ELLP2 could be due to an independent loss-of-function mutation affecting FATB1a. Sequencing of genomic DNA from RG3 revealed a nonsense mutation affecting FATB1a (A430T, relative to start codon, within exon 1) which results in the conversion of residue 144 from an arginine to a premature stop codon (R144*, Figure 1). In order to track the segregating alleles of this novel fatb1a mutant allele, we developed a SimpleProbe molecular marker assay to distinguish the genotypes containing the R144* fatb1a allele (Figure 2).

The C1726 fap1 allele has been mapped to the distal arm of LG K/Gm09 and a candidate gene mutation was identified in a beta-ketoacyl-[acyl-carrier-protein] synthase III gene (KASIII, Glyma09g41380, Cardinal et al. 2014). We determined the same splice site mutation is present in line RG3 (G174A relative to start codon, Figure 3). This mutation was absent in RG7, ‘Elgin 87’, ‘Century’ and ‘Williams 82’ (data not shown). Moreover, two additional independent low palmitic acid soybean breeding lines derived by phenotypic selection from crosses with C1726 also contained the same identical splice site mutation in Glyma09g41380 (germplasm lines SS03-2564, University of Missouri and M03-297033, University of Minnesota). Collectively these results support the conclusion that the splice site mutation in Glyma09g41380 is causative for the fap1 reduction in seed palmitic acid content (Cardinal et al., 2014). We developed a SimpleProbe assay for the fap1 splice site mutation in Glyma09g41380/KAS III (Figure 2 and 3).

We had previously developed a SimpleProbe assay for the sacpd-c W64* nonsense mutation (Boersma et al., 2012) responsible for the elevated stearic acid phenotype from RG7 (Figure 2).

Knowledge of the molecular details of these three alleles and the molecular marker assays allowed us to assess and compare the effect and interaction of mutant alleles on fatty acid composition for the eight genotypic classes recovered for the three genes in the RIL population (Figure 4, full details in Table 1). For clarity, the very small number of residual heterozygote samples remaining in the RIL population were not included in the analysis. Wild type alleles encoding presumably functional enzymes are referred to as WT, while mutant alleles are designated by the origin line, rg3 for the fatb1a R144* alleles, rg7 for the SACPD-C W64* alleles, and C1726 for the fap1 alleles (Figure 4, Table 1). Because the three mutant loci (fap1, fap3, and fas) are present on different chromosomes (Gm05, Gm09, and Gm14) independent assortment was expected.

Analysis of the palmitic acid phenotype in the RIL population

The presence of the novel fatb1a allele from RG3 decreased palmitic acid from an average of 122 ±3 to 66±7 g kg-1 (Figure 4 and Table 1). The fap1 locus, as detected by the C1726 KASIII splice site defect marker, resulted in a slightly smaller decrease in palmitic acid from an average of 122±3 to 86±2 g kg-1 (Figure 4 and Table 1). When both the RG3 mutant fatb1a R144* and the C1726 fap1 alleles were present, we noted an additive effect upon the palmitic acid reduction, resulting in 41±3 g kg-1 when associated with the RG7 sacpd-c W64*mutation, or 44±7 g kg-1 without the sacpd-c W64*mutation, but this difference was not statistically significant (Figure 4). The presence of the mutant sacpd-c W64*allele alone also results in a slight, but not statistically significant, decrease from 122±3 to 110±6 g kg-1 palmitic acid (Figure 4, Table 1).

Analysis of the stearic acid phenotype in the RIL population

The presence of the RG7 sacpd-c W64*mutation resulted in an average elevation of stearic acid levels from 34±6 to 85±20 g kg-1 (Figure 4, Table 1). This was considerably lower than the previously reported level of ~120 g kg-1 for RG7 (Primomo et al., 2002). We noted the majority of sacpd-c W64* samples showed stearic acid levels of

78±9 g kg-1 (n=27), however five samples displayed higher levels (above 10 g kg-1 with a maximum of 128 g kg-1, Supplementary Figure 1). Although this data set is limited in size, it is suggestive of a two gene model (Dubeck et al. 1989; Boersma et al. 2012). Neither the fap1 locus from C1726, nor the fatb1a R144* allele from ELLP2 had a significant effect on stearic acid levels (Figure 4, Table 1). We also noted a negative correlation between increased stearic acid phenotype due to the presence of the sacpd-c W64* allele and oleic acid content. In contrast, there was no statistically significant difference in the levels of linolenic acid amongst any of the lines examined in this study (Figure 4, Table 1).