Studying equine genetic disease

Identifying equine disease genes and alleles

In 2006 The Broad Institute of Harvard and the Massachusetts Institute of Technology (MIT) sequenced the genome of the Thoroughbred mare, “Twilight”, and made the equine genome sequence publically available early in 2007. Through the efforts of the Broad Institute and previous work done in the equine genetics community, over 90% of approximately 2.7 billion base pairs of the equine genome sequence were identified1. In 2018, an updated equine genome reference sequence was released that covers approximately 99.7% of the 2.7 billion base pairs of the equine genome2. In the last decade, this equine sequence information has been used to develop powerful tools that allow genetics researchers the opportunity to more effectively study genetic diseases in the horse.

The study of equine genetic disease can be broken down into four basic steps which are outlined below:

  1. Define the phenotype(s) of the disease to be studied.
  2. Determine the genetic architecture of the disease.
  3. Find potential disease-causing alleles.
  4. Collect scientific evidence to demonstrate that this allele(s) is causing disease.

Expand all

1. Define the phenotype(s) of the disease to be studied

The phenotype is a measurable trait(s) or characteristic(s) and can be anything from disease status (affected vs. unaffected), physical properties such as height, or clinical measures such a blood work values or muscle biopsy results. For example, for muscle disease, tying-up can be diagnosed by clinical signs (stiff gait, reluctance to move, pain on palpation of the muscles) and by measuring muscle enzymes in the bloodstream (creatine kinase [CK] and aspartate transaminase [AST]). The underlying cause of tying-up can be diagnosed by muscle biopsy, muscle histopathology, and muscle biochemistry. By using this information, specific muscle disorders can be identified, such as type 1 polysaccharide storage myopathy (PSSM1), myofibrillar myopathy (MFM), or recurrent exertional rhabdomyolysis (RER); different muscle diseases that share some common features.

One of the crucial factors in performing genetic studies is to accurately establish criteria to diagnose an individual with the specific disease (“case” or affected horse), as well as criteria to define “controls” or unaffected horses. In addition to diagnosing an individual with disease (i.e., case vs. control), which is a binary (yes/no) phenotype, phenotypes can also be measured as continuous traits, or traits in which the phenotype varies across a continuum, for example height or muscle enzyme values such as CK and AST mentioned above. For both binary and continuous traits, it is critical that the phenotypes measured accurately reflect the disease of interest and distinguish it from other diseases that may look similar (a “phenocopy”). In addition to defining disease and identifying individuals that are abnormal, it is equally important to define what constitutes a normal individual without disease. Comparing normal and abnormal individuals is required to draw conclusions about the cause of genetic disease.

2. Determine the genetic architecture of the disease

Genetic architecture, in general, refers to how a trait is inherited.  There are four main components of genetic architecture that describe the allele(s) and the total genetic contribution to that trait:

  • The number of genes/alleles that contribute to the trait. Genetic disease can be simple (monogenic “one gene”) traits where a mutation within a single gene that is inherited in a Mendelian pattern (autosomal dominant or autosomal recessive and autosomal or sex-linked,etc.) results in significant disease in the affected individual. Polygenic genetic diseases are caused by the effects of alleles at multiple genes and are often influenced by the environment. Although the inheritance and expression of these later traits is complex, they can be demonstrated to have a genetic component. We now know that most genetic diseases are actually multifactorial or complex, meaning they are caused by a combination of inherited mutations in multiple genes acting together with environmental factors.
  • The effect size of the allele(s). Allele effect size is a confusing concept for most of us. It is easiest to start with the simplest scenario, with what we call a fully penetrant Mendelian dominant or recessive disease. Penetrance can be thought of as the likelihood that an individual with a genotype that causes disease will actually develop the disease. In a fully penetrant dominant disease all individuals with one or two copies of the disease allele will develop disease. In a fully penetrant recessive disease all individuals with two copies of the disease allele will develop disease. In incomplete penetrance, not all individuals with the disease allele (dominant disease) or two copies of the disease allele (recessive disease) will develop the disease phenotype. For example in a recessive disease, 90% penetrance means that 90% of individuals with two copies of the disease allele will develop clinical disease. In single gene diseases there is always a single allele of large effect that can cause disease without the contribution of alleles from other genes.

In polygenic (i.e., many gene) diseases combinations of alleles in different genes each contribute to the risk that an individual will develop disease. In a polygenic disease, the affects of alleles in different genes sum together to cause risk. An individual’s total risk then results from contributions of all alleles. Polygenic disease risk is often depicted as an equation:

Risk of disease = affect of risk allele(s) in gene 1 + affect of risk allele(s) in gene 2 + affect of risk allele(s) in gene 3 + …

The scenario of multiple genes and alleles contributing to disease can be as simple as 2 genes, where one is the primary disease causing gene and its causative allele, along with an allele in the second gene that modifies the disease in some way (e.g., modification of the PSSM1 phenotype caused by an allele in the GYS1 gene by an allele in the RYR1 gene), to a scenario where many hundreds of genes and alleles contribute to the disease phenotype, as is the case with recurrent exertional rhabdomyolysis (RER). In polygenic diseases the alleles that increase an animal’s risk of developing disease are often referred to as “risk alleles.”

  • The population frequency of the allele(s). Alleles that cause single monogenic disease are typically rare in the population for two reasons. First, an animal with that particular disease may not survive long enough to produce offspring (or will produce offspring at a lower rate than animal’s without the disease). Second, controlled breeding programs can be designed to breed away from potential parents known, or suspected of, having these monogenic disease alleles. In both scenarios the underlying concept is genetic selection. In the former scenario, the diseased animal is less “fit” and thus less likely to produce offspring (natural selection), in the later scenario, humans deem the horse less desirable due to having disease or carrying a disease allele and choose not to breed the horse (artificial selection). In both cases, the selective pressures over generations result in the disease allele being less common in the population.

In contrast, the alleles that contribute to a polygenic disease are often quite common in the population. Because a single risk allele does not cause disease on its own, there are typically not natural or artificial selective pressures acting to decrease the frequency of that allele, so these alleles can become common. Further, because complex genetic diseases are due to the contribution of dozens to hundreds of alleles it can be very difficult to effectively breed away from these diseases.

  • Total genetic contribution to a complex disease (heritability).For any complex or multifactorial disease the animal’s phenotype is the result of its genotype(s) and its environment. Geneticist’s capture this idea in a simple equation:

Phenotype = genotype + environment   or   P = G + E

In this equation the animal’s genotype is the sum of all the alleles that contribute to the trait (from one allele to thousands depending on the trait), and the environment is the sum of all factors in the environment that might affect the phenotype (diet, management, treatments, etc.). For a polygenic disease, G in this equation can be substituted for the equation presented earlier:

Risk of disease = affect of risk allele(s) in gene 1 + affect of risk allele(s) in gene 2 + affect of risk allele(s) in gene 3 + … + environment

For any given disease the relative contributions of genetics and environment can vary. In fully penetrant dominant or recessive genetic diseases with no environmental contribution P = G. In complex genetic diseases, the contributions of genotype and environment to the phenotype can vary. In some diseases the genetic contribution (G) is high, while in others it is very low. The proportion of the phenotypic variability (P) that can be attributed to genetics (G) is called the heritability of that phenotype. For simple, fully penetrant genetic diseases the heritability is 100%. For a complex trait heritability can be estimated. For example, the heritability of RER in horses is between 0.34 and 0.45. In other words 34-45% of RER risk is due to the horse’s genetics.

Understanding the genetic architecture, in particular, the relative contribution of genetics to a given disease is critical because this determines how easy (or difficult) it will be to identify the genes and alleles causing the disease, and how useful a genetic test is likely to be.

3. Find potential disease causing alleles

Identification of disease mutations has traditionally been accomplished by a candidate gene approach alone, or in combination with genome mapping.

  • Candidate gene approach. Prior to the development of equine genome maps, the only means to identify genetic diseases in horses was to find a similar condition in other species for which the genetic basis was already known. This comparative candidate gene approach assumes that similar diseases involved the same genes, and involves sequencing the suspect gene in both normal and affected individuals to determine by comparison if a detrimental mutation occurs in the affected individuals. This initial approach was used successfully for hyperkalemic periodic paralysis (HYPP)3,4, severe combined immunodeficiency (SCID) 5, overo lethal white syndrome (OWLS)6, junctional epidermolysis bulosa (JEB) in Belgians7 and Saddlebreds8, glycogen branching enzyme deficiency (GBED) 9 and malignant hyperthermia (MH)10 in horses.
  • Genome mapping. Although candidate gene approaches are powerful and relatively rapid, they rely upon correct identification of a candidate gene. Identification of a candidate is difficult when no homologous disease is known, biochemical evidence is lacking, or potential candidates based of phenotype have been ruled out. In these scenarios, genetic mapping approaches are used to identify “positional candidate genes”. Genetic mapping approaches use the inheritance patterns of DNA markers to identify the region of the genome that must contain the disease-causing gene. Once that genome segment is identified the reference genome map is used to search for suitable “positional candidate genes” in that genome segment to sequence for discovery of the causative mutation(s).

To perform genome mapping, DNA from a number of horses that have the specific disease phenotype (cases), and as well as a number of horses certain to be free of the disease (controls), is required. The number of cases and controls required varies between breeds and varies depending on whether a single gene or multiple genes are believed to affect the disease phenotype. Although these approaches can be time-consuming and expensive, an important benefit is the strong support for disease involvement provided by large numbers of accurately phenotyped cases and controls.

4. Collect scientific evidence to demonstrate that the allele(s) is causing disease

Once alleles or candidate mutations are identified they need to be critically evaluated to determine if the allele(s) is truly a disease causing mutation. Multiple lines of evidence and experimentation and validation are required.

Why is this necessary? Even in human medicine, attention has recently been drawn to the high rate of false-positive reporting of disease causing alleles. For example, analysis of 406 reported disease mutations in humans demonstrated that 27% lacked compelling evidence that the allele was disease-causing11. Interpretation of the clinical consequences of alleles discovered solely by WGS is confounded by the fact that all normal healthy individuals carry potential disease-associated alleles, making it difficult to determine the likely significance of a potential mutation identified in any single patient. In fact, all healthy adult humans on average carry from 281–515 missense mutations, including 40–110 alleles classified as disease-causing, and 40–85 homozygous mutations predicted to be highly damaging12. Our analysis of whole genome sequences from 230 normal horse genomes found an average of 188-213 alleles per horse predicted to be highly damaging. These data together have led to the recognition that just finding a disruption of a gene is insufficient evidence to prove that the mutation is causing disease.

In an attempt to limit the false assignment of disease causing status (pathogenicity) to alleles, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology have developed comprehensive joint standards and guidelines for assignment of pathogenicity. These ACMG guidelines use 28 criteria to classify alleles as pathogenic (i.e. disease-causing), likely pathogenic, benign, likely benign, or of unknown significance 13 based on genetic, informatic and experimental support for the candidate gene and the specific allele (Box 1)13–15.

Box 1. Types of evidence supporting candidate genes and alleles as disease causing
Type of evidence Evidence for a candidate gene Evidence for functional allele
  • Gene identified as a “positional candidate” by genome mapping (e.g., GWAS)
  • Allele frequency is significantly higher in cases vs. controls
  • Allele is co-inherited with disease status within affected families
  • Allele is found at a low frequency (< 5%) in random population cohorts
  • Alleles in the gene are a common mechanism of disease
  • Gene is expressed in tissues relevant to the disease of interest
  • Allele site is evolutionary conserved, given the assumption that high conservation across species is evidence that changes at that site are poorly tolerated
  • Nonsense, frame-shift, gain or loss of splice site, or exon deletion
  • Allele occurs at a location predicted to cause functional disruption of protein
  • Gene product interacts with proteins biochemically implicated in disease phenotype
  • Gene product has a biochemical function consistent with disease phenotype
  • Gene has altered in expression in diseased individuals
  • Gene and/or gene product function is altered in diseased individuals
  • Other species with disruption of the gene show the disease phenotype
  • Allele significantly alters levels, splicing or normal function of gene product
  • Introducing the allele into an animal model results in the disease phenotype

What about using pedigrees?

One of the idiosyncrasies of working with equine genetics as compared to human genetics is the impact of popular sires. Genetic selection pressure within a breed often results in highly prolific sires predominating in the pedigrees of many individuals. In fact, many horses within a breed often share a common ancestor within 5 to 9 generations. While this concentrates many beneficial traits related to performance or appearance, it may inadvertently spread genetic diseases farther and faster than would be seen in human populations. It also means that a single mutation responsible for a disease in horses, derived from the founder, is typically seen in all individuals of that breed (or closely-related breeds) with that same disease. This is not the case in human populations.  As an example, there are over 24 mutations in the sodium channel gene that cause HYPP in thousands of humans, whereas only one shared mutation exists in the hundreds of thousands of horses with HYPP.

Breeders often want to know whether a genetic disorder runs in a certain line of horses.  However, pedigree analysis alone cannot make this determination, as it actually only identifies common sire lines and does not provide concrete evidence that a disorder is inherited. As a cautionary tale, in researching PSSM1 we found that two particular Quarter Horse stallions were common on both the sire’s and dam’s side of pedigrees of 22 PSSM1-affected Quarter Horses and published a paper naming these two stallions only as A and B in 1996. Without the certainty of a genetic test, we did not reveal the identity of these stallions which at times put us at odds with inquiring breeders. When we did discover the genetic basis for PSSM1 in 2008, additional research showed that the causative dominant mutation likely originated at least 1,200 years ago!  Furthermore, we found that the identical mutation is present and very likely causing PSSM1 in more than 20 equine breeds, not just the lineage of the two Quarter horse stallions. Revealing the names of stallions A and B would have left breeders with a false sense of security in breeding to stallions not present in those original limited pedigrees, since 10% of all Quarter Horses are now known to have the PSSM1 mutation. In addition, this would have implied that breeding to descendants of stallion A and B was inadvisable, yet approximately 50% of stallion A and B’s progeny could have been free of the PSSM1 mutation, since it is inherited in a dominant fashion and each stallion likely had a copy of both the normal and mutant allele.

Expand all


  1. Wade CM, Giulotto E, Sigurdsson S, et al. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science (80- ). 2009;326(5954):865-867.
  2. Kalbfleisch TS, Rice E, DePriest MS, et al. EquCab3, an Updated Reference Genome for the Domestic Horse. bioRxiv. April 2018:306928. doi:10.1101/306928.
  3. Spier SJ, Carlson GP, Harrold D, Bowling A, Byrns G, Bernoco D. Genetic study of hyperkalemic periodic paralysis in horses. J Am VetMedAssoc. 1993;202(6):933-937.
  4. Bowling AT, Byrns G, Spier S. Evidence for a single pedigree source of the hyperkalemic periodic paralysis susceptibility gene in quarter horses. Anim Genet. 1996;27(4):279-281.
  5. Shin EK, Perryman LE, Meek K. A kinase-negative mutation of DNA-PK(CS) in equine SCID results in defective coding and signal joint formation. J Immunol. 1997;158(8):3565-3569. Accessed July 31, 2018.
  6. Santschi EM, Purdy AK, Valberg SJ, Vrotsos PD, Kaese H, Mickelson JR. Endothelin receptor B polymorphism associated with lethal white foal syndrome in horses. MammGenome. 1998;9(4):306-309.
  7. J.D. B, Millon L V, Dileanis S, et al. Junctional Epidermolysis Bullosa in Belgian Draft Horses. In: Vol 2003.
  8. Graves KT, Henney PJ, Ennis RB. Partial deletion of the LAMA3 gene is responsible for hereditary junctional epidermolysis bullosa in the American Saddlebred Horse. Anim Genet. 2009;40(1):35-41.
  9. Ward TL, Valberg SJ, Adelson DL, Abbey CA, Binns MM, Mickelson JR. Glycogen branching enzyme (GBE1) mutation causing equine glycogen storage disease IV. MammGenome. 2004;15(7):570-577.
  10. Aleman M, Riehl J, Aldridge BM, LeCouteur RA, Stott JL, Pessah IN. Association of a mutation in the ryanodine receptor 1 gene with equine malignant hyperthermia. Muscle Nerve. 2004;30(3):356-365.
  11. Bell CJ, Dinwiddie DL, Miller NA, et al. Carrier Testing for Severe Childhood Recessive Diseases by Next-Generation Sequencing. Sci Transl Med. 2011;3(65):65ra4-65ra4. doi:10.1126/scitranslmed.3001756.
  12. Xue Y, Chen Y, Ayub Q, et al. Deleterious-and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet. 2012;91(6):1022-1032.
  13. Macarthur DG, Manolio TA, Dimmock DP, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508. doi:10.1038/nature13127.
  14. Goldstein DB, Allen A, Keebler J, et al. Sequencing studies in human genetics: design and interpretation. Nat Rev Genet. 2013;14(7):460-470. doi:10.1038/nrg3455.
  15. Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. 2015. doi:10.1038/gim.2015.30.