Tue, 07/08/2008 - 19:00
Genome-wide Linkage and Genome-wide Association [ndash] Can they be Reconciled? Bertram Müller-Myhsok Max-Planck Institute of Psychiatry, Munich Genome-wide association has become a major success story in the very recent past. It has altered the standing that association studies have in the field, and has delivered remarkably well replicating results. Contrasting the results with genome-wide linkage studies it is apparent that the overlap between the two approaches is very small. I will try to explore this question and come up with a rationale what to expect and what not to expect from a comparison of the results of the two approaches. Detection, Imputation and Association Analysis of Small Deletions and Null-alleles on Oligonucleotide Arrays L. Franke1,3, Carolien G.F. de Kovel1, Y.S. Aulchenko2, G. Trynka3, A. Zhernakova1, G. Heap4, H.M. Blauw5, L.H. van den Berg5, R. Ophoff1,6, P. Deloukas7, D.A. van Heel4, C. Wijmenga1,3 1Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, Utrecht, the Netherlands, 2Department of Epidemiology & Biostatistics, Erasmus MC Rotterdam, the Netherlands, 3Genetics Department, University Medical Centre Groningen and University of Groningen, Groningen, the Netherlands, 4Centre for Gastroenterology, Institute of Cell and Molecular Science, Queen Mary, University of London, London, UK, 5Department of Neurology, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, Utrecht, the Netherlands, 6Center for Neurobehavioral Genetics, David Geffen School of Medicine, University of California, Los Angeles, USA, 7Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK Keywords: copy number variation, deletion, single nucleotide polymorphism, linkage disequilibrium, imputation Copy number variation (CNV) is a major contributor to human genetic variation. Recently, CNV associations with human disease have been reported. Many genome-wide association (GWA) studies in complex diseases have been performed using sets of biallelic single nucleotide polymorphisms (SNPs), but the available CNV methods are still limited. We present a new method (TriTyper) that can infer genotypes in case-control datasets for deletion CNVs, or SNPs with an extra, untyped allele at a high-resolution single SNP level. By accounting for linkage disequilibrium (LD), as well as intensity data, calling accuracy is improved. Analysis of 3,102 unrelated Caucasian individuals, genotyped using Illumina Infinium HumanHap300 and HumanHap550 BeadChips, resulted in the identification of 1,880 SNPs with a common untyped allele that are in strong LD with neighboring biallelic SNPs. Simulations indicate our method has superior power to detect associations compared to biallelic SNPs that are in LD with these SNPs, yet without increasing Type I errors, as shown in a GWA analysis in celiac disease. Genotypes for 1,204 triallelic SNPs could be fully imputed, using only biallelic genotype calls, permitting association analysis of these SNPs in many published datasets. We estimate that 682 of the 1,655 unique loci reflect deletions; this is on average 99 deletions per individual, four times more than detected by other methods. Whilst the identified loci are strongly enriched for known deletions, 61% have not been reported before. Genes overlapping with these loci more often have paralogs (P = 0.006) and biologically interact with fewer genes than expected (P = 0.004). Analysis of population structure and genetic matching in European samples using genome-wide marker sets M. Nothnagel1, Timothy T. Lu1, O.L. Grueco2, O. Junge1, S. Freitag-Wolf1, A. Caliebe1, M. Kayser2, M. Krawczak1 1Christian Albrechts University, University Hospital Schleswig-Holstein, Institute of Medical Informatics and Statistics, Brunswiker Str. 10, 24105 Kiel, Germany, 2Department of Forensic Molecular Biology, Erasmus University Medical Center Rotterdam, the Netherlands Population stratification is known to be a potential confounder in genetic association studies. Genetic matching, e.g. between cases and controls, by using a large number of genetic markers can prevent systematic differences in the ancestry of phenotypic groups which cause a bias in the analysis. Here, we compare the genetic structure of many European samples by using genome-wide single nucleotide polymorphisms (SNP) data. Furthermore, we investigate if a small number of ancestry-sensitive markers (ASM) are sufficient to allow a genetic matching in European sample sets with the same accuracy as the complete, genome-wide marker set. Our results indicate that, besides a small number of highly informative markers, the great majority of markers contain only little information for matching and a large number of markers are required for reliable matching within Europe. A score statistic for genetic association given linkage J.J. Houwing-Duistermaat1, H.W. Uh1, R. van Minkelen2, M.C. de Visser2 1Dept of Medical Statistics and Bioinformatics, LUMC, Leiden, the Netherlands 2Dept of Hematology, LUMC, Leiden, the Netherlands Keywords: family data, relative risk model, Kong and Cox linkage method, thrombosis Genome-wide linkage studies are often followed by association studies of candidate genes located under linkage peaks. Single-nucleotide polymophisms (SNPs) are typed in a set of controls and in the affected relative pairs of the linkage study. To model this type of data, we extend the exponential model for linkage of Kong and Cox (1997) with an association term. This model appears to correspond to a relative risk model with the unknown genetic factor as exposure. We derive the score statistic to test for association given linkage for both autosomal SNPs as well as x-linked SNPs. To illustrate the statistic we analyze data from an affected sibling pair linkage study on Thrombosis (209 families). Genotypes of SNPs within candidate genes located under a peak at one of the autosomal chromosomes and under a peak at the X chromosome are typed in the affected sibling pairs and in 331 controls. G.H.Hardy and Hardy[ndash]Weinberg equilibrium (1908) A.W.F. Edwards Gonville and Caius College, Cambridge, UK Key words: Hardy[ndash]Weinberg centenary, W. Weinberg, G. H. Hardy, R. C. Punnett The Stuttgart physician W. Weinberg and the Cambridge mathematician G.H. Hardy independently introduced what we now know as Hardy[ndash]Weinberg equilibrium a century ago, Weinberg in February 1908 and Hardy in July. Here we tell the story behind Hardy's involvement, how he was responding to a query from his cricketing friend R.C. Punnett as to why in a random-mating population the dominants did not in the course of time drive out the recessives. The Statistical Equivalent Of The Binary TDT For Quantitative Traits S. Ghosh Human Genetics Unit, Indian Statistical Institute, Kolkata, India Key words: linkage disequilibrium, quantitative phenotypes The classical Transmission Disequilibrium Test (TDT) for binary traits proposed by Spielman et al. is a family-based alternative to population-based case-control studies and circumvents the problem of population stratification as it tests for allelic association in the presence of linkage. However, since the clinical end-point traits are often defined by quantitative precursors, it has been argued that it may be a more prudent strategy to analyze the quantitative phenotypes without dichotomizing them into binary traits. The paradigm of linkage disequilibrium in the context of quantitative traits generally considers the intuitive concept of differences in allelic frequencies between individuals having high values of the quantitative trait and those with low values of the trait as evidence of linkage disequilibrium between the marker locus and the QTL. Although some methods have been developed for testing transmission disequilibrium in the context of quantitative traits, these are not direct extensions of the classical TDT. We propose a simple logistic regression based test that can be analytically shown to be statistically equivalent to the TDT for binary traits, and hence is not susceptible to the presence of population stratification in the data. We perform Monte-Carlo simulations under a wide spectrum of disease models and varying parameter values of linkage disequilibrium to evaluate the power of the proposed procedure. We find that similar to the binary TDT, the power decreases with increase of dominance and decrease of heterozygosity at the QTL. The proposed method can be easily extended to incorporate multivariate phenotypes. We apply our method to analyze externalizing symptoms, an alcoholism related endophenotype from the Collaborative Study on the Genetics Of Alcohism (COGA) project. Association Analysis of SNP Data Imputed with the EM-Algorithm T. Becker, A. Flaquer, C. Herold, M. Steffens Institute for Medical Biometry, Informatics and Epidemiology, Univeristy of Bonn, Germany With the beginning of the area of genome-wide association studies methods to obtain "in silico" genotypes have gained importance. We introduce a genotype imputing method that is based on maximum-likelihood haplotype frequencies estimates obtained with the expectation-maximization algorithm. The haplotypes are derived from a joint data set consisting of a training sample (HapMap trios) and a case-control study sample. A simple association test is introduced which takes the assignment probability of imputed genotypes into account via genotype prediction measures. The performance of the method is evaluated with a simulation study based on HapMap data. Power values for genome-wide significance are considered for large case-control samples. We show that the incorporation of "in silico" genotypes leads to a substantial power gain ($7-12$ percentage points) for common disease variants. Despite the power gain achieved with imputed genotypes, our simulation study also shows that even denser marker and/or reference panels than those currently available are desirable. There is potential in full genome data. Searching for genotype-phenotype structure: Trying to understand Crohn's disease J.M. Chapman1, C. Onnie2, N. Prescott2, S. Fisher2, C. Lewis2, C. Mathew2, C.J. Verzilli1, J.C. Whittaker1 1Department of Epidemiology and Public Health, London School of Hygiene and Tropical Medicine, 2Department of Medical and Molecular Genetics, King's College London School of Medicine Key words: sub-phenotypes, Bayesian model search, Crohn's disease, disease structure Increasing numbers of genetic associations with complex, multi-factorial diseases are being discovered. Within those multi-factorial diseases, which are defined by a number of sub-phenotypes, our interest is extending to finding the underlying structure of these diseases, in particular determining which variants are related to which sub-phenotypes. Very often the sub-phenotypes that make up a particular disease are highly correlated and it becomes difficult to identify the effects of a set of variants on the set of sub-phenotypes. Such problems arise in many complex diseases such as Crohn's disease and also within disease classes, for example cancers. We consider this question within a model selection framework, based upon log-linear models, and suggest the use of a Bayesian reversible jump Metropolis-Hastings approach. We evaluate the performance of this suggested approach compared to more standard approaches by a simple simulation study and then apply the method to a real data example in Crohn's disease, with interesting results. False negative replication of disease associations can be caused by differences in marker allele frequencies between study populations Carolien G.F. de Kovel and Bobby P.C. Koeleman Complex Genetics Group, UMC Utrecht, Utrecht, the Netherlands Keywords: power, population differentiation, genetic association, odds ratio Replication is often the method of choice to confirm association between a certain marker and a phenotype. Usually, replication will be performed in populations that are of a different geographic origin than the original population. Because of this, in a percentage of studies, the marker allele in the control sample will differ in frequency between the replicates. We show that, unless the marker is the risk allele itself, this will always lead to a change in power and in observed OR, in some cases even flipping the effect of the marker from risk to protective. This may happen even if the frequency of the disease mutation itself is the same in both populations. We estimate for some realistic scenarios that the required sample size increases with roughly 30% for a 5% increase in marker allele frequency, while the observed OR declines, but much more extreme situations are possible. Up to ten percent of common markers may have inter-population differences within Europe of more than 5%. A Polygenic Model for Integration of Linkage and Pathway Information J.J.P. Lebrec1, I. Nishchenko1, H.J. van der Wijk1, T.W. Huizinga2 and H.C. van Houwelingen1 1Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands, 2Department of Rheumatology, Leiden University Medical Center, Leiden, the Netherlands Keywords: data integration, gene ontology, microarray, NARAC, rheumatoid arthritis We extend previous models for linkage curves with two linked loci (Farrall, 1997; Biernacka et al. 2005) to a general model which accommodates the polygenic structure of complex diseases. In this setting, we may account for the simultaneous action of possibly closely located genes so the model provides a framework for global linkage testing and for residual linkage testing in the presence of already established loci. Despite the highly polygenic nature of complex traits, most disorders probably involve just a few biological pathways, we therefore extend the previous model to a hierarchical structure that allows integration of gene-pathway annotation data. Using data on rheumatoid arthritis, we describe some of the many applications which the model allows including testing for the relevance of a gene list in terms of linkage and helping in candidate gene prioritization by integration of gene-pathway annotation data. Haplotypes, Siblings and Continuous Phenotypes G.B. Byrnes1, J. Stone2, L.C. Gurrin3, M.C. Southey4, J.L. Hopper3 1International Agency for Research on Cancer, Lyon, France, 2Cancer Research UK Centre for Epidemiology, Mathematics & Statistics Wolfson Institute of Preventive Medicine, the Barts and the London Queen Mary University, UK, 3Centre for MEGA Epidemiology, the University of Melbourne, Australia, 4Genetic Epidemiology Laboratory, Department of Pathology, the University of Melbourne, Australia Examining the association of haplotypes with phenotypes has several attractions, since haplotypes of measured SNPs can act as surrogates for other known or unknown SNPs or rare variants. However a primary difficulty is that haplotypes must be inferred probabilistically from genotypes. There has been considerable recent discussion of the consequences for downstream analyses (Lin & Huang, 2007, Kraft & Stram, 2007), however the issue of haplotype analysis in related individuals has been largely ignored. Genetic association studies which include related individuals are advantageous for many reasons, not least that they allow matching on unmeasured familial factors, including occult population stratification. Appropriate analytic methodology includes the TDT (Clayton & Jones, 1999) and a recent more general score-test approach (Thornton & McPeek, 2007). The difficulty of combining family structures and haplotype analyses is that errors in the imputation of haplotypes will be correlated within families. This does not appear to be dealt with in any of the available haplotype-inference software, although some authors have addressed the problem of estimating population frequencies from related individuals (Liu et al. 2006). We propose two techniques to allow correct inference of association between haplotypes and continuous phenotypes in sibships of varying size: 1. The use of orthogonal contrasts to allow independent analysis of within and between family effects using standard software; 2. Permutation testing to avoid bias from intra-sibship correlation of haplotype errors. As an example we use recently collected mammographic density measurements from a large prospective study of twins and sisters in Australia. References Clayton, D. & Jones, H. (1999) Am. J. Hum. Genet.65, 1161[ndash]9. Lin, D.Y. & Huang, B.E. (2007) Am. J. Hum. Genet.80, 578[ndash]9. Liu, P.-Y., Lu, Y. & Deng, H.-W. (2006) Genetics174, 499[ndash]509. Thornton, T. & McPeek, M.S. (2007) Am. J. Hum. Genet.81, 321[ndash]37. Meta-analysis of gene/disease association studies P.J. Newcombe, C. Verzilli, J. Pablo-Casas, J. Whittaker Non-Communicable Disease Epidemiology Unit, NCDEU, London School of Hygiene & Tropical Medicine, London, UK Key words: meta-analysis, gene/disease association, Bayesian methods, log-linear It is widely accepted that meta-analyses are necessary to provide the power required for genetic association studies. However, meta-analyses to identify genetic markers of disease are difficult because studies may have typed partially overlapping sets of markers, and published results often include only marginal (one marker at a time) analyses. This is problematic because between marker associations are unaccounted for. The effect of the true causal marker may be confounded by other markers it is closely associated with and, additionally, this is not the most efficient use of the data; individuals may have information for an unobserved marker if they have data for other, closely associated markers. We present a newly developed Bayesian approach to account for such between marker associations in a multi-marker analysis. By using a log-linear model for gene/disease meta-analysis of marginal genotype counts, we were able to inform between marker association parameters with prior information from on line project HapMap. I will demonstrate with simulated data that our model successfully adjusts for marker/marker confounding in comparison to single-marker analysis of the same data. Preliminary results from application to published data on a possible association between the PDE4D gene and stroke (for which controversy exists and marker sets differ between studies) will also be presented, in which no compelling evidence of an association was found. Unbiased estimation of effect sizes from genome-wide association scans with replication J. Bowden, F. Dudbridge MRC Biostatistics Unit, Cambridge, UK Effect sizes estimated from genome scans are upwardly biased, because only the top-ranking SNPs are reported, and moreover only if they reach a defined level of significance. No unbiased estimate exists, but replication studies are routinely performed that allow unbiased estimation of SNP effects. Estimation based on replication data alone is inefficient in the sense that the initial scan could, in principle, contribute information. We have applied recent methods developed for adaptive clinical trials to provide unbiased estimates based on both the initial scan and the replication study, which are more efficient than those based just on the replication. Specifically, we adjust the standard combined estimate to allow for selection by rank and significance in the initial scan. We illustrate our approach on some recently completed scans and use simulations to explore its efficiency. A simple method for co-segregation analysis to evaluate the pathogenicity of DNA variants of unknown significance in BRCA1 and BRCA2 L. Mohammadi1,4, M.P.G. Vreeswijk2, J. Wijnen1, P. Devilee2,3, C.J. van Asperen4 and J.C. van Houwelingen4 Departments of 1Clinical Genetics, 2Human Genetics, 3Pathology, 4Medical Statistics. Leiden University Medical Center Purpose - Uncertainty about the association of rare DNA variants with disease makes genetic counseling difficult. To classify these variants as disease causing or not, we want to derive likelihood ratios (LR). The aim is to determine whether or not variants are likely to be deleterious mutations. Method - The analysis of patterns of co-segregation of the variant with disease in families is a powerful tool to obtain likelihood ratios. There are limitations to the procedures proposed in the literature, e.g. genetic linkage software is usually needed for calculations. In this study, we describe a simple method for the analysis of co-segregation of rare variants with disease. We present an algorithm to calculate the likelihood ratios without the need for genetic linkage software. Results[ndash] We applied our algorithm to obtain likelihood ratios in favor of causality of BRCA1 and BRCA2 variants. Our data contained pedigrees with at least one carrier of a BRCA variant. The magnitude of the likelihood ratio depends on the numbers of people with the mutation and with breast or ovarian cancer. Conclusion[ndash] This is a simple and powerful method in analyzing co-segregation. We present a plain algorithm which does not need linkage packages. It can be easily run in the counseling setting as it requires only two affected genotyped persons, gender and the age of onset for breast and/or ovarian cancer. HD_IBD [ndash] a new method for tracing allelic inheritance in deep complex pedigrees F. Besnier and Ö. Carlborg Department of Animal Breeding and Genetics, Swedish University of Agricultural Science & Linnaeus Centre for Bioinformatics, Uppsala University, Sweden Linkage analysis to detect QTL in complex pedigrees is often limited by the ability to trace marker-alleles from ancestor to descendants to e.g. construct IBD matrices for use in variance component analysis. Most available tools to build IBD matrices are suitable for half- or full sib pedigrees including a limited number of generations. Here, we propose a new approach HD_IBD (Haplotype-based Deterministic IBD estimation) to trace the origin of marker alleles in a half-/full- sib pedigree of an arbitrary number of generations. The method consists of a first step using a genetic algorithm to determine haplotypes for all individuals in the pedigree, followed by a deterministic recursive method to access the inheritance pattern of each allele through the pedigree. The method has been applied to an eight generation Advanced Intercross Line (AIL) generated from two divergent lines of chicken that had been subjected to 43-generations of bi-directional selection for body weight at eight weeks of age (Dunnington & Siegel, 1996). All individuals in the pedigree (F0-F8, n = 1537) were genotyped for 383 SNP markers with approximately 1 cM coverage in 14 genomic regions where significant or suggestive QTL were evidenced in an F2 intercross between the two lines (Carlborg et al. 2006). We used HD_IBD to estimate the probability for each allele in each individual to have its origin in either of the two founder lines. The resulting QTL genotype probabilities were used to estimate the location and effect of putative QTL using least squares regression (Martinez & Curnow, 1992; Haley & Knott, 1992). Using this approach, we were able to replicate and fine-map several QTL that were evidenced in the original F2 experiment with higher power and precision than with association mapping that does not utilize available pedigree information. Multi-locus epistasis leads to hybrid inferiority in domestic fowl Ö. Carlborg1,2, Arnaud le Rouzic2, J.A. Castro1,2, L. Andersson3 and P. Siegel4 1Department of Animal Breeding and Genetics, SLU, Sweden, 2Linnaeus Centre for Bioinformatics, Uppsala University, Sweden, 3Department of Medical Biochemistry and Microbiology, Uppsala University, Sweden, 4Department of Animal Sciences, Virginia Polytechnic Institute and State University, USA Hybrid vigor (heterosis), hybrid inferiority, hybrid breakdown and hybrid incompatibility have all been extensively studied in the past century due to their importance in evolutionary and agricultural genetics. The underlying genetic mechanisms for these phenomena are, however, still largely unknown. Recently, transgressive segregation (TS) was proposed to describe the proportion of individuals in a hybrid population that have either higher or lower phenotypes than those of its founder lines. All the classic hybrid phenomena described above can be described based on the magnitude and direction of TS in a studied population. We have developed a new analytical approach to dissect the genetics of transgressive segregation in segregating hybrid populations. The method facilitates genetic dissection of complex traits in hybrid populations into individual locus effects as well as multi-locus epistasis. Furthermore, we are able to evaluate the functional role of genetic interactions in determining extreme hybrid phenotypes and predict transgressive multi-locus genotypes. We have used the method to analyze data from two hybrid chicken populations. The populations were selected because the average juvenile bodyweights in hybrids were lower than expected from parental line means and multi-locus epistatic networks had earlier been found to affect these traits. In both populations, we found strong evidence for multi-locus epistasis being the main explanation for the inferior hybrid phenotypes observed. Further use of the method to analyze data from other hybrid populations will give important insights to long-standing questions regarding the functional role of epistasis in determination of e.g. hybrid vigor (heterosis), hybrid breakdown and hybrid incompatibility. A Critical Review of Genomic Control Methods T. Dadd1,2, M.E. Weale1, C.M. Lewis1 1Department of Medical and Molecular Genetics, King's College London, UK, 2Unilever Research Colworth, Sharnbrook, Bedford, UK Key words: stratification, genomic control, simulation, association Population stratification has long been regarded as an important potential confounder of genetic association studies and careful design using fairly homogeneous populations or by matching ethnicity between cases and controls has become the norm. When planning small[minus]scale replication studies, however, it is tempting to utilise whatever samples are available, and this may lead to imbalanced sampling from heterogeneous populations. Under population stratification, genomic control (GC) theory assumes that [chi]2 test statistics are inflated by a factor [lambda]. This factor may be estimated by a summary [chi]2 value ([lambda]median or [lambda]mean) from a set of unlinked markers and used to correct [chi]2 test statistics in order to remove the effects of population stratification before assessing significance against a [chi]21 distribution. Many studies applying GC methods have used fewer than 50 unlinked markers and an important question is whether this can adequately assess population stratification. We assess the behaviour of the GC statistics in unbalanced case-control studies using simulation. SNPs representing cases and controls are sampled separately from two underlying subpopulations according to the Balding[minus]Nichols beta-binomial method with realistic intra-continental levels of Fst (0.001 to 0.02) and sampling schemata ranging from completely balanced to completely imbalanced between the two underlying subpopulations. Sample sizes range from 100 to 2500 cases and an equal number of controls unselected for disease status. Association between disease[minus]status and genotype is induced using standard methods under a multiplicative model with each copy of the risk allele conferring relative risks of between 1.2 and 2. The sampling properties of the genomic control parameters [lambda]median and [lambda]mean are explored using between 25 and 1600 unlinked markers. By then calculating empirical estimates of Type I error and power, we further link the sampling properties of the two estimates of [lambda] to genomic control corrections based on either the [chi]2-distribution (GCmedian or GCmean) or the F-distribution (GCF) and reinforce the importance of using the GCF procedure with sufficient numbers of unlinked markers. Recommendations for investigators are given. A Test for Gene Conversion and Results in the Human Growth Hormone (GH1) Gene Promoter A. Caliebe1, A. Wolf1, D.S. Millar2, M. Krawczak1, D.N. Cooper2 1Institut für Medizinische Informatik und Statistik, Christian-Albrechts-Universität Kiel, Germany, 2Institute of Medical Genetics, School of Medicine, Cardiff University, UK Keywords: gene conversion, coalescence-based test, haplotypes, Growth hormone (GH1) gene Gene conversion is an important mechanism of mutation which has not only served to fashion the structure of extant human genes but has also played an important role in pathology. We developed a coalescence-based test for gene conversion which employs the similarity of putative donor and acceptor sequences. The proximal promoter region of the human growth hormone (GH1) gene is highly polymorphic, an observation which has been attributed to gene conversion. For 14 SNPs located in the 535 bp human GH1 promoter, a total of 60 different haplotypes were observed in a total of 577 individuals from different ethnic backgrounds (156 Britons, 116 Spaniards, 163 West Africans and 142 Asians). When all four population groups were tested separately, evidence was found in the British, Spanish and African groups for the action of GH1 as an acceptor of gene conversion, with at least one of the four paralogous GH gene promoters serving as donor. A putative gene conversion hotspot spanning the transcriptional initiation site (position [minus]6 to +25) of the GH1 gene was found to contain several DNA sequence motifs previously shown to be associated with gene conversion. Of the GH1 paralogues, the GH2 gene promoter appears in particular to have acted as a donor in both Britons and Spaniards. The occurrence of gene conversion during the evolution of the human GH locus therefore has been established. Gene-environment interactions involved in type 2 diabetes: Application of a genotype-free method R. Kazma1,2, C. Bonaïti-Péllié1,3, J.M. Norris4, E. Génin1,2 1Université Paris-Sud, Villejuif, France, 2INSERM, Paris, France, 3INSERM, Villejuif, France, 4Department of Preventive Medicine and Biometrics, University of Colorado at Denver and Health Sciences Center, Denver, USA Keywords: genotype-environment interaction, recurrence risk, type 2 diabetes mellitus The detection of gene-environment interactions is a difficult issue in studies focusing on multifactorial diseases. Available methods to test for gene-environment interactions usually concentrate on some particular genetic and environmental factors. Rather than focusing on a known genetic factor, we applied a new method to determine whether or not a given exposure is susceptible to interact with unspecified genetic factors. The degree of familial aggregation is used as a surrogate. The Odds Recurrence Ratio (ORR) is used as an indirect measure of interaction. It contrasts recurrence risks in sibs of affected indexes when stratifying on the exposure of indexes. A Wald chi-square test based on the estimate of the ORR and its variance tests for departure from the null hypothesis of no gene-environment interaction, while accounting for a possible confounding bias if indexes and their sibs are correlated for the exposure. An application on a sample of 588 nuclear families ascertained through one index affected with type 2 diabetes is presented where gene-environment interactions involving obesity, physical activity and dietary fat intake are investigated. An association with obesity is clearly evidenced and a potential interaction involving this factor is suggested (p= 0.06). Multiple sibships have been used to increase sample size but a permutation procedure to obtain the empirical distribution is needed to account for dependency of sibpairs. Results of this undergoing work will also be presented. The method proposed here might be of particular interest prior to genetic studies to help determine the environmental risk factors that will need to be accounted for and select the most appropriate samples to genotype. SNPs to pathways [ndash] making biological sense of GWA results Peter Holmans Cardiff University, UK Genome-wide association (GWA) studies are a promising way of detecting associations between SNPs and complex traits. However, relatively few SNPs have p-values sufficiently small to give conclusive evidence of association. Conversely, there are usually several hundred SNPs with moderately significant p-values (p[sim]10-3-10-4). These will likely contain several false-positives, but may also contain genuine effects of small magnitude. The presence of a greater than expected number of associated SNPs in genes of similar biological function gives a degree of confidence that the associations are genuine (even if none is individually very significant) as well as giving an insight into the biological processes underlying the disease. We present a method where a list of significantly associated genes is generated (genes containing SNPs with a p-value for association less than a pre-defined threshold). Gene Ontology (GO) categories are tested for over-representation on this list (relative to the rest of the genome) allowing for varying numbers of SNPs per gene. Correction for testing multiple non-independent GO categories is performed using bootstrapping. The method is demonstrated using datasets from the Wellcome Trust Case Control Consortium (WTCCC) study. A simulation study using a hierarchical stochastic search approach N. Cremer1, L. Beckmann1, J. Chang-Claude1, D.V. Conti2 1Department of Cancer Epidemiology, German Cancer Research Center DKFZ, Heidelberg, Germany, 2Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA Keywords: Hierarchical stochastic search, multilevel modelling, simulation study, Bayes factors Hierarchical regression modelling incorporates external knowledge of variables by placing a higher order model on estimates obtained from a first level regression. In addition to including possibly relevant biological information into the analysis, this leads to more stable estimates. A stochastic search can be incorporated in a Markov Chain Monte Carlo (MCMC) framework, accounting for uncertainty with respect to model choice. A subset of putatively important variables is determined for further analysis, a feature especially appealing in times of GWAS. We present results of a simulation study based on an approach by Conti (Conti et al. 2008). Ten independent binary exposures with prevalence 0.3 were simulated in each 500 cases and controls. Three exposures were associated with disease, with effect size corresponding to 80% power in a single analysis. We considered three different a priori scenarios, comprising no information, perfect, and random prior knowledge about which variables were associated with disease. In addition to stabilizing estimates, the information was used to guide the stochastic search by influencing which variables were included in the model. Results were based on 100 replicates, each of which consisting of 10,000 MCMC steps. We present Bayes factors (BFs) for the variables, and the number of replicates with BFs > = 3, which can be seen as a surrogate measure for the power. With no information, there is positive evidence for the true variables (BFs ranging from 6.3 to 12.2), and power lies between 60 and 75%. No evidence is provided for the other variables. Perfect information yields strong evidence for variants associated with disease (BFs between 42.7 and 70.0, power between 91 and 93%), whereas values are reduced for the remaining variables. Random prior knowledge still clearly distinguishes associated and non-associated variables, but leads to a slightly elevated power in the latter group. The method detects variables associated with disease independently of the level of information provided. Results are consistent with the assumption that a more accurate distinction is obtained by giving correct prior information. Nevertheless, specifying just random knowledge stabilizes results, and thus seems to be superior to giving no information at all. Haplotype reconstruction algorithm SHARE using genotype sharing combined with Mantel Statistics Using Haplotype Sharing V. Marquard, J. Chang-Claude, L. Beckmann Department of Cancer Epidemiology, German Cancer Research Center DKFZ, Heidelberg, Germany Keywords: haplotype reconstruction, Mantel Statistics, haplotype sharing, genotype sharing, clustering The Mantel Statistics Using Haplotype Sharing (Beckmann et al. 2005) is a good association test statistic to identify putative disease loci in candidate regions of common complex diseases, but it needs the phase information of individuals. SHARE (Qian & Beckmann, 2005) is a two-staged algorithm that performs reconstruction in population individuals. In stage 1, candidate haplotype pairs per individual are identified. In stage 2, SHARE allows for multiple phylogenies of haplotypes by clustering them on the basis of haplotype distance and then fitting them within each cluster to a coalescent tree. The optimal set of haplotypes is the one with minimum sum of tree distances. First, we propose a new approach for the first stage of SHARE, in which candidate haplotype pairs per individual are identified. And second, we combine SHARE and the Mantel Statistics Using Haplotype Sharing by using the additional information of cluster membership of the individuals. The candidate haplotype pairs for stage 1 are observed by the application of an extension of Clarks algorithm (Clark, 1990). If an individuals genotype cannot be explained by two so called resolved haplotypes, we use one of the resolved haplotypes that forms the most similar genotype and construct a second haplotype, which will then be included in the set of resolved haplotypes. We measure the similarity of genotypes as their sharing, i.e number of intervals surrounding a marker that are flanked by markers with the same alleles. We investigated four ways to incorporate the additional information on cluster membership into the Mantel Statistics Using Haplotype Sharing. 1) The Mantel Statistics can be evaluated in each cluster defined by SHARE and the resulting p-values will then be combined by the step-down minP algorithm. 2) An additional element Cij describing the membership whether individual i and j are in the same cluster, can be added, resulting in the Mantel statistic at locus x Cij is1, when individual i and j are in the same cluster and 0 otherwise. Therefore, the Mantel Statistic M(x) is the same as calculated within each cluster (like 1)), but then summed up over all clusters. The significance can then be determined via Monte Carlo permutations over L (genetic similarity), Y (phenotypic similarity) or C (cluster membership). 3) Since Y and L describe similarities, one can incorporate an element describing the similarity Cij between the cluster containing individual i and that one containing individual j. The similarity of two clusters can be determined as 1 minus the distance of two clusters, assigned via average linkage. Again, significance can be assessed via Monte Carlo permutations over L, Y or C. 4) Furthermore, one can substitute the variable L by the cluster similarity variable C (like 3)). All four variations are calculated and the results compared with the Mantel Statistics Using Haplotype Sharing without using the additional information on cluster membership. Score test for age at onset linkage analysis of selected nuclear families A. Callegaro, H.C. van Houwelingen and J.J. Houwing-Duistermaat Department of Medical Statistics and Bioinformatics, Leiden, the Netherlands Keywords: age at onset, linkage analysis, nuclear families, selected samples Bivariate additive gamma frailty models are commonly used to model correlation between age at onset data observed in samples of monozygotic and dizygotic twins (correlated frailty models). These models allow for a higher correlation between monozygotic twins compared to dizygotic twins while the marginal distributions are the same. We propose to use a four dimensional additive model for genetic linkage analysis of nuclear families taking into account available information on age at onset. For testing we derived the corresponding score statistic from the retrospective likelihood of the identical by descent status given the phenotypic data. The new score statistic appeared to be a weighted version of the standard score statistic for affected sibling pair linkage analysis (mean test). The weights depend on marginal survival and variance and correlation parameters of the frailty distribution. Analogously to Merlin-Regress linkage analysis for quantitative traits, we propose to use estimates from registries and twin studies (Sham & Purcell, 2001). One of the properties of the statistic is that it has the right type I error even when the model for age at onset is not correct. Further the test-statistic can also be used when parents are not genotyped. To illustrate our new statistic we apply it to the age at onset data from the 12th Genetic Analysis Workshop (Almasy et al. 2001) where data have been generated using a complex model of seven genes influencing the liability and the age at onset of a common disease. The LOD score of the standard statistic (mean test) was 3.4 while the LOD score of our new statistic was 4. We conclude that the power of linkage analysis of age at onset data may be increased when the new statistic is used instead of a statistic which does not use the available information on age at onset of the parents. References Almasy, L., Terwilliger, J.D., Nielsen, D., Dyer, T.D. & Blangero, J. (2001). GAW12: Simulated genome scan, sequence, and family data for a common disease. Genetic Epidemiology21S, 332[ndash]338. Sham, P. & Purcell, S. (2001). Equivalence between Haseman-Elston and variance components linkage analyses for sib pairs. American Journal of Human Genetics68, 1527[ndash]1532. Prediction models for SNP data W. Ghidey1, T. Stijnen1, J.J. Houwing-Duistermaat1, B.T. Heijmans2, M. Beekman2, R.G.J. Westendorp3, P.E. Slagboom2 and H.C. van Houwelingen1 1Department of Medical Statistics and Bioinformatics, Medical Statistics, Leiden University Medical Center, Leiden, the Netherlands, 2Department of Medical Statistics and Bioinformatics, Molecular Epidemiology, Leiden University Medical Center, Leiden, the Netherlands, 3Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, the Netherlands Key words: prediction model, longevity, SNPs, cross-validation In genetic association studies, a large number of single-nucleotide polymorphisms (SNPs) are typed in samples of cases and controls for the purpose of identifying genes associated with a specific phenotype. With complex phenotypes, the SNPs usually act in combination (interaction effect) that an individual SNP may not be important by itself. In such cases, the identification of the best predictive model of the outcome incorporating interaction effects of a large number of SNPs is challenging. In this paper, we aim at finding a best predictive model of longevity in terms of specific combinations of a number of SNPs. We explored different better alternatives to the standard logistic regression model. Those methods have important features including better capability to handle large number of SNPs, less or no variable selection bias as they apply automatic selection criteria aimed at minimizing cross-validation error, and address over-fitting problem. We evaluated the predictive power of each method and we suggested further improvement in the predictive power through combining the individual model predictions. The model is constructed as a weighted combination of the candidate prediction models, with weights such that the cross-validated error is minimized. We illustrate the application of the different alternative prediction models and that of the combination model with longevity data of the Leiden 85-plus study. The data set consists of genotype information from 60 SNPs and demographic records of 887 subjects out of which 640 are old (85+ years) and 247 are currently young (18-40). Extended meta-analysis of genome-wide linkage studies in schizophrenia C.M. Lewis1,2, M.Y.M. Ng1, D.F. Levinson3 and the Schizophrenia Meta-Analysis Consortium 1Department of Medical and Molecular Genetics, King's College London, UK, 2Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, KCL, 3Stanford University School of Medicine, Stanford, USA Introduction We previously reported that a meta-analysis of schizophrenia linkage scans using the Genome Search Meta-Analysis (GSMA) method provided evidence for linkage in 10 chromosomal regions. We performed an updated analysis to incorporate multiple new studies. Methods Results (LOD, NPL scores or p-values) were obtained from investigators for 32 genome-wide linkage scans: 16 from the previous analysis (including 6 with new genotyping, some with expanded samples) and 16 new scans, totalling 3,283 pedigrees with 7,476 genotyped schizophrenia or schizoaffective cases. GSMA is a non-parametric method which ranks the strongest evidence for linkage from chromosomal regions (bins) of equal width (Rutgers map), and sums ranks across studies to assess evidence for linkage. A primary analysis (all families and 30 cM bins) was compared with results for the subset of 22 scans of European-ancestry samples. Results Suggestive evidence for linkage was obtained in two single bins, on chromosomes 5q (p = 0.004; 142-168 Mb), and 2pq (p = 0.006; 106 [ndash] 134 Mb). Nominally significant evidence for linkage was observed in 12 bins (including regions on chromosomes 1q, 2q, 3q, 4q, 8p, 10q and 16p), a result expected by chance in only 1.8% of GSMA studies of this size. Much stronger evidence for linkage was identified on chromosome 8p in European-ancestry samples (p = 0.0009; 16-33 Mb). Conclusions Linkage regions supported by meta-analysis may contain schizophrenia susceptibility loci. Although genome-wide association methodology has greater power to detect weak associations to common DNA variants, linkage analysis can detect diverse genetic effects segregating in families, such as multiple rare variants or several weakly associated loci in the same region. In silico genome-wide association mapping of complex traits O. Lao1, Y. Aulchenko2, M. Kayser1 1Department of Forensic Molecular Biology,ErasmusMC Rotterdam, the Netherlands, 2Department of Epidmiology & Biostatistics,Erasmus MC Rotterdam, the Netherlands Keywords: in silico association, Europe, phenotype, genotype The number of genome-wide association studies to newly identify genes involved in normal and pathological traits is growing fast, as there are increasing databases of various phenotypic traits. With such increased databases, it would be desirable to perform future genetic association studies "in silico", that is, trying to detect associations between phenotype and genotype based on population databases. To test whether this could be possible, we have analyzed the spatial covariation between, on one hand, genotypes of hundreds of thousands of genetic markers with genome-wide distribution and, on the other hand, normal as well as pathological phenotypes of medical but also forensic relevance in various European subpopulation samples. The genomic regions we identified with this approach as outliers strikingly overlap with those genomic regions previously associated with the respective phenotypic traits. This implies that the method we developed might be useful as an exploratory tool for future mapping of complex traits "in silico". Effects of the underlying model in multiple imputation: type I error and risks estimates P. Croiseau1, H. Perdry2,3 1Institute of Human Genetics, University of Newcastle upon Tyne, UK, 2INSERM, Villejuif, France, 3UMR-S 535, Univ. Paris-Sud, Villejuif, France Keywords: Multiple Imputation, Missing Data, Association When using trio family data to test for association between a disease and a genetic marker and to estimate the genotype relative risks (GRR), it is necessary to have all three individuals genotyped at the marker. Discarding trios in which some individuals are not genotyped can result in an important decrease in sample size and a loss of information. However, it is possible to infer the missing genotypes using the data available for linked markers together with the familial data. This can be done using maximum likelihood inference of missing data, which computes a probability for each possible genotype. An alternative is to use multiple imputation (MI), which generates a few imputed data sets that can be analyzed with standard statistical packages. These analyses are then combined to obtain an analysis of the initial data set. Multiple imputation, similarly to maximum likelihood inference, is done under a statistical model, the choice of which influences the result of the subsequent analyses. Here, we investigate the influence of two models on the type I error and on the GRR estimation. Model 0 assumes that the genetic loci considered have no effect on the disease and that Hardy-Weinberg equilibrium (HWE) holds in the population. Model 1 assumes HWE in the general population, but not in the affected offspring; it does not assume either equality between allele frequencies in affected offspring and in the general population. We simulate data sets with various proportions of missing data and perform analyses with MI and conditional logistic regression (Cordell & Clayton, 2002). We show that when MI is performed under model 0, the type I error rate is well controlled but the GRRs are slightly underestimated; in contrast, when MI is performed under model 1, the GRRs are unbiased but type I error rate inflates slightly. This suggests to use MI under model 0 to test for association, and MI under model 1 to estimate the GRR. Analysing Complex Canine Pedigrees D. Balding, F. Calboli Department of Epidemiology and Public Health Imperial College London, UK Keywords: pedigree analysis, inbreeding, population structure, dogs, gene mapping Dogs are of increasing interest as models for human diseases, and many whole-genome canine genetic association studies are beginning to emerge. The choice of breeds for such studies should be informed by knowledge of factors such as inbreeding, genetic diversity and population structure, which are likely to depend on breed-specific selective breeding patterns. I will describe some analyses of the UK Kennel Club registration database. Since electronic records began around 35 years ago (about 8 dog generations), the database includes around 5.7M dogs in 207 breeds. The largest single pedigree, that for the Labrador Retriever, includes 700K dogs. Canine pedigrees are complex, including many inbreeding loops and cross-generational matings. We analysed measures of inbreeding and genetic diversity from the pedigrees. We also developed some approaches to measuring population structure from the pedigree records: it has long been recognized that inbreeding can be studied either from genetic marker data or from pedigree records, but prior to our study population structure seems only to have been analysed from marker data. The analysis described here will appear shortly in Genetics. Comparison of the power of different methods for the analysis of haplotype-environment interactions when haplotype phase is ambiguous R. Hein, L. Beckmann, J. Chang-Claude Department of Cancer Epidemiology, German Cancer Research Center DKFZ, Heidelberg, Germany Keywords: haplotype-environment interaction, linkage disequilibrium, indirect association Association studies accounting for gene-environment interactions may be useful for detecting genetic effects. Current technology facilitates very dense spacing of genetic markers. The true disease variants may not be genotyped, so that causal genes are searched for by indirect association using genetic markers in linkage disequilibrium (LD) with the true disease variants. Power of single marker analyses may be heavily decreased in indirect association studies. Haplotypes capture LD information from multiple genetic markers and thus could be more powerful to detect associations with the disease. We compared the power of different methods for the analysis of haplotype-environment interactions in simulated case-control data when phase is ambiguous (Lake et al. 2003; Tregouet et al. 2002/2004; Lin et al. 2005). While the method developed by Lake et al. models the risk of disease prospectively, the other methods account for the case-control sampling design by modelling the risk of disease retrospectively. Furthermore, the methods differ in the choice of the reference group for haplotype-specific tests, in the expectation maximization algorithm used to estimate the phase, as well as in the way the haplotype pair distribution is estimated. The approaches of Lake et al. and Tregouet et al. assume Hardy-Weinberg equilibrium with respect to haplotype pairs. Individuals were simulated by drawing haplotype pairs and exposure status according to given distributions. Disease status was assigned to individuals by a logit model containing the environmental exposure and one chosen risk variant. After identifying risk haplotype(s) as those that harbour the risk allele, the risk variant was removed from the data for indirect association analysis. In the resulting haplotype population, the risk haplotype(s) were either uniquely distinguishable or identical to the most frequent haplotype. Our results show that power of all methods is low to detect main and interaction effects when the risk haplotype is identical to the most frequent haplotype. If the risk haplotype(s) can be distinguished from the most frequent haplotype, power for the methods of Lake et al. and Lin et al. is higher than that for the approach of Tregouet et al. The method of Lin et al. yielded highest power for the detection of haplotype-environment interaction effects. QTLMAP, a software for QTL detection in outbred populations H. Gilbert, P. Le Roy, C. Moreno, D. Robelin, J.M. Elsen Animal Genetics Department, INRA, Jouy en Josas, Rennes and Toulouse, France Key words: QTL detection, software, outbred population, linkage analysis QTLMAP is a software developed for the detection of QTL controlling traits in outbred populations as found in livestock. Such populations comprise mixture of large full sib and half sib families. The underlying methods are all based on the linkage analysis of phenotypes and marker genotypes (Elsen et al. 1999; Knott el al. 1996; Gilbert & Le Roy, 2007; Moreno et al. 2005). The main features of QTLMAP are the possibilities of varying the genetic model: one or two linked QTL, single trait or multi-trait analysis, non normal observations including discrete traits and survival data, non mendelian inheritance. It allows the possibility of searching expressionQTL. The statistical model may include fixed nuisance effects or covariates, and the polygenic relationships between the animals may be included using an animal model (Henderson, 1973). The test statistic is a likelihood ratio test. The distribution of the quantitative phenotype is modelled as a mixture of sub-distributions corresponding to each QTL genotype, the proportion of which being the dam phases probabilities. The test statistic can be simplified to its first order, corresponding to the regression approach (Haley & Knott, 1992). QTLMAP includes a choice of simulation procedures aiming at calculating the rejection thresholds by description of the distribution of the test statistic under the null distribution and at estimating the power of experimental designs. QTLMAP is available on the web (http://qgp.jouy.inra.fr/). In the future it should incorporate additional options, in particular linkage analysis with high numbers of SNP, use of Linkage disequilibrium information and modelling of epistatis. References Knott, S.A., Elsen, J.M. & Haley, C. S.,(1996) Methods for multiple marker mapping of quantitative trait loci in half-sib populations. Theor.Appl. Genet.93, 71[ndash]80. Elsen, J.M., Mangin, B., Goffinet, B., Boichard, D. & Le Roy, P. (1999) Alternative models for QTL detection in livestock. I. General introduction. Genet. Sel. Evol.31, 213[ndash]224. Moreno, C.R., Elsen, J.M., Le Roy, P. & Ducrocq, V. (2005) Interval mapping methods for detecting QTL affecting survival and time[ndash]to[ndash]event phenotypes. Genet. Res. Camb.85, 139[ndash]149. Haley, C.S. & Knott, S.A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69(4), 315[ndash]24. Gilbert, H. & Le Roy, P. (2007) Methods for the detection of multiple linked QTL applied to a mixture of full and half sib families. Genet. Sel.Evol. 39(2), 139[ndash]58. Henderson, C.R. (1973) Sire evaluation and genetic trends. In: Proceedings of the Animal Breeding and Genetics Symposium in Honour of Dr JL Lush, American Society of Animal Science and Dairy Science association, Champaign, Il. Meta-analysis of linkage studies in celiac disease P. Forabosco1,2, M.C. Babron3, C.M. Lewis1, Celiac Disease Linkage Meta-analysis Consortium. 1Department of Medical and Molecular Genetics, King's College London, UK, 2IGP-CNR, Alghero, Italy, 3INSERM Villejuif, France Coeliac disease (CD) is a complex, inflammatory disorder of the small intestine affecting approximately 1:200 individuals in Western populations. The genetic contribution of the HLA region is limited to approximately 40%, so non-HLA genes must also be involved in the disease etiology, together with gluten and possibly additional environmental factors. Several genome-wide linkage scans have identified a number of putative susceptibility loci for CD, some of which have been replicated in independent samples. Previously, a meta-analysis and a mega-analysis, in which the individual data sets are pooled prior to analysis, were applied to four CD genome scans (Babron et al. 2003), but additional studies are now available. This study aimed to identify the regions showing the most consistent evidence for linkage by applying the genome scan meta-analysis (GSMA) method (Wise et al. 1999) using all genome screens for CD performed to date. This meta-analysis used 8 independent studies including 554 families with more than 1,500 affected individuals. The GSMA is a rank-based analysis, which assesses the strongest evidence for linkage within bins of traditionally 30cM. We performed both unweighted (assuming equal contribution from each study) and weighted analyses (assuming higher contribution from larger studies). Not surprisingly, the most striking result was obtained at the HLA region (bin 6_2, p-value < 0.00001), with a strong effect also seen in the flanking bins (6_1 and 6_3), due most likely to a carry-over effect from the strong effect of the HLA locus. Outside the HLA region, the most significant finding was obtained at bin 10_6, consistently in the weighted (p-value = 0.0037) and unweighted (p-value = 0.0027) analyses with suggestive evidence for linkage (after correction for multiple testing). Looking at the different contribution of individual genome-wide linkage studies to the identified region on chromosome 10, it is apparent that the identified linked chromosomal region provides only weak evidence in some of the studies indicating that GSMA has a significant power of identifying regions that would not be apparent by eye-balling of linkage genome scan data. References Babron, M.C., Nilsson, S., Adamovic, S., Naluai, A.T., Wahlstrom, J., Ascher, H., Ciclitira, P.J., Sollid, L.M., Partanen, J., Greco, L. & et al. (2003) Meta and pooled analysis of European coeliac disease data. Eur J Hum Genet11, 828[ndash]34. Wise, L.H., Lanchbury, J.S. & Lewis, C.M. (1999) Meta-analysis of genome searches. Annals of Human Genetics63, 263[ndash]272. Markov model developed to calculate the prevalence of quantitative factors D. Almorza1, J.C. Salerno2, M. Kandus2, R.B. Ronceros3 and O. Sorarrain3 1Departamento de Estadística, Universidad de Cádiz, España, 2Instituto de Genética "Ewald A. Favret", INTA-Castelar, Castelar, Argentina, 3Facultad de Ciencias Agrarias y Forestales, Universidad Nacional de La Plata, Argentina Key words: balanced lethal system, generations, linkage The objective of this research was to study the evolution of quantitative factors, regulated by a balanced lethal system in repulsion phase, with different linkage distances between the characters. The results were obtained by building a biometric model using the mathematical formalism of a discrete absorbing Markov chain in its canonical form. In this case, a change in the linkage distance was considered to determine theoretically the number of generations for which a particular factor is maintained in a population, and to define the frequency response of the associated system. Under normal conditions, starting from K (a balanced lethal system in repulsion phase), the system can stay in state K with probability 1/2 or transition to a nonviable (NV) state with the same probability. If we include the option that the system can stay in state K with probability 1/8, when the system can transition to state E, the probability of remaining in state K increases to 3/8, and the probability of being in state E equals 1/8. However, as it is possible to transition from state E to other states to form the line of K, we multiply the value of reaching state E by 1/8.The probability of reaching the NV state is obtained by calculating half of the sum of the probabilities of going to NV from E, that is 7/16, and dividing the result by 8 to produce 7/128. Considering the option that K can go to E, the following matrix was produced. Starting from state K: (0,0,0,0,1), the probability of future generations is shown in Table 1. Therefore, the probability of eliminating a balanced lethal system is small, with more generations being required for this situation to occur. Table 1. Probabilities of future generations when the probability of remaining in state K equals 1/8.1a2a3a4a5a6a7a8aA0.01750.09500.26240.48870.69500.83560.91600.9580B0.03500.08140.11140.10800.08120.05160.02970.0162D0.03500.08140.11140.10800.08120.05160.02970.0162E0.07010.09040.07700.04930.02530.01130.00460.0018K0.84210.65150.43760.24570.11690.04960.01970.0076 This mathematical method allows the changing of options, and the evaluation of different situations. The use of Markov chain theory to study evolution over generations is a different approach when compared with classical mathematical methods. Genome-wide prediction of functional gene-gene interactions in case-control association studies Z. Bochdanovits, D. Ruano and P. Heutink Clinical Genetics, VU Medical Center/CNCR, Amsterdam, the Netherlands Keywords: gene-gene interaction, epistasis, linkage disequilibrium, complex disease The contribution of individual risk factors to complex disorders is modest and current strategies aimed at identifying such susceptibility genes in very large case-control studies are still thought to be underpowered. One biologically obvious reason why susceptibility genes would be difficult to identify based on searching for their main effects alone is that in reality they might work together in a non-additive fashion, i.e. the risk is actually conveyed by specific combinations of such factors. Although this possibility is widely acknowledged, a full screen for epistatic effects on a genome-wide scale is unfeasible because statistical power would be seriously compromised by correcting for multiple testing. One possible way to avoid this problem is to ascertain pairs of variants that a priori are more likely to be involved in functional gene-gene interactions and test only these for association with a complex phenotype. Classical population genetic theory predicts that linkage disequilibrium (LD) between interacting loci will emerge if the phenotype affected by the combination of loci is under selection. Although in natural populations other (demographic) factors may also induce LD, here we demonstrate that excess co-occurrence of unlinked polymorphisms (i.e. inter-chromosomal LD) in a random population sample can be used as a proxy for epistatic interactions. In context of a case-control association study, LD among controls can be shown to be a function of the relative risk (RR) of the combined effect of two polymorphisms. We show that under an epistatic model of penetrance but not under an additive model of penetrance selecting pairs of variants based on LD observed in controls yields good power to include the truly causal pair in the follow up association study. We applied this approach to a large case-control genome-wide association data set and find several Bonferroni corrected significant pair-wise associations with a complex phenotype that would not have been found focusing on main effects alone because the corresponding main effects are systematically lower compared to the joint effect of the two loci together. (Source: Annals of Human Genetics)
Recent comments
6 weeks 18 hours ago
7 weeks 4 days ago
34 weeks 8 hours ago
35 weeks 13 hours ago
35 weeks 1 day ago
35 weeks 1 day ago
35 weeks 3 days ago
35 weeks 4 days ago
35 weeks 5 days ago
35 weeks 5 days ago