Sarah Vahey

Intégration de la réalité diploïde et des modèles de pénétrance à une méthode de cartographie génétique fine.

Résumé Les méthodes de cartographie fine sont des modèles qui estiment la position d’un allèle mutant peuvant causer une maladie dans un groupe d’individus. Le travail de Larribe et al. (2002, 2003), MapArg, n’a pas tenu compte des paramètres de pénétrance jusqu’à maintenant. Ce mémoire démontre les effets de ces paramètres, soient la pénétrance et la phénocopie, sur la performance de MapArg, dans des populations haplo ̈ıdes. De plus, deux méthodes que nous avons développées seront ensuite incorporées à MapArg dans le but d’améliorer son efficacité si il y a pénétrance et/ou phénocopie.

Les résultats démontrent que la phénocopie peut avoir une influence négative sur l’efficacité de MapArg. La pénétrance ne semble pas avoir d’effet majeur sur MapArg. La première méthode développée est un modèle simple qui n’apporte pas d’amélioration majeur de MapArg par rapport à ce mˆeme modèle sans ajustement. Par contre, cela procure un point de départ pour les développements futurs dans les populations diploïdes. La deuxieme méthode améliore l’efficacité de MapArg sous certaines conditions, en particulier, si la taille de l’échantillon est assez grande. La deuxieme méthode fonctionne également très bien pour les données réelles de la Fibrose Kystique (Kerem et al., 1989). Mots clés: phénocopie, pénétrance, pénétrance incomplète, cartographie fine.

Fine mapping methods are models that provide an estimate for locating a mutation causing a given disease among a group of individuals. MapArg, the work of Larribe et al. (2002, 2003), did not take penetrance parameters into account to date. This thesis shows the effect of these parameters, namely penetrance and phenocopy, on the performance of MapArg for haploid populations. Also, two different methods are developed and incorporated into the MapArg framework with the goal of increasing efficacy of MapArg in the presence of penetrance and/or phenocopy. Results show that phenocopy can strongly effect MapArg’s efficiency while penetrance does not have much of an effect. The first Method developed is a simple model that does not prove much more efficient than MapArg without any adjustment; however, it provides the groundwork for further development when diploid populations will be modeled. Method 2 has shown to improve the efficiency of MapArg under certain conditions, in particular, when the sample size is large. This method also greatly improves the performance of MapArg with the Cystic fibrosis data (Kerem et al., 1989).
Mots-clés : phenocopy, penetrance, incomplete penetrance, fine mapping



Gene mapping of complex diseases is an ongoing research in the field of genetics. One of the primary goals of this research is to locate the position of a mutation(s) (or causal gene(s)), that causes a given disease. Fine mapping, a branch of gene mapping, uses information from previous studies that have ascertained an approximate location for the mutation(s) or causal gene(s) and concentrates on pinpointing the exact location. Larribe et al. (2002) and Larribe (2003) have developed a fine mapping method called MapArg that estimates the position along a chromosome of a mutation responsible for a given disease. Approximating complex biological processes by means of mathematical models is a difficult procedure and usually some hypotheses that are not always realistic are necessary in order for these models to be feasible. The research of Larribe, and the fine mapping methods of his contemporaries, are constantly evolving over time, incorporating models for biological aspects that were not previously accounted for.

One assumption that MapArg has worked with to date is that the disease being studied has complete penetrance and no phenocopy. In biological terms this means the following: if individuals are affected by a certain disease they automatically carry the mutation causing this disease and likewise, if individuals are not affected by disease they do not carry the mutation. It is well known however, that for complex diseases, there exists incomplete penetrance and phenocopy, e.g. Breast Cancer: some women suffer from breast cancer without carrying the causal gene (phenocopy) and other women carry the causal gene but do not suffer from breast cancer (incomplete penetrance). These phenomenon collectively known as the penetrance parameters are currently being taken into account either directly or indirectly, by McPeek and Strahs (1999), Morris et al. (2002) and Z ̈ollner and Pritchard (2005) in different ways. The goal of this thesis is to study the effect of incomplete penetrance and phenocopy on the performance of MapArg, and also to develop some models that can account for these parameters within the MapArg framework. This body of work is composed of three chapters. Chapter 1 contains an introduction to the biological notions that are necessary to understand fine mapping. The mathematical models upon which MapArg and other fine mapping methods are based are also presented in Chapter 1. A detailed explanation of MapArg is discussed in Chapter 2 along with a review of the fine mapping methods in the literature. Particular attention is given to the way in which other research methods have taken incomplete penetrance and phenocopy into account. The third and final chapter and is in fact the crux of the thesis and the original work contained here encapsulates the main goal of this thesis. The effects of the penetrance parameters on MapArg are shown. Following this are two methods that have been developed to take these parameters into account within the MapArg framework. Results of the performance of each method are then presented and discussed. Also ideas for future development is discussed.


Incomplete penetrance and phenocopy are two important phenomena that occur among populations with complex diseases. As MapArg assumed complete penetrance and no phenocopy to date, it was of great importance to see the effects, if any, that these parameters would have on the efficiency of this fine mapping method in finding the TIM. We have shown by way of simulation that incomplete penetrance does not appear to effect the performance of MapArg, whereas phenocopies among the sample render the method quite inefficient, even for quite low levels of phenocopy. The need to account for these parameters, especially phenocopy within the MapArg framework has become evident.

Given that the levels of penetrance and phenocopy are known a priori, two methods were developed in order to correct for the penetrance parameters. The first method, a rather straightforward approach, proved ineffective in most situations but improved efficiency under a few circumstances. However this method provides a starting point for the development of a model that works for diploid populations. Until now, MapArg works on haploid data but it is of interest to extend the method to diploid data.

Incorporating method 2 showed some improvement in the performance of MapArg. The most marked improvement can be seen when the sample size is increased. Also, the second method seems to work extremely well on the Cystic Fibrosis data, data that is known to have phenocopies resulting from multiple mutations. This result is very encouraging as it shows this method can work well for ”real” data as well as simulated data, where situations are sometimes more ideal than in reality.

Further discussion as to how other methods of accounting for the penetrance parameters might somehow be adapted to suit the MapArg framework, is given. It is clear that there remains a lot of further research in this area and it seems worthwhile to concentrate more on modeling phenocopy than penetrance as it is this parameter that has the greatest effect on MapArg.