DNA Ditties

What can you learn from your DNA matches? (Part 2)

Let’s discuss how to use DNA matches to build out your family tree and to explore non-paternal events or adoptions. Why are these two uses grouped together? Because both involve the investigation of DNA matches that initially seem mysterious because they don’t include names of ancestors that are familiar.

Now all of us have DNA matches that are mysterious, so where do we start? It makes sense to initially investigate the matches sharing the largest amounts of DNA with us. As we discussed previously, these will be our closer DNA cousins, so should have a more recent common ancestor. If match person X has an attached family tree, scrutinize it for familiar names and places. Be alert for problems like listing a married woman with her husband’s surname rather than her maiden name, which will defeat the automated algorithms that look to match surnames between trees. Then look at the list of people that match both you and person X. Are there any familiar names or places in their family trees? If so that gives clues as to next steps. If no familiar names or places, are there names or places in common among these shared matches? If so, that will focus your research. For example, among my own DNA matches I found a shared match group of people who were all connected to Minnesota. Further investigation showed that they all descended from the Nash family that emigrated from England to Lac qui Parle, Minnesota in the mid 1800s. The Nash name was not in my family tree, but the small village in Berkshire from which they emigrated was exactly the same as that where my mother’s ancestors had lived in the late 1700s. 
Connecting these people to your own tree will be made easier if you have built out your own tree laterally as well as vertically. By this I mean including siblings of your direct ancestors and the descendants of those siblings. Conversely, you may need to improve and extend the trees attached to your shared matches to find an overlap with your own. Be aware that people make mistakes when building their trees, so check that the information provided really supports the conclusions they have drawn. When investigating a person who appears in the tree of a match, remember that you can find other trees that include this person and check if those trees include additional, or different, information.
Make use of the tools provided by the genealogical companies. They all show shared matches between you and person X. Ancestry’s Sideview tool will identify if the match is on your maternal or paternal side. MyHeritage will show the extent of a match, not only between you and person X, but between person X and person Y, who also matches you. Why is this helpful? For example, you find person X has a good match to you, but no tree or a small tree. However, among the many shared matches, person Y is a close cousin of person X but has a more extensive tree. You can then focus on person Y’s tree to identify the ancestors shared with person X and then work backwards to an intersection with your own tree. MyHeritage has the Autocluster tool that will group your matches based on shared DNA relationships, and will also show connections between groups.
Hopefully these efforts will connect person X to your tree. But what if you are unsuccessful? Does that mean there is a non-paternal event in your ancestry or just that there’s not enough information available to make a connection? There is no clear answer. The bigger the DNA match and the more extensive your tree is, the more likely that person X signifies a non-paternal event. Perhaps you already know that you are looking for such an event. Or perhaps there are other clues. The more that you are able to connect DNA cousins to your tree, the more that you will be able to identify any lines that lack DNA cousins. That could signify that you have a mistake in your tree, or that a parental relationship was not a biological one (i.e. a non-parental event occurred). In any case, you will have to establish that a common ancestor of person X, and the shared matches between you and person X, intersected in time and space with one of your known ancestors, using any scraps of information that you can glean.

What can you learn from your DNA matches? (Part 1)

There are many uses for DNA matches and many ways to use DNA information. Let’s discuss some basics.
1. Discover and connect with known or unknown DNA cousins. If you recognize someone’s name from your DNA match list but you never knew them or lost contact, you immediately have a way to contact them. You may not recognize the name of a close match, perhaps because they are using an alias or their married name, but when you look at their family tree you immediately know who they are. But perhaps you have a reasonably close match with no public tree or very limited tree information. You can contact them to ask for more information, or try to build their tree backwards in time based on what little information they provide.
2. Use DNA matches to validate (or invalidate) your own tree. The fact that you and individual X have a reasonable DNA match means that you both descend from a common ancestor (or ancestral pair). If you can identify that common ancestor from examining your tree and their tree, you have validated the research you did to identify that ancestor. The more DNA cousins you can find that descend from the same ancestor, the more certain you can be of the validity of the ancestor. Even if the trees of you or your DNA cousin don’t go back far enough in time, Ancestry.com and MyHeritage.com have tools (ThruLines and the Theory of Family Relativity, respectively) that suggest how you and individual X might be related, using information from any trees that are on their sites. Not all of these suggestions may be correct, but you are able to examine all the evidence to decide if you believe the proposed relationship. Note that, for these tools to work for you, you must have your DNA test linked to your family tree on that site (https://dna-explained.com/2022/09/21/connect-your-dna-test-and-others-to-your-tree). Conversely, if you have an extensive family tree, well populated with siblings of your direct ancestors, and you are not finding any DNA cousins on a particular line, it may be time to worry that you have a mistake on that branch of the tree or there is a previously unknown adoption or non-parental event.
3. Use DNA matches to build out your family tree. 
4. Use DNA matches to explore non-paternal events or adoptions.
These last two uses involve similar problem solving techniques and we will discuss them next time.

Caveats and complications of using shared centimorgans to predict relationships: Pedigree Collapse and Endogamy

Now that we have discussed the connection between the number of centimorgans (cM) of DNA shared with a match and the relationship with that match, we need to complicate the discussion with some situations that can change that relationship. The good news is that these situations don’t apply to most people. The bad news is that if they do apply to you they can complicate your genetic genealogy. The simplest situation to understand is when you have individuals in your family tree who appear at more than one position in that tree. How can that happen? Consider what happens when two first cousins marry – as occurred fairly frequently in the not too distant past.
You can see that John Smith and Mary Johnson each appear twice as great grandparents of “Tester” Smith. Normally a person would inherit about 12.5% of their DNA from a great grandparent, but because John Smith and Mary Johnson are “double” great grandparents, Tester Smith will have about 25% of each of their DNAs. Or, to put it another way, John Smith and Mary Johnson will look like grandparents to Tester, rather than great grandparents, based on the amount of DNA they share with Tester. Similarly, when Tester matches with any descendants of John Smith and Mary Johnson, they will appear to be closer relatives than they really are. 
In this example the effects of pedigree collapse are dramatic. If the duplication of positions in Tester’s family tree happened in more distant generations the effect would be lessened but still apparent. Pedigree collapse is used to describe isolated incidences of family intermarriage that happen within the past 200-300 years.
Endogamy is essentially repeated pedigree collapse that has happened over many generations, even further back in time. How does this occur? Any community that is isolated, either geographically or socially, forcing repeated intermarriage within a relatively small group of people, will generate endogamy. Examples include island communities, or those living in a rural area at a time when transportation was limited, or immigrants who didn’t mix with the locals (the Amish), or perhaps the best studied group, Ashkenazi Jews in Europe. The Ashkenazi Jews had a centuries long history of persecution in Europe, which combined with a strong cultural identity, led to continued intermarriage within the Jewish community. Genetic studies have shown that probably the number of Ashkenazi Jews who first migrated to Europe was probably small. So repeated intermarriage among a small gene pool has led to significant endogamy in today’s people of Ashkenazi descent.
What are the consequences for your DNA matches if you are of Ashkenazi descent (or any other endogamous group)? Firstly you will have a lot more DNA cousins on your match list than people who are of non-endogamous descent. Because the DNA testing companies predict relationships solely from the total number of shared cM, you will find an enormous number of 3rd-5th cousins who are not your 3rd-5th cousins but much more distantly related. Your shared DNA has come through multiple paths much further back in time. A clue that this is happening is that the total number of cM comes from sharing many small segments of DNA (endogamy) rather than a few larger segments (non-endogamous relatives). So to prioritize which DNA cousins might actually be “real” relatives, look for those who have a few larger segments of shared DNA – perhaps over 20 cM.

How we use centimorgans in practice

Now that we understand the concept of centimorgans (cM) of DNA shared with a match (see the last DNA Ditty), we can move on to discuss how this information can be used in our genealogy research. Unfortunately, although there is a general concept that more cM shared means a closer family relationship, it is not possible to predict an exact relationship from a given number of cM matched. However, experts have gathered together a  large body of data regarding the number of cM shared by known relatives of all kinds and compiled it for us. A chart summarizing this information can be found on the ISOGG website (https://isogg.org/wiki/Autosomal_DNA_statistics).
A useful tool, derived from this data, can be found at DNA Painter (https://dnapainter.com/tools/sharedcm). This allows you to enter the number of cM shared between you and a “DNA cousin” and see which relationships are theoretically possible for the two of you. An even more sophisticated version of the tool (https://dnapainter.com/tools/sharedcmv4) will calculate the probabilities of different relationships for a given cM value.
The closest relationships, parents/children and siblings, share in the range of 1000 to over 3000 cM. When the number of shared cM is large, the number of possible relationships are relatively low, but conversely when the number of shared cM is below 100 there are many possible relationships. The DNA testing companies describe each match on your your list with labels such as 2nd cousin, 2nd-3rd cousin etc. These are not to be taken literally, but just as rough guidance.
The final point to make here is that it is possible for relatives as close as 3rd cousins to share no DNA. So a fraction of your 3rd, 4th, 5th etc cousins won’t ever show up on your match list. But conversely, you may get lucky and have matches with some of your 5th, 6th or even 7th cousins.

Centimorgans - How we measure genetic distance and relatedness​

In previous DNA Ditties we talked about SNPs and recombination between parental chromosomes during meiosis. How are we going to quantitate the degree of relatedness between the DNAs of two individuals? Should we count up the total number of SNPs that are shared? No, this won’t work, because those SNPs might be scattered all over the genome and might not reflect recent common ancestry. Should we look at the presence of long stretches of identical SNPs? This is much better, but SNPs are not evenly distributed throughout the chromosomes, so one short length of DNA might contain more SNPs than a longer segment. What we need is a way to measure genetic distances. How likely is it that two linked (nearby) genetic markers (SNPs) will become separated by recombination during a single meiosis? Intuitively we would say that the further apart they are on the chromosome the more likely they are to be separated. However, recombination frequencies are not the same in all areas of all chromosomes. So linear distances on the chromosomes are not the same as genetic distances. Having measured recombination frequencies in all regions of human chromosomes, scientists can define a unit of genetic distance, the centimorgan (cM) as "equal to a 1% chance that a marker (SNP) at one genetic locus on a chromosome will be separated from a marker at a second locus due to crossing over (recombination) in a single meiosis (generation)”. So the sharing of one or more  substantial DNA segments with identical SNPs between two individuals indicates a relatively recent common ancestor, and in general the more cM shared the more recent the ancestor. Now we need to define what we mean by substantial. A shared segment with a small number of cM could be just the result of chance identities. Usually we only count segments greater than 5-7cM in genetic length when we are totaling genetic relatedness. The smaller segments might be inherited from a recent common ancestor, but they could also be by chance. Similarly, a few large segments of identity are a more convincing measure of a recent common ancestor than a large number of small segments of identity, even if the total cM shared are similar.

DNA testing is a SNIP

Actually that’s SNP (Single Nucleotide Polymorphism), and SNPs are the basis of how the DNA testing companies determine our ethnic mix and how we are related to each other. There are over 6 billion nucleotide pairs (base pairs) in the human genome (nucleotides are A, G, C, or T). The vast majority of these pairs don’t vary between most humans. However, there are positions within the genome that do vary more frequently. The testing companies have collected somewhere between half a million and a million of the most commonly variable (polymorphic) positions and developed technology to test whether a DNA sample has an A, G, C or T at that position. These variable positions (SNPs) are spread fairly evenly along all the chromosomes, so that they act as genetic markers or landmarks. Each of us will have a unique set of SNPs that, together, define our haplotype. Essentially all of these SNPs will be inherited from our parents, but the recombination that occurs between paternal and maternal chromosomes during meiosis means that some of our chromosomes won’t look exactly like those of our parents (see the DNA Ditty about “Why do siblings have different DNA?”). This recombination process occurs with every generation, so the pattern of SNPs along the chromosomes gets scrambled as we have children and they have children and so on. Therefore, the longer the stretches of DNA where all the SNPs are identical between two individuals, the more closely related they are. That simple idea allows us to estimate and map relationships between people and track their origins over time.

These charts can be found at The Genetic Genealogist blog.