This post is maybe a little geekier than normal, but hopefully those readers who have been following the Y-DNA articles will find this of interest.
Surname projects on Family Tree DNA exist in large part to help members answer genealogical questions about their surname. The main tools available are Y-DNA tests such as STR tests (Y-12, Y-25, Y-37, Y-67, and Y-111) and SNP tests (Big Y-700). One challenge for surname project administrators is to recommend testing options for members that will help answer their questions at the least cost. In general, the more STR markers tested, the better the information gained, but the cost for testing goes up as the number of markers increases. Some projects have identified markers and values that seem to uniquely define the haplotype for their surname and thus can recommend testing levels that have a high probability of identifying members who are likely to be related to others who share their surname.
The goal of this analysis is to determine if there are any Unique STR Patterns (USPs) [1] that help to identify Ackley men. The approach will be to compare the ancestral values for Ackley men to ancestral values for upstream haplogroups, specifically R-M269, R-L21, and R-S1051, to determine if any of the values are particularly more prevalent in Ackley men than in those larger groups.
Y-DNA STR Testing and Convergence
Before taking an in-depth look at individual markers, a short discussion about the various levels of testing is in order. As mentioned above, STR testing at 12, 25, 37, 67, and 111 markers has been available at various times from Family Tree DNA. Currently only 37, 67, and 111 marker tests can be purchased. Although 12 and 25 marker tests are no longer available, there were many men who tested at those levels in the past, so tools exist to look at matches for all levels. The reason for discontinuing the lower marker tests is that in many cases they just don’t provide enough information to draw useful genealogical conclusions. For example, there is a non-Ackley man in the Ackley Surname Project who has nearly 4,000 Y-12 matches. Not one of those matches has his surname, there are nearly 3,000 different surnames among his matches, and the most frequently occurring surname occurs only 52 times. Moving up to 25 markers, this man has almost 1,600 matches with almost 1,000 surnames, none of which are his surname. This information is decidedly unhelpful in answering genealogical questions.
One of the reasons for this issue is a phenomenon called convergence, which is the idea that “over long periods of time a series of markers can through random change adopt patterns that make them look more closely related than they actually are.” [2] Fortunately for members of the Ackley Surname Project, this does not appear to be an issue. Of the 15 men in the project, only one has more than three non-Ackley matches at the 12-marker level. The one man who does not fit this scenario has 55 non-Ackley matches, and he differs from the other 14 men in that he is the only one in the project who has a value of 30 for marker DYS389ii, while all other men in the project have a value of 31. At 25 markers, all of the 55 “excess” matches drop off of his match list, and for the other 14 men one of the three “excess” matches drops off. The two remaining “excess” matches show up on some of the match lists all the way up to and including 111 markers, suggesting that these two men might have an NPE somewhere in their ancestry and are in fact closely related to the Ackley men. Otherwise, all of the matches for every member at 37, 67, and 111 markers have the Ackley surname. This situation seems to indicate that the Ackley haplotype is unique enough even at 12 markers to be useful in answering Ackley genealogical questions, and a Y-37 test is likely to be sufficient to determine if an individual is related to Ackley men in the project.
The Ackley Haplotype
The following are the STR values for the Ackley men who have taken Y-37 (5 men), Y-67 (2 men), or Y-111 (8 men) DNA tests. For ease of presentation, the values are split up by panel. Given the numbers of men who have tested at each level of markers, the number of samples for each panel are:
Panel 1 – 15
Panel 2 – 15
Panel 3 – 15
Panel 4 – 10
Panel 5 – 8
Panel 1 - Markers 1-12 |
Panel 2 - Markers 13-25 |
Panel 3 - Markers 26-37 |
Panel 4 - Markers 38-52 |
Panel 4 (cont) - Markers 53-67 |
Panel 5 - Markers 68-82 |
Panel 5 (cont) - Markers 83-97 |
Panel 5 (cont) - Markers 98-111 |
The mode of a set of data is the most frequently occurring
value. For STR markers it is assumed that the mode is the “original” value,
i.e., the value of that marker for the common ancestor of the testers in a
particular group. Modal values are also referred to as ancestral values. The
values do not mean much by themselves but take on meaning when compared to
other haplotypes for analysis purposes.
As mentioned above, Ackley data will be compared to data
from upstream haplogroups. R-M269 data (also known as R1b) was captured from
the “R_R1b ALL Subclades” haplogroup project at Family Tree DNA (FTDNA) [3], R-L21
data was captured from the “R L21 and Subclades” haplogroup project at FTDNA [4],
and R-S1051 data was captured from the “R-S1051” haplogroup project at FTDNA [5].
The relationships of these haplogroups are shown in the table below. Note that
R-M269 is the oldest haplogroup, having formed about 13,000 years ago. R-L21, a
subclade of R-M269, formed about 4,500 years ago, while R-S1051 formed about
3,900 years ago. Six of the 15 Ackley men who have done STR tests have also done
SNP tests that place them in the last two haplogroups in the table, R-FGC52286
and R-FGC52300.
SNP Formation Dates [6] |
As with the Ackley surname project, the number of men who
have joined each of the haplogroup projects mentioned above have tested
different levels of markers, from Y-12 through Y-111. As a result, the number
of data points available for each marker can be different depending on how many
men in the project tested at each different level. There were 25,631 members of
the R1b project, 9,213 members of the R-L21 project, and 264 members of the
R-S1051 project. It should be pointed out that there is some overlap between
the projects – since there is a hierarchical relationship between the
haplogroups there are men who joined more than one project. This was accounted
for in the analysis, and only unique data points were used in the calculations
that will be presented later.
A Closer Look At Some of the Markers
The table below gives examples of the data collected for
each of the markers 1-111. The main part of the table gives the frequency of
occurrence for the values for each marker for each haplogroup project. The
ancestral value (mode) for each marker for each project is highlighted in
green. The bottom part of each table gives the min, max, and mode for each
marker for the Ackley men who have been tested for that marker. The numbers in
the mode row give the difference between the Ackley ancestral value and the
haplogroup ancestral values; 0 means they match, a positive number means the
Ackley ancestral value is above the haplogroup ancestral value, and a negative
number means the Ackley ancestral value is below the haplogroup value.
In these examples, the ancestral values for the haplogroup
projects are well-defined, i.e., they are clearly the most frequently occurring
value for that marker, and other possible values are much less frequent. This
is not always the case, and some specific examples will be discussed later. For
the first marker, DYS393, MIN=MAX=MODE, indicating that all 15 Ackley project
members had the value 13. The Ackley ancestral value of 13 matches each of the
haplogroup project ancestral values. This situation holds true for 81 of the
111 markers; these markers will not be analyzed further as they would not
contribute to the stated goal of identifying USPs for Ackley men.
Marker DYS390 is an example where the Ackley ancestral value
of 25 is 1 greater than the ancestral value for each of the haplogroup
ancestral values. This situation, where the Ackley ancestral value differs from
the haplogroup ancestral values (which are equal to each other), is considered
a good candidate for further analysis. Of the 30 markers where the Ackley
ancestral value differed from the haplogroups ancestral values in some way, 17
of them fit these criteria, and will be discussed further below.
The table below shows examples of markers where the Ackley
ancestral values differ from the ancestral values for R1b and R-L21 (the more
distant haplogroups) but are equal to the ancestral value for R-S1051. This
pattern suggests that the mutation in this marker could be a USP for the
R-S1051 haplogroup rather than the Ackley haplotype. As such, these markers
will not be researched further.
There are four markers in the table below with a similar pattern, but the Ackley ancestral values differ only with the R1b ancestral value and are equal to the ancestral values for both R-L21 and R-S1051. This would indicate that the mutation in the marker could be a USP for R-L21, and the R-S1051 group and Ackley both stayed at that value. Likewise, these markers will not be investigated further.
The markers in the next example are interesting in that they show a somewhat unusual mutation pattern. In both cases, the Ackley ancestral value matches the ancestral value for the oldest haplogroup, R1b, and in the case of DYS712 also matches the ancestral value for R-L21 but does not match the more recent haplogroup R-S1051. This could be the result of a back mutation, where the Ackley men mutated from the more recent R-S1051 value back to the original R1b value. It is also worth noting that for CDYa, there is not a clear modal value; the frequencies for the values of 36 and 37 are very close in all three haplogroups. Even the Ackley men have some members with a value of 36 and some with a value of 37. For DYS712 the mode is a little more clear, but in this case the Ackley men have several values. There are two men at 19, five at 20, and 1 at 21. For this reason, these markers will not be analyzed further.
The final two markers that have not yet been discussed are
similar to the above example in that the Ackley ancestral values are closer to
the more ancient haplogroups (R1b and R-L21) than they are to the more recent
R-S1051. In both cases, the Ackley values are +1 compared to the values for R1b
and R-L21 and are +2 compared to R-S1051, and the +2 occurred because the
R-S1051 marker mutated in the opposite direction from the Ackley marker. These
markers would not lend themselves to the type of analysis discussed above and
thus will not be included in further analysis.
Analysis
To summarize what has been discussed so far: the Ackley
ancestral values are equal to the ancestral values for all three upstream
haplogroups on 81 of the 111 Y-STR markers, meaning these 81 markers would not
provide any insights into the Ackley haplotype. As discussed above, for various
reasons 13 of the remaining 30 markers would also not be good candidates for
further study. That leaves the 17 markers in the table below that will be the
subject of the remainder of this study.
To reiterate, for all 17 markers in this table, R1b value = R-L21 value = R-S1051 value ≠ Ackley value, so the Ackley value likely represents a true variation from the value found in the rest of the upstream haplogroups. Note that the “Total Records” column gives the number of unique records having values for that particular marker; any tester who was a member of multiple projects was only counted once. The figures in the “No. at Ackley Value” column give the numbers of testers for which the value of each marker is equal to the Ackley ancestral value. Likewise, the figures in the “No. at Ancestral Value” column give the number of records for which the value of each marker is equal to the more ancient haplogroup values (which are equal to one another).
The approach for this analysis is to look for marker values
or combinations of marker values that are present in Ackley men but appear to
be relatively rare in the larger R1b population. An obvious example of this is
marker DYS617. From the table above we can see that the Ackley value for that
marker is 11, which is present in only 187 (.86%) of R1b men, while over 91% of
R1b men have a value of 12 for that marker. Another way to look at this is to
observe that while Ackley men make up only .04% of the R1b men who have tested
to at least 67 markers (10/21647), they make up over 5% of the men who have a
value of 11 for marker DYS617 (10/187). The practical implication of this
situation is that in cases where a Y-37 test is not conclusive in determining
whether a tester is related to the Ackley men but is “close”, a Y-67 test can
be recommended to see if the tester has a value of 11 for DYS617. None of the
other markers appear to be unique enough on their own, but in combination with
other markers could provide some insights.
The table below shows the number of testers whose values
were equal to the Ackley values for all possible pairs of the 17 markers being
studied. Each cell in the table gives the number of testers who had the Ackley
values for the intersection of the row and column that form the intersection.
For example, 384 testers had a value of 25 for DYS390 (the row) and 15 for
DYS19a (the column).
Note that these numbers are typically much smaller than the numbers for single markers presented in the previous table, implying that combinations of these marker values are much less common and could provide insights into the Ackley haplotype. Of particular interest are the values for DYS617; the combination of a value of 11 for this marker with the Ackley value for any of the other 16 markers in the study looks to be extremely rare when compared to the R1b population. For example, a value of 25 for DYS390 and 11 for DYS617 occurred in only 24 of the 21,647 men who had values for both of those markers, and 10 of those 24 men were Ackleys. As a percentage, only 0.1% of R1b men had a value of 25 for DYS390 and 11 for DYS617, and 41% of those were Ackley men. This situation seems to confirm that a value of 11 for DYS617 in combination with any of the Ackley values for the other 16 markers are good candidates for USPs. Again, the practical implication of this is that an upgrade from Y-37 to Y-67 would be a reasonable suggestion for a tester whose relationship to other Ackley men is “close” but over the FTDNA threshold of 5 for 37 markers (say 6 or 7).
While there is undoubtedly more that could be learned by
further study of the Ackley haplotype, there are two main takeaways from what
has been learned so far: (1) a Y-37 test is probably sufficient to establish
relatedness for most testers, and (2) for testers who are “close”, a Y-67 test,
with particular attention to the DYS617 marker, could provide more information
to draw a more definitive conclusion.
A note of thanks
Thank you to Dave Vance for being kind enough to review this information before I published it. Dave is an expert in using Y-DNA testing for genealogy -- in fact he wrote a book on it -- and I appreciate his willingness to answer my questions and keep me straight.
No comments:
Post a Comment