PDA

View Full Version : A TMRCA Estimator Excel spreadsheet



MJost
12-28-2012, 09:56 PM
For my own use I have modified a version of MikeW's TMRCA Estimator Excel spreadsheet to use Ken N's Gen111t Engine modified by me and uses Excel 2003 or newer built in functions such a Var etc enhancing the precision. I am making this available to use by individual users with no warranty.

My TRMCA spreadsheet V2.2 which is located from this link.

https://docs.google.com/file/d/0By9Y3jb2fORNSU9Lbi1rYXl4NUU/edit


This spreadsheet is in a Zipped file to make the file size small. Click download under the file menu and save and unzip to your desktop or document storage. I have added some new Dashboard features. Whole population matches Kens original Coalescence (whole (n) population) variance results. Added sample population (n-1) Coalescence and InterClade Coalescence (n-1) Age. Filtering inside clade sheets integrated and Recalculate Filter button with Modal, rounded Mean and Median STR data. Replaced original Founders Model Interclade Age using a Pooled SD replacing the Iterate task which is no longer needed. Can import 67 and 111 marker Haplotypes directly from an open MikeW R1b-L21 or 67 markers from R1b-Early spreadsheet. Modified Graphical Time Line sheet shows all forms of YBP's. Preloaded with MikeW's L21 spreadsheet 67 marker haplotypes dated 12/27/2012 but one can clear and paste into the clade A sheet your own set of haplotypes. Drop downs for filtering on most columns are available to selectively choose data. Copy data from CladeA to B for interclade runs. Read instruction at the bottom of Results Tab.

V2.2 MJost

DMXX
12-28-2012, 10:03 PM
Fantastic, thank you MJost. Will attempt a few MRCA calculations this week hopefully with it.

MJost
12-28-2012, 10:19 PM
Please let me know what you think of it. I have made it very usefull I believe.

MJost

MJost
01-04-2013, 02:41 PM
Humata, did you have a chance to try the spreadsheet out yet?

MJost

MJost
01-14-2013, 03:40 PM
Updated my TRMCA estimator spreadsheet to V2.3 which is located directly from this link.

https://docs.google.com/file/d/0By9Y3jb2fORNSU9Lbi1rYXl4NUU/edit

This current file is loaded with all current R1b-L21 111 Markers from MikeW's
latest spreadsheet. You may clear the Clade page and enter your own list of haplotypes.

The TMRCA Estimator Excel spreadsheet using the Ken N's Gen111t Engine modified
by me and uses Excel 2003 or newer built in functions such a Var etc enhancing
the precision. This spreadsheet is in a Zipped file to make the file size small.
Click download under the file menu and save and unzip to your desktop or
document storage. I have added some new Dashboard features. Whole population
matches Kens original Coalescence (whole (n) population) variance results. Added
sample population (n-1) Coalescence and InterClade Coalescence (n-1) Age.
Filtering inside clade sheets integrated and Recalculate Filter button with
Modal, rounded Mean and Median STR data. Replaced original Founders Model
Interclade Age using a Pooled SD replacing the Iterate task which is no longer
needed. Can import 67 and 111 marker(added ExtHts Vlookup to provide full data
columns) Haplotypes directly from an "Open" MikeW R1b-L21 or 67 markers from
R1b-Early spreadsheet. Modified Graphical Time Line sheet shows all forms of
YBP's. Read instruction at the bottom of Results Tab. V2.3

MJost
TMRCA Estimator Excel spreadsheet

DMXX
01-22-2013, 02:11 AM
Humata, did you have a chance to try the spreadsheet out yet?

MJost

I didn't get a chance to since my last message due to time constraints, although I've opened another genealogical "line of inquiry" that'll serve as a nice opportunity to try your spreadsheet out. Will provide feedback in the next week.

MJost
02-11-2013, 12:59 AM
TRMCA Variance Estimator V3.1 is released.

MikeW's R1b-Early 67 marker Haplotypes Loaded with all major subclades included and their zzzunknown haplogroup Hts included. L21 includes all Subclades and those xsubclades, and xsubclades Hts that are untested.

Bird's q ranked STR option added to the spread sheet, along with Results page refined layout.

http://tinyurl.com/TMRCA-Gen111T-Estimator

MJost

MJost
02-24-2013, 03:50 AM
TMRCA Variance Estimator V3.3 is released.

MikeW's R1b-L21 111 marker Haplotypes Loaded with all major subclades included and their zzzunknown haplogroup Hts included. L21 includes all Subclades and those xsubclades, and xsubclades Hts that are untested. R1b-early xL21 67 marker HTs in backup tab.

Fixed 111 marker Str issue in Bird's q ranked STR and 111 marker slow option.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
04-08-2013, 05:18 PM
My TRMCA Variance Estimator V5 is released.

Loaded with MikeW's latest R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab.

Added a new beta option for a Geometric Mean Variance TMRCA calculation for data contained inside of CladeA using a fixed 25 markers only selection. (The data loaded does not contain any haplotypes that had Nuls due to GeoMean requirements.) Using STR's it appears that there is no significant differences with large numbers of Hts so I am not sure if I will expand this out to 111 markers or not. This option requires SetGeo and Set25Markers to be used at the same time.

Normal variance uses Arithmetic mean which is used when you want optimistic results or the time frame is short. Arithmetic mean = Sum of values / count of values.

Geometric mean is always smaller than Arthmetic mean and is used when pesimistic results are desired or the time frame is long because Arithmetic mean will be too big.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
04-08-2013, 09:48 PM
Pasting the entire link


http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

Silesian
04-09-2013, 06:26 PM
My TRMCA Variance Estimator V5 is released....

MJost is it possible you can run my 67 str and Kazak,result's through your calculator?


I'm interested in comparing Nazarov-Piotr Nazarov, b.1886, d.1932, Yaik Cossack clan Kazakhstan and my results?
http://www.familytreedna.com/public/Bashqort_Clans/default.aspx?section=yresults
(BASHKIRS, COSSACKS, POLES) R1b+L150
We are both listed as R1b1a2a1

MJost
04-09-2013, 08:04 PM
With only two haplotypes, I can only calcuate IntraClade Coalescence (n-1) Age of using Bird's q STRs (twenty five of 67 markers) results in a

7,761 +-1,568 year before present with a max of 9,329 years for you two guys with SNP L150? But listed as R1b1a2a1.

223828 Nazarov Piotr Nazarov, b.1886, d.1932, Yaik Cossack clan Kazakhstan R1b1a2a1 12 21 14 11 11-15 12 12 12 13 14 28 17 9-10 11 11 25 15 19 28 15-15-16-18 11 11 19-23 16 16 17 17 37-42 12 12 11 9 15-16 8 10 10 8 10 11 12 21-22 16 10 12 12 15 8 12 22 20 13 12 11 13 11 11 12 12
176123 A.K. Silesia/ Poland/Germany/Czech[ Shlesien,Śląsk1750] Poland R1b1a2a1 12 24 14 10 11-14 12 12 11 14 13 31 16 9-10 11 11 25 15 19 33 14-15-16-19 10 10 19-23 16 16 19 19 37-40 12 12 11 9 15-16 8 10 10 8 10 11 12 23-23 16 10 12 12 15 8 12 23 20 14 12 11 13 11 10 12 12

Arent you really L51/M412/PF6536/S167? And have you Geno2 tested?

MJost

Silesian
04-10-2013, 04:48 AM
With only two haplotypes, I can only calcuate IntraClade Coalescence (n-1) Age of using Bird's q STRs (twenty five of 67 markers) results in a

7,761 +-1,568 year before present with a max of 9,329 years for you two guys with SNP L150? But listed as R1b1a2a1.

http://www.familytreedna.com/public/Bashqort_Clans/default.aspx?section=ysnp

223828 Nazarov Piotr Nazarov, b.1886, d.1932, Yaik Cossack clan Kazakhstan R1b1a2a1 12 21 14 11 11-15 12 12 12 13 14 28 17 9-10 11 11 25 15 19 28 15-15-16-18 11 11 19-23 16 16 17 17 37-42 12 12 11 9 15-16 8 10 10 8 10 11 12 21-22 16 10 12 12 15 8 12 22 20 13 12 11 13 11 11 12 12
176123 A.K. Silesia/ Poland/Germany/Czech[ Shlesien,Śląsk1750] Poland R1b1a2a1 12 24 14 10 11-14 12 12 11 14 13 31 16 9-10 11 11 25 15 19 33 14-15-16-19 10 10 19-23 16 16 19 19 37-40 12 12 11 9 15-16 8 10 10 8 10 11 12 23-23 16 10 12 12 15 8 12 23 20 14 12 11 13 11 10 12 12

Arent you really L51/M412/PF6536/S167? And have you Geno2 tested?

MJost

Thanks for your help. Does this help. We are bothL150+/ L51-
Here are my snp's tested with Familytreedna kit 176123-R1b1a2a1 R-L150-L150+, L277-, L51-, L584-, Z2105+
Here are Piotr Nazarov Cossack Kazakhstan- kit 223828-R1b1a2a1 R-L150-L150+, L23+, L51- , M269+, P312-, U106-

MJost
04-10-2013, 05:44 PM
Thanks for your help. Does this help. We are bothL150+/ L51-
Here are my snp's tested with Familytreedna kit 176123-R1b1a2a1 R-L150-L150+, L277-, L51-, L584-, Z2105+
Here are Piotr Nazarov Cossack Kazakhstan- kit 223828-R1b1a2a1 R-L150-L150+, L23+, L51- , M269+, P312-, U106-

I had a conversation with the R1b Admin Mike Walsh and he is suggesting that Nazarov 223828 needs to test for Z2103/Z2105. And I think you should test for Z2103. Both of you need to test further as there seems to be a placement issue with the two new SNPs and it needs further investigation.

https://dl.dropboxusercontent.com/u/17907527/R1b_Descendency_Tree.jpg

MJost

MJost
04-11-2013, 08:01 PM
My TRMCA Variance Estimator V5.1 is up.

Loaded with MikeW's latest R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab.

Fixed: Results-Gen111t cells m9 & 10 for CI SD Gen Coal(n-1) to actually sample, changed count of n-1 from n. Added SetGeo and DelGeo and Clear Work Data from sheets for compaction saving size. Standardize Macro button colors. Restore missing CladeB K/N and Mutation Counting routine for CladeB.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

Silesian
04-14-2013, 03:51 AM
With only two haplotypes, I can only calcuate IntraClade Coalescence (n-1) Age of using Bird's q STRs (twenty five of 67 markers) results in a

7,761 +-1,568 year before present with a max of 9,329 years for you two guys with SNP L150? But listed as R1b1a2a1.

223828 Nazarov Piotr Nazarov, b.1886, d.1932, Yaik Cossack clan Kazakhstan R1b1a2a1 12 21 14 11 11-15 12 12 12 13 14 28 17 9-10 11 11 25 15 19 28 15-15-16-18 11 11 19-23 16 16 17 17 37-42 12 12 11 9 15-16 8 10 10 8 10 11 12 21-22 16 10 12 12 15 8 12 22 20 13 12 11 13 11 11 12 12

176123 A.K. Silesia/ Poland/Germany/Czech[ Shlesien,Śląsk1750] Poland R1b1a2a1 12 24 14 10 11-14 12 12 11 14 13 31 16 9-10 11 11 25 15 19 33 14-15-16-19 10 10 19-23 16 16 19 19 37-40 12 12 11 9 15-16 8 10 10 8 10 11 12 23-23 16 10 12 12 15 8 12 23 20 14 12 11 13 11 10 12 12

Arent you really L51/M412/PF6536/S167? And have you Geno2 tested?

MJost

Can you run your program IntraClade Coalescence test one last time except add two more. {4} In total 1-Poland/Silesia, 2-Yaik Cossack3-Tartar Khazakstan,4-Pashtun, Abbottabad, Kashmir border.


223828 Nazarov Piotr Nazarov, b.1886, d.1932, Yaik Cossack clan Kazakhstan R1b1a2a1 12 21 14 11 11-15 12 12 12 13 14 28 17 9-10 11 11 25 15 19 28 15-15-16-18 11 11 19-23 16 16 17 17 37-42 12 12 11 9 15-16 8 10 10 8 10 11 12 21-22 16 10 12 12 15 8 12 22 20 13 12 11 13 11 11 12 12


176123 A.K. Silesia/ Poland/Germany/Czech[ Shlesien,Śląsk1750] Poland R1b1a2a1 12 24 14 10 11-14 12 12 11 14 13 31 16 9-10 11 11 25 15 19 33 14-15-16-19 10 10 19-23 16 16 19 19 37-40 12 12 11 9 15-16 8 10 10 8 10 11 12 23-23 16 10 12 12 15 8 12 23 20 14 12 11 13 11 10 12 12


2XHXQ Nurgaliyev Kazakhstan 12 24 14 10 11-15 12 12 12 12 13 29 16 9-10 11 11 25 15 19 30 15-15 -16-17 11 10 19 23 16 16 19 18 37 38 12 12 12 12 14 12 9 15 16 8 10 10 8 11 10 23 23 16 10 12 12 15 8 22 20 12 11 13 11 11 12 12


204505 Yaar Ali Khan, c. 1910 Pakistan R1b1a2 12 24 14 10 11-14 12 13 12 13 13 29 15 9-10 11 11 26 15 19 31 14-15-16-18 10 11 19-23 15 17 21 18 37-40 12 12

thank-you

MJost
04-16-2013, 03:40 AM
I could have thought I replied to this message. Ok, 2XHXQ Nurgaliyev panel 38 to 67 can not be correct. Some of them are out of order ect. Can you check and see if he is in another FtDNA project and find his correct marker allele values?

MJost

MJost
04-20-2013, 10:24 PM
My TRMCA Estimator V6 is released.

Loaded with MikeW's latest 04/19/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab.

Added GD and GD IAM for MCMs calculator for only two selected Hts in each Clade A & B worksheet. Fixed CI SD Gen Coal(n-1) formulas and restored missing CladeB K/N and Mutation Counting.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
04-20-2013, 10:50 PM
Here are the STRs that are not considered stable by Steve Bird. His list of stable STRs are included in my spreadsheet as a TRMCA option for deeper ancestry.

This STR list below contains less linear relationships and have q's greater than the recommend cutoff of 0.07 or less.

Multi-copy Markers are not recommended along with these STRs.

These high linkage disequilibrium bimodal STRs
438
392
389II
389I

Single-site STRs with higher q's
19 0.309566
388 0.204902
393 0.693764
426 0.083828
434 11.833316
435 286.840313
436 52.030381
442 0.230691
445 0.112402
450 1.868858
454 286.840313
455 0.393904
461 0.14547
462 0.31254
472 3602.481783
487 0.181027
490 4.049567
492 0.195221
494 14.520367
497 0.254374
505 0.143035
525 0.599971
531 0.092912
552 0.399449
556 0.125281
561 0.402484
565 0.147108
568 0.420804
572 0.331963
575 1238.148396
587 1.3739
589 2.755554
590 15.070801
593 36.286593
594 2.441254
638 0.243963
640 2.591648
641 129.116061
716 0.560777
717 0.133671
726 8.239777
1B07 0.259133

Bird SC (2012) Towards Improvements in the Estimation of the Coalescent: Implications for the Most Effective Use of Y Chromosome Short Tandem
Repeat Mutation Rates. PLoS ONE 7(10): e48638. doi:10.1371/journal.pone.0048638


MJost

MJost
04-24-2013, 06:07 AM
My TRMCA Estimator V7 is released.

Loaded with MikeW's latest 04/19/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab.

I implemented a major change which will now allow the user to specify the exact STRs to be entered enhancing the ability to emulate various published studies. Prior, I had to remove unused STRs' mutation rate(s) and other cell data that was used for various functions in calculating a TMRCA. The spreadsheet is automated to use only the STRs entered but if the user will have to select the incremental increasing size panel to cover all STRs. Example, if DYS434 is used, the sheet will need to be "Set to 111 marker" option selected.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
04-28-2013, 04:38 AM
TRMCA Estimator V7.1 is released.

Loaded with MikeW's latest 04/27/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab.

I re-enable GeoMean feature and left the GeoMean results calculation engine fully setup that is created by the Setup GEO button. It will stay intact unless you run the Clear Wrk or Del GEO Button. The GeoMean require the use of Filtering and 25 marker option. And I update with the L21 haplotypes, removing entire HTs that had single Nuls or replacing it with the modal of the entire Clade for the 21-n922-A1 guys, ect.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
04-29-2013, 06:52 AM
TRMCA Estimator V7.2 is released.

Loaded with MikeW's latest 04/27/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab.

I fixed GeoMean when taking the sum of all the STR's squared GeoMean difference results but counted empty Haplotype rows due to a zero entry.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
05-05-2013, 07:49 PM
TRMCA Estimator V7.7 is released.

Loaded with MikeW's latest 05/05/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab. Fix several items as documented in the notes.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
06-12-2013, 02:26 PM
TRMCA Estimator V7.9 is released.

Loaded with MikeW's latest 06/09/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab. Fix several items as documented in the Results-TMRCA worksheet notes.

The data can be easily Cleared and any other haplotypes can be used, but for interclade work, place the ancestral clade HTs in Clade B work sheet.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
06-26-2013, 10:00 PM
TRMCA Estimator V8 is released.

Loaded with MikeW's latest 06/022/2013 R1b-L21 111 and 67 marker Haplotypes with all major subclades included and their zzzunknown haplogroup Hts included. R1b-early xL21 67 marker HTs in Backup tab. Fix several items as documented in the Results-TMRCA worksheet notes.

The data can be easily Cleared and any other haplotypes can be used, but for interclade work, place the ancestral clade HTs in Clade B work sheet.

Added Anatole Klyosov's method of comparing both Linear and logarithmic Audit for one common ancestor using 25 or less STR panel haplotypes that have at least four base ancestral modals.

http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

MJost

MJost
07-01-2013, 05:24 AM
TRMCA Estimator V8.1 is released.


This url is no long valid>> http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

Use the replacement below.

http://tinyurl.com/TMRCA-Estimator

MJost

MJost
07-07-2013, 01:26 PM
TRMCA Estimator V8.1 is released.


This url is no long valid>> http://tinyurl.com/TMRCA-Gen111T-Estimator-L21

Use the replacement below.

http://tinyurl.com/TMRCA-Estimator

MJost

Uploaded my TRMCA spreadsheet with the July 5th MikeW L21 Haplotypes.

MJost

MJost
07-26-2013, 02:38 AM
Uploaded my v8.2 TRMCA Estimator spreadsheet with MikeW's July 25th L21 Haplotypes, 67 markers in CladeA and 111Markers in CladeB. MikeW just recategorized the CTS4466's today (n=481 including Suspects) and I have them loaded in my TRMCA Estimator today.

http://tinyurl.com/TMRCA-Estimator

I added a STR allele value alert when their are any Haplotypes have any STR with allele values GD greater than plus or minus TWO from the Filtered modal. Look at TMRCA- Results row 38 and 39 for a '1' or more shown under each STR for each Clade. I got bit by a DYS 446 that out of 20 haplotypes, 19 had 23 and one had an 18 causing the number of generation to almost triple when using the smaller 25 marker panel size. As one would increase to 37, 67 or 111 that STR variance smoothed out to a smaller and smaller number of generations. Now its a standard tool that should be used as I have not implement a Infinite Allele Mutation Model option for any other STRs except Multi-copy Markers via Ken N's formula in the spread sheet.

Also resolved the issue by running a Macro when one wishes to edit any cells in columns A thru G that previously may have been set to validate mode after having copied cells pasted into my sheets. (MikeW you might want that macro code for you own Haplotype Spreadsheet as it has the same issue.)


MJost

MJost
08-06-2013, 04:31 PM
Uploaded my v8.3 TRMCA Estimator spreadsheet with MikeW's July 28th L21 Haplotypes, 67 markers in CladeA and 111Markers in CladeB. .

(TMRCA_Estimator Loaded with the latest L21-111(2,009 HTs) & 67 Markers(8,630 HTs)_Ver8.3 MJost 08/6/2013)


http://tinyurl.com/TMRCA-Estimator



MJost

MJost
08-06-2013, 10:02 PM
Lets look at calculating a TRMCA using the Scots Sc clade which is a very bushy group. The new Daddy L1335 is a small Wales based clade to the L1065 Scots. Reviewing that set of 111 marker Haplotypes, if one counts the mutations between the two at 111 markers there are 27 , which are based on the current Today's haplotypes, but this is all between their two associated modals which in essence will need to be divided by two to get to the MRCA. Recall, that increasing the number of markers used brings more confidence.

For a more detailed examination, using my TMRCA spreadsheet, I look at several criteria where I look and see if there are any possible bi-modal STRs or outlier haplotypes that show a wider than +- two allele range from the Modal. This more critical on smaller sets of HTs than the larger groups and in smaller panels vs larger and larger. You can treat the outliers by excluding them. Example was where I had a one haplotype out of 20 with an odd DYS456 value of 18 (all others had a 23) throwing my calculation off by a large amount at 25 markers at more than 10 plus generations. I use my Filter Clade(x) for STR GD's with plus or minus two from Modal in the displayed Dashboard report for those odd allele values first and try to determine how it will affect the TMRCA results via different panel size etc.

Next in order is to choose which size panel represents the best overall closest to the modal variance between the haplotypes using a Bruce Walsh designed tool called the k/n ratio which he states:

Http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461668/

"We assume that n markers are scored nonrecombining chromosome of interest (either the Y or mtDNA) and that we observe matches in allelic state at k of these. We start by assuming the infinite alleles model, where each mutation is assumed to be unique. We then modify our results by assuming a (symmetric single-step) stepwise mutational model, which is a more descriptor for microsatellite markers. As we show, when k/n is close to one and n moderate to large, the two different mutational models give essentially the distribution for t (time)."

My method is that when k/n is closest to one it gives essentially the best distribution for time. As you can notice the 37 marker panel is shows the worst k/n and should not be used for TMRCAs. This is what I might consider as the parallel to see how every HT including the outliers in the group fit.

Match k/n scored markers - Filtered (goal is closest to 1) for the CladeA: Scots 1335-Sc and CladeB: L1335 All
Ratio 111 (k/n) 67 (k/n) 37 (k/n) 25 (k/n)
CladeA 0.90319 0.90294 0.85399 0.89988
CladeB 0.89890 0.89739 0.84751 0.89409


I next look at the Confidence Index numbers to help decided which is the best set of markers to use in the above best determined panel size (This tool is also used to set a specific CI for standardization when running multiple clade aging). I usually just set the Confidence Level to 1 sigma (68.27%), for an assumed normal distribution, the probability that a measurement falls within 1 standard deviation of the mean. When performing TMRCA's on different size and/or clade data sets, using the same CI results helps confer standardization for simplicity when comparing of multiple different subclades.

Combining ancestral subclades sometime requires mixing with its subclade(s) for an IntraClade Coalescence, Founders Modal, and Interclade for the both. In the case of the Wales guys, their TMRCA is much younger by itself due to very small number of HTs that have about a 475 ybp modal which as we would have to assume, are missing lineages that possibly daughtered out or have yet to be found and tested. But when combined with its Subclades, the HTs creates a wider modal foot print where the number of generations appears to separate nicely. If the TMRCA is significanty outside the average of the others, then it probably should not be considered such as the IAM-MCM TMRCA ages as shown below. Combined, there are zero GD between them.



111 Markers Bird's 48 94 NMCM IAM-MCM 111 ALL
IntraClade Coalescence (n-1) Age
YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1)
Clade A: 1335-Sc's
1,373.5 10.06 1,326.9 10.86 1,278.7 11.57 1,397.7 13.23
Clade B: 1335 All
1,461.9 10.82 1,419.5 11.77 1,355.9 12.38 1,477.4 14.08

Diff = 88.4 0.76 92.7 0.91 77.2 0.81 79.7 0.86

Intraclade Founder's Modal Age
YBP CI-SD-GenModal YBP CI-SD-GenModal YBP CI-SD-GenModal YBP CI-SD-GenModal
Clade A: 1335-Sc's
1,512.6 11.59 1,444.1 12.29 1,152.1 9.87 1,525.9 15.04

Clade B: 1335 All
1,601.7 12.37 1,538.6 13.25 1,226.2 10.62 1,607.7 15.94


The CI concept is to chose the most stable type of markers with in the best overall panel as shown by the CI results. In this calculation the Interclade Founders modal is the same as the Intraclade Founders age, which should be and is the high end of the Clade A's TMRCA.

And the result shows that L1335* appears to be only about three generations older than the L1065's at a TMRCA of 1,601.7 vs 1,512.6 ybp, three generations. Only more L1335 haplotypes will confirm or not the truth. There was a quick SNP mutation between Wale and the Scots Guys. Need to look for a 1335 Sc variety that did not get the L1065 to prove that the SNP L1065 was farther apart in generations from L1335 than about three.


Soon I will show how these same methods produced a very close number of generations compared to a known Paper trail. The known ages information remained hidden until after TMRCA result were produced and then was compared to the actual known data in a TRMCA Methods Study which should be due out shortly.

MJost

MJost
08-09-2013, 02:34 AM
Added a macro in V8.5 to import MikeW's P312xL21 haplotypes. I ended up removing all the suspects to be able to combine the 67 lists and had plenty of rows for the 111 but again I only keep those HTs with Positive P312 or better Results. I ended up with a little over 1,900 111 marker HTs and 5,535 with the 67 length. I also removed Ysearch HTs. I left out the L21 duplicate HTs but did not check P312's list as it didnt seem to affect the the entire picture much.


http://tinyurl.com/TMRCA-Estimator



MJost

MJost
08-10-2013, 10:07 PM
Lets look at calculating a TRMCA using the Scots Sc clade which is a very bushy group. The new Daddy L1335 is a small Wales based clade to the L1065 Scots. Reviewing that set of 111 marker Haplotypes, if one counts the mutations between the two at 111 markers there are 27 , which are based on the current Today's haplotypes, but this is all between their two associated modals which in essence will need to be divided by two to get to the MRCA. Recall, that increasing the number of markers used brings more confidence.

For a more detailed examination, using my TMRCA spreadsheet, I look at several criteria where I look and see if there are any possible bi-modal STRs or outlier haplotypes that show a wider than +- two allele range from the Modal. This more critical on smaller sets of HTs than the larger groups and in smaller panels vs larger and larger. You can treat the outliers by excluding them. Example was where I had a one haplotype out of 20 with an odd DYS456 value of 18 (all others had a 23) throwing my calculation off by a large amount at 25 markers at more than 10 plus generations. I use my Filter Clade(x) for STR GD's with plus or minus two from Modal in the displayed Dashboard report for those odd allele values first and try to determine how it will affect the TMRCA results via different panel size etc.

Next in order is to choose which size panel represents the best overall closest to the modal variance between the haplotypes using a Bruce Walsh designed tool called the k/n ratio which he states:

Http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461668/

"We assume that n markers are scored nonrecombining chromosome of interest (either the Y or mtDNA) and that we observe matches in allelic state at k of these. We start by assuming the infinite alleles model, where each mutation is assumed to be unique. We then modify our results by assuming a (symmetric single-step) stepwise mutational model, which is a more descriptor for microsatellite markers. As we show, when k/n is close to one and n moderate to large, the two different mutational models give essentially the distribution for t (time)."

My method is that when k/n is closest to one it gives essentially the best distribution for time. As you can notice the 37 marker panel is shows the worst k/n and should not be used for TMRCAs. This is what I might consider as the parallel to see how every HT including the outliers in the group fit.

Match k/n scored markers - Filtered (goal is closest to 1) for the CladeA: Scots 1335-Sc and CladeB: L1335 All
Ratio 111 (k/n) 67 (k/n) 37 (k/n) 25 (k/n)
CladeA 0.90319 0.90294 0.85399 0.89988
CladeB 0.89890 0.89739 0.84751 0.89409


I next look at the Confidence Index numbers to help decided which is the best set of markers to use in the above best determined panel size (This tool is also used to set a specific CI for standardization when running multiple clade aging). I usually just set the Confidence Level to 1 sigma (68.27%), for an assumed normal distribution, the probability that a measurement falls within 1 standard deviation of the mean. When performing TMRCA's on different size and/or clade data sets, using the same CI results helps confer standardization for simplicity when comparing of multiple different subclades.

Combining ancestral subclades sometime requires mixing with its subclade(s) for an IntraClade Coalescence, Founders Modal, and Interclade for the both. In the case of the Wales guys, their TMRCA is much younger by itself due to very small number of HTs that have about a 475 ybp modal which as we would have to assume, are missing lineages that possibly daughtered out or have yet to be found and tested. But when combined with its Subclades, the HTs creates a wider modal foot print where the number of generations appears to separate nicely. If the TMRCA is significanty outside the average of the others, then it probably should not be considered such as the IAM-MCM TMRCA ages as shown below. Combined, there are zero GD between them.



111 Markers Bird's 48 94 NMCM IAM-MCM 111 ALL
IntraClade Coalescence (n-1) Age
YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1) YBP CI-SD-Gen-Coal(n-1)
Clade A: 1335-Sc's
1,373.5 10.06 1,326.9 10.86 1,278.7 11.57 1,397.7 13.23
Clade B: 1335 All
1,461.9 10.82 1,419.5 11.77 1,355.9 12.38 1,477.4 14.08

Diff = 88.4 0.76 92.7 0.91 77.2 0.81 79.7 0.86

Intraclade Founder's Modal Age
YBP CI-SD-GenModal YBP CI-SD-GenModal YBP CI-SD-GenModal YBP CI-SD-GenModal
Clade A: 1335-Sc's
1,512.6 11.59 1,444.1 12.29 1,152.1 9.87 1,525.9 15.04

Clade B: 1335 All
1,601.7 12.37 1,538.6 13.25 1,226.2 10.62 1,607.7 15.94


The CI concept is to chose the most stable type of markers with in the best overall panel as shown by the CI results. In this calculation the Interclade Founders modal is the same as the Intraclade Founders age, which should be and is the high end of the Clade A's TMRCA.

And the result shows that L1335* appears to be only about three generations older than the L1065's at a TMRCA of 1,601.7 vs 1,512.6 ybp, three generations. Only more L1335 haplotypes will confirm or not the truth. There was a quick SNP mutation between Wale and the Scots Guys. Need to look for a 1335 Sc variety that did not get the L1065 to prove that the SNP L1065 was farther apart in generations from L1335 than about three.


Soon I will show how these same methods produced a very close number of generations compared to a known Paper trail. The known ages information remained hidden until after TMRCA result were produced and then was compared to the actual known data in a TRMCA Methods Study which should be due out shortly.

MJost

Full Info PDF
https://docs.google.com/file/d/0By9Y3jb2fORNWGcyblVjakZudVE/edit?usp=sharing

MJost

MJost
08-18-2013, 05:10 AM
Soon I will show how these same methods produced a very close number of generations compared to a known Paper trail. The known ages information remained hidden until after TMRCA result were produced and then was compared to the actual known data in a TRMCA Methods Study which should be due out shortly.

MJost

Here is my Second Look results on the Marleen Van Horn TRMCA challange on Rootsweb which used her known paper trail haplotypes. The original submission requested 37, 67 and 111 marker TMRCA's for four parts of the tree and my Second review was submitted before Marleen released the specific birth ages and other notes. Several changes by Marleen were distributed at several stages during the study. After the reveal of the true ages, I made some revisions, one removing a haploytpe in Ancestor 8 who was actually a dupicate of Ancestor 2 and the other main change was that Haplotype Number Three had an odd DYS447 value of 18 (all others had a 23). I treated number three's allele value as IAM changing it to a 22 and reran the numbers. Here are my Revised Second Look TMRCA's.

https://docs.google.com/file/d/0By9Y3jb2fORNdUFQSkFfT1ZyWXM/edit?usp=sharing

The formal Study Results are here:
http://tinyurl.com/ChallengeSummary


MJost

MJost
09-17-2013, 07:25 PM
TMRCA_Estimator Loaded with the latest MIkeW's L21 with subclades 67 and 111 Markers Ver8.7b MJost 09/17/2013


http://tinyurl.com/TMRCA-Estimator



MJost

MJost
09-17-2013, 09:56 PM
I have combined MikeW's recently updated R1b subclades spreadsheets, R1b including U106, P312 xL21 and L21 creating my TMRCA Estimator V8.7b spreadsheet loaded with 67 markers.

TMRCA EstimatorV8.7b-R1bALL-67M

https://docs.google.com/file/d/0By9Y3jb2fORNU3hTM180ejFQTms/edit?usp=sharing

Bird's Stable STRs


Count IntraCladeCoalescence(n-1)Age MeanGenerations StdDevInGen YBP +OR-YBP Max-YBP
N=5712 Clade A: R1b-P312 All 120.2 35.6 3,606.3 1,068.8 4,675.1
N=268 Clade B: R1b xP312-xU106 162.2 41.4 4,867.4 1,241.7 6,109.2
Diff = 42.0 1,261.2 1,434.1
Count IntracladeFounder'sModalAge ModalGenAge StdDevInGen YBP +OR-YBP Max-YBP
N=5712 Clade A: R1b-P312 All 130.8 37.2 3,924.9 1,115.0 5,040.0
N=268 Clade B: R1b xP312-xU106 192.7 45.1 5,780.0 1,353.1 7,133.2
Diff = 61.8 1,855.1 2,093.2

TRUEMRCA InterCladeGAB Generations StdDevInGen YBP +OR-YBP Max-YBP
Pooled SD Clades A & B InterCladeCoalescence(n-1)Age: R1b-All for R1b-P312 All & R1b xP312-xU106 124.6 36.3 3,737.4 1,088.1 4,825.5
Pooled SD Clades A & B IntercladeModalFounder's: R1b-All for R1b-P312 All & R1b xP312-xU106 178.3 32.8 5,348.2 983.3 6,331.5



Count IntraCladeCoalescence(n-1)Age MeanGenerations StdDevInGen YBP +OR-YBP Max-YBP
N=1800 Clade A: R1b-U106 All 108.6 33.9 3,257.9 1,015.9 4,273.8
N=268 Clade B: R1b xP312-xU106 162.2 41.4 4,867.4 1,241.7 6,109.2
Diff = 53.7 1,609.5 1,835.4

Count IntracladeFounder'sModalAge ModalGenAge StdDevInGen YBP +OR-YBP Max-YBP
N=1800 Clade A: R1b-U106 All 119.7 35.6 3,592.0 1,066.7 4,658.7
N=268 Clade B: R1b xP312-xU106 192.7 45.1 5,780.0 1,353.1 7,133.2
Diff = 72.9 2,188.0 2,474.5

TRUEMRCA InterCladeGAB Generations StdDevInGen YBP +OR-YBP Max-YBP
PooledSDClades A & B InterCladeCoalescence(n-1)Age: R1b-All for R1b-U106 All & R1b xP312-xU106 124.4 36.2 3,730.8 1,087.1 4,817.9
PooledSDClades A & B IntercladeModalFounder's: R1b-All for R1b-U106 All & R1b xP312-xU106 177.7 32.5 5,330.4 975.7 6,306.1


MJost

MJost
10-30-2013, 06:40 PM
TMRCA Estimator Loaded with the latest 10/28 HTs from MikeW's L21 spreadsheet pulling down all available 67 and 111 Markers. I also imported MikeW's 67Markers from R1b-Early and the P312** only haplotypes. Ver 8.8 MJost 10/30/2013

http://tinyurl.com/TMRCA-Estimator

MJost

MJost
12-19-2013, 03:08 PM
TMRCA Estimator V10.1 loaded with the latest 12/18/13 HTs from MikeW's L21 spreadsheet pulling down all available 67 and 111 Markers. MJost 12/19/2013
http://tinyurl.com/TMRCA-Estimator

MJost

MJost
01-18-2014, 08:02 PM
Updated the TMRCA Estimator V10.3 loaded with the latest 01/14/14 HTs from MikeW's L21 spreadsheet pulling 67 and 111 Markers. The number of 67 markers were over the max 9990 haplotypes limit. I removed Ysearch HTs to bring it down under the max. MJost 01/18/14

http://tinyurl.com/TMRCA-Estimator

MJost

Rod Bruce
03-28-2014, 03:17 AM
Updated the TMRCA Estimator V10.3 loaded with the latest 01/14/14 HTs from MikeW's L21 spreadsheet pulling 67 and 111 Markers. The number of 67 markers were over the max 9990 haplotypes limit. I removed Ysearch HTs to bring it down under the max. MJost 01/18/14

http://tinyurl.com/TMRCA-Estimator

MJost

Newbie here.
I have downloaded the TMRCA Estimator and am currently exploring it - it will take me a while to digest it all.
Kudos for the impressive design and programming and to all the contributors.

I am keen to develop TMRCA trees for my project, to graphically show the relatedness of the member tests.
See here (https://www.familytreedna.com/public/bruce/default.aspx?section=results) to get an idea of what I am trying to achieve, and to note the spread of our project haplotypes and how they are currently divided into groups.
The method I used (which I gleaned from here (http://home.earthlink.net/~odoniv/HamCountry/HAM_DNA_Project/HAM_DNA_Phylogenetic_chart_Instructions.html)) uses the yutility111 to generate the TMRCA data (hybrid mutation model, 95% probability, 25 years per generation, FTDNA mutation rates).
The resulting PHYLIP data text was then fed into the Kitsch program (using commands l, j, 9, 1000), and the resulting tree file was then massaged and labelled and the graph images created from that.
I don't know what the Kitsch program is doing exactly, but I do understand that it is juggling and ordering the branch nodes.
The output looks pretty good to me, but may well be inaccurate due to the yutility111 and Kitsch parameters used.

Anyway, I hope I will get more accurate results by using the TMRCA Estimator, or some cutdown version of the toolset used in it.
Any ideas about how I might generate graphical tree output from it?
Or else (working backwards), how I might generate PHYLIP style data from it?

Thanks,
Rod

Rod Bruce
03-28-2014, 03:22 AM
I was using McGee's http://www.mymcgee.com/tools/yutility111.html

MJost
03-28-2014, 12:28 PM
Phylip data is based on calculated years to the most recent common ancestor for every pair of haplotypes is created. My TRMCA Estimator does not provide this type of information. It will be far easier to enter your set of haplotypes directly into McGee's Y Utilty and calculate a TMRCA matrix and get an output file in a Phylip format.

I like simple with SplitsTree4 for building phylogenetic networks.

MJost

Rod Bruce
03-29-2014, 02:45 PM
Thanks for the pointer to SplitsTree4.
Much better than Mega5.2 which I was using.

MJost
03-29-2014, 08:32 PM
Here is one I did a while back for L21 111 markers using over 750 Kits. Just download and open it and you can zoom into any area.

https://drive.google.com/file/d/0By9Y3jb2fORNTnNCbTU3Z0NlZEk/edit?usp=sharing

MJost