PDA

View Full Version : NGS Raw Results sharing and analysis for R1b-L21



Mikewww
09-30-2015, 04:24 PM
I am a co-administrator for the R-L21 project and have been following a process to support sharing of raw results data. I probably should have started this thread a long time ago although I think it is understood among most NGS testers on the R1b-L21-project yahoo group.

Here are two key related links, especially the first one.

http://ytree.net/

The ytree.net link is to Alex Williamson's Big Tree phylogenetic analysis. I can't say how fortunate we are to have L21 person like Alex commit his I/T skills, logical thought process and knowledge of NGS testing to this. This started out as L21 only but he has expanded to P312 overall and his work has been the greatest thing since sliced bread, in my opinion.

The following link is a spreadsheet that I upload from time to time. I upload it less frequently now because Alex's Big Tree has been enhanced and enhanced to where there is not as much need for this spreadsheet. I call it R1b-L21_Discovery_V1. It has ended up being more of a log for myself.

http://tinyurl.com/R1b-L21-SNP-Discovery-xlsm

If you download this right now you will see it is dated 07/24/15 and I need to update it soon. I generally send a notice on the R1b-L21-project yahoo group when I update this file and display the link to it there.

I'll post more about the process and try to explain the Big Tree a little bit, but I think it is good for people to read the front page of Alex's Big Tree.

I never want to slow the Big Tree updates down so I generally upload any new raw results I get to the R1b-L21-project yahoo group as soon as I get them and rename them (with kit#/MDKA name).

George Chandler
10-01-2015, 12:12 AM
Is there plans to correct the errors for S1051 that I sent you within Alex's tree?

Mikewww
10-01-2015, 03:21 AM
Is there plans to correct the errors for S1051 that I sent you within Alex's tree? I haven't had a chance to talk to him about it, but I think the best place to post about the Big Tree and L21 is over on the R1b-L21-project yahoo group where he posts and from where he gathers most of the data.

He has a method and I believe is very thorough. The Big Tree is his interpretation and interpretations vary. He tends to be more aggressive in retaining SNPs rather than conservative but since it is a draft tree that is not problematic. If something is found to be inconsistent he discards it at that point. I can't adequately defend his methods, though, any more than anyone else's.

Mikewww
10-01-2015, 03:32 AM
My primary role as it relates to NGS results is to contact project members and ask them to share their VCF results. I typically don't delve into interpretations other than the free analysis that Alex Williamson does. I don't even mention that unless the tester wants to know what kind of analysis is being done.

Over the last couple of years I've sent out requests to somewhere over 1,200 people. If I don't hear back I will go into their myFTDNA account and see if they have a second email. If they send me something or have problems with the steps or want more help I might have a several emails per tester. Of course, since this the primary source of contacts is FTDNA projects that means I'm dealing with Big Y testers as far as proactive requests for data sharing.

Here is the typical email I send:

"Hello ____, your Big Y results are in for _____. These results are only useful if shared so they can be compared across people. We currently have a comparison including more than 1000 L21 people. To do the comparison, please send me your raw results. Here's how.

Sign in to your myFTDNA account with your kit # (not as a project admin.)

Scroll down until to the OTHER RESULTS section and select the orange BIG Y RESULTS.

On the BIG Y RESULTS screen select the blue DOWNLOAD RAW RESULTS button on the right.

At this point, you may see an error message related to "Houston". If that happens, please check again the next day. It may take a day or two for FTDNA to update these files.

If you don't get the error message, then just select the green DOWNLOAD .VCF on the left then finally SAVE FILE okay. (Do not select the .BAM file, it is quite large.)

A zipped (compressed) .VCF file folder should download very quickly.

Please email me the file you downloaded.

Thank you. I'll upload the file to a shared space for L21 and several analysts and myself will try to compare with the other raw results we have to understand L21's tree down to the modern era. The primary discussion for this takes place at
https://groups.yahoo.com/neo/groups/R1b-L21-Project/info
Please join in if you haven't already."


The files they send me normally come with a strange file name that looks something like this.

bigy-012f70d9-0b6f-40e9-8fce-38f4ea2af6d3.zip

They are not too big so they are fairly nice to handle. You can get a great deal of information out of the VCF and REGIONS files although BAM files provide more. The key thing about VCF files is they provide all of the derived calls by FTDNA, even the ones the reject for quality or other reasons.

Mikewww
10-01-2015, 12:29 PM
...
The files they send me normally come with a strange file name that looks something like this.

bigy-012f70d9-0b6f-40e9-8fce-38f4ea2af6d3.zip
...

Part of my role in this is to catalog these VCF compressed folders in a meaningful and consistent way so people can find them. The first step I do with each file is rename it from something unintelligible to some like this:

L513_U2464_Bierney_BigY_RawData_20150920.zip

This takes some time as I have to gather haplogroup label, kit #, surname (I try to use the MDKA surname), etc. and date the files. For each kit I've already started this processed because I check the GAP tool RECEIVED LAB RESULTS report for R-L21 about once a week. On that report I have to click into the myFTDNA accounts for each result to get the email ID.

These these things like email ID's, kit #s, haplogroup labels, etc are actually stored in my working copy of the R1b-L21_Discovery_V1 file. Externally I don't think there is that much a need for this spreadsheet any more since the Big Tree has been enhanced over time to keep up with SNP names and the like. However, for myself, this is my log of who I've contacted and when, along with this kit related information.

After I rename the files I upload them to the Big Y results folder of the L21 yahoo group. When I do that the yahoo group automatically posts/sends an email notice like this:

Subject: [R1b-L21-Project] New file uploaded to R1b-L21-Project
Sun, Sep 20, 2015 at 8:41 PM
to R1b-L21-Project


Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the R1b-L21-Project
group.

File : /Big_Y_results/L513_U2464_Bierney_BigY_RawData_20150920.zip
Uploaded by : mikewww7 <mwwdna@gmail.com>
Description :

You can access this file at the URL:
https://groups.yahoo.com/neo/groups/R1b-L21-Project/files/Big_Y_results/L513_U2464_Bierney_BigY_RawData_20150920.zip

Long ago, I've also set up folders like this in the L21 group for FGC results, Geno 2 results and Chromo 2 results. I do not load those, though. The whole idea is that testers or project admins do this themselves. Of course I don't have access to contact information or even knowledge of may of these non-Big Y results.

I've also set up similar folders in the R1b-DF27-project and R1b-P312-project yahoo groups. Razyn does a nice job of the working the requests for Big Y raw results files for DF27. It's tedious, I can verify that. The only place I really do all of the email request/exchange work is in the L21 Big Y results folder (and believe me that's enough!). Another fellow has setup folders for storing AncestryDNA, YSEQ and 23andMe results for DF27.

Here is the link and below that are some examples of recent uploads as they are displayed at this web link:

https://groups.yahoo.com/neo/groups/R1b-L21-Project/files/Big_Y_results/

CTS3386_196841_ONuallain_BigY_RawData_20151001.zip ..... 278 KB ..... mikewww7 ..... 6:38 AM
M222_226023_McIntire_BigY_RawData_20150912.zip ..... 270 KB ..... mikewww7 ..... 2:33 PM
DF21_166653_MacFarland_BigY_RawData_20150910.zip ..... 268 KB ..... mikewww7 ..... 1:55 PM
CTS1751_107682_MacMaster_BigY_RawData_20150908.zip ..... 266 KB ..... mikewww7 ..... 1:20 PM
CTS4466_245010_Ober_BigY_RawData_20150930.zip ..... 261 KB ..... mccarthygen ..... 10:51 AM
DF49x_259442_Fancher_BigY_RawData_20150930.zip ..... 267 KB ..... mikewww7 ..... 8:47 AM
M222_B3380_Melloy_BigY_RawData_20150930.zip ..... 286 KB ..... mikewww7 ..... 8:46 AM
...
Z251_387196_Jack_BigY_RawData_20150918.zip ..... 282 KB ..... mikewww7 ..... Sep 23
DF41_221200_Stewart_BigY_RawData_20150923.zip ..... 259 KB ..... mikewww7 ..... Sep 23
L513_U2464_Bierney_BigY_RawData_20150920.zip ..... 271 KB ..... mikewww7 ..... Sep 20
...
L513_44265_Martin_BigY_RawData_20150918.zip ..... 283 KB ..... mikewww7 ..... Sep 18
...
CTS4466_31397_Hayes_BigY_RawData_20150905.zip ..... 275 KB ..... elizabethodonoghue..... Sep 05
DF21_312546_McCarty_BigY_RawData_20150831.zip ..... 232 KB ..... mccarthygen ..... Sep 04

You can see that other folks besides myself upload these files. That's good!

All of the L21 Big Y raw results VCF folders that I have, as long as I have permission, are uploaded to this folder. It ends up being a directory of Big Y results for anyone who has joined the R1b-L21-project yahoo group. I'm not sure who all uses these files but Alex Williamson uses them extensively for the Big Tree. Effectively, the role I play is really administrative supporting him and anyone else who is accessing these files.

You can sort the folder on the web page by subclade or by date. You can do FIND's (ctrl-F) to find a specific kit #, etc. ... as long as you are in the R1b-L21-project yahoo group.

I do one special thing, just because those guys are special. I upload the M222 VCF raw results to the R1b-M222-project yahoo group. That's a bit redundant, but as I said those guys are special.

Of course, this is one of the reasons I use yahoo groups. I don't really like the formatting of yahoo groups but we get a lot of free storage and can links to other places, etc.

Mikewww
10-01-2015, 09:59 PM
Part of my role in this is to catalog these VCF compressed folders in a meaningful and consistent way so people can find them. The first step I do with each file is rename it from something unintelligible to some like this:

L513_U2464_Bierney_BigY_RawData_20150920.zip
....
These these things like email ID's, kit #s, haplogroup labels, etc are actually stored in my working copy of the R1b-L21_Discovery_V1 file.
...
After I rename the files I upload them to the Big Y results folder of the L21 yahoo group. When I do that the yahoo group automatically posts/sends an email notice like this:

Subject: [R1b-L21-Project] New file uploaded to R1b-L21-Project
Sun, Sep 20, 2015 at 8:41 PM
to R1b-L21-Project

Hello,

This email message is a notification to let you know that
a file has been uploaded to the Files area of the R1b-L21-Project
group.

File : /Big_Y_results/L513_U2464_Bierney_BigY_RawData_20150920.zip
Uploaded by : mikewww7 <mwwdna@gmail.com>
Description :

You can access this file at the URL:
https://groups.yahoo.com/neo/groups/R1b-L21-Project/files/Big_Y_results/L513_U2464_Bierney_BigY_RawData_20150920.zip
....
All of the L21 Big Y raw results VCF folders that I have, as long as I have permission, are uploaded to this folder. It ends up being a directory of Big Y results for anyone who has joined the R1b-L21-project yahoo group. I'm not sure who all uses these files but Alex Williamson uses them extensively for the Big Tree. Effectively, the role I play is really administrative supporting him and anyone else who is accessing these files...

I've a had a couple of requests for updating the L21 Discovery V1 file so I've loaded over twenty in the last two days including the U2464 Bierney who is in my own subclade. Until I load them in this Discovery spreadsheet I don't do any viewing of the contents of the VCF and REGIONS data. This is my posting on the yahoo group from just a few minutes ago.


Thu, Oct 01, 2015 at 4:42 PM
to R1b-L21-Project

I've updated the zipped/compressed spreadsheet version of the Big Y Discovery results.

Haplogroup SNP Discovery for R1b-L21 V1

This file is essentially a summary of the passed derived (positive) variants with current SNP labels for the people who have shared and I now have 1,095 L21 people in this file.

I downloaded YBrowse yesterday so the SNP naming should be fairly current. The position numbers that are on Alex Williamson's STR and homopolymer list have red diagonal lines through them. Positions that I know of in palindrome, 125bp and X-similar regions have red dots on them. Short Indels are a light pink and long Indels are a dark pink.

James, I didn't see the VCF files for the kits you listed. I assume they are L555 types. Are they in the L21 project? If not, can you help me and we'll contact them to ask them to share their VCF raw results. I can update this fairly quickly if we get their VCF folders.

Regards,
Mike W

Forwarded:
On Fri, Sep 25, 2015 at 2:38 PM, James Irvine <jamesmirvine@hotmail.co.uk> wrote:

Mike, are you going to do an update of your excellent Discovery Analysis incorporating the BigY data for F54774, F364399, F280156, F87191, F280599 and Sept 11th? Did you receive this data?
https://groups.yahoo.com/neo/groups/R1b-L21-Project/conversations/messages/29941

I have second much more extensive spreadsheet I call Discovery V2 where I don't throw away any data and grade SNP locations, test calls, phylogenetic consistency etc. I only do that for L513 though so Bierney will end up there when I get chance.

However, please notice the time-line. I uploaded Bierney on the 20th and still have not really analyzed his data. That's okay because Alex already has so all I really do is use another analysis method to cross-check L513 guys that are already positioned in L513 on the Big Tree.

Mikewww
10-16-2015, 10:11 PM
I've been uploading files over the last several weeks but just uploaded another 7 or 8 today and played catch-up on my tracking spreadsheet, adding 23 to it today. Several new M222 people are in.