PDA

View Full Version : Fraud and frequent preprocessing in Genetic data



tipirneni
03-31-2019, 03:25 PM
Some ethnicities are known to be emotional about Genetic data so resort to pre-processing & post-processing of the genetic data from the test results. These processing are done to remove certain ethnic SNPs from their data. This may be ok as long as the government rules and legislation don't catch them. Say if the genetic tests are used in health data here are the cases that can be filed.

Laboratory Fraud
Clinical laboratories and independent diagnostic testing facilities – referred to as IDTFs – have often been defendants in False Claims Act cases brought by whistleblowers. Common types of fraud by laboratories and IDTFs include:


Billing tests that were performed by unlicensed personnel, or misrepresenting who performed the test.

Kulin
03-31-2019, 03:30 PM
How do you know about this?

tipirneni
03-31-2019, 03:44 PM
How do you know about this?

Based on published test results from test studies on the communities vs. the variations shown in the calc results & g25 & validated by phenotype

Outliers can be spotted easily & accounted for based on the possible additions in the populations

Censored
04-01-2019, 03:51 AM
What reason would someone have for doing this?

tipirneni
04-01-2019, 10:02 PM
What reason would someone have for doing this?

Many subcaste groups in India believe that they were pure etc... or they used to higher status etc... so remove the Dalit SNPs using pre-processing & add some Aryan SNPs to make a statement. So they will be doing it. There are also instances of newly converts claiming relation to Syeds etc.. among muslims.

Censored
04-01-2019, 11:56 PM
Many subcaste groups in India believe that they were pure etc... or they used to higher status etc... so remove the Dalit SNPs using pre-processing & add some Aryan SNPs to make a statement. So they will be doing it. There are also instances of newly converts claiming relation to Syeds etc.. among muslims.

How would it affect SNPs analyzed by DNA companies?

26284729292
04-02-2019, 01:28 AM
Based on published test results from test studies on the communities vs. the variations shown in the calc results & g25 & validated by phenotype

Outliers can be spotted easily & accounted for based on the possible additions in the populations

Can you give an example in relation to g25?

tipirneni
04-02-2019, 02:13 AM
Can you give an example in relation to g25?

Good question. I m seeing many G25 coordinates missing some data. But right now it beyond scope of this Thread.

pegasus
04-02-2019, 03:41 AM
Some ethnicities are known to be emotional about Genetic data so resort to pre-processing & post-processing of the genetic data from the test results. These processing are done to remove certain ethnic SNPs from their data. This may be ok as long as the government rules and legislation don't catch them. Say if the genetic tests are used in health data here are the cases that can be filed.

Laboratory Fraud
Clinical laboratories and independent diagnostic testing facilities – referred to as IDTFs – have often been defendants in False Claims Act cases brought by whistleblowers. Common types of fraud by laboratories and IDTFs include:


Billing tests that were performed by unlicensed personnel, or misrepresenting who performed the test.

While the first portion you said is true for some who doctor their results superficially on boards, the rest of what you wrote makes no sense and its a rather sweeping and ridiculous statement. None of the DNA testing companies are hinged on pandering to emotions for any ethnic group lol, modern populations cluster with their respective ethnic groups using any software whether its Davidski's G25, Poi's checkfit, PAST, qpAdm , F3 stats. We have seen that a gazillion times over on this forum.

Donwulff
04-02-2019, 04:26 AM
This sounds like the Cracked.com "article" re-warmed. I was shocked to find that in wake of the "Fake News" trend many, many people on online communities including Reddit, ISOGG FB group etc. consider Cracked.com which bills itself as "America's Only Humor Site" as legitimate, trusted breakign news site - at least as long as the articles confirm their beliefs.

The "Inside The Shady World Of DNA Testing Companies" article is so riddled with contradictions (Secret informant claims both to have been working there when tests were introduced, and that they weren't yet working there when angry cutomer attacked them, for example) and technical whoopers (Claiming that DNA tests require using mouthwash to purify your DNA, when it's explicitly forbidden by all DTC genetic testing companies etc.) that it's a wonder anybody can read it without their heads exploding. But it's become staple lore among everybody who disagrees with ancestry estimates for whatever reason.

At the same time these accusations are always tricky, because anybody anywhere can set up shop and start selling DNA ancestry tests that are completely made up. There are threads on LivingDNA forums right now about how LivingDNA is changing people's ancestry estimates to match after they've contacted customer service. However, for any DNA testing compay which offers one-to-one matching to DNA relatives, doctoring actual DNA results would be immediately obvious when people don't match their relatives. And at least the major American DNA testing companies have no conceivable motive for tweaking ancestry analysis according to some country's internal divisions, which I've seen alleged in the past when people don't agree with their results.

agent_lime
04-02-2019, 04:44 AM
Good question. I m seeing many G25 coordinates missing some data. But right now it beyond scope of this Thread.

Prove your assertions. Who is missing data? What have they done?

Censored
04-02-2019, 08:26 AM
Good question. I m seeing many G25 coordinates missing some data. But right now it beyond scope of this Thread.

In what way are they “missing data”? If that were the case, you couldn’t even use them. Are you implying they’ve been edited somehow? If so, can you provide examples where you suspect this to be the case?

Strong claims require strong evidence.

Generalissimo
04-02-2019, 12:02 PM
Unbelievable.

tipirneni
04-02-2019, 01:53 PM
Prove your assertions. Who is missing data? What have they done?

Are you looking right at the all G25 coords being posted ? or just looking at the Khatri ones ? Did you ever try to check the validity of each coord ?

tipirneni
04-02-2019, 01:58 PM
In what way are they “missing data”? If that were the case, you couldn’t even use them. Are you implying they’ve been edited somehow? If so, can you provide examples where you suspect this to be the case?

Strong claims require strong evidence.

I am not using the G25 data. It is just a tool to post each individual past admixure & group admixure.

agent_lime
04-02-2019, 02:33 PM
Are you looking right at the all G25 coords being posted ? or just looking at the Khatri ones ? Did you ever try to check the validity of each coord ?

The assertion is so big, you'll have to do the leg work. I still have to tell you this is some conspiracy level stuff. First, you would have to come up with what the person's preference is, and then skew the results that way by modifying in the SNP data. That would take some machine learning and serious bit of hardware. Add to that these companies can and will get sued out of existence if it comes to the front that they were modifying data on racist or ethnic notions.

Many people do take multiple ancestry tests, if we find a user that you suspect has been modified we could write a program to cross check SNP's. Although I highly highly doubt it.

xenus
04-02-2019, 05:23 PM
The assertion is so big, you'll have to do the leg work. I still have to tell you this is some conspiracy level stuff. First, you would have to come up with what the person's preference is, and then skew the results that way by modifying in the SNP data. That would take some machine learning and serious bit of hardware. Add to that these companies can and will get sued out of existence if it comes to the front that they were modifying data on racist or ethnic notions.

Many people do take multiple ancestry tests, if we find a user that you suspect has been modified we could write a program to cross check SNP's. Although I highly highly doubt it.

The first issue is I don't think they have any knowledge about what they are alleging actually entails. I'm going to get technical here The first part is about data falsification and they claim the use of preprocessing which for anyone unfamiliar with its use in data science means to clean up and contextualize data so you don't end up with a "garbage in, garbage out" situation. In this case it would have to apply to before a sample is sequenced which makes no sense and anything after is post-processing because current consumer testing sequencing processes don't output any "noise" that needs to be handled by preprocessing. If data were to be falsified it would be entirely a post-processing task and one that you couldn't get away with because so many people use multiple testing companies.

At the end of the post are much more believable claims of financial fraud by using unlicensed lab techs but billing for work by licensed lab techs that you almost surely have to legally have doing the work in the first place but that doesn't seem to have anything to do with falsifying customers genetic data.

It is normal that unsampled or undersampled heterogeneity within a population/group caused by isolation and drift on one hand and gene flow via admixture on the other are to be expected. There are going to be outliers as well but they are necessarily defined relative to the sample pool which could just as easily be cherry picked to conform to expectations in the first place. He actually said that removed outliers could be "verified by phenotype" which in this instance makes it easy to interpret his post and his replies as coming down to "I think data is being falsified because if a person doesn't look like X ethnicity as I am familiar with then the data should clearly reflect that".

pegasus
04-02-2019, 05:56 PM
I am not using the G25 data. It is just a tool to post each individual past admixure & group admixure.

G25 is not even an Admixture calculator to begin it can perfectly replicate results of formal stats even, from what DMXX said. Indeed , I can replicate results seen in formal papers.

You make borderline trollish posts at times but this one really takes the cake.

Censored
04-02-2019, 06:07 PM
The first issue is I don't think they have any knowledge about what they are alleging actually entails. I'm going to get technical here The first part is about data falsification and they claim the use of preprocessing which for anyone unfamiliar with its use in data science means to clean up and contextualize data so you don't end up with a "garbage in, garbage out" situation. In this case it would have to apply to before a sample is sequenced which makes no sense and anything after is post-processing because current consumer testing sequencing processes don't output any "noise" that needs to be handled by preprocessing. If data were to be falsified it would be entirely a post-processing task and one that you couldn't get away with because so many people use multiple testing companies.

At the end of the post are much more believable claims of financial fraud by using unlicensed lab techs but billing for work by licensed lab techs that you almost surely have to legally have doing the work in the first place but that doesn't seem to have anything to do with falsifying customers genetic data.

It is normal that unsampled or undersampled heterogeneity within a population/group caused by isolation and drift on one hand and gene flow via admixture on the other are to be expected. There are going to be outliers as well but they are necessarily defined relative to the sample pool which could just as easily be cherry picked to conform to expectations in the first place. He actually said that removed outliers could be "verified by phenotype" which in this instance makes it easy to interpret his post and his replies as coming down to "I think data is being falsified because if a person doesn't look like X ethnicity as I am familiar with then the data should clearly reflect that".

He seriously could have just gone with "I think people are doctoring their own calculator results after having received them". It wouldn't involve so much suspension of belief. Not that I believe this to be the case either.

agent_lime
04-02-2019, 07:36 PM
He seriously could have just gone with "I think people are doctoring their own calculator results after having received them". It wouldn't involve so much suspension of belief. Not that I believe this to be the case either.

Doctoring G25 in a specific way would take a lot of work. The only thing I can believe is that someone changes their Harappa, but even those will get caught when they post their oracles. Companies changing SNP's when according to them I am 100% South Asian the same as a Sri Lankan seems ridiculous

MonkeyDLuffy
04-04-2019, 06:07 AM
He seriously could have just gone with "I think people are doctoring their own calculator results after having received them". It wouldn't involve so much suspension of belief. Not that I believe this to be the case either.

We had a few cases of people altering their results on calcs, but they were caught right away. I don't think anyone is smart enough to edit their own data and take off markers they don't want.

Jatt1
04-05-2019, 05:21 AM
We had a few cases of people altering their results on calcs, but they were caught right away. I don't think anyone is smart enough to edit their own data and take off markers they don't want.

I think they can change the well known SNPs alleles.

26284729292
04-05-2019, 05:26 AM
We had a few cases of people altering their results on calcs, but they were caught right away. I don't think anyone is smart enough to edit their own data and take off markers they don't want.

What happened in these instances/what was the context?

Censored
04-05-2019, 05:49 AM
What happened in these instances/what was the context?

There was one Pashtun guy from a long time ago who was altering his results to appear less South Asian shifted.

anglesqueville
04-05-2019, 06:41 AM
I may, of course, be wrong, but... I have worked a lot (and once again recently) with AIMs (Ancestral Informative Markers). My experiments were all rather deceiving, in the sense that those AIMs seem to be very little ... informative. I can hardly imagine how it would be possible to change significantly an individual "ethnic make-up" by modifying by hand the genotypes of some of those AIMs, and a fortiori of a handful, even a big handful, of "markers". All this story makes me very skeptical.

agent_lime
04-05-2019, 07:01 AM
I may, of course, be wrong, but... I have worked a lot (and once again recently) with AIMs (Ancestral Informative Markers). My experiments were all rather deceiving, in the sense that those AIMs seem to be very little ... informative. I can hardly imagine how it would be possible to change significantly an individual "ethnic make-up" by modifying by hand the genotypes of some of those AIMs, and a fortiori of a handful, even a big handful, of "markers". All this story makes me very skeptical.

It probably can be done. Find the markers that are more Western Eurasian, replace ASI ones. Then make a new raw file. It would take a lot of work to do manually though.

Donwulff
04-05-2019, 07:53 AM
Original poster said, "so remove the Dalit SNPs using pre-processing & add some Aryan SNPs to make a statement. So they will be doing it."

I really wanted to bite on that, because the idea of "Dalit SNPs"... The microarrays used in current ancestry testing are testing mostly for frequent variants, in fact the definition of SNP is something along the lines of frequency more than 1-2%. To become that common, they need to be tens of thousands years old - in other words, they were formed long before the populations they seek to measure. Unless the variant is very recent one, it's going to be shared a lot by geographically close populations, and has indepently occurred in geographically distant populations. Most modern ancestry estimation methods in fact use haplotype based approach due to these reasons, not individual markers. It's quite doubtful any "Dalit SNPs" exist, and the suggestion of them existing in itself sounds discriminatory.

On the technical side though, I have bad news. Regardless of the method used for ancestry estimates, for an individual tester it would be as simple as copying raw data from someone of desired ethnicity (Many, many people have published their raw data) to run at a third party. Of course, why bother when you can just paintshop the results? As for the genetic genealogy companies itself, having developed the analysis of course they have the expertise to make changes to data to make it look any ethnicity desired. But as I noted early on, if they actually changed the raw data as was implied by the original poster, these people would no longer match their relatives in DNA relatives comparison.

So basically, there are no "SNP's for specific ethnicity", and while tampering with the genetic data is a feasible feat, for the test-taker Photoshop is much more easy and straightforward, and if a testing company did it, it would be immediately obvious. Adjusting ancestry analysis results is well possible, but lacks motivation unless it can be shown that the party doing the analysis is of or getting paid by said ethnicity. Either way, it should be easy to provide specific proof of "Fraud and frequent preprocessing".

Vadim Verenich
04-05-2019, 05:12 PM
This smells like a conspiracy theory's stink to me.

Jatt1
04-05-2019, 06:58 PM
This smells like a conspiracy theory's stink to me.

Elaborate please.

Vadim Verenich
04-05-2019, 08:16 PM
Elaborate please.

The whole topic which has been discussed for a while is just another example of very well known informal fallacy:



Conspiracy Theory

(also known as: canceling hypothesis, canceling hypotheses, cover-ups)

Description: Explaining that your claim cannot be proven or verified because the truth is being hidden and/or evidence destroyed by a group of two or more people. When that reason is challenged as not being true or accurate, the challenge is often presented as just another attempt to cover up the truth and presented as further evidence that the original claim is true.

Logical Form:

A is true.

B is why the truth cannot be proven.

Therefore, A is true.

Example #1:

Noah’s ark has been found by the Russian government a long time ago, but because of their hate for religion, they have been covering it up ever since.

Example #2:

Geologists and scientists all over the world are discovering strong evidence for a 6000-year-old earth, yet because of the threat of ruining their reputation, they are suppressing the evidence and keeping quiet.

Example #3:
Some ethnicities are known to be emotional about Genetic data so genetic testing companies resort to pre-processing & post-processing of the genetic data from the test results. These processing are done to remove certain ethnic SNPs from their data

tipirneni
04-07-2019, 10:17 PM
This smells like a conspiracy theory's stink to me.

I was also thinking in same lines. May be some unseen money pot is investing in few urban hotspots where few disgruntled upper caste people trying to erase all other people from their genome history.

26284729292
04-09-2019, 10:58 PM
Original poster said, "so remove the Dalit SNPs using pre-processing & add some Aryan SNPs to make a statement. So they will be doing it."

I really wanted to bite on that, because the idea of "Dalit SNPs"... The microarrays used in current ancestry testing are testing mostly for frequent variants, in fact the definition of SNP is something along the lines of frequency more than 1-2%. To become that common, they need to be tens of thousands years old - in other words, they were formed long before the populations they seek to measure. Unless the variant is very recent one, it's going to be shared a lot by geographically close populations, and has indepently occurred in geographically distant populations. Most modern ancestry estimation methods in fact use haplotype based approach due to these reasons, not individual markers. It's quite doubtful any "Dalit SNPs" exist, and the suggestion of them existing in itself sounds discriminatory.

On the technical side though, I have bad news. Regardless of the method used for ancestry estimates, for an individual tester it would be as simple as copying raw data from someone of desired ethnicity (Many, many people have published their raw data) to run at a third party. Of course, why bother when you can just paintshop the results? As for the genetic genealogy companies itself, having developed the analysis of course they have the expertise to make changes to data to make it look any ethnicity desired. But as I noted early on, if they actually changed the raw data as was implied by the original poster, these people would no longer match their relatives in DNA relatives comparison.

So basically, there are no "SNP's for specific ethnicity", and while tampering with the genetic data is a feasible feat, for the test-taker Photoshop is much more easy and straightforward, and if a testing company did it, it would be immediately obvious. Adjusting ancestry analysis results is well possible, but lacks motivation unless it can be shown that the party doing the analysis is of or getting paid by said ethnicity. Either way, it should be easy to provide specific proof of "Fraud and frequent preprocessing".

This also seems oddly south asian specific with the use of these terms...
I don't think this is possible without an inordinate amount of work.

Which makes me wonder where any thought over its existence is coming from

Tomenable
04-09-2019, 11:16 PM
There was a discussion about 23andMe's methodology here:

http://www.anthrogenica.com/showthread.php?4576-Do-you-trust-23andme-s-Ancestry-Composition-or-GEDmatch-calculators-more&p=145261&viewfull=1#post145261

Quote:

"23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.

Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.

That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.

This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.

The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."

====

But I don't think any other company is using the same "purifying methodology".

26284729292
04-09-2019, 11:42 PM
There was a discussion about 23andMe's methodology here:

http://www.anthrogenica.com/showthread.php?4576-Do-you-trust-23andme-s-Ancestry-Composition-or-GEDmatch-calculators-more&p=145261&viewfull=1#post145261

Quote:

"23andMe's speculative mode greatly overestimates major components, and underestimates minor components. This is due to their methodology of snipping the genome into 100 SNP segments to compare against the limited references they have. So for example, if 60% of the segment indicates Middle Eastern, and 40% indicates South Asian, that segment is assigned 100% Middle Eastern. In effect 40% of the segment, which is South Asian is ignored, and the whole segment is assigned Middle-Eastern.

Also, their methodology includes segment smoothing, which means if there are chunks of minor components in a segment, they are ignored.

That is how Iranians and West Asians turn out 98-100% Middle Eastern, and folks in neighboring Pakistan turn out 98-100% South Asian in speculative mode.

This naturally is unrealistic and uninformative, because you don't need a test to tell you that. Conservative mode is better with regards to inflation of major components and underestimation of minor components, but the trouble here is that people get 5-70% unassigned. This is where your minor components are hidden.

The above translates to 23andMe being useless for figuring out your minor components to any degree of accuracy."

====

But I don't think any other company is using the same "purifying methodology".

Yeah but this is very different than "aryan" and "dalit" SNPs. This sounds more like a south asian specific accusation, as in certain users/people "doctor" their results to appear more "aryan".

26284729292
04-09-2019, 11:43 PM
I was also thinking in same lines. May be some unseen money pot is investing in few urban hotspots where few disgruntled upper caste people trying to erase all other people from their genome history.

Where have you seen this? I'm legitimately curious as to where this is coming from.

Tomenable
04-09-2019, 11:46 PM
Yeah but this is very different than "aryan" and "dalit" SNPs. This sounds more like a south asian specific accusation, as in certain users/people "doctor" their results to appear more "aryan".

Well, dalits do tend to have slightly more of indigenous South Asian DNA.

Anyway as for that methodology:

I don't think doing that is "wrong" or "right", but people should understand 23andMe shows recent and geographical (not deep or ethnic / "racial") ancestry.

That's why they claim what they do about the timeframe (that their test goes only a few centuries back, to Early Modern Era).

So what Maciamo claims on Eupedia (that 23andMe goes back even thousands of years ago, and he uses their data in his "racial admixture maps") is wrong.

See what I'm talking about: https://www.eupedia.com/europe/autosomal_maps_dodecad.shtml#23andMe

E.g. map of "Italian admixture": https://www.eupedia.com/images/content/23andMe_Italian.png

^^^
Less than 1% Italian DNA in Britain and in much of France and Iberia, which is almost certainly wrong considering how significant was Roman impact there:

http://science.sciencemag.org/content/363/6432/1230

^^^
This study detected high Italian admix only in Iberia, but brace yourselves: future aDNA studies about France and Southern Britain will also find Roman DNA. Reich lately suggested Roman impact in Southern England. And I claimed this back in 2015/2016 on forums (that South English have some Italian admix). There is also no way that all of France (Gaul) started speaking Latin and building advanced Roman-type cities without large influx of actual Roman settlers.

26284729292
04-10-2019, 12:17 AM
Well, dalits do tend to have slightly more of indigenous South Asian DNA.

Anyway as for that methodology:

I don't think doing that is "wrong" or "right", but people should understand 23andMe shows recent and geographical (not deep or ethnic / "racial") ancestry.

That's why they claim what they do about the timeframe (that their test goes only a few centuries back, to Early Modern Era).

So what Maciamo claims on Eupedia (that 23andMe goes back even thousands of years ago, and he uses their data in his "racial admixture maps") is wrong.

See what I'm talking about: https://www.eupedia.com/europe/autosomal_maps_dodecad.shtml#23andMe

E.g. map of "Italian admixture": https://www.eupedia.com/images/content/23andMe_Italian.png

^^^
Less than 1% Italian DNA in Britain and in much of France and Iberia, which is almost certainly wrong considering how significant was Roman impact there:

http://science.sciencemag.org/content/363/6432/1230

^^^
This study detected high Italian admix only in Iberia, but brace yourselves: future aDNA studies about France and Southern Britain will also find Roman DNA. Reich lately suggested Roman impact in Southern England. And I claimed this back in 2015/2016 on forums (that South English have some Italian admix). There is also no way that all of France (Gaul) started speaking Latin and building advanced Roman-type cities without large influx of actual Roman settlers.

I don't disagree with your claim. But again I don't 'think he's referring so much to the 23andme interpretation as to doctoring the SNP's themselves in anticipation of how admix calculators work. E.g. an individual who is 35% south indian on harappa world (35% "dalit" for instance) altering his SNP's to replace some of the "south indian" SNPs with "caucasian/ne euro (other groups on harappa)" SNPs, making themselves appear more aryan and hence distanced from other local groups/castes.

JerryS.
04-10-2019, 04:18 AM
how in the world do you alter/change SNP data?

Jatt1
04-14-2019, 06:19 PM
how in the world do you alter/change SNP data?

It is a text file that assigns each SNP an allele A/T, or C/G and all you have to do is to change the assigned A to T or C to G etc, afaik.