If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
Hello Chris. Is it possible that the 1/290,000 estimate, derived from the fact that only one person in the database (of 29,000 entries) had 314.1C, is the basis of the error--given Sir Alec's comment that he was a decimal off?
I have to say I'm a bit sceptical about the "decimal point" explanation of where the 1 in 290,000 frequency estimate came from. The article suggests it was really 1 in 29,000 according to the GMI database (which now has 34,617 sequences, but would have had less when the "Eddowes" match was found).
.
It does seem odd, Chris. It looks like some sort of speculation by the journalist, unless the notion was put forward by one of the authorities.
The key thing is, as you say, that they can no longer hide behind the notion, put forward on these forums many times, that the non-scientists cannot know as much as the scientists.
This was presumably their strong point. I cannot believe that they were keeping stronger evidence back. This was their best shot, and it's absolutely wrong.
It'll be interesting to see what defences they put forward in the coming days.
I have to say I'm a bit sceptical about the "decimal point" explanation of where the 1 in 290,000 frequency estimate came from. The article suggests it was really 1 in 29,000 according to the GMI database (which now has 34,617 sequences, but would have had less when the "Eddowes" match was found).
You've studied this a bit Chris. To get a figure such as 1 in 290,000 out, presumably you have to put a sequence in. Is that right? And if so then the sequence becomes one of the 290,000?
In fact what Jari got out was the figure 0.000003506 which he claimed was around 1 in 290,000.
How would you get a figure like that out?
In the article it was Sir Alec who said that the database is too small and couldn't possible give a figure of 1 in 290,000. But could it give 0.000003506?
In the article it was Sir Alec who said that the database is too small and couldn't possible give a figure of 1 in 290,000. But could it give 0.000003506?
No, because those two figures are equivalent.
There are two difficulties with getting a figure of 1 in 290,000 for 314.1C from that database:
(1) The database would be too small to give that figure, even if only one sequence in the database matched and
(2) the search engine doesn't show 314.1C as rare at all, because it's clever enough to work out that it's the same as 315.1C.
In fact what Jari got out was the figure 0.000003506 which he claimed was around 1 in 290,000.
I may have it. The figure 0.000003506 is about 1 in 285,000. If JL inadvertently added an extra zero in his note to RE, then that's what he'd have got. If the figure was actually 0.00003506 then you get 1 in 28,522.
"The key thing is, as you say, that they can no longer hide behind the notion, put forward on these forums many times, that the non-scientists cannot know as much as the scientists."
Indeed. Nor can they claim that it is incomprehensible that scientists can make a simple error.
I may have it. The figure 0.000003506 is about 1 in 285,000. If JL inadvertently added an extra zero in his note to RE, then that's what he'd have got. If the figure was actually 0.00003506 then you get 1 in 28,522.
I think - my maths is not great.
Yes, I think the author of the article had something like that in mind (though there would still be the difficulty that if you put 314.1C into the search engine, it doesn't just return one match, but tens of thousands).
The "Eddowes" DNA analysis seems to have been done in early 2013. I've just been looking at the EMPOP release history. It did have around the right number of sequences for a single one to represent 1 in about 28500 at that time. On 5 September 2012, there was release 8, with 26073 sequences, and on 17 January 2013, release 9, with 29444 sequences.
Apparently those were discrete steps, though. However, where this is referred to in the book, the phrase "based on the latest available information" is used, and elsewhere I have seen JL suggest he may have used data that weren't publicly available.
Is it possible he was given access to a version of the database that hadn't been publicly released, and for that reason didn't use the standard search engine, which would have corrected for the error of nomenclature? Maybe, but if he really made two serious errors regarding this one sequence variation, that is really disturbing.
Apparently those were discrete steps, though. However, where this is referred to in the book, the phrase "based on the latest available information" is used, and elsewhere I have seen JL suggest he may have used data that weren't publicly available.
Mmmmm. Of course, the 'latest available information' would date from when he sent the email to RE and that could have been ages ago.
Scientists claim that work by a genetic expert that appeared to unmask Jack the Ripper is wrong, and the notorious murderer's identity still remains a mystery 126 after the string of killings
Comment