Announcement

Collapse

Skeptiko forums moved

The official forums of the Skeptiko podcast have moved to http://skeptiko.com/forum/.
As such, these forums are now closed for posting.
See more
See less

Statistical Biases in Parapsycholgy

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Maaneli View Post
    As we've discussed before, I think there's something missing in your account. The binomial test is used to test a null hypothesis, or the question 'what is the binomial probability that a certain ratio of hits to misses in an experiment such as the ganzfeld occurs by random chance?' This null hypothesis is N1. In other words, one assumes the null hypothesis is true for each trial of a ganzfeld experiment, *and then* computes the binomial probability for the observed proportion of hits to misses.
    No. You're letting the tail wag the dog. The validity of a statistical test depends on the data conforming to certain assumptions. For instance, a test that assumes the data come from a normal distribution will not generally yield valid results if the data are substantially non-normal. For the binomial test, the assumptions are twofold: that each observation resulted from a Bernoulli trial with a common, but unknown, probability of success p, and that each trial is independent of every other trial. If these two assumptions are not met, then the binomial test results will generally not be valid, and the test should not be used. However, if those assumptions are met then the population distribution of your data is a member of the binomial family of distributions. In other words, for the binomial test to be valid, the data have to come from some (common) binomial distribution. If that is true, then you additionally assume that p = p₀, the value of p under the null hypothesis, and compute the p-value for the test. If that p-value is less than the chosen significance level of the test, you conclude that the the null hypothesis has been rejected. That is, that the common p does not equal p₀; the data belong to a binomial distribution with some other value of p.

    Consider what would happen if you conducted the binomial test without meeting the minimal assumptions of the test. Then a rejection of the null hypothesis leads to the useless conclusion that the data are unlikely under the null. Okay, but what else can be concluded? Nothing. It does not follow that there is a p ≠ p₀, for all or even some of the data. All we can conclude is that the data weren't derived from a binomial(p₀) distribution.

    And of course, when one assumes the null is true in an experiment such as the ganzfeld, it automatically implies independence and constant hit probability for each ganzfeld trial, because of how the ganzfeld trials are designed.
    That sentence is nonsensical. The assumption of the binomial test (whether the null is true or not) is that the data are independent with equal probability of success. It does not "imply" it "because of how the ganzfeld trials are designed."

    That's why one can use the exact binomial test on a single ganzfeld experiment, and that's why one can apply the binomial test to the pooled hits and trials of a ganzfeld meta-analysis (assuming all studies have 4-choice design).
    One can use the binomial test on a single ganzfeld experiment if the trials meet the binomial assumptions, which (if we ignore the issue of multiple trials per subject) they do; that is, the trials are independent and identically distributed Bernoulli trials. And that is exactly the reason that we can't (or, rather, shouldn't) apply the test to a collection of ganzfeld experiments if there is evidence of heterogeneity—because heterogeneity means that the trials are not independent and identically distributed.

    Here is what you are claiming: that if a collection of k independent experiments are each binomial(p_i) for i = 1 to k, with common probability of success p₀ under the null hypothesis, then a binomial test of the combined data that rejects the null implies that p_i ≠ p₀ for some i. That is an interesting conjecture that sounds superficially plausible, but it does not follow from your argument. It needs to be rigorously proved, and maybe it has been, but I have not seen a proof of it.
    Last edited by jt512; May 11th, 2013, 05:07 PM.

    Comment


    • Originally posted by jt512 View Post
      No. You're letting the tail wag the dog. The validity of a statistical test depends on the data conforming to certain assumptions. For instance, a test that assumes the data come from a normal distribution will not generally yield valid results if the data are substantially non-normal. For the binomial test, the assumptions are twofold: that each observation resulted from a Bernoulli trial with a common, but unknown, probability of success p, and that each trial is independent of every other trial. If these two assumptions are not met, then the binomial test results will generally not be valid, and the test should not be used. However, if those assumptions are met then the population distribution of your data is a member of the binomial family of distributions. In other words, for the binomial test to be valid, the data have to come from some (common) binomial distribution. If that is true, then you additionally assume that p = p₀, the value of p under the null hypothesis, and compute the p-value for the test. If that p-value is less than the chosen significance level of the test, you conclude that the the null hypothesis has been rejected. That is, that the common p does not equal p₀; the data belong to a binomial distribution with some other value of p.
      In all the different references I've read defining the assumptions of a binomial experiment, I have never seen one like the one you're using. In particular, none of the references say that the common probability of success p is "unknown". In fact, in all the examples I've read of a binomial experiment, p is always taken equal to the probability one would expect under the null hypothesis, and is determined by the particular design of the given binomial experiment. For example, from stattrek:

      Here is an example of a binomial experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because:

      -The experiment consists of repeated trials. We flip a coin 2 times.
      -Each trial can result in just two possible outcomes - heads or tails.
      -The probability of success is constant - 0.5 on every trial.
      -The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials.

      http://stattrek.com/probability-dist.../binomial.aspx

      In other words, because the design is a coin-flipping experiment (two possible outcomes, heads or tails), p = 0.50, i.e. the probability of 'success' under the null hypothesis.

      Do you have a specific reference you can show me from which you are deriving your particular definition of a binomial experiment? And can you post that definition verbatum here?

      Also, I still don't understand why you think that intraclass correlations (a consequence of heterogeneity) imply that the trials are not independent. Heterogeneity, as far as I can see, has no impact on the independence assumption. For example, suppose I do two ganzfeld experiments, each of 100 trials, but in the first one I use only selected subjects, and in the second one I only use unselected subjects. Suppose also that, as per my meta-analytic finding that selected subjects perform significantly better than unselected subjects, the experiment using the former produces a significantly higher hit rate (as indicated by a chi^2 test) than the experiment using the latter.

      As stattrek states, trials are independent when "getting heads on one trial does not affect whether we get heads on other trials." Conversely, then, if the trials were dependent, this would mean that 'getting heads on one trials affects whether we get heads on other trials'.

      It is clear though that the fact that the hypothetical experiment with selected subjects produces a significantly higher hit rate than the experiment with unselected subjects does not imply that the trials of the experiment with selected subjects affected the hit probability of the trials of the experiment with unselected subjects (i.e. it does not imply that the trials between the two experiments were dependent on each other). They were simply two separate experiments, using two different types of subjects, one of which used subjects that, for whatever reason (ESP or otherwise), had a significantly greater ability to identify the targets. So, given that the trials of the two experiments are in fact 'independent' of each other, it is OK to pool the hits and trials of the two experiments and perform an exact binomial test for p = p_0. In fact, this would be effectively equivalent to running a single 200 trial experiment in which for the first 100 trials we use selected subjects, and for the next 100 trials we use unselected subjects, and then performing an exact binomial test for p = p_0.





      Originally posted by jt512 View Post
      That sentence is nonsensical. The assumption of the binomial test (whether the null is true or not) is that the data are independent with equal probability of success. It does not "imply" it "because of how the ganzfeld trials are designed."
      It's not nonsensical. Recall the coin-flipping example above; the "equal probability of success" for the coin-flipping experiment is the p under the null, i.e. 0.5, because there are only two possible outcomes - heads or tails. If we were considering a ganzfeld experiment instead, p would be 0.25 under the null because there is 1 actual target out of 4 possible targets.



      Originally posted by jt512 View Post
      One can use the binomial test on a single ganzfeld experiment if the trials meet the binomial assumptions, which (if we ignore the issue of multiple trials per subject) they do; that is, the trials are independent and identically distributed Bernoulli trials. And that is exactly the reason that we can't (or, rather, shouldn't) apply the test to a collection of ganzfeld experiments if there is evidence of heterogeneity—because heterogeneity means that the trials are not independent and identically distributed.
      I think there's an inconsistency in your view. If you agree that the trials in a single ganzfeld experiment do in fact meet the binomial assumptions (i.e. independence of trials and common p across trials), then you must agree that my example of a single 200 trial ganzfeld experiment, where the first half of the trials are done with selected subjects and the second half with unselected subjects, also meets the binomial assumptions. And since my example is fundamentally no different than my other example of running two separate 100 trial ganzfeld experiments - one using selected subjects, the other using unselected subjects - and pooling the hits and trials across the two experiments, to be consistent, you must also agree that the single 'experiment' obtained from pooling the hits and trials satisfies the binomial assumptions. But since you don't, I think that's an inconsistent position. And I think the inconsistency is in the (unless you can show otherwise) mistaken assumption that heterogeneity means that the trials are not independent with common p.
      Last edited by Maaneli; May 15th, 2013, 06:22 PM.

      Comment


      • Originally posted by fls
        Jay, I wonder what your thoughts are on the problem that we already know that the probability under the null* is not 0.25 in every case. Since the target is selected with replacement among four choices, there will be variation in the numbers of different target choices within each experiment. And in combination with preferential selection of certain targets, the actual probability under the null will be different from study to study. Using an extreme example to illustrate what I mean...if subjects preferentially always choose the first picture presented to them, then the actual probability under the null will be the number of times the target was randomized to the first position. So in one study this may be 0.22, in another it may be 0.32, and in a third it may be 0.21.
        I disagree. I don't think that the null probability depends on the observed distribution of targets, assuming that target selection is random; that is, that the targets selected for each trial are mutually independent with equal probability (a priori). And I don't think that systematic subject preferences for certain targets or positions of targets changes this.

        Let's assume that the null hypothesis is true, that target selection is random as defined above, that a single set of four targets is used in every trial, and that every subject picks "Target #1" every time. Now it may turn out that Target #1 gets randomly selected by the randomizing mechanism in more than 25% of the trials, perhaps even often enough that the null hypothesis is wrongly rejected. But we know the probability that that will happen: it's the significance level of the test, α. So, if it happens, it is just a run-of-the-mill Type 1 error.

        To see that the observed distribution of targets doesn't matter, it might help to think of a single trial, in which the subject first guesses the target from a pool of 4 possible targets, and only afterwards the target is randomly selected by a valid random number generator. Again, assume that the null hypothesis is true, and that our subject would always guess "Target #1" no matter what. Now, after the subject makes his guess, what is the probability that the random number generator will pick Target #1? Obviously, it is .25. What else could it be? Clearly the target that is eventually randomly selected can't be a factor in what the null probability should be. And if this is true for a single trial, it must be true for a multi-trial experiment. And if it is true if the subject guesses before the target is randomly selected, it ought to be true if the reverse order is used. After all, what difference could the order possibly make, as long as no information about the target was communicated to the subject?
        Last edited by jt512; May 16th, 2013, 01:17 AM.

        Comment


        • Originally posted by jt512 View Post
          Let's assume that the null hypothesis is true, that target selection is random as defined above, that a single set of four targets is used in every trial, and that every subject picks "Target #1" every time. Now it may turn out that Target #1 ends up being randomly selected by the randomizing mechanism in more than 25% of the trials, perhaps even often enough that the null hypothesis is wrongly rejected. But we know the probability that that will happen: it's the significance level of the test, α. So, if it happens, it is just a run-of-the-mill Type 1 error.
          Well, well, we agree about something, then. That happens to be exactly what I pointed out in a prior thread:

          To counter this argument, once made by Hyman, I will quote (as Maaneli has often done) John Palmer's response:

          Hyman's criticism is a potentially serious one because it could apply not just to the PRL studies but to all the other ganzfeld studies and, in fact, to most studies in parapsychology that use a procedure where the frequency of each target is free to vary. Parapsychologists almost never perform the kind of control analyses reported by Bem (1994). Has Hyman unwittingly undermined a huge amount of the psi literature? The answer is no, and the reason is that the bias at issue is not systematic. In other words, it could just as easily lead to a deficiency of hits as to an excess of hits, depending upon whether the most frequent targets contradict or match participant response biases. As Hyman himself notes, this effect cancels itself out over a large number of studies. That is one reason we do replications; successes based on such 'lucky' matches of targets and response biases will not replicate very well.
          I completely agree with Palmer, of course, but would add that we can quantify the probability that any given experiment would thus produce a spuriously significant result at—you guessed it—p=.05. This is a truism, of course, as the probability of getting any significant experiment given the null hypothesis is by default .05
          I take it you agree with Palmer?

          Comment


          • Originally posted by fls
            I agree that the experiment can be conceptually regarded as described. Putting aside the ganzfeld condition and any senders, we have a group of subjects who are shown a set of pictures and asked to "pick one". And they will show preferences in what they pick - that is, each picture does not have equal probability of being chosen. But this doesn't matter because they will be randomly assigned to one of four picture groups, and they will purportedly have equal probability of being assigned to one of those groups.

            So to go back to the part in bold, what is our null and what is the false positive rate for that null?
            The null hypothesis is that the probability of a correct guess is .25. The false positive rate is the chosen significance level of the test, α.

            Is it really going to be the same as our alpha? (I mean in practice, not in theory.)
            Yes. If the experiment is conducted and reported honestly, then the false positive rate will be the significance level of the binomial test. If the experiment is conducted or reported dishonestly, can its validity be restored by using an alternative analysis? I doubt it.

            You suggest that the null hypothesis is the probability of group assignment. Why?
            Because if the probability of being "assigned" a particular target is p, then the expected proportion of correct responses under the null is p.

            This is not the case in medical trials, for example, even though it could be.
            I don't know what you mean by that. The null hypothesis in a clinical trial is that there is no difference in outcomes between treatment groups. In Ganzfeld experiment differences between groups defined by target are not of primary interest.

            Even in Chi-square tests, the expected values are calculated based on the observed frequencies.
            I don't know why you say "even in" chi-squared tests. A chi-squared test explicitly depends on the observed frequencies. It's a non-parametric test. I see no reason why you couldn't analyze a Ganzfeld experiment using a chi-squared test, but it will have type 1 error rate of α just as a binomial test. However, it will (I suspect) have lower power than the binomial test.

            If the argument for the validity of this approach depends upon group assignments showing up with the frequency predicted by the binomial distribution...
            The validity of the binomial test does not depend on the observed distribution of target frequencies. It depends on the assumptions of the binomial test: that each output of the randomization mechanism is an independent and identically distributed Bernoulli trial.

            ...what do you make of the example I linked to where the randomizer produced a set of group assignments which would be very infrequent in that binomial distribution?
            If the randomization mechanism was valid, then it was simply a random error, and resulted in an erroneous rejection of the null hypothesis. Shit happens, but in the frequentest hypothesis testing paradigm we think it's ok if it happens α percent of the time.

            (Remember, we aren't actually given the results of the groups assignments in the ganzfeld experiments, only a summary statistic, so knowing the distribution in that one case was an aberration.)
            If the randomization procedure was valid, and the test conducted and reported honestly, then an unlikely distribution occurred, but that doesn't invalidate the test. If the randomization procedure was invalid, that suggests deep flaws in the experiment or the honesty with which it was conducted or reported, and I would not trust an analysis using the observed target frequencies to save the experiment.

            To determine whether the randomization was valid, it would be useful for investigators to report the results of the randomization. If over a number of experiments there was an inordinate imbalance in target selection, then that would indicate that at least some of the experimental results are invalid. If this were found out to be true, then I don't think the answer would be to use a different analysis.
            Last edited by jt512; May 16th, 2013, 04:26 PM.

            Comment


            • Originally posted by fls
              The trial can also be regarded as "a target is randomly selected, what is the probability the subject will choose that picture?" And in that case, p is unknown under the null.

              Even if you conceive of trials in this manner, p is still 0.25 because targets are randomly selected.

              Comment


              • Originally posted by fls
                Look more carefully at what I said. The target has already been selected.

                Linda

                Oh right, I see. So here, you would have two factors contributing to your p – response bias and random variation in target selection. Unknown like you say. So you would have to compare frequencies from two groups right? Something like signal detection theory. Interesting approach, but I don’t see how this invalidates the binomial method (or are you saying that at all?). You make the point that we can sometimes get different answers when we treat our data differently. That seems an obvious point to me. More important is that we make a prediction before we apply our test and that the result is replicated a good number of times.

                Comment


                • Originally posted by fls
                  I'm sorry. My questions were rhetorical, but that wasn't clear. I understand how you go about specifying the false positive rate and the expected portion of correct guesses when the experiment is regarded conceptually as "a guess is made and the probability of randomly selecting the corresponding target is p".

                  The "why?" refers to "why this concept?" The trial can also be regarded as "a target is randomly selected, what is the probability the subject will choose that picture?" And in that case, p is unknown under the null. Conceptually, this is more in line with what we propose is happening (we suspect that the guesses change under the alternate hypothesis, not the target selection). And it is more in line with what I am used to with medical studies - once the intervention groups are randomly selected, we look for differences in outcomes, rather than describing differences in outcomes and asking for the probability of randomly selecting corresponding therapies.
                  In a randomized clinical trial, the goal is to determine if there is a difference in an outcome, say mortality, due to a difference in a treatment between the groups. When the goal is to assess a difference between the treatment groups, the intuitive thing to do is to compare the odds of death in each treatment group. But, you could, as you know, do the unintuitive thing—compare the odds of assignment to treatment group between survivors and non-survivors. Although one approach feels more natural than the other, they are mathematically equivalent, and yield identical odds ratios and p-values.

                  It seems to matter. To take an actual example, the GUSTO trial compared accelerated TPA (aTPA) to other regimens in the treatment of acute MI and showed a clear advantage (MMS: Error). As a two-choice example, when aTPA was compared to streptokinase plus IV heparin, the difference in the 30 day mortality (6.3% vs. 7.4%) was significant at p=0.003. If we look at it from our other perspective (the probability of being assigned a particular target, or p=0.5 under the null), then the "hit rate" for aTPA* is 50.5% which is no longer significant at p=0.15 (two-tailed).

                  *If the hit rate in a two-choice ganzfeld test is "when subject guesses A the randomly selected target is A, and when subject guesses B the randomly selected target is B", then in a two-group medical trial the hit rate is "when subject lives (A) the randomly selected target is aTPA, and when the subject dies (B) the randomly selected target is SK+IV heparin". In this case, the hit rate is 9695+770 out of a total of 20721.
                  You've computed an agreement statistic, when agreement was not the hypothesis. The independent variable (drug) and dependent variable (mortality) aren't even compatible for a test of agreement (there is no natural pairing). The only reason you could even compute the statistic is that you had a two-treatment–by–two-outcome table. Try computing an analogous statistic for a four-treatment–by–two-outcome table!

                  But agreement makes sense in a ganzfeld trial, because there is a natural pairing between the column categories (the assigned targets, A, B, C, and D) and the row categories (the guessed targets, A, B, C, and D).

                  It looks like we get different answers depending upon how we regard our null hypothesis. So how do we go about resolving this issue? (not rhetorical)
                  For starters, let's not do agreement tests to test for group differences.

                  I'm not saying that an analysis using the observed frequencies would be useful. I'm just pointing out that whether we end up rejecting the null (and the value of the z-score) depends upon the observed frequencies, despite ignoring the observed frequencies. As you have pointed out, when observed frequencies behave like they are expected to behave, we could fairly safely ignore them (which I agree with).
                  To be specific, what I said is that if the null hypothesis is true and the randomizing process valid, then the probability that the null will be (erroneously) rejected is the significance level of the test (α). But if you use a test based on observed target frequency (say a chi-square test of independence between the assigned and the guessed targets) your probability of erroneously rejecting the null will still be α. So what have you gained?

                  To determine whether the randomization was valid, it would be useful for investigators to report the results of the randomization. If over a number of experiments there was an inordinate imbalance in target selection, then that would indicate that at least some of the experimental results are invalid. If this were found out to be true, then I don't think the answer would be to use a different analysis.
                  Don't go soft on me now! . You already said that it was sufficient to know that each output is an independent and identically distributed Bernoulli trial, and that "shit happens" was not sufficient reason to invalidate the test.
                  Did I say Bernoulli? Then I wasn't thinking. The outputs should be from a uniform distribution.

                  And I don't think I'm contradicting myself, or "going soft." In a clinical trial report, the authors not only explain the randomization procedure, they also show the outcome of the randomization so that readers can make judgments about it. Parapsychologists should do the same. A target frequency distribution that implausibly deviates from uniform would be cause to question the validity of the study, and if such a pattern were observed across studies, it would be cause to question the legitimacy of the field.
                  Last edited by jt512; May 17th, 2013, 09:00 PM.

                  Comment


                  • Just a quick note to let everyone know that our quick analysis of Selected subjects vs. Unselected in the non-meta-anlaysed data did not replicate the pattern previously seen, of Selected subjects out-scoring Unselected ones.

                    Our figures (still waiting for details on one experiment) are:

                    Unselected scored 87 hits in 297 trials, for a hit rate of 29.3%, marginally significant at p=0.052

                    Selected scored 24 out of 147 trials, for a hit rate of 16.3%, which is significant in a negative direction, p=0.0078.

                    Both of those are one-tailed.

                    So, this indicates that the missing experiments from Storm et al’s meta-analysis do not score lower due to the ratio of Selected/Unselected participants, and so there remains this large amount of data which is significantly lower that the meta-analysed papers, and this goes through the years, right back to the seventies. Whether deliberate or not, this is a sign of bias on the part of parapsychologists.

                    Comment


                    • Originally posted by fls
                      Originally posted by jt512
                      In a randomized clinical trial, the goal is to determine if there is a difference in an outcome, say mortality, due to a difference in a treatment between the groups.
                      Right. And this seems to be what we think is happening in ganzfeld trials - that the guess is due to the target (ignore the "between groups" part). Different target, different guess. So back to the question I asked.
                      I don't think we can ignore the "between groups" part. The reason we analyze clinical trials by computing and comparing outcomes per group, and not by computing that weird statistic you came up with and comparing it to a chance global rate, is that the question we want to answer directly is, What is the between-groups difference? In contrast, in a ganzfeld experiment the primary hypothesis is not a between-groups question, but a global one: do people guess the right target more often than they would by chance alone?

                      The trial can also be regarded as "a target is randomly selected, what is the probability the subject will choose that picture?" And in that case, p is unknown under the null. Conceptually, this is more in line with what we propose is happening (we suspect that the guesses change under the alternate hypothesis, not the target selection).
                      No. The overall hit probability under the null is known. As long as the random number generation process is valid (and there is no excuse for it not to be), then that probability is .25.

                      When we think that the guesses are different depending upon what target is chosen, why are we doing the unintuitive thing and asking whether the target randomization is different depending upon what picture the subject chooses?
                      We're not. We're forming a binomial random variable that the guess is correct or not, and then comparing the observed probability with the probability under the null (.25, if there are four choices).

                      If the null hypothesis in the ganzfeld is described as "what is the probability that the subject will choose the picture selected as the target?" (rather than "what is the probability that the randomizer will select the picture which the subject chose?"), doesn't that probability vary from trial to trial?
                      Conditionally on the selected target? Yes. Unconditionally? No.

                      One fewer assumption to mess with the frequency with which you reject the null?
                      If the random number generation process is valid, then the unconditional binomial analysis is valid, with the probability of a type 1 error equal to the significance level of the test. This is the frequentist ideal. Furthermore, the result of the analysis, the overall (ie, without respect to target) probability of a guess being correct is straightforward to calculate and interpret.

                      There is no excuse for the random number generation process being invalid. Nowadays you can practically get a cryptographically secure random number generator for free with the purchase of a box of Crackerjacks. Even in the old days, perfectly good random number tables could be, and were, used instead. Therefore, if the observed frequency distribution of targets in an experiment implausibly differs from a uniform distribution, then the experiment is invalid—not because the random generator is invalid, but because the investigators have corrupted the experiment in one way or another. Such a corrupted experiment cannot be saved by using an analysis that is conditional on target.

                      In a clinical trial report, the authors not only explain the randomization procedure, they also show the outcome of the randomization so that readers can make judgments about it.
                      They don't just show it, they use it in their analyses.
                      Use it in their analysis? How?

                      Parapsychologists should [disclose the outcome of their randomization of targets]. A target frequency distribution that implausibly deviates from uniform would be cause to question the validity of the study, and if such a pattern were observed across studies, it would be cause to question the legitimacy of the field.
                      Have you looked at this in regard to the outcome of the randomization in medical trials?
                      I'm not sure what you are referring to by "this." I have certainly looked at whether the randomization in clinical trials resulted in balance of known potential confounding variables, which, besides blinding, is what it is intended to do.
                      Last edited by jt512; May 20th, 2013, 10:03 PM.

                      Comment


                      • Originally posted by Ersby View Post
                        So there remains this large amount of data which is significantly lower that the meta-analysed papers, and this goes through the years, right back to the seventies.
                        Andrew, is the evidence of the existence of this un-meta-analyzed data throughout the years succinctly summarized somewhere?
                        Last edited by jt512; May 20th, 2013, 09:57 PM.

                        Comment


                        • Originally posted by Ersby View Post
                          Just a quick note to let everyone know that our quick analysis of Selected subjects vs. Unselected in the non-meta-anlaysed data did not replicate the pattern previously seen, of Selected subjects out-scoring Unselected ones.

                          Our figures (still waiting for details on one experiment) are:

                          Unselected scored 87 hits in 297 trials, for a hit rate of 29.3%, marginally significant at p=0.052

                          Selected scored 24 out of 147 trials, for a hit rate of 16.3%, which is significant in a negative direction, p=0.0078.

                          Both of those are one-tailed.

                          So, this indicates that the missing experiments from Storm et al’s meta-analysis do not score lower due to the ratio of Selected/Unselected participants, and so there remains this large amount of data which is significantly lower that the meta-analysed papers, and this goes through the years, right back to the seventies. Whether deliberate or not, this is a sign of bias on the part of parapsychologists.
                          Nice work Andrew. Could it not be a quality issue: those studies adhering to more established GZ protocols are more likely to be selected in an MA - and give higher hit rates than unselected studies following irregular protocols?

                          Comment


                          • Originally posted by Ersby View Post
                            Just a quick note to let everyone know that our quick analysis of Selected subjects vs. Unselected in the non-meta-anlaysed data did not replicate the pattern previously seen, of Selected subjects out-scoring Unselected ones.

                            Our figures (still waiting for details on one experiment) are:

                            Unselected scored 87 hits in 297 trials, for a hit rate of 29.3%, marginally significant at p=0.052

                            Selected scored 24 out of 147 trials, for a hit rate of 16.3%, which is significant in a negative direction, p=0.0078.

                            Both of those are one-tailed.

                            So, this indicates that the missing experiments from Storm et al’s meta-analysis do not score lower due to the ratio of Selected/Unselected participants, and so there remains this large amount of data which is significantly lower that the meta-analysed papers, and this goes through the years, right back to the seventies. Whether deliberate or not, this is a sign of bias on the part of parapsychologists.
                            Andrew, why did you post these results unilaterally? I thought you agreed to my suggestion for all four of us (me, Johann, Iyace, and you) to have a discussion about how to best present the results before doing so.
                            Last edited by Maaneli; May 20th, 2013, 10:41 PM.

                            Comment


                            • Originally posted by Michael Duggan View Post
                              Nice work Andrew. Could it not be a quality issue: those studies adhering to more established GZ protocols are more likely to be selected in an MA - and give higher hit rates than unselected studies following irregular protocols?
                              It's not exactly Andrew's work. I did all those calculations he reported (and he left out other calculations I did), and I, Johann, Iyace, and him, collectively sorted through the relevant studies to find out which used selected subjects and which used unselected subjects.

                              Comment


                              • Originally posted by Ersby View Post
                                Just a quick note to let everyone know that our quick analysis of Selected subjects vs. Unselected in the non-meta-anlaysed data did not replicate the pattern previously seen, of Selected subjects out-scoring Unselected ones.

                                Our figures (still waiting for details on one experiment) are:

                                Unselected scored 87 hits in 297 trials, for a hit rate of 29.3%, marginally significant at p=0.052

                                Selected scored 24 out of 147 trials, for a hit rate of 16.3%, which is significant in a negative direction, p=0.0078.
                                Also, just to add to this, Fisher's exact p = .0035 (highly significant) for the difference between the overall hit rates of the unselected and selected studies.


                                Originally posted by Ersby View Post
                                So, this indicates that the missing experiments from Storm et al’s meta-analysis do not score lower due to the ratio of Selected/Unselected participants,
                                True, but it does show that the selected subjects score significantly lower than chance expectation, and significantly lower than unselected subjects. That this, these results can't be explained by the null hypothesis, and are opposite to the usually expected parapsych bias.

                                Comment

                                Working...
                                X