|
Timm, U. (2000). The Problem of Finding an Optimal Measure of
Effect Size for Psi Experiments. Proceedings of Presented Papers: The Parapsychological Association
43rd Annual Convention, (pp. 292-301).
Abstract
The measurement of effect size (ES) has
a long history in parapsychology, though the earlier attempts did not use this
special statistical term. Already in 1935, R. H. Thouless presented an index of
"psi efficiency" (later on called ESP quotient) that measures the
relative number of psi-effected (or "true") hits. He used a
straightforward statistical derivation separating the hits into true hits and
chance hits. Over the years, his index was discussed and redefined by several
authors, among them the present author (1971), who termed it psi-coefficient (ψ)
and also derived an alternative index (ψ')
using an extended model, in which the single trials are divided into several
subtrials. Other parapsychologists used or discussed the ES measure z/(n)½
, which later became commonly used (e.g., Schmidt 1971). At that time, however,
most parapsychologists showed little interest in this topic.
No new development arose until
meta-analyses were introduced. Then the interest clearly increased, but it was
directed to widely applicable ES measures such as z/(n)½.
Thus it was overlooked that in ESP experiments a mixture of a perceptual and a
guessing component is given, which outside-parapsychology can only be found in a
few multiple-choice designs. The necessary elimination of the guessing component
can only be obtained by a special ES measure adjusted to this situation. Of
course the common ES measures are not such adjusted measures. And also
Rosenthal's and Rubin's more recent index ψ (1989)
does not have this quality though, strangely enough, it
was presented as an ES measure for the multiple-choice case.
A special feature of the true hit model
is the conclusion that, given positive and constant ES, the significance of ESP
experiments must increase not only with n, but also with decreasing hit
probability p (or with increasing number of target alternatives k = 1/p). For
instance, the classical ESP experiments with 5 alternatives should lead to
z-scores twice as high as experiments with 2 alternatives. Unfortunately,
Rosenthal's and Rubin's derivation suggests the contrary: Given the very low ES,
typical for parapsychology, it leads to a recommended "k-best" of
nearly 2! Since their derivation is not based on a realistic psychometric model,
but more on an ad hoc scale transformation, its application to ESP and other
multiple-choice experiments is quite problematic. Besides the index ψ,
at best the author's alternative index ψ'
is recommendable (in fact also for PK experiments), since it represents a
compromise between the extremes.
|