These notes detail some methodological and other issues which create reasonable doubt that the featured report substantiated positive impacts from the Unplugged curriculum and on the feasibility of adequately implementing the interventions.
Randomisation was compromised when 24 of the 102 schools allocated to Unplugged pulled out of the study. Just three of the 68 control schools pulled out, suggesting (as the researchers surmised) that rather than the research elements, it was the burden of implementing the interventions which deterred most schools. Since this excess drop-out was unrelated to any of the characteristics they measured, the researchers argued it was unlikely to have affected the development of substance use among the schools' pupils. An alternative case can be made that schools least prepared to invest in substance use prevention were winnowed out from the Unplugged arm of the study, leaving a more promising set of schools than remained in the control arm. Given how generally small they were, this could conceivably account for differences in outcomes between the two arms. These and other losses to the study meant that the 7079 pupils who filled in baseline surveys became 5541 contributing analysable data at the latest follow-up survey.
Post-allocation pull-out was one reason why what started as 290 schools eligible for and invited to join the study was whittled down to 138 at the final follow-up. Another major contributor was that 120 never got allocated, mainly because they were unable to schedule the intervention during the next school year. The number of schools which pulled out casts doubt over whether Unplugged really was as feasible for schools to implement as its creators had intended. This plus the loss of pupils from the study mean the findings can only be considered applicable to the roughly half of schools prepared to take on the burden of the research and interventions, and to the minority of the entire pupil population taught in such schools and who complete the surveys required by research projects.
What is known for certain is that the parental and peer-leader supplements did not prove feasible. Parent workshops were implemented by schools, but often inadequately and generally parents stayed away. On reflection the researchers thought the peer activities demanded too high a level of leadership for these early teens. They also found that that despite the strong motivation of the teachers, implementation of the core curriculum itself was "just moderate", seemingly because the lessons took too long and teachers found some of the activities (such as role-play) difficult.
Some findings which did meet criteria for statistical significance might not have done so using alternative ways of testing the results. One alternative which did eliminate all statistically significant differences at the latest follow-up was to assume that all the children not followed up were engaging in the substance use being tested. As the authors pointed out, the small numbers known to have used substances in these ways were overwhelmed by the unrealistic assumption that all 1538 children missing from the analysis had done so. It would also be unrealistic to assume none had progressed in their substance use over the past 18 months, an assumption which left the findings unaltered. In between is the unknown real picture, one which may or may not have eliminated some or all of the statistically significant differences.
Especially if multiple outcomes tend not to covary, the more are measured, the more likely it is that some will reach the threshold for a statistically significant difference purely due to chance variations in the samples rather than any real impact of the interventions being tested. For example, by convention, if a difference would happen only 1 in 20 times by chance, it is considered a non-chance occurrence possibly due to the intervention. But if, say, 20 independent outcomes are measured, more often than not one would cross this threshold purely by chance. To cater for this, it is recommended1 that researchers consider raising the threshold (in the example, according to some adjustment methods to as high as 1 in 400) before each of the outcomes is considered to have reflected a statistically significant difference. 1 International Conference on Harmonisation Of Technical Requirements for Registration of Pharmaceuticals for Human Use. "ICH harmonised tripartite guideline statistical principles for clinical trials." Statistics in Medicine: 1999, 18, p. 1905–1942.
1 International Conference on Harmonisation Of Technical Requirements for Registration of Pharmaceuticals for Human Use. "ICH harmonised tripartite guideline statistical principles for clinical trials." Statistics in Medicine: 1999, 18, p. 1905–1942.were made for the multiple outcomes tested in the study to reduce the possibility that some differences were found statistically significant purely by chance. In the featured report, Unplugged schools had a significant advantage on two out of seven measures of the prevalence of different types of substance use. However, one such measure was missing – any use of illicit drugs. The first follow-up found this had not been significantly reduced by the interventions. Assuming this remained the case, on just two of eight prevalence measures did the Unplugged schools record a significant advantage. Also, just five of the 27 possible transitions between these types of substance use were significantly affected by the lessons. Lack of adjustment for multiple tests was less of a problem than it might have been because several measures (eg, only pupils who have smoked at all can also smoke frequently) were not completely separate. Nevertheless, the findings do not adequately support statements about the impact of the programme on substance use as a whole. The most which can be said so far is that among the pupils who could be included in the final follow-up, it reduced drunkenness, could not be shown to have reduced smoking or cannabis use, but may have done so.
Also it is unclear whether statistical significance was decided on the basis of so-called 'one-tailed' tests which effectively assume As opposed to 'two-tailed' tests. Sometimes one-tailed statistical tests are justified by appeal to the study's aim to test whether the intervention is better than the alternative, not whether it is better or worse, or by virtue of the expectation that the intervention will be better. But "a one-tailed test is only well justified if in addition to the existence of a strong directional hypothesis, it can be convincingly argued that an outcome in the wrong tail [ie, finding the intervention was counterproductive] is meaningless and might as well be dismissed as a chance occurrence" (Abelson R.P. Statistics as principled argument. Lawrence Erlbaum Associates, 1995). that a negative finding (in this case, that the interventions made things worse) must have been a meaningless fluke. This kind of test roughly doubles the chance of finding a significant positive effect.
The findings for boys when the sample was divided by sex strongly suggest a consistently beneficial programme impact, but 'suggest' is all they can do, because the analysis was not planned in advance. Such post hoc analyses can capitalise Post-hoc subsample analyses of this kind are best seen as generating hypotheses for testing in a study specially designed for this purpose. The main problems (there is no implication that these all applied in this case) are that they rob the results of the reassurance of the level playing field created by randomising patients to different treatments, build on what may be chance variation in the effectiveness of the intervention between different subsamples, test effects not derived from the theory of how the intervention is supposed to work, and can capitalise on the fact that samples can be sub-sampled in any number of ways until one (perhaps purely by chance) results in a significant finding. As a result, "any conclusion of treatment efficacy (or lack thereof ) or safety based solely on exploratory subgroup analyses are unlikely to be accepted" (Lewis J.A. "Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline." Statistics in Medicine: 1999, 18, p. 1903–1904. These risks are eliminated or reduced by specifying the subsamples in advance at the time the trial is designed but often this is not the case (Al-Marzouki S., Roberts I. "Selective reporting in clinical trials: analysis of trial protocols accepted by The Lancet." The Lancet: 2008, 372, 19 July, p. 201). on the likelihood that, purely by chance, one of the many ways a sample can be divided up will produce a significant finding in one of the sub-samples.
Finally, the EU-Dap Project Team evaluated a curriculum which they themselves had designed. In several social research areas (1 2 3), these and other forms of 'allegiance' have been found to favour more positive findings than fully independent research. Such overlaps are, however, endemic in drug prevention research.
Comment on these background notes Return to main entry