This entry is our analysis of a study considered particularly relevant to improving outcomes from drug or alcohol interventions in the UK. The original study was not published by Findings; click Title to order a copy. Free reprints may be available from the authors – click prepared e-mail. The summary conveys the findings and views expressed in the study. Below is a commentary from Drug and Alcohol Findings.
Links to other documents. Hover over for notes. Click to highlight passage referred to. Unfold extra text
Copy title and link
| Comment/query | Tweet
Spoth R., Redmond C., Shin C. et al.
Preventive Medicine: 2013, 56, p. 190–196.
Unable to obtain a copy by clicking title? Try asking the author for a reprint by adapting this prepared e-mail or by writing to Dr Spoth at rlspoth@iastate.edu. You could also try this alternative source.
Evaluated drug prevention programmes for adolescents are typically implemented by research teams, raising questions over real-world applicability and sustainability, but an important US trial is said to have robustly demonstrated the public health potential of a system in which the communities themselves take primary responsibility.
Summary The PROSPER (PROmoting School-community-university Partnerships to Enhance Resilience) model for disseminating substance use prevention is not a set programme but a way of engaging and supporting communities to (hopefully sustainably) implement the programmes they choose. It uses the leverage afforded by the existing infrastructure in US states of cooperative extension systems [offering non-formal educational programmes to help people use research-based knowledge to improve their lives] run by land grant universities [founded to extend higher education to broad segments of the US population].
In small rural US communities, the PROSPER trial tested a method for implementing family- and school-based substance use prevention programmes for adolescents in which the communities themselves to take primary responsibility.
Researchers on this major trial saw its findings as indicative of robust preventive impacts, showing the PROSPER model had the potential for improving public health.
Main questions over the findings are whether they would be replicated outside the type of communities recruited to the study, and whether even in these communities they really were a “robust” demonstration of the effectiveness of the PROSPER model.
In the PROSPER model local teams of 8–12 community stakeholders (from schools and human service agencies plus parent and youth representatives) associated with schools in the area are co-led by local cooperative extension staff. These teams are linked to state-level university researchers and cooperative extension faculties via prevention coordinators connected with the land grant university. Coordinators also provide ongoing, proactive technical assistance to community teams to optimise team functioning and the delivery of prevention programmes. Though technical assistance and expert support is provided, the local teams have primary responsibility for implementing the programmes they choose, promising a more sustainable and lower cost alternative to projects led by outside experts or by staff dedicated to the project.
To join the study, communities in the US states of Iowa and Pennsylvania had to have a school district enrolment of 1300 to 5200, of whom at least 15% were eligible for free or reduced-cost school lunches, netting communities with an appreciable percentage of poorer families. Also considered essential were qualified extension and school system personnel to serve as team co-leaders. The recruitment process included visits to school district administrators to describe the project. All the districts had just one high school.
Of 68 otherwise eligible districts, 20 did not meet staffing requirements; just five refused to join the study. Willing and suitable communities joined until the required 28 were recruited, 14 in each state. They consisted of rural towns and small cities with populations of about 7000 to 46,000. After matching on location and school district size, districts were randomly allocated to carry on as usual (the control districts) or to implement the PROSPER model, 14 in each arm of the study. During the first year of the trial two of the PROSPER school districts dropped out and were replaced. Control districts did not necessarily neglect prevention; virtually all engaged in some prevention efforts, and six of the 14 implemented interventions of the type available through PROSPER.
Aided by the other tiers of the system, the 14 local PROSPER teams selected what central experts saw as evidence-based prevention programmes intended to be universally applicable rather than just for high-risk families. The selection was to include one of three family-focused programmes. Of these, all 14 teams chose the version of the Strengthening Families Program developed by a project headed by the lead researcher in the study, and delivered it to pupils in the sixth grade, typically aged 11–12. For the programme parents and children were invited to attend seven weekly two-hour sessions typically in the evening; 17% of eligible families attended at least one session; of these families, 90% attended at least four.
The following year school-based substance use education programmes were chosen and delivered to the now 7th-grade pupils. Life Skills Training and Project Alert were each selected by four teams; the All Stars curriculum was selected by the other six. Each was delivered during class periods, generally by one of the schools’ own teachers, and reached around 9 in 10 of the intended pupils. This pattern of interventions was implemented for two successive school years beginning in 2002.
In 2002 and 2003 about 90% of the intended sixth-grade pupil sample provided some baseline data via confidential written questionnaires before the interventions started. Two-thirds lived with both biological parents, 85% were described as Caucasian, and about a third received free or subsidised school lunches. They were followed up by researchers annually until the 12th grade, which typically consists of 18-year-olds ending their school education. At each follow-up point, on average 86% of eligible pupils completed the questionnaires. By the final high-school follow-up the sample had reduced from the 10,849 who completed pre-intervention baseline questionnaires to about 7784, representing 72% of pupils who completed baseline measures and 65% of all the pupils in the relevant school years.
Presented findings focused on the 11th grade and the final 12th grade follow-ups, between which a further tenth of the sample could no longer be re-assessed. A ‘lifetime illicit substance use index’ scored from 0 to 5 captured the range of substances ever used by the pupils. It asked if they had ever used methamphetamine, ecstasy, cannabis (by smoking), drugs or medications prescribed for someone else, or certain prescription-only painkillers when these had not been prescribed by a doctor.
By 12th grade this index averaged 1.43 in PROSPER school districts and 1.68 in control districts, a statistically significant difference indicating that on average slightly fewer different types of drugs had been tried by pupils in PROSPER districts. Of the individual drugs, in PROSPER communities methamphetamine, ecstasy, cannabis and inhalants [‘glue-sniffing’] were all significantly less likely to have been tried among pupils who had not used them before the interventions (‘uptake’). Uptake of drinking and of drinking to the point of drunkenness were virtually identical in PROSPER and non-PROSPER districts. Results were similar in the 11th grade.
In respect of current/recent use of individual substances, at 12th grade pupils in PROSPER districts were significantly less likely to have smoked cigarettes in the past month, or over the past year to have used cannabis, methamphetamine or inhalants. Except for no significant impact on inhalant use, results were similar in the 11th grade. Across the whole sample, the largest relative reduction in the proportion of pupils was in respect of past-year methamphetamine use at the 12th grade, a reduction from about 4% to about 3%, representing 31% fewer users. At neither grade were there any statistically significant differences relating to drinking, assessed in terms of past-month drunkenness and past-year drink-driving.
The study also assessed how often pupils said they had recently got drunk, drove after drinking, or smoked cannabis. At the 12th grade, only in respect of cannabis did pupils in PROSPER districts use significantly less often, a reduction from 2.05 to 1.83 in a frequency index scored from 0 to 7. Results were similar at the 11th grade, except for a lower frequency of having been drunk (index scores 2.46 v. 2.66 in non-PROSPER districts) and a larger – but still not statistically significant – difference in driving after drinking.
Growth in substance use across the six-and-a-half years of the follow-up tended to be less steep among pupils in PROSPER school districts. These results were statistically significant for growth in the current/recent use of all assessed substances and frequency of use measures, except in respect of inhalants and having been drunk (though growth in the frequency of drunkenness was significantly retarded).
Often the association of PROSPER with retarded development of substance use was stronger among pupils at higher risk of substance use at the start of the study, defined as those who (typically aged about 11–12) had already tried one or more of alcohol, cigarettes, or cannabis. Greater effects on high-risk pupils were most apparent in the 11th-grade analysis. By the 12th grade it was statistically significant for the lifetime illicit substance use index, and over the past year for cannabis use and frequency of use; measures related to smoking, drinking or methamphetamine showed no significant effect. This pattern was the same for differences in the growth of substance use over the entire six-and-a-half years of the follow-up.
Overall, the effects of interventions delivered via the PROSPER delivery system on long-term adolescent substance use outcomes were robust, both in terms of growth across the middle- and high-school years, and in results at the final two follow-ups. Positive long-term effects were observed for lifetime illicit use and for current use and frequencies of use, for all types of substances. Relative reduction rates (indicating how much smaller the proportion engaging in that behaviour was in PROSPER districts) ranged from 3.3% to 31.4%. Notably, intervention effects on growth in substance use from 6th to 12th grades were significant for all outcomes, except past-month drunkenness and past-year inhalant use, indicating that the differences between PROSPER and control districts increased over time.
Effects on outcomes related to alcohol were relatively weak compared to those for other substances, perhaps because of relatively high starting levels of use and its greater acceptability. Intervention effects among the higher-risk subsample were comparable to or, in most cases, stronger than those among the lower-risk subsample, countering the contention that only lower-risk populations benefit from universally applied interventions.
Other findings from the study suggested how the interventions chosen by PROSPER districts might have worked. In middle school these reduced the influence on other students of adolescents who used substances, by reducing the centrality of substance-using youth in their friendship networks. Also there were positive effects on key factors thought to underly substance misuse in later adolescence, including social skills, parenting, and family environment, and the children’s attitudes and beliefs about substance use. Finally, early initiation of substance use was reduced by the interventions; early initiation is one of the most powerful predictors of later substance misuse, underscoring the critical importance of the developmental timing of interventions; PROSPER teams delivered interventions when pupils were beginning to experiment with substance use, but before they had transitioned to more frequent or serious use.
The relative reduction rates have potential public health significance. For example, if the rate for the frequency of cannabis use was to hold when scaling up the PROSPER partnership model, for every 100 12th-graders in non-intervention school districts using cannabis more often than yearly, there would about 86 in PROSPER districts. However, it has not yet been established whether results from study communities would generalise to populations which differ in characteristics such as ethnicity and geographic location.
commentary PROSPER’s evaluation design provided the framework for a stringent test of the PROSPER delivery model and its menu of interventions – not just in one school, but in 28 school districts across two US states, and not just in the immediate wake of the interventions, but up to six years later, and in respect of a sub-sample, up to age 19. It was an impressive culmination to a consistent programme of work dating back decades with important implications for mounting sustainable prevention programmes. Its importance warrants searching analysis and its methods offer an opportunity to address issues common to much substance use prevention research. The commentary is correspondingly extensive but readers who just want the main points can confine themselves to the Commentary in brief. Our conclusion is that the trial was not a robust demonstration of the effectiveness of the PROSPER system or its menu of interventions, one contested by the authors, whose comments are publicly available.
A major question was whether the interventions could be delivered through an implementation system which handed much of the work to representatives of the communities themselves – relatively ‘real world’ conditions compared to entirely researcher-controlled trials, and a system which might more sustainably embed preventive interventions in these types of communities. A second question was whether the interventions they selected – from a menu set by the central team on the basis that they had been shown to be effective – remained effective in these circumstances. If they did, it would represent a breakthrough to more sustainable, community-based substance use prevention.
On the face of it, both questions – whether the interventions could be implemented this way, and whether they would be effective – were answered in the affirmative, leading the researchers to talk of the “potential public health value” of the PROSPER model, suggesting it could underpin widespread dissemination of effective prevention. Relative to control communities, on nearly every measure published to date, children recruited to the trial in PROSPER communities were less likely to develop the substance use patterns the programme was aiming to prevent. On the balance of probabilities, it is likely that there were preventive impacts. However, criteria closer to ‘beyond reasonable doubt’ are commonly applied in scientific evaluations.
Over the reality of the programme’s public health potential, there are two main concerns:
1 First, and acknowledged by the researchers, whether the results would be replicated outside the type of communities recruited to the study.
2 Secondly, whether even in these communities, these results meet conventional scientific criteria for being confident they represent real effects; whether they really were a “robust” demonstration of the effectiveness of the PROSPER model.
These concerns are unpacked under the headings: Would the system work elsewhere? and Were the interventions really shown to be effective? You can find the bottom line under the heading, Focus on the final in-school follow-up, where the findings of the trial are revisited taking into account methodological concerns. During this process the PROSPER trial is used to explore common features which undermine confidence in prevention programme evaluations – though the trial and its reporting are no more flawed than several others, and in significant respects, more robust than most.
Also examined are the findings that in some respects pupils most likely to use substances benefited most from PROSPER (‘High risk’ pupils benefit most), attempts to identify the most active ingredients in the PROSPER mix (Were there any (more) active ingredients?), results from earlier follow-ups (Earlier findings) and for a subsample of the former pupils at age 19 (Effects in early adulthood), and the degree to which the interventions the teams selected from really were evidence-based (Was the intervention menu evidence-based?).
This section summarises points explored and substantiated later, and introduces a concern over this trial and many others – the possibility that researchers committed to an intervention will lean towards finding and declaring it works.
The first question raised above – generalisability to other types of communities – can only be settled by further studies, but the need for well-developed university-community partnerships, the small, distinct nature of the communities in the trial, and their selection and self-selection into the study, mean similar interventions cannot be assumed to be feasible or to work elsewhere.
On the second question, the weight of the findings favours the interventions, but generally only modestly and without meeting conventional criteria for ruling out chance findings. Based on these criteria – more stringent than those to which the study’s authors subjected their findings – probably only one of the nine measures of current/recent use at the final follow-up of the full sample were significantly in favour of PROSPER – a small difference in the average frequency of cannabis use. Despite other positive findings, judged in terms of how the pupils ended up as they emerged into adulthood, overall the programme was not shown to be a success – though the possibility remains that actually it was.
Across the entire PROSPER follow-up period, findings of reduced use at different ages were consistently strongest for cannabis and at times strong too for inhalants and methamphetamine, but consistently non-significant for measures related to drinking and usually also smoking. Consistently too, a measure combining ever having illicitly used a range of substances was lower in PROSPER communities, probably, mainly reflecting fewer youngsters having tried cannabis, but this measure could reflect a single episode of use several years before rather than any current behaviour of concern.
By age 19, normally a year after leaving school, significant results were confined to a slightly lower average frequency of smoking and a lower frequency of cannabis use (about 11 v. 15 times in the past year), and slightly fewer drug-related problems, but none may have survived an adjustment for the many chances given PROSPER to register a statistically significant difference. At this age, in no case was the proportion of youngsters who had recently used various substances significantly lower in PROSPER communities, and neither were any of the alcohol-related measures.
‘High risk’ pupils did in some respects seem to benefit most from PROSPER, and at least did no worse, but methodological concerns limit confidence in these findings.
Attending the family programme and choosing Life Skills Training as the school programme emerged as the most active components. Since both were choices rather than random allocations, the findings can only be considered suggestive. Outside the context of the PROSPER trial, the options on its intervention menu have not been found consistently effective, particularly in trials conducted by research teams with no stake in the intervention’s success – though it might justifiably be claimed that they are at least more evidence-based than most preventive programmes of their kind.
A further important consideration is whether PROSPER’s generally small and (when subjected to stringent tests) patchily significant longer term gains are worth the investment it took to produce them. Perhaps the most hopeful finding was a reduction in drug-related problems when the former pupils were aged 19 ( below), but how much importance to attach to what seems a small difference is unclear, as is how representative the sample was by that stage.
A difficult but (in substance use prevention) unavoidable issue is the conflict of interest between the developer of a programme whose motivation may be to show it works and promote its dissemination – perhaps for laudable reasons to do with advancing public health – and those of an evaluator, whose motivation should be to ‘stress-test’ the intervention by subjecting effectiveness findings to rigorous scrutiny – also for the laudable reason of not wasting resources on unproven interventions. If it survives this scrutiny, then the intervention has a strong claim to evidence-based status. Claims which emerge from an attempt to prove rather than disprove effectiveness risk being based on a less rigorous examination more friendly to the intervention being tested.
In substance use prevention, the very common overlaps between the developers of a programme and its main evaluators lie at the heart of the so-called ‘researcher allegiance’ effect. As in other social and medical research areas (1 2 3), there is concern that researchers with an interest in a programme’s success record more positive findings than fully independent researchers. Possible reasons include implementation quality unachievable without the developer’s inputs, transmission of optimistic expectations to the interventionists and in turn to their pupils or clients, and the relaxing of accepted research practices intended to prevent bias and minimise the risk of falsely declaring an intervention a success.
A prominent substance use prevention researcher has themselves highlighted the pervasiveness of developer-led evaluations, and the consequent risk of biased reporting of findings: “Even if the researcher is not selling his/her program for profit, reporting positive findings increases the possibility of future research, so in many cases the stakes are rather high.” Rigorous evaluation can amount to asking someone who has invested a lifetime’s work in an intervention to be prepared to test it to destruction. Counter-measures the researcher called for included “full transparency” (entailing the publication of specific research plans before the data has been collected) and making all the data available to others for re-analysis.
In the featured trial the conditions for an allegiance effect were plainly present. The lead author led the project which initially developed the only intervention chosen by all the PROSPER teams. The institute he directs is dedicated to promoting university/community partnerships. Among its “values and core beliefs” are that “Partnerships with communities – schools, Extension, other community agencies – are essential to long-term, positive outcomes for youth, adults, families and communities.” The PROSPER trial was the culminating test of these values and beliefs, and of a programme of work dating back to the early 1990s which has attracted serial government funding. Three of his co-authors on the featured report also work at the institute and the remaining two are described as collaborators. With so much to lose from a trial which might contradict core beliefs and values and undermine decades of work, the “stakes” were indeed high.
These conclusions were reached on the basis of more stringent criteria for success than those considered appropriate by the researchers. The conditions for a ‘researcher allegiance’ effect ( panel) were present in the study, and some of the methods used to analyse the results were of the kind associated with such an effect – in particular the tendency of evaluators committed to an intervention to take the stance of trying to prove its success, rather than testing whether the findings would survive a rigorous dis-proval attempt.
This phenomenon is not just endemic, but integral to programmes of work of the kind which culminated in the PROSPER trial. Researchers not committed to the issue they are researching would be unlikely to sustain a research and development effort over decades, and that easily shades into commitment to ‘their’ particular way of addressing the issue. Without this commitment, the sustained effort needed to develop effective approaches might never happen, but with it, the results may not be reliable or replicable.
PROSPER was developed for small rural communities with the required infrastructure, willingness and capacity. “Results are primarily expected to generalize to the type and size of communities selected for this study,” acknowledged the researchers. Their reports highlight the importance of local school and university personnel willing to engage in a collaborative effort to prevent substance use problems in their communities. Support from universities mean the local teams are not left on their own, but still they have to do the bulk of the work. Inevitably this genesis is likely to limit the applicability of the PROSPER system – but arguably limit it less than delivery systems which require a research/expert team to do all the work.
Whether the PROSPER model would be able to engage communities to implement the interventions was a major focus of the research. In most trial communities, that test largely seems to have been passed: all the interventions were well implemented, with the notable exception that the family programme engaged few of the targeted families.
However, trial communities were enrolled after an extensive selection process. By design, they were not necessarily typical in their resources or interest in preventing substance use. Required was both a school district administration and a university extension educator willing to be involved in the PROSPER programme. Of 68 otherwise eligible districts, 20 did not meet staffing requirements, and five refused to join the study. The 28 which did join had volunteered to do so rather than having been selected at random or for their representativeness. Even in these same types of small communities, their performance may not be replicated more broadly. The chances become slimmer yet if a region or nation decided to mandate or strongly incentivise such a programme across its schools – yet to fulfil a significant public health role, widespread dissemination is essential. Nevertheless, during the first year of the project two of the 14 allocated to PROSPER dropped out, signalling what may in those areas have been major implementation problems.
The final sample consisted of small towns and rural communities, almost all of whose residents and team members were variously described as White or Caucasian. How a community-based programme would be engaged with and implemented in these perhaps isolated, small, and homogenous communities is, the researchers acknowledge, not necessarily a guide to what would happen in places like London and New York were a similar university-community collaboration attempted.
From the communities which did join the study and were allocated to the PROSPER arm emerged indications of where that kind of approach is most likely to succeed. On the negative side, strongest of these indicators was poverty. Findings from the trial (for details unfold supplementary text) suggest that programmes like PROSPER flourish best in communities most willing and able to help themselves, which also tend to have less serious substance use and other problems. Poor, fractured and disheartened neighbourhoods most in need of an antidote to substance use problems seem least likely to profit, and most likely to need intensive and ongoing support. It was, however, promising that across all the PROSPER teams, members’ knowledge of how to implement and evaluate prevention programmes increased significantly more than among people in similar positions in the control schools, perhaps heralding the generation of a core community network able and willing to continue these attempts.
Answering this question is hampered by there being no publicly available study plan published or registered before data had been collected. Without this it is not known which (if any) of the many measures taken were considered primary yardsticks of the success of the PROSPER model, whether any measures have not been reported on, and which of the analyses were planned in advance of knowing what the data showed, making it impossible to rule out selective reporting.
After tightening the criteria for what counts as a statistically significant finding, only a small difference in the frequency of cannabis use over the past year is likely to remain statistically significant among the nine current/recent measures at the 12th grade follow-up. The ‘tightening’ entailed accounting for the possibility that some of the many differences could breach conventional criteria for statistical significance by chance, and not assuming that a negative result was unthinkable. These and other issues are expanded on below.
Pre-selecting a primary yardstick of success is considered critical to clinical trials. Without this, it is unclear whether the findings indicate the intervention has been successful, and the way is open for studies to take so many measures that purely by chance at least one would meet the conventional criterion (differences which would happen fewer than 1 in 20 times by chance) for statistical significance, permitting the intervention falsely to be declared a success. Specifying this yardstick in a publicly available pre-data study plan also makes it more difficult to cherry-pick or construct other measures known to produce a significant result, or to choose from among the many ways to subdivide samples those which cast the intervention in the best light.
Some change in measures is to be expected over such a long follow-up at a time when substance use will be changing due to maturation and changing patterns in US society, and also when national drug misuse priorities change. However, this does not seem to account for some of the changes ( below), and it would have remained possible to specify primary measures at each new age-point but before the data had been collected.
To cater for the risk of finding ‘significant’ differences by chance, studies which test a hypothesis across multiple outcomes are advised to consider (1 2) adjusting the significance bar upwards. This safeguards against chance variations seeming to vindicate an intervention, but at the cost of more possibly real effects being dismissed as non-significant.
PROSPER chose not to do this, yet the risk of a chance finding was substantial. By gathering or constructing many measures, the featured analysis provided at least 18 chances for PROSPER to prove effective at the 12th grade, and another 18 at the 11th grade, plus 34 across the whole follow-up – 70 in all. In one of the reports on the trial the research team explained why they nevertheless chose not raise the bar, instead testing each outcome against the usual ‘1 in 20 by chance’ criterion. Effectively, the argument seems to have been that risking many effects being discounted might have “masked any interpretable patterns of findings” – yet the pattern of findings could still have been interpreted by laying aside for the time being the requirement that each be statistically significant, and instead focusing on the size and (in its everyday meaning) significance of the differences, expressing any conclusions in suitably tentative language.
Another major methodological concern is the use of so-called ‘one-tailed’ tests of statistical significance in the featured analysis and in other PROSPER follow-ups except the first and last. Effectively these assume that a negative finding (in this case, that the PROSPER programme made things worse relative to control schools) must be a meaningless fluke and can be ignored, roughly doubling the chance of finding a significant positive effect. The argument for this assumption rested on the past record of the interventions in this and other trials, but that was and remains by no means so universally positive or so predictable that a negative outcome was unthinkable. In any event, the point of the trial was to test the new PROSPER delivery system, the effects of which could not be predicted in advance; unfold supplementary text for details.
Perhaps appreciating that their rationale for one-tailed tests would not be universally accepted, the researchers provided exact significance levels so these could be doubled to approximate results of a two-tailed test. But their preference for one-tailed tests gave an impression of robustness in the findings which does not stand up to a more appropriate analysis.
The featured report claims ‘intent-to-treat’ status for its analysis of the PROSPER trial, an important signifier that the findings were not biased by drop-out from the study or the exclusion of communities which inadequately implemented the interventions or pupils/families who inadequately attended them. Without this assurance, the randomisation intended to create a level playing field between PROSPER and control districts cannot be assumed to have been sustained, and the outcomes may be biased.
The claim is made on the basis that pupils were included regardless of whether they attended the substance use education lessons or their families attended the family intervention, an important safeguard. But this is only one aspect of a true intent-to-treat analysis. To meet this standard, every school district and pupil randomly allocated to PROSPER versus control arms of the trial would have to have been included in the outcome analyses, even if they dropped out from the study early.
Yet in the first year of the project two of the school districts allocated to PROSPER dropped out and were replaced. Especially because the trial was a trial of the PROSPER model rather than the interventions as such, these communities should have been included in the outcome analyses. This would have incorporated in the evaluation of PROSPER the possibility that even communities seemingly willing and able to undertake it will later find themselves unable or unwilling to do so, a possibility effectively excluded from the analysis. Several analyses were undertaken which showed that the replacement sites did not upwardly bias estimates of PROSPER’s effectiveness. Among these were excluding the replacement sites and their paired control districts, omitting them and the two ‘worst’ control districts ( panel below), comparing trajectories of substance use for each individual site, and comparing PROSPER-control differences within pairs of sites; none meets the standard of including the missing districts in the analysis. However, the re-analyses which were done suggest that the impacts of this departure from the standard were probably small.
The first report of outcomes from the trial says 12,022 pupils were assessed at the start, while elsewhere we learn that 11,960 provided “valid information,”. With about 7784 completing 12th grade assessments, it means about 35% of the pupils intended to be included in the study were missing, amounting to 27% of those who completed baseline measures. Though par for the course for long-term follow-ups, assessors were concerned that this loss might lead to inaccurate estimates of PROSPER’s effects.
The analyses tried to account for the missing data, and checks found that on known variables there were no significant differences between the kinds of pupils not followed up in PROSPER versus control districts, leading the researchers to argue that loss to follow-up was unlikely to have biased the findings. However, this is not the same as saying that the pupils who contributed data remained representative of the starting sample, and such checks cannot eliminate the possibility that unmeasured factors introduced a bias in the comparison between PROSPER and control pupils. Sophisticated statistical procedures like those employed in the featured analysis help relatively little when the reason why the data is missing may be related to the outcomes being assessed – in this case perhaps, that pupils more likely to use substances are also more likely to have missed follow-up assessments. Even when this is not the case, the degree of missing data in the featured analysis can result in biased estimates of effects.
Missing data is of greatest concern when a PROSPER-control difference was due to so few youngsters that a missing third could have easily have tipped the balance – the case, for example, in respect of the 12th-grade finding on methamphetamine use. Across the entire sample this yielded the largest relative reduction in proportions, in this case of users in the past year. But in absolute terms the reduction was from 4% to 3%, meaning that about 78 pupils held the balance in a sample missing over 4000 of all those in the sampled school years and over 3000 who completed baseline measures.
Once the considerations explained above are taken into account, how do the findings from PROSPER look? The short answer is, much less encouraging. Across all measures, generally the programme would not have registered a statistically significant advantage. Had two-tailed tests ( above) been used, 31 of 70 of substance use outcomes recorded in the featured article or in its supplementary data would have registered a statistically significant advantage for the PROSPER communities. Add in an adjustment for multiple tests and take steps to conservatively adjust for the two replaced PROSPER communities, and at the 12th grade just two of 13 measures remain significantly in favour of PROSPER. Focus on those reflecting how the pupils ended up at the 12th grade rather than their history of having ever used substances, and just one of the nine assessments significantly favoured PROSPER – a small difference in the average frequency of cannabis use. Even with other measures of use and earlier follow-ups, it is questionable whether this single current/recent use outcome warrants the “long-term … robust” description given their findings by the researchers. This summary of findings is expanded on below, along with the justification for focusing on the final in-school follow-up.
PROSPER’s web site forefronts the fact that the trial “has been recognized by two of the most rigorous review panels for prevention programs, the Coalition for Evidence-Based Policy and Blueprints for Healthy Youth Development”. Blueprints rated PROSPER as “promising”, while the Coalition judged that it only narrowly failed to meet its “top tier” standard, perhaps due to the need to replicate the findings in other types of areas.
Both sets of assessors recalculated statistically significant differences on a two-tailed basis ( above), apparently rejecting the explanation given for one-tailed tests. Additionally, the Coalition’s analysis reported adjustments for multiple tests to reduce the concern that some will have produced a statistically significant difference purely by chance ( above). It also offers the most detailed analysis of the featured article’s 12th grade results, and based these on a slightly different sample of communities, one the assessors considered less likely to produce ‘false positives’ favouring PROSPER.
The Coalition was concerned that the drop-out of two PROSPER school districts undermined the randomisation intended to assure a level playing field, so instead of all 28 districts, they relied on a re-analysis by the study’s researchers which “omitted the two replacement communities in the PROSPER group along with the two control communities with the highest overall rates of substance use at 6.5-year follow-up”. Excluding the ‘worst’ of the comparison districts would have tilted the analysis against PROSPER, yet the estimated effects were only marginally smaller than across the full sample. Nevertheless, this analysis still contravened intent-to-treat principles ( above), and it remains possible that the two of the 14 districts allocated to PROSPER which dropped out would have registered worse outcomes than even the worst of the comparison communities.
In the re-analysis for the Coalition, given two-tailed tests the only statistically significant results relating to current/recent use at the final follow-up were a reduction in the likelihood of having smoked cigarettes in the past month from 36% in control districts to 31% in PROSPER districts, and a reduction in the frequency of cannabis use over the past year amounting to the difference between 2.1 and 1.8 on a scale scored from 0 to 7. Only the cannabis use frequency difference remained statistically significant at the conventional ‘1 in 20 by chance’ level once an adjustment had been made for multiple tests – also the only difference which across all 28 districts had a chance of surviving these stricter criteria.
Before the 12th-grade follow-up there had been three assessments of the effects of PROSPER at younger ages, and additionally assessments of trends across all school follow-ups. But if how the pupils ended up does not significantly differ, it matter less that there were positive findings from earlier years and positive trends – and the most recent of the two follow-ups might have produced inaccurate findings due to sizable differential drop-out of PROSPER and control pupils.
This is not to dismiss the possible value of preventing substance use during the early teens, even if there are no lasting impacts on later substance use. A clear example is the retardation in the uptake of inhalant use associated with PROSPER, any one episode of which could prove fatal, and similarly the possibility of long-term damage due to accidents or incidents while the teenager is drunk, generally not significantly retarded by PROSPER.
But the main argument for school-years prevention is as a cost-beneficial way of forestalling substance use problems and related costs and consequences in later life. PROSPER authors themselves argue that early findings from the project must be given full weight because delayed initiation of substance use could prevent substance use disorders later in life and/or prevent lasting developmental damage.
Reviewing this literature is beyond the scope of this commentary, but we can perhaps do enough to show that there is not necessarily any direct or proven causal link between early onset of substance use and later problems, and that the evidence for such a link is weak. There may be a lasting effect, but it cannot be assumed and has in each case to be demonstrated. The analysis of some of this literature which you can unfold below includes three prevention-study reports cited by PROSPER authors in support of their argument that delaying age of onset of substance use has lasting benefits.
For PROSPER as for other programmes, the proof of long-term benefits has to be found in the pudding of long-term findings, not assumed. For this reason, of greatest interest in the featured report is current/recent substance use at the 12th-grade follow-up, the last to include the full sample. At this stage, how many of the pupils could as they emerged into adulthood be considered ‘substance users’? Relaxing the significance threshold by using one-tailed tests and not adjusting for multiple measures meant success was recorded on all but the alcohol-related measures – an exception attributed to the normative and relatively widespread nature of drinking, despite there being plenty of room for the PROSPER programme to have reduced drunkenness and drink-driving, and combating normative perceptions was what some of the interventions were partly about.
Had two-tailed tests ( above) been used, the picture would have been very different. In respect of recently having engaged in these behaviours, there would have been no significant results for getting drunk, driving after drinking, smoking tobacco or cannabis, or inhaling solvents, leaving PROSPER to register a statistically significant advantage only in relation to methamphetamine – the difference between about 3% of the PROSPER sample having used this drug in the past year versus 4% in control school districts. Had an adjustment also been made for multiple tests, this finding would almost certainly not have survived, leaving no significant impacts on proportions of current/recent users.
Additionally, of the three frequency of use measures over the past month or year, the one relating to cannabis was significantly lower among PROSPER pupils, though it is unclear whether this finding would have remained significant after adjusting for multiple tests. It did when the Coalition for Evidence-Based Policy recalculated the 12th-grade results, but they had also excluded the two PROSPER communities which dropped out of the study early, leaving it unclear whether the sample reported on in the featured analysis would have produced the same result. Also, different ways of adjusting for multiple tests produce different results. The way chosen by the Coalition depends on the choice made of the proportion of outcomes one is prepared falsely to find statistically significant. Vary that choice, and which findings remain significant will also vary.
If by 12th grade the interventions had little significant effect, this might have been related to a waning in their impacts on the factors thought to account for effects on substance use. Though more consistently apparent at earlier follow-ups, by the 9th grade when pupils were typically aged 14–15, just eight of the 21 factors were significantly different among PROSPER versus control pupils. Among those which were not were the pupils’ confidence in their abilities to refuse an offer of drugs, their attitudes regarding substance use, and perhaps most closely related to actual use, their intentions to use a range of substances over the coming year.
On the basis of the published report, only one of the eight significant findings would certainly have passed a two-tailed test, and it is unclear whether any would have remained after adjustment for multiple measures. Whether significant or not, differences were generally very small.
The ambition to offer a universally applicable prevention option of public health significance means the main interest is in PROSPER’s impacts across all the children in the sample. It is, however, reported that on certain measures effects were significantly greater among the roughly 30% of pupils who at the start had already used cannabis, alcohol or tobacco, than among “lower risk” pupils. By the 12th grade, of the individual substances, this applied only to cannabis, but non-significant trends were in the same direction for the other substances.
As the authors say, this means PROSPER was at least no less effective among pupils who raise greatest concerns. This is, however, not the same as saying that it had statistically significant impacts among these pupils – that it has been shown to significantly restrain their tendency to more rapidly and more commonly engage in substance use.
The main methodological concern is whether this way of dividing the sample into higher and lower risk was planned before it was known to produce desirable results. With no publicly available pre-data analysis plan, the possibility cannot be excluded that the division criterion was chosen not on the basis of its theoretical or practical salience, but in order to best demonstrate that PROSPER would not fail to protect children at higher risk, a concern reinforced by a shift in the dividing line from that used in an earlier study; unfold supplementary text for details.
Even if a varied set of combined interventions has no overall effect on a particular measure, certain variants might. With non-universal attendance at the family intervention and three different school interventions, the PROSPER trial offered a chance to investigate this possibility. The answers given (1 2) were that attending the family intervention and, among the school programmes, Life Skills Training, were the most active ingredients.
Since both were choices rather than the result of random allocation, the findings can only be considered suggestive. Choice brings with it other factors associated with the reasons for making or not making those choices which could have affected the outcomes, regardless of the impact of the interventions, most clearly in the choice only a perhaps atypical minority made to attend the family intervention. It is also a concern that the two analyses selected very different outcomes via which to test the interventions, raising the possibility of ‘cherry-picking’ to be able to report desirable results.
The difference choice to attend or not might make can be gauged from the apparent boost PROSPER found given to substance use prevention by supplementing Life Skills Training with the family programme. No such boost was apparent in a trial which eliminated choice through random allocation. Unfold supplementary text for details.
In PROSPER the first (free source at the time of writing) post-intervention follow-up was conducted in the 7th grade when pupils would typically have been aged 13, 18 months after baseline measures. Of the 14 measures assessed, in eight cases fewer children in PROSPER districts had used or started to use substances, results reached on the appropriately stringent basis of two-tailed tests of significance and meeting the conventional ‘fewer than 1 in 20 times by chance’ criterion, though not adjusted for the multiple chances PROSPER was given to prove effective.
A harbinger of later findings, results were strongest for cannabis and non-significant for alcohol- and tobacco-related measures. Since the baseline measures were taken, about 3.7% of PROSPER-district pupils had started to use cannabis compared to 6.1% in control districts, and 2.8% v. 4.8% of baseline non-users had used the drug in past year. Past-year use of inhalants (‘glue sniffing’) was also significantly less common among PROSPER-district pupils, but not past-month smoking. Recent methamphetamine use was not reported on because so rare, effectively meaning that at this stage PROSPER had no effect. The upshot is that of the six measures of recent use of different substances which were or might have been reported on, only the two related to cannabis and inhalants showed statistically significant benefits, and these may not have survived stricter significance criteria to take account of the multiple chances give PROSPER to prove effective.
Among other significant results was a retarding in the uptake of inhalant use, a lower ‘lifetime illicit substance use index’ score combining ever-use of methamphetamine, ecstasy, cannabis (by smoking), drugs or medications prescribed for someone else, or certain prescription-only painkillers, and a lower score on an index combining the same cannabis use measure with ever-use of alcohol or cigarettes. Both may primarily reflect differences in cannabis use.
Of the 14 measures reported on across the whole sample, seven concerned ‘uptake’ of substance use among pupils who at the start of the trial had never used the substance concerned. In effect these were subgroup analyses, involving only prior non-users. Of the 12,022 pupils who started the trial and the 10,781 to supply 18-month follow-up data, between 7766 and 8385 pupils were included in the analyses. Subgroup analyses weaken the reassurance of equivalence of starting point given by the randomisation of the whole sample, though in this case probably only slightly given the size of the subgroup (non-users comprised at least 90% of the full sample for each measured substance).
Perhaps more seriously, the measure used was the absolute difference in proportions of new users between PROSPER and control districts, not the recommended relative measure. Why this matters can be appreciated from a hypothetical example. Assume 100 pupils in each arm of the trial of whom 90 PROSPER pupils had not used cannabis before the intervention and 80 control pupils. If 20 of the 90 PROSPER non-users have started use 18 months later the uptake proportion is 22%. If 30 of the 80 control non-users have started use 18 months later the uptake proportion is 38%. It was these kinds of calculations which were compared in the PROSPER trial report. But in relative terms the number of PROSPER users had increased from 10 to 30 or by 300%, and the number of control users from 20 to 50, just 250% – demonstrating that the two calculation methods can give opposing impressions of the effectiveness of an intervention. Whether this might have been the case in the PROSPER trial is impossible to say, because there seems no table detailing the baseline characteristics of the pupils in PROSPER v. control districts, including the proportions who had already used the various substances, though it seems none of the differences were statistically significant. Similar issues arise in respect of the five measures of rates of recent use of substances, which also excluded baseline users and reported absolute differences in proportions.
At this early stage there were consistent signs – some statistically significant – that PROSPER had greater effects among higher-risk pupils – roughly 30% of the total – who at the start had already used cannabis, alcohol or tobacco, than among lower risk; pupils.
Three years later and 4½ years after the baseline measures had been taken, when the 10th grade pupils would typically have been aged 15–16, again (free source at the time of writing) there was no adjustment for the multiple chances given PROSPER to prove effective, and the report used one-tailed tests, questionably doubling the chances of a ‘statistically significant’ finding.
Recalculating to two-tailed tests suggests seven of the 15 measures of use or uptake of use at 10th grade were statistically significant, and most were probably strong enough (it depends on the method used and in some cases on correlation between the measures or choice of acceptable error rate) to survive adjustment for multiple tests. The same concerns about the ‘new user’ measures as described above apply, but this time there is no mention of the recent-use measures also being based on the subgroup of baseline non-users. Uptake of illicit substances was most strongly affected by PROSPER, while on a two-tailed basis there were no statistically significant effects on tobacco or alcohol use either in terms of numbers of new users or recent use. Past-year use of cannabis and (more strongly) methamphetamine were each significantly less common among PROSPER pupils, and use of inhalants narrowly missed being added to this list. Of the measures combining use of several substances, one reflecting ever-use of illicit drugs was significantly lower in PROSPER districts, again probably primarily reflecting differences in cannabis use. A similar measure combining ever-use of alcohol, tobacco and cannabis just missed significance. Additionally, trends in the growth of substance use over the follow-up period were often significantly less steep in PROSPER districts. There was no analysis of whether PROSPER’s effects were greatest among high-risk pupils.
Finally, two year later the featured report gave findings for the 11th grade when pupils were aged 16–17. On a two-tailed basis, at this stage eight of 18 measures registered significantly lower past or recent substance use among PROSPER v. control pupils. All the eight related to either inhalants or illicit substances, none to tobacco or alcohol. Significant findings including the combined measure of ever having ever having illicitly used a range of substances. Nine of the measures reflected recent use. Of these, three involving cannabis or methamphetamine registered statistically significant reductions among PROSPER v. control pupils, though the methamphetamine finding would probably not have survived stricter significance criteria to take account of the multiple chances given PROSPER to prove effective. Unaffected to a significant degree were recent drunkenness, drink-driving, smoking or inhalant use.
In 2017 further results from the study took us on another year to when the former pupils had left school and were aged about 19, a welcome and rare test of the persistence of school-age prevention efforts. The authors saw their findings as supporting “the utility of the PROSPER delivery system to provide preventive health benefits into emerging adulthood.” As a year before, the overall pattern of the findings did favour PROSPER, especially in relation to the recent frequency of cannabis use, a difference which may have partly accounted for youngsters from PROSPER districts experiencing fewer drug-related problems.
Two-tailed tests were used to assess statistical significance, an appropriate strategy. But if the bar had been raised to account for the multiple chances given PROSPER, none of the recent-use measures may have proved statistically significant, including the frequency of cannabis use. The fact that an unknown but perhaps substantial proportion of pupils asked to provide data did not do so, also means the results may be less applicable to the entire population than those gathered while pupils were still in school and easier to reach. For details unfold supplementary text.
For the first time the age-19 report alluded to the “primary substance misuse outcomes targeted by PROSPER”. Among them were those for which “evidence for positive intervention effects was most pronounced”. It seems that these primary outcomes related mainly to illegal drug use and non-prescribed use of opioid painkillers – six measures of which five were reported to be significantly lower among participants from PROSPER districts. With no evidence that these were specified as primary in advance of this analysis, nor any hints from previous years that they were the most important, the possibility cannot be ruled out that they were selected at least partly because they provided the most convincing evidence. It seems surprising that the frequency of relatively serious events like drunkenness and drink-driving were not among the selection.
Studies of ingredients of PROSPER’s intervention menu outside the context of the PROSPER delivery system have registered some statistically significant preventive effects, but overall do not offer strong support for their consistent success, either delivered on their own or in combination.
Most relevant was an earlier trial from the same research team of the combination of programmes (Life Skills Training and the Strengthening Families Program) which on one measure at least was judged the most effective in the featured trial. It was conducted in the same type of communities as those recruited to PROSPER. Had the more appropriate two-tailed tests been used, at the 12th grade follow-up it seems that only the initiation of cannabis smoking, and a measure combining this with initiation of drinking and tobacco smoking, would have been significantly lower after the combined programme. Analyses for current use over the past month and other more serious patterns of substance use produced no significant results favouring the interventions. It seems unlikely that the cannabis use finding would have survived an adjustment for the multiple chances given the interventions to prove significant.
The same study also tested Life Skills Training on its own, the programme chosen by four of the PROSPER teams as their school-based programme. As in other trials of this extensive curriculum, long-term effects were most convincing in respect of smoking. For cannabis they were only partially significant and not at all for alcohol. Unfold the supplementary text for more on this trial and other studies of the family programme and school-based programmes used in the featured trial.
Thanks for their comments on this entry in draft to research author Richard Spoth of Iowa State University in the USA and his colleagues, Dennis Gorman of the Texas A&M University System in the USA, and David Foxcroft of Oxford Brookes University in England. Comments from research author Professor Spoth and colleagues are available on his university’s web site. Some of the comments have been incorporated in the Summary and Commentary above, but there remain differences of opinion. Commentators bear no responsibility for the text including the interpretations and any remaining errors.
Last revised 10 January 2018. First uploaded 29 August 2017
Comment/query
Open Effectiveness Bank home page
Top 10 most closely related documents on this site. For more try a subject or free text search
STUDY 2000 Education's uncertain saviour
REVIEW 2019 Family-based prevention programmes for alcohol use in young people
STUDY 2010 One-year follow-up evaluation of the Project Towards No Drug Abuse (TND) dissemination trial
STUDY 2011 Effects of a school-based prevention program on European adolescents' patterns of alcohol use
REVIEW 2014 Interventions to reduce substance misuse among vulnerable young people
STUDY 2010 Long-term effects of the Strong African American Families program on youths’ alcohol use
HOT TOPIC 2017 It’s magic: prevent substance use problems without mentioning drugs