START NOW: a cognitive behavioral skills training for adolescent girls with conduct or oppositional defiant disorder – a randomized clinical trial
Conflict of interest statement: See Acknowledgements for full disclosures.
Abstract
Background
Conduct disorder (CD) and oppositional defiant disorder (ODD) both convey a high risk for maladjustment later in life and are understudied in girls. Here, we aimed at confirming the efficacy of START NOW, a cognitive-behavioral, dialectical behavior therapy-oriented skills training program aiming to enhance emotion regulation skills, interpersonal and psychosocial adjustment, adapted for female adolescents with CD or ODD.
Methods
A total of 127 girls were included in this prospective, cluster randomized, multi-center, parallel group, quasi-randomized, controlled phase III trial, which tested the efficacy of START NOW (n = 72) compared with standard care (treatment as usual, TAU, n = 55). All female adolescents had a clinical diagnosis of CD or ODD, were 15.6 (±1.5) years on average (range: 12–20 years), and were institutionalized in youth welfare institutions. The two primary endpoints were the change in number of CD/ODD symptoms between (1) baseline (T1) and post-treatment (T3), and (2) between T1 and 12-week follow-up (T4).
Results
Both treatment groups showed reduced CD/ODD symptoms at T3 compared with T1 (95% CI: START NOW = −4.87, −2.49; TAU = −4.94, −2.30). There was no significant mean difference in CD/ODD symptom reduction from T1 to T3 between START NOW and TAU (−0.056; 95% CI = −1.860, 1.749; Hedge's g = −0.011). However, the START NOW group showed greater mean symptom reduction from T1 to T4 (−2.326; 95% CI = −4.274, −0.378; Hedge's g = −0.563). Additionally, secondary endpoint results revealed a reduction in staff reported aggression and parent-reported irritability at post assessment.
Conclusions
Although START NOW did not result in greater symptom reduction from baseline to post-treatment compared with TAU, the START NOW group showed greater symptom reduction from baseline to follow-up with a medium effect size, which indicates a clinically meaningful delayed treatment effect.
Introduction
Conduct disorder (CD) and oppositional defiant disorder (ODD) are characterized by repeated patterns of rule-breaking, aggressive or defiant behaviors that exceed the extent of what is considered age-appropriate behavior. Although both disorders have some distinct characteristics, there is convincing evidence for an overlap in neurocognitive functioning (Kleine Deters et al., 2020) as well as brain function and structure (Noordermeer, Luman, & Oosterlaan, 2016) between CD and ODD independent of comorbid symptoms. Especially, emotion processing impairments are observed in both disorders (Fairchild et al., 2019). As CD and ODD share common biological and psychosocial risk factors existing treatment recommendations are generally not specific to either disorder (NICE, 2017).
Both CD and ODD come with enormous long-term costs to the individual and to society (American Psychiatric Association, 2013). Although CD and ODD are less prevalent and often emerge later in girls than in boys, they are still common mental disorders of female youths (Konrad et al., 2022). However, most of the large-scale studies guiding theory and interventions related to CD/ODD have been based on male samples. In girls, CD and ODD have received comparatively little scholarly or evidence-led intervention attention, although the course and overall psychosocial adjustment problems associated with the disorder are often even more severe, and associated with a higher rate of lifetime comorbidities (Konrad et al., 2022).
Especially adolescents living in youth welfare institutions display many CD/ODD symptoms (Bronsard et al., 2016). Additionally, they often have a history of traumatic experiences and childhood adversity (Gordon, Nguyen, Mitchell, & Tyler, 2023) with severe impairments in emotion regulation capacities. Still, there is a general lack of specific interventions as well as of access to targeted, evidence-based, and cost-effective care and treatment in residential care settings (González-García et al., 2017). Evidence-based strategies may also be relevant for staff working with adolescents diagnosed with CD or ODD, who often show difficulties in complying with institutional rules, a major area of concern within the youth welfare sector (Smith, Colletta, & Bender, 2021). Hence, especially in case of severe or chronic aggressive behavior multimodal approaches focusing on both, adolescents and staff working in the welfare institutions, are urgently needed, training professionals how to motivate adolescents to change maladaptive behavior and teach them evidence-based skills to better cope with emotional and stressful demands in daily life (NICE, 2017). Dialectical behavioral therapy (DBT) integrating methods beyond traditional behavioral and cognitive concepts such as mindfulness or emotion regulation skills is discussed as a promising intervention approach for externalizing, antisocial or offending behavior (Visdomine-Lozano, 2022). A recent systematic review and meta-analysis indicates that DBT significantly reduces anger and additionally, a non-significant trend in reducing aggression was shown (Ciesinski, Sorgi-Wilson, Cheung, Chen, & McCloskey, 2022). There also is some evidence for DBT being an effective treatment for youth in residential care settings (McCredie, Quinn, & Covington, 2017).
Another integrative approach that has accumulated support for use within correctional (Kersten, Cislo, Lynch, Shea, & Trestman, 2016) and outpatient (Truong et al., 2021) adult populations is START NOW, which was developed from dialectical behavioral therapy – corrections modified (DBT-CM; Sampl, Wakai, & Trestman, 2010). The program is an evidence-based, manual-guided, gender-specific, strengths-based skills training incorporating methods of cognitive behavioral therapy (CBT) and DBT, motivational interviewing (MI), and trauma-informed care. The program was effective in reducing rule-violating and aggressive behavior in male adult correctional samples living in correctional facilities (Kersten, Cislo, et al., 2016). Similar results were obtained in male incarcerated adolescents receiving the previous version of START NOW, DBT-CM (Shelton, Kesten, Zhang, & Trestman, 2011). Thus, correctional institutions, but also youth welfare institutions constitute adequate settings for comprehensive treatment approaches such as DBT or START NOW, being able to provide ample opportunity for everyday coaching to transfer learned skills into daily life.
However, due to the nature of the setting (i.e., numerous barriers impeding implementation of scientific studies), robust, high-quality treatment evaluation studies within youth welfare settings are scarce (James, Thompson, & Ringle, 2017). Thus, the present study aimed to evaluate the efficacy of the skills training program START NOW compared to standard care in youth welfare settings (treatment as usual, TAU) adapted for female adolescents with CD or ODD within a randomized control trial. It was hypothesized that female adolescents participating in the START NOW program—which was applied as an add-on to standard care—would show a greater reduction in CD/ODD symptomatology compared with adolescents receiving standard care only. Correspondingly, the primary endpoints refer to the change in fulfilled number of CD/ODD symptoms assessed with a semi-structured psychiatric interview. Furthermore, it was hypothesized that START NOW would result in a reduction of self-rated psychopathology, irritability, emotion regulation, self-reported life satisfaction and staff-rated aggression compared to standard care alone. Additionally, in the intervention group, participant and trainer satisfaction with START NOW was assessed.
Methods
The specific aim of the present study was to evaluate the efficacy of the START NOW intervention adapted for females with CD/ODD living in youth welfare institutions (for the study protocol see Kersten et al., 2016).
Trial design and setting
The present study was a prospective, confirmatory, cluster-randomized, multi-center, international phase III trial with two parallel CD/ODD patient groups (START NOW intervention vs. standard care/treatment as usual, TAU). The study was conducted at four sites in three countries (Aachen and Frankfurt, Germany; Basel, Switzerland; Amsterdam, The Netherlands) as part of the European multi-disciplinary FP7 project “Neurobiology and Treatment of Adolescent Female Conduct Disorder: The Central Role of Emotion Processing” (FemNAT-CD; see http://www.femnat-cd.eu/ for more information).
The control condition consisted of any standard care available at a respective youth welfare institution. The intervention condition consisted of TAU and the START NOW program as add-on (from here on referred to as START NOW). The study protocol was approved by the respective University-based institutional review boards (University of Aachen, University of Frankfurt, University of Basel, University of Amsterdam). Deviations from the protocol are described in the Appendix S1.
Participants and procedure
Participants were recruited from youth welfare institutions within a reasonable distance to the four participating sites (Frankfurt, Aachen, Amsterdam, Basel) from December 2014 to July 2018. Eligibility criteria included female sex, between 12 and 20 years old, current clinical diagnosis of CD or ODD, and sufficient knowledge of German or Dutch (reading and writing skills). Exclusion criteria included verbal, performance, or total intelligence quotient (IQ) below 70, a history or current diagnosis of autism spectrum disorder or schizophrenia, current diagnosis of bipolar disorder or mania, fetal alcohol syndrome, any known monogenetic disorder or genetic syndrome, any chronic or acute neurological disorder, any severe medical condition interfering with the intervention, or any concurrent group-based psychotherapy. Eligibility status was assessed during an initial screening (T0), which was conducted from December 2014 to February 2015. All adolescents or their legal guardians (if the participant was younger than 14 years) provided written informed consent at screening. This superiority trial included three points: baseline assessment (T1) conducted from February 2015 to February 2018, post-treatment assessment (T3, 12 weeks after T1) from June 2015 to May 2018, and follow-up assessment (T4, 12 [±1] weeks after T3 or 24 [±1] weeks after T1, respectively) from September 2015 to July 2018. Participants demographic and clinical characteristics are provided in Table 1. All adolescents were compensated for participation in the assessments. Compensation for filling out questionnaires varied from small gifts, for example, make-up utensils, to vouchers, depending on the recommendations of the local ethical committees.
START NOW | TAU | Differencea | |||||
---|---|---|---|---|---|---|---|
Mean (SD) | Range | N | Mean (SD) | Range | N | p-Value | |
Age T1 (years) | 15.9 (1.4) | 12.3–20.1 | 72 | 15.1 (1.5) | 12.0–18.1 | 55 | .009 |
IQ-Test verbal | 88.7 (12.0) | 60–135 | 67 | 89.1 (15.1) | 65–120 | 52 | .972 |
IQ-Test performance | 98.1 (14.4) | 60–130 | 67 | 95.1 (13.5) | 65–125 | 51 | .192 |
IQ-Test estimate total | 93.7 (10.9) | 73–120 | 67 | 92.5 (11.7) | 68–115 | 51 | .699 |
CD symptoms (current T1) | 4.8 (2.7) | 1–11 | 72 | 4.7 (2.3) | 0–10 | 51 | .770 |
CD symptoms (past T1) | 1.5 (2.0) | 0–9 | 72 | 1.3 (2.2) | 0–9 | 51 | .430 |
ODD symptoms (current T1) | 5.3 (1.8) | 0–8 | 72 | 5.3 (1.8) | 0–8 | 51 | .994 |
ODD symptoms (past T1) | 1.0 (1.6) | 0–7 | 72 | 0.8 (1.4) | 0–7 | 51 | .618 |
Number of CD/ODD symptoms (T1) | 10.0 (3.4) | 4–19 | 71 | 9.6 (2.8) | 4–14 | 53 | |
Number of CD/ODD symptoms (T3) | 4.4 (3.3) | 0–13 | 49 | 6.0 (3.1) | 1–14 | 39 | |
Number of CD/ODD symptoms (T4) | 3.0 (2.9) | 0–12 | 45 | 6.0 (3.2) | 0–11 | 31 | |
T3–T1 (days) | 160.9 (57.1) | 98–301 | 51 | 124.7 (29.1) | 78–185 | 45 | <.001 |
T4–T1 (days) | 243.1 (58.4) | 166–392 | 46 | 224.8 (54.6) | 142–266 | 31 | <.001 |
Difference b | |||||||
N | % | N | % | p-Value | |||
Site | .404 | ||||||
Frankfurt (Germany) | 21 | 29.2 | 10 | 18.2 | |||
Aachen (Germany) | 11 | 15.3 | 11 | 20.0 | |||
Amsterdam (The Netherlands) | 15 | 20.8 | 13 | 23.6 | |||
Basel (Switzerland) | 25 | 34.7 | 21 | 38.2 | |||
Comorbidity (current at T1) | |||||||
Depression | 15 | 20.8 | 8 | 15.7 | .471 | ||
ADHD | 19 | 26.4 | 9 | 17.6 | .255 | ||
Substance use disorder | 11 | 15.3 | 5 | 9.8 | .530 | ||
Anxiety disorder | 15 | 20.8 | 9 | 17.6 | .660 | ||
PTSD | 8 | 11.1 | 10 | 19.6 | .189 | ||
Borderline personality disorder | 8 | 12.1 | 10 | 20.4 | .109 | ||
Concomitant psychotropic mediation | |||||||
At least one | 29 | 40.3 | 13 | 23.6 | .048 | ||
No medication | 43 | 59.7 | 42 | 76.4 |
- ADHD, attention deficit hyperactivity disorder; CD, conduct disorder; IQ, intelligence quotient; ODD, oppositional defiant disorder; PTSD, posttraumatic stress disorder; T1, baseline assessment; T3, post-assessment; T4, follow-up assessment; TAU, treatment as usual.
- a Mann–Whitney U test.
- b Chi-square test.
Sample size calculation was done using nQuery Advisor 7.0 software (Statistical Solutions, Cork, Ireland) and was based on effect sizes regarding change in the primary outcome variable (Nelson-Gray et al., 2006). That is, a pre-post change in the treatment group of μ = 3 and in the control group of μ = 1, as well as a conservative common standard deviation of 3.5 for both groups were expected. Additionally, to account for the correlated data structure (for example, due to group therapy), a design effect of 1.08 was assumed. To achieve a power of 80% (α = 5%, two-sided; two-sample t-test), a total of 108 patients was required. We planned to assess 172 patients to obtain 128 patients to be randomized, assuming that 25% of screened patients will not be eligible and that 15% will drop out, including those lost to follow-up or with major protocol violations. The latter was defined as participation in less than five group sessions (only START NOW group).
Randomization and blinding
Randomization was done after baseline assessment at the group level (with groups comprising 4–8 participants). A web-based randomization software was used (http://randomizer.at) by investigators at the respective study centers in Germany, the Netherlands or Switzerland. Staff of the respective groups were informed about their allocation immediately post randomization. Participants were informed by their institution and via mail. In the case of early drop out, individuals were replaced within the respective group (six in START NOW, four in TAU). For these individuals, baseline assessment was not blind as randomization allocation was known to the staff of the group. Early drop-out was defined as dropping out before start of the intervention (START NOW), or dropping out within 7 days post randomization (TAU). More information on the respective assessment time points and periods of the study is found in the published study protocol (Kersten, Praetzlich, et al., 2016).
A quasi-randomized controlled AB-BA design was used, such that institutions or the recruited living group were randomized to start either with START NOW (A) or with TAU (B). After having conducted A or B, each institution would run the other condition: Institutions having completed a TAU cycle would subsequently receive START NOW, and institutions that finished START NOW would conduct a new group with TAU, whereby the following rules were defined to eliminate the risk of carry-over effects: First, only caretakers who have not been involved or trained previously run the TAU condition. Second, institutions guaranteed that the risk for an exchange between new participants and girls who already received START NOW is minimized as much as possible (for example, girls are placed in different living groups or duration of stay in institution is shorter than duration of whole intervention period). We tried to achieve the same number of participants completing START NOW and TAU.
Intervention
The START NOW skills training was conducted by two staff members of the respective youth welfare institution. Content of a 2-day pre-training provided to institutions randomized to start with the START NOW training was to deliver general information on the theoretical background of START NOW (DBT, CBT, trauma-sensitive care and MI). Furthermore, facilitators were trained in running the sessions, including functional analyses of emotions and behavior, exercises of mindfulness and the topics of the various sessions such as accepting emotions, building up interpersonal skills and setting goals. Facilitators were trained to enable individual practicing (in-vivo coaching) in daily life and to use MI techniques to elicit behavioral changes in START NOW participants.
The present START NOW skills training consisted of 12 weekly group sessions (90 min each) and 12 individual sessions (45 min each). Participants in the TAU group took part in the standard care provided within their institution, excluding any group-based psychotherapeutic approaches like START NOW (for example, CBT or DBT-oriented programs).
Next to the above described 2-day training START NOW including a practical and written test trainers received ongoing bi-weekly clinical supervision by experienced, certified clinicians. Fidelity monitoring was implemented by the following procedure: Two START NOW sessions (videotaped/attended in person) were evaluated by the supervisors with a quality adherence form (QAF, see Appendices S7) assessing the presented content (4 items, for example, ABC exercises, mindfulness exercise) and implemented quality of the session (9 items, for example, trying to stimulate change talk, rolling with resistance, validation of participants) on a 4-point Likert scale (ineffective, acceptable, effective, very effective). Results indicate high-quality adherence on almost all items (see Appendices S8).
Primary endpoints
The two primary endpoints of the study were the change in number of CD/ODD diagnostic symptom criteria (current episode) as assessed by the Kiddie Schedule for Affective Disorders and Schizophrenia for School-Age Children (K-SADS; Kaufman, Birmaher, Brent, Rao, & Ryan, 1997) (1) between T1 (baseline) and T3 (post-treatment) and (2) between T1 (baseline) and T4 (follow-up).
The K-SADS was administered separately to participants and their social worker from the participant's respective youth welfare institution, who was, if possible, not involved in the intervention. Both perspectives (adolescent/social worker) determine the final judgment regarding the presence of a specific DSM- IV/TR symptom (given/not given or subthreshold/missing information). First, symptoms were summarized to achieve the final diagnosis of CD and ODD according to the DSM- IV/TR algorithm. Second, a dimensional, combined summary score of CD and ODD symptoms is given to calculate the maximal change in number of symptoms between −23 and +23. All K-SADS interviews were done by trained master- and doctoral-level students supervised by an experienced clinician. Inter-rater reliabilities (IRR, N = 75, see Konrad et al., 2022) of CD, ODD (as well as assessed comorbid diagnoses) were high (Cohen's κs ≥ .84, agreement rates ≥ 92%).
Secondary endpoints
Secondary endpoints included: Change from T1 to T3 and from T1 to T4 in (1) self-reported psychopathology (Youth Self Report, YSF; Achenbach, 1991); (2) self-reported irritability (Affective Reactivity Index, ARI-S; Stringaris et al., 2012); (3) self-reported emotion regulation (Difficulties in Emotion Regulation Scale, DERS; Gratz & Roemer, 2004); (4) social workers' daily ratings of aggressive symptoms (modified Overt Aggression Scale, M-OAS; Knoedler, 1989), assessed twice a day on 4 consecutive days); (5) self-reported life satisfaction (Brief Multidimensional Students' Life Satisfaction Scale, BMSLSS; Seligson, Huebner, & Valois, 2003); and (6) self-reported and trainer-reported satisfaction with the START NOW program (8-item client (participant) satisfaction questionnaire, CSQ, and 8-item trainer satisfaction questionnaire, TSQ) with four rating options (see Appendices S8). The secondary endpoints are described in more detail in the Appendix S2.
Statistical analyses
A CONSORT flow diagram (Figure 1). Patient characteristics stratified by treatment group as randomized are presented as means and standard deviation (SD) or frequencies and percent values (Table 1).

The analysis of the primary endpoints was based on the full analysis set (FAS) consisting of all randomized and additionally recruited subjects, except for early drop-outs. Subjects were analyzed as randomized irrespective of the received intervention, meeting the requirements of an intention-to-treat analysis (ITT). Missing data with respect to the primary endpoints were imputed on the item level by logistic regression with fully conditional specification (Eekhout et al., 2014). The CD/ODD symptom sum score was calculated based on observed (non-missing) and imputed items for the primary analysis. Number of missing values for primary and secondary endpoints are indicated in Tables 1 and 3, respectively. Ten imputed data sets were generated and results were combined by Rubin's rule (for details on imputation see Appendix S3). For the complete case analysis set, the symptom sum score was only calculated if all items were observed.
Additionally, intra-cluster correlation coefficient (ICC) was calculated as the mean across 10 imputed data sets. The LMM for the second null hypothesis included data from T1, T3, and T4, so that a repeated statement was added to model the within-subject structure. A first-order autoregressive structure was used but robustness of results was further proved by applying an unstructured matrix. As sensitivity analyses, the LMMs were applied to three further analysis sets: (1) “complete case (CC)” excluding subjects with missing values, (2) “as treated (AT)” with subjects analyzed in the group they were actually treated in, (3) “per-protocol (PP)” consisting of patients who passed through the study without any major protocol deviations (details on major protocol violations see Appendix S4).
Secondary endpoints were analyzed descriptively according to ITT in the FAS. The same linear mixed models as described above were applied to evaluate differences between groups. Results are summarized as the adjusted mean between-group difference with the corresponding two-sided 95% CI and descriptive p-values. Secondary endpoints were partly imputed by person mean imputation in case of a small number of missing items per subscale (Appendix S3).
Finally, safety analysis included comparison of frequencies of adverse and serious adverse events between treatment groups. p-Values of the Chi-squared tests are reported.
Analyses were prespecified in a statistical analysis plan (Appendix S5). In contrast to the statistical analysis plan, we performed additional sensitivity analyses for the primary endpoints, considering concomitant psychotropic medication (yes = at least one medication) as additional covariate in the model because of differences in this variable between groups. In addition, the number of imputations for missing data were increased from 5 to 10 due to model stability issues.
Results were considered as significant using a two-sided level of significance at <.05 or based on 95% confidence intervals. All analyses were performed using SAS® Software Version 9.4 (SAS Inc., Cary, NC). Figures were prepared in R version 4.0.2 (R Core Team, 2015), using the package ggplot2 (Wickham, 2016).
Results
Participants
The final analysis sample consisted of 127 participants from 29 institutions. Participating institutions in the three countries were heterogeneous with regard to size, location (rural vs. urban), but also with regard to staff-to-adolescent ratio or setting (securely locked vs. open).
A flow chart of participants per group is shown in Figure 1. The initial randomization cycle (n = 100, 20 groups) was followed by a second quasi-randomization cycle, (i.e., institutions participating in a second cycle applied the condition opposite of initial group assignment) including 7 groups (n = 24). Due to early drop-outs (n = 7), 10 participants who fulfilled eligibility criteria were recruited in addition to account for these drop-outs. Of the resulting 127 participants, 29 (22.8%) terminated the study before post-assessment, with more drop-outs in the START NOW group compared to TAU (29.1% vs. 14.5%). Another 15% of participants dropped out before follow-up assessment (START NOW: 6.9% vs. TAU: 25.5%). Ten participants were randomized to START NOW but were only treated in the TAU condition. They were analyzed in the TAU group in the “as treated (AT)” set and were excluded from the “per protocol (PP)” set.
Primary endpoints
Figure 2 shows number of symptoms of participants at baseline, post-assessment, and follow-up stratified by treatment group. In the primary analysis (FAS), an adjusted mean difference for the number of symptoms at T3 versus T1 of −3.68 (95% CI = −4.87, −2.49) for the START NOW group (n = 72) and −3.62 (95% CI = −4.94, −2.30) for the TAU group (n = 55) was found (Figure 2), showing that both treatment groups decreased in number of symptoms. However, symptom decrease at T3 did not differ significantly between groups (Table 2). Thus, the first null hypothesis of equal mean change in the number of CD/ODD symptoms between T1 and T3 for both the intervention and the control group could not be rejected (Table 2). Results of sensitivity analyses for T3 can be found in Table 2 as well as Figure 3. Additionally, the mean ICC coefficients based on the 10 imputed data sets was 0.285 (primary analysis) and the observed ICC in case of the CC set was 0.287.

N included | Difference between groups | |||||
---|---|---|---|---|---|---|
Start NOW, TAU | M | 95% CI | t | p-Value | Hedge's g | |
Primary analysis (FAS, ITT) H1 | 72, 55 | −0.056 | −1.86, 1.749 | −0.06 | .952 | −0.011 |
Sensitivity: Complete case set H1 | 49, 37 | −0.945 | −3.015, 1.124 | −0.94 | .356 | −0.203 |
Sensitivity: As treated set (imputed) H1 | 62, 65 | −0.647 | −2.406, 1.112 | −0.72 | .471 | −0.127 |
Sensitivity: Per protocol set H1 | 44, 25 | −0.016 | −1.994, 1.962 | −0.02 | .987 | −0.004 |
Sensitivity: covarying medication (FAS) H1 | 72, 55 | −0.011 | −1.840, 1.818 | −0.01 | .991 | −0.002 |
Sensitivity: FAS H2 | 72, 55 | −1.694 | −3.601, 0.213 | −1.74 | .0817 | −0.408 |
Sensitivity: Complete case set H2 | 44, 30 | −2.326 | −4.274, −0.378 | −2.4 | .0203 | −0.563 |
- Least squares mean, 95% confidence intervals, and test statistics for the group difference with respect to the change in the number of CD/ODD symptoms between T1 and T3. CI, confidence interval; FAS, full analysis set (ITT imputed); ICC, Intra-class coefficient; ITT, intention-to-treat; M, Least squares mean; SE, Standard error; T1, Baseline Assessment; T3, Post-treatment Assessment; TAU, treatment as usual.

As the first null hypothesis could not be rejected, the second null hypothesis was not tested in a confirmatory but an exploratory manner in the FAS and CC set without imputation of missing values with respect to the CD/ODD symptom score at T4, applying a mixed model including data from T3 (with imputed data for FAS) to provide information about the unobserved post-baseline outcomes. Here, in the CC set an adjusted mean in symptom reduction of −5.996 (95% CI = −7.548, −4.444) for the START NOW group (n = 44) vs. −3.670 (95% CI = −5.090, −2.251) in the TAU group (n = 30) was found, resulting in a significant mean difference of −2.326 (95% CI = −4.274, −0.378) between the treatment groups (Table 2). In the FAS, the mean difference was −1.69 (95% CI = 0.21, −3.60) between groups (Table 2) with a mean in symptom reduction of −4.16 (START NOW) vs. −2.47 (TAU; compare Figure 3).
Secondary endpoints
All secondary endpoints were analyzed in the FAS according to ITT principles with partly imputed data as specified above. In the START NOW group, there was a significant decrease from T1 to T3 in externalizing symptoms (YSR), affective reactivity parent- and self-report (ARI-P, ARI-S) and emotion regulation (DERS total score). In the TAU group, there was only a significant decrease in emotion regulation (DERS total score; Table 3). Furthermore, the START NOW and TAU group did not differ regarding changes from T1 to T3 in any of the secondary endpoints, except for staff rated aggression (M-OAS) with a medium effect size and affective reactivity parent-report (ARI-P), for which the START NOW group showed a stronger decrease compared with TAU. Least square means of changes in scores are displayed per group and as difference between groups with 95% confidence intervals in Table 3.
Start NOW | TAU | Difference (Start NOW vs. TAU) | |||||||
---|---|---|---|---|---|---|---|---|---|
M (T3–T1) | 95% CI | N | M (T3–T1) | 95% CI | N | M | 95% CI | Hedge's g | |
YSR externalizing (T-score) | −5.44 | −8.82, −2.05 | 36 | −3.32 | −7.78, 1.14 | 27 | −2.11 | −8.10, 3.87 | −0.178 |
YSR internalizing (T-score) | −2.04 | −5.21, 1.14 | 37 | −2.62 | −6.73, 1.50 | 27 | 0.58 | −4.92, 6.08 | 0.054 |
ARI-S (total score) | −1.66 | −2.84, −0.49 | 45 | −0.34 | −1.71, 1.04 | 33 | −1.33 | −3.28, 0.62 | −0.324 |
ARI-P (total score) | −2.40 | −3.35, −1.46 | 45 | −0.48 | −1.72, 0.75 | 27 | −1.92 | −3.59, −0.25 | −0.572 |
DERS (total score) | −15.51 | −23.12, −7.90 | 42 | −12.27 | −18.97, −5.57 | 33 | −3.24 | −13.41, 6.93 | −0.148 |
M-OAS (aggression score) | −0.76 | −1.54, 0.03 | 41 | 0.68 | −0.07, 1.42 | 25 | −1.44 | −2.53, −0.35 | −0.65 |
BMSLSS (total score) | 1.54 | −0.85, 3.93 | 39 | 0.50 | −1.59, 2.6 | 30 | 1.04 | −2.21, 4.29 | 0.152 |
- Mean change in secondary outcome measure scores by group and least squares means the group difference with respect to changes from T1 to T3 in secondary outcome measures using the full analysis set. ARI-S, Affective Reactivity Index Self-report; BMSLSS, Brief Multidimensional Students' Life Satisfaction Scale; CI, confidence interval; DERS, Difficulties in Emotion Regulation Scale; M, least squares mean of difference between T3–T1 and difference between groups; M-OAS, Modified Overt Aggression Scale; TAU, treatment as usual; YSR, Youth Self-Report.
Safety analysis
Adverse (AE) and severe adverse events (SAE) were assessed and reported in a structured form (Kersten, Praetzlich, et al., 2016). In case of SAE the local PI, the study coordinator and the local ethical committee were informed, and the Data Safety and Management Board (DSMB) was contacted for advice on study continuation.
88 adverse events and 21 severe adverse events occurred, of which 59 (67%) and 11 (52.4%), respectively, were not related to treatment. 41.7% of patients had at least one adverse event and 12.6% at least one severe adverse event. Furthermore, participants in the START NOW group had significantly more adverse events compared with TAU participants. The mean number of adverse events was 0.9 ± 1.3 (range = 0–7) in the START NOW group and 0.5 ± 0.9 (range = 0–4) in the TAU group (p = .016). However, there was no difference regarding adverse event intensity or whether they were linked to the START NOW treatment or not. We found no difference in severe adverse events between the START NOW (M = 0.2 ± 0.6, range = 0–2) and the TAU (M = 0.1 ± 0.3, range = 0–1) group (p = .439). Frequency of (severe) adverse events stratified by treatment group, with intensity and link to treatment, are shown in the Appendix S6.
Satisfaction with the training
For adolescents receiving START NOW satisfaction (percentage of positive rating) regarding various aspects with the training was: (1) High quality of the START NOW training (79.8%), (2) I got help I wanted (75.5%), (3) the training met my expectations (71.4%), (4) I would recommend the training (67.3%), (5) I am satisfied with the help I got (79.6%), (6) the training helped me to better cope with my problems (78.3%), (7) the training helped me in my daily life (69.4%), (8) I would participate again in a START NOW training in case I need help (63.3%).
For trainers conducting the training in the institutions, the rate of satisfaction was as follows: (1) High quality of the START NOW training (88.3%), (2) participants got the help they wanted (87.1%), (3) the training met participants´ expectations (77.8%), (4) the training helped participants to better cope with their problems (90.0%), (5) I would recommend the training to other institutions (95.2%), (6) I am satisfied with the support I got as facilitator (90.2%), (7) the quality of the skills-training I conducted was high (93.3%), (8) I will provide further trainings in my institution (83.3%). Results of trainers' satisfaction are given in the Appendices S8 as well as mean scores of CSQ and TSQ.
Discussion
This randomized controlled trial investigated the efficacy of a cognitive-behavioral, DBT-oriented skills training, START NOW, for female youth with CD or ODD, provided by staff within youth welfare institutions compared to standard care. Contrary to our hypothesis, there was no immediate significant difference in the number of CD/ODD symptoms at the end of the intervention between youth receiving standard care and those receiving START NOW in addition. Yet, secondary endpoints indicated a clinically meaningful stronger reduction in staff rated aggressive behavior (Hedge's g = −0.65) and parent rated irritability (Hedge's g = −0.57) at post-assessment.
Furthermore, in exploratory analyses at 12 weeks follow-up assessment (T4), CD/ODD symptoms showed a greater reduction in the START NOW group compared to standard care, with a medium effect size. This effect was not simply due to a motivational bias in the sample that completed the follow-up assessment as the effect remained stable when including the full analysis set, which also comprised the participants that dropped out early. Thus, the findings imply a delayed treatment effect, potentially resulting from prolonged practice of newly acquired skills (Ishikawa, Okajima, Matsuoka, & Sakano, 2007). This interpretation is in line with several meta-analyses on efficacy or effectiveness of CBT approaches for clinically impaired youth (Riise, Wergeland, Njardvik, & Ost, 2021; Sun, Rith-Najarian, Williamson, & Chorpita, 2019; Wergeland, Riise, & Ost, 2021). The positive effect at follow-up might be attributed to prolonged in-vivo coaching by trained staff during daily life beyond the 12 weeks-group training. Furthermore, it is possible that by ongoing practice and supervision staff became more experienced in using MI skills or other intervention techniques.
A recent meta-analysis (Ciesinski et al., 2022) indicates a dose–response effect of DBT on the reduction of anger and aggressive behavior. This recommendation is also supported by results of a START NOW pre-post study in adults placed in correctional institutions (Kersten, Cislo, et al., 2016; Kersten, Praetzlich, et al., 2016). Accordingly, based on our current results we conclude that sufficient intensity of training and coaching is needed to achieve a meaningful change on CD/ODD symptoms in female youth living in youth welfare institutions.
Even though we were not able to conduct a mediation analysis investigating underlying mechanisms of observed improvement, in line with Visdomine-Lozano (2022) we would like to emphasize that the following elements (all core elements of START NOW) are essential in the treatment of aggressive and antisocial behavior: Analyzing consequences of aggressive and antisocial behavior (functional behavior analysis), improving emotion regulation (identifying/accepting or regulating painful emotions), validating the personal history and understanding problematic behavior as a coping attempt, reducing hostile attribution bias and replacing problematic with functional behavior, practicing social competency, establishing positive attitudes, individual values and goals, building-up/strengthening skills to live according to values, and to reach goals.
Replicating previous studies investigating the effectiveness of DBT on anger and aggression (Ciesinski et al., 2022) or DBT-oriented approaches such as START NOW (Kersten, Cislo, et al., 2016; Shelton et al., 2011) results of this randomized controlled trial further show that intervention approaches integrating methods beyond CBT are promising in effectively treating institutionalized youth with CD or ODD. Participants were highly satisfied with the intervention. Still, drop-out was higher in the intervention then TAU groups. This can be attributed to the fact that the implementation of the intervention in youth welfare institutions had to be done over a longer time period as originally planned, due to vacation, sickness or other reasons of staff shortage.
In the current study, we need to address the following limitations: First, the current RCT study was part of a comprehensive assessment within the large FemNAT-CD research project, and it cannot be ruled out that the long assessment time was a burden to participants and limited the validity of self-rated questionnaires. For future studies, we recommend accounting for limited reading skills or lack of attention or motivation to complete a large number of questionnaires in youth with CD or ODD. The individual Goal Attainment Scale could not be analyzed as data were incomplete and or not valid due to participant's difficulties in understanding the instruction or defining concrete goals as required. Ecological Momentary Assessments had to be canceled as the used instruments (cell phone like mobile devices) did not comply with some institution's security restrictions. To assess symptoms in daily life situations staff rated aggression measures were collected twice a day on four consecutive days, however only at post assessment and not at follow-up.
Using objective variables, for example, school or professional performance, might be useful to reduce the questionnaire burden for participants. Second, double-blind assessments could only be realized during the baseline-assessment. To overcome any potential bias regarding external interviewees, we aimed to conduct CD/ODD interviews with social workers not involved in the training. Unfortunately, this was not always practical due to structural limitations (staff shortages, all staff received the training) and/or personnel (low motivation, vacation/sick days).
The necessary exclusion criteria may limit the generalizability of the findings of this RCT. Still, to participate in the group-based intervention sufficient reading capacities were necessary to participate in the present intervention. It may be discussed and tested in a further study if the program may also be effective with additional support to follow the skills-training and practicing its everyday implementation for individuals with low reading skills. Exclusion of individuals with other disorders such as pervasive developmental or genetic disorders or suffering from acute episodes of mental disorders resulting in a considerably reduced level of functioning was necessary, as these individuals are in need of other disorder specific interventions.
We did not yet consider any changes in biomarkers to measure the efficacy of the intervention on a biological level. To this end, we plan to investigate whether the START NOW skills training improves deficient prefrontal brain activity during effortful emotion regulation, which has been observed in female adolescents with CD (Raschle et al., 2019).
The study was conducted in three countries with a considerable variety in the youth welfare sector regarding structural and educational conditions. However, although institutions differed in Germany, Switzerland, and the Netherlands, center was not associated with change in CD/ODD symptoms (the primary endpoint). Thus, we can conclude that START NOW is an approach suited for a broad variety of institutions.
Conclusion
In sum, the 12 week-skills training START NOW for female youth with CD or ODD living in youth welfare institutions did not show an additional immediate effect on reducing CD/ODD symptoms compared to standard care. However, unblinded staff reported reduced aggression in the intervention group directly after intervention and parents reported reduced irritability. Additionally, a significant group difference in CD/ODD symptoms was observed at 3 months follow-up, indicating that adolescents receiving START NOW showed further symptom reduction beyond training participation, whereas adolescents receiving standard care did not. This delayed treatment effect may be due to further improvement in adolescents´ skills or improvement in staff competences. To date, this is the largest, multicenter randomized controlled trial (RCT) including female patients with CD or ODD living in youth welfare institutions. The findings have several important implications: (1) RCTs can be successfully conducted within youth welfare settings, and the present findings are thus characterized by high external validity. (2) RCTs can be successfully done with females diagnosed with CD or ODD, however, drop-out is somewhat higher in this population than in studies with youth with other mental disorders. In addition, cognitive and motivational capacities of the study population must be carefully considered regarding the use and interpretation of self-assessment methods. (3) START-NOW, a highly manualized training making use of standards in cognitive behavior therapy, DBT, MI and trauma sensitive care, is a highly promising approach in reducing CD/ODD symptoms, as well as aggressive and irritable behavior in youth. As it is designed to be used by staff trained during a two-day intensive course, implementation and sustainability costs are low. In addition, results of this study indicate trainers and participants were highly satisfied with the START NOW training and highly motivated in acquiring and applying learned skills.
Acknowledgements
The authors are grateful for all female adolescents and staff from youth welfare institutions that have participated in this study. C.M.F. receives royalties for books and book chapters on ASD, ADHD, ODD, CD, and MDD. She has served as consultant for Servier in 2021. She currently receives research funding by the German Research Association (DFG), the European Commission (EC), and the German Ministry of Science and Education (BMBF). N.M.R. receives funding from the Hochschulmedizin Zurich (HMZ, STRESS), the University of Zurich Research Priority Program ‘Adaptive Brain Circuits in Development and Learning (URPP AdaBD)’ and the Swiss National Science Foundation (105314_207624). The remaining authors have declared that they have no competing or potential conflicts of interest. Furthermore, the authors want to thank Prof. Dr. Chris Kuiper and Drs. Frederique Coelman for their support on conducting this study in youth welfare institutions in The Netherlands. Open access funding provided by Universitat Basel.
Funding
This study was supported by the European Commission FP7 program, Grant no. 602407 FemNAT-CD (coordinator: C.M. Freitag). The EC was not involved in study design, data analysis or publication of the project.
Trial registration
BMC Trials. 2016;17:568. DOI: 10. 1186/s13063-016-1705-6. German Clinical Trials Register (DRKS), DRKS00007524, (Date: 18.12.2015) and International Clinical Trials Registry Platform (WHO).
Author contributions
Conceptualization: Stadler, Freitag, Popma, Nauta-Jansen, Konrad, Kersten, Kieser, Trestman. Funding aquisition: Stadler, Freitag, Popma, Kieser, Konrad. Data curation: Ackermann, Bernhard, Gundlach, Kersten, Kieser, Kirchner, Kohls, Limprecht, Martinelli, Oldenhof, Prätzlich, Raschle, Unternährer, Vriends. Investigation/data acquisition: Ackermann, Bernhard, Gundlach, Kersten, Kohls, Martinelli, Nauta-Jansen, Oldenhof, Prätzlich, Raschle, Vriends. Methodology: Kieser, Kirchner, Limprecht, Stadler, Trestman, Unternaehrer. Project administration: Ackermann, Gundlach, Kersten, Kieser, Kirchner, Limprecht, Oldenhof, Prätzlich, Stadler. Supervision in local sites: Stadler, Freitag, Kohls, Konrad, Popma, Nauta-Jansesn, Popma, Raschle, Unternaehrer, Vriends. Visualization: Unternaehrer, Kirchner. Writing—original draft: Stadler, Unternaehrer, Kirchner, Kersten. Writing—review and editing: All authors discussed the results and contributed to the manuscript.
Key points
- Compared with standard care a 12-week add-on START NOW skills-training did not result in greater CD/ODD symptom reduction from baseline to post-treatment. However, results revealed a reduction in staff reported aggression and parent-reported irritability.
- At 3-month follow-up participants in the intervention group showed a stronger CD/ODD symptom reduction compared to standard care, indicating a delayed treatment effect.
- Training staff to provide a DBT/CBT-based skills-training may be promising not only to address maladaptive behavior in adolescents with CD or ODD, but also to empower staff to apply effective techniques to better manage oppositional and aggressive behavior.