In this chapter, we review ten structured decision-making tools that show considerable clinical promise in the screening or assessment of risk for different types of violence: Dynamic Appraisal of Situational Aggression—Inpatient Version, Dynamic Appraisal of Situational Aggression—Youth Version, Dynamic Appraisal of Situational Aggression: Women’s Version, Violence-Risk Screening-10, Domestic Violence Screening Instrument, Dynamic Risk Assessment for Offender Re-Entry, Juvenile Sexual Offense Recidivism Risk Assessment Tool—II, Guidelines for Stalking Assessment and Management, Stalking Risk Profile, and Assessment of Risk for Honour-Based Violence. Following a summary of shared features amongst the tools, we provide a brief description of each tool, focusing on their purpose, content, and characteristics, available empirical research on psychometric properties, and an analysis of professional uptake. The chapter concludes with directions for future research on the psychometric properties and applicability of these instruments.
Meta-analytic reviews have identified at least 400 structured violence risk assessment measures used by professionals working in health, justice, and general community settings (Singh et al., 2014). These include well-researched and widely used measures, measures that have received relatively less empirical or clinical attention but nevertheless show considerable promise to assist in the evaluation of risk, and locally developed or “homegrown” measures that have limited, if any, research support.
In this chapter, we discuss structured measures with some research and professional uptake, but which do not yet have an established research base or usership. Some of these measures were developed to screen for violence risk or guide immediate action rather than facilitate a comprehensive risk assessment. We were interested in evaluating measures that show considerable clinical promise in the screening or assessment of risk for different types of violence. We excluded from consideration measures that: (1) do not have evaluations of psychometric properties published in peer-reviewed scholarly journals; (2) were developed for use by a single institution or jurisdiction; (3) were not developed explicitly for the purpose of risk assessment (e.g., behavioural checklists, personality assessments); (4) are not administered by clinicians (e.g., victim judgment tools, self-report appraisals of risk); and (5) were not published in or translated into English. The tools on which we chose to focus use different assessments methods (i.e., structured professional judgment [SPJ] and actuarial) and address diverse outcomes (i.e., general and specific forms of violence) among adults or youth.
Space limitations allowed us to focus on ten measures: Dynamic Appraisal of Situational Aggression—Inpatient Version (Ogloff & Daffern, 2006), Dynamic Appraisal of Situational Aggression—Youth Version (Daffern & Ogloff, 2014), Dynamic Appraisal of Situational Aggression: Women’s Version (Riordan et al., 2019), Violence-Risk Screening-10 (Hartvig et al., 2007), Domestic Violence Screening Instrument (Williams & Houghton, 2004; Williams & Grant, 2006), Dynamic Risk Assessment for Offender Re-Entry (Serin, 2007; Serin, Mailloux, & Wilson, 2012), Juvenile Sexual Offense Recidivism Risk Assessment Tool—II (Epperson, Ralston, Fowers, DeWitt, & Gore, 2006), Guidelines for Stalking Assessment and Management (Kropp, Hart, & Lyon, 2008), Stalking Risk Profile (MacKenzie et al., 2009), and Assessment of Risk for Honour-Based Violence (Kropp, Belfrage, & Hart, 2013).
We first review screening tools, then turn our attention to more comprehensive risk assessment measures. Screening tools are brief and meant to be completed relatively quickly. They typically are developed to identify individuals in need of a more comprehensive evaluation (“screening in”), or to identify individuals presumed to pose relatively lower levels of risk (“screening out”) who would not be subjected to more resource-heavy assessments. They also may assist in determining whether immediate intervention is required. By virtue of their brevity, screening tools are not intended to facilitate long-term risk management. In contrast, risk assessment tools are intended to provide a comprehensive evaluation of risk posed and, in some cases, to facilitate development of an individually tailored risk management plan. For a more in-depth discussion of these issues, see Vincent, Terry, and Maney (2009).
For each measure reviewed, we provide a brief description (including the model of risk assessment, the criterion it is designed to assess, target populations, intended applications, and content), a summary of its psychometric properties (with a focus on interrater reliability and predictive validity), and an analysis of its professional uptake (including countries of use and user feedback regarding the acceptability and feasibility of administering the tool in practice). Given that the SPJ tools described in this chapter were developed through similar means and share similar features, we briefly review universal features of these instruments at the onset to avoid repetition in the text. As is the case for all SPJ tools, the four described in this chapter: (1) were developed through a systematic review of the relevant scientific and professional literatures, including analysis of standards of practice, ethical codes, and relevant legal principles; (2) are intended for use with males and females by trained professionals working in diverse settings (e.g., mental health, law enforcement, human resources, security); (3) should be completed after considering multiple sources of information (e.g., interview with the examinee and collateral sources, and review of diverse types of files, such as mental health, legal, school, employment, and social media); (4) comprise items assessed on a 3-level scale (not present, possibly or partially present, present); (5) allow for consideration of case-specific factors not included on the tool; (6) require evaluators to exercise professional judgment in reaching a final conclusory opinion or summary risk rating (SRR) of low, moderate, or high risk (however, total numerical scores may be generated for research purposes); and (7) allow for reassessments of risk. As with all actuarial tools, a defining hallmark of the four reviewed in this chapter is that a priori algorithmic rules for combining the data are applied to yield a total numerical score. See the chapter by Heilbrun et al. (Chapter 1 in this volume), for a detailed description of the actuarial and SPJ methods of violence risk assessment. In the final section of this chapter, we discuss directions for future research.
The DASA-IV (Ogloff & Daffern, 2006) is an actuarial measure designed to assess risk for imminent inpatient violence (i.e., within the next 24 hours) and identify targets for staff intervention. The DASA-IV is to be used by a qualified health care professional, such as a psychiatric nurse, with men and women aged 18 years and older who are residing in inpatient forensic or civil psychiatric settings.
The DASA-IV was developed by empirically testing the association between inpatient aggression and all items on the Brøset Violence Checklist (BVC; Almvik, Woods, & Rasmussen, 2000), six items on the Historical-Clinical-Risk Management-20, Version 2 (HCR-20; Webster, Douglas, Eaves, & Hart 1997), and other empirically derived risk factors (Ogloff & Daffern, 2006) over a 6-month period among a sample of forensic psychiatric patients in Australia. The DASA-IV comprises seven items that together yielded the highest Area Under the Curve (AUC) value in the development sample: two from the HCR-20 (negative attitudes, impulsivity), two from the BVC (irritability, verbal threats), and three additional risk factors (sensitive to perceived provocation, easily angered when requests are denied, and unwillingness to follow direction). Each item is coded as absent (0) or present (1) over the previous 24-hour period. A total score is derived from summing each item score, with a score of 0 indicating low risk for violence over the upcoming 24-hour period, 1 to 3 indicating moderate risk, 4 or higher indicating high risk, and 6 to 7 indicating imminent risk. Generally, re-evaluations of risk using the DASA-IV should occur every 24 hours.
A youth and a women’s version of the DASA were also developed. The DASA-Youth Version (DASA-YV; Daffern & Ogloff, 2014) is comprised of all seven items on the DASA-IV and four additional items (anxious or fearful, low empathy/remorse, significant peer rejection, and outside stressors), which were adapted or developed from a measure of general violence, the Structured Assessment of Violence Risk in Youth (SAVRY; Borum, Bartel, & Forth, 2006), and from the DASA-YV validation study (Daffern & Ogloff, 2014). The DASA: Women’s Version (DASA:WV; Riordan et al., 2019) is comprised of all DASA-IV items, as well as two items from the HCR-20 Female Additional Manual (covert/manipulative behaviour and low self-esteem; de Vogel, de Vries Robbé, van Kalmthout, & Place, 2011), and a rating of ward atmosphere (disturbing/unsettling and/or aggressive tension/threats of violence).
At least three peer-reviewed studies have examined the reliability of the DASA-IV or DASA-YV. These studies have reported excellent interrater reliability (IRR; see Cicchetti & Sparrow, 1981) of the DASA-IV total score among psychiatric nurses working at an adult forensic psychiatric hospital (Krippendorff’s alpha = .92; Chan & Chow, 2014) and supervision staff working at a juvenile psychiatric facility (single measures intraclass correlation coefficient [ICC 1] = .91; Chu, Hoo, Daffern, & Tan, 2012) and good IRR of the DASA-YV total score among a youth inpatient psychiatric sample (Kappa [κ] = .79) (Dutch & Patil, 2018). Although violence risk assessment measures are not intended to measure a single underlying psychological construct, Chan and Chow (2014) found high internal consistency (see Nunnally & Bernstein, 1978), as indicated by Cronbach’s alpha (α) for the DASA-IV total score (α = .86). IRR for DASA-IV risk bins and individual items has not been published. To date, no peer-reviewed studies have examined the reliability of the DASA-WV.
With respect to concurrent validity, the DASA-IV total score has been found to have a large association with the BVC total score (r = .67; Chu, Thomas, Daffern, & Ogloff, 2013, Spear-man rank-order coefficient [r s] = .96; Chan & Chow, 2014) and the HCR-20 Clinical scale “score” (r = .73; Chu, Thomas et al., 2013). With respect to incremental validity (here, the extent to which one tool increases the predictive validity beyond that of other tools or factors), several peer-reviewed studies have found the DASA-IV total score outperformed the HCR-20 Clinical “score” (e.g., Chu et al., 2013; Ogloff & Daffern, 2006), as well as structured and unstructured clinical judgment (Griffith et al., 2013). In Chu, Thomas et al. (2013) comparison of DASA-IV risk bins and total score, the former did not add incrementally to the latter in the prediction of imminent physical aggression towards others. To our knowledge, no peer-reviewed studies have examined the concurrent validity of the DASA-YV or DASA-WV.
At least 15 peer-reviewed studies have examined predictive validity of the DASA-IV (Barry-Walsh, Daffern, Duncan, & Ogloff, 2009; Chan & Chow, 2014; Chu, Daffern, & Ogloff, 2013; Chu et al., 2012; Chu, Thomas et al., 2013; Daffern & Howells, 2007; Daffern et al., 2009; Dumais, Larue, Michaud, & Goulet, 2012; Griffith, Daffern, & Godber, 2013; Kasinathan et al., 2015; Lantta, Kontio, Daffern, Adams, & Valimaki, 2016; Maguire, Daffern, Bowe, & McKenna, 2017; Nquwaku et al., 2018; Riordan et al., 2019; Vojt, Marshall, & Thomson, 2010). Studies with civil and forensic psychiatric samples have indicated moderate to large predictive accuracy of DASA-IV total scores for any imminent inpatient aggression (r = .33 to .37; AUC = .55 to .97), verbal aggression (r = .30 to .40; AUC = .57 to .86), physical aggression towards objects (AUC = .66 to .82), physical aggression towards other patients (r = .33 to .40; AUC = .55 to .92) and staff (AUC = .48 to .80), and self-harm (AUC = .65 to .92).
Predictive validity of the DASA-YV has been evaluated in at least two peer-reviewed studies (Dutch & Patil, 2018; Kasinathan et al., 2015). These studies have indicated moderate to large predictive accuracy of DASA-YV total scores for any aggression (AUC = .75 to .90), verbal aggression (AUC = .74 to .92), physical aggression towards others (AUC = .72 to .84). and physical aggression against objects (AUC = .75 to .88). Incrementally utility of the DASA-IV or DASA-YV have not yet been examined.
At least one peer-reviewed study has examined the predictive validity of the DASA-WV. Riordan and colleagues (2019) reported moderate to large predictive accuracy of DASA-WV total scores for any aggression (AUC = .63 to .76), verbal aggression (AUC = .64 to .76), physical aggression towards others (AUC = .65 to .82), physical aggression towards objects (AUC = .63 and .82), and self-harm (AUC = .66 to .92) in a female forensic psychiatric inpatient sample. However, the DASA-WV did not improve predictive accuracy above the DASA-IV, so the authors did not recommended its use.
The DASA-IV has been translated into Finnish (Lantta, Daffern, Kontio, & Valimaki, 2015) and French (Dumais et al., 2012) and currently is used in inpatient settings in at least eight countries across four continents (Barry-Walsh et al., 2009; Chan & Chow, 2014; Chu et al., 2012; Chu Thomas et al., 2013; Daffern et al., 2009; Dumais et al., 2012; Griffith et al., 2013; Kasinathan et al., 2015; Lantta et al., 2015; Maguire et al., 2017; Vojt et al., 2010). Studies examining the perceived clinical utility of the DASA-IV have reported mixed results. Dumais and colleagues (2012) surveyed attitudes of psychiatric nurses towards the DASA-IV following its implementation at a civil psychiatric hospital in Canada. Most nurses (75.0% to 81.3%) had a positive view of the measure and perceived it as clinically relevant and helpful in preventing violence in the hospital. In contrast, Daffern and colleagues (2009) found that 12 of 16 psychiatric nurses who responded to the survey reported that administering the DASA-IV among a Dangerous and Severe Personality Disorder (DSPD) forensic psychiatric sample in England was not helpful. The main concern expressed was that the DASA-IV was unable to monitor rapid changes (i.e., over seconds, minutes, or hours) in affect and behaviour specific to DSPD patients. However, the authors noted that these findings may reflect a lack of full integration of the DASA-IV into clinical practice and insufficient training with some staff. Finally, Lantta and colleagues (2015) surveyed attitudes following implementation of the DASA-IV in mental health units in Finland. Nurses described the DASA-IV as easy and quick to administer and helpful in treatment monitoring and facilitating communication between staff, but difficult to complete.
Dutch and Patil (2018) surveyed attitudes of youth psychiatric nurses following the use of the DASA-YV. Most nurses (80%) reported that the DASA-YV was quick, easy to use, and applicable to all patients. They also reported that the DASA-YV helped them better observe patient behaviour, and that it predicted aggressive incidents better than intuition alone. All nurses felt that the tool helped them record instances of aggression and that it was useful as a daily tool.
The V-RISK-10 (Hartvig et al., 2007) is a SPJ measure developed to assist in the evaluation of risk for inpatient and outpatient violence among adult civil psychiatric patients. It was designed to screen patients for referral for a more comprehensive risk assessment (e.g., with the HCR-20) and identify individuals in need of immediate risk management efforts. It is intended for use by psychologists, psychiatrists, or general practitioners.
The V-RISK-10 was developed by empirically testing the association between violence and a 33-item measure, the Preliminary Scheme (Hartvig, Alfarnes, Ostberg, Skjonberg, & Moger, 2006) over a 1-year period among a sample of civil psychiatric patients in Norway. The Preliminary Scheme comprises 19 items from the HCR-20, Version 2 (Webster et al., 1997), six items from the BVC (Almvik et al., 2000), and eight risk factors derived from the empirical and professional literatures (Hartvig et al., 2006). Based on findings of this research (Hartvig et al., 2006), the V-RISK-10 was developed by identifying the 10 items with the best predictive validity (Hartvig et al., 2011). Of these 10 items, four capture both past and present functioning (e.g., previous and/or current substance use), four tap present functioning only (e.g., lack of empathy), and two address anticipated future functioning (e.g., exposure to and coping with future stressors). In the final administration step, evaluators select one of three options regarding next steps: no further detailed violence risk assessment, more detailed violence risk assessment, or implementation of preventive measures.
To date, at least three peer-reviewed studies have examined the IRR of the V-RISK-10 when used by psychologists or physicians in their clinical-forensic practice. All found fair to good support, with ICC 1 values ranging from .35 to .87 and average measures intraclass correlation coefficient (ICC 2) values ranging from .77 to .89 for the total score (Bjørkly, Hartvig, Heggen, Brauer, & Moger, 2009; Roaldset, Hartvig, & Bjørkly, 2011; Yao, Li, Arthur, Hu, & Cheng, 2012). Adequate rater reliability also has been reported for the SRR, with an ICC 1 of .72 and ICC 2 of .85 (Bjørkly et al., 2009). A wider range of reliability coefficients has been reported at the item level (ICC 1 = .06 to .80 and ICC 2 = .29 to .96; Bjørkly et al., 2009; Yao et al., 2012).
To our knowledge, no peer-reviewed studies have examined the concurrent or incremental validity of the V-RISK-10. High levels of predictive validity of the V-RISK-10 have been demonstrated in at least eight peer-reviewed studies, with no differences in predictive accuracy as a function of gender (Eriksen et al., 2016; Eriksen, Færden, Lockertsen, Bjørkly, & Roaldset, 2018). With respect to inpatient violence, moderate to large effect sizes have been observed for any violence (AUC = .79 to .85), violent threats (AUC = .81), and physical violence (AUC = .89). Similarly, moderate to large effects for violence have been reported for patients discharged to the community (AUC = .69 to .80; Eriksen et al., 2016, 2018; Hartvig, Roaldset, Moger, Østberg, & Bjørkly, 2011; Roaldset et al., 2011; Roaldset, Hartvig, Linaker, & Bjørkly, 2012; Yao et al., 2012; Yao, Li, Arthur, Hu, & Cheng, 2014).
The V-RISK-10 has been translated into at least three languages (English, Danish, and Mandarin; Nielsen et al., 2015; Singh et al., 2014; Yao et al., 2012) and is used internationally on five continents (Singh et al., 2014), including at least seven countries. Results from an international survey of psychology, psychiatry, and nursing professionals indicated that V-RISK-10 users perceive it as helpful to monitor risk, conduct risk assessments, and develop case management plans (Singh et al., 2014).
The DVSI (Williams & Houghton, 2004; Williams & Grant, 2006) is an actuarial measure designed to screen for risk of intimate partner violence (IPV) to indicate whether a more comprehensive risk assessment (e.g., the Spousal Assault Risk Assessment [SARA]; Kropp & Hart, 2015) should be conducted or whether immediate risk management plans should be put in place (Williams & Houghton, 2004). The development of the DVSI was prompted by the need to increase the speed with which IPV cases were processed. As such, the information required to complete the DVSI can be drawn from official (e.g., court and probation) records and offender management databases (Williams & Houghton, 2004).
The initial version of the DVSI (Williams & Houghton, 2004) was developed through analysis of local data to identify common variables associated with IPV, review of the empirical literature, and consultation with police and other professionals (e.g., judges, lawyers, victim service workers). The resulting 12 items were subsequently found to be statistically associated with IPV recidivism in a validation sample of 1,465 male offenders arrested for IPV offences committed against female partners. DVSI items pertain to the criminal history (e.g., history of IPV and non-IPV offenses) and social history (e.g., employment status, recent separation) of the offender, and whether weapons or children were present during the index IPV offense. Each item is scored from 0 to 2 or 0 to 3, depending on the presence and severity of the item. Total scores are calculated by summing item scores, with higher total scores indicating “the higher the risk for reoffending, noncompliance with court, and probation orders, and thus, the higher the risk to victims” (Williams & Houghton, 2004, p. 441). Moreover, a higher score is interpreted as an indication of the need for a more thorough IPV assessment.
In 2006, an 11-item version, the DSVI-Revised (DVSI-R; Williams & Grant, 2006), was developed by re-wording or removing redundant items. In addition, in response to users’ feedback about wanting the option to exercise professional discretion, two SRRs were introduced for users to judge imminent risk (i.e., within the next 6 months) of violence toward (1) the victim and (2) other persons known to the victim or perpetrator.
Reliability of the DVSI has been evaluated in at least four peer-reviewed studies. These studies reported acceptable internal consistency for the total score for the DVSI (α = .71; Williams & Houghton, 2004) and DVSI-R (α = .73 to .75; Stansfield & Williams, 2014; Williams, 2012; Williams & Stansfield, 2017) as well as acceptable item-total scale correlations (r = .24 to .72; Williams, 2012). IRR of the DVSI or DVSI-R has not been reported.
Support for the concurrent validity of the DVSI total score has been demonstrated through moderate to large associations with other measures of IPV risk, such as the SARA total “score” (r = .53 to .54; Hilton, Harris, Rice, Houghton, & Eke, 2008; Williams & Houghton, 2004) and SRR (r = .57; Williams & Houghton); the Domestic Violence Risk Appraisal Guide (r = .50; Hilton et al., 2008); and the Ontario Domestic Assault Risk Assessment (r = .52; Hilton et al., 2008). Its association with measures of risk for general violence and psychopathy is somewhat smaller: Level of Service Inventory-Revised (Andrews & Bonta, 1995) total score (r = .17; Williams & Houghton, 2004); VRAG (Quinsey, Harris, Rice, & Cormier, 2006) score (r = .31; Hilton et al., 2008); and Psychopathy Checklist-Revised (Hare, 2003) total score (r = .34; Hilton et al., 2008).
Predictive validity of the DVSI or DVSI-R has been examined in at least eight peer-reviewed studies. The initial version of the DVSI showed small to moderate predictive utility for the presence of any IPV or family violence rearrests (AUC = .61 to .65) and the total number of such rearrests (r = .18 to .21; Williams & Houghton, 2004). When IPV was identified using victim self-report data, the DVSI had small to moderate effect sizes for severe IPV (AUC = .60 to .68), and low to no predictive accuracy for any IPV (AUC = .50 to .60) and less severe IPV (AUC = .49 to .56; Campbell et al., 2005; Williams & Houghton, 2004).
The DVSI-R total score also has been found to have small to moderate utility in predicting the occurrence of IPV or family violence offenses (AUC = .61 to .65) and the total number of such offenses (r = .17 to .24), violations of protective or court orders (AUC = .67 to .72), offense severity (r = .18), and degree of victim injury (r = .19) (Gerstenberger & Williams, 2012; Hilton et al., 2008; Stansfield & Williams, 2014; Williams & Grant, 2006; Williams, 2012). Predictive accuracy was improved when the DVSI-R total score was combined with additional perpetrator, victim, and clinical variables (AUC = .84; Williams & Grant, 2006). In the three peer-reviewed studies that examined the predictive accuracy of DVSI-R SRRs, DVSI-R SRRs were predictive of any IPV recidivism (Odds Ratio = 1.69), imminent risk of violence towards the victim (AUC = .64 to .66), and imminent risk to others (AUC = .61 to .66) (Williams, 2012; Williams & Grant, 2006; Williams & Stansfield, 2017).
Although the DVSI-R total score has been found to add incrementally to the prediction of IPV above and beyond perpetrator demographic and offence characteristics (Gerstenberger & Williams, 2012; Williams & Grant, 2006), findings with respect to the incremental utility of DVSI-R SRRs are mixed. One peer-reviewed study found that the DVSI-R SRR for imminent risk to other persons known to the victim or perpetrator, but not the SRR for imminent risk towards victims, added incrementally to perpetrator characteristics in the prediction of IPV (Williams & Grant, 2006). In another study, neither imminent risk towards victims nor imminent risk to others was incrementally predictive beyond DVSI-R total scores (Williams, 2012).
The DVSI/DVSI-R is used in several U.S. jurisdictions to determine the suitable pretrial and disposition options for IPV offenders (Williams, 2012). Given its inclusion of SRRs, the DVSI-R has yielded more positive user feedback among field staff than the original DVSI (Williams & Grant, 2006).
The DRAOR (Serin, 2007; Serin et al., 2012) is a clinical rating scale designed to assist in the assessment and management of general and violent recidivism among male and female offenders aged 18 years or older under community supervision. It was developed based on two well-validated models of offender management: the Personal, Interpersonal, Community-Reinforcement perspective and the Risk-Need-Responsivity framework (Andrews & Bonta, 2010). The DRAOR is intended for use by probation and parole officers.
The DRAOR comprises 19 items divided into three subscales pertaining to dynamic risk factors that can change gradually (Stable, e.g., attitudes towards authority), dynamic risk factors that can change rapidly (Acute, e.g., opportunity for crime), and factors that have the potential to mitigate reoffence risk (Protective, e.g., social support). Items were selected following a review of the scientific literature on recidivism (Serin, 2007). Each Stable and Acute item is rated as no problem, slight problem, or definite problem, whereas each Protective item is rated as not protective, slight/possible asset, or definite asset using information obtained from interviews with the offender and collateral contacts (e.g., family, treatment providers) or other external sources (e.g., police intelligence activity). Evaluators identify risk scenarios and indicate their level of concern for imminent reoffending or violation of supervision conditions on a 6-point Likert-type scale from Not Concerned (1) to Very Concerned (6). Evaluators also indicate whether they intend to modify the frequency of supervision (increase, maintain, or decrease) based on the aforementioned level of concern. For research purposes, a total score may be calculated by summing scores of the Acute and Stable scales and subtracting the Protective scale score (Serin et al., 2016). It is recommended that DRAOR reassessments occur at each supervision contact (i.e., at least once a month) to ensure that any changes in the offender’s circumstances are captured (Serin et al., 2012).
IRR and internal consistency of the DRAOR have not been reported in peer-reviewed research. With respect to convergent validity, Yesberg and Polaschek (2015) reported the DRAOR total score had a moderate association (r = .30) with a measure of offenders’ preparedness for release, the Release Proposal Feasibility Assessment Revised (Wilson, 2002), and small associations with a measure of criminal recidivism risk, the Violence Risk Scale (Wong & Gordon, 2002; r = .16 and .25 with Violence Risk Scale Static and Dynamic subscale scores, respectively).
Predictive utility of the DRAOR when used by probation officers has been examined in at least four peer-reviewed studies. Among low- to moderate- (Serin, Chadwick, & Lloyd, 2016) and high-risk (Yesberg & Polaschek, 2015) offenders, the DRAOR total score was a small to moderate predictor of general (AUC = .62 to .70) and violent (AUC = .60) recidivism, a small to large predictor of supervision violations (AUC = .55 to .72), and a small predictor of reimprisonment (AUC = .62). Similar results also have been obtained for Stable (AUC = .58 to .67), Acute (AUC = .55 to .70), and Protective (AUC = .60 to .70) subscale scores. Although some studies have provided weak support of the predictive utility of the DRAOR, modest AUC values may reflect improved case management of high-risk offenders, rather than a limitation of the predictive utility of the tool (Yesberg & Polaschek, 2015).
Consistent with claims that the DRAOR is a “gender-neutral” and dynamic risk assessment measure, it has been found to be predictive of recidivism irrespective of examinee gender (Yesberg, Scanlan, Hanby, Serin, & Polaschek, 2015; Scanlan, Yesberg, Fortune, & Polaschek, 2020) and sensitive to fluctuations in risk (Polaschek, Yesberg, & Chauhan, 2018; Serin, Chadwick et al., 2016). For instance, Serin, Gobeil et al. (2016) reported that over an average follow-up period of 65.6 days, the DRAOR scores of 85% of the study participants meaningfully changed.
With respect to incremental utility of the DRAOR, findings have been mixed. Yesberg and colleagues (2015) found that the DRAOR incrementally predicted reconvictions among men above an actuarial measure of risk, the Risk of Re-Conviction × Risk of Re-Imprisonment Model ([RoC*RoI], Bakker, Riley, & O’Malley, 1999). In contrast, Yesberg and Polaschek (2015) found that the DRAOR added incrementally to the RoC*RoI for women, but not for men.
The DRAOR has been implemented in New Zealand (Yesberg & Polaschek, 2015) and in some U.S. jurisdictions (Serin, Chadwick et al., 2016). In pilot testing in New Zealand, probation officers described the DRAOR as useful, user-friendly, easy to administer, and helpful in structuring their interactions with offenders (Yesberg & Polaschek, 2015).
The JSORRAT-II (Epperson et al., 2006) is an actuarial measure designed to assist in the evaluation of risk of sexual violence among male juvenile offenders aged 12 to 17 years old who have been committed a sexual offence. It is the only known actuarial risk assessment measure designed specifically for adolescent sexual offenders and is intended for use by mental health and criminal justice professionals.
The JSORRAT-II comprises 12 items scored following review of file information pertaining to the youth’s sexual and nonsexual offence history (e.g., number of adjudications for sexual offences), treatment history (i.e., completion of sex offender treatment), school history (e.g., school discipline problems), and victimization history (e.g., number of sexual abuse incidents in which the juvenile was a victim). Items were selected by identifying significant predictors of sexual reoffending among a sample of male youth adjudicated for a sex offence (Epperson et al., 2006). Five J-SORRAT-II items are scored as 0 or 1 and the remaining items are scored from 0 to 2 or 0 to 3 to indicate different levels of severity for the particular item. An item is scored as 0 if there is insufficient information to rate it. Total scores between 0 and 2 classify a youth as low risk to sexually reoffend, 3 and 4 as moderately low risk, 5 and 7 as moderate risk, 8 and 11 as moderately high risk, and 12 and above as high risk. JSORRAT-II assessments expire when the evaluee turns 18 years old.
At least five peer-reviewed studies have found support for the IRR of JSORRAT-II total scores and individual items in research settings (i.e., when rated by research assistants). ICC 1 values for total scores have ranged from .89 to .97 (Epperson & Ralston, 2015; Ralston, Sarkar, Philipp, & Epperson, 2015, 2018; Ralston, Epperson, & Edwards, 2014; Viljoen et al., 2008), whereas individual items have displayed greater variability, with ICC 1 values ranging from .67 to 1.00 (Ralston et al., 2014). In addition, the JSORRAT-II has demonstrated excellent internal consistency (α = .97 to .99; Ralston et al., 2015, 2018).
With respect to convergent validity, the JSORATT-II total score has demonstrated a moderate correlation with the total score of the Juvenile-Sex Offender Assessment Protocol-II (Prentky & Righthand, 2003; r = .28, Viljoen et al. 2008) and a small correlation with the total score on the SAVRY (Borum, Bartel, & Forth, 2006; r = .19).
Predictive validity of the JSORRAT-II has been evaluated in at least eight peer-reviewed studies. In their 2012 meta-analytic review of seven JSORRAT-II studies (including four unpublished studies and the JSORRAT-II development sample), Viljoen, Mordell, and Beneteau (2012) reported small weighted effect sizes for sexual reoffending (r w = .12; AUC w of .64 and .61 when the initial development sample was and was not included in the analysis, respectively). Subsequent to this meta-analysis, several peer-reviewed studies have provided further evidence for the predictive utility of the JSORRAT-II for sexual reoffending (AUC = .65 to .70; Epperson & Ralston, 2015; Ralston et al., 2014, 2015, 2018; but see also Rasmussen, 2018). To date, no peer-reviewed studies have examined the incremental validity of the JSORRATT-II.
The JSORRAT-II authors recommend that its use be limited to jurisdictions in which it has been validated (Epperson et al., 2006; i.e., California, Georgia, Iowa, Utah). As such, the JSORRAT-II currently is used in only a small number of states. To our knowledge no studies have examined user satisfaction for the JSORRAT-II.
The SAM (Kropp, Hart, & Lyon, 2008) is a SPJ risk assessment measure developed to assess the risk of stalking or criminal harassment among individuals with a known or suspected history of stalking. The SAM defines stalking as any “unwanted and repeated communication, contact, or other conduct that deliberately or recklessly causes people to experience reasonable fear or concern for their safety or the safety of others known to them” (Kropp et al., 2008, p. 1).
The SAM comprises 30 risk factors across three domains pertaining to (1) the pattern and seriousness of the stalking behaviour (Nature of Stalking, e.g., intimidates victims, stalking is persistent); (2) the psychosocial adjustment and background of the perpetrator (Perpetrator, e.g., angry, intimate relationship problems); and (3) the ability of the victim to engage in self-protective behaviours (Victim Vulnerability, e.g., inconsistent behaviour toward perpetrator, employment and financial problems). Each risk factor is coded with respect to the perpetrator’s (1) current or most recent pattern of stalking behaviour and (2) previous pattern of stalking, if applicable. Based on the presence and relevance of each item, evaluators provide several SRRs: (1) case prioritization; (2) risk for continued stalking; (3) risk for serious physical harm; (4) reasonableness of victim’s fear; and (5) immediate action required (Kropp et al., 2008).
At least six peer-reviewed studies have examined the reliability of the SAM. Good to excellent interrater reliability for the SAM total “score” (ICC 1 = .82, ICC 2 = .77 to .90), SRRs (ICC 1 = .39 to .77 and ICC 2 = .44 to .57 for case prioritization, ICC 1 = .66 to .71 and ICC 2 = .83 for future stalking, ICC 1 = .44 to .50 and ICC 2 = .61 and for serious physical harm), and domain “scores” (ICC 1 = .77 to .91 and ICC 2 = .64 to .87 for Nature of Stalking, ICC 1 = .68 to .92 and ICC 2 = .81 to .87 for Perpetrator, and ICC 1 = .44 to .72 and ICC 2 = .61 to .77 for Victim Vulnerability) have been reported among trained research assistants (Foellmi, Rosenfeld, & Galietta, 2016; Gerbrandij, Rosenfeld, Nijdam-Jones, & Galietta, 2018; Kropp, Hart, Lyon, & Storey, 2011; Shea, McEwan, Strand, & Ogloff, 2018; Storey & Hart, 2011; Storey, Hart, Meloy, & Reavis, 2009). At least one peer-reviewed study has examined the internal consistency of the SAM. Foellmi and colleagues (2016) reported good internal consistency for the SAM total “score” (α = .75) and “scores” on the Nature of Stalking (α = .60) and Perpetrator (α = .70) domains (there was insufficient information to rate the Victim Vulnerability domain).
With respect to concurrent validity, SAM total and subscale “scores” and SRRs have been found to have moderate to strong associations with the Brief Spousal Assault Form for the Evaluation of Risk (B-SAFER; Kropp, Hart, & Belfrage, 2005; r = .74 for total “scores”, r = .36 to .64 for SRRs; Gerbrandij et al., 2018), Psychopathy Checklist: Screening Version (PCL:SV; Hart, Cox, & Hare, 1995; r = .20 to .51), and the Violence Risk Appraisal Guide (VRAG; r = .21 to .25; Kropp et al., 2011; Storey et al., 2009).
The predictive utility of the SAM has been evaluated in at least three peer-reviewed studies. Using Cox Proportional Hazard analyses (which examines the association between a tool and the imminence or rate of reoffending), Foellmi and colleagues (2016) found that SAM total “scores” were significantly associated with the imminence of stalking recidivism among offenders receiving community treatment (Hazard Ratio [HR] = 1.11, p < .01). Neither SRRs nor the Nature of Stalking and Perpetrator domains “scores” were significantly predictive. In a mixed sample of male intimate partner violence and stalking offenders, Gerbrandij and colleagues (2018) found that SAM total “scores” had weak predictive accuracy for stalking and violent recidivism (AUC = .40 to .60). However, Shea and colleagues (2018) found moderate to large predictive accuracy of total “scores” for stalking recidivism (AUC = .76) and case prioritization (AUC = .69) in a sample of female and male offenders with stalking-related charges who were participating in court-ordered assessment and/or treatment.
With respect to incremental validity, Gerbrandij and colleagues (2018) found that the SAM did not add incremental utility above and beyond the B-SAFER in the prediction of stalking or violent recidivism. In addition, the SAM did not add incremental utility above the PCL:SV in the prediction of violent recidivism. However, the SAM did improve the prediction of stalking recidivism above and beyond the PCL:SV.
The SAM has been translated into at least two languages (Swedish and Norwegian) and is used in at least four countries (Canada, Sweden, England, and Wales; Belfrage & Strand, 2009; Storey & Hart, 2011). In addition, professionals have been trained to use the SAM in Norway, Denmark, Holland, and Switzerland. Studies conducted in mental health and law enforcement settings have reported that the SAM has good perceived clinical or operational utility (Belfrage, & Strand, 2009; Kropp et al., 2011; Storey et al., 2009). Moreover, police officers have reported that the SAM was easy to code and helpful in initiating protective action (Belfrage & Strand, 2009).
The SRP (MacKenzie et al., 2009) is a SPJ measure developed to assist in evaluation and management of different domains of risk related to stalking: stalking-related violence, stalking persistence, stalking recurrence, and psychosocial injury to the stalker (i.e., likelihood that persons engaging in stalking will experience significant psychological or social harm due to their behaviour). It is intended for use with individuals with a known or suspected history of stalking, defined as a “pattern of targeted, repeated, and unwanted intrusive acts that can be reasonably expected to cause apprehension, distress, or fear in the victim” (McEwan et al., 2018, p. 1). Due to the technical expertise required for SRP administration (e.g., use of standardized tests of cognitive functioning, personality traits, or interpersonal attachment style), it is meant to be used by mental health professionals or law enforcement professionals in collaboration with a qualified mental health professional.
The SRP incorporates the motivational typology of stalking developed by Mullen and colleagues (Mullen, Pathe, & Purcell, 2009; Mullen, Pathe, Purcell, & Stuart, 1999). Through use of a decision tree (see Mullen et al., 2006), the evaluator classifies the examinee into one of five types intended to indicate the apparent function of the stalker’s behaviour (i.e., rejected, resentful, intimacy-seeking, incompetent suitor, or predatory) based on the nature of the examinee’s relationship with the victim, the examinee’s motivation for seeking unwanted contact with the victim, and the presence of specific psychopathology. There are 81 dynamic items on the SRP; however, only a subset is coded for each stalker type (McEwan et al., 2018). In other words, although each domain of risk includes “general” risk factors that are assessed for all stalker types, the SRP also recognizes that relevant risk factors and corresponding case management strategies vary by stalker motivation. Except for two items that are rated dichotomously, items are rated on a 3-level scale.
At least one peer-reviewed study has reported psychometric properties of the SRP. Using data from 241 men and women who were charged with stalking or stalking-related offences, and underwent evaluation using the SRP, McEwan and colleagues (2018) found good to excellent IRR for SRRs (ICC 1 = .70 to .90), moderate to excellent IRR for domain scores (ICC 1 = .65 to .98), and excellent classification of stalker types (kappa [κ] = .98) between clinical/forensic psychologists and one of the first two authors of the SRP. With the exception of two items related to the examinee, Refusal to conform to legal directives (ICC 1 = .09) and Sense of entitlement (ICC 1 = .26), there was fair to substantial IRR for individual SRP items (ICC 1 > .61; exact ICC 1 values not reported). Internal consistency also was reported (r = .51 to .82 between domain scores and SRRs).
McEwan and colleagues (2018) reported moderate to large predictive accuracy for long-term (i.e., average 4-year) stalking persistence (AUC = .68), stalking recurrence involving the same (AUC = .68) or different (AUC = .66) victims, and any persistence or recurrence of stalking towards the same victim (AUC = .73). In contrast, predictive accuracy was weak to moderate for short-term (i.e., 6 month) stalking persistence (AUC = .53), stalking recurrence involving the same (AUC = .63) or different (AUC = .75) victims, and any persistence or recurrence of stalking towards the same victim (AUC = .63); however, these findings may reflect that the majority of participants were incarcerated during the 6-month period following their SRP assessment. Due to low base rates, predictive utility of the SRP for stalking-related physical violence or psychosocial injury to the stalker was not examined in the study. To our knowledge, no peer-reviewed studies have examined the concurrent or incremental validity of the SRP.
The SRP has been implemented in forensic psychiatric settings in at least three countries (Australia, England, and Wales; MacKenzie & James, 2011). Although user satisfaction for the SRP has not been evaluated empirically, editorial commentary has described it as being helpful for tailoring case management and treatment plans to specific stalker types (Schwartz-Watts, 2006).
The PATRIARCH (Kropp, Belfrage, & Hart, 2013) is an SPJ measure designed to assist in assessment and management of risk for “any actual, attempted, or threatened physical harm, including forced marriages, with honor as the motive” (Belfrage, Strand, Ekman, & Hasselborg, 2011, p. 21). The PATRIARCH comprises 15 items across three domains: (1) nature of the examinee’s history of honour-based violence, or HBV (e.g., attitudes that support HBV); (2) examinee risk factors (e.g., problems with cultural integration); and (3) victim vulnerability factors (e.g., extreme fear) that were taken from the B-SAFER. Items are rated for two time periods: current and in the past. Evaluators provide SRRs regarding case prioritization, risk of life-threatening violence, and risk of imminent violence.
Internal consistency of the PATRIARCH has been examined in at least two peer-reviewed studies. Belfrage and colleagues (2011) found that four Perpetrator (i.e., escalation, attitudes that support honour violence, high degree of insult, and personal problems) and two Victim Vulnerability (i.e., inconsistent behaviour and unsafe living situation) items correlated significantly with SRRs (r values not reported). Strand (2015) reported a large correlation between acute risk and risk for serious or fatal violence (Kendall’s tau-b [τb] = 0.67). To our knowledge there are no peer-reviewed studies examining the IRR or validity of the PATRIARCH.
Professionals have been trained to use the PATRIARCH in Belgium, Canada, Norway, Sweden, and Switzerland. In research conducted in Sweden, criminal justice professionals perceived the PATRIARCH as easy to administer, and their use of the tool was associated with increased use of preventative measures for victims (e.g., initiation of no contact orders, protected living) and more comprehensive criminal investigations (Strand, 2015).
With literally hundreds of tools developed to screen or assess risk for violence, evaluators must be informed consumers. In addition to ensuring the tool(s) selected have the requisite properties to address the referral question or assessment purpose (e.g., screening people in for further assessment, sorting people into groups of putatively lower and higher risk, rank-ordering people in terms of their relative risk for violence, identifying immediate risk management strategies, engaging in case management and reassessment over time, etc.), evaluators also should be guided by the quality and quantity of empirical support for the tool. In this chapter, we reviewed several tools that have clinical promise by virtue that there exists some research and professional uptake in support of their use, but for which a well-established empirical foundation has not yet accrued.
Each of the brief and emerging measures reviewed in this chapter has been evaluated in at least one peer-reviewed publication; however, measures vary in terms of the quantity and type of research support they have. Whereas some measures, such as the DASA-IV, DVSI/DVSI-R, JSORRATT-II, and V-RISK-10, have achieved moderate predictive accuracy in at least two studies, others, such as the DRAOR, PATRIARCH, SAM, and SRP, have limited or no evidence on their predictive utility, or findings are mixed (e.g., some studies have reported only modest predictive accuracy). Similarly, while some measures (e.g., the DVSI/DVSI-R, JSORRATTII, V-RISK-10) have evidence to support their interrater reliability (and internal consistency) and concurrent incremental validity, others (e.g., the SRP) have limited or no evidence on these features available. Finally, some measures have research reporting their perceived user satisfaction (e.g., DASA-IV, DRAOR, SAM, PATRIARCH), whereas others do not (e.g., JSORATT-II). Future research efforts should be dedicated towards further investigating the psychometric properties and applicability of these measures.
We recommend that more work be conducted in terms of investigating whether the psychometric properties of these tools vary as a function of demographic characteristics (e.g., gender, race/ethnicity), level of risk, rater type, purpose of the evaluation (e.g., research assistants completing the tool for basic research paradigms versus professionals using the tool to make real-world case decisions in field settings), and geographic region (e.g., in the country where the tool was developed versus elsewhere). Researchers should also examine the degree to which clinical overrides or SRRs add incrementally to numerical scores, as well as the perceived clinical utility of these competing approaches to combining data; whether change on arguably dynamic measures is associated with increases or decreases in subsequent violence outcomes; and the degree to which measures whose primary purpose is to guide case management and intervention assist in case planning or treatment matching. Additionally, in several studies reviewed in this chapter, the author(s) of the risk assessment measure was also an author(s) on a study investigating the psychometric properties of that measure. Research by independent investigators is needed, as empirical evidence of a researcher allegiance effect has been reported (e.g., Singh, Grann, & Fazel, 2013; cf. Guy, 2008). Last, given that these measures show considerable promise to assist in the evaluation of risk, additional efforts, particularly with measures that are currently in use in a single country or a small number of U.S. states, should be made to validate and implement these measures in different jurisdictions.