Background
Audit and feedback, as a component of quality improvement, aims to improve the uptake of recommended practice by reviewing clinical performance against explicit standards and directing action towards areas not meeting those standards [
1]. Around 60 national clinical audit (NCA) programmes in the UK [
2] inform service improvement across priorities such as diabetes, stroke, and cancer.
Whilst a number of components which may enhance effectiveness have been identified (e.g. providing feedback more than once), feedback effectiveness remains difficult to predict [
1,
3,
4]. Rigorous evaluation methods, including randomised trials, can establish the relative effectiveness of alternative feedback components. Given there are many potential ways of delivering feedback components (e.g. timing, comparators, display characteristics), with or without co-interventions (e.g. educational meetings, computerised reminders), addressing all would require a prohibitive number of head-to-head trials and would not allow investigation of interactions between interventions and their components. More efficient methods are needed to prioritise which feedback components to study and to take forward to definitive trials.
The multiphase optimization strategy (MOST) offers a methodological approach for building, optimising, and evaluating multicomponent interventions [
5,
6]. MOST comprises three steps:
preparation, laying the groundwork for optimisation by conceptualising and piloting components;
optimisation, conducting trials to identify the most promising single or combined intervention components; and
evaluation, a definitive randomised trial to assess intervention effectiveness. Earlier implementation studies have used a similar approach to define the most promising “active ingredients” for further study [
7,
8], including experiments that systematically vary components of an intervention within a randomised controlled design in a manner that simulates a real situation as much as possible. Interim endpoints (e.g. behavioural intention, behavioural simulation) are measured rather than actual behaviour or healthcare outcomes. A key mechanism of effect of audit and feedback interventions is that they operate by increasing recipients’ intention to enact desired changes in accordance with the audit standards.
We undertook the first and second steps of MOST to develop and investigate the single and combined effects of different feedback components (hereby referred to as “feedback modifications”). We began with a set of 15 theory-informed suggestions for effective feedback, identified through expert interviews, systematic reviews, and our own experience with providing, evaluating, and receiving practice feedback [
3]. These suggestions were grouped under the nature of the desired action (e.g. improving the specificity of recommendations for action), the nature of the data available for feedback (e.g. providing more rapid or multiple feedback), feedback display (e.g. minimising unnecessary cognitive workload for recipients), and delivery of feedback (e.g. addressing credibility of information). We considered and added a further suggestion (incorporating the patient voice) in response to current policy drives to involve patients and members of the public more in health service organisation and delivery [
2].
We used a structured consensus process, involving audit and feedback developers, recipients and researchers, and public representatives, to select the following six feedback modifications as high priority for investigation in an online fractional factorial screening experiment [
9]:
effective comparators,
multimodal feedback,
specific actions,
optional detail,
patient voice, and
cognitive load. The consensus panel guided our selection based on the need for further research, likely feasibility of adoption by national clinical audits, feasibility of delivery within an online experiment, and user acceptability. We then engaged professionals typically involved in developing or targeted by NCAs in three rounds of user-centred design to develop and apply the modifications within an audit report excerpt and design a web portal for the online experiment.
In the second stage of a MOST, reported here, we used a randomised fractional factorial screening design, to investigate and optimise the most promising single and combined effects of the six modifications on interim outcomes. We chose a factorial design to allow all six modifications and their interactions to be investigated simultaneously. By randomising participants to multiple modification conditions, all participants contribute to the evaluation of each effect, with a reduced sample size compared to an equivalent evaluation in multiple or multi-arm multistage adaptive trials, which would in any case detect different estimands (i.e. simple effects rather than main and interaction effects) to those of interest here.
Methods
Design overview
We conducted an online, fractional factorial screening experiment. Six modifications to feedback (Table
1; see also Additional file
1) were each operationalised in two versions (ON with the modification applied, OFF without modification) and applied within audit report excerpts for five different NCAs. We randomised participants to receive one of 32 combinations of the modifications, stratified by NCA. After viewing the audit excerpt, participants completed a short questionnaire, to generate all study outcomes. This study is reported as per the CONSORT guideline for randomised trials [
10].
Table 1
The six feedback modifications selected in our online fractional factorial screening experiment
A. Effective comparators Feedback is typically given in the context of a comparator. Select comparators according to their ability to change or reinforce the desired behaviour | ON when showing the top 25% nationally as the comparator OFF when showing the mean average |
B. Multimodal feedback Present feedback in different ways to help recipients develop a more memorable mental model of the information presented, allow interaction with the feedback in a way that best suits them, and reinforce memory by repetition | ON if the performance result text was accompanied by a graphical display of performance data OFF when the graphical display was absent |
C. Specific actions Specify desired behaviour to facilitate intentions to perform that behaviour and enhance the likelihood of subsequent action | ON if the feedback suggested specific recommendations for action (i.e. who needs to do what, differently, with or to whom, where and when) OFF when such recommendations were absent |
D. Optional detail Provide short, actionable messages with optional information available for interested recipients. Feedback credibility can be enhanced if recipients are able to ‘drill down’ to better understand their data | ON if short messages with clickable, expanding links to explanatory detail were included OFF when these links were absent |
E. Patient voice Explicitly link patient experience to audit standards to highlight the importance of providing high-quality care and hence increase motivation to improve practice | ON when a box including a photograph of a fictional patient was added, with a quotation describing their experience of care related to the associated audit standard OFF when these were absent |
F. Cognitive load Minimise the effort required to process information by prioritising key messages, reducing the amount of data presented, improving readability, and reducing visual clutter | ON when distracting detail was minimised OFF if additional general text not directly related to the audit standard and feedback on other audit standards was added |
Setting and participants
We collaborated with five UK NCAs covering a range of clinical priorities: the Myocardial Ischaemia National Audit Project (MINAP) [
11], National Comparative Audit of Blood Transfusion (NCABT), Paediatric Intensive Care Audit Network (PICANet), and Trauma Audit Research Network (TARN) in secondary care and the National Diabetes Audit (NDA) in primary care. The NCABT, MINAP, and TARN each covered more than 150 National Health Service (NHS) trusts in England alone. PICANet included 34 paediatric intensive care units, and the NDA covered all (approximately) 7500 general practices in England.
Each NCA emailed invitations containing the link to the online experiment to their distribution lists of feedback recipients, i.e. clinicians, managers, nurses, and commissioners; all were eligible to participate. Prior to experiment entry, participants were required to confirm informed consent. On completing the experiment, participants were offered the opportunity to view evidence-based guidance on how to improve their own audit and feedback practice. Participants were also offered a £25 voucher and certificate of completion. Email addresses provided for voucher and certificate requests were not linked to experiment data to preserve anonymity.
After opening to recruitment, we identified a serious breach of study integrity involving inappropriate repeated participant completion of the experiment linked to a single general practice in order to claim multiple £25 vouchers for completion. This occurred within a 5-day period subsequently defined as the “contamination period”. We therefore temporarily closed the experiment to enhance security. Additional experiment entry criteria, applied prior to randomisation, required participants to provide NHS or Health and Social Care Northern Ireland email addresses. These were validated to confirm that participants had not previously completed the experiment and to prevent those who had, from proceeding; email addresses remained unlinked to experiment data to retain anonymity.
Intervention
Following consent, participants selected the audit relevant to them, before indicating their role and organisation. Participants were then randomised to be presented with one of 32 versions of the excerpt of an audit report comprising different combinations of the six modifications (each ON or OFF). Participants were informed that the excerpt contained simulated but realistic data.
The audit excerpts followed a basic template (Additional file
1). The page was titled with the relevant audit (e.g. “National Diabetes Audit Report”) and a statement that the data were collected in 2018. The excerpt showed an audit standard (e.g. “Patients with type 2 diabetes whose HbA1c level is 58 mmol/mol or above after 6 months with single-drug treatment are offered dual therapy”) and the result (e.g. “Our practice achieved this standard of care for 86% (318/370) of patients”). NCA collaborators advised on the selection of audit standards to help ensure experiment participants perceived them as valid and credible [
12]. The remaining content depended on which combination of the six feedback modifications participants were randomised to (Table
1).
Outcomes
The primary outcome was participant intention to adhere to an audit-specific standard (Table
2). Intention has a known, if limited, ability to predict behaviour that may inform intervention development and early evaluation [
13‐
15].
Table 2
NCA standards contributing to experiment outcomes
NCABT | Clinical staff should prescribe tranexamic acid for surgical patients expected to have moderate or more significant blood loss unless contraindicated |
NDA | Patients with type 2 diabetes whose HbA1c level is 58 mmol/mol (7.5%) or above after 6 months with single-drug treatment are offered dual therapy |
MINAP | Adults with non-ST-segment-election myocardial infarction or unstable angina who have an intermediate or higher risk of future adverse cardiovascular events are offered coronary angiography (with follow-on percutaneous coronary intervention if indicated) within 72 h of first admission to hospital |
PICANet | Minimise the number of unplanned extubations for paediatric intensive care patients per 1000 days of invasive ventilation |
TARN | Patients who have had urgent 3D imaging for major trauma should have a provisional written radiology report within 60 min of the scan |
We aimed to minimise unintended “loading” of responses of intention due to social desirability bias by presenting the target behaviour in the context of other behaviours that would be appropriate, including the introductory statement, “Considering the time and resources available to you and other clinical priorities …”, and anchored items over “the next three months”.
The primary outcome measured intention as the mean value across three items beginning with the stem statements, “I intend”, “I want”, and “I expect”. Each item was followed by the appropriate audit standard, e.g. “Over the next three months, I
[intend/want/expect] to ensure that our patients with type 2 diabetes whose HbA1c level is 58mmol/mol or above following 6 months with single-drug treatment are offered dual therapy”. Responses to each item followed a 7-point Likert scale and were scored −3 (completely disagree) through to +3 (completely agree). Previous testing of these stems indicated that they measure the same concept, with Cronbach’s alpha values above 0.9 [
16].
Secondary outcomes, mainly assessed on a −3 to +3 Likert scale, comprised the following:
-
Proximal intention evaluating participants’ intention to undertake other actions in response to feedback: bring the audit result to the attention of colleagues, set goals, formulate an action plan, and review personal performance in relation to the audit standard.
-
Comprehension using a single item (“I found the information in this audit report excerpt easy to understand”) adapted from the Website Evaluation Questionnaire [
17].
-
User experience using the mean value of the positively worded two-item lite version of the Usability Metric for User Experience questionnaire [
18‐
20]: “This audit report excerpt met my information needs”, and “This online audit report excerpt was easy to use”.
-
User engagement using the length of time (in seconds) spent on and the number of “clicks” within the audit report excerpt.
Data collection
After viewing the audit excerpt, participants completed a 12-item questionnaire displayed within the experiment. We recorded the time spent on the excerpt and the questionnaire and the number of “clicks” on the audit page.
Statistical considerations
Experimental design
A full factorial design would require 26 = 64 combinations of the six modifications. We chose a half fraction of the full design, i.e. 32 combinations, to provide a more efficient design to identify the vital few (significant) factors from the trivial many (screening).
We generated our balanced and orthogonal half fractional factorial design [
5,
21], denoted
\({2}_{VI}^{6-1}\), using the defining relation
I = ABCDEF, design generator
F = ABCDE, and effect coding with each level of the six modifications coded as −1 (OFF) and +1 (ON).
The trade-off using the half, rather than the full design, is the introduction of aliasing (confounding) in model effects. Under the half fraction with six factors, all effects are aliased; main effects are aliased with 5-way interactions, 2-way interactions aliased with 4-way interactions, and 3-way interactions form aliased pairs. Under the sparsity of effects principle [
22], in which a system is usually dominated by main effects and low-order interactions, we assume negligible four-way and higher-order interactions and attribute any effects to the main effects and lower-order interactions.
Although the full factorial design, with no aliasing, would have allowed estimation of higher order effects, this would have required increased resource to implement and verify all 64 combinations across the five NCAs. Considering existing knowledge and assumptions about interactions, we therefore chose the half fraction to minimise the number of combinations required whilst allowing estimation of all main effects and 2-way interactions of modifications. We considered but discounted use of the quarter fraction as further aliasing would have compromised the interpretation of 2-way interactions. The full design and alias structure can be found in Additional file
1.
Randomisation and masking
Participants were allocated to one of the 32 combinations of the six feedback modifications, with equal allocation using block randomisation, stratified by NCA. The design was replicated in blocks of the 32 combinations, each partitioned into two blocks of 16 using the alias pair ABF = CDE, to ensure modifications were balanced (each modification has the same number of participants at each level) and orthogonal (sum of the product of any two or more modifications is 0) within each block of 16 participants. A statistician (AWH) prepared the randomisation lists, which were programmed (MA) into the website. Remaining study personnel remained blind to allocation. Participants were, by nature of the experiment, exposed to the randomised audit excerpts but not informed of their allocation.
Sample size
Assuming similar effects of each modification across NCA and role, 500 participants across the five NCAs provided 90% power to detect small-to-moderate main effects (0.3 SDs) for each modification using a two-sided 5% significance t-test. Due to the use of effect coding for each modification, all else being equal, there is equal power for detecting interaction effects (of any order, irrespective of aliasing) of the same magnitude as the main effects. Any antagonistic interactions would reduce the magnitude of main effects; this sample size provided approximately 80% and 70% power to detect reduced main effects of 0.25 SDs and 0.22 SDs, respectively. No allowance for loss to follow-up was required, as data were collected at one time point. As this was a screening experiment, the aim was to identify potentially important effects for further evaluation (by ruling out unimportant effects). No allowance was made for multiplicity because false positives are identified through further experimentation. Detection of promising effects was based on the use of Pareto plots, where a split was made between potentially important and unimportant effects.
Recruitment was permitted to exceed the 500 participant target, up to a maximum of 1200 participants (480 participants per NCA, 15 replications of the 32 combinations of modifications), to increase the power to evaluate potential interaction effects within available resources. We originally planned a 4-month recruitment period.
Statistical analysis
Populations
We defined two modified intention-to-treat (ITT) populations. The primary population excluded all participants recruited during the “contamination period” over which repeated participant completion took place. A secondary population excluded participants who completed the experiment questionnaire in < 20 s for sensitivity analyses (based on the distribution of questionnaire completion times, Additional file
2). This cutoff was chosen to provide a more inclusive population compared to the primary population, aiming to retain valid and unique participants during the contamination period whilst removing those most likely to be duplicative participants who completed the questionnaire in an unfeasible time.
General considerations
Statistical analyses described in a pre-specified plan, approved by the independent statistician from our project steering committee, were conducted in SAS version 9.4 (SAS Institute Inc., Cary, NC). An overall two-sided 5% significance level was used unless otherwise stated.
Analytical approach
To identify and screen for potentially active modifications, we included the six experimental modifications and covariates as independent variables in multivariable linear regression models (using maximum likelihood estimation) with dependent variables for the primary outcome of intention and secondary outcomes of proximal intention, comprehension, and user experience. We used summary statistics to explore the secondary outcome of user engagement.
The pre-specified covariates were as follows:
-
NCA: MINAP, NCABT, NDA, PICANet, and TARN. The NCA with the largest number of randomised participants (NDA) formed the reference category.
-
Randomised design block: block 1 and block 2, using effect (−1, +1) coding.
-
Role: Clinical (allied health professional, fully trained doctor, nurse or nurse specialist, training doctor) and non-clinical (manager, audit and administrative staff). Clinical roles formed the reference category.
We assumed a continuous distribution for all outcomes. We explored the distribution of outcomes using descriptive statistics and graphical display, model diagnostics to check validity of statistical modelling. Although outcomes were collected on a 7-point Likert scale, model diagnostics from the linear models were satisfactory, and this approach was considered more appropriate than alternatives including loss of power from dichotomising the outcome or increased complexity from modelling the data using ordinal regression.
We used effect coding for each modification to ensure parameter estimates, and their interactions provided the main effect (rather than simple effects); that is, the effect averaged across all combinations of levels of the other modifications.
Analysis used a multistage approach for each outcome. Stage 1 used available complete data to identify the most promising modifications and interactions. Stage 2 applied the resulting model using the primary population with multiply imputed missing data.
Stage 1 — complete case analysis
We tested whether an “initial” model, including modification main effects and two-way interactions alongside covariates, was adequate using the lack-of-fit test [
23]. Where lack of fit was observed, we included additional interactions in a “full” model using stepwise selection based on a 15% significance level of the
F statistic, respecting the hierarchy of effects and checking consistency with Akaike’s Information Criteria and Bayesian Information Criteria.
Based on the Pareto principle that a small number of parameters account for a large portion of the effect, we identified the most promising parameters from the “full” model by ranking their absolute standardised effect sizes in Pareto plots [
5]. A final “parsimonious” model was then obtained using backward selection (based on 15% significance level of the
F statistic) to simplify the model whilst retaining NCA, randomised design block, and promising parameters identified via the Pareto plot.
Stage 2 — ITT analysis
We applied stage 1 models to the primary population with multiply imputed missing data using the fully conditional specification predictive mean matching method [
24].
A single missing data model generated 50 imputations across all outcomes using predictors: outcome, NCA, role, NCA*role interaction, modification main effects, and two- and three-way interactions. We applied further interactions, between modification main effects and two-way interactions with NCA and with role, where model convergence allowed. We calculated parameter estimates, associated standard errors, and
p-values using Rubin’s rules [
25].
We compared Pareto plots to ensure the inclusion of appropriate parameters. Where there were differences in parameters meeting the threshold for inclusion, we included parameters identified in either stage in the final “parsimonious” model.
Results present the stage 2 final “parsimonious” models and predicted plots to illustrate the direction and strength of identified main effects and interactions (Additional file
3).
Sensitivity analyses
Sensitivity analysis explored the impact of the inappropriate repeated participant completion by repeating the analysis of the primary outcome using available complete data in the secondary population (excluding participants completing the questionnaire in < 20 s) as compared to the primary population (excluding participants within the contamination period).
Discussion
In an online experiment involving five NCA programmes, none of six feedback modifications independently increased intention to enact audit standards across clinical and non-clinical recipients. However, potentially important synergistic and antagonistic effects were observed when feedback modifications were combined, as well as dominant influences of NCA programme and recipient role. Whilst modification effects were generally small (< 0.1 on a scale of −3 to +3), their combined cumulative effect showed more substantial heterogeneity. Predicted intention for the primary outcome, in clinical participants in the NDA, ranged from 1.22 (95% CI 0.72, 1.72) for the least effective combination including multimodal feedback, optional detail, and reduced cognitive load to 2.40 (95% CI 1.88, 2.93) for the most effective combination including multimodal feedback, specific actions, patient voice, and reduced cognitive load.
Our findings should be considered in the light of Clinical Performance Feedback Intervention Theory (CP-FIT) [
26]. This theory specifies steps in the feedback cycle: choosing standards of clinical performance against which care is measured (goal setting); collection and analysis of clinical performance data (data collection and analysis); communication of the measured clinical performance to health professionals (feedback); reception, comprehension, and acceptance of this by the recipient (interaction, perception, and acceptance, respectively); planned behavioural responses to feedback (intention and behaviour); and changes to patient care (clinical performance improvement). A further step of verification may occur between perception and acceptance where recipients interrogate the data underlying their feedback. CP-FIT proposes that feedback, recipient, and context variables operate via a range of mechanisms (e.g. credibility of feedback, social influence) to determine success or failure of the feedback cycle.
Our six feedback modifications and study outcomes mainly focused on perception and intention, although our modifications also targeted interaction, acceptance, verification, and behaviour to lesser extents. Only reduced cognitive load alone had positive effects on perception, and effective comparator had negative effects. Other feedback modifications’ effects were conditional on interactions, some of which had intuitive explanations. Providing optional detail and multimodal feedback both entails giving additional information to audit recipients; combining their overlapping functions led to intention being less than the sum of their parts (i.e. an antagonistic interaction). Other interactions were difficult to explain, if not counterintuitive, such as both synergistic and antagonistic interactions between multimodal feedback and cognitive load for different outcomes, NCAs, and roles. Such a range of findings reflect the exploratory nature of this screening experiment, which aimed to detect the most promising signals of effects for further study.
Our findings suggest that the recipient and context variables of CP-FIT, which approximated to role and NCA in our study, have greater influences on feedback effectiveness than single feedback modifications. Participation in the NCABT was associated with lower intention relative to the NDA as was having a non-clinical role, with the exception of NCABT non-clinical participants. These variations may reflect differences in audit organisation and specialty engagement with audit programmes. For instance, there was a trend towards higher intention in PICANet; this highly specialised audit has a relatively small number of participating sites and may therefore represent a more cohesive, engaged, and responsive network compared with other NCAs. By comparison, the NCABT services a diverse range of topics and clinical settings. A consequence could be differing levels of familiarity with the audit standard selected for each NCA, its credibility, or perceived difficulty of achieving the standard. This is supported by the finding that comprehension and user experience varied less by NCA.
With the exceptions of MINAP and NCABT, intention was generally higher for clinical than managerial, audit, or administrative roles. This is consistent with an earlier modelling experiment evaluating audit and feedback, which found that changes in simulated behaviour were mediated through perceived behavioural control [
7]. In our study, clinicians may have perceived greater control over their ability to implement audit standards than those in other roles.
Strengths and limitations
Previous modelling studies have largely evaluated how feedback in general affects cognitions, but not the effects of individual feedback components [
7,
27,
28]. Our fractional factorial design provides information on the effects of both individual and combined modifications and their interactions, demonstrating a rigorous approach for developing multicomponent interventions. Our analysis populations exceeded our sample size requirement of 500 participants, providing over 90% power to detect small to moderate main and interaction effects for each modification. Our use of effect coding also ensured equal power to detect main and interaction effects of the same size. The five NCAs provided diversity in audit methods, topics, and targeted recipients, thereby increasing confidence that the effects we found across NCAs are relevant to a wider range of audit programmes.
Five main study limitations concern the design and “dose’ of the online feedback modifications. First, we selected feedback modifications amenable to online experimentation and which could be operationalised with reasonable fidelity to the original suggestions for effective feedback. Nevertheless, where anticipated effects were not detected, we must consider whether the online feedback modifications were optimised to deliver a sufficient “dose” to influence participant responses and how these could be strengthened in future online experiments or “real-world” pragmatic trials. One case in point is
multimodal feedback; whilst the Cochrane review indicated that feedback may be more effective when it combines both written and verbal information [
1], we operationalised this modification by adding graphical to textual information. The intervention dose may also have been reduced by limited duration of exposure. We originally estimated a completion time of 20–25 min for the audit excerpt and survey; however, participants spent a much lower median time of just over a minute on audit excerpts and less than 5 min on the experiment overall. Whilst these short durations reflect limited engagement, it is uncertain how long feedback recipients would typically spend examining feedback in actual practice settings; it may be relatively brief given competing demands for attention. Therefore, this aspect of our experiment may have reasonable external validity given that much NCA feedback is delivered electronically.
Second, we set out to design a screening experiment which would be relatively sensitive in detecting changes in proximal outcomes of behaviour change, specifically intention to enact audit standards. We would expect some attenuation of effects on intention when the feedback modifications are applied in “real-world” practice, largely because of numerous post-intentional influences on practice (e.g. time and resource constraints). Furthermore, we had anticipated that outcomes measuring intentions would exhibit skew towards higher intention, partly due to social desirability bias. We attempted to neutralise some of this bias by offering statements which recognised that participants would have competing priorities in normal practice. However, the general skewness of outcomes towards higher intentions imposed a ceiling effect on our ability to detect change.
It is worth considering whether intention is the most appropriate primary outcome to use in screening experiments of audit and feedback. CP-FIT hypothesises that several factors, both upstream and downstream to intention, affect the ability of feedback to change clinical behaviour [
26]. Upstream influences include interaction with and perception and verification of feedback data. For example, we found that
reducing cognitive load improved comprehension of data and increased intention to bring audit findings to the attention of colleagues when accompanied by
multimodal feedback. Therefore, any future experiments could use a wider range of outcomes to reflect different aspects of the whole audit and feedback cycle.
Third, we noted that 11.3% of participants (most commonly managers) dropped out of the experiment prior to questionnaire completion. This suggests a modest degree of self-selection, so that those who completed the experiment might have perceived the experiment or the feedback as more relevant to their roles than those who did not.
Fourth, the integrity of the experiment was threatened by a significant number of duplicative responses. Designing our experiment to maintain participant anonymity of responses meant we could not identify the duplicative responses within experiment data. We therefore minimised the impact of duplicative responses by removing all 603 (49%) responses over the affected period to ensure the primary analysis only included genuine, independent responses. This approach simultaneously discarded unidentifiable genuine responses, representing a waste of research resources and participant time. We conducted sensitivity analyses, excluding only participants who spent less than 20 s completing the experiment questionnaire. This resulted in far fewer exclusions (280, 23%) and a greater proportion of participants included from general practice. Sensitivity analysis of the primary outcome largely supported the modification effects identified. However, we also identified additional effects not detected in the primary analysis, in part due to increased sample size but also due to differences between the two groups of study participants.
Finally, it is noted that any significant effects discussed could be due all or in part to aliasing (designed confounding) (Additional file
1), although this is considered unlikely based on the sparsity of effects principle.
Implications for practice and research
Our screening experiment aimed to identify single and combined feedback modifications worthy of further real-world trial evaluation. We detected promising signals of effects on intentions for certain combinations of feedback modification and mixed effects of single and combined modifications on a range of proximal intentions, comprehension, and user experience. Although we would be cautious in generalising from an online experiment, we highlight findings with implications for audit programme design and delivery.
We observed potentially important differences between NCAs for intention to enact the audit standards used in the experiment. Further work should explore which aspects of the audit standards, audit organisation, or targeted recipients account for these variations. Our findings suggest a need for national audits to explicitly review the strengths and weaknesses of their whole audit cycles to identify priorities for change. Clinical recipients were more likely to report higher intention than managerial, administrative, and audit staff. Audit programmes should consider reviewing how their feedback is disseminated to staff who are most likely to be able to act on it, particularly clarifying expectations and goals for managers.
The varying interactions between feedback modifications we observed suggest that audit programmes cannot presume that all proposed enhancements to feedback are additive and highlight the need to explicitly consider how different features of feedback might fit and act together, synergistically or antagonistically. As audit and feedback developers are faced with making design decisions on what to include in their feedback interventions, we make specific suggestions based on modification effects supported by good or consistent evidence from the combined analysis of five NCAs:
-
Using a comparator aiming to reinforce desired behaviour change (effective comparators), which shows recipient performance against the top quarter of performers compared to showing a comparison against overall mean performance, may reduce how easily participants understand audit results and their overall user experience unless accompanied by short, actionable messages with progressive disclosure of additional information (optional detail).
-
Combining optional detail and a quotation and photograph from a fictional patient describing their experience of care related to the associated audit standard (patient voice) may improve recipient experience.
-
Combining multimodal feedback with optional detail may reduce intentions to implement audit standards and set goals, comprehension, and recipient experience.
-
Many recipients may invest relatively brief time in digesting feedback. Minimising cognitive load, by removing distracting detail and additional general text not directly related to the audit standard, may improve comprehension and, when combined with multimodal feedback, intention to bring audit findings to the attention of colleagues.
Acknowledgements
Thank you to our project steering committee: Paula Whitty, Paul Carder, Chris Dew, Roy Dudley-Southern, Steven Gilmour, Mirek Skrypak, and Laurence Wood. We thank our patient and public involvement panel, comprising Pauline Bland, Allison Chin, Susan Hodgson, Gus Ibegbuna, Chris Pratt, Graham Prestwich, Martin Rathfelder, Kirsty Samuel, and Laurence Wood. We are also grateful for the support of the five national clinical audits that are featured in this research, and particularly, we wish to thank Antoinette Edwards and Fiona Lecky from TARN and John Grant-Casey for the support from the NCABT.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.