New Study Reveals Digital Forensic Examiners Inclined to Biasability
by Casey J. Bastian
In a first-of-its-kind study, renowned cognitive bias expert Itiel Dror and co-author Nina Surnde researched the reliability and biasability (the impact of contextual information) of digital forensics (“DF”) experts’ performance. The study is titled “A hierarchy of expert performance (HEP) applied to digital forensics: Reliability and biasability in digital forensics decision making,” (“Study”) and was published online May 24, 2021, in the professional journal Forensic Science International: Digital Investigation.
The study plainly illustrated that contextual information could bias an examiner’s observations, and the level of consistency between experts’ analyses is typically low. “Although high reliability between [DF] examiners is anticipated, it is important to be aware that consistency does not imply accuracy or validity,” Dror and Surnde write. “Consistency may arise from a variety of reasons. This entails that quality measures should not only be directed toward the tools and technology, but also the human. It is not possible to calibrate a human the same way as a technical instrument, but measures such as blind proficiency testing through the use of fake test cases may provide knowledge about human performance. This knowledge is an essential foundation for transparency and designing effective measures that can minimize errors, or detect them before they cascade further into the investigation process.”
The use of digital evidence in modern criminal investigations is only increasing. Due to both the integration of technological devices in everyday activities and new DF capabilities, the assumption that digital evidence is inherently objective and credible can prove dangerous. The reality is that there are a range of possible errors and uncertainties in digital evidence itself. Errors can also be introduced as a result of human factors within the DF process as the evidence is gathered. DF is a fairly new, fast-changing area of forensic science. A variety of judgments, including decisions requiring interpretation and subjectivity, are involved in DF. The constantly changing landscape of technology, in both the evidence and the forensic tools used to conduct DF, create a “quality challenge.” The consequence is that the quality of the DF outcome, the digital evidence itself, is heavily influenced by human cognitive factors, which leads to bias and errors.
Understanding the sources of bias and error is vital to the development of effective quality measures and increased transparency in the DF process. The fair administration of justice is threatened by invalid digital evidence and such evidence can result in wrongful convictions. Accordingly, the purpose of the Study is to address two key questions: (1) Are DF examiners biased by contextual information when, making observations, interpretations of observations, or in their conclusions during the analysis of digital traces? (2) Are DF examiners consistent with one another when making observations, interpretations of observations, or conclusions during the analysis of digital trace?
The participants in the Study consisted of 53 DF examiners, nine women and 44 men, from eight countries, including Canada, Denmark, Finland, India, Kenya, Norway, the Netherlands, and the United Kingdom. Most participants had at least a limited criminal investigations background. Each participant was tasked with analyzing an evidence file from Digital Corpora that represented an average workplace operating system. This provided evidence file was used to generate data in a “scenario explaining an incident of confidential information leakage.” The DF examiners were instructed to work alone and were not allowed to consult with any colleagues during the experiment.
HEP is a framework used to understand and quantify expert performance in DF. Three perspectives of decision making are covered through HEP. Per Dror and Surnde, the first perspective is reliability versus biasability, the second perception is observation versus conclusions, and the third is differences between (both among and across) experts versus within experts (same expert and same evidence, but at different times). The two fundamentals of decision making are reliability and biasability. The concept of reliability pertains to “the consistency, reproducibility, or repeatability of decisions, i.e. would the same observations and conclusions be made on the same evidence.” Reliability is not the same as validity. Simply because a DF examiner produces results that are consistent does not mean those results are valid. A reliable tool or repeatable process might produce consistent results that are nonetheless invalid. Reliability is concerned with consistency, but validly is concerned with correctness. There is an integral connection between these concepts. If there is no reliability, there can be no consistency, and you can thus never acquire validity without it. Biasability pertains to the effects of contextual information. That is, task-relevant or task-irrelevant information and other sources of biases impacting observations and conclusions. Dror previously published a study in 2020 that identified eight sources of bias typically observed in contexts like DF.
Task-irrelevant information can include information that a DF examiner does not need to know such as that the suspect was arrested, the suspect confessed, or even the suspect’s race. Determining what is task-relevant and task-irrelevant is not always as obvious. While task-relevant information is often necessary for DF examiner decision making, it too must be managed and provided at an appropriate time to limit bias within the expert process. The classification of information being relevant or irrelevant can be a difficult task and does vary from case to case.
In the Study, the same evidence file and investigative scenario were given to all 53 DF examiners, but varying contextual information was provided. One set of contextual information indicated strong guilt, weak guilt, or innocence. The control group received only the scenario but no contextual information. The DF examiners were required to document relevant findings in their report. Eleven different “traces” of varying sophistication were selected within the evidence file for comparison of the proportion of observations. The traces identified as A1-A11 were not complex, and only basic DF skills would be necessary to find each. Not one of the DF examiners who participated in the study found all 11 traces, and a mere eight percent of the DF examiners found between 8-10 traces.
The Study reveals that DF examiners observe fewer traces when they possess contextual information purporting suspect innocence. The information also works in guilt contexts where more traces are observed by the DF examiners when they believe the suspect is guilty. Regarding reliability, low consistency was found on all examined levels: observation of traces, interpretation of observed traces, and conclusions. This low reliability implies detrimental implications for factual and technical evidence reporting, as well as the reporting of opinion evidence. Dror and Surnde conclude that there is a “serious and urgent need for quality assurance” in digital forensic examinations.
Sources: sciencedirect.com, forensicmag.com
As a digital subscriber to Criminal Legal News, you can access full text and downloads for this and other premium content.
Already a subscriber? Login