Skip navigation
The Habeas Citebook: Prosecutorial Misconduct - Header

The Evolving Science, Skepticism, and Limited Evidentiary Value of Firearm and Toolmark Identification

by Douglas Ankney

In People v. Kirschke, 53 Cal.App.3d 405 (1975), a firearm and toolmark identification (“FTI”) expert testified for the prosecution “that an evidence bullet had been fired by a particular firearm and that ‘no other weapon in the world was the murder weapon.’” But in post-conviction proceedings, court-appointed experts stated that a positive identification could not be made. The court found that the expert had “negligently presented false demonstrative evidence in support of his ballistics testimony.” Paul C. Giannelli, “Daubert Challenges to Firearms Identifications,” Case Western School of Law (2007) (“Giannelli’s Report”).

Then in 2006, departing from almost a century of judicial precedent, the U.S. District Court for the Southern District of New York limited the “expert” testimony of an FTI analyst by refusing to permit the expert to testify that, “to a reasonable degree of ballistic certainty,” a bullet and shell casings recovered at a crime scene came from firearms linked to the defendant. United States v. Glynn, 578 F. Supp. 2d 567 (S.D.N.Y. 2006). District Judge Jed S. Rakoff observed in Glynn that whatever else ballistics identification analysis could be called, it cannot fairly be called science. See Id.

The following article will: (1) discuss the history of FTI, (2) explain the general methods and practices of FTI, (3) briefly examine the judiciary’s almost blind acceptance of FTI expert testimony, (4) review the criticisms of FTI that call into question its continued scientific validity, (5) review the leading studies of FTI that were designed to rebuff those criticisms and confirm the scientific validity of FTI, and (6) examine the attempts to make FTI legitimate and reliable.

Part 1: The History of FTI

FTI, like all forensic sciences, did not develop as part of university research using the scientific method to discover truth about the natural world. Forensic science developed under the auspices of law enforcement in both investigating and prosecuting crime.

FTI is commonly referred to as “ballistics,” but this is a misnomer. “Ballistics is the study of the motion of a projectile. Interior ballistics concerns the study of the projectile within the firearm; exterior ballistics concerns the study of the projectile after it leaves the firearm; and terminal (wound) ballistics concerns the study of the effects of the projectile on a target. Giannelli’s Report.

On the other hand, the Association of Firearm and Toolmark Examiners (“AFTE”) (FTI’s largest professional organization) defines FTI as: “Toolmark Identification is a discipline of forensic science which has as its primary concern to determine if a toolmark was produced by a particular tool. Firearm Identification is a subcategory of toolmark identification, which has as its primary concern to determine if a bullet, cartridge case, or other ammunition component was fired by a particular firearm.”

According to Joe Nickell and John F. Fischer’s Crime Science: Methods of Forensic Detection (1999) (“Crime Science”), the first recorded instance of a person attempting to match a gunshot to the person who fired it occurred in Lancashire, England, in 1794. The paper wadding that had been tamped down around the lead ball and gun powder in the barrel had lodged inside the victim along with the lead ball. The wadding was a piece torn from a street ballad. When the suspect was arrested, found in his pocket was the remainder of the ballad which the piece from the wound matched exactly. He was convicted and sentenced to death.

However, it was not until 1835 that a bullet removed from a victim’s body (as opposed to wadding) was linked to a suspect in a crime. Crime Science. Henry Goddard, an assistant to a magistrate, observed a ridge-like blemish on a bullet removed from a murder victim. Upon observing a bullet mold at the home of the suspect that had a corresponding gouge at the same location of the bullet, Goddard confronted the suspect, and he confessed to the murder. Id.

In 1898, German chemist Paul Jeserich was the first to microscopically compare a bullet that had been removed from a murder victim with another bullet test fired from the suspected murder weapon. Id. Jeserich testified that based upon the agreement of the two bullets’ markings, the fatal bullet was fired from the defendant’s gun.

 But the “founder” of modern FTI is Calvin Goddard (no apparent relation to Henry Goddard). Crime Science. In 1927, Goddard examined the cartridge cases and bullets that were used as evidence in the convictions of Nicola Sacco and Bartolomeo Vanzetti.

Years earlier, Sacco and Vanzetti had been convicted of murdering two men during a robbery and were sentenced to death. The pair had been arrested days after the crime, and at the time of the arrest, Sacco had a .32 caliber pistol in his pocket. A total of six .32 caliber bullets had been recovered from the murder victims. At trial, the prosecution experts testified one of the fatal bullets was fired from Sacco’s pistol, but the defense experts testified that the bullet could not have been fired from Sacco’s gun. After the duo was convicted, there was a worldwide outcry with protests in Moscow, Paris, London, and major cities in Brazil. The British Labour Party, the German Reichstag, and the French Chamber of Deputies issued calls for their release.

It was widely believed that the two men, who were both impoverished immigrants and anarchists, were victims of the perverted justice of elite capitalists. Massachusetts Governor A.T. Fuller, bowing to international pressure, appointed a commission to review the facts of the case.

To this commission, Goddard volunteered his services. Using a comparison microscope, Goddard conducted comparisons of the cartridge cases and bullets from the crime scene with cartridge cases and a bullet test fired from Sacco’s .32 pistol. Goddard explained that one of the shell casings and one of the fatal bullets recovered from the crime scene contained markings that matched the bullet and shell casing test fired from Sacco’s pistol.

On August 23, 1927, Sacco and Vanzetti were executed in the Massachusetts electric chair. Vanzetti forgave “those who were executing an innocent man.” Crime Science.

Almost 50 years later in 1977, then Massachusetts Governor Michael Dukakis, in an act that casts doubt on Goddard’s conclusions, issued a proclamation exonerating Sacco and Vanzetti of any shame or disgrace attached to their names because their trial had been unfair. While Dukakis did not exonerate the men of their guilt (saying their guilt or innocence could not now be known), he stated that there was no question that the judge conducted the trial in a biased manner, and if held today, the verdicts would undoubtedly be reversed by the Massachusetts Supreme Court.

Goddard also used his comparison microscope and methodology in the investigation of the St. Valentine’s Day Massacre in Chicago in 1929. On that date, two men wearing police uniforms entered a garage with guns drawn. Placing members of Bugs Moran’s bootlegging gang along a wall, the two in police uniforms watched as two other men shot Moran’s men to death with .45 caliber Thompson submachine guns. Crime Science. Chicago police called upon Goddard to compare the .45 caliber Thompson submachine guns used by the Chicago Police Department with the cartridge casings and bullets recovered from the crime scene to assure the public that the police did not murder those men. According to Goddard, none of the Police Department’s submachine guns were used in the crime. Ten months later, Goddard compared cartridge casings and bullets test fired from submachine guns discovered in the home of one of Al Capone’s men with the crime-scene evidence and testified at the man’s trial that those guns were used in the massacre.

“Goddard’s success was rewarded by two wealthy businessmen who had served on the coroner’s jury and were so impressed that they financed Goddard’s own Scientific Crime Detection Laboratory at Northwestern University. He later helped the FBI set up its firearms section when its Criminological Laboratory, as it was then known, was opened in 1932. Its very first piece of laboratory equipment was a comparison microscope.” Crime Science.

Part 2: Common FTI Methods and Practices

As will be fully explained in Part 4, in the discipline of FTI, no uniform or standard protocol is observed by all forensic science laboratories or analysts. But the Assistant General Counsel for the FBI’s Forensic Laboratory at Quantico, Virginia, Colonel (Ret.) James R. Agar, II reports the following:

“Contemporary firearms examinations closely follow the methodology Calvin Goddard pioneered nearly a century ago. During its investigation of President Kennedy’s assassination, the Warren Commission described the fundamental principles of firearm identification as follows:

A cartridge, or round of ammunition, is composed of a primer, a cartridge case, powder, and a bullet. The primer, a metal cup containing a detonable mixture, fits into the base of the cartridge case, which is loaded with powder. The bullet, which usually consists of lead or of a lead core encased in a higher strength metal jacket, fits into the neck of the cartridge case. To fire the bullet, the cartridge is placed in the chamber of a firearm, immediately behind the firearm’s barrel. The base of the cartridge rests against a solid support called the breech face or, in the case of a bolt-operated weapon, the bolt face. When the trigger is pulled, a firing pin strikes a swift, hard blow to the primer, detonating the priming mixture. The flames from the resulting explosion ignite the powder, causing a rapid combustion whose force propels the bullet forward through the barrel.

The barrels of modern firearms are ‘rifled,’ that is, several spiral grooves are cut into the barrel from end to end. The purpose of the rifling is to set the bullet spinning around its axis, giving it a stability in flight that it would otherwise lack. The weapons of a given make and model are alike in their rifling characteristics; that is, number of grooves, number of lands (the raised portion of the barrel between the grooves) and twist of the rifling. When a bullet is fired through a barrel, it is engraved with those rifling characteristics.

In addition to rifling characteristics, every weapon bears distinctive microscopic characteristics on its components, including its barrel, firing pin, and breech face. While a weapon’s rifling characteristics are common to all other weapons of its make and model (and sometimes even to weapons of a different make or model), a weapon’s microscopic characteristics are distinctive, and differ from those of every other weapon, regardless of make and model. Such markings are initially caused during manufacture, since the action of manufacturing tools differs microscopically from weapon to weapon, and since tools change microscopically while being operated. As a weapon is used, further distinctive microscopic markings are introduced by the effects of wear, fouling, and cleaning.

When a cartridge is fired, the microscopic characteristics of the weapon’s barrel are engraved into the bullet (along with its rifling characteristics), and the microscopic characteristics of the firing pin and breech face are engraved into the base of the cartridge case. By virtue of these microscopic markings, an expert can frequently match a bullet or cartridge case to the weapon in which it was fired. To make such an identification, the expert compares the suspect bullet or cartridge case under a comparison microscope, side by side with a test bullet or cartridge case which has been fired in the weapon, to determine whether the pattern of the markings in the test and suspect items are sufficiently similar to show that they were fired in the same weapon.” Colonel (Ret.) James R. Agar, II, “The Admissibility of Firearms and Toolmarks Expert Testimony in the Shadow of PCAST,” 74:1 Bay. L. Rev. 93 (2022) (“Agar’s Report”).

The characteristics of the bullets and cartridge cases are typically identified as “class characteristics,” “individual characteristics,” or “subclass characteristics.” Agar’s Report. Class characteristics include the caliber of the bullet or cartridge case and their composition materials, the firing pin impression; general rifling characteristics (the number of lands and whether right- or left-hand twist); breech-face marks; manufacturer identification; headstamp; bullet weight; and priming material. Id. Class characteristics are the design factors that were determined prior to manufacturing. Class characteristics are useful in eliminating a bullet or cartridge case as being fired from a particular firearm or in restricting the pool of potential firearms that could have fired a bullet or cartridge case (e.g., if a fatal bullet is a .45 caliber, then the .22 caliber pistol in the suspect’s pocket could not have fired it). Id. Based on class characteristics, FTI analysts cannot identify an evidentiary cartridge case or bullet as coming from any particular firearm. Id.

Individual characteristics are marks that FTI analysts consider unique to an individual tool or firearm. These marks include random imperfections and irregularities during manufacturing. These individual characteristics are also caused by use of the firearm, cleaning, and/or corrosion. Id.

Between class characteristics and individual characteristics are the subclass characteristics. These are marks that may be found on a few dozen or even a few hundred firearms of the same make and model that occurred during an irregularity in manufacturing, such as when a machining tool is out of alignment or is chipped. Id.

If the FTI analyst or expert determines that the class characteristics of the bullet(s) in evidence are compatible with the suspected firearm, the expert will fire a test bullet from the firearm into boxes of cotton waste or a recovery tank filled with water. Crime Science. An evidence bullet and test bullet are then examined simultaneously side by side beneath a comparison microscope as follows:

“After the two bullets are mounted, the usual practice is for the examiner to scrutinize the entire surface of the rotating bullets at relatively low magnifications for the purpose of locating on one of the bullets the most prominent group of striations. [Writer’s note: striations are slight or narrow furrows, ridges, stripes, or streaks usually in a parallel arrangement.] Once such marks are located, say on the evidence bullet, that bullet is permitted to remain stationary. Then the examiner rotates the other, or test, bullet in an attempt to find a corresponding area with individual characteristics that match those on the evidence bullet. If what appears to be a match is located, the examiner rotates both bullets simultaneously to determine whether or not similar coincidences exist on other portions of the bullets. Upon finding corresponding marks on other portions, while having the bullets in the same relative positions as when the first matches were observed, the examiner proceeds with further examinations of the same nature at higher magnifications. A careful study of all the detail on both bullets ultimately permits him to conclude that both bullets were or were not fired through the same barrel.” Id.

But “[e]ven if bullets were fired in succession from the same weapon, not all individual characteristics would be identical. There would be some striations caused by powder residues, rust, corrosion and pitting, sand or dirt, and other surface factors or ‘fugitive’ materials which of course are not likely to be duplicated on all bullets through that particular barrel. Moreover, there might be other striations on the bullets which would have no relationship to the interior of the barrel through which they were fired. For instance, there might be marks on metal-cased bullets due to imperfections on the interior of the sizing die used in the fabrication of the bullet.
Likewise, fired bullets might contain crimp or burr impressions left there by the mouth of the cartridge case or shell. Obviously, the presence or absence of such marks, whether duplicated or not, must be discounted by the firearms identification technician.” Id.

Shell or cartridge case identification “is based on certain markings left on the case by the firearm’s mechanisms. Most of the markings are found on the base, or closed end, of the case, the end where the primer is located, and they are studied and compared in juxtaposition with the comparison microscope. Firing pin indentations are produced when the firing pin is struck by the hammer and forced into the primer, leaving a crater. Breech face markings are caused by burning gases inside the casing forcing the cartridge back against the weapon’s breech face. Any striations on the breech face are recorded on the shell. In semiautomatic and automatic firearms, both extractor markings and ejector markings are left by the respective mechanisms on the rim of the shell case. Also, in semiautomatic pistols, the magazine may leave marks on the side of the cartridge. And, depending on the firearm, certain additional markings may be imparted to the shell case as the result of some particular mechanism.” Id.

Again, all of the markings on two cartridge cases from two successive firings of one firearm will not match. Possible causes of these differences include the position of the cartridge in the magazine; the difference in the amount of force from the gases of the fired cartridges forcing the cartridge case onto the breech face; and markings on the case from striking the pavement or other objects after ejection from the firearm. “Regardless, the task of the firearms and toolmark examiner is to identify the individual characteristics of microscopic toolmarks apart from class and subclass characteristics and then to assess the extent of agreement in individual characteristics in the two sets of toolmarks to permit the identification of an individual tool or firearm.” Agar’s Report.

It must be emphasized that FTI analysts or examiners do not follow a uniform protocol. However, the Federal Bureau of Investigation (“FBI”) permits FTI examiners to reach one of three conclusions or opinions: (1) Source Identification, (2) Source Exclusion, or (3) Inconclusive. Id. A Source Identification is defined as: “[A]n examiner’s conclusion that two toolmarks originated from the same source. This conclusion is an examiner’s opinion that all observed class characteristics are in agreement and the quality and quantity of corresponding individual characteristics is such that the examiner would not expect to find that same combination of individual characteristics repeated in another source and has found insufficient disagreement of individual characteristics to conclude they originated from different sources.

The basis for a ‘source identification’ conclusion is an examiner’s opinion that the class characteristics and corresponding individual characteristics provide extremely strong support for the proposition that the two toolmarks originated from the same source and extremely weak support for the proposition that the two toolmarks originated from different sources.” Id.

Source Exclusion is defined as “the examiner’s opinion that two bullets or cartridge cases did not come from the same source or firearm.” Id.

Inconclusive permits the FTI examiner to opine that his or her examination or comparison is inconclusive because, “while the observed class characteristics agree, there is insufficient quality and/or quantity of corresponding individual characteristics that the examiner is unable to identify or exclude the two toolmarks as having originated from the same source.” Id.

In similar fashion, the AFTE’s Theory of Identification is:

“1. The theory of identification as it pertains to toolmarks enables opinions of common origin to be made when the unique surface contours of two toolmarks are in sufficient agreement.

2. This sufficient agreement is related to the significant duplication of random toolmarks as evidenced by the correspondence of a pattern or combination of patterns of surface contours. Significance is determined by the comparative examination of two or more sets of surface contour patterns comprised of individual peaks, ridges, and furrows. Specifically, the relative height or depth, width, curvature, and spatial relationship of the individual peaks. Ridges and furrows within one set of surface contours are defined and compared to the corresponding features in the second set of contours. Agreement is significant when it exceeds the best agreement demonstrated between two toolmarks known to have been produced by the same tool. The statement that sufficient agreement exists between two toolmarks means that the likelihood another too could have made the mark can be considered a practical impossibility.

3. The current interpretation of individualization/identification is subjective in nature, founded on scientific principles and based on the examiner’s training and experience.”

Part 3: The Judiciary’s Historical Acceptance of FTI Evidence

In People v. Berkman, 139 N.E. 91 (Ill. 1923), the Supreme Court of Illinois opined that the positive identification of a bullet was not only impossible but “preposterous.” Yet in just seven years, that same court became one of the first in the United States to admit firearms identification evidence. People v. Fisher, 172 N.E. 743 (Ill. 1930). And the technique rapidly gained widespread judicial acceptance. As District Judge Rakoff wrote: “By way of general background, for many decades ballistics testimony was accepted almost without question in most federal courts in the United States.” Glynn. The vast majority of reported opinions in criminal cases revealed that trial judges rarely excluded expert testimony, and reported appellate opinions revealed that challenges to the admission of expert testimony were seldom successful.

For the first 50 years, the admissibility of expert FTI testimony was governed by Frye v. United States, 293 F. 1013 (D.C. Cir. 1923). Agar’s Report. Admission of expert testimony under Frye required a scientific principle or discovery to be “sufficiently established to have gained general acceptance in the particular field in which it belongs.” Frye. FTI experts encountered little difficulty in passing the Frye test as the courts looked to Goddard’s earlier cases as the blueprint to evaluate the FTI discipline and the testimony of purported FTI experts. Agar’s Report.

In 1975, the Federal Rules of Evidence (“FRE”) were adopted by the federal courts. FRE 702 addresses the admissibility of expert opinion testimony. Amended several times thereafter, the current FRE 702 reads:

“A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:
(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;
(b) the testimony is based on sufficient facts or data;
(c) the testimony is the product of reliable principles and methods; and
(d) the expert has reliably applied the principles and methods to the facts of the case.”

Then in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), a unanimous U.S. Supreme Court determined that FRE 702 superseded the Frye test. Agar’s Report; see also Daubert. The Daubert Court instructed that federal judges have a “gatekeeping” role to ensure that admitted expert evidence is both relevant and reliable. Courts were long experienced with determining whether evidence is relevant, but determining whether evidence is reliable was another matter. Though, whether expert witness testimony is reliable is really the crux of expert witness testimony itself.

The Daubert opinion listed five non-exhaustive factors to guide courts in determining whether evidence is reliable. Factor (1) considers whether a scientific theory or technique can be (and has been) tested. Factor (2) asks “whether the theory or technique has been subjected to peer review and publication.” Factor (3) considers any “known or potential rate of error.” Factor (4) weighs “the existence and maintenance of standards controlling the technique’s operation.” And factor (5) evaluates the “general acceptance” within the “relevant scientific community.” The Daubert Court cautioned that the focus “must be solely on principles and methodology, not on the conclusions they generate” and reiterated that FRE 702’s reliability determination is a “flexible one.” 

    The U.S. Supreme Court later clarified that the federal courts’ “gatekeeping” function “applie[d] not only to testimony based on ‘scientific’ knowledge, but also to testimony based on ‘technical’ and ‘other specialized’ knowledge’ [and] a trial court may consider one or more of the more specific factors that Daubert mentioned when doing so will help determine that testimony’s reliability.” Kumho Tire Co., LTD. v. Carmichael, 526 U.S. 137 (1999).

The Kumho Tire Court emphasized that, irrespective of the Daubert factors, “the relevant reliability concerns may focus upon personal knowledge or experience.” Additionally, FRE 702 permits evidence that would have been inadmissible under Frye. Id.

In United States v. Hicks, 389 F.3d 514 (5th Cir. 2004), it was observed that “the matching of spent shell casings to the weapon that fired them has been a recognized method of ballistics testing in this circuit for decades.” Agar’s Report. And in United States v. Williams, 506 F.3d 151 (2d Cir. 2007), the Court found that, even in the absence of an admissibility hearing, the firearm expert’s testimony was admissible based on her education, training, and experience.

“A survey of reported opinions from U.S. district courts and state courts from 2000-2008 reveals many of the courts reviewed the admissibility of firearms identification testimony. One of these early cases was United States v. Santiago [199 F. Supp. 2d 101 (S.D.N.Y. 2002)], where the Southern District of New York opined expert testimony for firearms identification would be admissible even if such expertise was not from the ‘scientific community’ and ‘was based purely on experience.’ No pre-trial admissibility hearing was held in Santiago. Yet the trial court relied, in part, on the implicit endorsement of firearms expert witnesses by the U.S. Supreme Court in United States v. Schaeffer [523 U.S. 303 (1998)], where the Court upheld the exclusion of polygraph evidence at a court martial because a polygraph examiner was ‘unlike other expert witnesses who testify about factual matters outside the jurors’ knowledge, such as the analysis of fingerprints, ballistics, or DNA found at a crime scene....’” Agar’s Report.

Because the Daubert Court had cautioned that reliability determinations “must be solely on principles and methodology, not on the conclusions they generate,” the prosecution’s expert FTI witnesses were often able to testify to outlandish conclusions. The experts made “assertions that their matches are certain beyond all doubt, that the error rate of their methodology is ‘zero,’ and other such pretensions.” Glynn.

According to the 2016 report from the President’s Council of Advisors on Science on Science and Technology (“PCAST Report”), trial transcripts reveal the exaggerations of forensic experts testifying that their conclusions are “100% certain;” have “zero,” “essentially zero,” “vanishingly small,” “negligible,” “minimal,” or “microscopic” error rate; or have a chance of error so remote as to be a “practical impossibility.” Such statements are scientifically indefensible since all laboratory tests and feature-comparison analysis have error rates greater than zero. And yet, such unsupportable claims have been made in a myriad of criminal trials.

Undoubtedly, such factually inaccurate testimony contributes to miscarriages of justice. For example, at the trial of Patrick Pursley for a murder that occurred in Rockford, Illinois, on April 2, 1993, the State had no eyewitnesses, no confession, and no DNA or fingerprint evidence linking him to the crime. Undeterred by the lack of any legitimate evidence, the State built its case on the testimony of an FTI expert who testified that the bullets and cartridge casings recovered from the crime scene matched to a 9-millimeter Taurus firearm recovered from Pursley’s home “to the exclusion of all other firearms.” Pursley was convicted but maintained his innocence. In 2007, the Illinois postconviction forensic testing statute was amended, permitting comparisons of the test fired evidence and the crime scene evidence using digital images from the National Ballistics Identification Network (“NIBIN”). [Writer’s note: In the early 1990s, the FBI and the Bureau of Alcohol, Tobacco, Firearms, and Explosives (“ATF”) developed separate databases of images of bullets and cartridge cases that could be queried for potential matches. The National Institute of Standards and Technology (“NIST”) integrated these databases, and it is now the NIBIN maintained by the ATF.]

Ultimately, based on these NIBIN images and re-examination of the trial evidence, two independent experts concluded that neither the cartridge cases nor the bullets from the crime scene came from the firearm recovered from Pursely’s home. In January 2019, Pursley was re-tried and acquitted. He served nearly 24 years in prison for a murder he did not commit due to the factually indefensible and thoroughly exaggerated testimony of a so-called FTI expert.

Part 4: Questioning FTI’s Scientific Validity

In 2008, the National Academy of Sciences National Research Council (“NRC”) published a landmark report titled “Ballistic Imaging.” Agar’s Report. In Ballistic Imaging, the NRC commissioned a review to assess the feasibility, accuracy, and technical capability of a national ballistics database to criminal investigations. In concluding that a national ballistics image database was not feasible at that time, the NRC Committee found that the “validity of the fundamental assumptions of uniqueness and reproducibility of firearms-related toolmarks has not yet been fully demonstrated.” Id. Also, the Ballistic Imaging report was careful to “note that the committee does not provide an overall assessment of firearms identification as a discipline nor does it advise on the admissibility of firearms-related toolmarks evidence in legal proceedings: these topics are not within its charge.” Id.

In 2009, the NRC published the congressionally mandated study of forensic science in a report entitled “Strengthening Forensic Science in the United States: A Path Forward” (“NRC Report”). The NRC Report reviewed a broad spectrum of forensic science and criticized several forensic disciplines, including FTI: “With the exception of nuclear DNA analysis, however, no forensic method has been rigorously shown to have the capacity to consistently, and with a high degree of certainty, demonstrate a connection between evidence and a specific individual source.” The NRC Report made clear that “problems, irregularities, and miscarriages of justice [could not] simply be attributed to a handful of rogue analysts or underperforming laboratories but [were] systemic and pervasive.” PCAST Report. With regard to FTI, the NRC Report observed: “Knowing the extent of agreement in marks made by different tools, and the extent of variation in marks made by the same tool, is a challenging task. AFTE standards acknowledge that these decisions involve subjective qualitative judgments by examiners and that the accuracy of examiners’ assessments is highly dependent on their skill and training. In earlier years, toolmark examiners relied on their past casework to provide a foundation for distinguishing between individual, class, and subclass characteristics. More recently, extensive training programs using known samples have expanded the knowledge base of examiners.”

The NRC Report further observed that “[m]uch forensic evidence – including, for example, bite marks and firearm and toolmark identifications – is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing to explain the limits of the discipline.” Janis C. Puracal and Aliza B. Kaplan, “Science in the Courtroom: Challenging Faulty Forensics,” The Champion (Jan./Feb. 2020) (“Puracal and Kaplan’s Report”).

The NRC Report revealed “[t]here is no uniformity in the certification of forensic practitioners, or in the accreditation of crime laboratories. Indeed, most jurisdictions do not require forensic practitioners to be certified, and most forensic disciplines have no mandatory certification programs. Moreover, accreditation of crime laboratories is not required in most jurisdictions. Often there are no standard protocols governing forensic practice in a given discipline. And, even when protocols are in place ... they often are vague and not enforced in any meaningful way.” Jessica D. Gabel, “Realizing Reliability in Forensic Science from the Ground Up,” 104 J. Crim. L. & Criminology 283 (2014) (“Gabel’s Report”). The NRC Report concluded that the problems with forensic evidence could “only be addressed by a national commitment to overhaul the current structure that support’s the forensic science community in this country.” Gabel’s Report.

Most critical of FTI, the NRC Report explained: “Because not enough is known about the variabilities among individual tools and guns, we are not able to specify how many points of similarity are necessary for a given level of confidence in the result. Sufficient studies have not been done to understand the reliability and repeatability of the methods. The committee agrees that class characteristics are helpful in narrowing the pool of tools that may have left a distinctive mark. Individual patterns from manufacture or from wear might, in some cases, be distinctive enough to suggest one particular source, but additional studies should be performed to make the process of individualization more precise and repeatable.

A fundamental problem with toolmark and firearm analysis is the lack of a precisely defined process. As noted above, AFTE has adopted a theory of identification, but it does not provide a specific protocol. It says that an examiner may offer an opinion that a specific tool or firearm was the source of a specific set of toolmarks or a bullet striation pattern when ‘sufficient agreement’ exists in the pattern of two sets of marks. It defines agreement as significant ‘when it exceeds the best agreement demonstrated between tool marks known to have been produced by different tools and is consistent with the agreement demonstrated by tool marks known to have been produced by the same tool.’ The meaning of ‘exceeds the best agreement’ and ‘consistent with’ are not specified, and the examiner is expected to draw on his or her own experience. This AFTE document, which is the best guidance available for the field of toolmark identification, does not even consider, let alone address, questions regarding variability, reliability, repeatability, or number of correlations needed to achieve a given degree of confidence.”

Since publication of the NRC Report, many studies were undertaken and much literature was produced that exposed additional flaws of FTI and called the discipline’s validity and evidentiary value into question. As is noted in Kenneth S. Broun, “McCormick, Evidence,” 6th ed. (2006) §210: “[A]ny expert giving any opinion on whether the scientific test identifies the defendant as being the person who left the incriminating trace, such as a ... bullet ... necessarily bases this conclusion on an understanding or impression of how similar the items being compared are and how common it is to find items with these similarities. If these beliefs have any basis in fact, it is to be found in the general experience of the criminalists or more exacting statistical studies of these matters.” Giannelli’s Report. FTI falls into the former category since it is based on the experience of the examiners and not on statistical studies. And it is the reliance on this experience that critics question. Id.

Even when marks on two or more casings are the same, it does not mean the casings came from the same gun, and when marks on the casings are different, it does not mean they came from different guns. Id. In one study, 52% of matching striations were observed in samples known to not be from the same firearm, and a maximum of only 86% of matching striations were observed in samples known to be from the same firearm. William A. Tobin and Peter J. Blau, “Hypothesis Testing of the Critical Underlying Premise of Discernible Uniqueness in Firearms-Toolmarks Forensic Practice,” 53 Jurimetrics J. 122 (2013) (“Tobin and Blau”).

Compounding the problem is the fact that there is no objective standard in determining whether a mark is a class characteristic, subclass characteristic, an individual characteristic, or even an accidental mark. The case of United States v. Green, 405 F. Supp. 2d 104 (D. Mass. 2005), highlights this critical flaw. In Green, the FTI expert testified that shell casings recovered from the crime scene came from a specific .380 caliber Hi Point pistol linked to the defendants and that the “match” was made “to the exclusion of every other firearm in the world.” Yet, when questioned at a Daubert hearing about a marking unique to one cartridge case that was an “upside down checkmark” in appearance, the FTI expert stated he did not know what it was, did not know what caused it, and did not attach any significance to it. The expert testified the mark was accidental in nature, whether caused by the manufacturer, the primer, or scratched prior to being placed in the gun.

When asked how he knew that, the expert answered, “I do not know that.” He testified that his decision to ignore the mark and still call the casings a match was not based on any studies or database but was based solely on his opinion, yet remarkably, he apparently believed he was justified in declaring that the gun in question was a “match” to the crime scene shell casings “to the exclusion of every other firearm in the world.”

David L. Faigman, Chancellor and Dean at the University of California, Hastings, College of Law, reports that in one study (identified as Ames II), FTI examiners were tested as to proficiency. Faigman, et al., “The Field of Firearms Forensics Is Flawed,” Scientific American (May 25, 2022). The examiners reported false positives and “inconclusive” 52 percent of the time. Faigman counted “inconclusive” as an error because the examiners knew beforehand that the bullets either did or did not come from the firearm. He compared the Ames II study to a “true\false exam” where “I don’t know” or “inconclusive” is not an option.

Most disturbingly, when the same items were later sent to the same Ames II participants for reevaluation, the examiners reached the same conclusions only two thirds of the time. Id. When different examiners compared the same bullets, their conclusions agreed with the first examiners’ conclusions less than one third of the time. Id.

Many critics compare FTI with the now defunct (perhaps “debunked”) forensic practice of comparative bullet lead analysis (“CBLA”). Tobin and Blau. For almost 40 years, courts accepted CBLA analysis and the corresponding experts’ opinions that, since the material composition of a crime-scene bullet matched the material composition of the bullets in a box of bullets recovered from the suspect’s home, the crime-scene bullet must have come from that box.

CBLA experts explained in great detail the sophisticated nature of their analytical instrumentation and their ability to measure compositional constituents to the parts per million. But it was all brought to naught when, in 2005, researchers published studies showing that retail distribution of bullets showed no “uniqueness” of bullet composition from store to store or box to box. Tobin and Blau. The concentrations of indistinguishable product in local and regional areas demonstrated that consumers could not have purchased different product compositions even if they had deliberately attempted to do so. Consequently, even though the material composition of a bullet removed from a murder victim was identical with the composition of bullets recovered from the suspect’s home, it is not relevant evidence because every box of bullets of the same brand within that region has the same composition. Just as FTI examiners assume each firearm leaves discernible and unique marks, the CBLA examiners had assumed (quite errantly) the unique composition of the bullets in each box.

Tobin and Blau sum it up best. Identifying a bullet or cartridge casing as coming from a particular firearm requires discernment of marks unique to that firearm. Discernment of uniqueness requires: “(1) some criteria, indicia, or ‘parameters of detection’ for uniqueness, and (2) rules of application for those indicia to discern ‘same’ from ‘different.’ An exhaustive review of the domain literature reveals no such criteria. Thus, there is no apparent official or scientifically acceptable protocol for distinguishing ‘same’ from ‘different.’”

The criticism and exposure of the flawed nature of FTI and other forensic “sciences” prompted former President Barack Obama to ask PCAST “whether there are additional steps on the scientific side, beyond those already taken by the Administration in the aftermath of the highly critical 2009 National Research Council report on the state of the forensic sciences, that could help ensure the validity of forensic evidence used in the Nation’s legal system.” The PCAST Report was the published answer to that question.

Significantly, the PCAST Report observed that per Daubert, the legal standard of admissibility of FRE 702 requires evidence to be based on “reliable principles and methods” and that the expert in each case “reliably applied the principles and methods.” PCAST coined the term “foundational validity” to mean “the scientific standard” corresponding with FRE 702’s “reliable principles and methods,” and PCAST coined the term “validity as applied” to equate with the expert having “reliably applied the principles and methods.”

Because FTI is a feature-comparison method, it belongs to the discipline of metrology (the science of measurement and its application). PCAST Report. “For a metrological method to be scientifically valid and reliable, the procedures that comprise it must be shown, based on empirical studies, to be repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application.” Id.

PCAST defined “repeatable” as “with known probability, an examiner obtains the same result, when analyzing samples from the same sources.” It defined “reproducible” as “with known probability, different examiners obtain the same result, when analyzing the same samples.” And “accurate” was defined as “with known probabilities, an examiner obtains correct results both (1) for samples from the same sources (true positives) and (2) for samples from different sources (true negatives).” Finally, “reliability” was defined as “repeatability, reproducibility, and accuracy.”

“To meet the scientific criteria of foundational validity, two key elements are required:

(1) a reproducible and consistent procedure for (a) identifying features within evidence samples; (b) comparing features in two samples; and (c) determining, based on the similarity between the features in two samples, whether the samples should be declared to be a proposed identification (“matching rule”);

(2) empirical measurements, from multiple independent studies, of (a) the method’s false positive rate – that is, the probability it declares a proposed identification between samples that come from different sources and (b) the method’s sensitivity – that is, the probability that it declares a proposed identification between samples that actually come from the same source.” Id.

The accuracy of a forensic method can only be determined based on appropriate empirical testing. Id.

Feature-comparison methods may be classified as either objective or subjective. Id. Objective methods consist of procedures that are each defined with enough standardized and quantifiable detail that they can be performed either by an automated system or human examiners exercising little or no judgment. Subjective methods include key procedures that involve significant human judgment, e.g., which features to select within a pattern or how to determine whether the features are similar to be called a probable match. Id.

Because FTI is a subjective feature-comparison method, “black-box studies” are necessary to assess the method. A black-box study is defined as “an empirical study that assesses a subjective method by having examiners analyze samples and render opinions about the origin or similarity of samples.” Id. A black-box study is basically a proficiency test.

While “foundational validity” means that a method can, in principle, be reliable, “validity as applied” means that “the method has been reliably applied in practice.” Id. The key criteria for validity as applied are:

“(1) The forensic examiner must have been shown to be capable of reliably applying the method and must have actually done so. Demonstrating that an examiner is capable of reliably applying the method is crucial – especially for subjective methods, in which human judgment plays a central role. From a scientific standpoint the ability to apply a method reliably can be demonstrated only through empirical testing that measures how often the expert reaches the correct answer.... Determining whether an examiner has actually reliably applied the method requires that the procedures actually used in the case, the results obtained, and the laboratory notes be made available for scientific review by others.

(2) Assertions about the probability of the observed features occurring by chance must be scientifically valid. (a) The forensic examiner should report the overall false positive rate and sensitivity for the method established in the studies of foundational validity and should demonstrate that the samples used in the foundational studies are relevant to the facts of the case. (b) Where applicable, the examiner should report the random match probability based on the specific features observed in the case. (c) An expert should not make claims or implications that go beyond the empirical evidence and the applications of valid statistical principles to that evidence.” Id.

Perhaps the most critical statement in the PCAST Report regarding FTI is: “Firearms analysts have long stated that their discipline has near-perfect accuracy. In a 2009 article, the chief of the Firearms-Toolmarks Unit of the FBI Laboratory stated that ‘a qualified examiner will rarely, if ever, commit a false-positive error (misidentification),’ citing his review, in an affidavit, of empirical studies that showed virtually no errors. With respect to firearms analysis, the 2009 NRC report concluded that ‘sufficient studies have not been done to understand the reliability and reproducibility of the methods’ – that is, the foundational validity of the field had not been established.”

The PCAST Report acknowledged that beginning around 2001, a number of studies were undertaken in an attempt to estimate the accuracy of FTI examiners’ conclusions. But many of those “studies were not appropriate for assessing scientific validity and estimating reliability because they employed artificial designs that differ in important ways from problems faced in casework.” For example, bullets and casings were in pristine condition; whereas, in live work, these items are often misshapen, smashed, etc. In several studies, the bullets and casings were fired from consecutively manufactured barrels and slides.

Also, many of those empirical studies were “closed set,” i.e., the answer was always present within the test, meaning the examiner could use the process of elimination to arrive at the correct answer. The exact method of these studies is described in Part 5.

But to illustrate, suppose two shots are fired from each of 10 .38 pistols for a total of 20 bullets. The examiner is given 10 bullets – each one fired from a different one of the 10 guns – labeled “A” thru “L.” The examiner is given the other 10 labeled “1” thru “10.” The test requires the examiner to identify which two bullets came from each firearm. If the examiner is unable to identify any markings on bullet #2 but knows that bullet “F” has not yet been paired, the examiner – via process of elimination – will pair bullets “F” and “2” as coming from the same firearm but will not have arrived at that correct answer through comparison of the marks on the bullets. This defeats the entire point of the study, i.e., to test the validity of the method itself.

To accurately estimate FTI’s false positive rate and sensitivity rate, the empirical studies must be “black box, open set.” In that vein, the PCAST Committee analyzed four closed-set studies; one partly open-set study (“Miami-Dade Study”); and one open-set study (“Ames Laboratory Study”). Of these, only the Ames Laboratory Study was an appropriately designed study for measuring validity and estimating reliability.

The PCAST Report explained that empirical measurements of the false positive rate (“FPR”) and sensitivity (“SEN”) of forensic comparison methods “must be based on large collections of known and representative samples from each relevant population, so as to reflect how often a given feature or combination of features occurs.”

However, “since empirical measurements are based on a limited number of samples, SEN and FPR cannot be measured exactly, but only estimated. Because of the finite sample size, the maximum likelihood estimates do not tell the whole story. Rather, it is necessary and appropriate to quote confidence bounds within which SEN, and FPR, are likely to lie.” Id.

By convention, a confidence level of 95% is the most widely used. Consequently, when the frequency of false positives in a study is reported as “1.5 percent (upper 95 percent confidence interval 2.2 percent),” it means the FTI examiners participating in the study incorrectly identified a bullet or cartridge case as coming from a particular firearm 1.5% of the time – but since the number of firearms and the number of examiners in the study are but a fraction of the total number in the world, there is a 5% chance the actual frequency could be as high as 2.2%. These two figures, in turn, translate to estimated FPRs of 1 in 66 with an upper bound of 1 in 46 – meaning that based on this study it is estimated that the FTI examiners make a false “match” an average of one time in every 66 cases but there is a five percent chance they could be making false matches one time in every 46 cases.

Regarding the foundational validity of FTI, the PCAST Committee made these findings: “firearms analysis currently falls short of the criteria for foundational validity, because there is only a single appropriately designed study to measure validity and estimate reliability. The scientific criteria for foundational validity require more than one such study, to demonstrate reproducibility. Whether firearms analysis should be deemed admissible based on current evidence is a decision that belongs to the courts. If firearms analysis is allowed in court, the scientific criteria for validity as applied should be understood to require clearly reporting the error rates seen in appropriately designed black-box studies (estimated at 1 in 66, with a 95 percent confidence limit of 1 in 46, in the one such study to date).” (emphasis supplied)

With respect to validity as applied, PCAST observed: “If firearms analysis is allowed in court, validity as applied would, from a scientific standpoint, require that the expert: (1) has undergone rigorous proficiency testing on a large number of test problems to evaluate his or her capability and performance, and discloses the results of the proficiency testing; and (2) discloses whether, when performing the examination, he or she was aware of any other facts of the case that might influence the conclusion.”

As the flawed nature of FTI became known, some courts began gradually limiting the expert testimony of FTI examiners. In Green, the court refused to allow the prosecution’s expert to testify that the cartridge casings from the crime scenes came from a particular .380 Hi Point pistol “to the exclusion of every other firearm in the world.”

In Glynn, the prosecution’s expert wanted to testify that the evidence bullet and cartridge casings came from firearms linked to the defendant “to a reasonable degree of ballistic certainty.” But the court permitted the expert to state only that it was “more likely than not” that the bullet and casings came from those firearms.

In United States v. Taylor, 663 F. Supp. 2d 1170 (D.N.M. 2009), Judge William Johnson explained that the Government’s FTI expert, ATF expert Ronald G. Nichols, could “give to the jury his expert opinion that there is a match between the .30 caliber rifle recovered from the abandoned house and the bullet believed to have killed [the victim]. However, because of the limitations on the reliability of firearms identification evidence discussed above, Mr. Nichols will not be permitted to testify that his methodology allows him to reach this conclusion as a matter of scientific certainty. Mr. Nichols also will not be allowed to testify that he can conclude that there is a match to the exclusion, either practical or absolute, of all other guns. He may only testify that, in his opinion, the bullet came from the suspected rifle to within a reasonable degree of certainty in the firearms examination field.”

In United States v. Ashburn, 88 F. Supp. 3d 239 (E.D.N.Y. 2015), the Court refused to permit the FTI expert to say he was “100% certain” or that it was a “practical impossibility” that another firearm could have fired the items in evidence or that the identification was to “the exclusion of all other firearms in the world.” However, the Court in Ashburn permitted the expert to testify that his conclusions were to “a reasonable degree of ballistics certainty.”

In 2016, then-U.S. Attorney General Loretta Lynch ordered forensic examiners to stop using the terms “reasonable scientific certainty,” “reasonable [degree of firearms discipline] certainty,” or any words to that effect.

In United States v. Medley, 312 F. Supp. 3d (D. Md. 2018), the Court refused to permit the expert to use the words “identify” or “identification” and allowed the expert only to state the cartridge cases recovered from the crime scene were “consistent with” cartridge cases from the firearm linked to the defendant.

In Williams v. United States, 210 A.3d 734 (D.C. Ct. App. 2019), the Court held “it is plainly error to allow a firearms and toolmark examiner to unqualifiedly opine, based on pattern matching, that a specific bullet was fired by a specific gun.”

In United States v. Tibbs, 2019 WL 4359486 (D.C. Super. 2019), the Court limited the FTI expert to opining that “based on his examination and the consistency of the class characteristics and microscopic toolmarks, the recovered firearm cannot be excluded as the source of the cartridge case found on the scene of the alleged shooting – in other words, that the firearm may have fired the recovered casing. [The FTI expert] may not state an ultimate conclusion in stronger terms. Similarly, [the FTI expert] will be precluded at any point in his testimony from stating that individual marks are unique to a particular firearm or that observed individual characteristics can be used to ‘match’ a firearm to a piece of ballistics evidence.”

Regrettably, other courts continue to permit FTI experts to render exaggerated opinions with little or no restrictions thereby misleading juries. In United States v. Johnson, 875 F.3d 1265 (9th Cir. 2017), the Court affirmed a conviction where the FTI expert testified that a bullet from the crime scene “matched” the test bullet from a pistol found in the possession of the defendant. The Court acknowledged the criticism of the FTI discipline but seemingly dismissed it because the FTI expert had been cross examined by the defense; the expert did not testify he was “absolutely certain” in his testimony; the defense was free to call its own expert; and the defense could find “only one case” (Glynn) where a court would not permit an expert to testify as to a match.

In United States v. Gil, 68 F. App’x 11 (2d Cir. 2017) (unpublished), the Court affirmed a trial court’s decision to allow unrestricted FTI expert testimony, finding that the FTI discipline has an error rate “in the range of 1%,” which the trial court dismissed as “de minimis,” and concluded that “challenges to the admission of ballistics expert opinion are meritless.” That seems to be an incredible position to take in light of the scientific evidence, or lack thereof, regarding firearm pattern matching.

In Garrett v. Commonwealth, 534 S.W.3d 217 (Ky. 2017), the FTI examiner opined at trial that the pistol obtained from the defendant had fired the bullet recovered during the murder investigation. The Kentucky Supreme Court, unmoved by the NRC Report’s criticism of the AFTE Theory of Identification, concluded that the AFTE Theory of Identification satisfied Daubert, and the Court observed that: “The proper avenue for Garrett to address his concerns about the methodology and reliability of Collier’s testimony was through cross-examination, as well as through the testimony of his own expert. In this way, the jury was presented with both parties’ positions, and with any limitations to the testimony, and charged with weighing all the evidence presented.”

Part 5: Rebutting the Criticism

To put it mildly, “[t]he PCAST Report was not universally welcomed,” according to Puracal and Kaplan’s Report. To respond to the criticism of the PCAST Report, PCAST released “An Addendum to the PCAST Report on Forensic Science in Criminal Courts 2” (“Addendum”) in 2017. Id. For example, the U.S. Department of Justice (“DOJ”) attempted to discredit the PCAST Report by claiming the PCAST authors had failed to consider relevant research studies. The Addendum explained PCAST’s efforts to obtain additional studies and the DOJ’s response “that it had no additional studies for PCAST to consider.” And it appears that those affiliated with law enforcement and the prosecution of criminal cases have not slowed in their efforts to refute the science and to discredit the PCAST Report.

For example, Agar’s Report, published in 2022, states: “[T]he PCAST Report contains multiple problems that undermine the integrity of the report, rendering it an unreliable source – as a matter of science and law – to evaluate the firearms and toolmark discipline. These shortcomings include the makeup of persons who were affiliated with the PCAST Report, the use of terms and definitions alien to the firearms examination discipline or forensic science in general, and the use of arbitrary criteria to weigh the reliability of firearm analysis.”

Agar states that 38 people “researched, analyzed, drafted, and reviewed the PCAST Report on forensic science.” He then attempts to discredit these people for not possessing what Agar believes to be proper credentials, e.g., none were FTI examiners, none had ever been the director of a forensic laboratory, none had ever prepared an FTI report or testified as an FTI expert, none were AFTE members, only two had backgrounds in forensic science, none were affiliated with the DOJ, or with law enforcement or with prosecutors, etc. Agar’s Report.

Agar next faults the PCAST Report for identifying FTI as belonging to the discipline of science known as metrology, “the science that deals with measurement.” Agar quotes the DOJ: “Traditional forensic pattern examination methods – as currently practiced – do not belong to the scientific discipline of metrology. Forensic examiners visually compare the individual features observed in two examined samples, they do not measure [them.] The result of this comparison is a conclusion that is stated in words (nominal terms), not magnitudes (measurements).”

Agar also takes issue with the PCAST Committee coining the term “foundational validity.” According to Agar, the term is unscientific because the term is not found in any other scientific literature, is not found in FRE 702, and is different from the term “scientific validity” used by the U.S. Supreme Court.

Further, Agar argues that the requirements for a forensic method to achieve foundational validity – i.e., repeatability, reproducibility, and accuracy – are “rigid, dogmatic criteria.” Agar contends such requirements are “inapposite” to the U.S. Supreme Court’s statement that application of FRE 702 is a “flexible one.”

Agar contends that the PCAST Committee had no legal or scientific basis to support the requirement of “black box studies” that deliver a “reproducible and consistent procedure for ... identifying features within evidence samples” and derive “empirical measurements, from multiple independent studies.”

Agar also discredits the PCAST Report by claiming it has not been peer reviewed. He then cites no fewer than five organizations, viz., the American Society of Crime Laboratory Directors; the AFTE; the Organization of Scientific Area Committee (“OSAC”) Firearms and Toolmarks Subcommittee; the ATF; and the FBI, that had critiqued and criticized the PCAST Report’s assessment of FTI.

Agar also disputes the PCAST Report’s declaration that “[c]asework [alone] is not scientifically valid research, and experience alone cannot establish scientific validity.” According to Agar, that declaration is incorrect because FRE 702 permits a witness to be qualified as an expert based on experience alone.

Agar also argues that state and federal courts admitting FTI expert testimony post-PCAST Report discredit the report. According to Agar, since the publication of the PCAST Report, state courts in California, Connecticut, Delaware, Kentucky, Louisiana, Maryland, Mississippi, Missouri, Nebraska, New Jersey, New Mexico, North Carolina, Ohio, and Washington as well as federal district courts in Arizona, California, the District of Columbia, Nevada, New York, Oklahoma, and Virginia have admitted FTI expert testimony with few restrictions. (See Agar’s Report for supporting case citations from those courts)

Agar also attacks the credibility of the PCAST Report indirectly by arguing that the judges that have limited FTI expert testimony based on the PCAST Report have abused their judicial discretion. He alleges that by not permitting FTI examiners to testify that bullets or cartridge cases are “identified” as coming from a particular firearm, the judges are exceeding their authority under FRE 702. And by requiring the FTI examiners to opine only that a particular firearm “could not be excluded,” or “is consistent with,” or “more likely than not” is the firearm used in the crime, the judges are changing witness testimony, causing witnesses to commit perjury and so forth. Agar sums up his view of judges who limit FTI expert testimony by stating: “It is a lamentable day for science and the law when people in black robes attempt to substitute their opinions for those who wear white lab coats.”

Part 6: Improving the Reliability of the FTI Discipline and Expert Testimony

The implementation of corrective measures in response to the NRC and PCAST Reports has been lethargic at best. “In many respects, although [the NRC Report] could hardly be characterized as new information, the [NRC] Report laid forensic science’s shortcomings to bare and brought to the surface the weaknesses that have plagued forensic science for decades.” Gabel’s Report. Professor Gabel details the history of the abysmal state of forensic practice and of crime laboratories, beginning with “1967 when President Lyndon B. Johnson’s Commission on Law Enforcement and the Administration of Justice found that many police labs lacked both equipment and expertise.” Throughout the 1970s and 1980s, numerous grants to fund improvements were handed out that seldom achieved any success.

In the 1990s, DNA came to be the “gold standard” in law enforcement and forensic investigations. Consequently, even though the National Institute of Justice (“NIJ”) teamed up with the Office of Law Enforcement Standards to fund the “Forensic Summit: Roadmap to the Year 2000” to report persistent deficiencies in public crime labs and calling for greater standardization, increased research, and quality controls in all forensic disciplines, the lion’s share of federal funding allocated to crime labs for those improvements was tied to only DNA research.

In 2004, President George W. Bush spearheaded the formation of a new forensic science commission with the passage of the Consolidated Appropriations Act that “obligated the NIJ to provide Congress with a report on forensic science and medical examiner communities’ needs beyond DNA initiatives.” Also, the DNA Sexual Assault Justice Act of 2004 was passed that required the Attorney General to create a national forensic science commission that would, among other things, make recommendations and disseminate best practices to public crime labs. But the forensic science commission was never funded. As the NRC and PCAST Reports reveal, with regard to forensic science disciplines generally, and FTI specifically, very little had changed since the Johnson Administration.

But the news is not all bad. In addition to some conscientious judges preventing FTI experts from making greatly exaggerated and scientifically unsubstantiated claims, there have been other improvements. For example, as of 2014, 88% of America’s 409 publicly funded crime labs had been accredited by an independent and professional forensic science organization. Agar’s Report. In 2022, 83% of crime labs in the United States were accredited by one organization: ANSI-ASQ National Accreditation Board (“ANAB”). While ANAB offers accreditation in numerous forensic fields, 251 of its accredited labs are accredited in FTI. Id. The ANAB accreditation requires training of examiners, testimony monitoring, validation of procedures, and annual proficiency testing to determine whether FTI examiners perform to industry standards.

Collaborative Testing Services (“CTS”) is the dominant testing service. Twice a year, CTS provides a proficiency test, which requires an FTI examiner to compare four questioned bullets or cartridge cases with three known bullets and cartridge cases. Apparently, these are closed-set examinations because at least one or more of the four questioned items are a “match.” Id. The results of CTS FTI testing for 2018 and 2019 revealed that of the 1,191 tests given, 1,172 examiners returned the correct answers. Id.

The DOJ has also stepped up to the plate. For example, the FBI Handbook of Forensic Science of 1994 stated: “Firearms identification is the forensic science discipline that identifies a bullet, cartridge case or other ammunition component as having been fired by a particular firearm to the exclusion of all other firearms.” Giannelli’s Report. But the DOJ’s current Firearms Uniform Language of Testimony and Reporting (“ULTR”) prohibits FTI examiners from testifying that their source identification opinion “excludes all other firearms in the world.” Agar’s Report.

But perhaps most promising is the implementation and growing use of computer databases and comparison of bullets and cartridge cases that have been electronically scanned. In addition to the ATF’s NIBIN, in 2016, the NIST created the Ballistics Toolmark Research Database (“NBTRD”). The NBTRD’s database consists of images of bullets and cartridge cases that are scanned with a 3D high-resolution microscope. Researchers may both download images from the database and upload their own images. In this manner, the database continues to grow in size, providing more information for developing and validating algorithms that quantify the similarity between firearm toolmarks. As the number of bullets and cartridge cases available for comparison increases, the foundational uncertainty of “unique” or “individual” characteristics is addressed.

As of yet, none of the NBTRD’s data is used in actual casework, i.e., criminal investigations and prosecutions. NIST mechanical engineer Hans Soons explained that a similarity score between two pieces of evidence by itself is often meaningless. “A comparison score needs context,” he said, adding, “How does it compare with scores obtained when comparing samples fired from the same firearm versus scores obtained when comparing samples fired from a different firearm?” To address this concern, the NIST is collaborating with the FBI and the Netherlands Forensic Institute to develop the Reference Population Database of Firearm Toolmarks (“RPDFT”). The purpose of NBTRD and RPDFT is to eventually be able to give a statistical statement in court, confidently stating the likelihood that two samples came from the firearm with a high degree of certainty. But the process is going to take years because the RPDFT must be populated with a sufficient number of images to permit the expert to testify that he or she knows what a match and non-match looks like.

Additionally, NIST mechanical engineer John Song reported in Forensic Science International that a statistical approach for cartridge case comparisons had been developed that might enable numerical testimony. Song and his colleagues developed Congruent Matching Cells (“CMC”). Three-dimensional surface scans of the breech face impressions on cartridge cases are compared by an algorithm. CMC divides one of the scanned surfaces into a grid of cells and then searches the other surface for matching cells.

In their recent study, 135 cartridge cases fired from 21 different 9mm pistols were scanned, producing 433 matching image pairs and 4,812 non-matching pairs. The CMC algorithm correctly classified all the pairs. Importantly, it was observed that almost all the non- matching pairs had zero matching cells – only a handful had one or two due to random effects.    Conversely, all of the matching pairs had a minimum of 18 matching cells – meaning the matching and non-matching pairs fell into highly separated distributions based on the number of matching cells. Using this method, an FTI expert could testify about how closely two cartridge cases match based on the number of matching cells and the probability of a random match – like expert DNA forensic testimony. But for now, CMC lacks a large enough and diverse enough dataset of scanned images to calculate realistic error rates for use in actual casework.

The FTI discipline is controversial. The most recent evaluation is Agar’s Report. While Agar’s Report is informative and sheds light on the current state of FTI, his acerbic tone takes on the nature of a diatribe against the PCAST Report – calling into question as to whether his report is an unbiased review of current FTI facts and science.

For example, Agar begins his critique of the PCAST Report by attacking the messengers instead of the message. Seeking to discredit the PCAST Committee, he lists numerous qualifications that none of the members possessed – yet he in large measure fails to disclose the qualifications the members possess. Members included the President of the Broad Institute of Harvard and MIT; the President and CEO of Aerospace Corporation; a dean of a medical school; and professors of chemistry, science, technology, biochemistry, and electrical engineering from Princeton, Northwestern, the University of Texas at Austin, the University of Maryland, and elsewhere.

With these backgrounds, Agar claims the members are not qualified to evaluate the scientific validity of FTI because none of the members were experienced in preparing an FTI report and testifying as an FTI expert, directed a forensic lab, etc. This is tantamount to saying only an astrologer experienced with creating a birth chart and casting a horoscope is qualified to evaluate the scientific validity of astrology. While members of the PCAST Committee might not have the FTI background and experience for which Agar condemns them, they are undoubtedly eminently qualified to determine whether a discipline satisfies the scientific method and the principles of evaluating whether a discipline can properly be classified as scientific.

Moreover, the PCAST Report details the manner in which the members further educated themselves. Their study included an extensive literature review and was also informed by inputs from forensic researchers at the FBI’s Laboratory, NIST, and other forensic scientists.

Agar argues that the PCAST members were wrong in identifying FTI as belonging to metrology. According to Agar, FTI is not metrology because FTI examiners make comparisons and do not measure. Yet, the AFTE Theory of Identification clearly states: “Significance is determined by the comparative examination of two or more sets of surface contour patterns comprised of individual PEAKS, RIDGES, and FURROWS. Specifically, the relative HEIGHT or DEPTH, WIDTH, CURVATURE, and SPATIAL RELATIONSHIP of the individual PEAKS. RIDGES and FURROWS within one set of surface contours are DEFINED and compared to the corresponding features in the second set of contours.” (Emphasis added.) The very use of terms such as height, depth, width, curvature, spatial relationship, peaks, ridges, and furrows imply measurement. To say “the depth of the groove identified as Groove A on bullet #1 is comparatively equal to the depth of Groove A on bullet #2” is a visual measurement regardless of whether or not the depth of the grooves was actually measured in millimeters or micro millimeters, etc.

Agar’s criticism of the PCAST members coining the term “foundational validity” is shortsighted. He alleges that the term is unscientific because it is not found in any other scientific literature, nor is it found in the opinions of the U.S. Supreme Court. Using that standard, the term “genocide” is invalid because there is no instance of its use prior to the Nuremberg Trials after World War II.

Furthermore, the individual words “foundational” and “validity” are quite common and their meanings, taken together, are easily discerned. The foundation of FTI is the assumption that firearms leave unique marks on cartridge casings and bullets and those marks can be identified by FTI examiners to determine that the cartridge casings or bullets came from that particular firearm. The validity, or correctness, or genuineness of that assumption is not known. It may be correct, or it may be utter balderdash, which many suspect.

Likewise, Agar’s criticism of the PCAST members’ requirements of repeatability, reproducibility, and accuracy is foolish. Those are the requirements for testing the validity of any scientific assumption. One thing that is very troubling is the inability of the examiners in the Ames II study to reach the same conclusions when comparing the same items a second time. This indicates that in the first comparison, the examiners designated some marks or striations to be individual characteristics, but in the second examination, the examiners designated other marks as individual characteristics. This lack of objective criteria demonstrates FTI’s lack of repeatability.

The closed-set studies described by Agar are wholly inadequate for assessing FTI’s validity. First, as the PCAST Report observed, in the closed-set studies the examiners reported an inconclusive result 0.2% of the time, but in the open-set studies, the inclusive result jumped to a whopping 41.8% in the Miami-Dade Study and 33.7% in the Ames Laboratory Study. This indicates that in the closed-set studies, the examiners made the correct identifications but did so without using the proposed FTI method of identifying unique marks or “individual characteristics.”

Moreover, the closed-set studies were not representative of FTI practice in actual criminal prosecutions. The use of consecutively manufactured barrels is one example. While the theory behind using those barrels was to make the identifications more difficult, it may be the opposite is true. To illustrate, suppose the calibration of a manufacturing tool was slightly off, resulting in a barrel with a striation of an etched line with a slight downward tic curved at the right end. On the next barrel, the same striation occurs but the tic is slightly longer. The tic grows in length with each consecutive barrel until it begins to take on the shape of the letter “C.” An examiner could discern the sequence of the barrels’ manufacturing by observing the length and shape of the tic mark and use that information to pair the unknown bullet or cartridge case with the known.  

As for the Miami-Dade Study, the firearms employed EBIS barrels specifically designed to enable source identification. But few firearms actually recovered in criminal investigations and subjected to FTI examinations have EBIS barrels.

And of the one appropriate study – the Baldwin (Ames) Study – Agar failed to discuss the examiners’ latent desire to find a “match.” This is demonstrated by the examiners’ response of “inconclusive” when given kits that contained non-matching known and unknown samples. That is, instead of excluding the firearm that fired the known sample as being the firearm that discharged the unknown sample, the examiners reported “inconclusive” 735 times when examining the 2,180 non-matching samples – or 34% of the time. This could be indicative of FTI examiners being prone to make matches and hesitant to report exclusions. In addition, Agar’s argument that trial judges abuse their discretion when limiting the testimony of FTI examiners has one fatal flaw. No appellate court has found that those judges abused their discretion. 

Agar is not an unbiased author. He is the Assistant General Counsel for the FBI lab in Quantico, Virginia. He has been under fire for his remarks recorded in a handout from an online lecture. Agar instructed FTI examiners on how to circumvent judges’ restrictions on the examiners’ testimony and advises the examiners to inform the judges that any effort to restrict their testimony is tantamount to asking them to commit perjury. “Why a High-Ranking FBI Attorney is Pushing ‘Unbelievable’ Junk Science on Guns,” (Feb. 2022).

It seems Upton Sinclair could have been describing Agar and his pushback on valid criticisms of FTI’s lack of scientific rigor when he famously wrote, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

As it now stands, it appears the only opinion an FTI expert ought to be permitted to give to a factfinder in a criminal proceeding is that some of the markings on a bullet or cartridge case could possibly be unique to the particular firearm in evidence, and those marks are consistent with marks found on the crime-scene bullet or cartridge case. Anything more than that risks convicting the innocent and letting the guilty go free.

Agar has it wrong. It is a lamentable day for science and the criminal justice system when police conceal their uniforms beneath the pretense of wearing white lab coats while testifying at trial as unbiased, scientific experts.


Sources:;;; Paul C. Giannelli, “Daubert Challenges to Firearms Identifications,” Case Western School of Law (2007); Joe Nickell and John F. Fischer, Crime Science: Methods of Forensic Detection (1999); Agar, J., “The Admissibility of Firearms and Toolmarks Expert Testimony in the Shadow of PCAST,” 74:1 Bay. L. Rev. 93 (2022); Federal Rules of Evidence; Report of President’s Council of Advisors on Science on Science and Technology (2016); National Academy of Sciences National Research Council, “Ballistic Imaging” (2008); National Academy of Sciences National Research Council, “Strengthening Forensic Science in the United States: A Path Forward” (2009); Janis C. Puracal and Aliza B. Kaplan, “Science in the Courtroom: Challenging Faulty Forensics,” The Champion; Jessica D. Gabel, “Realizing Reliability in Forensic Science from the Ground Up,” 104 J. Crim. L. & Criminology 283 (2014); Kenneth S. Broun, “McCormick, Evidence,” 6th ed. (2006); William A. Tobin and Peter J. Blau, “Hypothesis Testing of the Critical Underlying Premise of Discernible Uniqueness in Firearms-Toolmarks Forensic Practice,” 53 Jurimetrics J. 122 (2013); Faigman, et al., “The Field of Firearms Forensics Is Flawed,” Scientific American (May 25, 2022);



The Habeas Citebook: Prosecutorial Misconduct Side
CLN Subscribe Now Ad
Prisoner Education Guide side