Denim, as a Crime-Solving Tool, Has Holes
By In 1997, Charles Barbee and three co-defendants were convicted of robbing two banks in Spokane, Wash., and setting off bombs in the office of a local newspaper and a Planned Parenthood clinic. One key piece of evidence from the trial was a security-camera photo that showed an alternating dark-and-light pattern along a seam of one of the robber’s bluejeans. Richard Vorder Bruegge, an F.B.I. forensic scientist, told the jury that the visual features of the jeans in the photograph, particularly the dark-and-light “bar code” pattern, matched a pair that had been seized from the house of one of the suspects: Charles Barbee.
The next year, Dr. Vorder Bruegge published a study on the Barbee case in the Journal of Forensic Sciences, which was used to set a legal precedent for how analysis of patterns in photographs could be used as evidence. Analysis of visual elements in photographs, such as facial markings, design features on clothing and jeans bar codes, is used in hundreds of cases a year, F.B.I. officials have said.
But a recent study published in the Proceedings of the National Academy of Sciences raises questions about the trustworthiness of matching jeans by their patterns of wear.
“Even under ideal conditions, trying to get an exact match is difficult,” said Hany Farid, a computer scientist at the University of California, Berkeley, and the senior author of the study. “This technique should be used with extreme caution, if at all.”
Dr. Farid has spent most of his career studying the forensics of digital images, and has testified in court about whether images were digitally altered. After reading an investigation by Ryan Gabrielson of ProPublica last year, he was inspired to look into photo analysis techniques used by the F.B.I.
Much of the scientific heft undergirding those techniques stemmed from the one study on jeans bar codes, Mr. Gabrielson wrote. Dr. Farid set out to test the technique.
He and Sophie Nightingale, a postdoctoral researcher, bought 100 pairs of jeans from thrift stores in Berkeley and took a photo of each long, vertical seam. They also had 111 workers, found through the crowdsourcing site Amazon Mechanical Turk, send in similar pictures of their own jeans. These images would be used to measure the range of differences between different jeans.
To simulate the kind of variation seen in real-world images of the same jeans, they chose 10 pairs whose seams all had pronounced dark-and-light patterns and took 10 photos of each seam under varied conditions: in different rooms in their lab, with different lighting, using different cameras and placing the jeans on different surfaces.
Dr. Farid and Dr. Nightingale plotted each dark-and-light pattern on a line graph; the light portions of the seam were represented by peaks, and the dark portions were represented by valleys. They then sought to compare the graphs to each other. Ideally, this comparison would show that two images of the same seam are much more similar than two images of different seams. This, in turn, would support the idea that the bar code for each seam is truly unique, and that a photo reliably captures that uniqueness.
To make the comparison easier, they adapted a mathematical tool that neuroscientists use to measure the similarity between different “spike trains,” a phenomenon in which brain cells are mostly silent, then fire suddenly. Dr. Farid and Dr. Nightingale transformed the jeans graphs to look more like spike trains, with narrow, pointy peaks and valleys, and then used the spike-train tool to compare them.
The data showed that two images of the same seam often looked quite different — so much so that it was often impossible to tell whether a pair of images were of the same seam or different ones. Much of the problem, the researchers concluded, comes down to the fact that cloth is flexible: it stretches, folds and drapes in complicated ways, which changes how it looks in photos.
The lack of distinctiveness in images of seams significantly limits the accuracy of jeans identification, according to the study. The algorithm made a significant number of false matches between different pairs of jeans.
The authors found that if they made the algorithm more discriminating, limiting the odds of making a false match to one in a million — 0.0001 percent — then the chances of making a correct match were only about 20 percent. The rest of the time, the algorithm would not make any match. If they were less picky about accuracy, they could obtain correct matches about 80 percent of the time — but they would also get about 20 percent false matches.
Alicia Carriquiry, a statistician at Iowa State University and director of a program on forensic science, who was not involved with the study, said the most important goal for any forensic technique is to have a low likelihood of false matches. False matches can lead to innocent people being convicted of crimes that they did not commit.
“In the jeans study, that probability was huge, meaning that the chance of making a false identification using that evidence is high,” she said.
Dr. Farid said the study actually represented a best-case scenario, in which the jeans were photographed from up close, under bright lighting and with good cameras. In real investigations, suspects are often photographed at distance, with low-resolution CCTV cameras.
Researchers outside the F.B.I. posit that the Journal of Forensic Sciences article also failed to show that jeans bar codes were a reliable method of identification. The major problem, they say, was that the article did not include an objective statistical model of how likely it was for the method to make mistakes — to gauge the possibility that two different pairs of jeans might look the same because of manufacturing similarities or just by coincidence, for instance. Instead the study leaned on the analyst’s judgment of markings on jeans.
Dr. Vorder Bruegge pointed this out himself in the study: “It should be remembered that in this and other cases, the overall significance of wear marks is not necessarily based on a quantitative assessment, but on a qualitative assessment.”
During the trial of Mr. Barbee, Dr. Vorder Bruegge demonstrated the accuracy of the technique by explaining that one pair of jeans seized from Mr. Barbee matched the pair worn by the bank robber, while 34 other pairs of jeans offered up by the defense did not. But outside researchers say that method does not substitute for having a statistical model describing the method’s accuracy.
In fact, at four points in the article, Dr. Vorder Bruegge noted that the technique had yet to be statistically validated. “Although a validation study has yet to be performed to test the theory that all denim trouser bar code seam patterns are unique,” he wrote, “it has been observed in numerous examinations that it is possible to distinguish pairs of jeans from one another based solely on differences in the patterns along the seams.”
No such validation study has been published since then. The F.B.I. declined to answer questions about the bureau’s use of jeans bar codes or about Dr. Vorder Bruegge’s research. Independent researchers say that with many other kinds of pattern analysis, as with jeans bar codes, prosecution witnesses rely too much on subjective judgments rather than rigorous statistics.
“Forensic scientists will say, ‘Yeah, I’m sure, based on my 20 years of experience, that these prints were made by that same finger,’” said Anil Jain, a computer scientist who studies pattern recognition and biometrics. “They say that’s a subjective decision. We want to get away from that.” F.B.I. investigators sometimes present the methods in court as being near-infallible, often citing levels of accuracy that researchers find implausible.
In a 2003 case, Dr. Vorder Bruegge claimed that the plaid shirt worn by a bank robber and captured by a security camera made a definitive match with one seized from the home of a suspect. He testified that only one in 650 billion shirts would match so well — a claim that “makes about as much sense as the statement two plus two equals five,” Karen Kafadar, a statistician at the University of Virginia, told ProPublica.
Dr. Farid intends to study whether the challenges of jeans-matching also bedevil other kinds of pattern-based evidence: lines in plaid or striped shirts, blob shapes in camouflage designs and marks left behind by tires.
“At some point, we need to understand that the fact that two items look similar in no way means that they have a common origin,” Dr. Carriquiry said.
“This stuff matters,” Dr. Farid said. “People are going to jail based on shoddy evidence.”