psychiatric diagnoses. If the definitions of mental disorders were too vague and inexact, different psychiatrists would apply them in different ways, making poor diagnostic reliability inevitable.
This imprecision was why, as Spitzer said, for the DSM-II âThere are no diagnostic categories for which reliability [is] uniformly high ⦠[and why] the level of reliability is no better than fair for psychosis and schizophrenia and is poor for the remaining categories.â 8 Spitzerâs hope was that by sharpening the definitions there would be less scope for personal interpretation, which in turn would mean diagnostic reliability would rise.
Finally, to help further improve diagnostic reliability, Spitzerâs team made a third and major alteration: they created criteria for each disorder that a patient had to meet in order to warrant the diagnosis. So while, for example, there are nine symptoms associated with depression, it was somehow decided that a patient would need to have at least five of them for a period of at least two weeks to qualify for receiving the diagnosis of depression.
The only problem was on what grounds did Spitzerâs team decide that if you have five symptoms for two weeks, you suffer from a depressive disorder? Why did they choose five symptoms for two weeks instead of six symptoms for three weeks? Or, for that matter, three symptoms for five weeks? What was the science that justified putting the line where Spitzerâs team chose to draw it? In an interview in 2010, the psychiatrist Daniel Carlat asked Spitzer this very question.
Carlat: How did you decide on five criteria as being your minimum threshold for depression?
Spitzer: It was just a consensus. We would ask clinicians and researchers, âHow many symptoms do you think patients ought to have before you would give them the diagnosis of depression,â and we came up with the arbitrary number of five.
Carlat: But why did you choose five and not four? Or why didnât you choose six?
Spitzer: Because four just seemed like not enough. And six seemed like too much. [Spitzer smiled mischievously.]
Carlat: But werenât there any studies done to establish the threshold?
Spitzer: We did reviews of the literature, and in some cases we received funding from NIMH to do field trials ⦠[However] when you do field trials in depression and other disorders, there is no sharp dividing line where you can confidently say, âThis is the perfect number of symptoms needed to make a diagnosisâ ⦠It would be nice if we had a biological gold standard, but that doesnât exist, because we donât understand the neurobiology of depression.
I suspect that by now some of you may be scratching your heads. Wasnât the whole point of Spitzerâs reform to make psychiatric diagnosis a little more scientifically rigorous? But what, you may ask, is rigorous about a committee drawing arbitrary lines between mental disorder and normality? And what is scientific about asking the psychiatric community to voteon whether existing disorders should be removed from the DSM ? In other words, in the name of making psychiatric diagnosis more scientific, had Spitzerâs team continued to make use of the unscientific procedures that had dogged the construction of earlier manuals?
As important as this question is, Iâll refrain from answering it right now, because there is a more crucial question to be addressed first: Did Spitzerâs reforms actually work? Did they solve the reliability problem? If you went to see two different psychiatrists independently today, would they be likely to both assign you the same diagnosis?
In an interview for The New Yorker in 2005, a journalist called Alix Spiegel asked Spitzer that very question. His answer was unequivocal. âTo say that weâve solved the reliability problem is just not true,â said Spitzer. âItâs been improved. But if youâre in a