DE-PROSECUTION AND DEATH: A FINAL REPLY TO KAPLAN, NADDEO & SCOTT
Thomas P. Hogan
Summary
Kaplan et al. have submitted a second response commenting on my original article estimating a causal link between de-prosecution and homicides in Philadelphia using the traditional synthetic control method. Unfortunately, the Kaplan second response, while attempting to fix one data error, remains riddled with serious and fatal errors. These errors were explicitly identified in a previous reply, but the authors chose to ignore their mistakes. For instance, the Kaplan second response is still claiming that the leading donor city of Detroit only cleared 8% of homicides during one year, while maintaining that New York cleared 101% of its homicides during another year, with numerous other obvious mistakes in homicide clearance data. In this brief and final reply to the Kaplan authors, I will detail the Kaplan authors’ methodological and data errors, explain how and why such mistakes impact the most recent Kaplan response, and describe how to correct the mistakes. The original study, estimating that de-prosecution in Philadelphia is associated with an additional 74.79 homicides per year (p<.05), remains robust.
Declarations of interest: none.
1 | INTRODUCTION
In De-Prosecution and Death: A Synthetic Control Analysis of the Impact of De-Prosecution on Homicides (Hogan, 2022) (hereinafter, the “De-Prosecution Article”), I examined the potential causal link between the policy of de-prosecution and homicides in Philadelphia. I used the traditional synthetic control method to estimate a causal effect on homicides in a city where sentencings across felonies and misdemeanors declined by approximately 70% over five years. The results showed a statistically significant increase of 74.79 homicides per year in Philadelphia during 2015-19 associated with the de-prosecution policy.
Jacob Kaplan, J.J. Naddeo, and Tom Scott publicly posted a working paper criticizing the results of the De-Prosecution Article (hereinafter, the “Kaplan First Response”). While initially replicating the results of the De-Prosecution Article precisely, the Kaplan First Response then criticized the data, methods, and results from the De-Prosecution Article, insisting that it was necessary to use an augmented synthetic control method which allegedly “flipped” the results of the original study. I replied to the Kaplan First Response, pointing out that the Kaplan authors had committed a massive data error in the homicide clearance variable used in the synthetic control model, for instance claiming that the donor city New York cleared zero homicides in the 2010-12 time period, as well as a series of other data and methodological errors (hereinafter, the “De-Prosecution Reply”). Once the data and methodological errors were corrected in the Kaplan First Response, the results of the original De-Prosecution Article remained robust, and in fact were buttressed by validity testing in the Kaplan First Response. As conceded by the Kaplan authors, the Kaplan First Response was rejected for publication after it was peer reviewed. The Kaplan authors recently and publicly posted a second response, once again attempting to avoid peer review and once again challenging the results of the De-Prosecution Article (hereinafter, the “Kaplan Second Response”).[1] The Kaplan Second Response no longer claims to have “flipped” the original results, but merely states in an appendix that they find “considerable heterogeneity in the results.”
The Kaplan Second Response must also be dismissed as methodologically unsound and relying on plainly incorrect data. The Kaplan authors continue to insist on using an augmented synthetic control method, which is not called for in this context. More importantly, the Kaplan Second Response once again is using fatally flawed data. For the major donor cities in the synthetic control model (Detroit, New Orleans, and New York), the Kaplan Second Response fails to correct plainly wrong data for the homicide clearance rate variable. As examples, the Kaplan Second Response continues to maintain that Detroit only cleared 8% of its homicides in 2012, New Orleans only cleared 15% of its homicides in 2012, while New York cleared 101% of its homicides in 2017. These data errors were pointed out in the De-Prosecution Reply, but the Kaplan authors chose to ignore them, once again rendering their results invalid. In this final reply, I will: (1) detail the remaining major data errors in the Kaplan Second Response; (2) describe the ongoing methodological mistake; and (3) explain how to correct the mistakes.
2 | THE CONTINUED DATA ERRORS IN THE KAPLAN SECOND RESPONSE
The Kaplan Second Response continues to use egregiously incorrect data, even though such data flaws were identified by the De-Prosecution Reply. In the Kaplan Second Response, the Kaplan authors never admit to their original mistake, claiming merely that “we admit that we should have taken a closer look at the clearance data.” The Kaplan authors then chose to correct one small area of the incorrect data, while ignoring the more salient data errors, and published another critique.
In applying the traditional synthetic control method to create synthetic Philadelphia, the De-Prosecution Article reported that three donor cities were identified (with relative contributions): Detroit (0.468), New Orleans (.334), and New York (.198). The De-Prosecution Article included homicide clearances as a variable.[2] The Kaplan responses also reported that the donor cities in their model were Detroit, New Orleans, and New York, with similar contributions per city, and also attempted to use homicide clearances as a variable.
As identified in the De-Prosecution Reply, the Kaplan First Response used incorrect homicide clearance data when attempting to apply an augmented synthetic control method, resulting in their “flipped” result. The De-Prosecution Reply specifically stated:
To use a few examples, the Kaplan Response data reports that donor city New York cleared 0% of its homicides in 2010, 2011, and 2012. New York then rebounds to clear 101% of its homicides in 2017, according to the Kaplan Response authors. The Kaplan Response included gross errors for clearance rates for the other donor cities for their model, such as Detroit with a 16% clearance rate in 2010, an 11% clearance rate in 2011, an 8% clearance rate in 2012, and a 14% clearance rate in 2016. According to the data used by the Kaplan Response, New Orleans only cleared 15% of its homicides in 2012, while Los Angeles (a substitute donor in the robustness modeling) cleared 107% of its homicides in 2013. There are similar errors throughout the clearance data generated and used by the Kaplan Response. As previously noted, the Kaplan Response deleted median income as a variable, further undermining data integrity. The wildly incorrect clearance data then were applied in the augmented synthetic control method preferred by the Kaplan Response, causing the fatal error in its ultimate and bizarre statistical conclusion. The authors of the Kaplan Report have engaged in research conduct that is either remarkably disingenuous or remarkably sloppy. (Hogan, 2022, internal footnotes deleted).
In the Kaplan Second Response, the authors choose simply to ignore the majority of these incorrect data. The Kaplan authors state that they attempted to correct one small area of flawed data, the 0% homicide clearance rates for New York from 2010-12. The Kaplan Second Response replaced the 0% clearance rate for New York during that time period with the “mean of homicides cleared with the agency for all non-zero observations in 2010-19.” In other words, rather than investigate the actual clearance rate for New York via official data or demographic metrics, the Kaplan authors imputed 30% of the applicable clearance rate data for the city.
Initially, it is useful to point out the general problem with this proposed data correction by the Kaplan Second Response. During the robustness testing section of the De-Prosecution Article, the original study tested the impact of de-prosecution on burglary and robbery offenses. The De-Prosecution Article cautioned that the results could not be considered valid because approximately 20% of the data had to be imputed from prior years. The Kaplan Second Response is attempting to rely upon results where 30% of the data had to be imputed from other years. In addition, the Kaplan Second Response is calculating the mean clearance rate for homicides in New York to replace the 0% clearance rate years by including their assignment of a 101% clearance rate for 2017 in the mean calculation. The Kaplan authors are replacing a specific error with a general and compounded error, not attempting to reach accurate results.
Moreover, the Kaplan authors are attempting a correction to one portion of the incorrect data for the smallest contributing donor. New York’s relative contribution to the synthetic control model is 19.8%, by far the smallest contributor (Hogan, 2022). Thus, the Kaplan Second Response’s proposed solution to using incorrect data is to blindly impute 30% of the relevant data for one variable for a city that is contributing less than 20% to the overall model, a miniscule and flawed correction. Their “correction” addresses less than 6% of one variable in a much larger dataset.
Most importantly, the Kaplan Second Response has knowingly chosen to continue to use data which was specifically identified by the De-Prosecution Reply as incorrect. For instance, the Kaplan authors make no attempt to address their model which claims that Detroit only cleared 8% of its homicides in 2012, New York cleared 101% of its homicides in 2017, or Los Angeles (an alternative donor in robustness testing) cleared 107% of its homicides in 2013. The Kaplan Second Response only chooses to address those data where the clearance rate was 0%, not where the clearance rate was otherwise obviously low or impossibly high. While the 0% clearance rate for New York during 2010-12 was certainly the most eye-catching error in the Kaplan First Response, it was not the statistical driver of their erroneous results.
The Kaplan Second Response’s continued use of incorrect data is particularly impactful for their flawed results when considering Detroit. Detroit is the major donor to the synthetic control model (Hogan, 2022). Following in Table 1 is the homicide clearance data for Detroit used by the Kaplan responses in running their models:
Table 1
Kaplan Calculation of Clearance Rates for Detroit
Year Clearance Rate
2010 16.1
2011 11.3
2012 8.8
2013 33.5
2014 45.1
2015 35.4
2016 14.5
2017 51.3
2018 51.7
2019 51.4
Practitioners and academics with real-life exposure to the criminal justice system would immediately discount these clearance rates as inaccurate. If Detroit only cleared 8.8% of its homicides in 2012, 11.3% in 2011, 14.5% in 2016, and 16.1% in 2010, such abysmal clearance rates would have been national news. If Detroit tripled its clearance rate from 2016 to 2017, that also would have been a major crime story in both academic and public circles. The Kaplan authors do not have the requisite training or experience with the criminal justice system to spot these obvious problems. However, anybody with access to the internet could do a quick search to check these data. An internet search for Detroit’s homicide clearance rates over the years quickly yields a publicly available chart showing both the FBI’s figures and Detroit Police Department’s figures for homicide clearance rates in Detroit, none of which resemble the data generated by the Kaplan Second Response.[3]
Homicide clearance rates can be difficult to ascertain. In the traditional synthetic control model used by the De-Prosecution Article, this became a moot point because the algorithm did not assign any weight to the clearance rate data (the same result for the Kaplan authors when using the traditional synthetic control model). However, by injecting the incorrect data into the model, then using an augmented synthetic control method, the Kaplan First Response and Kaplan Second Response induced an invalid result. Essentially, the Kaplan authors told the algorithm that the homicide data were biased, eliminated median income as a variable, then induced the algorithm to rely on the incorrect homicide clearance data by process of elimination. This then produced the “flipped” results suggested in the Kaplan First Response and the null results estimated in the Kaplan Second Response.[4]
Once this erroneous data is corrected and the accurate variables are re-introduced into the dataset, the augmented synthetic control method preferred by the Kaplan authors replicates the results of the original model from the De-Prosecution Article, as previously identified in the De-Prosecution Reply. In addition to repeating their data errors, the Kaplan authors also repeated a methodological error in the Kaplan Second Response.
3 | THE METHODOLOGICAL FLAW OF THE KAPLAN SECOND RESPONSE
The Kaplan Second Response continues to insist that it is necessary to replace the traditional synthetic control method used in the De-Prosecution Article with an augmented synthetic control model, intended to be used when the classic model results in a poor pre-period match (Abadie & L’Hour, 2021). As pointed out in the De-Prosecution Reply, the pre-period match in the original study was extremely precise, and it appears that the Kaplan authors simply misread the synthetic control plot.[5] Thus, the augmented synthetic control method was and is unnecessary.
In the Kaplan Second Response, the authors no longer claim that the pre-intervention fit in the original study was insufficient. Instead, they appear to insist that the augmented synthetic control method must be used in place of the traditional synthetic control method in all studies. This assertion is completely novel and incorrect. As discussed by Ben-Michael et al. (2021) and Abadie and L’Hour (2021), the augmented synthetic control method is only necessary when bias-correction is required due to an infeasible pre-treatment fit. Because the pre-intervention fit was strong in the original De-Prosecution Article study, the Kaplan Second Response’s insistence on using the augmented synthetic control method is both incorrect and puzzling.[6] Nevertheless, the Kaplan Second Response continues to replicate the outcome of the original De-Prosecution Article when using the traditional synthetic control method, and even in one iterative model using the augmented synthetic control method and incorrect data. The only remaining issue is to explain how the flaws in the Kaplan responses can be remedied.
4 | CORRECTING THE KAPLAN SECOND RESPONSE
There are three ways for the Kaplan authors to correct their results. First, if the Kaplan authors insist on using the augmented synthetic control method, they can simply eliminate homicide clearances as a variable and re-run their code. Once the variable with the erroneous data is removed entirely, the augmented synthetic control method used by the Kaplan responses replicates the results of the original De-Prosecution Article.
Second, the Kaplan authors could calculate the clearance data accurately. There are multiple alternatives. For Detroit, New Orleans, and New York (the donor cities), they could check with the individual police departments, query public data bases, review FBI data, or use offender demographics. The De-Prosecution Article used the latter approach, but then spot-checked the results to make sure that there were no anomalies like 0% or over 100% clearance rates. The clearance rate variable became a moot point for purposes of the De-Prosecution Article because of the assignment of weights to variables in the application of the traditional synthetic control method. Unfortunately for the Kaplan authors, their grossly inaccurate clearance rate data operationalized their invalid results. If they actually had attempted to check the invalid data after they were identified in the De-Prosecution Reply, the Kaplan Second Response might have avoided publicly repeating the error.
Third, the Kaplan authors can admit that using a bias-correction model is unnecessary and return to using the traditional synthetic control method. The Kaplan First Response conceded that when running the traditional synthetic control algorithm, they replicated the results of the original De-Prosecution Article, finding that de-prosecution in Philadelphia is associated with an increase of over 74 homicides per year (p=0.012), virtually identical results to the original De-Prosecution Article. The replication itself by the Kaplan authors is an important validation of the original De-Prosecution Article.[7]
5 | CONCLUSION
After repeated attacks by the Kaplan authors on the De-Prosecution Article, two firm conclusions can be reached. First, the Kaplan authors are willing to use obviously incorrect data (even after such data were identified for them) and flawed methods in an attempt to undermine an evidence-driven result where the authors dislike the policy implications. The Kaplan authors plainly betray this bias when they cite approvingly to other papers which purportedly show positive policy outcomes related to de-prosecution, without bothering to do any critical analysis of the data or methodology of those studies. Like children vexed by a puzzle, the Kaplan authors alternately attempt to hammer square pegs into round holes or hurl the puzzle against the wall.
The second conclusion is that the original results of the De-Prosecution Article remain robust against these attacks, and in fact have been strengthened by revealing that even the Kaplan authors’ poorly specified data and incorrect methods reveal consistent trends causally linking de-prosecution and homicides in Philadelphia. In summary, a quantitative analysis using standard methodology and variables estimated that de-prosecuting 70% of both felony and misdemeanor cases over a five-year period resulted in a large and statistically significant increase in homicides in Philadelphia. This result is matched by real world increases in Philadelphia homicides and common sense. Given the scope of de-prosecution in Philadelphia, these results are unsurprising, and should be given due weight by policymakers.
[1] The De-Prosecution Article was published by Criminology & Public Policy on July 7, 2022 (https://onlinelibrary.wiley.com/doi/epdf/10.1111/1745-9133.12597). On July 25, 2022, the Kaplan First Response was posted by the three Kaplan authors on Twitter and linked to a publicly accessible document (https://drive.google.com/file/d/12aZDxYC7MUkZHwORkCECHNWV-SOei0nx/view). The De-Prosecution Reply was posted to a Substack account. The Kaplan Second Response was again posted on Twitter and linked to a publicly accessible document (https://github.com/jnaddeo/job-market-materials/blob/5838168428ed11935e341e4b6c09979093805ab8/working_papers/cpp_response_092022.pdf).
[2] The other predictors were homicides, population, median income, and a classification of the prosecutors’ offices (Hogan, 2022). In robustness testing, a large number of other variables were added to the original model without changing results.
[3] See CBS Detroit (June 29, 2022), https://www.cbsnews.com/detroit/news/crime-without-punishment-detroit-homicide-clearance-rates-rise-as-national-rates-fall/. The heterogeneity when comparing the FBI’s results and the Detroit Police Department’s results for cleared homicides is another signal for caution in relying simply on reported data without an internal validity check. The Washington Post also collected data on homicide clearances in Detroit, also never tracking the extremely low rates calculated by the Kaplan authors for certain years. See Murder with Impunity, Washington Post (last checked September 27, 2022), https://www.washingtonpost.com/graphics/2018/investigations/unsolved-homicide-database/?utm_term=.0ae0fb0528fc&city=detroit.
[4] As the Kaplan authors are learning, homicide clearances can be a treacherous area. Such data, when accurate, can provide interesting insights into a city. However, the police occasionally fail to report the data for an entire year, leading to a 0% clearance rate. Clearances from one year may get assigned to another year, leading to the potential for an over 100% clearance rate. The information can be skewed by “exceptional clearances,” especially when the police are motivated to improve clearance numbers. A new police commander or data team can change the measurements for clearances. Indeed, if homicide clearances had received the majority of weighting in the original De-Prosecution Article study, I would have re-assessed my confidence in the validity of the results. This a good example of why researchers without much prior exposure to the criminal justice system should consult an experienced researcher or criminal justice practitioner before publishing in the field, a point which was raised in the excellent Criminology & Public Policy webinar recently held addressing research about prosecutors.
[5] The Kaplan authors did not realize that the divergence would start at t-1 because of the yearly data, a common mistake, and one that I made in reviewing the plot for the initial De-Prosecution Article.
[6] The Kaplan Second Response also continues to insist on using homicide rates instead of the raw numbers of homicides, while admitting (once again hidden in a footnote) that the use of rates results in overfitting, which they phrase as the model “did not identify a unique solution.” The Kaplan authors are ignoring Abadie (2021), wherein the originator of the synthetic control method explains that the use of levels or rates depends on the context of the data and study, with overfitting a clear signal that the application of rates (or other transformation of raw numbers) is an inappropriate use of the algorithm. Importantly, Abadie (2021) is not saying that levels, rates, or logs are always right or always wrong, but that researchers must be sensitive to the context of the study, and that overfitting is an obvious marker that data specification choices are flawed.
[7] The balance of the issues raised in the Kaplan Second Response already have been addressed in the original De-Prosecution Article and the De-Prosecution Reply. Jacob Kaplan still refuses to admit that he was working directly with the Philadelphia District Attorney’s Office during the de-prosecution time period, a fact that seems to be worthy of identification in a statement of interests.