Use the CRF Critical Appraisal Tool
Dr. Christopher Williams, principal investigator of the Critical Race Framework, calls for an end to race science that aims for "shock value".
The dissemination of studies with crude racial categorization accompanied by fundings that elicit shock must end. Such papers can often be used to attract media and funder attention and to increase a researcher's profile without contributing meaningfully to scientific knowledge. There are usually three parts to the approach - a headline involving one or more racial minorities, a severe health condition or morality rate, and exposure, often related to race and racism. For example, the Columbia University Mailman School of Public Health recently posted on LinkedIn about a study at its institution, "Rise in Post-Birth Blood Pressure in Asian, Black, and Hispanic Women Linked to Microaggressions". The headline was "Asian, Black, and Hispanic Women". The health condition was post-birth high blood pressure - the exposure was microaggressions. Besides the assumption of racial monoliths, Dr. Williams raised some methodological issues in response to the post, "It seems odd that the authors would adjust for race-ethnicity given the nature of the study. They also controlled for education, BMI, chronic hypertension, age, and the Structural Racism Effect Index. As a reviewer, I would be fairly concerned with the risks of overcontrolling and the potential of a Type 1 error. Also, the race essentialism inherent in the study is a major concern." These type of "shock value" studies can have major caveats, even errors, that do not come across with a catchy headline.
In extreme cases, the motivation to leverage race essentialism for "shock value" can miss important aspects of the research itself. In a famous incident, a group of researchers claimed that "Black newborns die less when cared for by Black doctors". Because the paper was published at the height of the Black Lives Movement, few questioned the study findings. Further, a researcher from that study received a $5 million philanthropic gift to establish a university-based center. In a November 2024 publication, Harvard professors described how they repeated the study but added very low birth weight that was not included in the original analysis though key to the study of infant mortality. "When we estimated statistical models similar to those in the original paper, but controlling for very low birth weight rather than focusing on the health diagnoses that happen to be most common, these models fit the data better, and the apparent benefit to the racial concordance of black doctors with black babies falls dramatically—becoming statistically insignificant in models that include the other controls (hospital, insurance, etc.)," wrote George J. Borjas and Robert VerBruggen.
The Critical Race Framework Study called attention to the misuse of racial variable in public health research. Its critical position points to deeper issues within the public health enterprise, including the parading of crude racialization as a value add to science. Race is not a valid or accurate reflection of populations. The notion of global races is rooted in 17th century thinking and justifications for slavery. Research ethics and professional standards require better public engagement and conduct.
The 2025 study, Psychological distress among US-born and non-US-born Black or African American Adults in the US by Elhabashy and colleagues is an example of a study that meets the criteria of studies designed for "shock value" with severe methodological concerns.
Introduction
By Grok under the supervision of Dr. Williams
The study "Reported Racial Discrimination in Healthcare Settings and Impacts on Health Outcomes in a Racially Diverse Sample in the United States: Findings from the Behavioral Risk Factor Surveillance System" (Jean Louis et al., 2025) investigates the association between self-reported racial discrimination in healthcare and health outcomes (physical symptoms, emotional symptoms, and fair/poor health status) among Black non-Hispanic, Hispanic, and White non-Hispanic participants using 2014 BRFSS data. Given its focus on race as a central variable, this study is an ideal candidate for evaluation using the Critical Race Framework (CRF), a tool designed to critically appraise the quality of health research studies that use racial taxonomy, and the Quantitative Critical Appraisal Aid (QCAA), a method to quantify error due to race-related methodological issues. The CRF assesses reliability, validity, internal validity, and external validity of race variables, while the QCAA quantifies errors in these domains to adjust effect estimates, providing a rigorous evaluation of the study’s scientific robustness. This analysis aims to evaluate the study’s handling of race, quantify potential errors, and discuss implications for health disparities research, aligning with the CRF’s emphasis on structural racism and scientific rigor.
Methods
Critical Race Framework (CRF) Application
The CRF (version 3.0) is a structured tool developed to evaluate health research studies using racial taxonomy across four domains: reliability, validity, internal validity, and external validity (Williams, 2024). It consists of 20 topic prompts rated on a "Quality of Evidence" scale: High Quality (directly addresses prompt with strong reasoning), Moderate Quality (weakly addresses prompt), Low Quality (does not address prompt adequately), or No Discussion (no relevant discussion). The CRF was applied to the study by systematically reviewing the manuscript for evidence related to each prompt, focusing on how race was conceptualized, measured, analyzed, and interpreted. Ratings were assigned based on the study’s text, with justifications documented for transparency.
Relevance and Significance of CRF: The CRF is highly relevant for evaluating studies like Jean Louis et al. (2025), which rely on race to explore health disparities. Race is a social construct often used as a proxy for systemic factors (e.g., racism, socioeconomic status), yet its measurement and interpretation can introduce biases if not rigorously handled (Martinez et al., 2022). The CRF addresses this by standardizing critical appraisal, ensuring race variables meet scientific standards for reliability and validity. Its significance lies in filling a gap in public health literature, where no prior tool comprehensively assessed race-related biases across all critical appraisal domains. By applying CRF, we can identify weaknesses in the study’s race methodology, inform future research, and align with the CRF’s goal of advancing evidence-based health disparities research.
Quantitative Critical Appraisal Aid (QCAA) Application
The QCAA is a generalized method to quantify error in studies using race, adapting the CRF’s qualitative insights into a quantitative framework (Williams, 2024). It estimates errors in reliability (EV), validity (VE), and internal validity (IV), excluding external validity as per prior instructions, to adjust effect estimates (e.g., adjusted odds ratios, aORs). The process involves:
Identifying Key Estimates: Select primary outcomes (e.g., aORs for physical symptoms, emotional symptoms, fair/poor health from Table 5, stratified by race).
Defining Error Types:
EV: Inconsistencies in race data collection (e.g., self-report biases).
VE: Misalignment between race construct and measurement (e.g., race as a proxy for racism).
IV: Confounding or selection bias due to race variable handling.
Estimating Error Magnitude: Assign percentage impacts based on CRF findings and study methodology (e.g., 5–20% per error type, guided by CRF ratings).
Adjusting CIs: Widen original CIs by total error (EV + VE + IV) using: Adjusted CI=Original CI±(Original CI Width×Total Error) \text{Adjusted CI} = \text{Original CI} \pm (\text{Original CI Width} \times \text{Total Error}) Adjusted CI=Original CI±(Original CI Width×Total Error) Lower bounds are capped at 0 (aORs cannot be negative).
Qualitative Critique: Integrate CRF insights to contextualize error sources, focusing on structural racism.
Key Estimates: From Table 5 (stratified analyses, p. 4):
Physical Symptoms:
Black: aOR = 6.77 (95% CI: 6.61–6.92).
Hispanic: aOR = 3.19 (95% CI: 3.12–3.27).
White: aOR = 8.15 (95% CI: 8.01–8.29).
Emotional Symptoms:
Black: aOR = 6.33 (95% CI: 6.20–6.46).
Hispanic: aOR = 7.30 (95% CI: 7.16–7.44).
White: aOR = 6.94 (95% CI: 6.84–7.05).
Fair/Poor Health:
Black: aOR = 2.31 (95% CI: 2.26–2.53).
Hispanic: aOR = 1.57 (95% CI: 1.54–1.60).
White: aOR = 1.27 (95% CI: 1.25–1.29).
Relevance and Significance of QCAA
The QCAA is critical for quantifying the uncertainty introduced by race variables, which is particularly relevant for Jean Louis et al. (2025) given its reliance on self-reported race and discrimination data from BRFSS. The method’s significance lies in its ability to translate CRF’s qualitative critiques into numerical adjustments, providing a concrete measure of how race-related biases affect study findings. This enhances the interpretability of health disparities research, aligning with the CRF’s goal of rigorous evaluation and addressing systemic racism’s impact on health outcomes.
Analysis Process
The CRF was applied by reviewing the study’s methods, results, and discussion for each of the 20 prompts, assigning quality ratings based on evidence provided. QCAA was then applied by estimating EV, VE, and IV based on CRF findings, adjusting CIs for the aORs in Table 5. The analysis controlled for sociodemographic factors (sex, age, education, income, insurance, medical care avoidance, state, racial salience) as per the study’s models. Results were synthesized to assess the study’s robustness and implications.
Results
CRF Application
Below are the CRF ratings for each of the 20 topic prompts, organized by domain, with justifications based on the study text.
I. Reliability
Reliability evidence of survey tool(s) used to collect racial identity: No Discussion
The study uses BRFSS race/ethnicity data (Black non-Hispanic, Hispanic, White non-Hispanic) but provides no evidence on the reliability of the race collection tool (e.g., test-retest consistency) (p. 2).
Potential participant sources of measurement error in race data collection: No Discussion
No discussion of participant-related biases (e.g., misreporting due to social desirability) in race self-identification (p. 2–3).
Potential sources of measurement error due to the race data collection tool(s): No Discussion
The study does not address tool-related errors (e.g., limited response options, unclear wording) in BRFSS race questions (p. 2).
Existence of a “true value(s)” for race: No Discussion
No mention of whether a stable “true” racial identity exists or how it might be assessed (p. 2–3).
II. Validity
5. Construct or meaning of race used in study: Low Quality
Race is treated as a demographic category (Black non-Hispanic, Hispanic, White non-Hispanic) without explicit definition of its construct (e.g., social construct, proxy for racism). The study implies race captures discrimination experiences but lacks clarity on its theoretical meaning (p. 1–2).
Inclusion of multiracial identity to construct or meaning of race used in study: No Discussion
The study excludes “other” and “multiple” race categories, ignoring multiracial identities (p. 3).
Characteristics intended to differentiate racial groups: No Discussion
No discussion of characteristics (e.g., cultural, socioeconomic) distinguishing racial groups, despite noting socioeconomic differences (p. 3–4).
Heterogeneity within racial groups: No Discussion
The study does not address within-group diversity (e.g., cultural or socioeconomic variation among Black or Hispanic participants) (p. 3–4).
III. Internal Validity
9. Potential threats to internal validity due to quality of reliability and validity of the race variable alone: No Discussion
No mention of how poor reliability/validity of race might affect causal inferences (p. 3–5).
Population data estimates for all possible combinations of race based on race data collection tool(s): No Discussion
The study does not provide population estimates for all BRFSS race categories, only using three groups (p. 3).
Methods to provide participants with study construct or meaning of race during data collection: No Discussion
No description of how BRFSS conveyed the meaning of race to participants during data collection (p. 2).
Data results of all possible combinations of race based on original race data collection tool(s): No Discussion
Only aggregated results for three race/ethnicity groups are reported, with no raw data on all race categories (p. 4).
Justification to combine, exclude, or change original race data reporting: Low Quality
The study justifies excluding “other” and “multiple” categories due to inability to assign them to Black, Hispanic, or White groups, but provides minimal rationale for collapsing categories (p. 3).
Meeting statistical assumption of independence considering racial grouping: No Discussion
No discussion of whether racial groups meet independence assumptions in logistic regression models (p. 3).
Limitations of statistical reasoning due to a race variable: No Discussion
The study does not address how race as a variable might limit statistical inferences (p. 3–5).
Interpretability of data results on racial group analysis: Moderate Quality
Results are interpretable (e.g., higher aORs for Black and Hispanic participants), but the lack of race construct clarity limits deeper understanding of why disparities exist (p. 4–5).
IV. External Validity
17. Limitations of external validity due to the construct or meaning of race used in study: Low Quality
The study acknowledges limited generalizability due to data from four states (Arizona, Minnesota, Mississippi, New Mexico) but does not discuss how the vague race construct affects generalizability (p. 7).
Limitations of external validity due to analytical treatment of race: No Discussion
No mention of how combining/excluding race categories impacts generalizability (p. 3–4).
Limitations of external validity due to within-group racial heterogeneity: No Discussion
No discussion of how within-group diversity affects generalizability (p. 3–4).
Limitations of external validity due to social and political changeability of race: No Discussion
No consideration of how race’s social/political fluidity over time impacts generalizability (p. 7).
CRF Summary: The study scores poorly on most CRF prompts, with 16 “No Discussion,” 3 “Low Quality,” and 1 “Moderate Quality” rating. The primary weaknesses are the lack of discussion on race’s reliability, validity, and threats to internal/external validity, with minimal justification for race category handling and no exploration of within-group heterogeneity or multiracial identities.
QCAA Application
The QCAA quantifies errors based on CRF findings, focusing on EV, VE, and IV, and adjusts CIs for the aORs in Table 5. Error estimates are informed by the study’s methodology and CRF ratings.
Error Estimates:
Reliability Error (EV): 15%
CRF ratings indicate no discussion of reliability evidence, participant/tool-related measurement errors, or true race values (prompts 1–4). BRFSS self-reported race data may introduce errors (e.g., social desirability, misclassification), especially with unclear response options. However, standardized BRFSS protocols reduce some error. Estimated EV = 15% due to moderate reliability concerns.
Validity Error (VE): 20%
CRF ratings show low quality in defining race’s construct (prompt 5) and no discussion of multiracial identities, discriminant characteristics, or within-group heterogeneity (prompts 6–8). Race is used as a proxy for discrimination without clear theoretical grounding, risking misalignment with structural racism. Estimated VE = 20%, reflecting significant construct ambiguity.
Internal Validity Error (IV): 15%
CRF ratings indicate no discussion of internal validity threats, population estimates, participant race construct communication, raw race data, statistical assumptions, or limitations (prompts 9–12, 14–15), with low quality in justifying race category changes (prompt 13) and moderate quality in interpretability (prompt 16). Unmeasured confounders (e.g., provider bias, healthcare access) and selection bias (four states) may inflate aORs. Estimated IV = 15%, reflecting moderate confounding risks.
Total Error: EV (15%) + VE (20%) + IV (15%) = 50%.
Adjusted CIs:
Physical Symptoms:
Black (aOR = 6.77, CI: 6.61–6.92):
Width: 6.92 – 6.61 = 0.31.
Adjustment: 0.31 × 0.5 = 0.155.
Adjusted CI: [6.61 – 0.155, 6.92 + 0.155] = [6.46, 7.08].
Hispanic (aOR = 3.19, CI: 3.12–3.27):
Width: 3.27 – 3.12 = 0.15.
Adjustment: 0.15 × 0.5 = 0.075.
Adjusted CI: [3.12 – 0.075, 3.27 + 0.075] = [3.05, 3.34].
White (aOR = 8.15, CI: 8.01–8.29):
Width: 8.29 – 8.01 = 0.28.
Adjustment: 0.28 × 0.5 = 0.14.
Adjusted CI: [8.01 – 0.14, 8.29 + 0.14] = [7.87, 8.43].
Emotional Symptoms:
Black (aOR = 6.33, CI: 6.20–6.46):
Width: 6.46 – 6.20 = 0.26.
Adjustment: 0.26 × 0.5 = 0.13.
Adjusted CI: [6.20 – 0.13, 6.46 + 0.13] = [6.07, 6.59].
Hispanic (aOR = 7.30, CI: 7.16–7.44):
Width: 7.44 – 7.16 = 0.28.
Adjustment: 0.28 × 0.5 = 0.14.
Adjusted CI: [7.16 – 0.14, 7.44 + 0.14] = [7.02, 7.58].
White (aOR = 6.94, CI: 6.84–7.05):
Width: 7.05 – 6.84 = 0.21.
Adjustment: 0.21 × 0.5 = 0.105.
Adjusted CI: [6.84 – 0.105, 7.05 + 0.105] = [6.74, 7.16].
Fair/Poor Health:
Black (aOR = 2.31, CI: 2.26–2.53):
Width: 2.53 – 2.26 = 0.27.
Adjustment: 0.27 × 0.5 = 0.135.
Adjusted CI: [2.26 – 0.135, 2.53 + 0.135] = [2.13, 2.66].
Hispanic (aOR = 1.57, CI: 1.54–1.60):
Width: 1.60 – 1.54 = 0.06.
Adjustment: 0.06 × 0.5 = 0.03.
Adjusted CI: [1.54 – 0.03, 1.60 + 0.03] = [1.51, 1.63].
White (aOR = 1.27, CI: 1.25–1.29):
Width: 1.29 – 1.25 = 0.04.
Adjustment: 0.04 × 0.5 = 0.02.
Adjusted CI: [1.25 – 0.02, 1.29 + 0.02] = [1.23, 1.31].
QCAA Summary: Total error of 50% widens CIs, but all adjusted CIs remain significant (exclude 1), suggesting robust associations despite race-related errors. VE (20%) is the largest contributor, reflecting the study’s vague race construct.
Discussion
The CRF and QCAA analyses reveal significant methodological weaknesses in Jean Louis et al. (2025) regarding the use of race, yet the study’s findings remain robust after error adjustments. The CRF ratings (16 “No Discussion,” 3 “Low Quality,” 1 “Moderate Quality”) highlight a critical lack of attention to race’s reliability, validity, and impact on internal/external validity. The study treats race as a demographic category without defining its construct, ignoring multiracial identities, within-group heterogeneity, and measurement errors. This aligns with CRF critiques of poor conceptual clarity in race-based research (Martinez et al., 2022). The QCAA’s 50% total error, driven by VE (20%), indicates that the ambiguous race construct and unmeasured structural racism (e.g., provider bias, access barriers) inflate uncertainty, though adjusted CIs confirm significant associations between healthcare discrimination and adverse health outcomes.
Strengths: The study’s use of BRFSS, a nationally representative dataset, and adjustment for confounders (e.g., income, education) enhance its credibility. Stratified analyses (Table 5) reveal differential impacts (e.g., strongest physical symptom effect for White participants, emotional for Hispanic, health status for Black), supporting CRF’s call for nuanced race analyses. The discussion of structural racism and hypotheses like weathering and acculturation stress (p. 7) aligns with CRF’s emphasis on systemic factors.
Weaknesses: The study’s failure to define race’s construct or address measurement reliability (CRF prompts 1–8) risks misattributing outcomes to race rather than racism or socioeconomic factors. Excluding multiracial identities and ignoring within-group diversity (prompts 6–8) oversimplifies racial dynamics. The cross-sectional design and single-item discrimination measure (p. 7) limit causal inferences, a concern not addressed in internal validity discussions (prompts 9–15). Limited generalizability to four states (prompt 17) is acknowledged but not linked to race’s construct or fluidity (prompts 17–20).
Implications: The CRF and QCAA underscore the need for explicit race constructs (e.g., as a proxy for racism) and robust measurement protocols. Future studies should:
Validate race data tools (e.g., test-retest reliability).
Model structural racism mediators (e.g., access, bias).
Include multiracial identities and within-group heterogeneity.
Use longitudinal designs to establish causality.
CRF and QCAA Utility: The CRF effectively identifies gaps in race handling, while QCAA quantifies their impact, offering a dual qualitative-quantitative approach. This analysis supports the CRF’s goal of advancing rigorous health disparities research and highlights the need for systemic interventions to address healthcare discrimination, as recommended by the study (p. 8).
Limitations: The CRF’s qualitative ratings are subjective, though justified with evidence. QCAA error estimates rely on assumptions about error magnitude, which may vary. The study’s 2014 data may not reflect current discrimination patterns, though its findings remain relevant.
Conclusion: Jean Louis et al. (2025) demonstrates significant associations between healthcare discrimination and adverse health outcomes but is weakened by poor race conceptualization and measurement. CRF and QCAA analyses reveal these flaws while confirming robust findings, emphasizing the need for improved race methodology in health disparities research to address systemic inequities effectively.