Magali Pignard: Hormones in Teenagers – How Chen’s Study Misleads the Reader

La Petite Sirène
Jun 10, 2025
8 min read

13 avril 2024 - https://www.transition-mineurs.com/post/critique-de-l-%C3%A9tude-de-chen-2023-hormones-sexuelles-crois%C3%A9es

https://www.nejm.org/doi/full/10.1056/NEJMoa2206297

Trad. Chat GPT - DeepL

his post analyzes a highly publicized study published in the New England Journal of Medicine (Chen et al., 2023), which is often cited as evidence in favor of hormonal treatments for youth identifying as transgender. To make the methodological issues accessible to a broad audience, I begin by presenting a fictional study — reproducing the same biases — before addressing the real case.

Trad. Chat GPT - DeepL

Methodological rigor is a central foundation of clinical research.

It requires adherence to the pre-registered protocol, transparency in the presentation of results, and a cautious interpretation of findings. Bypassing these principles undermines the reliability of an entire study.

To clearly illustrate this type of drift, let’s begin with a fictional study.

1. Fictional study on the effects of Cyclotramine

on mental health and psychosocial well-being in bipolar disorder

Imagine a prospective study aiming to evaluate the effects of Cyclotramine—a fictional new mood stabilizer—on the mental health and psychosocial functioning of 300 people diagnosed with bipolar disorder. In this ambitious project, researchers follow participants for 24 months, assessing around fifteen variables related to well-being and mental health every three months. The study has no control group.

Cyclotramine: fictional name

Pre-registered protocol

In their pre-registered protocol (filed before the recruitment of participants, as it should be), the researchers hypothesize that Cyclotramine will lead to improvements in the following eight variables:

Mania/Hypomania; Anxiety; Depression; Suicidality; Irritability; Quality of life; Overall functioning; Self-harm.

Strong hypotheses… followed by disappointing results

After two years of follow-up, the outcome is clear: six of the eight variables show no improvement. Only anxiety and depression decrease slightly— and only in half of the participants. The effects are statistically weak and clinically insignificant.

Final publication: the rules quietly change

Here’s where things take an unexpected turn:

In the final publication, the initial hypotheses are quietly revised, with no explanation for the removal of the six key variables that showed no improvement.

Only depression and anxiety—the two variables that showed slight improvement—are retained. To these, the authors add three new variables: self-esteem, positive affect, and… treatment satisfaction.

This last one is measured by a self-report questionnaire with two subscales, but only the treatment satisfaction subscale is reported, with no mention or justification for omitting the other.

Faithful to the protocol… really?

Despite these post-hoc adjustments—removing six key protocol variables, adding three new ones, and selectively reporting results—the authors claim their study remains consistent with the original protocol.

A miraculous variable

By fortunate coincidence, the only variable to show a notable effect across all participants is this treatment satisfaction—a variable added after the fact. And the entire paper revolves around it. The authors even go so far as to present it as a “primary objective” of the treatment—despite it never appearing in the original hypotheses.

Using complex statistical models, the authors argue that improved treatment satisfaction explains the effect of the drug on mental health and psychosocial well-being, even though their own results contradict this claim: the other reported variables—depression, anxiety, positive affect, and self-esteem—only improve slightly in half of the participants, and not at all in the others.

And the conclusion?

Throughout the paper, the authors repeatedly claim that Cyclotramine improves mental health and psychosocial functioning.

In summary, according to them: this drug is effective.

Even though most of the reported results don’t support this.
Even though the hypotheses were changed after the fact.
Even though we know nothing about the evolution of three-quarters of the main variables they originally expected to improve.
Even though the entire argument rests on a single secondary variable, added after the analysis.
And even though—importantly—concomitant treatments (psychotherapy, antidepressants, mood stabilizers, etc.) were never controlled for, making it impossible to seriously attribute any effects to Cyclotramine alone.

Such a paper would be corrected… except in one field

In any other field of medicine, a publication that retroactively alters its hypotheses without explanation, selectively reports only favorable results, and omits negative data would face immediate criticism, leading at the very least to a correction, if not a full retraction.

Except in one field: pediatric gender medicine.

2. The Chen et al. study: a very real case

As surprising as it may seem, the fictional example above is directly inspired by a real study, praised by the media and much of the medical community. It’s the study by Chen et al., published in 2023 in the prestigious New England Journal of Medicine:

Psychosocial Functioning in Transgender Youth after 2 Years of Hormones.

This study evaluates the effects of sex hormones in 315 adolescents identifying as transgender or non-binary.

The authors measured around fifteen psychosocial variables over a 24-month period, at six-month intervals.

In the published article, five results are reported: a slight improvement in life satisfaction, positive affect, depression, and anxiety in natal girls; no improvement in boys. Only the variable “appearance congruence” shows a large effect size across all participants.

Methodological Parallel with the Fictional Study

Hypotheses Modified After Observing the Results

As detailed by Singal (2023, translated into French), the hypotheses formulated in the published article (Chen et al., 2023, p. 241) differ significantly from those stated in the pre-registered protocol (pp. 36/44 of the final 2021 protocol version).

This revision, made after the results were observed, is never acknowledged as such in the publication — a textbook case of HARKing (Hypothesizing After the Results are Known, Kerr, 1998).

Selective Omission of Key Variables

The rewriting of the hypotheses led to:

The exclusion of 75% of the key variables (6 out of 8) for which the authors originally predicted improvement: Suicidality, Gender Dysphoria, Quality of Life, Body Esteem, Trauma-Related Disorders, and Self-Harm. These variables were indeed measured at every follow-up point (see pp. 22–28/44 of the final 2021 protocol), but are simply not reported, with no justification provided.
The introduction of three new variables, again without explanation: positive affect, life satisfaction, and appearance congruence. The latter is not classified by the authors themselves as a psychosocial variable. It is a subscale of the Transgender Congruence Scale (TCS), which was administered in full, but only this component is reported.

The table below illustrates the consequences of this rewriting of the hypotheses.

Variables évaluées	Amélioration significative/sexe
Dépression	Seulement les femmes natales Effet faible (d = 0,20)
Anxiété	Seulement les femmes natales Effet faible (d = 0,25)
Congruence de l’apparence*	Hommes et femmes Effet élevé (d = -1,12)
Affect positif*	Seulement les femmes natales Effet très faible (d = -0,06)
Satisfaction de vie*	Seulement les femmes natales Effet faible à modéré (d = -0,39)
Suicidalité	?
Dysphorie de genre	?
Qualité de vie	?
Estime corporelle	?
Troubles liés aux traumatismes	?
Automutilation	?
Acceptation de l’identité de genre*	?

*Not included in the pre-registered protocol hypotheses

Effect size: 0.20 = small, 0.50 = moderate, > 0.80 = large

The effect sizes are those reported in Table S5 of the study’s supplementary appendix.

A Claim Contradicted by the Facts

Despite these significant discrepancies between the protocol and the final article, the authors assert unequivocally:

“The authors vouch for the accuracy and completeness of the data and the fidelity of the study to the protocol.” (Chen et al., 2023, p. 247)

A Secondary Variable Becomes the Centerpiece of the Analysis

“Appearance congruence,” introduced after the fact, becomes the centerpiece of the analysis.

It is suddenly presented as a “primary objective” of treatment. To support this, the authors employ complex statistical models (parallel latent growth models) aimed at demonstrating that this variable is a potential mechanism by which hormone treatment influences psychosocial functioning.[1]

However, this hypothesis is contradicted by their own results: the reported psychosocial variables only improve slightly in natal girls—and not at all in boys.

Partial and Weak Results Used to Assert a Nonexistent Overall Benefit

Despite the lack of significant improvement in most reported psychosocial outcomes, and the absence of a control group, the authors generalize their conclusions. The article contains several strong claims, such as:

“Our results showed improvements in psychosocial functioning over 2 years of gender-affirming hormone treatment.” (p. 249)

“Increases in appearance congruence were associated with decreases in depression and anxiety symptoms, and with increases in positive affect and life satisfaction.” (p. 245)

And in conclusion:

“Our results showed improvements in psychosocial functioning over 2 years of gender-affirming hormone therapy, supporting its use as an effective treatment […] Overall, our results provide evidence of a beneficial effect of gender-affirming hormone therapy on appearance congruence and psychosocial functioning.” (p. 249)

A Major Methodological Flaw Makes These Conclusions Untenable

Beyond the final statements, a critical methodological flaw renders these conclusions indefensible: the two concluding assertions suggest that cross-sex hormones are responsible for the observed improvements. Yet this is methodologically unjustifiable: the study does not control for concomitant treatments received by participants (antidepressants, anxiolytics, psychotherapy, etc.). No statistical adjustments are made to isolate their effects. Under these conditions, it is impossible to determine whether the few observed changes are due to hormone treatment or to other concurrent interventions (e.g., some participants were severely depressed at baseline and were likely taking antidepressants in addition to hormones).

The Editorial Double Standard

In any other field of medicine, a study that modifies its hypotheses post hoc, fails to report negative results, and highlights a variable introduced after the fact would face immediate criticism and possibly calls for retraction. And indeed, this is precisely what happened: several researchers submitted letters to the editor and requests for correction (Biggs, 2023a; Hare, 2023; Jorgensen, 2023c). However, the New England Journal of Medicine has, to date, made no corrections, nor has it published any substantive response to the critiques. The article remains unchanged, as if the methodological objections were irrelevant.

Real-World Consequences of a Biased Narrative

This gap between the actual results and the reported conclusions risks misleading clinicians, families, and the youth themselves. A professional could wrongly conclude that gender-affirming hormones significantly improve the psychosocial functioning of these young people.

Yet within the framework of informed consent, it is essential that professionals provide youth and their families with balanced, transparent, and evidence-based information. By amplifying the supposed benefits, this kind of publication undermines the possibility of truly informed consent.

Moreover, the potential benefits of hormone treatment—already rated as low to very low certainty by major systematic reviews (including the most recent one: Miroshnychenko et al., 2025)—must be weighed against significant risks: bone fragility and cardiovascular issues (Chan Swe et al., 2022), impacts on fertility (Stolk et al., 2023), sexual dysfunction (Kronthaler et al., 2024; da Silva et al., 2024), and unknown long-term effects of use beginning in adolescence (Cass Review, 2024, Point 90).

These treatments are also prescribed to a population whose identity and priorities are highly likely to evolve over time (Cass Review, 2024, §16.10; Bachmann et al., 2024; Rawee et al., 2024; Sapir, 2024).

In such a context, scientific rigor is an ethical imperative. As long as the data remain this fragile, a cautious approach grounded in robust evidence is essential—before further generalizing or medicalizing youth in the midst of forming their identities.

[1] Unlike the fictional study, the hypothesis advanced by Chen et al. (that appearance congruence might improve psychosocial functioning) is based on a more psychologically plausible rationale. However, the issue is not the theoretical plausibility of the hypothesis, but how it is used: it was formulated after the fact, and it becomes the central interpretive lens of a paper that ignores most of the originally planned clinical variables.

À savoir : Les auteurs d'un article du British Medical Journal ont évalué cette étude en utilisant deux outils :

Échelle de Newcastle-Ottawa (utilisée dans la revue systématique sur les hormones qui informé le Cass Review ; cette revue n'a pas évalué cette étude car elle a été publiée après la recherche des études).
➥ Score total : 5/7, qualité « modérée » (cette échelle ne prend pas explicitement en compte le phénomène de HARKing).
Risk of Bias in Non-Randomised Studies - of Interventions (ROBINS-I) (voir l'annexe supplémentaire en ligne, appendice 3 ).
➥ Risque biais : critique.

OBSERVATORY "La Petite Sirène"

Sapere Aude!
dare know !