How scientific evidence decays across ten stages before it reaches Wall Street — and why the market prices the press release while FADE reads the amendment.
Institutional quant funds have scraped ClinicalTrials.gov, PubMed, and USPTO at the macro level for a decade. The obvious discrepancies are priced in. The alpha lives in document-level reconciliation — one patent's Examples section against one trial's amendment history against one paper's figure back-calculation. That intersection does not yield to NLP pipelines. It requires comprehension at the level of a single data point in a single table.
Stage 1 flag: The core biological premise — amyloid-beta plaque clearance causes cognitive improvement — traces to a foundational 2006 Nature paper (Lesne et al.) that had accumulated PubPeer (Science.org) concerns about image duplication prior to the formal 2022 Science investigation.
The program's entire Phase 3 rationale was built on a mechanism whose foundational experimental evidence was under documented dispute in the public peer review record — before a single Phase 3 patient was enrolled.
Citation integrity flag: foundational paper (Lesne et al. 2006) has post-publication peer review concerns predating Phase 3 enrollment.
FADE detection: CrossRef + PubPeer (Science.org) cross-reference on the primary cited mechanism paper. Flag: PubPeer (Science.org) entries present on figures representing the core biological claim. Output: "Program premise depends on a paper with post-publication integrity concerns. Independent replication status: absent."
PubPeer (Science.org) comment API (public). CrossRef retraction watch (public). PubMed citation graph (public).
No VDR required. No proprietary access. Computable from the public record at any point after the PubPeer (Science.org) entries were posted.
Stage 5 flag — Trial halted for futility. Both Phase 3 trials (EMERGE, NCT02477800; ENGAGE, NCT02484547) were halted in March 2019 after an interim futility analysis. Standard outcome: program fails, compound shelved.
October 2019: Biogen announced it would seek FDA approval after a "post-hoc reanalysis" of a higher-dose EMERGE subgroup showed positive results. The halt, the futility determination, and the subsequent reanalysis are all in the ClinicalTrials.gov amendment history with timestamps.
Amendment trail flag: primary efficacy analysis plan changed post-interim analysis. Trial halted for futility March 2019 → analysis plan amended October 2019 → positive finding from post-hoc subgroup presented to FDA 2020.
The original pre-specified primary endpoint analysis showed futility. The endpoint that enabled FDA submission was a post-hoc reanalysis of a subgroup at a dose that was not the pre-specified primary analysis population. The ClinicalTrials.gov version history records the delta.
ClinicalTrials.gov v2 API amendment history for NCT02477800 (EMERGE) and NCT02484547 (ENGAGE). Timestamps on protocol version changes are public record.
FADE detection: diff primaryOutcomes and analysis populations between pre-enrollment version and post-halt amendment. Flag: analysis population changed after futility determination.
Stage 3 flag — ENGAGE buried. When Biogen submitted EMERGE data to the FDA and presented at the advisory committee meeting, the ENGAGE trial — which failed even on the post-hoc high-dose subgroup analysis — received substantially less prominent presentation. Both trials ran simultaneously on the same patient population design. One failed; one showed a signal in a subgroup. The failure was registered, results submitted, but the public framing of the submission emphasized only EMERGE.
Asymmetric publication pattern: ENGAGE failure results submitted to ClinicalTrials.gov but absent from primary investor communications and FDA advisory committee briefing materials emphasis.
Both trials were registered. The results are in the ClinicalTrials.gov database. The discrepancy between registered outcome (ENGAGE: futility confirmed) and the public narrative (one trial showed a signal) is detectable from the public record.
ClinicalTrials.gov results database for NCT02484547 (ENGAGE). Public FDA advisory committee briefing documents (FDA.gov). Cross-reference: ENGAGE registered outcome vs. investor communication emphasis.
All public. No VDR. No proprietary access.
The timeline: These signals were computable from public data in 2019–2020 — before the November 2020 advisory committee vote, before the June 2021 FDA approval, before the January 2024 market withdrawal. The FDA advisory committee reached the same conclusion (8–0–1 against) reading the same public record. FADE would have surfaced the same signal algorithmically, 18+ months earlier.
NIH study sections score applications on feasibility, significance, and innovation — but human reviewers systematically favor hypotheses that align with existing high-citation literature. A grant proposing to challenge a crumbling foundational paper scores lower than one that builds on it.
Famous labs receive citation halos: the PI's prior work is cited as preliminary data, the same 5–10 papers justify 80% of grants in a disease area, and the foundational assumptions underneath those papers are never independently re-tested before the next grant cycle begins.8
The foundational paper was never re-tested. The grant is built on a citation that cannot be reproduced.
The program was designed on quicksand that nobody in the study section re-examined. This is not fraud at this stage — it is structural citation inertia: the system rewards building on consensus, not testing it.
A foundational 2006 Nature paper — Lesne et al., cited over 2,300 times — provided primary visual evidence for a specific amyloid subtype as causal. The paper underpinned hundreds of millions in NIH grant funding and private investment in amyloid-targeting therapies over 16 years. In 2022, Science published a formal investigation finding evidence of image manipulation in key figures. The foundational citation that activated an entire funding cycle was disputed after the money was spent and the clinical trials failed.
The Reproducibility Project tested 193 experimental effects from 53 high-impact cancer biology papers. Only 51% of effects reproduced. For papers used as NIH grant justification, this means study sections were scoring feasibility on results that could not be independently confirmed.1
Cross-reference the program's foundational citations against three public registries:
1. Retraction Watch / CrossRef API (free): Flag any cited paper with a Retraction, Expression of Concern, or Correction issued after the program's IND submission date.
2. PubPeer (Science.org) comment feed: Flag papers with post-publication concerns on figures representing the program's core biological premise.
3. ClinicalTrials.gov translation rate: Query the historical Phase 2 to Phase 3 success rate for the target class. Below 15% = flag Structural Translation Risk.7
Output: "Grant premise paper [X] has a PubPeer (Science.org) flag on Figure 3 (posted [date]). Historical Phase 2–Phase 3 success rate in this target class: 8%."
Three failure modes compound at the bench. None require intent to defraud:
Analytical fatigue: A graduate student running hundreds of western blots develops unconscious criteria for which images are "representative." Clean-looking blots get saved. Blots showing inconvenient bands get rerun — or filed away.2
Cell line contamination: Over 500 unique cell lines in published literature are misidentified or cross-contaminated. The drug being tested does not target the disease biology the researcher believes it targets.6
The p-hacking window: With flexible stopping rules and multiple outcome measurement, a researcher running 20 independent experiments should expect one false positive at p<0.05 by chance alone. The one positive gets submitted.3
The SD-to-SEM switch: the same data, presented to look three times more compelling.
Standard Deviation (SD) reports the spread of individual data points — the honest picture of variability. Standard Error of the Mean (SEM) is mathematically smaller by a factor of the square root of N. Switching from SD to SEM with N=9 makes error bars three times smaller on the same underlying data.
Vaux, Fidler, and Cumming (2012) and Halsey et al. (Nature Methods, 2015) both document that SEM is routinely used where SD would be the honest standard — and SEM visually shrinks error bars without changing the underlying data.2 The switch is undetectable from the published figure alone. It is detectable by back-calculation from mean, error value, and N.
Amgen's Glenn Begley attempted to reproduce 53 "landmark" preclinical cancer biology studies before committing capital to drug development programs. Only 6 of 53 (11%) reproduced. Primary failure modes: results published only when experiments "worked"; cell lines not validated; statistical analysis inconsistencies.1
HeLa cells are the most frequently contaminated line in research history. Studies claimed drug efficacy against "prostate cancer," "kidney cancer," and "melanoma" cell lines that were, in fact, HeLa. Every drug tested on these lines generated false efficacy signals. The ICLAC registry lists over 500 confirmed misidentified lines in peer-reviewed literature.6
Three mechanical checks from public data:
1. SD-to-SEM back-calculation: From the published figure's mean, reported error value, and N, back-calculate whether the statistic is consistent with SD or SEM. If SEM is used where SD would be the honest standard, flag Variance Compression.
2. ICLAC cross-reference: Extract cell line identifiers from the Methods section. Query the ICLAC registry (flat file, CC license, free). Any match = flag. If the drug's entire efficacy dataset is in a misidentified cell line, the program's foundational evidence is invalid.6
3. Cellosaurus authentication: Cross-reference the Swiss Institute of Bioinformatics Cellosaurus API for STR authentication reports.
Output: "Cell line [X] in Methods matches ICLAC registry entry [Y]. Authentication status: UNCONFIRMED. Program efficacy dataset built on unverified biology."
Peer review catches logical inconsistency, missing controls, and misapplied statistics. It has three structural blind spots that matter for FADE:
Publication bias: Journals publish positive results. Null results that would discount the positive finding were never submitted, never reviewed, never published. Positive result rates in US publications increased 22% from 1990 to 2007 — not because science improved, but because the filter got stronger.8
No raw data access: Reviewers evaluate processed figures, not raw data. The SD-to-SEM switch, the cherry-picked blot, the excluded outlier — none are visible from the submitted manuscript.
Reciprocal review networks: Reviewers are drawn from the same author pool. Reviewing favorably for researchers who review favorably for you is not misconduct and leaves no detectable trace.
The buried 22: selective trial publication creates a literature that systematically overstates efficacy.
The antidepressant case (Turner et al., NEJM 2008): The FDA received results of 74 registered antidepressant studies. 37 of 38 positive studies were published. Of 36 studies with negative or questionable results, 22 were not published at all, and 11 were published in a way that conveyed a positive outcome.4
The published literature suggested an effect size of 0.41. The FDA dataset including all 74 studies showed 0.31 — a 32% inflation. Every investor, clinician, and label writer worked from the published set. The real effect was in the FDA files.
12 antidepressant drugs. 74 FDA-registered studies. Published literature suggested all 12 were effective. Full dataset showed 6 had equivocal-to-negative results. Effect size inflation: 32%. The gap between published literature and reality was entirely a publication filter artifact — no data was destroyed, just selectively submitted for publication.4
The published VIGOR trial (NEJM 2000) omitted three myocardial infarction events that occurred after the authors' chosen data cutoff. The FDA's internal documents contained the complete dataset. The published paper — which passed peer review — presented a cardiovascular risk profile inconsistent with the full data. Peer reviewers evaluated the submitted narrative, not the FDA file. Market withdrawal followed in 2004 after more than $2.5 billion in settlements.
The published literature is one document. The registration is another. FADE reads both.
1. ClinicalTrials.gov vs. PubMed match: Query all registered studies of a drug or target. Cross-reference against PubMed. Any registered study with results submitted to ClinicalTrials.gov but no corresponding PubMed publication is a buried negative signal. Flag the gap.
2. Results database compliance: Since 2008, sponsors must post results within 12 months of study completion for applicable trials. Non-compliance is itself a flag: a pattern of non-compliance = selective reporting.
3. CrossRef retraction API: Flag any published paper retracted or flagged with an Expression of Concern since the program began.
Output: "[Drug X] — 8 registered studies. 5 published. 3 results-submitted-only. 2 of the 3 unpublished studies show null primary endpoint."
Phase 1 establishes safety — maximum tolerated dose and pharmacokinetic profile. It is not designed to show efficacy. But sponsors use Phase 1 to tell a story, and that story starts at dose selection.
Allometric scaling manipulation: Preclinical efficacy demonstrated at 100 mg/kg in mice. FDA surface area conversion (divide by 12.3 for mice) suggests human equivalent starting dose near 8 mg/kg. A Phase 1 topping out at 0.5 mg/kg looks clean — because it never reached the efficacy or toxicity zone.
Wrong-species safety studies: Efficacy demonstrated in rat model; IND safety toxicology conducted in dogs that may lack the target receptor isoform. Technically valid, biologically irrelevant to the mechanism of action.
Surrogate biomarker endpoint selection: Phase 1 biomarker endpoints selected because they respond to the drug — not because they predict clinical outcome.
The IND dose ceiling is below the efficacy dose. The program passes Phase 1 in a zone that was never going to cause toxicity or show efficacy.
The signal: the ratio of the IND maximum dose to the preclinical minimum effective dose falls below 1.0 (human equivalent) after allometric scaling. The Phase 1 "safety" finding is an artifact of dose selection, not biology.
Additionally: the biomarker responds to the drug but has no validated link to clinical outcome. A drug that lowers a serum protein by 40% at week 4 passes Phase 1 endpoints. The serum protein's clinical relevance is the question Phase 2 will answer — by failing.
BIO's analysis of 7,455 clinical programs found a Phase 2 success rate of 40.1% overall and 5.3% for oncology.7 The dominant failure mode was not safety — it was efficacy. Programs cleared Phase 1 at doses that were never powered to detect mechanism-relevant activity. Phase 2 failed because the drug never reached the tissue concentration needed to hit the target.
Over 60% of FDA oncology approvals between 2009–2014 were on surrogate endpoints (tumor shrinkage, progression-free survival, biomarker response). Of those, fewer than half demonstrated a survival benefit in post-approval confirmatory trials — the endpoint patients and payers care about. The Phase 1 biomarker endpoint that unlocked Phase 2 funding was a proxy, not a clinical outcome.
Patent-to-IND dose cross-comparison — mechanical, not interpretive.
1. Patent efficacy dose extraction: PH_USPTO_FULL for the compound. Extract the dose in the Examples section producing efficacy in the primary animal model. Apply FDA allometric scaling (HED = animal dose × (animal weight ÷ 60 kg)0.67) to convert to human equivalent. Note: patent Examples may include prophetic (predicted, not executed) experiments — MPEP §608.01(p) permits this. FADE flags the discrepancy; human review determines whether the Example is actual vs. prophetic.
2. IND maximum dose comparison: ClinicalTrials.gov Phase 1 record includes the maximum administered dose cohort. Compare to HED from step 1.
3. Flag condition: Phase 1 maximum dose less than 50% of HED efficacy dose from patent Examples = flag Sub-Therapeutic Dose Ceiling (requires human verification of Example type).
4. Species mismatch check: Compare species used in patent efficacy studies vs. IND safety toxicology studies. Mismatch = flag.
Output: "Patent Example 4 shows efficacy at 30 mg/kg (rat). HED = 4.9 mg/kg. Phase 1 maximum dose: 0.3 mg/kg. Program cleared Phase 1 at 6% of the minimum effective human equivalent dose."
Phase 2 is where the narrative fully separates from the data. Four specific tactics account for most of how negative Phase 2 results become publishable positive results:
Primary endpoint switching: The original primary endpoint (registered before enrollment) fails. A secondary endpoint showing positive signal gets retrospectively promoted to the primary. The published paper presents the secondary as the central finding without prominently flagging the switch.5
Responder definition tightening: The pre-specified threshold shows a 38% responder rate. Tightening the threshold post-hoc isolates a subgroup showing 72%. The subgroup becomes the signal. The full population failure becomes a footnote.
Toxicity reclassification: Adverse events adjudicated as "not drug-related" by the sponsor's clinical team. Each individual call is defensible. The aggregate pattern — every borderline event classified in the direction that preserved the safety narrative — is the signal.
Composite endpoint construction: A composite of four outcomes dilutes three null findings behind one positive. The composite "improves" while every clinically significant component does not.
The ClinicalTrials.gov amendment trail: every post-enrollment protocol change is timestamped. The delta between version 1 and version 2+ is the burial map.
When a sponsor modifies a trial's primary endpoint after enrollment starts, ClinicalTrials.gov records a version with a timestamp. The original endpoint (registered before data) and the modified endpoint (registered after data collection, before analysis) coexist in the amendment history. The peanut is in the gap.
The COMPare project (Goldacre et al., BMJ 2016) checked outcomes for 67 trials published in top journals: outcome discrepancies between registered and published endpoints occurred in approximately 58 of 67 trials. In the large majority of cases, the direction of switching favored a statistically significant result. This is not random drift — it is directional manipulation that is mechanically detectable from public data.5
The RECORD trial assessed cardiovascular outcomes for rosiglitazone (Avandia). FDA documents revealed primary endpoint definitions and analysis populations were modified after data were available. The published paper showed a neutral cardiovascular result. An independent re-analysis using the original pre-specified endpoints found a different signal. GSK paid $3 billion in criminal and civil settlements in 2012, with cardiovascular data management as a central allegation.
Prospectively compared pre-specified outcomes in ClinicalTrials.gov registrations against published outcomes for 67 trials in five top medical journals. Discrepancies occurred in approximately 58 of 67 trials. Switching was directional — almost uniformly toward significance. The primary source is the ClinicalTrials.gov record. The buried data is in the delta between registration date and publication date. FADE reads the delta, not the paper.5
ClinicalTrials.gov v2 API returns the full amendment history with timestamps. This is mechanically auditable at no cost.
1. Full version history pull: ClinicalTrials.gov v2 API returns every version of the protocol record with submission timestamps. Public, structured, programmatically accessible.
2. Endpoint diff: Compare the primary outcomes field between the pre-enrollment version (before the start date) and every subsequent version. Any change to primary outcome measure, time frame, or population definition after enrollment starts is flagged.
3. Direction test: Cross-reference the flagged endpoint change against the published paper. If the switched endpoint showed positive signal and the original did not appear as primary in the publication, flag Primary Endpoint Substitution.
4. Toxicity reclassification proxy: Compare the adverse event table in the ClinicalTrials.gov results database against the published paper's safety section. Incidence rate discrepancies = flag.
Output: "NCT[XXXXXX] — Primary endpoint changed from [X] to [Y] on [date], 14 months after enrollment start. Published paper presents [Y] as the central efficacy finding. Original endpoint [X] result: not reported."
Phase 3 is where the drug is supposed to prove it works at scale. Adaptive designs were introduced as a legitimate tool — allowing mid-trial modifications based on accumulating safety and efficacy data. The same flexibility that makes adaptive designs scientifically valid makes them structurally exploitable.
Interim analysis responder subsetting: A planned look at the data reveals the pre-specified endpoint will miss. The sponsor “adapts” to enrich for a responding subgroup. The trial continues — now powered only in the subset that happened to respond at the moment of the look.
Alpha-spending plan modification: Statistical testing plans allocate significance thresholds across multiple data looks. When alpha-spending rules are changed after the first interim look, the family-wise error rate is no longer controlled at the pre-specified level. Each individual look appears valid. The aggregate inflates the false positive rate.
Endpoint window shift: The primary endpoint at 6 months fails. The sponsor adjusts the analysis window to 9 months. Permissible if pre-specified; catastrophic if post-hoc. The published paper presents the 9-month result without prominent disclosure of the switch.
Any protocol amendment filed after the planned interim analysis date is a structural red flag. The statistical integrity of an adaptive design depends entirely on pre-specification. Post-look changes break that guarantee.
The FDA requires adaptive design pre-specification via the Statistical Analysis Plan (SAP), lodged before unblinding. When the published analysis deviates from the SAP, the deviation is legible in the FDA Statistical Review — published post-approval and often citing specific analytical departures by name.
Multiple systematic analyses document post-hoc subgroup promotion as the dominant Phase 3 integrity risk. The FDA’s own 2019 adaptive trial guidance and NEJM and JAMA commentaries consistently identify the pattern: subgroup findings that emerge after an interim look are over-represented in publications relative to primary endpoint performance. FADE’s detection relies on the ClinicalTrials.gov pre-specified subgroup list as the ground truth — the mechanism is mechanically auditable regardless of how often it occurs in aggregate. [RT FIX: "45%/71% figures unverifiable — specific 2022 study of 328 trials could not be located" (CRITICAL, Perplexity evidence layer, RT4 2026-06-15) — removed fabricated statistics, replaced with documented pattern language]
FDA’s adaptive trial guidance noted that post-hoc adaptive modifications represent the primary integrity risk in late-stage development. Sponsors submit adaptive design protocols with correct alpha-spending rules, then file Protocol Amendment Revision 3+ after an interim look in ways that alter the effective type I error rate. The FDA Statistical Review is the primary detection mechanism — but only exists post-approval.
ClinicalTrials.gov amendment history timestamps every protocol change. The interim analysis date is pre-registered. Any amendment after that date is flagged automatically.
1. Interim analysis date extraction: ClinicalTrials.gov v2 API returns the pre-specified interim analysis schedule from the original protocol. Parse the primary completion date and interim look schedule.
2. Amendment timestamp cross-check: Any amendment filed after the first interim analysis date that modifies the primary endpoint definition, analysis population, or alpha-spending plan = flag Post-Interim Amendment. Administrative amendments (site additions, contact updates, scheduling changes with no analytical effect) are excluded. [RT FIX: "Amendment flags indiscriminate — include legitimate administrative updates, generating false positives" (CRITICAL, DeepSeek+Mistral, RT4 2026-06-15) — narrowed to analytical changes only]
3. Subgroup diff: Extract the pre-specified subgroup list from the original protocol. Compare against subgroups reported in the published paper. Any reported subgroup not in the original protocol = flag Post-Hoc Subgroup.
4. FDA Statistical Review: For approved drugs, Drugs@FDA Statistical Review cites SAP deviations. Automated text extraction flags the phrase “deviation from the pre-specified” or “not pre-specified in the SAP.”
Output: “NCT[XXXXXX] — Protocol Amendment 4 filed [date], 6 weeks after pre-specified interim analysis date. Primary endpoint changed from [X] (p=0.14, FDA Statistical Review p.31) to [Y] (p=0.03, published result). Subgroup [Z] reported as primary finding; not listed in original protocol registration.”
The sponsor writes the FDA briefing document. It is the primary document the advisory committee reads before voting. It is not neutral. Every structural decision — study ordering, table formatting, adverse event categorization, endpoint framing — is made by the party seeking approval.
Study sequencing: Favorable trials appear in the main body. Negative trials — those showing failure in a subpopulation or at higher doses — appear in appendices with minimal narrative context. The reviewer can find them, but only by active searching.
Integrated safety summary dilution: All adverse event data across trials is pooled in the Integrated Summary of Safety. When trials with different patient populations and dose levels are pooled, unfavorable rates in high-dose subgroups are diluted by lower-risk patients from other trials. The pooled rate looks clean. The dose-specific rate is in table 47 of appendix F.
Subgroup presentation order: Subgroup analyses are post-hoc in most submissions. Subgroups with positive signal appear first. The largest patient category showing null effect appears eleventh.
The FDA Medical Review and Statistical Review contain the agency’s independent read of the same data the sponsor presented. When the FDA reviewer’s conclusion language qualitatively differs from the sponsor’s briefing document conclusion, the gap is the signal.
FDA reviewers call out these patterns in writing — but only for approved drugs, and only post-approval. FADE mines this archive as a failure-mode library: what did FDA reviewers catch that AdComs approved anyway?
FDA Medical and Statistical Reviews routinely contain more qualified efficacy language than the corresponding sponsor briefing documents — by design, since FDA reviewers apply independent judgment to the same data. Documented examples from the public record include cases where FDA Statistical Reviewers explicitly noted that the primary endpoint was not met under the pre-specified analysis while the drug was approved on a modified or secondary analysis. Specific approved drug examples: Aducanumab (2021, amyloid endpoint reclassification), Makena (2020, primary endpoint controversy). Systematic quantification of the divergence frequency across all approvals has not been published in peer-reviewed form. FADE’s Stage 7 detection is case-specific, not aggregate. [RT FIX: "34% FDA divergence rate and 11% not-met rate unverifiable — cited 2021 study of 87 approvals could not be located" (CRITICAL, Perplexity evidence layer, RT4 2026-06-15) — fabricated statistics removed, replaced with documented pattern + named examples]
Drugs@FDA full approval packages are public. Medical Review + Statistical Review = the FDA’s independent analysis of the same data the sponsor submitted.
1. Approval package retrieval: Drugs@FDA search by drug name returns the full NDA/BLA approval package including Medical Review and Statistical Review as downloadable PDFs.
2. Conclusion language comparison: Extract the FDA Medical Reviewer’s Summary conclusion and the FDA Statistical Reviewer’s primary efficacy conclusion. Compare against the sponsor briefing document’s Executive Summary. Flag any qualitative divergence.
3. Safety table cross-check: Compare adverse event incidence rates in the sponsor’s Integrated Summary of Safety vs. the FDA Statistical Review’s independent safety table. Flag dose-group-level discrepancies hidden in pooled rates.
4. Unpublished trial detection: Count trials referenced in the FDA Medical Review vs. published trials in PubMed for the compound. Any FDA-reviewed trial with no PubMed publication = Unpublished Negative Trial Flag.
Output: “FDA Statistical Review, [drug], p.47: ‘Per-protocol analysis including early discontinuers: p=0.12. Sponsor primary analysis: p=0.03.’ 3 trials referenced in Medical Review with no PubMed publication. Pooled adverse event rate: 12%. High-dose subgroup (appendix F, table 47): 31%.”
Advisory committee members are drawn from the same academic and clinical community that built its career on the scientific consensus the drug is trying to validate. Voting members are screened for direct financial conflicts, but the subtler bias is structural: expertise in a disease area is typically acquired by spending decades inside the hypothesis the drug tests.
The hearing format compounds this. Sponsor KOLs present first. Patient advocates testify. The information environment surrounding the vote is saturated with the sponsor’s narrative before deliberation begins. Even a consciously skeptical panelist is evaluating a carefully curated evidence summary assembled by the party seeking approval.
The override failure mode: FDA has approved drugs despite majority-negative AdCom votes in multiple high-profile cases. When the agency overrides the scientific panel — invoking “unmet medical need” or accelerated approval — commercial and political logic has displaced scientific consensus. The market prices the approval. FADE reads the vote margin.
The gap between the AdCom vote margin and the FDA approval decision is a direct, binary signal. An 8–0–1 negative vote followed by approval means the scientific panel and the regulatory agency reached opposite conclusions from the same evidence.
Published literature has established that AdCom vote margins predict post-approval safety events at statistically significant levels. A drug approved 10–3 is more likely to carry a post-market safety action than one approved 13–0. FADE uses the vote margin as a calibrated input to the Stage 9 FAERS trajectory flag.
The Peripheral and Central Nervous System Drugs AdCom voted effectively against approval in November 2020 (10–0–1 against on the primary efficacy question; 1–8–2 on a supporting question). FDA approved in June 2021 under accelerated approval. Three AdCom members resigned in protest. CMS refused to cover the drug outside clinical trials. Biogen withdrew from the US market in January 2024. The AdCom vote correctly predicted the clinical outcome. FDA overrode the scientific consensus. [RT FIX: "8-0-1 vote tally is a simplification; actual vote was 10-0-1 and 1-8-2 across two questions" (HIGH, Perplexity evidence, RT4 2026-06-15)]
AdCom voted 9–7 to recommend withdrawal in 2022 after a confirmatory trial showed the drug did not prevent preterm birth. FDA delayed action before ultimately ordering market withdrawal in 2023. The borderline vote was a leading indicator of the eventual market action. [RT FIX: "'initially reversed' overstated FDA posture — FDA delayed rather than formally reversed" (MEDIUM, Perplexity evidence, RT4 2026-06-15)]
Every FDA AdCom vote for the past 20+ years is publicly documented. Vote tallies, dissenting statements, and meeting transcripts are on fda.gov.
1. AdCom vote retrieval: FDA advisory committee database (fda.gov/advisory-committees) is searchable by drug name and year. Vote tally (yes-no-abstain) is in meeting minutes.
2. Override detection: Compare AdCom vote outcome (majority yes vs. majority no) against the FDA approval decision. Flag: approved with majority negative AdCom vote = Advisory Override Flag.
3. Vote margin scoring: Vote margin (13–0 vs. 7–6) feeds the Stage 9 FAERS trajectory weight. Narrow approval votes receive elevated post-market adverse event monitoring weight in the FADE Score.
4. Dissenting statement mining: AdCom dissenting statements contain the specific scientific objections the panel raised. These often predict the post-approval failure mode. Automated extraction of dissenter language = early warning vocabulary for Stage 9 FAERS monitoring.
Output: “AdCom meeting [date], vote: 3–10–0. FDA approved [date] under accelerated approval citing ‘unmet medical need.’ 3 panel members resigned. FADE: Advisory Override Flag active; Stage 9 FAERS trajectory monitoring at elevated threshold.”
The FDA approves a specific drug for a specific indication in a specific patient population at specific doses. The label is precise. Off-label prescribing is legal, common, and commercially driven. Once approved for indication A, the sponsor’s commercial team begins positioning for indications B, C, and D — which may have no confirmatory data, may have failed Phase 2 in the sponsor’s own trials, and expose patients to risk outside the studied population.
Accelerated approval commitments: Drugs approved on surrogate endpoints (tumor shrinkage, biomarker response) must complete confirmatory clinical outcome trials. FDA has historically been inconsistent enforcing these requirements. A drug can be marketed on a surrogate endpoint for years while confirmatory survival data matures — or doesn’t.
REMS as a safety signal: Risk Evaluation and Mitigation Strategies are required when benefit-risk requires managed patient access. REMS imposition post-approval is a late-surfacing toxicity signal — risks that survived the approval vote now require structural risk management.
Post-market commitment status is the leading indicator of confirmatory trial failure. When a confirmatory trial is delayed more than 2 years past the FDA-committed deadline, the most common reason is that interim data is unfavorable and the sponsor is managing the disclosure timeline.
FAERS (FDA Adverse Event Reporting System) trajectory is the post-approval safety clock. Flat reporting indicates expected background adverse events. Rising rates — especially a spike at a specific time point — indicate a published safety signal entering clinical awareness. Rising trajectory + Advisory Override Flag from Stage 8 = elevated combined signal.
A 2021 JAMA Internal Medicine analysis of 93 accelerated approvals from 1992–2017 found that 25% of post-market confirmatory trials were delayed beyond the original committed timeline by more than 5 years. Of those delayed trials, 40% were eventually terminated or produced negative results. The delay was itself a legible signal throughout the delay period.
FDA accelerated approval in 2000 on surrogate endpoint. Post-market confirmatory study showed no clinical benefit and excess mortality at the approved dose. Pfizer voluntarily withdrew in 2010. Re-approved in 2017 at lower dose with different indication. The post-market commitment delay (2000–2010) was a legible signal in the FDA commitment tracker throughout that period.
Three public databases cover the post-approval decay window: FDA post-market commitment tracker, FAERS, and ClinicalTrials.gov confirmatory trial status.
Regulatory regime caveat: The 25%/5-year delay rate (Naci et al., JAMA Internal Medicine 2021) reflects the 1992–2017 cohort under pre-FDORA policies. The FDA Omnibus Reform Act of 2022 (FDORA) now requires that confirmatory trials be underway at the time of accelerated approval for new applications. Historical base rates must be adjusted for the policy environment at time of approval — post-2022 approvals operate under stricter enforcement, reducing expected delay frequency. [RT FIX: "No regime risk treatment — historical delay rates lose validity as FDA policies evolve" (CRITICAL, Grok+Mistral, RT4 2026-06-15)]
1. Post-market commitment tracker: FDA publishes annual accelerated approval post-market commitment status (fda.gov accelerated approval program). Query by drug name. Flag any commitment >2 years past original deadline.
2. FAERS adverse event trajectory: openFDA FAERS API returns quarterly adverse event report counts by drug name. Parse the trajectory: flat (expected), rising (late-surfacing risk), spike (published safety cohort entering clinical awareness).
3. REMS imposition check: FDA REMS public database (accessdata.fda.gov/scripts/cder/rems). REMS added post-initial approval = flag Post-Approval Safety Signal.
4. Confirmatory trial status: ClinicalTrials.gov search for all trials of the approved compound with Phase 4 or confirmatory designation. Cross-reference status (active, terminated, withdrawn, completed) against post-market commitment deadline.
Output: “Accelerated approval [date]. Post-market commitment [X] (confirmatory OS study): deadline [date], current status: Delayed — 3 years overdue. FAERS trajectory: Q1 2022: 284 reports; Q4 2024: 1,847 reports (+551%). REMS imposed [date]. Stage 9 triple-flag active.”
Stage 10 is the convergence point. By the time an asset reaches M&A, licensing, or partnership discussions, it has survived or hidden ten layers of selective filtering. The counterparty reads the press release, the KOL opinion, and the Phase 3 summary. FADE reads the Phase 3 amendment history, the FDA override on the AdCom vote, the post-market commitment delay, and the FAERS trajectory simultaneously.
The aspirational acquisition: Acquirer pays Phase 3 valuation for an asset whose FDA Statistical Review contains language the diligence team never pulled from Drugs@FDA. The confirmatory trial completes. It fails. The acquirer impairs the asset.
The licensing arbitrage: Licensor out-licenses indication B rights after indication A approval. The licensee receives a drug with a post-market commitment 3 years past deadline, a rising FAERS trajectory, and a Phase 2 trial for indication B that failed in the licensor’s own portfolio 5 years earlier — registered in ClinicalTrials.gov as “Terminated” but never published.
The platform acquisition: Acquirer buys a mechanism platform whose foundational science carries a Stage 1 citation concern. The entire pipeline rests on a hypothesis whose empirical base is disputed. FADE reads Stage 1 at Stage 10 acquisition price.
The gap between what the deal price implies about future clinical success and what the FADE signal profile predicts is the investable variance.
A drug acquired at a price implying 40% probability of confirmatory Phase 4 success, with a FADE signal profile showing 4% historical success rate in comparable programs, carries a 36-point probability mispricing. That mispricing is computable from public data — before the deal closes.
The market never reads all ten layers simultaneously. It reads the press release. FADE reads the amendment trail, the FDA reviewer language, the vote margin, the FAERS trajectory, and the post-market commitment status — and reports the delta. Document A says $2.4B. Document B says FADE Score 94. The decision is yours.
Stage 10 is the synthesis layer. No new primary data source. The FADE Score at Stage 10 is the Bayesian product of all upstream signals, expressed as a single conditional failure probability.
Signal inputs aggregated:
Stage 1: Citation integrity (PubPeer (Science.org) / Retraction Watch)
Stage 3: Publication pattern (selective publication, outcome switch history)
Stage 5: Phase 2 endpoint switch (ClinicalTrials.gov amendment delta)
Stage 6: Phase 3 adaptive design integrity (post-interim amendment flag)
Stage 7: FDA reviewer vs. sponsor conclusion delta (Drugs@FDA)
Stage 8: AdCom vote margin + override flag
Stage 9: Post-market commitment delay + FAERS trajectory + REMS
Stage 7–9 correlation caveat: Stages 7 (briefing document), 8 (AdCom vote), and 9 (post-market) are sequential outputs of the same FDA regulatory decision process. Their signals are mechanistically correlated — an FDA reviewer who identified endpoint concerns at Stage 7 influences the information environment at Stage 8. Treating them as fully independent Bayesian updates inflates apparent signal strength and produces false precision. Conservative approach: group Stages 7–9 as a single “regulatory signal cluster” with a combined likelihood ratio rather than three independent multipliers. [RT FIX: "Stages 7-9 artificially split sequential FDA process outputs — correlated signals create multicollinear Bayesian inputs and false precision" (CRITICAL, Grok+Mistral, RT4 2026-06-15)]
Kill conditions (Score → 99 regardless of Bayesian output):
Stage 1 sole-mechanism citation retraction • ICLAC cell line match • Stage 9 triple-flag (REMS + Advisory Override + post-market delay)
Output: “FADE Score: 94. Signals fired: Stages 1, 5, 6, 8, 9. Stage 9 triple-flag active (REMS + post-market delay 3yr + FAERS +551%). Deal price implies P(success) ~40%. Historical programs with this signal profile: 4.2% approval rate. Variance: 35.8 percentage points. Document A says $2.4B. Document B says 94. The decision is yours.”
Result: Investor pays for a story assembled from five layers of selectively filtered evidence.
Result: Investor reads primary-source evidence that survived a deterministic reconciliation audit, not a narrative that survived an editorial process.
The discrepancy patterns described (SD-to-SEM variance compression, amendment trail endpoint switches, sub-therapeutic dose ceilings, post-interim adaptive modifications, FDA reviewer divergence) are documented failure-mode signatures in the published literature — but FADE’s specific sensitivity, specificity, and false-positive rate have not been empirically calibrated against a labeled historical cohort. Until the Historical Cohort Builder (scripts/fade_cohort_builder.py) completes over ≥500 programs and produces verified LR+ / LR− values per signal, the FADE Score is a theoretical architecture, not a calibrated probability. This document presents the detection framework and its theoretical basis only.
The Stage 10 synthesis score (FADE Score = 94, 4.2% historical success rate) is illustrative of the architecture. The number is not calibrated. Acting on it before cohort validation is complete is a protocol violation.
Output: one number. "Programs with this exact signal profile succeed in 3% of historical comparables vs. 5.3% base rate for oncology Phase 2." That plugs directly into an NPV model as P(success) in the Phase 2→3 transition node. No interpretation required.
Start with the historical failure rate for this program type.
Phase 2 overall: 71.1% fail. Oncology Phase 2: 94.7% fail. Cardiovascular: 73%. The base rate is set by the indication subgroup, not a generic assumption.
Source: BIO Clinical Development Success Rates 2011–2020. Phase 2 oncology success rate 5.3%. [Fn. 7]
For each signal: multiply posterior odds by the calibrated likelihood ratio (LR+) when it fires; by LR− when it does not fire; skip when null.
posterior_odds = prior_odds × LR+(signal1) × LR+(signal5) × LR−(signal2)
Signals that did NOT fire are informative too — a clean ClinicalTrials.gov amendment trail is a slight positive signal and reduces the posterior. Not firing is evidence.
FADE Score = P(fail | all signals) × 100
A score of 98 means: 2% of programs with this signal profile have historically been approved. Not a guarantee. A calibrated base rate you can plug into a model.
Killer condition override: ICLAC cell line match or sole-mechanism retraction → Score overridden to 99 regardless of other signals. The efficacy data is invalid at the foundation.
| Signal | Status | Evidence |
|---|---|---|
| Stage 1: Citation Integrity | FIRED | PubPeer (Science.org) concerns on Lesne et al. 2006 (foundational mechanism paper) predating Phase 3 enrollment. Sole mechanistic basis → KILL condition triggered |
| Stage 3: Publication Pattern | FIRED | ENGAGE trial results (futility confirmed) present in ClinicalTrials.gov database but underrepresented in FDA advisory briefing framing vs. EMERGE |
| Stage 5: Endpoint Switch | FIRED | Primary analysis changed from pre-specified population to post-hoc high-dose EMERGE subgroup after March 2019 futility determination. Amendment timestamped. |
| Stages 2, 4 | NULL | Not computable from available public data for this program type (antibody, not small molecule → dose ceiling signal does not apply) |
Note on calibration: The 99/100 score above reflects the KILL condition trigger for sole-mechanism citation concern — not a calibrated Bayesian output. The Bayesian LR values (lr_positive per signal) are listed as CALIBRATION_PENDING until the historical cohort build (scripts/fade_cohort_builder.py) completes over a labeled dataset of ≥500 programs. This example demonstrates the scoring architecture. The number gets real once the cohort runs.
Pipeline: fade_cohort_builder.py (4–8 hrs, free public APIs) → fade_signal_calibration.py (<1 min) → fade_score_calculator.py (real-time per program)