VIP Fantastic Anachronism

VIP Fantastic Anachronism

📝 ARTICLE INFORMATION

  • Title: VIP Fantastic Anachronism
  • Person: Alvaro de Menard
  • Platform: fantasticanachronism.com (Substack)
  • Date: October 16, 2025
  • Primary Focus: AI, futurism, metascience, history, literature, philosophy
  • Word Count: Approximately 3600 words

🎯 HOOK

An independent outsider read 2,500 social science papers for a DARPA replication market, then made $10,000 proving that predicting which studies are fake is easier than predicting the stock market. And the academic system doesn’t care.

💡 ONE-SENTENCE TAKEAWAY

Fantastic Anachronism reveals that social science expertise is statistical theater where broken feedback mechanisms reward citations over truth, making credentialed researchers systematically worse at detecting bad science than intelligent amateurs using simple heuristics.

📖 SUMMARY

Alvaro de Menard is an independent researcher who has become one of the most credible critics of modern social science; precisely because he isn’t embedded in it. His blog, Fantastic Anachronism, interrogates expertise, credentialism, institutional decay, and why sophisticated societies produce unreliable knowledge at scale.

The blog launched with observations about memetic defoundation: ideas persisting after their justifications vanish. Bunny ears as a gesture outlived the cuckold horns meaning; organizational structures survive losing their original purpose; rituals continue when foundations crack. This framework illuminates how academic institutions continue accumulating prestige long after their quality control mechanisms fail.

De Menard’s most influential essay, “Are Experts Real?” opens with honest oscillation: experts have deep knowledge requiring decades of specialization, yet sometimes they say things amateurs can instantly recognize as false. How can both be true? By examining feedback loops. Chess has real expertise because the feedback is immediate, public, and objective. Every game result updates the rating, new players constantly compete, and you cannot fake grandmaster performance. Science should work similarly. Physics does: if physics were fake, CPUs wouldn’t work. But social science? If social priming effects were fake, there’s no immediate penalty. Citations aren’t correlated with truth; the feedback loop breaks.

De Menard then participated in DARPA’s Replication Markets project, forecasting which of 3,000 social science papers would replicate. He read 2,500+ papers, developed a statistical model predicting replication probability, traded on those predictions, and earned $10,000 by outperforming the market. The model was brutally simple: effect size, p-values, number of authors, gender composition, interaction effects, discipline. Nothing sophisticated. Yet intelligent amateurs using these basic heuristics outperformed professional reviewers and editors who had missed the problems entirely.

The most damning finding: studies that replicate are cited at identical rates to studies that don’t. Published replications don’t move citations. Bad papers accumulate prestige regardless of quality. This is the core scandal…the system doesn’t penalize failure, so it cannot learn.

De Menard’s essay “What’s Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers” catalogs methodological horror: researchers blindly using N=59 sample size across studies with completely different effect prevalences because one paper mentioned that number. Epidemiologists confidently asserting false claims about masks. Pension fund chiefs invoking credentials instead of logic when confronted with nonsense. The pattern is consistent: when expertise is questioned, credentialed experts appeal to credentials rather than reason; the opposite of how Magnus Carlsen responds to chess criticism.

The underlying problem isn’t stupidity or malice, though some exists. It’s institutions designed to reward publication volume over discovery. Tenure requires papers, grants require publications, citations drive prestige, and prestige drives funding. The system is optimized for productive-looking failure, not truth. A researcher knows they can publish bad work and face no consequences. The only risk is if their field eventually experiences a replication crisis that triggers funding attention. But even that produces minimal change: social science has been post-replication-crisis for a decade with no measurable improvement.

De Menard distinguishes fake expertise from credential inflation. Some fields have legitimate expertise hierarchies with functioning feedback loops (basic science, chess, markets). Other fields have adopted the external trappings of expertise (PhD requirements, peer review, journal prestige, tenure) while losing the internal mechanisms that generate real knowledge. The result is credential-competence disparities so extreme they implicate the entire system.

The blog then explores adjacent problems: how to trust experts when you can’t evaluate their domain? (“Unjustified True Disbelief”) How do you distinguish real skepticism from charlatan exploitation? How does specialization create epistemic fragmentation where no one can catch cross-cutting errors? What happens to social hierarchies when they become disconnected from actual performance?

De Menard’s tone throughout is acidic without being nihilistic. He’s not claiming all expertise is fake. He’s cataloging specific structural failures, showing why they’re predictable, and proposing fixes: subsidized prediction markets for research evaluation, tighter feedback loops, making forecasting records public, rewarding accurate predictions rather than mere publication, and potentially building parallel research institutions where quality controls work.

The blog spans intellectual history (How does credentialism emerge? Why do societies adopt broken expertise hierarchies?), epistemology (When can you trust institutional claims?), and practical forecasting (Can simple models actually predict truth better than credentialed experts?). De Menard treats academic failure not as an individual morality problem but as a systems problem, and he provides evidence that systems problems have systematic solutions.

🔍 INSIGHTS

Core Insights

  • Feedback Loop Quality Determines Real Expertise: Genuine expertise exists only where feedback is tight, public, and consequential. Tight feedback (chess, engineering, markets) produces real expertise. Loose feedback (social sciences, nutrition science) produces prestige-without-competence.
  • Citation Decoupling from Quality: Studies that replicate are cited at identical rates to studies that don’t. Published failures are not penalized. The system cannot self-correct when failures and successes get equal rewards.
  • Credentialism Without Credentials: Experts credentialed in field A appeal to general credentialism when confronted with domain-specific errors, suggesting they’re defending status rather than truth. Real experts defend their claims with arguments, not credentials.
  • Specialization Creates Unaccountability: As fields narrow, fewer people can judge results. This enables wide-ranging errors to persist undetected. Specialization trades breadth for depth while losing cross-cutting error detection.
  • Simple Heuristics Outperform Expert Judgment: Laypeople using basic statistical principles (effect size, p-value, sample size, interaction effects) predict replication success above market accuracy. This suggests experts aren’t using even basic epistemic hygiene.
  • Institutional Uniformity Provides Credibility Cover: Psychology and physics departments look identical to outsiders despite vastly different epistemic reliability. This superficial uniformity lets low-quality fields borrow prestige from high-quality ones.
  • Incentive Structure Produces Predictable Failure: The processes generating unreliable research are routine, well-understood, and easy to avoid. Yet they persist because the reward structure incentivizes them. This isn’t a mystery, it’s structural.

How This Connects to Broader Trends/Topics

  • Institutional Decay: Fantastic Anachronism treats academic failure as a special case of broader organizational pathology where structures outlive their justifications (memetic defoundation).
  • Epistemic Crisis: Connects to discussions of institutional trust, expert polarization, and why intelligent people increasingly doubt credentialed authorities.
  • Crowdsourced Expertise: Demonstrates that distributed prediction markets can outperform hierarchical expertise; relevant to AI forecasting, governance, and decentralized decision-making.
  • Credentialism vs. Competence: Challenges meritocratic assumptions underlying modern credentialing, showing where degrees measure sorting ability rather than capability.
  • Institutional Reform: Proposes concrete mechanisms (prediction markets, public records, feedback loops) for realigning institutions with truth-seeking.

🛠️ FRAMEWORKS & MODELS

Expertise Feedback-Loop Hierarchy

  • Explanation: Real expertise exists on a spectrum determined by feedback mechanism quality; tightness, publicity, objectivity, and consequences.
  • Tiers (from most to least reliable feedback):
    • Chess/Markets: Public rating, instant updates, zero tolerance for fakery, new entrants constantly testing
    • Physics/Engineering: Real-world consequences (rockets explode, CPUs malfunction), objective verification, rapid falsification
    • Basic Science: Experimental replicability, peer scrutiny, eventual field consensus, slower but functional feedback
    • Social Science: Publication citation-driven, replication rare, failures unrewarded, feedback loop broken
    • Organizational/Political: No clear falsification mechanism, incentives disconnected from outcomes, perverse feedback loops
  • Application: Evaluate any knowledge claim by asking: What feedback mechanism exists? How tight is it? What are the consequences of error?
  • Significance: Explains why some fields have real expertise and others have status hierarchy without competence.

Credential-Competence Disparity Detection Model

  • Explanation: When experts appeal to credentials instead of argument, the disparity between credential and competence has become visible.
  • Warning Signs:
    • Appeals to titles/degrees when questioned about specific claims
    • Inability to engage counterarguments at technical level
    • Prestige and field results moving in opposite directions (citations rising while replications fall)
    • Unfamiliar specialists questioning basics confidently
    • Simple heuristics outperforming expert judgment
  • Application: When you notice credential-competence disparity, treat the field’s institutional prestige as potentially unreliable until mechanisms are fixed.
  • Significance: Provides operational way to identify broken expertise hierarchies without needing domain knowledge.

Memetic Defoundation Framework

  • Explanation: Ideas, practices, or structures persist long after their foundational justifications disappear.
  • Pattern:
    1. Practice established with clear justification: [reason] → [practice]
    2. Reason forgotten, disproved, or superseded
    3. Practice persists anyway through institutional inertia
    4. Practice eventually serves function divorced from original purpose
  • Examples: Bunny ears gesture (cuckold horns meaning lost), tenure system (cold-war-era security measures persisting for unrelated purposes), university structure (medieval preservation), publication volume (justified by pre-digital era need for permanence, persists despite digital storage)
  • Application: When institutional practices seem irrational, trace to their origin. The justification may have vanished while practice remains.
  • Significance: Explains why institutions appear internally contradictory. They’re holding together conflicting purposes from different eras.

Citation-Quality Decoupling Detection

  • Explanation: When replicated and non-replicated studies receive equal citations, the system has lost its quality signal.
  • Implications:
    • Peer review is not catching errors at publication
    • No feedback mechanism punishes bad work
    • Future researchers cannot distinguish reliable from unreliable foundations
    • Field knowledge builds on sand
  • Application: Check whether bad papers are subsequently retracted, if citations drop post-failure, if field collectively updates confidence. If none happen, feedback loop is broken.
  • Significance: Provides objective test for whether a field can self-correct.

Tight vs. Loose Feedback Environments

  • Tight Feedback (Real Expertise Possible):
    • Measurable outcomes: Chess rating, engineering performance, market returns
    • Rapid consequences: Hours to months, not years or decades
    • Publicity: Everyone can see the record
    • New competition: Constant influx of challengers
    • Example: Chess, trading, engineering
  • Loose Feedback (Status Hierarchy without Expertise):
    • Ambiguous outcomes: Effects hard to measure, isolate, or verify
    • Delayed consequences: Years to decades, often never
    • Privacy: Results assessed by peers who benefit from similar behavior
    • No new entrants: Barriers to entry prevent fresh perspective
    • Example: Social science, organizational leadership, policy advice
  • Application: Before trusting expert claims, determine which feedback environment the field operates in.
  • Significance: Explains why intelligent people with doctorates can be wrong systematically; not from stupidity but from operating in information environments without consequences for error.

💬 QUOTES

  • “I vacillate between two modes: sometimes I think every scientific and professional field is genuinely complex, requiring years if not decades of specialization to truly understand even a small sliver of it…But sometimes one of these masters at the top of the mountain will say something so obviously incorrect, something even an amateur can see is false, that the only possible explanation is that they understand very little about their field.”

  • Context: Opening of “Are Experts Real?”, establishing the core tension driving the entire essay.

  • Significance: Captures the lived experience of epistemic confusion in modern institutions…both possibilities feel true simultaneously.

  • “If Carlsen is fake that also implicates every player who has played against him, every tournament organizer, and so on. The entire hierarchy comes into question. Even worse, imagine if it was revealed that Carlsen was a fake, but he still continued to be ranked #1 afterwards.”

  • Context: Contrasting chess’s self-correcting expertise hierarchy with social science’s credential stickiness.

  • Significance: Demonstrates that broken expertise can theoretically be detected by checking whether the system corrects obvious failures.

  • “The marginal researcher is a hack and the marginal paper should not exist. There’s a general lack of seriousness hanging over everything.”

    • Context: Conclusion after reading 2,500+ social science papers for DARPA replication markets.
    • Significance: Visceral summary after intimate engagement with the field’s actual output, not abstract critique.
  • “When you look at the papers citing Viechtbauer et al., you will find dozens of them simply using N=59, regardless of the problem they’re studying, and explaining that they’re using that sample size because of the Viechtbauer paper!”

    • Context: Example of credentialed researchers making obvious statistical errors that cascade through the literature.
    • Significance: Shows expertise failure isn’t from inability but from not engaging basic epistemic hygiene.
  • “Studies that replicate are cited at the same rate as studies that do not.”

    • Context: Key finding from replication market research and broader meta-research on citations.
    • Significance: Proves the feedback loop is genuinely broken. Failures and successes get identical rewards.
  • “This year, we’ve come to better appreciate the fallibility and shortcomings of numerous well-established institutions (‘masks don’t work’)… while simultaneously entrenching more heavily mechanisms that assume their correctness (‘removing COVID misinformation’).”

    • Context: Patrick Collison quote cited by de Menard on institutional learning failure.
    • Significance: Captures the paradox of institutions losing credibility while gaining more authority over information.
  • “The truth is that the system will not even have to defend itself, it will just ignore this stuff and keep on truckin’ like nothing happened.”

    • Context: On whether DARPA replication market results will produce institutional reform.
    • Significance: Reflects pessimism about whether external evidence can overcome institutional inertia without structural incentive changes.

APPLICATIONS

For Evaluating Claims Outside Your Expertise

  • Assess Feedback Mechanism: Does the field have tight, public, consequential feedback? Chess yes, psychology no. If feedback is loose, require higher evidence burden.
  • Check Citation-Quality Correlation: If bad work is cited equally to good work, the field’s prestige may not track quality. Discount expert consensus.
  • Test Credential-Competence Disparity: Ask the expert technical questions. If they appeal to credentials or titles instead of engaging substantively, expertise may be status rather than knowledge.
  • Use Simple Heuristics: For scientific claims, check basic statistics; effect size, sample size, p-values, interaction effects. If these are absurd, the study likely won’t replicate.
  • Seek Forecasting Records: Has anyone tracked this expert’s predictions against outcomes? Public, dated track records beat credentials.

For Institutional Reform

  • Implement Prediction Markets: Subsidize prediction markets for contested claims. Forecasters outperform standard peer review. Use market prices as advisory alongside traditional evaluation.
  • Make Feedback Public and Rapid: Require replication of all findings in high-impact journals. Publish replication results prominently. Track which papers fail. Update citations accordingly.
  • Decouple Prestige from Publication Volume: Reward accurate predictions, successful replications, and clear thinking. Penalize prolific failure. Make prestige sticky only when performance continues.
  • Create Parallel Institutions: Where existing institutions have broken feedback loops, build new ones with tighter mechanisms (New Science, Arcadia, direct philanthropic funding bypassing peer review).
  • Specialize in Tightening Feedback: Hire forecasters, statisticians, and replication specialists to audit research at institutional level. Make auditing prestigious.

For Individual Decision-Making

  • Question Institutional Consensus When Feedback is Loose: If everyone credentialed agrees on something but the field has no tight feedback loop, intellectual humility should cut both ways, and be skeptical of consensus too.
  • Track Expert Predictions: Keep records of what claimed experts predicted and what actually happened. Calibrate trust based on track records, not titles.
  • Demand Specific Claims with Dates: Vague claims are unfalsifiable. Force experts to predict specifics by date. This creates accountability.
  • Build Expertise Through Tight Loops: If you want to become genuinely skilled at something, choose domains with immediate feedback (chess, markets, sports, programming) over domains with delayed or absent feedback (policy, psychology, organizational management).

For Meta-Science and Funding

  • Audit Field Quality Systematically: Before funding a field, assess feedback loop quality. Tight-feedback fields deserve larger grants and more autonomy. Loose-feedback fields need reform before scaling.
  • Incentivize Failure Detection: Fund people who find and replicate failures. Currently, finding bad work carries career risk. Reverse this.
  • Create Forecasting Infrastructure: Establish prediction markets for contested scientific claims. Use forecasting accuracy as a key metric for allocating future funding.
  • Monitor Citation-Quality Coupling: Measure whether citations track replicability. If they don’t, redirect funding toward fields with better quality signals.

📚 REFERENCES

Key Essays on Fantastic Anachronism

  • “Are Experts Real?” (January 2021). Core framework on expertise feedback loops, credential-competence disparity, chess vs. social science comparison
  • “What’s Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers” (September 2020). Comprehensive replication crisis analysis, methodological audit
  • “Memetic Defoundation” (March 2020). Framework for why institutions preserve practices after justifications vanish
  • “Unjustified True Disbelief” (January 2021). When and how to rationally distrust expertise
  • “How I Made $10k Predicting Which Studies Will Replicate” (December 2021). Practical forecasting methodology, replication markets details

DARPA SCORE Program & Replication Markets

  • Replication Markets project (2019–2021): ~3,000 social science papers, prediction market structure, trading methodology
  • Results pending on full replication outcomes; prediction markets ~70% accurate on average
  • Comparison projects: repliCATS discussion-based forecasting, DARPA prediction market evaluation

Related Work Cited or Referenced

  • Ioannidis, J. P. (2005). “Why Most Published Research Findings Are False”. Foundational paper on publication bias
  • Camerer et al. (2018). “Evaluating replicability of laboratory experiments in economics” (Nature)
  • Viechtbauer et al. (2015). “A simple formula for the calculation of sample size in pilot studies”. Example of methodological error cascade
  • Candidate gene literature collapse (e.g., 5-HTTLPR). Historical example of field-wide failure and eventual correction
  • Implicit Association Test criticism. Example of politically-motivated social science collapse
  • Brian Wansink scandal: Example of high-citation fraud lacking institutional consequences

Authors & Researchers Mentioned

  • Eugene Wigner: Frustration with specialization in physics, quote on inability to follow modern physics
  • Andrew Gelman: Statistical Rethinking, meta-science commentary
  • Scott Alexander: Slate Star Codex commentary on scientific consensus and replication markets
  • Patrick Collison: nstitutional criticism, sponsorship of New Science
  • Tyler Cowen: General epistemology and expertise assessment
  • Tom Liptak: Superforecaster track record comparison to epidemiologists
  • Kelsey Piper (Vox): Popular coverage of replication crisis, citing de Menard

Institutional Context

  • Academic publishing economics and incentive structures
  • Peer review system limitations and alternatives
  • Citation metrics as quality signals (and their failure)
  • Tenure system origins and persistence through memetic defoundation
  • Rise of “meta-science” as response to field-level quality failures

⚠️ QUALITY & TRUSTWORTHINESS NOTES

Strengths

  • Grounded in Specifics: Cites actual papers (e.g., N=59 sample size errors), concrete examples (CalPERS investment committee, mask epidemiology), specific studies that failed to replicate. Claims are traceable.
  • Quantitative Rigor: Participated in DARPA forecasting project with monetary outcome. Predictions were measurable against reality. No pure speculation. Skin in the game.
  • Transparency About Limitations: De Menard acknowledges he’s non-academic, calls himself “independent researcher of dubious nature,” notes DARPA participation, distinguishes between forecasts and actual replication results (which weren’t all available at time of writing).
  • Intellectual Honesty: Engages counterarguments seriously. Notes that specialization has benefits even while explaining downsides. Acknowledges his pessimism might be unwarranted. Cites research supporting multiple positions.
  • Cross-Disciplinary Synthesis: Combines statistics, institutional analysis, history, epistemology, and practical forecasting into coherent framework rather than narrow critique.

Considerations

  • Survivorship Bias: De Menard read 2,500 papers for DARPA project but didn’t sample randomly. This was papers considered important enough for replication market. Results may not generalize to all social science.
  • Pessimism Bias: De Menard explicitly describes himself as more pessimistic than peers about social science’s ability to self-correct. This shapes interpretation of evidence.
  • Limited Positive Examples: Critique is thorough; solutions proposed are largely speculative or borrowed (prediction markets, parallel institutions). Less evidence that proposed reforms would work at scale.
  • Non-Academic Status as Double-Edged: Outsider perspective is valuable but also lacks institutional knowledge about field-internal reforms already happening, grant-making changes underway, or successful niche improvements.
  • Broad Generalization: While focusing on social science, de Menard sometimes treats “academic expertise” monolithically, potentially overgeneralizing from psychology/sociology to physics or mathematics where feedback mechanisms differ significantly.
  • Publication Bias in Examples: Examples cited (Wansink, implicit bias tests) are spectacular failures. The typical failure is subtler and harder to document.

Accuracy Assessment

  • Verified Claims: N=59 sample size error is documented and cited correctly. Wansink citations and resignation are factually accurate. DARPA SCORE program structure matches historical record. Prediction market accuracy figures (~70%) align with published results.
  • Statistical Claims: De Menard’s descriptions of p-hacking, publication bias, and methodological problems align with mainstream meta-science consensus (Ioannidis, Gelman, Simonsohn, etc.).
  • Institutional Critique: Descriptions of tenure system, peer review incentives, and citation metrics accurately reflect current structures, even if interpretations differ.
  • Replication Rates: Cited replication rates (39% in psychology replication, ~50% in social science generally) match published meta-research.

Trust Indicators

  • External Validation: Featured in Vox, Social Science Space, multiple academic venues despite being non-academic. Work cited approvingly by researchers, funders (Caleb Watney, Paul Graham).
  • Practical Success: Made $10K in replication forecasting, demonstrating genuine predictive skill; not just theoretical critique.
  • Intellectual Community: Engages seriously with academic researchers, acknowledges expertise limitations, participates in institutional reform efforts rather than just criticizing from sidelines.

Potential Harm Risks

  • Epistemic Demoralization: Reading de Menard might discourage trust in any expertise, which could be harmful if leads to rejection of reliable fields alongside broken ones.
  • Mitigation: De Menard explicitly preserves distinction between tight-feedback and loose-feedback fields. Physics, engineering, medicine (for some questions) remain trustworthy. The critique isn’t nihilism but sorting.

Find this work at: fantasticanachronism.com | @AlvaroDeMenard on X


Crepi il lupo! 🐺