Mathematical Model: Research Productivity Simulation

An Empirically-Grounded Formalization of Traditional vs AI-Augmented Research Labs

1. Model Overview and Environment

We model a research lab over time $T$ (in years) consisting of $n_{PhD}$ PhD researchers and $n_{PI}=1$ principal investigator.

1.1 Research Environment

VariableSymbolDescription
TSimulation time (years)Typically T = 2 years
n_PhDNumber of PhD researchers3 in our simulation
λ₀Base task completion rateTasks per unit time per researcher
ηLearning rateEfficiency gain per year

2. Task Completion Rate Model

Each researcher allocates their time across discrete tasks. Let $f_k$ be the fraction of time spent on task type $k$:

Task kf_kSource
Boilerplate code0.15Feldon et al. 2017
Custom code0.20Feldon et al. 2017
Debugging0.10Feldon et al. 2017
Writing0.15Feldon et al. 2017
Literature review0.10Feldon et al. 2017
Data analysis0.15Feldon et al. 2017
Simulation0.10Feldon et al. 2017
Validation0.05Model assumption

The average speedup is:

$$\bar{s} = \sum_{k} f_k \cdot s_k \tag{1}$$

3. Per-Task Speedup Parameters

TaskSymbolValueSource
Boilerplate codes_boiler1.55xPeng et al. 2023 — RCT: 55.8% faster on simple tasks
Writing draftss_write1.40xNoy & Zhang 2023 — RCT: +40% for below-median writers
Debuggings_debug1.30xEstimate based on Peng et al. finding
Literature reviews_lit1.30xEstimate (no published RCT)
Data analysiss_data1.15xConservative estimate
Custom codes_custom1.00xMETR study — 0% for experienced devs
Simulations_sim1.00xConceptual: no AI for physical experiments
Validations_valid0.90xModeled: hallucination checking overhead

Critical: METR study found zero speedup for experienced developers on real tasks.

Computing $\bar{s}$:

$$\bar{s} = 0.15(1.55) + 0.20(1.00) + 0.10(1.30) + 0.15(1.40) + 0.10(1.30) + 0.15(1.15) + 0.10(1.00) + 0.05(0.90) = 1.220$$

4. Skill-Biased Effectiveness Model

AI productivity gains are skill-biased (Noy & Zhang 2023):

$$s_{eff}(\sigma) = s \cdot (1 + \alpha \cdot (1 - \sigma)) \tag{2}$$

With $\sigma \sim N(0.5, 0.15^2)$:

$$E[s_{eff}] = \bar{s} \cdot (1 + \alpha \cdot (1 - \mu_s)) = 1.220 \cdot 1.20 = 1.464 \tag{3}$$

5. Output Factor Derivation

5.1 Traditional Lab

$$E_{trad}(T) = \lambda_0 \cdot T \cdot n_{PhD} \cdot (1 + \eta_{trad} \cdot T/2) \tag{4}$$

where $\eta_{trad} = 0.15$/year.

5.2 Augmented Lab

$$E_{aug}(T) = \lambda_0 \cdot T \cdot n_{PhD} \cdot (1 + \eta_{aug} \cdot T/2) \cdot E[s_{eff}] \tag{5}$$

where $\eta_{aug} = 0.20$/year.

5.3 Output Ratio

$$R(T) = \frac{E_{aug}(T)}{E_{trad}(T)} = \frac{1 + \eta_{aug}T/2}{1 + \eta_{trad}T/2} \cdot E[s_{eff}] \tag{6}$$

At $T = 2$ years:

$$R(2) = \frac{1.20}{1.15} \cdot 1.464 = 1.528$$

Key Result: ~1.5x Output Factor

The model predicts an output factor of approximately 1.5x — not 3-5x claimed by vendors.

6. Publication Output Model

Research directions progress through stages:

$$P_{pub} = p_{idea} \cdot p_{method} \cdot p_{exp} \cdot p_{analysis} \cdot p_{write} \cdot p_{submit} \cdot p_{accept} \tag{7}$$

StageProbability
p_idea0.80
p_method0.70
p_exp0.60
p_analysis0.50
p_write0.70
p_submit0.80
p_accept0.65

$P_{pub} \approx 0.12$ (12% of directions become publications)

7. AI Failure Modes

7.1 Hallucinations

$$E[P]_{aug} = E_{aug}(T) \cdot P_{pub} \cdot (1 - p_{halluc}) \tag{8}$$

where $p_{halluc} \approx 0.10-0.15$

7.2 Dead End Dynamics

$$p_{dead,aug} = p_{dead,trad} \cdot (1 - \beta) + \gamma \cdot p_{halluc} \tag{9}$$

8. Sensitivity Analysis

ParameterRangeEffect on R
Mean skill μs0.3-0.7Lower skill → higher R
Skill-bias α0.2-0.6Higher α → higher R
Hallucination rate0.05-0.25Higher → lower R
Time T1-5 yearsLonger T → higher R

9. Summary

MetricTraditionalAugmentedDelta
Avg speedup1.00x1.22x+22%
Skill-biased effective1.00x1.46x+46%
Output ratio (T=2yr)1.00x1.53x+53%

10. Data Sources

  1. Peng et al. 2023 — arXiv:2302.06590. 55.8% faster on simple tasks.
  2. METR — Independent study. 0% gain for experienced devs.
  3. Noy & Zhang 2023Science 381(6654). +40% for below-median writers.
  4. Dell'Acqua et al. 2023 — Harvard/BCG. 23% WORSE outside AI frontier.
  5. Open Science Collaboration 2015 — Science 349(6251). 36-39% replication.
  6. Feldon et al. 2017 — CBE-Life Sciences Education. Task time allocation.
  7. Walters & Wilder 2023 — J. Academic Librarianship. 47-100% reference fabrication.

11. Conclusion

The mathematical model predicts ~1.5x output factor, not 3-5x. This emerges from task-specific speedups, skill bias, and validation overhead — consistent with METR's finding of 0% gain for experienced developers.

12. Disclaimer

This is not a rigorous study. The model is a structured estimate, not a peer-reviewed result. Key limitations and what a rigorous version would require: