📹 Tutorial Videos

Presentation Tutorial

Tutorial Video

How to design, organize, and execute a generic multi-agent system for JAMA-style paper generation.

This guide explains the architecture and operation of the BST236 Midterm Project Workflow—a system designed to turn raw public health data into publication-ready research papers with minimal human intervention.


1. Executive Summary

1.1 Why This Workflow Exists

Research is often slowed by the friction of switching between data analysis, literature review, and manuscript drafting. This workflow eliminates that friction by:

1.2 The Goal

To move from Raw Data → Publication-Ready PDF + Presentation Materials in under 60 minutes.


2. Getting Started: Installation & Setup

2.1 System Requirements

Our workflow is designed to be cross-platform but has been rigorously tested on:

2.2 Software Prerequisites

You will need three core components installed on your system:

2.2.1 Python Environment (3.9+)

The orchestrator and analysis scripts require standard scientific computing packages.

# Install required Python packages
pip install pandas numpy scipy statsmodels matplotlib seaborn scikit-learn openpyxl

2.2.2 LaTeX Distribution

This is required for generating the final publication-ready PDF.

2.2.3 AI Assistant (Optional)

If you wish to use the multi-agent system via a CLI assistant, we recommend the GitHub Copilot CLI:

gh extension install github/gh-copilot
gh auth login

2.3 Repository Setup

  1. Clone the Workflow:
    git clone https://github.com/your-squad/midterm-project.git
    cd midterm-project
  2. Verify Installation:

    Run a quick sanity check to ensure all dependencies are met:

    python -c "import pandas, statsmodels, matplotlib; print('✅ Python OK')"
    pdflatex --version | head -n 1
Important
PATH Issues: Ensure that pdflatex and bibtex are in your system's PATH. On macOS, you may need to add /Library/TeX/texbin to your path manually.

3. Quick Start

3.1 Method 1: The Python Orchestrator (Script Mode)

The fastest way to execute the workflow is via the Python orchestrator. This is recommended for the "90-minute exam" scenario where stability is key. Place your datasets in a folder (e.g., exam_data/) and run:

# Run the complete end-to-end workflow
python workflow/orchestrator.py exam_data/ -o exam_paper/

3.2 Method 2: The AI CLI (Agent Mode)

For interactive control or fine-tuning, you can use an AI CLI (like Gemini CLI or Claude Code) to trigger the agents directly. This allows the AI to "think" through each step and handle complex edge cases that a static script might miss.

The "Power Prompt":

Simply point the AI to the workflow/ folder and issue a high-level directive:

"Use the multi-agent workflow system in workflow/ to generate a JAMA Network Open paper 
from the data in exam_data/. 

1. Follow the Agent definitions in workflow/agents/
2. Use the Skills provided in workflow/skills/
3. Adhere to the Quality Standards in workflow/prompts/

Deliver the final PDF to the exam_paper/ directory."
Pro Tip
Pro Tip: When using Agent Mode, you can pause between phases to review the research_question.md or analysis_plan.md before the AI proceeds to the statistical modeling phase.
Verification
Verification: After the workflow finishes, check exam_paper/paper.pdf. If you see "Reference ??", it means the BibTeX phase encountered an error; check exam_paper/paper.log for details.

4. System Architecture

4.1 The 3-Layer Design Pattern

Our workflow is built on a modular "3-layer cake" architecture, ensuring that domain expertise is separated from technical implementation.

Agent Layer (The Brains)
Skills Layer (The Hands)
Prompts Layer (The Rulebook)

4.1.1 Layer 1: The Agent Layer (workflow/agents/)

Specialized AI entities with specific "personalities" and goals.

4.1.2 Layer 2: The Skills Layer (workflow/skills/)

Atomic, reusable code libraries that agents "invoke" to perform work.

4.1.3 Layer 3: The Prompts Layer (workflow/prompts/)

The "Rulebook" that defines quality standards and formatting requirements.


5. The 7-Phase Execution Loop

The orchestrator.py script executes the following sequence:

1. Exploration
2. Analysis
3. Visualization
4. Literature
5. Writing
6. Review
7. Production
Phase Script Agent Output
1. Exploration 01_data_explorer.py Data Explorer research_question.md, analysis_plan.md
2. Analysis 02_statistician.py Statistician results/, analysis_code.py
3. Visualization 03_visualizer.py Visualizer figures/*.pdf, tables/*.tex
4. Literature 04_literature_reviewer.py Literature Reviewer references.bib, citations.md
5. Writing 05_paper_writer.py Paper Writer paper.tex (Complete JAMA Draft)
6. Review 06_quality_controller.py Quality Controller review_report.md, Revised .tex
7. Production 07_post_production.py Post-Production slides/, website/, social/

6. Operational Patterns

6.1 Pattern: The Adversarial Critic Loop

The system doesn't just "finish" at Phase 5. Phase 6 invokes the Quality Controller Agent to audit the paper against the review-checklist.md. If the score is below the "95% Human-Quality" threshold, the Writer is re-activated with specific correction instructions.

6.2 Pattern: Contractor Mode (Manual Intervention)

While orchestrator.py handles the "Big Picture," users can use the CLI to drop into a specific phase.

# Example: Only regenerate figures if the data changed
python workflow/03_visualizer.py exam_data/ exam_paper/

6.3 Pattern: Post-Paper Products

Once the paper is finalized, the Post-Production Agent automatically converts the LaTeX content into:

  1. Beamer Slides: A 15-slide summary of the findings.
  2. Interactive Website: A Plotly-based dashboard for exploring the results.
  3. Social Media kit: Summaries for LinkedIn and Twitter.
Pro Tip
Pro Tip: Use the interactive website to verify that the numbers in the "Results" section match the raw data visualization. It's a great sanity check!

7. Reference Material

7.1 Appendix: File Structure

/
├── workflow/
│   ├── agents/    # Specialized AI instructions
│   ├── skills/    # Reusable code libraries (Python/LaTeX)
│   ├── prompts/   # Quality standards & checklists
│   └── scripts/   # Phase-specific execution logic (01-07)
├── exam_paper/    # The generated output folder
│   ├── paper.pdf  # Main Deliverable
│   ├── figures/   # Vector PDFs
│   └── post_paper_products/ # Slides & Website
└── sample/        # Testing datasets

7.2 Troubleshooting Guide

Issue Potential Cause Recommended Fix
LaTeX Compilation Error Missing figure or special character ($ or %) Check paper.log and verify all figure paths in .tex.
Phase 2 Error Dataset too large or non-numeric data Check data_summary.md for variable type mismatches.
Agent Refusals Over-sensitive safety filters Ensure your instructions are strictly scientific/medical.
Security First
Security First: Never store API keys or database credentials in the workflow/ folder. Use environment variables for all sensitive configuration.