ClawBio

ClawBio

πŸ¦– ClawBio

The first bioinformatics-native AI agent skill library.
Built on OpenClaw (180k+ GitHub stars). Local-first. Privacy-focused. Reproducible.

CI Python 3.10+ MIT License ClawHub Skills Open Issues Slides


See It in Action

A community contributor built a nutrigenomics skill and ran it β€” from raw genetic data to personalised nutrition report with radar charts, heatmaps, and reproducibility bundle:

https://github.com/ClawBio/ClawBio/releases/download/v0.2.0/david-nutrigx-demo.mp4

What just happened behind the scenes 1. The AI agent read `SKILL.md` β€” a specification that encodes the correct bioinformatics decisions (40 SNPs, 13 nutrient domains, evidence-based risk thresholds) 2. It ran the Python skill **locally** β€” no genetic data left the machine 3. It produced a markdown report with figures, tables, and a **reproducibility bundle** (`commands.sh`, `environment.yml`, `checksums.sha256`) 4. Anyone can re-run the exact same analysis and get identical results, SHA-256 verified

ClawBio PharmGx Demo
PharmGx Reporter: 12 genes, 51 drugs, under 1 second


The Problem

You read a paper. You want to reproduce Figure 3. So you:

  1. Go to GitHub. Clone the repo.
  2. Wrong Python version. Fix dependencies.
  3. Need the reference data β€” where is it?
  4. Download 2GB from Zenodo. Link is dead.
  5. Email the first author. Wait 3 weeks.
  6. Paths are hardcoded to /home/jsmith/data/.
  7. Two days later: still broken. You give up.

Now imagine the same paper published a skill:

python ancestry_pca.py --demo --output fig3
# Figure 3 reproduced. Identical. SHA-256 verified. 30 seconds.

That’s ClawBio. Every figure in your paper should be one command away from reproduction.


πŸ¦– What Is ClawBio?

A skill is a domain expert’s knowledge β€” frozen into code β€” that an AI agent executes correctly every time.

ChatGPT / Claude  = a smart generalist who guesses at bioinformatics
πŸ¦– ClawBio skill  = a domain expert's proven pipeline that the AI executes

Why Not Just Use ChatGPT?

Ask Claude to β€œprofile my pharmacogenes from this 23andMe file.” It’ll write plausible Python. But:

ClawBio encodes the correct bioinformatics decisions so the agent gets it right first time, every time.


πŸ” Provenance & Reproducibility

Every ClawBio analysis ships with a reproducibility bundle β€” not as an afterthought, but as part of the output:

report/
β”œβ”€β”€ report.md              # Full analysis with figures and tables
β”œβ”€β”€ figures/               # Publication-quality PNGs
β”œβ”€β”€ tables/                # CSV data tables
β”œβ”€β”€ commands.sh            # Exact commands to reproduce
β”œβ”€β”€ environment.yml        # Conda environment snapshot
└── checksums.sha256       # SHA-256 of every input and output file

Why this matters: a reviewer can re-run your analysis in 30 seconds. A collaborator can reproduce your Figure 3 without emailing you. Future-you can regenerate results two years later from the same bundle.


πŸ¦– Skills

Skill Status Description
Bio Orchestrator MVP Routes bioinformatics requests to the right specialist skill
PharmGx Reporter MVP Pharmacogenomic report: 12 genes, 51 drugs, CPIC guidelines
Ancestry PCA MVP PCA decomposition vs SGDP (345 samples, 164 global populations)
Semantic Similarity MVP Semantic Isolation Index for 175 GBD diseases from 13.1M PubMed abstracts
Equity Scorer Planned HEIM diversity metrics from VCF/ancestry data
VCF Annotator Planned Variant annotation with VEP, ClinVar, gnomAD + ancestry context
Lit Synthesizer Planned PubMed/bioRxiv search with LLM summarisation and citation graphs
scRNA Orchestrator Planned Scanpy automation: QC, clustering, DE analysis, visualisation
Struct Predictor Planned AlphaFold/Boltz local structure prediction
Repro Enforcer Planned Export any analysis as Conda env + Singularity + Nextflow pipeline

πŸ¦– MVP Skills in Detail

PharmGx Reporter β€” Personal Scale

Generates a pharmacogenomic report from consumer genetic data (23andMe, AncestryDNA):

python pharmgx_reporter.py --input demo_patient.txt --output report

Demo result: CYP2D6 *4/*4 (Poor Metabolizer) β†’ 10 drugs AVOID (codeine, tramadol, 7 TCAs, tamoxifen), 20 caution, 21 standard.

~7% of people are CYP2D6 Poor Metabolizers β€” codeine gives them zero pain relief. ~0.5% carry DPYD variants where standard 5-FU dose can be lethal. This skill catches both.

Ancestry PCA β€” Population Scale

Runs principal component analysis on your cohort against the SGDP reference panel (345 samples, 164 global populations):

python ancestry_pca.py --demo --output ancestry_report

Demo result: 736 Peruvian samples across 28 indigenous populations. Amazonian groups (Matzes, Awajun, Candoshi) sit in genetic space that no SGDP population occupies β€” genuinely underrepresented, not just in GWAS, but in the reference panels themselves.

Semantic Similarity Index β€” Systemic Scale

Computes a Semantic Isolation Index for diseases using 13.1M PubMed abstracts and PubMedBERT embeddings (768-dim):

python semantic_sim.py --demo --output sem_report

Key finding: Neglected tropical diseases are +38% more semantically isolated (P < 0.0001, Cohen’s d = 0.84). 14 of the 25 most isolated diseases are Global South priority conditions. Knowledge silos kill innovation β€” a malaria immunology breakthrough could help leishmaniasis, but the literatures don’t talk to each other.

Corpas et al. (2026). HEIM: Health Equity Index for Measuring structural bias in biomedical research. Under review.


Quick Start

Prerequisites

Install and run

# Install a skill
openclaw install skills/pharmgx-reporter

# Run with natural language
openclaw "Profile the pharmacogenes in my 23andMe file at data/raw_genotype.txt"

# Or run directly
python skills/pharmgx-reporter/pharmgx_reporter.py --input data/raw_genotype.txt --output report

Every skill includes demo data so you can try it immediately without your own files.


πŸ¦– Architecture

User: "Analyse the diversity in my VCF file"
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
  β”‚  Bio         β”‚  ← routes by file type + keywords
  β”‚  Orchestratorβ”‚
  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                                                  β”‚
  PharmGx    Ancestry    Semantic    Equity    VCF
  Reporter   PCA         Similarity  Scorer    Annotator ...
  β”‚                                                  β”‚
  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
  β”‚  Markdown    β”‚  ← report + figures + checksums
  β”‚  Report      β”‚     + reproducibility bundle
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each skill is standalone β€” the orchestrator routes to the right one, but every skill also works independently.

See docs/architecture.md for the full design.


Community Wanted Skills πŸ¦–

We want skills from the bioinformatics community. If you work with genomics, proteomics, metabolomics, imaging, or clinical data β€” wrap your pipeline as a skill.

Skill What Your expertise
claw-gwas PLINK/REGENIE automation Statistical genetics
claw-metagenomics Kraken2/MetaPhlAn wrapper Microbiome
claw-acmg Clinical variant classification Clinical genomics
claw-pathway GO/KEGG enrichment Functional genomics
claw-phylogenetics IQ-TREE/RAxML automation Evolutionary biology
claw-proteomics MaxQuant/DIA-NN Proteomics
claw-spatial Visium/MERFISH Spatial transcriptomics

See CONTRIBUTING.md for the submission process and templates/SKILL-TEMPLATE.md for the skill template.


Presentation

ClawBio was announced at the London Bioinformatics Meetup on 26 February 2026.


Citation

If you use ClawBio in your research, please cite:

@software{clawbio_2026,
  author = {Corpas, Manuel},
  title = {ClawBio: An Open-Source Library of AI Agent Skills for Reproducible Bioinformatics},
  year = {2026},
  url = {https://github.com/ClawBio/ClawBio}
}

License

MIT β€” clone it, run it, build a skill, submit a PR. πŸ¦–