ClawBio

πŸ¦– ClawBio Architecture

Overview

ClawBio is a collection of modular AI agent skills for bioinformatics, designed around three principles: local-first execution, reproducible analysis, and composable workflows.

System Design

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    User Request      β”‚
                    β”‚  (natural language)  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Bio Orchestrator   β”‚
                    β”‚  (routing + planning β”‚
                    β”‚   + report assembly) β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                β”‚                 β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Equity Scorer   β”‚ β”‚ Seq Wranglerβ”‚ β”‚ Struct Predictorβ”‚
    β”‚  VCF Annotator   β”‚ β”‚ scRNA Orch  β”‚ β”‚ Lit Synthesizer β”‚
    β”‚                  β”‚ β”‚             β”‚ β”‚ Repro Enforcer  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                β”‚                 β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Output Layer       β”‚
                    β”‚  - Markdown report   β”‚
                    β”‚  - Figures (PNG/SVG) β”‚
                    β”‚  - Audit log         β”‚
                    β”‚  - Repro bundle      β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Routing Logic

The Bio Orchestrator routes requests based on:

  1. File extension: .vcf -> equity-scorer/vcf-annotator, .fastq -> seq-wrangler, etc.
  2. Keyword matching: β€œdiversity” -> equity-scorer, β€œstructure” -> struct-predictor, etc.
  3. User intent: Explicit skill names override automatic routing.
  4. Chaining: Multi-step requests trigger sequential skill invocation with output piping.

Skill Independence

Every skill works standalone. The Bio Orchestrator adds:

A user can invoke any skill directly without the orchestrator.

Data Flow

Input File(s)
    β”‚
    β–Ό
Validation (file type, format, size checks)
    β”‚
    β–Ό
Processing (skill-specific computation)
    β”‚
    β–Ό
Results (tables, metrics, intermediate files)
    β”‚
    β–Ό
Visualisation (matplotlib/seaborn figures)
    β”‚
    β–Ό
Report Assembly (markdown + embedded figures)
    β”‚
    β–Ό
Reproducibility Export (conda env, commands, checksums)
    β”‚
    β–Ό
Audit Log Append (timestamped action record)

Privacy Model

ClawBio enforces a strict local-first privacy model:

Reproducibility Contract

Every analysis produces:

  1. commands.sh: Exact shell commands to reproduce the analysis without the agent.
  2. environment.yml: Conda environment specification with pinned versions.
  3. checksums.sha256: SHA-256 hashes of all input files.
  4. analysis_log.md: Timestamped record of every action taken.

This means any result can be reproduced on any machine with the same inputs, independent of OpenClaw.

Skill Packaging

Each skill is a directory containing:

skill-name/
β”œβ”€β”€ SKILL.md          # Required: YAML frontmatter + markdown instructions
β”œβ”€β”€ skill_name.py     # Optional: Python implementation
β”œβ”€β”€ utils.py          # Optional: shared utilities
β”œβ”€β”€ tests/            # Optional: test cases
β”‚   └── test_skill.py
└── examples/         # Optional: example inputs/outputs
    β”œβ”€β”€ input.vcf
    └── expected_output.md

The SKILL.md is the primary artifact. The Python files are supporting code that the agent invokes via shell commands. This separation means:

Integration with OpenBio

The existing OpenBio skill provides API access to:

ClawBio skills can call OpenBio for database lookups while keeping all computation local. For example, Struct Predictor might use OpenBio to fetch a reference structure from PDB, then run local AlphaFold for comparison.

Extensibility

New skills follow the template at templates/SKILL-TEMPLATE.md. The Bio Orchestrator routing table is designed to be extended: add a new entry mapping file types or keywords to your skill, and the orchestrator routes to it automatically.

Community submissions go through ClawHub or direct PR to this repository. πŸ¦–