Getting Started
skill-up is an evaluation tool for Agent Skill developers. Use it to verify that your Skill behaves correctly inside real Agent Engines (Claude Code, Codex, Qoder CLI) and to run continuous regression locally or in CI.
Installation
Install with the script
curl -fsSL https://raw.githubusercontent.com/alibaba/skill-up/main/install.sh | bashThe installer downloads the matching binary from GitHub Releases.
To build locally from a checkout, install Go 1.25 or later:
make build
# or
go build -o bin/skill-up ./cmd/skill-upVerify the install
skill-up --versionCore concepts
To evaluate a Skill with skill-up you need two things:
- eval.yaml — the entrypoint config that declares the runtime environment, the Agent Engine and model, and the global grading strategy.
- case.yaml — a single evaluation case that defines the prompt sent to the Agent, the expected output, and grading rules.
They live inside the evals/ folder of your Skill:
my-skill/
SKILL.md # Your Skill definition
evals/ # Evaluation root
eval.yaml # Entrypoint config
cases/ # One file per case
basic-test.yaml
edge-case.yaml
fixtures/ # Optional test resources
repos/ # Repository templates
scripts/ # Grading scripts5-minute quick start
Step 1 — Create the eval config
Inside your Skill directory, create evals/eval.yaml:
schema_version: v1alpha1
environment:
type: none # Plain-text Skills don't need an isolated container
engine:
name: claude_code # Use Claude Code as the Agent Engine
cases:
files:
- evals/cases/hello-world.yamlTip: When
evals/eval.yamllives under a directory that containsSKILL.md, skill-up installs the current Skill automatically. The omitted fields use defaults: JSON report output,timeout_seconds: 300,max_turns: 10, andparallelism: 1. Addengine.model,skills,cases.defaults, orreportonly when you need to override them.
For the full eval.yaml schema, see Writing Evals.
Step 2 — Write an Eval Case
Create evals/cases/hello-world.yaml:
input:
prompt: |
Please generate a Hello World program
expect:
must_contain:
- "Hello"
- "World"
must_not_contain:
- "error"The case id defaults to the filename (hello-world). Add a judge block only when you need script-based or agent-based grading.
Step 3 — Validate the config
This step is optional, but useful before the first run: it checks eval.yaml and all referenced case files without starting an Agent Engine.
skill-up validateOn success you should see:
✓ eval.yaml is valid (loaded 1 case(s))Step 4 — Run the evaluation
skill-up runYou will see output similar to:
Running 1 case(s) with agent claude_code
[Runner] Running 1 cases with agent claude_code
[Evaluator] Skill installed: <skill-name>
[Evaluator] Running case hello-world (with_skill): Skill should respond to a basic request
[Evaluator] Case hello-world: PASS (pass_rate: 100.0%)
[INFO] Results written to ./<skill-name>-workspace/iteration-1Next steps
- Writing Evals — full reference for
eval.yamland case files. - CLI Reference — every command and flag.
- Migrating from Anthropic — if you already have an Anthropic skill-creator
evals.json.
