The three signals
A SkillScore blends up to three independent kinds of proof:Benchmark
The skill is run on example tasks with it vs. without it. Did it raise the pass rate?
Live eval
On real production runs that were evaluated, did the skill pass?
AI rating
An AI judge scores the quality of real production outputs.
A skill with no evidence yet shows New rather than a number — not a failure, just nothing measured.
Where the evidence comes from
The score is always computed from the evidence attached to that specific skill. What differs is whose runs count and who can see the result:| The skill is… | Scored from | Inherits a score? | Visible to |
|---|---|---|---|
| Public | Everyone who uses it, anonymized — its cross-org reputation | n/a — it is the public reputation | Everyone |
| Private to your team | Your team’s own runs | No — scored from your team’s data only | Your team |
| A copy you forked | Its own runs — starts at New, earns its own | No — starts fresh, builds its own | Your team |
Forking starts a fresh score
When you fork a skill you get an independent, editable copy. Because you can change it, it does not inherit the original’s score — it builds its own from your runs. The original keeps its public score.
Public reputation vs. your results
For a public skill your team uses, you’ll see two complementary numbers. They answer different questions, so we keep them separate:SkillScore (public)
“Is this proven to work, across everyone?” The cross-org reputation — use it to decide whether to adopt a skill.
Your results
“Is it working for us?” Your team’s own pass rate, usage, and ratings on the skill — use it to catch a version that regressed for your workload, then pin a version or fork your own.
How it’s calculated
A background job regularly reads each skill’s benchmark runs, evaluated traces, and AI ratings, computes the 0–100, and stores it with a per-signal breakdown. Public skills are scored from anonymized cross-org evidence; private and forked skills are scored from your team’s data only and kept private to your org.On the leaderboard
The registry leaderboard ranks skills by SkillScore. A skill needs at least one signal to appear — and the more corroborating signals it has, the more its score can be trusted.Related
Public Registry
Browse and install skills ranked by SkillScore.
skillevaluation
The A/B benchmark that produces the strongest SkillScore signal.
Skills
Author, auto-discover, and track SKILL.md files.