Scanner / Precision

Precision changelog

The AgentScore scanner is a regex-based detector. Regex cannot tell a database method call apart from a shell exec, or a test fixture apart from a real credential, without context. Scanner v2.1 ships a mitigator system that scans a window around each match for known sanitizer wrappers or test-fixture markers and downgrades the severity when one fires.

The list below is every mitigator the scanner has gained, paired with the public report that motivated it. Each entry was driven by a real maintainer interaction, not a hypothetical edge case.

Scanner version

v2.1

Current ruleset digest

3185eb87b4ce

How mitigators work

When a finding fires, the scanner looks at a 2,000-character window around the match. If a sanitizer pattern (like validateCommand, execFile, or a database-shaped .exec()) hits in that window, the severity is downgraded. The original finding stays visible with an annotation showing which mitigator fired and where, so you can audit the call yourself.

The downgrades: command_injection, unsafe_eval, sensitive_file_access go to LOW. hardcoded_secret goes to MEDIUM (a placeholder is still an information disclosure about test infrastructure, just not a credential).

Precision is bounded by what regex can express. A renamed sanitizer or a database method that does not match the variable-name list still produces a false positive. Real data-flow analysis is on the v2.2 roadmap.

Recent mitigator additions

ruleset 3185eb87b4ce

Database .exec and eval-in-message-strings (HomenShum/nodebench-ai#8)

Maintainer reviewed two HIGH findings against source. Confirmed three real command_injection sites and refactored to argv-based spawn. Correctly identified unsafe_eval as a false positive: the regex matched better-sqlite3's db.exec(`SQL`) and the literal word eval inside a recommendations.push string.

Patterns added

sanitizer
/\.exec\s*\(\s*[`'"]?\s*(SELECT|INSERT|UPDATE|DELETE|CREATE|ALTER|DROP|PRAGMA|VACUUM|BEGIN|COMMIT|ROLLBACK|TRUNCATE|REPLACE|MERGE|GRANT|REVOKE|EXPLAIN)\b/i

SQL keyword immediately after .exec( — strong signal this is a database method, not child_process.exec.

sanitizer
/\b(db|database|conn|connection|client|pool|prepared|stmt|sql|query|knex|prisma)\.\s*exec\s*\(/i

Database-shaped variable names calling .exec — better-sqlite3, pg, mysql2, prisma raw queries.

test_fixture
/\/[a-zA-Z]*[Ee]val[A-Z][a-zA-Z]*\.(js|ts|mjs|cjs)\b/

Files like selfEvalTools.js, llmJudgeEval.js, pipelineEval.js — eval refers to evaluation flow, not JavaScript eval.

test_fixture
/\b(?:recommendations?|messages?|errors?|warnings?|notes?)\s*\.\s*push\s*\(\s*[`'"]/i

Strings being pushed into a recommendations or messages array are message text, not executable code.

test_fixture
/\b(console|logger|log|debug|info|warn|error|trace)\s*\.\s*[a-z]+\s*\(\s*[`'"][^`'"]*\beval\b/i

console.log and friends emitting strings that contain the word eval.

Outcome: nodebench-mcp@3.2.1 rescored 55/ELEVATED to 85/LOW after refactor + mitigators. Both findings downgraded with explicit annotations.

Declarative test-fixture markers (claude-flow CRITICAL hardcoded_secret in dist/)

claude-flow shipped a manifest-validator with structural test fixtures inside dist/. Existing test_fixture rules expected files to live under tests/ or specs/ — they did not catch shipped-as-data fixtures.

Patterns added

test_fixture
/\babc(123|def|xyz)/i

Canonical fake-credential placeholders. claude-flow used sk-abc123-style test keys.

test_fixture
/\b(123|abcd){3,}/i

Long repeating placeholder sequences like abcdabcdabcd.

test_fixture
/\bexpected(Outcome|Result|Behavior|Behaviour|Verdict|Action|Status)\s*:/

Declarative fixture structure: { params: { ... }, expectedOutcome: 'deny' } — pure data, not exploitable code.

test_fixture
/\b(should|must|will)(Fail|Pass|Reject|Allow|Block|Deny|Throw|Match)\b/i

Test-shaped predicate names sitting near otherwise-dangerous-looking literals.

test_fixture
/\/(?:[a-zA-Z-]*-)?(validator|sanitizer|detector|scanner|denier|filter)\.(js|ts|mjs|cjs)\b/i

Validator and sanitizer source files contain example dangerous strings by design.

Outcome: claude-flow's hardcoded_secret CRITICAL finding downgraded to MEDIUM. command_injection HIGH still flags pending a future mitigator pass.

Sanitizer wrappers (Agions/taskflow-ai#6)

Maintainer shipped v3.0.2 with a validateCommand wrapper around shell_exec (whitelist + dangerous-pattern detection) within 48h of the scan report. The scanner's HIGH command_injection finding for the same code was no longer the right severity once a guard was in place.

Patterns added

sanitizer
/\bvalidateCommand\b/

Direct match for the wrapper Agions introduced.

sanitizer
/\bsanitize(?:Command|Args|Input)?\b/i

Common naming convention for input sanitizers across the ecosystem.

sanitizer
/\bisAllowedCommand\b/i

Whitelist-style guards.

sanitizer
/\bexecFile\b/

execFile with an args array cannot shell-interpret. Different posture from exec.

sanitizer
/\bspawn\s*\(\s*[^,)]+,\s*\[/

spawn('cmd', [args]) array form bypasses the shell entirely.

Outcome: taskflow-ai went 45/HIGH to 60/ELEVATED on v3.0.2, then to 80/MODERATE on v4.0.0 after seven capabilities were deleted. Full arc closed in four days.

Reporting a precision gap

If the scanner flagged something it should not have, or missed something it should have caught, the report goes through public issue forms on the scanner repo. Detection-accuracy reports are not security disclosures and do not need confidentiality.

Real vulnerabilities in AgentScore infrastructure go to security@agentscores.xyz, not these forms.