Scanner / Precision
Precision lineage
By Michael K Onyekwere
The AgentScore scanner is a regex-based detector. Regex cannot tell a database method call apart from a shell exec, or a test fixture apart from a real credential, without context. Scanner v2.1 ships a mitigator system that scans a window around each match for known sanitizer wrappers or test-fixture markers and downgrades the severity when one fires.
The list below is every mitigator the scanner has gained, paired with the public report that motivated it. Each entry was driven by a real maintainer interaction, not a hypothetical edge case.
This page is the scanner's public memory. If you want the full maintainer loop, start with the NodeBench case study, then compare it with the live package report.
Scanner version
v2.2
Current ruleset digest
55366a05fee3
How mitigators work
When a finding fires, the scanner looks at a 2,000-character window around the match. If a sanitizer pattern (like validateCommand, execFile, or a database-shaped .exec()) hits in that window, the severity is downgraded. The original finding stays visible with an annotation showing which mitigator fired and where, so you can audit the call yourself.
The downgrades: command_injection, unsafe_eval, sensitive_file_access go to LOW. hardcoded_secret goes to MEDIUM (a placeholder is still an information disclosure about test infrastructure, just not a credential).
Precision is bounded by what regex can express. A renamed sanitizer or a database method that does not match the variable-name list still produces a false positive. Real data-flow analysis is on the v2.2 roadmap.
Recent mitigator additions
Per-file iteration + all-matches walk (Codex review caught two scope leaks)
Codex review of the earlier May 16 mitigator pass surfaced two structural issues. First, the scanner read the whole gunzipped tarball as one buffer and ran findMitigators against +/-2000 chars in that buffer, so a README heading in one file could downgrade a real finding in another (cross-file leak). Second, even after per-file iteration, the scanner ran each pattern as a single .exec() per file, so an early benign shell call in a file would mask a later real unsafe one in the same file (same-file masking). Both fixed today.
Patterns added
iterateTarFiles(buf)Walk POSIX tar archive entry-by-entry, including GNU longname (L) and pax (x) extended header support. Mitigators run against the current file's content only. ~80 lines, no new dep.
findAllMatches(pattern, content)Iterate every match per pattern per file (capped at 200 to bound pathological inputs). Each match's severity is computed against its own +/-2000 char window. The worst severity across all matches wins for that file. Short-circuits on the first match that remains at the pattern's original severity.
isScannableFile(filename)Path filter that allows source-like extensions and excludes pure artefacts (node_modules, .git, coverage, .next, *.map, *.min.js, *.d.ts). Crucially does NOT skip dist/ or build/ — in published npm tarballs those are usually the actual executable code.
Self-detected false-positive class: command_injection in browser/CLI MCP packages (31 advisories affected)
Internal tracking on 2026-05-15 showed 31 distinct browser/terminal/CLI-automation MCP packages flagged with HIGH command_injection in a 30-day window. Volume alone made a real-bug-per-package interpretation implausible. Sample-checked five (safari-mcp, brave-real-browser-mcp-server, memoir-cli, s3db.js, claude-flow): four of five were clear false positives, one ambiguous. Causes: postinstall codesign of internal binaries, database client .exec(`SELECT ...`) misidentified as child_process.exec, hardcoded ALL_CAPS constants treated as user input, numeric IDs from webhook payloads, and a security tutorial .md file that the regex caught literally with `// ❌ Dangerous: shell injection possible` as the matched line. The advisory pipeline had been auto-publishing HIGH severity on these patterns at ~one per day. Caught and shipped under our own steam before any maintainer push-back; the public-correction loop is the asset, applied to ourselves.
Patterns added
/\b(db|database|conn|connection|client|pool|prepared|stmt|sql|query|knex|prisma|this)\.\s*exec\s*\(/iExtended the existing database-method allowlist to include `this.exec(`. s3db.js's remote-sqlite-client calls `this.exec(\`SELECT ...\`, [params])` for SQL, not shell.
/\b(__dirname|__filename|process\.cwd\(\)|path\.(?:join|resolve|dirname|basename))\b/Compile-time identifiers within scope. safari-mcp's postinstall used `path.join(__dirname, '..', 'safari-helper')` and the regex treated that as user input.
/\$\{[A-Z][A-Z0-9_]{2,}\}/ALL_CAPS interpolated identifiers strongly signal a compile-time constant. s3db.js's `${REPO_URL}` is a hardcoded repository URL.
/\$\{\s*(?:Number|parseInt|parseFloat)\s*\(/Numeric coercion. A value passed through Number(), parseInt(), or parseFloat() cannot carry shell metacharacters.
/\$\{\s*[a-zA-Z_]+\.(?:number|id|index|count|length|size)(?:\s*\|\s*0|\s*\?\?\s*0)?\s*\}/Properties guaranteed to be numeric. claude-flow's webhook example uses `${event.pull_request.number}` which is always an integer from GitHub.
/\b(codesign|signtool|notarytool)\b|\bgpg\s+--sign\b/Code-signing toolchain. Postinstall scripts in macOS/Windows helper packages invoke these against package-internal binaries, never user input.
/execSync\s*\(\s*`npm\s+(?:view|pack|info)\s+\$\{[a-zA-Z_]+\}/Auto-update scripts querying npm for known dependency names from package.json.
/```\s*(?:js|ts|javascript|typescript|jsx|tsx|json|bash|sh|shell|console)?\s*\n/Triple-backtick code fences. Markdown documentation routinely contains example code, including anti-pattern examples.
/❌|✅|⚠️\s+(?:Dangerous|Unsafe|Bad|Don't|Avoid)/iMarkdown anti-pattern markers. claude-flow's v3-security-architect.md literally labelled the example shell-exec line `❌ Dangerous: shell injection possible` and the scanner caught the tutorial.
/\/\/\s*(?:❌|⚠️|Dangerous|Unsafe|Bad\s+example|Don't\s+do\s+this|Avoid\s+this|Anti-pattern|Vulnerable)\s*[:.]/iEnglish comment annotations marking code as an anti-pattern.
/^#{1,4}\s+\S/mMarkdown headings within scope. Strong signal the surrounding context is documentation, not executable source.
/<!--[\s\S]*?-->/HTML / JSX comments commonly used in README, MDX, and docs.
Database .exec and eval-in-message-strings (HomenShum/nodebench-ai#8)
Maintainer reviewed two HIGH findings against source. Confirmed three real command_injection sites and refactored to argv-based spawn. Correctly identified unsafe_eval as a false positive: the regex matched better-sqlite3's db.exec(`SQL`) and the literal word eval inside a recommendations.push string.
Patterns added
/\.exec\s*\(\s*[`'"]?\s*(SELECT|INSERT|UPDATE|DELETE|CREATE|ALTER|DROP|PRAGMA|VACUUM|BEGIN|COMMIT|ROLLBACK|TRUNCATE|REPLACE|MERGE|GRANT|REVOKE|EXPLAIN)\b/iSQL keyword immediately after .exec(. Strong signal this is a database method, not child_process.exec.
/\b(db|database|conn|connection|client|pool|prepared|stmt|sql|query|knex|prisma)\.\s*exec\s*\(/iDatabase-shaped variable names calling .exec, like better-sqlite3, pg, mysql2, prisma raw queries.
/\/[a-zA-Z]*[Ee]val[A-Z][a-zA-Z]*\.(js|ts|mjs|cjs)\b/Files like selfEvalTools.js, llmJudgeEval.js, pipelineEval.js. The word eval refers to evaluation flow, not JavaScript eval.
/\b(?:recommendations?|messages?|errors?|warnings?|notes?)\s*\.\s*push\s*\(\s*[`'"]/iStrings being pushed into a recommendations or messages array are message text, not executable code.
/\b(console|logger|log|debug|info|warn|error|trace)\s*\.\s*[a-z]+\s*\(\s*[`'"][^`'"]*\beval\b/iconsole.log and friends emitting strings that contain the word eval.
Declarative test-fixture markers (claude-flow CRITICAL hardcoded_secret in dist/)
claude-flow shipped a manifest-validator with structural test fixtures inside dist/. Existing test_fixture rules expected files to live under tests/ or specs/, so they did not catch shipped-as-data fixtures.
Patterns added
/\babc(123|def|xyz)/iCanonical fake-credential placeholders. claude-flow used sk-abc123-style test keys.
/\b(123|abcd){3,}/iLong repeating placeholder sequences like abcdabcdabcd.
/\bexpected(Outcome|Result|Behavior|Behaviour|Verdict|Action|Status)\s*:/Declarative fixture structure: { params: { ... }, expectedOutcome: 'deny' }. Pure data, not exploitable code.
/\b(should|must|will)(Fail|Pass|Reject|Allow|Block|Deny|Throw|Match)\b/iTest-shaped predicate names sitting near otherwise-dangerous-looking literals.
/\/(?:[a-zA-Z-]*-)?(validator|sanitizer|detector|scanner|denier|filter)\.(js|ts|mjs|cjs)\b/iValidator and sanitizer source files contain example dangerous strings by design.
Sanitizer wrappers (Agions/taskflow-ai#6)
Maintainer shipped v3.0.2 with a validateCommand wrapper around shell_exec (whitelist + dangerous-pattern detection) within 48h of the scan report. The scanner's HIGH command_injection finding for the same code was no longer the right severity once a guard was in place.
Patterns added
/\bvalidateCommand\b/Direct match for the wrapper Agions introduced.
/\bsanitize(?:Command|Args|Input)?\b/iCommon naming convention for input sanitizers across the ecosystem.
/\bisAllowedCommand\b/iWhitelist-style guards.
/\bexecFile\b/execFile with an args array cannot shell-interpret. Different posture from exec.
/\bspawn\s*\(\s*[^,)]+,\s*\[/spawn('cmd', [args]) array form bypasses the shell entirely.
Reporting a precision gap
If the scanner flagged something it should not have, or missed something it should have caught, the report goes through public issue forms on the scanner repo. Detection-accuracy reports are not security disclosures and do not need confidentiality.
Real vulnerabilities in AgentScore infrastructure go to security@agentscores.xyz, not these forms.