Case Study

Agions fixed taskflow-ai security issues 48 hours after we flagged them

Name: AgentScore MCP Package Security Scan Dataset
Creator: AgentScore

By Michael K Onyekwere

A regex scan, a GitHub issue, two code fixes, a clean release. The postinstall went from HIGH to LOW. The shell-exec wrapper landed. Score moved from 45 to 60. This is what the gate looks like in a week when the maintainer is paying attention.

This is the maintainer-remediation leg of the case-study set. Redis shows consumer prevention by pinning. NodeBench shows the scanner improving from public maintainer pushback.

Findings addressed: 2
Time to fix: 48h
Score change: 45 → 60

What we found

On April 20, 2026 our monitor scanned taskflow-ai@3.0.1. The package is a feature-rich AI task-management CLI published to npm by maintainer Agions. The scan returned score 45 of 100, risk level HIGH, with four findings:

HIGH install_script: postinstall ran node -e "console.log(...)", which routes through the shell and carries an interpretation surface even for a cosmetic banner.
HIGH command_injection: a shell_exec tool called with a template literal argument, one of the classic shell-injection patterns.
MEDIUM excessive_dependencies: 31 runtime dependencies, large attack surface.
LOW no_provenance: not published with npm provenance attestations.

We opened Agions/taskflow-ai#6 with the scan report, the specific source locations, and recommended fixes.

What Agions did

“We have addressed the following issues in version 3.0.2. postinstall script: changed from node -e "console.log(...)" to simple echo command. This eliminates the potential for any shell interpretation during install. command injection: added command validation to shell_exec tool in built-in.ts. Now uses the same validateCommand security validator as our other shell execution tools. The validator checks against dangerous patterns and maintains a whitelist of allowed commands.”
Agions, taskflow-ai maintainer, on Agions/taskflow-ai#6, April 22, 2026

taskflow-ai@3.0.2 went live on npm the same day. Our watch cron picked up the publish within minutes. On the rescan, the install_script finding correctly dropped from HIGH to LOW (plain echo has no interpretation surface). Score moved 45 to 60. Issue closed.

What the score did not change, and why it is still useful

The command_injection finding still flags HIGH on 3.0.2. That is not a bug in Agions' fix. The scanner correctly detected a separate unguarded shell-exec in the marketplace installer that Agions' shell_exec patch did not touch. The fix was real and material; the package just has another pattern of the same class in a different file. A useful trust tool tells the truth even when it is awkward.

Our scanner v2.1 (shipped April 22) adds context-aware severity downgrade: if a known sanitizer wrapper like validateCommand is detected near a flagged call, the finding is downgraded. In Agions' 3.0.2 case the wrapper is in shell.js, the remaining flagged call is indownloader.js. Different scope, no downgrade. If the next release brings validateCommand coverage to the installer too, the score will catch up automatically.

Update: v4.0.0 (April 24)

Two days after v3.0.2 shipped, Agions released taskflow-ai@4.0.0 and took a different path than the section above anticipated. Instead of adding the validateCommand wrapper to the installer to satisfy the scanner, the maintainer removed the risky surface area entirely. Our watch cron picked up the publish within minutes.

Score: 60 → 80
Risk: ELEVATED → MODERATE
HIGH findings: 1 → 0

Seven capabilities were deleted from the tool surface in v4.0.0:

• code_analysis
• database_access
• email_messaging
• memory_state
• network_egress
• repo_read
• shell_exec

The shell_exec tool that produced the originalcommand_injection finding is gone. The remaining findings are all LOW or MEDIUM: the install_script that is already just an echo, a high-but-normal runtime dependency count, and the standard no_provenance flag. Four days from first scan report to closed-loop structural cleanup.

This is what the watch feed is supposed to surface. Not just individual fixes, but the major-version rewrites that close whole classes of issue. Repo operators who had pinned v3.x can see exactly what changed and decide whether to bump.

What this pattern demonstrates

Redis (see case study) closed their exposure by pinning MCP dependencies. Agions closed theirs by writing fixes. Two different postures, same ecosystem hygiene result: a package that was HIGH is now ELEVATED, and the path to LOW is clear and documented. Both cases happened because a regex scanner, a monitoring feed, and a willingness to open the issue did the work that no one was being paid to do.

NodeBench completes the set. Redis is consumer prevention. Agions is maintainer remediation. NodeBench is precision correction, where a maintainer challenged the scanner with evidence and the ruleset got better the same day.

The AgentScore Policy Gate makes this loop run automatically for any repo that installs MCP dependencies. Version bumps become deliberate decisions. Score drops block merge until someone looks. Capability changes surface before they ship.

See the live advisory feed

Every score change, every capability drift, every new finding. RSS and JSON. Free.

Advisory Feed Install the Policy Gate