Docker Image Vulnerability Scanning: Cut 300+ CVEs Down to What Actually Matters

Is the base image in the FROM line of your Dockerfile safe to use? Pulling nginx, python, grafana, or any other image from Docker Hub with the latest tag doesn't mean it's patched, current, or secure. A standard nginx image returns 300+ CVEs. Most are vulnerabilities in OS utilities unrelated to serving HTTP traffic, but not all of them can be ignored.

Some vulnerabilities are enough to compromise your application even if your code and infrastructure are flawless. CVE-2025-27363, an out-of-bounds write in libfreetype6 that can lead to arbitrary code execution, was present in the latest official nginx images at the time the vulnerability became known, and patched base layers took time to reach Docker Hub. With an EPSS score of 0.65 and confirmed active exploitation in the CISA KEV catalog, this is not a theoretical risk. It arrived silently, with nothing in the nginx release notes and no changes to the image you were already running.

So, even well-maintained images lag. nginx:latest has carried OpenSSL vulnerabilities for months at a time. The upstream fix exists, but the Docker Hub layer takes time to catch up. Teams pulling latest and following best practices were still deploying the vulnerability throughout.

The interpretation problem

A scanner gives you a long list of vulnerabilities that is hard to interpret and act on:

Which image is genuinely dangerous, and which is safe to use?
Which vulnerabilities deserve attention first?
Of hundreds of CVEs, which actually matter for your deployment?
How do you justify to an auditor or CTO why you're ignoring 298 out of 300 CVEs?

Solution: a systematic approach

This article covers the first step: reducing scanner noise through automated prioritization based on severity, exploitation probability, and image context, turning a raw CVE list into a manageable set that actually requires engineering attention.

Automated classification based on image metadata sits between two extremes: raw severity filtering, which ignores whether a vulnerable package has any role in the container, and reachability analysis, which knows exactly what code executes and what it touches. The metadata approach gets you most of the way there, and how far depends on how much context you give the model about the image.

The approach prioritizes vulnerabilities based on:

Severity, CVSS score, exploitation probability (EPSS), and confirmed active exploitation (CISA KEV)
Relevance to the specific image's function

In follow-up articles, we'll enrich the results with runtime context (which libraries are loaded, which functions are called) and show whether vulnerabilities can be triggered given your specific configuration.

After completing the full workflow, you'll be able to say with confidence:

✓Which CVEs are actually dangerous for your configuration
✓Whether the image needs an immediate update or can wait
✓Whether to switch to an alternative base image
✓Which risks are acceptable and which require immediate action
✓How to justify those decisions to a security auditor or management

The full workflow can be automated and integrated into a CI/CD pipeline. This series covers how to do that.

The Algorithm

01Scan

Run a scanner against the image to get a complete list of CVEs, including transitive dependencies you might not know are in the image.

02Enrich

For each CVE, add two signals beyond CVSS:

EPSS: probability of exploitation in the wild within the next 30 days, updated daily by FIRST.org
CISA KEV: catalog of CVEs with confirmed active exploitation maintained by the US government

CVSS measures theoretical severity. EPSS and KEV tell you whether exploitation is actually happening.

03Categorize

Assign each CVE to a bucket based on how the vulnerable package relates to what the container does:

Not Applicable: Package is not in the execute path (VEX: not_affected).

Post-Exploit Only: Exploitation requires prior container access.

Directly Exposed: Reachable from outside without prior access.

Categorization is done by an LLM using the CVE description, package type, and image purpose. Runtime tracing in later steps validates these decisions against the actual container.

Setup

We use Trivy as the scanner, the most widely adopted open-source container scanner, with over 34.2k stars on GitHub. For the experiment we picked three images with different base distributions and workload types:

Image	Base	Type
`node:20-alpine`	Alpine	Runtime
`nginx:1.25`	Debian	Web server
`grafana/grafana:10.0.0`	Ubuntu	Application

These are pinned versions used during testing. The algorithm applies to any image. Pinning versions ensures the results in this article are reproducible.

Categorization in Practice

Each CVE gets one of two outcomes: PROCESS or IGNORE. The decision tree:

KEV listed or
EPSS > 0.1

→PROCESSConfirmed active exploitation. No further analysis needed.

else ifSeverity: LOW→IGNORELow theoretical impact with no known exploitation.

else ifMEDIUM / HIGH / CRITICAL→CLASSIFY

↳Network-facing or parses external inputDirectly Exposed→PROCESS

↳Only exploitable post-breachPost-Exploit Only→PROCESS

↳Not part of the running serviceNot Applicable→IGNORE

↳Insufficient contextUnknown→PROCESS

How prompt formulation affects results

We ran the same dataset through three prompt versions. The difference between 28% and 45% noise reduction comes entirely from how the rules are formulated:

Prompt version	node (15)	nginx (363)	grafana (293)	total (671)
v1no image context	33% (5)	56% (205)	23% (68)	41% (278)
v2image context only	20% (3)	46% (167)	6% (18)	28% (188)
v3context + heuristics	27% (4)	70% (253)	15% (44)	45% (301)

The model's accuracy is directly tied to the context in the prompt: image purpose, entrypoint, package type. Without that context (as the v1 results show), the model has little to anchor a decision on and defaults to UNKNOWN. More runtime context (loaded libraries, actual call graph) would narrow this further; that's what Stages 2 and 3 cover.

Applying the Algorithm

01Scan

Image	Critical	High	Medium	Low	Unknown	Total
`node:20-alpine`	0	12	1	2	0	15
`nginx:1.25`	16	61	125	156	5	363
`grafana/grafana:10.0.0`	15	70	190	18	0	293

02Enrich

Across the three images, 3 entries match KEV (2 unique CVEs):

CVE	Severity	Package	Image
CVE-2025-27363	HIGH	libfreetype6	nginx
CVE-2023-44487	HIGH	nghttp2-libs	grafana
CVE-2023-44487	MEDIUM	golang.org/x/net	grafana

CVE-2023-44487 is the HTTP/2 Rapid Reset Attack, one of the most widely exploited vulnerabilities of 2023. It appears twice in grafana because two separate packages are affected.

03Categorize

LOW severity CVEs without a KEV flag are ignored by default (176 CVEs). The remaining 319 CVEs went through AI classification: 125 were assigned Not Applicable and ignored. Three examples:

Image	CVE	Sev	EPSS	KEV	Category	Decision
nginx	CVE-2025-27363	HIGH	0.65	✓	KEV	PROCESS
node	CVE-2024-21538	HIGH	0.00	—	Directly Exposed	PROCESS
nginx	CVE-2024-2398	HIGH	0.02	—	Not Applicable	IGNORE

Results

Image	Before	Process	Ignore	Noise reduction
`node:20-alpine`	15	11	4	27%
`nginx:1.25`	363	110	253	70%
`grafana/grafana:10.0.0`	293	249	44	15%

Where static context runs out

CVE-2026-23950 is a race condition in [email protected] (npm). The model flagged it Not Applicable: “the library is not loaded at runtime.” For a production container running node app.js that reasoning holds: npm never runs, tar never loads. But the model had a stronger argument available in the CVE description and ignored it: the vulnerability only triggers on case-insensitive filesystems like macOS APFS; Alpine uses ext4. Without that reasoning, the classification breaks for any container where npm install runs at runtime. This points to prompt design: explicit instructions to reason about OS and filesystem context would produce a more reliable result.

Conclusion

Across three images, 671 CVEs were processed: 301 filtered without manual review, 370 requiring attention. Noise reduction ranges from 15% to 70% depending on the image type and how conservatively the prompt is tuned.

The approach works best on images with a known, fixed purpose. For generic base images without application context, Not Applicable is harder to assign and more CVEs default to PROCESS.

The more context the prompt has (image purpose, entrypoint, package type), the more accurate the classifications and the fewer CVEs end up in Unknown. The three prompt versions above show the range: 28% to 45% noise reduction from the same dataset, purely from rule differences. This approach is better than raw severity filtering and worse than reachability analysis: it reasons from metadata, not from what the container actually loads. The tar example above shows where that matters. Treat the results as a first pass: useful for cutting the queue, but the prompt needs tuning for your specific infrastructure, and the calls it makes on ambiguous cases need verification. Stages 2 and 3 provide that.

Docker Image Vulnerability Scanning:
Cut 300+ CVEs Down to What Actually Matters

The interpretation problem

Solution: a systematic approach

The Algorithm

Setup

Categorization in Practice

How prompt formulation affects results

Applying the Algorithm

01Scan

02Enrich

03Categorize

Results

Where static context runs out

Conclusion

Subscribe to new posts

Analyze your image

Docker Image Vulnerability Scanning:Cut 300+ CVEs Down to What Actually Matters

The interpretation problem

Solution: a systematic approach

The Algorithm

Setup

Categorization in Practice

How prompt formulation affects results

Applying the Algorithm

01Scan

02Enrich

03Categorize

Results

Where static context runs out

Conclusion

Subscribe to new posts

Analyze your image

Docker Image Vulnerability Scanning:
Cut 300+ CVEs Down to What Actually Matters