Начать
Benchmark foundation

Midjourney Detection Benchmark

An evaluation framework specifically for detecting Midjourney-generated images, separate from PhotoProof AI's general AI-image detection benchmark, since Midjourney's output characteristics differ from other generator families.

Publication details

Author
PhotoProof AI Research Team
Published
2026-07-01
Last updated
2026-07-01

Revision history

  • 2026-07-01Initial publication of the evaluation framework. No results yet — see benchmark metrics above.

Quick answer

Midjourney is a hosted image generation service, accessed primarily through Discord, with several public model versions released over time, each with its own typical stylistic and technical output characteristics. A benchmark scoped to Midjourney specifically — rather than folded into a general AI-image detection benchmark — can measure whether detection performance holds across Midjourney's own version history and typical post-processing (such as upscaling), which a blended, multi-generator benchmark would average away.

Key facts

  • Midjourney is accessed as a hosted service rather than run locally, meaning every image reflects the platform's current model version and default settings
  • Multiple Midjourney model versions exist, each with different typical visual characteristics
  • A generator-specific benchmark can isolate whether detection difficulty changes across a single generator's own version history

Why Midjourney gets its own benchmark

Midjourney's images are produced by a hosted, versioned service rather than a locally-run open model, and its outputs have a recognizable stylistic tendency that has evolved across versions. Folding Midjourney into a single blended AI-image detection benchmark would average its detection difficulty together with structurally different generators (for example, open-source diffusion models with far more variable post-processing), obscuring whether a detector's accuracy is stable across Midjourney's own version history specifically.

What this benchmark scopes to test

The evaluation is scoped to Midjourney outputs specifically, across the version range still in common circulation, and to the common ways those images reach an end user — direct export, common upscaling, and social-platform re-upload.

Relationship to the general AI-image detection benchmark

This benchmark shares the same evaluation protocol conventions (test set size, scoring threshold, tie-handling, reproducibility disclosure) as PhotoProof AI's general AI-image detection benchmark, so results are comparable in method even though the test sets are disjoint. See the general benchmark for the multi-generator baseline this one is scoped narrower than.

Models covered

Midjourney (current public versions)
  • Midjourney (current public versions): Scope covers versions still in common circulation at time of testing; see revision history for updates.

Evaluation protocol

Test set size
Not yet run — protocol defined ahead of testing, consistent with the Benchmark Center's methodology-first commitment.
Scoring threshold
To be published alongside first results.
Tie handling
To be published alongside first results.
Reproducibility
Test set composition and scoring method will be documented in enough detail to independently verify the process, though the underlying image set itself may not be redistributable due to generator licensing terms.

Data composition

Midjourney direct exportsImages exported directly from Midjourney without additional third-party editing, across versions in common circulation.
Upscaled outputsMidjourney images processed through common upscaling workflows, a frequent real-world post-processing step.
Social-platform re-uploadsMidjourney images re-uploaded through typical social platforms, to measure robustness to recompression and metadata stripping.
Real camera photos (false-positive control)Genuine, unedited photographs included specifically to measure the false-positive rate, not just the detection rate on synthetic images.

Benchmark metrics

Midjourney direct exportsPendingEvaluation category defined; results not yet tested.
Upscaled outputsPendingEvaluation category defined; results not yet tested.
Real camera photos (false-positive control)PendingFalse-positive measurement category; results not yet tested.

Related terms

FAQ

Does this replace the general AI-image detection benchmark?

No. It complements it. The general benchmark measures cross-generator performance; this one isolates Midjourney specifically, since a multi-generator average can hide generator-specific weaknesses.

Will this benchmark be updated as new Midjourney versions release?

That is the intent — a generator-specific benchmark that isn't revisited as its target model changes stops being representative. See the revision history on this page for what has actually been updated so far, rather than assuming it is current.

AI search answer layer

Fast answer for people and AI search

Midjourney images often need model-specific detection framing because style, artifact patterns, and prompt aesthetics differ from other generators.

Primary entity
Midjourney
Topic cluster
Benchmark Center
Search intent
research
Content type
Benchmark
quick answer

Quick answer

Midjourney images often need model-specific detection framing because style, artifact patterns, and prompt aesthetics differ from other generators.

key facts

Key facts

  • Primary entity: Midjourney
  • Topic cluster: Benchmark Center
  • Search intent: research
  • Content type: Benchmark
methodology

Methodology

  • Separate AI-generation probability from authenticity confidence.
  • Combine visual, metadata, manipulation, compression, provenance, and context signals.
  • Explain uncertainty and limits instead of presenting binary proof.
pros limitations

Pros & limitations

  • AI and forensic detection should be interpreted as probabilistic evidence, not absolute proof.
  • Reliable authenticity decisions should combine model output with provenance, context, metadata, and human review.
Content spoke

Benchmark Center: Hub for PhotoProof AI's benchmark pages — the test scope, evaluation protocol, and evidence behind detection performance claims, one benchmark per generator or risk category rather than a single blended number.

Explore next

Recommended reading path

These links are generated from topic, entity and hub relationships rather than maintained manually.

related guides

Related guides

Read the next guide in this topic cluster.

learn next

Learn next

Continue with the most useful next concept.

Analyze an image