Human versus Machine:

Comparing Human and
AI-Generated Prototypes

A usability study investigating whether an AI-generated interface can match a prototype created by interaction design students across effectiveness, efficiency, and user satisfaction.

Timeline Jan 2025 - Jun 2025 6 months
Role Researcher, Analyst, Author
Tools Visily AI, SPSS, Figma
HCI Research Usability Testing Generative AI Statistical Analysis
Human and AI prototype comparison
Research approach

Thesis process

The study moved from literature and experiment design through controlled usability testing, statistical analysis, and evaluation.

Research

Review HCI, usability, and generative AI literature.

Design

Select prototypes and plan a fair comparison.

Test

Run tasks, observation, recordings, and SUS.

Analyze

Compare results using SPSS and statistical tests.

Conclude

Evaluate AI's role in interaction design.

00
Research question

Can AI-generated prototypes achieve comparable usability?

Rather than judging visual aesthetics, the thesis examined how real users interacted with each solution. It measured effectiveness, efficiency, and user satisfaction to understand whether current AI prototyping tools can match human design work.

1. Research

Problem framing

Generative AI has sparked debate about how design professions may change. While text and image generation receive significant attention, less research has focused on AI-powered prototyping tools and their ability to produce usable interaction designs.

The thesis therefore focused on observed user behavior and measurable outcomes rather than assumptions about AI capability.

3 Usability dimensions
2 Prototype conditions
1 Controlled comparison
2. Experiment design

Human prototype vs AI generation

A high-fidelity student prototype for urban farming and community gardening was selected as the human-designed condition. The same design brief was translated into prompts for Visily AI.

Participants were not told which condition they received. Both prototypes were adjusted to support identical tasks and reduce bias during testing.

Visily AI prompt

Six requested screens

  1. Home dashboard with local gardens, workshops, and sustainable practices.
  2. Garden Connection for browsing gardens and joining community events.
  3. Community Hub with discussions, tips, and user gardening projects.
  4. Private messaging between users to discuss gardening.
  5. Calendar showing private events and friends' schedules.
  6. Social post screen with like, comment, and share interactions.
3. User testing

Controlled data collection

Twenty-two Stockholm University participants were divided into two groups using a between-subjects design. Each person interacted with only one prototype to prevent learning effects.

Five predefined tasks, screen recordings, click tracking, and SUS questionnaires produced quantitative data for comparison.

22 Participants 11 human / 11 AI
5 Tasks Identical in both conditions
A/B Study design Between subjects
EffectivenessMisclicks and navigation errors
EfficiencyTask completion time
SatisfactionSystem Usability Scale
4. Analysis

Results at a glance

The prototypes performed similarly in satisfaction and total task time. Individual tasks favored different solutions, while the clearest difference appeared in effectiveness: the AI prototype produced almost twice as many misclicks.

SUS score 80.0 / 70.4 Human vs AI, no significant difference
Total task time 133s / 132s Human vs AI

Task 2

Add a garden plot to favorites.

Human29s
AI58s
Human prototype faster p = 0.003

Task 3

Read the fourth step in a vegetable recipe.

Human39s
AI19s
AI prototype faster p = 0.003
Misclicks ~2x Higher in the AI-generated prototype
Reflection

AI as collaborator rather than replacement

The study challenged simple claims that AI-generated interfaces are either dramatically better or worse. AI produced comparable satisfaction and speed, but required human adjustment and caused more navigation errors.

The project reinforced that interaction design is not only about generating screens. Understanding behavior, measuring outcomes, and validating decisions with users remain essential.

Read the full thesis