Muse: Outfit curation, reimagined with AI

An eight-week design sprint inside Snap's AR Shopping Suite. The goal was to turn a passive browse experience into a conversation that assembles complete, shoppable looks around what a user actually needs.

Task success rate

88%

Error rate

3%

User satisfaction

4.0 / 5.0

My role

Product Designer

Timeline

Eight-week sprint, 2023

Team

PM, User Researcher, Data Scientists, Engineers, 3 Designers

Platform

Web and Mobile via ARES Shopping Suite

Context

A sprint with something real at stake

By 2023, Snap's ARES Shopping Suite had two strong technical foundations in place: an outfit recommendation algorithm that could assemble looks from live catalogue inventory, and a virtual try-on pipeline that rendered garments onto a user's body with accurate fabric and lighting. What it lacked was a front-end experience that connected them around a shopper's actual intent. Discovery was still a browse-and-hope exercise with no personalisation and no guidance.

Three designers each developed a competing proposal. The strongest concept would be selected for a real A/B test with retail clients. My hypothesis going in was straightforward: the problem wasn't the technology. Shoppers still had to do the hardest part themselves, spending 20 to 40 minutes jumping between tabs and mentally assembling a look from individual product pages, with no guarantee the pieces would actually work together.

Muse's premise was simple: a shopper who types "I need something for a coastal holiday in July" has given the system more useful signal in one sentence than a recommendation engine gets from months of browse history. The conversation replaces the browse.

Muse never made it to production. The 2023 usability test confirmed the interaction model worked. What it couldn't confirm was whether a live algorithm would generate recommendations good enough to hold that trust over time.

Weeks	Phase	With
1–2	Research: interviews, empathy mapping, card sorting, competitor analysis	User Researcher, Data Scientists
3–4	Ideation: crazy 8s, wireframes, flow validation	Head of Design, PM
5–6	High-fidelity design and prototyping	Head of Design, Engineers
7	Usability testing: 12 participants, moderated	User Researcher
8	Iteration, consent architecture, developer handoff	Legal and Privacy, Engineers

Weeks 1–2

Research: interviews, empathy mapping, card sorting, competitor analysis

User Researcher, Data Scientists

Weeks 3–4

Ideation: crazy 8s, wireframes, flow validation

Head of Design, PM

Weeks 5–6

High-fidelity design and prototyping

Head of Design, Engineers

Week 7

Usability testing: 12 participants, moderated

User Researcher

Week 8

Iteration, consent architecture, developer handoff

Legal and Privacy, Engineers

Interactive prototype

Prototype built in Framer. Covers conversational discovery, outfit reveal, and virtual fitting room on desktop and mobile.

Problem

Finding pieces is easy. Building a look isn't.

Shoppers in the ARES suite could already see how a garment fit their body. What they couldn't do was get help choosing what to try on. The discovery phase was entirely unassisted. Every major retailer already used some form of AI to suggest products, but those suggestions were generic, backward-looking, and unexplained. Shoppers had learned to ignore them. The issue wasn't personalisation. It was that nothing ever showed its reasoning.

Core problem statement

How do we reduce the cognitive work of building a complete outfit from hours to minutes, while earning enough trust that shoppers actually act on the recommendations?

"I can find individual pieces I like but I never know if they're going to work together until I'm standing in front of a mirror. By then I've already paid for everything."

Usability test participant, 2023

Research & Analysis

What two weeks of research taught us

We ran interviews, empathy mapping, card sorting, and competitor analysis alongside the User Researcher and Data Scientists. We needed to settle one question before designing anything: were shoppers ignoring AI recommendations because they were inaccurate, or for some other reason entirely? The answer had completely different implications for what we should build.

Three shopper archetypes

We started by defining three archetypes from interview synthesis, then used them throughout the project to pressure-test design decisions.

The Style Seeker

Enjoys discovery, wants curation

Elif loves browsing but gets overwhelmed. She wants a curated starting point, not a blank search bar.

The Trend-Led Shopper

Social proof matters

Ji-woo tracks occasions and seasons. She appreciates recommendations that feel intentional and current.

The Efficient Buyer

Goal-oriented, low patience

Marcus shops with a specific need. He will abandon if the flow doesn't move fast enough.

The trust gap wasn't accuracy. It was silence

Across 12 interviews, empathy mapping surfaced one finding that reframed the entire brief. Shoppers weren't sceptical of AI recommendations because they thought the technology was wrong. They ignored them because nothing explained the thinking behind them. A recommendation without context felt arbitrary. And arbitrary doesn't convert.

Empathy mapping across 12 participants. The trust gap wasn't accuracy. It was silence.

Everyone had recommendations. Nobody showed their work

We mapped Stitch Fix, Zalando, Farfetch, and Amazon across onboarding friction, recommendation transparency, and AR depth. Tools that asked a lot upfront lost users before seeing a result. Tools that asked nothing felt generic. Nobody had built something that gathered preference through conversation and showed its reasoning. That gap became Muse.

Competitor analysis across Stitch Fix, Zalando, Farfetch, and Amazon. The low friction, high transparency quadrant was empty.

What research confirmed

01 Trust requires explanation

Users didn't distrust AI recommendations because they were inaccurate. They ignored them because nothing explained the reasoning. Visible context was non-negotiable.

02 Style and body data are separate

Shoppers treat style preferences and body data as completely separate decisions. Asking about both in the same flow felt intrusive and caused drop-off before the experience had started. The two needed to be separate entry points.

03 Conversation over questionnaire

Open prompts felt low-effort and personal. Structured onboarding forms felt like admin. The format of the input shapes how willing users are to engage.

04 AR try-on converts at the decision point

Seeing an item on a body like yours was the most effective way to move from interest to purchase. The try-on needed to be one tap from the recommendation, not a separate flow.

Design

Designing the conversation, the curation, and the fitting room

Six weeks of design across sketches, wireframes, and high-fidelity screens. The concept held from the very first session: a conversational interface that returns complete outfit looks grounded in real inventory, with visible reasoning throughout, connecting directly into a personal virtual fitting room.

Crazy 8s and sketching

Eight directions explored in one session. From search-first to conversation-first, model selectors to product grids. The chat prompt and outfit reveal in the bottom right won out.

Wireframes and flow validation

The full journey mapped and validated with the PM and engineering lead before moving to high fidelity.

A direction we tried and ruled out

The earlier direction surfaced individual items by category, with a separate fitting room and a choice of generic models. No photo upload. Building an outfit meant users picking and pairing pieces manually. We scrapped it because the hard work was still on the shopper, the fitting room was a separate step, and generic models weren't personalisation.

The final design

Built within Snap's ARES Shopping Suite design system, the final concept runs across three connected surfaces: a conversational discovery interface, a virtual fitting room, and a Fit Finder integration for sizing.

Each surface was designed separately for desktop and mobile. Not as a responsive adaptation, but as two distinct layouts built around how people actually shop on each. On desktop, outfit cards sit side by side with a persistent chat sidebar. On mobile, cards stack full-width with reasoning tucked behind a tap. The fitting room goes from a three-panel layout on desktop to an 80% viewport preview on mobile with a bottom sheet for actions.

Desktop: persistent sidebar with chat history, outfit cards side-by-side, reasoning note visible beneath each look name.

Stacked mobile outfit cards with transparent AI insights. The immersive mobile fitting room keeps the focus on the look, using a functional bottom sheet for a seamless mobile shopping journey.

The photo upload introduced a consent design problem. Uploaded photos qualify as biometric data under GDPR in several EU member states and under BIPA in Illinois, which meant a separate explicit consent step was needed directly before the upload UI, not bundled with general data consent. The EU and North American flows were designed separately rather than as a single global compromise. That work happened in week eight alongside Legal and Privacy.

Interaction

How the conversation becomes a purchase

The three surfaces only make sense as a connected sequence. Here is how a user moves through them, and the decisions that shaped each moment.

1. The prompt: visible thinking, not instant results

The user types a natural-language prompt or selects a chip. On send, the AI enters a visible streaming state: a typing indicator appears and the response builds progressively. An instant result feels like a filter. A streaming response feels like someone thinking. The distinction is small technically and significant experientially.

2. One clarifying question, more precise output

Before returning outfit looks, Muse may ask a single follow-up question to narrow the brief, "Is this more of a restaurant dinner or something with a bar and dancing after?" The answer refines the recommendations and adds a preference signal to the style model. This is also Muse's cold start solution: a first-time user with no history can be guided toward something specific through one question. The conversation is the onboarding.

3. Complete looks, not a product grid

Two or three outfit cards appear, each with a look name, assembled pieces, total price, and a "Shop this look" CTA. A brief reasoning note sits beneath each name explaining why this combination fits the prompt. Every recommendation shows its work. This is the most important design decision in the entire flow and the clearest point of difference from every existing recommendation engine.

4. Out-of-stock handling: complete looks only

When an item in a recommended look is out of stock, Muse substitutes the look rather than surfacing an incomplete one. The next best complete and shoppable alternative is shown in its place. If Fit Finder has been completed, Muse filters by the user's size before surfacing any looks. On a first visit with no size data, looks are shown with a prompt to complete Fit Finder before adding to cart.

5. Photo upload and the seven-second wait

The fitting room leads with "Upload your photo" as the primary action. On upload, a loading indicator appears on the model frame alongside a live countdown timer. At approximately seven seconds the rendered image fades in. Making the generation time visible was a deliberate trust decision: it signals that something computationally real just happened, rather than a static image swap.

6. One tap to add the complete look

A single "Add all to cart" CTA replaces five separate add-to-cart interactions. For returning users with a size profile, their recommended size is pre-selected. For first-time users, clicking "Add all to cart" triggers the Fit Finder prompt inline. The user completes the short sizing flow and the item is added in the right size without leaving the page.

Muse doesn't rely on upfront profiling. It builds a style model from signals a user generates naturally as they interact: what they dwell on, what they skip, what they explicitly reject in a prompt. Session signals stay local to the conversation. Persistent ones, like adding to cart or completing Fit Finder, update a long-term profile across future sessions. After roughly three sessions the experience becomes meaningfully personalised. On a first visit, the conversational prompt and clarifying question stand in for history entirely.

Availability and sizing decision logic

All items in stock in the user's size

Show look normally.

One item unavailable in the user's size, available in an adjacent size

Substitute the item with the closest available size. Surface a note flagging the size difference. The shopper decides whether to proceed.

One item low stock or fully out of stock

Substitute the look with the next best complete alternative.

Hero garment out of stock or unavailable in the user's size

Substitute the look entirely. No partial looks are surfaced.

Two or more items unavailable

Substitute the look with the next best complete alternative.

No size data yet (first visit, Fit Finder not completed)

Surface looks without size filtering. Prompt to complete Fit Finder before adding to cart. Size-aware substitution activates after the first sizing interaction.

Usability testing

What we learned from 12 participants

Moderated usability testing with 12 participants, recruited to reflect the three archetypes from research. Each was given the same scenario: find a complete outfit for a specific occasion using whichever tools the prototype offered. Sessions were conducted remotely using think-aloud protocol, with a second team member noting where participants hesitated or expressed uncertainty.

The most revealing moments weren't in the metrics. Several participants paused after seeing the outfit card reasoning note. Lines like 'Perfect for a sunset boat trip or cliffside dinner' landed exactly as intended. The reaction was consistent: 'oh, so it actually understood what I meant.' That confirmed the core research finding: unexplained recommendations are the trust problem, not inaccurate ones.

Metric

Muse

Proposal B

Proposal C

Task success rate

88%

75%

70%

Error rate

Feature engagement

65%

50%

45%

The feature engagement gap was the most meaningful number. Users who tested Muse were significantly more likely to try both the fitting room and Fit Finder, not because they were prompted to, but because the outfit-first flow made those surfaces feel like natural next steps.

It's worth being honest about the limits here. Participants responded to static outfit cards with pre-written reasoning notes, not live AI-generated recommendations. What the test validated was the interaction model: that outfit-level curation, visible reasoning, and a connected fitting room created a more confident and engaged shopper. Whether the algorithm produces recommendations good enough to sustain that trust at scale is a question only a live A/B test can answer, which is exactly what the next phase was designed to address.

Reflection

What I'd do differently

The blank prompt

A blank prompt was too much of a design bet for users who hadn't used anything like it before. I'd test prompt chips, suggested starting points, and a browse fallback before committing to the open-ended entry as the only path.

Clarifying questions

The clarifying exchange was added as a concept but not usability-tested in enough depth. I'd want to know how many users skip it, whether the question timing feels natural, and whether the visible effect on the outfit output is enough to make the exchange feel worthwhile.

Out-of-stock handling

Substitution was the right call, but the logic itself needed more testing than it got. How similar does a substitute need to be before it feels like a real alternative rather than a consolation? And should the swap be visible to the user or happen silently? Only live data can answer that.

Photo upload privacy

Shifting to personal photo upload was the right direction. But the privacy and comfort implications, particularly GDPR biometric provisions in the EU and BIPA in Illinois, needed earlier legal alignment and more usability testing than the timeline allowed.

Making the learning visible

Signal collection happens silently. In retrospect I'd surface a lightweight "Here's what I've learned about your style" card after a few sessions. Personalisation that's visible gives users a sense of control over it, which closes the same trust gap identified in research.

Role and collaborators

What I owned and who I worked with

I was one of three designers who developed a competing proposal during the sprint. My responsibility covered the full end-to-end experience: research participation, ideation, wireframing, high-fidelity design across desktop and mobile, and prototype preparation for usability testing. The conversational discovery flow, the fitting room reframe from "View on Model" to personal photo upload, and the personalisation signal model were all decisions I owned and proposed. The privacy and consent architecture was developed collaboratively with Legal and Privacy in week eight.

Head of Product

Oversaw product strategy and ensured alignment with business goals across all three proposals.

Head of Design

Guided the overall design vision and facilitated bi-weekly critique sessions throughout the sprint.

User Researcher

Ran the interview programme, facilitated empathy mapping, and moderated the usability test sessions.

Data Scientists

Advised on signal weighting and provided the card sorting similarity matrix.

Engineers

Consulted throughout on feasibility, and scoped the MVP build estimate for the A/B test phase.

Legal and Privacy

Defined EU and North American consent requirements and reviewed the consent architecture.

menu

Muse: Outfit curation, reimagined with AI

88%

88%

3%

3%

4.0 / 5.0

4.0 / 5.0

A sprint with something real at stake

Interactive prototype

Finding pieces is easy. Building a look isn't.

What two weeks of research taught us

Three shopper archetypes

The Style Seeker

The Trend-Led Shopper

The Efficient Buyer

The trust gap wasn't accuracy. It was silence

Everyone had recommendations. Nobody showed their work

What research confirmed

01

Trust requires explanation

02

Style and body data are separate

03

Conversation over questionnaire

04

AR try-on converts at the decision point

Designing the conversation, the curation, and the fitting room

Crazy 8s and sketching

Wireframes and flow validation

A direction we tried and ruled out

The final design

How the conversation becomes a purchase

1. The prompt: visible thinking, not instant results

1. The prompt: visible thinking, not instant results

2. One clarifying question, more precise output

2. One clarifying question, more precise output

3. Complete looks, not a product grid

3. Complete looks, not a product grid

4. Out-of-stock handling: complete looks only

4. Out-of-stock handling: complete looks only

5. Photo upload and the seven-second wait

5. Photo upload and the seven-second wait

6. One tap to add the complete look

6. One tap to add the complete look

Availability and sizing decision logic

What we learned from 12 participants

What I'd do differently

The blank prompt

The blank prompt

Clarifying questions

Clarifying questions

Out-of-stock handling

Out-of-stock handling

Photo upload privacy

Photo upload privacy

Making the learning visible

Making the learning visible

What I owned and who I worked with

Head of Product

Head of Design

User Researcher

Data Scientists

Engineers

Legal and Privacy