Case study, Muse, Snap Inc, 2023

Muse: Outfit curation, reimagined with AI

An eight-week design sprint inside Snap's AR Shopping Suite. The goal was to turn a passive browse experience into a conversation that assembles complete, shoppable looks around what a user actually needs.

Task success rate

88%

88%

Error rate

3%

3%

User satisfaction

4.0 / 5.0

4.0 / 5.0

My role

Product Designer

Timeline

Eight-week sprint, 2023

Team

PM, User Researcher, Data Scientists, Engineers, 3 Designers

Platform

Web and Mobile via ARES Shopping Suite

Context

A sprint with something real at stake

By 2023, Snap's ARES Shopping Suite had two strong technical foundations in place: an outfit recommendation algorithm that could assemble looks from live catalogue inventory, and a virtual try-on pipeline that rendered garments onto a user's body with accurate fabric and lighting. What it lacked was a front-end experience that connected them around a shopper's actual intent. Discovery was still a browse-and-hope exercise with no personalisation and no guidance.

Three designers each developed a competing proposal. The strongest concept would be selected for a real A/B test with retail clients. My hypothesis going in was straightforward: the problem wasn't the technology. Shoppers still had to do the hardest part themselves, spending 20 to 40 minutes jumping between tabs and mentally assembling a look from individual product pages, with no guarantee the pieces would actually work together.

Muse's premise was simple: a shopper who types "I need something for a coastal holiday in July" has given the system more useful signal in one sentence than a recommendation engine gets from months of browse history. The conversation replaces the browse.

Weeks Phase With
1–2 Research: interviews, empathy mapping, card sorting, competitor analysis User Researcher, Data Scientists
3–4 Ideation: crazy 8s, wireframes, flow validation Head of Design, PM
5–6 High-fidelity design and prototyping Head of Design, Engineers
7 Usability testing: 12 participants, moderated User Researcher
8 Iteration, consent architecture, developer handoff Legal and Privacy, Engineers

Interactive prototype

Prototype built in Framer. Covers conversational discovery, outfit reveal, and virtual fitting room on desktop and mobile.

Problem

Finding pieces is easy. Building a look isn't.

Shoppers in the ARES suite could already see how a garment fit their body. What they couldn't do was get help choosing what to try on. The discovery phase was entirely unassisted. Every major retailer already used some form of AI to suggest products, but those suggestions were generic, backward-looking, and unexplained. Shoppers had learned to ignore them. The issue wasn't personalisation. It was that nothing ever showed its reasoning.

Core problem statement

How do we reduce the cognitive work of building a complete outfit from hours to minutes, while earning enough trust that shoppers actually act on the recommendations?

"I can find individual pieces I like but I never know if they're going to work together until I'm standing in front of a mirror. By then I've already paid for everything."

Usability test participant, 2023

Research & Analysis

What two weeks of research taught us

We ran interviews, empathy mapping, card sorting, and competitor analysis alongside the User Researcher and Data Scientists. The goal wasn't to document methods for the sake of it. It was to build enough evidence to make decisions we could actually defend.

Three shopper archetypes

We started by defining three archetypes from interview synthesis, then used them throughout the project to pressure-test design decisions.

The Style Seeker

Enjoys discovery, wants curation

Elif loves browsing but gets overwhelmed. She wants a curated starting point, not a blank search bar.

The Trend-Led Shopper

Social proof matters

Ji-woo tracks occasions and seasons. She appreciates recommendations that feel intentional and current.

The Efficient Buyer

Goal-oriented, low patience

Marcus shops with a specific need. He will abandon if the flow doesn't move fast enough.

The trust gap wasn't accuracy. It was silence

Across 12 interviews, empathy mapping surfaced one finding that reframed the entire brief. Shoppers weren't sceptical of AI recommendations because they thought the technology was wrong. They ignored them because nothing explained the thinking behind them. A recommendation without context felt arbitrary. And arbitrary doesn't convert.

Empathy mapping across 12 participants. The trust gap wasn't accuracy. It was silence.

Competitor analysis across Stitch Fix, Zalando, Farfetch, and Amazon. The low friction, high transparency quadrant was empty.

Everyone had recommendations. Nobody showed their work

We mapped Stitch Fix, Zalando, Farfetch, and Amazon across onboarding friction, recommendation transparency, and AR depth. The pattern was consistent: tools that asked a lot upfront lost users before they saw a single result. Tools that asked nothing felt generic and unexplained. Nobody had built something that gathered preference through conversation and showed its reasoning. The top-right quadrant was empty. That was the gap Muse was designed to fill.

What research confirmed

01

Trust requires explanation

Users didn't distrust AI recommendations because they were inaccurate. They ignored them because nothing explained the reasoning. Visible context was non-negotiable.

02

Style and body data are separate

Shoppers treat style preferences and body data as completely separate decisions. Asking about both in the same flow felt intrusive and caused drop-off before the experience had started. The two needed to be separate entry points.

03

Conversation over questionnaire

Open prompts felt low-effort and personal. Structured onboarding forms felt like admin. The format of the input shapes how willing users are to engage.

04

AR try-on converts at the decision point

Seeing an item on a body like yours was the most effective way to move from interest to purchase. The try-on needed to be one tap from the recommendation, not a separate flow.

Design

Designing the conversation, the curation, and the fitting room

Six weeks of design across sketches, wireframes, and high-fidelity screens. The concept held from the very first session: a conversational interface that returns complete outfit looks grounded in real inventory, with visible reasoning throughout, connecting directly into a personal virtual fitting room.

Crazy 8s and sketching

Eight directions explored in one session. From search-first to conversation-first, model selectors to product grids. The chat prompt and outfit reveal in the bottom right won out.

Wireframes and flow validation

The full journey mapped and validated with the PM and engineering lead before moving to high fidelity.

A direction we tried and ruled out

The earlier direction surfaced individual items by category, with a separate fitting room and a choice of generic models. No photo upload. Building an outfit meant users picking and pairing pieces manually. We scrapped it because the hard work was still on the shopper, the fitting room was a separate step, and generic models weren't personalisation.

The final design

Built within Snap's ARES Shopping Suite design system, the final concept runs across three connected surfaces: a conversational discovery interface, a virtual fitting room, and a Fit Finder integration for sizing.

Each surface was designed separately for desktop and mobile. Not as a responsive adaptation, but as two distinct layouts built around how people actually shop on each. On desktop, outfit cards sit side by side with a persistent chat sidebar. On mobile, cards stack full-width with reasoning tucked behind a tap. The fitting room goes from a three-panel layout on desktop to an 80% viewport preview on mobile with a bottom sheet for actions.

Desktop: persistent sidebar with chat history, outfit cards side-by-side, reasoning note visible beneath each look name.

Stacked mobile outfit cards with transparent AI insights. The immersive mobile fitting room keeps the focus on the look, using a functional bottom sheet for a seamless mobile shopping journey.

Interaction

How the conversation becomes a purchase

The three surfaces only make sense as a connected sequence. Here is how a user moves through them, and the decisions that shaped each moment.

1. The prompt: visible thinking, not instant results

The user types a natural-language prompt or selects a chip. On send, the AI enters a visible streaming state: a typing indicator appears and the response builds progressively. An instant result feels like a filter. A streaming response feels like someone thinking. The distinction is small technically and significant experientially.

2. One clarifying question, more precise output

2. One clarifying question, more precise output

Before returning outfit looks, Muse may ask a single follow-up question to narrow the brief, "Is this for daytime or an evening occasion?" The answer refines the recommendations and adds a preference signal to the style model. This is also Muse's cold start solution: a first-time user with no history can be guided toward something specific through one question. The conversation is the onboarding.

3. Complete looks, not a product grid

3. Complete looks, not a product grid

Two or three outfit cards appear, each with a look name, assembled pieces, total price, and a "Shop this look" CTA. A brief reasoning note sits beneath each name explaining why this combination fits the prompt. Every recommendation shows its work. This is the most important design decision in the entire flow and the clearest point of difference from every existing recommendation engine.

4. Out-of-stock handling: transparent, not hidden

4. Out-of-stock handling: transparent, not hidden

When an item in a recommended look is out of stock or unavailable in the user's size, Muse surfaces the look anyway. The item is visually flagged with a muted state and a stock label. Hiding availability problems reduces recommendation quality without the user understanding why. Transparency at every tier builds more trust than a clean but artificially constrained catalogue.

5. Photo upload and the seven-second wait

5. Photo upload and the seven-second wait

The fitting room leads with "Upload your photo" as the primary action. On upload, a skeleton pulse appears on the model frame with a live timer badge. At approximately seven seconds the rendered image fades in. Making the generation time visible was a deliberate trust decision: it signals that something computationally real just happened, rather than a static image swap.

6. One tap to add the complete look

6. One tap to add the complete look

A single "Add all to bag" CTA replaces five separate add-to-cart interactions. For returning users with a size profile, their recommended size is pre-selected. For first-time users, tapping "Add all to bag" triggers the Fit Finder prompt inline. The user completes the short sizing flow and the item is added in the right size without leaving the page.

Availability and sizing decision logic

All items in stock in the user's size
Show look normally with "Shop this look" CTA
One item unavailable in the user's size, but available in an adjacent size
Flag the item with a "Limited sizing" label and show the closest available size. The shopper decides whether to proceed.
One item low stock or fully out of stock
Show look with that item flagged. "Notify me" replaces its CTA. Look total reflects available items only.
Two or more items unavailable, or the hero garment is out of stock in the user's size
Show look with a clear warning and offer the closest in-stock alternative as a secondary card. The original look remains visible to save or share.

On a first visit, before Fit Finder has collected sizing data, the system cannot filter by size. In this state, size availability is flagged at the look level with a prompt to complete Fit Finder. The filtering activates after the first sizing interaction. Transparency at every tier builds more trust than a clean but artificially constrained catalogue.

Personalisation

How Muse learns without asking

Muse doesn't rely on upfront profiling or purchase history. It builds a style model from the signals a user generates naturally as they interact. The clarifying exchange is the most efficient collection point: the user is actively contributing rather than being passively observed.

Clarifying question answered
Highest-quality signal; user directly states a preference
Outfit card viewed for 3+ seconds
Signals interest in the look's colour palette, silhouette, and occasion type
"Shop this look" tapped
Strong positive signal for all items and their style attributes
Item added to bag or wishlisted
Strongest passive signal; weighted 3x above dwell time
Outfit card scrolled past in under 1 second
Mild negative signal for that look's dominant attribute
Item swapped out in fitting room
Negative signal for that garment's specific style attributes
Prompt explicitly rejects a style ("not too formal", "nothing floral")
Hard filter applied to current and future sessions

Session signals refine recommendations within the current conversation only. Persistent signals (add to bag, purchase, explicit feedback) update the long-term profile used across all future sessions. A user browsing for a costume party shouldn't permanently skew their profile towards sequins. After approximately three completed sessions the system becomes meaningfully personalised. On a first visit, the conversational prompt and clarifying question stand in for history entirely.

Privacy and consent

Consent by region

Muse handles two categories of sensitive data: behavioural data used for personalisation, and biometric-adjacent data from uploaded photos for virtual try-on rendering. Consent requirements differ significantly between the EU and North America. The design accommodates both without creating a friction-heavy experience in either region.

European Union — GDPR
North America — CCPA / PIPEDA / BIPA
Explicit opt-in before data collection
No signals collected before the user actively accepts. Pre-ticked boxes are prohibited. Declining must be as easy as accepting.
Photo data — explicit biometric consent
Uploaded photos are processed under Article 9 as biometric data in several EU member states. A separate consent is required before photo upload, not bundled with general data consent.
Right to erasure
Users can delete their style profile, chat history, and uploaded photos from within the product via a "Reset my style profile" action in settings.
Data retention labels
Each data category shown with its retention period. Photos deleted after 30 days or immediately after rendering, whichever is sooner.
Opt-out model (CCPA — California)
Data collection permitted by default. A clear "Do Not Sell or Share My Personal Information" option is surfaced in the footer and stylist settings, not only in the privacy policy.
Photo data — BIPA exposure (Illinois)
Illinois's Biometric Information Privacy Act requires explicit informed consent before collecting biometric identifiers. A BIPA-compliant flow is required for Illinois users, distinct from CCPA.
PIPEDA (Canada)
Photo data for rendering falls under sensitive personal information and requires express consent, closer to GDPR in practice than California CCPA.
Data portability (emerging US states)
Virginia, Colorado, and Connecticut have passed comprehensive privacy laws. A data export feature is recommended proactively ahead of potential federal harmonisation.
Required
Recommended
Forward-looking

The EU and North American flows are designed separately, not as a single global compromise. The photo consent is surfaced as a distinct, labelled step immediately before the upload UI, with a plain-language explanation that the image is rendered server-side and not stored beyond the session unless the user explicitly saves the output.

Usability testing

The reasoning note landed exactly as intended

Moderated usability testing with 12 participants, recruited to reflect the three archetypes from research. Each was given the same scenario: find a complete outfit for a specific occasion using whichever tools the prototype offered. Sessions were conducted remotely using think-aloud protocol, with a second team member noting where participants hesitated or expressed uncertainty.

The most revealing moments weren't in the metrics. Several participants paused after seeing the outfit card reasoning note. Lines like 'Perfect for a sunset boat trip or cliffside dinner' landed exactly as intended. The reaction was consistent: 'oh, so it actually understood what I meant.' That confirmed the core research finding: unexplained recommendations are the trust problem, not inaccurate ones.

Metric
Muse
Proposal B
Proposal C
Task success rate
88%
75%
70%
Error rate
3%
7%
9%
Feature engagement
65%
50%
45%

The feature engagement gap was the most meaningful number. Users who tested Muse were significantly more likely to try both the fitting room and Fit Finder, not because they were prompted to, but because the outfit-first flow made those surfaces feel like natural next steps.

It's worth being honest about the limits here. Participants responded to static outfit cards with pre-written reasoning notes, not live AI-generated recommendations. What the test validated was the interaction model: that outfit-level curation, visible reasoning, and a connected fitting room created a more confident and engaged shopper. Whether the algorithm produces recommendations good enough to sustain that trust at scale is a question only a live A/B test can answer, which is exactly what the next phase was designed to address.

Next steps

Prototype to production

Muse was selected from three competing proposals following usability testing. The next phase was an MVP build scoped to the core conversational flow and outfit reveal, with the fitting room and Fit Finder integration to follow once the recommendation layer had been validated in production. With the algorithm and try-on pipeline already in place as backend infrastructure, an MVP build was estimated at 10 to 14 weeks for a small engineering team.

The A/B test would run within the ARES Shopping Suite across two or three retail clients, testing Muse's conversational entry point against the existing browse-and-filter experience as the control. Primary metrics: task completion rate, session duration, and add-to-bag rate. The test would run for a minimum of four weeks to account for novelty effect and collect enough sessions for statistical significance.

Metric
Target
Active users (month 1)
500
Task completion rate
85%
Avg session duration
5 min
User satisfaction
4.5/5

Reflection

What I'd do differently

The blank prompt

The blank prompt

A blank prompt was too much of a design bet for users who hadn't used anything like it before. I'd test prompt chips, suggested starting points, and a browse fallback before committing to the open-ended entry as the only path.

Clarifying questions

Clarifying questions

The clarifying exchange was added as a concept but not usability-tested in enough depth. I'd want to know how many users skip it, whether the question timing feels natural, and whether the visible effect on the outfit output is enough to make the exchange feel worthwhile.

Out-of-stock handling

Out-of-stock handling

The decision to surface out-of-stock items transparently was made through team discussion, not user testing. Whether to show or substitute is a trust question that deserves data, not just a product team judgment call.

Photo upload privacy

Photo upload privacy

Shifting to personal photo upload was the right direction. But the privacy and comfort implications, particularly GDPR biometric provisions in the EU and BIPA in Illinois, needed earlier legal alignment and more usability testing than the timeline allowed.

Making the learning visible

Making the learning visible

Signal collection happens silently. In retrospect I'd surface a lightweight "Here's what I've learned about your style" card after a few sessions. Personalisation that's visible gives users a sense of control over it, which closes the same trust gap identified in research.

Role and collaborators

What I owned and who I worked with

I was one of three designers who developed a competing proposal during the sprint. My responsibility covered the full end-to-end experience: research participation, ideation, wireframing, high-fidelity design across desktop and mobile, and prototype preparation for usability testing. The conversational discovery flow, the fitting room reframe from "View on Model" to personal photo upload, and the personalisation signal model were all decisions I owned and proposed. The privacy and consent architecture was developed collaboratively with Legal and Privacy in week eight.

Head of Product

Oversaw product strategy and ensured alignment with business goals across all three proposals.

Head of Design

Guided the overall design vision and facilitated bi-weekly critique sessions throughout the sprint.

User Researcher

Ran the interview programme, facilitated empathy mapping, and moderated the usability test sessions.

Data Scientists

Advised on signal weighting and provided the card sorting similarity matrix.

Engineers

Consulted throughout on feasibility, and scoped the MVP build estimate for the A/B test phase.

Legal and Privacy

Defined EU and North American consent requirements and reviewed the consent architecture.

Say Hello

Always looking to connect and build impactful products.

Say Hello

Always looking to connect and build impactful products.