Case study, Muse, Snap Inc, 2023
Muse: Outfit curation, reimagined with AI
An eight-week design sprint inside Snap's AR Shopping Suite. The goal was to turn a passive browse experience into a conversation that assembles complete, shoppable looks around what a user actually needs.
Task success rate
Error rate
User satisfaction
My role
Product Designer
Timeline
Eight-week sprint, 2023
Team
PM, User Researcher, Data Scientists, Engineers, 3 Designers
Platform
Web and Mobile via ARES Shopping Suite
Context
A sprint with something real at stake
By 2023, Snap's ARES Shopping Suite had two strong technical foundations in place: an outfit recommendation algorithm that could assemble looks from live catalogue inventory, and a virtual try-on pipeline that rendered garments onto a user's body with accurate fabric and lighting. What it lacked was a front-end experience that connected them around a shopper's actual intent. Discovery was still a browse-and-hope exercise with no personalisation and no guidance.
Three designers each developed a competing proposal. The strongest concept would be selected for a real A/B test with retail clients. My hypothesis going in was straightforward: the problem wasn't the technology. Shoppers still had to do the hardest part themselves, spending 20 to 40 minutes jumping between tabs and mentally assembling a look from individual product pages, with no guarantee the pieces would actually work together.
Muse's premise was simple: a shopper who types "I need something for a coastal holiday in July" has given the system more useful signal in one sentence than a recommendation engine gets from months of browse history. The conversation replaces the browse.
| Weeks | Phase | With |
|---|---|---|
| 1–2 | Research: interviews, empathy mapping, card sorting, competitor analysis | User Researcher, Data Scientists |
| 3–4 | Ideation: crazy 8s, wireframes, flow validation | Head of Design, PM |
| 5–6 | High-fidelity design and prototyping | Head of Design, Engineers |
| 7 | Usability testing: 12 participants, moderated | User Researcher |
| 8 | Iteration, consent architecture, developer handoff | Legal and Privacy, Engineers |
Interactive prototype
Prototype built in Framer. Covers conversational discovery, outfit reveal, and virtual fitting room on desktop and mobile.
Problem
Finding pieces is easy. Building a look isn't.
Shoppers in the ARES suite could already see how a garment fit their body. What they couldn't do was get help choosing what to try on. The discovery phase was entirely unassisted. Every major retailer already used some form of AI to suggest products, but those suggestions were generic, backward-looking, and unexplained. Shoppers had learned to ignore them. The issue wasn't personalisation. It was that nothing ever showed its reasoning.
Core problem statement
How do we reduce the cognitive work of building a complete outfit from hours to minutes, while earning enough trust that shoppers actually act on the recommendations?
"I can find individual pieces I like but I never know if they're going to work together until I'm standing in front of a mirror. By then I've already paid for everything."
Usability test participant, 2023
Research & Analysis
What two weeks of research taught us
We ran interviews, empathy mapping, card sorting, and competitor analysis alongside the User Researcher and Data Scientists. The goal wasn't to document methods for the sake of it. It was to build enough evidence to make decisions we could actually defend.
Three shopper archetypes
We started by defining three archetypes from interview synthesis, then used them throughout the project to pressure-test design decisions.

The Style Seeker
Enjoys discovery, wants curation
Elif loves browsing but gets overwhelmed. She wants a curated starting point, not a blank search bar.

The Trend-Led Shopper
Social proof matters
Ji-woo tracks occasions and seasons. She appreciates recommendations that feel intentional and current.

The Efficient Buyer
Goal-oriented, low patience
Marcus shops with a specific need. He will abandon if the flow doesn't move fast enough.
The trust gap wasn't accuracy. It was silence
Across 12 interviews, empathy mapping surfaced one finding that reframed the entire brief. Shoppers weren't sceptical of AI recommendations because they thought the technology was wrong. They ignored them because nothing explained the thinking behind them. A recommendation without context felt arbitrary. And arbitrary doesn't convert.
Empathy mapping across 12 participants. The trust gap wasn't accuracy. It was silence.
Competitor analysis across Stitch Fix, Zalando, Farfetch, and Amazon. The low friction, high transparency quadrant was empty.
Everyone had recommendations. Nobody showed their work
We mapped Stitch Fix, Zalando, Farfetch, and Amazon across onboarding friction, recommendation transparency, and AR depth. The pattern was consistent: tools that asked a lot upfront lost users before they saw a single result. Tools that asked nothing felt generic and unexplained. Nobody had built something that gathered preference through conversation and showed its reasoning. The top-right quadrant was empty. That was the gap Muse was designed to fill.
What research confirmed
01
Trust requires explanation
Users didn't distrust AI recommendations because they were inaccurate. They ignored them because nothing explained the reasoning. Visible context was non-negotiable.
02
Style and body data are separate
Shoppers treat style preferences and body data as completely separate decisions. Asking about both in the same flow felt intrusive and caused drop-off before the experience had started. The two needed to be separate entry points.
03
Conversation over questionnaire
Open prompts felt low-effort and personal. Structured onboarding forms felt like admin. The format of the input shapes how willing users are to engage.
04
AR try-on converts at the decision point
Seeing an item on a body like yours was the most effective way to move from interest to purchase. The try-on needed to be one tap from the recommendation, not a separate flow.
Design
Designing the conversation, the curation, and the fitting room
Six weeks of design across sketches, wireframes, and high-fidelity screens. The concept held from the very first session: a conversational interface that returns complete outfit looks grounded in real inventory, with visible reasoning throughout, connecting directly into a personal virtual fitting room.
Crazy 8s and sketching

Eight directions explored in one session. From search-first to conversation-first, model selectors to product grids. The chat prompt and outfit reveal in the bottom right won out.
Wireframes and flow validation

The full journey mapped and validated with the PM and engineering lead before moving to high fidelity.
A direction we tried and ruled out




The earlier direction surfaced individual items by category, with a separate fitting room and a choice of generic models. No photo upload. Building an outfit meant users picking and pairing pieces manually. We scrapped it because the hard work was still on the shopper, the fitting room was a separate step, and generic models weren't personalisation.
The final design
Built within Snap's ARES Shopping Suite design system, the final concept runs across three connected surfaces: a conversational discovery interface, a virtual fitting room, and a Fit Finder integration for sizing.
Each surface was designed separately for desktop and mobile. Not as a responsive adaptation, but as two distinct layouts built around how people actually shop on each. On desktop, outfit cards sit side by side with a persistent chat sidebar. On mobile, cards stack full-width with reasoning tucked behind a tap. The fitting room goes from a three-panel layout on desktop to an 80% viewport preview on mobile with a bottom sheet for actions.

Desktop: persistent sidebar with chat history, outfit cards side-by-side, reasoning note visible beneath each look name.



Stacked mobile outfit cards with transparent AI insights. The immersive mobile fitting room keeps the focus on the look, using a functional bottom sheet for a seamless mobile shopping journey.
Interaction
How the conversation becomes a purchase
The three surfaces only make sense as a connected sequence. Here is how a user moves through them, and the decisions that shaped each moment.
1. The prompt: visible thinking, not instant results
The user types a natural-language prompt or selects a chip. On send, the AI enters a visible streaming state: a typing indicator appears and the response builds progressively. An instant result feels like a filter. A streaming response feels like someone thinking. The distinction is small technically and significant experientially.

Before returning outfit looks, Muse may ask a single follow-up question to narrow the brief, "Is this for daytime or an evening occasion?" The answer refines the recommendations and adds a preference signal to the style model. This is also Muse's cold start solution: a first-time user with no history can be guided toward something specific through one question. The conversation is the onboarding.

Two or three outfit cards appear, each with a look name, assembled pieces, total price, and a "Shop this look" CTA. A brief reasoning note sits beneath each name explaining why this combination fits the prompt. Every recommendation shows its work. This is the most important design decision in the entire flow and the clearest point of difference from every existing recommendation engine.

When an item in a recommended look is out of stock or unavailable in the user's size, Muse surfaces the look anyway. The item is visually flagged with a muted state and a stock label. Hiding availability problems reduces recommendation quality without the user understanding why. Transparency at every tier builds more trust than a clean but artificially constrained catalogue.

The fitting room leads with "Upload your photo" as the primary action. On upload, a skeleton pulse appears on the model frame with a live timer badge. At approximately seven seconds the rendered image fades in. Making the generation time visible was a deliberate trust decision: it signals that something computationally real just happened, rather than a static image swap.

A single "Add all to bag" CTA replaces five separate add-to-cart interactions. For returning users with a size profile, their recommended size is pre-selected. For first-time users, tapping "Add all to bag" triggers the Fit Finder prompt inline. The user completes the short sizing flow and the item is added in the right size without leaving the page.

Availability and sizing decision logic
On a first visit, before Fit Finder has collected sizing data, the system cannot filter by size. In this state, size availability is flagged at the look level with a prompt to complete Fit Finder. The filtering activates after the first sizing interaction. Transparency at every tier builds more trust than a clean but artificially constrained catalogue.
Personalisation
How Muse learns without asking
Muse doesn't rely on upfront profiling or purchase history. It builds a style model from the signals a user generates naturally as they interact. The clarifying exchange is the most efficient collection point: the user is actively contributing rather than being passively observed.
Session signals refine recommendations within the current conversation only. Persistent signals (add to bag, purchase, explicit feedback) update the long-term profile used across all future sessions. A user browsing for a costume party shouldn't permanently skew their profile towards sequins. After approximately three completed sessions the system becomes meaningfully personalised. On a first visit, the conversational prompt and clarifying question stand in for history entirely.
Privacy and consent
Consent by region
Muse handles two categories of sensitive data: behavioural data used for personalisation, and biometric-adjacent data from uploaded photos for virtual try-on rendering. Consent requirements differ significantly between the EU and North America. The design accommodates both without creating a friction-heavy experience in either region.
The EU and North American flows are designed separately, not as a single global compromise. The photo consent is surfaced as a distinct, labelled step immediately before the upload UI, with a plain-language explanation that the image is rendered server-side and not stored beyond the session unless the user explicitly saves the output.
Usability testing
The reasoning note landed exactly as intended
Moderated usability testing with 12 participants, recruited to reflect the three archetypes from research. Each was given the same scenario: find a complete outfit for a specific occasion using whichever tools the prototype offered. Sessions were conducted remotely using think-aloud protocol, with a second team member noting where participants hesitated or expressed uncertainty.
The most revealing moments weren't in the metrics. Several participants paused after seeing the outfit card reasoning note. Lines like 'Perfect for a sunset boat trip or cliffside dinner' landed exactly as intended. The reaction was consistent: 'oh, so it actually understood what I meant.' That confirmed the core research finding: unexplained recommendations are the trust problem, not inaccurate ones.
The feature engagement gap was the most meaningful number. Users who tested Muse were significantly more likely to try both the fitting room and Fit Finder, not because they were prompted to, but because the outfit-first flow made those surfaces feel like natural next steps.
It's worth being honest about the limits here. Participants responded to static outfit cards with pre-written reasoning notes, not live AI-generated recommendations. What the test validated was the interaction model: that outfit-level curation, visible reasoning, and a connected fitting room created a more confident and engaged shopper. Whether the algorithm produces recommendations good enough to sustain that trust at scale is a question only a live A/B test can answer, which is exactly what the next phase was designed to address.
Next steps
Prototype to production
Muse was selected from three competing proposals following usability testing. The next phase was an MVP build scoped to the core conversational flow and outfit reveal, with the fitting room and Fit Finder integration to follow once the recommendation layer had been validated in production. With the algorithm and try-on pipeline already in place as backend infrastructure, an MVP build was estimated at 10 to 14 weeks for a small engineering team.
The A/B test would run within the ARES Shopping Suite across two or three retail clients, testing Muse's conversational entry point against the existing browse-and-filter experience as the control. Primary metrics: task completion rate, session duration, and add-to-bag rate. The test would run for a minimum of four weeks to account for novelty effect and collect enough sessions for statistical significance.
Reflection
What I'd do differently
A blank prompt was too much of a design bet for users who hadn't used anything like it before. I'd test prompt chips, suggested starting points, and a browse fallback before committing to the open-ended entry as the only path.
The clarifying exchange was added as a concept but not usability-tested in enough depth. I'd want to know how many users skip it, whether the question timing feels natural, and whether the visible effect on the outfit output is enough to make the exchange feel worthwhile.
The decision to surface out-of-stock items transparently was made through team discussion, not user testing. Whether to show or substitute is a trust question that deserves data, not just a product team judgment call.
Shifting to personal photo upload was the right direction. But the privacy and comfort implications, particularly GDPR biometric provisions in the EU and BIPA in Illinois, needed earlier legal alignment and more usability testing than the timeline allowed.
Signal collection happens silently. In retrospect I'd surface a lightweight "Here's what I've learned about your style" card after a few sessions. Personalisation that's visible gives users a sense of control over it, which closes the same trust gap identified in research.
Role and collaborators
What I owned and who I worked with
I was one of three designers who developed a competing proposal during the sprint. My responsibility covered the full end-to-end experience: research participation, ideation, wireframing, high-fidelity design across desktop and mobile, and prototype preparation for usability testing. The conversational discovery flow, the fitting room reframe from "View on Model" to personal photo upload, and the personalisation signal model were all decisions I owned and proposed. The privacy and consent architecture was developed collaboratively with Legal and Privacy in week eight.
Head of Product
Oversaw product strategy and ensured alignment with business goals across all three proposals.
Head of Design
Guided the overall design vision and facilitated bi-weekly critique sessions throughout the sprint.
User Researcher
Ran the interview programme, facilitated empathy mapping, and moderated the usability test sessions.
Data Scientists
Advised on signal weighting and provided the card sorting similarity matrix.
Engineers
Consulted throughout on feasibility, and scoped the MVP build estimate for the A/B test phase.
Legal and Privacy
Defined EU and North American consent requirements and reviewed the consent architecture.


