Interactive Clustering Visualization

Making a complex clustering algorithm understandable and interactive for users with no data science background

  • #web
  • #data visualization
  • #highcharts-js

Bringing sophisticated data analysis to everyone

Amplitude aims to bring sophisticated data analysis to all members of a product team regardless of their familiarity with SQL or other data tools. This is an ambitious goal on its own but even further challenging when it’s for queries that involve complex math that are hard to visualize.

That was the case for the query Clustering, later renamed to Personas, that was to be designed for people with basic knowledge of the concept and had to be interactive.

Clustering is the process of grouping together like objects by some similarity function. In our case, our algorithm grouped together users who had similar behavior based on the top 100 events they performed in their given application. The outcome would be an instant way for our analytics users to find out distinctive groupings of their customers based on what they actually do in their app. As you can imagine, that would be incredibly helpful to understand various patterns of usage that exist.

Research

Existing visualization research

They say ‘don’t reinvent the wheel’ but what if there just isn’t a wheel for reference? I started off by doing research around existing visualizations to get inspiration from in hopes that there were some visual standards already available.

A few (of many) visualizations I researched around clustering and data grouping

Unfortunately, most of these visualizations focused on only 2-4 different dimensions of information using color and size in addition to x and y position making a lot of them hard to parse. What's more is that our algorithm used 100 different dimensions and so these would not do. I realized I would need to start sketching and prototyping solutions since our apporach was so different.

Target Persona

After a few customer interviews and discussions with our PM team, I focused on a single target persona with a bare minimum level of analytics experience that the visualization would be designed for. The visualization would still try and be friendly enough for all users but we needed a guided level of expertise to know where to make trade-offs and focus.

Exploring the solution space

Sketching and brainstorming

I started exploring the solution space by sketching ideas of simple ways to approach the problem. Rather than look at the clustering data solely, I thought of ways to incorporate aspects of the resultant clusters in comparison to one another.

A small sample of sketches and sticky notes used while brainstorming and exploring ideas

I also discussed with my team and spoke to customers around who would use such a feature and when they do similar actions manually. This helped us realize the main drivers for usage of Clustering.

Prototyping using Highcharts JS to see usage with real data

Soon after starting my sketches and showing them to others in the office, I realized that I needed to look at this with real data. My sketches assumed that the data would be predictably distributed but there was no way of knowing how much noise and other factors would actually contribute to the size of the clusters.

I reached out to our developers and they figured out a way to build a sandbox for me that allowed me to play with real data coming from our clustering algorithm but in an area of our codebase where I could play around with scrappy code.

My functioning prototype built using Highcharts JS that used real data from our clustering algorithm

After playing around with Highcharts JS and writing some hacky code to tie it altogether, I was able to build a functioning prototype that used real data. This was incredibly useful because I could see expected distribution of cluster sizes on various real world data sets giving me a much clearer picture of what to expect.

Exploring the solution space by purposefully designing 3 opposing solutions

Once I had my prototype built and saw the results with real data, I realized that the solution space for such a new type of visualization could be large and diverse. Should it focus on ranking clusters based on metrics like retention or percent active? Or should the visualization be more about comparing each cluster to other cohorts in the system?To get conviction around these various questions and make better design decisions, I realized that I needed to properly explore the solution space by designing mock-ups that were very different from each other. This way I could get feedback around each extreme and better understand which ideas were the most effective and actually useful.

I also realized that since this Clustering query was complex and required a number of steps that it was very different from others within Amplitude. And so I was at a crossroads. One one hand I wanted to use the same interaction design language we worked so hard to establish that is consistent from chart to chart but on the other hand there were so many steps the user needed to figure out on their own to be successful that I wondered if breaking from the design system would be more helpful.

I therefore set my “design axes” for my solution space to be around guidance vs familiarity. I decided that I would design a solution that would be a heavily guided approach with less familiarity to other charts and our design system, one that had high levels of familiarity with less of a guided approach, and finally one that tried to meet in the middle of the two.

A single screenshot of each of my 3 mock-up solutions designed to explore the space

My three mock-up solutions proved to be very helpful to understand how the final solution should behave. Around the team, the 3 opposing mock-up solutions sprung up many conversations and encouraged ideas that weren’t discussed before. Having a design axes to discuss with also helped the team debate and suggest different aspects they liked and disliked with a common framework.

Speaking to users before a single line of code is written

Now that I had my 3 mock-up solutions ready, I began to show them to existing customers, each with various levels of analytics experience.

I walked them through the designs and got their raw feedback. I then explained the various axes and the different pros and cons of each. The conversations were enlightening and, by having real mockups to show, I got very specific feedback. No one liked any one solution entirely but rather people suggested mixes of each that brought together the parts that would help them be most effective.

The Solution

Designed with conviction

The final solution would have not been possible without researching existing solutions, building a prototype that better helped me understand the data the algorithm returned, and then building 3 opposing solutions that helped get diverse and specific feedback from our customers

Approachable yet powerful

The final solution was approachable yet allowed for sophisticated and powerful controls that allowed users to interact and modify the query to their needs.

In the end, the final solution shared components from all 3 mock-up solutions and would have not been possible without exploring the solution space. To learn more about Amplitude’s Clustering, later renamed to Personas, visit Amplitude’s website to understand how it works and to try out the interface yourself.