Tamr Unify
NOTE: All visual design was performed by a teammate.
Background
Tamr is a next-generation Master Data Management (MDM) tool. For decades, organizations like IBM and Informatica have led the enterprise in helping to unify disparate, siloed data. This was traditionally done by employing a number of data curators that write “rules” defining whether, for example, customer records refer to the same real-world entity or not:
Customers same if FNAME is 90% similar && SocSec are 100% similar && …
Informatica MDM Hub Console
These rules become very complex and after a few hundred, they exceed cognitive capacity even with decision-support tooling. MDM projects have been quite expensive and, in a sense, not terribly successful. Tamr was born of Michael Stonebraker’s work at MIT in a project called Data Tamer. Stonebraker won the Turing Award shortly after co-founding Tamr.
Tamr’s genesis is based on the observation that the big data mastering problem is ripe for machine-learning. Tamr employs a patented inference engine that automatically clusters columns and records — when it gets stuck, it employs the help of Subject Matter Experts (SMEs) via “expert-sourcing”.
Need
In an effort to prescribe an intuitive UX, Tamr offers users “projects” with fixed workflows for mastering data.
Tamr’s project list
A project dashboard
However, these projects have failed to capture the complexity of real-world projects, which require stringing together (often via home-grown Python, for example), many of Tamr’s project. This is an example of an emergent use case, similar to what I’ve blogged about before.
Directed Acyclic Graph (DAG) representing many Tamr “projects” in a single data-processing pipeline. Blurred for confidentiality.
Tamr wanted a way to give users flexibility in defining complex workflows without succumbing to the “lowest common denominator” of giving users a full-blown, ETL-style, “boxes and arrows” tool, which are generally complex or capable of becoming complex — the very thing we strive to transform about the MDM space.
Process
I explored a number of existing tools and systems for authoring and orchestrating such workflows, such as StreamSets, KNIME, and many more. I flew to visit AirFlow’s headquarters with our technical architect and head of product as part of our research and partnership development.
After building histograms to analyze our customers DAGs for patterns (do certain steps always follow others?), facilitating affinity diagraming workshops, Google Venture-style Sprints with whiteboard storyboarding, participatory design, interviews with CDOs and data curators, surveys and more, I eventually came to a design that bifurcated the experience based on personas, or rather, jobs to be done.
Solution
For the often singular data engineer orchestrating and troubleshooting the master workflow, a DAG view is provided. However, for a vast majority of data curators that only need to get in and out of the steps relevant to them (e.g., schema mapping, clustering, collapsing clusters into golden records, etc.), a list view inspired by TurboTax was employed.
Underneath TurboTax’s hood is complex branching — but users needn’t see it. A simpler list view is presented wherein dependencies are resolved as-needed in a user’s journey.
This solution has shifted our roadmap and is currently underway by our engineering team.