By Nathan Self, Research Associate, Department of Computer Science, and faculty at Discovery Analytics Center, Virginia Tech. Self will be participating on the Opportunities, Challenges and Future Trends in Advanced Analytics panel at the second annual Capital Data Summit on Feb. 28, 2018.
Imagine your biggest spreadsheet. Too many rows and columns to take in at once. At the Discovery Analytics Center (DAC) at Virginia Tech we are interested in how humans and machines can work together to make sense out of all that data. Andromeda is an example of how analysts can combine sophisticated machine learning algorithms with interactive visualization to get insights from their data.
Let’s assume that your enormous spreadsheet has a row for every customer and a column for every statistic you keep about each customer. Andromeda draws a scatterplot in which each point represents a customer. Points that are close to each other represent rows that are similar to each other. Likewise, distant points represent dissimilar rows. To begin with, Andromeda assumes that each column is equally important.
This is where the human aspect comes in. You know what the data actually represents and you can interact with Andromeda in two ways. You can either (1) change the importance of a column. The points of the scatterplot will regroup to preserve the “near is similar” constraint. Or, (2) you can reach right into the scatterplot and move points closer or farther from each other and Andromeda will compute which columns have to be important for them to be considered similar.
Andromeda, with new algorithms and new paradigms of user-algorithm interaction, serves as a good example that complex statistical methods do not necessitate complex user interfaces or expert users. The selling point of Andromeda is that you don’t have to know that Andromeda uses an algorithm called weighted multidimensional scaling to lay out the scatterplot. You don’t have to know that data scientists at DAC developed inverse multidimensional scaling to handle interacting with the points. In fact, users have effectively generated insights despite having no experience with these, or similar, algorithms. And, their insights are more complex than when they use spreadsheets alone.
There are countless pivotal statistical processes that are well-suited and useful for current data analysis needs. Any of these would be well-served by intuitive interfaces for people that are not experts in statistics.
There is an adage that the same statistics can be used to justify either side of an argument. Machine learning has the same malleability. Understanding exactly what a machine learning tool like Andromeda can do for you — as well as its limitations — is important for deciding what to do with its outputs.