Don’t learn lessons on predictive modeling techniques the hard way
By Ed Burns, Executive Editor, TechTarget
Feb 07, 2017
This article first appeared on TechTarget.
The 2016 presidential election ended in stunning fashion, and it wasn’t just because of who won. Indeed, Donald Trump’s upset victory over Hillary Clinton triggered a political earthquake of seismic proportions. But another big surprise was seeing a campaign so focused on big data and predictive analytics fall to the candidate driven more by emotion and intuition. And it wasn’t just the Clinton campaign that got caught off guard when voting didn’t follow the path predicted by most analytical models. Virtually all analytics-driven election forecasters projected that Clinton would win, some with a probability as high as 99%. Even Trump’s own data analytics team put his chances of pulling off a victory at only 30% the day before Election Day last November.
It’s often said that organizations should be more data-driven. Businesses that make decisions based on data analytics tend to outperform those that don’t, according to proponents. Unquestionably, cutting-edge enterprises — from Google, Amazon and Facebook to the likes of Uber and Airbnb — are changing their industries partly by leveraging data mining, machine learning and predictive modeling techniques. But that doesn’t mean data-driven analytics projects are immune to errors and problems like the ones that befell Clinton’s campaign. Any predictive analytics initiative can hit similar potholes that send it careening off track. Missteps like using low-quality data, measuring the wrong things and failing to give predictive models a suitable reality check should serve as a reminder to data scientists and other analysts that the analytics process isn’t simply a matter of collecting some data and developing models to run against it.
Predict the right things
In business applications, building and rolling out predictive models doesn’t necessarily give you a better idea of what’s likely to happen in the future.
That was a lesson learned recently by Meridian Energy Ltd., an electricity generator and distributor that operates in New Zealand and Australia. Speaking at the IBM World of Watson 2016 conference in Las Vegas last October, Neil Gregory, the company’s reliability engineering manager, said his team was migrating away from an 8-year-old predictive maintenance system because of shortcomings in the way it made predictions — a flaw that Meridian decided it couldn’t live with anymore. The software, from a vendor Gregory declined to identify, was intended to predict the maintenance needs of assets like generators, wind turbines, transformers, circuit breakers and industrial batteries — essentially all the large equipment the company owns and operates. “Bad things happen when you don’t understand the condition of your plant,” he said. “That’s what our real drive is to do predictive asset management: to avoid that kind of thing.” But the outdated predictive modeling techniques supported by the system weren’t actually predicting equipment failures. Instead, it ran simulations of different scenarios and predicted when assets would fail the simulated tests. That may sound like a small distinction, but a failed test doesn’t necessarily mean a piece of equipment will fail in the real world. The discrepancy limited the degree to which plant maintenance teams could rely on predictive recommendations generated by the software. To replace the old system, Gregory’s team was implementing IBM’s Predictive Maintenance and Quality software, which was scheduled to go live in January. He said the new application will let his data analysts incorporate more-real-time data from equipment to feed their predictive models. That in turn should enable them to better predict likely failures before operations are affected at Meridian. Going forward, Gregory thinks his team will also be able to do more with machine learning to help ensure that the predictive capabilities of its models improve over time. Meridian is using IBM’s SPSS predictive analytics platform to power the machine learning efforts. As part of that, the data analysts should be able to build predictive models in SPSS and “drag and drop” them into the predictive maintenance application, Gregory said. “The ability to ‘learn’ is something we see a lot of value in,” he noted. “There’s some huge potential there for us because we’re a data-rich industry.”
Predictive models need sanity checks
Across industries, the main point of adopting a data-driven strategy that leverages predictive modeling techniques is to make business decisions more informed and objective — and less influenced by people’s natural cognitive biases. But that doesn’t mean organizations should eliminate human judgment from the analytics process entirely. “Over time, intuition has to come into play where you say, ‘This doesn’t look right,'” said Dennis Climer, director of commercial division pricing at Shaw Industries Group Inc., a large carpet and flooring manufacturer based in Dalton, Ga. Shaw uses predictive analytics to determine proposed prices for installations of its commercial products that are likely to maximize profit margins without giving customers sticker shock. The pricing optimization effort is customized to each sale and based on factors that include the customer’s size and previous order history, the type of project and details about the products involved. Data is pulled from Shaw’s Salesforce customer relationship management system into software from Zilliant Inc., where analytical models are run to predict optimal price ranges. The recommended prices are then fed back into Salesforce and put to use by sales teams. Climer said the process has made price quotes more certain and, ultimately, profits on contracts more predictable. But it doesn’t run on autopilot. His team continually does health checks on the analytical models to make sure they’re delivering sensible recommendations, and he makes changes to quotes if the suggested price ranges are outside the realm of what he knows to be reasonable based on his experience working with customers. Monitoring the output of predictive models is important because their performance tends to “drift” over time due to changes in customer behavior and broader trends such as the overall economic health of a market. For Shaw, it’s also important because sometimes the company might not have enough data to be fully confident that the software is modeling prices effectively — for example, when it’s rolling out new products or expanding into new territories. Climer said that’s when a data analyst needs to step in and make sure the models are giving answers that can actually be used to set prices. “As long as people are involved, it’s never going to be just [about] math,” he said. “Some things [recommended by predictive models] don’t make business sense.”
Create clear roles for analytics teams
Analytics managers also need to be on guard to ensure that the data scientists working for them can focus on developing predictive insights and not get bogged down with more reactive — and less rewarding — duties such as basic business intelligence reporting or data management tasks.
That was the case at London-based insurance company Aviva PLC. Rod Moyse, Aviva’s head of analytics, said initial projects designed to improve fraud forecasting and predict appropriate monetary settlements for bodily injury claims flew under the radar internally. At the time, most of the company saw Moyse and his 40-person team as reporting specialists, not data scientists whose primary job is to build and run complex models using predictive modeling techniques. “But we had started to think about how to move to something altogether different,” he said. “We needed to change [internal perceptions], and fast.” Speaking at software vendor SAS Institute Inc.’s Analytics Experience 2016 conference in Las Vegas last September, Moyse said the key to overcoming the perceptions was focusing on predictive analytics projects that were relevant to upper-level management to ensure they received recognition and acceptance inside Aviva. One project that helped put the analytics team on the map was creating an SAS-based tool to assess whether cars involved in accidents should be repaired or declared total losses. That used to be a lengthy process for claims agents; in the end, though, decisions were often based on little more than calls between them and customers. Now, the tool looks at pre-accident car values and compares them to repair estimates from mechanics. It’s a fairly simple application, Moyse acknowledged. But the technology turned a highly subjective process into one based more on predictive insights. Within Aviva, Moyse also worked to define the role a predictive analytics team is supposed to play to avoid further confusion about that. The prescription, he said, is to not talk about what has happened in business operations but rather what is likely to happen in the future, while also educating corporate executives and business managers about predictive analytics — “how it works, what we can do.”
Of course, the most important thing is having good data to work with. Otherwise, even the best-planned predictive analytics efforts can go awry, as demonstrated by what happened in the presidential election. For the Clinton campaign and election forecasters alike, the main problem with their predictions of the outcome appears to have been overreliance on data that turned out to be unreliable: poll results. “If you’re making a call on an election based predominantly on one point of data, you’re making a mistake,” said Michael Cohen, CEO of Cohen Research Group, a public opinion and market research firm in Washington, D.C. “Polling has its place, but it’s not the only thing. When you’re looking at research asking people what they believe, you have to look at other metrics around it.”
Cohen thinks many of the polls before the election were skewed in favor of Clinton because of social desirability bias: people telling pollsters they supported Clinton because they thought that was more socially acceptable but then opting for Trump in the privacy of the voting booth. Also, polling didn’t do an effective job of capturing voter enthusiasm, he said. Trump had more energized supporters who showed up at polling places on Election Day in greater numbers than analytical models had predicted. Predictive modelers looking to forecast elections need to find ways to incorporate these other factors into their models, said Cohen, who is also an adjunct professor at George Washington University’s Graduate School of Political Management. The good news for corporate enterprises is that their analytics teams are generally in a better position to make predictions compared to those working for campaigns. Businesses tend to have far more data about customer behavior or their operations than campaigns have about voters. In addition, each election is a unique head-to-head matchup, which limits how historical data can be used to model voting. Even so, problems can arise in corporate applications due to a lack of sufficient data for “training” predictive models to ensure they produce accurate results. That’s particularly an issue when data scientists are running machine learning applications, said Mike Gualtieri, an analyst at Forrester Research. “Machine learning thrives when there’s a lot of historical use cases to learn from — when there are many training events,” he said. And to avoid business missteps, analytics teams need to know when to hold off on touting the output of predictive models that aren’t rock-solid, even if it means scrapping them and moving on to something else. “There are just some things you can’t predict,” Gualtieri cautioned.