The Value of Verified Data in Analytics

Ron Agresta, Sr. Director, Product Management, SAS Institute, discusses how their platform is built to support the industry for advanced analytics.


Blog: WHAT’S UP WITH MODEL DOWNTIME?

Nearly Half of Data Science Models May Be Wasted

Deriving meaning from data is truly the way an enterprise can gain value. After determining the truth, of course. Don't forget my personal data philosophy, which I boil down to three words: Truth before meaning. Deriving meaning from your data through artificial intelligence is the focus of many data teams these days. Analytics, and more specifically the analytical models, can be embedded in a broad array of mobile apps, websites, IoT devices, software, and service offerings across the technology landscape of a typical data-driven enterprise.  Although huge advances have been made in the development of analytical and data science models, unfortunately a lot of those AI models never get deployed.  

In fact, according to Gartner’s 2022 AI in Organizations survey, nearly half of AI models never make it into production. Getting AI models into production and guarding against the possible ethical issues that could occur is a subject that we covered in a recent episode of Ask The Experts.  

“Whether it's forecasting or prediction or optimization, customers want to use advanced analytics,” said our guest, Ron Agresta Senior Director Product Management, at Loqate’s long-term partner SAS. The challenge, as with most things data-oriented, is that it can get a little too complicated. “The analytic life cycle, there's a lot of parts to it.” Data scientists and data engineers need to be aware, warned Ron, and have their “eyes on all of those opportunities where with one wrong selection they end up with something that they didn't intend.” 

These issues are prevalent because, as Ron explains, “the power of analytics is so self-evident these days there's no need to convince anyone that it's something that you should do to get an edge.” But with self-evident opportunities often come the self-realization that better management of data and platforms is an imperative.

How can you manage all those models as they catwalk down the enterprise runway? Analytics practitioners need “a full analytics platform from data ingestion, access through visualization to advanced analytic modeling, ultimately to model deployment and model monitoring,” recommended Ron, “really from beginning to end.”

“At SAS we're looking for things like skewness and bias and missing values and outliers,” said Ron, “anything that might start to impact the end results of the analytic model. Even letting people know they have selected a potentially sensitive piece of data, something that may be not great to use in an analytic model.”

The typical steps for analytic modelers can include “finding the right data, integrating that data and then moving to guts of the modeling process ideally,” Ron suggested in “low-code environments, dragging and dropping steps that are connected and helping the user get to the next stage to compare those models, validate them, make sure that they're running in a performant ways, that they're providing the value over time.”

RINSE AND REPEAT

The dynamic nature of this work requires constant attention. “Think of it as a loop that's ongoing,” explained Ron, “making sure that the model continues to supply the analytics that is driving the business use case operates efficiently and continues to perform well. If it starts to go in a different direction, then you go back through and revalidate, check the data, check the quality, and keep on going.”

“I like to think of that as the rinse and repeat methodology,” quipped my co-host Robert Dickson, Loqate’s Vice President of Professional Services, “everything being on a loop all the time.”

Since many analytics model use cases are aimed at improving, growing, and protecting customer, supplier, and partner relationships, location information and address validation will always be a critical input. “Whether that's taking an address, applying geocodes to it or any number of additional attributes about the geo aspect of the information, it drives all kinds of interesting geo-based analytics,” said Ron.

Some examples include “improving visualizations to see how things are doing in region, in state and country. Different slices of data, postal codes, or even the exact latitude and longitude,” Ron explained. SAS’ Financial Services customers have realized location information can broaden their value landscape. “Beyond crunching financial data, [location data] puts that in context of where it is in the globe, how it is relative to customer mix in different locations.”

In some cases the enterprise needs to mitigate the risk of certain relationships. “The North Carolina Department of Insurance,” Ron shared, “investigates potential insurance crimes, fraud in their accounts and then provides the accountability and the reference information needed to find potential fraud cases.”

Leveraging Loqate’s address verification and location data has become even more seamless with the release of SAS Viya, “we've expanded beyond verifying geocoding addresses. We've added email address verification, phone number, address verification,” Ron announced.

THE MATH HAS TO BE RIGHT

In many cases the likelihood of a model getting into production will be heavily influenced by its resiliency “where the math has to be right, it's got to be right every time, 24/7, 365 at enormous scale,” Ron revealed. And models that start strong might begin to change in unexpected ways. “You’ll need all kinds of interesting technologies to monitor how those models perform over time.”

Avoiding what one might perhaps refer to as model downtime is a constant challenge for enterprise data leaders who need to call out outliers when they come across them. “If you're building a report for an executive dashboard,” said Ron “and you select a couple variables that may have some issues with them, we'll let you know as you're doing that. And then we can carry that all the way through the modeling exercise. We're letting you know that you may have selected data that may skew your model, move it in a different direction, giving you results that you didn't quite think of.”

With software observability leading to data observability could model observability be far behind? “Taking analytics and putting that into production,” concluded Ron, “we've been doing that for a really long time.” So while the model observability buzzword may have yet to be coined, it sure sounds like that’s something Ron and the team at SAS have already been doing for a really long time.

Previous
Previous

Guiding Principles for Successful Data Governance and Management

Next
Next

Strategic Value of Metadata for an Enterprise