By Meredith Barnes-Cook, Global Head of Insurance, Ushur
“The only constant in life is change”- that goes for AI, too. The Greek philosopher Heraclitus could not have anticipated AI, but his observation stands the test of time. Anyone who has wondered why they experience a diminishing value from an AI solution will likely find the root cause to be unchecked data drift. Machine learning models are trained using a high volume of data at the time of initial deployment. When implemented, there will be a portion of data that is classified by the model with low confidence. These low confidence predictions are typically handled manually by knowledge specialists/line-of-business personnel. But that is not the end of the ML automation journey, rather it’s just the beginning.
Machine Learning should never be introduced as a “set it and forget it” solution
We have seen this flawed pitch in the marketplace, and it’s destined to put a carrier on a path toward disappointment. Not to mention it will erode customer experience, diminish responsiveness, and increase operational expense. A partnership between a solution provider and carrier focused on continuous improvement ensures the value realized from automation doesn’t just remain steady, rather it grows over time.
Monitor & Maintain
Model performance needs to be monitored, to look for the earliest indicators that it’s time for additional training that incorporates another batch of more recent data. We partner with our customers and apply a root cause problem solving approach when we see anything out of the expected with SmartMail’s process automation. All it can take is one new hot topic in the world, for new terminology to become the leading Google search word seemingly overnight. Or perhaps a carrier has introduced a new product or launched a new advertising campaign. The changes that occur every day will change the conversations that customers want to have about their insurance. Or as Benjamin Franklin said so well, “When you are finished changing, you are finished.”
The general rules of thumb for improving the performance of sub-optimal categories are suggested below:
- Add more data. If the number of training samples is low, try adding more data.
- Check for labeling errors. Potential mislabels identified by the Ushur platform can also be made available for this exercise.
- Double-check the data collection process for potential sampling bias. A biased sample is one that is not representative of the entire population. Data collection should be purely a random exercise, wherein each data point has an equal chance of being chosen (using an entire database/ backup/PST archive, collecting samples across a wide time range, etc).
- Use Ushur’s Intelligent Data Extraction. In some scenarios, two or more categories can have overlap. It might be hard to distinguish between categories based on the textual content of samples. In those cases, consider using the Ushur platform’s intelligent data extraction capability to identify KBIs (key business indicators) which can be coupled with the metadata feature to override the model’s predicted classifications.
- Use a progressive approach when adding categories. If the number of categories is large, it might be helpful to start with fewer categories and progressively add more. The initial set of categories can be chosen based on volume/availability of data, business value, etc.
- Experiment with other model types. The Ushur platform supports a wide variety of NLP models starting from simple TF-IDF, Doc2Vec, or word2vec based SVM models up to neural-network-based deep learning models like fasttext, BiLSTM, ULMfit, etc.
Additional Reading
As always, share your thoughts, experiences with data drift, or any questions you may have in the comments!
Meredith is the Global Head of Insurance at Ushur, and has decades of insurance industry, consulting, and agency leadership experience spanning business systems, operations, and customer service. She loves working with Ushur customers on their customer experience transformations.