What the Heck is Machine Learning Good For, Anyway?

You’ve probably heard the phrase “machine learning” bandied around in business publications. You probably are also frustrated by the vague promises from these articles that machine learning will revolutionize compliance (for better or for worse) and offer a powerful tool for your organization. What will machine learning actually do for records managers, or compliance engineers, or people tasked with file analysis for their company? But machine learning isn't just fluff - there are concrete ways it can help enterprise information governance.

The Problem: Combing Through Your Dataset

Any large organization’s data stores are going to be very complicated and filled with data of various kinds. Finding information you need, like PII to comply with privacy regulation, is already next to impossible if we assume all of the PII is in places that logically make sense. But what if it isn’t? What if PII was sent employee-to-employee in a Bloomberg chat, or a customer accidentally entered their social security number into the “more comments” section? These might be too difficult to discover through a conventional file search, and the GDPR probe into your business certainly won’t accept “I didn’t know it was there” as an excuse.

The Solution: Machine Learning as Investigator

Machine learning, up until this point, has been cast as an enemy to GDPR compliance due to the regulation’s restrictions on building predictive profiles about a business’ customers. However, it may also prove to be a valuable tool in achieving compliance and better managing file data.

If fed the right dataset (preferably one that doesn’t violate users’ privacy), a machine learning algorithm can find PII using context clues. Word vector analysis, one of the more common and better-researched methods of machine learning, analyzes the relationships between different words or items in a dataset. Thus, sensitive or relevant data can be discovered not by searching for it specifically but by identifying what words are most common to appear close to it.

In the future, machine learning may also be able to categorize file types that today’s file analysis tools cannot. This article from Analytics Vidhya discusses using deep learning (a type of machine learning that simulates how the human brain works) to analyze audio files. Perhaps soon, with machine learning assistance, sound and video can be as easy to categorize and remediate as text.

Going Beyond the Buzzwords

Depending on your opinion, machine learning could sound like the answer to all your professional problems or an impossible-to-define fad that will be relegated to the pile with Betamax in 3 years. It’s really neither – it’s a tool that you can add to your toolbox of information governance tool. Even if you won’t end up using it now or in the future, it’s important to do research on the future of IG solutions so you can choose an effective, unified option to meet your company’s needs.

I'm a Bay Area native who enjoys writing about the endlessly fascinating field of information governance. In my spare time, I enjoy making board games, baking, and attempting to convince everyone I know to watch The Genius.