Will Machine Learning Find Your Next Security Incident

Josh Lemon
Article by Josh Lemon

Digital Forensics and Incident Response Managing Director & Certified SANS Instructor

Machine Learning, Security Analytics, Behavioral Analytics, call it whatever you like, but will it really uncover hidden security incidents in your network?

For the purpose of this article, I am going to refer to machine learning, security analytics, behaviour analytics, user based anomaly detection, and all the other flavours in between as "machine learning". Yes, I understand they can all be different and represent different areas; however, I want to discuss the application of machine-based detection being used to find security incidents versus human based detection.

I have to say up front that I am not a data scientist, I'm an incident responder, so my knowledge and skillset lean strongly on actually conducting an investigation. I often scoff at the idea that a self-learning algorithm is going to replace a human that sits behind a computer sifting through evidence to find all the attackers that are hidden in a network, just like magic. It kind of sounds like science fiction - really, you think your black box of magic code will suddenly find that APT actor is sitting in the corner of the network. You can see how that idea, as an Incident Responder, is just hard to comprehend. But maybe I should not be too harsh, I mean I'd love to have some more free time to kick back in the office and post funny memes of fellow staff mates, who wouldn't. And the idea that a self-learning machine can tell the difference between a rubbish IDS alert and real IDS alert without human intervention would be a big win for incident responders everywhere.

Machine Learning is simple, we just buy some right?##

Machine learning and behaviour analytics are starting to step out of the world of science fiction and into the world or reality. Look at online giants like Google and Amazon, whose business model allows them to invest heavily in research, staffing, and resources to produce machine learning systems that market to the most likely person who would buy a service/product from them. This is also common in other business environments that involve a significant return for an organisation when a more tailed purchasing experience is presented to potential customers. While I'm slightly biased, take a look at Salesforce that has invested heavily in machine learning, they have even managed to commercialise it for their customers in a module called "Einstein". What I'm trying to get at here is that the organisations that invest in machine learning have a significant return to gain from it, and right now it's predominantly the online sales/marketing sector that has the most to gain from investing in machine learning.

When you contrast the financial returns to be gained from the likes of Google or Amazon investing in machine learning, versus the dollar value returned from the same machine learning application being applied to an IT security department, you can see the two just don't stack up. Security operation centres, and vendors, right now know that machine learning is a hot new tool to claim they have (as was "Threat Intelligence" three years ago). However, I believe we're very much in the infancy of actually seeing security operation centres, or security products, actually produce outcomes from finally turned tools that really use machine learning. That's not to say we won't get there in the IT security space, just that we're not there yet and you should be highly sceptical of vendors or operation centres claiming to use machine learning to find security incidents. Sure, there are some security operation centres, and vendors, experimenting with machine learning, and I'm not saying they shouldn't - this will hopefully move out industry closer to a practical application of machine learning for information security. I'm just saying check the difference between someone saying they are using machine learning in these environments as opposed to 'experimenting' with machine learning.

Machine Learning to replace incident responders....##

The other thing you need to consider is that criminals or malicious actors are not always predictable in the same way our shopping habits might be. In fact, they usually continue to be criminals because they are unpredictable enough not to get caught. Also, the work of Incident Response is fundamentally a human physiological task. Incident Responders are not just technicians with digital forensics skills that follow a strict playbook every time, they are individuals with the ability to predict and anticipate what an actor may or may not do next, along with the ability to either tip off or remain observant of an actor. Additionally, actors love it when we, as security professionals, decide to drop in a security tool that detects security events in a predictable and consistent way. This allows an actor to craft their attacks to avoid detection, sure they get caught the first few times, but once they understand the detection tool they also invest time and resources to bypass it in the future. Look at the effort criminals go to now with continually updating their ransomware or banking trojan malware, so it is less likely to be detected in a sandbox or by a spam filter. An analogue of this concept applied to bypassing machine learning is companies specialising in SEO and claiming to predict how Google's machine learning search engine will and won't prioritise your site - and this type of business model is huge now.

So what should you do now?##

Where do I think this leaves us with Machine Learning, Security Analytics, Behaviour Analytics, and the rest? I personally believe that we won't see machine learning ever replace Incident Responders, I also don't believe that we'll see commoditised machine learning tools/systems in the IT security space for quite some time. Once the entry cost for IT security departments to access machine learning drops it's likely we'll see it introduced into our industry, although then we'll have the hurdle of training and tuning machine learning devices, and right now that's extremely time-consuming and probably not the right investment for your IT security dollars. The teams or vendors that I've seen lay claim to utilising machine learning for detection of incidents simply use very well thought out and crafted detection rule sets that chain events together to produce security events that are more likely to be interesting. They don't perform full incident response, and they don't auto-magically find an APT and contain a threat. For right now, spend your money on a small team of staff to do dedicated tuning and maintenance on your existing security detection devices, let's get that right first, and be cautious of people selling you the idea that they can do machine learning to find all of the unknown security incidents in your environment.