GenAI as a Cyber Threat Intel copilot - Swings and Misses
I’ve written here before about using GenAI apps as a sort of partner in cyber threat intel research efforts. I feel sure that getting good results in these efforts depends just as much on what I ask of these apps and how I prompt them as it does on the apps’ capabilities. My initial interaction this morning on this topic was a perfect example of how not to get good results. If I was up at the plate in a baseball game, there would have been some laughably bad swings where my bat got nowhere in the same zip code as the ball.
Why the big misses? Why more than one swing and a miss? I was asking four GenAI apps to do something that’s not always, or even often, in their wheelhouse. I wanted them to find recent links addressing a specific ask in my prompt to them. This prompt:
I would like you to help me find the 3 most interesting articles from that list published in the last 48 hours. By most interesting I mean those that offer the most useful information on critical level threat alerts and the techniques and indicators of compromise associated with attackers
My prompt here is not the best, but the bigger problem is with asking for links, and even more so asking for recent links. I won’t share the responses here, but I will say that all four GenAI apps - ChatGPT 4o, Gemini, Claude-3-Opus, and Pi - delivered poor, unusable responses. The responses were a mixture of invalid links (404 errors and even searches for alleged post titles yielded nothing), and big misses across the board on the timeframe of articles. Some were from as far back as 2008.
I tried a few variations of my prompt. I gave the apps a list of my 15 tops threat intel RSS feeds to work with, and tried a time line of 5 days. The results were no better. Eventually - back on the baseball metaphor, having stuck out in my first three plate appearances - I went with a very different type of prompt. My hope was that the new prompt was more likely to get good results, leveraging GenAI apps’ ability to summarize and analyze data. Here’s my new prompt with the new approach, where I give the apps a role, a specific article to work with, and some context around what their response should achieve:
You are a cyber threat intel analyst who works closely with the SOC in your organization. Reading this article from Red Canary - https://redcanary.com/blog/threat-detection/entra-id-service-principals/ - what concise guidance would you give to the SOC team on detection for this sort of BEC effort
I don’t want to name and shame any of the apps, but lets just say that Gemini has zero chance of making it to the big leagues based on its response here. The one app that offered a (very) good response was ChatGPT 4o. Its response is concise, clear, and could easily be put to use by a SOC team:
Guidance for SOC Team on Detecting BEC Efforts Using Entra ID Service Principals
Monitor
MailItemsAccessed
Logs: Regularly analyze theMailItemsAccessed
log to detect anomalous access patterns. This log tracks access to Exchange Online mailboxes and can help identify unusual activity.Identify Anomalous Applications: Set up alerts for unusual
ClientAppId
values, especially those not commonly used within the organization. This can help detect unauthorized third-party applications.Track Service Principal Permissions: Focus on service principals with delegated or application permissions. Delegated permissions should match the user’s privilege level, while application permissions should be scrutinized for broad access.
Baseline Mailbox Activity: Establish baseline behaviors for mailbox access and flag deviations. Compare historical data to detect unusual patterns in access frequency or volume.
Analyze IP Addresses: Investigate access attempts from unfamiliar IP addresses, particularly if they are Microsoft-owned IPs when using service principals. This can indicate malicious use of trusted applications.
Review Consent Processes: Be vigilant about applications requesting access through consent processes, particularly for multi-tenant applications. Validate the legitimacy of these applications and their required permissions.
So what have we I learned here? I need to remember and identify where GenAI apps do not consistently deliver quality responses and focus on taking advantage of their strengths. My “ask” of them, and my prompts to them, need to be better. This is an ongoing process of course, and I am going to continue looking for ways to get the most of working with GenAI apps on cyber threat intel research, looking for ways it can make my efforts more efficient and effective.