There is a scramble to figure out what to do and how to recover during a systems breach. Ensuring your logging is properly configured is the best way to preserve evidence to recover, and harden.
You’re probably thinking that’s a little clickbait-y. And you’d be right to think that.
This is not a hands on guide to incident response and recovery.
But we’re going to review the common infrastructure configuration mistakes that hinder the preservation of evidence and the containment of the breach.
We’ll outline the common mistakes that we have seen that limit the information available to teams to determine how and when the breach occurred, and what to recommend in order to properly secure the network against the current attack or similar future attacks.
There will be questions, finger pointing and wild statements like:
Is the breach still happening?
How did the attackers get in?
When did they get in?
What did they access? Did they delete any?
Who would do this?
I told you that we should have changed that password a long time ago
That was clearly Joe, he probably left a back-door
OUR COMPETITORS ARE AFTER US!!!!!!!!!!
Where are the back-ups? We do not have back-ups?
What do you mean you don’t know who did it?
In the following, we are going to outline the mistakes that have led to this invisibility moment. From there, we are going to offer solutions for it.
When you discover a breach, remember:
Don’t panic
Don’t make hasty decisions to wipe and re-install your system (not yet)
Do follow your incident response plan
Before a breach make sure you correct these common infrastructure mistakes.
Mistake #1: No Logging
The lack of information about the state and history of the systems. The lack of information becomes clear almost immediately, it’s the lack of information. Many of the systems are not collecting logs.
This is the main reason why proper logging mechanisms are part of every compliance framework we know.
It is important that every layer and service in your infrastructure has the ability to produce logs. It is equally important that you activate the logging mechanism, collect the logs, and review them regularly.
Logs should be collected and retained for every layer, including:
Your web server: Any request to your server should be captured with basic meta information.
Your databases: Any INSERT, UPDATE, or DELETE operation should be captured with the specific user who has done this.
Your Authentication services: Any login and logout attempt, successful or not, should be logged.
Your own application, not to forget: Your application should produce logs, and these should be captured.
Your administration/build services: If you have e.g. Jenkins, you should keep those logs.
Any physical servers produce system logs: These are, on Linux based systems, usually stored under /var/logs.
The logs are, potentially, a lot of data. Companies need to be prepared to store the current logs and a history of the log files for a period of time. A company wide data retention policy is required, 90 days of retention are usually sufficient.
Mistake # 3: Log Collection and Examination
Collection is one thing, organizing and reacting to log data is another.
You need a central spot where all these logs flow. There are multiple providers/tools for log collection and examination services. They include:
OpenTelemetry framework (used with any front-end or other additions)
All of these tools are great log collection and management solutions. Inside these tools, you need to define what is “normal” for your application, and what kind of traffic to flag.
Mistake #4: Shared accounts
In order to identify the exact source and path of a breach, it is important to have log information tied to specific users and sources. Even when you use service accounts, there needs to be meta-information about the source (i.e., the exact container/compute instance including identifiers). Without this information, you are poking around in the dark trying to find a needle in a haystack.
Now What
Follow your incident response plan. A lot of teams don’t have an incident response plan. The objective for teams is usually pretty simple: stop information from being stolen and repair your systems so a breach won’t happen again. Without a plan there is a scramble to figure out what to do and this is usually where big mistakes happen (like wiping the system without making images of the compromised system to learn what occurred to learn to avoid future breaches)
Preserve evidence. It is critical to preserve forensic evidence. This is why the # 1 mistake we see is a lack of sufficient logging. Teams will need to balance the urge to fix it immediately, with the need to not destroy data and logs that will allow teams to investigate the breach and secure the systems against future attacks.
Contain the breach. This can be as simple as disconnecting the Internet connection, though this is much more difficult with the cloud. You’re looking to disable, not delete, access to the breached systems. This includes servers, access points, firewalls, etc. Change all passwords (documenting old passwords for later analysis) and disable all non-essential accounts. Preserve all logs including firewall, network access, system logs, etc. This is not a complete list of actions but a starting point.
Investigate the breach, fix your systems, and get back online.
Start as close to the source of the symptoms as possible. For example, were there new/deleted entries in the database that were not intended? The logs should reveal when and by whom these updates happened, and potentially where they connected from in the network.Do you suddenly recognize some more outgoing traffic on your server, as if someone has installed a crypto-miner? Check out the system-logs of your specific machines.
If you identified a user, you can start investigating what they were doing. This is the moment you may recognize that the whole issue was caused by a bug. If that is not the case, cautionary change the password for this user and examine your authentication system where logins did come from.
If no clear user was possible to be identified, then the source was one of the internal servers. Go back to Step 1 on this newly found server then and see if you can identify the user.
Examining the authentication system. You can see in the logs there when the specific service/user started sessions. There are multiple outcomes here:
In the best case scenario, you can identify a login that was clearly not them. Then changing credentials and invalidating sessions is the way to go.
If the login seems to have been done by your user, and the user did not do anything according to their statement, you can analyze the traffic again for the source. If you see multiple IP addresses there or unusual behavior during the user session, you know that a third party came in. In this case, they either found a way to spoof the session or to get their hands on the authentication token. After changing the credentials and invalidating all sessions, you then need to figure out how that happened. Check if your authentication system is up to date, and all libraries it depends on, and if there are active CVEs. The same holds for your front-end, where the token is being actively used to do server calls.
There was no login, and somehow the attacker got in and acted as the named user. That is the worst case scenario, but happens even to the best. This is the moment you need to beef up your unit tests to see that none of the endpoints allow any access without the proper authentication. This would be also the moment to use and enforce standards such as OpenAPI.
Conclusion
Being able to investigate a breach, understand the vulnerabilities, and prevent them in the future are dependent on information. And this information is captured by logging. At CoGuard, we check every configuration that has the proper logging enabled, at all layers of your infrastructure. In this way, you are prepared to examine any breach at any moment.
Oops! Something went wrong while submitting the form.
Check out and explore a test environment to run infra audits on sample repositories of web applications and view select reports on CoGuard's interative dashboard today.