Recently in Incident Response Category

Incident Response Planning

Some time in the past, it seems that my work has shifted mostly from doing (what I consider) actual work to talking and writing about actual work.

Although I spend much of my time writing or reviewing documents and sitting in meetings, every now and then a call comes in that usually starts with something along the lines of "Something interesting happened and I think you may be interested in it". That phrase will send me right over to my incident response documentation toolkit and I start taking notes. Being responsible for incident response management gives me that one escape that everyone should have.

Most incidents can be dealt with as a matter of routine and can typically be addressed by help desk and system administrators. Such calls usually result in a quick resolution and happy customers. In my role (CISO-like) I care mostly about proper execution of previously defined security procedures and about getting some good reporting out of this process. That reporting will then be used to drive process improvements.

Not all calls can be resolved by relying on procedures that are documented in detail. Sometimes you can tell that it will be a good one, just based on the caller ID of the person who reports the event to you. Other times, you'll know in the first minute of the conversation that something interesting is going on. In those cases, having a documented incident response plan will be invaluable.

Your incident response plan may not give very detailed instructions for such odd cases, and that is fine. In such cases, your plans should give you the guidance to determine what to do and know that your actions have been previously discussed with key executives and that they have their support.

If your computer security incident response team (CSIRT) is activated, you want to things right the first time. For incident types where it is impossible, impractical or just too expensive to developed detailed incident response procedures, I have found it very useful to document some incident types, such as "Unauthorized information disclosure", and provide a basic strategy, roles and responsibilities, and reporting forms for these incidents. Rather than focusing on the cause of the incident, my planning revolves around the outcome. This approach is sometimes referred to as an all-hazards approach. An analogy is that in your response planning, you don't really care what started the fire, but you do care about how to fight it.

Incidents like this are meant to be addressed by the CSIRT and for a successful tactical and operational execution they rely heavily on the expertise and training of the incident handlers. I typically use the following boiler template when developing general response templates for CSIRT incidents:

1. One or two pages on general response strategy, including explicit assignment of some basic responsibilities. For example, one of the first steps in the identification and notification phase will be "CISO will send heads-up notice to CIO after initial report has been received".

2. A master checklist containing one or two pages of actionable items that cover the entire response process. Action items can be: a) Determine law-enformcement involvement. b) Assess and document scope of breach. c) Close case. Checklist must have enough space to check off the items, add comments, and record dates/times. Completed checklists will be used as input to writing up the post-incident report.

3. Some basic reporting forms. I tend to conclude my identification phase with a written report that outlines initial findings in a convenient one-page format that I can use to update key stakeholders. The initial incident report will be used as input to writing up the post-incident report.

4. Timeline forms that can capture date/time and actors of all actions that take place affecting the response. All actors are required to maintain their own timeline forms and they are also used as inputs for the post-incident report.

Approaching an incident response in this way, where a basic strategy is known ahead of time and "maintaining excellent notes" is embedded in the response is a key for successful reporting and process improvements.
Like many other professions that have a security dimension, information security professionals are (or at least, should be) trained to deal with crises. Excellent training is available from many sources, one of which is the SANS institutes security 504: Hacker Techniques, Exploits and Incident Handling. Since I am a mentor for 504, I feel that I am fairly comfortable with the material. One of the topics that I have found lacking in most training of which I am aware is that, while several (very useful) approaches to incident handling are discussed, not all that much attention is paid to how to actually organize an incident response structure.

In order to provide some more guidance to my students, I have done some research and I ended up on the FEMA site. While the Federal Emergency Management Agency is often scorned or ridiculed, they do have some interesting materials available for free.

Some background information first. FEMA's mission is to support citizens and first responders to ensure that we work together to build, sustain, and improve our capability to prepare for, protect against, respond to, recover from, and mitigate all hazards. This definition has "government" written all over it, but there are some useful components for my purposes.

Specifically, the part where they mention "prepare for and respond to" (incidents) has relevance.

FEMA's emergency management institute provides many types of study in the field of emergency management, but the one that I am most interested in is the independent self study option. Under the Independent Study Program, some very interesting resources are made available for free; more specifically, some modules are offered that address the Incident Command System (ICS).

IS-100.a Introduction to Incident Command System is a module that introduces the concept of an incident command system. "The Incident Command System, or ICS, is a standardized, on-scene, all-hazard incident management concept. ICS allows its users to adopt an integrated organizational structure to match the complexities and demands of single or multiple incidents without being hindered by jurisdictional boundaries." It does not take much imagination to see how this concept can be applied to information security incidents, or to wider incidents that include information security aspects.

The ICS approach is based on a few common concepts. The ones that are most relevant to us are the use of common terminology and clear text, adoption of a modular organization, management by objective, reliance on an incident action plan, and maintaining a manageable span of control.

The training material discusses roles and responsibilities of the incident commander, delegation of authority, unified command, command staff, general staff, and much more. All concepts that are very useful when dealing with security incidents or business continuity events.

I highly recommend taking a look at the online FEMA training offerings. They are free, include a self-assessment and if you pass the online exam, they will even give you a pretty certificate in a PDF file. No pretty letters after you name though.

Two more excellent GIAC Gold Papers

Since I have taken the role of a GIAC Gold adviser, I have seen many good papers pass by. Every now and then, some jump out as being clearly above average. This week has been a particularly good week and two new additions have joined the reading room.

Security Incident Handling in High Availability Environments by Algis Kibirkstis adopts the point of view of a telecommunications provider. Having done some data modeling work in large telephone exchanges myself, I have always been intrigued by the high level of requirements that this industry puts on itself. Kibirkstis provides an excellent overview of the concept of High Availability (carrier-grade reliability) and goes on to describe how the incident handling process takes place in these environments. The paper ends with a set of 8 concrete recommendations. The paper is available here.

Investigative Tree Models by Rodney Caudle ties in to my other fascination: how to use symbolic models to improve real-world situations. No, I am not talking about glossy fashion magazine models, but things like decision trees, graphs, etc. Caudle describes how to use attack trees to aid incident investigations. He takes the reader through the formal definitions of these models and clearly explains them by providing well-documented examples. The second part of the paper describes a full case study on how to use a tree model to obtain proof in an investigation into email abuse. The paper wraps up with a brief conclusion and a look forward at some possible future trends. The paper is available here.

More information about GIAC Gold certification can be found on the GIAC website.

The Apache foundation experienced some downtime on August 28 when unauthorized access to their servers was detected. A few days ago, the Apache infrastructure team posted a very well-written post-incident report in which more details with respect to the attack are published, and an overview of the lessons that were learned from it are shared.

The report is very well written and worth reading. Some key findings:

  • "The use of SSH keys facilitated this attack." Yes, SSH is more secure than telnet (or rlogin), but it must still be hardened.

  • "The ability to run CGI scripts in any virtual host, when most of our websites do not need this functionality, made us unnecessarily vulnerable to an attack of this nature." Very few people are not also guilty on this one. Trim down a system's configuration to only provide the minimal amount of functionality it needs to do the job.

  • "We will re-implement measures such as IP banning after several failed logins, on all machines." Brute force attacks are still one of the most successful attack vectors. Automatic account lockout and restricting the network space from which incoming connections are allowed in the first place seriously reduce the attack surface.

  • "Because they obtained root on the CentOS machine, we are not entirely sure, almost all logs on the machine were destroyed. The machine ran many stock web applications and may of had less than secure password practices -- but once they got root whatever evidence of the initial hack was destroyed." Keep critical logs on a dedicated, hardened server to facilitate post-incident analysis.