Friday, November 12, 2010

Understanding Priority Issues Impacting Business Operations

Huge enterprises have a network of complex policies that tightly knit together the business models and the support structure made of people managing technology tools.  The progression of decades in problem management/experience have lead to the creation of policies that provides a domino effect inherent in the support structure chain that responds immediately to certain events leading to a possible loss or something similar.  Thus change management came into the light along with risk management and systems audit management.  One of the most highly debunked support structure of this new areas in systems management was the creation of "Priority levels"

What are priority levels?  Priority levels are part of the change management chain and widely used by management and stakeholders to address issues on systems.

In this post I am going to detail a very important ingredient that highlights the life of a systems personnel (admins and developers alike),  addressing change management using priority levels.  It is my intention that a clear picture must be drawn, while discerning the events where priority levels take into account.

The P1 (Priority Level 1)  -- Response Time 1 hour (CRITICAL)

- Any major failure affecting an entire site/business or more than one device/server
- Business is loosing millions
- Impacting huge number of users

Priority level 1 is the most "CRITICAL" a  P1 is when you drop everything and focus on the problem.  However, this has been misused and abused by senior management who is paranoid of the fact of losing so much that isn't there.   I have seen and experience these scenarios happen in real life I myself was a victim of it!  To battle your way out those criteria must be present above to address a real P1 concern.  I hate it when P1 levels are brought down the test environment.  It's a complete mockery of the policy and virtually nothing to do with the real issues in dealing with those mentioned by definition.

 - Immediate notification to Engineers
-  Escalation direct to 3rd level Engineers
-  Escalation direct to Incident Manager / Senior Management

The P2 (Priority Level 2)  -- Response Time 8 hours  (HIGH)

- Incident affecting single, critical device/server
- Site/service functioning but performance is degraded.
- Affects only a small number of users.
- Incident during normal/critical period.

Priority level 2 P2 is considered "HIGH" importance is significant but requires that the issue be address within the business day span and not more than EOB (End of Business).  Normally, P2 will succeed a P1 in order of task to be undertaken.  A good example is when you have a key functionality in a system that is hampering expedient recourse to a given output or desired performance levels.  In my experience P2 issues in systems administration are the most "VAGUE" a P2 is raised due to a phenomenal behavior in the system which defies even the most prudent investigation procedure.  Often a fix is found by making the wrong decision for the right course of action, which eventually led to a key vulnerability issue or a kernel bug not yet known.

- Immediate notification to Help Desk Supervisor.
- Incident Manager also informed.
- Notification to Team leader/Manager if response SLA not met.

The P3 (Priority Level 3) -- Response Time 2 days (NORMAL)

- Normal service requests and incidents affecting non-critical device/server
- Site/service functioning but performance is degraded.
- Affects only a small number of end users.
- Incident during quiet period.

This level is a normal day to day affair that systems administrators must face. e.g. unlocking locked out users from the system, creation/removal of accounts.  Investigation to key services behaving normally and registered by the system however, unexpected results appear from time to time due to a bug in the code somewhere etc.  Tickets such as those pertaining to maintenance related works on the system take the course of this level.  Meaning an expected well define procedure is in place to execute a task which is about to happen.

- Notification to Team leader/Manager if response SLA not met.

Their is a P4 and P5 in some cases these levels are considered low and not-prioritized respectively.  Sysadmins are always at the forefront of making sure that the infrastructure supporting a sizeable application with huge monetary value is sitting on a very robust system.  Those levels presented above may or may hold true to your organizations use, but then has the last say on where to put priorities especially if management has a good grasp of the technical workings involved (assuming your management team are former IT people).  Jack-of-all trades are the bane of most companies, again I will refer it to my previous post...

If that is the case, then better throw in the towel and start looking elsewhere for an organization that respects these standards.


No comments:

Post a Comment