Last year I spent a lot time at the operations side of life. Bringing new IT-systems into a productive and operational state is a pretty interesting and challenging topic. Since I am a freelancer I need to get very often a pretty quick insight into new environments. And the first touching points are always important documents like the design document and ……

The operations manual

Having such a document has a lot of benefits:

  • Given a first insight to new members of the Operations team (IT- and Operating has a high fluctuation, hm?)
  • Offers a change to neutrally audit/review operations tasks by 3rd parties
  • Having all necessary information to operate a system in 1 document (yeah I know… the design documentation would be veerrry beneficial as well)
  • …………… so many more (please comment)

Even though many companies struggle to create one (sure it costs time and therefore money) I will try to give a good starting point with the following operating model I have created. This model can be used for creating a new operations manual from scratch or if you just want to audit your existing manual. Please be aware of that this a generic model and is not specific to certain environments.

I don’t claim that I know and included every important thing that must be included in such a manual… so feel free to give me feedback and I will update the document accordingly (if the discussion is leading to a conclusion which let the model evolve and bring more benefit to all of us).

I divided the model in 3 different sections that must be in each operations manual.

  1. General information: Which IT-Service is delivered? Which communication channels are used? Which persons are important for the operating and during escalations?
  2. Functional tasks / requirements of an ops-manual (Does anyone has a better wording to describe those ?) : Concrete tasks and information that are done or used by operators/administrators to keep the basic functionality of the IT-systems.
  3. Non-Functional tasks : Tasks to ensure performance and availability of the solution. Those tasks ensure the quality of the environment and are typically separated in two phases – detecting and acting (e.g. failure & recovery, performance problem & fix). IMO those are the tasks that are really important to grow from a pure cost-driver within a company to a service-provider. A lot of organizations are having structured methods for detecting, but missing a well a structured process afterwards.


At the end of the day I believe that each operations manual should give information about the mentioned items.  Having a structured document with all of those information of the environment separates boys from men (from an organisational/maturity point of view 😉 ) –> so let’s grow up, create one and give me feedback about your experiences with those type of documents.