How do you define resilience

Data center

According to the American consulting company Uptime Institute, the availability of an IT environment can be classified into the four availability classes Tier 1 to Tier 4. The lowest tier 1 level allows an annual downtime of around 29 hours and does not require redundancies for energy and cooling distribution. At the other end of the scale, the Tier 4 classification allows only 0.4 hours of downtime per year. Here, supply routes are duplicated several times and maintenance is possible during ongoing operation without IT downtime.

In Europe, companies today orient themselves to DIN EN 50600. With a holistic approach, this standard provides comprehensive specifications for the planning, construction and operation of a data center. The highest availability class 4 (VK 4) defined there does not provide any concrete quantitative information on downtimes, but rather makes conceptual specifications for “very high availability”. For example, VK 4 provides for a design with system redundancies, i.e. proposes double supply paths, but only one refrigeration supply path. Another classification for failure safety comes from the Federal Office for Information Security (BSI), which defines VK 4 with 99.999 percent, which allows a downtime of 26 seconds per month or 6 minutes per year.

Reliability - a lot helps a lot

IT managers meet the requirements of a high level of fail-safety with the concept of a redundant infrastructure. In the IT area, redundancy means that functionally comparable resources are held in duplicate. This means that excess capacities are created in order to be able to compensate for a hardware failure. The simplest form is N + 1 redundancy. In addition to the required units, a further component is provided here - i.e. the required unit N (= Need) plus one. If a component fails in such an architecture, the standby unit takes over. Based on these theoretical considerations, the failure safety is optimized at the hardware level using the electricity, cooling and monitoring functions.

Secure the power supply with A / B feed

Securing the energy supply is a central task in the operation of data centers. Power fluctuations and short-term failures are protected by battery-buffered UPS systems. If the UPS works with a modular architecture, the entire system does not have to be designed to be completely redundant. Rather, one or two UPS modules can be provided to compensate for the failure of another module. The advantage is low costs, since fewer standby units are required. This can be expanded to include the 2N concept: Here, two different mains supply lines feed the UPS systems. This so-called A / B feed ensures that the energy supply is always secured via a supply line. With the highest level of failure safety, the individual energy paths are designed redundantly down to the level of the IT racks. An automatic transfer switch (STS - Static Transfer Switch) automatically switches the active energy source to the current path so that the power supply is secured at all times.

You might also be interested in:

How data center operators avoid failures

In five steps to the next generation data center

Protect IT cooling against power peaks

Another critical component in the data center are the cooling systems: If the cooling system fails, there is a risk of overheating and damage to the server. If maximum reliability is required, the IT cooling should be supplemented by a UPS system in order to compensate for current peaks and fluctuations in the power grid. The technical term for uninterrupted IT cooling: "Continuous cooling". In addition, A / B protection of the energy supply is usually not used in cooling systems. Also, no double water circuits are installed.

For emergency cooling, it may be sufficient under certain circumstances to open the doors of the IT racks via an automatic system in order to prevent heat build-up. Ultimately, however, in the event of a cooling failure, the primary aim is to shut down the servers quickly and without data loss in order to protect the hardware from consequential damage.

  1. Heinz-Jörg Robert, Axians
    "The road to a unified hybrid cloud operating model will dominate the IT departments' agenda in 2018."
  2. Uwe Müller, Cisco
    "The administration of hybrid environments requires management systems that are able to put multiple clouds and different technologies under a uniform surface."
  3. Peter Dümig, Dell EMC
    "A central trend are hyper-converged infrastructure solutions. Here we are observing strong growth and great interest in such solutions as a platform for virtualization and software-defined storage (SDS) in all forms."
  4. Michael Homborg, Fujitsu
    "In the areas of servers, server virtualization and data centers, three trends in particular will continue in 2018: software-defined data centers (SDDC), the implementation of modular servers and the use of OpenStack as a universal cloud platform."
  5. Stefan Weber, HPE
    “Hybrid IT infrastructures no longer only include the local data center and various clouds, but increasingly also IT outside the data centers, in other words: edge computing directly at the point of data creation, in factories, stadiums or shopping centers. However, the interlinking of different forms of local IT and clouds increases the complexity. "
  6. Ingolf Wittmann, IBM
    "Current and short-term topics are Edge Computing, Persistent Memory and Docker. In times of IoT and Industry 4.0, an unbelievable amount of data is generated from sensors and devices that have to be processed in real time."
  7. Dieter Stehle, Lenovo
    "In 2018, IT managers in the area of ​​servers and server virtualization will be very much concerned with turnkey solutions. The modularity and scalability of hyper-convergent solutions will become increasingly important in the future. In our view, the hybrid cloud, as-a-service offerings and managed service providers will play a very important role in the data center area in 2018. "
  8. Michael Haderer, Thomas Krenn
    “General IT trends such as machine learning, IoT, and Industry 4.0 will also determine the trends in the server landscape in 2018. To do this, rapidly growing amounts of data need to be both stored and analyzed. "

Pay attention to the energy supply for the monitoring

The third aspect at the hardware level is monitoring. The monitoring system used to monitor the infrastructure should be backed up by a redundant power supply. So it is possible to secure the system via PoE (Power over Ethernet) in addition to a regular power circuit. A fully mirrored monitoring platform, which then works as monitor A and monitor B instance, offers the highest level of security.