08 March 2018

How simplicity overcame redundancy in the quest for uptime

As IT services move increasingly to the core of organisations’ business strategy, there has been a focus on hardware redundancy as the key to maximising uptime. Redundancy dictates that for every data centre component required for robust, continuous operation there’s at least one duplicate standby component – making it an expensive, complex proposition. But what if you could guarantee data centre uptime without the extra redundancy?

To achieve the Uptime Institute’s highest Tier IV Certification , a data centre must demonstrate fault tolerance, which is the ability to function normally in the event of individual equipment failures or distribution path interruptions.

Unsurprisingly, the effort required to design, build and operate a Tier IV data centre with fully redundant N+N infrastructure has been economically viable only for large data-dependent organisations such as banks.

However, as NEXTDC has demonstrated through the recent launch of its second-generation Tier IV data centres in Brisbane and Melbourne (and shortly in Sydney), hardware redundancy can be overcome, as one of several unique design, engineering and operational measures that work together to deliver maximum uptime, affordably.

Complexity versus simplicity

In designing the second-generation data centres, Jeff Van Zetten, Head of Engineering and Design at NEXTDC, looked at how to combine innovative engineering and design with cutting-edge technologies, and achieve Tier IV certification without the prohibitive costs of 2N redundancy.

“While you could deliver Tier IV through doubling the electrical capacity and putting in horrifically complicated chilled water systems and automatic control valves, our aim was to deliver Tier IV levels of uptime through an elegantly simple design and build that achieves fault tolerance through the separation, segregation and modularity of critical infrastructure components.

B2 cooling towers.jpg

Shown above, B2 features incredibly efficient, stand-alone modular cooling tower units, instead of one large, central water plant unit, each linked to a different set of risers.

“Simplicity is actually the key here. In addition to the hard-to-justify costs of building to 2N redundancy, you also end up with a far more complex system — and complexity is the enemy of reliability. When you have more components and pathways, deployment and maintenance becomes harder, and the probability of componentry failure increases.”

Accessing technology at the cutting edge

Van Zetten continues, “Fault tolerance on its own does not prevent equipment failure. Beyond redundancy and fault tolerance, maximising uptime also depends on the reliability of every component; on selecting durable, high quality equipment from premium suppliers that have a track record of supporting their products and technologies.”

Underscoring the importance of reliability, NEXTDC’s second-generation facilities utilise technology from industry pioneers, including uninterruptible power systems from German manufacturer Piller; emergency power generators from MTU (a division of Rolls Royce); and transformation, distribution and switching infrastructure from global technology leader ABB.

Piller UPS B2.jpg

PIctured above in grey, the ABB-supplied boards that enable Piller’s Isolated-Parallel Bus-based electrical distribution and protection scheme (IP-Bus), the heart of B2’s highly reliable and low-risk power system. In blue, the Piller Rotary UPS than can support 1.6MW of load for up to 15 seconds while the MTU generator fires up.

Trusting the people behind the infrastructure

“The other aspect of reliability is maintenance and people,” said Van Zetten. “If you look at many recent major data centre outages, the primary cause has been human error. This can include a failure to adhere to the correct service or maintenance regimes, mistakes during complex switching procedures, poor operational protocols or lack of adequate training.

“We are also committed to operational excellence. Our P1 Perth data centre underwent Uptime’s Certification of Operational Sustainability in late 2017, achieving Tier III Gold, which is the highest possible standard for a Tier III design and construct certified data centre.”

“This certification — which is an audit and not a test or examination — signifies that facility’s management team and their supporting sub-contractors clearly demonstrate that they operate and maintain the data centre in accordance with Uptime Institute best practices.” NEXTDC is now rolling out the operational sustainability certification to their other facilities.

For more information on NEXTDC’s Tier IV design, download the whitepaper Power, Secure, Connect: NEXTDC's new-generation data centres