ETSI NFV#13 NFV resiliency presentation - ali kafel - stratus

Ali Kafel, VP of Business Development

Ensuring High Availability and Resiliency for NFV

Monday 15th February, 2016,

3.00 - 6.00pm

Croke Park, Dublin 3, Ireland

MOVING IT TO THE FIELD(CO-LOCATED WITH ETSI NFV#13)

The details of this presentation are covered in this White Paper:

http://www.slideshare.net/akafel/nfv-resiliency-whitepaper-ali-kafel-stratus-technologies

Why We Need Resiliency vs High Availability

Achieving Resiliency Management for NFV

Proof point – ETSI PoC#35

Agenda

Stratus Technologies

Intel PlatformsftServer

Hardware Fault Tolerance

Proprietary Platforms

1980 - Present

Software Fault Tolerance

everRun Enterprise12,000+ Installed

2008 - Present

Trusted Name in Fault Tolerant Computing for 35 years

Stratus Fault Tolerant Cloud

Resilient Cloud TechnologiesBased of proven SW infrastructure

2015-present

Why the need for Resiliency in NFV

• It is no longer about voice services ….. Certain data and video services need HA and Resiliency more that voice

• Even “mature” cloud technologies still lack HA and Resiliency

uptime hours mins secs

99.9% 8.76 525.6 31536

99.99% 52.56 3154

99.999% 5.256 315.4

99.9999% 0.526 31.54

Down time

Reliability• How long a system performs its intended function.

• MTBF = total time in service / number of failures

Availability

• % of time an equipment is in an operable state ie. Service accessible and

service continuity

• Availability (A) = Uptime / (Uptime + Downtime);

• A = MTBF / (MTBF + MTTR)

Resiliency

• The ability to recover quickly from failures, to return to its original form /

state to maintain operable state + QoS

• Resiliency (R) = Availability (A) + QoS

What you need is R, not just A… because, for example:… A 99.999% application that fails once a week for just 1 secs and disrupts active services is not

Resilient and not acceptable

A 99.9999% application that causes increases latency during a fault is not acceptable

Defining Reliability, Availability and Resiliency

Stratus Technologies Page 2

Resiliency Management cannot be done in the VNFs…..Because you cannot manage what you cannot see

Virtualized Resources

Performance Faults

Resource Depletion

Fault Impacts

External Dependencies

Are exposed to

Depend 0n

SDNC-OL

SDNC-UL

Shared

StorageShared

Network

NFVI Fabric

NODE HW C/N/S

NODE SW

Virtualization SW

vC, vN, Vs

Facility Infra

Over 80% of system failure

modes are not directly

visible by the VNFs

Infrastructure decoupling hides

the information required to take

actions on faults from VNFs

HW Faults

SW Faults

Config Faults

Migrations Upgrades

7Stratus Technologies

Resiliency management can be “designed In” in multiple waysbut it’s best done in the Software Infrastructure

Applications / VNFs

Operating Environment

Hardware

• Transparent – no code change• Fast & Simple Deployment• No special App Software

• Very expensive• Inefficient utilization• Special Hardware• Rigid

In the Hardware In the Applications In the Software Infrastructure

Applications / VNFs

Hardware

• App specific state can be Customized

• Can’t detect & manage all infrastructure faults • Code written for resiliency increased by ~40%• Most developers don’t have Resiliency experience• More complex & Longer time to develop

Middleware

Applications / VNFs

with Resilience Layer

Hardware

• Needs to be adaptable to a wide range of Application Architectures

• Broader & Faster fault detection and correlation• Faster and simpler Application development• Transparent – no code changes• Multiple levels of Resiliency

Benefits:

• Reduces Development & Verification time

• Lower Risks

• Faster time to market

Resiliency ManagementIt’s Complexity, Multi-Dimensional and more than just Fault Management

Detection

(Prediction)

Localization

Isolation Remediation

(Service restoration)

Recovery (Redundancy restoration)

Resiliency on multiple factors• Speed of Service restoration & Redund. restoration

• State Management: Service continuity

• “Key state” versus “All state”

• Redundancy mode: Resource consumption / cost

• Application performance impact

Availability

Management

Configuration

Management

Fault management

State Protection

Remembering the preceding events in a given sequence of

interactions within the application

All or partial?

Service Restoration (or Failover)

Insuring that service is restored either through a fast restart or

failover to an active secondary or hotStandy

The speed of Service Restoration depends on the type of

application

Some applications need State Protection, most

applications need fast Service Restoration

Multi-dimensional aspects of ResiliencyTwo Key Elements: Service Restoration and State protection

Slow (mins)

Start from reset

Key state stored on

Re-instantiation afterfailure: No Standby

“OSS, Billing”

“Web server”

Multi-dimensional aspects of ResiliencyState Protection versus Service Restoration

Types of State Protection Full state protection

Key state protection

No state protection

(Stateless)

State Management has

implications on Transparency

Performance

Resources

“Cold Standby”

Service Restoration Speed

Slow (mins)

Start from reset

Failover

Medium (secs)

Key state

Stored in RAM

or Disk

Key state stored on

Pre-instantiated Before failure: Failover to running Standby

“OSS, Billing” “email, SMS”

“Web server”“vCE Router

Forwarder”

“Cold Standby” “Warm Standby”

No state protection

(Stateless)

Performance

Resources

Slow (mins)Fast (msecs)

Start from reset

Failover + key state

reload

Failover Full VM state in

Failover

Medium (secs)

Key state

Stored in RAM

or Disk

Key state stored on

“Warm Standby” “Hot Standby or

Active-Active”

“OSS, Billing” “email, SMS”“Voice control,

Router Control”

“Web server”“vPE Router

Forwarder”

“vCE Router

Forwarder”

“Cold Standby”

No state protection

(Stateless)

Performance

Resources

To do Fast Remediation

you need

Pre-instantiation

State management

Immense Pain Loss ofConsciousness

Loss ofBodily Control

TemporaryBrain Loss

Fault Tolerant Systems Provide Service Continuity, Even During Failures

Failure

Cold Restart versus Hot Standby or Active-Active ……it’s like surviving a heart attack versus preventing one

Cold Restart(Instant HA)

Hot StandbyOr Active-Active(Fault Tolerant)

msecs secs mins hours days

Fully ProtectedBackup Activated -

UnprotectedRestored to Fully Protected Redundancy

Customer Affecting Application Outage NormalApp Restart

All state is Lost

All state is Preserved

Re-instantiation after failure: No Standby

Stratus Technologies Confidential

State protectionGuaranteeing Globally Consistent State

Different ways to describe StatePointing

• Active-Standby synchronous VM replication

• Also known Checkpointing with I/O barrier, I/O lock-stepping or

buffering

What does it guarantee

• Application transparency

• IO barrier prevents all external communications from the

speculative execution prior to state replication

• Consistent VM memory replica between act-standby and hot-

standby, at the confirmed statepoint

We call it StatePointing (VM replication)Providing Service Continuity with fast Service Restoration

VM instances paired between primary and secondary hosts in the cloud infrastructure

State of primary (active) captured regularly and applied to secondary (HotStandby)

StatePoint™ = VM Checkpoint + I/O StateStepping

• Provides globally consistent state

Fast service restoration from the most recent StatePoint upon primary failover to secondary

Automatic redundancy restoration through third host instantiation

Hot Standby Host

SP N-1

If the primary host fails, it automatically switches to the secondary host

Active Host

Guest Run

Epoch N-1

Guest Run

Epoch NSP N-1

Guest Run

Epoch N+1

Guest Run

Epoch N+2

Guest Run

Epoch N+1

SP N+1

Third Host(created post primary failure)

Guest From

SP N+X

SP N+1 SP N+X

Active host

Hot Standby host

Act.-Stby. & Egress Network Traffic

n-1 n+1

QEMU Monitor

Egress Network Queue Barrier; prevents transmission of queued egress packet(s) until the barrier is removed

Insert n

PCR Pause, Capture, Resume (PCR); phases of Statepoint process when VM execution is suspended

Note: For simplicity, n-2 interactions are not shown.

QEMU (Standby)

Network EgressQueue

[snapshots]

QEMU (Active)

Enqueue

Insert n-1 state I/O barrier

Guest VM(Active)

Insert n+1 barrier

n-1 I/O barrier Still onn-1 I/O barrier removed

n I/O barrier still onn I/O barrier removed

Multiple levels of resiliency Ensures flexibility and resource optimization based of applications

Deliver Availability as an

infrastructure service to virtual and

cloud ecosystems

Firewall MME IMS Web Server

While every VNF needs Fault

Management, not all need state

protection

VNF-CForwarding

Element

VNF-CForwarding

Element

VNF-CForwarding

Element

VNF-CControlElement

Monolithic

De-composed VNFs (separate control and forwarding

elements)

Stateless Fast Path

Forwarding

Elements

Stateful

Control

Element

Tolerant(includes State

protection)

Availability(no State

protection)

Unprotected

Modes of

protection

Commodity

High Volume

Networking

Virtualization

Commodity Hyper Scale

COTS Computing

Commodity

High Volume

Storage

IMS…

Virtualized

OSS/BSS

Virtualized

Orchestration

Stratus Node Resiliency Services (NRS)

Protection with Application transparency, no code changesResiliency Functionality in the NFVI nodes & managed in the MANO

Stratus

Resiliency Management

Services (RMS)

OpenStackenvironment

The Stratus Approach has implemented enhancements in KVM and plug-ins in OpenStack to make it seamless for the VNFs

SW Infrastructure Resiliency Management

• Fault protection for all applications, no required code changes for most apps

• State Protection, offering globally consistent state

• Multiple levels of Resiliency – Software Defined Availability (SDA) Control vs. Forwarding element, Stateful vs. stateless, etc

Benefits:

• Reduces Development & Verification time

• Lower Risks

• Faster time to market

Benefits of Resiliency Managementthat includes Fault Management, Availability Management and Configuration Management

The Stratus led PoC (ETSI PoC#35)

Participants of PoC#35

Availability Management with Stateful Fault Tolerance• Demonstrated at NFV World Congress May 6-8 in San Jose, CA

OpenStack Summit, May 2015, Vancouver, Canada

SDN World Congress Oct 2015, Dusseldorf, Germany

• Completed 7/31/2015, final reported submitted

http://nfvwiki.etsi.org/index.php?title=Availability_Management_with_Stateful_Fault_Tolerance

OpenStack based VIM mechanisms alone are insufficient for supporting

carrier grade resiliency, but Stratus Cloud Technology solves that and

provided stateful failover enabling service continuity with acceptable QoS

• Service Restoration in millisecs

• Redundancy Restoration in seconds

Any non resilient VNF can be made instantaneously Resilient with no code

change (as long as it is OpenStack ready and there is no standard way to

package VNF)

Multiple levels of Resiliency can be easily provided using Software Defined

Resiliency in the Infrastructure, based on application requirement for State

and service restoration speed

What we proved with PoC#35

Thank You!

ETSI NFV#13 NFV resiliency presentation - ali kafel - stratus

Technology

Transcript of ETSI NFV#13 NFV resiliency presentation - ali kafel - stratus

226. - Apymsa...alarma automotriz hurricane control remoto extreme $5.49 0002104 ... stratus, neon, durango, urvan, tsuru, sentra, xterra, toyota ... $121.31 0900200 cable bateria

La mayoría piensan: Lo mismo que 4G€¦ · Lo primero es comenzar a realizar NFV (virtualizar enrutadores, servidores DNS) así como optimizar la red (disminuir latencia, eliminar

Resistencia antirretroviral primaria: Situación Global y ... · adehrencia lo tomó intermitentemente. ... • Se estima que el número de pacientes ... AZT 65 EFV 30 FPV 75 NFV

Sin título-12reproecu.com/gallery/inyectores.pdfNota: Estos códigos pertenecen a los vehículos Chrysler Neón - Stratus R/T - Cirrus. Inspección visual Veriﬁca que el arnés

Tendencias NFV Universidad Carlos III en NFV movilforum

Peraj Reunión Nacional 2018 Zacatecas como fotografía El ... · Hass-Cohen, N. & Findlay, J.C. (2015). Art Therapy and the Neuroscience of Relationships, Creativity, and Resiliency:

Rosa Maria Rourich - arxiumunicipal.guixols.catarxiumunicipal.guixols.cat/attachments/tallers_historia/patxot/arxiu... · ELS NÚVOLS. cumulus stratus cirrus. RETALLEM EL CONTORN

· 2018-04-05 · UNIVERSIDAD NACIONAL DE ASUNCIÓN ... Filtros, Proxy, NAT. PCP Introducción a las Redes SDN y NFV 6.2 6.1. Historia. 6.2.6.2. Concepto ... Trabajos prácticos

PowerPoint 프레젠테이션C0%AF%BD...SKB BSS/OSS SKT BSS/OSS End-to-end Service Dashboard TIO Portal Cloud VPN Policy control Transport NFV SDN control Control SO (mobile product

CL1. CUMULUS DE POCO DESARROLLO VERTICAL O … · cl7. stratus fractus o cumulus fractus, o ambos de mal tiempo (pannus) generalmente abajo de nimbustratus o altostratus. cl8. cumulus

NFV testing landscape

SDN y NFV ganando tracción: ¿está lista su infraestructura?

IBM Cloud Resiliency Consulting Services

BOMBAS DE AGUA - Sagaji · stratus 4l 2.0 1995-2005. gp996 ub2422 caravan 4l 2.4 1996-2007. grand caravan 4l 2.4 1996-1997 c/ tensor manual. stratus 4l 2.4 1995-2006 dohc. gp2422

Índice - Antenas Ponticantenaspontic.com/assets/catalogopontic2014.pdfVarilla de acero inoxidable o negra con tornillo largo para Neon 95-99, Stratus, Cirrus, Voyager y Ram 10-VEN

Agua, Aceite y Detergente - Stratus · Adopta una estructura transportadora de botella de tipo colgante, para que el cambio de modelo de botella sea más conveniente, más rápido

Ingeniería Avanzada de Colombia Ltda. - PROYTEK, …. Levantamiento CCPC/Manual_Usuario... · sistema scada centro de compresiÓn pacific cartagena pacific stratus energy informe

Motor Chrysler Stratus

Contaminación Química, Petróleo y Salud en la Región Oriente de … Beltman.pdf · 2015. 9. 9. · STRATUS CONSULTING Ejemplos de Efectos Sobre la Salud en Personas Expuestas

Universidad Autónoma de Nuevo Leóncdigital.dgb.uanl.mx/la/1020014922_C/1020014924_T3/1020014924_018.pdfparlem de die emplear parte de en beber. Stratus membra, vcz de ha bens membra