------------------------------------------------------------------------------
Copyright 1998 IEEE.  Published in the Proceedings of the ISW'98, 28-30
October 1998 in Orlando Florda, USA. Personal use of this material is
permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works
for resale or redistribution to servers or lists, or to reuse any
copyrighted component of this work in other works must be obtained from
the IEEE.  Contact: Manager, Copyrights and Permissions / IEEE Service Center /
                    445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ
		    08855-1331, USA.  Telephone: + Intl. 732-562-3966
------------------------------------------------------------------------------


     On Survivable Multi-Networks for Information Systems Survivability##

   A Position Paper for the 1998 Information Survivability Workshop (ISW'98)

		
      David Tipper*  Deepankar Medhi**  William Yurcik*++  Robert Cotter**	

		
	      *Department of Info. Science and Telecommunications
		        University of Pittsburgh
		          Pittsburgh, PA 15260

	   	    **Department of Computer Networking
		      University of Missouri-Kansas City
		          Kansas City, MO 64110

	
Abstract: A major attack can significantly reduce the capability to
	  deliver services in large-scale information systems.
          We address the survivability of large-scale heterogeneous
	  information systems which consist of various services provided
	  over multiple interconnected networks with different
	  topologies and multi-vendor equipment with both wireline and
	  wireless infrastructure -- the communications network part
	  of such systems is referred to as multi-networks.   We are 
	  developing a comprehensive set of solutions for network design 
	  and management aspects of providing adequate service continuity 
	  in the event of major attack on multi-networks.  The end goal 
          is to support critical services in the event of major attack 
          by making optimal use of network resources while minimizing 
          network congestion.  We expect many of our results will extend 
          from conventional attacks which physically destroy links and 
	  nodes to virtual attacks which destroy or corrupt network 
          control information and databases.


Introduction

Due to the rapidly growing demand for information transfer such as voice,
data, and video across communication networks, the need for reliable
communication service has become increasingly important.  The potentially
drastic effects of communication network failures have been demonstrated
by several highly publicized network failures showing the need for
survivable networks that provide service that is robust to failures.

The final report of the President's Commission on Critical Infrastructure
Protection (PCCIP) alludes to serious vulnerabilities and threats in eight
critical national infrastructures.  Perhaps equally important, if not more so,
is that all of these critical infrastructures are closely interdependent;
a failure in one sector can easily affect other sectors.  Furthermore, all
of the national infrastructures depend on the underlying telecommunications
(computer-communication) infrastructure such as computing resources, databases,
private networks, and the Internet.{Peter Neumann}   The impact of previous
outages due to reliability failures brings into question the fragility of the
U.S. telecommunications infrastructure to an intelligent malicious attack.
Speaking publicly about infrastructure threats for the first time, the
Director of the CIA, George Tenet, testified before Congress that several
foreign governments have "information warfare" programs targeting the
USA.{testimony before the Senate Committee on Governmental Affairs 6/24/98}


Scope of Research
 
For brevity, the communication network portion of such systems are referred to
here as multi-networks. We focus on the development of multi-networks design
models/algorithms to provide a quality of service (QoS) specified under single
and multiple attack/failure conditions. We address the problem of intelligently
designing and evolving a network architecture to improve survivability, 
starting from existing architectures and legacy networks.

Given the multi-networks environment already deployed, we are developing
network management algorithms (e.g., provisioning of backup routes, virtual
circuit rerouting algorithms, etc.) which make optimum use of network resources
after an attack/failure (both large and small types) in support of critical
services. We concentrate on the design and analysis of multiple priority
traffic restoration techniques to provide service continuity while minimizing
the network congestion.

We are developing a multi-layer restoration approach involving a coordinated
strategy between different layers (transmission, traffic, application layers).
Since fault recovery is possible at various layers, one aspect of our work is
determining what combinations of traffic restoration should be used at each
layer and how this is related to the network topological design.  The
restoration algorithms will be suitable for automatic invocation by network
components, resulting in a self-configuring system that adapts to the changing
fault environment.

We specifically address the issue of survivable multicasting services since
these emerging services (audio/video  conferencing, sensor data distribution, 
etc.) will be critical under an attack.  The use of multicasting to reduce 
redundant traffic flows has the potential for an orders of magnitude decrease
of traffic congestion but multicasting also introduces vulnerabilities that are
not present in unicast transmissions such as the involvement of potentially 
more links and nodes, control complexity due to group dynamics, and 
all-or-nothing restoration requirements.

An emphasis is given to studying the transient network congestion that
occurs after a failure and incorporating its effect into the design of the
network and the traffic restoration algorithms.  A major factor of network
performance after a failure is the transient congestion period that results
from restored circuits attempting to send out a backlog of traffic accumulated
for retransmission after a failure.  Thus not only will a critical network
user be provided service continuity, but the service quality must be
maintained to the highest degree possible.


Selected Research Results

We have made progress in several directions on understanding the network
dynamics to address for a major failure in a multi-networks environment.
We touch on some selected research results as described below:

Our work on network design for survivable multi-networks has thus far
focused on the development of procedures for deploying survivable virtual
networks on top of existing physical infrastructure.  ATM and
circuit-switched network architectures currently allow the establishment
of virtual network overlays on a physical network; for example, the
provisioning of Virtual Paths (VPs) in an ATM network. One technique that
can be adopted to provide survivability in virtual networks is to provision 
both a working and a link/node disjoint backup path which can be switched to 
in the event of a failure in the working path.  We have developed a generic 
integer optimization model formulation for the layout of ATM working and 
backup VPs which results in the minimum bandwidth requirements.  This model 
allows for the incorporation of priorities (i.e., whether or not a VP is 
provisioned a backup) and specification of how many links and nodes may be 
shared between the working and backup paths if disjoint paths are not 
possible.  A second optimization model was developed for the network 
dimensioning problem where network reconfigurability was taken into 
consideration as well as the QoS requirement acceptable under a failure.

In a parallel effort, we have developed a new technique for the layout of
survivable multipoint (aka multicast) group communications in connection-
oriented (i.e., ATM) networks.  The technique is termed the Self-Healing
Virtual Ring (SHVR)  multicast and consists of two counter-rotating rings
made up of Virtual Circuits (VCs).  One ring is normally used for communication
with the second ring serving as a hot-standby to which traffic can be rerouted 
in the event of an attack.  A performance analysis comparing the SHVR approach 
with a similar hot-standby approach using shared multicast trees or VC Mesh 
groups shows that the SHVR approach requires less bandwidth and simpler 
signaling to provide survivability.  We have also developed a network 
dimensioning model for providing multicasting services by developing a 
k-shortest tree based concept.  We have implemented this model to determine 
network survivability design for networks with sizes up to 100 nodes.

The restoration time is critical in determining whether a user is provided
service continuity.  A major part of restoration time is the detection and
notification time.  We have initiated a measurement-based benchmarking study
of alarm detection and notification times in an ATM testbed laboratory.
This effort focuses on quantifying the time delay in lower layers
of the protocol stack notifying higher layers of a failure.


Selected Publications:

K. Balakrishnan, D. Tipper, and D. Medhi, "Routing Strategies for Fault Recovery
in Wide Area Networks," Proceedings of IEEE Military Communications
Conference (Milcom '95), San Diego, CA, November, 1995.

K. Balakrishnan, D. Tipper and J. Hammond, ``An Analysis of the Timing of
Traffic Restoration in Wide Area Communication Networks," Proceedings of
14th International Teletraffic Congress, Antibes, France, June, 1994.

K. Balakrishnan, S. Menon and D. Tipper, ``A Study of Issues Relating to
Traffic Restoration in Wide Area Communication Networks," Proceedings of IEEE
Southeastcon 94, Miami, FL, April, 1994.

R. Cotter, D. Medhi and D. Tipper, ``Traffic Backlog and Impact on Network
Dimensioning for Survivability for Wide-Area VP-based ATM Networks," Proceedings
of 15th International Teletraffic Congress, Washington, DC, June 1997.

T. Dahlberg, S. Ramaswamy and D. Tipper,``Survivability Issues in Wireless
Mobile Networks,'' Proceedings of First International Workshop on Mobile
and Wireless Communication Networks, Paris, France, May, 1997.

B. Jager and D. Tipper, `` On Fault Recovery Priority in ATM Networks,''
Proceedings of IEEE ICC '98, Atlanta, GA, June, 1998.

D. Medhi, ``A Unified Approach to Network Survivability for Teletraffic Networks:  
Models, Algorithms and Analysis," IEEE Trans. on Communications, Vol. 42, pp.  
534-548, 1994.

D. Medhi and R. Khurana, ``Optimization and Performance of Network
Restoration Schemes for Wide-Area Teletraffic Networks," Journal of Network
and Systems Management , Vol. 3, No. 3, pp. 265-294, September 1995.

D. Medhi and D. Tipper, ``Towards Fault Recovery and Management in Communication
Networks," Guest Editorial, Journal of Network and Systems Management,
Vol. 5, No. 2, June 1997.

A. Pitsillides, S. Nikolopoulos and D. Tipper,``Addressing Network Survivability
Issues by Finding the K-best Paths Through a Trellis Graph,'' Proceedings of
IEEE INFOCOM '97, Kobe, Japan, April 1997.

S. A. Shah and D. Medhi, ``Performance under a Failure of Wide-Area Datagram  
Networks with Unicast and Multicast Traffic Routing," Proc. of IEEE Military  
Communications Conference (MILCOM'98), Bradford, Mass, October 1998.

D. Tipper, J. Hammond, S. Sharma, A. Khetan, K. Balakrishnan, and S. Menon,
``An Analysis of the Congestion Effects of Link Failures in Wide Area Networks,"
IEEE Journal on Selected Areas in Communications, Vol.12, pp. 179-192, Jan 1994.

W.-P. Wang, D. Tipper, B. Jaeger and D. Medhi, ``Fault Recovery Routing in Wide
Area Packet Networks," Proceedings of 15th International Teletraffic Congress,
Washington, DC, June 1997.


-------

##  work supported in part by Defense Advanced Research Projects Agency
    grant F30602-97-1-0257 and National Science Foundation
    grant NCR-95-06652 .

++  author for correspondence, additional contact information:
    Email yurcik@tele.pitt.edu, telephone (412) 624-9411, FAX (412) 624-2788;
    supported in part by NASA Earth Systems Science grant  # NGT-30019,
    Defense Advanced Research Projects Agency grant #F30602-97-1-0257, and
    SAE International - The Engineering Society For Advancing Mobility Land
    Sea Air and Space