Onix Because the control platform simplifies the duties of Server 1 Server N both switches (which are controlled by the platform) and Network Control Logic Network Control Logic the control logic (which is implemented on top of the NIB NIB platform) while allowing great generality of function, Onix Switch Import / Export Distribution I / E Distribution I / E Switch Import / Export the control platform is the crucial enabler of the SDN paradigm. The most important challenges in building a production-quality control platform are: Management Connectivity Network Infrastructure Managed Physical Network Infrastructure • Generality: The control platform’s API must allow management applications to deliver a wide range of Figure 1: There are four components in an Onix controlled functionality in a variety of contexts. network: managed physical infrastructure, connectivity infrastructure, Onix, and the control logic implemented by the management application. This figure depicts two Onix instances • Scalability: Because networks (particularly in the datacenter) are growing rapidly, any scaling limita- coordinating and sharing (via the dashed arrow) their views of the underlying network state, and offering the control logic a tions should be due to the inherent problems of state read/write interface to that state. Section 2.2 describes the NIB. management, not the implementation of the control platform. of research that started with the 4D project  and continued with RCP , SANE , Ethane  and • Reliability: The control platform must handle equip- ment (and other) failures gracefully. NOX  (see [4,23] for other related work). While all of these were steps towards shielding protocol design from • Simplicity: The control platform should simplify the low-level details, only NOX could be considered a control task of building management applications. platform offering a general-purpose API.5 However, NOX did not adequately address reliability, nor did it give • Control plane performance: The control platform the application designer enough flexibility to achieve should not introduce significant additional control scalability. plane latencies or otherwise impede management The primary contributions of Onix over existing work applications (note that forwarding path latencies are thus twofold. First, Onix exposes a far more general are unaffected by SDN). However, the requirement API than previous systems. As we describe in Section 6, here is for adequate control-plane performance, not projects being built on Onix are targeting environments optimal performance. When faced with a tradeoff as diverse as the WAN, the public cloud, and the between generality and control plane performance, enterprise data center. Second, Onix provides flexible we try to optimize the former while satisficing the distribution primitives (such as DHT storage and group latter.4 membership) allowing application designers to implement control applications without re-inventing distribution While a number of systems following the basic mechanisms, and while retaining the flexibility to make paradigm of SDN have been proposed, to date there has performance/scalability trade-offs as dictated by the been little published work on how to build a network application requirements. control platform satisfying all of these requirements. To fill this void, in this paper we describe the design 2 Design and implementation of such a control platform called Understanding how Onix realizes a production-quality Onix (Sections 2-5). While we do not yet have extensive control platform requires discussing two aspects of its deployment experience with Onix, we have implemented design: the context in which it fits into the network, and several management applications which are undergoing the API it provides to application designers. production beta trials for commercial deployment. We 2.1 Components discuss these and other use cases in Section 6, and present some performance measures of the platform itself in There are four components in a network controlled by Section 7. Onix, and they have very distinct roles (see Figure 1). Onix did not arise de novo, but instead derives from a long history of related work, most notably the line • Physical infrastructure: This includes network switches and routers, as well as any other network 4There might be settings where optimizing control plane elements (such as load balancers) that support performance is crucial. For example, if one cannot use backup paths for an interface allowing Onix to read and write the improved reliability, one can only rely on a fine-tuned routing protocol. In such settings one might not use a general-purpose control platform, 5Only a brief sketch of NOX has been published; in some ways, but instead adopt a more specialized approach. We consider such settings this paper can be considered the first in-depth discussion of a NOX-like increasingly uncommon. design, albeit in a second-generation form.
Onixの思想 • Network Information Base (NIB) • ネットワークのステートはグラフで表現できる Category Purpose 1 Forwarding n Forwarding Host Link Engine Table Query Find entities. 1 Create, destroy Create and remove entities. 2 1 n Access attributes Inspect and modify entities. Network Node Port Notifications Receive updates about changes. Figure 2: The default network entity classes provided by Synchronize Wait for updates being exported to • アプリケーション特性に合わせた Onix’s API. Solid lines represent inheri DB tance, while dashed lines network elements and controllers. correspond to referential relation between entity instances. The Configuration Configure how state is imported to and exported from the NIB. • numbers on the dashed lines show the quantitative mapping Transactional persistent database relationship (e.g., one Link maps to two Ports, and two Pull Ask for entities to be imported Ports can map to the same Link). Nodes, ports and links on-demand. • One-hop, ev constitute en the tually network -consist topology. en All t, memor entity classes y-only DHT inherit the same base class providing generic key-value pair access. Table 1: Functions provided by the Onix NIB API. For example, there is a Port entity class that can network element and/or other Onix instances – no belong to a list of ports in a Node entity. Figure 2 ordering or latency guarantees are given. While this illustrates the default set of typed entities Onix provides – has the potential to simplify the control logic and make all typed entities have a common base class limited to multiple modifications more efficient, often it is useful to generic key-value pair access. The type-set within Onix is know when an update has successfully completed. For not fixed and applications can subclass these basic classes instance, to minimize disruption to network traffic, the to extend Onix’s data model as needed.6 application may require the updating of forwarding state The NIB provides multiple methods for the control on multiple switches to happen in a particular order (to logic to gain access to network entities. It maintains an minimize, for example, packet drops). For this purpose, index of all of its entities based on the entity identifier, the API provides a synchronization primitive: if called allowing for direct querying of a specific entity. It also for an entity, the control logic will receive a callback once supports registration for notifications on state changes the state has been pushed. After receiving the callback, or the addition/deletion of an entity. Applications can the control logic may then inspect the contents of the NIB further extend the querying capabilities by listening for and verify that the state is as expected before proceeding. notifications of entity arrivals and maintaining their own We note that if the control logic implements distributed indices. coordination, race-conditions in state updates will either The control logic for a typical application is therefore not exist or will be transient in nature. fairly straightforward. It will register to be notified on An application may also only rely on NIB notifications some state change (e.g., the addition of new switches and to react to failures in modifications as they would any ports), and once the notification fires, it will manipulate other network state changes. Table 1 lists available NIB- the network state by modifying the key-value pairs of the manipulation methods. affected entities. The NIB provides neither fine-grained nor distributed 3 Scaling and Reliability locking mechanisms, but rather a mechanism to request To be a viable alternative to the traditional network and release exclusive access to the NIB data structure architecture, Onix must meet the scalability and reliability of the local instance. While the application is given the requirements of today’s (and tomorrow’s) production net- guarantee that no other thread is updating the NIB within works. Because the NIB is the focal point for the system the same controller instance, it is not guaranteed the state and events, its use largely dictates the scalability and state (or related state) remains untouched by other Onix reliability properties of the system. For example, as the instances or network elements. For such coordination, number of elements in the network increases, a NIB that it must use mechanisms implemented externally to the is not distributed could exhaust system memory. Or, the NIB. We describe this in more detail in Section 4; for now, number of network events (generated by the NIB) or work we assume this coordination is mostly static and requires required to manage them could grow to saturate the CPU control logic involvement during failure conditions. of a single Onix instance.7 All NIB operations are asynchronous, meaning that This and the following section describe the NIB updating a network entity only guarantees that the update distribution framework that enables Onix to scale to very message will eventually be sent to the corresponding 7In one of our upcoming deployments, if a single-instance 6Subclassing also enables control over how the key-value pairs are application took one second to analyze the statistics of a single Port stored within the entity. Control logics may prefer different trade-offs and compute a result (e.g., for billing purposes), that application would between memory and CPU usage. take two months to process all Ports in the NIB.