Unique Architecture for 8-Way to 64-Way SMP Pentium III Xeon
 Servers Provides
Breakthrough Scalability and Performance
 
Abstract

To scale a system to greater than four Pentium III Xeon processors, the limitations of the P6x system bus must be overcome. ACCENT achieves this feat with its Adaptive Memory Multidimensional Crossbar architecture, by providing sufficient memory and I/O subsystem bandwidth for the system to scale well up to sixty-four processors. This is accomplished by supporting 8 dual processor P6x system buses operating in parallel, connected with each other by a crossbar switching array, each interconnected with three other complexes through a second dimensional crossbar.. This scalability provides true enterprise class performance for applications such as web servers, electronic commerce, collaborative computing, database, and data warehousing. As applications require more performance, the architecture provides a smooth growth path.

TABLE OF CONTENTS

Symmetric Multiprocessing with the Pentium XEON and Windows NT/2000
Architecture Overview
High-performance Memory Subsystem
Address Reorder Buffer
Balanced P6 Bus Bridge I/O Architecture
Full Standards-Compliance
Major Open Standards

Symmetric Multiprocessing with the Pentium Xeon and Windows NT/2000

With its high-performance and low cost, Intel’s Pentium XEON and IA-64 families sets a milestone in the evolution of computing, allowing companies to apply PC economics to the enterprise, further deploy information technology, and become more competitive.

Intel designed the Pentium XEON processor, with its integrated on-board cache, for symmetric multiprocessing (SMP). While SMP has long existed in mainframes, minicomputers, and workstations, adding processors to smaller systems provides large-system performance and cuts the cost of enterprise computing. Further, the multiprocessing capabilities of Windows NT/2000 gives SMP systems real-world gains for standard server applications. Low cost, standard interfaces, integration with desktop PCs, and acceptance are making Windows NT/2000 SMP servers the preferred enterprise platform.

To date, standard motherboards have been available with up to four Pentium XEON processors in an SMP configuration. While these systems provide substantially more performance than uniprocessor systems, more and more applications demand performance beyond the capability of these four processor systems. Since the P6 bus can normally only support four processors, scaling beyond four has been a challenge for the industry. ACCENT’s eight way architecture was created to address these limits, providing the performance demanded by modern database Internet servers, electronic commerce, collaborative computing, and data warehousing.

Architecture Overview

The ACCENT design team optimized the Adaptive Memory Multidimensional Crossbar architecture to scale up to sixty-four processors while maintaining standards-compatibility for the resulting system.

At high system utilization, the available bandwidth on a single P6x bus limits system throughput. Many data intensive applications, such as transaction processing, electronic commerce, and other data-intensive commercial applications can easily saturate a single P6 bus. By employing two independent P6x buses per 8 CPUs, with a second pair of busses and a bridge between them, and high performance memory and I/O systems, the Adaptive Memory MultiDimensional Crossbar architecture allows much higher total system throughput.
(See figure 1.)

How can the Adaptive Memory Crossbar out-perform other multiprocessing schemes? The answer lies in these design innovations:

Figure 1. Introduction to the Architecture

High-performance Memory Subsystem

The memory subsystem is capable of sustaining 1.066 GB per second per line (8 lines per subsystem) of memory bandwidth. It is capable of supporting the each pair of independent P6x buses at their full bandwidth of 533 MB/s. The memory system provides sixteen interleaved banks of memory, allowing for higher actual throughput under tough workloads. Typical Pentium Xeon memory systems provide only two banks of memory.


Four application-specific IC (ASIC) chips implement the Adaptive Memory Crossbar layer for the Multidimensional Axis (Z-Memory). (See Figure 2.) The data ASIC chip switches between the 8 key data buses and the third bus connected to memory (Rambus). The address ASIC controls data switching, checks coherency, and routes transactions between the buses. High-performance synchronous DRAMs are configurable up to 64 GB using 512 MB DIMMs. All P6 buses access the same global memory system.

Figure 2. Memory Subsystem

Unlike typical memory controllers that process reads and writes in the order received, this design speeds up the system by favoring read operations and delaying write instructions. Since read requests often force the processor to "stall" while waiting for data, memory transactions are reordered to allow reads to complete before pending write transactions. When there are no remaining read transactions, the subsystem processes up to four buffered writes while blocking reads. Writes continue until a read transaction arrives.

The memory system is highly pipelined, providing maximum bandwidth. Whenever possible, the pipeline is collapsed to raise performance by reducing latency. This dynamically collapsible pipeline scheme keeps the controller aware of loads on the memory and of when to reduce the pipeline depth. Advanced ASIC semiconductor and packaging technology is employed to provide maximum performance and integration at reasonable cost. The address ASICs reside in 8x 388-pin enhanced plastic ball grid array with 65,000 gates plus 7.5 KB of RAM each.

Address Reorder Buffer

The Address Reorder Buffer lowers overhead on the memory system and P6 buses. Memory requests go into memory through this buffer, and the memory controller first processes banks that are not busy. When a bank is busy, the Address Reorder Buffer enables the controller to prioritize the requests and re-order them to optimize for bandwidth. This process fills the data buffer as fast as possible. (See Figure 3.)

Transactions may be reordered according to the following rules:

1.Oldest requests go first, subject to the requested memory bank being free.
2. Reads have priority. If there are no read requests, then a waiting write request is accepted.
3. Once writing starts, then writes have priority up to four writes to minimize bus turn-arounds.
 

Figure 3. Address Reorder Buffer Reading in each of 64 layer processing subsystems

Transactions requiring the same bank must wait in line, but those needing an idle bank jump ahead. The system must guarantee that data be returned to each of the processors in the order that they were issued. The controller moves data as quickly as possible from memory to the buffers inside the ASIC’s, then returns the data to the processors in the necessary order. This capability enables the memory subsystem to approach its theoretical bandwidth of 1.066 GB per second on a sustained basis. By reordering within the memory system, overall application performance, where memory bandwidth is at a premium, increases 30 to 40 percent.

Balanced P6 Bus Bridge I/O Architecture

The Adaptive Memory Crossbar uses standard Intel 850 PCI bridges between the four PCI buses and the 16 P6 buses. In the ACCENT PRO2--1 system implementation, two PCI buses are for add-in cards, offering eight PCI card slots, four per bus. And two are used for built-in UltraSCSI channels. The consequent balanced approach meets all needs of system I/O, memory, and processor.

This design provides an aggregate speed of 266 MB-per-second from disk to memory -- enough bandwidth to supply enterprise applications. This PCI bridge-to-bus design operates more efficiently than other schemes that put I/O on a third P6 bus. These schemes have more overhead, latency, complexity, and cost.

The other two PCI buses are connected to seven UltraSCSI buses, six of which support 24 internal disks. The remaining UltraSCSI bus is for backup and storage devices. Additional I/O comes from standard devices containing BIOS, NVRAM, timers, real time clock, floppy disk, serial/parallel ports etc. The power supply, fans, and temperature are monitored over an I2C bus.

Full Standards-Compliance

The Adaptive Memory Crossbar complies with Intel’s MPS 1.5 symmetric multiprocessing specification, P6x system bus specifications, and Windows NT/2000 server software interfaces.

Major Open Standards

The ACCENT crossbar adheres to:

The I/O system relies on the PCI standard, ensuring compatibility with all standard PCI peripheral devices. By providing UltraSCSI directly on board, high performance and compatibility are achieved with a broad array of storage devices.

For more information, contact:

Computer Sales & Service
1-888-ASK-CSS1
Web site: www.CSSusa.NET
 

Legal Notice

Computer Sales & Service believes information in this publication is accurate as of its publication date; such information is subject to change without notice. Products and technologies described in this publication may change due to design enhancements and advances in technology. compamerica.com  is not responsible for any inadvertent errors. ACCENT is a trade name and usemark of compamerica.com.

All trade names are trademarks of their respective companies.

Copyright 1997-2001 compamerica.com
All Rights Reserved. Printed in USA

Originally Developed by:

 

Now Marketed by: