Run traditional enterprise workloads plus newer demanding AI workloads both on‑premises and in the cloud with an end‑to‑end solution optimized by Intel and VMware

(1)

Executive Summary

This reference architecture describes a hybrid cloud solution that can handle existing enterprise workloads (like using SQL and NoSQL databases for

transactional processing) but also extends capabilities to include compute- and memory-hungry artificial intelligence (AI) jobs. From data warehousing to machine learning and deep learning, the Hybrid Cloud Data Analytics Solution is just what enterprises need as data volumes continue to swell and data centers are increasingly pressured to provide real-time insights to launch their businesses into the future.

This solution is a unique combination of the latest hardware innovations from Intel, VMware’s broad portfolio of virtualization software products, container orchestration, and AI tools optimized to run on 2nd Generation Intel® Xeon Scalable processors. These processors feature built-in inferencing acceleration through Intel Deep Learning Boost (Intel DL Boost) with Vector Neural Network Instructions (VNNI).

With this end-to-end solution, enterprises can quickly operationalize database processing and AI to discover insights hidden in their data—and scale to meet future needs.

Analytics/AI

Cloud Infrastructure: VMware Cloud Foundation

Intel Data Center Blocks For Cloud – VMware (vSAN ReadyNodes)

VMware vSphere Cluster VMware NSX-T

VMware vCenter Server 6.7 U VMware vSAN 6.7 EPVMware vSphere Cluster VMware NSX-T

VMware vCenter Server 6.7 U

VMware vSAN 6.7 EP VMware ESXi Hypervisor 6.7 EP

vSAN ReadyNode VRN2208WFAF82R vSAN ReadyNode VRN2208WFAF83R Intel Architecture Optimized Building Blocks

Intel Distribution for Python

Deep Learning Reference Stack Intel MKL Intel MKL-DNN

NoSQL Databases SQL Databases H2O.ai

VMware Enterprise PKS Orchestration VMware Software-Defined

Data Center Manager

Intel SSD DC P4510 Series NVM-e Based

3D NAND

2nd Gen Intel Xeon Scalable Processors 6230-6248Gold Platinum 8260 Intel Optane™ SSD DC

P4800X Series Intel Ethernet 700 Series

Consistent High-Performance Architecture Based on Intel Technology Machine-Learning

Software Data Warehousing

Figure 1. Building blocks of the reference architecture for the Hybrid Cloud Data Analytics Solution with vSAN ReadyNodes.

Run traditional enterprise workloads plus newer demanding AI workloads both on‑premises and in the cloud with an end‑to‑end solution optimized by Intel and VMware

Easily Consumable Hybrid Cloud Data Analytics Solution

Intel Builders

Enterprise Data Center

Intel Data Center Group Authors Patryk Wolsza

Cloud Solutions Architect Karol Brejna

Senior Architect Marcin Hoffmann Cloud Solutions Engineer

Intel Data Center Group Contributors Lokendra Uppuluri

Software Architect Ewelina Kamyszek

Undergraduate Intern Technical Marek Małczak

Cloud Solutions Engineer Lukasz Sitkiewicz Software Engineer VMware Contributor Enrique Corro Fuentes Data Science Staff Engineer, Office of the CTO

2nd Generation Intel Xeon Scalable Processors The Hybrid Cloud Data Analytic Solution features 2nd Generation Intel Xeon Gold and Platinum processors.

(2)

Why Hybrid Cloud for Machine Learning?

While many enterprises choose to run certain workloads in the public cloud, other workloads are better suited to staying on-premises. Increasingly, enterprises want a hybrid cloud option for flexibility and business agility, especially as artificial intelligence (AI) and machine learning workloads become increasingly prevalent. Forbes says that, “Artificial intelligence and machine learning will be the leading catalyst driving greater cloud computing adoption by 2020.”¹ VMware and Intel are firm proponents of a hybrid cloud strategy as the most effective way to tackle the requirements of machine-learning development and execution. For example, training machine-learning models using regulated datasets containing sensitive data can be done on-premises;

once trained, the models can be affordably deployed in the public cloud to take advantage of special capabilities such as extended geographical coverage, redundancy across availability zones, and on-demand increased capacity for seasonal demand. What’s more, a cloud environment such as VMware Cloud Foundation on Amazon Web Services (AWS) can provide access to the latest innovative Intel architecture-based high-performance infrastructure that can handle ever-larger data warehouses and computationally intense workloads that benefit from hardware acceleration.

The hybrid cloud environment can strike the right balance between cost and the feature set of an IT infrastructure that can support successful adoption of AI.

A hybrid cloud strategy for AI relies on the assumption that machine-learning workloads are portable; that is, they will run properly at any point in the hybrid cloud deployment without the need of laborious modifications.

The combination of VMware Cloud Foundation (deployed on premises) and VMware Cloud on AWS solves machine- learning workload mobility challenges by delivering a hybrid cloud service that integrates VMware’s Software-Defined Data Center (SDDC) technologies for compute, storage, and network virtualization products. This integration enables companies to use the same tools and skills to operate VMware SDDCs deployed both on-premises and in the public cloud (see Figure 2).

Solution Overview

This reference architecture presents configuration details for building a hybrid cloud solution that can handle existing enterprise workloads—such as SQL and NoSQL databases for transaction processing—but can also extend capabilities to include compute- and memory-intensive AI jobs.

The reference architecture consists of three layers (Figure 1):

• Intel Data Center Blocks for Cloud – VMware (vSAN ReadyNodes), which contain a hardware foundation from Intel

• Cloud/virtualization technology from VMware

• Application building blocks optimized for Intel architecture The remainder of this document provides detailed

configuration information for building a unified cloud solution through which an organization can run applications hosted on both VMs and containers located in an on-premises data center as well as in the cloud. The hybrid cloud nature of the solution allows enterprises to extend available resources and easily migrate workloads from on-premises to the cloud.

Software Overview

The solution consists of:

• VMware Cloud Foundation, which provides a management domain and the ability to create and manage workload domains.

• VMware NSX-T Data Center networking virtualization solution, which enables software-defined networking in virtualized environments.

• VMware Enterprise PKS, which delivers a native Kubernetes solution to the stack.

VMware Cloud Foundation deployed with VMware Enterprise PKS offers a simple solution for securing and supporting containers within existing environments that already support VMs based on VMware ESXi, without requiring any retooling or rearchitecting of the network.

With this easy-to-deploy and comprehensive solution, enterprises can quickly operationalize database processing and AI to unlock the insights hidden in their data—and scale the solution as future needs dictate. The hybrid cloud capability provides flexibility in workload placement as well as business agility.

Hardware Overview

For fast data analytics and inferencing, the hardware for the solution can scale from a single rack of just a few servers to about 1,000 servers with the following components:

• Workload-optimized 2nd Generation Intel Xeon Scalable processors with support for Intel Deep Learning Boost (Intel DL Boost) with Vector Neural Network Instructions (VNNI)

• High-capacity Intel DC 3D NAND SATA SSDs for the vSAN capacity tier, which can scale to PBs if necessary

• Low-latency Intel Optane™ DC SSDs for the vSAN caching tier

• Reliable, fast Intel Ethernet networking components

Solution Architecture Details

This section describes the building blocks in each of the reference architecture’s layers: vSAN ReadyNodes with Intel hardware, cloud infrastructure, and analytics and AI building blocks. For the complete bill of materials for the Base and Plus configurations, refer to Table 1 on page 8.

Intel Data Center Blocks for Cloud – VMware (vSAN ReadyNodes) with Intel Hardware

Intel Data Center Blocks make it easier to adopt and qualify the latest Intel technology, helping you address the demands of today’s data centers. You can choose a workload-optimized, pre-configured system or customize a server for your unique needs. Intel Data Center Blocks for Cloud are pre-certified, fully validated purpose-built systems designed to simplify and accelerate cloud deployments. In particular, Intel Data Center Blocks for Cloud – VMware are certified vSAN ReadyNodes.

This reference architecture uses two Intel Data Center Blocks for Cloud – VMware: Intel Server System VRN2208WFAF82R and Intel Server System VRN2208WFAF83R.

2nd Generation Intel Xeon Scalable Processors Intel 2nd Generation Intel Xeon Scalable processors are designed for the most demanding data-centric and in- memory database workloads. These processors incorporate a performance-optimized multi-chip package that delivers up to 48 cores per CPU, 12 DDR4 memory channels per socket, and support for Intel Optane DC persistent memory DIMMs, which provide large-capacity memory to the system.

(4)

For the “Base” configuration, the Intel Xeon Gold 6248 processor provides an optimized balance of price and performance in a mainstream configuration. The Intel Xeon Platinum 8260 processor powers the “Plus” configuration, which is designed for high-density deployments or more demanding, latency-sensitive environments. Even higher-performance processors can also be used in either configuration.

Intel SSD Data Center Family

The Intel Optane SSD DC P4800X series is the first product to combine the attributes of memory and storage. With an industry-leading combination of high throughput, low latency, high quality of service (QoS), and high endurance, this innovative solution is optimized to break through data access bottlenecks. The Intel Optane SSD DC P4800X and P4801X accelerate applications with fast caching and fast storage to increase scale per server and reduce transaction costs for latency sensitive workloads. In addition, the Intel Optane DC P4800X helps enable data centers to deploy bigger and more affordable datasets to gain new insights from large memory pools.

VMware vSAN performs best when the cache tier is using fast SSDs with low latency and high endurance. Workloads that require high performance can benefit from empowering the cache tier with the highest-performing SSDs rather than mainstream Serial ATA (SATA) SSDs. Therefore, in this reference architecture, Intel Optane DC SSDs with Non-Volatile Memory Express (NVMe) are used to power the cache tier.

Intel Optane DC SSDs offer IOPS per dollar with low latency, coupled with 30 drive-writes-per-day (DWPD) endurance, so they are ideal for write-heavy cache functions.²

The vSAN capacity tier is served by Intel DC 3D NAND SSDs with NVMe, delivering optimized read performance with a combination of data integrity, performance consistency, and drive reliability.

Intel Ethernet Connections and Intel Ethernet Adapters The Intel Ethernet 700 series accelerates the performance of VMware vSAN platforms powered by Intel Xeon Scalable processors, delivering validated performance ready to meet high-quality thresholds for data resiliency, service reliability, and ease of provisioning.^{3, 4, 5, 6}

Intel Optane DC Persistent Memory

Intel Optane DC persistent memory represents a new class of memory and storage technology that allows organizations to maintain larger amounts of data closer to the processor with consistently low latency and near-DRAM performance.

Organizations can use Intel Optane DC persistent memory with VMware vSAN deployments to cost effectively expand the capacity of memory available to support more or larger VMs in virtual desktop infrastructure (VDI) deployments, or higher quantities of “hot” data available for processing with in-memory databases, analytics, and other demanding workloads.

Physical Networking Layer

This reference architecture uses two models of network switches:

• Data Plane: 2x Arista DCS-7060CX2-32S-R

• Management Plane: 1x Arista DCS-7010T-48-R An enterprise-grade router solution is also necessary to ensure adequate routing capabilities for the multiple virtual local area networks (VLANs) that are required in the solution.

Cloud Infrastructure

VMware Enterprise PKS

VMware Enterprise PKS is a container services solution that enables Kubernetes to operate in multi-cloud environments.

VMware Enterprise PKS simplifies the deployment and management of Kubernetes clusters with Day 1 and Day 2 operations support. VMware Enterprise PKS manages container deployment from the application layer all the way to the infrastructure layer, according to the requirements for production-grade software using BOSH and Pivotal Ops Manager. VMware Enterprise PKS supports high availability, autoscaling, health-checks, and self-repairing of underlying VMs and rolling upgrades for the Kubernetes clusters.

VMware Cloud Foundation and VMware Cloud on AWS VMware Cloud Foundation is a unified SDDC platform for both private and public clouds. It brings together a hypervisor platform; software-defined services for compute, storage, network, and security; and network virtualization into an integrated stack whose resources are managed through a single administrative tool. VMware Cloud

Foundation provides an easy path to hybrid cloud through a simple, security-enabled, and agile cloud infrastructure on premises and as-a-service public cloud environments.

VMware Cloud on AWS is a highly scalable, secure, hybrid cloud service that enables users to run VMware SDDC on AWS with enterprise tools like vSphere, vSAN, and NSX as a managed service. It allows organizations to seamlessly migrate and extend their on-premises VMware-based environments to the AWS cloud.

VMware SDDC Manager

VMware SDDC Manager manages the start-up of the Cloud Foundation system, creates and manages workload domains, and performs lifecycle management to ensure the software components remain up-to-to date. SDDC Manager also monitors the logical and physical resources of VMware Cloud Foundation.

VMware vSphere

VMware vSphere extends virtualization to storage and network services and adds automated, policy-based provisioning and management. As the foundation for VMware’s complete SDDC platform, vSphere is the starting point for building your SDDC.

(5)

VMware HCX

VMware HCX extends continuous, hybrid cloud capabilities to VMs. It enables customers to migrate workloads between public clouds and data centers without any modification to application or VM configuration (see Figure 3). It provides full compatibility with the VMware software stack and helps make the migration simple, secure, and scalable.

The HCX Multi-Site Service mesh provides a security-enabled pipeline for migration, extension, and VM protection between two connected VMware HCX sites. It can be used to extend VLANs and retain IP and MAC addresses, as well as existing network policies, during migration between two sites. It also enables flexibility when planning complex, growing workloads across physical sites.

Analytics and AI Building Blocks

Enterprises need high-performance data analytics and AI to remain competitive. They require flexible solutions that can run traditional data analytics and AI applications. The Hybrid Cloud Data Analytics Solution includes performance optimizations that take advantage of Intel hardware in a VMware infrastructure. These building blocks enable enterprises to quickly operationalize analytics, AI, and machine-learning workloads. Because the building blocks are already optimized for Intel architecture, you don’t have to spend days or weeks fine tuning parameters. And using any of the building blocks typically requires either no or minimal changes to your applications (such as adding a single line of code).

Deep Learning Reference Stack

The Deep Learning Reference Stack (see Figure 4) is an integrated, highly-performant open source and containerized stack optimized for Intel Xeon Scalable processor-based platforms. This open source community release is part of an effort to ensure AI developers have easy access to all features and functionality of Intel platforms. Highly tuned and built for cloud-native environments, the Deep Learning Reference Stack enables developers to quickly prototype by reducing complexity associated with integrating multiple software components, while still giving them the flexibility to customize their solutions.

The Deep Learning Reference Stack includes highly tuned software components across the OS (Clear Linux), deep- learning frameworks (TensorFlow and PyTorch), Intel Distribution of OpenVINO™ toolkit, and other software components. The following sections provide a few details about some of the components in the Deep Learning Reference Stack.

• Intel Distribution of OpenVINO Toolkit. The Intel Distribution of OpenVINO toolkit (short for Open Visual Inference and Neural Network Optimization) provides developers with improved neural network performance on a variety of Intel processors and helps to further unlock cost-effective, real-time vision applications. The toolkit enables deep-learning inference and easy heterogeneous execution across multiple Intel architecture-based platforms, providing implementations across cloud architectures to edge devices and across all types of computer vision accelerators—CPUs, GPUs, Intel Movidius™

Neural Compute Sticks, and Intel field-programmable gate arrays (Intel FPGAs)—using a common API. The OpenVINO toolkit of functions and pre-optimized kernels helps speed time to market.

• Optimized Versions of TensorFlow and PyTorch. The Deep Learning Reference Stack’s version of TensorFlow is an end- to-end open source platform for machine learning that is optimized to run on Intel hardware. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers explore state-of-the-art machine learning and enables developers to easily build and deploy machine-learning-powered applications.

PyTorch is a Torch-based open source machine-learning library for Python that developers can use for deep- learning applications such as natural language processing.

Developers use this scientific computing package as a replacement for NumPy.

Internet

NetworksDC MPLS/Private Direct Connect

HCX Interconnect

VM MOBILITY PIPELINE

NETWORK EXTENSION PIPELINE

TARGET

HCX ENVIRONMENT

SOURCE

HCX ENVIRONMENT

MPLS = MULTIPROTOCOL LABEL SWITCHING

Figure 3. VMware HCX enables VM mobility and network extension. Source: docs.vmware.com/en/VMware-HCX/services/user-guide/

GUID-5D2F1312-EB62-4B25-AF88-9ADE129EDB57.html

Intel Hardware

Hypervisor – VMware ESXi OS – Linux

Container Runtime

PyTorch Python Clear Linux OS TensorFlow OpenVINO

PyTorch Python Clear Linux OS TensorFlow

OpenVINO

Container Runtime

PyTorch Python Clear Linux OS TensorFlow OpenVINO

PyTorch Python Clear Linux OS TensorFlow

OpenVINO

OS – Linux OS – Linux Kubeflow Kubernetes

Software Optimized for Intel Architecture Infrastructure

Container Controllers (Optional)

Figure 4. Deep Learning Reference Stack in a multi‑node configuration.

(6)

• Intel Distribution for Python. Intel Distribution for Python is a ready-to-use, integrated package that delivers faster application performance on Intel architecture-based platforms. With it you can:

n Accelerate compute-intense applications—including numeric, scientific, data analytics, and machine-

learning applications—that use NumPy, SciPy, scikit-learn, and more.

n Optimize performance with native performance libraries and parallelism techniques.

n Implement and scale production-ready algorithms for scientific computing and machine-learning workloads.

• Intel MKL. Intel MKL optimizes code with minimal effort for future generations of Intel processors. It is compatible with a wide choice of compilers, languages, operating systems, and linking and threading models. Intel MKL features highly optimized, threaded, and vectorized math functions that maximize performance on each processor family. It uses industry-standard C and Fortran APIs for compatibility with popular Basic Linear Algebra Subprograms (BLAS), LAPACK, and Fastest Fourier Transform in the West (FFTW) functions. No code changes are required to use Intel MKL, and it automatically dispatches optimized code for each processor without the need to branch code. You can also take advantage of Priority Support, which connects you directly to Intel engineers for confidential answers to technical questions.

• Intel MKL‑DNN. Intel MKL-DNN is an open source performance library for deep-learning applications that includes basic building blocks for neural networks, optimized for Intel architecture. Intel MKL-DNN is intended for deep-learning applications and framework developers interested in improving application performance on Intel processors. Note that Intel MKL-DNN is distinct from Intel MKL, which is a general math performance library.

H2O.ai

This solution demonstrates H2O, a popular machine-learning platform that has been optimized for Intel architecture. H2O Driverless AI is a high-performance platform for automatic development and rapid deployment of state-of-the-art predictive analytics models. H2O Driverless AI automates several time-consuming aspects of a typical data science workflow, including data visualization, model optimization, feature engineering, predictive modeling, and scoring pipelines.

When combined with the Intel Data Analytics Acceleration Library (Intel DAAL), H2O Driverless AI can take advantage of algorithms optimized for Intel architecture, such as the XGBoost algorithm.

Data Warehousing Building Blocks

Data warehouses are considered one of the core components of business intelligence. They are a central location to store data from one or more disparate sources as well as current and historical data. Numerous methods can be used to organize a data warehouse. Hardware, software, and data

resources are the main components of this architecture, and VMware Cloud Foundation is an excellent platform to deploy data warehousing solutions (see Figure 5).

In addition to traditional SQL data warehouses, VMware Cloud Foundation also accommodates NoSQL databases, and it’s an efficient platform for running the Apache Hadoop framework and all of its related services that support big data and data mining. You can run Apache services like Hive, Kafka, and HBase to achieve Bigtable-like capabilities on top of Hadoop and the Hadoop Distributed File System (HDFS), and easily scale them according to your temporary needs.

Everything runs on vSAN, which provides additional policy configuration options in terms of data redundancy and can be used by both platform administrators and end users (such as when processing persistent volume claims upon Kubernetes deployments) to obtain the maximum usage of the entire platform storage system.

Platform‑Verified Workloads

This section discusses performance testing results for deep learning and data warehousing.

Deep Learning on VMware Cloud Foundation

Image classification is one of the most popular use cases for deep learning. Our tests benchmarked the ResNet50 inferencing model, using both TensorFlow and the OpenVINO toolkit from the Deep Learning Reference Stack. The accuracy of the model was validated using the ImageNet dataset. For detailed instructions on downloading the model, installing the frameworks, and running the benchmark, refer to the

“Deep Learning“ section in Appendix A – Solution Features Validation and Benchmarking; this appendix also includes more extensive discussion of our benchmarking results.

Data Warehousing

Virtual Machines (VMs)

Software‑Defined Data Center

VMware Cloud Foundation

Software‑Defined Networking

VMware NSX

Software‑Defined Storage

VMware vSAN External Datastore

Management

VMware vCenter

Hypervisor

VMware ESXi APPOS APP

OS APP OS APP

OS APP

… OS

ML Workloads

Containers

…

ESXi 1 ESXi 2 ESXi 3 … ESXi n

Figure 5. VMware Cloud Foundation is an excellent platform for all your data analytics and machine‑learning workloads.

(7)

Deep-learning workloads are compute- and memory- intensive. Therefore, sizing the resources of the VMs is crucial to obtain optimal performance. We ran various experiments with VM sizes ranging from two virtual CPUs (vCPUs) to 80 vCPUs. We determined that for ResNet50 v1 inference workloads, a VM with 16 vCPUs provides

the optimal performance for both the Base and Plus configurations of the Hybrid Cloud Data Analytics Solution.

We measured throughput scaling by adding new VMs, each with 16 vCPUs, and measuring throughput for the following batch sizes: 1, 16, 32, 64, and 128. Figure 6 illustrates the results for batch sizes 1 and 128 for clusters with 1, 2, 3, and 4 VMs, using the Base configuration. The measured data has been normalized with batch size 1 on 1 VM as the baseline.

As you can see, in a multimode system, as the VMs running the workload are scaled, the overall throughput of the ResNet50 v1 workload scales efficiently.⁷ This demonstrates the effectiveness of the ESXi scheduler.

Data Warehousing on VMware Cloud Foundation

Our data warehouse test used Microsoft SQL Server 2017, with both the Base and Plus configurations of the reference architecture. Microsoft SQL Server is a relational database management system (RDBMS) that uses Transact-SQL (T-SQL) as its query language. T-SQL is an SQL language extension that allows the use of basic programming constructions such as variables, loops, and conditional instructions.

For the benchmark, we used HammerDB—an open source tool that uses specially generated tables and virtual users to measure and test database workloads. HammerDB has several types of database engines. In our tests, we used the Microsoft SQL Server engine because it’s a popular choice for benchmarking tests. Although it’s possible to run a HammerDB instance directly on the SQL Server, we recommend creating another instance of Microsoft Windows

Server and test SQL Server databases remotely. For more information, visit the HammerDB documentation and the GitHub HammerDB web page.

HammerDB is an online transactional processing (OLTP) workload that allowed us to measure maximum throughput for the workload. Key performance indicators (KPIs) from Microsoft SQL Server used to evaluate the results are as follows:

• Transactions per minute (TPM) from HammerDB

• Latency CPU Time:Requests (the time the CPU spent on the request)

• CPU Time:Total (the total time the CPU spent on the batch) In addition, we considered KPIs from the infrastructure to measure resource saturation (CPU, memory, storage, and networking).

The intention of the benchmarking was to compare the Base and Plus configurations, using two different VMware Cloud Foundation workload domains, under a load and simultaneously measure a maximum density from the client perspective without violation of the service level agreement (SLA).

In our case, the SLA metrics are based on latency measurements:

• SQL latency CPU Time:Requests 99^th percentile < 5 ms

• SQL latency CPU Time:Total 80^th percentile < 20 ms Our tests confirmed that the Plus configuration can deliver up to 20 percent more throughput; and while spawning up to 50 percent more data warehouses it can achieve up to 50 percent higher user density without violating the SLA, compared to the Base configuration (see Figure 7).⁸ See Appendix D – Data Warehouse Benchmarking for full benchmarking details.

10x 8x 6x 4x 2x 0

Virtual Machines (VMs)

Relative Throughput

1 2 3

Batch Size = 1 Batch Size = 128

2.67

1 1.77 2.65

3.72

7.98

NORMALIZED BASELINE

ResNet50 v1 Throughput Scaling with VMs

Normalized Baseline: Batch Size = 1, 1 VM (16 vCPUs)

Figure 6. Scalability of the overall throughput of the ResNet50 v1 workload.

10x

1x

0 Normalized Per

formance _Base ^Plus _Base

2S Intel Xeon Gold 6248 2S Intel Xeon Platinum 8260

20

^{uP tO}

%

fASteR

Transactions Per Minute

HammerDB

50

^{uP tO}

%

hiGheR DenSitY

Online Transaction Processing on SQL 2017 Data Warehouses

and Users Plus

Figure 7. Data warehousing benchmarks reveal that the Plus configuration of the Hybrid Cloud Data Analytics Solution can increase performance and density, compared to the Base configuration.

(8)

Bill of Materials

Hardware Specifications

The Hybrid Cloud Data Analytics Solution can scale from a single rack with just eight servers up to 15 workload domains with a total of 960 servers. This reference architecture uses 12 Intel architecture-based servers (see Table 1). Each rack consists of two top-of-rack (ToR) Arista switches, and a single out-of-band Arista management switch in a single rack.

Additional racks can be added at the time of purchase or later.

In multi-rack deployments, an additional set of spine switches are recommended (usually installed in the second rack). With the introduction of VMware Cloud Foundation 3.0 and Bring You Own Network (BYON), VMware no longer certifies switch compatibility with VMware Cloud Foundation.

The initial software imaging requires an additional server or laptop running virtualization software and a privately managed switch. These components are not part of the solution and are not needed after completing the VMware Cloud Foundation imaging and start-up process.

To demonstrate support for heterogeneous hardware configurations, this reference architecture uses two types of servers, which use different CPU models, memory size, and number of drives. Customers can modify the Base vSAN ReadyNode configuration to some extent, adding more memory or drives or replacing the CPU with a higher core- count or better clock-speed model. The general rules are described in the blog, “What You Can (and Cannot) Change in a vSAN ReadyNode.”

For the full vSAN ReadyNode hardware specification, see Intel Data Center Blocks for Cloud – vSAN Ready Node: System Deployment and Configuration Guide.

Software and Firmware Specifications

This reference architecture consists of two main software component suites: VMware Cloud Foundation and VMware Enterprise PKS. VMware Enterprise PKS requires multiple components and supporting infrastructure. In addition, several networking services like an enterprise NTP server and a DNS server are needed for seamless integration with the external networks and global time synchronization. For a complete list of requirements and prerequisites, refer to the Table 1. Hardware Bill of Materials for the Hybrid Cloud Data Analytics Solution Reference Architecture

MANAGEMENT CLUSTER

4 Nodes BASE WORKLOAD DOMAIN

4 Nodes PLUS WORKLOAD DOMAIN

4 Nodes

Part Description Qty Part Description Qty Part Description Qty

Base SKU Intel Server System

VRN2208WFAF82R 1 Intel Server System

VRN2208WFAF83R 1

Mainboard Intel Server Board S2600WF0R 1 Intel Server Board S2600WF0R 1 Intel Server Board S2600WF0R 1 CPU Intel Xeon Gold 6230 processor

(20 cores, 2.10 GHz) 2 Intel Xeon Gold 6248 processor

(20 cores, 2.50 GHz) 2 Intel Xeon Platinum 8260

processor (24 cores, 2.40 GHz) 2 Memory 32 GB RDIMM DDR4-2933 12 32 GB RDIMM DDR4-2933 12 32 GB RDIMM DDR4-2666 24 Caching Tier 375 GB Intel Optane SSD DC

P4800X Series (PCIe v4 U.2) 2 375 GB Intel Optane SSD DC

P4800X Series (PCIe v4 U.2) 4 Capacity Tier 4 TB Intel SSD DC P4510 Series

(2.5” NVMe U.2) 6 4 TB Intel SSD DC P4510 Series

(2.5” NVMe U.2) 12

Boot Device 480 GB Intel SSD D3-S4510 Series

(M.2, 80 mm) 1 480 GB Intel SSD D3-S4510 Series

(M.2, 80 mm) 1

NIC Intel Ethernet Converged Network

Adapter XXV710-DA2 1 Intel Ethernet Converged

Network Adapter XXV710-DA2 1 Intel Ethernet Converged

Network Adapter XXV710-DA2 1

Table 2. VMware Cloud Foundation Main Products and Services. For other components refer to: VMware Cloud Foundation Release Notes.

COMPONENT VERSION BUILD

VMware Cloud Foundation Bundle 3.8.0 14172583

VMware Cloud Builder VM 2.1.0.0 14172583

VMware ESXi Hypervisor ESXi670-201906002 13981272

VMware vSAN 6.7 Express Patch 10 13805960

VMware NSX Data Center for vSphere 6.4.5 13282012

VMware NSX-T Data Center 2.4.1 13716575

VMware vCenter Server Appliance 6.7 Update 2c 14070457

VMware SDDC Manager 3.8.0 14172583

VMware vRealize Suite Lifecycle Manager 2.1 Patch 1 13685821

VMware Enterprise PKS 1.4.1 1.4.1.0-24363153

(9)

official VMware documentation. Table 2 details the VMware Cloud Foundation products and services.

From the hardware perspective, Table 3 provides the firmware and driver versions that were used in this solution.

Table 3. BIOS and Firmware Specifications for the Hybrid Cloud Data Analytics Solution Reference Architecture

INGREDIENT VERSION

BIOS SE5C620.86B.02.01.0008.031920191559

BMC 1.93

ME 04.01.04.251

SDR 1.93

NIC Firmware 6.80 0x8003d05 1.2007.0 NIC version 1.7.17

Intel Optane SSD

DC P4800X E2010435

Intel SSD DC P4510 VDV10152 Microcode Base: 0x05000021

Plus: 0x05000021 Management: 0x0400001c

Deployment Considerations

The goal of using solutions like VMware Cloud Foundation, NSX-T, vSAN and VMware Enterprise PKS is the

transformation to a SDDC, where administrators can define, deploy, and manage clusters and resources based on actual demand from users. Each of these components is a standalone product and can be used independently. The following sections provide some deployment considerations for these solution components.

VMware Cloud Foundation

VMware Cloud Foundation consists of core components that are compute virtualization (VMware vSphere), network virtualization (VMware NSX), storage virtualization (VMware vSAN), and cloud monitoring (VMware vRealize Suite). VMware Cloud Foundation allows you to build enterprise-ready cloud infrastructure for the private and public cloud.

The standard architecture model for VMware Cloud Foundation includes a dedicated management domain (one per instance) for all management components and up to 15 virtual infrastructure workload domains created by users.

Management Domain

The management domain is a special-purpose workload domain that is used to host the infrastructure components needed to instantiate, manage, and monitor the VMware Cloud Foundation infrastructure. It is automatically created using VMware Cloud Builder on the first rack in a VMware Cloud Foundation system during start-up, and it contains management components such as SDDC Manager, vCenter Server, NSX, and vRealize Log Insight. The management domain uses vSAN as primary storage and requires a minimum of four nodes to work properly. If you add more

racks to your system, the management domain covers additional components automatically.

Workload Domain

The workload domain represents a cluster of resources that can contain up to 64 servers with its own vCenter Server appliance, integrated vSAN, and NSX. A workload domain can span multiple racks; if you add more racks, you can scale any existing workload domains to the additional racks as needed.

All tasks related to the workload domains are performed using the SDDC Manager web interface. This includes the creation, expansion, and deletion of workload domains, along with physical infrastructure monitoring and management.

VMware vSAN

vSAN is storage virtualization software that is fully integrated with VMware vSphere, which joins all storage devices across a vSphere cluster into a shared data pool (see Figure 8). Two vSAN cluster configurations are possible: hybrid and all- flash vSAN. A hybrid vSAN cluster uses two types of storage devices—flash devices for the cache tier and magnetic drives for the capacity tier. In an all-flash vSAN configuration, both the cache and capacity tiers use flash drives. Application of such a solution eliminates the need for external shared storage.

vSAN offers users flexibility to define policies on demand and delivers ease of management of storage for containers.

VMware vSphere VMware vSAN

VMware vSphere VMware vSAN

APPOS APP OS APP

OS APP

OS

Managed by VMware vCenter

ESXi 1

SSD SSD

ESXi 2

SSD SSD

ESXi n

SSD SSD

vSAN Shared Storage

Figure 8. VMware vSAN configuration. Source: vmware.com/content/

dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-vsan-datasheet.pdf

VMware NSX

VMware NSX is a network virtualization solution that allows you to build software-defined networks in virtualized data centers. Just as VMs are abstracted from physical server hardware, virtual networks (including switches, ports, routers, firewalls, etc.) are constructed in the virtual space. Virtual networks are provisioned and managed independent of the underlying hardware.

VMware NSX allows you to define network connectivity among virtualized elements running on vSphere and to harden network security through micro-segmentation rules.

VNFs) defined by VMware NSX include switching, routing, firewalling, load balancing, and virtual private networks (VPNs—specifically IPsec and SSL).

(10)

NSX‑V Compared to NSX‑T

VMware offers two types of the NSX software-defined networking platform – NSX-V and NSX-T.

• NSX‑V (NSX for vSphere) is designed for vSphere

deployments only and is architected so that a single NSX-V manager platform is tied to a single VMware vCenter Server instance. The NSX-V platform is the original NSX platform that has been available for several years.

• NSX‑T (NSX‑Transformers) was designed for different virtualization platforms and multi-hypervisor environments;

it can also be used in cases where NSX-V is not applicable.

While NSX-V supports software-defined networking for only VMware vSphere, NSX-T also supports a network virtualization stack for KVM, Docker, Kubernetes, and OpenStack, as well as AWS-native workloads. VMware NSX-T can be deployed without a vCenter Server and is adopted for heterogeneous compute systems. It is designed to address emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks. A popular use case for NSX-T is with containers because it includes the NSX-T Container Networking interface (CNI) plugin that allows developers to configure network connectivity for container applications.

With NSX-T, as used in this reference architecture, VMware has shifted from the Virtual Extensible LAN (VXLAN)- based encapsulation that is utilized by NSX-V to the newer GENEVE encapsulation. This tunneling protocol preserves traditional offload capabilities available on network interface cards (NICs) for great performance. Additional metadata can be added to overlay headers to help improve context differentiating for processing information such as end-to- end telemetry, data tracking, encryption, security, and so on in the data transferring layer. Additional information in the metadata is called TLV (Type, Length, Value). GENEVE (co-developed by VMware, Intel, Red Hat, and Microsoft) is based on the best concepts of the VXLAN, STT, and NVGRE encapsulation protocols.

The maximum transmission unit (MTU) value for Jumbo frames must be at least 1700 bytes when using GENEVE encapsulation. This is because of the additional metadata field of variable length for GENEVE headers. The VXLAN protocol requires an MTU value of 1600 or higher.

NSX Components

The main components of VMware NSX are NSX Manager, NSX Controllers, and NSX Edge gateways.

• NSX Manager is a centralized component of NSX that is used for network management. It is a virtual appliance that provides the GUI and the RESTful APIs for creating, configuring, orchestrating, and monitoring NSX-T Data Center components (such as logical switching and routing, networking and edge services, and security and distributed firewall services), as well as NSX Edge services gateways.

NSX Manager is the management plane for the NSX-T Data

Center eco-system. NSX Manager provides an aggregated system view and is the centralized network management component of NSX-T Data Center. It provides configuration and orchestration of logical networking components:

logical switching and routing, networking and edge services, and security services and distributed firewall.

• NSX Controller is a distributed state management system used to overlay transport tunnels and control virtual networks, which can be deployed as VMs on VMware ESXi or KVM hypervisors. The NSX Controller manages all logical switches within the network, and handles information about VMs, hosts, switches, and VXLANs. Having three controller nodes ensures data redundancy in case of failure of one NSX Controller node.

• NSX Edge is a gateway service that provides access to physical and virtual networks for VMs. It can be installed as a distributed virtual router or as a services gateway.

The following services can be provided: dynamic routing, firewalls, NAT, DHCP, VPNs, load balancing, and high availability.

An NSX Edge VM has four internal interfaces: eth0, fp-eth0, fp-eth1, and fp-eth2. eth0 is reserved for management, while the other interfaces are assigned to the Data Plane Development Kit (DPDK) fastpath. These interfaces are allocated for uplinks to TOR switches and for NSX-T Data Center overlay tunneling.

NSX Edge can connect to two transport zones—one for overlay and other for north-south peering with external devices. These two transport zones define the limits of logical network distribution on the NSX Edge (see Figure 9):

n Overlay Transport Zone. Any traffic that originates from a VM participating in an NSX-T Data Center domain might require reachability to external devices or networks. This is typically described as external north-south traffic.

The NSX Edge node is responsible for decapsulating the overlay traffic received from compute nodes as well as encapsulating the traffic sent to compute nodes.

n VLAN Transport Zone. In addition to the encapsulate or decapsulate traffic function, NSX Edge nodes also need a VLAN transport zone to provide uplink connectivity to the physical infrastructure.

Note: You must match the physical interfaces of an NSX-T Edge node with profiles you have previously created.

Transport Nodes (TNs) and virtual switches represent NSX data transferring components. The TN is the NSX-compatible device participating in the traffic transmission and the NSX networking overlay. A node must contain a hostswitch that can serve as a transport node. NSX-V requires the use of a vSphere distributed virtual switch (VDS) as usual in vSphere.

Standard virtual switches cannot be used for NSX-V. NSX-T presumes that you have deployed an NSX-T virtual distributed switch (N-VDS). Open vSwitches (OVS) are used for KVM hosts, while VMware vSwitches are used for ESXi hosts.

(11)

N-VDS is a software NSX component on the transport node that performs traffic transmission. N-VDS is the primary component of the transport node’s data plane, which forwards traffic and owns at least one physical NIC. Each N-VDS of the different transport nodes is independent, but they can be grouped by assigning the same names for centralized management.

Transport zones are available for both NSX-V and NSX-T.

Transport zones define the limits of logical networks distribution. Each transport zone is linked to its NSX Switch (N-VDS). Transport zones for NSX-T are not linked to clusters. There are two types of transport zone for VMware NSX-T due to GENEVE encapsulation: Overlay and VLAN. As for VMware NSX-V, a transport zone defines the distribution limits of VXLAN only.

Figure 9. VMware NSX‑T Data Center transport zones. Source:

docs.vmware.com/en/VMware-NSX-T-Data-Center/2.3/com.vmware.nsxt.

install.doc/GUID-F47989B2-2B9D-4214-B3BA-5DDF66A1B0E6.html NSX Edge

WebVM Host Logical Switch 2 Overlay Transport

Zone 2

TIER‑0

VLAN Transport

Zone VLAN Transport

Zone

Secure

Host VM

Logical Switch 1 Overlay Transport

Zone 1 TIER‑1

Physical

Architecture Physical

Architecture

VMware Enterprise PKS (Kubernetes)

VMware Enterprise PKS is a solution for deploying Kubernetes in multi-cloud environments. It simplifies Kubernetes clusters deployment with Day 1 and Day 2 operations support and manages container deployment from the application layer all the way to the infrastructure layer.

The deployed Kubernetes cluster is available in its native form; there are no add-ons or proprietary extensions. The native Kubernetes command-line interface (CLI) can be used.

VMware Enterprise PKS uses BOSH for instantiating, deploying, and managing Kubernetes clusters on a cloud platform. After the VMware Enterprise PKS solution is

deployed on the Ops Manager Dashboard, users can provision Kubernetes clusters using the CLI and run container-based workloads on the clusters with the Kubernetes CLI (kubectl).

VMware Enterprise PKS Infrastructure

VMware Enterprise PKS consists of several components:

• Ops Manager is a graphical dashboard that deploys with BOSH. Ops Manager works with the BOSH Director to manage, configure, and upgrade Pivotal Cloud Foundry (PCF) products such as Pivotal Application Service (PAS), VMware Enterprise PKS, and PCF services and partner products. Ops Manager represents PCF products as tiles with multiple configuration panes that let you input or select configuration values needed for the product. Ops Manager generates BOSH manifests that contain the user-supplied configuration values and sends them to the Director. After you install Ops Manager and BOSH, you use Ops Manager to deploy almost all PCF products.

• BOSH is an open source tool that lets you run software systems in the cloud. BOSH and its infrastructure-as-a- service (IaaS) cloud provider interfaces (CPIs) are what enable PCF to run on multiple instances of infrastructure as a service (IaaS). VMware Enterprise PKS uses BOSH to run and manage Kubernetes container clusters. PKS is based on the Cloud Foundry Foundation’s open source Container Runtime (formerly Kubo) project.

• VMware Harbor Registry is an enterprise-class registry server that stores and distributes container images. Harbor allows you to store and manage images for use with VMware Enterprise PKS.

VMware Enterprise PKS Control Plane Overview The VMware Enterprise PKS control plane enables users to deploy and manage Kubernetes clusters. It manages the lifecycle of Kubernetes clusters deployed using PKS. A special command-line tool (PKS CLI) is used to communicate with the VMware Enterprise PKS control plane and it allows users to:

• View cluster plans

• Create clusters

• View information about clusters

• Obtain credentials to deploy workloads to clusters

• Scale clusters

• Delete clusters

• Create and manage network profiles for VMware NSX-T VMware Enterprise PKS Control Plane Architecture The VMware Enterprise PKS control plane is located on a single VM and includes the following components:

• PKS API server

• PKS Broker

• User Account and Authentication (UAA) server

Note: The PKS API Load Balancer is used for deployments without NSX-T. If the deployment is NSX-T-based, a

destination NAT (DNAT) rule on the enterprise external router is configured for the PKS API host.

(12)

Components of the control plane interact with each other as shown in Figure 10.

Figure 10. VMware Enterprise PKS control plane.

Source: docs.pivotal.io/runtimes/pks/1-4/control-plane.html

Local Workstation

Kubernetes Cluster 1 kuberconfig

kuberctl

Cluster 1 Load Balancer

Kubernetes Cluster 2 Cluster 2

Load Balancer

Kubernetes Cluster 3 Cluster 3

Load Balancer

BOSH

Ops Manager VM

UAA PKS API PKS Broker

PKS Control Plane VM

PKS CLI PKS API

Load Balancer

Virtual Infrastructure Overview for VMware Enterprise PKS with NSX‑T Workload Domains

The complete ecosystem of VMware Enterprise PKS deployed on VMware Cloud Foundation is illustrated in Figure 11. Note that you may have multiple workload domains, and thus multiple VMware Enterprise PKS instances, within a single VMware Cloud Foundation environment.

The hosts in the VMware Cloud Foundation workload domain provide resources for the VMware Enterprise PKS workloads;

they host Kubernetes clusters deployed by VMware Enterprise PKS and the containerized applications that run on them. On VMware vSphere, Kubernetes clusters consist of a series of master and worker nodes that run as VMs. As defined within your cluster plans, Kubernetes clusters can reside either within or across physical vSphere clusters.

Cluster Workload Management

Since the Kubernetes version delivered by PKS is in its native form, the default kubectl command line tool is used for managing the containers and deployments on the Kubernetes clusters. Full documentation for the tool can be found in the official Kubernetes documentation.

Networking in VMware Enterprise PKS

VMware Enterprise PKS relies on virtual networks that must be configured and ready prior to VMware Enterprise PKS deployment. VMware Enterprise PKS components are installed in a logical switch over a Tier-1 NSX-T router. Pod and node networks are configured over different Tier-1 routers. All Tier-1 routers must be connected to a physical network over Tier-0 routers with proper Uplink configuration (Figure 12).

Figure 11. Complete ecosystem of VMware Enterprise PKS deployed on VMware Cloud Foundation. Source: docs.vmware.

com/en/VMware-Validated-Design/5.1/sddc-architecture-and-design-for- vmware-enterprise-pks-with-vmware-nsx-t-workload-domains/GUID- 8F4D6F40-8126-4C41-952D-192A45AF5AF3.html

Shared Edge and Compute Cluster

ESXi Transport Node ESXi Transport Node ESXi Transport Node ESXi Transport Node Managed By:

Compute vCenter Server

SDDC Kubernetes Workload Clusters APPOSAPP

OSAPP OS

APPOSAPP OSAPP OS

APPOSAPP OSAPP

OS APPOSAPP

OSAPP OS

APPOSAPP OSAPP OS

APPOSAPP OSAPP

OS APPOSAPP

OSAPP OS

APPOSAPP OSAPP OS

APPOSAPP OSAPP

OS APPOSAPP

OSAPP OS

APPOSAPP OSAPP OS

APPOSAPP OSAPP

OS APPOSAPP

OSAPP OS

APPOSAPP OSAPP OS

APPOSAPP OSAPP

OS

Virtual Infrastructure Shared Edge and

Compute APPOSAPP

OSAPP OS PCF Ops Manager

APPOSAPP OSAPP OS NSX-T Edges APPOSAPP

OSAPP OS DirectorBOSH

APPOSAPP OSAPP OS VMware Enterprise PKS APPOSAPP

OSAPP OS VMware Harbor

Registry

Management

Virtual Infrastructure

Management APPOSAPP

OSAPP OS NSX-V Controller

APPOSAPP OSAPP

OS NSX-V Edge

Compute APPOS NSX-T Manager

APPOS vCenter

Server APPOS vCenter

Server APPOS NSX-V Manager

Other Mgmt Apps

APPOSAPP OSAPP OS APPOSAPP OSAPP OS APPOSAPP OSAPP OS

APPOSAPP OSAPP OS

NSX‑T Transport Zones NSX‑T

Transport Zones (Mgmt)

N‑VDS (Compute) vDS (Mgmt)

3 Compute Clusters

Compute vCenter Server

Management Cluster

Management vCenter Server

INTERNAL SDDC NETWORK EXTERNAL

NETWORK

Figure 12. Logical diagram of a single VMware Enterprise PKS cluster deployment. Source: docs.vmware.com/en/VMware- Enterprise-PKS/1.3/vmware-enterprise-pks-13/GUID-nsxt-multi-pks.html

VMware vCenter NSX-T Controllers NSX-T Manager

TIER‑1

PKS Foundation A

Router

10.0.0.0/16

Physical L2/L3 Switches

Infrastructure Ops Manager, BOSH, PKS API 10.0.1.0/24

Shared Services RabbitMQ, MySQL, etc…

10.0.2.0/24

Kubernetes Node Network Master, Work Nodes 10.1.1.0/24

Kubernetes Namespaces 172.17.1.0/24

Load Balancers Kubernetes Services/SNAT 172.16.2.1 … 172.16.2.100

TIER‑1

Management Network (vSphere vSwitch)

TIER‑0

Router

20.0.0.0/16 TIER‑0

TIER‑1

PKS Foundation B

Infrastructure Ops Manager, BOSH, PKS API 20.0.1.0/24

Shared Services RabbitMQ, MySQL, etc…

20.0.2.0/24

Kubernetes Node Network Master, Work Nodes 20.1.1.0/24

Kubernetes Namespaces 173.17.1.0/24

Load Balancers Kubernetes Services/SNAT 173.16.2.1 … 172.16.3.100

TIER‑1

INTERNET

PKS Control Plane Management Network (Logical Switch) Ops Manager BOSH

(13)

Integration of VMware Enterprise PKS, VMware Cloud Foundation, NSX‑T, and vSAN

VMware Cloud Foundation combines compute, storage, networking, security, and cloud management services—

an ideal platform for running enterprise workloads and containerized applications. vSAN offers users flexibility to define policies on demand and delivers ease-of-management of storage for containers. Developers can consume storage as code by abstracting the complexity of the underlying storage infrastructure. With the help of NSX-T, end users no longer need to know the underlying network architecture.

NSX-T can automatically create load balancers, routers, and switches for use by VMware Enterprise PKS.

The close integration of VMware Enterprise PKS, NSX-T, and vSAN based on a VMware Cloud Foundation network for containers in Kubernetes clusters makes it easy to manage ephemeral and persistent storage as well as to access vSAN’s availability and data service features. In addition, vSphere High Availability and vSphere Fault Tolerance can protect VMs from physical server failure. The combination of these technologies makes PKS on VMware Cloud Foundation a complete solution, perfect for Kubernetes administrators and developers.

Environment Provisioning

As mentioned in the “Solution Overview” section, the complete environment consists of three main products:

VMware Cloud Foundation, VMware NSX-T, and VMware Enterprise PKS. The following sections describe how to provision these products.

Hardware and Software Requirements

Multiple requirements must be met before deploying the VMware Cloud Foundation platform. The complete details are listed in the VMware Cloud Foundation Planning and Preparation Guide.

There are specific requirements for network (Jumbo Frames and 802.1Q tagging), network pools, VLANs and IP pools, hostnames and IP addresses. You need to be familiar with the “Planning and Preparation Guide” before taking the next steps. The entire setup relies heavily on multiple VLANs.

Each domain requires its own set of isolated VLANs for management, VMware vMotion, vSAN, VXLAN (NSX virtual tunnel end-point (VTEP)), and uplink. Read VMware’s “VLANs and IP Subnets” documentation for more details.

VMware Cloud Foundation Deployment

The VMware Cloud Foundation deployment process includes several steps after you obtain all necessary hardware components, install them in a rack, and provide

the necessary power, cooling, and uplink connection to the data center infrastructure. First, you deploy Cloud Builder VM, which can be used (you may also install the ESXi OS manually) for imaging the ESXi software on VMware Cloud Foundation servers. Then you need to download and complete the Deployment Parameter Sheet, and finally you initiate the VMware Cloud Foundation start-up process.

The entire deployment of VMware Cloud Foundation is described in the VMware Cloud Foundation Architecture and Deployment Guide – VMWare Cloud Foundation 3.8 document, chapter: “Deploying Cloud Foundation.”

Step 1: Deploy Cloud Builder VM

Cloud Builder VM is used to deploy VMware Cloud

Foundation; it also includes the VMware Imaging Appliance (VIA), which can be used for imaging the ESXi servers. For details go to the “Software Requirements” chapter of the VMware Cloud Foundation Planning and Preparation Guide.

The detailed deployment procedure of Cloud Builder VM is available in VMware’s “Deploy Cloud Builder VM”

documentation.

Step 2: Install ESXi Software on VMware Cloud Foundation Servers

Imaging ESXi servers using Cloud Builder VM (which is done with VIA) is optional. If you already have servers with a supported version of ESXi, you do not need to use VIA. You may also install ESXi manually on each machine. Using VIA has some advantages, as it not only installs ESXi, but also deploys any additional vSphere Installation Bundles (VIBs) and configures standard passwords across all machines.

Detailed information on how to prepare hosts and set up VIA for ESXi installation is available in the “Installing ESXi Software on Cloud Foundation Servers” chapter of the

“Cloud Foundation Guide.”

Best Known Method

Be sure to add to the VIA Bundle any required or custom VIBs that you need. In most cases, those will be specific drivers for NICs or SSDs. VIBs can be added directly in the VIA Web interface by going to Bundle ➔ Modify VIBs. For the reference architecture described in this document, we added the following VIBs:

• NIC driver and update tool for Intel Network adapter:

i40en‑1.7.17‑1OEM.670.0.0.8169922.x86

• Intel Volume Management Device (Intel VMD) driver for NVMe: intel‑nvme‑vmd‑1.7.0.1000‑

1OEM.670.0.0.8169922.x86_64

• Intel SSD Data Center Tool for updating NVMe firmware:

intel_ssd_data_center_tool‑3.0.19‑400

(14)

Step 3: Download and Complete the Deployment Parameter Sheet

You import the parameter spreadsheet (Excel file) during the VMware Cloud Foundation start-up process. Before you begin the start-up process, collect all the information needed to configure network connectivity, including a list of VLANs, network addresses, and uplinks, and put that data into the Excel file. You should also plan for the DNS infrastructure and subdomain name reserved for VMware Cloud Foundation.

A detailed description of all the necessary fields is available in the VMware Cloud Foundation Architecture and

Deployment Guide “About the Deployment Parameter Sheet.”

Best Known Method

Be sure that all the passwords you provide meet the password complexity criteria.

Step 4: Start Up VMware Cloud Foundation

Once the imaging process is completed and the parameter sheet is ready, move to the final phase—starting up VMware Cloud Foundation.

Best Known Method

At this point, we recommend making a snapshot of the Cloud Builder VM. In case of a failed start up, you will be able to quickly restore the VM and start a fresh process instead of reinstalling the whole Cloud Builder.

During start up, SDDC Manager, all vCenter Servers, NSX Managers and Controllers, Platform Services Controller (PSC), vSAN, and vRealize Log Insight components are deployed, creating the management domain of VMware Cloud Foundation. The process takes about two hours.

After the start-up process is complete you should see a notification with a link to the new SDDC Manager Web interface. The SDDC Manager interface is accessible through a standard web browser.

The complete description of the VMware Cloud Foundation start-up process is included in VMware’s “Initiate the Cloud Foundation Bring-Up Process” documentation.

The management workload domain is now created and contains all the components needed to manage the infrastructure. It will also host additional components like VMware Enterprise PKS or NSX-T, which will be installed later. You should not deploy any user applications on this management cluster. Instead, create one or more workload domains that comprise a separate vSphere cluster with vSAN and NSX preinstalled and configured, along with a dedicated instance of vCenter Server for management purposes. All instances of vCenter Server and NSX Manager are deployed on the management domain.

NSX‑T Installation

Installation of NSX-T usually requires deploying NSX Manager, three NSX controllers, creating an NSX controller cluster, installing VIBs (kernel modules) on ESXi hosts, and installing NSX Edge VMs. When using VMware Cloud Foundation, most of the steps are executed automatically during workload domain deployment; however, t

Run traditional enterprise workloads plus newer demanding AI workloads both on‑premises and in the cloud with an end‑to‑end solution optimized by Intel and VMware

Executive Summary

Analytics/AI

Figure 1. Building blocks of the reference architecture for the Hybrid Cloud Data Analytics Solution with vSAN ReadyNodes.

Run traditional enterprise workloads plus newer demanding AI workloads both on‑premises and in the cloud with an end‑to‑end solution optimized by Intel and VMware

Intel Data Center Group Authors Patryk Wolsza

Why Hybrid Cloud for Machine Learning?

Table of Contents

Executive Summary . . . .1

Bill of Materials . . . .8

Summary . . . 17

VMware SDDC Manager

Solution Overview

Software Overview

Hardware Overview

Solution Architecture Details

Intel SSD Data Center Family

Intel Optane DC Persistent Memory

Cloud Infrastructure

VMware SDDC Manager

Analytics and AI Building Blocks

Deep Learning Reference Stack

TARGET

Figure 4. Deep Learning Reference Stack in a multi‑node configuration.

Data Warehousing Building Blocks

Platform‑Verified Workloads

Data Warehousing

ML Workloads

Data Warehousing on VMware Cloud Foundation

ResNet50 v1 Throughput Scaling with VMs

Figure 6. Scalability of the overall throughput of the ResNet50 v1 workload.

fASteR

Transactions Per Minute

HammerDB

Figure 7. Data warehousing benchmarks reveal that the Plus configuration of the Hybrid Cloud Data Analytics Solution can increase performance and density, compared to the Base configuration.

Bill of Materials

Software and Firmware Specifications

VRN2208WFAF83R 1

COMPONENT VERSION BUILD

Table 3. BIOS and Firmware Specifications for the Hybrid Cloud Data Analytics Solution Reference Architecture

INGREDIENT VERSION

BIOS SE5C620.86B.02.01.0008.031920191559

Deployment Considerations

Management Domain

VMware vSAN

VMware vSphere VMware vSAN

VMware NSX

NSX‑V Compared to NSX‑T

NSX Components

VLAN Transport

VMware Enterprise PKS (Kubernetes)

VMware Enterprise PKS Infrastructure

Figure 10. VMware Enterprise PKS control plane.

PKS Control Plane VM

Cluster Workload Management

PKS Foundation A

Router

INTERNET

Environment Provisioning

VMware Cloud Foundation Deployment

Step 1: Deploy Cloud Builder VM

Step 2: Install ESXi Software on VMware Cloud Foundation Servers

Best Known Method

i40en‑1.7.17‑1OEM.670.0.0.8169922.x86

Best Known Method

Step 4: Start Up VMware Cloud Foundation

Best Known Method

NSX‑T Installation