

## Time Transfer requirements: network & data center Thomas Kernen, Principal Architect | SwiNOG-38, June 21st 2023



## What this presentation is not about – previous topics:

- Peering policies SwiNOG #1
- Designing and deploying a VoIP network SwiNOG #5
- Metro Ethernet SwiNOG #8
- IPTV/Video over Broadband SwiNOG #12
- 2000-2010: How the Internet has evolved SwiNOG #20
- Today is about:
  - "High precision" Time transfer across networks
  - IEEE 1588 Precision Time Protocol
  - Living in a nanosecond scale world

# Introduction

• Video for network engineers: what is relevant to you? - SwiNOG #17 • Automatic Multicast without Explicit Tunnels (AMT) - SwiNOG #22

2





# Agenda

- Timing 101
- OCP-TAP DC PTP Profile



Use cases for Timing in the Data Center



## Media



# Industry specific requirements

## Telco

Finance





### Data Center



# Nanosecond-level clock synchronization can be an enabler of a new spectrum of timing- and delay-critical applications in data centers — <u>Yilong Geng & All 2018</u>



- Enable set of new applications
- Improve set of current applications
- Using Precision Timing Protocol (PTP)
  - today
- Spotlight case: Google <u>Spanner, TrueTime and the CAP Theorem</u>
  - considered impossible due to the CAP Theorem.

# Why Synchronization in Data Centers?

Provide a reliable time synchronization service across the infra of a data center

• Increase the level of accuracy by 2 to 3 orders of magnitude beyond what NTP infra offers

• Highly available global-scale distributed database. It provides strong consistency for all transactions. This combination of availability and consistency over the wide area is generally





- Distributed databases
- One Way Delay (OWD) Measurement
- Network & host based telemetry
- System-Wide Performance Analysis (<u>Nsight Systems</u>)
  - Root cause analysis
  - CPU, GPU interactions and activity
  - Multi-node systems
  - Interrupts, wait states
- Security

## Use cases

Microscopic view of bursts, buffer contention, and loss (Millisampler/Syncmillisampler)



|              | 20.55               |    | . 22s    | <br>55 | ■ Q □ 1x<br>55 |
|--------------|---------------------|----|----------|--------|----------------|
| m _ ncciKern |                     |    |          |        |                |
|              |                     |    |          |        |                |
| ncciKern     | nccl<br>clKern<br>n | em | nccikern |        |                |



### Needed to guarantee if a transaction is committed at time T1 (e.g., write operation) before another transaction T2 (e.g., read operation), committed timestamp of T1 is before the committed timestamp of T2 when compared with real-time.

- Aligning the clocks across all nodes in the distributed system ensures that they all display the same time for a given level of accuracy thereby defining a window of time uncertainty ( $\epsilon$ )
- Ordering of operations is necessary, but not always sufficient
- Strict serializability (two-phase commit)
- Ordering in time leads to improve performance but requires strict clock skew guarantees between machines (e.g., to enable property of linearizability)

## **Distributed Database**





# Schematic representation of read returning outdated information

# Commit-wait ensuring consistency guarantee (linearizability)



Source: <a href="https://engineering.fb.com/2022/11/21/production-engineering/precision-time-protocol-at-meta/">https://engineering.fb.com/2022/11/21/production-engineering/precision-time-protocol-at-meta/</a>



# Schematic representation of read returning outdated information

# Commit-wait ensuring consistency guarantee (linearizability)



Source: <a href="https://engineering.fb.com/2022/11/21/production-engineering/precision-time-protocol-at-meta/">https://engineering.fb.com/2022/11/21/production-engineering/precision-time-protocol-at-meta/</a>





## Commit-wait reads issued against PTP and NTP backed clusters



# Why is NTP not accurate enough?

### Source: <a href="https://engineering.fb.com/2022/11/21/production-engineering/precision-time-protocol-at-meta/">https://engineering.fb.com/2022/11/21/production-engineering/precision-time-protocol-at-meta/</a>



# Timing 101





Client time



Server time

Ideal sync
Disciplined client
Free running client



- Node dependent
  - Time stamping resolution
  - Local oscillator quality
- Network related
  - Packet Delay Variations
  - Performance of time aware network devices
  - Different paths upstream and downstream
  - Highly asymmetric network loading
- Configuration dependent
  - Message rates

## What is accuracy?





Not Accurate Low Precision

Not Accurate High Precision



Accurate Low Precision



Accurate **High Precision** 





## Timing 101 End to End time transfer

 $\Sigma$  time transfer from reference clock to application (userspace) representation







## **Basic Principles of PTP**



## **Delivering Consistent Timing** Challenges To Be Overcome



OS Timing capabilities

Servo configuration & implementation

NIC/CPU/Memory alignment

with PTP process

OS Noise & CPU interrupts:

Jitter into PTP stack

Hardware timestamping resolution & jitter under load

Target is performance dependent (ie: accuracy)







# Software vs. Hardware timestamping





Hardware timestamping pulls timestamps as close as possible to the MAC with minimal overhead (sub 10ns in modern implementations)



Software timestamping: TS, Clock & PTP

Software timestamping doesn't provide a high accuracy and deterministic behaviour (10 to 100 microseconds) due to system noise, latency, scheduling

Hardware timestamping: TS, PHC vs. PTP



### Device #1:

\_\_\_\_\_

|                  | ConnectX7<br>MCX713106AS-C<br>NVIDIA Connec<br>MT_0000000843<br>/dev/mst/mt41<br>946dae0300088<br>946dae088e6e<br>Current<br>28.37.1014<br>3.7.0102<br>14.30.0013 | tX-7 HHHL A<br>29_pciconf0<br>e6e<br>Available<br>N/A<br>N/A |  |
|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|--|
| Status:          | No matching i                                                                                                                                                     | mage found                                                   |  |
| Device #2:<br>   |                                                                                                                                                                   |                                                              |  |
| Device Type:     | ConnectX6DX                                                                                                                                                       |                                                              |  |
|                  | MCX623106TC-CDA_Ax                                                                                                                                                |                                                              |  |
| Description:     | ConnectX-6 Dx                                                                                                                                                     | EN adapter                                                   |  |
| PSID:            | MT_0000000761                                                                                                                                                     |                                                              |  |
| PCI Device Name: | /dev/mst/mt4125_pciconf0                                                                                                                                          |                                                              |  |
| Base GUID:       | 946dae03000abbca                                                                                                                                                  |                                                              |  |
| Base MAC:        | 946dae0abbca                                                                                                                                                      |                                                              |  |
| Versions:        | Current                                                                                                                                                           |                                                              |  |
| FW               | 22.37.1014                                                                                                                                                        |                                                              |  |
| PXE              | 3.7.0102                                                                                                                                                          | N/A                                                          |  |
| UEFI             | 14.30.0013                                                                                                                                                        | N/A                                                          |  |
| Status:          | No matching i                                                                                                                                                     | mage found                                                   |  |

## **Timestamping capabilities**

Adapter Card; 100GbE; Dual-port QSFP112; PCIe 5.0 x16; Crypto Disabled; Secure Boot Enabled

card; 100GbE; Dual-port QSFP56; Enhanced-SyncE & PTP GM support; PPS In/Out; PCIe 4.0 x16; Crypto and Secure Boot



## sudo ethtool -T enp6s0f0np0

Time stamping parameters for enp6s0f0np0: Capabilities: hardware-transmit hardware-receive hardware-raw-clock PTP Hardware Clock: 2 Hardware Transmit Timestamp Modes: off on Hardware Receive Filter Modes: none all

# **Timestamping capabilities**

## Offset distribution in nanoseconds





| Industry                   | Applica  |
|----------------------------|----------|
| Toloopo Q Mabilo           | Sync fo  |
| Telecom & Mobile           | networl  |
| Drafaggianal Audia ///idaa | Sync fo  |
| Professional Audio/Video   | receiver |
| Douvor                     | Sync fo  |
| Power                      | synchro  |
| Audio/Video, Industrial,   | Sync of  |
| Automation, Automotive     | demand   |
| Inductrial Automatics      | Sync fo  |
| Industrial Automation      | real-tim |
| Entorprico/Einopoiol       | Sync of  |
| Enterprise/Financial       | measur   |
| Data Contor                | Sync fo  |
| Data Center                | center   |

# **PTP Profiles across Industries**

### ition

or 2G/3G/4G/5G base stations & fronthaul ks

or video/audio feeds between sources and rs

or substation sampled values,

ophasor, power protection

<sup>-</sup> A/V applications with high QoS/QoE

d and time sensitive networks

or industrial plants, machine-to-machine ne control

f time tagged and packet latency

rements

or time-sensitive applications within data



Specification

ITU-T G.8265.1

ITU-T G.8275.1, G.8275.2

SMPTE ST 2059-2

IEEE C37.238-2017 IEC 61850-9-3 & IEC 62493-2 Annex A.2

IEEE Std 802.1AS-2020

IEC 62439-3 Annex B IEC 62439-3 Annex C

draft-ietf-tictoc-ptp-enterprise-profile

OCP DC PTP Profile #1



## <u>Time Reference Layer:</u>

- Rootftop antennas, GPS system
- Open Time Server (OTS) (aka GM)

### <u>Network fabric Layer:</u>

- Large set of PTP-aware switches
- e.g., Transparent Clock (TC)

### <u>Server Layer</u>:

- Very large set of server machines
- End applications requiring time
- HW timestamping

## **"Time Sync Service" Reference Model**











## **Time Error Budget**



| PTP Attributes         |                                 |
|------------------------|---------------------------------|
| Company ID             | 7A-4D-                          |
| Clock types            | GM, E2                          |
| Network transport      | IPv6 (m<br>IPv4 (re<br>Highes   |
| Messages & Rates       | Annour<br>Signalir              |
| Path delay measurement | Delay F                         |
| Domain Number          | 0                               |
| Clock Operations       | One-st<br>One-st<br>Two-st      |
| Network Communication  | Unicast<br>Multica              |
| Clock Class            | 6 (trace<br>7 (hold<br>52 (hold |
| A-BMCA                 | Active-<br>Active-              |

## **OCP-TAP DC PTP Profile #1** Key values

## **PTP Profile Value**

- -2F (OCP)
- 2E TC, OC
- nandatory)
- ecommended)
- st class of service
- nce {0, -4}, Sync {+3, -7}, Follow\_Up, Delay\_Req/Delay\_Resp {0, -7}
- ing, Management
- Request-Response mechanism
- tep and Two-step for GM, OC
- tep for TC (mandatory)
- tep for TC (not recommended)
- st discovery & Unicast negotiation
- ast is prohibited
- eable)
- lover, within spec)
- Idover, out of spec)
- -Active
- -Standby



24



- The nanosecond scale world is fascinating!
- Builds upon IEEE 1588 Precision Time Protocol
- Tuned for DC applications in OCP-TAP
- Enables new applications
- Improves current applications
- Delivers reliable time synchronization as a DC service

# In Conclusion







