Nvidia Ampere Microarchitecture

From GM-RKB
(Redirected from Ampere (Microarchitecture))
Jump to navigation Jump to search

An Nvidia Ampere Microarchitecture is a GPU Architecture for an Nvidia GPU family.



References

2023

  • (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Ampere_(microarchitecture) Retrieved:2023-5-8.
    • Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère. Nvidia announced the Ampere architecture GeForce 30 series consumer GPUs at a GeForce Special Event on September 1, 2020. Nvidia announced the A100 80GB GPU at SC20 on November 16, 2020. Mobile RTX graphics cards and the RTX 3060 based on the Ampere architecture were revealed on January 12, 2021.

      Nvidia announced Ampere's successor, Hopper, at GTC 2022, and "Ampere Next Next" for a 2024 release at GPU Technology Conference 2021.

2023

Chips

  • GA100[1]
  • GA102
  • GA103
  • GA104
  • GA106
  • GA107

Comparison of Compute Capability: GP100 vs GV100 vs GA100[2]

GPU features NVIDIA Tesla P100 NVIDIA Tesla V100 NVIDIA A100
GPU codename GP100 GV100 GA100
GPU architecture NVIDIA Pascal NVIDIA Volta NVIDIA Ampere
Compute capability 6.0 7.0 8.0
Threads / warp 32 32 32
Max warps / SM 64 64 64
Max threads / SM 2048 2048 2048
Max thread blocks / SM 32 32 32
Max 32-bit registers / SM 65536 65536 65536
Max registers / block 65536 65536 65536
Max registers / thread 255 255 255
Max thread block size 1024 1024 1024
FP32 cores / SM 64 64 64
Ratio of SM registers to FP32 cores 1024 1024 1024
Shared Memory Size / SM 64 KB Configurable up to 96 KB Configurable up to 164 KB

Comparison of Precision Support Matrix[3][4]

Supported CUDA Core Precisions Supported Tensor Core Precisions
FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16 FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16
NVIDIA Tesla P4 No Yes Yes No No Yes No No No No No No No No No No
NVIDIA P100 Yes Yes Yes No No No No No No No No No No No No No
NVIDIA Volta Yes Yes Yes No No Yes No No Yes No No No No No No No
NVIDIA Turing Yes Yes Yes No No Yes No No Yes No No Yes Yes Yes No No
NVIDIA A100 Yes Yes Yes No No Yes No Yes Yes No Yes Yes Yes Yes Yes Yes

Legend:

  • FPnn: floating point with nn bits
  • INTn: integer with n bits
  • INT1: binary
  • TF32: TensorFloat32
  • BF16: bfloat16

Comparison of Decode Performance

Concurrent streams H.264 decode (1080p30) H.265 (HEVC) decode (1080p30) VP9 decode (1080p30)
V100 16 22 22
A100 75 157 108