Post on 06-Apr-2018
8/2/2019 Seminar Presentation -CUDA
1/30
8/2/2019 Seminar Presentation -CUDA
2/30
8/2/2019 Seminar Presentation -CUDA
3/30
CUDA (an acronym forCompute
Unified Device Architecture) is a parallel
computing architecture developed by
NVIDIA. CUDA is the computing engine in
NVIDIA graphics processing unit (GPUs)
that is accessible to software developersthrough variants of industry standard
programming languages.
8/2/2019 Seminar Presentation -CUDA
4/30
Chip in computer video cards, PlayStation 3, Xbox,
etc.
Two major vendors: NVIDIA and ATI (now AMD)
8/2/2019 Seminar Presentation -CUDA
5/30
. GPU Computing with CUDA brings parallel
computing to the masses.
. Data-parallel supercomputers are everywhere!
. CUDA makes this power accessible
8/2/2019 Seminar Presentation -CUDA
6/30
Applications:
High arithmetic intensity:
Dense linear algebra, PDEs, n-body, finite
difference, High bandwidth:
Sequencing (virus scanning, genomics), sorting,
database
Visual computing:
Graphics, image processing, tomography,
machine vision
8/2/2019 Seminar Presentation -CUDA
7/30
8/2/2019 Seminar Presentation -CUDA
8/30
8/2/2019 Seminar Presentation -CUDA
9/30
` Compute Unified device architecture
For parallel computing Developed by NVIDIA Co-designed hardware & software for direct GPU computing
Hardware: fully general data-parallel arch
` General thread launch` Global load-store` Parallel data cache` Scalar architecture` Integers, bit operations` Double precision (shortly)
8/2/2019 Seminar Presentation -CUDA
10/30
` Thread : The smallest unit executing aninstruction.
` Block : Contains several threads.
` Warp : A group of threads physically executed in
parallel (usually running the same application).` Grid : Contains several thread blocks.
` Kernel : An application or program, that runs onthe GPU.
` Device : The GPU.
` Host : The CPU.
8/2/2019 Seminar Presentation -CUDA
11/30
8/2/2019 Seminar Presentation -CUDA
12/30
> Expose as much parallelism as possible
> Optimize memory usage for maximum
bandwidth
> Maximize occupancy to hide latency
>
Optimize instruction usage formaximum throughput
8/2/2019 Seminar Presentation -CUDA
13/30
` Each thread can:
Read/write per-block on-chip shared memory
Read per-grid cached constant memory
Read/write non-cached device memory: Per-grid global memory
Per-thread local memory
8/2/2019 Seminar Presentation -CUDA
14/30
Basic Strategies
` Processing data is cheaper than moving it around> Especially forGPUs as they devote many more transistors
to ALUs than memory` And will be increasingly so> The less memory bound a kernel is, the better it will scale
with future GPUs` So you want to:>Maximize use of low-latency, high-bandwidth memory>Optimize memory access patterns to maximize bandwidth
8/2/2019 Seminar Presentation -CUDA
15/30
8/2/2019 Seminar Presentation -CUDA
16/30
1. Copy data from main mem to GPU mem
2. CPU instructs the process to GPU
3. GPU execute parallel in each core4.4. Copy the result from GPU mem to main
memory.
8/2/2019 Seminar Presentation -CUDA
17/30
Provide ability to run code on GPU
Manage resources
Partition data to fit on cores
Schedule blocks to cores
8/2/2019 Seminar Presentation -CUDA
18/30
` Programming interface of CUDA applications is based on the standardC language with extensions, which facilitates the learning curve ofCUDA
` CUDA provides access to 16 KB of memory (per multiprocessor)shared between threads, which can be used to setup cache with higherbandwidth than texture lookups
`
More efficient data transfers between system and video memory` No need in graphics APIs with their redundancy and overheads` Linear memory addressing, gather and scatter, writing to arbitrary
addresses` Hardware support for integer and bit operations
` Scattered reads code can read from arbitrary addresses in memory.` Shared memory CUDA exposes a fast shared memory region (16KB
in size) that can be shared amongst threads. This can be used as auser-managed cache, enabling higher bandwidth than is possible usingtexture lookups.
` Faster downloads and read backs to and from the GPU` Full support for integer and bitwise operations, including integer texture
lookups
8/2/2019 Seminar Presentation -CUDA
19/30
` up to 512 CUDA cores and 3.0 billion transistors
` NVIDIA Parallel Data Cache technology
` NVIDIA Giga Thread engine
` ECC memory support` Native support for Visual Studio
8/2/2019 Seminar Presentation -CUDA
20/30
` Accelerated rendering of 3D graphics
` Real Time Cloth Simulation OptiTex.com - Real Time ClothSimulation
` Distributed Calculations, such as predicting the native
conformation of proteins` Medical analysis simulations, for example virtual reality
based on CT and MRI scan images.
` Physical simulations, in particular in fluid dynamics.
` Environment statistics
` Accelerated encryption, decryption and compression` Accelerated inter conversion of video file formats
` Artificial intelligence
8/2/2019 Seminar Presentation -CUDA
21/30
APPLICATIONS
8/2/2019 Seminar Presentation -CUDA
22/30
Ultra Sound Scaning
8/2/2019 Seminar Presentation -CUDA
23/30
` GPU Electromagnetic Field simulation
` Cell phone irradiation
` MRI Design / Modeling
` Printed Circuit Boards
` Radar Cross Section (Military)
` Seismic Migration
` 8X Faster than Quad Core alone
8/2/2019 Seminar Presentation -CUDA
24/30
8/2/2019 Seminar Presentation -CUDA
25/30
8/2/2019 Seminar Presentation -CUDA
26/30
8/2/2019 Seminar Presentation -CUDA
27/30
` No recursive functions
` Minimum unit block of 32 threads
` Closed CUDA architecture, it belongs to NVIDIA
8/2/2019 Seminar Presentation -CUDA
28/30
CUDA is a powerful parallel programming model Heterogeneous - mixed serial-parallel programming
Scalable- hierarchical thread execution model
Accessible - minimal but expressive changes to C
Interoperable - simple graphics interop mechanismsCUDA is an attractive platform
Broad - OpenGL, DirectX, WinXP, Vista, Linux, MacOS
Widespread - over 85M CUDA GPUs, 60K CUDAdevelopers
CUDA provides tremendous scope for innovativegraphics research beyond programmable shading
8/2/2019 Seminar Presentation -CUDA
29/30
` Official site
` Nvidia Parallel N sight
` Nvidia CUDA developer registration for professionaldevelopers and researchers
` Nvidia CUDA GPU Computing developer forums` Programming Massively Parallel Processors: A
Hands-on Approach
` CUDA Tutorials for high performance computing
` www.google.com` Intro to GPGPU computing featuring CUDA and
OpenCL examples
8/2/2019 Seminar Presentation -CUDA
30/30