CUDA Toolkit (Tools & Utilities) のバージョン選択について

CUDA Toolkitについて、オフィスサイトウとして仕事で納品するものは CUDA Toolkit 3.2 を使用しています。理由・・・3.2 での依存要求が多いため。

かといって全く 4.0 を扱わないのではありません。以下、 CUDA Toolkit 4.0 説明について

http://developer.nvidia.com/cuda-toolkit-40

Easier Application Porting

Share GPUs across multiple threads
Use all GPUs in the system concurrently from a single host thread
No-copy pinning of system memory, a faster alternative to cudaMallocHost()
C++ new/delete and support for virtual functions
Support for inline PTX assembly
Thrust library of templated performance primitives such as sort, reduce, etc.
NVIDIA Performance Primitives (NPP) library for image/video processing
Layered Textures for working with same size/format textures at larger sizes and higher performance

Faster Multi-GPU Programming

Unified Virtual Addressing
GPUDirect v2.0 support for Peer-to-Peer Communication

New & Improved Developer Tools

Automated Performance Analysis in Visual Profiler
C++ debugging in CUDA-GDB for Linux and MacOS
GPU binary disassembler for Fermi architecture (cuobjdump)
Parallel Nsight 2.0 now available for Windows developers with new debugging and profiling features.

Check out the NEW CUDA 4.0 Math Library Performance Review
Find all the latest versions of other Libraries and Tools on our Tools & EcoSystem Page

The latest released NVIDIA Drivers are always available at www.nvidia.com/drivers
For previous releases, see the CUDA Toolkit Release Archive
Get yourself fully trained- check out the latest CUDA Webinars
Become a CUDA Registered Developer, report bugs, engage with NVIDIA engineering
Jump to: [Windows][ Linux ] [ MacOS ]

＝＝＝

CUDA 4.0 Library Performance Overview

The performance of many math functions has improved with the release of the CUDA 4.0 Toolkit.

This presentation includes the performance results of many of the key functions.

Results include performance measurements for :

cuFFT – Fast Fourier Transforms Library
cuBLAS – Complete BLAS Library
cuSPARSE – Sparse Matrix Library
cuRAND – Random Number Generation (RNG) Library
NPP – Performance Primitives for Image & Video Processing
Thrust – Templated Parallel Algorithms & Data Structures
math.h – C99 floating-point Library

CUDA_4 0_Math_Libraries_Performance_6_14.pdf

A review of the performance of CUDA 4.0 Math Libraries, including cuFFT, cuBLAS, cuSPARSE, cuRAND, NPP, Thrust and others

以上

関連

投稿者斉藤之雄 (Yukio Saito)

Global Information and Communication Technology OTAKU / Sports volunteer / Social Services / Master of Technology in Innovation for Design and Engineering, AIIT / BA, Social Welfare, NFU / twitter@yukio_saitoh

Written by 斉藤之雄
・世界最大の ICT ディストリビュータでシニアプリセールスコンサルタント（マルチクラウドで Data and AI 領域に強みあり）
・東京オリンピックフィールドキャスト (MED/FR)
・東京パラリンピックマラソンコースサポートリーダー
・社会福祉士（免許登録済み）
・東京都登録公認障がい者スポーツ指導員
・東京都中野区スポーツ推進委員（非常勤公務員）
・AWS認定ソリューションアーキテクトアソシエイト (2021-2024)

■Microsoft MCP 取得歴
・AZ-700(Mar/2022)★★
・MS-720 (Feb/2022)★★
・AZ-204 (Feb/2022)★★
・DA-100 (Dec/2021)★★
・Azure DevOps Engineer Expert (Dec/2021) ★★★
・AZ-400 (Dec/2021)★★★
・AZ-600 (Dec/2021)★★
・PL-200 (Oct/2021)★★
・AZ-140 (Oct/2021)★★
・SC-300 (Oct/2021)★★
・AZ-104 (Sep/2021)★★
・Azure Solutions Architect Expert (Sep/2021) ★★★
・AZ-304 (Sep/2021) ★★★
・MB-920 (Sep/2021) ★
・AZ-303 (Aug/2021) ★★★
・MS-900 (Aug/2021) ★
・SC-900 (Jul/2021) ★
・PL-900 (Jul/2021) ★
・AI-102 (Jul/2021) ★★
・DP-900 (Jun/2021) ★
・MB-901 (Jun/2021) ★
・AI-900 (May/2021) ★
・AZ-900 (Apr/2021) ★
—
■所属学会
・日本ロボット学会
・人工知能学会
・計測自動制御学会
・日本生産性本部（個人賛助会員）
—
■将来実現したいこと（Social Action）
・障害者（手帳保有に関係なく）の就労支援
・気づき難い大人の学習障害者に対する就労支援
・日本語を母国語としない方への就労支援
・成長あり共生社会
—
自宅メインマシン IdeaPad Gaming 3 シリーズ

—

[CUDA] CUDA Toolkit 3.2 or 4.0 (Tools & Utilities) どちら？

CUDA 4.0 Library Performance Overview

関連

投稿者斉藤之雄 (Yukio Saito)

関連投稿

見逃しています

[受講メモ] NVIDIA GTC 2024

[資格取得] IBM Cloud for Professional Architect v6 (合格体験談）

[ICT] Lenovo IdeaPad Gaming 370 RAM 64GB 環境

[Stable-Diffusion-webui] NVIDIA GPU を持たない安価 NotePC (Windows 11)で簡単に動かす方法

CUDA 4.0 Library Performance Overview

関連

投稿者 斉藤之雄 (Yukio Saito)

関連投稿

見逃しています

投稿者斉藤之雄 (Yukio Saito)