View
47
Download
13
Category
Preview:
DESCRIPTION
Compute Unified Device Architecture
Citation preview
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 1/14
CUDA
Aparallelcomputingplatformandprogrammingmodel
Developer(s) NVIDIACorporation
Initialrelease June23,2007
Stablerelease 7.0/March17,2015
Operatingsystem WindowsXPandlater,MacOSX,Linux
Platform SupportedGPUs
Type GPGPU
License Freeware
Website www.nvidia.com/object/cuda_home_new.html(http://www.nvidia.com/object/cuda_home_new.html)
CUDAFromWikipedia,thefreeencyclopedia
CUDA,whichstandsforComputeUnifiedDeviceArchitecture,[1]isaparallelcomputingplatformandapplicationprogramminginterface(API)modelcreatedbyNVIDIA.[2]ItallowssoftwaredeveloperstouseaCUDAenabledgraphicsprocessingunit(GPU)forgeneralpurposeprocessinganapproachknownasGPGPU.TheCUDAplatformisasoftwarelayerthatgivesdirectaccesstotheGPU'svirtualinstructionsetandparallelcomputationalelements.[3]
TheCUDAplatformisdesignedtoworkwithprogramminglanguagessuchasC,C++andFortran.ThisaccessibilitymakesiteasierforspecialistsinparallelprogrammingtoutilizeGPUresources,asopposedtopreviousAPIsolutionslikeDirect3DandOpenGL,whichrequiredadvancedskillsingraphicsprogramming.Also,CUDAsupportsprogrammingframeworkssuchasOpenACCandOpenCL.[3]
Contents
1Background2Programmingcapabilities3Advantages4Limitations5SupportedGPUs6Versionfeaturesandspecifications7Example8Languagebindings9CurrentandfutureusagesofCUDAarchitecture10Seealso11References12Externallinks
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 2/14
ExampleofCUDAprocessingflow1.CopydatafrommainmemtoGPUmem2.CPUinstructstheprocesstoGPU3.GPUexecuteparallelineachcore4.CopytheresultfromGPUmemtomainmem
Background
TheGPU,asaspecializedprocessor,addressesthedemandsofrealtimehighresolution3Dgraphicscomputeintensivetasks.Asof2012,GPUshaveevolvedintohighlyparallelmulticoresystemsallowingveryefficientmanipulationoflargeblocksofdata.ThisdesignismoreeffectivethangeneralpurposeCPUsforalgorithmswhereprocessingoflargeblocksofdataisdoneinparallel,suchas:
pushrelabelmaximumflowalgorithmfastsortalgorithmsoflargeliststwodimensionalfastwavelettransformmoleculardynamicssimulations
Programmingcapabilities
TheCUDAplatformisaccessibletosoftwaredevelopersthroughCUDAacceleratedlibraries,compilerdirectivessuchasOpenACC,andextensionstoindustrystandardprogramminglanguagesincludingC,C++andFortran.C/C++programmersuse'CUDAC/C++',compiledwith"nvcc"NVIDIA'sLLVMbasedC/C++compiler.[4]Fortranprogrammerscanuse'CUDAFortran',compiledwiththePGICUDAFortrancompilerfromThePortlandGroup.
Inadditiontolibraries,compilerdirectives,CUDAC/C++andCUDAFortran,theCUDAplatformsupportsothercomputationalinterfaces,includingtheKhronosGroup'sOpenCL,[5]Microsoft'sDirectCompute,OpenGLComputeShaders(http://www.opengl.org/wiki/Compute_Shader)andC++AMP.[6]ThirdpartywrappersarealsoavailableforPython,Perl,Fortran,Java,Ruby,Lua,Haskell,R,MATLAB,IDL,andnativesupportinMathematica.
Inthecomputergameindustry,GPUsareusednotonlyforgraphicsrenderingbutalsoingamephysicscalculations(physicaleffectssuchasdebris,smoke,fire,fluids)examplesincludePhysXandBullet.CUDAhasalsobeenusedtoacceleratenongraphicalapplicationsincomputationalbiology,cryptographyandotherfieldsbyanorderofmagnitudeormore.[7][8][9][10][11]
CUDAprovidesbothalowlevelAPIandahigherlevelAPI.TheinitialCUDASDKwasmadepublicon15February2007,forMicrosoftWindowsandLinux.MacOSXsupportwaslateraddedinversion2.0,[12]whichsupersedesthebetareleasedFebruary14,2008.[13]CUDAworkswithallNvidiaGPUsfromtheG8xseriesonwards,includingGeForce,QuadroandtheTeslaline.CUDAiscompatiblewithmoststandardoperatingsystems.NvidiastatesthatprogramsdevelopedfortheG8xserieswillalsoworkwithoutmodificationonallfutureNvidiavideocards,duetobinarycompatibility.
Advantages
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 3/14
CUDAhasseveraladvantagesovertraditionalgeneralpurposecomputationonGPUs(GPGPU)usinggraphicsAPIs:
ScatteredreadscodecanreadfromarbitraryaddressesinmemoryUnifiedvirtualmemory(CUDA4.0andabove)Unifiedmemory(CUDA6.0andabove)SharedmemoryCUDAexposesafastsharedmemoryregionthatcanbesharedamongstthreads.Thiscanbeusedasausermanagedcache,enablinghigherbandwidththanispossibleusingtexturelookups.[14]FasterdownloadsandreadbackstoandfromtheGPUFullsupportforintegerandbitwiseoperations,includingintegertexturelookups
Limitations
CUDAdoesnotsupportthefullCstandard,asitrunshostcodethroughaC++compiler,whichmakessomevalidC(butinvalidC++)codefailtocompile.[15][16]InteroperabilitywithrenderinglanguagessuchasOpenGLisoneway,withOpenGLhavingaccesstoregisteredCUDAmemorybutCUDAnothavingaccesstoOpenGLmemory.Copyingbetweenhostanddevicememorymayincuraperformancehitduetosystembusbandwidthandlatency(thiscanbepartlyalleviatedwithasynchronousmemorytransfers,handledbytheGPU'sDMAengine)Threadsshouldberunningingroupsofatleast32forbestperformance,withtotalnumberofthreadsnumberinginthethousands.Branchesintheprogramcodedonotaffectperformancesignificantly,providedthateachof32threadstakesthesameexecutionpaththeSIMDexecutionmodelbecomesasignificantlimitationforanyinherentlydivergenttask(e.g.traversingaspacepartitioningdatastructureduringraytracing).UnlikeOpenCL,CUDAenabledGPUsareonlyavailablefromNvidia[17]NoemulatororfallbackfunctionalityisavailableformodernrevisionsValidC/C++maysometimesbeflaggedandpreventcompilationduetooptimizationtechniquesthecompilerisrequiredtoemploytouselimitedresources.Asingleprocessmustrunspreadacrossmultipledisjointmemoryspaces,unlikeotherClanguageruntimeenvironments.C++RunTimeTypeInformation(RTTI)isnotsupportedinCUDAcode,duetolackofsupportintheunderlyinghardware.ExceptionhandlingisnotsupportedinCUDAcodeduetoperformanceoverheadthatwouldbeincurredwithmanythousandsofparallelthreadsrunning.CUDA(withcomputecapability2.x)allowsasubsetofC++classfunctionality,forexamplememberfunctionsmaynotbevirtual(thisrestrictionwillberemovedinsomefuturerelease).[SeeCUDACProgrammingGuide3.1AppendixD.6]InsingleprecisiononfirstgenerationCUDAcomputecapability1.xdevices,denormalnumbersarenotsupportedandareinsteadflushedtozero,andtheprecisionsofthedivisionandsquarerootoperationsareslightlylowerthanIEEE754compliantsingleprecisionmath.Devicesthatsupportcomputecapability2.0andabovesupportdenormalnumbers,andthedivisionandsquarerootoperationsareIEEE754compliantbydefault.However,userscanobtainthepreviousfastergaminggrademathofcomputecapability1.xdevicesifdesiredbysettingcompilerflagstodisableaccuratedivisions,disableaccuratesquareroots,andenableflushingdenormalnumberstozero.[18]
SupportedGPUs
Computecapabilitytable(versionofCUDAsupported)byGPUandcard.AlsoavailabledirectlyfromNvidia(http://developer.nvidia.com/cudagpus):
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 4/14
Computecapability(version)
Microarchitecture GPUs Cards
1.0
Tesla
G80,G92,G92b,G94,G94b
GeForceGT420*,GeForce8800Ultra,GeForce8800GTX,GeForceGT340*,GeForceGT330*,GeForceGT320*,GeForce315*,GeForce310*,GeForce9800GT,GeForce9600GT,GeForce9400GT,QuadroFX5600,QuadroFX4600,QuadroPlex2100S4,TeslaC870,TeslaD870,TeslaS870
1.1
G86,G84,G98,G96,G96b,G94,G94b,G92,G92b
GeForceG110M,GeForce9300MGS,GeForce9200MGS,GeForce9100MG,GeForce8400MGT,GeForce8600GT,GeForce8600GTS,GeForceG105M,QuadroFX4700X2,QuadroFX3700,QuadroFX1800,Quadro
FX1700,QuadroFX580,QuadroFX570,QuadroFX470,QuadroFX380,QuadroFX370,QuadroFX370LowProfile,QuadroNVS450,QuadroNVS420,QuadroNVS290,QuadroNVS295,QuadroPlex2100D4,QuadroFX3800M,QuadroFX3700M,QuadroFX3600M,QuadroFX
2800M,QuadroFX2700M,QuadroFX1700M,QuadroFX1600M,QuadroFX770M,QuadroFX570M,QuadroFX370M,QuadroFX360M,QuadroNVS320M,QuadroNVS160M,QuadroNVS150M,QuadroNVS140M,QuadroNVS135M,QuadroNVS130M,QuadroNVS450,QuadroNVS
420,QuadroNVS295
1.2GT218,GT216,GT215
GeForceGT240,GeForceGT220*,GeForce210*,GeForceGTS360M,GeForceGTS350M,GeForceGT335M,GeForceGT330M,GeForceGT
325M,GeForceGT240M,GeForceG210M,GeForce310M,GeForce305M,QuadroFX380LowProfile,NVIDIANVS300,QuadroFX1800M,
QuadroFX880M,QuadroFX380M,NVIDIANVS300,NVS5100M,NVS3100M,NVS2100M,ION
1.3 GT200,GT200b
GeForceGTX280,GeForceGTX275,GeForceGTX260,QuadroFX5800,QuadroFX4800,QuadroFX4800forMac,QuadroFX3800,QuadroCX,
QuadroPlex2200D2,TeslaC1060,TeslaS1070,TeslaM1060
2.0
Fermi
GF100,GF110
GeForceGTX590,GeForceGTX580,GeForceGTX570,GeForceGTX480,GeForceGTX470,GeForceGTX465,GeForceGTX480M,Quadro
6000,Quadro5000,Quadro4000,Quadro4000forMac,QuadroPlex7000,Quadro5010M,Quadro5000M,TeslaC2075,TeslaC2050/C2070,Tesla
M2050/M2070/M2075/M2090
2.1
GF104,GF106
GF108GF114,GF116,GF119
GeForceGTX560Ti,GeForceGTX550Ti,GeForceGTX460,GeForceGTS450,GeForceGTS450*,GeForceGT640(GDDR3),GeForceGT630,
GeForceGT620,GeForceGT610,GeForceGT520,GeForceGT440,GeForceGT440*,GeForceGT430,GeForceGT430*,GeForceGTX675M,GeForceGTX670M,GeForceGT635M,GeForceGT630M,
GeForceGT625M,GeForceGT720M,GeForceGT620M,GeForce710M,GeForce610M,GeForceGTX580M,GeForceGTX570M,GeForceGTX560M,GeForceGT555M,GeForceGT550M,GeForceGT540M,GeForceGT525M,GeForceGT520MX,GeForceGT520M,GeForceGTX485M,GeForceGTX470M,GeForceGTX460M,GeForceGT445M,GeForceGT435M,GeForceGT420M,GeForceGT415M,GeForce710M,GeForce410M,Quadro2000,Quadro2000D,Quadro600,Quadro410,Quadro
4000M,Quadro3000M,Quadro2000M,Quadro1000M,NVS5400M,NVS5200M,NVS4200M
3.0GK104,GK106,GK107
GeForceGTX770,GeForceGTX760,GeForceGT740,GeForceGTX690,GeForceGTX680,GeForceGTX670,GeForceGTX660Ti,GeForceGTX660,GeForceGTX650TiBOOST,GeForceGTX650Ti,GeForceGTX650,GeForceGTX880M,GeForceGTX780M,GeForceGTX770M,
GeForceGTX765M,GeForceGTX760M,GeForceGTX680MX,GeForceGTX680M,GeForceGTX675MX,GeForceGTX670MX,GeForceGTX660M,GeForceGT750M,GeForceGT650M,GeForceGT745M,GeForce
GT645M,GeForceGT740M,GeForceGT730M,GeForceGT640M,GeForceGT640MLE,GeForceGT735M,GeForceGT730M,Quadro
K5000,QuadroK4200,QuadroK4000,QuadroK2000,QuadroK2000D,
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 5/14
Kepler QuadroK600,QuadroK420,QuadroK500M,QuadroK510M,QuadroK610M,QuadroK1000M,QuadroK2000M,QuadroK1100M,QuadroK2100M,QuadroK3000M,QuadroK3100M,QuadroK4000M,Quadro
K5000M,QuadroK4100M,QuadroK5100M,TeslaK103.2 TegraK1 JetsonTK1(SoC)
3.5 GK110,GK208
GeForceGTXTITANZ,GeForceGTXTITANBlack,GeForceGTXTITAN,GeForceGTX780Ti,GeForceGTX780,GeForceGT640
(GDDR5),GeForceGT630v2,GeForceGT730,GeForceGT720,QuadroK6000,QuadroK5200,TeslaK40,TeslaK20x,TeslaK20
3.7 GK210 TeslaK80
5.0
Maxwell
GM107,GM108
GeForceGTX750Ti,GeForceGTX750,GeForceGTX960M,GeForceGTX950M,GeForce940M,GeForce930M,GeForceGTX860M,GeForceGTX850M,GeForce845M,GeForce840M,GeForce830M,QuadroK2200,
QuadroK1200,QuadroK620,QuadroK620M
5.2GM200,GM204,GM206
GeForceGTXTITANX,GeForceGTX980Ti,GeForceGTX980,GeForceGTX970,GeForceGTX960,GeForceGTX950,GeForceGTX980M,
GeForceGTX970M,GeForceGTX965M,QuadroM6000,QuadroM5000,QuadroM4000
5.3 TegraX1
'*'OEMonlyproducts
AtableofdevicesofficiallysupportingCUDA:[17]
NvidiaGeForceGeForceGTXTITANXGeForceGTX980TiGeForceGTX980GeForceGTX970GeForceGTX960GeForceGTX950GeForceGTXTitanZGeForceGTXTITANBlackGeForceGTXTITANGeForceGTX780TiGeForceGTX780GeForceGTX770GeForceGTX760GeForceGTX750TiGeForceGTX750GeForceGT740GeForceGT730GeForceGTX690GeForceGTX680GeForceGTX670GeForceGTX660TiGeForceGTX660GeForceGTX650TiBOOSTGeForceGTX650TiGeForceGTX650
NvidiaGeForceMobileGeForceGTX980MGeForceGTX970MGeForceGTX965MGeForceGTX960MGeForceGTX950MGeForce940MGeForce930MGeForceGTX880MGeForceGTX870MGeForceGTX860MGeForceGTX850MGeForce845MGeForce840MGeForce830MGeForceGTX780MGeForceGTX770MGeForceGTX765MGeForceGTX760MGeForceGT750MGeForceGT745MGeForceGT740MGeForceGT735MGeForceGT730MGeForceGTX680MXGeForceGTX680M
NvidiaQuadroQuadroM6000QuadroM5000QuadroM4000QuadroK6000QuadroK5200QuadroK5000QuadroK4200QuadroK4000QuadroK2200QuadroK2000DQuadroK2000QuadroK1200QuadroK620QuadroK600QuadroK420Quadro6000Quadro5000Quadro4000Quadro2000Quadro600QuadroFX5800QuadroFX5600QuadroFX4800QuadroFX4700X2QuadroFX4600
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 6/14
GeForceGT640GeForceGT630GeForceGT620GeForceGT610GeForceGTX590GeForceGTX580GeForceGTX570GeForceGTX560TiGeForceGTX560GeForceGTX550TiGeForceGT520GeForceGTX480GeForceGTX470GeForceGTX465GeForceGTX460GeForceGTX460SEGeForceGTS450GeForceGT440GeForceGT430GeForceGT420GeForceGTX295GeForceGTX285GeForceGTX280GeForceGTX275GeForceGTX260GeForceGTS250GeForceGTS240GeForceGT240GeForceGT220GeForce210/G210GeForceGT140GeForce9800GX2GeForce9800GTX+GeForce9800GTXGeForce9800GTGeForce9600GSOGeForce9600GTGeForce9500GTGeForce9400GTGeForce9400mGPUGeForce9300mGPUGeForce9100mGPUGeForce8800UltraGeForce8800GTXGeForce8800GTSGeForce8800GTGeForce8800GSGeForce8600GTSGeForce8600GT
GeForceGTX675MXGeForceGTX675MGeForceGTX670MXGeForceGTX670MGeForceGTX660MGeForceGT650MGeForceGT645MGeForceGT640MGeForceGTX580MGeForceGTX570MGeForceGTX560MGeForceGT555MGeForceGT550MGeForceGT540MGeForceGT525MGeForceGT520MGeForceGTX480MGeForceGTX470MGeForceGTX460MGeForceGT445MGeForceGT435MGeForceGT425MGeForceGT420MGeForceGT415MGeForceGTX285MGeForceGTX280MGeForceGTX260MGeForceGTS360MGeForceGTS350MGeForceGTS260MGeForceGTS250MGeForceGT335MGeForceGT330MGeForceGT325MGeForceGT320MGeForce310MGeForceGT240MGeForceGT230MGeForceGT220MGeForceG210MGeForceGTS160MGeForceGTS150MGeForceGT130MGeForceGT120MGeForceG110MGeForceG105MGeForceG103MGeForceG102MGeForceG100
QuadroFX3800QuadroFX3700QuadroFX1800QuadroFX1700QuadroFX580QuadroFX570QuadroFX380QuadroFX370QuadroNVS510QuadroNVS450QuadroNVS420QuadroNVS295QuadroPlex1000ModelIVQuadroPlex1000ModelS4NvidiaQuadroMobileQuadroK5100MQuadroK5000MQuadroK4100MQuadroK4000MQuadroK3100MQuadroK3000MQuadroK2100MQuadroK2000MQuadroK1100MQuadroK1000MQuadroK620MQuadroK610MQuadroK510MQuadroK500MQuadro5010MQuadro5000MQuadro4000MQuadro3000MQuadro2000MQuadro1000MQuadroFX3800MQuadroFX3700MQuadroFX3600MQuadroFX2800MQuadroFX2700MQuadroFX1800MQuadroFX1700MQuadroFX1600MQuadroFX880MQuadroFX770MQuadroFX570MQuadroFX380MQuadroFX370MQuadroFX360M
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 7/14
GeForce8600mGTGeForce8500GTGeForce8400GSGeForce8300mGPUGeForce8200mGPUGeForce8100mGPU
GeForce9800MGTXGeForce9800MGTSGeForce9800MGTGeForce9800MGSGeForce9700MGTSGeForce9700MGTGeForce9650MGTGeForce9650MGSGeForce9600MGTGeForce9600MGSGeForce9500MGSGeForce9500MGGeForce9400MGGeForce9300MGSGeForce9300MGGeForce9200MGSGeForce9100MGGeForce8800MGTXGeForce8800MGTSGeForce8700MGTGeForce8600MGTGeForce8600MGSGeForce8400MGTGeForce8400MGSGeForce8400MGGeForce8200MG
QuadroNVS320MQuadroNVS160MQuadroNVS150MQuadroNVS140MQuadroNVS135MQuadroNVS130M
NvidiaTeslaTeslaK80TeslaK40TeslaK20XTeslaK20TeslaK10TeslaC2050/2070TeslaM2050/M2070TeslaS2050TeslaS1070TeslaM1060TeslaC1060TeslaC870TeslaD870TeslaS870
Versionfeaturesandspecifications
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 8/14
Featuresupport(unlistedfeaturesaresupportedforallcomputecapabilities)
Computecapability(version)1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2
Integeratomicfunctionsoperatingon32bitwordsinglobalmemory
No YesatomicExch()operatingon32bitfloatingpointvaluesinglobalmemoryIntegeratomicfunctionsoperatingon32bitwordsinsharedmemory
No YesatomicExch()operatingon32bitfloatingpointvaluesinsharedmemoryIntegeratomicfunctionsoperatingon64bitwordsinglobalmemoryWarpvotefunctionsDoubleprecisionfloatingpointoperations No YesAtomicfunctionsoperatingon64bitintegervaluesinsharedmemory
No Yes
Floatingpointatomicadditionoperatingon32bitwordsinglobalandsharedmemory_ballot()_threadfence_system()_syncthreads_count(),_syncthreads_and(),_syncthreads_or()Surfacefunctions3DgridofthreadblockWarpshufflefunctions No YesFunnelshift
No YesDynamicparallelism
Featuresupport(unlistedfeaturesaresupportedforallcomputecapabilities)
1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2Computecapability(version)
TechnicalspecificationsComputecapability(version)
1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2Maximumdimensionalityofgridofthreadblocks 2 3
Maximumxdimensionofagridofthreadblocks 65535 2
311
Maximumy,orzdimensionofagridofthreadblocks 65535
Maximumdimensionalityofthreadblock 3
Maximumxorydimensionofablock 512 1024
Maximumzdimensionofablock 64Maximumnumberofthreadsperblock 512 1024Warpsize 32Maximumnumberofresidentblockspermultiprocessor 8 16 32
Maximumnumberofresidentwarpspermultiprocessor 24 32 48 64
Maximumnumberofresidentthreadspermultiprocessor 768 1024 1536 2048
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 9/14
Numberof32bitregisterspermultiprocessor 8K 16K 32K 64K 128K 64KMaximumnumberof32bitregistersperthread 128 63 255
Maximumamountofsharedmemorypermultiprocessor 16KB 48KB
112KB
64KB
96KB
Numberofsharedmemorybanks 16 32Amountoflocalmemoryperthread 16KB 512KBConstantmemorysize 64KBCacheworkingsetpermultiprocessorforconstantmemory
8KB 10KB
Cacheworkingsetpermultiprocessorfortexturememory
Devicedependent,between6KBand8KB 12KB
Between12KB
and48KB24KB
Maximumwidthfor1DtexturereferenceboundtoaCUDAarray 8192 65536
Maximumwidthfor1Dtexturereferenceboundtolinearmemory 2
27
Maximumwidthandnumberoflayersfora1Dlayeredtexturereference 8192512 163842048
Maximumwidthandheightfor2DtexturereferenceboundtoaCUDAarray 6553632768 6553665535
Maximumwidthandheightfor2Dtexturereferenceboundtoalinearmemory 6500065000 6500065000
Maximumwidthandheightfor2DtexturereferenceboundtoaCUDAarraysupportingtexturegather
N/A 1638416384
Maximumwidth,height,andnumberoflayersfora2Dlayeredtexturereference 81928192512 16384163842048
Maximumwidth,heightanddepthfora3DtexturereferenceboundtolinearmemoryoraCUDAarray
204820482048 409640964096
Maximumwidth(andheight)foracubemaptexturereference N/A 16384
Maximumwidth(andheight)andnumberoflayersforacubemaplayeredtexturereference
N/A 163842046
Maximumnumberoftexturesthatcanbeboundtoakernel 128 256
Maximumwidthfora1DsurfacereferenceboundtoaCUDAarray
Notsupported
65536
Maximumwidthandnumberoflayersfora1Dlayeredsurfacereference 655362048
Maximumwidthandheightfora2DsurfacereferenceboundtoaCUDAarray 6553632768
Maximumwidth,height,andnumberoflayersfora2Dlayeredsurfacereference 65536327682048
Maximumwidth,height,anddepthfora3DsurfacereferenceboundtoaCUDAarray
65536327682048
Maximumwidth(andheight)foracubemapsurfacereferenceboundtoaCUDAarray 32768
Maximumwidth(andheight)andnumber
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 10/14
oflayersforacubemaplayeredsurfacereference
327682046
Maximumnumberofsurfacesthatcanbeboundtoakernel 8 16
Maximumnumberofinstructionsperkernel 2million 512million
Technicalspecifications1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2
Computecapability(version)
ArchitecturespecificationsComputecapability(version)
1.0 1.1 1.2 1.3 2.0 2.1 3.0 3.5 3.7 5.0 5.2NumberofALUlanesforintegerandfloatingpointarithmeticoperations 8
[19] 32 48 192 128
Numberofspecialfunctionunitsforsingleprecisionfloatingpointtranscendentalfunctions 2 4 8 32
Numberoftexturefilteringunitsforeverytextureaddressunitorrenderoutputunit(ROP) 2 4 8 16 8
Numberofwarpschedulers 1 2 4Numberofinstructionsissuedatoncebyscheduler 1 2[20]
Formoreinformationpleasevisitthissite:http://www.geeks3d.com/20100606/gpucomputingnvidiacudacomputecapabilitycomparativetable/andalsoreadNvidiaCUDAprogrammingguide.[21]
Example
ThisexamplecodeinC++loadsatexturefromanimageintoanarrayontheGPU:
texturetex;
voidfoo(){cudaArray*cu_array;
//AllocatearraycudaChannelFormatDescdescription=cudaCreateChannelDesc();cudaMallocArray(&cu_array,&description,width,height);
//CopyimagedatatoarraycudaMemcpyToArray(cu_array,image,width*height*sizeof(float),cudaMemcpyHostToDevice);
//Settextureparameters(default)tex.addressMode[0]=cudaAddressModeClamp;tex.addressMode[1]=cudaAddressModeClamp;tex.filterMode=cudaFilterModePoint;tex.normalized=false;//donotnormalizecoordinates
//BindthearraytothetexturecudaBindTextureToArray(tex,cu_array);
//Runkerneldim3blockDim(16,16,1);dim3gridDim((width+blockDim.x1)/blockDim.x,(height+blockDim.y1)/blockDim.y,1);kernel(d_data,height,width);
//UnbindthearrayfromthetexturecudaUnbindTexture(tex);}//endfoo()
__global__voidkernel(float*odata,intheight,intwidth){unsignedintx=blockIdx.x*blockDim.x+threadIdx.x;
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 11/14
unsignedinty=blockIdx.y*blockDim.y+threadIdx.y;if(x
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 12/14
MathematicaCUDALink(http://reference.wolfram.com/mathematica/CUDALink/tutorial/Overview.html)MATLABParallelComputingToolbox,MATLABDistributedComputingServer,[24]and3rdpartypackageslikeJacket..NETCUDA.NET(http://www.casshpc.com/solutions/libraries/cudanet),ManagedCUDA(https://managedcuda.codeplex.com),CUDAfy.NET(http://www.hybriddsp.com).NETkernelandhostcode,CURAND,CUBLAS,CUFFTPerlKappaCUDA(http://psilambda.com/download/kappaforperl),CUDA::Minimal(https://github.com/run4flat/perlCUDAMinimal)PythonNumba,NumbaPro,PyCUDA(http://mathema.tician.de/software/pycuda),KappaCUDA(http://psilambda.com/download/kappaforpython),TheanoRubyKappaCUDA(http://psilambda.com/download/kappaextras)Rgputools(http://brainarray.mbni.med.umich.edu/brainarray/rgpgpu/)
CurrentandfutureusagesofCUDAarchitecture
Acceleratedrenderingof3DgraphicsAcceleratedinterconversionofvideofileformatsAcceleratedencryption,decryptionandcompressionDistributedcalculations,suchaspredictingthenativeconformationofproteinsMedicalanalysissimulations,forexamplevirtualrealitybasedonCTandMRIscanimages.Physicalsimulations,inparticularinfluiddynamics.NeuralnetworktraininginmachinelearningproblemsDistributedcomputingMoleculardynamicsMiningcryptocurrencies
Seealso
AllineaDDTAdebuggerforCUDA,OpenACC,andparallelapplicationsOpenCLAstandardforprogrammingavarietyofplatforms,includingGPUsBrookGPUtheStanfordUniversitygraphicsgroup'scompilerArrayprogrammingParallelcomputingStreamprocessingrCUDAAnAPIforcomputingonremotecomputersMolecularmodelingonGPU
References1. Shimpi,AnandLalWilson,Derek(November8,2006)."NVIDIA'sGeForce8800(G80):GPUsRe
architectedforDirectX10"(http://www.anandtech.com/show/2116/8).AnandTech.RetrievedMay16,2015.2. NVIDIACUDAHomePage(http://www.nvidia.com/object/cuda_home_new.html)3. AbiChahla,Fedy(June18,2008)."Nvidia'sCUDA:TheEndoftheCPU?"
(http://www.tomshardware.com/reviews/nvidiacudagpu,1954.html).Tom'sHardware.RetrievedMay17,2015.
4. CUDALLVMCompiler(http://developer.nvidia.com/cuda/cudallvmcompiler)5. FirstOpenCLdemoonaGPU(https://www.youtube.com/watch?v=r1sN1ELJfNo)onYouTube6. DirectComputeOceanDemoRunningonNvidiaCUDAenabledGPU(https://www.youtube.com/watch?
v=K1I4kts5mqc)onYouTube7. GiorgosVasiliadis,SpirosAntonatos,MichalisPolychronakis,EvangelosP.MarkatosandSotirisIoannidis
(September2008)."Gnort:HighPerformanceNetworkIntrusionDetectionUsingGraphicsProcessors"(http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf)(PDF).Proceedingsofthe11thInternational
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 13/14
ExternallinksOfficialwebsite(http://www.nvidia.com/object/cuda_home.html)CUDACommunity(https://plus.google.com/communities/114632076318201174454)onGoogle+AlittletooltoadjusttheVRAMsize(https://devtalk.nvidia.com/default/topic/726765/needalittletooltoadjustthevramsize/)
Retrievedfrom"https://en.wikipedia.org/w/index.php?title=CUDA&oldid=674383050"
Categories: Computerphysicsengines GPGPU GPGPUlibraries GraphicshardwareNvidiasoftware Parallelcomputing Videocards Videogamehardware
(http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf)(PDF).Proceedingsofthe11thInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID).
8. Schatz,M.C.,Trapnell,C.,Delcher,A.L.,Varshney,A.(2007)."HighthroughputsequencealignmentusingGraphicsProcessingUnits"(http://www.biomedcentral.com/14712105/8/474).BMCBioinformatics.8:474:474.doi:10.1186/147121058474(https://dx.doi.org/10.1186%2F147121058474).PMC2222658(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222658).PMID18070356(https://www.ncbi.nlm.nih.gov/pubmed/18070356).
9. Manavski,SvetlinA.GiorgioValle(2008)."CUDAcompatibleGPUcardsasefficienthardwareacceleratorsforSmithWatermansequencealignment"(http://www.biomedcentral.com/14712105/9/S2/S10).BMCBioinformatics9:S10.doi:10.1186/147121059S2S10(https://dx.doi.org/10.1186%2F147121059S2S10).PMC2323659(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323659).PMID18387198(https://www.ncbi.nlm.nih.gov/pubmed/18387198).
10. PyritGoogleCodehttps://code.google.com/p/pyrit/11. UseyourNvidiaGPUforscientificcomputing(http://boinc.berkeley.edu/cuda.php),BOINCofficialsite
(December18,2008)12. NvidiaCUDASoftwareDevelopmentKit(CUDASDK)ReleaseNotesVersion2.0forMACOSX
(http://developer.download.nvidia.com/compute/cuda/sdk/website/doc/CUDA_SDK_release_notes_macosx.txt)13. CUDA1.1NowonMacOSX(http://news.developer.nvidia.com/2008/02/cuda11nowo.html)(Posted
onFeb14,2008)14. Silberstein,MarkSchuster,AssafGeiger,DanPatney,AnjulOwens,JohnD.(2008).Efficientcomputation
ofsumproductsonGPUsthroughsoftwaremanagedcache.Proceedingsofthe22ndannualinternationalconferenceonSupercomputingICS'08.pp.309318.doi:10.1145/1375527.1375572(https://dx.doi.org/10.1145%2F1375527.1375572).ISBN9781605581583.
15. NVCCforcesc++compilationof.cufiles(https://devtalk.nvidia.com/default/topic/508479/cudaprogrammingandperformance/nvccforcesccompilationofcufiles/#entry1340190)
16. C++keywordsonCUDACcode(http://stackoverflow.com/questions/15362678/ckeywordsoncudaccode/15362798)
17. "CUDAEnabledProducts"(http://www.nvidia.com/object/cuda_learn_products.html).CUDAZone.NvidiaCorporation.Retrieved20081103.
18. Whitehead,NathanFitFlorea,Alex."Precision&Performance:FloatingPointandIEEE754ComplianceforNVIDIAGPUs"(https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIACUDAFloatingPoint.pdf)(PDF).Nvidia.RetrievedNovember18,2014.
19. ALUsperformonlysingleprecisionfloatingpointarithmetics.Thereis1doubleprecisionfloatingpointunit.
20. Nomorethanoneschedulercanissue2instructionsatonce.ThefirstschedulerisinchargeofthewarpswithanoddIDandthesecondschedulerisinchargeofthewarpswithanevenID.
21. AppendixF.FeaturesandTechnicalSpecifications(http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf)PDF(3.2MiB),Page148of175(Version5.0October2012)
22. PyCUDA(http://mathema.tician.de/software/pycuda)23. pycublas(http://kered.org/blog/20090413/easypythonnumpycudacublas/)24. "MATLABAddsGPGPUSupport"(http://www.hpcwire.com/features/MATLABAddsGPGPUSupport
103307084.html).20100920.
8/6/2015 CUDAWikipedia,thefreeencyclopedia
https://en.wikipedia.org/wiki/CUDA 14/14
Thispagewaslastmodifiedon3August2015,at15:46.TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.
Recommended