000 | 11877nam a22005053i 4500 | ||
---|---|---|---|
001 | EBC6383586 | ||
003 | MiAaPQ | ||
005 | 20220324112726.0 | ||
006 | m o d | | ||
007 | cr cnu|||||||| | ||
008 | 220324s2020 xx o ||||0 eng d | ||
020 |
_a9781484255742 _q(electronic bk.) |
||
020 | _z9781484255735 | ||
035 | _a(MiAaPQ)EBC6383586 | ||
035 | _a(Au-PeEL)EBL6383586 | ||
035 | _a(OCoLC)1204226016 | ||
040 |
_aMiAaPQ _beng _erda _epn _cMiAaPQ _dMiAaPQ |
||
050 | 4 | _aQA76.76.C65 | |
100 | 1 | _aReinders, James. | |
245 | 1 | 0 |
_aData Parallel C++ : _bMastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. |
264 | 1 |
_aBerkeley, CA : _bApress L. P., _c2020. |
|
264 | 4 | _c�2021. | |
300 | _a1 online resource (565 pages) | ||
336 |
_atext _btxt _2rdacontent |
||
337 |
_acomputer _bc _2rdamedia |
||
338 |
_aonline resource _bcr _2rdacarrier |
||
505 | 0 | _aIntro -- Table of Contents -- About the Authors -- Preface -- Acknowledgments -- Chapter 1: Introduction -- Read the Book, Not the Spec -- SYCL 1.2.1 vs. SYCL 2020, and DPC++ -- Getting a DPC++ Compiler -- Book GitHub -- Hello, World! and a SYCL Program Dissection -- Queues and Actions -- It Is All About Parallelism -- Throughput -- Latency -- Think Parallel -- Amdahl and Gustafson -- Scaling -- Heterogeneous Systems -- Data-Parallel Programming -- Key Attributes of DPC++ and SYCL -- Single-Source -- Host -- Devices -- Sharing Devices -- Kernel Code -- Kernel: Vector Addition (DAXPY) -- Asynchronous Task Graphs -- Race Conditions When We Make a Mistake -- C++ Lambda Functions -- Portability and Direct Programming -- Concurrency vs. Parallelism -- Summary -- Chapter 2: Where Code Executes -- Single-Source -- Host Code -- Device Code -- Choosing Devices -- Method#1: Run on a Device of Any Type -- Queues -- Binding a Queue to a Device, When Any Device Will Do -- Method#2: Using the Host Device for Development and Debugging -- Method#3: Using a GPU (or Other Accelerators) -- Device Types -- Accelerator Devices -- Device Selectors -- When Device Selection Fails -- Method#4: Using Multiple Devices -- Method#5: Custom (Very Specific) Device Selection -- device_selector Base Class -- Mechanisms to Score a Device -- Three Paths to Device Code Execution on CPU -- Creating Work on a Device -- Introducing the Task Graph -- Where Is the Device Code? -- Actions -- Fallback -- Summary -- Chapter 3: Data Management -- Introduction -- The Data Management Problem -- Device Local vs. Device Remote -- Managing Multiple Memories -- Explicit Data Movement -- Implicit Data Movement -- Selecting the Right Strategy -- USM, Buffers, and Images -- Unified Shared Memory -- Accessing Memory Through Pointers -- USM and Data Movement -- Explicit Data Movement in USM. | |
505 | 8 | _aImplicit Data Movement in USM -- Buffers -- Creating Buffers -- Accessing Buffers -- Access Modes -- Ordering the Uses of Data -- In-order Queues -- Out-of-Order (OoO) Queues -- Explicit Dependences with Events -- Implicit Dependences with Accessors -- Choosing a Data Management Strategy -- Handler Class: Key Members -- Summary -- Chapter 4: Expressing Parallelism -- Parallelism Within Kernels -- Multidimensional Kernels -- Loops vs. Kernels -- Overview of Language Features -- Separating Kernels from Host Code -- Different Forms of Parallel Kernels -- Basic Data-Parallel Kernels -- Understanding Basic Data-Parallel Kernels -- Writing Basic Data-Parallel Kernels -- Details of Basic Data-Parallel Kernels -- The range Class -- The id Class -- The item Class -- Explicit ND-Range Kernels -- Understanding Explicit ND-Range Parallel Kernels -- Work-Items -- Work-Groups -- Sub-Groups -- Writing Explicit ND-Range Data-Parallel Kernels -- Details of Explicit ND-Range Data-Parallel Kernels -- The nd_range Class -- The nd_item Class -- The group Class -- The sub_group Class -- Hierarchical Parallel Kernels -- Understanding Hierarchical Data-Parallel Kernels -- Writing Hierarchical Data-Parallel Kernels -- Details of Hierarchical Data-Parallel Kernels -- The h_item Class -- The private_memory Class -- Mapping Computation to Work-Items -- One-to-One Mapping -- Many-to-One Mapping -- Choosing a Kernel Form -- Summary -- Chapter 5: Error Handling -- Safety First -- Types of Errors -- Let's Create Some Errors! -- Synchronous Error -- Asynchronous Error -- Application Error Handling Strategy -- Ignoring Error Handling -- Synchronous Error Handling -- Asynchronous Error Handling -- The Asynchronous Handler -- Invocation of the Handler -- Errors on a Device -- Summary -- Chapter 6: Unified Shared Memory -- Why Should We Use USM? -- Allocation Types -- Device Allocations. | |
505 | 8 | _aHost Allocations -- Shared Allocations -- Allocating Memory -- What Do We Need to Know? -- Multiple Styles -- Allocations �a la C -- Allocations �a la C++ -- C++ Allocators -- Deallocating Memory -- Allocation Example -- Data Management -- Initialization -- Data Movement -- Explicit -- Implicit -- Migration -- Fine-Grained Control -- Queries -- Summary -- Chapter 7: Buffers -- Buffers -- Creation -- Buffer Properties -- use_host_ptr -- use_mutex -- context_bound -- What Can We Do with a Buffer? -- Accessors -- Accessor Creation -- What Can We Do with an Accessor? -- Summary -- Chapter 8: Scheduling Kernels and Data Movement -- What Is Graph Scheduling? -- How Graphs Work in DPC++ -- Command Group Actions -- How Command Groups Declare Dependences -- Examples -- When Are the Parts of a CG Executed? -- Data Movement -- Explicit -- Implicit -- Synchronizing with the Host -- Summary -- Chapter 9: Communication and Synchronization -- Work-Groups and Work-Items -- Building Blocks for Efficient Communication -- Synchronization via Barriers -- Work-Group Local Memory -- Using Work-Group Barriers and Local Memory -- Work-Group Barriers and Local Memory in ND-Range Kernels -- Local Accessors -- Synchronization Functions -- A Full ND-Range Kernel Example -- Work-Group Barriers and Local Memory in Hierarchical Kernels -- Scopes for Local Memory and Barriers -- A Full Hierarchical Kernel Example -- Sub-Groups -- Synchronization via Sub-Group Barriers -- Exchanging Data Within a Sub-Group -- A Full Sub-Group ND-Range Kernel Example -- Collective Functions -- Broadcast -- Votes -- Shuffles -- Loads and Stores -- Summary -- Chapter 10: Defining Kernels -- Why Three Ways to Represent a Kernel? -- Kernels As Lambda Expressions -- Elements of a Kernel Lambda Expression -- Naming Kernel Lambda Expressions -- Kernels As Named Function Objects. | |
505 | 8 | _aElements of a Kernel Named Function Object -- Interoperability with Other APIs -- Interoperability with API-Defined Source Languages -- Interoperability with API-Defined Kernel Objects -- Kernels in Program Objects -- Summary -- Chapter 11: Vectors -- How to Think About Vectors -- Vector Types -- Vector Interface -- Load and Store Member Functions -- Swizzle Operations -- Vector Execution Within a Parallel Kernel -- Vector Parallelism -- Summary -- Chapter 12: Device Information -- Refining Kernel Code to Be More Prescriptive -- How to Enumerate Devices and Capabilities -- Custom Device Selector -- Being Curious: get_info< -- > -- -- Being More Curious: Detailed Enumeration Code -- Inquisitive: get_info< -- > -- -- Device Information Descriptors -- Device-Specific Kernel Information Descriptors -- The Specifics: Those of "Correctness" -- Device Queries -- Kernel Queries -- The Specifics: Those of "Tuning/Optimization" -- Device Queries -- Kernel Queries -- Runtime vs. Compile-Time Properties -- Summary -- Chapter 13: Practical Tips -- Getting a DPC++ Compiler and Code Samples -- Online Forum and Documentation -- Platform Model -- Multiarchitecture Binaries -- Compilation Model -- Adding SYCL to Existing C++ Programs -- Debugging -- Debugging Kernel Code -- Debugging Runtime Failures -- Initializing Data and Accessing Kernel Outputs -- Multiple Translation Units -- Performance Implications of Multiple Translation Units -- When Anonymous Lambdas Need Names -- Migrating from CUDA to SYCL -- Summary -- Chapter 14: Common Parallel Patterns -- Understanding the Patterns -- Map -- Stencil -- Reduction -- Scan -- Pack and Unpack -- Pack -- Unpack -- Using Built-In Functions and Libraries -- The DPC++ Reduction Library -- The reduction Class -- The reducer Class -- User-Defined Reductions -- oneAPI DPC++ Library -- Group Functions. | |
505 | 8 | _aDirect Programming -- Map -- Stencil -- Reduction -- Scan -- Pack and Unpack -- Pack -- Unpack -- Summary -- For More Information -- Chapter 15: Programming for GPUs -- Performance Caveats -- How GPUs Work -- GPU Building Blocks -- Simpler Processors (but More of Them) -- Expressing Parallelism -- Expressing More Parallelism -- Simplified Control Logic (SIMD Instructions) -- Predication and Masking -- SIMD Efficiency -- SIMD Efficiency and Groups of Items -- Switching Work to Hide Latency -- Offloading Kernels to GPUs -- SYCL Runtime Library -- GPU Software Drivers -- GPU Hardware -- Beware the Cost of Offloading! -- Transfers to and from Device Memory -- GPU Kernel Best Practices -- Accessing Global Memory -- Accessing Work-Group Local Memory -- Avoiding Local Memory Entirely with Sub-Groups -- Optimizing Computation Using Small Data Types -- Optimizing Math Functions -- Specialized Functions and Extensions -- Summary -- For More Information -- Chapter 16: Programming for CPUs -- Performance Caveats -- The Basics of a General-Purpose CPU -- The Basics of SIMD Hardware -- Exploiting Thread-Level Parallelism -- Thread Affinity Insight -- Be Mindful of First Touch to Memory -- SIMD Vectorization on CPU -- Ensure SIMD Execution Legality -- SIMD Masking and Cost -- Avoid Array-of-Struct for SIMD Efficiency -- Data Type Impact on SIMD Efficiency -- SIMD Execution Using single_task -- Summary -- Chapter 17: Programming for FPGAs -- Performance Caveats -- How to Think About FPGAs -- Pipeline Parallelism -- Kernels Consume Chip "Area" -- When to Use an FPGA -- Lots and Lots of Work -- Custom Operations or Operation Widths -- Scalar Data Flow -- Low Latency and Rich Connectivity -- Customized Memory Systems -- Running on an FPGA -- Compile Times -- The FPGA Emulator -- FPGA Hardware Compilation Occurs "Ahead-of-Time" -- Writing Kernels for FPGAs. | |
505 | 8 | _aExposing Parallelism. | |
588 | _aDescription based on publisher supplied metadata and other sources. | ||
590 | _aElectronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2022. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries. | ||
655 | 4 | _aElectronic books. | |
700 | 1 | _aAshbaugh, Ben. | |
700 | 1 | _aBrodman, James. | |
700 | 1 | _aKinsner, Michael. | |
700 | 1 | _aPennycook, John. | |
700 | 1 | _aTian, Xinmin. | |
776 | 0 | 8 |
_iPrint version: _aReinders, James _tData Parallel C++ _dBerkeley, CA : Apress L. P.,c2020 _z9781484255735 |
797 | 2 | _aProQuest (Firm) | |
856 | 4 | 0 |
_uhttps://www.nbs.de/bibliothek/faq _zWie greife ich auf das E-Book zu? |
856 | 4 | 0 |
_uhttps://ebookcentral.proquest.com/lib/nbsde/detail.action?docID=6383586 _zClick to View |
999 |
_c2067 _d2067 |