Skip to content

Commit 43337a4

Browse files
Geonseok LeeGeonseok Lee
authored andcommitted
Update: metal doc
1 parent 1ca2f8b commit 43337a4

File tree

1 file changed

+14
-273
lines changed

1 file changed

+14
-273
lines changed

_docs/Documents/metal.md

Lines changed: 14 additions & 273 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ permalink: /docs/metal/
55

66
Render advanced 3D graphics and perform data-parallel computations using graphics processors.
77

8-
## Overview
8+
### Overview
99

1010
Graphics processros (GPU) are designed to quickly render graphics and perform data-parallel calculations. Use the Metal framework when you need to communicate directly with the GPUs available on a device. Apps that render complex scenes or that perform advanced scientific calculations can use this power to achieve maximum performance. Such apps include:
1111

@@ -16,283 +16,24 @@ Graphics processros (GPU) are designed to quickly render graphics and perform da
1616
Metal works hand-in-hand with other frameworks that supplement its capability. Use *MetalKit* to simplify the task of getting your Metal content onscreen. Use *Metal Performance Shaders* to implement custom rendering functions or to take advantage of a large library of existing functions.
1717
Many high level Apple frameworks are built on top of Metal to take advantage of its performance, including *Core Image*, *SpriteKit* and *SceneKit*. Using one of these high-level frameworks shields you from the details of GPU programming, but writing custom Metal code enables you to achieve the highest level of performance.
1818

19-
## Basic Tasks
19+
### Getting the Default GPU
2020

21-
### Performing Calculations on GPU
21+
To use the Metal framework, you start by getting a GPU device. All of the objects your app needs to interact with Metal come from MTLDevice that you acquire at runtime. iOS nad tvOS devices have only one GPU that you access by calling MTLCreateSystemDefaultDevice():
2222

23-
- Converting a simple function written in C to Metal Shading Language (MSL), so that it can be run on a GPU
24-
- Finding a GPU
25-
- Preparing the MSL function to run on the GPU by creating a pipeline.
26-
- Creating memory allocations accessible to the GPU to hold data
27-
- Creating a command buffer and encoding GPU commands to manipulate the data
28-
- Committing the buffer to a command queue to make the GPU execute the encoded commands
29-
30-
#### Write a GPU Function to Perform Calculations
31-
32-
To illustrate GPU programming, this app adds corresponding elements of two arrays together, writing the results to a third array. Listing 1 shows a function tht performs this calculation on the CPU, written in C. it loops over the index, calculating one value per iteration of the loop.
33-
34-
**Listing 1** Array addition, written in C
35-
36-
```c
37-
void add_arrays(const float* inA, const float* inB, float* result, int length) {
38-
for (int index = 0; index < length ; index++)
39-
{
40-
result[index] = inA[index] + inB[index];
41-
}
42-
}
43-
```
44-
45-
Each value is calculated independently, so the values can be safely calculated concurrently. To perform the calculation on the GPU, you need to rewrite this function in Metal Shading Language (MSL). MSL is a variant of C++ designed for GPU programming. In Metal, code that runs on GPUs is called a *shader*, because historically they were first used to calculate colors in 3D graphics. Listing 2 shows a shader in MSL that performs the same calculation as Listing 1. The sample project defines this function in the add.*metal* file. Xcode builds all .*metal* files in the application target and creates a default Metal library, which it embeds in your app. You'll see how to load the default library later in this sample.
46-
47-
**Listing 2** Array addition, written in MSL
48-
49-
```c++
50-
kernel void add_arrays(device const float* inA, device const float* inB, device float* result, uint index [[thread_position_in_grid]])
51-
{
52-
// the for-loop is replcaed with a collection of threads, each of which
53-
// calls this function.
54-
result[index] = inA[index] + inB[index];
55-
}
56-
```
57-
58-
Listing 1 and Listing 2 are similar, but there are some important differences in the MSL version.
59-
Take a closer look at Listing 2.
60-
61-
First, the function adds the kernel keyword, which declares that the function is:
62-
- A *public GPU function*. Public functions are the only functions that your app can see. Public functions also can't be called by other shader functions.
63-
- A *compute function* (also known as a compute kernel), which performs a parallel calculation using a grid of threads.
64-
65-
See Using a Render Pipeline to Render Primitives to learn the other function keywords used to declare public graphics functions.
66-
67-
The *add_arrays* function declares three of its arguments with the device keyword, which says that these pointers are in the device address space. MSL defines several disjoint address spaces for memory. Whenever you declare a pointer in MSL, you must supply a keyword to declare its address space. Use the device address space to declare persistent memory that the GPU can read from and write to.
68-
69-
Listing 2 removes the for-loop from Listing 1, because the function is now going to be called by multiple threads in the compute grid. This sample creates a 1D grid of threads that exactly matches the array's dimensions, so that each entry in the array is calculated by a different thread.
70-
71-
To replace the index previously provided by the for-loop, the function takes a new index argument, with another MSL keyword, *thread_position_in_grid*, specified using C++ attribute syntax. This keyword declares that Metal should calculate a unique index for each thread and pass that index in this argument. Because add_arrays uses a 1D grid, the index is defined as a scalar integer. Even though the loop was removed, Listing 1 and Listing 2 use the same line of code to add the two numbers together. If you want to convert similar code from C or C++ to MSL, replace the loop logic with a grid in the same way.
72-
73-
#### Find a GPU
74-
75-
In your app, a *MTLDevice* object is a thin abstraction for a GPU; you use it to communicate with a GPU. Metal creates a MTLDevice for each GPU. You get the default device object by calling *MTLCreateSystemDefaultDevice()*.
76-
77-
```objc
78-
id<MTLDevice> device = MTLCreateSystemDefaultDevice();
79-
```
80-
81-
#### Initialize Metal Objects
82-
83-
Metal represents other GPU-related entities, like compiled shaders, memory buffers and textures, as objects. To create these GPU-specific objects, you call methods on a MTLDevice or you call methods on objects created by a MTLDevice. All objects created directly or indirectly by a device object are usable only with that device object. Apps that use multiple GPUs will use multiple device objects and create a similar hierarchy of Metal objects for each.
84-
85-
The sample app uses a custom MetalAdder class to manage the objects it needs to communicate with the GPU. The class's initializer creates these objects and stores them in its proprties. The app creates an instance of this class, passing in the Metal device object to use to create the secondary objects. The MetalAdder object keeps strong references to the Metal objects until it finishes executing.
86-
87-
```objc
88-
MetalAdder* adder = [[MetalAdder alloc] initWithDevice:device];
89-
```
90-
91-
In Metal, expensive initialization tasks can be run once and the results retained and used inexpensively. You rarely need to run such tasks in performance-sensitive code.
92-
93-
#### Get a Reference to the Metal Function
94-
95-
The first thing the initializer does is load the function and prepare it to run on the GPU. When you build the app, Xcode compiles the *add_arrays* function and adds it to a default Metal library that it embeds in the app. You use *MTLLibrary* and *MTLFunction* objects to get information about Metal libraries and the functions contained in them. To get an object representing the *add_arrays* function, ask the MTLDevice to create a MTLLibrary object for the default library, and then ask the library for a MTLFunction object that represents the shader function.
96-
97-
```objc
98-
- (instancetype) initWithDevice: (id<MTLDevice>) device
99-
{
100-
self = [super init];
101-
if (self)
102-
{
103-
_mDevice = device;
104-
105-
NSError* error = nil
106-
107-
// Load the shader files with a .metal file extention in the project
108-
109-
id<MTLLibrary> defaultLibrary = [_mDevice newDefaultLibrary];
110-
if (defaultLibrary == nil)
111-
{
112-
NSLog(@"Failed to find the default library.");
113-
return nil;
114-
}
115-
116-
id<MTLFunction> addFunction = [defaultLibrary newFunctionWithName:@"add_arrays"];
117-
if (addFunction == nil)
118-
{
119-
NSLog(@"Failed to find the adder function");
120-
return nil;
121-
}
122-
}
23+
```swift
24+
guard let device = MTLCreateSystemDefaultDevice() else {
25+
fatalError( "Failed to get the system's default Metal device." )
12326
}
12427
```
12528

126-
#### Prepare a Metal Pipeline
29+
### Setting Up a Command Structure
12730

128-
The function object is a proxy for the MSL function, but it's not executable code. You convert the function into executable code by creating a *pipeline*. A pipeline specifies the steps that the GPU performs to complete a specific task. In Metal, a pipeline is represented by a *pipeline state object*. Because this sample uses a compute function, the app creates a MTLComputePipelineState object.
31+
To get the GPU to perform work on your behalf, you send commands to it. A command performs the drawing, parallel computation, or resource management work your app requires.
32+
The relationship between Metal apps and a GPU is that of a client-server pattern:
12933

130-
```objc
131-
_mAddFunctionPSO = [_mDevice newComputePipelineStateWithFunction: addFunction error:&error];
132-
```
133-
134-
A compute pipeline runs a single compute function, optionally manipulating the input data before running the function, and the output data afterwards.
135-
136-
When you create a pipeline state object, the device object finishes compiling the function for this specific GPU. This sample creates the pipeline state object synchronously and returns it directly to the app. Because compiling does take a while, avoid creating pipeline state objects synchronously in performance-sensitive code.
137-
138-
> **Note**
139-
>> All of the objects returned by Metal in the code you've seen so far are returned as objects that conform to protocols. Metal defines most GPU-specific objects using protocols to abstract away the underlying implementation classes, which may vary for different GPUs. Metal defines GPU-independent objects using classes. The reference documentation for any given Metal protocol make it clear whether you can implement that protocol in your app.
34+
- Your Metal app is the client.
35+
- The GPU is the server.
36+
- You make requests by sending commands to the GPU.
37+
- After processing the commands, the GPU can notify your app when it's ready for more work.
14038

141-
#### Create a Command Queue
142-
143-
To send work to the GPU, you need a command queue. Metal uses command queues to schedule commands. Create a command queue by asking the MTLDevice for one.
144-
145-
```objc
146-
_mCommandQueue = [_mDevice newCommandQueue];
147-
```
148-
149-
#### Create Data Buffers and Load Data
150-
151-
After initializing the basic Metal objects, you load data for the GPU to execute. This task is less performance critical, but still useful to do early in your app's launch.
152-
153-
A GPU can have its own dedicated memory, or it can share memory with the operating system. Metal and the operating system kernel need to perform additional work to let you store data in memory and make that data available to the GPU. Metal abstracts this memory management using *resource* objects. (MTLResource). A resource is an allocation of memory that the GPU can access when running commands. Use a MTLDevice to create resources for its GPU.
154-
155-
The sample app creates three buffers and fills the first two with random data. The third buffer is where *add_arrays* will store its results.
156-
157-
```objc
158-
_mBufferA = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
159-
_mBufferB = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
160-
_mBufferResult = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];
161-
162-
[self generateRandomFloatData:_mBufferA];
163-
[self generateRandomFloatData:_mBufferB];
164-
```
165-
166-
The resources in this sample are(MTLBuffer) objects, which are allocations of memory without a predefined format. Metal manages each buffer as an opaque collection of bytes. However, you specify the format when you use a buffer in a shader. This means that your shaders and your app need to agree on the format of any data being passed back and forth.
167-
168-
When you allocate a buffer, you provide a storage mode to determine some of its performance characteristics and whether the CPU or GPU can access it. The sample app uses shared memory (storageModeShared), which both the CPU and GPU can access.
169-
170-
To fill a buffer with random data, the app gets a pointer to the buffer's memory and write data to it on the CPU. The add_arrays function in Listing 2 declared its arguments as arrays of floating-point numbers, so you provide buffers in the same format:
171-
172-
```objc
173-
- (void) generateRandomFloatData: (id<MTLBuffer>) buffer
174-
{
175-
float* dataPtr = buffer.contents;
176-
177-
for (unsigned long index = 0; index < arrayLength; index++)
178-
{
179-
dataPtr[index] = (float)rand()/(float)(RAND_MAX);
180-
}
181-
}
182-
```
183-
184-
#### Create a Command Buffer
185-
186-
Ask the command queue to create a command buffer.
187-
188-
```objc
189-
id<MTLCommandBuffer> commandBuffer = [_mCommandQueue commandBuffer];
190-
```
191-
192-
#### Create a Command Encoder
193-
194-
To write commands into a command buffer, you use a *command encoder* for the specific kind of commands you want to code. This sample creates a compute command encoder, which encodes a compute pass. A compute pass holds a list of commands that execute compute pipelines. Each compute command causes the GPU to create a grid of threads to execute on the GPU.
195-
196-
```objc
197-
id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
198-
```
199-
200-
To encode a command, you make a series of method calls on the encoder. Some methods set state information, like the pipeline state object (PSO) or the arguments to be passed to the pipeline. After you make those state changes, you encode a command to execute the pipeline. The encoder writes all of the state changes and command parameters into the command buffer.
201-
202-
![](https://docs-assets.developer.apple.com/published/064ba03feb/1516649f-a760-4bae-bee5-9bb1996ff42e.png)
203-
204-
#### Set Pipeline State and Argument Data
205-
206-
Set the pipeline state object of the pipeline you want the command to execute. Then set data for any arguments that the pipeline needs to send into the add_array function. For this pipeline, that means providing references to three buffers. Metal automatically assigns indices for the buffer arguments in the order that the arguments appear in the function declaration in Listing 2, starting with 0. You provide arguments using the same indices.
207-
208-
```objc
209-
[computeEncoder setComputePipelineState:_mAddFunctionPSO];
210-
[computeEncoder setBuffer:_mBufferA offset:0 atIndex:0];
211-
[computeEncoder setBuffer:_mBufferB offset:0 atIndex:1];
212-
[computeEncoder setBuffer:_mBufferResult offset:0 atIndex:2];
213-
```
214-
215-
You also specify an offset for each argument. An offset of 0 means the command will access the data from the beginning of a buffer. However, you could use one buffer to store multiple arguments, specifying an offset for each argument.
216-
217-
You don't specify any data for the index argument because the add_arrays function defined its values as being provided by the GPU.
218-
219-
#### Specify Thread Count and Organiztion
220-
221-
Next, decide how many threads to create and how to organize those threads. Metal can create 1D, 2D, or 3D grids. The add_arrays function uses a 1D array, so the sample creates a 1D grid of size (dataSize x 1 x 1), from which Metal generates indices between 0 and dataSize-1.
222-
223-
```objc
224-
MTLSize gridSize = MTLSizeMake(arrayLength, 1, 1);
225-
```
226-
227-
#### Specify Threadgroup Size
228-
229-
Metal subdivices the grid into smaller grids called *threadgroups*. Each threadgroup is calculated separately. Metal can dispatch threadgroups to different processing elements on the GPU to speed up processing. You also need to decide how large to make the threadgroups for your command.
230-
231-
```objc
232-
NSUInteger threadGroupSize = _mAddFunctionPSO.maxTotalThreadsPerThreadgroup;
233-
if (threadGroupSize > arrayLength)
234-
{
235-
threadGroupSize = arrayLength;
236-
}
237-
MTLSize threadgroupSize = MTLSizeMake(threadGroupSize, 1, 1);
238-
```
239-
240-
The app asks the pipeline state object for the largest possible threadgroup and shrinks it if that size is larger than the size of the data set. The maxTotalThreadsPerThreadgroup property gives the maximum number of threads allowed in the threadgroup, which varies depending on the complexity of the function used to create the pipeline state object.
241-
242-
#### Encode the Compute Command to Execute the Threads
243-
244-
Finally, encode the command to dispatch the grid of threads
245-
246-
```objc
247-
[computeEncoder dispatchThreads:gridSize threadsPerThreadgroup:threadgroupSize];
248-
```
249-
250-
When the GPU executes this command, it uses the state you previously set and command's parameters to dispatch threads to perform the computation.
251-
252-
You can follow the same steps using the encoder to encode multiple compute commands into the compute pass without performing any redundant stps. For example, you might set the pipeline state object once, and then set arguments and encode a command for each collection of buffers to process.
253-
254-
#### End the Compute Pass
255-
256-
When you have no more commands to add the compute pass, you end the encoding process to close out the compute pass.
257-
258-
```objc
259-
[computeEncoder endEncoding];
260-
```
261-
262-
#### Commit the Command Buffer to Execute Its Commands
263-
264-
Run the commands in the command buffer by committing the command buffer to the queue.
265-
266-
```objc
267-
[commandBuffer commit];
268-
```
269-
270-
The command queue created the command buffer, so committing the buffer always places it on that queue. After you commit the command buffer, Metal asynchronously prepares the commands for execution and then schedules the command buffer to execute on the GPU. After the CPU executes all the commands in the command buffer, Metal marks the command buffer as complete.
271-
272-
#### Wait for the Calculation to Complete
273-
274-
Your app can do other work while the GPU is processing your commands. This sample doesn't need to do any additional work, so it simply waits until the command buffer is complete.
275-
276-
```objc
277-
[commandBuffer waitUntilCompleted];
278-
```
279-
280-
Alternatively, to be notified when Metal has processed all of the commands, add a completion handler to the command buffer (addCompletedHandler(_:)), or check the status of a command buffer by reading its status property.
281-
282-
#### Read the Results From the Buffer
283-
284-
After the command buffer completes, the GPU's calculations are stored in the output buffer and Metal performs any necessary steps to make sure the CPU can see them. In a real app, you would read the results from the buffer and do something with them, such as displaying the results onscreen or writing them to a file. Because the calculations are only used to illustrate the process of creating a Metal app, the sample reads the values stored in the output buffer and tests to make sure the CPU and the GPU calculated the same results.
285-
286-
```objc
287-
- (void) verifyResults
288-
{
289-
float* a = _mBufferA.contents;
290-
float* b = _mBufferB.contents;
291-
float* result = _mBufferResult.contents;
292-
293-
for ( unsigned long index = 0; index < arrayLength; index ++)
294-
{
295-
assert(result[index] == a[index] + b[index]);
296-
}
297-
}
298-
```
39+
![]('https://docs-assets.developer.apple.com/published/861974e544/6411df7f-4f5c-46d6-9573-c2c6dd10fffb.png')

0 commit comments

Comments
 (0)