Fidelityfx Cas: Lou Kramer, Developer Technology Engineer, Amd

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 92

FidelityFX CAS

Lou Kramer, Developer Technology


Engineer, AMD
Legal
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc.
(AMD) including, but not limited to, the features, functionality, performance, availability, timing,
pricing, expectations and expected benefits of AMD’s current and future products, which are made
pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995.
Forward-looking statements are commonly identified by words such as "would," "may," "expects,"
"believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are
cautioned that the forward-looking statements in this presentation are based on current beliefs,
assumptions and expectations, speak only as of the date of this presentation and involve risks and
uncertainties that could cause actual results to differ materially from current expectations. Such
statements are subject to certain known and unknown risks and uncertainties, many of which are
difficult to predict and generally beyond AMD's control, that could cause actual results and other
future events to differ materially from those expressed in, or implied or projected by, the forward-
looking information and statements.  Investors are urged to review in detail the risks and
uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to
AMD's Quarterly Report on Form 10-Q for the quarter ended March 30, 2019.

2 | AMD DEVELOPER DAY | June 19th, 2019


Group of image quality-enhancing techniques developed
by AMD

 Hosted on GPUOpen under


the MIT license

3 | AMD DEVELOPER DAY | June 19th, 2019


Group of image quality-enhancing techniques developed
by AMD

 Hosted on GPUOpen under


the MIT license

This talk will only cover Contrast Adaptive Sharpening (CAS).

4 | AMD DEVELOPER DAY | June 19th, 2019


Group of image quality-enhancing techniques developed
by AMD

 Hosted on GPUOpen under


the MIT license

This talk will only cover Contrast Adaptive Sharpening (CAS).

5 | AMD DEVELOPER DAY | June 19th, 2019


Group of image quality-enhancing techniques developed
by AMD

 Hosted on GPUOpen under


the MIT license

This talk will only cover Contrast Adaptive Sharpening (CAS).

6 | AMD DEVELOPER DAY | June 19th, 2019


AGENDA

Overview

Algorithm

Integration

Input

Performance

Examples

Q&A

7 | AMD DEVELOPER DAY | June 19th, 2019


Overview
Contrast Adaptive Sharpening (CAS)

For sharpening and optionally upsampling


 Enhance sharpness and local high-frequency contrast

8 | AMD DEVELOPER DAY | June 19th, 2019


Overview
Contrast Adaptive Sharpening (CAS)

For sharpening and optionally upsampling


 Enhance sharpness and local high-frequency contrast

No CAS

9 | AMD DEVELOPER DAY | June 19th, 2019


Overview
Contrast Adaptive Sharpening (CAS)

For sharpening and optionally upsampling


 Enhance sharpness and local high-frequency contrast

CAS

10 | AMD DEVELOPER DAY | June 19th, 2019


Overview
Contrast Adaptive Sharpening (CAS)

For sharpening and optionally upsampling


 Enhance sharpness and local high-frequency contrast

 Created to provide natural sharpness without artifacts with only low overhead

11 | AMD DEVELOPER DAY | June 19th, 2019


Overview
Contrast Adaptive Sharpening (CAS)

For sharpening and optionally upsampling


 Enhance sharpness and local high-frequency contrast

 Created to provide natural sharpness without artifacts with only low overhead

 Upsampling feature was designed for Dynamic Resolution Scaling (DRS)


⁃ With DRS render resolution can change per frame
 use CAS to sharpen and upsample to the fixed output resolution, then composite full
resolution UI over CAS output
⁃ This happens all in one compute dispatch 😊

12 | AMD DEVELOPER DAY | June 19th, 2019


Upsample from 440x512 to 880x1024
using bicubic

13 | AMD DEVELOPER DAY | June 19th, 2019


Upsample from 440x512 to 880x1024
using CAS

14 | AMD DEVELOPER DAY | June 19th, 2019


When to use CAS?
 For sharpening purposes
⁃ E.g. as post-TAA sharpener
⁃ After other ‚softener‘ passes to get some of the contrasts back
⁃ After scaling from lower resolution when CAS is used in the sharpen-only version

 Sharpening + Scaling
⁃ As a Dynamic Resolution Scaling option

 Everytime when you think it improves image quality  Best is to just try it out!

15 | AMD DEVELOPER DAY | June 19th, 2019


ALGORITHM

16 | AMD DEVELOPER DAY | June 19th, 2019


Algorithm - Overview
CAS is a spatial only filter.

It uses the minimal nearest 3x3 source texel window for filtering.

w
Final filter is of form while the centre pixel is the output pixel.
w 1 w
w

Weight w is computed per pixel, adapting to local contrast.

17 | AMD DEVELOPER DAY | June 19th, 2019


How to compute weight w?

Light color: 0.95, 0.75, 0.6

Middle color: 0.9, 0.4, 0.1

Dark color: 0.75, 0.25, 0.05

18 | AMD DEVELOPER DAY | June 19th, 2019


How to compute weight w?

Light color: 0.95, 0.75, 0.6

Middle color: 0.9, 0.4, 0.1

Dark color: 0.75, 0.25, 0.05

19 | AMD DEVELOPER DAY | June 19th, 2019


Base sharpening amount Light color: 0.95, 0.75, 0.6

CAS fetches a ‘circle‘ neighborhood around the pixel ‘c‘. Middle color: 0.9, 0.4, 0.1
a Dark color: 0.75, 0.25, 0.05
b c d
e

It then computes a minimum and maximum.


a a
min ( b c )d= MIN max( b c) =d MAX
e e

MIN_G = 0.4 MAX_G = 0.75

The minimum and maximum give an idea of local contrast. Only the green channel is used.

20 | AMD DEVELOPER DAY | June 19th, 2019


Base sharpening amount Light color: 0.95, 0.75, 0.6

0 MIN_G MAX_G 1 Middle color: 0.9, 0.4, 0.1

d_min_g = 0.4 Dark color: 0.75, 0.25, 0.05


d_max_g = 0.25

d_min_g = 0 + MIN_G = 0 + 0.4 = 0.4


d_max_g = 1 – MAX_G = 1 – 0.75 = 0.25

d_max_g is the minimum distance to the signal limit, since d_max_g < d_min_g.
d_max_g is divided by MAX_G to get a base sharpening amount of ‘A‘

d_min_g is not used in this example, but for darker colors it would since then we have d_max_g >
d_min_g.

Base sharpening amount A = d_max_g/MAX_G


A = 0.25 / 0.75 = 1/3
21 | AMD DEVELOPER DAY | June 19th, 2019
Base sharpening amount Light color: 0.95, 0.75, 0.6

MAX_G Middle color: 0.9, 0.4, 0.1


0 MIN_G 1

d_max_g = 0.6 Dark color: 0.75, 0.25, 0.05


d_min_g = 0.25

d_min_g = 0 + MIN_G = 0 + 0.25 = 0.25


a
d_max_g = 1 – MAX_G = 1 – 0.4 = 0.6
b c d
e
d_min_g is the minimum distance to the signal limit, since d_min_g < d_max_g.
d_min_g is divided by MAX_G to get a base sharpening amount of ‘A‘

Base sharpening amount A = d_min_g/MAX_G


A = 0.25 / 0.4 = 0.625

22 | AMD DEVELOPER DAY | June 19th, 2019


Base sharpening amount
Base sharpening amount A = d_max_g/MAX_G
a
A = 0.25 / 0.75 = 1/3
b c d
e
A = sqrt(A);
A = sqrt(A) = sqrt(1/3) ≈ 0.577350

The base sharpening amount ranges from


0 := no sharpening to 1:= full sharpening.

It is computed per-pixel at runtime in CasFilter().


This sharpening amount is further influenced by a developer chosen maximum.

23 | AMD DEVELOPER DAY | June 19th, 2019


Base sharpening amount
Base sharpening amount A = d_max_g/MAX_G
a
A = 0.25 / 0.75 = 1/3
Biases towards b c d
more sharpening e
A = sqrt(A);
A = sqrt(A) = sqrt(1/3) ≈ 0.577350

The base sharpening amount ranges from


0 := no sharpening to 1:= full sharpening.

It is computed per-pixel at runtime in CasFilter().


This sharpening amount is further influenced by a developer chosen maximum.

24 | AMD DEVELOPER DAY | June 19th, 2019


Base sharpening amount
Base sharpening amount A = d_max_g/MAX_G
a
A = 0.25 / 0.75 = 1/3
b c d
e
A = sqrt(A);
A = sqrt(A) = sqrt(1/3) ≈ 0.577350 You call this function in
your shader per pixel.
The base sharpening amount ranges from More about integration
later.
0 := no sharpening to 1:= full sharpening.

It is computed per-pixel at runtime in CasFilter().


This sharpening amount is further influenced by a developer chosen maximum.

25 | AMD DEVELOPER DAY | June 19th, 2019


Sharpness tuning knob
This sharpening amount is further influenced by a developer chosen maximum.
This maximum - the sharpness tuning knob - is set during CasSetup() time.

// The 'varAU4(const0);' expands into 'uint32_t const0[4];' on the CPU.


// The 'varAU4(const0);' expands into 'uint4 const0;' on the GPU.
varAU4(const0);
varAU4(const1);
CasSetup(const0,const1,
  0.0f,             // Sharpness tuning knob (0.0 to 1.0).
1920.0f,1080.0f,  // Example input size.
  2560.0f,1440.0f); // Example output size.

26 | AMD DEVELOPER DAY | June 19th, 2019


Sharpness tuning knob
This sharpening amount is further influenced by a developer chosen maximum.
This maximum - the sharpness tuning knob - is set during CasSetup() time.

Influences
// The 'varAU4(const0);' expands into 'uint32_t const0[4];' onthe
thepeak
CPU.
// The 'varAU4(const0);' expands into 'uint4 const0;'sharpness amount.
on the GPU.
varAU4(const0);
varAU4(const1);
CasSetup(const0,const1,
  0.0f,             // Sharpness tuning knob (0.0 to 1.0).
1920.0f,1080.0f,  // Example input size.
  2560.0f,1440.0f); // Example output size.

27 | AMD DEVELOPER DAY | June 19th, 2019


Sharpness tuning knob
More on integration
This sharpening amount is further influenced by a developer chosen maximum. later 
This maximum - the sharpness tuning knob - is set during CasSetup() time.

// The 'varAU4(const0);' expands into 'uint32_t const0[4];' on the CPU.


// The 'varAU4(const0);' expands into 'uint4 const0;' on the GPU.
varAU4(const0);
varAU4(const1);
CasSetup(const0,const1,
  0.0f,             // Sharpness tuning knob (0.0 to 1.0).
1920.0f,1080.0f,  // Example input size.
  2560.0f,1440.0f); // Example output size.

28 | AMD DEVELOPER DAY | June 19th, 2019


Sharpness tuning knob
This sharpening amount is further influenced by a developer chosen maximum.
This maximum - the sharpness tuning knob - is set during CasSetup() time.

The developer chosen maximum is then:

developerMaximum = lerp(-0.125, -0.2, sharpness_knob);


Sharpness_knob := 0 is minimum, := 1 is maximum.

Example: sharpness_knob = 0
developerMaximum = Lerp(-0.125, -0.2, 0) = -0.125;

29 | AMD DEVELOPER DAY | June 19th, 2019


Sharpness tuning knob
This sharpening amount is further influenced by a developer chosen maximum.
This maximum - the sharpness tuning knob - is set during CasSetup() time.

The developer chosen maximum is then:


Earlier versions of CAS had
no modifiable sharpness
developerMaximum = lerp(-0.125, -0.2, sharpness_knob); knob. The output was similar
Sharpness_knob := 0 is minimum, := 1 is maximum. to sharpness_knob = 0.

Example: sharpness_knob = 0
developerMaximum = Lerp(-0.125, -0.2, 0) = -0.125;

30 | AMD DEVELOPER DAY | June 19th, 2019


Weight w
The algorithm then produces a filter weight ‚w‘ as the product of the per-pixel base sharpness
amount A and the developer chosen maximum:

W = A * developerMaximum;
W = 0.577350 * -0.125 ≈ -0.072169

w a -
The filter kernel is then: w 1 w b c d 0.07216
9
w e
- 1 -
0.07216 0.07216
output_color = (w*a + w*b + 1*c + w*d + w*e) / (w * 4 + 1);
9 9
-
0.07216
9

31 | AMD DEVELOPER DAY | June 19th, 2019


Final output Light color: 0.95, 0.75, 0.6, 1
The filter kernel is then: w Middle color: 0.9, 0.4, 0.1, 1
w 1 w
Dark color: 0.75, 0.25, 0.05, 1
w

Applied to our pixel: a


-
b c d
0.07216
e 9
- 1 -
0.07216 0.07216
output_color_g = (w*a + w*b + 1*c + w*d + w*e) / (w * 4 9+ 1); 9
-
0.07216
output_color_g = -0.072169 * 0.75 + -0.072169 * 0.75 + 1 * 0.4 ... ≈ 0.32898
9

Green channel weight is used for all output channels.

32 | AMD DEVELOPER DAY | June 19th, 2019


Sharpening + scaling
Scaling algorithm adaptively interpolates between nearest 4 results of the non-scaling algorithm

It starts by fetching the ‘circle’ neighborhood around the pixel centered between centers of pixels
{f,g,j,k}

The algorithm then computes the no-scaling sharpening weights {wf, wg, wj, wk} for {f,g,j,k}.

b c
e f g h
i j k l
n o

33 | AMD DEVELOPER DAY | June 19th, 2019


Sharpening + scaling
The interpolation is bilinear and starts with the following bilinear weights:

// s t
pp := fractional pixel position
// u v
float s = (1.0f – pp.x) * (1.0f – pp.y);
float t = pp.x * (1.0f – pp.y);
float u = (1.0f – pp.x) * pp.y;
float v = pp.x * pp.y;

34 | AMD DEVELOPER DAY | June 19th, 2019


Sharpening + scaling
To  hide bilinear interpolation and restore diagonals, the contribution gets additional
contrast weighting

s = s * contrastWeight_f
t = t * contrastWeight _g
b c
u = u * contrastWeight _j
e f g h
v = v * contrastWeight _k
i j k l
n o
The green channel is used as a proxy for „luma“.

35 | AMD DEVELOPER DAY | June 19th, 2019


Sharpening + scaling
Final filter:

wf * s wg * t b c
wf * s wg * t + wj * u wf * s + wk * v wg * t e f g h
+s +t
i j k l
wj * u wf * s + wk * v wg * t + wj * u wk * v
+u +v n o
wj * u wk * v

Output color = (wf * s * b + wg * t * c + ... ) / (wf*s * 2 + wg * t * 2 + ... )

Green channel weight is used for all output channels.

36 | AMD DEVELOPER DAY | June 19th, 2019


INTEGRATION

37 | AMD DEVELOPER DAY | June 19th, 2019


Integration – where to put?
Sharpen only can be a replacement for existing post TAA sharpen passes.

Sharpening + upsampling can be used for dynamic resolution scaling.


 On a single pass

General recommendation – after your post process chain and before your UI

Bloom and
Anti-
Tone CAS Grain
Aliasing
mapping

38 | AMD DEVELOPER DAY | June 19th, 2019


Integration – where to put?
It is likely better to reduce the amount of film grain which happens before CAS (as CAS will amplify
grain).

An alternative would be to add grain after CAS.

It is best to run CAS after tonemapping.

CAS can slightly alterate UI, so it‘s recommended to apply it before

Bloom and
Anti-
Tone CAS Grain
Aliasing
mapping

39 | AMD DEVELOPER DAY | June 19th, 2019


Integration - General
CAS is designed to run as a compute shader
Should run either in a 32-bit, CasFilter(), or packed 16-bit, CasFilterH()
 We recommend to use packed 16-bit for Vulkan (if packed math is supported obv.  )
 And 32-bit for DX12 (for now)

32-bit form works on 8x8 tiles via one {64,1,1} threadgroup


 Each thread works on one pixel per CasFilter() call
16-bit form works on a pair of 8x8 tiles in a 16x8 configuration via one {64,1,1}
 Each thread works on two pixel per CasFilterH() call

40 | AMD DEVELOPER DAY | June 19th, 2019


Integration – CPU Side
Make sure <stdint.h> has already been included.
Setup pre-portability-header defines.
#define A_CPU 1

Include the portability header.


#include "ffx_a.h"
Include the CAS header.
#include "ffx_cas.h" // FFX Cas
#include <stdint.h>
#define A_CPU
#include "ffx_a.h"
#include "ffx_cas.h"

41 | AMD DEVELOPER DAY | June 19th, 2019


Integration – CPU Side
A_STATIC void CasSetup(
outAU4 const0,
outAU4 const1, uint32_t m_const0[4];
// sharpness knob uint32_t m_const1[4];
CasSetup(m_consts, m_consts,
// 0 := default (lower ringing)
0.0f, // Sharpness tuning knob (0.0 to 1.0).
// 1 := maximum (highest ringing) 920.0f, 1080.0f, // Example input size.
AF1 sharpness, 2560.0f, 1440.0f); // Example output size.
AF1 inputSizeInPixelsX,
AF1 inputSizeInPixelsY,
AF1 outputSizeInPixelsX,
AF1 outputSizeInPixelsY){
// build constants

 }

42 | AMD DEVELOPER DAY | June 19th, 2019


Integration – CPU Side
A_STATIC void CasSetup(
outAU4 const0,
outAU4 const1, uint32_t m_const0[4];
// sharpness knob uint32_t m_const1[4];
CasSetup(m_consts, m_consts,
// 0 := default (lower ringing)
0.0f, // Sharpness tuning knob (0.0 to 1.0).
// 1 := maximum (highest ringing) 920.0f, 1080.0f, // Example input size.
AF1 sharpness, 2560.0f, 1440.0f); // Example output size.
AF1 inputSizeInPixelsX,
AF1 inputSizeInPixelsY,
AF1 outputSizeInPixelsX,
• Builds constants for input to the shader
AF1 outputSizeInPixelsY){ • Const0 contains the scaling terms
// build constants • depends on input and output size
… • Const1 contains the sharpness value
• And some additional information about the input and output size
 } • It‘s possible to call CasSetup on the GPU too

43 | AMD DEVELOPER DAY | June 19th, 2019


Integration – CPU Side
Later dispatch the shader based on the amount of semi-persistent loop unrolling.

CAS is designed to run semi-persistent (aka loop unrolled) always.

Here is an example for running with the 16x16 (4-way unroll for 32-bit or 2-way unroll for 16-bit)

vkCmdDispatch(cmdBuf,(widthInPixels+15)/16,(heightInPixels+15)/16,1);

 relies on how you distribute the workload to your threadgroups. If in any doubt, refer to the example 

44 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Create a shader to apply CAS that includes the exposed CAS headers

Fullscreen pass at pretty much the end of your pipeline

Pass takes an image + constants and outputs sharpened, potentially higher resolution image

CAS
Compute
Shader

Constant
s

45 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Create a shader to apply CAS that includes the exposed CAS headers

Fullscreen pass at pretty much the end of your pipeline

Pass takes an image + constants and outputs sharpened, potentially higher resolution image

CAS
Compute
Shader

These are the constants


Constant created during
s CasSetup()

46 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Setup layout for constants from CasSetup, e.g.:
layout(set=0,binding=2) uniform const_buffer
{
uvec4 const0;
uvec4 const1;
} cb;

Setup layout for input and output image


layout(set=0,binding=0,rgba16f)uniform image2D imgSrc;
layout(set=0,binding=1,rgba16f)uniform image2D imgDst;

Example here: images are of


VK_FORMAT_R16G16B16A16_SFLOAT
format

47 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Setup pre-portability-header defines (sets up GLSL/HLSL path, etc.)
#define A_GPU 1
#define A_GLSL 1 // or #define A_HLSL 1

Include the portability header (or copy it in without an include).


#include <ffx_a.h>

48 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Define the fetch function(s).
 CasLoad() takes a 32-bit unsigned integer 2D coordinate and loads the color.
AF3 CasLoad(ASU2 p){return imageLoad(imgSrc,p).rgb;}
 Define the input modifiers as nop's initially.
void CasInput(inout AF1 r,inout AF1 g,inout AF1 b){}

Include this CAS header file (or copy it in without an include).


#include <ffx_cas.h>

49 | AMD DEVELOPER DAY | June 19th, 2019


CasLoad() is used by the CasFilter()
Integration – GPU Side function, but needs to be defined
manually.
Define the fetch function(s). Reason: may differ according to input
format
 CasLoad() takes a 32-bit unsigned integer 2D coordinate and loads the color.
AF3 CasLoad(ASU2 p){return imageLoad(imgSrc,p).rgb;}
 Define the input modifiers as nop's initially.
void CasInput(inout AF1 r,inout AF1 g,inout AF1 b){}

Include this CAS header file (or copy it in without an include).


#include <ffx_cas.h>

50 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Define the fetch function(s). Will come later
back
 CasLoad() takes a 32-bit unsigned integer 2D coordinate and loads to color.
the this in
relation to input
AF3 CasLoad(ASU2 p){return imageLoad(imgSrc,p).rgb;} format!
 Define the input modifiers as nop's initially.
void CasInput(inout AF1 r,inout AF1 g,inout AF1 b){}

Include this CAS header file (or copy it in without an include).


#include <ffx_cas.h>

51 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Define the fetch function(s). GLSL:
 CasLoad() takes a 32-bit unsigned integer 2D coordinate and loads the color. ASU2 := ivec2
AF1 := float
AF3 CasLoad(ASU2 p){return imageLoad(imgSrc,p).rgb;}
AF3 := vec3
 Define the input modifiers as nop's initially.
void CasInput(inout AF1 r,inout AF1 g,inout AF1 b){}

Include this CAS header file (or copy it in without an include). HLSL:
ASU2 := int2
#include <ffx_cas.h>
AF1 := float
AF3 := float3

52 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Example in shader integration for the semi-persistent 16x16 case for 32-bit using the faster quality
option.
layout(local_size_x=64)in;
void main(){

Fetch constants from CasSetup().


AU4 const0=cb.const0;
AU4 const1=cb.const1;

Do remapping of local xy in workgroup for a more PS-like swizzle pattern.


gxy is the integer pixel position of the output image
AU2 gxy=ARmp8x8(gl_LocalInvocationID.x)+AU2(gl_WorkGroupID.x<<4u,gl_WorkGroupID.y<<4u);

Filter.

53 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
Simple remap 64x1 to 8x8 with rotated 2x2 pixel quads
Example in shader integrationinfor the linear.
quad semi-persistent 16x16 case for 32-bit using the faster quality
option.
layout(local_size_x=64)in; LANE TO 8x8 MAPPING
void main(){ ===================
00 01 08 09 10 11 18 19
02 03 0a 0b 12 13 1a 1b
Fetch constants from CasSetup().
04 05 0c 0d 14 15 1c 1d
AU4 const0=cb.const0; 06 07 0e 0f 16 17 1e 1f
20 21 28 29 30 31 38 39
AU4 const1=cb.const1;
22 23 2a 2b 32 33 3a 3b
24 25 2c 2d 34 35 3c 3d
26 27 2e 2f
Do remapping of local xy in workgroup for36 37 3ePS-like
a more 3f swizzle pattern.
gxy is the integer pixel position of the output image
AU2 gxy=ARmp8x8(gl_LocalInvocationID.x)+AU2(gl_WorkGroupID.x<<4u,gl_WorkGroupID.y<<4u);

Filter.

54 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
void CasFilter(
// Output values, non-vector so port between CasFilter() and CasFilterH() is easy.
out AF1 pixR,
out AF1 pixG,
out AF1 pixB,
// Integer pixel position in output.
AU2 ip,
// Constants generated by CasSetup().
AU4 const0,
AU4 const1,
// Must be a compile-time literal value, true = sharpen only (no resize).
AP1 noScaling){ … /* algorithm as explained before  */ … }

55 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side
AF4 c;
CasFilter(c.r,c.g,c.b,gxy,const0,const1,false);
imageStore(imgDst,ASU2(gxy),c);
gxy.x+=8u;
CasFilter(c.r,c.g,c.b,gxy,const0,const1,false); 8x8 8x8
imageStore(imgDst,ASU2(gxy),c);
gxy.y+=8u;
CasFilter(c.r,c.g,c.b,gxy,const0,const1,false);
imageStore(imgDst,ASU2(gxy),c); 8x8 8x8
gxy.x-=8u;
CasFilter(c.r,c.g,c.b,gxy,const0,const1,false);
imageStore(imgDst,ASU2(gxy),c);

}

56 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side - Packed
Setup pre-portability-header defines (sets up GLSL/HLSL path, packed math support, etc.)
#define A_GPU 1
#define A_GLSL 1 // or #define A_HLSL 1
#define A_HALF 1
#define CAS_PACKED_ONLY 1

Include the portability header (or copy it in without an include).


#include <ffx_a.h>

57 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side - Packed
Define the fetch function(s).
 CasLoadH() is the 16-bit version taking 16-bit unsigned integer 2D coordinate and loading 16-bit
float color.
AH3 CasLoadH(ASW2 p){return AH3(imageLoad(imgSrc,ASU2(p)).rgb);}
 Define the input modifiers as nop's initially.
void CasInputH(inout AH2 r,inout AH2 g,inout AH2 b){}

Include this CAS header file (or copy it in without an include).


CAS_PACKED_ONLY needs to be defined already!
#include <ffx_cas.h>

58 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side - Packed
Example for semi-persistent 16x16 but this time for packed math.
layout(local_size_x=64)in;
void main(){

Fetch constants from CasSetup().


AU4 const0=cb.const0;
AU4 const1=cb.const1;

Do remapping of local xy in workgroup for a more PS-like swizzle pattern.


gxy is the integer pixel position of the output image.
AU2 gxy=ARmp8x8(gl_LocalInvocationID.x)
+AU2(gl_WorkGroupID.x<<4u,gl_WorkGroupID.y<<4u);

Filter.

59 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side - Packed
void CasFilterH(
// Output values are for 2 8x8 tiles in a 16x8 region.
// pix<R,G,B>.x = right 8x8 tile
// pix<R,G,B>.y = left 8x8 tile
// This enables later processing to easily be packed as well.
out AH2 pixR,
out AH2 pixG,
out AH2 pixB,
// Integer pixel position in output.
AU2 ip,
// Constants generated by CasSetup().
AU4 const0,
AU4 const1,
// Must be a compile-time literal value, true = sharpen only (no resize).
AP1 noScaling){… /* algorithm as explained before  */ … }

60 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side - Packed
// Can be used to convert from packed SOA to AOS for store.
void CasDepack(out AH4 pix0,out AH4 pix1,AH2 pixR,AH2 pixG,AH2 pixB){
#ifdef A_HLSL
// Invoke a slower path for DX only, since it won't allow uninitialized values.
pix0.a=pix1.a=0.0;
#endif
pix0.rgb=AH3(pixR.x,pixG.x,pixB.x);
pix1.rgb=AH3(pixR.y,pixG.y,pixB.y);}

61 | AMD DEVELOPER DAY | June 19th, 2019


Integration – GPU Side - Packed
AH4 c0,c1; AH2 cR,cG,cB;
CasFilterH(cR,cG,cB,gxy,const0,const1,false);
Extra work integrated after CAS would go here.
...
Suggest only running CasDepack() right before stores, to maintain packed math for any work after
CasFilterH().
CasDepack(c0,c1,cR,cG,cB);
imageStore(imgDst,ASU2(gxy),AF4(c0));
imageStore(imgDst,ASU2(gxy)+ASU2(8,0),AF4(c1)); 16x8
gxy.y+=8u;
CasFilterH(cR,cG,cB,gxy,const0,const1,false);
...
CasDepack(c0,c1,cR,cG,cB); 16x8
imageStore(imgDst,ASU2(gxy),AF4(c0));
imageStore(imgDst,ASU2(gxy)+ASU2(8,0),AF4(c1));
}

62 | AMD DEVELOPER DAY | June 19th, 2019


INPUT

63 | AMD DEVELOPER DAY | June 19th, 2019


Input - General
Input must range between {0 to 1} for each color channel.
CAS output will be {0 to 1} ranged as well.

CAS is designed to be a linear filter.


This filter does not function well on sRGB or gamma 2.2 non-linear data.
This filter does not function on PQ non-linear data.
 Due to the shape of PQ, the positive side of the ring created by the negative lobe tends to
become over-bright.

64 | AMD DEVELOPER DAY | June 19th, 2019


Input - General
CAS sharpen only does 5 loads, so any conversion applied during CasLoad() or CasInput() has
a 5 load * 3 channel = 15x cost amplifier.
 input conversions need to be factored into the prior pass's output.
 But if necessary use CasInput() instead of CasLoad(), as CasInput() works with packed color.
 For CAS with scaling the amplifier is 12 load * 3 channel = 36x cost amplifier.

Any conversion applied to output has a 3x cost amplifier (3 color channels).


 Output conversions are substantially less expensive.
Added VALU ops due to conversions will have visible cost as this shader is already quite
VALU heavy.

65 | AMD DEVELOPER DAY | June 19th, 2019


Input Format – FP16
FP16 with all non-negative values ranging {0 to 1}
-> use as is, filter is designed for linear input and output ranging {0 to 1}

66 | AMD DEVELOPER DAY | June 19th, 2019


Input Format – UNORM
UNORM with linear conversion approximation
Can be used for both sRGB or FreeSync2 native gamma 2.2 cases

Load/store with either 10:10:10:2 UNORM or 8:8:8:8 UNORM

Use gamma 2.0 conversion in CasInput() as an approximation:


void CasInput(inout AF1 r,inout AF1 g,inout AF1 b){r*=r;g*=g;b*=b;}
CasInput() is for optional input transforms.

Do linear to gamma 2.0 before the store in your shader:


c.r=sqrt(c.r);c.g=sqrt(c.g);c.b=sqrt(c.b);
imageStore(imgDst,ASU2(gxy),c);

67 | AMD DEVELOPER DAY | June 19th, 2019


Input Format – UNORM Packed
UNORM with linear conversion approximation
Can be used for both sRGB or FreeSync2 native gamma 2.2 cases

Use gamma 2.0 conversion in CasInputH() as an approximation:


void CasInputH(inout AH2 r,inout AH2 g,inout AH2 b){r*=r;g*=g;b*=b;}

Do linear to gamma 2.0 before the store in your shader


CasFilterH(cR,cG,cB,gxy,const0,const1,false);
cR=sqrt(cR);cG=sqrt(cG);cB=sqrt(cB);
CasDepack(c0,c1,cR,cG,cB);
imageStore(img[0],ASU2(gxy),AF4(c0));
imageStore(img[0],ASU2(gxy+AU2(8,0)),AF4(c1));

68 | AMD DEVELOPER DAY | June 19th, 2019


Input Format – SRGB
 Use texelFetch() with sRGB format (VK_FORMAT_R8G8B8A8_SRGB) for loads (gets linear into
shader).
 Store to destination using UNORM (not sRGB) stores and do the linear to sRGB conversion in the
shader.

AF3 CasLoad(ASU2 p){return texelFetch(texSrc,p,0).rgb;}

Add before store in shader:


 Do linear to sRGB before store
c.r=AToSrgbF1(c.r);
c.g=AToSrgbF1(c.g);
c.b=AToSrgbF1(c.b);
imageStore(imgDst,ASU2(gxy),c);

69 | AMD DEVELOPER DAY | June 19th, 2019


Input Format – SRGB Packed
AH3 CasLoadH(ASW2 p){return AH3(texelFetch(texSrc, ASU2(p),0).rgb);}

Add before store in shader:


 Do linear to sRGB before store
CasFilterH(cR,cG,cB,gxy,const0,const1,false);
cR=AToSrgbH2(cR);
cG=AToSrgbH2(cG);
cB=AToSrgbH2(cB);
CasDepack(c0,c1,cR,cG,cB);
imageStore(img[0],ASU2(gxy),AF4(c0));
imageStore(img[0],ASU2(gxy+AU2(8,0)),AF4(c1));

70 | AMD DEVELOPER DAY | June 19th, 2019


PERFORMANCE

71 | AMD DEVELOPER DAY | June 19th, 2019


Performance
As of current testing, sharpen-only takes

< 100 µs

on RX 5700 XT for 1080p unpacked.

 Numbers are expected to scale fairly linearly with resolution

 Sharpen + Scaling: depends on configuration

72 | AMD DEVELOPER DAY | June 19th, 2019


EXAMPLES

73 | AMD DEVELOPER DAY | June 19th, 2019


F1 2019
Borderlands
3
Borderlands
3
Borderlands 3
Ni Shui Han – 100%
Zoom
Ni Shui Han – 100%
Zoom
Ni Shui Han
Rage 2
Rage 2
Rage 2
Unity Demo

84
Unity Demo

85
Strange
Brigade

2560 scaled to 3840 no CAS

86
Strange
Brigade

2560 scaled to 3840 with CAS

87
Strange
Brigade

3840 native

88
THANKS TO
 Timothy Lottes
 Adam Sawicki
 Marcus Svensson
 Ihor Szlachtycz
 Nick Thibieroz

89
Q&A
lou.kramer@amd.com

@lou_auroyup

https://gpuopen.com/

90 | AMD DEVELOPER DAY | June 19th, 2019


Disclaimer & Attribution

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to
revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS
OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO
ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN
IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
 © 2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon TM and combinations thereof are trademarks of Advanced Micro
Devices, Inc. in the United States and/or other jurisdictions. Vulkan  is a registered trademark of the Khronos Group Inc. Other names are for informational
purposes only and may be trademarks of their respective owners.

91 | AMD DEVELOPER DAY | June 19th, 2019

You might also like