Introduction

https://git.concertos.live/Encode_Guide/mdbook-guide

This guide is meant to be both a starting point for newcomers interested in producing high quality encodes as well as a reference for experienced encoders. As such, after most functions are introduced and their uses explained, an in-depth explanation can be found. These are meant as simple add-ons for the curious and should in no way be necessary reading to apply the function.

Terminology

For this to work, basic terms and knowledge about video is required. These explanations will be heavily simplified, as there should be dozens of web pages with explanations on these subjects on sites like Wikipedia, the AviSynth wiki etc.

Video

Consumer video products are usually stored in YCbCr, a term which is commonly used interchangeably with YUV. In this guide, we will mostly be using the term YUV, as VapourSynth formats are written as YUV.

YUV formatted content has information split into three planes: Y, referred to as luma, which represents the brightness, and U and V, which each represent a chroma plane. These chroma planes represent offsets between colors, whereby the middle value of the plane is the neutral point.

This chroma information is usually subsampled, meaning it is stored in a lower frame size than the luma plane. Almost all consumer video will be in 4:2:0, meaning chroma planes are half of the luma plane's size. The Wikipedia page on chroma subsampling should hopefully suffice to explain how this works. As we usually want to stay in 4:2:0 format, we are restricted in our frame dimensions, as the luma plane's size has to be divisible by 2. This means we cannot do uneven cropping or resizing to uneven resolutions. However, when necesssary, we can work in each plane individually, although this will be explained in the filtering part.

Additionally, our information has to be stored with a specific precision. Usually, we deal with 8-bit per-plane precision. However, for UHD Blu-rays, 10-bit per-plane precision is the standard. This means possible values for each plane range from 0 to \(2^8 - 1 = 255\). In the bit depths chapter, we will introduce working in higher bit depth precision for increased accuracy during filtering.

VapourSynth

For loading our clips, removing unwanted black borders, resizing, and combatting unwanted artifacts in our sources, we will employ the VapourSynth framework via Python. While using Python might sound intimidating, those with no prior experience need not worry, as we will only be doing extremely basic things.

There are countless resources to setting up VapourSynth, e.g. the Irrational Encoding Wizardry's guide and the VapourSynth documentation. As such, this guide will not be covering installation and setup.

To start with writing scripts, it is important to know that every clip/filter must be given a variable name:

clip_a = source(a)
clip_b = source(b)

filter_a = filter(clip_a)
filter_b = filter(clip_b)

filter_x_on_a = filter_x(clip_a)
filter_y_on_a = filter_y(clip_b)

Additionally, many functions are in script collections or similar. These must be loaded manually and are then found under the given alias:

import vapoursynth as vs
core = vs.core
import awsmfunc as awf
import kagefunc as kgf
from vsutil import *

bbmod = awf.bbmod(...)
grain = kgf.adaptive_grain(...)
deband = core.f3kdb.Deband(...)

change_depth = depth(...)

So as to avoid conflicting function names, it is usually not recommended to do from x import *.

While many filters are in such collections, there are also filters that are available as plugins. These plugins can be called via core.namespace.plugin or alternatively clip.namespace.plugin. This means the following two are equivalent:

via_core = core.std.Crop(clip, ...)
via_clip = clip.std.Crop(...)

This is not possible for functions under scripts, meaning the following is NOT possible:

not_possible = clip.awf.bbmod(...)

In this guide, we will name the source clip to be worked with src and set variable names to reflect what their operation does.

While working on a video, one will usually encounter some visual artifacts, such as banding, darkened borders, etc. As these are not visually pleasant, and for the most part aren't intended by the original creator, it's prefered to use video filters to fix them.

Scene-filtering

As every filter is destructive in some way, it is desirable to only apply them whenever necessary. This is usually done using ReplaceFramesSimple, which is in the RemapFrames plugin. RemapFramesSimple can also be called with Rfs. Alternatively, one can use a Python solution, e.g. std.Trim and addition. However, RemapFrames tends to be faster, especially for larger sets of replacement mappings.

Let's look at an example of applying the f3kdb debanding filter to frames 100 through 200 and 500 through 750:

src = core.ffms2.Source("video.mkv")
deband = source.neo_f3kdb.Deband(src)

replaced = core.remap.Rfs(src, deband, mappings="[100 200] [500 750]")

There are various wrappers around both the plugin and the Python method, notably awsmfunc.rfs for the former and lvsfunc.util.replace_frames for the latter.

Filter order

In order for filters to work correctly and not be counterproductive, it is important to apply them in the proper order. This is especially important for filters like debanders and grainers, as putting these before a resize can completely negate their effect.

A generally acceptable order would be:

  1. Load the source
  2. Crop
  3. Raise bit depth
  4. Detint
  5. Fix dirty lines
  6. Deblock
  7. Resize
  8. Denoise
  9. Anti-aliasing
  10. Dering (dehalo)
  11. Deband
  12. Grain
  13. Dither to output bit depth

Keep in mind that this is just a general recommendation. There can always be a case where you might want to deviate, e.g. if you're using a fast denoiser like KNLMeansCL, you can do this before resizing.

It is generally recommended to mux raw streams (.h264, .h265, etc.) into containers (such as using MKVToolNix) before loading with source filters.

There are a few source filters that can be used to load videos into VapourSynth. If you have an NVIDIA GPU, DGDecNV is recommended:

src = core.dgdecodenv.DGSource("src.dgi")

.dgi index file is generated for the source video file using the DGIndexNV program in the package.

If you don't have an NVIDIA GPU, L-SMASH-Works is recommended:

src = core.lsmas.LWLibavSource("src.mkv")

ffms2 can also be be used, although certain versions add extra black frames to the beginning of video:

src = core.ffms2.Source("src.mkv")

For MPEG2 sources, d2vsource and DGMPGDec can be used.

Cropping will almost always be necessary with live action content, as black bars are almost always applied to fit a 16:9 frame.

If the to-be-cropped content has a resolution that is a multiple of 2 in each direction, this is a very straightforward process:

crop = src.std.Crop(top=2, bottom=2, left=2, right=2)

However, when this is not the case, things get a little bit more complicated. No matter what, you must always crop away as much as you can before proceeding.

If you're working with a completely black-and-white film, read the appendix entry on these. The following only applies to videos with colors.

Then, read through this guide's FillBorders explanation from the dirty lines subchapter. This will explain what to do if you do not plan on resizing.

If you do plan on resizing, proceed according to the resizing note from the dirty lines subchapter.

Resizing is a very complex topic. However, for simple downscaled encodes, one doesn't need to know very much information. As such, this page will only cover the necessities for downscaling. Those interested in knowing more about resampling should refer to the Irrational Encoding Wizardry's guide's resampling page for more information.

You can, however, check the later subchapters for some slightly more advanced topics such as descaling and rescaling or chroma resampling and shifting, both of which are absolute necessities to know about when encoding anime.

Downscaling

For downscaling, the go-to resizer is a spline36 resizer:

resize = src.resize.Spline36(1280, 720, dither_type="error_diffusion")

The parameters here should be straightforward: simply adjust width and height as necessary. Don't worry about dither_type="error_diffusion" yet, simply leave this as-is; all it does is make for a nicer looking output. The explanation for this parameter can be found in the dithering chapter.

Finding target dimensions

The desired dimensions for standard resolutions should be fairly known for 16:9 content: \(3840\times2160\) for 2160p, \(1920\times1080\) for 1080p, \(1280\times720\) for 720p.

However, most films aren't made in this aspect ratio. A more common aspect ratio would be 2.39:1, where the video is in \(2048\times858\). Consumer products are usually in the aforementioned resolutions, so it's more likely to see something like \(1920\times804\) after black bars are cropped.

Going from this to 720p gets us exactly \(1280\times536\):

\[\begin{align} w &= \frac{720}{1080}\times1920=1280 \\ h &= \frac{720}{1080}\times804 =536 \end{align} \]

However, this won't always be the case. Let's say your source is in \(1920\times806\):

\[\begin{align} w &= \frac{720}{1080}\times1920=1280 \\ h &= \frac{720}{1080}\times806 =537.\overline{3} \end{align} \]

Obviously, we can't resize to \(537.\overline{3}\), so we need to find the closest height with the lowest aspect ratio error. The solution here is to divide by two, round, then multiply by two again:

\[ h = \mathrm{round}\left( \frac{720}{1080} \times 806 \times \frac{1}{2} \right) \times 2 = 538 \]

In Python:

height = round(1280 / src.width / 2 * src.height) * 2

Now, we feed this to our resize:

resize = src.resize.Spline36(1280, height, dither_type="error_diffusion")

Alternatively, if our source was cropped on the left and right instead of top and bottom, we do:

width = round(720 / src.height / 2 * src.width) * 2

If you (understandably) don't want to bother with this, you can use the zresize wrapper in awsmfunc:

resize = awf.zresize(src, preset=720)

With the preset option, you don't have to bother calculating anything, just state the target resolution (in height) and it'll determine the correct dimensions for you.

Notes

For resizing uneven crops, please refer to the dirty lines chapter, specifically the FillBorders section and the notes.

Additionally, it is worth noting that resizing should not be done at the beginning of your script, as doing so can damage some of the filtering performed and even reintroduce issues.

Ideal resolutions

For digital anime, please refer to the descaling subchapter for this. It is extremely rare for descaling to be relevant for live action, too, but if your source is especially blurry and clearly a cheap production, it's also worth looking into.

It's common knowledge that not every source should be encoded in the source's resolution. As such, one should know how to determine whether a source warrants e.g. a 1080p encode or if a 720p encode would suffice from a detail-retention standpoint.

To do this, we simply compare a source downscaling and scaled back up:

downscale = src.resize.Spline36(1280, 720, dither_type="error_diffusion")
rescale = downscale.resize.Spline36(src.width, src.height, dither_type="error_diffusion")

Now, we interleave the two, then go through the video and see if details are blurred:

out = core.std.Interleave([src, rescale])

We can also perform all these with the UpscaleCheck wrapper from awsmfunc:

out = awf.UpscaleCheck(src)

Let's look at two examples. First, Shinjuku Swan II:

Here, edges get very blurry in the rescale, meaning a 1080p is warranted. This is especially noticeable in the plants' leaves.

Now, The Way of the Dragon:

Here, we see grain is blurred ever so slightly, and some compression artifacts are warped. However, edges and details are not affected, meaning a 720p would do just fine here.

Descaling

If you've read a bit about anime encoding, you've probably heard the term "descaling" before; this is the process of "reversing" an upscale by finding the native resolution and resize kernel used. When done correctly, this is a near-lossless process and produces a sharper output than standard spline36 resizing with less haloing artifacts. However, when done incorrectly, this will only add to the already existing issues that come with upscaling, such as haloing, ringing etc.

The most commonly used plugin to reverse upscales is Descale, which is most easily called via fvsfunc, which has an alias for each kernel, e.g. fvf.Debilinear. This supports bicubic, bilinear, lanczos, and spline upscales.

Most digitally produced anime content, especially TV shows, will be a bilinear or bicubic upscale from 720p, 810p, 864p, 900p, or anything in-between. While not something that can only be done with anime, it is far more prevalent with such content, so we will focus on anime accordingly.

As our example, we'll look at Nichijou, which is a bilinear upscale from 720p.

To showcase how nice a descale can look, let's compare with a standard spline resize:

descale = fvf.Debilinear(src, 1280, 720)
spline = src.resize.Spline36(1280, 720)
out = core.std.Interleave([descale, spline])

Native resolutions and kernels

Now, the first thing you need to do when you want to descale is figure out what was used to resize the video and from which resolution the resize was done. The most popular tool for this is getnative, which allows you to feed it an image, which it will then descale, resize, and calculate the difference from the source, then plot the result so you can find the native resolution.

For this to work best, you'll want to find a bright frame with very little blurring, VFX, grain etc.

Once you've found one, you can run the script as follows:

python getnative.py image.png -k bilinear

This will output a graph in a Results directory and guess the resolution. It's based to take a look at the graph yourself, though. In our example, these are the correct parameters, so we get the following:

alt text

There is a clear dip at 720p. We can also test other kernels:

python getnative.py image.png -k bicubic -b 0 -c 1

The graph then looks as follows:

alt text

If you'd like to test all likely kernels, you can use --mode "all".

To double check this, we compare the input frame with a descale upscaled back with the same kernel:

descale = fvf.Debilinear(src, 1280, 720)
rescale = descale.resize.Bilinear(src, src.width, src.height)
merge_chroma = rescale.std.Merge(src, [0, 1])
out = core.std.Interleave([src, merge_chroma])

Here, we've merged the chroma from the source with our rescale, as chroma is a lower resolution than the source resolution, so we can't descale it. The result:

As you can see, lineart is practically identical and no extra haloing or aliasing was introduced.

On the other hand, if we try an incorrect kernel and resolution, we see lots more artifacts in the rescaled image:

b, c = 0, 1
descale = fvf.Debicubic(src, 1440, 810, b=b, c=c)
rescale = descale.resize.Bicubic(src, src.width, src.height, filter_param_a=b, filter_param_b=c)
merge_chroma = rescale.std.Merge(src, [0, 1])
out = core.std.Interleave([src, merge_chroma])

Mixed Resolutions

The example above of incorrect kernel and height should make it obvious that descaling incorrectly is quite destructive. Unfortunately, most video that can be descaled has elements in other resolutions. Sometimes, different elements in a frame will have different resolutions, e.g. the background is in 900p, character A is in 810p, and character B is in 720p. In cases like this, it's usually safer to do a simple spline36 resize. One can technically do a lot of masking to fix this, but that's a lot of effort and masks failing is going to be likely.

A more common situation in which one will encounter mixed resolutions is credits and overlays, which are usually in 1080p. Let's look at what happens if we add some text to the above frame and descale that compared to a spline36 resize on it. To make comparing easier, these images are zoomed in by a factor of 3:

The debilinear resize clearly adds stronger haloing artifacts here.

To deal with this, we can use the DescaleM functions from fvsfunc, which mask these elements and scale them via a spline36 resize:

descale = fvf.DebilinearM(src, 1280, 720)

As these functions are comparatively slow, you might want to consider finding these elements beforehand and applying the function only to those frames. If you aren't certain that your frame doesn't have 1080p elements, though, stick with these functions.

Alternatively, in very rare cases, the resolution and/or kernel will change scene-by-scene, or even worse, frame-by-frame. You can consider trying lvsfunc.scale.descale, which tries to find the ideal height for each frame. Ideally, however, you should do this manually.

4:4:4 and 4:2:0

Upscaling and Rescaling

Upscaling

Rescaling

Chroma Resampling and Shifting

Bit Depths: An Introduction

When you filter a frame, the results are limited to values available in your bit depth. By default, most SDR content comes in 8-bit and HDR content in 10-bit. In 8-bit, you're limited to values between 0 and 255. However, as most video content is in limited range, this range becomes 16 to 235 for luma and 16 to 240 for chroma.

Let's say you want to raise every pixel whose value lies in the rang of 60 to 65 to the power of 0.88. Rounding to three decimal places:

OriginalRaised
6036.709
6137.247
6237.784
6338.319
6438.854
6539.388

As we're limited to integer values between 0 and 255, these round to 37, 37, 38, 38, 39, 39. So, while the filter doesn't lead to the same value, we round these all to the same ones. This quickly leads to unwanted banding artifacts. For example, raising to the power of 0.88 in 8-bit vs a higher bit depth of 32-bit:

To mitigate this, we work in higher bit depths and later use so called dither algorithms to add some fluctuation during rounding and prevent banding. The usual bit depths are 16-bit and 32-bit. While 16-bit sounds worse at first, the difference isn't noticeable and 32-bit, being in float instead of integer format, is not supported by every filter.

Luckily for those not working in higher bit depth, lots of filters force higher precisions internally and dither the results back properly. However, switching between bit depths multiple times is a waste of CPU cycles and, in extreme cases, can alter the image as well.

Changing bit depths

To work in a higher bit depth, you can use the depth function from vsutil at the start and end of your filter chain. This will use a high quality dither algorithm by default and takes only a few keystrokes:

from vsutil import depth

src = depth(src, 16)

resize = ...

my_great_filter = ...

out = depth(my_great_filter, 8)

When you're working in higher bit depths, it's important to remember that some functions might expect parameter input values in 8-bit, while others expect them in the input bit depth. If you mistakenly enter 255 assuming 8-bit in a function expecting 16-bit input, your results will be extremely different, as 255 is the higher value in 8-bit, while in 16-bit, this is roughly equivalent to 1 in 8-bit.

To convert values, you can use scale_value from vsutil, which will help handling edge cases etc.:

from vsutil import scale_value

v_8bit = 128

v_16bit = scale_value(128, 8, 16)

This would get you v_16bit = 32768, the middle point of 16-bit.

This isn't quite as simple for 32-bit float, as you need to specify whether to scale offsets depending on range and whether you're scaling luma or chroma. This is because limited range luma values are between 0 and 1, while chroma values are between -0.5 and +0.5. Usually, you're going to be dealing with TV range, so set scale_offsets=True:

from vsutil import scale_value

v_8bit = 128

v_32bit_luma = scale_value(128, 8, 32, scale_offsets=True)
v_32bit_chroma = scale_value(128, 8, 32, scale_offsets=True, chroma=True)

This gets us v_32bit_luma = 0.5, v_32bit_chroma = 0.

Dither Algorithms

TODO

DUE FOR REWRITE

This page is due for a rewrite to include the very useful debanders from vs-debandshit. You might find more success with dumb3kdb and f3kpf than what's listed here.

Debanding

This is the most common issue one will encounter. Banding usually happens when bitstarving and poor settings lead to smoother gradients becoming abrupt color changes, which obviously ends up looking bad. These can be fixed by performing blur-like operations and limiting their outputs.

Note that, as blurring is a very destructive process, it's advised to only apply this to necessary parts of your video and use masks to further limit the changes.

There are three great tools for VapourSynth that are used to fix banding: neo_f3kdb, fvsfunc's gradfun3, which has a built-in mask, and vs-placebo's placebo.Deband.

Banding example fixed with f3kdb default settings.

neo_f3kdb

deband = core.neo_f3kdb.deband(src=clip, range=15, y=64, cb=64, cr=64, grainy=64, grainc=64, dynamic_grain=False, sample_mode=2)

These settings may come off as self-explanatory for some, but here's what they do:

  • src This is obviously your source clip.

  • range This specifies the range of pixels that are used to calculate whether something is banded. A higher range means more pixels are used for calculation, meaning it requires more processing power. The default of 15 should usually be fine. Raising this may help make larger gradients with less steps look smoother, while lower values will help catch smaller instances.

  • y The most important setting, since most (noticeable) banding takes place on the luma plane. It specifies how big the difference has to be for something on the luma plane to be considered as banded. You should start low and slowly but surely build this up until the banding is gone. If it's set too high, lots of details will be seen as banding and hence be blurred. Depending on your sample mode, y values will either only have an effect in steps of 16 (mode 2) or 32 (modes 1, 3, 4). This means that y=20 is equivalent to y=30.

  • cb and cr The same as y but for chroma. However, banding on the chroma planes is comparatively uncommon, so you can often leave this off.

  • grainy and grainc In order to keep banding from re-occurring and to counteract smoothing, grain is usually added after the debanding process. However, as this fake grain is quite noticeable, it's recommended to be conservative. Alternatively, you can use a custom grainer, which will get you a far nicer output (see the graining section for more on this).

  • dynamic_grain By default, grain added by f3kdb is static. This compresses better, since there's obviously less variation, but it usually looks off with live action content, so it's normally recommended to set this to True unless you're working with animated content.

  • sample_mode Is explained in the README. Consider switching to 4, since it might have less detail loss.

In-depth function explanation TODO

GradFun3

The most popular alternative to f3kdb is gradfun3. This function is more resource intensive and less straightforward parameters, but can also prove useful in cases where f3kdb struggles. As there are a ton of parameters, this guide will only cover the most important ones:

import fvsfunc as fvf
deband = fvf.GradFun3(src, thr=0.35, radius=12, elast=3.0, mask=2, mode=3, smode=2, debug=False, planes=list(range(src.format.num_planes)), ref=src)
  • thr is equivalent to y, cb, and cr in what it does. You'll likely want to raise or lower it.

  • radius has the same effect as f3kdb's range.

  • smode sets the smooth mode. It's usually best left at its default, or set to 5 if you'd like to use a CUDA-enabled GPU instead of your CPU. Uses ref (defaults to input clip) as a reference clip.

  • mask sets the mask strength. 0 to disable. The default is a sane value.

  • planes sets which planes should be processed.

  • debug allows you to view the mask.

  • elast controls blending between debanded and source clip. Default is sane. Higher values prioritize debanded clip more.

In-depth function explanation TODO For a more in-depth explanation of what `thr` and `elast` do, check the algorithm explanation in mvsfunc.

placebo.Deband

This debander is quite new to the VapourSynth scene, but it's very good at fixing strong banding. However, as such, it is also prone to needless detail loss and hence should only be used when necessary and ideally combined with a detail/edge mask. It's (current) parameters:

placebo.Deband(clip clip[, int planes = 1, int iterations = 1, float threshold = 4.0, float radius = 16.0, float grain = 6.0, int dither = True, int dither_algo = 0])

It's not unlikely that this function will see significant change in the future, hence the README is also very much worth reading.

Parameters you'll want to look at:

  • planes obviously the to-be-processed planes. The syntax is different here, check the README. In short, default for luma-only, 1 | 2 | 4 for luma and chroma.

  • iterations sets how often the debander is looped. It's not recommended to change this from the default, although this can be useful in extreme cases.

  • threshold sets the debander's strength or rather the threshold when a pixel is changed. You probably don't want to go much higher than 12. Go up in steps of 1 and fine-tune if possible.

  • radius does the same as for the previous functions.

  • grain is again the same as f3kdb, although the grain is a lot nicer.

In-depth function explanation TODO It uses the mpv debander, which just averages pixels within a range and outputs the average if the difference is below a threshold. The algorithm is explained in the source code.

Banding detection

If you want to automate your banding detection, you can use banddtct from awsmfunc. Make sure to adjust the values properly and check the full output. Check this link for an explanation on how to use it. You can also just run adptvgrnMod or adaptive_grain with a high luma_scaling value in hopes that the grain covers it up fully. More on this in the graining section. Note that both of these methods won't be able to pick up/fix every kind of banding. banddtct can't find banding covered by grain, and graining to fix banding only works for smaller instances.

Deblocking

Deblocking is mostly equivalent to smoothing the source, usually with another mask on top. The most popular function here is Deblock_QED from havsfunc. The main parameters are

  • quant1: Strength of block edge deblocking. Default is 24. You may want to raise this value significantly.

  • quant2: Strength of block internal deblocking. Default is 26. Again, raising this value may prove to be beneficial.

In-depth function explanation TODO

Other popular options are deblock.Deblock, which is quite strong, but almost always works,

In-depth function explanation TODO

dfttest.DFTTest, which is weaker, but still quite aggressive, and fvf.AutoDeblock, which is quite useful for deblocking MPEG-2 sources and can be applied on the entire video. Another popular method is to simply deband, as deblocking and debanding are very similar. This is a decent option for AVC Blu-ray sources.

In-depth function explanation TODO

Graining

TODO: explain why we love grain so much and static vs. dynamic grain. Also, images.

Graining Filters

There are a couple different filters you can use to grain. As lots of functions work similarly, we will only cover AddGrain and libplacebo graining.

AddGrain

This plugin allows you to add grain to the luma and chroma grains in differing strengths and grain patterns:

grain = src.grain.Add(var=1.0, uvar=0.0, seed=-1, constant=False)

Here, var controls the grain strength for the luma plane, and uvar controls the strength for the chroma plane. seed allows you to specify a custom grain pattern, which is useful if you'd like to reproduce a grain pattern multiple times, e.g. for comparing encodes. constant allows you to choose between static and dynamic grain.

Raising the strength increases both the amount of grain added as well as the offset a grained pixel will have from the original pixel. For example, var=1 will lead to values being up to 3 8-bit steps away from the input values.

There's no real point in using this function directly, but it's good to know what it does, as it's considered the go-to grainer.

In-depth function explanation This plugin uses a normal distribution to find the values it changes the input by. The `var` parameter is the standard deviation (usually noted as \(\sigma\)) of the normal distribution.

This means that (these are approximations):

  • \(68.27\%\) of output pixel values are within \(\pm1\times\mathtt{var}\) of the input value
  • \(95.45\%\) of output pixel values are within \(\pm2\times\mathtt{var}\) of the input value
  • \(99.73\%\) of output pixel values are within \(\pm3\times\mathtt{var}\) of the input value
  • \(50\%\) of output pixel values are within \(\pm0.675\times\mathtt{var}\) of the input value
  • \(90\%\) of output pixel values are within \(\pm1.645\times\mathtt{var}\) of the input value
  • \(95\%\) of output pixel values are within \(\pm1.960\times\mathtt{var}\) of the input value
  • \(99\%\) of output pixel values are within \(\pm2.576\times\mathtt{var}\) of the input value

placebo.Deband as a grainer

Alternatively, using placebo.Deband solely as a grainer can also lead to some nice results:

grain = placebo.Deband(iterations=0, grain=6.0)

The main advantage here is it runs on your GPU, so if your GPU isn't already busy with other filters, using this can get you a slight speed-up.

In-depth function explanation TODO

adaptive_grain

This function from kagefunc applies AddGrain according to overall frame brightness and individual pixel brightness. This is very useful for covering up minor banding and/or helping x264 distribute more bits to darks.

grain = kgf.adaptive_grain(src, strength=.25, static=True, luma_scaling=12, show_mask=False)

strength here is var from AddGrain. The default or slightly lower is usually fine. You likely don't want to go above 0.75.

The luma_scaling parameter is used to control how strong it should favor darker frames over brighter frames, whereby lower luma_scaling will apply more grain to bright frames. You can use extremely low or extremely high values here depending on what you want. For example, if you want to grain all frames significantly, you might use luma_scaling=5, while if you just want to apply grain to darker parts of darker frames to cover up minor banding, you might use luma_scaling=100.

show_mask shows you the mask that's used to apply the grain, with whiter meaning more grain is applied. It's recommended to switch this on when tuning luma_scaling.

In-depth function explanation The author of the function wrote a fantastic blog post explaining the function and how it works.

GrainFactory3

TODO: rewrite this or just remove it.

An older alternative to kgf.adaptive_grain, havsfunc's GrainFactory3 is still quite interesting. It splits pixel values into four groups based on their brightness and applies differently sized grain at different strengths via AddGrain to these groups.

grain = haf.GrainFactory3(src, g1str=7.0, g2str=5.0, g3str=3.0, g1shrp=60, g2shrp=66, g3shrp=80, g1size=1.5, g2size=1.2, g3size=0.9, temp_avg=0, ontop_grain=0.0, th1=24, th2=56, th3=128, th4=160)

The parameters are explained above the source code.

This function is mainly useful if you want to apply grain to specific frames only, as overall frame brightness should be taken into account if grain is applied to the whole video.

For example, GrainFactory3 to make up for missing grain on left and right borders:

In-depth function explanation TODO

In short: Create a mask for each brightness group, use bicubic resizing with sharpness controlling b and c to resize the grain, then apply that. Temporal averaging just averages the grain for the current frame and its direct neighbors using misc.AverageFrames.

adptvgrnMod

This function resizes grain in the same way GrainFactory3 does, then applies it using the method from adaptive_grain. It also has some protection for darks and brights to maintain average frame brightness:

grain = agm.adptvgrnMod(strength=0.25, size=1, sharp=50, static=False, luma_scaling=12, seed=-1, show_mask=False)

Grain strength is controlled by strength, for both luma and chroma grain. If a single value is passed, the chroma graining applied is half of that value. To choose both strengths, pass a list like strength=[0.50, 0.35], which will apply luma strength of 0.50 and chroma strength of 0.35.

Just like adaptive_grain, the default or slightly lower is usually fine, but you shouldn't go too high. If you're using a size greater than the default, you can get away with higher values, e.g. strength=1, but it's still advised to stay conservative with grain application.

The size and sharp parameters allow you to make the applied grain look a bit more like the rest of the film's. It's recommended to play around with these so that fake grain isn't too obvious. In most cases, you will want to raise both of them ever so slightly, e.g. size=1.2, sharp=60.

static, luma_scaling, and show_mask are equivalent to adaptive_grain, so scroll up for explanations. seed is the same as AddGrain's; again, scroll up.

By default, adptvgrnMod will fade grain around extremes (16 or 235) and shades of gray. These features can be turned off by setting fade_edges=False and protect_neutral=False respectively.

It's recently become common practice to remove graining entirely from one's debander and grain debanded areas entirely with this function.

sizedgrn

If one wants to disable the brightness-based application, one can use sizedgrn, which is the internal graining function in adptvgrnMod.

Some examples of adptvgrnMod compared with sizedgrn for those curious

A bright scene, where the brightness-based application makes a large difference:

An overall darker scene, where the difference is a lot smaller:

A dark scene, where grain is applied evenly (almost) everywhere in the frame:

In-depth function explanation (Old write-up from the function's author.)

Size and Sharpness

The graining part of adptvgrnMod is the same as GrainFactory3's; it creates a "blank" (midway point of bit depth) clip at a resolution defined by the size parameter, then scales that via a bicubic kernel that uses b and c values determined by sharp:

$$\mathrm{grain\ width} = \mathrm{mod}4 \left( \frac{\mathrm{clip\ width}}{\mathrm{size}} \right)$$

For example, with a 1920x1080 clip and a size value of 1.5:

$$ \mathrm{mod}4 \left( \frac{1920}{1.5} \right) = 1280 $$

This determines the size of the frame the grainer operates on.

Now, the bicubic kernel's parameters are determined:

$$ b = \frac{\mathrm{sharp}}{-50} + 1 $$ $$ c = \frac{1 - b}{2} $$

This means that for the default sharp of 50, a Catmull-Rom filter is used:

$$ b = 0, \qquad c = 0.5 $$

Values under 50 will tend towards B-Spline (b=1, c=0), while ones above 50 will tend towards b=-1, c=1. As such, for a Mitchell (b=1/3, c=1/3) filter, one would require sharp of 100/3.

The grained "blank" clip is then resized to the input clip's resolution with this kernel. If size is greater than 1.5, an additional resizer call is added before the upscale to the input resolution:

$$ \mathrm{pre\ width} = \mathrm{mod}4 \left( \frac{\mathrm{clip\ width} + \mathrm{grain\ width}}{2} \right) $$

With our resolutions so far (assuming we did this for size 1.5), this would be 1600. This means with size 2, where this preprocessing would actually occur, our grain would go through the following resolutions:

$$ 960 \rightarrow 1440 \rightarrow 1920 $$

Fade Edges

The fade_edges parameter introduces the option to attempt to maintain overall average image brightness, similar to ideal dithering. It does so by limiting the graining at the edges of the clip's range. This is done via the following expression:

x y neutral - abs - low < x y neutral - abs + high > or
x y neutral - x + ?

Here, x is the input clip, y is the grained clip, neutral is the midway point from the previously grained clip, and low and high are the edges of the range (e.g. 16 and 235 for 8-bit luma). Converted from postfix to infix notation, this reads:

\[x = x\ \mathtt{if}\ x - \mathrm{abs}(y - neutral) < low\ \mathtt{or}\ x - \mathrm{abs}(y - neutral) > high\ \mathtt{else}\ x + (y - neutral)\]

The effect here is that all grain that wouldn't be clipped during output regardless of whether it grains in a positive or negative direction remains, while grain that would pass the plane's limits isn't taken.

In addition to this parameter, protect_neutral is also available. This parameter protects "neutral" chroma (i.e. chroma for shades of gray) from being grained. To do this, it takes advantage of AddGrainC working according to a Guassian distribution, which means that $$max\ value = 3 \times \sigma$$ (sigma being the standard deviation - the strength or cstrength parameter) is with 99.73% certainty the largest deviated value from the norm (0). This means we can perform a similar operation to the one for fade_edges to keep the midways from being grained. To do this, we resize the input clip to 4:4:4 and use the following expression:

\[\begin{align}x \leq (low + max\ value)\ \mathtt{or}\ x \geq (high - max\ value)\ \mathtt{and}\\ \mathrm{abs}(y - neutral) \leq max\ value\ \mathtt{and}\ \mathrm{abs}(z - neutral) \leq max\ value \end{align}\]

With x, y, z being each of the three planes. If the statement is true, the input clip is returned, else the grained clip is returned.

I originally thought the logic behind protect_neutral would also work well for fade_edges, but I then realized this would completely remove grain near the edges instead of fading it.

Now, the input clip and grained clip (which was merged via std.MergeDiff, which is x - y - neutral) can be merged via the adaptive_grain mask.

Dirty lines from A Silent Voice (2016)'s intro. On mouseover: fixed with ContinuityFixer and FillBorders.

One of the more common issues you may encounter are 'dirty lines', these are usually found on the borders of video where a row or column of pixels exhibits inconsistent luma values comparative to its surroundings. Oftentimes, this is the due to improper downscaling, for example downscaling after applying borders. Dirty lines can also occur because the compressionist doesn't consider that whilst they're working with 4:2:2 chroma subsampling (meaning their height doesn't have to be mod2), consumer video will be 4:2:0, leading to extra black rows that you can't get rid of during cropping if the main clip isn't placed properly. Another form of dirty lines is exhibited when the chroma planes are present on black bars. Usually, these should be cropped out. The opposite can also occur, however, where the planes with legitimate luma information lack chroma information.

It's important to remember that sometimes your source will have fake lines (often referred to as 'dead' lines), meaning ones without legitimate information. These will usually just mirror the next row/column. Do not bother fixing these, just crop them instead. An example:

Similarly, when attempting to fix dirty lines you should thoroughly check that your fix has not caused unwanted problems, such as smearing (common with overzealous ContinuityFixer values) or flickering (especially on credits, it is advisable to omit credit reels from your fix in most cases). If you cannot figure out a proper fix it is completely reasonable to either crop off the dirty line(s) or leave them unfixed. A bad fix is worse than no fix!

Here are five commonly used methods for fixing dirty lines:

rektlvls

From rekt. This is basically FixBrightnessProtect3 and FixBrightness from AviSynth in one, although unlike FixBrightness, not the entire frame is processed. Its values are quite straightforward. Raise the adjustment values to brighten, lower to darken. Set prot_val to None and it will function like FixBrightness, meaning the adjustment values will need to be changed.

from rekt import rektlvls
fix = rektlvls(src, rownum=None, rowval=None, colnum=None, colval=None, prot_val=[16, 235])

If you'd like to process multiple rows at a time, you can enter a list (e.g. rownum=[0, 1, 2]).

To illustrate this, let's look at the dirty lines in the black and white Blu-ray of Parasite (2019)'s bottom rows:

In this example, the bottom four rows have alternating brightness offsets from the next two rows. So, we can use rektlvls to raise luma in the first and third row from the bottom, and again to lower it in the second and fourth:

fix = rektlvls(src, rownum=[803, 802, 801, 800], rowval=[27, -10, 3, -3])

In this case, we are in FixBrightnessProtect3 mode. We aren't taking advantage of prot_val here, but people usually use this mode regardless, as there's always a chance it might help. The result:

In-depth function explanation In FixBrightness mode, this will perform an adjustment with std.Levels on the desired row. This means that, in 8-bit, every possible value \(v\) is mapped to a new value according to the following function: $$\begin{aligned} &\forall v \leq 255, v\in\mathbb{N}: \\ &\max\left[\min\left(\frac{\max(\min(v, \texttt{max_in}) - \texttt{min_in}, 0)}{(\texttt{max_in} - \texttt{min_in})}\times (\texttt{max_out} - \texttt{min_out}) + \texttt{min_out}, 255\right), 0\right] + 0.5 \end{aligned}$$ For positive adj_val, \(\texttt{max_in}=235 - \texttt{adj_val}\). For negative ones, \(\texttt{max_out}=235 + \texttt{adj_val}\). The rest of the values stay at 16 or 235 depending on whether they are maximums or minimums.

FixBrightnessProtect3 mode takes this a bit further, performing (almost) the same adjustment for values between the first \(\texttt{prot_val} + 10\) and the second \(\texttt{prot_val} - 10\), where it scales linearly. Its adjustment value does not work the same, as it adjusts by \(\texttt{adj_val} \times 2.19\). In 8-bit:

Line brightening: $$\begin{aligned} &\texttt{if }v - 16 <= 0 \\ &\qquad 16 / \\ &\qquad \texttt{if } 235 - \texttt{adj_val} \times 2.19 - 16 <= 0 \\ &\qquad \qquad 0.01 \\ &\qquad \texttt{else} \\ &\qquad \qquad 235 - \texttt{adj_val} \times 2.19 - 16 \\ &\qquad \times 219 \\ &\texttt{else} \\ &\qquad (v - 16) / \\ &\qquad \texttt{if }235 - \texttt{adj_val} \times 2.19 - 16 <= 0 \\ &\qquad \qquad 0.01 \\ &\qquad \texttt{else} \\ &\qquad \qquad 235 - \texttt{adj_val} \times 2.19 - 16 \\ &\qquad \times 219 + 16 \end{aligned}$$

Line darkening: $$\begin{aligned} &\texttt{if }v - 16 <= 0 \\ &\qquad\frac{16}{219} \times (235 + \texttt{adj_val} \times 2.19 - 16) \\ &\texttt{else} \\ &\qquad\frac{v - 16}{219} \times (235 + \texttt{adj_val} \times 2.19 - 16) + 16 \\ \end{aligned}$$

All of this, which we give the variable \(a\), is then protected by (for simplicity's sake, only doing dual prot_val, noted by \(p_1\) and \(p_2\)): $$\begin{aligned} & a \times \min \left[ \max \left( \frac{v - p_1}{10}, 0 \right), 1 \right] \\ & + v \times \min \left[ \max \left( \frac{v - (p_1 - 10)}{10}, 0 \right), 1 \right] \times \min \left[ \max \left( \frac{p_0 - v}{-10}, 0\right), 1 \right] \\ & + v \times \max \left[ \min \left( \frac{p_0 + 10 - v}{10}, 0\right), 1\right] \end{aligned}$$

bbmod

From awsmfunc. This is a mod of the original BalanceBorders function. While it doesn't preserve original data nearly as well as rektlvls, it will lead to decent results with high blur and thresh values and is easy to use for multiple rows, especially ones with varying brightness, where rektlvls is no longer useful. If it doesn't produce decent results, these can be changed, but the function will get more destructive the lower you set them. It's also significantly faster than the versions in havsfunc and sgvsfunc, as only necessary pixels are processed.

import awsmfunc as awf
bb = awf.bbmod(src=clip, left=0, right=0, top=0, bottom=0, thresh=[128, 128, 128], blur=[20, 20, 20], planes=[0, 1, 2], scale_thresh=False, cpass2=False)

The arrays for thresh and blur are again y, u, and v values. It's recommended to try blur=999 first, then lowering that and thresh until you get decent values.
thresh specifies how far the result can vary from the input. This means that the lower this is, the better. blur is the strength of the filter, with lower values being stronger, and larger values being less aggressive. If you set blur=1, you're basically copying rows. If you're having trouble with chroma, you can try activating cpass2, but note that this requires a very low thresh to be set, as this changes the chroma processing significantly, making it quite aggressive.

For our example, I've created fake dirty lines, which we will fix:

To fix this, we can apply bbmod with a low blur and a high thresh, meaning pixel values can change significantly:

fix = awf.bbmod(src, top=6, thresh=90, blur=20)

Our output is already a lot closer to what we assume the source should look like. Unlike rektlvls, this function is quite quick to use, so lazy people (i.e. everyone) can use this to fix dirty lines before resizing, as the difference won't be noticeable after resizing.

While you can use rektlvls on as many rows/columns as necessary, the same doesn't hold true for bbmod. Unless you are resizing after, you should only use bbmod on two rows/pixels for low blur values (\(\approx 20\)) or three for higher blur values. If you are resizing after, you can change the maximum value according to: \[ max_\mathrm{resize} = max \times \frac{resolution_\mathrm{source}}{resolution_\mathrm{resized}} \]

In-depth function explanation bbmod works by blurring the desired rows, input rows, and reference rows within the image using a blurred bicubic kernel, whereby the blur amount determines the resolution scaled to accord to \(\mathtt{\frac{width}{blur}}\). The output is compared using expressions and finally merged according to the threshold specified.

The function re-runs one function for the top border for each side by flipping and transposing. As such, this explanation will only cover fixing the top.

First, we double the resolution without any blurring (\(w\) and \(h\) are input clip's width and height): \[ clip_2 = \texttt{resize.Point}(clip, w\times 2, h\times 2) \]

Now, the reference is created by cropping off double the to-be-fixed number of rows. We set the height to 2 and then match the size to the double res clip: \[\begin{align} clip &= \texttt{CropAbs}(clip_2, \texttt{width}=w \times 2, \texttt{height}=2, \texttt{left}=0, \texttt{top}=top \times 2) \\ clip &= \texttt{resize.Point}(clip, w \times 2, h \times 2) \end{align}\]

Before the next step, we determine the \(blurwidth\): \[ blurwidth = \max \left( 8, \texttt{floor}\left(\frac{w}{blur}\right)\right) \] In our example, we get 8.

Now, we use a blurred bicubic resize to go down to \(blurwidth \times 2\) and back up: \[\begin{align} referenceBlur &= \texttt{resize.Bicubic}(clip, blurwidth \times 2, top \times 2, \texttt{b}=1, \texttt{c}=0) \\ referenceBlur &= \texttt{resize.Bicubic}(referenceBlur, w \times 2, top \times 2, \texttt{b}=1, \texttt{c}=0) \end{align}\]

Then, crop the doubled input to have height of \(top \times 2\): \[ original = \texttt{CropAbs}(clip_2, \texttt{width}=w \times 2, \texttt{height}=top \times 2) \]

Prepare the original clip using the same bicubic resize downwards: \[ clip = \texttt{resize.Bicubic}(original, blurwidth \times 2, top \times 2, \texttt{b}=1, \texttt{c}=0) \]

Our prepared original clip is now also scaled back down: \[ originalBlur = \texttt{resize.Bicubic}(clip, w \times 2, top \times 2, \texttt{b}=1, \texttt{c}=0) \]

Now that all our clips have been downscaled and scaled back up, which is the blurring process that approximates what the actual value of the rows should be, we can compare them and choose how much of what we want to use. First, we perform the following expression (\(x\) is \(original\), \(y\) is \(originalBlur\), and \(z\) is \(referenceBlur\)): \[ \max \left[ \min \left( \frac{z - 16}{y - 16}, 8 \right), 0.4 \right] \times (x + 16) + 16 \] The input here is: \[ balancedLuma = \texttt{Expr}(\texttt{clips}=[original, originalBlur, referenceBlur], \texttt{"z 16 - y 16 - / 8 min 0.4 max x 16 - * 16 +"}) \]

What did we do here? In cases where the original blur is low and supersampled reference's blur is high, we did: \[ 8 \times (original + 16) + 16 \] This brightens the clip significantly. Else, if the original clip's blur is high and supersampled reference is low, we darken: \[ 0.4 \times (original + 16) + 16 \] In normal cases, we combine all our clips: \[ (original + 16) \times \frac{originalBlur - 16}{referenceBlur - 16} + 16 \]

We add 128 so we can merge according to the difference between this and our input clip: \[ difference = \texttt{MakeDiff}(balancedLuma, original) \]

Now, we compare to make sure the difference doesn't exceed \(thresh\): \[\begin{align} difference &= \texttt{Expr}(difference, "x thresh > thresh x ?") \\ difference &= \texttt{Expr}(difference, "x thresh < thresh x ?") \end{align}\]

These expressions do the following: \[\begin{align} &\texttt{if }difference >/< thresh:\\ &\qquad thresh\\ &\texttt{else}:\\ &\qquad difference \end{align}\]

This is then resized back to the input size and merged using MergeDiff back into the original and the rows are stacked onto the input. The output resized to the same res as the other images:

FillBorders

From fb. This function pretty much just copies the next column/row in line. While this sounds, silly, it can be quite useful when downscaling leads to more rows being at the bottom than at the top, and one having to fill one up due to YUV420's mod2 height.

fill = core.fb.FillBorders(src=clip, left=0, right=0, bottom=0, top=0, mode="fixborders")

A very interesting use for this function is one similar to applying ContinuityFixer only to chroma planes, which can be used on gray borders or borders that don't match their surroundings no matter what luma fix is applied. This can be done with the following script:

fill = core.fb.FillBorders(src=clip, left=0, right=0, bottom=0, top=0, mode="fixborders")
merge = core.std.Merge(clipa=clip, clipb=fill, weight=[0,1])

You can also split the planes and process the chroma planes individually, although this is only slightly faster. A wrapper that allows you to specify per-plane values for fb is FillBorders in awsmfunc.

Note that you should only ever fill single columns/rows with FillBorders. If you have more black lines, crop them! If there are frames requiring different crops in the video, don't fill these up. More on this at the end of this chapter.

To illustrate what a source requiring FillBorders might look like, let's look at Parasite (2019)'s SDR UHD once again, which requires an uneven crop of 277. However, we can't crop this due to chroma subsampling, so we need to fill one row. To illustrate this, we'll only be looking at the top rows. Cropping with respect to chroma subsampling nets us:

crp = src.std.Crop(top=276)

Obviously, we want to get rid of the black line at the top, so let's use FillBorders on it:

fil = crp.fb.FillBorders(top=1, mode="fillmargins")

This already looks better, but the orange tones look washed out. This is because FillBorders only fills one chroma if two luma are fixed. So, we need to fill chroma as well. To make this easier to write, let's use the awsmfunc wrapper:

fil = awf.fb(crp, top=1)

Our source is now fixed. Some people may want to resize the chroma to maintain original aspect ratio performing lossy resampling on chroma, but whether this is the way to go is not generally agreed upon. If you want to go this route:

top = 1
bot = 1
new_height = crp.height - (top + bot)
fil = awf.fb(crp, top=top, bottom=bot)
out = fil.resize.Spline36(crp.width, new_height, src_height=new_height, src_top=top) 
In-depth function explanation FillBorders has four modes, although we only really care about mirror, fillmargins, and fixborders. The mirror mode literally just mirrors the previous pixels. Contrary to the third mode, repeat, it doesn't just mirror the final row, but the rows after that for fills greater than 1. This means that, if you only fill one row, these modes are equivalent. Afterwards, the difference becomes obvious.

In fillmargins mode, it works a bit like a convolution, whereby for rows it does a [2, 3, 2] of the next row's pixels, meaning it takes 2 of the left pixel, 3 of the middle, and 2 of the right, then averages. For borders, it works slightly differently: the leftmost pixel is just a mirror of the next pixel, while the eight rightmost pixels are also mirrors of the next pixel. Nothing else happens here.

The fixborders mode is a modified fillmargins that works the same for rows and columns. It compares fills with emphasis on the left, middle, and right with the next row to decide which one to use.

ContinuityFixer

From cf. ContinuityFixer works by comparing the rows/columns specified to the amount of rows/columns specified by range around it and finding new values via least squares regression. Results are similar to bbmod, but it creates entirely fake data, so it's preferable to use rektlvls or bbmod with a high blur instead. Its settings look as follows:

fix = core.cf.ContinuityFixer(src=clip, left=[0, 0, 0], right=[0, 0, 0], top=[0, 0, 0], bottom=[0, 0, 0], radius=1920)

This is assuming you're working with 1080p footage, as radius's value is set to the longest set possible as defined by the source's resolution. I'd recommend a lower value, although not going much lower than 3, as at that point, you may as well be copying pixels (see FillBorders below for that). What will probably throw off most newcomers is the array I've entered as the values for rows/columns to be fixed. These denote the values to be applied to the three planes. Usually, dirty lines will only occur on the luma plane, so you can often leave the other two at a value of 0. Do note an array is not necessary, so you can also just enter the amount of rows/columns you'd like the fix to be applied to, and all planes will be processed.

As ContinuityFixer is less likely to keep original data in tact, it's recommended to prioritize bbmod over it.

Let's look at the bbmod example again and apply ContinuityFixer:

fix = src.cf.ContinuityFixer(top=[6, 6, 6], radius=10)

Let's compare this with the bbmod fix (remember to mouse-over to compare):

The result is ever so slightly in favor of ContinuityFixer here. This will rarely be the case, as `ContinuityFixer` tends to be more destructive than `bbmod` already is.

Just like bbmod, ContinuityFixer shouldn't be used on more than two rows/columns. Again, if you're resizing, you can change this maximum accordingly: \[ max_\mathrm{resize} = max \times \frac{resolution_\mathrm{source}}{resolution_\mathrm{resized}} \]

In-depth function explanation ContinuityFixer works by calculating the least squares regression of the pixels within the radius. As such, it creates entirely fake data based on the image's likely edges. No special explanation here.

ReferenceFixer

From edgefixer. This requires the original version of edgefixer (cf is just an old port of it, but it's nicer to use and processing hasn't changed). I've never found a good use for it, but in theory, it's quite neat. It compares with a reference clip to adjust its edge fix as in ContinuityFixer.:

fix = core.edgefixer.Reference(src, ref, left=0, right=0, top=0, bottom=0, radius = 1920)

Notes

Too many rows/columns

One thing that shouldn't be ignored is that applying these fixes (other than rektlvls) to too many rows/columns may lead to these looking blurry on the end result. Because of this, it's recommended to use rektlvls whenever possible or carefully apply light fixes to only the necessary rows. If this fails, it's better to try bbmod before using ContinuityFixer.

Resizing

It's important to note that you should always fix dirty lines before resizing, as not doing so will introduce even more dirty lines. However, it is important to note that, if you have a single black line at an edge that you would use FillBorders on, you should remove that using your resizer.

For example, to resize a clip with a single filled line at the top to \(1280\times536\) from \(1920\times1080\):

top_crop = 138
bot_crop = 138
top_fill = 1
bot_fill = 0
src_height = src.height - (top_crop + bot_crop) - (top_fill + bot_fill)
crop = core.std.Crop(src, top=top_crop, bottom=bot_crop)
fix = core.fb.FillBorders(crop, top=top_fill, bottom=bot_fill, mode="fillmargins")
resize = core.resize.Spline36(fix, 1280, 536, src_top=top_fill, src_height=src_height)

An easier way of doing the above is using the relevant parameters in awsmfunc's zresize.

So the last line in the above example would become:

resize = awf.zresize(fix, preset=720, top=1)

Similarly, if you filled 1 line each on the left and right side, you would use:

resize = awf.zresize(fix, preset=720, left=1, right=1)

A significant benefit of using zresize is that it automatically calculates the most appropriate target resolution, to minimize the AR error.

Diagonal borders

If you're dealing with diagonal borders, the proper approach here is to mask the border area and merge the source with a FillBorders call. An example of this (from the Your Name (2016)):

Fix compared with unmasked in fillmargins mode and contrast adjusted for clarity:

Code used (note that this was detinted after):

mask = core.std.ShufflePlanes(src, 0, vs.GRAY).std.Binarize(43500)
cf = core.fb.FillBorders(src, top=6, mode="mirror").std.MaskedMerge(src, mask)

Finding dirty lines

Dirty lines can be quite difficult to spot. If you don't immediately spot any upon examining borders on random frames, chances are you'll be fine. If you know there are frames with small black borders on each side, you can use something like the following script:

def black_detect(clip, thresh=None):
    if thresh:
        clip = core.std.ShufflePlanes(clip, 0, vs.GRAY).std.Binarize(
            "{0}".format(thresh)).std.Invert().std.Maximum().std.Inflate( ).std.Maximum().std.Inflate()
    l = core.std.Crop(clip, right=clip.width / 2)
    r = core.std.Crop(clip, left=clip.width / 2)
    clip = core.std.StackHorizontal([r, l])
    t = core.std.Crop(clip, top=clip.height / 2)
    b = core.std.Crop(clip, bottom=clip.height / 2)
    return core.std.StackVertical([t, b])

This script will make values under the threshold value (i.e. the black borders) show up as vertical or horizontal white lines in the middle on a mostly black background. If no threshold is given, it will simply center the edges of the clip. You can just skim through your video with this active. An automated alternative would be dirtdtct, which scans the video for you.

Other kinds of variable dirty lines are a bitch to fix and require checking scenes manually.

Variable borders

An issue very similar to dirty lines is unwanted borders. During scenes with different crops (e.g. IMAX or 4:3), the black borders may sometimes not be entirely black, or be completely messed up. In order to fix this, simply crop them and add them back. You may also want to fix dirty lines that may have occurred along the way:

crop = core.std.Crop(src, left=100, right=100)
clean = core.cf.ContinuityFixer(crop, left=2, right=2, top=0, bottom=0, radius=25)
out = core.std.AddBorders(clean, left=100, right=100)

If you're resizing, you should crop these off before resizing, then add the borders back, as leaving the black bars in during the resize will create dirty lines:

crop = src.std.Crop(left=100, right=100)
clean = crop.cf.ContinuityFixer(left=2, right=2, top=2, radius=25)
resize = awf.zresize(clean, preset=720)
border_size = (1280 - resize.width) / 2
bsize_mod2 = border_size % 2
out = resize.std.AddBorders(left=border_size - bsize_mod2, right=border_size + bsize_mod2)

In the above example, we have to add more to one side than the other to reach our desired width. Ideally, your border_size will be mod2 and you won't have to do this.

If you know you have borders like these, you can use brdrdtct from awsmfunc similarly to dirtdtct to scan the file for them.

The 0.88 gamma bug

If you have two sources of which one is noticeably brighter than the other, chances are your brighter source is suffering from what's known as the gamma bug. If this is the case, do the following (for 16-bit) and see if it fixes the issue:

out = core.std.Levels(src, gamma=0.88, min_in=4096, max_in=60160, min_out=4096, max_out=60160, planes=0)

Do not perform this operation in low bit depth. Lower bit depths can and will lead to banding:

For the lazy, the fixlvls wrapper in awsmfunc defaults to a gamma bug fix in 32-bit.

In-depth explanation This error seems to stem from Apple software. This blog post is one of the few mentions of this blug one can find online.

The reason for this is likely that the software unnecessarily tries to convert between NTSC gamma (2.2) and PC gamma (2.5), as \(\frac{2.2}{2.5}=0.88\).

To undo this, every value just has to be raised to the power of 0.88, although TV range normalization has to be done:

\[ v_\mathrm{new} = \left( \frac{v - min_\mathrm{in}}{max_\mathrm{in} - min_\mathrm{in}} \right) ^ {0.88} \times (max_\mathrm{out} - min_\mathrm{out}) + min_\mathrm{out} \]

For those curious on how the gamma bug source and source will differ: all values other than 16, 232, 233, 234, and 235 are different, with the largest and most common difference being 10, lasting from 63 until 125. As an equal number of values can be hit and the operation is usually performed in high bit depth, significant detail loss is unlikely. However, do note that, no matter the bit depth, this is a lossy process.

Double range compression

A similar issue is double range compression. When this occurs, luma values will range between 30 and 218. This can easily be fixed with the following:

out = src.resize.Point(range_in=0, range=1, dither_type="error_diffusion")
out = out.std.SetFrameProp(prop="_ColorRange", intval=1)

In-depth explanation This issue means something or someone during the encoding pipeline assumed the input to be full range despite it already being in limited range. As the end result usually has to be limited range, this perceived issue is "fixed".

One can also do the exact same in std.Levels actually. The following math is applied for changing range:

\[ v_\mathrm{new} = \left( \frac{v - min_\mathrm{in}}{max_\mathrm{in} - min_\mathrm{in}} \right) \times (max_\mathrm{out} - min_\mathrm{out}) + min_\mathrm{out} \]

For range compression, the following values are used: \[ min_\mathrm{in} = 0 \qquad max_\mathrm{in} = 255 \qquad min_\mathrm{out} = 16 \qquad max_\mathrm{out} = 235 \]

As the zlib resizers use 32-bit precision to perform this internally, it's easiest to just use those. However, these will change the file's _ColorRange property, hence the need for std.SetFrameProp.

Other incorrect levels

A closely related issue is otherwise incorrect levels. To fix this, one ideally uses a reference source with correct levels, finds the equivalent values to 16 and 235, then adjusts from there (in 8-bit for clarity, do this in higher bit depths):

out = src.std.Levels(min_in=x, min_out=16, max_in=y, max_out=235)

However, this usually won't be possible. Instead, one can do the following math to figure out the correct adjustment values:

\[ v = \frac{v_\mathrm{new} - min_\mathrm{out}}{max_\mathrm{out} - min_\mathrm{out}} \times (max_\mathrm{in} - min_\mathrm{in}) + min_\mathrm{in} \]

Whereby one can just choose any low value from the to-be-adjusted source, set that as \(min_\mathrm{in}\), choose the value for that same pixel in the reference source as \(min_\mathrm{out}\). One does the same for high values and maximums. Then, one calculates this using 16 and 235 (again, preferably in high bit depths - 4096 and 60160 for 16-bit, 0 and 1 in 32-bit float etc.) for \(v_\mathrm{new}\) and the output values will be our \(x\) and \(y\) in the VapourSynth code above.

To illustrate this, let's use the German and American Blu-rays of Burning (2018). The USA Blu-ray has correct levels, while GER has incorrect ones:

A high value in GER here would be 199, while the same pixel is 207 in USA. For lows, one can find 29 and 27. With these, we get 18.6 and 225.4. Doing these for a couple more pixels and different frames, then averaging the values we get 19 and 224. Adjusting the luma with these values gets us closer to the reference video's1:

In-depth explanation Those who have read the previous explanations should recognize this function, as it is the inverse of the function used for level adjustment. We simply reverse it, set our desired values as \(v_\mathrm{new}\) and calculate.

Improper color matrix

If you have a source with an improper color matrix, you can fix this with the following:

out = core.resize.Point(src, matrix_in_s='470bg', matrix_s='709')

The ’470bg’ is color matrix is equivalent to BT.601's. To know if you should be doing this, you'll need some reference sources, preferably not web sources. Technically, you can identify bad colors and realize that it's necessary to change the matrix, but one should be extremely certain in such cases.

In-depth explanation Color matrices define how conversion between YCbCr and RGB takes place. As RGB naturally doesn't have any subsampling, the clip is first converted from 4:2:0 to 4:4:4, then from YCbCr to RGB, then the process is reverted. During the YCbCr to RGB conversion, we assume Rec.601 matrix coefficients, while during the conversion back, we specify Rec.709.

The reason why it's difficult to know whether the incorrect standard was assumed is because the two cover a similar range of CIE 1931. The chromaticity diagrams should make this obvious (Rec.2020 included as a reference):

Rounding error

A slight green tint may be indicative of a rounding error having occured. This issue is especially common among streaming services like Amazon, CrunchyRoll, HBO Max etc. To fix this, we need to add a half step in a higher bit depth than the source's:

high_depth = vsutil.depth(src, 16)
half_step = high_depth.std.Expr("x 128 +")
out = vsutil.depth(half_step, 8)

If the above fixes the tint, but introduces a very slight mismatch in contrast, try applying the filter to only the chroma plane:

half_step = high_depth.std.Expr(["", "x 128 +"])

Sometimes, sources will require slightly different values, although these seem to always be multiples of 16, with the most common after 128 being 64. It is also not uncommon for differing values to have to be applied to each plane. This is easily done with std.Expr by using a list of adjustments, where an empty one means no processing is done:

adjust = high_depth.std.Expr(["", "x 64 +", "x 128 +"])
In-depth explanation When the studio went from their 10-bit master to 8-bit, their software may have always rounded down (e.g. 1.9 would be rounded to 1). Our way of solving this simply adds an 8-bit half step, as \(0.5 \times 2 ^ {16 - 8} = 128\).

gettint

The gettint script can be used to automate detection of whether a tint was caused by a common error. To use it, first crop out any black bars, texts, and any other elements that might be untinted. Then, simply run it and wait for the results:

$ gettint source.png reference.png

The script can also take video files, where it will choose the middle frame and use AutoCrop to remove black bars.

If none of the common errors are detected as cause of the tint, the script will attempt to match via gamma, gain and offset, or level adjustments. The fixes for all of these have been explained here, with the exception of gain and offset, which can be fixed with std.Expr:

gain = 1.1
offset = -1
out = core.std.Expr(f"x {gain} / {offset} -")

Do note that, if using the gamma fix function from the gamma bug fix (or fixlvls), the gamma from gettint will need to be inverted.

Detinting

Please note that you should only resort to this method if all others fail.

If you've got a better source with a tint and a worse source without a tint, and you'd like to remove it, you can do so via timecube and DrDre's Color Matching Tool2. First, add two reference screenshots to the tool, export the LUT, save it, and add it via something like:

clip = core.resize.Point(src, matrix_in_s="709", format=vs.RGBS)
detint = core.timecube.Cube(clip, "LUT.cube")
out = core.resize.Point(detint, matrix=1, format=vs.YUV420P16, dither_type="error_diffusion")

1

For simplicity's sake, chroma planes weren't touched here. These require far more work than luma planes, as it's harder to find very vibrant colors, especially with screenshots like this.

2

This program is sadly closed source. Alternatives would be very welcome.

this needs to be reformatted

Masking is a less straightforward topic. The idea is to limit the application of filters according to the source image's properties. A mask will typically be grayscale, whereby how much of the two clips in question are applied is determined by the mask's brightness. So, if you do

mask = mask_function(src)
filtered = filter_function(src)
merge = core.std.MaskedMerge(src, filtered, mask)

The filtered clip will be used for every completely white pixel in mask, and the src clip for every black pixel, with in-between values determining the ratio of which clip is applied. Typically, a mask will be constructed using one of the following three functions:

  • std.Binarize: This simply separates pixels by whether they are above or below a threshold and sets them to black or white accordingly.

  • std.Expr: Known to be a very complicated function. Applies logic via reverse Polish notation. If you don't know what this is, read up on Wikipedia. Some cool things you can do with this are make some pixels brighter while keeping others the same (instead of making them dark as you would with std.Binarize): std.Expr("x 2000 > x 10 * x ?"). This would multiply every value above 2000 by ten and leave the others be. One nice use case is for in between values: std.Expr("x 10000 > x 15000 < and x {} = x 0 = ?".format(2**src.format.bits_per_sample - 1)).
    This makes every value between 10 000 and 15 000 the maximum value allowed by the bit depth and makes the rest zero, just like how a std.Binarize mask would. Many other functions can be performed via this.

  • std.Convolution: In essence, apply a matrix to your pixels. The documentation explains it well, so just read that if you don't get it. Lots of masks are defined via convolution kernels. You can use this to do a whole lot of stuff. For example, if you want to average all the values surrounding a pixel, do std.Convolution([1, 1, 1, 1, 0, 1, 1, 1, 1]). To illustrate, let's say you have a pixel with the value \(\mathbf{1}\) with the following \(3\times3\) neighborhood:

    \[\begin{bmatrix} 0 & 2 & 4 \\ 6 & \mathbf{1} & 8 \\ 6 & 4 & 2 \end{bmatrix}\]

    Now, let's apply a convolution kernel:

    \[\begin{bmatrix} 2 & 1 & 3 \\ 1 & 0 & 1 \\ 4 & 1 & 5 \end{bmatrix}\]

    This will result in the pixel 1 becoming: \[\frac{1}{18} \times (2 \times 0 + 1 \times 2 + 3 \times 4 + 1 \times 6 + 0 \times \mathbf{1} + 1 \times 8 + 4 \times 6 + 1 \times 4 + 5 \times 2) = \frac{74}{18} \approx 4\]

So, let's say you want to perform what is commonly referred to as a simple "luma mask":

y = core.std.ShufflePlanes(src, 0, vs.GRAY)
mask = core.std.Binarize(y, 5000)
merge = core.std.MaskedMerge(filtered, src, mask)

In this case, I'm assuming we're working in 16-bit. What std.Binarize is doing here is making every value under 5000 the lowest and every value above 5000 the maximum value allowed by our bit depth. This means that every pixel above 5000 will be copied from the source clip.

Let's try this using a filtered clip which has every pixel's value multiplied by 8:

Binarize mask applied to luma with filtered clip being std.Expr("x 8 *").

Simple binarize masks on luma are very straightforward and often do a good job of limiting a filter to the desired area, especially as dark areas are more prone to banding and blocking.

A more sophisticated version of this is adaptive_grain from earlier in this guide. It scales values from black to white based on both the pixel's luma value compared to the image's average luma value. A more in-depth explanation can be found on the creator's blog. We manipulate this mask using a luma_scaling parameter. Let's use a very high value of 500 here:

kgf.adaptive_grain(y, show_mask=True, luma_scaling=500) mask applied to luma with filtered clip being std.Expr("x 8 *").

Alternatively, we can use an std.Expr to merge the clips via the following logic:

if abs(src - filtered) <= 1000:
    return filtered
elif abs(src - filtered) >= 30000:
    return src
else:
    return src + (src - filtered) * (30000 - abs(src - filtered)) / 29000

This is almost the exact algorithm used in mvsfunc.LimitFilter, which GradFun3 uses to apply its bilateral filter. In VapourSynth, this would be:

expr = core.std.Expr([src, filtered], "x y - abs 1000 > x y - abs 30000 > x x y - 30000 x y - abs - * 29000 / + x ? y ?")

LimitFilter style expression to apply filter std.Expr("x 8 *") to source.

Now, let's move on to the third option: convolutions, or more interestingly for us, edge masks. Let's say you have a filter that smudges details in your clip, but you still want to apply it to detail-free areas. We can use the following convolutions to locate horizontal and vertical edges in the image:

\[\begin{aligned} &\begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1 \end{bmatrix} &\begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{bmatrix}\end{aligned}\]

Combining these two is what is commonly referred to as a Sobel-type edge mask. It produces the following for our image of the lion:

image Now, this result is obviously rather boring. One can see a rough outline of the background and the top of the lion, but not much more can be made out.
To change this, let's introduce some new functions:

  • std.Maximum/Minimum: Use this to grow or shrink your mask, you may additionally want to apply coordinates=[0, 1, 2, 3, 4, 5, 6, 7] with whatever numbers work for you in order to specify weights of the surrounding pixels.

  • std.Inflate/Deflate: Similar to the previous functions, but instead of applying the maximum of pixels, it merges them, which gets you a slight blur of edges. Useful at the end of most masks so you get a slight transition between masked areas.

We can combine these with the std.Binarize function from before to get a nifty output:

mask = y.std.Sobel()
binarize = mask.std.Binarize(3000)
maximum = binarize.std.Maximum().std.Maximum()
inflate = maximum.std.Inflate().std.Inflate().std.Inflate()

Sobel mask from before manipulated with std.Binarize, std.Maximum, and std.Inflate.

A common example of a filter that might smudge the output is an anti-aliasing or a debanding filter. In the case of an anti-aliasing filter, we apply the filter via the mask to the source, while in the case of the debander, we apply the source via the mask to the filtered source:

mask = y.std.Sobel()

aa = taa.TAAmbk(src, aatype=3, mtype=0)
merge = core.std.MaskedMerge(src, aa, mask)

deband = src.f3kdb.Deband()
merge = core.std.MaskedMerge(deband, src, mask)

We can also use a different edge mask, namely kgf.retinex_edgemask, which raises contrast in dark areas and creates a second edge mask using the output of that, then merges it with the edge mask produced using the untouched image:

kgf.retinex_edgemask applied to luma.

This already looks great. Let's manipulate it similarly to before and see how it affects a destructive deband in the twig area at the bottom:

deband = src.f3kdb.Deband(y=150, cb=150, cr=150, grainy=0, grainc=0)
mask = kgf.retinex_edgemask(src).std.Binarize(8000).std.Maximum()
merge = core.std.MaskedMerge(deband, src, mask)

A very strong deband protected using kgf.retinex_edgemask.

While some details remain smudged, we've successfully recovered a very noticeable portion of the twigs. Another example of a deband suffering from detail loss without an edge mask can be found under figure 35 in the appendix.

Other noteworthy edge masks easily available in VapourSynth include:

  • std.Prewitt is similar to Sobel. It's the same operator with the 2 switched out for a 1.

  • tcanny.TCanny is basically a Sobel mask thrown over a blurred clip.

  • kgf.kirsch will generate almost identical results to retinex_edgemask in bright scenes, as it's one of its components. Slower than the others, as it uses more directions, but will get you great results.

Some edge mask comparisons can be found in the appendix under figures 26{reference-type="ref" reference="fig:16"}, 30{reference-type="ref" reference="fig:10"} and 34{reference-type="ref" reference="fig:23"}.

As a debanding alternative to edge masks, we can also use "range" masks, which employ std.Minimum and std.Maximum to locate details. The most well known example of this is the mask inside GradFun3. This works as follows:

Then, two clips are created, one which will employ std.Maximum, while the other obviously will use std.Minimum. These use special coordinates depending on the mrad value given. If \(\mathtt{mrad} \mod 3 = 1\), [0, 1, 0, 1, 1, 0, 1, 0] will be used as coordinates. Otherwise, [1, 1, 1, 1, 1, 1, 1, 1] is used. Then, this process is repeated with \(\mathtt{mrad} = \mathtt{mrad} - 1\) until $\mathtt{mrad} = 0$. This all probably sounds a bit overwhelming, but it's really just finding the maximum and minimum values for each pixel neighborhood.

Once these are calculated, the minimized mask is subtracted from the maximized mask, and the mask is complete. So, let's look at the output compared to the modified retinex_edgemask from earlier:

Comparison of retinex_edgemask.std.Binarize(8000).std.Maximum() and default GradFun3.

Here, we get some more pixels picked up by the GradFun3 mask in the skies and some brighter flat textures. However, the retinex-type edge mask prevails in darker, more detailed areas. Computationally, our detail mask is a lot quicker, however, and it does pick up a lot of what we want, so it's not a bad choice.

Fortunately for us, this isn't the end of these kinds of masks. There are two notable masks based on this concept: debandmask and lvsfunc.denoise.detail_mask. The former takes our GradFun3 mask and binarizes it according to the input luma's brightness. Four parameters play a role in this process: lo, hi, lothr, and hithr. Values below lo are binarized according to lothr, values above hi are binarized according to hithr, and values in between are binarized according to a linear scaling between the two thresholds:

\[\frac{\mathtt{mask} - \mathtt{lo}}{\mathtt{hi} - \mathtt{lo}} \times (\mathtt{hithr} - \mathtt{lothr}) + \mathtt{lothr}\]

This makes it more useful in our specific scenario, as the mask becomes stronger in darks compared to GradFun3. When playing around with the parameters, we can e.. lower lo so we our very dark areas aren't affected too badly, lower lothr to make it stronger in these darks, raise hi to enlarge our lo to hi gap, and raise hithr to weaken it in brights. Simple values might be lo=22 << 8, lothr=250, hi=48 << 8, hithr=500:

Comparison of retinex_edgemask.std.Binarize(8000).std.Maximum(), default GradFun3, and default debandmask(lo=22 << 8, lothr=250, hi=48 << 8, hithr=500).

While not perfect, as this is a tough scene, and parameters might not be optimal, the difference in darks is obvious, and less banding is picked up in the background's banding.

Our other option for an altered GradFun3 is lvf.denoise.detail_mask. This mask combines the previous idea of the GradFun3 mask with a Prewitt-type edge mask.

First, two denoised clips are created using KNLMeansCL, one with half the other's denoise strength. The stronger one has a GradFun3-type mask applied, which is then binarized, while the latter has a Prewitt edge mask applied, which again is binarized. The two are then combined so the former mask gets any edges it may have missed from the latter mask.

The output is then put through two calls of RemoveGrain, the first one setting each pixel to the nearest value of its four surrounding pixel pairs' (e.. top and bottom surrounding pixels make up one pair) highest and lowest average value. The second call effectively performs the following convolution: \[\begin{bmatrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{bmatrix}\]

By default, the denoiser is turned off, but this is one of its advantages for us in this case, as we'd like the sky to have fewer pixels picked up while we'd prefer more of the rest of the image to be picked up. To compare, I've used a binarize threshold similar to the one used in the debandmask example. Keep in mind this is a newer mask, so my inexperience with it might show to those who have played around with it more:

Comparison of retinex_edgemask.std.Binarize(8000).std.Maximum(), default GradFun3, default debandmask(lo=22 << 8, lothr=250, hi=48 << 8, hithr=500), and detail_mask(pre_denoise=.3, brz_a=300, brz_b=300).

Although an improvement in some areas, in this case, we aren't quite getting the step up we would like. Again, better optimized parameters might have helped.

In case someone wants to play around with the image used here, it's available in this guide's repository: https://git.concertos.live/Encode_Guide/mdbook-guide/src/branch/master/src/filtering/Pictures/lion.png.

Additionally, the following functions can be of help when masking, limiting et cetera:

  • std.MakeDiff and std.MergeDiff: These should be self-explanatory. Use cases can be applying something to a degrained clip and then merging the clip back, as was elaborated in the Denoising section.

  • std.Transpose: Transpose (i.. flip) your clip.

  • std.Turn180: Turns by 180 degrees.

  • std.BlankClip: Just a frame of a solid color. You can use this to replace bad backgrounds or for cases where you've added grain to an entire movie but you don't want the end credits to be full of grain. To maintain TV range, you can use std.BlankClip(src, color=[16, 128, 128]) for 8-bit black. Also useful for making area based masks.

  • std.Invert: Self-explanatory. You can also just swap which clip gets merged via the mask instead of doing this.

  • std.Limiter: You can use this to limit pixels to certain values. Useful for maintaining TV range (std.Limiter(min=16, max=235)).

  • std.Median: This replaces each pixel with the median value in its neighborhood. Mostly useless.

  • std.StackHorizontal/std.StackVertical: Stack clips on top of/next to each other.

  • std.Merge: This lets you merge two clips with given weights. A weight of 0 will return the first clip, while 1 will return the second. The first thing you give it is a list of clips, and the second item is a list of weights for each plane. Here's how to merge chroma from the second clip into luma from the first: std.Merge([first, second], [0, 1]). If no third value is given, the second one is copied for the third plane.

  • std.ShufflePlanes: Extract or merge planes from a clip. For example, you can get the luma plane with std.ShufflePlanes(src, 0, vs.GRAY).

If you want to apply something to only a certain area, you can use the wrapper rekt or rekt_fast. The latter only applies you function to the given area, which speeds it up and is quite useful for anti-aliasing and similar slow filters. Some wrappers around this exist already, like rektaa for anti-aliasing. Functions in rekt_fast are applied via a lambda function, so instead of src.f3kdb.Deband(), you input rekt_fast(src, lambda x: x.f3kdb.Deband()).

One more very special function is std.FrameEval. What this allows you to do is evaluate every frame of a clip and apply a frame-specific function. This is quite confusing, but there are some nice examples in VapourSynth's documentation: http://www.vapoursynth.com/doc/functions/frameeval.html. Now, unless you're interested in writing a function that requires this, you likely won't ever use it. However, many functions use it, including
kgf.adaptive_grain, awf.FrameInfo, fvf.AutoDeblock, TAAmbk, and many more. One example I can think of to showcase this is applying a different debander depending on frame type:

import functools
def FrameTypeDeband(n, f, clip):
    if clip.props['_PictType'].decode() == "B":
        return core.f3kdb.Deband(clip, y=64, cr=0, cb=0, grainy=64, grainc=0, keep_tv_range=True, dynamic_grain=False)
    elif clip.props['_PictType'].decode() == "P":
        return core.f3kdb.Deband(clip, y=48, cr=0, cb=0, grainy=64, grainc=0, keep_tv_range=True, dynamic_grain=False)
    else:
        return core.f3kdb.Deband(clip, y=32, cr=0, cb=0, grainy=64, grainc=0, keep_tv_range=True, dynamic_grain=False)
        
out = core.std.FrameEval(src, functools.partial(FrameTypeDeband, clip=src), src)

If you'd like to learn more, I'd suggest reading through the Irrational Encoding Wizardry GitHub group's guide: https://guide.encode.moe/encoding/masking-limiting-etc.html and reading through most of your favorite Python functions for VapourSynth. Pretty much all of the good ones should use some mask or have developed their own mask for their specific use case.

Edge detection is also very thoroughly explained in a lot of digital image processing textbooks, e.g. Digital Image Processing by Gonzalez and Woods.

HQDeringmod

Sharpening

DeHalo_alpha

old function explanation `DeHalo_alpha` works by downscaling the source according to `rx` and `ry` with a mitchell bicubic ($b=\nicefrac{1}{3},\ c=\nicefrac{1}{3}$) kernel, scaling back to source resolution with blurred bicubic, and checking the difference between a minimum and maximum (check [3.2.14](#masking){reference-type="ref" reference="masking"} if you don't know what this means) for both the source and resized clip. The result is then evaluated to a mask according to the following expressions, where $y$ is the maximum and minimum call that works on the source, $x$ is the resized source with maximum and minimum, and everything is scaled to 8-bit: $$\texttt{mask} = \frac{y - x}{y + 0.0001} \times \left[255 - \texttt{lowsens} \times \left(\frac{y + 256}{512} + \frac{\texttt{highsens}}{100}\right)\right]$$ This mask is used to merge the source back into the resized source. Now, the smaller value of each pixel is taken for a lanczos resize to $(\texttt{height} \times \texttt{ss})\times(\texttt{width} \times \texttt{ss})$ of the source and a maximum of the merged clip resized to the same resolution with a mitchell kernel. The result of this is evaluated along with the minimum of the merged clip resized to the aforementioned resolution with a mitchell kernel to find the minimum of each pixel in these two clips. This is then resized to the original resolution via a lanczos resize, and the result is merged into the source via the following:
if original < processed
    x - (x - y) * darkstr
else
    x - (x - y) * brightstr

Masking

Denoising

BM3D

KNLMeansCL

SMDegrain

Grain Dampening

STPresso

Dehardsubbing

hardsubmask

hardsubmask_fades

Delogoing

DeLogoHD

To start off, you'll want to select a smaller region of your video file to use as reference, since testing on the entire thing would take forever. The recommended way of doing this is by using awsmfunc's SelectRangeEvery:

import awsmfunc as awf
out = awf.SelectRangeEvery(clip, every=15000, length=250, offset=[1000, 5000])

Here, the first number is the offset between sections, the second one is the length of each section, and the offset array is the offset from start and end.

You'll want to use a decently long clip (a couple thousand frames usually) that includes both dark, bright, static, and action scenes, however, these should be roughly as equally distributed as they are in the entire video.

When testing settings, you should always use 2-pass encoding, as many settings will substantially change the bitrate CRF gets you. For the final encode, both are fine, although CRF is faster.

To find out what setting is best, compare them all to each other and the source. You can do so by interleaving them either individually or by a folder via awsmfunc. You'll usually also want to label them, so you actually know which clip you're looking at:

# Load the files before this
src = awf.FrameInfo(src, "Source")
test1 = awf.FrameInfo(test1, "Test 1")
test2 = awf.FrameInfo(test2, "Test 2")
out = core.std.Interleave([src, test1, test2])

# You can also place them all in the same folder and do
src = awf.FrameInfo(src, "Source")
folder = "/path/to/settings_folder"
out = awf.InterleaveDir(src, folder, PrintInfo=True, first=extract, repeat=True)

If you are using vspreview or VSEdit, you can use output nodes to set each clip to a different node. Then you can switch between them using the number keys on your keyboard.

# Load the files before this
src = awf.FrameInfo(src, "Source")
test1 = awf.FrameInfo(test1, "Test 1")
test2 = awf.FrameInfo(test2, "Test 2")
src.set_output(0)
test1.set_output(1)
test2.set_output(2)

If you're using yuuno, you can use the following iPython magic to get the preview to switch between two source by hovering over the preview screen:

%vspreview --diff
clip_A = core.ffms2.Source("settings/crf/17.0")
clip_A.set_output()
clip_B = core.ffms2.Source("settings/crf/17.5")
clip_B.set_output(1)

Usually, you'll want to test for the bitrate first. Just encode at a couple different CRFs and compare them to the source to find the highest CRF value that is indistinguishable from the source. Now, round the value, preferably down, and switch to 2-pass. For standard testing, test qcomp (intervals of 0.05), aq-modes with aq-strengths in large internals (e.. for one aq-mode do tests with aq-strengths ranging from 0.6 to 1.0 in intervals of 0.2), aq-strength (intervals of 0.05), merange (32, 48, and 64), psy-rd (intervals of 0.05), ipratio/pbratio (intervals of 0.05 with distance of 0.10 maintained), and then deblock (intervals of 1). If you think mbtree could help (i.. you're encoding animation), redo this process with mbtree turned on. You probably won't want to change the order much, but it's certainly possible to do so.

For x265, the order should be qcomp, aq-mode, aq-strength, psy-rd, psy-rdoq, ipratio and pbratio, and then deblock.

If you want that little extra efficiency, you can redo the tests again with smaller intervals surrounding the areas around which value you ended up deciding on for each setting. It's recommended to do this after you've already done one test of each setting, as they do all have a slight effect on each other.

Once you're done testing the settings with 2-pass, switch back to CRF and repeat the process of finding the highest transparent CRF value.

General settings

These are the settings that you shouldn't touch between encodes.

Preset

Presets apply a number of parameters, which can be referenced here Just use the placebo preset, we'll change the really slow stuff, anyway:

--preset placebo

Level

Where --preset applies a defined set of parameters, --level provides a set of limitations to ensure decoder compatibility. For further reading, see this wikipedia article

For general hardware support level 4.1 is recommended, otherwise you may omit this.

--level 41

Motion estimation

For further reading see this excellent thread on Doom9

x264 has two motion estimation algorithms worth using, umh and tesa. The latter is a placebo option that's really, really slow, and seldom yields better results, so only use it if you don't care about encode time. Otherwise, just default to umh:

--me umh

Ratecontrol lookahead

The ratecontrol lookahead (rc-lookahead) setting determines how far ahead the video buffer verifier (VBV) and macroblock tree (mbtree) can look. Raising this can slightly increase memory use, but it's generally best to leave this as high as possible:

--rc-lookahead 250

If you're low on memory, you can lower it to e.g. 60.

Source-specific settings

These settings should be tested and/or adjusted for every encode.

Profile

Just set this according to the format you're encoding:

--profile high
  • high for 8-bit 4:2:0
  • high444 for 10-bit 4:4:4, 4:2:2, 4:2:0, lossless

Ratecontrol

Beyond all else, this is the single most important factor for determining the quality from any given input. Due to how poorly x264 handles lower bitrates (comparatively, particularly when encoding 8-bit) starving your encode will result in immediate artifacts observable even under the lightest scrutiny.

While manipulating other settings may make small, but usually noticeable differences; an encode can't look great unless given enough bits.

For some futher insight, reference this article

Constant ratefactor

For more information please see this post by an x264 developer.

The constant ratefactor (CRF) is the suggested mode for encoding. Rather than specifying a target bitrate, CRF attempts to ensure a consistent quality across a number of frames; as such, the resulting bitrate will only be consistent if passed identical values. In short, CRF is recommended for use with your finalized encode, not testing, where two pass is recommended.

Lower CRF values are higher quality, higher values lower. Some settings will have a very large effect on the bitrate at a constant CRF, so it's hard to recommend a CRF range, but most encodes will use a value between 15.0 and 20.0. It's best to test your CRF first to find an ideal bitrate, then test it again after all other settings have been tested with 2pass.

To specify CRF:

--crf 16.9

Two pass

An alternative to CRF which leverages an initial pass to collect information about the input before encoding. This comes with two distinct advantages:

  • The ability to target a specific bitrate
  • Effectively infinite lookahead

This is very suitable for testing settings, as you will always end up at almost the same bitrate no matter what.

As mentioned previously, to encode this two runs are necessary. The first pass can be sent to /dev/null, since all we care about is the stats file. To specify which of the two passes you're encoding all we need to change is --pass

vspipe -c y4m script.vpy - | x264 --demuxer y4m --preset placebo --pass 1 --bitrate 8500 -o /dev/null -
vspipe -c y4m script.vpy - | x264 --demuxer y4m --preset placebo --pass 2 --bitrate 8500 -o out.h264 -

Chroma quantizer offset

If you're struggling with chroma getting overly distorted, it can be worth tinkering with this option. You can find examples of what bitstarved chroma can look like HERE and HERE. Lower values give more bits to chroma, higher values will take away. By default, this will be -2 for 4:2:0 and -6 for 4:4:4. Setting this will add your offset onto -2.

To lower the chroma QP offset from -2 to -3, granting chroma more bits:

--chroma-qp-offset -1

Deblock

For an explanation of what deblock does, read this Doom9 post and this blog post

Set this too high and you'll have a blurry encode, set it too low and you'll have an overly blocky encode. We recommend testing deblock in a range of -2:-2 to 0:0 (animated content should use stronger deblock settings). Usually, you can test this with the same alpha and beta parameters at first, then test offsets of \(\pm1\).

Many people will mindlessly use -3:-3, but this tends to lead to unnecessary blocking that could've been avoided had this setting been tested.

To specify e.g. an alpha of -2 and a beta of -1:

--deblock -2:-1

Quantizer curve compression

The quantizer curve compression (qcomp) is effectively the setting that determines how bits are distributed among the whole encode. It has a range of 0 to 1, where 0 is a constant bitrate and 1 is a constant quantizer, the opposite of a constant bitrate. This means qcomp affects how much bitrate you're giving to "important" scenes as opposed to "unimportant" scenes. In other words, it can be seen as the trade-off between bitrate allocation to simple or static scenes and complex or high motion scenes. Higher qcomp will allocate more to the latter, lower to the former.

It's usually recommended to set this between 0.60 and 0.70 without mbtree and 0.70 and 0.85 with mbtree. You want to find that sweet spot where complex scenes will look good enough without ruining simple scenes.

--qcomp 0.60

Macroblock tree

From this thread by an x264 developer: "It tracks the propagation of information from future blocks to past blocks across motion vectors. It could be described as localizing qcomp to act on individual blocks instead of whole scenes. Thus instead of lowering quality in high-complexity scenes (like x264 currently does), it'll only lower quality on the complex part of the scene, while for example a static background will remain high-quality. It also has many other more subtle effects, some potentially negative, most probably not."

Curious readers can reference the paper directly

Macroblock tree ratecontrol (mbtree) can lead to large savings for very flat content, but tends to be destructive on anything with a decent amount of grain. If you're encoding something with very little movement and variation, especially cartoons and less grainy digital anime, it's recommended to test this setting to see if it's worth it.

When using mbtree, you should max out your lookahead (--rc-lookahead 250) and use a high qcomp (\(\geq 0.70\)).

Adaptive quantization

While qcomp determines bit allocation for frames across the video, adaptive quantization (AQ) is in charge of doing this on a block-basis, distributing bits not only within the current frame, but also adjacent frames. [citation needed] It does so by distributing bits e.g. from complex to flat blocks.

There are three modes available in vanilla x264:

  1. Allow AQ to redistribute bits across the whole video and within frames.
  2. Auto-variance; this attempts to adapt strength per-frame.
  3. Auto-variance with a bias to dark scenes.

Generally speaking, you'll likely get the best results with AQ mode 3. With the other two modes, you have to carefully make sure that darks aren't being damaged too much. If you e.g. have a source without any dark scenes (or only very few), it can be worth manually allocating more bits to darks via zoning and using AQ modes 1 or 2.

This comes along with a strength parameter. For modes 1 and 2, you usually want a strength between 0.80 and 1.30. Mode 3 is a bit more aggressive and usually looks best with a strength between 0.60 and 0.85.

Raising the AQ strength will help flatter areas, e.g. by maintaining smaller grain and dither to alleviate banding. However, higher AQ strengths will tend to distort edges more.

Older, grainier live action content will usually benefit more from lower AQ strengths and may benefit less from the dark scene bias present in AQ mode 3, while newer live action tends to benefit more from higher values. For animation, this setting can be very tricky; as both banding and distorted edges are more noticeable. It's usually recommended to run a slightly lower AQ strength, e.g. around 0.60 to 0.70 with mode 3.

To use e.g. AQ mode 3 with strength 0.80:

--aq-mode 3 --aq-strength 0.80

Motion estimation range

The motion estimation range (merange) determines how many pixels are used for motion estimation. Larger numbers will be slower, but can be more accurate for higher resolutions. However, go too high, and the encoder will start picking up unwanted info, which in turn will harm coding efficiency. You can usually get by with testing 32, 48, and 64, then using the best looking one, preferring lower numbers if equivalent:

--merange 32

Frametype quantizer ratio

These settings determine how bits are distributed among the different frame types. Generally speaking, you want to have an I-frame to P-frame ratio (ipratio) around 0.10 higher than your P-frame to B-frame ratio (pbratio). Usually, you'll want to lower these from the defaults of 1.40 for ipratio and 1.30 for pbratio, although not by more than 20.

Lower ratios will tend to help with grainier content, where less information from previous frames can be used, while higher ratios will usually lead to better results with flatter content.

You can use the stats created by the x264 log at the end of the encoding process to check whether the encoder is overallocating bits to a certain frametype and investigate whether this is a problem. A good guideline is for P-frames to be double the size of B-frames and I-frames in turn be double the size of P-frames. However, don't just blindly set your ratios so that this is the case. Always use your eyes.

To set an ipratio of 1.30 and a pbratio of 1.20:

--ipratio 1.30 --pbratio 1.20

If using mbtree, pbratio doesn't do anything, so only test and set ipratio.

Psychovisually optimized rate-distortion optimization

One big issue with immature encoders is that they don't offer psychovisual optimizations like psy-rdo. What it does is distort the frame slightly, sharpening it in the process. This will make it statistically less similar to the original frame, but will look better and more similar to the input. What this means is this is a weak sharpener of sorts, but a very much necessary sharpener!

The setting in x264 comes with two options, psy-rdo and psy-trellis, which are both set via the same option:

--psy-rd rdo:trellis

Unfortunately, the latter will usually do more harm than good, so it's best left off. The psy-rdo strength should be higher for sharper content and lower for blurrier content. For animated content, psy-rdo can introduce ringing even with default values. We suggest using lower values, between 0.60 and 0.90. For live action content where this is of much lesser concern you should find success with values around 0.95 to 1.10.

When testing this, pay attention to whether content looks sharp enough or too sharp, as well as whether anything gets distorted during the sharpening process.

For example, to set a psy-rd of 1.00 and psy-trellis of 0:

--psy-rd 1.00:0

DCT block decimation

Disabling DCT block decimation (no-dct-decimate) is very common practice, as it drops blocks deemed unimportant. For high quality encoding, this is often unwanted and disabling this is wise. However, for flatter content, leaving this on can aid with compression. Just quickly test on and off if you're encoding something flat.

To disable DCT block decimation:

--no-dct-decimate

Video buffer verifier

To understand what this is, there's actually a Wikipedia article you can read. Alternatively, you may find this video presentation from demuxed informative.

For us, the main relevance is that we want to disable this when testing settings, as video encoded with VBV enabled will be non-deterministic. Otherwise, just leave it at your level's defaults.

To disable VBV:

--vbv-bufsize 0 --vbv-maxrate 0

VBV settings for general hardware compliance (High@L4.1)

--vbv-bufsize 78125 --vbv-maxrate 62500

Reference frames

The reference frames (ref) setting determines how many frames P frames can use as reference. Many existing guides may provide an incorrect formula to find the 'correct' value. Do not use this. Rather, allow x264 to calculate this for automatically (as dictated by --level).

Otherwise, if you don't care about compatibility with 15 year old TVs and 30 year old receivers, set this however high you can bare, with a maximum value of 16. Higher refs will improve encoder efficiency at the cost of increased compute time.

To set the maximum value of 16:

--ref 16

Zones

Sometimes, the encoder might have trouble distributing enough bits to certain frames, e.g. ones with wildly different visuals or sensitive to banding. To help with this, one can zone out these scenes and change the settings used to encode them.

When using this to adjust bitrate, one can specify a CRF for the zone or a bitrate multiplier. It's very important to not bloat these zones, e.g. by trying to maintain all the grain added while debanding. Sane values tend to be \(\pm 2\) from base CRF or bitrate multipliers between 0.75 and 1.5.

To specify a CRF of 15 for frames 100 through 200 and 16 for frames 300 through 400, as well as a bitrate multiplier of 1.5 for frames 500 through 600:

--zones 100,200,crf=15/300,400,crf=16/500,600,b=1.5

For a more complete picture of what --zones can and can not manipulate, see this section.

Output depth and color space

To encode 10-bit and/or 4:4:4 video, one must specify this via the following parameters:

--output-depth 10 --output-csp i444

The official documentation for x265 is very good, so this page will only cover recommended values and switches.

Source-independent settings

  • --preset veryslow or slower

  • --no-rect for slower computers. There's a slight chance it'll prove useful, but it probably isn't worth it.

  • --no-amp is similar to rect, although it seems to be slightly more useful.

  • --no-open-gop

  • --no-cutree since this seems to be a poor implementation of mbtree.

  • --rskip 0 rskip is a speed up that gives up some quality, so it's worth considering with bad CPUs.

  • --ctu 64

  • --min-cu-size 8

  • --rdoq-level 2

  • --max-merge 5

  • --rc-lookahead 60 although it's irrelevant as long as it's larger than min-keyint

  • --ref 6 for good CPUs, something like 4 for worse ones.

  • --bframes 16 or whatever your final bframes log output says.

  • --rd 3 or 4 (they're currently the same). If you can endure the slowdown, you can use 6, too, which allows you to test --rd-refine.

  • --subme 5. You can also change this to 7, but this is known to sharpen.

  • --merange 57 just don't go below 32 and you should be fine.

  • --high-tier

  • --range limited

  • --aud

  • --repeat-headers

Source-dependent settings

  • --output-depth 10 for 10-bit output.

  • --input-depth 10 for 10-bit input.

  • --colorprim 9 for HDR, 1 for SDR.

  • --colormatrix 9 for HDR, 1 for SDR.

  • --transfer 16 for HDR, 1 for SDR.

  • --hdr10 for HDR.

  • --hdr10-opt for 4:2:0 HDR, --no-hdr10-opt for 4:4:4 HDR and SDR.

  • --dhdr10-info /path/to/metadata.json for HDR10+ content with metadata extracted using hdr10plus_parser.

  • --dolby-vision-profile 8.1 specified Dolby Vision profile. x265 can encode only to profiles 5, 8.1, and 8.2

  • --dolby-vision-rpu /path/to/rpu.bin for Dolby Vision metadata extracted using dovi_tool.

  • --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,20)" for BT.2020 or
    G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) for Display P3 mastering display color primaries with the values for L coming from your source's MediaInfo for mastering display luminance.

    For example, if your source MediaInfo reads:

    Mastering display color primaries : BT.2020
    Mastering display luminance : min: 0.0000 cd/m2, max: 1000 cd/m2
    Maximum Content Light Level : 711 cd/m2
    Maximum Frame-Average Light Level : 617 cd/m2
    

    This means you set "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,0)"

  • --max-cll "711,617" from your source's MediaInfo for maximum content light level and maximum frame-average light level. The values here are from the above example.

  • --cbqpoffs and --crqpoffs should usually be between -3 and 0 for 4:2:0. For 4:4:4, set this to something between 3 and 6. This sets an offset between the bitrate applied to the luma and the chroma planes.

  • --qcomp between 0.60 and 0.80.

  • --aq-mode 4, 3, 2, 1, or --hevc-aq with 4 and 3 usually being the two best options. If using aMod, there is an extra mode 5. These do the following:

    1. Standard adaptive quantization, simply add more bits to complex blocks.

    2. Adaptive quantization with auto-variance.

    3. Adaptive quantization with auto-variance and bias to dark scenes.

    4. Adaptive quantization with auto-variance and better edge preservation.

    5. Adaptive quantization with auto-variance, better edge preservation, and bias to dark scenes. Only in aMod.

    6. hevc-aq "scales the quantization step size according to the spatial activity of one coding unit relative to frame average spatial activity. This AQ method utilizes the minimum variance of sub-unit in each coding unit to represent the coding unit's spatial complexity." Like most of the x265 documentation, this sounds a lot fancier than it is. Don't enable with other modes turned on.

  • --aq-strength between 0.80 and 1.40 for AQ modes 1-3 or 0.50 and 1.00 for AQ mode 4.

  • --aq-bias-strength between 0.50 and 1.20 if using aMod and an AQ mode with dark bias. This is a multiplier with lower numbers lowering the bias. Default is 1.00.

  • --deblock -4:-4 to 0:0, similar to x264. Test at least -3:-3 to -1:-1 with live action, -2:-2 to 0:0 with animation.

  • --ipratio and --pbratio same as x264 again.

  • --psy-rd 0.80 to 2.00, similar-ish effect to x264. Values are generally higher than with x264, though.

  • --psy-rdoq anything from 0.00 to 2.00 usually.

  • --no-sao is usually best, but if your encode suffers from a lot of ringing, turn SAO back on. SAO does tend to blur quite heavily.

  • --no-strong-intra-smoothing on sharp/grainy content, you can leave this on for blurry content, as it's an additional blur that'll help prevent banding.

Taking Screenshots

Taking simple screenshots in VapourSynth is very easy. If you're using a previewer, you can likely use that instead, but it might still be useful to know how to take screenshots via VapourSynth directly.

We recommend using awsmfunc.ScreenGen. This has two advantages:

  1. You save frame numbers and can easily reference these again, e.g. if you want to redo your screenshots.
  2. It takes care of proper conversion and compression for you, which might not be the case with some previewers (e.g. VSEdit).

To use ScreenGen, from within the directory containing your VapourSynth script, create a file called screens.txt with the frame numbers you'd like to screenshot, e.g.

26765
76960
82945
92742
127245

Then, at the bottom of your VapourSynth script, put

awf.ScreenGen(src, "Screenshots", "a")

a is what is put after the frame number. This is useful for staying organized and sorting screenshots, as well as preventing unnecessary overwriting of screenshots.

Now, run your script in the command line (or reload in a previewer):

python vapoursynth_script.vpy

Done! Your screenshots should now be in the Screenshots folder.

Comparing Source vs. Encode

Comparing the source against your encode allows potential downloaders to judge the quality of your encode easily. When taking these, it is important to include the frame types you are comparing, as e.g. comparing two I frames will lead to extremely favorable results. You can do this using awsmfunc.FrameInfo:

src = awf.FrameInfo(src, "Source")
encode = awf.FrameInfo(encode, "Encode")

If you'd like to compare these in your previewer, it's recommended to interleave them:

out = core.std.Interleave([src, encode])

However, if you're taking your screenshots with ScreenGen, it's easier not to do that and just run a ScreenGen call:

src = awf.FrameInfo(src, "Source")
encode = awf.FrameInfo(encode, "Encode")
awf.ScreenGen([src, encode], "Screenshots")

By default, this will generate src screenshots with the suffix "a", and encode screenshots with the suffix "b". This will allow you to sort your folder by name and have every source screenshot followed by an encode screenshot, making uploading easier.

To use custom suffixes, you can use the suffix argument:

awf.ScreenGen([src, encode], "Screenshots", suffix=["src","enc"])

HDR comparisons

For comparing an HDR source to an HDR encode, it's recommended to tonemap. This process is destructive, but you should still be able to tell what's warped, smoothed etc.

The recommended function for this is awsmfunc.DynamicTonemap:

src = awf.DynamicTonemap(src)
encode = awf.DynamicTonemap(encode, reference=src)

The reference=src in the second tonemap makes sure that the tonemapping is consistent across the two.

Optional: For better quality tonemapping, ensure that you have installed vs-placebo.

Choosing frames

When taking screenshots, it is important to not make your encode look deceptively transparent. To do so, you need to make sure you're screenshotting the proper frame types as well as content-wise differing kinds of frames.

Luckily, there's not a lot to remember here:

  • Your encode's screenshots should always be B type frames.
  • Your source's screenshots should never be I type frames.
  • Your comparisons should include dark scenes, bright scenes, close-up shots, long-range shots, static scenes, high action scenes, and whatever you have in-between.

Comparing Different Sources

When comparing different sources, you should proceed similarly to comparing source vs. encode. However, you'll likely encounter differing crops, resolutions or tints, all of which get in the way of comparing.

For differing crops, simply add borders back:

src_b = src_b.std.AddBorders(left=X, right=Y, top=Z, bottom=A)

If doing this leads to an offset of the image content, you should resize to 4:4:4 so you can add uneven borders. For example, if you want to add 1 pixel tall black bars to the top and bottom:

src_b = src_b.resize.Spline36(format=vs.YUV444P8, dither_type="error_diffusion")
src_b = src_b.std.AddBorders(top=1, bottom=1)

For differing resolutions, it's recommended to use a simple spline resize1:

src_b = src_b.resize.Spline36(src_a.width, src_a.height, dither_type="error_diffusion")

If one source is HDR and the other one is SDR, you can use awsmfunc.DynamicTonemap:

src_b = awf.DynamicTonemap(src_b)

For different tints, refer to the tinting chapter.

1

It's important to make sure you're resizing to the appropriate resolution here; if you're comparing for a 1080p encode, you'd obviously compare at 1080p, whereas if you're comparing for the sake of figuring out which source is better, you'll want to upscale the lower resolution source to match the higher resolution.

If your debanded clip had very little grain compared to parts with no banding, you should consider using a separate function to add matched grain so the scenes blend together easier. If there was lots of grain, you might want to consider adptvgrnMod, adaptive_grain or GrainFactory3; for less obvious grain or simply for brighter scenes where there'd usually be very little grain, you can also use grain.Add. The topic of grainers will be further elaborated later in the graining section.

Here's an example from Mirai:

Black & White Clips: Working in GRAY

Because of the YUV format saving luma in a separate plane, working with black and white movies makes our lives a lot easier when filtering, as we can extract the luma plane and work solely on that:

y = src.std.ShufflePlanes(0, vs.GRAY)

The get_y function in vsutil does the same thing. With our y clip, we can perform functions without mod2 limitations. For example, we can perform odd crops:

crop = y.std.Crop(left=1)

Additionally, as filters are only applied on one plane, this can speed up our script. We also don't have to worry about filters like f3kdb altering our chroma planes in unwanted ways, such as graining them.

However, when we're done with our clip, we usually want to export the clip as YUV. There are two options here:

1. Using fake chroma

Using fake chroma is quick and easy and has the advantage that any accidental chroma offsets in the source (e.g. chroma grain) will be removed. All this requires is constant chroma (meaning no tint changes) and mod2 luma.

The simplest option is for u = v = 128 (8-bit):

out = y.resize.Point(format=vs.YUV420P8)

If you have an uneven luma, just pad it with awsmfunc.fb. Assuming you want to pad the left:

y = core.std.StackHorizontal([y.std.BlankClip(width=1), y])
y = awf.fb(y, left=1)
out = y.resize.Point(format=vs.YUV420P8)

Alternatively, if your source's chroma isn't a neutral gray, use std.BlankClip:

blank = y.std.BlankClip(color=[0, 132, 124])
out = core.std.ShufflePlanes([y, blank], [0, 1, 2], vs.YUV)

2. Using original chroma (resized if necessary).

This has the advantage that, if there is actual important chroma information (e.g. slight sepia tints), this will be preserved. Just use ShufflePlanes on your clips:

out = core.std.ShufflePlanes([y, src], [0, 1, 2], vs.YUV)

However, if you've resized or cropped, this becomes a bit more difficult. You might have to shift or resize the chroma appropriately (see the chroma resampling chapter for explanations).

If you've cropped, extract and shift accordingly. We will use split and join from vsutil to extract and merge planes:

y, u, v = split(src)
crop_left = 1
y = y.std.Crop(left=crop_left)
u = u.resize.Spline36(src_left=crop_left / 2)
v = v.resize.Spline36(src_left=crop_left / 2)
out = join([y, u, v])

If you've resized, you need to shift and resize chroma:

y, u, v = split(src)
w, h = 1280, 720
y = y.resize.Spline36(w, h)
u = u.resize.Spline36(w / 2, h / 2, src_left=.25 - .25 * src.width / w)
v = v.resize.Spline36(w / 2, h / 2, src_left=.25 - .25 * src.width / w)
out = join([y, u, v])

Combining cropping and shifting, whereby we pad the crop and use awsmfunc.fb to create a fake line:

y, u, v = split(src)
w, h = 1280, 720
crop_left, crop_bottom = 1, 1

y = y.std.Crop(left=crop_left, bottom=crop_bottom)
y = y.resize.Spline36(w - 1, h - 1)
y = core.std.StackHorizontal([y.std.BlankClip(width=1), y])
y = awf.fb(left=1)

u = u.resize.Spline36(w / 2, h / 2, src_left=crop_left / 2 + (.25 - .25 * src.width / w), src_height=u.height - crop_bottom / 2)

v = v.resize.Spline36(w / 2, h / 2, src_left=crop_left / 2 + (.25 - .25 * src.width / w), src_height=u.height - crop_bottom / 2)

out = join([y, u, v])

If you don't understand exactly what's going on here and you encounter a situation like this, ask someone more experienced for help.

A

B

C

D

E

F

G

K

L

M

N

R

T

V

Z