I usually like to keep my posts more positive-focused — here’s what you should do vs. here’s what you shouldn’t do. But this week alone I had three potential clients relay to me a very common experience: I thought I finally found a good developer, but then they suddenly just disappeared! I’ll admit its hard …

2023-06-19 15:30

One of the advantages (if you can call it that) of being in this industry as long as I have is that I’ve been through multiple economic disasters, technological paradigm shifts, and — it’s not all bad! — economic and technological boom periods. So I figured I might as well throw out a few predictions …

2023-06-19 15:30

In every introduction to a potential client, partner, or other associate, the first thing I do is give a brief overview of my history. I know this is common in just about any business or social interaction, but its especially important in my line of work, since communicating my curriculum vitae is so critical to …

2023-06-19 15:30


After reviewing the code for the simple YAML parser I wrote, I decided it was getting a little messy, so before continuing, I decided to refactor it a little bit.

The simples thing to do was to separate the serialisation and the deserialisation into separate classes, and simple call those from within the YamlConvert class in the existing methods. This approach tends to be what other JSON and YAML libraries do, with added functionality such as being able to control aspects of the serialisation/deserialisation process for specific types.

I currently don’t need, or want, to do that, as I’m taking a much more brute force approach - however it is something to consider for a future refactor. Maybe.

I ended up with the following for the YamlConvert:

public static class YamlConvert
    private static YamlSerialiser Serialiser;
    private static YamlDeserialiser Deserialiser;
    static YamlConvert()
        Serialiser = new YamlSerialiser();
        Deserialiser = new YamlDeserialiser();
    public static string Serialise(YamlHeader header)
        return Serialiser.Serialise(header);

    public static YamlHeader Deserialise(string filePath)
        if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);

        var content = File.ReadAllLines(filePath);

        return Deserialise(content);

    public static YamlHeader Deserialise(string[] rawHeader)
        return Deserialiser.Deserialise(rawHeader);

It works quite well, as it did before, and looks a lot better. There is no dependency configuration to worry about, as I mentioned above I’m not worried about swapping out the serialisation/deserialisation process at any time.

2022-11-15 00:00


Previously we left off with a method which could parse the YAML header in one of our markdown files, and it was collecting each line between the --- header marker, for further processing.

One of the main requirements for the overall BlogHelper9000 utility is to be able to standardise the YAML headers in each source markdown file for a post. Some of the posts had a mix of different tags, that were essentially doing the same thing, so one of the aims is to be able to collect those, and transform the values into the correct tags.

In order to achieve this, we can specify a collection of the valid header properties up front, and also a collection of the ‘other’ properties that we find, which we can hold for further in the process when we’ve written the code to handle those properties. The YamlHeader class has already been defined, and we can use a little reflection to load that class up and pick the properties out.

private static Dictionary<string, object?> GetYamlHeaderProperties(YamlHeader? header = null)
    var yamlHeader = header ?? new YamlHeader();
    return yamlHeader.GetType()
        .GetProperties(BindingFlags.DeclaredOnly | BindingFlags.Public | BindingFlags.Instance)
        .Where(p => p.GetCustomAttribute<YamlIgnoreAttribute>() is null)
        .ToDictionary(p =>
            var attr = p.GetCustomAttribute<YamlNameAttribute>();

            return attr is not null ? attr.Name.ToLower() : p.Name.ToLower();
        }, p => p.GetValue(yamlHeader, null));

We need to be careful to ignore collecting properties that are not part of the YAML header in markdown files, but that we use in the YamlHeader that we can use when doing further processing - such as holding the ‘extra’ properties that we’ll need to match up with their valid counterparts in a further step. Thus we have the custom YamlIgnoreAttribute that we can use to ensure we drop properties that we don’t care about. We also need to ensure that we can match up C# property names with the actual YAML header name, so we also have the YamlNameAttribute to handle this.

Then we just need a way of parsing the individual lines and pulling the header name and the value out.

(string property, string value) ParseHeaderTag(string tag)
    tag = tag.Trim();
    var index = tag.IndexOf(':');
    var property = tag.Substring(0, index);
    var value = tag.Substring(index+1).Trim();
    return (property, value);

Here we just return a simple tuple after doing some simple substring manipulation, which is greatly helped by the header and its value always being seperated by ‘:’.

Then if we put all that together we can start to parse the header properties.

private static YamlHeader ParseYamlHeader(IEnumerable<string> yamlHeader)
    var parsedHeaderProperties = new Dictionary<string, object>();
    var extraHeaderProperties = new Dictionary<string, string>();
    var headerProperties = GetYamlHeaderProperties();

    foreach (var line in yamlHeader)
        var propertyValue = ParseHeaderTag(line);

        if (headerProperties.ContainsKey(
            parsedHeaderProperties.Add(, propertyValue.value);
            extraHeaderProperties.Add(, propertyValue.value);

    return ToYamlHeader(parsedHeaderProperties, extraHeaderProperties);

All we need to do is, to setup up some dictionaries to hold the header properties, get the dictionary of valid header properties, and then loop through each line, parsing the header tag and verifying whether the property is a ‘valid’ one that we definitely know we want to keep, and or one we need to hold for further processing. You’ll noticed in the above code, that it’s missing an end brace: this is deliberate, because the ParseHeaderTag method and ToYamlHeader method are both nested methods.

Reading through the code to write this post has made me realise that we can do some refactoring to make this look a little nicer.

So we’ll look at that next.

2022-07-30 00:00


The next thing to do to get BlogHelper9000 functional is to write a command which provides some information about the posts in the blog. I want to know:

  • How many published posts there are
  • How many drafts there are
  • A short list of recent posts
  • How long it’s been since a post was published

I also know that I want to introduce a command which will allow me to fix the metadata in the posts, which is a little messy. I’ve been inconsistently blogging since 2007, originally starting off on a self-hosted python blog I’ve forgot the name of before migrating to Wordpress, and then migrating to a short lived .net static site generator before switching over to Jekyll.

Obviously, Markdown powered blogs like Jekyll have to provide non-markdown metadata in each post, and for Jekyll (and most markdown powered blogs) that means: YAML.

Parse that YAML

There are a couple of options when it comes to parsing YAML. One would be to use YamlDotNet which is a stable library which conforms with V1.1 and v1.2 of the YAML specifications.

But where is the fun in that?

I’ve defined a POCO called YamlHeader which I’m going to use to use as the in-memory object to represent the YAML metadata header at the top of a markdown file.

If we take a leaf from different JSON converters, we can define a YamlConvert class like this:

public static class YamlConvert
    public static string Serialise(YamlHeader header)

    public static YamlHeader Deserialise(string filePath)

With this, we can easily serialise a YamlHeader into a string, and deserialise a file into a YamlHeader.


Deserialising is the slight more complicated of the two, so lets start with that.

Our first unit test looks like this:

    public void Should_Deserialise_YamlHeader()
        var yaml = @"---
layout: post
title: 'Dynamic port assignment in Octopus Deploy'
tags: ['build tools', 'octopus deploy']
featured_image: /assets/images/posts/2020/artem-sapegin-b18TRXc8UPQ-unsplash.jpg
featured: false
hidden: false
post content that's not parsed";
        var yamlObject = YamlConvert.Deserialise(yaml.Split(Environment.NewLine));


This immediately requires us to add an overload for Deserialise to the YamlConvert class, which takes a string[]. This means our implementation for the first Deserialise method is simply:

public static YamlHeader Deserialise(string filePath)
    if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);

    var content = File.ReadAllLines(filePath);

    return Deserialise(content);

Now we get into the fun part. And a big caveat: I’m not sure if this is the best way of doing this, but it works for me and that’s all I care about.

Anyway. A YAML header block is identified by a single line of only --- followd by n lines of YAML which is signified to have ended by another single line of only ---. You can see this in the unit test above.

The algorithm I came up with goes like this:

For each line in lines:
  if line is '---' then
    if header start marker not found then
      header start marker found
     break loop
    store line
  parse each line of found header

So in a nutshell, it loops through each line in the file, look for the first --- to identify the start of the header, and then until it hits another ---, it gathers the lines for further processing.

Translated into C#, the code looks like this:

public static YamlHeader Deserialise(string[] fileContent)
    var headerStartMarkerFound = false;
    var yamlBlock = new List<string>();

    foreach (var line in fileContent)
        if (line.Trim() == "---")
            if (!headerStartMarkerFound)
                headerStartMarkerFound = true;


    return ParseYamlHeader(yamlBlock);

This is fairly straightforward, and isn’t where I think some of the problems with the way it works actually are - all that is hidden behind ParseYamlHeader, and is worth a post on its own.

2022-07-22 00:00


In the introductory post to this series, I ended with issuing a command to initialise a new console project, BlogHelper9000. It doesn’t matter how you create your project, be it from Visual Studio, Rider or the terminal, the end result is the same, as the templates are all the same.

With the new .net 6 templates, the resulting Program.cs is somewhat sparse, if you discount the single comment then all you get in the file is a comment and a Console.WriteLine("Hello, World!");, thanks to all the new wizardry in the latest versions of the language and the framework.

Thanks to this new fangled sorcery, the app still has a static main method, you just don’t need to see it, and as such, the args string array is still there. For very simple applications, this is all you really need to do. However, once you get past a few commands, with a few optional flags, things can get complicated, fast. This can into a maintenance headache.

In the past I’ve written my own command line parsing abstractions, I’ve used Mono.Options and other libraries, and I think I’ve finally settled on Oakton as my go to library for quickly and easily adding command line parsing to a console application. It’s intuitive, easy to use and easy to maintain. This means you can easily introduce it into a team environment and have everyone understand it immediately.

Setup Command loading

After following Oakton’s getting started documentation, you can see how easy it is to get going with a basic implementation. I recommended introducing the ability to have both synchronous and asynchronous commands able to be executed, and you achieve this by a small tweak to the Program.cs and taking into consideration the top-level statements in .net 6, like this:

using System.Reflection;

var executor = CommandExecutor.For(_ =>{

var result = await executor.ExecuteAsync(args);
return result;

In .net 5, or if you don’t like top-level statements and have a static int Main you can make it static Task<int> Main instead and return the executor.ExecuteAsync instead of awaiting it.

Base classes

In some console applications, different commands can have the same optional flags, and I like to put mine in a class called BaseInput. Because I know I’m going to have several commands in this application, I’m going to add some base classes so that the different commands can share some of the same functionality. I’ve also used this in the past to, for example, create a database instance in the base class, which is then passed into each inheriting command. It’s also a good place to add some common argument/flag validation.

What I like to do is have an abstract base class, which inherits from the Oakton command, and add an abstract Run method to it, and usually a virtual bool ValidateInput too; these can then be overriden in our actual Command implementations and have a lot of nice functionality automated for us in a way that can be used across all Commands.

Some of the detail of these classes are elided, to stop this from being a super long post, you can see all the details in the Github repo.

public abstract class BaseCommand<TInput> : OaktonCommand<TInput>
    where TInput : BaseInput
    public override bool Execute(TInput input)
        return ValidateInput(input) && Run(input);

    protected abstract bool Run(TInput input);

    protected virtual bool ValidateInput(TInput input)
        /* ... */

This ensures that all the Commands we implement can optionally decide to validate the inputs that they take in, simply by overriding ValidateInput.

The async version is exactly the same… except async:

public abstract class AsyncBaseCommand<TInput> : OaktonAsyncCommand<TInput>
    where TInput : BaseInput
    public override Task<bool> Execute(TInput input)
        return ValidateInput(input) && Run(input);

    protected abstract Task<bool> Run(TInput input);

    protected virtual Task<bool> ValidateInput(TInput input)
        /* ... */

There is an additional class I’ve not yet shown, which adds some further reusable functionality between each base class, and that’s the BaseHelper class. I’ve got a pretty good idea that any commands I write for the app are going to operate on posts or post drafts, which in jekyll are stored in _posts and _drafts respectively. Consequently, the commands need an easy way of having these paths to hand, so a little internal helper class is a good place to put this shared logic.

internal class BaseHelper<TInput> where TInput : BaseInput
    public string DraftsPath { get; }

    public string PostsPath { get;  }

    private BaseHelper(TInput input)
        DraftsPath = Path.Combine(input.BaseDirectoryFlag, "_drafts");
        PostsPath = Path.Combine(input.BaseDirectoryFlag, "_posts");

    public static BaseHelper<TInput> Initialise(TInput input)
        return new BaseHelper<TInput>(input);

    public bool ValidateInput(TInput input)
        if (!Directory.Exists(DraftsPath))
            ConsoleWriter.Write(ConsoleColor.Red, "Unable to find blog _drafts folder");
            return false;

        if (!Directory.Exists(PostsPath))
            ConsoleWriter.Write(ConsoleColor.Red, "Unable to find blog _posts folder");
            return false;

        return true;

This means that our base class implementations can now become:

private BaseHelper<TInput> _baseHelper = null!;
protected string DraftsPath => _baseHelper.DraftsPath;
protected string PostsPath => _baseHelper.PostsPath;

public override bool Execute(TInput input)
    _baseHelper = BaseHelper<TInput>.Initialise(input);
    return ValidateInput(input) && Run(input);

protected virtual bool ValidateInput(TInput input)
    return _baseHelper.ValidateInput(input);
Note the null!, where I am telling the compiler to ignore the fact that _baseHelper is being initialised to null, as I know better.

This allows each command implementation to hook into this method and validate itself automatically.

First Command

Now that we have some base classes to work with, we can start to write our first command. If you check the history in the repo, you’ll see this wasn’t the first command I actually wrote… but it probably should have been. In any case, it only serves to illustrate our first real command implementation.

public class InfoCommand : BaseCommand<BaseInput>
    public InfoCommand()

    protected override bool Run(BaseInput input)
        var posts = LoadsPosts();
        var blogDetails = new Details();

        DeterminePostCount(posts, blogDetails);
        DetermineDraftsInfo(posts, blogDetails);
        DetermineRecentPosts(posts, blogDetails);


        return true;


LoadPosts is a method in the base class which is responsible for loading the posts into memory, so that we can process them and extract meaningful details about the posts. We put store this information in a Details class, which is what we ultimately use to render the details to the console. You can see the details of these methods in the github repository, however they all boil down to simple Linq queries.


In this post we’ve seen how to setup Oakton and configure a base class to extend the functionality and give us more flexibility, and an initial command. In subsequent posts, we’ll cover more commands and I’ll start to use the utility to tidy up metadata across all the posts in the blog and fix things like images for posts.

2022-07-14 00:00


I just had to setup my vimrc and vimfiles on a new laptop for work, and had some fun with Vim, mostly as it’s been years since I had to do it. I keep my vimfiles folder in my github, so I can grab it wherever I need it.

To recap, one of the places that Vim will look for things is $HOME/vimfiles/vimrc, where $HOME is actually the same as %USERPROFILE%. In most corporate environments, the %USERPROFILE% is actually stored in a networked folder location, to enable roaming profile support and help when a user gets a new computer.

So you can put your vimfiles there, but, it’s a network folder - it’s slow to start an instance of Vim. Especially if you have a few plugins.

Instead, what you can do is to edit the _vimrc file in the Vim installation folder (usually in C:\Program Files (x86)\vim), delete the entire contents and replace it with:

set rpt+=C:\path\to\your\vimfiles
set viminfo+=nC:\path\to\your\vimfiles\or\whatever
source C:\path\to\your\vimfiles\vimrc

What this does is:

  1. Sets the runtime path to be the path to your vimfiles
  2. Tells vim where to store/update the viminfo file (which stores useful history state amongst other things)
  3. Source your vimrc file and uses that

This post largely serves as a memory aid for myself when I need to do this again in future I won’t spend longer than I probably needed to googling it to find out how to do it, but I hope it helps someone else.

2022-03-11 00:00


Recently I was inspired by @buhakmeh’s blog post, Supercharge Blogging With .NET and Ruby Frankenblog to write something similar, both as an exercise and excuse to blog about something, and as a way of tidying up the metadata on my existing blog posts and adding header images to old posts.

High level requirements

The initial high level requirements I want to support are:

  1. Cross-platform. This blog is jekyll based, and as such is written in markdown. Any tool I write for automation purposes should be cross-platform.
  2. Easily add posts from the command line, and have some default/initial yaml header metadata automatically added.
  3. See a high level overview of the current status of my blog. This should include things like the most recent post, how many days I’ve been lazy and not published a post, available drafts etc
  4. Publish posts from the command line, which should update the post with published status and add the published date to the yaml header and filename.
  5. Create a customised post header for each post on the blog, containing some kind of blog branding template and the post title, and update or add the appropriate yaml header metadata to each post. This idea also comes from another @buhakmeh’s post.
  6. The blog has many years of blog posts, spread across several different blogging platforms before settling on Jekyll. As such, some of the yaml metadata for each blog post is… not consistent. Some effort should go into correcting this.
  7. Automaticlly notify Twitter of published posts.

Next steps

The next series of posts will cover implementing the above requirements… not necessarily in that order. First I will go over setting up the project and configuring Oakton.

After that I will probably cover implementing fixes to the existing blog metadata, as I think that is going to be something that will be required in order for any sort of Info function to work properly, as all of the yaml metadata will need to be consistent.

Then I think I’ll tackle the image stuff, which should be fairly interesting, and should give a nice look to the existing posts, as having prominent images for posts is part of the theme for the blog, which I’ve not really taken full advantage of.

I’ll try to update this post with links to future posts, or else make it all a big series.

dotnet new console --name BlogHelper9000

2022-03-04 00:00


At work, we have recently been porting our internal web framework into .net 6. Yes, we are late to the party on this, for reasons. Suffice it to say I currently work in an inherently risk averse industry.

Anyway, one part of the framework is responsible for getting reports from SSRS.

The way it did this is to use a wrapper class around a SOAP client generated from good old ReportService2005.asmx?wsdl, using our faithful friend svcutil.exe. The wrapper class used some TaskCompletionSource magic on the events in the client to make the client.LoadReportAsync and the other *Async methods actually async, as the generated client was not truely async.

Fast forward to the modern times, and we need to upgrade it. How do we do that?

Obviously, Microsoft are a step ahead: svcutil has a dotnet version - dotnet-svcutil. We can install it and get going:

dotnet too install --global dotnet-svcutil

Once installed, we can call it against the endpoint:

Make sure you call this command in the root of the project where the service should go
dotnet-svcutil http://server/ReportServer/ReportService2005.asmx?wsdl

In our wrapper class, the initialisation of the client has to change slightly, because the generated client is different to the original svcutil implementation. Looking at the diff between the two files, it’s because the newer version of the client users more modern .net functionality.

The wrapper class constructor has to be changed slightly:

public Wrapper(string url, NetworkCredential credentials)
    var binding = new BasicHttpBinding(BasicHttpSecurityMode.TransportCredentialOnly);
    binding.Security.Transport.ClientCredentialType = HttpClientCredentialType.Ntlm;
    binding.MaxReceivedMessageSize = 10485760; // this is a 10mb limit
    var address = new EndpointAddress(url);

    _client = new ReportExecutionServiceSoapClient(binding, address);
    _client.ClientCredentials.Windows.AllowedInpersonationLevel = TokenImpersonationLevel.Impersonation;
    _client.ClientCredentials.Windows.ClientCredential = credentials;

Then, the code which actually generates the report can be updated to remove all of the TaskCompletionSource, which actually simplifies it a great deal:

public async Task<byte[]> RenderReport(string reportPath, string reportFormat, ParameterValue[] parameterValues)
    await _client.LoadReportAsync(null, reportPath, null);
    await _client.SetExecutionParametersAsync(null, null, parameterValues, "en-gb");
    var deviceInfo = @"<DeviceInfo><Toolbar>False</ToolBar></DeviceInfo>";
    var request = new RenderRequest(null, null, reportFormat, deviceInfo);
    var response = await _client.RenderAsync(request);
    return response.Result;

You can then do whatever you like with the byte[], like return it in an IActionResult or load it into a MemoryStream and write it to disk as the file.

Much of the detail of this post is sourced from various places around the web, but I’ve forgotten all of the places I gleaned the information from.

2022-01-11 00:00


Recently we realised that we had quite a few applications being deployed through Octopus Deploy, and that we had a number of Environments, and a number of Channels, and that managing the ports being used in Dev/QA/UAT across different servers/channels was becoming… problematic.

When looking at this problem, it’s immediately clear that you need some way of dynamically allocating a port number on each deployment. This blog post from Paul Stovell shows the way, using a custom Powershell build step.

As we’d lost track of what sites were using what ports, and that we also have ad-hoc websites in IIS that aren’t managed by Octopus Deploy, we thought that asking IIS “Hey, what ports are the sites you know about using?” might be a way forward. We also had the additional requirement that on some of our servers, we also might have some arbitary services also using a port and that we might bump into a situation where a port was chosen that was already being used by a non-IIS application/website.

Researching the first situation, it’s quickly apparent that you can do this in Powershell, using the Webadministration module. Based on the answers to this question on Stackoverflow, we came up with this:

Import-Module Webadministration

function Get-IIS-Used-Ports()
    $Websites = Get-ChildItem IIS:\Sites

    $ports = foreach($Site in $Websites)
        $Binding = $Site.bindings
        [string]$BindingInfo = $Binding.Collection
        [string]$Port = $BindingInfo.SubString($BindingInfo.IndexOf(":")+1,$BindingInfo.LastIndexOf(":")-$BindingInfo.IndexOf(":")-1)

        $Port -as [int]

    return $ports

To get the list of ports on a machine that are not being used is also fairly straightforward in Powershell:

function Get-Free-Ports()
    $availablePorts = @(49000-65000)
    $usedPorts = @(Get-NetTCPConnection | Select -ExpandProperty LocalPort | Sort -Descending | Where { $_ -ge 49000})

    $unusedPorts = foreach($possiblePort in $usedPorts)
        $unused = $possiblePort -notin $usedPorts

    return $unusedPorts

With those two functions in hand, you can work out what free ports are available to be used as the ‘next port’ on a server. It’s worth pointing out that if a site in IIS is stopped, then IIS won’t allow that port to be used in another website (in IIS), but the port also doesn’t show up as a used port in netstat -a, which is kind of what Get-NetTCPConnection does.

function Get-Next-Port()
    $iisUsedPorts = Get-IIS-Used-Ports
    $freePorts = Get-Free-Ports

    $port = $freePorts | Where-Object { $iisUsedPorts -notcontains $_} | Sort-Object | Select-Object First 1

    Set-OctopusVariable -Name "Port" -Value "$port"

Then you just have to call it at the end of the script:


You’d also want to have various Write-Host or other logging messages so that you get some useful output in the build step when you’re running it.

2021-10-26 00:00


If you found this because you have a build server which is ‘offline’, without any external internet access because of reasons, and you can’t get your build to work because dotnet fails to restore the tool you require for your build process because of said lack of external internet access, then this is for you.

In hindsight, this may be obvious for most people, but it wasn’t for me, so here it is.

In this situation, you just need to shy away from local tools completely, because as of yet, I’ve been unable to find anyway of telling dotnet not to try to restore them, and they fail every build.

Instead, I’ve installed the tool(s) as a global tool, in a specific folder, e.g. C:\dotnet-tools, which I’ve then added to the system path on the server. You may need to restart the build server for it to pick up the changes to the environment variable.

One challenge that remains is how to ensure the dotnet tools are consistent on both the developer machine, and the build server. I leave that as an exercise for the reader.

2021-05-06 00:00


I’m leaving this here so I can find it again easily.

We had a problem updating the Visual Studio 2019 Build Tools on a server, after updating an already existing offline layout.

I won’t go into that here, because it’s covered extensively on Microsoft’s Documentation website.

The installation kept failing, even when using --noweb. It turns out that when your server is completely cut off from the internet, as was the case here, you also need to pass --noUpdateInstaller.

This is because (so it would seem) that even though --noweb correctly tells the installer to use the offline cache, it doesn’t prevent the installer from trying to update itself, which will obviously fail in a totally disconnected environment.

2021-04-01 00:00


Since a technical breakdown of how Betsy does texture compression was posted, I wanted to lay out how the compressors in Convection Texture Tools (CVTT) work, as well as provide some context of what CVTT's objectives are in the first place to explain some of the technical decisions.

First off, while I am very happy with how CVTT has turned out, and while it's definitely a production-quality texture compressor, providing the best compressor possible for a production environment has not been its primary goal. Its primary goal is to experiment with compression techniques to improve the state of the art, particularly finding inexpensive ways to hit high quality targets.

A common theme that wound up manifesting in most of CVTT's design is that encoding decisions are either guided by informed decisions, i.e. models that relate to the problem being solved, or are exhaustive.  Very little of it is done by random or random-like searching. Much of what CVTT exists to experiment with is figuring out techniques which amount to making those informed decisions.

CVTT's ParallelMath module, and choice of C++

While there's some concidence with CVTT having a similar philosophy to Intel's ISPC compressor, the reason for CVTT's SPMD-style design was actually motivated by it being built a port of the skeleton of DirectXTex's HLSL BC7 compressor.

I chose to use C++ instead of ISPC for three main reasons:
  • It was easier to develop it in Visual Studio.
  • It was easier to do operations that didn't parallelize well.  This turned out to matter with the ETC compressor in particular.
  • I don't trust in ISPC's longevity, in particular I think it will be obsolete as soon as someone makes something that can target both CPU and GPU, like either a new language that can cross-compile, or SPIR-V-on-CPU.

Anyway, CVTT's ParallelMath module is kind of the foundation that everything else is built on.  Much of its design is motivated by SIMD instruction set quirks, and a desire to maintain compatibility with older instruction sets like SSE2 without sacrificing too much.

Part of that compatibility effort is that most of CVTT's ops use a UInt15 type.  The reason for UInt15 is to handle architectures (like SSE2!) that don't support unsigned compares, min, or max, which means performing those operations on a 16-bit number requires flipping the high bit on both operands.  For any number where we know the high bit is zero for both operands, that flip is unnecessary - and a huge number of operations in CVTT fit in 15 bits.

The compare flag types are basically vector booleans, where either all bits are 1 or all bits are 0 for a given lane - There's one type for 16-bit ints, and one for 32-bit floats, and they have to be converted since they're different widths.  Those are combined with several utility functions, some of which, like SelectOrZero and NotConditionalSet, can elide a few operations.

The RoundForScope type is a nifty dual-use piece of code.  SSE rounding modes are determined by the CSR register, not per-op, so RoundForScope when targeting SSE will set the CSR, and then reset it in its destructor.  For other architectures, including the scalar target, the TYPE of the RoundForScope passed in is what determines the operation, so the same code works whether the rounding is per-op or per-scope.

While the ParallelMath architecture has been very resistant to bugs for the most part, where it has run into bugs, they've mostly been due to improper use of AnySet or AllSet - Cases where parallel code can behave improperly because lanes where the condition should exclude it are still executing, and need to be manually filtered out using conditionals.

BC1-7 common themes

All of the desktop formats that CVTT supports are based on interpolation.  S3TC RGB (a.k.a. DXT1) for instance defines two colors (called endpoints), then defines all pixels as being either one of those two colors, or a color that is part-way between those two colors, for each 4x4 block.  Most of the encoding effort is spent on determining what the two colors should be.
You can read about a lot of this on Simon Brown's post outlining the compression techniques used by Squish, one of the pioneering S3TC compressors, which in turn is the basis for the algorithm used by CVTT's BC1 compressor.

Principal component analysis

Principal component analysis determines, based on a set of points, what the main axis is that the colors are aligned along.  This gives us a very good guess of what the initial colors should be, simply using the colors that are the furthest along that axis, but it isn't necessarily ideal.

Endpoint refinement

In BC1 for instance, each color is assigned to one of four possible values along the color line.  CVTT solves for that by just finding the color with the shortest distance to each pixel's color.  If the color assignments are known, then it's possible to determine what the color values are that will minimize the sum of the square distance of that mapping.  One round of refinement usually yields slightly better results and is pretty cheap to check.  Two rounds will sometimes yield a slightly better result.


One problem with using the farthest extents of the principal axis as the color is that the color precision is reduced (quantized) by the format.  In BC1-5, the color is reduced to a 16-bit color with 5 bits of red, 6 bits of green, and 5 bits of alpha.  It's frequently possible to achieve a more accurate match by using colors outside of the range so that the interpolated colors are closer to the actual image colors - This sacrifices some of the color range.

CVTT internally refers to these as "tweak factors" or similar, since what they functionally do is make adjustments to the color mapping to try finding a better result.

The number of extrapolation possibilities increases quadratically with the number of indexes.  CVTT will only ever try four possibilities: No insets, one inset on one end (which is two possibilities, one for each end), and one inset on both ends.

BC1 (DXT1)

CVTT's BC1 encoder uses the cluster fit technique developed by Simon Brown for Squish.  It uses the principal axis to determine an ordering of each of the 16 pixels along the color line, and then rather than computing the endpoints from the start and end points, it computes them by trying each possible count of pixels assigned to each endpoint that maintains the original order and still totals 16.  That's a fairly large set of possibilities with a lot of useless entries, but BC1 is fairly tight on bits, so it does take a lot of searching to maximize quality out of it.

BC2 (DXT3)

BC2 uses BC1 for RGB and 4bpp alpha.  There's not much to say here, since it just involves reducing the alpha precision.

BC3 (DXT5)

This one is actually a bit interesting.  DXT5 uses indexed alpha, where it defines two 8-bit alpha endpoints and a 3-bit interpolator per pixel, but it also has a mode where 2 of the interpolators are reserved 0 and 255 and only 6 are endpoint-to-endpoint values.  Most encoders will just use the min/max alpha.  CVTT will also try extrapolated endpoints, and will try for the second mode by assuming that any pixels within 1/10th of the endpoint range of 0 or 255 would be assigned to the reserved endpoints.  The reason for the 1/10th range is that the rounding range of the 6-value endpoints is 1/10th of the range, and it assumes that for any case where the endpoints would include values in that range, it would just use the 8-index mode and there'd be 6 indexes between them anyway.

BC4 and BC5

These two modes are functionally the same as BC3's alpha encoding, with the exception that the signed modes are offset by 128.  CVTT handles signed modes by pre-offsetting them and undoing the offset.


BC7 has 8 modes of operation and is the most complicated format to encode, but it's actually not terribly more complicated than BC1.  All of the modes do one of two things: They encode 1 to 3 pairs of endpoints that are assigned to specific groupings of pixels for all color channels, referred to as partitions, or or they encode one set of endpoints for the entire block, except for one endpoint, which is encoded separately.
Here are the possible partitions:

Credit: Jon Rocatis from this post.

Another feature of BC7 are parity bits, where the low bit of each endpoint is specified by a single bit.  Parity bits (P-bit) exist as a way of getting a bit more endpoint precision when there aren't as many available bits as there are endpoint channels without causing the channels to have a different number of bits, something that caused problems with gray discoloration in BC1-3.
CVTT will by default just try every partition, and every P-bit combination.

Based on some follow-up work that I'm still experimenting with, a good quality trade-off would be to only check certain subsets.  Among the BC7 subsets, the vast majority of selected subsets fall into a only about 16 of the possible ones, and omitting those causes very little quality loss.  I'll publish more about that when my next experiment is further along.

Weight-by-alpha issues

One weakness that CVTT's encoder has vs. Monte Carlo-style encoders is that principal component analysis does not work well for modes in BC7 where the alpha and some of the color channels are interpolated using the same indexes.  This is never a problem with BC2 or BC3, which can avoid that problem by calculating alpha first and then pre-weighting the RGB channels.

I haven't committed a solution to that yet, and while CVTT gets pretty good quality anyway, it's one area where it underperforms other compressors on BC7 by a noticeable amount.

Shape re-use

The groupings of pixels in BC7 are called "shapes."

One optimization that CVTT does is partially reuse calculations for identical shapes.  That is, if you look at the 3 subset grouping above, you can notice that many of the pixel groups are the same as some pixel groups in the 2 subset grouping.

To take advantage of that fact, CVTT performs principal component analysis on all unique shapes before performing further steps.  This is a bit of a tradeoff though: It's only an optimization if those shapes are actually used, so it's not ideal for if CVTT were to reduce the number of subsets that it checks.

Weight reconstruction

One important aspect of BC7 is that, unlike BC1-3, it specifies the precision that interpolation is to be done at, as well as the weight values for each index.  However, doing a table lookup for each value in a parallelized index values is a bit slow.  CVTT avoids this by reconstructing the weights arithmetically:

MUInt15 weight = ParallelMath::LosslessCast<MUInt15>::Cast(ParallelMath::RightShift(ParallelMath::CompactMultiply(g_weightReciprocals[m_range], index) + 256, 9));

Coincidentally, doing this just barely fits into 16 bits of precision accurately.


BC6H is very similar to BC7, except it's 16-bit floating point.   The floating point part is achieved by encoding the endpoints as a high-precision base and low-precision difference from the base.  Some of the modes that it supports are partitioned similar to BC7, and it also has an extremely complicated storage format where the endpoint bits are located somewhat arbitrarily.
There's a reason that BC6H is the one mode that's flagged as "experimental."  Unlike all other modes, BC6H is floating point, but has a very unique quirk: When BC6H interpolates between endpoints, it's done as if the endpoint values are integers, even though they will be bit-cast into floating point values.

Doing that severely complicates making a BC6H encoder, because part of the floating point values are the exponent, meaning that the values are roughly logarithmic.  Unless they're the same, they don't even correlate proportionally with each other, so color values may shift erratically, and principal component analysis doesn't really work.

CVTT tries to do its usual tricks in spite of this, and it sort of works, but it's an area where CVTT's general approach is ill-suited.


ETC1 is based on cluster fit, via what's basically a mathematical reformulation of it.

Basically, ETC1 is based on the idea that the human visual system sees color detail less than intensity detail, so it encodes each 4x4 block as a pair of either 4x2 or 2x4 blocks which each encode a color, an offset table ID, and a per-pixel index into the offset table.  The offsets are added to ALL color channels, making them grayscale offsets, essentially.

Unique cumulative offsets

What's distinct about ETC compares to the desktop formats, as far as using cluster fit is concerned, is two things: First, the primary axis is always known.  Second, the offset tables are symmetrical, where 2 of the entries are the negation of the other two.
The optimal color for a block, not accounting for clamping, will be the average color of the block, offset by 1/16th of the offset assigned to each pixel.  Since half of the offsets negate each other, every pair of pixels assigned to opposing offsets cancel out, causing no change.  This drastically reduces the search space, since many of the combinations will produce identical colors.  Another thing that reduces the search space is that many of the colors will be duplicates after the precision reduction from quantization.  Yet another thing is that in the first mode, the offsets are +2 and +4, which have a common factor, causing many of the possible offsets to overlap, cancelling out even more combinations.

So, CVTT's ETC1 compressor simply evaluates each possible offset from the average color that results in a unique color post-quantization, and picks the best one.  Differential mode works by selecting the best VALID combination of colors, first by checking if the best pair of colors is valid, and failing that, checking all evaluated color combinations.


ETC2 has 3 additional selectable modes on top of the ETC1 modes.  One, called T mode, contains 4 colors: Color0, Color1, Color1+offset, and Color2+offset.  Another, called H mode, contains Color0+offset, Color0-offset, Color1+offset, and Color1-offset.  The final mode, called planar mode, contains what is essentially a base color and a per-axis offset gradient.

T and H mode

T and H mode both exist to better handle blocks where, within the 2x4 or 4x2 blocks, the colors do not align well along the grayscale axis.  CVTT's T/H mode encoding basically works with that assumption by trying to find where it thinks the poorly-aligned color axes might be.  First, it generates some chrominance coordinates, which are basically 2D coordinates corresponding to the pixel colors projected on to the grayscale plane.  Then, it performs principal component analysis to find the primary chrominance axis.  Then, it splits the block based on which side of the half-way point each pixel is to form two groupings that are referred to internally as "sectors."

From the sectors, it performs a similar process of inspecting each possible offset count from the average to determine the best fit - But it will also record if any colors NOT assigned to the sector can still use one of the results that it computed, which are used later to determine the actual optimal pairing of the results that it computed.

One case that this may not handle optimally is when the pixels in a block ARE fairly well-aligned along the grayscale axis, but the ability of T/H colors to be relatively arbitrary would be an advantage.

ETC2 with punch-through, "virtual T mode"

ETC2 supports punchthrough transparency by mapping one of the T or H indexes to transparent.  Both of these are resolved in the same way as T mode.  When encoding punch-through the color values for T mode are Color0, Color1+offset, transparent, Color1-offset, and in H mode, they are Color0+offset, Color0-offset, transparent, and Color1.

Essentially, both have a single color, and another color +/- an offset, there are only 2 differences: First, the isolated color H mode is still offset, so the offset has to be undone.  If that quantizes to a more accurate value, then H mode is better.  Second, the H mode color may not be valid - H mode encodes the table index low bit based on the order of the colors, but unlike when encoding opaque, reordering the colors will affect which color has the isolated value and which one has the pair of values. 

H mode as T mode encoding

One special case to handle with testing H mode is the possibility that the optimal color is the same.  This should be avoidable by evaluating T mode first, but the code handles it properly just to be safe.  Because H mode encodes the table low bit based on a comparison of the endpoints, it may not be possible to select the correct table if the endpoints are the same.  In that case, CVTT uses a fallback where it encodes the block as T mode instead, mapping everything to the color with the pair of offsets.

Planar mode

Planar mode involves finding an optimal combination of 3 values that determine the color of each channel value as O+(H*x)+(V*Y)

How planar mode actually works is by just finding the least-squares fit for each of those three values at once.
Where error=(reconstructedValue-actualValue)², we want to solve for d(error)/dO=0, d(error)/dH=0, and d(error)/dV=0

All three of these cases resolve to quadratic formulas, so the entire thing is just converted to a system of linear equations and solved.  The proof and steps are in the code.

ETC2 alpha and EAC

Both of these "grayscale" modes are both more complicated because they have 3-bit indexes, multiple lookup tables, and an amplitude multiplier.

CVTT tries a limited set of possibilities based on alpha insets.  It tries 10 alpha ranges, which correspond to all ranges where the index inset of each endpoint is +/- 1 the number of the other endpoint.  So, for example, given 8 alpha offsets numbered 0-7, it will try these pairs:
  • 0,7
  • 0,6
  • 1,7
  • 1,6
  • 1,5
  • 2,6
  • 2,5
  • 2,4
  • 3,5
  • 3,4
Once the range is selected, 2 multipliers are checked: The highest value that can be multiplied without exceeding the actual alpha range, and the smallest number that can be multiplied while exceeding it.

The best result of these possibilities is selected.

Possible improvements and areas of interest

BC6H is by far the most improvable aspect.  Traditional PCA doesn't work well because of the logarithmic interpolation.  Sum-of-square-difference in floating point pseudo-logarithmic space performs much worse than in gamma space and is prone to sparkly artifacts.

ETC1 cumulative offset deduplication assumes that each pixel is equally important, which doesn't hold when using weight-by-alpha.

ETC2 T/H mode encoding could try all 15 possible sector assignments (based on the 16-pixel ordering along the chroma axis) instead of one.  I did try finding the grouping that minimized the total square distance to the group averages instead of using the centroid as the split point, but that actually had no effect... they might be mathematically equivalent?  Not sure.

A lot of these concepts don't translate well to ASTC.  CVTT's approaches largely assume that it's practical to traverse the entire search space, but ASTC is highly configurable, so its search space has many axes, essentially.  The fact that partitioning is done AFTER grid interpolation in particular is also a big headache that would require its own novel solutions.

Reduction of the search space is one of CVTT's biggest sore spots.  It performs excellently at high quality targets, but is relatively slow at lower quality targets.  I justified this because typically developers want to maximize quality when import is a one-time operation done offline, and CVTT is fast enough for the most part, but it probably wouldn't be suitable for real-time operation.

by OneEightHundred ( at 2021-01-03 23:21



The plan to post a play-by-play for dev kind of fell apart as I preferred to focus on just doing the work, but the Windows port was a success.

If you want some highlights:

  • I replaced the internal resource format with ZIP archives to make it easier to create custom resource archives.
  • PICT support was dropped in favor of BMP, which is way easier to load.  The gpr2gpa tool handles importing.
  • Ditto with dropping "snd " resource support in favor of WAV.
  • Some resources were refactored to JSON so they could be patched, mostly dialogs.
  • Massive internal API refactoring, especially refactoring the QuickDraw routines to use the new DrawSurface API, which doesn't have an active "port" but instead uses method calls directly to the draw surface.
  • A bunch of work to allow resolution changes while in-game.  The game will load visible dynamic objects from neighboring rooms in a resolution-dependent way, so a lot of work went in to unloading and reloading those objects.

The SDL variant ("AerofoilSDL") is also basically done, with a new OpenGL ES 2 rendering backend and SDL sound backend for improved portability.  The lead version on Windows still uses D3D11 and XAudio2 though.

Unfortunately, I'm still looking for someone to assist with the macOS port, which is made more difficult by the fact that Apple discontinued OpenGL, so I can't really provide a working renderer for it any more.  (Aerofoil's renderer is actually slightly complicated, mostly due to postprocessing.)

Goin' mobile

In the meantime, the Android port is under way!  The game is fully playable so far, most of the work has to do with redoing the UI for touchscreens.  The in-game controls use corner taps for rubber bands and battery/helium, but it's a bit awkward if you're trying to use the battery while moving left due to the taps being on the same side of the screen.

Most of the cases where you NEED to use the battery, you're facing right, so this was kind of a tactical decision, but there are some screens (like "Grease is on TV") where it'd be really nice if it was more usable facing left.

I'm also adding a "source export" feature: The source code package will be bundled with the app, and you can just use the source export feature to save the source code to your documents directory.  That is, once I figure out how to save to the documents directory, which is apparently very complicated...

Anyway, I'm working on getting this into the Google Play Store too.  There might be some APKs posted to GitHub as pre-releases, but there may (if I can figure out how it works) be some Internal Testing releases via GPS.  If you want to opt in to the GPS tests, shoot an e-mail to

Will there be an iOS port?

Maybe, but there are two obstacles:

The game is GPL-licensed and there have reportedly been problems with Apple removing GPL-licensed apps from the App Store, and it may not be possible to comply with it.  I've heard there is now a way to push apps to your personal device via Xcode with only an Apple ID, which might make satisfying some of the requirements easier, but I don't know.

Second, as with the macOS version, someone would need to do the port.  I don't have a Mac, so I don't have Xcode, so I can't do it.

by OneEightHundred ( at 2020-10-20 11:09


Most of the images in Glider PRO's resources are in PICT format.

The PICT format is basically a bunch of serialized QuickDraw opcodes and can contain a combination of both image and vector data.

The first goal is to get all of the known resources to parse.  The good news is that none of the resources in the Glider PRO application resources or any of the houses contain vector data, so it's 100% bitmaps.  The bad news is that the bitmaps have quite a bit of variation in their internal structure, and sometimes they don't match the display format.

Several images contain multiple images spliced together within the image data, and at least one image is 16-bit color even though the rest of the images are indexed color.  One is 4-bit indexed color instead of 8-bit.  Many of them are 1-bit, and the bit scheme for 1-bit images is also inverted compared to the usual expectations (i.e. 1 is black, 0 is white).

Adding to these complications, while it looks like all of the images are using the standard system palette, there's no guarantee that they will - It's actually even possible to make a PICT image that combines multiple images with different color palettes, because the palette is defined per picture op, not per image file.

There's also a fun quirk where the PICT image frame doesn't necessarily have 0,0 as the top-left corner.

I think the best solution to this will simply be to change the display type to 32-bit and unpack PICT images to a single raster bitmap on load.  The game appears to use QuickDraw abstractions for all of its draw operations, so while it presumes that the color depth should be 8-bit, I don't think there's anything that will prevent GlidePort from using 32-bit instead.

In the meantime, I've been able to convert all of the resources in the open source release to PNG format as a test, so it should be possible to now adapt that to a runtime PICT loader.

by OneEightHundred ( at 2019-11-23 20:43


Recently found out that Classic Mac game Glider PRO's source code was released, so I'm starting a project called GlidePort to bring it to Windows, ideally in as faithful of a reproduction as possible and using the original data files.  Some additions like gamepad support may come at a later time if this stays on track.

While this is a chance to restore of the few iconic Mac-specific games of the era to, it's also a chance to explore a lot of the era technology, so I'll be doing some dev diaries about the process.

Porting Glider has a number of technical challenges: It's very much coded for the Mac platform, which has a lot of peculiarities compared to POSIX and Windows.  The preferred language for Mac OS was originally Pascal, so the C standard library is often mostly or entirely unused, and the Macintosh Toolbox (the operating system API)  has differences like preferring length-prefixed strings instead of C-style null terminated strings.

Data is in big endian format, as it was originally made for Motorola 68k and PowerPC CPUs.  Data files are split into two "forks," one as a flat data stream and the other as a resource database that the toolbox provides parsing facilities for.  In Mac development, parsing individual data elements was generally the preferred style vs. reading in whole structures, which leads to data formats often having variable-length strings and no padding for character buffer space or alignment.

Rendering is done using QuickDraw, the system-provided multimedia infrastructure.  Most images use the system-native PICT format, a vector format that is basically a list of QuickDraw commands.

At minimum, this'll require parsing a lot of Mac native resource formats, some Mac interchange formats (i.e. BinHex 4), reimplementation of a subset of QuickDraw and QuickTime, substitution of copyrighted fonts, and switch-out of numerous Mac-specific compiler extensions like dword literals and Pascal string escapes.

The plan for now is to implement the original UI in Qt, but I might rebuild the UI instead if that turns out to be impractical.

by OneEightHundred ( at 2019-10-10 02:03


When adding ETC support to Convection Texture Tools, I decided to try adapting the cluster fit algorithm used for desktop formats to ETC.

Cluster fit works by sorting the pixels into an order based on a color axis, and then repeatedly evaluating each possible combination of counts of the number of pixels assigned to each index.  It does so by taking the pixels and applying a least-squares fit to produce the endpoint line.

For ETC, this is is simplified in a few ways: The axis is always 1,1,1, so the step of picking a good axis is unnecessary.  There is only one base color and the offsets are determined by the table index, so the clustering step would only solve the base color.

Assuming that you know what the offsets for each pixel are, the least squares fit amounts to simply subtracting the offset from each of the input pixels and averaging the result.

For a 4x2 block, there are 165 possible cluster configurations, but it turns out that some of those are redundant, given certain assumptions.  The base color is derived from the formula ((color1-offset1)+(color2-offset2)+...)/8, but since the adds are commutative, that's identical to ((color1+color2+...)-(offset1+offset2+...))/8

The first half of that is the total of the colors, which is constant.  The second is the total of the offsets.

Fortunately, not all of the possible combinations produce unique offsets.  Some of them cancel out, since adding 1 to or subtracting 1 from the count of the offsets that are negatives of each other produces no change.  In an example case, the count tuples (5,0,1,2) and (3,2,3,0) are the same, since 5*-L + 0*-S + 1*S + 2*L = 3*-L + 2*-S + 3*S + 0*L.

For most of the tables, this results in only 81 possible offset combinations.  For the first table, the large value is divisible by the small value, causing even more cancellations, and only 57 possible offset combinations.

Finally, most of the base colors produced by the offset combinations are not unique after quantization: Differential mode only has 5-bit color resolution, and differential mode only has 4-bit resolution, so after quantization, many of the results get mapped to the same color.  Deduplicating them is also inexpensive: If the offsets are checked in ascending order, then once the candidate color progresses past the threshold where the result could map to a specific quantized color, it will never cross back below that threshold, so deduplication only needs to inspect the last appended quantized color.

Together, these reduce the candidate set of base colors to a fairly small number, creating a very optimal search space at low cost.

There are a few circumstances where these assumptions don't hold:

One is when the clamping behavior comes into effect, particularly when a pixel channel's value is near 0 or 255.  In that case, this algorithm can't account for the fact that changing the value of the base color would have no effect on some of the offset colors.

One is when the pixels are not of equal importance, such as when using weight-by-alpha, which makes the offset additions non-commutative, but that only invalidates the cancellation part of the algorithm.  The color total can be pre-weighted, and the rest of the algorithm would have to rely on working more like cluster fit: Sort the colors along the 1,1,1 line and determine the weights for the pixels in that order, generate all 165 cluster combinations, and compute the weight totals for each one.  Sort them into ascending order, and then the rest of the algorithm should work.

One is when dealing with differential mode constraints, since not all base color pairs are legal.  There are some cases where a base color pair that is just barely illegal could be made legal by nudging the colors closer together, but in practice, this is rare: Usually, there is already a very similar individual mode color pair, or another differential mode pair that is only slightly worse.

In CVTT, I deal with differential mode by evaluating all of the possibilities and picking the best legal pair.  There's a shortcut case when the best base color for both blocks produces a legal differential mode pair, but this is admittedly a bit less than optimal: It picks the first evaluation in the case of a tie when searching for the best, but since blocks are evaluated starting with the largest combined negative offset, it's a bit more likely to pick colors far away from the base than colors close to the base, even though colors closer to the average tend to produce smaller offsets and are more likely to be legal, so this could be improved by making the tie-breaking function prefer smaller offsets.

In practice though, the differential mode search is not where most of the computation time is spent: Evaluating the actual base colors is.

As with the rest of CVTT's codecs, brute force is still key: The codec is designed to use 8-wide SSE2 16-bit math ops wherever possible to processing 8 blocks at once, but this creates a number of challenges since sorting and list creation are not amenable to vectorization.  I solve this by careful insertion of scalar ops, and the entire differential mode part is scalar as well.  Fortunately, as stated, the parts that have to be scalar are not major contributors to the encoding time.

You can grab the stand-alone CVTT encoding kernels here:

by OneEightHundred ( at 2019-09-06 00:47


Convection Texture Tools is now roughly equal quality-wise with NVTT at compressing BC7 textures despite being about 140 times faster, making it one of the fastest and highest-quality BC7 compressors.

How this was accomplished turned out to be simpler than expected.  Recall that Squish became the gold standard of S3TC compressors by implementing a "cluster fit" algorithm that ordered all of the input colors on a line and tried every possible grouping of them to least-squares fit them.

Unfortunately, using this technique isn't practical in BC7 because the number of orderings has rather extreme scaling characteristics.  While 2-bit indices have a few hundred possible orderings, 4-bit indices have millions, most BC7 mode indices are 3 bits, and some have 4.

With that option gone, most BC7 compressors until now have tried to solve endpoints using various types of endpoint perturbation, which tends to require a lot of iterations.

Convection just uses 2 rounds of K-means clustering and a much simpler technique based on a guess about why Squish's cluster fit algorithm is actually useful: It can create endpoint mappings that don't use some of the terminal ends of the endpoint line, causing the endpoint to be extrapolated out, possibly to a point that loses less accuracy to quantization.

Convection just tries cutting off 1 index at each end, then 1 index at both ends.  That turned out to be enough to place it near the top of the quality benchmarks.

Now I just need to add color weighting and alpha weighting and it'll be time to move on to other formats.

by OneEightHundred ( at 2018-03-30 05:26


How do you generate the tangent vectors, which represent which way the texture axes on a textured triangle, are facing?

Hitting up Google tends to produce articles like this one, or maybe even that exact one. I've seen others linked too, the basic formulae tend to be the same. Have you looked at what you're pasting into your code though? Have you noticed that you're using the T coordinates to calculate the S vector, and vice versa? Well, you can look at the underlying math, and you'll find that it's because that's what happens when you assume the normal, S vector, and T vectors form an orthonormal matrix and attempt to invert it, in a sense you're not really using the S and T vectors but rather vectors perpendicular to them.

But that's fine, right? I mean, this is an orthogonal matrix, and they are perpendicular to each other, right? Well, does your texture project on to the triangle with the texture axes at right angles to each other, like a grid?

... Not always? Well, you might have a problem then!

So, what's the real answer?

Well, what do we know? First, translating the vertex positions will not affect the axial directions. Second, scrolling the texture will not affect the axial directions.

So, for triangle (A,B,C), with coordinates (x,y,z,t), we can create a new triangle (LA,LB,LC) and the directions will be the same:

We also know that both axis directions are on the same plane as the points, so to resolve that, we can to convert this into a local coordinate system and force one axis to zero.

Now we need triangle (Origin, PLB, PLC) in this local coordinate space. We know PLB[y] is zero since LB was used as the X axis.

Now we can solve this. Remember that PLB[y] is zero, so...

Do this for both axes and you have your correct texture axis vectors, regardless of the texture projection. You can then multiply the results by your tangent-space normalmap, normalize the result, and have a proper world-space surface normal.

As always, the source code spoilers:

terVec3 lb = ti->points[1] - ti->points[0];
terVec3 lc = ti->points[2] - ti->points[0];
terVec2 lbt = ti->texCoords[1] - ti->texCoords[0];
terVec2 lct = ti->texCoords[2] - ti->texCoords[0];

// Generate local space for the triangle plane
terVec3 localX = lb.Normalize2();
terVec3 localZ = lb.Cross(lc).Normalize2();
terVec3 localY = localX.Cross(localZ).Normalize2();

// Determine X/Y vectors in local space
float plbx = lb.DotProduct(localX);
terVec2 plc = terVec2(lc.DotProduct(localX), lc.DotProduct(localY));

terVec2 tsvS, tsvT;

tsvS[0] = lbt[0] / plbx;
tsvS[1] = (lct[0] - tsvS[0]*plc[0]) / plc[1];
tsvT[0] = lbt[1] / plbx;
tsvT[1] = (lct[1] - tsvT[0]*plc[0]) / plc[1];

ti->svec = (localX*tsvS[0] + localY*tsvS[1]).Normalize2();
ti->tvec = (localX*tsvT[0] + localY*tsvT[1]).Normalize2();

There's an additional special case to be aware of: Mirroring.

Mirroring across an edge can cause wild changes in a vector's direction, possibly even degenerating it. There isn't a clear-cut solution to these, but you can work around the problem by snapping the vector to the normal, effectively cancelling it out on the mirroring edge.

Personally, I check the angle between the two vectors, and if they're more than 90 degrees apart, I cancel them, otherwise I merge them.

by OneEightHundred ( at 2012-01-08 00:23


Valve's self-shadowing radiosity normal maps concept can be used with spherical harmonics in approximately the same way: Integrate a sphere based on how much light will affect a sample if incoming from numerous sample direction, accounting for collision with other samples due to elevation.

You can store this as three DXT1 textures, though you can improve quality by packing channels with similar spatial coherence. Coefficients 0, 2, and 6 in particular tend to pack well, since they're all dominated primarily by directions aimed perpendicular to the texture.

I use the following packing:
Texture 1: Coefs 0, 2, 6
Texture 2: Coefs 1, 4, 5
Texture 3: Coefs 3, 7, 8

You can reference an early post on this blog for code on how to rotate a SH vector by a matrix, in turn allowing you to get it into texture space. Once you've done that, simply multiply each SH coefficient from the self-shadowing map by the SH coefficients created from your light source (also covered on the previous post) and add together.

by OneEightHundred ( at 2011-12-07 18:39


Spherical harmonics seems to have some impenetrable level of difficulty, especially among the indie scene which has little to go off of other than a few presentations and whitepapers, some of which even contain incorrect information (i.e. one of the formulas in the Sony paper on the topic is incorrect), and most of which are still using ZYZ rotations because it's so hard to find how to do a matrix rotation.

Hao Chen and Xinguo Liu did a presentation at SIGGRAPH '08 and the slides from it contain a good deal of useful stuff, nevermind one of the ONLY easy-to-find rotate-by-matrix functions. It also treats the Z axis a bit awkwardly, so I patched the rotation code up a bit, and a pre-integrated cosine convolution filter so you can easily get SH coefs for directional light.

There was also gratuitous use of sqrt(3) multipliers, which can be completely eliminated by simply premultiplying or predividing coef #6 by it, which incidentally causes all of the constants and multipliers to resolve to rational numbers.

As always, you can include multiple lights by simply adding the SH coefs for them together. If you want specular, you can approximate a directional light by using the linear component to determine the direction, and constant component to determine the color. You can do this per-channel, or use the average values to determine the direction and do it once.

Here are the spoilers:

#define SH_AMBIENT_FACTOR   (0.25f)
#define SH_LINEAR_FACTOR (0.5f)
#define SH_QUADRATIC_FACTOR (0.3125f)

void LambertDiffuseToSHCoefs(const terVec3 &dir, float out[9])
// Constant
out[0] = 1.0f * SH_AMBIENT_FACTOR;

// Linear
out[1] = dir[1] * SH_LINEAR_FACTOR;
out[2] = dir[2] * SH_LINEAR_FACTOR;
out[3] = dir[0] * SH_LINEAR_FACTOR;

// Quadratics
out[4] = ( dir[0]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[5] = ( dir[1]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * SH_QUADRATIC_FACTOR;
out[7] = ( dir[0]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;

void RotateCoefsByMatrix(float outCoefs[9], const float pIn[9], const terMat3x3 &rMat)
// DC
outCoefs[0] = pIn[0];

// Linear
outCoefs[1] = rMat[1][0]*pIn[3] + rMat[1][1]*pIn[1] + rMat[1][2]*pIn[2];
outCoefs[2] = rMat[2][0]*pIn[3] + rMat[2][1]*pIn[1] + rMat[2][2]*pIn[2];
outCoefs[3] = rMat[0][0]*pIn[3] + rMat[0][1]*pIn[1] + rMat[0][2]*pIn[2];

// Quadratics
outCoefs[4] = (
( rMat[0][0]*rMat[1][1] + rMat[0][1]*rMat[1][0] ) * ( pIn[4] )
+ ( rMat[0][1]*rMat[1][2] + rMat[0][2]*rMat[1][1] ) * ( pIn[5] )
+ ( rMat[0][2]*rMat[1][0] + rMat[0][0]*rMat[1][2] ) * ( pIn[7] )
+ ( rMat[0][0]*rMat[1][0] ) * ( pIn[8] )
+ ( rMat[0][1]*rMat[1][1] ) * ( -pIn[8] )
+ ( rMat[0][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )

outCoefs[5] = (
( rMat[1][0]*rMat[2][1] + rMat[1][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[1][1]*rMat[2][2] + rMat[1][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[1][2]*rMat[2][0] + rMat[1][0]*rMat[2][2] ) * ( pIn[7] )
+ ( rMat[1][0]*rMat[2][0] ) * ( pIn[8] )
+ ( rMat[1][1]*rMat[2][1] ) * ( -pIn[8] )
+ ( rMat[1][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )

outCoefs[6] = (
( rMat[2][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[2][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[2][0]*rMat[2][2] ) * ( pIn[7] )
+ 0.5f*( rMat[2][0]*rMat[2][0] ) * ( pIn[8])
+ 0.5f*( rMat[2][1]*rMat[2][1] ) * ( -pIn[8])
+ 1.5f*( rMat[2][2]*rMat[2][2] ) * ( pIn[6] )
- 0.5f * ( pIn[6] )

outCoefs[7] = (
( rMat[0][0]*rMat[2][1] + rMat[0][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[0][1]*rMat[2][2] + rMat[0][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[0][2]*rMat[2][0] + rMat[0][0]*rMat[2][2] ) * ( pIn[7] )
+ ( rMat[0][0]*rMat[2][0] ) * ( pIn[8] )
+ ( rMat[0][1]*rMat[2][1] ) * ( -pIn[8] )
+ ( rMat[0][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )

outCoefs[8] = (
( rMat[0][1]*rMat[0][0] - rMat[1][1]*rMat[1][0] ) * ( pIn[4] )
+ ( rMat[0][2]*rMat[0][1] - rMat[1][2]*rMat[1][1] ) * ( pIn[5] )
+ ( rMat[0][0]*rMat[0][2] - rMat[1][0]*rMat[1][2] ) * ( pIn[7] )
+0.5f*( rMat[0][0]*rMat[0][0] - rMat[1][0]*rMat[1][0] ) * ( pIn[8] )
+0.5f*( rMat[0][1]*rMat[0][1] - rMat[1][1]*rMat[1][1] ) * ( -pIn[8] )
+0.5f*( rMat[0][2]*rMat[0][2] - rMat[1][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )

... and to sample it in the shader ...

float3 SampleSHQuadratic(float3 dir, float3 shVector[9])
float3 ds1 =*;
float3 ds2 = dir*dir.yzx; // xy, zy, xz

float3 v = shVector[0];

v += dir.y * shVector[1];
v += dir.z * shVector[2];
v += dir.x * shVector[3];

v += ds2.x * shVector[4];
v += ds2.y * shVector[5];
v += (ds1.z * 1.5 - 0.5) * shVector[6];
v += ds2.z * shVector[7];
v += (ds1.x - ds1.y) * 0.5 * shVector[8];

return v;

For Monte Carlo integration, take sampling points, feed direction "dir" to the following function to get multipliers for each coefficient, then multiply by the intensity in that direction. Divide the total by the number of sampling points:

void SHForDirection(const terVec3 &dir, float out[9])
// Constant
out[0] = 1.0f;

// Linear
out[1] = dir[1] * 3.0f;
out[2] = dir[2] * 3.0f;
out[3] = dir[0] * 3.0f;

// Quadratics
out[4] = ( dir[0]*dir[1] ) * 15.0f;
out[5] = ( dir[1]*dir[2] ) * 15.0f;
out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * 5.0f;
out[7] = ( dir[0]*dir[2] ) * 15.0f;
out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 15.0f;

... and finally, for a uniformly-distributed random point on a sphere ...

terVec3 RandomDirection(int (*randomFunc)(), int randMax)
float u = (((float)randomFunc()) / (float)(randMax - 1))*2.0f - 1.0f;
float n = sqrtf(1.0f - u*u);

float theta = 2.0f * M_PI * (((float)randomFunc()) / (float)(randMax));

return terVec3(n * cos(theta), n * sin(theta), u);

by OneEightHundred ( at 2011-12-02 12:22


Fresh install on OS X of ColdFusion Bulder 2 (TWO, the SECOND one). Typing a simple conditional, this is what I was given:

I also had to manually write the closing cfif tag. It's such a joke.

The absolute core purpose of an IDE is to be a text editor. Secondary to that are other features that are supposed to make you work better. ColdFusion Builder 2 (TWO!!!!!) completely fails on all levels as a text editor. It doesn't even function as well as notepad.exe!

Text search is finicky, Find & Replace is completely broken half the time, the UI is often unresponsive (yay Eclipse), the text cursor sometimes disappears, double-clicking folders or files in an FTP view pops up the Rename dialog every time, HTML / CF tag completion usually doesn't happen, indention is broken, function parameter tooltips obscure the place you are typing, # and " completion randomly breaks (often leaving you with a ###)...the list goes on and on.

Adobe has a big feature list on their site. I'm thinking maybe they should go back and use some resources to fix the parts where you type things into the computer, you know, the whole point of the thing.

by Ted ( at 2011-12-01 15:14


Has it really been a year since the last update?

Well, things have been chugging along with less discovery and more actual work. However, development on TDP is largely on hold due to the likely impending release of the Doom 3 source code, which has numerous architectural improvements like rigid-body physics and much better customization of entity networking.

In the meantime, however, a component of TDP has been spun off into its own project: The RDX extension language. Initially planned as a resource manager, it has evolved into a full-fledged programmability API. The main goal was to have a runtime with very straightforward integration, to the point that you can easily use it for managing your C++ resources, but also to be much higher performance than dynamically-typed interpreted languages, especially when dealing with complex data types such as float vectors.

Features are still being implemented, but the compiler seems to be stable and load-time conversion to native x86 code is functional. Expect a real release in a month or two.

The project now has a home on Google Code.

by OneEightHundred ( at 2011-10-19 01:37



You'll recall some improvements I proposed to the YCoCg DXT5 algorithm a while back.

There's another realization of it I made recently: As a YUV-style color space, the Co and Cg channels are constrained to a range that's directly proportional to the Y channel. The addition of the scalar blue channel was mainly introduced to deal with resolution issues that caused banding artifacts on colored objects changing value, but the entire issue there can be sidestepped by simply using the Y channel as a multiplier for the Co and Cg channels, causing them to only respect tone and saturation while the Y channel becomes fully responsible for intensity.

This is not a quality improvement, in fact it nearly doubles PSNR in testing. However, it does result in considerable simplification of the algorithm, both on the encode and decode sides, and the perceptual loss compared to the old algorithm is very minimal.

This also simplifies the algorithm considerably:

int iY = px[0] + 2*px[1] + px[2]; // 0..1020
int iCo, iCg;

if (iY == 0)
iCo = 0;
iCg = 0;
iCo = (px[0] + px[1]) * 255 / iY;
iCg = (px[1] * 2) * 255 / iY;

px[0] = (unsigned char)iCo;
px[1] = (unsigned char)iCg;
px[2] = 0;
px[3] = (unsigned char)((iY + 2) / 4);

... And to decode:

float3 DecodeYCoCgRel(float4 inColor)
return (float3(4.0, 0.0, -4.0) * inColor.r
+ float3(-2.0, 2.0, -2.0) * inColor.g
+ float3(0.0, 0.0, 4.0)) * inColor.a;

While this does the job with much less perceptual loss than DXT1, and eliminates banding artifacts almost entirely, it is not quite as precise as the old algorithm, so using that is recommended if you need the quality.

by OneEightHundred ( at 2010-10-11 03:21

A few years back there was a publication on real-time YCoCg DXT5 texture compression. There are two improvements on the technique I feel I should present:

There's a pretty clear problem right off the bat: It's not particularly friendly to linear textures. If you simply attempt to convert sRGB values into linear space and store the result in YCoCg, you will experience severe banding owing largely to the loss of precision at lower values. Gamma space provides a lot of precision at lower intensity values where the human visual system is more sensitive.

sRGB texture modes exist as a method to cheaply convert from gamma space to linear, and are pretty fast since GPUs can just use a look-up table to get the linear values, but YCoCg can't be treated as an sRGB texture and doing sRGB decodes in the shader is fairly slow since it involves a divide, power raise, and conditional.

This can be resolved first by simply converting from a 2.2-ish sRGB gamma ramp to a 2.0 gamma ramp, which preserves most of the original gamut: 255 input values map to 240 output values, low intensity values maintain most of their precision, and they can be linearized by simply squaring the result in the shader.

Another concern, which isn't really one if you're aiming for speed and doing things real-time, but is if you're considering using such a technique for offline processing, is the limited scale factor. DXT5 provides enough resolution for 32 possible scale factor values, so there isn't any reason to limit it to 1, 2, or 4 if you don't have to. Using the full range gives you more color resolution to work with.

Here's some sample code:

unsigned char Linearize(unsigned char inByte)
float srgbVal = ((float)inByte) / 255.0f;
float linearVal;

if(srgbVal 0.04045)
linearVal = srgbVal / 12.92f;
linearVal = pow( (srgbVal + 0.055f) / 1.055f, 2.4f);

return (unsigned char)(floor(sqrt(linearVal)* 255.0 + 0.5));

void ConvertBlockToYCoCg(const unsigned char inPixels[16*3], unsigned char outPixels[16*4])
unsigned char linearizedPixels[16*3]; // Convert to linear values

for(int i=0;i16*3;i++)
linearizedPixels[i] = Linearize(inPixels[i]);

// Calculate Co and Cg extents
int extents = 0;
int n = 0;
int iY, iCo, iCg;
int blockCo[16];
int blockCg[16];
const unsigned char *px = linearizedPixels;
for(int i=0;i16;i++)
iCo = (px[0]1) - (px[2]1);
iCg = (px[1]1) - px[0] - px[2];
if(-iCo > extents) extents = -iCo;
if( iCo > extents) extents = iCo;
if(-iCg > extents) extents = -iCg;
if( iCg > extents) extents = iCg;

blockCo[n] = iCo;
blockCg[n++] = iCg;

px += 3;

// Co = -510..510
// Cg = -510..510
float scaleFactor = 1.0f;
if(extents > 127)
scaleFactor = (float)extents * 4.0f / 510.0f;

// Convert to quantized scalefactor
unsigned char scaleFactorQuantized = (unsigned char)(ceil((scaleFactor - 1.0f) * 31.0f / 3.0f));

// Unquantize
scaleFactor = 1.0f + (float)(scaleFactorQuantized / 31.0f) * 3.0f;

unsigned char bVal = (unsigned char)((scaleFactorQuantized 3) | (scaleFactorQuantized >> 2));

unsigned char *outPx = outPixels;

n = 0;
px = linearizedPixels;
// Calculate components
iY = ( px[0] + (px[1]1) + px[2] + 2 ) / 4;
iCo = ((blockCo[n] / scaleFactor) + 128);
iCg = ((blockCg[n] / scaleFactor) + 128);

if(iCo 0) iCo = 0; else if(iCo > 255) iCo = 255;
if(iCg 0) iCg = 0; else if(iCg > 255) iCg = 255;
if(iY 0) iY = 0; else if(iY > 255) iY = 255;

px += 3;

outPx[0] = (unsigned char)iCo;
outPx[1] = (unsigned char)iCg;
outPx[2] = bVal;
outPx[3] = (unsigned char)iY;

outPx += 4;

.... And to decode it in the shader ...

float3 DecodeYCoCg(float4 inColor)
float3 base = inColor.arg + float3(0, -0.5, -0.5);
float scale = (inColor.b*0.75 + 0.25);
float4 multipliers = float4(1.0, 0.0, scale, -scale);
float3 result;

result.r = dot(base, multipliers.xzw);
result.g = dot(base,;
result.b = dot(base, multipliers.xww);

// Convert from 2.0 gamma to linear
return result*result;

by OneEightHundred ( at 2010-10-11 01:32


This article is hilarious. It sounds like a perfectly normal business-y article until to you get to this gem:
The barrier to entry on the Instant concept is apparently low, and Yahoo and Microsoft's Bing have both tested the waters, according to a report in Search Engine Land.
(emphasis mine)

So apparently Dawn Kawamoto, "Technology Reporter" for Daily Finance, thinks the barrier to entry to searching the entire internet instantly is low.

I don't even know what to say.

by Blake Householder ( at 2010-09-12 19:44


I'm interested in advertising Kittyball to help promote it to a broader audience than "people who search kitty in the App Store and scroll waaay down", so I was looking to spend a few hundred dollars on ads. I happened to see an ad on Gamasutra for GAO, so I clicked it. Here's what I got:

Admire the graphs! Gaze in awe at the pile of logos! Marvel at screenshots of tables! Apply for GAO advertiser account!


Why should I apply if I have no idea what I'll get?

So I sent GAO this email with their "contact us" form:
RE: GAO: your landing page sucks :(

I clicked an ad banner for your site from Gamasutra ( )
and *nothing* on the landing page tells me why I should do business with
you. What will it cost me? What benefits will I get? Why are you better
than your competitors? I have no idea!

I see that you've got some reach, but I have no frame of reference for that
so I don't care.
You've got some clients, but they're not me, so I don't care.
You've got "cutting edge functionality" but I don't care.
I can apply for an account, but why?
and they helpfully replied with:
Good day,

We are pleased to have confirmation that our landing page only appeals to
people who care.

Best Wishes,

Valera Koltsov
Game Advertising Online
Thanks guys! Guess I'll take my money elsewhere!

A good landing page should directly tell the viewer what benefits they will receive. A good landing page answers the question of "why should I give you my money?"

by Blake Householder ( at 2010-09-11 19:04