sn.printf.net

2023-05-27

In every introduction to a potential client, partner, or other associate, the first thing I do is give a brief overview of my history. I know this is common in just about any business or social interaction, but its especially important in my line of work, since communicating my curriculum vitae is so critical to …

2023-05-27 11:30

I usually like to keep my posts more positive-focused — here’s what you should do vs. here’s what you shouldn’t do. But this week alone I had three potential clients relay to me a very common experience: I thought I finally found a good developer, but then they suddenly just disappeared! I’ll admit its hard …

2023-05-27 11:30

One of the advantages (if you can call it that) of being in this industry as long as I have is that I’ve been through multiple economic disasters, technological paradigm shifts, and — it’s not all bad! — economic and technological boom periods. So I figured I might as well throw out a few predictions …

2023-05-27 11:30

I’ve had an increasing number of conversations over the past few weeks about both React Native and Firebase — mostly with non-technical founders who have been advised by those they trust that, before they embark on their app development, they should choose one (or both) of these technologies at the core of their stack. I …

2023-05-27 11:30

Big things start with a simple idea. But what I want to talk about today is how that simple idea grows bigger successfully. Specifically, I want to look at the very first step I believe all entrepreneurs should take before setting out to turn their grand vision into reality! The vast majority of projects I …

2023-05-27 11:30

For a lot of non-technical founders setting out to build their product, it can be hard to understand just how much work needs to be done before you write a single line of code! In this post, I will break down much of the ground work that needs to be done by your tech team …

2023-05-27 11:30

Hands down the most terrifying phase of building a new tech startup is launch. So many pieces of a very complex puzzle have to come together at exactly the same time in exactly the right way for things to go right. If you’ve never gone through it, you may be wondering what on Earth I’m …

2023-05-27 11:30

The March update of Graphite Comics just went live. It’s one of those updates that is very significant, with major changes under the hood — but which users will almost certainly not notice at all. This makes for an interesting opportunity to talk about balancing needs in software development. In this case, we are looking …

2023-05-27 11:30

I’ve updated my social networking demo app Scrawl to include a couple often requested features – Universal links, email verification, and password reset. In addition, I’ve integrated Mailgun into the backend application in order to allow the server to send email messages to users (which is obviously a requirement for two of these features). If …

2023-05-27 11:30

In this series of posts, I want to discuss several myths, misconceptions, and misunderstandings that threaten to derail inexperienced or non-technical founders of tech startups.   Note: this post also appears on Glowdot with permission. In the beginning stages of planning an app development, there are two ubiquitous questions that get asked, and they are the …

2023-05-27 11:30

2022-11-15

After reviewing the code for the simple YAML parser I wrote, I decided it was getting a little messy, so before continuing, I decided to refactor it a little bit.

The simples thing to do was to separate the serialisation and the deserialisation into separate classes, and simple call those from within the YamlConvert class in the existing methods. This approach tends to be what other JSON and YAML libraries do, with added functionality such as being able to control aspects of the serialisation/deserialisation process for specific types.

I currently don’t need, or want, to do that, as I’m taking a much more brute force approach - however it is something to consider for a future refactor. Maybe.

I ended up with the following for the YamlConvert:

public static class YamlConvert
{
    private static YamlSerialiser Serialiser;
    private static YamlDeserialiser Deserialiser;
    
    static YamlConvert()
    {
        Serialiser = new YamlSerialiser();
        Deserialiser = new YamlDeserialiser();
    }
    
    public static string Serialise(YamlHeader header)
    {
        return Serialiser.Serialise(header);
    }

    public static YamlHeader Deserialise(string filePath)
    {
        if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);

        var content = File.ReadAllLines(filePath);

        return Deserialise(content);
    }

    public static YamlHeader Deserialise(string[] rawHeader)
    {
        return Deserialiser.Deserialise(rawHeader);
    }
}

It works quite well, as it did before, and looks a lot better. There is no dependency configuration to worry about, as I mentioned above I’m not worried about swapping out the serialisation/deserialisation process at any time.

2022-11-15 00:00

2022-07-30

Previously we left off with a method which could parse the YAML header in one of our markdown files, and it was collecting each line between the --- header marker, for further processing.

One of the main requirements for the overall BlogHelper9000 utility is to be able to standardise the YAML headers in each source markdown file for a post. Some of the posts had a mix of different tags, that were essentially doing the same thing, so one of the aims is to be able to collect those, and transform the values into the correct tags.

In order to achieve this, we can specify a collection of the valid header properties up front, and also a collection of the ‘other’ properties that we find, which we can hold for further in the process when we’ve written the code to handle those properties. The YamlHeader class has already been defined, and we can use a little reflection to load that class up and pick the properties out.

private static Dictionary<string, object?> GetYamlHeaderProperties(YamlHeader? header = null)
{
    var yamlHeader = header ?? new YamlHeader();
    return yamlHeader.GetType()
        .GetProperties(BindingFlags.DeclaredOnly | BindingFlags.Public | BindingFlags.Instance)
        .Where(p => p.GetCustomAttribute<YamlIgnoreAttribute>() is null)
        .ToDictionary(p =>
        {
            var attr = p.GetCustomAttribute<YamlNameAttribute>();

            return attr is not null ? attr.Name.ToLower() : p.Name.ToLower();
        }, p => p.GetValue(yamlHeader, null));
}

We need to be careful to ignore collecting properties that are not part of the YAML header in markdown files, but that we use in the YamlHeader that we can use when doing further processing - such as holding the ‘extra’ properties that we’ll need to match up with their valid counterparts in a further step. Thus we have the custom YamlIgnoreAttribute that we can use to ensure we drop properties that we don’t care about. We also need to ensure that we can match up C# property names with the actual YAML header name, so we also have the YamlNameAttribute to handle this.

Then we just need a way of parsing the individual lines and pulling the header name and the value out.

(string property, string value) ParseHeaderTag(string tag)
{
    tag = tag.Trim();
    var index = tag.IndexOf(':');
    var property = tag.Substring(0, index);
    var value = tag.Substring(index+1).Trim();
    return (property, value);
}

Here we just return a simple tuple after doing some simple substring manipulation, which is greatly helped by the header and its value always being seperated by ‘:’.

Then if we put all that together we can start to parse the header properties.

private static YamlHeader ParseYamlHeader(IEnumerable<string> yamlHeader)
{
    var parsedHeaderProperties = new Dictionary<string, object>();
    var extraHeaderProperties = new Dictionary<string, string>();
    var headerProperties = GetYamlHeaderProperties();

    foreach (var line in yamlHeader)
    {
        var propertyValue = ParseHeaderTag(line);

        if (headerProperties.ContainsKey(propertyValue.property))
        {
            parsedHeaderProperties.Add(propertyValue.property, propertyValue.value);
        }
        else
        {
            extraHeaderProperties.Add(propertyValue.property, propertyValue.value);
        }
    }

    return ToYamlHeader(parsedHeaderProperties, extraHeaderProperties);

All we need to do is, to setup up some dictionaries to hold the header properties, get the dictionary of valid header properties, and then loop through each line, parsing the header tag and verifying whether the property is a ‘valid’ one that we definitely know we want to keep, and or one we need to hold for further processing. You’ll noticed in the above code, that it’s missing an end brace: this is deliberate, because the ParseHeaderTag method and ToYamlHeader method are both nested methods.

Reading through the code to write this post has made me realise that we can do some refactoring to make this look a little nicer.

So we’ll look at that next.

2022-07-30 00:00

2022-07-22

The next thing to do to get BlogHelper9000 functional is to write a command which provides some information about the posts in the blog. I want to know:

  • How many published posts there are
  • How many drafts there are
  • A short list of recent posts
  • How long it’s been since a post was published

I also know that I want to introduce a command which will allow me to fix the metadata in the posts, which is a little messy. I’ve been inconsistently blogging since 2007, originally starting off on a self-hosted python blog I’ve forgot the name of before migrating to Wordpress, and then migrating to a short lived .net static site generator before switching over to Jekyll.

Obviously, Markdown powered blogs like Jekyll have to provide non-markdown metadata in each post, and for Jekyll (and most markdown powered blogs) that means: YAML.

Parse that YAML

There are a couple of options when it comes to parsing YAML. One would be to use YamlDotNet which is a stable library which conforms with V1.1 and v1.2 of the YAML specifications.

But where is the fun in that?

I’ve defined a POCO called YamlHeader which I’m going to use to use as the in-memory object to represent the YAML metadata header at the top of a markdown file.

If we take a leaf from different JSON converters, we can define a YamlConvert class like this:

public static class YamlConvert
{
    public static string Serialise(YamlHeader header)
    {
    }

    public static YamlHeader Deserialise(string filePath)
    {
    }
}

With this, we can easily serialise a YamlHeader into a string, and deserialise a file into a YamlHeader.

Deserialise

Deserialising is the slight more complicated of the two, so lets start with that.

Our first unit test looks like this:

    [Fact]
    public void Should_Deserialise_YamlHeader()
    {
        var yaml = @"---
layout: post
title: 'Dynamic port assignment in Octopus Deploy'
tags: ['build tools', 'octopus deploy']
featured_image: /assets/images/posts/2020/artem-sapegin-b18TRXc8UPQ-unsplash.jpg
featured: false
hidden: false
---
post content that's not parsed";
        
        var yamlObject = YamlConvert.Deserialise(yaml.Split(Environment.NewLine));

        yamlObject.Layout.Should().Be("post");
        yamlObject.Tags.Should().NotBeEmpty();
    }

This immediately requires us to add an overload for Deserialise to the YamlConvert class, which takes a string[]. This means our implementation for the first Deserialise method is simply:

public static YamlHeader Deserialise(string filePath)
{
    if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);

    var content = File.ReadAllLines(filePath);

    return Deserialise(content);
}

Now we get into the fun part. And a big caveat: I’m not sure if this is the best way of doing this, but it works for me and that’s all I care about.

Anyway. A YAML header block is identified by a single line of only --- followd by n lines of YAML which is signified to have ended by another single line of only ---. You can see this in the unit test above.

The algorithm I came up with goes like this:

For each line in lines:
  if line is '---' then
    if header start marker not found then
      header start marker found
      continue
     break loop
    store line
  parse each line of found header

So in a nutshell, it loops through each line in the file, look for the first --- to identify the start of the header, and then until it hits another ---, it gathers the lines for further processing.

Translated into C#, the code looks like this:

public static YamlHeader Deserialise(string[] fileContent)
{
    var headerStartMarkerFound = false;
    var yamlBlock = new List<string>();

    foreach (var line in fileContent)
    {
        if (line.Trim() == "---")
        {
            if (!headerStartMarkerFound)
            {
                headerStartMarkerFound = true;
                continue;
            }

            break;
        }

        yamlBlock.Add(line);
    }
        
    return ParseYamlHeader(yamlBlock);
}

This is fairly straightforward, and isn’t where I think some of the problems with the way it works actually are - all that is hidden behind ParseYamlHeader, and is worth a post on its own.

2022-07-22 00:00

2022-07-14

In the introductory post to this series, I ended with issuing a command to initialise a new console project, BlogHelper9000. It doesn’t matter how you create your project, be it from Visual Studio, Rider or the terminal, the end result is the same, as the templates are all the same.

With the new .net 6 templates, the resulting Program.cs is somewhat sparse, if you discount the single comment then all you get in the file is a comment and a Console.WriteLine("Hello, World!");, thanks to all the new wizardry in the latest versions of the language and the framework.

Thanks to this new fangled sorcery, the app still has a static main method, you just don’t need to see it, and as such, the args string array is still there. For very simple applications, this is all you really need to do. However, once you get past a few commands, with a few optional flags, things can get complicated, fast. This can into a maintenance headache.

In the past I’ve written my own command line parsing abstractions, I’ve used Mono.Options and other libraries, and I think I’ve finally settled on Oakton as my go to library for quickly and easily adding command line parsing to a console application. It’s intuitive, easy to use and easy to maintain. This means you can easily introduce it into a team environment and have everyone understand it immediately.

Setup Command loading

After following Oakton’s getting started documentation, you can see how easy it is to get going with a basic implementation. I recommended introducing the ability to have both synchronous and asynchronous commands able to be executed, and you achieve this by a small tweak to the Program.cs and taking into consideration the top-level statements in .net 6, like this:

using System.Reflection;

var executor = CommandExecutor.For(_ =>{
    _.RegisterCommands(typeof(Program).GetTypeInfo().Assembly);
});

var result = await executor.ExecuteAsync(args);
return result;

In .net 5, or if you don’t like top-level statements and have a static int Main you can make it static Task<int> Main instead and return the executor.ExecuteAsync instead of awaiting it.

Base classes

In some console applications, different commands can have the same optional flags, and I like to put mine in a class called BaseInput. Because I know I’m going to have several commands in this application, I’m going to add some base classes so that the different commands can share some of the same functionality. I’ve also used this in the past to, for example, create a database instance in the base class, which is then passed into each inheriting command. It’s also a good place to add some common argument/flag validation.

What I like to do is have an abstract base class, which inherits from the Oakton command, and add an abstract Run method to it, and usually a virtual bool ValidateInput too; these can then be overriden in our actual Command implementations and have a lot of nice functionality automated for us in a way that can be used across all Commands.

Some of the detail of these classes are elided, to stop this from being a super long post, you can see all the details in the Github repo.

public abstract class BaseCommand<TInput> : OaktonCommand<TInput>
    where TInput : BaseInput
{
    public override bool Execute(TInput input)
    {
        return ValidateInput(input) && Run(input);
    }

    protected abstract bool Run(TInput input);

    protected virtual bool ValidateInput(TInput input)
    {
        /* ... */
    }
}

This ensures that all the Commands we implement can optionally decide to validate the inputs that they take in, simply by overriding ValidateInput.

The async version is exactly the same… except async:

public abstract class AsyncBaseCommand<TInput> : OaktonAsyncCommand<TInput>
    where TInput : BaseInput
{
    public override Task<bool> Execute(TInput input)
    {
        return ValidateInput(input) && Run(input);
    }

    protected abstract Task<bool> Run(TInput input);

    protected virtual Task<bool> ValidateInput(TInput input)
    {
        /* ... */
    }
}

There is an additional class I’ve not yet shown, which adds some further reusable functionality between each base class, and that’s the BaseHelper class. I’ve got a pretty good idea that any commands I write for the app are going to operate on posts or post drafts, which in jekyll are stored in _posts and _drafts respectively. Consequently, the commands need an easy way of having these paths to hand, so a little internal helper class is a good place to put this shared logic.

internal class BaseHelper<TInput> where TInput : BaseInput
{
    public string DraftsPath { get; }

    public string PostsPath { get;  }

    private BaseHelper(TInput input)
    {
        DraftsPath = Path.Combine(input.BaseDirectoryFlag, "_drafts");
        PostsPath = Path.Combine(input.BaseDirectoryFlag, "_posts");
    }

    public static BaseHelper<TInput> Initialise(TInput input)
    {
        return new BaseHelper<TInput>(input);
    }

    public bool ValidateInput(TInput input)
    {
        if (!Directory.Exists(DraftsPath))
        {
            ConsoleWriter.Write(ConsoleColor.Red, "Unable to find blog _drafts folder");
            return false;
        }

        if (!Directory.Exists(PostsPath))
        {
            ConsoleWriter.Write(ConsoleColor.Red, "Unable to find blog _posts folder");
            return false;
        }

        return true;
    }
}

This means that our base class implementations can now become:

private BaseHelper<TInput> _baseHelper = null!;
protected string DraftsPath => _baseHelper.DraftsPath;
protected string PostsPath => _baseHelper.PostsPath;

public override bool Execute(TInput input)
{
    _baseHelper = BaseHelper<TInput>.Initialise(input);
    return ValidateInput(input) && Run(input);
}

protected virtual bool ValidateInput(TInput input)
{
    return _baseHelper.ValidateInput(input);
}
Note the null!, where I am telling the compiler to ignore the fact that _baseHelper is being initialised to null, as I know better.

This allows each command implementation to hook into this method and validate itself automatically.

First Command

Now that we have some base classes to work with, we can start to write our first command. If you check the history in the repo, you’ll see this wasn’t the first command I actually wrote… but it probably should have been. In any case, it only serves to illustrate our first real command implementation.

public class InfoCommand : BaseCommand<BaseInput>
{
    public InfoCommand()
    {
        Usage("Info");
    }

    protected override bool Run(BaseInput input)
    {
        var posts = LoadsPosts();
        var blogDetails = new Details();

        DeterminePostCount(posts, blogDetails);
        DetermineDraftsInfo(posts, blogDetails);
        DetermineRecentPosts(posts, blogDetails);
        DetermineDaysSinceLastPost(blogDetails);

        RenderDetails(blogDetails);

        return true;
    }

    /**...*/
}

LoadPosts is a method in the base class which is responsible for loading the posts into memory, so that we can process them and extract meaningful details about the posts. We put store this information in a Details class, which is what we ultimately use to render the details to the console. You can see the details of these methods in the github repository, however they all boil down to simple Linq queries.

Summary

In this post we’ve seen how to setup Oakton and configure a base class to extend the functionality and give us more flexibility, and an initial command. In subsequent posts, we’ll cover more commands and I’ll start to use the utility to tidy up metadata across all the posts in the blog and fix things like images for posts.

2022-07-14 00:00

2022-06-09

Normally you can’t broadly stop someone from being able to send you mail. However, there is a loophole.

You can file a PS Form 1500 and say that the advertisement you received from them made you horny. No questions asked prohibitory order.

🙅‍♂️🥵📫

by Factor Mystic at 2022-06-09 00:37

2022-03-11

I just had to setup my vimrc and vimfiles on a new laptop for work, and had some fun with Vim, mostly as it’s been years since I had to do it. I keep my vimfiles folder in my github, so I can grab it wherever I need it.

To recap, one of the places that Vim will look for things is $HOME/vimfiles/vimrc, where $HOME is actually the same as %USERPROFILE%. In most corporate environments, the %USERPROFILE% is actually stored in a networked folder location, to enable roaming profile support and help when a user gets a new computer.

So you can put your vimfiles there, but, it’s a network folder - it’s slow to start an instance of Vim. Especially if you have a few plugins.

Instead, what you can do is to edit the _vimrc file in the Vim installation folder (usually in C:\Program Files (x86)\vim), delete the entire contents and replace it with:

set rpt+=C:\path\to\your\vimfiles
set viminfo+=nC:\path\to\your\vimfiles\or\whatever
source C:\path\to\your\vimfiles\vimrc

What this does is:

  1. Sets the runtime path to be the path to your vimfiles
  2. Tells vim where to store/update the viminfo file (which stores useful history state amongst other things)
  3. Source your vimrc file and uses that

This post largely serves as a memory aid for myself when I need to do this again in future I won’t spend longer than I probably needed to googling it to find out how to do it, but I hope it helps someone else.

2022-03-11 00:00

2022-03-04

Recently I was inspired by @buhakmeh’s blog post, Supercharge Blogging With .NET and Ruby Frankenblog to write something similar, both as an exercise and excuse to blog about something, and as a way of tidying up the metadata on my existing blog posts and adding header images to old posts.

High level requirements

The initial high level requirements I want to support are:

  1. Cross-platform. This blog is jekyll based, and as such is written in markdown. Any tool I write for automation purposes should be cross-platform.
  2. Easily add posts from the command line, and have some default/initial yaml header metadata automatically added.
  3. See a high level overview of the current status of my blog. This should include things like the most recent post, how many days I’ve been lazy and not published a post, available drafts etc
  4. Publish posts from the command line, which should update the post with published status and add the published date to the yaml header and filename.
  5. Create a customised post header for each post on the blog, containing some kind of blog branding template and the post title, and update or add the appropriate yaml header metadata to each post. This idea also comes from another @buhakmeh’s post.
  6. The blog has many years of blog posts, spread across several different blogging platforms before settling on Jekyll. As such, some of the yaml metadata for each blog post is… not consistent. Some effort should go into correcting this.
  7. Automaticlly notify Twitter of published posts.

Next steps

The next series of posts will cover implementing the above requirements… not necessarily in that order. First I will go over setting up the project and configuring Oakton.

After that I will probably cover implementing fixes to the existing blog metadata, as I think that is going to be something that will be required in order for any sort of Info function to work properly, as all of the yaml metadata will need to be consistent.

Then I think I’ll tackle the image stuff, which should be fairly interesting, and should give a nice look to the existing posts, as having prominent images for posts is part of the theme for the blog, which I’ve not really taken full advantage of.

I’ll try to update this post with links to future posts, or else make it all a big series.

dotnet new console --name BlogHelper9000

2022-03-04 00:00

2022-01-11

At work, we have recently been porting our internal web framework into .net 6. Yes, we are late to the party on this, for reasons. Suffice it to say I currently work in an inherently risk averse industry.

Anyway, one part of the framework is responsible for getting reports from SSRS.

The way it did this is to use a wrapper class around a SOAP client generated from good old ReportService2005.asmx?wsdl, using our faithful friend svcutil.exe. The wrapper class used some TaskCompletionSource magic on the events in the client to make the client.LoadReportAsync and the other *Async methods actually async, as the generated client was not truely async.

Fast forward to the modern times, and we need to upgrade it. How do we do that?

Obviously, Microsoft are a step ahead: svcutil has a dotnet version - dotnet-svcutil. We can install it and get going:

dotnet too install --global dotnet-svcutil

Once installed, we can call it against the endpoint:

Make sure you call this command in the root of the project where the service should go
dotnet-svcutil http://server/ReportServer/ReportService2005.asmx?wsdl

In our wrapper class, the initialisation of the client has to change slightly, because the generated client is different to the original svcutil implementation. Looking at the diff between the two files, it’s because the newer version of the client users more modern .net functionality.

The wrapper class constructor has to be changed slightly:

public Wrapper(string url, NetworkCredential credentials)
{
    var binding = new BasicHttpBinding(BasicHttpSecurityMode.TransportCredentialOnly);
    binding.Security.Transport.ClientCredentialType = HttpClientCredentialType.Ntlm;
    binding.MaxReceivedMessageSize = 10485760; // this is a 10mb limit
    var address = new EndpointAddress(url);

    _client = new ReportExecutionServiceSoapClient(binding, address);
    _client.ClientCredentials.Windows.AllowedInpersonationLevel = TokenImpersonationLevel.Impersonation;
    _client.ClientCredentials.Windows.ClientCredential = credentials;
}

Then, the code which actually generates the report can be updated to remove all of the TaskCompletionSource, which actually simplifies it a great deal:

public async Task<byte[]> RenderReport(string reportPath, string reportFormat, ParameterValue[] parameterValues)
{
    await _client.LoadReportAsync(null, reportPath, null);
    await _client.SetExecutionParametersAsync(null, null, parameterValues, "en-gb");
    var deviceInfo = @"<DeviceInfo><Toolbar>False</ToolBar></DeviceInfo>";
    var request = new RenderRequest(null, null, reportFormat, deviceInfo);
    var response = await _client.RenderAsync(request);
    return response.Result;
}

You can then do whatever you like with the byte[], like return it in an IActionResult or load it into a MemoryStream and write it to disk as the file.

Much of the detail of this post is sourced from various places around the web, but I’ve forgotten all of the places I gleaned the information from.

2022-01-11 00:00

2021-12-22

who is eating cereal anymore? Literally don’t think I’ve seen someone eat a bowl of cereal in twenty years

🥣🤔

by Factor Mystic at 2021-12-22 03:23

2021-10-26

Recently we realised that we had quite a few applications being deployed through Octopus Deploy, and that we had a number of Environments, and a number of Channels, and that managing the ports being used in Dev/QA/UAT across different servers/channels was becoming… problematic.

When looking at this problem, it’s immediately clear that you need some way of dynamically allocating a port number on each deployment. This blog post from Paul Stovell shows the way, using a custom Powershell build step.

As we’d lost track of what sites were using what ports, and that we also have ad-hoc websites in IIS that aren’t managed by Octopus Deploy, we thought that asking IIS “Hey, what ports are the sites you know about using?” might be a way forward. We also had the additional requirement that on some of our servers, we also might have some arbitary services also using a port and that we might bump into a situation where a port was chosen that was already being used by a non-IIS application/website.

Researching the first situation, it’s quickly apparent that you can do this in Powershell, using the Webadministration module. Based on the answers to this question on Stackoverflow, we came up with this:

Import-Module Webadministration

function Get-IIS-Used-Ports()
{
    $Websites = Get-ChildItem IIS:\Sites

    $ports = foreach($Site in $Websites)
    {
        $Binding = $Site.bindings
        [string]$BindingInfo = $Binding.Collection
        [string]$Port = $BindingInfo.SubString($BindingInfo.IndexOf(":")+1,$BindingInfo.LastIndexOf(":")-$BindingInfo.IndexOf(":")-1)

        $Port -as [int]
    }

    return $ports
}

To get the list of ports on a machine that are not being used is also fairly straightforward in Powershell:

function Get-Free-Ports()
{
    $availablePorts = @(49000-65000)
    $usedPorts = @(Get-NetTCPConnection | Select -ExpandProperty LocalPort | Sort -Descending | Where { $_ -ge 49000})

    $unusedPorts = foreach($possiblePort in $usedPorts)
    {
        $unused = $possiblePort -notin $usedPorts
        if($unused)
        {
            $possiblePort
        }
    }

    return $unusedPorts
}

With those two functions in hand, you can work out what free ports are available to be used as the ‘next port’ on a server. It’s worth pointing out that if a site in IIS is stopped, then IIS won’t allow that port to be used in another website (in IIS), but the port also doesn’t show up as a used port in netstat -a, which is kind of what Get-NetTCPConnection does.

function Get-Next-Port()
{
    $iisUsedPorts = Get-IIS-Used-Ports
    $freePorts = Get-Free-Ports

    $port = $freePorts | Where-Object { $iisUsedPorts -notcontains $_} | Sort-Object | Select-Object First 1

    Set-OctopusVariable -Name "Port" -Value "$port"
}

Then you just have to call it at the end of the script:

Get-Next-Port

You’d also want to have various Write-Host or other logging messages so that you get some useful output in the build step when you’re running it.

2021-10-26 00:00

2021-05-06

If you found this because you have a build server which is ‘offline’, without any external internet access because of reasons, and you can’t get your build to work because dotnet fails to restore the tool you require for your build process because of said lack of external internet access, then this is for you.

In hindsight, this may be obvious for most people, but it wasn’t for me, so here it is.

In this situation, you just need to shy away from local tools completely, because as of yet, I’ve been unable to find anyway of telling dotnet not to try to restore them, and they fail every build.

Instead, I’ve installed the tool(s) as a global tool, in a specific folder, e.g. C:\dotnet-tools, which I’ve then added to the system path on the server. You may need to restart the build server for it to pick up the changes to the environment variable.

One challenge that remains is how to ensure the dotnet tools are consistent on both the developer machine, and the build server. I leave that as an exercise for the reader.

2021-05-06 00:00

2021-04-01

I’m leaving this here so I can find it again easily.

We had a problem updating the Visual Studio 2019 Build Tools on a server, after updating an already existing offline layout.

I won’t go into that here, because it’s covered extensively on Microsoft’s Documentation website.

The installation kept failing, even when using --noweb. It turns out that when your server is completely cut off from the internet, as was the case here, you also need to pass --noUpdateInstaller.

This is because (so it would seem) that even though --noweb correctly tells the installer to use the offline cache, it doesn’t prevent the installer from trying to update itself, which will obviously fail in a totally disconnected environment.

2021-04-01 00:00

2021-01-03

Since a technical breakdown of how Betsy does texture compression was posted, I wanted to lay out how the compressors in Convection Texture Tools (CVTT) work, as well as provide some context of what CVTT's objectives are in the first place to explain some of the technical decisions.

First off, while I am very happy with how CVTT has turned out, and while it's definitely a production-quality texture compressor, providing the best compressor possible for a production environment has not been its primary goal. Its primary goal is to experiment with compression techniques to improve the state of the art, particularly finding inexpensive ways to hit high quality targets.

A common theme that wound up manifesting in most of CVTT's design is that encoding decisions are either guided by informed decisions, i.e. models that relate to the problem being solved, or are exhaustive.  Very little of it is done by random or random-like searching. Much of what CVTT exists to experiment with is figuring out techniques which amount to making those informed decisions.

CVTT's ParallelMath module, and choice of C++

While there's some concidence with CVTT having a similar philosophy to Intel's ISPC compressor, the reason for CVTT's SPMD-style design was actually motivated by it being built a port of the skeleton of DirectXTex's HLSL BC7 compressor.

I chose to use C++ instead of ISPC for three main reasons:
  • It was easier to develop it in Visual Studio.
  • It was easier to do operations that didn't parallelize well.  This turned out to matter with the ETC compressor in particular.
  • I don't trust in ISPC's longevity, in particular I think it will be obsolete as soon as someone makes something that can target both CPU and GPU, like either a new language that can cross-compile, or SPIR-V-on-CPU.

Anyway, CVTT's ParallelMath module is kind of the foundation that everything else is built on.  Much of its design is motivated by SIMD instruction set quirks, and a desire to maintain compatibility with older instruction sets like SSE2 without sacrificing too much.

Part of that compatibility effort is that most of CVTT's ops use a UInt15 type.  The reason for UInt15 is to handle architectures (like SSE2!) that don't support unsigned compares, min, or max, which means performing those operations on a 16-bit number requires flipping the high bit on both operands.  For any number where we know the high bit is zero for both operands, that flip is unnecessary - and a huge number of operations in CVTT fit in 15 bits.

The compare flag types are basically vector booleans, where either all bits are 1 or all bits are 0 for a given lane - There's one type for 16-bit ints, and one for 32-bit floats, and they have to be converted since they're different widths.  Those are combined with several utility functions, some of which, like SelectOrZero and NotConditionalSet, can elide a few operations.

The RoundForScope type is a nifty dual-use piece of code.  SSE rounding modes are determined by the CSR register, not per-op, so RoundForScope when targeting SSE will set the CSR, and then reset it in its destructor.  For other architectures, including the scalar target, the TYPE of the RoundForScope passed in is what determines the operation, so the same code works whether the rounding is per-op or per-scope.

While the ParallelMath architecture has been very resistant to bugs for the most part, where it has run into bugs, they've mostly been due to improper use of AnySet or AllSet - Cases where parallel code can behave improperly because lanes where the condition should exclude it are still executing, and need to be manually filtered out using conditionals.

BC1-7 common themes

All of the desktop formats that CVTT supports are based on interpolation.  S3TC RGB (a.k.a. DXT1) for instance defines two colors (called endpoints), then defines all pixels as being either one of those two colors, or a color that is part-way between those two colors, for each 4x4 block.  Most of the encoding effort is spent on determining what the two colors should be.
 
You can read about a lot of this on Simon Brown's post outlining the compression techniques used by Squish, one of the pioneering S3TC compressors, which in turn is the basis for the algorithm used by CVTT's BC1 compressor.

Principal component analysis

Principal component analysis determines, based on a set of points, what the main axis is that the colors are aligned along.  This gives us a very good guess of what the initial colors should be, simply using the colors that are the furthest along that axis, but it isn't necessarily ideal.

Endpoint refinement

In BC1 for instance, each color is assigned to one of four possible values along the color line.  CVTT solves for that by just finding the color with the shortest distance to each pixel's color.  If the color assignments are known, then it's possible to determine what the color values are that will minimize the sum of the square distance of that mapping.  One round of refinement usually yields slightly better results and is pretty cheap to check.  Two rounds will sometimes yield a slightly better result.

Extrapolation

One problem with using the farthest extents of the principal axis as the color is that the color precision is reduced (quantized) by the format.  In BC1-5, the color is reduced to a 16-bit color with 5 bits of red, 6 bits of green, and 5 bits of alpha.  It's frequently possible to achieve a more accurate match by using colors outside of the range so that the interpolated colors are closer to the actual image colors - This sacrifices some of the color range.

CVTT internally refers to these as "tweak factors" or similar, since what they functionally do is make adjustments to the color mapping to try finding a better result.

The number of extrapolation possibilities increases quadratically with the number of indexes.  CVTT will only ever try four possibilities: No insets, one inset on one end (which is two possibilities, one for each end), and one inset on both ends.

BC1 (DXT1)

CVTT's BC1 encoder uses the cluster fit technique developed by Simon Brown for Squish.  It uses the principal axis to determine an ordering of each of the 16 pixels along the color line, and then rather than computing the endpoints from the start and end points, it computes them by trying each possible count of pixels assigned to each endpoint that maintains the original order and still totals 16.  That's a fairly large set of possibilities with a lot of useless entries, but BC1 is fairly tight on bits, so it does take a lot of searching to maximize quality out of it.

BC2 (DXT3)

BC2 uses BC1 for RGB and 4bpp alpha.  There's not much to say here, since it just involves reducing the alpha precision.

BC3 (DXT5)

This one is actually a bit interesting.  DXT5 uses indexed alpha, where it defines two 8-bit alpha endpoints and a 3-bit interpolator per pixel, but it also has a mode where 2 of the interpolators are reserved 0 and 255 and only 6 are endpoint-to-endpoint values.  Most encoders will just use the min/max alpha.  CVTT will also try extrapolated endpoints, and will try for the second mode by assuming that any pixels within 1/10th of the endpoint range of 0 or 255 would be assigned to the reserved endpoints.  The reason for the 1/10th range is that the rounding range of the 6-value endpoints is 1/10th of the range, and it assumes that for any case where the endpoints would include values in that range, it would just use the 8-index mode and there'd be 6 indexes between them anyway.

BC4 and BC5

These two modes are functionally the same as BC3's alpha encoding, with the exception that the signed modes are offset by 128.  CVTT handles signed modes by pre-offsetting them and undoing the offset.

BC7

BC7 has 8 modes of operation and is the most complicated format to encode, but it's actually not terribly more complicated than BC1.  All of the modes do one of two things: They encode 1 to 3 pairs of endpoints that are assigned to specific groupings of pixels for all color channels, referred to as partitions, or or they encode one set of endpoints for the entire block, except for one endpoint, which is encoded separately.
 
Here are the possible partitions:

Credit: Jon Rocatis from this post.

Another feature of BC7 are parity bits, where the low bit of each endpoint is specified by a single bit.  Parity bits (P-bit) exist as a way of getting a bit more endpoint precision when there aren't as many available bits as there are endpoint channels without causing the channels to have a different number of bits, something that caused problems with gray discoloration in BC1-3.
 
CVTT will by default just try every partition, and every P-bit combination.

Based on some follow-up work that I'm still experimenting with, a good quality trade-off would be to only check certain subsets.  Among the BC7 subsets, the vast majority of selected subsets fall into a only about 16 of the possible ones, and omitting those causes very little quality loss.  I'll publish more about that when my next experiment is further along.

Weight-by-alpha issues

One weakness that CVTT's encoder has vs. Monte Carlo-style encoders is that principal component analysis does not work well for modes in BC7 where the alpha and some of the color channels are interpolated using the same indexes.  This is never a problem with BC2 or BC3, which can avoid that problem by calculating alpha first and then pre-weighting the RGB channels.

I haven't committed a solution to that yet, and while CVTT gets pretty good quality anyway, it's one area where it underperforms other compressors on BC7 by a noticeable amount.

Shape re-use

The groupings of pixels in BC7 are called "shapes."

One optimization that CVTT does is partially reuse calculations for identical shapes.  That is, if you look at the 3 subset grouping above, you can notice that many of the pixel groups are the same as some pixel groups in the 2 subset grouping.

To take advantage of that fact, CVTT performs principal component analysis on all unique shapes before performing further steps.  This is a bit of a tradeoff though: It's only an optimization if those shapes are actually used, so it's not ideal for if CVTT were to reduce the number of subsets that it checks.

Weight reconstruction

One important aspect of BC7 is that, unlike BC1-3, it specifies the precision that interpolation is to be done at, as well as the weight values for each index.  However, doing a table lookup for each value in a parallelized index values is a bit slow.  CVTT avoids this by reconstructing the weights arithmetically:

MUInt15 weight = ParallelMath::LosslessCast<MUInt15>::Cast(ParallelMath::RightShift(ParallelMath::CompactMultiply(g_weightReciprocals[m_range], index) + 256, 9));

Coincidentally, doing this just barely fits into 16 bits of precision accurately.

BC6H

BC6H is very similar to BC7, except it's 16-bit floating point.   The floating point part is achieved by encoding the endpoints as a high-precision base and low-precision difference from the base.  Some of the modes that it supports are partitioned similar to BC7, and it also has an extremely complicated storage format where the endpoint bits are located somewhat arbitrarily.
 
There's a reason that BC6H is the one mode that's flagged as "experimental."  Unlike all other modes, BC6H is floating point, but has a very unique quirk: When BC6H interpolates between endpoints, it's done as if the endpoint values are integers, even though they will be bit-cast into floating point values.

Doing that severely complicates making a BC6H encoder, because part of the floating point values are the exponent, meaning that the values are roughly logarithmic.  Unless they're the same, they don't even correlate proportionally with each other, so color values may shift erratically, and principal component analysis doesn't really work.

CVTT tries to do its usual tricks in spite of this, and it sort of works, but it's an area where CVTT's general approach is ill-suited.

ETC1

ETC1 is based on cluster fit, via what's basically a mathematical reformulation of it.

Basically, ETC1 is based on the idea that the human visual system sees color detail less than intensity detail, so it encodes each 4x4 block as a pair of either 4x2 or 2x4 blocks which each encode a color, an offset table ID, and a per-pixel index into the offset table.  The offsets are added to ALL color channels, making them grayscale offsets, essentially.
 

Unique cumulative offsets

What's distinct about ETC compares to the desktop formats, as far as using cluster fit is concerned, is two things: First, the primary axis is always known.  Second, the offset tables are symmetrical, where 2 of the entries are the negation of the other two.
 
The optimal color for a block, not accounting for clamping, will be the average color of the block, offset by 1/16th of the offset assigned to each pixel.  Since half of the offsets negate each other, every pair of pixels assigned to opposing offsets cancel out, causing no change.  This drastically reduces the search space, since many of the combinations will produce identical colors.  Another thing that reduces the search space is that many of the colors will be duplicates after the precision reduction from quantization.  Yet another thing is that in the first mode, the offsets are +2 and +4, which have a common factor, causing many of the possible offsets to overlap, cancelling out even more combinations.

So, CVTT's ETC1 compressor simply evaluates each possible offset from the average color that results in a unique color post-quantization, and picks the best one.  Differential mode works by selecting the best VALID combination of colors, first by checking if the best pair of colors is valid, and failing that, checking all evaluated color combinations.
 

ETC2

ETC2 has 3 additional selectable modes on top of the ETC1 modes.  One, called T mode, contains 4 colors: Color0, Color1, Color1+offset, and Color2+offset.  Another, called H mode, contains Color0+offset, Color0-offset, Color1+offset, and Color1-offset.  The final mode, called planar mode, contains what is essentially a base color and a per-axis offset gradient.

T and H mode

T and H mode both exist to better handle blocks where, within the 2x4 or 4x2 blocks, the colors do not align well along the grayscale axis.  CVTT's T/H mode encoding basically works with that assumption by trying to find where it thinks the poorly-aligned color axes might be.  First, it generates some chrominance coordinates, which are basically 2D coordinates corresponding to the pixel colors projected on to the grayscale plane.  Then, it performs principal component analysis to find the primary chrominance axis.  Then, it splits the block based on which side of the half-way point each pixel is to form two groupings that are referred to internally as "sectors."

From the sectors, it performs a similar process of inspecting each possible offset count from the average to determine the best fit - But it will also record if any colors NOT assigned to the sector can still use one of the results that it computed, which are used later to determine the actual optimal pairing of the results that it computed.

One case that this may not handle optimally is when the pixels in a block ARE fairly well-aligned along the grayscale axis, but the ability of T/H colors to be relatively arbitrary would be an advantage.
 

ETC2 with punch-through, "virtual T mode"

ETC2 supports punchthrough transparency by mapping one of the T or H indexes to transparent.  Both of these are resolved in the same way as T mode.  When encoding punch-through the color values for T mode are Color0, Color1+offset, transparent, Color1-offset, and in H mode, they are Color0+offset, Color0-offset, transparent, and Color1.

Essentially, both have a single color, and another color +/- an offset, there are only 2 differences: First, the isolated color H mode is still offset, so the offset has to be undone.  If that quantizes to a more accurate value, then H mode is better.  Second, the H mode color may not be valid - H mode encodes the table index low bit based on the order of the colors, but unlike when encoding opaque, reordering the colors will affect which color has the isolated value and which one has the pair of values. 

H mode as T mode encoding

One special case to handle with testing H mode is the possibility that the optimal color is the same.  This should be avoidable by evaluating T mode first, but the code handles it properly just to be safe.  Because H mode encodes the table low bit based on a comparison of the endpoints, it may not be possible to select the correct table if the endpoints are the same.  In that case, CVTT uses a fallback where it encodes the block as T mode instead, mapping everything to the color with the pair of offsets.

Planar mode

Planar mode involves finding an optimal combination of 3 values that determine the color of each channel value as O+(H*x)+(V*Y)

How planar mode actually works is by just finding the least-squares fit for each of those three values at once.
 
Where error=(reconstructedValue-actualValue)², we want to solve for d(error)/dO=0, d(error)/dH=0, and d(error)/dV=0

All three of these cases resolve to quadratic formulas, so the entire thing is just converted to a system of linear equations and solved.  The proof and steps are in the code.

ETC2 alpha and EAC

Both of these "grayscale" modes are both more complicated because they have 3-bit indexes, multiple lookup tables, and an amplitude multiplier.

CVTT tries a limited set of possibilities based on alpha insets.  It tries 10 alpha ranges, which correspond to all ranges where the index inset of each endpoint is +/- 1 the number of the other endpoint.  So, for example, given 8 alpha offsets numbered 0-7, it will try these pairs:
  • 0,7
  • 0,6
  • 1,7
  • 1,6
  • 1,5
  • 2,6
  • 2,5
  • 2,4
  • 3,5
  • 3,4
Once the range is selected, 2 multipliers are checked: The highest value that can be multiplied without exceeding the actual alpha range, and the smallest number that can be multiplied while exceeding it.

The best result of these possibilities is selected.

Possible improvements and areas of interest

BC6H is by far the most improvable aspect.  Traditional PCA doesn't work well because of the logarithmic interpolation.  Sum-of-square-difference in floating point pseudo-logarithmic space performs much worse than in gamma space and is prone to sparkly artifacts.

ETC1 cumulative offset deduplication assumes that each pixel is equally important, which doesn't hold when using weight-by-alpha.

ETC2 T/H mode encoding could try all 15 possible sector assignments (based on the 16-pixel ordering along the chroma axis) instead of one.  I did try finding the grouping that minimized the total square distance to the group averages instead of using the centroid as the split point, but that actually had no effect... they might be mathematically equivalent?  Not sure.

A lot of these concepts don't translate well to ASTC.  CVTT's approaches largely assume that it's practical to traverse the entire search space, but ASTC is highly configurable, so its search space has many axes, essentially.  The fact that partitioning is done AFTER grid interpolation in particular is also a big headache that would require its own novel solutions.

Reduction of the search space is one of CVTT's biggest sore spots.  It performs excellently at high quality targets, but is relatively slow at lower quality targets.  I justified this because typically developers want to maximize quality when import is a one-time operation done offline, and CVTT is fast enough for the most part, but it probably wouldn't be suitable for real-time operation.

by OneEightHundred (noreply@blogger.com) at 2021-01-03 23:21

2020-10-20

 

The plan to post a play-by-play for dev kind of fell apart as I preferred to focus on just doing the work, but the Windows port was a success.

If you want some highlights:

  • I replaced the internal resource format with ZIP archives to make it easier to create custom resource archives.
  • PICT support was dropped in favor of BMP, which is way easier to load.  The gpr2gpa tool handles importing.
  • Ditto with dropping "snd " resource support in favor of WAV.
  • Some resources were refactored to JSON so they could be patched, mostly dialogs.
  • Massive internal API refactoring, especially refactoring the QuickDraw routines to use the new DrawSurface API, which doesn't have an active "port" but instead uses method calls directly to the draw surface.
  • A bunch of work to allow resolution changes while in-game.  The game will load visible dynamic objects from neighboring rooms in a resolution-dependent way, so a lot of work went in to unloading and reloading those objects.

The SDL variant ("AerofoilSDL") is also basically done, with a new OpenGL ES 2 rendering backend and SDL sound backend for improved portability.  The lead version on Windows still uses D3D11 and XAudio2 though.

Unfortunately, I'm still looking for someone to assist with the macOS port, which is made more difficult by the fact that Apple discontinued OpenGL, so I can't really provide a working renderer for it any more.  (Aerofoil's renderer is actually slightly complicated, mostly due to postprocessing.)

Goin' mobile

In the meantime, the Android port is under way!  The game is fully playable so far, most of the work has to do with redoing the UI for touchscreens.  The in-game controls use corner taps for rubber bands and battery/helium, but it's a bit awkward if you're trying to use the battery while moving left due to the taps being on the same side of the screen.

Most of the cases where you NEED to use the battery, you're facing right, so this was kind of a tactical decision, but there are some screens (like "Grease is on TV") where it'd be really nice if it was more usable facing left.

I'm also adding a "source export" feature: The source code package will be bundled with the app, and you can just use the source export feature to save the source code to your documents directory.  That is, once I figure out how to save to the documents directory, which is apparently very complicated...

Anyway, I'm working on getting this into the Google Play Store too.  There might be some APKs posted to GitHub as pre-releases, but there may (if I can figure out how it works) be some Internal Testing releases via GPS.  If you want to opt in to the GPS tests, shoot an e-mail to codedeposit.gps@gmail.com

Will there be an iOS port?

Maybe, but there are two obstacles:

The game is GPL-licensed and there have reportedly been problems with Apple removing GPL-licensed apps from the App Store, and it may not be possible to comply with it.  I've heard there is now a way to push apps to your personal device via Xcode with only an Apple ID, which might make satisfying some of the requirements easier, but I don't know.

Second, as with the macOS version, someone would need to do the port.  I don't have a Mac, so I don't have Xcode, so I can't do it.


by OneEightHundred (noreply@blogger.com) at 2020-10-20 11:09

2020-10-06

A conservative estimate has me shooting hogs in 45 seconds

🐷🔫⏱

by Factor Mystic at 2020-10-06 22:36

2020-08-03

As part of modernisng, updating and generally overhauling my blog, I thought it would be nice to add some consistancy to the Yaml front matter used by Jekyll. For those who do not know, Jekyll uses Yaml front matter blocks to process any file which contains one as a special file. The front matter can contain variables in the form foo: value. Jekyll itself defines some predefined globabl variables and variables for posts, but anything else is valid and can be use in Liquid tags.

I wondered if I could write some F# to:

  1. Load all the markdown files.
  2. Parse all the front matter.
  3. Modify the front matter to drop variables no longer required by a theme.
  4. Update the front matter with new variables which are understand by the current theme.
  5. Randomly assign a path to a header image file for each post which doesn’t already have one.
  6. Write the front matter back to its post.

Fairly straightforward requirements.

Loading and parsing the front matter

I’m using YamlDotNet to do most of the heavy lifting. I think could also have used the FSharp.Configuration Type Provider, but I’m not sure that it would have done exaclty what I wanted.

I’m just writing this in an F# script, hosted in a project. After adding the YamlDotNet NuGet package, we can reference it and get to work:

#r "../../.nuget/packages/YamlDotNet/8.1.2/lib/netstandard2.1/YamlDotNet.dll"

open System.IO
open System.Text.RegularExpressions
open YamlDotNet.Serialization
open YamlDotNet.Serialization.NamingConventions

let path = "../sgrassie.github.io/_posts"

Here, we reference the package, and then open various namespaces for use later on. The code for my blog is kept in a separate folder, relative to the project which has got the fsharp scripts I’m writing abot in it. This is nice and easy.

type FrontMatter() =
    member val Title = "" with get, set
    member val Description = "" with get, set
    member val Layout = "" with get, set
    member val Tags = [|""|] with get, set
    member val Published = "" with get, set
    member val Category = "" with get, set
    member val Categories = "" with get, set
    member val Metadescription = "" with get, set
    member val Series = "" with get, set
    member val Featured = false with get, set
    member val Hidden = false with get, set
    member val Image = "" with get, set
    [<YamlMember(Alias = "featured_image", ApplyNamingConventions = false)>]
    member val FeaturedImage = "" with get, set
    [<YamlMember(Alias = "featured_image_thumbnail", ApplyNamingConventions = false)>]
    member val FeaturedImageThumbnail = "" with get, set
    [<YamlIgnore>]
    member val MarkdownFilePath = "" with get, set

This is a class with auto-implemented properties. You can see three attributes in use. The YamlMember attribute allows us to alias a property in Yaml which doesn’t follow the CamelCase convention we configured the deserialiser with. I think that a C# version of this would look pretty much the same.

let deserializer = DeserializerBuilder()
                     .WithNamingConvention(CamelCaseNamingConvention.Instance)
                     .Build()

This initialises the YamlDotNet deserialiser, and is pretty much almost exactly how you would do this in C#. To deserialise something, we need some Yaml. When I was testing this, I got an error in YamlDotNet that was pretty weird and essentially means that it can’t parse the file, and it turns out it’s because all the other stuff outside the Yaml front matter that is upsetting it.

let expression = "(?:---)(?<yaml>[\\s\\S]*?)(?:---)"

Oh regex, I do love thee.

Very simply, this regex will parse everything in a file between two --- blocks, into a named Yaml group. We now have actual front matter, we still need to parse into an object.

let extractFrontmatter filePath =
    let file = File.ReadAllText(filePath)
    let result = Regex.Match(file, expression).Groups.["yaml"].Value
    let frontMatter =
        let frontMatter = deserializer.Deserialize<FrontMatter>(result)
        frontMatter.MarkdownFilePath <- filePath
        frontMatter
    frontMatter

This is a bit more complex so lets unpack it:

  1. Pass in the filePath.
  2. Read all of the text from it.
  3. Strip only the front matter from the text.
  4. Parse the front matter test with an inner function, which uses the deserializer, and return it. Here, we also keep track of the file path (we will need this later).

We also need to load all of the markdown files:

let loadMarkdownFiles path = Directory.EnumerateFiles(path, "*.md", SearchOption.AllDirectories) 

Notice how those last couple of functions are using ‘currying’. It lets us do all of the work in one pipeline:

path |> loadMarkdownFiles |> Seq.map extractFrontmatter |> Seq.iter (fun x -> printfn "%s - %s" x.MarkdownFilePath x.Title)

This gives us a dataset to work with. Next time we’ll continue with the rest of the requirements.

2020-08-03 00:00

2020-07-27

Many years ago, after working in my first programming job for a couple of years the company was taken over, and coding tests for new hires were introduced. The incumbent developers all decided to take the test, and it was seen as a fun diversion for a couple of hours.

I don’t have access to the actual wording of the requirements given to candidates, but the test required a text file containing around 100k words to be loaded and sorted into the largest set of the longest anagram. For example in the words file I’m using in this blog post, there are 466544 words in the file, 406627 of which are anagrams. The largest set is for a 7 letter anagram, of whih there are 15 words. There are smaller sets of longer anagrams, we’re not interested in those. And, it had to run in in less a second. They had three hours to write it, on a computer not connected to the internet. They had access to Java, through Eclipse, C/C++/C# through Visual Studio and Delphi through Embarcadero Studio.

I don’t know where the test originally came from - I think it originated in a different company which had been acquired by the same company I now worked for, but I’m not sure. I think the intent of the test was to in part gauge how the candiate reacted to the deadline pressure, part how they could understand the requirements given to them, and lastly what sort of code they wrote.

As it has been a long time and the company no longer recruits after moving most development overseas, so, I’m going to present my solution.

Making people sit coding tests during interviews is not good for anyone, and doesn’t always guarantee that you’ll hire the best person for the job.

The Solution

First we have to load the file, and figure out to generate the anagram and keep track of how many instances of that anagram there are. It turned out for the candidates taking the test that this was the bit that most got stuck on, specifically the short mental leap it took to working out you needed to sort the letters of the word alphabetically to create the key.

private static string CreateKey(string word)
{
    var lowerCharArray = word.ToLowerInvariant().ToCharArray();
    Array.Sort(lowerCharArray);
    return new string(lowerCharArray);
}

private static void LoadWords(string filePath, Dictionary<string, List<string>> words)
{
    using (var streamReader = File.OpenText(filePath))
    {
        string s;

        while ((s = streamReader.ReadLine()) != null)
        {
            var key = CreateKey(s);

            if (words.TryGetValue(key, out var set))
            {
                set.Add(s);
            }
            else
        
                var newSet = new List<string> {s};
                words.Add(key, newSet);
            }
        }
    }
}

words is a Dictionary<string, List<string>>, which we use to track the count of anagrams. The rest of the file loading is a fairly standard while loop over the reader ReadLine method, checking the dictionary to see if the anagram has already been found, and if so add the new word to the set, otherwise, add the anagram and create a new list to hold the word(s).

Once we have all the words loaded and matched into sets of anagrams, we can process them to work out which is the largest set with the longest word.

private static KeyValuePair<string, List<string>> ProcessAnagrams(Dictionary<string, List<string>> words)
{
    var largestSet = 0;
    var longestWord = 0;
    var foundSet = new KeyValuePair<string, List<string>>();

    foreach (var set in words)
    {
        if (set.Value.Count >= largestSet)
        {
            largestSet = set.Value.Count;

            if (set.Key.Length > longestWord)
            {
                longestWord = set.Key.Length;
                foundSet = set;
            }
            else
            {
                longestWord = 0;
            }
        }
    }

    return foundSet;
}

Here we simply bruteforce check all of the entries in the dictionary to find the answer. It’s not elegant, but it gets the job done. Running it on my Macbook Pro gives:

406627 anagrams processed from 466544 in 00:02:850
File read and key generation in 00:02:829
Anagrams searched in: 00:00:021
Found: 
Key: AEINRST (7), Count: 15
aeinrst
antsier
asterin
eranist
nastier
ratines
resiant
restain
retains
retinas
retsina
stainer
starnie
stearin
Tersina

2020-07-27 00:00

2020-07-20

There are lots of blog comment systems, and this blog has used Disqus as the comment system for a long time. I’m not going to go into all the reasons to move away from Disqus, but page load times and wanting more control over your data and being able to respect your readers privacy figure highly.

Also, this blog is a technical blog focused on software development and associated topics, and this means that anyone who wants to comment on my blog is almost certain to be familar with Github and have an account, and also be as uncomfortable using Disqus as I have been.

I did investigate rolling my own code based on examples from other blogs, who have used some jekyll liquid templates and javascript to pull from the Github API and use it to post comments back to the repo hosting the blog. This has some attraction, but also has a big drawback, which is the authorisation situation to the Github API, as you don’t really want your client id and client secret exposed in the repo.

Enter utteranc.es

You can get around this by hosting an app in heroku to use as the postback url so that you can hide the client id and client secret, and there is also staticman, but none of these seemed as simple as just using utteranc.es

To configure utteranc.es, head over to the website and follow the instructions, and fill out the form to suit you. For the blog post to issue mapping, I chose ‘Issue title contains page title’, and I also chose to have utteranc.es add a ‘Comment’ label to the issue it creates in the blog repository. After you do that, you’ll get a code snippet generated for you that looks somewhat like this:

<script src="https://utteranc.es/client.js"
        repo="sgrassie/sgrassie.github.io"
        issue-term="title"
        label="Comment"
        theme="github-light"
        crossorigin="anonymous"
        async>
</script>

Add this to a jekyll include, for example utterances.html and then include it in your post.html layout at the position you want the blog comments to appear. Most jekyll blog templates have Disqus support, so it will probably just be a simple case of finding where in the layout that Disqus is included, and replacing it.

Exporting existing comments

If your existing comments are not important to you, then at this point you can stop and enjoy your new Github powered comment system. Personally for me, it’s the principle of the thing, and the fact that the comments on my blog belong to me, and the author of the comment. So, we can do something about it.

Disqus allows you to export your comments, and once you do so, you will get your comments emailed to the email registered with your Disqus account. I’ve done a lot of work with XML in a previous role, and I think that the Disqus XML export looks… odd. The reason I say that is that each post on your blog appears to be mapped to a <thread> element, which contains a bunch of expected metadata about the blog post. I would expect each individual comment to be a nested in a <comments> element, but this is not the case. Instead, each individual comment has an entry as a <post> element at the same level as the <thread>, and they are mapped to each other using and attribute id. I don’t think that makes any sense, I’m sure there must be good reasons. I just can’t think what might be.

A comment then, looks like this:

<thread dsq:id="1467739952">
    <id>218 http://temporalcohesion.co.uk/?p=218</id>
    <forum>temporalcohesion</forum>
    <category dsq:id="2467491" />
    <link>http://temporalcohesion.co.uk/2010/10/25/lets-write-an-api-library-for-github/</link>
    <title>Let&amp;#8217;s write an API library for Github</title>
    <message />
    <createdAt>2010-10-25T12:00:24Z</createdAt>
    <author>
        <name>Stuart Grassie</name>
        <isAnonymous>false</isAnonymous>
        <username>stuartgrassie</username>
    </author>
    <isClosed>false</isClosed>
    <isDeleted>false</isDeleted>
</thread>

An actual comment on this post looks like:

<post dsq:id="952258229">
    <id>wp_id=25</id>
    <message><![CDATA[<p>Great post Stu!</p>]]></message>
    <createdAt>2010-10-25T22:47:44Z</createdAt>
    <isDeleted>false</isDeleted>
    <isSpam>false</isSpam>
    <author>
        <name>John Sheehan</name>
        <isAnonymous>true</isAnonymous>
    </author>
    <thread dsq:id="1467739952" />

You can see the way that the post element is mapped back to the containing thread using the dsq:id attribute.

Parsing the XML

The strange structure of the XML makes it less straightforward to parse the XML, as it means we’ll have to do a little bit of work in matching up blog posts and the comments on them. Also very annoying is the fact that a thread element doesn’t know if it actually has any associated post comments.

We can acomplish this fairly easily with a little bit of F# and the FSharp.Data XmlProvider. Setting the provider up is straightforward, here I’m just using a direct reference to the assembly which I’d previously added via NuGet.

#r "../../.nuget/packages/fsharp.data/3.3.3/lib/netstandard2.0/FSharp.Data.dll
open FSharp.Data

type Disqus = XmlProvider<"/Users/stuart/Downloads/temporalcohesion-2020-07-13T20 27 09.014136-all.xml">

type Comment = { Author: string; Message: string; Created: System.DateTimeOffset; ParentThreadId: int64; }
type BlogPost = { Title: string; Url: string; Author: string; ThreadId: int64; Comments : Comment list }

let data = Disqus.Load("/Users/stuart/Downloads/temporalcohesion-2020-07-13T20 27 09.014136-all.xml")

If you are new to F# (and I’m still fairly new) this might look scary, but it really isn’t. After referencing the assembly in the script, we open the FSharp.Data namespace, and then initialise an XmlProvider by passing it the XML the file we’re going to parse.

Do not do this for really big XML files! See the XmlProvider documentation for more details.

That enables the XmlProvider to infer a lot of things about the XML in the file, and then the XmlProvider loads the actual data from the file. Two records are also defined to hold the details about the Threads/Posts that are going to imported, and how multiple comments map refer to a single blog post. These records are analagous to simple C# POCO classes with getters and setters.

With these types ready, we can define a couple of functions to convert the XML into them, and thus do a way with a lot of the extraneous noise from the XML, that we don’t really care about.

let toComments posts =
    posts
    |> Seq.filter (fun (post : Disqus.Post) -> not post.IsSpam || not post.IsDeleted)
    |> Seq.map (fun (post : Disqus.Post) -> {Author = post.Author.Name; Message = post.Message; Created = post.CreatedAt; ParentThreadId = post.Thread.Id})
    |> Seq.toArray

let toBlogPosts posts =
    posts
    |> Seq.filter (fun (thread : Disqus.Thread) -> not thread.IsDeleted)
    |> Seq.map (fun (thread : Disqus.Thread) -> {Title = thread.Title; Url = thread.Link.Substring(0, thread.Link.Length - 1); Author = thread.Author.Name; ThreadId = thread.Id; Comments = [] })

These functions use currying, which as a longtime C# developer I’m still getting the hang of, and that will come in handy shortly. They map the Disqus types generated by the XmlProvider into the custom types I defined, taking care to filter out comments we don’t want to import and not importing any blog posts which Disqus says have been deleted.

I’m not sure the Seq.filter in the toComments function worked correctly, as I still had to go and manually delete a couple of comments that were marked as spam from the Github Issues

With those functions defined, we need a way of mapping the comments to the correct blog post.

let mapBlogToComments(post, comments) =
    let commentsOnPost = comments 
                         |> Array.filter (fun comment -> comment.ParentThreadId = post.ThreadId) 
                         |> Array.toList
    {post with Comments = commentsOnPost}

Here we take a single post, and all of the comments, and then use a nested function to grab the set of comments associated to that post, by way of the ThreadId. With that written, we can use some more currying to create another function that will do a lot of hard work for us:

let addCommentsToTheirPosts comments = data.Threads |> toBlogPosts |> Seq.map (fun post -> mapBlogToComments(post, comments))

This function will take the threads, use the toBlogPosts method to turn them into BlogPost and then map each blog post to the correct comments using the method we’ve just defined to do that. But where do the comments come from? Well, it turns out this currying thing is really quite useful, as it enables all this magic looking |>, or ‘piping’ to happen.

let toImport = data.Posts
               |> toComments
               |> addCommentsToTheirPosts 
               |> Seq.filter (fun x -> x.Comments.Length > 0)

Take all the posts data, turn them all into comments, and then pipe that to the addCommentsToTheirPosts function, and then filter out blog posts which don’t have any comments, as importing those is pointless. All for around 24 lines of code. I know full well the C# it would take do all that, and whilst with C# 8 you could probably get close, I doubt you’d equal 24 lines.

Whilst googling for clarification on an aspect of the Octokit.net api, I came across Removing Disqus and adding GitHub Issue Comments, which is essentially what I’m doing here, just in C#.

Just to be on the safe side, it’s probably a good idea to look through each of the posts and comments that we’ve now got to just to see if things are matching up correctly.

toImport |> Seq.iter (fun post -> printfn "%s - %s - comments: %d" post.Title post.Url post.Comments.Length)

Running that will give you an idea of what blog posts are going to be imported, and the number of comments. The first time I ran this, I found some of the blog posts in the Disqus XML import did not have the posts title set, so I was getting duplicated post titles. As there were only three instances of this error, I just manually corrected the XML and re-reran the script to check I had everything correct.

Uploading to GitHub

So far, so good. Now comes the fun part and something I’ve yet to do in F#, which is interop with a C# library. It turns out that it’s not so hard, but that makes perfect sense when you understand that F# is a .net language, just like C#. A long time ago I started to write an API library for GitHub, but I gave it up in favour of Octokit.net.

The F# which follows looks horrible, and I am certain there must be a cleaner way of doing what I’m about to show, but I don’t know what it is.

We can easily reference Octokit and open the namespace as before:

#r "../../.nuget/packages/octokit/0.48.0/lib/netstandard2.0/Octokit.dll"
open Octokit

Then we just need to setup a few variables:

let repo = "sgrassie.github.io"
let githubApp = "foo"
let token = "<your-personal-access-token-here>"
let credentials = Credentials(token)
let header = ProductHeaderValue(githubApp)
let client = GitHubClient(header, Credentials = credentials) 

These just get us a client to work with, and all I did was just register a new Personal Access Token on my account to use as the password. Notice how with F# you don’t need to new anything, even though they are classes from a C# assembly. These can then be used in the following function, which I’m gonna prefix with this warning:

I’m still new at F#, I’ve no idea if what you’re avout to see is ‘good’ F#.

It does work though, so just… use at your own caution.

let exportToGithub posts =
    for post in posts do
        System.Threading.Thread.Sleep(2000)
        let issuebody = sprintf "Comment thread for the post [%s](%s)" post.Title post.Url
        printfn "%s" issuebody
        let newIssue = NewIssue(post.Title, Body = issuebody)
        let issue = client.Issue.Create("sgrassie", repo, newIssue) |> Async.AwaitTask |> Async.RunSynchronously
        printfn "New issue created for %s" post.Title
        for comment in post.Comments do
           System.Threading.Thread.Sleep(2000)
           let message = sprintf "Comment by **%s** on **%s** (imported from Disqus):\r\n\r\n%s" comment.Author (comment.Created.ToString("f")) comment.Message
           let newComment = client.Issue.Comment.Create("sgrassie", repo, issue.Number, message) |> Async.AwaitTask |> Async.RunSynchronously
           printfn "    New comment created for %s" comment.Author

toImport |> exportToGithub 

I’m sure that a more experienced F# person is going to look at that and be like “WTF”, but as I said, it does work. I left the printfn log messages in, but essentially it loops over each post, waits a couple of seconds and then creates the new issue, and then loops over all of the comments for that post and adds then as comments to the issue. I put the Thread.Sleep’s in the there just so I didn’t hammer the Github API, but honestly there was that few to import I doubt it would have trigged the rate limit, but I imagine a more popular blog with more comments on the posts woould.

2020-07-20 00:00

2020-07-13

I’ve been upgrading part of our build infrastructure to handle the ongoing upgrade to .net core, and as part of that, I’ve had to update the Cake build script to handle doing the restore in an offline environment, on the build server.

There is a great post on the Octopus blog about writing a Cake build script for .net core, I encourage you to check that out, I’m not going to repeat too much of that.

My specific requirement is that the `DotNetCoreRestore’ needs to succeed on an ‘offline’ build server, that is, a build server that has no access to the internet.

In order for this to succeed, you are going to need provide a way for NuGet to get the packages, usually this is done by maintaining an offline NuGet cache which you can point NuGet at, or even checking the packages into the repository. I’d always recommend going with the first option, although there are scenarios were the second option might be required.

However you do it, you need to tell NuGet where they are. The easiest thing to do is to use a NuGet.Config local to the .sln, but it is possible to code a location into the script.

Here is the restore task:

Task("Restore")
    .IsDependentOn("Clean")
    .Does(() =>
    {
        var settings = new DotNetCoreRestoreSettings();

        if(BuildSystem.IsRunningOnTeamCity)
        {
            settings.PackagesDirectory = "./packages";
            settings.IgnoreFailedSources = true
            //optionally
            //settings.Sources = new[] { "http://someinternalfeed/nuget" }
        }

        foreach(var project in projects)
        {
            DotNetCoreRestore(project.FullPath, settings);
        }
    });

This project uses a NuGet.config to add the paths of internal package sources, and sets the location of the packages folder to be local to the .sln - we’ve found this cuts down on conflicts on developer machines.

2020-07-13 00:00

2020-07-10

In the previous post, I displayed my fledgling understanding of F# by writing a script which can parse the CSV set of results of the English Premier League to generate the league table. The script does this primarily by using a mutable BCL Dictionary. F# is immutable by default, and whilst that is itself not immutable, you have to go out of your way to enable it. I’ll try to save repeating Scott Wlashin.

There are some improvements that can be made to the script. I’ll highlight them here and then link to the full script as a gist.

First a note on pattern matching. In the previous post I mentioned that I thought I could use pattern matching in a particular place, and obviously I can:

let fullTimeResult =
        match row.FTR with
        | "H" -> Home
        | "A" -> Away
        | _ -> Draw

Rather than if/then/else. Here, the “” is equal to the default in a C# switch statement, if it’s not a Home or Away (win), then it _must be a draw.

Making things immutable

To start making things immutable, we can update the updateTeam function from the previous post, and pass in a Map<string, LeagueRow>:

let updateTeam (league : Map<string, LeagueRow>, team : string, points : int, forGoals : int, againstGoals: int, won : int, drawn, lost: int) =
    if league.ContainsKey team then
        let existing = league.[team]
        let updated = {existing with Played = existing.Played + 1; Won = existing.Won + won; Drawn = existing.Drawn + drawn; Lost = existing.Lost + lost; For = existing.For + forGoals; Against = existing.Against + againstGoals; Points = existing.Points + points}
        league.Add(team, updated)
    else
        let leagueRow = {Team = team; Played = 1; Won = won; Drawn = drawn; Lost = lost; GD = 0; For = forGoals; Against = againstGoals; Points = points}
        league.Add(team, leagueRow)

The code is almost the same as the previous version, except that we no longer use the <- operator to update the mutable dictionary. What’s going on instead is that F# creates a new instance of the LeagueRow, with updated values, and adds that to the Map, by key, which has the side-effect of creating a new instance of the whole Map, with the league row identified by the key replaced with the updated version.

The updateHomeWin function becomes:

let updateHomeWin (league : Map<string, LeagueRow>, result : MatchResult) =
    let league = updateTeam(league, result.HomeTeam, 3, result.HomeGoals, result.AwayGoals, 1, 0, 0)
    let league = updateTeam(league, result.AwayTeam, 0, result.AwayGoals, result.HomeGoals, 0, 0, 1)
    league

This again replaces the BCL Dictionary with the Map, and simply passes the league map through each updateTeam call, and then returns the updated league object.

processMatchResult is also updated to pass in a Map, and calling the fold with it and a default map is straightforward:

|> Seq.fold processMatchResult (Map<string, LeagueRow> [])

This makes the script much more ‘the way of things’ in F#, which is to say it’s using an immutable data structure.

2020-07-10 00:00