For a lot of non-technical founders setting out to build their product, it can be hard to understand just how much work needs to be done before you write a single line of code! In this post, I will break down much of the ground work that needs to be done by your tech team before you start to actually build out your product.

This initial setup involves getting the following ready:

  1. Setting up your accounts with the app stores you plan to launch in
  2. Setting up your cloud hosting and database hosting plans
  3. Setting up your domain and configuring your SSL certificates
  4. Setting up your code repository
  5. Setting up an alpha/beta testing process for pre-release testing
  6. Setting up your task and bug tracking system
  7. Choosing the framework(s) you plan to leverage to build out your technology

While all of these steps are fairly straightforward, and common to just about every development project, they are nonetheless absolutely critical, and in some cases the decisions you make here can affect your company in the future — sometimes even very far into the future!

So let’s take a look at them one by one…

Setting up your app store accounts

First, you need to get set up with the app stores. Luckily, this process has become much simpler in the last few years than it was in the beginning! But it does still require some time, so plan accordingly!

Getting started on Google Play is a piece of cake, as Google is well-known for minimizing the hoops you have to jump through. For the most part you just need to sign up and pay the annual developer fee and you are good to go.

Apple, on the other hand, requires a D&B number and some more detailed information about your business (or yourself, if you are signing up as an individual). While the D&B number is not hard to get, it can take a little while to get, so be prepared to wait a while.

After the initial business set up, there is actually quite a bit of technical set up required at this stage — notably, you need to generate some signing certificates and distribution profiles with Apple in order to build, sign and deploy you apps. You need even more certificates if you plan to send push notifications to users, or leverage some of the other specialized functions that the iOS SDK exposes to you.

On Google Play, similarly, you will need to generate some certificates for app signing, sending notifications, and a few other optional bits of functionality.

Again, none of these steps are particularly difficult, but they do take time, and it is highly advisable that you have your developer or someone else experienced in these setup steps get this stuff done for you, to prevent unnecessary delays.

Setting up your cloud and database hosting

Back in the old days, you could run a tech company from your bedroom if you had the technical know-how. These days, you still can — but if you plan to scale up in the future and potentially accomodate millions of users, you will want to plan ahead and make sure you are running your tech on a scalable platform.

But it’s not just scaling you have to think about — its also cost of maintanence!

Back in the day, when I was running Glowfoto, our technology was powered by several cabinets in a vibrant datacenter in downtown Los Angeles, which was host to a couple dozen servers all performing individual tasks (serving files, serving web pages, answering database requests, managing payments….)

While this worked great, it was on us to make sure these servers were running. If we started to outgrow our infrastructure, it was on us to bring in more servers, and reconfigure our software and hardware set up to handle the new load.

Often this meant spending hours — sometimes in the middle of the night — in a cold data center with a laptop and screwdriver getting everything configured.

But that’s not all! The new load would also come with new bandwidth requirements, which meant once we got everything ready for scale, we had to go down a few floors and negotiate a bandwidth and colocation upgrade!

Luckily we don’t deal with this stuff anymore. Services like Amazon AWS, Google Cloud Platform, and Microsoft’s Azure provide us with an almost set-it-and-forget-it place to host our technology. Ok, I’m exaggerating the degree to which this is “hands off”, but compared to the old days, hosting our tech in the cloud is a dream.

We also have a few higher level options if we want to give up a little control in return for a more managed platform (services like Heroku for example).

But in general, we just need to decide on a hosting platform, and ideally a managed database platform.

On the majority of the projects I work on these days, this means signing up with Amazon AWS for server hosting, and MongoDB for database hosting. And our Mongo instances are actually hosted and run on AWS hardware!

This setup not only involves signing up and configuring billing, but also making some initial technical decisions. Managing hosted platforms like AWS is full time job in itself these days, so definitely make sure someone experienced is setting this up for you!

Setting up your domain and configuring your SSL certificates

Modern mobile platforms no longer allow connections over old-school http — you now must connect to your API over https which means you need a certificate for your domain. Which, in turn, means you need a domain!

On AWS this entails setting up a load balancer in front of your EC2 instances, configuring a certificate to use for SSL and setting up your domain to use an alias on top of your http endpoint to allow for secure communication with your API.

Does that all sound like gibberish? Don’t worry, it’s actually pretty straightforward. But I would like to pause for a moment to point out that we are only half way through the setup phase, and we’ve already touched FIVE Amazon AWS services (six if you count the billing module)!

So far we’ve dealt with IAM, S3, EC2 or ElasticBeanstalk, Route 53, and Certificate Manager — and we haven’t even written a single line of code!

I have worked with very large companies that have actually had a single person in charge of one of those services — e.g. a non-programmer, devops employee who only deals with managing DNS via Route 53. Granted, that’s pretty rare, but what isn’t rare is having a single person in charge of the AWS services as a whole. Again, this is not a programmer, this is just an administrator in charge of AWS management.

Remember at this point, you likely have one person in charge of AWS, and everything else, including your code! Hopefully you can start to see how important this is.

Setting up your code repository

This is pretty straightforward, but we’re going to want a place to host our code!

Why do we want this? Well, if we bring on additional programmers later — either because the code becomes more complex, or we start to move to more parallel development (e.g. having one programmer work on Android, one programmer work on iOS, etc) — then we want an easy place for new coders to quickly grab the code and start digging in, and also a place where the team can be immediately aware when one programmer makes a change, so we all can stay in sync.

More importantly, in the early stages, its important that you have access to your code, so that if anything ever happens to your developer, you won’t have to wonder where your project is!

There are a few options here, but the most common is to sign up with Github and create a repository there. Again, this is a fairly technical step (at least at the signup stage) so you will definitely want your developer to do this for you.

Setting up a deployment system for alpha/beta testing

You may be aware of Apple’s Test Flight system for beta testing, and you may know that deploying apps on Android devices outside of the store is (relatively!) simple.

However, Test Flight requires apps to be approved by Apple’s testers, and sideloading Android apps still requires a few steps that confuses the average user.

On my projects, I use the Beta deployment system by Fabric, which allows me to fire off dozens of updates a day and keep my clients up to date as often as I want. This comes in especially handy when developing functionality that requires extra discussion — we can review multiple app updates in a single day and try out options right on our devices, rather than imagining what they will look like, or waiting for Apple to approve the latest iteration for Test Flight deployment.

Fabric Beta is an absolutely mandatory tool in our tool belt, it’s free, and it’s fairly user friendly. But there is a bit of setup involved in terms of getting signing certificates set up and inviting testers, so again you’ll want your developer to get this ready for you!

Setting up your task management and bug tracking

This is the one step that admittedly isn’t SUPER important, since in the early stages of most development projects, the team is fairly small and email and Skype/Slack etc will work just fine. But services like Asana are free and easy and its not a bad idea to start learning how to use them now, since soon enough they will be necessary!

Choosing your frameworks

This is where your choice of developer starts to really matter, and its the point just before we start to write that mythical first line of code.

Are we going to build our technology on top of a framework? And if so, which one?

There are a couple reasons to work with a framework, but the two most important are: they help keep your code readable and manageable if and when you bring on new developers, and they reduce the amount of code you have to write!

There are a few popular options you can use to build a mobile app on top of — two of them are Firebase and Parse. Both give you a common paradigm with which you can build out your system, and both give you some tools you can use to start to visualize your database, add data to your system, and monitor and administrate your system once it goes live.

But most important is what you can get out-of-the-box. Taking Parse, for example, you will get — right away on installation — the following:

  • Simple push notification functionality on both platforms
  • Basic email-verification for new users
  • A robust user-authentication (signup, login, logout) system day one.
  • A file upload and hosting framework for handling, for example, user images or uploads
  • A well-defined framework for handling data validation (when and by whom objects can be created, deleted, or edited) and security (read/write access, security role definition, etc).

And more. These are all things you will need at some point and, without a framework like Parse providing them out of the box, will need to be coded by your developer. Leveraging Parse will allow you to cut literally hundreds of hours off of your development over the lifespan of your app by simple using this framework and the common systems it includes.

Obviously your developer will need to know how to use the framework you choose, but more importantly your developer should have the expertise to help you choose which framework is right for your project!

Now we code!

And there you have it… all of the work that must be done before we start to write any code! It’s a lot, and it takes dozens of hours. But it is absolutely critical that you don’t take any shortcuts here if you really truly want to maximize your chances of success!

I have had founders come into my office asking for help saving a dying project, and when we dug in, I actually identified the initial point of failure right here — before coding even began. Sounds crazy, but it happens. This is really, really important stuff!

I’m here to help. I have spent my entire career guiding founders and established enterprises along the path to successful product launch. Feel free to reach out to me if you want to discuss any of this in detail. The majority of this is what I cover during the initial call with clients!

by stromdotcom at 2020-01-23 21:21


Most of the images in Glider PRO's resources are in PICT format.

The PICT format is basically a bunch of serialized QuickDraw opcodes and can contain a combination of both image and vector data.

The first goal is to get all of the known resources to parse.  The good news is that none of the resources in the Glider PRO application resources or any of the houses contain vector data, so it's 100% bitmaps.  The bad news is that the bitmaps have quite a bit of variation in their internal structure, and sometimes they don't match the display format.

Several images contain multiple images spliced together within the image data, and at least one image is 16-bit color even though the rest of the images are indexed color.  One is 4-bit indexed color instead of 8-bit.  Many of them are 1-bit, and the bit scheme for 1-bit images is also inverted compared to the usual expectations (i.e. 1 is black, 0 is white).

Adding to these complications, while it looks like all of the images are using the standard system palette, there's no guarantee that they will - It's actually even possible to make a PICT image that combines multiple images with different color palettes, because the palette is defined per picture op, not per image file.

There's also a fun quirk where the PICT image frame doesn't necessarily have 0,0 as the top-left corner.

I think the best solution to this will simply be to change the display type to 32-bit and unpack PICT images to a single raster bitmap on load.  The game appears to use QuickDraw abstractions for all of its draw operations, so while it presumes that the color depth should be 8-bit, I don't think there's anything that will prevent GlidePort from using 32-bit instead.

In the meantime, I've been able to convert all of the resources in the open source release to PNG format as a test, so it should be possible to now adapt that to a runtime PICT loader.

by OneEightHundred ( at 2019-11-23 20:43


Recently found out that Classic Mac game Glider PRO's source code was released, so I'm starting a project called GlidePort to bring it to Windows, ideally in as faithful of a reproduction as possible and using the original data files.  Some additions like gamepad support may come at a later time if this stays on track.

While this is a chance to restore of the few iconic Mac-specific games of the era to, it's also a chance to explore a lot of the era technology, so I'll be doing some dev diaries about the process.

Porting Glider has a number of technical challenges: It's very much coded for the Mac platform, which has a lot of peculiarities compared to POSIX and Windows.  The preferred language for Mac OS was originally Pascal, so the C standard library is often mostly or entirely unused, and the Macintosh Toolbox (the operating system API)  has differences like preferring length-prefixed strings instead of C-style null terminated strings.

Data is in big endian format, as it was originally made for Motorola 68k and PowerPC CPUs.  Data files are split into two "forks," one as a flat data stream and the other as a resource database that the toolbox provides parsing facilities for.  In Mac development, parsing individual data elements was generally the preferred style vs. reading in whole structures, which leads to data formats often having variable-length strings and no padding for character buffer space or alignment.

Rendering is done using QuickDraw, the system-provided multimedia infrastructure.  Most images use the system-native PICT format, a vector format that is basically a list of QuickDraw commands.

At minimum, this'll require parsing a lot of Mac native resource formats, some Mac interchange formats (i.e. BinHex 4), reimplementation of a subset of QuickDraw and QuickTime, substitution of copyrighted fonts, and switch-out of numerous Mac-specific compiler extensions like dword literals and Pascal string escapes.

The plan for now is to implement the original UI in Qt, but I might rebuild the UI instead if that turns out to be impractical.

by OneEightHundred ( at 2019-10-10 02:03


When adding ETC support to Convection Texture Tools, I decided to try adapting the cluster fit algorithm used for desktop formats to ETC.

Cluster fit works by sorting the pixels into an order based on a color axis, and then repeatedly evaluating each possible combination of counts of the number of pixels assigned to each index.  It does so by taking the pixels and applying a least-squares fit to produce the endpoint line.

For ETC, this is is simplified in a few ways: The axis is always 1,1,1, so the step of picking a good axis is unnecessary.  There is only one base color and the offsets are determined by the table index, so the clustering step would only solve the base color.

Assuming that you know what the offsets for each pixel are, the least squares fit amounts to simply subtracting the offset from each of the input pixels and averaging the result.

For a 4x2 block, there are 165 possible cluster configurations, but it turns out that some of those are redundant, given certain assumptions.  The base color is derived from the formula ((color1-offset1)+(color2-offset2)+...)/8, but since the adds are commutative, that's identical to ((color1+color2+...)-(offset1+offset2+...))/8

The first half of that is the total of the colors, which is constant.  The second is the total of the offsets.

Fortunately, not all of the possible combinations produce unique offsets.  Some of them cancel out, since adding 1 to or subtracting 1 from the count of the offsets that are negatives of each other produces no change.  In an example case, the count tuples (5,0,1,2) and (3,2,3,0) are the same, since 5*-L + 0*-S + 1*S + 2*L = 3*-L + 2*-S + 3*S + 0*L.

For most of the tables, this results in only 81 possible offset combinations.  For the first table, the large value is divisible by the small value, causing even more cancellations, and only 57 possible offset combinations.

Finally, most of the base colors produced by the offset combinations are not unique after quantization: Differential mode only has 5-bit color resolution, and differential mode only has 4-bit resolution, so after quantization, many of the results get mapped to the same color.  Deduplicating them is also inexpensive: If the offsets are checked in ascending order, then once the candidate color progresses past the threshold where the result could map to a specific quantized color, it will never cross back below that threshold, so deduplication only needs to inspect the last appended quantized color.

Together, these reduce the candidate set of base colors to a fairly small number, creating a very optimal search space at low cost.

There are a few circumstances where these assumptions don't hold:

One is when the clamping behavior comes into effect, particularly when a pixel channel's value is near 0 or 255.  In that case, this algorithm can't account for the fact that changing the value of the base color would have no effect on some of the offset colors.

One is when the pixels are not of equal importance, such as when using weight-by-alpha, which makes the offset additions non-commutative, but that only invalidates the cancellation part of the algorithm.  The color total can be pre-weighted, and the rest of the algorithm would have to rely on working more like cluster fit: Sort the colors along the 1,1,1 line and determine the weights for the pixels in that order, generate all 165 cluster combinations, and compute the weight totals for each one.  Sort them into ascending order, and then the rest of the algorithm should work.

One is when dealing with differential mode constraints, since not all base color pairs are legal.  There are some cases where a base color pair that is just barely illegal could be made legal by nudging the colors closer together, but in practice, this is rare: Usually, there is already a very similar individual mode color pair, or another differential mode pair that is only slightly worse.

In CVTT, I deal with differential mode by evaluating all of the possibilities and picking the best legal pair.  There's a shortcut case when the best base color for both blocks produces a legal differential mode pair, but this is admittedly a bit less than optimal: It picks the first evaluation in the case of a tie when searching for the best, but since blocks are evaluated starting with the largest combined negative offset, it's a bit more likely to pick colors far away from the base than colors close to the base, even though colors closer to the average tend to produce smaller offsets and are more likely to be legal, so this could be improved by making the tie-breaking function prefer smaller offsets.

In practice though, the differential mode search is not where most of the computation time is spent: Evaluating the actual base colors is.

As with the rest of CVTT's codecs, brute force is still key: The codec is designed to use 8-wide SSE2 16-bit math ops wherever possible to processing 8 blocks at once, but this creates a number of challenges since sorting and list creation are not amenable to vectorization.  I solve this by careful insertion of scalar ops, and the entire differential mode part is scalar as well.  Fortunately, as stated, the parts that have to be scalar are not major contributors to the encoding time.

You can grab the stand-alone CVTT encoding kernels here:

by OneEightHundred ( at 2019-09-06 00:47


Hands down the most terrifying phase of building a new tech startup is launch. So many pieces of a very complex puzzle have to come together at exactly the same time in exactly the right way for things to go right.

If you’ve never gone through it, you may be wondering what on Earth I’m on about. After all, isn’t the goal just to get the product out the door as fast as possible? Release early, release often and all that, right? Well, not in my experience. I have actually witnessed many products that launched way too early! But more importantly, I’ve seen teams that launched way too wrongly.

While experience is helpful in getting all of those moving parts in sync and timed up perfectly, it also contributes to that overall dread. The more products you’ve launched, the more you know how many things can go wrong, and just how slightly one variable can be off to throw the whole machine into chaos and, worst of all, how much luck is actually involved!

I’m happy to say our launch of Graphite Comics in June went as well as we could have possibly hoped. Heck, how often can you say your app launch was covered by The New York times? But while I’d like to say we just got lucky, or our product was just that good — the truth is, a lot of work, stress, timing and money went into getting the product ready, the messaging ready, the branding ready, the team ready, the content ready… and so on and so on.

I’m at a point in my career now where I can say I have launched about a hundred products, and as such, about a hundred companies — although “company” is a broad term. Some “companies” are just simple apps that make a couple bucks and go away. Some “companies” are, like Graphite Comics, venture backed endeavors made up of millions of dollars worth of technology, serving top quality content curated and sourced from some of the biggest and brightest creators and publishers in the comic book industry. The amount of energy that went into this launch was nothing short of enormous, especially given the (relatively) small size of the team and our working budget.

We poured years into this product, and we hope it shows.

You can checkout Graphite Comics on the web, on iPhone and iPad in the App Store, or on Android in Google Play.

Are you looking to launch your app or tech product? Or are you at the idea phase, ready to start building the foundation for the product you ultimately hope to launch one day? Get in touch! I’d love to talk to you about your company and where you are at!

by stromdotcom at 2019-08-25 00:04


It would take me a while to get used to carrying my urine and feces around…

by Factor Mystic at 2019-05-23 21:51


Although there are readily available skeletons for purchase, I want to print my own skeleton for “profiling and debugging” purposes :)


by Factor Mystic at 2019-04-05 22:10


The March update of Graphite Comics just went live. It’s one of those updates that is very significant, with major changes under the hood — but which users will almost certainly not notice at all.

This makes for an interesting opportunity to talk about balancing needs in software development. In this case, we are looking at balancing speed and efficiency against operating costs. And even more interestingly, we can look at an example where a technical decision that improved efficiency and user experience evolved, over time, to actually have a detrimental effect on user experience!

In the early versions of Graphite, we didn’t have many of the features we have now — notably playlists, notifications, favorites, and so on. We really just had a big list of titles, and when you selected a title, we went out and fetched the books for that title. In order to keep things quick, when we fetched the books, we also fetched the pages for that book. That way when you selected a book to read, you didn’t have to wait for the app to go out and fetch the pages — you could just dive right in super quick.

The user experience, naturally, was amazing. Load times are a sure way to kill the mood, and we were avoiding load times completely by pre-loading the content.

In terms of economy, there was of course a minor increase in cost. This is a given, because we were loading content that may or may not have been necessary or needed. But that cost was so low as to be negligible.

A typical example in the beginning would look like this: a user opened a title, causing the app to fetch the books for that title. Titles had an average of around 10-20 books back then, each with 20-30 pages. So we were loading somewhere around 200-600 objects ahead of time. That might sound like a lot, but it really isn’t these days (I have a mental limit of 1000 objects that I keep in my head as an ideal maximum to fetch at once, and that’s just a personal preference), and because our system is designed efficiently, the overhead and cost to make queries like that is incredibly low. Fetching all of that data ahead of time added a tiny fraction of a second to the initial query, and an even smaller fraction of a penny to the cost of the query, but saved an entire query and all the time a query takes when the issue was tapped. The result was a great user experience at a low overall cost impact.

Over time, however, a few things changed. First, we started adding many, many more books to our system. Titles could now have hundreds or thousands of books. Even worse, books were starting to come in with hundreds of pages — sometimes up to 500 pages!

Additionally, we started adding new features like playlists and notifications. Those playlists included books, and notifications announced the release of new books. Similar to the query we ran when viewing a title, we were preloading books and pages for notifications (and other objects) that referenced a book — that way you could look at your notification feed, tap a book, and immediately — without an additional query — start reading that book. Neat!

Except for those big books, and the amount of top level objects being queried. In the notifications example, we were pulling down up to 1000 objects at the top level. Let’s look at a worst case scenario: you have 1000 notifications which alert you to a new book, each which has 150 pages. That query is pulling down 1000 top level objects, each with a book (so another 1000 objects), each with 150 pages — that’s 151,000 objects in one query!

Suddenly this low impact query which greatly improved user experience by pre-loading content became our longest and slowest query by far, and completely ruined the experience on a few screens. At that point, it became much more efficient and economical to load pages on demand.

Of course, there are ways to improve on this as well, and those improvements were part of the March update. For starters, we cache those loaded pages so, as long as the cache is still fresh, you don’t have to fetch pages to loaded recently (if, for example, you open a book you recently closed).

Also, as we have done from the beginning, we load the actual page content on demand — that’s what makes Graphite a streaming platform, after all — so the real heavy lifting of page content doesn’t happen until you are very close to viewing the page itself. So all in all, the impact of this change ended up being negligible, especially in light of the improvement to load times on those screens that contained hundreds or thousands of top level objects.

If you want to see this in action, download Graphite Comics for iOS or Graphite Comics for Android and start reading today! Graphite is a free to read, ad and subscription based platform for comic books which is currently in a public beta ahead of our official launch at San Diego Comic Con 2019.

Graphite for iOS

by stromdotcom at 2019-03-20 21:40


I’ve updated my social networking demo app Scrawl to include a couple often requested features – Universal links, email verification, and password reset.

In addition, I’ve integrated Mailgun into the backend application in order to allow the server to send email messages to users (which is obviously a requirement for two of these features).

If you are interested in a slightly deep dive into these features, read on!

Universal links

The first feature added to Scrawl is universal links. In a nutshell, this feature essentially allows our app to request that it be the first responder for any requests to a given domain.

For example, if your app is called MyApp and you own the domain, you can set up your app to handle any links to that domain. Instead of the device opening Safari and loading up a webpage, the URL can be passed to the app, and the app can handle the request.

This is great because it allows you to create a mobile website that works perfectly fine for users who don’t have your app installed, but allows those users who do have your app installed to get the best, native app experience possible.

Scrawl actually now supports two URL handling schemes: it is the handler for links from the domain as well as responding to it’s own custom URL scheme scrawl:// — the latter scheme is often useful for inter-app communication and data passing, whereas the former is convenient for users to pass around links to and into your app.

There’s another great advantage to using universal links: iOS devices will store your user’s credentials as if they were website passwords — which makes things much, much easier for your users when they reinstall your app and want to sign in again quickly!

In Scrawl, I’m using universal links to handle two kinds of links: email validation, and password reset requests.

Email verification

Often, you will want to make sure users verify their identity in some way before they have access to some or all of the features in your app. The simplest and most common validation is email verification.

In order to handle email validation requests, a few things have to happen:

  1. The user object needs to be updated to include a column to denote if the email address was validated — BUT this column must only be writable by the server! Users should not be able to overwrite this column themselves (e.g. via a public REST api or any other method).
  2. When a user signs up, or when the user requests a new validation email, the server should create an object in the database, generate a token, store that token in the new object, and send an email to the email address with a link back to the app. The object created must only be creatable by the server, and read/writeable by the server.
  3. When the user clicks this link, the app should handle the request by validating that the request was issued by the server and, if so, mark that new column as true. This is done by simply checking the database to see if an object exists for the given email address matching the token. Since the server created the token and sent it to the email address, and since the object we stored it in is not readable, we can safely assume that anyone who has the token at least has access to the email address.
  4. The validation request should then be deleted from the server.

This is all pretty straightforward. The interesting thing about email validation is how to handle the case when the user hasn’t validated their email yet.

Do we pop up a window and refuse to close it until the validation is complete, essentially blocking the user out until validation is successful?

Or do we simply check that column before we allow the user to perform certain actions — both on the client side and the server side?

In Scrawl, I opted for the latter option — primarily because I don’t like putting walls in front of users and prefer to let users do as much as we can possibly allow without any interruption in use.

Specifically, I want to prevent the user from creating any content — a place, a comment, a photo, etc — until they have validated their email address. Other types of actions — like voting on places or comments, or simply reading comments, are perfectly fine to do even if you haven’t verified your email address.

This approach is consistent with my overall philosophy of not annoying the user! Let them have as much of the experience as possible before you ask them to sign up, pay, verify their identity, etc.

However, that’s not always appropriate, and sometimes “as much as we can allow” is… well, nothing!

To see an example of the former case, you can check out GlobeChat, which requires validation to finish signing up or logging in. In the case of GlobeChat, the validation email is sent immediately upon signup (and login, if the user has not verified their email yet!), and a modal dialogue essentially locks users out until the validation occurs.

The final piece of the pie is gatekeeping on the client and server side. I far too often see projects where there is only one check — either client or server — before allowing access to a feature that demands verification.

The server should always have the final say, and in the worst case, if you really, really can only do that check in one place, please make it the server!

That said, client side checks are super important to make sure that you don’t waste calls to the server for events you have the ability to know ahead of time will be rejected. And even more important, you should alert your users to the fact that what they are trying to do is going to fail, without making them wait for a network request to complete. Even better, make sure the UI reflects this fact so they won’t even be able to try!

Password reset

When a user requests a password reset, the server needs to generate a token, and send an email to the user with a link back into the app.

When the user clicks the link, the client app should verify with the server that the request is valid, and then present the user with a password reset screen.

When the password is reset, the server should update the user, and delete the request.

This almost certainly seems really straightforward, but it is absolutely critical to get features like this right. If email validation or password reset requests are not secure and well built, users could potentially change other user’s passwords, create accounts with fake emails… or worse. Scrawl now reflects what I consider best practices in implementing these features.

by stromdotcom at 2019-03-01 01:31


In this series of posts, I want to discuss several myths, misconceptions, and misunderstandings that threaten to derail inexperienced or non-technical founders of tech startups.  

Note: this post also appears on Glowdot with permission.

In the beginning stages of planning an app development, there are two ubiquitous questions that get asked, and they are the obvious ones:

  • How much will it cost to build this app?
  • How long will it take to build this app?

These are not unreasonable questions — indeed, when building anything the first two things you need to wrap your head around are budget and timeline.

However, these questions become much more difficult — if not impossible — to answer when it comes to software. Let’s look at why that is.

Software is not physical

To start, when we set out to build software, we really need to understand that we aren’t building a physical, standalone thing that exists in its own space, on its own terms, independent of anything else.

Unlike, say, a house or a car which may need occasional maintenance or upgrades, software often needs constant maintenance and upgrades. It also often needs to evolve. There are a few common reasons for this.

The platform your software runs on has changed

Apple and Google constantly roll out updates to the software (iOS and Android) that powers our devices. These updates can include improvements to the overall performance of our devices, but they can also include breaking changes that cause our apps to suddenly not work as expected. This can include things like changes to way the OS allows apps to interface with hardware (permissions, drivers, etc) or changes to the SDK that we use to perform OS exposed tasks (deprecated methods, etc).

The APIs your software uses have changed

If your app relies on third party APIs — as many do — then you need to respond to changes in those APIs often.

A very common example is the Facebook SDK, which changes often and results in Facebook often retiring old methods. Sometimes these changes can be handled easily by updating the SDK and rewriting a few lines of code. Sometimes these changes can fundamentally change the way your app works.

A few years ago, Facebook changed the way apps could request a user’s friend data. And by “changed the way” I really mean, took away the option. You may remember a time when mobile games were spamming your Facebook wall with requests for you to join so-and-so in a game. When Facebook took away the ability for apps to request a users friends (without special permission to do so), many, many apps had to completely change the way they handled on-boarding and user acquisition.

Users are not static

Your users change, because people change. In order to stay relevant, your software is going to have to evolve to keep up with user expectations.

UI/UX expectations change

In the many years I have been building software, I have seen tremendous changes in the expectations of users in terms of not only functionality and performance, but even more importantly user interface (UI) and user experience (UX).

Its important to remember that the dominant apps — the Facebooks and Instagrams and so on — have whole teams devoted to improving the UI/UX of their apps, and even publishing papers on new theories regarding human computer interaction. When Instagram changes their interface, there is often a shockwave that affects every other app out in the wild, as the UI paradigm shifts in another direction.

User requirements change

Users’ needs change often! Let’s say you have an app that allows people to upload images, get a link, and post the link to their images in various places.

This works great, until a new social network comes along, and they forbid posting links. They do allow posting a special kind of code however. We have seen this in the past in the form of WordPress shortcodes, BBCode, and so on.

If that site blows up, you will definitely want your app to adapt and allow your users to post their images on this new site. If you don’t, I can assure you someone else will come along and fill that user need!

Your expectations change!

There is another even more common change that happens — you change your mind! It happens much, mucb more often than you might think. As development progresses and you get to hold in your hands this thing that started as just an idea, you start get a different perspective on it. You realize there are things you didn’t consider before. You realize some features you thought were critical are actually useless or at least less important. You simply reconsider and adjust, and therefore change direction. Sometimes, you may pivot and completely change the concept entirely!

Although in the best case we always set out to build the thing we originally wanted to build, its good to keep an open mind and remain flexible.

Bug free software is not a thing!

It pains me to say this, but there really is no such thing as bug free software. This is especially true as your software grows in scope and complexity.

What we strive to achieve, really, is a minimum set of bugs. There is not a single app you use that doesn’t have a giant bug list that is waiting to be tackled. If I could pull back the curtain on any piece of software, and it doesn’t matter what type of software, what platform, or who built it, there is a dedicated bug tracking system behind the scenes in which people are constantly entering bugs and developers are constantly pulling out bug reports and trying to fix them. Its just a reality of software development.

That bing said, however, your users should in any case be blissfully unaware of the number of issues in the queue — and often, they are. Many of the bugs you see in these lists are minor issues or issues that affect only a tiny subset of users doing very specific things. However, they still linger and need to be addressed — and as your software evolves, I can almost guarantee new ones will pop up.

Changes trigger new requirements

This is a big one. A change in one place often requires many changes in many other places. Being able to predict those new requirements comes with experience.

For example, lets say the CEO of a blog platform app hears from users that they want to be able to upload photos with their posts. He asks the CTO how long it will take to do it.

The inexperienced CTO thinks about it, realizes adding the client side functionality to choose and upload a photo is pretty simple, and the backend capability of receiving and storing the photo is also pretty simple, and linking the two together by adding a column to the Posts table in the database is a piece of cake, so he answers: “that’ll take a couple days!”.

The experienced CTO realizes that adding photo uploads creates a whole new set of requirements beyond just uploading a photo. We need to be able to delete photos, and replace photos. We need to be able to moderate user uploads to keep unwanted content off our platform — this alone often requires building a whole new software bundle to allow content moderators to do their work! We need to think about things like validating files on the client side and the server side (making sure users don’t upload files that aren’t images). We probably want to add some logic to resize uploads so that if a popular user uploads a 1GB image we don’t have to serve that giant file to everyone who views their post.

And on and on and on. That one simple feature request cascades out into a whole new set of requirements — and possibly those new requirements create their own requirements!

And the list goes on

There are even more reasons we can discuss, but hopefully I’ve made the point.

As a final example, the first big product I ever built was a social network and media sharing site that grew alongside MySpace in the early 2000s.

It started small — just a form to upload photos and get a sharing code running on a single, humble server. Eventually it evolved into a full-fledged social network, and grew to the point where it ran in a stack of highly powerful servers in a dedicated datacenter.

Let’s use that site to answer the original question here: when was development “finished”? When it was retired 12 years after it was launched! New features were constantly being added or removed according to user demands. New enhancements were constantly being made as traffic increased. It truly was a living, evolving being that needed constant care and maintenance, and that constant care and maintenance is largely why it was so successful.

So what’s the answer?

I’ve discussed in part why the original two questions that started this blog post are difficult if not impossible to answer. But I also noted that those questions are crucial to ask before starting any development! So how do we answer those questions?

The key is getting specific. Instead of asking “how much does it cost to build this app?” we break down the lifespan of the product into chunks we can wrap our head around — workshops, prototypes, alpha/beta development phases, release candidates and, after release, versioned updates. The question then becomes, “how long will it take to prototype an app with the following features….” or “how much will it take to get an MVP of my app, on iOS and Android, with this limited feature set“?

In this way, we can more accurately make estimates about how long and at what cost a project will take. Often, the first quote and proposal I offer is a package of several of those questions — we plan out as far ahead as is reasonable. This usually means a few workshops to set the plan, a prototype, an alpha and beta development where we plan out a few critical “must have” features, and then a final pre-release phase where we make sure everything we need to release the app publicly — aka the MVP or minimum viable product — and then we stop there.

Once we reach release, then we can regroup, discuss where we are, put together a list of feature requests, changes, and bug fixes, and estimate a timeline and budget for that new chunk of development. Not only does this help us answer those two hard questions more accurately, it helps us stay focused and on track by encouraging us all to not bite off more than we can chew.

By keeping the size and scope of each phase manageable, we set attainable goals rather than looking too far into the future and risk getting lost along the way.

There is an extra bit of knowledge here: if you are shopping around for quotes and you ask generally “how much will it cost to make this app?” and you get an explicit answer — run and don’t look back. This is a developer that is telling you what they think you want to hear, not what you need to hear. These developers will do anything to get you to sign on with them, from telling you anything and everything is possible, agreeing to every feature request without any advice as to how it might be approached differently, more efficiently, and more cost effectively, and they will approach your project as a single task instead of the complex procedure that it is.

by stromdotcom at 2019-02-25 17:11


Graphite is a scalable, efficient and multi-platform graphical content distribution system for mobile devices and the web.

I designed and developed the backend system powering Graphite — a Node.js based system that is powered by several AWS services, in addition to some locally hosted server functionality (mostly to handle maintenance jobs and statistical analysis of the live system) and a media sharing system facilitating the onboarding of new users coming from social media sites like Twitter and Facebook.

I also developed the iOS app for Graphite — one of the biggest and most complex mobile projects I have ever taken on. Although on the surface Graphite seems quite simple, in fact the technology powering it is extremely sophisticated and complex.

In addition to developing the iOS app and the server-side platform, I current manage the development of every other current and future platform — including Android, the web, and a few other platforms on the roadmap.

You can download the public beta of Graphite on iOS and on Android and visit the Graphite website here.

If you need an app developed, reach out and let’s talk! You can contact me using the contact form on this site, via Skype at stromdotcom or by visiting my company website at

by stromdotcom at 2019-02-12 20:13

Scrawl is a social media app developed over the course of one month, representing a typical social media app project.

Aside from being a pretty neat idea, I actually launched this app to demonstrate a few critical concepts that many first time (and even some fairly experienced!) mobile entrepreneurs might not be aware of.  Let’s take a look at a few of them!

Delayed permission requests

One of the surest ways to lose a new user is to ask for permissions too soon — or worse, immediately upon first launch of the app.  Even if you are lucky and don’t immediately cause that user to delete your app, you will almost certainly end up with that user denying your request.

The reason for this is pretty simple, if not immediately obvious.  A new user really doesn’t know much about you or your app at all.  And you certainly haven’t earned their trust yet.  But here you are, asking for access to their photos, contacts, location…. wouldn’t your first response be, “wait, who are you and what are you going to do with this information?”

And while the push notifications permission might not be invasive in terms of privacy, it is invasive in terms of annoyance factor for many apps, as it has been chronically abused in the past.  Consequently, most users tend to answer “no” unless they have a compelling reason to say “yes”.

The worst part of all this is that once a user says no, its really hard to get them to go into the settings app and change their answer to yes.  Primarily because most users don’t even know how do to that even if they wanted to.

A much better approach is to delay asking for those permissions until you have spent a little time with that user, and given them a good reason to accept the request.  In Scrawl, we do this in a few simple ways:

  • Push notifications are not requested until you perform an action for which you might receive a notification.  In Scrawl, you can receive notifications when users reply to you, or post in a place you have favorited.  So we don’t ask for push permission until you post something, or favorite a place.
  • For location permission, we start users off as if they were sitting in our office in Santa Monica, CA.  At the top of the “nearby places” list, we let them know they are looking at places near us, and offer them the option of sharing their location so we can show them places near them.  In this way, we allow the user to request that we accept their share, rather than intrusively ask them where they are.
  • For camera and photo library permissions, we ask only when the user requests to share an image.  This isn’t that interesting, however, as most apps work like this.  It is, in fact, default behavior in iOS apps.

Aside from waiting to ask, we employ one more strategy here to maximize acceptance of permission requests: we ask the user ourselves before we request that iOS make the request.  This is a little more advanced a concept, but it is based on the fact that once iOS itself asks you for permission, it will never ask again — the user MUST change their answer in the Settings app.  So instead of requesting the push notification permission, for example, we pop up an explanatory dialog explaining why we want it, and asking the user if they are interested in granting it.  If they say no, we don’t ask them again for a while, giving them a chance to get more comfortable in the app, and giving us a chance to earn trust.  If they say yes, then we tell iOS to go ahead and request the permission.  If they said yes to us, there’s a pretty good chance they’ll say yes to iOS.  So when the request is made, even though its the last time it will be made, the odds that it will be accepted is much, much higher.

Delayed log in requests

As a general principle, we like to delay asking a user to log in or sign up as long as we possibly can.  At a bare minimum, we try with all our might to avoid requesting users log in immediately upon launching the app!  For some apps, this is inevitable, for most situations, there is a set of functionality that is perfectly reasonable to offer to anonymous users.

In Scrawl, for example, users can search places and read posts, look at photos, and even report objectionable content without logging in.  But to create a place, post a message, upload a photo or vote on any content, they need to log in.  When a user attempts to do one of those things, that’s when we request they log in or sign up — not before.

Content moderation

One of the most important things to consider when creating a social network, and especially one which centers on user generated content (UGC), is content moderation.  You need to have a way to make sure objectionable content is quickly removed and malicious users are removed from the system.  I won’t go into detail about our moderation system here for obvious reasons, but we have created a model for self-moderation that allows objectionable content to be filtered out of the system until such a time that a human moderator can permanently remove it and moderate the offending user accounts.  It’s a pretty smart system, and one that we believe fixes this major problem which has plagued other similar apps in the past.

If you’d like to see Scrawl in action, we’d love for you to check it out on iOS here:

Scrawl on the App Store.

An Android version is coming soon!

Scrawl – post anything, anywhere, anytime!

If you need an app developed, reach out and let’s talk! You can contact me using the contact form on this site, via Skype at stromdotcom or by visiting my company website at

by stromdotcom at 2019-02-12 00:06


Originating in a thesis, REST is an attempt to explain what makes the browser distinct from other networked applications.

You might be able to imagine a few reasons why: there’s tabs, there’s a back button too, but what makes the browser unique is that a browser can be used to check email, without knowing anything about POP3 or IMAP.

Although every piece of software inevitably grows to check email, the browser is unique in the ability to work with lots of different services without configuration—this is what REST is all about.

HTML only has links and forms, but it’s enough to build incredibly complex applications. HTTP only has GET and POST, but that’s enough to know when to cache or retry things, HTTP uses URLs, so it’s easy to route messages to different places too.

Unlike almost every other networked application, the browser is remarkably interoperable. The thesis was an attempt to explain how that came to be, and called the resulting style REST.

REST is about having a way to describe services (HTML), to identify them (URLs), and to talk to them (HTTP), where you can cache, proxy, or reroute messages, and break up large or long requests into smaller interlinked ones too.

How REST does this isn’t exactly clear.

The thesis breaks down the design of the web into a number of constraints—Client-Server, Stateless, Caching, Uniform Interface, Layering, and Code-on-Demand—but it is all too easy to follow them and end up with something that can’t be used in a browser.

REST without a browser means little more than “I have no idea what I am doing, but I think it is better than what you are doing.”, or worse “We made our API look like a database table, we don’t know why”. Instead of interoperable tools, we have arguments about PUT or POST, endless debates over how a URL should look, and somehow always end up with a CRUD API and absolutely no browsing.

There are some examples of browsers that don’t use HTML, but many of these HTML replacements are for describing collections, and as a result most of the browsers resemble file browsing more than web browsing. It’s not to say you need a back and a next button, but it should be possible for one program to work with a variety of services.

For an RPC service you might think about a curl like tool for sending requests to a service:

$ rpctl http://service/ describe MyService
methods: ...., my_method

$ rpctl http://service/ describe MyService.my_method
arguments: name, age

$ rpctl http://service/ call MyService.my_method --name="James" --age=31
   message: "Hello, James!"

You can also imagine a single command line tool for a databases that might resemble kubectl:

$ dbctl http://service/ list ModelName --where-age=23
$ dbctl http://service/ create ModelName --name=Sam --age=23
$ ...

Now imagine using the same command line tool for both, and using the same command line tool for every service—that’s the point of REST. Almost.

$ apictl call MyService:my_method --arg=...
$ apictl delete MyModel --where-arg=...
$ apictl tail MyContainers:logs --where ...
$ apictl help MyService

You could implement a command line tool like this without going through the hassle of reading a thesis. You could download a schema in advance, or load it at runtime, and use it to create requests and parse responses, but REST is quite a bit more than being able to reflect, or describe a service at runtime.

The REST constraints require using a common format for the contents of messages so that the command line tool doesn’t need configuring, require sending the messages in a way that allows you to proxy, cache, or reroute them without fully understanding their contents.

REST is also a way to break apart long or large messages up into smaller ones linked together—something far more than just learning what commands can be sent at runtime, but allowing a response to explain how to fetch the next part in sequence.

To demonstrate, take an RPC service with a long running method call:

class MyService(Service):
    def long_running_call(self, args: str) -> bool:
        id = third_party.start_process(args)
        while third_party.wait(id):
        return third_party.is_success(id)

When a response is too big, you have to break it down into smaller responses. When a method is slow, you have to break it down into one method to start the process, and another method to check if it’s finished.

class MyService(Service):
    def start_long_running_call(self, args: str) -> str:
    def wait_for_long_running_call(self, key: str) -> bool:

In some frameworks you can use a streaming API instead, but replacing a procedure call with streaming involves adding heartbeat messages, timeouts, and recovery, so many developers opt for polling instead—breaking the single request into two, like the example above.

Both approaches require changing the client and the server code, and if another method needs breaking up you have to change all of the code again. REST offers a different approach.

We return a response that describes how to fetch another request, much like a HTTP redirect. You’d handle them In a client library much like an HTTP client handles redirects does, too.

def long_running_call(self, args: str) -> Result[bool]:
    key = third_party.start_process(args)
    return Future("MyService.wait_for_long_running_call", {"key":key})

def wait_for_long_running_call(self, key: str) -> Result[bool]:
    if not third_party.wait(key):
        return third_party.is_success(key)
        return Future("MyService.wait_for_long_running_call", {"key":key})
def fetch(request):
   response = make_api_call(request)
   while response.kind == 'Future':
       request = make_next_request(response.method_name, response.args)
       response = make_api_call(request)

For the more operations minded, imagine I call time.sleep() inside the client, and maybe imagine the Future response has a duration inside. The neat trick is that you can change the amount the client sleeps by changing the value returned by the server.

The real point is that by allowing a response to describe the next request in sequence, we’ve skipped over the problems of the other two approaches—we only need to implement the code once in the client.

When a different method needs breaking up, you can return a Future and get on with your life. In some ways it’s as if you’re returning a callback to the client, something the client knows how to run to produce a request. With Future objects, it’s more like returning values for a template.

This approach works for breaking up a large response into smaller ones too, like iterating through a long list of results. Pagination often looks something like this in an RPC system:

cursor = rpc.open_cursor()
output = []
while cursor:
    cursor = rpc.move_cursor(

Or something like this:

start = 0
output = []
while True:
    out = rpc.get_values(start, batch=30)
    start += len(out)
    if len(out)  str
   def lock_queue(self, worker_id:str, queue_name: str) -> str:
   def take_from_queue(self, worker_id: str, queue_name, queue_lock: str):
   def upload_result(self, worker_id, queue_name, queue_lock, next, result):
   def unlock_queue(self, worker_id, queue_name, queue_lock):
   def exit_worker(self, worker_id):

Unfortunately, the client code looks much nastier:

worker_id = rpc.register_worker(my_name)
lock = rpc.lock_queue(worker_id, queue_name)
while True:
    next = rpc.take_from_queue(worker_id, queue_name, lock)
    if next:
        result = process(next)
        rpc.upload_result(worker_id, queue_name, lock, next, result)
rpc.unlock_queue(worker_id, queue_name, lock)

Each method requires a handful of parameters, relating to the current session open with the service. They aren’t strictly necessary—they do make debugging a system far easier—but problem of having to chain together requests might be a little familiar.

What we’d rather do is use some API where the state between requests is handled for us. The traditional way to achieve this is to build these wrappers by hand, creating special code on the client to assemble the responses.

With REST, we can define a Service that has methods like before, but also contains a little bit of state, and return it from other method calls:

class WorkerApi(Service):
    def register(self, worker_id):
        return Lease(worker_id)

class Lease(Service):
    worker_id: str

     def lock_queue(self, name):
        return Queue(self.worker_id, name, lock)

    def expire(self):

class Queue(Service):
    name: str
    lock: str
    worker_id: str

     def get_task(self):
        return Task(.., name, lock, worker_id)

    def unlock(self):

class Task(Service)
    task_id: str
    worker_id: str

     def upload(self, out):
        mark_done(self.task_id, self.actions, out)

Instead of one service, we now have four. Instead of returning identifiers to pass back in, we return a Service with those values filled in for us. As a result, the client code looks a lot nicer—you can even add new parameters in behind the scenes.

lease = rpc.register_worker(my_name)

queue = lease.lock_queue(queue_name)

while True:
    next = queue.take_next() 
    if next:

Although the Future looked like a callback, returning a Service feels like returning an object. This is the power of self description—unlike reflection where you can specify in advance every request that can be made—each response has the opportunity to define a new parameterised request.

It’s this navigation through several linked responses that distinguishes a regular command line tool from one that browses—and where REST gets its name: the passing back and forth of requests from server to client is where the ‘state-transfer’ part of REST comes from, and using a common Result or Cursor object is where the 'representational’ comes from.

Although a RESTful system is more than just these combined—along with a reusable browser, you have reusable proxies too.

In the same way that messages describe things to the client, they describe things to any middleware between client and server: using GET, POST, and distinct URLs is what allows caches to work across services, and using a stateless protocol (HTTP) is what allows a proxy or load balancer to work so effortlessly.

The trick with REST is that despite HTTP being stateless, and despite HTTP being simple, you can build complex, stateful services by threading the state invisibly between smaller messages—transferring a representation of state back and forth between client and server.

Although the point of REST is to build a browser, the point is to use self-description and state-transfer to allow heavy amounts of interoperation—not just a reusable client, but reusable proxies, caches, or load balancers.

Going back to the constraints (Client-Server, Stateless, Caching, Uniform Interface, Layering and Code-on-Demand), you might be able to see how they things fit together to achieve these goals.

The first, Client-Server, feels a little obvious, but sets the background. A server waits for requests from a client, and issues responses.

The second, Stateless, is a little more confusing. If a HTTP proxy had to keep track of how requests link together, it would involve a lot more memory and processing. The point of the stateless constraint is that to a proxy, each request stands alone. The point is also that any stateful interactions should be handled by linking messages together.

Caching is the third constraint: labelling if a response can be cached (HTTP uses headers on the response), or if a request can be resent (using GET or POST). The fourth constraint, Uniform Interface, is the most difficult, so we’ll cover it last. Layering is the fifth, and it roughly means “you can proxy it”.

Code-on-demand is the final, optional, and most overlooked constraint, but it covers the use of Cursors, Futures, or parameterised Services—the idea that despite using a simple means to describe services or responses, the responses can define new requests to send. Code-on-demand takes that further, and imagines passing back code, rather than templates and values to assemble.

With the other constraints handled, it’s time for uniform interface. Like stateless, this constraint is more about HTTP than it is about the system atop, and frequently misapplied. This is the reason why people keep making database APIs and calling them RESTful, but the constraint has nothing to do with CRUD.

The constraint is broken down into four ideas, and we’ll take them one by one: self-descriptive messages, identification of resources, manipulation of resources through representations, hypermedia as the engine of application state.

Self-Description is at the heart of REST, and this sub-constraint fills in the gaps between the Layering, Caching, and Stateless constraints. Sort-of. It covers using 'GET’ and 'POST’ to indicate to a proxy how to handle things, and covers how responses indicate if they can be cached, too. It also means using a content-type header.

The next sub-constraint, identification, means using different URLs for different services. In the RPC examples above, it means having a common, standard way to address a service or method, as well as one with parameters.

This ties into the next sub-constraint, which is about using standard representations across services—this doesn’t mean using special formats for every API request, but using the same underlying language to describe every response. In other words, the web works because everyone uses HTML.

Uniformity so far isn’t too difficult: Use HTTP (self-description), URLs (identification) and HTML (manipulation through representations), but it’s the last sub-constraint thats causes most of the headaches. Hypermedia as the engine of application state.

This is a fancy way of talking about how large or long requests can be broken up into interlinked messages, or how a number of smaller requests can be threaded together, passing the state from one to the next. Hypermedia referrs to using Cursor, Future, or Service objects, application state is the details passed around as hidden arguments, and being the 'engine’ means using it to tie the whole system together.

Together they form the basis of the Representational State-Transfer Style. More than half of these constraints can be satisfied by just using HTTP, and the other half only really help when you’re implementing a browser, but there are still a few more tricks that you can do with REST.

Although a RESTful system doesn’t have to offer a database like interface, it can.

Along with Service or Cursor, you could imagine Model or Rows objects to return, but you should expect a little more from a RESTful system than just create, read, update and delete. With REST, you can do things like inlining: along with returning a request to make, a server can embed the result inside. A client can skip the network call and work directly on the inlined response. A server can even make this choice at runtime, opting to embed if the message is small enough.

Finally, with a RESTful system, you should be able to offer things in different encodings, depending on what the client asks for—even HTML. In other words, if your framework can do all of these things for you, offering a web interface isn’t too much of a stretch. If you can build a reusable command line tool, generating a web interface isn’t too difficult, and at least this time you don’t have to implement a browser from scratch.

If you now find yourself understanding REST, I’m sorry. You’re now cursed. Like a cross been the greek myths of Cassandra and Prometheus, you will be forced to explain the ideas over and over again to no avail. The terminology has been utterly destroyed to the point it has less meaning than 'Agile’.

Even so, the underlying ideas of interoperability, self-description, and interlinked requests are surprisingly useful—you can break up large or slow responses, you can to browse or even parameterise services, and you can do it in a way that lets you re-use tools across services too.

Ideally someone else will have done it for you, and like with a web browser, you don’t really care how RESTful it is, but how useful it is. Your framework should handle almost all of this for you, and you shouldn’t have to care about the details.

If anything, REST is about exposing just enough detail—Proxies and load-balancers only care about the URL and GET or POST. The underlying client libraries only have to handle something like HTML, rather than unique and special formats for every service.

REST is fundamentally about letting people use a service without having to know all the details ahead of time, which might be how we got into this mess in the first place.

2019-01-08 16:57


If you ask a programmer for advice—a terrible idea—they might tell you something like the following: Don’t repeat yourself. Programs should do one thing and one thing well. Never rewrite your code from scratch, ever!.

Following “Don’t Repeat Yourself” might lead you to a function with four boolean flags, and a matrix of behaviours to carefully navigate when changing the code. Splitting things up into simple units can lead to awkward composition and struggling to coordinate cross cutting changes. Avoiding rewrites means they’re often left so late that they have no chance of succeeding.

The advice isn’t inherently bad—although there is good intent, following it to the letter can create more problems than it promises to solve.

Sometimes the best way to follow an adage is to do the exact opposite: embrace feature switches and constantly rewrite your code, pull things together to make coordination between them easier to manage, and repeat yourself to avoid implementing everything in one function..

This advice is much harder to follow, unfortunately.

Repeat yourself to find abstractions.

“Don’t Repeat Yourself” is almost a truism—if anything, the point of programming is to avoid work.

No-one enjoys writing boilerplate. The more straightforward it is to write, the duller it is to summon into a text editor. People are already tired of writing eight exact copies of the same code before even having to do so. You don’t need to convince programmers not to repeat themselves, but you do need to teach them how and when to avoid it.

“Don’t Repeat Yourself” often gets interpreted as “Don’t Copy Paste” or to avoid repeating code within the codebase, but the best form of avoiding repetition is in avoiding reimplementing what exists elsewhere—and thankfully most of us already do!

Almost every web application leans heavily on an operating system, a database, and a variety of other lumps of code to get the job done. A modern website reuses millions of lines of code without even trying. Unfortunately, programmers love to avoid repetition, and “Don’t Repeat Yourself” turns into “Always Use an Abstraction”.

By an abstraction, I mean two interlinked things: a idea we can think and reason about, and the way in which we model it inside our programming languages. Abstractions are way of repeating yourself, so that you can change multiple parts of your program in one place. Abstractions allow you to manage cross-cutting changes across your system, or sharing behaviors within it.

The problem with always using an abstraction is that you’re preemptively guessing which parts of the codebase need to change together. “Don’t Repeat Yourself” will lead to a rigid, tightly coupled mess of code. Repeating yourself is the best way to discover which abstractions, if any, you actually need.

As Sandi Metz put it, “duplication is far cheaper than the wrong abstraction”.

You can’t really write a re-usable abstraction up front. Most successful libraries or frameworks are extracted from a larger working system, rather than being created from scratch. If you haven’t built something useful with your library yet, it is unlikely anyone else will. Code reuse isn’t a good excuse to avoid duplicating code, and writing reusable code inside your project is often a form of preemptive optimization.

When it comes to repeating yourself inside your own project, the point isn’t to be able to reuse code, but rather to make coordinated changes. Use abstractions when you’re sure about coupling things together, rather than for opportunistic or accidental code reuse—it’s ok to repeat yourself to find out when.

Repeat yourself, but don’t repeat other people’s hard work. Repeat yourself: duplicate to find the right abstraction first, then deduplicate to implement it.

With “Don’t Repeat Yourself”, some insist that it isn’t about avoiding duplication of code, but about avoiding duplication of functionality or duplication of responsibility. This is more popularly known as the “Single Responsibility Principle”, and it’s just as easily mishandled.

Gather responsibilities to simplify interactions between them

When it comes to breaking a larger service into smaller pieces, one idea is that each piece should only do one thing within the system—do one thing, and do it well—and the hope is that by following this rule, changes and maintenance become easier.

It works out well in the small: reusing variables for different purposes is an ever-present source of bugs. It’s less successful elsewhere: although one class might do two things in a rather nasty way, disentangling it isn’t of much benefit when you end up with two nasty classes with a far more complex mess of wiring between them.

The only real difference between pushing something together and pulling something apart is that some changes become easier to perform than others.

The choice between a monolith and microservices is another example of this—the choice between developing and deploying a single service, or composing things out of smaller, independently developed services.

The big difference between them is that cross-cutting change is easier in one, and local changes are easier in the other. Which one works best for a team often depends more on environmental factors than on the specific changes being made.

Although a monolith can be painful when new features need to be added and microservices can be painful when co-ordination is required, a monolith can run smoothly with feature flags and short lived branches and microservices work well when deployment is easy and heavily automated.

Even a monolith can be decomposed internally into microservices, albeit in a single repository and deployed as a whole. Everything can be broken into smaller parts—the trick is knowing when it’s an advantage to do so.

Modularity is more than reducing things to their smallest parts.

Invoking the ‘single responsibility principle’, programmers have been known to brutally decompose software into a terrifyingly large number of small interlocking pieces—a craft rarely seen outside of obscenely expensive watches, or bash.

The traditional UNIX command line is a showcase of small components that do exactly one function, and it can be a challenge to discover which one you need and in which way to hold it to get the job done. Piping things into awk '{print $2}' is almost a rite of passage.

Another example of the single responsibility principle is git. Although you can use git checkout to do six different things to the repository, they all use similar operations internally. Despite having singular functionality, components can be used in very different ways.

A layer of small components with no shared features creates a need for a layer above where these features overlap, and if absent, the user will create one, with bash aliases, scripts, or even spreadsheets to copy-paste from.

Even adding this layer might not help you: git already has a notion of user-facing and automation-facing commands, and the UI is still a mess. It’s always easier to add a new flag to an existing command than to it is to duplicate it and maintain it in parallel.

Similarly, functions gain boolean flags and classes gain new methods as the needs of the codebase change. In trying to avoid duplication and keep code together, we end up entangling things.

Although components can be created with a single responsibility, over time their responsibilities will change and interact in new and unexpected ways. What a module is currently responsible for within a system does not necessarily correlate to how it will grow.

Modularity is about limiting the options for growth

A given module often gets changed because it is the easiest module to change, rather than the best place for the change to be made. In the end, what defines a module is what pieces of the system it will never responsible for, rather what it is currently responsible for.

When a unit has no rules about what code cannot be included, it will eventually contain larger and larger amounts of the system. This is eternally true of every module named ‘util’, and why almost everything in a Model-View-Controller system ends up in the controller.

In theory, Model-View-Controller is about three interlocking units of code. One for the database, another for the UI, and one for the glue between them. In practice, Model-View-Controller resembles a monolith with two distinct subsystems—one for the database code, another for the UI, both nestled inside the controller.

The purpose of MVC isn’t to just keep all the database code in one place, but also to keep it away from frontend code. The data we have and how we want to view it will change over time independent of the frontend code.

Although code reuse is good and smaller components are good, they should be the result of other desired changes. Both are tradeoffs, introducing coupling through a lack of redundancy, or complexity in how things are composed. Decomposing things into smaller parts or unifying them is neither universally good nor bad for the codebase, and largely depends on what changes come afterwards.

In the same way abstraction isn’t about code reuse, but coupling things for change, modularity isn’t about grouping similar things together by function, but working out how to keep things apart and limiting co-ordination across the codebase.

This means recognizing which bits are slightly more entangled than others, knowing which pieces need to talk to each other, which need to share resources, what shares responsibilities, and most importantly, what external constraints are in place and which way they are moving.

In the end, it’s about optimizing for those changes—and this is rarely achieved by aiming for reusable code, as sometimes handling changes means rewriting everything.

Rewrite Everything

Usually, a rewrite is only a practical option when it’s the only option left. Technical debt, or code the seniors wrote that we can’t be rude about, accrues until all change becomes hazardous. It is only when the system is at breaking point that a rewrite is even considered an option.

Sometimes the reasons can be less dramatic: an API is being switched off, a startup has taken a beautiful journey, or there’s a new fashion in town and orders from the top to chase it. Rewrites can happen to appease a programmer too—rewarding good teamwork with a solo project.

The reason rewrites are so risky in practice is that replacing one working system with another is rarely an overnight change. We rarely understand what the previous system did—many of its properties are accidental in nature. Documentation is scarce, tests are ornamental, and interfaces are organic in nature, stubbornly locking behaviors in place.

If migrating to the replacement depends on switching over everything at once, make sure you’ve booked a holiday during the transition, well in advance.

Successful rewrites plan for migration to and from the old system, plan to ease in the existing load, and plan to handle things being in one or both places at once. Both systems are continuously maintained until one of them can be decommissioned. A slow, careful migration is the only option that reliably works on larger systems.

To succeed, you have to start with the hard problems first—often performance related—but it can involve dealing with the most difficult customer, or biggest customer or user of the system too. Rewrites must be driven by triage, reducing the problem in scope into something that can be effectively improved while being guided by the larger problems at hand.

If a replacement isn’t doing something useful after three months, odds are it will never do anything useful.

The longer it takes to run a replacement system in production, the longer it takes to find bugs. Unfortunately, migrations get pushed back in the name of feature development. A new project has the most room for feature bloat—this is known as the second-system effect.

The second system effect is the name of the canonical doomed rewrite, one where numerous features are planned, not enough are implemented, and what has been written rarely works reliably. It’s a similar to writing a game engine without a game to implement to guide decisions, or a framework without a product inside. The resulting code is an unconstrained mess that is barely fit for its purpose.

The reason we say “Never Rewrite Code” is that we leave rewrites too late, demand too much, and expect them to work immediately. It’s more important to never rewrite in a hurry than to never rewrite at all.

null is true, everything is permitted

The problem with following advice to the letter is that it rarely works in practice. The problem with following it at all costs is that eventually we cannot afford to do so.

It isn’t “Don’t Repeat Yourself”, but “Some redundancy is healthy, some isn’t”, and using abstractions when you’re sure you want to couple things together.

It isn’t “Each thing has a unique component”, or other variants of the single responsibility principle, but “Decoupling parts into smaller pieces is often worth it if the interfaces are simple between them, and try to keep the fast changing and tricky to implement bits away from each other”.

It’s never “Don’t Rewrite!”, but “Don’t abandon what works”. Build a plan for migration, maintain in parallel, then decommission, eventually. In high-growth situations you can probably put off decommissioning, and possibly even migrations.

When you hear a piece of advice, you need to understand the structure and environment in place that made it true, because they can just as often make it false. Things like “Don’t Repeat Yourself” are about making a tradeoff, usually one that’s good in the small or for beginners to copy at first, but hazardous to invoke without question on larger systems.

In a larger system, it’s much harder to understand the consequences of our design choices—in many cases the consequences are only discovered far, far too late in the process and it is only by throwing more engineers into the pit that there is any hope of completion.

In the end, we call our good decisions ‘clean code’ and our bad decisions ‘technical debt’, despite following the same rules and practices to get there.

2018-08-05 13:02



In the last post we were left with some tests that exercised some very basic functionality of the Deck class. In this post, we will continue to add unit tests and write production code to make those tests pass, until we get a class which is able to produce a randomised deck of 52 cards.

Test Refactoring

You can, and should, refactor your tests where appropriate. For instance, on the last test in the last post, we only asserted that we could get all the cards for a particular suit. What about the other three? With most modern test frameworks, that is very easy.

public void Should_BeAbleToSelectSuitOfCardsFromDeck(Suit suit)
    var deck = new Deck();

    var cards = deck.Where(x => x.Suit == suit);


More Cards

We are going to want actual cards with values to work with. And for the next test, we can literally copy and past the previous test to use as a starter.

public void Should_BuildAllCardsInDeck(Suit suit)
    var deck = new Deck();

    var cards = deck.Where(x => x.Suit == suit);

    cards.Should().Contain(new List<Card> 
        new Card(suit, "A"), new Card(suit, "2"), new Card(suit, "3"), new Card(suit, "4"),
        new Card(suit, "5"), new Card(suit, "6"), new Card(suit, "7"), new Card(suit, "8"),
        new Card(suit, "9"), new Card(suit, "10"), new Card(suit, "J"), new Card(suit, "Q"),
        new Card(suit, "K")

Now that I’ve written this, when I compare it to the previous one, it’s testing the exact same thing, in slightly more detail. So we can delete the previous test, it’s just noise.

The test is currently failing because it can’t compile, due to there not being a constructor which takes a string. Lets fix that.

public struct Card
    private Suit _suit;
    private string _value;

    public Card(Suit suit, string value)
        _suit = suit;
        _value = value;

    public Suit Suit { get { return _suit; } }
    public string Value { get { return _value; } }

    public override string ToString()
        return $"{Suit}";

There are a couple of changes to this class. Firstly, I added the constructor, and private variables which hold the two defining variables, with properties with only public getters. I changed it from being a class to being a struct, and it’s now an immutable value type, which makes sense. In a deck of cards, there can, for example, only be one Ace of Spades.

These changes mean that are tests don’t work, as the Deck class is now broken, because the code which builds set of thirteen cards for a given suit is broken - it now doesn’t understand the Card constructor, or the fact that the .Suit property is now read-only.

Here is my first attempt at fixing the code, which I don’t currently think is all that bad:

private string _ranks = "A23456789XJQK";

private List<Card> BuildSuit(Suit suit)
    var cards = new List<Card>(_suitSize);

    for (var i = 1; i <= _suitSize; i++)
        var rank = _ranks[i-1].ToString();
        var card = new Card(suit, rank);

    return cards;

This now builds us four suites of thirteen cards. I realised as I was writing the production code that handling “10” as a value would be straightforward, so I opted for the simpler (and common) approach of using “X” to represent “10”. The test pass four times, once for each suit. This is probably unnecessary, but it protects us in future from inadvertantly adding any code which may affect the way that cards are generated for a particular suit.

Every day I’m (randomly) shuffling

It’s occured to me as I write this that the Deck class is funtionally complete, as it produces a deck of 52 cards when it is instantiated. You will however recall that we want a randomly shuffled deck of cards. If we consider, and invoke the Single Responsibility Principal, then we should add a Dealer class; we are modeling a real world event and a pack of cards cannot shuffle itself, that’s what the dealer does.


In this post I’ve completed the walk through of developing a class to create a deck of 52 cards using some basic TDD techniques. I realised adding the ability to shuffle the pack to the Deck class would be a violation of SRP, as the Deck class should not be concerned or have any knowledge about how it is shuffled. In the next post I will discuss how we can implement a Dealer class, and illustrate some techniques swapping the randomisation algorithim around.

2018-06-13 00:00


Debuggable code is code that doesn’t outsmart you. Some code is a little to harder to debug than others: code with hidden behaviour, poor error handling, ambiguity, too little or too much structure, or code that’s in the middle of being changed. On a large enough project, you’ll eventually bump into code that you don’t understand.

On an old enough project, you’ll discover code you forgot about writing—and if it wasn’t for the commit logs, you’d swear it was someone else. As a project grows in size it becomes harder to remember what each piece of code does, harder still when the code doesn’t do what it is supposed to. When it comes to changing code you don’t understand, you’re forced to learn about it the hard way: Debugging.

Writing code that’s easy to debug begins with realising you won’t remember anything about the code later.

Rule 0: Good code has obvious faults.

Many used methodology salesmen have argued that the way to write understandable code is to write clean code. The problem is that “clean” is highly contextual in meaning. Clean code can be hardcoded into a system, and sometimes a dirty hack can written in a way that’s easy to turn off. Sometimes the code is clean because the filth has been pushed elsewhere. Good code isn’t necessarily clean code.

Code being clean or dirty is more about how much pride, or embarrassment the developer takes in the code, rather than how easy it has been to maintain or change. Instead of clean, we want boring code where change is obvious— I’ve found it easier to get people to contribute to a code base when the low hanging fruit has been left around for others to collect. The best code might be anything you can look at quickly learn things about it.

  • Code that doesn’t try to make an ugly problem look good, or a boring problem look interesting.
  • Code where the faults are obvious and the behaviour is clear, rather than code with no obvious faults and subtle behaviours.
  • Code that documents where it falls short of perfect, rather than aiming to be perfect.
  • Code with behaviour so obvious that any developer can imagine countless different ways to go about changing it.

Sometimes, code is just nasty as fuck, and any attempts to clean it up leaves you in a worse state. Writing clean code without understanding the consequences of your actions might as well be a summoning ritual for maintainable code.

It is not to say that clean code is bad, but sometimes the practice of clean coding is more akin to sweeping problems under the rug. Debuggable code isn’t necessarily clean, and code that’s littered with checks or error handling rarely makes for pleasant reading.

Rule 1: The computer is always on fire.

The computer is on fire, and the program crashed the last time it ran.

The first thing a program should do is ensure that it is starting out from a known, good, safe state before trying to get any work done. Sometimes there isn’t a clean copy of the state because the user deleted it, or upgraded their computer. The program crashed the last time it ran and, rather paradoxically, the program is being run for the first time too.

For example, when reading and writing program state to a file, a number of problems can happen:

  • The file is missing
  • The file is corrupt
  • The file is an older version, or a newer one
  • The last change to the file is unfinished
  • The filesystem was lying to you

These are not new problems and databases have been dealing with them since the dawn of time (1970-01-01). Using something like SQLite will handle many of these problems for you, but If the program crashed the last time it ran, the code might be run with the wrong data, or in the wrong way too.

With scheduled programs, for example, you can guarantee that the following accidents will occur:

  • It gets run twice in the same hour because of daylight savings time.
  • It gets run twice because an operator forgot it had already been run.
  • It will miss an hour, due to the machine running out of disk, or mysterious cloud networking issues.
  • It will take longer than an hour to run and may delay subsequent invocations of the program.
  • It will be run with the wrong time of day
  • It will inevitably be run close to a boundary, like midnight, end of month, end of year and fail due to arithmetic error.

Writing robust software begins with writing software that assumed it crashed the last time it ran, and crashing whenever it doesn’t know the right thing to do. The best thing about throwing an exception over leaving a comment like “This Shouldn’t Happen”, is that when it inevitably does happen, you get a head-start on debugging your code.

You don’t have to be able to recover from these problems either—it’s enough to let the program give up and not make things any worse. Small checks that raise an exception can save weeks of tracing through logs, and a simple lock file can save hours of restoring from backup.

Code that’s easy to debug is code that checks to see if things are correct before doing what was asked of it, code that makes it easy to go back to a known good state and trying again, and code that has layers of defence to force errors to surface as early as possible.

Rule 2: Your program is at war with itself.

Google’s biggest DoS attacks come from ourselves—because we have really big systems—although every now and then someone will show up and try to give us a run for our money, but really we’re more capable of hammering ourselves into the ground than anybody else is.

This is true for all systems.

Astrid Atkinson, Engineering for the Long Game

The software always crashed the last time it ran, and now it is always out of cpu, out of memory, and out of disk too. All of the workers are hammering an empty queue, everyone is retrying a failed request that’s long expired, and all of the servers have paused for garbage collection at the same time. Not only is the system broken, it is constantly trying to break itself.

Even checking if the system is actually running can be quite difficult.

It can be quite easy to implement something that checks if the server is running, but not if it is handling requests. Unless you check the uptime, it is possible that the program is crashing in-between every check. Health checks can trigger bugs too: I have managed to write health checks that crashed the system it was meant to protect. On two separate occasions, three months apart.

In software, writing code to handle errors will inevitably lead to discovering more errors to handle, many of them caused by the error handling itself. Similarly, performance optimisations can often be the cause of bottlenecks in the system—Making an app that’s pleasant to use in one tab can make an app that’s painful to use when you have twenty copies of it running.

Another example is where a worker in a pipeline is running too fast, and exhausting the available memory before the next part has a chance to catch up. If you’d rather a car metaphor: traffic jams. Speeding up is what creates them, and can be seen in the way the congestion moves back through the traffic. Optimisations can create systems that fail under high or heavy load, often in mysterious ways.

In other words: the faster you make it, the harder it will be pushed, and if you don’t allow your system to push back even a little, don’t be surprised if it snaps.

Back-pressure is one form of feedback within a system, and a program that is easy to debug is one where the user is involved in the feedback loop, having insight into all behaviours of a system, the accidental, the intentional, the desired, and the unwanted too. Debuggable code is easy to inspect, where you can watch and understand the changes happening within.

Rule 3: What you don’t disambiguate now, you debug later.

In other words: it should not be hard to look at the variables in your program and work out what is happening. Give or take some terrifying linear algebra subroutines, you should strive to represent your program’s state as obviously as possible. This means things like not changing your mind about what a variable does halfway through a program, if there is one obvious cardinal sin it is using a single variable for two different purposes.

It also means carefully avoiding the semi-predicate problem, never using a single value (count) to represent a pair of values (boolean, count). Avoiding things like returning a positive number for a result, and returning -1 when nothing matches. The reason is that it’s easy to end up in the situation where you want something like "0, but true" (and notably, Perl 5 has this exact feature), or you create code that’s hard to compose with other parts of your system (-1 might be a valid input for the next part of the program, rather than an error).

Along with using a single variable for two purposes, it can be just as bad to use a pair of variables for a single purpose—especially if they are booleans. I don’t mean keeping a pair of numbers to store a range is bad, but using a number of booleans to indicate what state your program is in is often a state machine in disguise.

When state doesn’t flow from top to bottom, give or take the occasional loop, it’s best to give the state a variable of it’s own and clean the logic up. If you have a set of booleans inside an object, replace it with a variable called state and use an enum (or a string if it’s persisted somewhere). The if statements end up looking like if state == name and stop looking like if bad_name && !alternate_option.

Even when you do make the state machine explicit, you can still mess up: sometimes code has two state machines hidden inside. I had great difficulty writing an HTTP proxy until I had made each state machine explicit, tracing connection state and parsing state separately. When you merge two state machines into one, it can be hard to add new states, or know exactly what state something is meant to be in.

This is far more about creating things you won’t have to debug, than making things easy to debug. By working out the list of valid states, it’s far easier to reject the invalid ones outright, rather than accidentally letting one or two through.

Rule 4: Accidental Behaviour is Expected Behaviour.

When you’re less than clear about what a data structure does, users fill in the gaps—any behaviour of your code, intended or accidental, will eventually be relied upon somewhere else. Many mainstream programming languages had hash tables you could iterate through, which sort-of preserved insertion order, most of the time.

Some languages chose to make the hash table behave as many users expected them to, iterating through the keys in the order they were added, but others chose to make the hash table return keys in a different order, each time it was iterated through. In the latter case, some users then complained that the behaviour wasn’t random enough.

Tragically, any source of randomness in your program will eventually be used for statistical simulation purposes, or worse, cryptography, and any source of ordering will be used for sorting instead.

In a database, some identifiers carry a little bit more information than others. When creating a table, a developer can choose between different types of primary key. The correct answer is a UUID, or something that’s indistinguishable from a UUID. The problem with the other choices is that they can expose ordering information as well as identity, i.e. not just if a == b but if a <= b, and by other choices mean auto-incrementing keys.

With an auto-incrementing key, the database assigns a number to each row in the table, adding 1 when a new row is inserted. This creates an ambiguity of sorts: people do not know which part of the data is canonical. In other words: Do you sort by key, or by timestamp? Like with the hash-tables before, people will decide the right answer for themselves. The other problem is that users can easily guess the other keys records nearby, too.

Ultimately any attempt to be smarter than a UUID will backfire: we already tried with postcodes, telephone numbers, and IP Addresses, and we failed miserably each time. UUIDs might not make your code more debuggable, but less accidental behaviour tends to mean less accidents.

Ordering is not the only piece of information people will extract from a key: If you create database keys that are constructed from the other fields, then people will throw away the data and reconstruct it from the key instead. Now you have two problems: when a program’s state is kept in more than one place, it is all too easy for the copies to start disagreeing with each other. It’s even harder to keep them in sync if you aren’t sure which one you need to change, or which one you have changed.

Whatever you permit your users to do, they’ll implement. Writing debuggable code is thinking ahead about the ways in which it can be misused, and how other people might interact with it in general.

Rule 5: Debugging is social, before it is technical.

When a software project is split over multiple components and systems, it can be considerably harder to find bugs. Once you understand how the problem occurs, you might have to co-ordinate changes across several parts in order to fix the behaviour. Fixing bugs in a larger project is less about finding the bugs, and more about convincing the other people that they’re real, or even that a fix is possible.

Bugs stick around in software because no-one is entirely sure who is responsible for things. In other words, it’s harder to debug code when nothing is written down, everything must be asked in Slack, and nothing gets answered until the one person who knows logs-on.

Planning, tools, process, and documentation are the ways we can fix this.

Planning is how we can remove the stress of being on call, structures in place to manage incidents. Plans are how we keep customers informed, switch out people when they’ve been on call too long, and how we track the problems and introduce changes to reduce future risk. Tools are the way in which we deskill work and make it accessible to others. Process is the way in which can we remove control from the individual and give it to the team.

The people will change, the interactions too, but the processes and tools will be carried on as the team mutates over time. It isn’t so much valuing one more than the other but building one to support changes in the other.Process can also be used to remove control from the team too, so it isn’t always good or bad, but there is always some process at work, even when it isn’t written down, and the act of documenting it is the first step to letting other people change it.

Documentation means more than text files: documentation is how you handover responsibilities, how you bring new people up to speed, and how you communicate what’s changed to the people impacted by those changes. Writing documentation requires more empathy than writing code, and more skill too: there aren’t easy compiler flags or type checkers, and it’s easy to write a lot of words without documenting anything.

Without documentation, how can you expect people to make informed decisions, or even consent to the consequences of using the software? Without documentation, tools, or processes you cannot share the burden of maintenance, or even replace the people currently lumbered with the task.

Making things easy to debug applies just as much to the processes around code as the code itself, making it clear whose toes you will have to stand on to fix the code.

Code that’s easy to debug is easy to explain.

A common occurrence when debugging is realising the problem when explaining it to someone else. The other person doesn’t even have to exist but you do have to force yourself to start from scratch, explain the situation, the problem, the steps to reproduce it, and often that framing is enough to give us insight into the answer.

If only. Sometimes when we ask for help, we don’t ask for the right help, and I’m as guilty of this as anyone—it’s such a common affliction that it has a name: “The X-Y Problem”: How do I get the last three letters of a filename? Oh? No, I meant the file extension.

We talk about problems in terms of the solutions we understand, and we talk about the solutions in terms of the consequences we’re aware of. Debugging is learning the hard way about unexpected consequences, and alternative solutions, and involves one of the hardest things a programer can ever do: admit that they got something wrong.

It wasn’t a compiler bug, after all.

2018-05-14 04:30


Convection Texture Tools is now roughly equal quality-wise with NVTT at compressing BC7 textures despite being about 140 times faster, making it one of the fastest and highest-quality BC7 compressors.

How this was accomplished turned out to be simpler than expected.  Recall that Squish became the gold standard of S3TC compressors by implementing a "cluster fit" algorithm that ordered all of the input colors on a line and tried every possible grouping of them to least-squares fit them.

Unfortunately, using this technique isn't practical in BC7 because the number of orderings has rather extreme scaling characteristics.  While 2-bit indices have a few hundred possible orderings, 4-bit indices have millions, most BC7 mode indices are 3 bits, and some have 4.

With that option gone, most BC7 compressors until now have tried to solve endpoints using various types of endpoint perturbation, which tends to require a lot of iterations.

Convection just uses 2 rounds of K-means clustering and a much simpler technique based on a guess about why Squish's cluster fit algorithm is actually useful: It can create endpoint mappings that don't use some of the terminal ends of the endpoint line, causing the endpoint to be extrapolated out, possibly to a point that loses less accuracy to quantization.

Convection just tries cutting off 1 index at each end, then 1 index at both ends.  That turned out to be enough to place it near the top of the quality benchmarks.

Now I just need to add color weighting and alpha weighting and it'll be time to move on to other formats.

by OneEightHundred ( at 2018-03-30 05:26


I once ate 10 mg LSD by accident. It was a dilution error. The peak lasted ~10 hr. At some point, I saw the top of my head. But hey, maybe it was just an hallucination ;)

maybe ;)

by Factor Mystic at 2018-02-04 17:30


I know that last time I said I was going to start soldering, but I really wanted to play with the networking capabilities of the ESP8266 first.

There’s a bunch of example programs to do web requests, such as BasicHttpClient, HTTPSRequest, WifiClient, StreamHttpClient, etc. I had trouble getting these working because there’s no built in certificate store or TLS validation capabilities. That means you can’t do a “normal” HTTPS request to test services like RequestBin (since they’re HTTPS only). The example programs have you type in the server’s SHA1 thumbprint, but that didn’t seem to work for me. The certs I inspected were SHA256, which I assume is the problem.

Anyway, I’m not interested in doing HTTP in any of my project ideas right now, so I moved on to what I actually want to do, which is MQTT. Once again it was Hack-a-day that clued me in to this protocol, which is very popular for small devices & home automation. I started out looking for a “simplest possible MQTT example for ESP8266″ and didn’t find anything simple enough initially. Later I realized that there’s two great places to start looking for libraries & examples. First is the esp8266/Arduino repo on Github, which has a list of miscellaneous third party projects compatible with this specific chip. Second is in the Arduino IDE itself; the Library Manager is searchable:

Arduino Library Manager - Searching for

Arduino Library Manager – Searching for “MQTT”

The problem here is that which (if any) are actually good, useful, or correct for the ESP8266. The first search result in that screenshot is only for the “Arduino Uno Wifi Developer Edition”, for example.

Another challenge here is working through all the company branding. The second library listed, “Adafruit MQTT Library”, is ESP8266 compatible and comes with a “simple” example program to get started. However, it’s oriented around the Adafruit IO IoT web service (which is apparently a thing). I did get it to work with the local MQTT broker I’m running here on my PC, but I had to guess if might and try to peel away their extra stuff just to get to the bones.

The ESP8266 Github linked to lmroy/pubsubclient, which itself is a fork of knolleary/pubsubclient which seems more up to date. I don’t know why they’re linking to an out of date fork, except that it appears to more easily support “large” messages. The original has a default max packet size of 128 bytes, which might be too small for real apps, but I’m looking for a simple example so it should be fine.

Here’s a link to the example program from that repo:


The example is easy to set up; just punch in your Wifi access info & MQTT broker host name. Interestingly it does apparently support DNS… from working with the HTTP examples earlier, some of them used IP addresses rather than host names, so it wasn’t clear if DNS was supported in some of these network libraries. This one does, apparently.

Here’s what the output looks like. I’m running mosquitto 1.4.8 on my PC, with mosquitto_sub running in the lower panel subscribed to # (a wildcard, so all topic messages are shown).

Basic MQTT Example on the ESP8266

Basic MQTT Example on the ESP8266

Actual footage of me as this program was running

Actual footage of me as this program was running

I thought it would be fun to give the messages a little personality, so I found a list of all the voice lines from the turrets in Portal 2, copied them into a giant array, and now instead of “Hello World #37″ it’ll send something like “So what am I, uh, supposed to do here?” or “Well, I tried. Best of Luck!” once every few seconds.

Additionally, I made it so that the power LED blinks as it’s sending a message, as a little visual chirp to let you know it’s alive.


The code is here, and this version is a modification of the Adafruit MQTT example, rather than the other library linked above, because I wrote it before I discovered that simpler example. (Found the list of voice lines here, and removed a few dupes).

by Factor Mystic at 2018-02-03 19:03


I thought it might be fun to play around with programmable microcontrollers, so I bought some to play with. One of the most popular chips right now is the ESP8266 which I first saw pop up on Hack-a-day in 2014. I had to search backwards through 47 pages of blog posts that have been made in the meantime — that might give you a sense of its popularity.

I played around with PIC16/PIC18s in the early 2000s but never actually made anything, but the interest has been there. It’s also been my long time desire to create an E-ink weather display (once again thanks to an old Hack-a-day post, this one from 2012. That’s how far behind on projects I am). Recently I noticed some inexpensive E-ink development boards on Aliexpress and decided to jump into a less shallow end of the pool. I had also been following the ESP8266 for Arduino repo on Github, so I vaguely knew where to begin.

WeMos D1 ESP8266 WeMos D1 ESP8266

This evening I received the hardware (specifically, three of these) and decided to see if I could get a basic program deployed, just to get started. I don’t really know what I’m doing but I’m pretty good at reading & following directions (that counts for a lot in life).

The basics are:

1. Follow the “Installing with Boards Manager” directions here: (which includes grabbing the latest Arduino IDE software, then pulling in the ESP8266 chip configuration, which includes the WeMos D1 mini board configuration.

2. From WeMos’ website, grab the driver for the on-board USB/programmer chip, which for me was the CH240G ( Nothing more “exciting” than installing Chinese device driver software!

3. I got a little tripped up here, but later figured it out- when you plug one of the D1 boards into your PC via USB, it’ll show up as a COM Port in Device Manager. You have to pick that same COM Port in the Arduino IDE or it can’t find your board to deploy to. This was a little confusing because it won’t show up until you’re plugged in. picking the right com port in the arduino ide

4. The Arduino IDE comes with the ability to load in example programs from the board configuration, loaded in Step 1. I wanted the simplest possible thing to make sure everything was working, so picked the “Hello World” of microcontroller programs: “Blink”, which toggles the power LED in an infinite loop.

So far, so good! All told, this took about an hour from opening the package to getting the light to blink (which includes scrounging around for a good USB cable and trying to get a in-focus pictures).

As you can see, I didn’t even bother to solder on the headers yet. I will do that, but I think next I will look into getting some wifi code up and running.

by Factor Mystic at 2018-01-30 02:36



In the previous post in this series, we had finished up with a very basic unit test, which didn’t really test much, which we had ran using dotnet xunit in a console, and saw some lovely output.

We’ll continue to write some more unit tests to try and understand what kind of API we need in a class (or classes) which can help us satisfy the first rule of our Freecell engine implementation. As a reminder, our first rule is: There is one standard deck of cards, shuffled.

I’m trying to write both the code and the blog posts as I go along, so I have no idea what the final code will look like when I’ve finished. This means I’ll probably make mistakes and make some poor design decisions, but the whole point of TDD is that you can get a feel for that as you go along, because the tests will tell you.

Don’t try to TDD without some sort of plan

Whilst we obey the 3 Laws of TDD, that doesn’t mean that we can’t or shouldn’t doodle a design and some notes on a whiteboard or a notebook about the way our API could look. I always find that having some idea of where you want to go and what you want to achieve aids the TDD process, because then the unit tests should kick in and you’ll get a feel for whether things are going well or the conceptual design you had is not working.

With that in mind, we know that we will want to define a Card object, and that there are going to be four suits of cards, so that gives us a hint that we’ll need an enum to define them. Unless we want to play the same game of Freecell over and over again, then we’ll need to randomly generate the cards in the deck. We also know that we will need to iterate over the deck when it comes to building the Cascades, but the Deck should not be concerned with that.

With that in mind, we can start writing some more tests.

To a functioning Deck class

First things first, I think that I really like the idea of having the Deck class enumerable, so I’ll start with testing that.

public void Should_BeAbleToEnumerateCards()
    foreach (var card in new Deck())

This is enough to make the test fail, because the Deck class doesn’t yet have a public definition for GetEnumerator, but it gives us a feel for how the class is going to be used. To make the test pass, we can do the simplest thing to make the compiler happy, and give the Deck class a GetEnumerator definition.

public IEnumerator<object> GetEnumerator()
    return Enumerable.Empty<object>().GetEnumerator();

I’m using the generic type of object in the method, because I haven’t yet decided on what that type is going to be, because to do so would violate the three rules of TDD, and it hasn’t yet been necessary.

Now that we can enumerate the Deck class, we can start making things a little more interesting. Given that it is a deck of cards, it should be reasonable to expect that we could expect to be able to select a suit of cards from the deck and get a collection which has 13 cards in it. Remember, we only need to write as much of this next test as is sufficient to get the test to fail.

public void Should_BeAbleToSelectSuitOfCardsFromDeck()
    var deck = new Deck();

    var hearts = deck.Where();

It turns out we can’t even get to the point in the test of asserting something because we get a compiler failure. The compiler can’t find a method or extension method for Where. But, the previous test where we enumerate the Deck in a foreach passes. Well, we only wrote as much code to make that test pass as we needed to, and that only involved adding the GetEnumerator method to the class. We need to write more code to get this current test to pass, such that we can keep the previous test passing too.

This is easy to do by implementing IEnumerable<> on the Deck class:

public class Deck : IEnumerable<object>
    public IEnumerator<object> GetEnumerator()
        foreach (var card in _cards)
            yield return card;

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();

I’ve cut some of the other code out of the class so that you can see just the detail of the implementation. The second explicitly implemented IEnumerable.GetEnumerator is there because IEnumerable<> inherits from it, so it must be implemented, but as you can see, we can just fastward to the genericly implemented method. With that done, we can now add using System.Linq; to the Deck class so that we can use the Where method.

var deck = new Deck();

var hearts = deck.Where(x => x.Suit == Suit.Hearts);

This is where the implementation is going to start getting a little more complicated that the actual tests. Obviously in order to make the test pass, we need to add an actual Card class and give it a property which can use to select the correct suit of cards.

public enum Suit

public class Card
    public Suit Suit { get; set; }

After writing this, we can then change the enumerable implementation in the Deck class to public class Deck : IEnumerable<Deck>, and the test will now compile. Now we can actually assert the intent of the test:

public void Should_BeAbleToSelectSuitOfCardsFromDeck()
    var deck = new Deck();

    var hearts = deck.Select(x => x.Suit == Suit.Hearts);



In this post, I talked through several iterations of the TDD loop, based on the 3 Rules of TDD, in some detail. An interesting discussion that always rears its head at this point is: Do you need to follow the 3 rules so excruciatingly religously? I don’t really know the answer to that. Certainly I always had it in my head that I would need a Card class, and that would necessitate a Suit enum, as these are pretty obvious things when thinking about the concept of a class which models a deck of cards. Could I have taken a short cut, written everything and then wrote the tests to test the implementation (as it stands)? Probably, for something so trivial.

In the next post, I will write some more tests to continue building the Deck class.

2017-11-28 00:00



I thought Freecell would make a fine basis for talking about Test Driven Development. It is a game which I enjoy playing. I have an app for it on my phone, and it’s been available on Windows for as long as I can remember, although I’m writing this on a Mac, which does not by default have a Freecell game.

The rules are fairly simple:

  • There is one standard deck of cards, shuffled.
  • There are four “Free” Cell piles, which may each have any one card stored in it.
  • There are four Foundation piles, one for each suit.
  • The cards are dealt face-up left-to-right into eight cascades
    • The cards must alternate in colour.
    • The result of the deal is that the first four cascades will have seven cards, the final four will have six cards.
  • The top most card of a cascade beings a tableau.
  • A tableaux must be built down by alternating colours.
  • A card in cell may be moved onto a tableau subject to the previous rule.
  • A tableaux may be recursively moved onto another tableaux, or to an empty cascade only if there is enough free space in Cells or empty cascades to use as intermediate locations.
  • The game is won when all four Foundation piles are built up in suit, Ace to King.

These rules will form the basis of a Frecell Rules Engine. Note that we’re not interested in a UI at the moment.

This post is a follow on from my previous post of how to setup a dotnet core environment for doing TDD.

red - first test

We know from the rules that we need a standard deck of cards to work with, so our initial test could assert that we can create an array, of some type that is yet to be determined, which has a length of 51.

public void Should_CreateAStandardDeckOfCards()
    var sut = new Deck();


There! Our first test. It fails (by not compiling). We’ve obeyed The 3 Laws of TDD: We’ve not written any production code and we’ve only written enough of the unit test to make it fail. We can make the test pass by creating a Deck class in the Freecell.Engine project. Time for another commit:

green - it passes

It is trivial to make our first test pass, as all we need to do is create a new class in our Freecell.Engine project, and our test passes as it now compiles. We can prove this by instructing dotnet to run our unit tests for us:

nostromo:Freecell.Engine.Tests stuart$ dotnet watch xunit
watch : Started
Detecting target frameworks in Freecell.Engine.Tests.csproj...
Building for framework netcoreapp2.0...
  Freecell.Engine -> /Users/stuart/dev/freecell/Freecell.Engine/bin/Debug/netstandard2.0/Freecell.Engine.dll
  Freecell.Engine.Tests -> /Users/stuart/dev/freecell/Freecell.Engine.Tests/bin/Debug/netcoreapp2.0/Freecell.Engine.Tests.dll
Running .NET Core 2.0.0 tests for framework netcoreapp2.0... Console Runner (64-bit .NET Core 4.6.00001.0)
  Discovering: Freecell.Engine.Tests
  Discovered:  Freecell.Engine.Tests
  Starting:    Freecell.Engine.Tests
  Finished:    Freecell.Engine.Tests
   Freecell.Engine.Tests  Total: 1, Errors: 0, Failed: 0, Skipped: 0, Time: 0.142s
watch : Exited
watch : Waiting for a file to change before restarting dotnet...

It is important to make sure to run dotnet xunit from within the test project folder, you can’t pass the path to the test project like you can with dotnet test. As you can see, I’ve also started watching xunit, and the runner is now going to wait until I make and save a change before automatically compiling and running the tests.

red, green

This first unit test still doesn’t really test very much, and because we are obeying the 3 TDD rules, it forces us to think a little before we write any test code. When looking at the rules, I think we will probably want the ability to move through our deck of cards and have the ability to remove cards from the deck. So, with this in mind, the most logical thing to do is to make the Deck class enumerable. We could test that by checking a length property. Still in our first test, we can add this:

var sut = new Deck();

var length = sut.Length;

If I switch over to our dotnet watch window, we get the immediate feedback that this has failed:

Detecting target frameworks in Freecell.Engine.Tests.csproj...
Building for framework netcoreapp2.0...
  Freecell.Engine -> /Users/stuart/dev/freecell/Freecell.Engine/bin/Debug/netstandard2.0/Freecell.Engine.dll
DeckTests.cs(13,30): error CS1061: 'Deck' does not contain a definition for 'Length' and no extension method 'Length' accepting a first argument of type 'Deck' could be found (are you missing a using directive or an assembly reference?) [/Users/stuart/dev/freecell/Freecell.Engine.Tests/Freecell.Engine.Tests.csproj]
Build failed!
watch : Exited with error code 1
watch : Waiting for a file to change before restarting dotnet...

We know that we have a pretty good idea that we’re going to make the Deck class enumerable, and probably make it in implement IEnumerable<>, then we could add some sort of internal array to hold another type, probably a Card and then right a bunch more code that will make our test pass.

But that would violate the 3rd rule, so instead, we simply add a Length property to the Deck class:

public class Deck 
    public int Length {get;}

This makes our test happy, because it compiles again. But it still doesn’t assert anything. Let’s fix that, and assert that the Length property actually has a length that we would expect a deck of cards to have, namely 52:

var sut = new Deck();

var length = sut.Length;


The last line of the test asserts through the use of FluentAssertions that the Length property should be 51. I like FluentAssertions, I think it looks a lot cleaner than writing something like Assert.True(sut.Length, 51), and it’s quite easy to read and understand: ‘Length’ should be 51. I love it. We can add it with the command dotnet add package FluentAssertions. Fix the using reference in the test class so that it compiles, and then check our watch window:

Detecting target frameworks in Freecell.Engine.Tests.csproj...
Building for framework netcoreapp2.0...
  Freecell.Engine -> /Users/stuart/dev/freecell/Freecell.Engine/bin/Debug/netstandard2.0/Freecell.Engine.dll
  Freecell.Engine.Tests -> /Users/stuart/dev/freecell/Freecell.Engine.Tests/bin/Debug/netcoreapp2.0/Freecell.Engine.Tests.dll
Running .NET Core 2.0.0 tests for framework netcoreapp2.0... Console Runner (64-bit .NET Core 4.6.00001.0)
  Discovering: Freecell.Engine.Tests
  Discovered:  Freecell.Engine.Tests
  Starting:    Freecell.Engine.Tests
    Freecell.Engine.Tests.DeckTests.Should_CreateAStandardDeckOfCards [FAIL]
      Expected value to be 51, but found 0.
      Stack Trace:
           at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message)
           at FluentAssertions.Execution.AssertionScope.FailWith(String message, Object[] args)
           at FluentAssertions.Numeric.NumericAssertions`1.Be(T expected, String because, Object[] becauseArgs)
        /Users/stuart/dev/freecell/Freecell.Engine.Tests/DeckTests.cs(16,0): at Freecell.Engine.Tests.DeckTests.Should_CreateAStandardDeckOfCards()
  Finished:    Freecell.Engine.Tests
   Freecell.Engine.Tests  Total: 1, Errors: 0, Failed: 1, Skipped: 0, Time: 0.201s
watch : Exited with error code 1
watch : Waiting for a file to change before restarting dotnet...

Now to make our test past, we could again just start implementing IEnumerable<>, but that’s not TDD, and Uncle Bob might get upset at me. Instead, we will do the simplest thing that will make the test pass:

public class Deck
    public int Length { get { return new string[51].Length; }}


Now that we have a full test with an assertion that passes, we can about the refactor stage of the red/gree/refactor TDD cycle. As it stands, our simple classes passes our test but we can see right away that newing up an array in the getter of the Length property is not going to be something that is going to serve our interests well in the long run, so we should do something about that. Making it a member variable seems to be the most logical thing to do at the moment, so we’ll do that. We don’t need to make any changes to our test on the refactor stage. If we do, that’s a design smell that would indicate that something is wrong.

ublic class Deck
    private const int _size = 51;
    private string[] _cards = new string[_size];
    public int Length { get { return _cards.Length; }}


In this post, we’ve fleshed out our Deck class a little more, and gone through the full red/green/refactor TDD cycle. I also introduced FluentAssertions, and showed the output from the watch window as it showed the test failing

2017-11-21 00:00


A few months ago I left a busy startup job I’d had for over a year. The work was engrossing: I stopped blogging, but I was programming every day. I learned a completely new language, but got plenty of chances to use my existing knowledge. That is, after all, why they hired me.


I especially liked something that might seem boring: combing through logs of occasional server errors and modifying our code to avoid them. Maybe it was because I had setup the monitoring system. Or because I was manually deleting servers that had broken in new ways. The economist in me especially liked putting a dollar value on bugs of this nature: 20 useless servers cost an extra 500 dollars a week on AWS.

But, there’s only so much waste like this to clean up. I’d automated most of the manual work I was doing and taught a few interns how to do the rest. I spent two weeks openly wondering what I’d do after finishing my current project, even questioning whether I’d still be useful with the company’s new direction.

Career Tip: don’t do this.

That’s when we agreed to part ways. So, there I was, no “official” job but still a ton of things to keep me busy. I’d help run a chain of Hacker Hostels in Silicon Valley, I was still maintaining Wine as an Ubuntu developer, and I was still a “politician” on Ubuntu’s Community Council having weekly meetings with Mark Shuttleworth.

Politiking, business management, and even Ubuntu packaging, however, aren’t programming. I just wasn’t doing it anymore, until last week. I got curious about counting my users on Launchpad. Download counts are exposed by an API, but not viewable on any webpage. No one else had written a proper script to harvest that data. It was time to program.


And man, I went a little nuts. It was utterly engrossing, in the way that writing and video games used to be. I found myself up past 3am before I even noticed the time; I’d spent a whole day just testing and coding before finally putting it on github. I rationalized my need to make it good as a service to others who’d use it. But in truth I just liked doing it.

It didn’t stop there. I started looking around for programming puzzles. I wrote 4 lines of python that I thought were so neat they needed to be posted as a self-answered question on stack overflow. I literally thought they were beautiful, and using the new yield from feature in Python3 was making me inordinately happy.

And now, I’m writing again. And making terrible cartoons on my penboard. I missed this shit. It’s fucking awesome.

by YokoZar at 2014-01-29 02:46


Lock’n'Roll, a Pidgin plugin for Windows designed to set an away status message when the PC is locked, has received its first update in three and a half years!

Daniel Laberge has forked the project and released a version 1.2 update which allows you to specify which status should be set when the workstation locks. Get it while it’s awesome (always)!

by Chris at 2013-02-08 03:56


How do you generate the tangent vectors, which represent which way the texture axes on a textured triangle, are facing?

Hitting up Google tends to produce articles like this one, or maybe even that exact one. I've seen others linked too, the basic formulae tend to be the same. Have you looked at what you're pasting into your code though? Have you noticed that you're using the T coordinates to calculate the S vector, and vice versa? Well, you can look at the underlying math, and you'll find that it's because that's what happens when you assume the normal, S vector, and T vectors form an orthonormal matrix and attempt to invert it, in a sense you're not really using the S and T vectors but rather vectors perpendicular to them.

But that's fine, right? I mean, this is an orthogonal matrix, and they are perpendicular to each other, right? Well, does your texture project on to the triangle with the texture axes at right angles to each other, like a grid?

... Not always? Well, you might have a problem then!

So, what's the real answer?

Well, what do we know? First, translating the vertex positions will not affect the axial directions. Second, scrolling the texture will not affect the axial directions.

So, for triangle (A,B,C), with coordinates (x,y,z,t), we can create a new triangle (LA,LB,LC) and the directions will be the same:

We also know that both axis directions are on the same plane as the points, so to resolve that, we can to convert this into a local coordinate system and force one axis to zero.

Now we need triangle (Origin, PLB, PLC) in this local coordinate space. We know PLB[y] is zero since LB was used as the X axis.

Now we can solve this. Remember that PLB[y] is zero, so...

Do this for both axes and you have your correct texture axis vectors, regardless of the texture projection. You can then multiply the results by your tangent-space normalmap, normalize the result, and have a proper world-space surface normal.

As always, the source code spoilers:

terVec3 lb = ti->points[1] - ti->points[0];
terVec3 lc = ti->points[2] - ti->points[0];
terVec2 lbt = ti->texCoords[1] - ti->texCoords[0];
terVec2 lct = ti->texCoords[2] - ti->texCoords[0];

// Generate local space for the triangle plane
terVec3 localX = lb.Normalize2();
terVec3 localZ = lb.Cross(lc).Normalize2();
terVec3 localY = localX.Cross(localZ).Normalize2();

// Determine X/Y vectors in local space
float plbx = lb.DotProduct(localX);
terVec2 plc = terVec2(lc.DotProduct(localX), lc.DotProduct(localY));

terVec2 tsvS, tsvT;

tsvS[0] = lbt[0] / plbx;
tsvS[1] = (lct[0] - tsvS[0]*plc[0]) / plc[1];
tsvT[0] = lbt[1] / plbx;
tsvT[1] = (lct[1] - tsvT[0]*plc[0]) / plc[1];

ti->svec = (localX*tsvS[0] + localY*tsvS[1]).Normalize2();
ti->tvec = (localX*tsvT[0] + localY*tsvT[1]).Normalize2();

There's an additional special case to be aware of: Mirroring.

Mirroring across an edge can cause wild changes in a vector's direction, possibly even degenerating it. There isn't a clear-cut solution to these, but you can work around the problem by snapping the vector to the normal, effectively cancelling it out on the mirroring edge.

Personally, I check the angle between the two vectors, and if they're more than 90 degrees apart, I cancel them, otherwise I merge them.

by OneEightHundred ( at 2012-01-08 00:23


Valve's self-shadowing radiosity normal maps concept can be used with spherical harmonics in approximately the same way: Integrate a sphere based on how much light will affect a sample if incoming from numerous sample direction, accounting for collision with other samples due to elevation.

You can store this as three DXT1 textures, though you can improve quality by packing channels with similar spatial coherence. Coefficients 0, 2, and 6 in particular tend to pack well, since they're all dominated primarily by directions aimed perpendicular to the texture.

I use the following packing:
Texture 1: Coefs 0, 2, 6
Texture 2: Coefs 1, 4, 5
Texture 3: Coefs 3, 7, 8

You can reference an early post on this blog for code on how to rotate a SH vector by a matrix, in turn allowing you to get it into texture space. Once you've done that, simply multiply each SH coefficient from the self-shadowing map by the SH coefficients created from your light source (also covered on the previous post) and add together.

by OneEightHundred ( at 2011-12-07 18:39


Spherical harmonics seems to have some impenetrable level of difficulty, especially among the indie scene which has little to go off of other than a few presentations and whitepapers, some of which even contain incorrect information (i.e. one of the formulas in the Sony paper on the topic is incorrect), and most of which are still using ZYZ rotations because it's so hard to find how to do a matrix rotation.

Hao Chen and Xinguo Liu did a presentation at SIGGRAPH '08 and the slides from it contain a good deal of useful stuff, nevermind one of the ONLY easy-to-find rotate-by-matrix functions. It also treats the Z axis a bit awkwardly, so I patched the rotation code up a bit, and a pre-integrated cosine convolution filter so you can easily get SH coefs for directional light.

There was also gratuitous use of sqrt(3) multipliers, which can be completely eliminated by simply premultiplying or predividing coef #6 by it, which incidentally causes all of the constants and multipliers to resolve to rational numbers.

As always, you can include multiple lights by simply adding the SH coefs for them together. If you want specular, you can approximate a directional light by using the linear component to determine the direction, and constant component to determine the color. You can do this per-channel, or use the average values to determine the direction and do it once.

Here are the spoilers:

#define SH_AMBIENT_FACTOR   (0.25f)
#define SH_LINEAR_FACTOR (0.5f)
#define SH_QUADRATIC_FACTOR (0.3125f)

void LambertDiffuseToSHCoefs(const terVec3 &dir, float out[9])
// Constant
out[0] = 1.0f * SH_AMBIENT_FACTOR;

// Linear
out[1] = dir[1] * SH_LINEAR_FACTOR;
out[2] = dir[2] * SH_LINEAR_FACTOR;
out[3] = dir[0] * SH_LINEAR_FACTOR;

// Quadratics
out[4] = ( dir[0]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[5] = ( dir[1]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * SH_QUADRATIC_FACTOR;
out[7] = ( dir[0]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;

void RotateCoefsByMatrix(float outCoefs[9], const float pIn[9], const terMat3x3 &rMat)
// DC
outCoefs[0] = pIn[0];

// Linear
outCoefs[1] = rMat[1][0]*pIn[3] + rMat[1][1]*pIn[1] + rMat[1][2]*pIn[2];
outCoefs[2] = rMat[2][0]*pIn[3] + rMat[2][1]*pIn[1] + rMat[2][2]*pIn[2];
outCoefs[3] = rMat[0][0]*pIn[3] + rMat[0][1]*pIn[1] + rMat[0][2]*pIn[2];

// Quadratics
outCoefs[4] = (
( rMat[0][0]*rMat[1][1] + rMat[0][1]*rMat[1][0] ) * ( pIn[4] )
+ ( rMat[0][1]*rMat[1][2] + rMat[0][2]*rMat[1][1] ) * ( pIn[5] )
+ ( rMat[0][2]*rMat[1][0] + rMat[0][0]*rMat[1][2] ) * ( pIn[7] )
+ ( rMat[0][0]*rMat[1][0] ) * ( pIn[8] )
+ ( rMat[0][1]*rMat[1][1] ) * ( -pIn[8] )
+ ( rMat[0][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )

outCoefs[5] = (
( rMat[1][0]*rMat[2][1] + rMat[1][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[1][1]*rMat[2][2] + rMat[1][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[1][2]*rMat[2][0] + rMat[1][0]*rMat[2][2] ) * ( pIn[7] )
+ ( rMat[1][0]*rMat[2][0] ) * ( pIn[8] )
+ ( rMat[1][1]*rMat[2][1] ) * ( -pIn[8] )
+ ( rMat[1][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )

outCoefs[6] = (
( rMat[2][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[2][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[2][0]*rMat[2][2] ) * ( pIn[7] )
+ 0.5f*( rMat[2][0]*rMat[2][0] ) * ( pIn[8])
+ 0.5f*( rMat[2][1]*rMat[2][1] ) * ( -pIn[8])
+ 1.5f*( rMat[2][2]*rMat[2][2] ) * ( pIn[6] )
- 0.5f * ( pIn[6] )

outCoefs[7] = (
( rMat[0][0]*rMat[2][1] + rMat[0][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[0][1]*rMat[2][2] + rMat[0][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[0][2]*rMat[2][0] + rMat[0][0]*rMat[2][2] ) * ( pIn[7] )
+ ( rMat[0][0]*rMat[2][0] ) * ( pIn[8] )
+ ( rMat[0][1]*rMat[2][1] ) * ( -pIn[8] )
+ ( rMat[0][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )

outCoefs[8] = (
( rMat[0][1]*rMat[0][0] - rMat[1][1]*rMat[1][0] ) * ( pIn[4] )
+ ( rMat[0][2]*rMat[0][1] - rMat[1][2]*rMat[1][1] ) * ( pIn[5] )
+ ( rMat[0][0]*rMat[0][2] - rMat[1][0]*rMat[1][2] ) * ( pIn[7] )
+0.5f*( rMat[0][0]*rMat[0][0] - rMat[1][0]*rMat[1][0] ) * ( pIn[8] )
+0.5f*( rMat[0][1]*rMat[0][1] - rMat[1][1]*rMat[1][1] ) * ( -pIn[8] )
+0.5f*( rMat[0][2]*rMat[0][2] - rMat[1][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )

... and to sample it in the shader ...

float3 SampleSHQuadratic(float3 dir, float3 shVector[9])
float3 ds1 =*;
float3 ds2 = dir*dir.yzx; // xy, zy, xz

float3 v = shVector[0];

v += dir.y * shVector[1];
v += dir.z * shVector[2];
v += dir.x * shVector[3];

v += ds2.x * shVector[4];
v += ds2.y * shVector[5];
v += (ds1.z * 1.5 - 0.5) * shVector[6];
v += ds2.z * shVector[7];
v += (ds1.x - ds1.y) * 0.5 * shVector[8];

return v;

For Monte Carlo integration, take sampling points, feed direction "dir" to the following function to get multipliers for each coefficient, then multiply by the intensity in that direction. Divide the total by the number of sampling points:

void SHForDirection(const terVec3 &dir, float out[9])
// Constant
out[0] = 1.0f;

// Linear
out[1] = dir[1] * 3.0f;
out[2] = dir[2] * 3.0f;
out[3] = dir[0] * 3.0f;

// Quadratics
out[4] = ( dir[0]*dir[1] ) * 15.0f;
out[5] = ( dir[1]*dir[2] ) * 15.0f;
out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * 5.0f;
out[7] = ( dir[0]*dir[2] ) * 15.0f;
out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 15.0f;

... and finally, for a uniformly-distributed random point on a sphere ...

terVec3 RandomDirection(int (*randomFunc)(), int randMax)
float u = (((float)randomFunc()) / (float)(randMax - 1))*2.0f - 1.0f;
float n = sqrtf(1.0f - u*u);

float theta = 2.0f * M_PI * (((float)randomFunc()) / (float)(randMax));

return terVec3(n * cos(theta), n * sin(theta), u);

by OneEightHundred ( at 2011-12-02 12:22


Fresh install on OS X of ColdFusion Bulder 2 (TWO, the SECOND one). Typing a simple conditional, this is what I was given:

I also had to manually write the closing cfif tag. It's such a joke.

The absolute core purpose of an IDE is to be a text editor. Secondary to that are other features that are supposed to make you work better. ColdFusion Builder 2 (TWO!!!!!) completely fails on all levels as a text editor. It doesn't even function as well as notepad.exe!

Text search is finicky, Find & Replace is completely broken half the time, the UI is often unresponsive (yay Eclipse), the text cursor sometimes disappears, double-clicking folders or files in an FTP view pops up the Rename dialog every time, HTML / CF tag completion usually doesn't happen, indention is broken, function parameter tooltips obscure the place you are typing, # and " completion randomly breaks (often leaving you with a ###)...the list goes on and on.

Adobe has a big feature list on their site. I'm thinking maybe they should go back and use some resources to fix the parts where you type things into the computer, you know, the whole point of the thing.

by Ted ( at 2011-12-01 15:14


Has it really been a year since the last update?

Well, things have been chugging along with less discovery and more actual work. However, development on TDP is largely on hold due to the likely impending release of the Doom 3 source code, which has numerous architectural improvements like rigid-body physics and much better customization of entity networking.

In the meantime, however, a component of TDP has been spun off into its own project: The RDX extension language. Initially planned as a resource manager, it has evolved into a full-fledged programmability API. The main goal was to have a runtime with very straightforward integration, to the point that you can easily use it for managing your C++ resources, but also to be much higher performance than dynamically-typed interpreted languages, especially when dealing with complex data types such as float vectors.

Features are still being implemented, but the compiler seems to be stable and load-time conversion to native x86 code is functional. Expect a real release in a month or two.

The project now has a home on Google Code.

by OneEightHundred ( at 2011-10-19 01:37