sn.printf.net

2024-06-14

I usually like to keep my posts more positive-focused — here’s what you should do vs. here’s what you shouldn’t do. But this week alone I had three potential clients relay to me a very common experience: I thought I finally found a good developer, but then they suddenly just disappeared! I’ll admit its hard …

2024-06-14 18:30

One of the advantages (if you can call it that) of being in this industry as long as I have is that I’ve been through multiple economic disasters, technological paradigm shifts, and — it’s not all bad! — economic and technological boom periods. So I figured I might as well throw out a few predictions …

2024-06-14 18:30

In every introduction to a potential client, partner, or other associate, the first thing I do is give a brief overview of my history. I know this is common in just about any business or social interaction, but its especially important in my line of work, since communicating my curriculum vitae is so critical to …

2024-06-14 18:30

2024-06-08

I think failures are educational and worth documenting, so here's the story of how FSE-on-GPU failed.

One of the things I'm working on right now is Gstd, an attempt to make a Zstandard variant that runs on GPU compute.

Zstandard uses a newer entropy coding method called FSE, which is a variant of tabled asymmetric numeral systems (tANS).  Using this scheme requires two steps, both of which have challenges in how to implement them on GPU compute: Decoding the probabilities and converting those probabilities into a decode table.

GPUs are designed for highly parallel instruction execution, especially executing the same operation on many different inputs at once (a "vector" of values).  Like Brotli-G and GDEFLATE, Gstd uses multiple bitstreams simultaneously to allow values to be parsed from all of them at once.

One of the reasons this type of scheme is so efficient is that GPUs have instructions to compute a running total for each of the inputs, so if you can create a vector containing 0 for lanes that don't need to be refilled and 1 for lanes that do need to be refilled, and do a prefix-sum op, then you'll get a vector where all of the lanes contain the offset from the current read position that they need to be refilled from.

Doing things in parallel this way however requires minimizing the dependency of sequential values.  Running totals ARE dependent on preceding values, but they can be done fast because the GPU has an operation to do it.

This creates a problem for decoding FSE because it has multiple steps that are sequentially dependent.

Here's a quick summary of the FSE table decoding and generation algorithm:

  • Determine how many table slots have yet to be assigned.
  • Determine the maximum possible value that the next value to read can have - It is equal to the number of remaining slots + 1.
  • Load enough bits to encode that value.
  • If the value is small enough (how small is determined by the number of remaining slots), return 1 bit to the bitstream (or, depending on how you phrase it, don't load it in the first place).  This step prevents any value from being encoded that would be larger than the number of remaining slots.
  • Subtract 1 from the decoded number
  • If the resulting number is -1, then it is a special "less than one" probability with a slot usage of 1.  Otherwise the slot usage is the decoded value.
  • Repeat the preceding steps until there are no slots remaining.
  • Once all symbols are decoded, position all "less than one" probabilities at the end.
  • For each symbol, scatter the slots by adding a prime number to the last slot index, and then take that sum modulo the table size.  If it overlaps a less-than-one entry, repeat this process until an empty slot is found.
  • Sort the table positions in ascending order.
  • Assign each slot a baseline and bit usage.  Basically, when a value is decoded, the slot corresponding to a "state" value is read from the table.  That slot indicates a number of bits to load and a baseline value that it must be added to.

The example given in the specification looks like this:

state order 0 1 2 3 4
state value 1 39 77 84 122
width 32 32 32 16 16
Number_of_Bits 5 5 5 4 4
range number 2 4 6 0 1
Baseline 32 64 96 0 16
range 32-63 64-95 96-127 0-15 16-31

Note that the baselines start at zero where bit usage is low and wrap around to the high bit usage ones, which are lower-numbered and come first.  The size of the baseline range is always 2^Number_Of_Bits, which ensures there is always a slot that will decode to any given state value.  Like all ANS variants, these are encoded in the reverse order that they are decoded, so there needs to always be a slot that will decode to a symbol that, combined with the bits, equals a the desired state value.

There are several serial dependencies here:

  • The low-bit-usage cutoff of each probability depends on the slot usage remaining after the previous symbol.
  • The table index depends on how many less-than-one probabilities were skipped by preceding symbols (otherwise it is just a multiple of the slot index in the series).
  • The bit ranges depend on how many slots of the same symbol precede it in numerical order.

The last one has a few implications, there are basically two ways to do it: Run a loop over the post-placement slots and figure out which occurrence of the symbol each entry is, or run a loop over pre-placement slots for each symbol to sort them.  Both are seriously problematic due to low utilization.  What if we could just assign the baselines to pre-scattered values instead of post-scatter?

This goes against the theory, but if it didn't hurt the compression ratio enough, it might make it work.  A clever idea that turned out to be a big mistake.  It's actually pretty nice in terms of implementation possibilities:

First, fill an empty table with zeroes and then pack the baseline, bits, and symbol into a value and put it into the table.  On hardware with WaveMatch and equivalent ops, it's possible to create a bitfield containing 0 for lanes with zeroes and 1 for lanes with non-zero values, which can be used in tandem with masking and high-bit search ops to quickly find which lane preceding a given lane has a non-zero value.  On hardware without it, it's possible to do it in logarithmic time by packing and transforming the values in a way that their desired order is numerically ascending and repeatedly using WaveMax.

The baselines are ascending, but were monotonically ascending, so this basically involved negating the bit usage and coding a value that could be subtracted from the slot index and then multiplied by 2^BitUsage to get the actual baseline.

Once that's all done, loop over that table and multiply each index by the prime distribution value and take that value modulo the table size to get the final position.  Great!


So, I tested this out with some sample data from OpenArena.  The results were coming out pretty favorably (~15%) ahead of GDEFLATE and changing from post-scatter to pre-scatter was not having a big impact on the compression ratio.

However, FSE is still difficult to work with on GPU: It requires a big table, and that big table has to be indexed from groupshared memory because of how big it is, unless it can be crammed into a register somehow... But there's a lot of data that would have to be crammed into registers!

For 15%, it would be worth it though!  Unfortunately that hit a painful realization while working on the decoder: The encoder was dropping the contents of in-compressible blocks.  Once that was fixed, the savings were only about 3%.

3% was not great to begin with, but was especially odd because GDEFLATE is not functionally different from Deflate, and Zstandard usually outperforms Deflate by a measurable margin.  Upon running tests, there were a good number of blocks where Zstandard was compressing to about 10% larger than GDEFLATE.  My best guess was, and still is, that GDEFLATE uses libdeflate as its compressor, and libdeflate is quite good... so, to test this possibility, I needed to make some code that could convert a Deflate block into a Zstandard block!

Now, in order to do that, I need to generate FSE probability tables for the match, literal length, and offset codes, and then rescale them to total a power of 2 as the scheme requires.  How do you do that?

Well, normally for an arithmetic or range coder, the cost per occurrence of a symbol is -log(Probability/SumOfProbabilities)/log(2).  As probability decreases, the bit usage declines rapidly at first and then slower.  Through some math, this can be simplified into smaller calculations, but there's one thing that's weird about FSE: The reduction of bit cost seems linear, doesn't it?  Maybe the bit usage is just approximately-logarithmic but not actually?  Close enough?

Suppose the successor state for a symbol is uniformly random (and they do behave pretty randomly).  Have a look at the table above.  If you bump the slot usage for a symbol by 1, the effect is to split one of those 5-bit ranges into two 4-bit ranges, but those 4-bit ranges cover the same range of successor states, so really that fraction of the successor state ranges was reduced by 1 bit in cost.  The problem is that all of those 5-bit ranges are the same size, so each time you add 1 slot, it keeps reducing the bits-per-symbol by the same amount, not an amount that's changing according to a logarithm curve.  It continues doing this until it hits a power of 2, at which point it has to start splitting 4-bit ranges into 3-bit ranges.


But that means the ideal slot to bump the probability of is the same until it hits a power-of-2 usage, resulting in a table that contains only power-of-2 values except for at most one probability - and all of those powers of 2 have integral bit usage, which is just Huffman.

This seemed very problematic so I had to run a few tests, and the answer was that FSE doesn't have this problem because given uniform successor states, FSE will produce predecessor states that are disproportionately lower values.  (This is why the less-than-one distribution works in the first place.)  It's somewhat easy to see why: Lower-numbered states take up more numeric range.

Doing baseline distribution before scattering breaks that assumption though, and it makes the predecessor states uniformly distributed.


Some further tests confirmed my suspicions: If you use a properly-scaled probability distribution with cell properties determined before scattering, then the result is worse than Huffman.  It's only better than Huffman if the cell properties are determined AFTER scattering.

That means that in order to do this right, it was necessary to sort the entries, which I don't think there is any particularly good algorithm for.

But tests showed it was only a small amount of overhead?

Yeah, but Huffman is only a small amount of overhead too.  FSE and arithmetic coding make it possible to get fractional bit usage, but the benefit of that depends a lot on the distribution of values.  The fewer bits used by the most-probable symbols, the more likely it is to help, because a fractional bit of accuracy is a bigger portion of the values being encoded.


The silver lining

Ultimately, this realization is pretty lethal to using FSE on GPU, since the remaining options for table construction are quite bad, and the table is a problem in the first place.

The good news is that rANS should still work.  In fact, the way GDEFLATE decodes Huffman codes is by using binary search within a vector register, and rANS symbol lookup would work the same way.

So, Gstd isn't dead yet (it will be dead if it turns out that the larger blocks are due to format inefficiencies instead of libdeflate being really good), but it will not be using FSE, and I think the work of getting FSE to work on GPU compute was unfortunately wasted.

by OneEightHundred (noreply@blogger.com) at 2024-06-08 17:21

2024-03-06

This is part 3 of a 4-part series, mirrored from the Gale Force Games page.

Part 1 is here.

Part 2 is here.

Part 4 is here.

A quick foreword on "un-exporting" projects

The only other engine in ScummVM that really resembles mTropolis in terms of the development style and challenges is Director, and ScummVM's Director engine is able to take advantage of a tool called ProjectorRays that converts exported and protected Director movies back into a form that Director itself can load, including decompiling Lingo bytecode into source code.

Wouldn't something like that for mTropolis be useful too?  Unfortunately, no, for multiple reasons:

  1. Running things in mTropolis is buggy.  If you change the state of a scene object while running the game, the state can persist into the editor.  The documentation even tells you to do this for list variables because there is no way to edit their contents in the editor for some reason.  Scenes may be loaded when playing in the editor when they wouldn't be loaded in-game.
  2. Setting up test cases to determine its behavior doesn't usually require using the actual title.
  3. Plug-ins have editor-only files that aren't distributed with titles.  For instance, here's the Resource directory of the demo:

    The files ending with ".ePP" provide editor functionality, only the ".rPP" and ".cPP" files are distributed with the built title.  That means if a title is using a non-standard plug-in, the editor won't be able to use it, because the ".ePP" file for it is missing.
  4. mTropolis has a lot of non-standard plug-ins.  Remember what I said earlier about it seeming like mFactory was using a business-to-business model?  A lot of plug-ins have shown up in retail titles that seem like pre-release or customized versions of the standard ones.  There is very little consistency.
  5. I made a debugger that is way more informative than the mTropolis editor anyway, which we'll get to very soon!

We can rebuild it. We have the technology.

In the last installment, I covered how MTDisasm came into being and ultimately was able to fully dump Obsidian's game data.  At that point, making it work again was clearly an achievable goal, it was just going to take a lot of work to do it.

My initial plan was to convert MTDisasm into a library and use that as the loader for a new program called MTEmu (mUlator was a bit too on-the-nose).  Unfortunately, while I had done a bunch of OS interface stuff for my Glider PRO port, very little of it except for the resource loader was going to be useful for Obsidian.  Even the StuffIt unpack tool, ported from The Unarchiver, wasn't going to be too useful - the installer format used a different algorithm, so I'd have to port that too.  I was basically going to be starting from 2D drawing boilerplate, and writing boilerplate sucks.

Trying to add it to ScummVM was somewhat under consideration, and it already had things I needed: QuickTime parsing and decoding, PE-COFF resource parsing (needed to get cursors out of DLLs), MIDI output, and all of the OS boilerplate.  My biggest reservation though was the license.  I am not a GPL fan, everything I do is MIT/Apache licensed if I can help it because I think reaching more platforms is more important than contributor reciprocation.

I ultimately decided to try doing it as a ScummVM addition anyway for three reasons:

  1. It was already in-demand for being added to ScummVM, and doing it as a separate project was probably going to result in it being brought in anyway with a bunch of duplicated work.
  2. I didn't think my reservations about the licensing were going to matter because I didn't think anyone was actually going to untangle the IP rights mess needed to put it back into print.  Surprise!
  3. I figured it was probably technically possible to do in some kind of arms-length way that the mTropolis player code could be easily hoisted out if I wanted to do it later.  It was, in fact, coded that way.

Obsidian on console when?

I want to keep the possibility open, but bringing it to console would mean re-implementing the things mentioned above (basically going the MTEmu route) and paying for certification.  It wouldn't be cheap or easy, and given the level of interest in it, I think it would be difficult to justify.

I didn't think it would be re-released in the first place though, so who knows?  If the people with the rights to it want ink on a dotted line, they know where to find me.

Foundational work and design

ScummVM has a "create_engine" tool now, but at the time the mTropolis work was started, the recommendation was to clone the engine used for acclaimed 1993 Game of the Year Plumbers Don't Wear Ties, so that was used to get a simple foundation, but one immediate problem was that the initial plan of bringing MTDisasm in as a library was going to be a no-no with the project's code standards (which, among other things, prefers using its own integer types instead of stdint.h).

Because of this, the MTDisasm data object loaders were mostly brought in by hand.  One thing I knew I wanted to do from the start was separate the loading of object data from instantiation of the objects, partly for cleanliness (though at the cost of duplication), partly to deal with disconnects between how data was stored on disk vs. how it should exist in a loaded object.

One of the first technical challenges was planning for linking up object references, which has 2 problematic aspects: The first problem was that if objects are loaded in some order, then it's possible that an object references another object that hasn't been loaded yet, so object link-up has to be its own phase.  The second, and much bigger problem, was mTropolis's system of "aliases."

As briefly mentioned in an earlier part, a modifier in mTropolis may be converted into an "alias" that allows the modifier to be reused in multiple different objects.  This was especially important in tandem with "behaviors," which are modifiers containing other modifiers.  One thing that might not be apparent from that though is that the behaviors have to be cloned when they are brought into the scene as aliases.  If a behavior includes a variable inside of the behavior, for instance, then each instance of that behavior has its own copy of the variable - and other things inside of the behavior reference that variable!

On top of that, if a variable is converted to an alias, then the aliased variable has another special behavior: Changes to any instance of the alias apply to all of them, making it effectively a variable reference.

Aliased variables were actually implemented via an incorrect assumption for quite a while.  Initially, they were implemented via a somewhat complicated approach where the aliased variable modifier existed globally, but references to it could be put in multiple parts of the project.  It later turned out that aliased variable modifiers are actually distinct objects - If you add them to the scene twice, each one has a different GUID.  The engine was eventually changed to handle this correctly, cloning variable modifiers and making them reference a shared storage object instead.

As a legacy of that though, the mTropolis engine's loader has several steps to making an object exist:

  1. The definition of an object is is loaded from disk as a DataObject.
  2. An object corresponding to the DataObject's type (which may be an alias!) is created.
  3. The object is initialized by loading the data in the DataObject.
  4. The object is, at some point, added to the scene.
  5. After some operation that adds objects to the scene, the object is "materialized."

"Materializing" an object does 3 things:

  1. Assigns it a new runtime GUID
  2. Replaces any alias modifiers with clones of the global object referenced by the alias.
  3. Resolves any references inside of the object.  In some cases, this means references with static GUIDs are resolved up the scene hierarchy.  In other cases, it means that it detects that a reference is to an object that was cloned, and the reference is replaced with a reference to the clone.

Even now, it's not entirely clear how object references are supposed to work though.  Muppet Treasure Island, another mTropolis title, had numerous problems with variables having duplicate names but different contents and behaving in a way where the correct behavior must have involved resolving the references to a different one than what the GUID pointed to, but in other cases, the one pointed to by the GUID was the correct one.

Investing in the development experience

Dealing with a complex thing like this is really hard if you can't actively see what's happening.  The first line of defense is logging messages.  Since mTropolis depends heavily on message-passing for logic, it's really important to be able to see what messages are being sent and where they are going.  If you run the game with debug level 3 or higher (e.g. by putting "-d 3" in the command line), then ScummVM will log all message propagation to the console.


This is extremely important, but printing things to the console doesn't provide a lot of information about the state of the scene.  To help with a lot of development problems at once, one feature that went in very early was the debug overlay.  It was so high-priority that its position in the to-do list was right after being able to reach the first screen.

If you've paid attention while adding Obsidian to ScummVM (or if you go to the options by hitting Ctrl-F5 in the ZOOM/Steam version to exit to the launcher...), you may notice that there's an option marked "Start with debugger."

This launches the game with a debug overlay and a few buttons on the side, and a display that shows you the name of the active scene and active shared scene.

Unfortunately, the step-through debugger part was planned (and much of the internal architecture built around it), but never actually came to fruition because it turned out to not be the right tool for debugging most of the logic bugs that popped up.

Step-through debugging is useful if you want to be able to analyze the state of a program while it's in the middle of running scripts, for example, but most problems with game logic working properly were not due to scripts executing incorrectly, they were due to problems with messaging - sending messages in the wrong order, sending messages that weren't supposed to be sent, not sending messages that were supposed to be sent, and so on.

Debugging message problems on the other hand didn't benefit a lot from having a step-through debugger, it mostly depended on looking at the disassembled scene to figure out how it was supposed to work, comparing that with the message log to figure out it was actually doing instead, and then setting up test scenarios in mTropolis to see what messages actually get sent - and in what order - in a similar situation.

One example of how messaging went wrong is having to figure out the exact point where queued messages were discharged during a scene transition, something that was causing the wrong music to play in the Statue lower level.

The project viewer and inspector, however, were incredibly helpful:

These let you see all kinds of information in real-time: What is loaded, what the values of variables are set to, what the GUIDs of objects in the scene (which allows them to be cross-referenced with MTDisasm output), and any other information that's been exposed for that object.

There are also some toast notifications that pop up at the bottom:

Warnings are colored yellow and errors are colored red.  You won't see many warnings these days, but the main source of warnings is that every modifier and element in the ScummVM mTropolis engine has a "support level," which is either unimplemented, partially-finished, or finished.

Entering a scene with a partially-finished or unimplemented element or modifier results in a warning notification.  This is to make it clear that a scene with things in an unfinished state has been entered, and if something isn't working correctly, one of those unfinished things is likely to be the culprit!

(You may be wondering why there's a warning about text labels in a scene with no text labels.  That's because Obsidian has its own debug overlay text label, but in the retail version, it's moved off-screen.)

Error popups mostly occur due to Miniscript errors.  If you play Obsidian, you'll notice several of these.  That's not actually a problem with ScummVM - the game has some scripts that do invalid things, and in all of the cases that I'm aware of, I've confirmed that the proper behavior is in fact to throw an error and stop the script.

Making this also ran into a bit of a problem with ScummVM's internal architecture, it has its own GUI system, which you can see in the in-game settings and launcher, and it's not a great GUI architecture but it's not really bad for what it does either, but it does have one problem: It assumes exclusive control until you leave it, which means it is completely unusable while the game is running.  Dealing with that required rolling a new small UI kit to make the windows, scroll bars, hierarchy tree, etc. that you see in the debug overlay.

Having that UI kit also required everything else to be aware of it.  For example, if the game is supposed to change your mouse cursor, it actually doesn't, it changes the cursor assignment of the main game "window" so that the mouse is visible, and not the game cursor, when it's over one of the debug overlays.  Same goes for detecting mouse movement if the mouse isn't in the game window.

Of course, the big irony of all of this is that these debugging capabilities are considerably more advanced and informative than what mTropolis gives you, so I actually had way more information available to me than the developers of the game did!

I know your deepest, darkest secrets

One fun thing about being able to see everything in the game data is being able to see things that were either obscure or in one case, really not meant to be found at all, and find all kinds of new information about the game that was either not well-known or not known at all.

Realm names

The internal name of the bureau realm is "Labyrinth" and the name of the spider realm is "Abraxas."

Space bar skip

It turns out pressing the space bar skips most cinematics, except for important story cinematics, letting you get through the game much faster.  Huh.  Doing this causes some bugs sometimes though, like preventing the music from stopping when you beat the Bureau chapter.

I heard you like FMV games so I put an FMV game in your FMV game

You may have noticed that there is a help booth in the first part of the Bureau named "Sources" that shows a falling book, referencing the Myst intro, and if you click the screen, you get a crazed man telling you to bring him "the blue pages," another Myst reference.  What you may not know is that all of the booths are reachable from the phone puzzle, and if you call the Sources booth, the phone is answered by Henry Stauf from The 7th Guest.

Speaking of the phone puzzle, I've looked at the code for it, and I still don't understand it.  Each of the dials actually represents a positive or negative coordinate, and a coordinate is only valid if either all three are negative or positive (determined by which half of the slider it's on), but that means there are actually 2 valid coordinates for any point you want to reach.

Fireflies in the Bismuth junkyard

There are supposed to be fireflies in the Bismuth junkyard, but you probably won't see them on a normal playthrough because when you first land in it, you're actually in a duplicate of it that belongs to the previous chapter, and the duplicate doesn't have the fireflies.  Normally, leaving that landing scene triggers a disk change, which puts you in the actual section.  Try going back down and looking around!

The dirtiest dirty secret of them all

Not only was this one completely unknown previously, it was actually discovered by accident due to the unfinished modifier toast popup mentioned earlier.  Viewing the zoetrope popped up a warning about an unimplemented "Path Motion Modifier," which was a bit strange because the wobbling frame was working just fine and nothing seemed to be unusual, let alone having problems because of a missing modifier that basically existed to move sprites around.

Looking at the scene in the debugger showed some interesting things.

The path motion modifier responds to a message named B206_Start_Drop.  Drop what though?  Further down there is a "Click Behavior" but it's actually not a click behavior, it has 5 key detection triggers and a counter that fires when all 5 are reached.

After looking through the logic, it turns out that if you type "Voxel" after beating the puzzle, then a head appears and the animated bird drops a turd on it.

To make matters worse, the path motion modifier is not used for anything else in the game except for this, which meant I had to implement its behavior just for this bird poop Easter egg that nobody even knew about.

You thought nobody would ever find out...

Spaghetti no-code

No game can survive without reusable systems though, but making systems out of objects in a scene hierarchy is particularly weird.

Much of Obsidian's navigation system is handled by a list of "nodes," and you're usually on one of those nodes.  The node lits is usually supplied by compound variable that has to be named "cSR" in the subsection.

This is a fairly common recurring pattern: Behaviors are placed in the hierarchy alongside variables that the behaviors have to consume.

A pretty gnarly piece of "what were they thinking?" here though is how those values get in the list in the first place.  The editor lets you set the initial value of variables, so I bet you're thinking that you can edit lists in the editor right?  Well you would be wrong, the actual way is that you have a script set the values that you want, run your project, and then when you go back into the editor, the values that you set while running it have persisted into the value.  Intentionally leaking play state back into editor-persistent state as intended design?  Whee!

Some other mTropolis users would later come up with their own patterns, like Muppet Treasure Island populating lists by broadcasting a message to part of the scene hierarchy and having all of the responders ping back.  Why this was better than just giving the scripting language a "for" loop, I have no idea.

You're probably wondering then, if coding was so cumbersome in this, how did the library terminals work?  Well, for one thing, the terminals are done via a combination of several different parts that handle specific functions, from text output to actually updating things.  It's basically a finite state machine, so most of the logic behavior simply handles all possible states.


The game also includes a custom plug-in called "RSGKit" that includes a few modifiers used to do more complex tasks, like string manipulation (which Miniscript has no built-in support for) and creating WordMixer answers.

Speaking of WordMixer, the dictionary data is also completely stored in the RSGKit plug-in.  The ScummVM mTropolis engine has its own internal plug-in system to implement these modifiers, including parsing out the dictionary data from hard-coded offsets in the DLLs, and the dictionaries are used for both WordMixer and the filing cabinets.  Naturally, getting the word list exactly correct (including short words) is mandatory, since some important files are referenced by number in the scripts.

Trouble is on the menu

Sometimes, the way the game and its logic system work create problems that clash with how ScummVM is supposed to work in ways that are really hard to do anything about.  A big one is the menus.

If you go to the game options in ScummVM and change the sound and music volume, you might notice that it... doesn't really work. 


That's because the game logic manages the sound levels, and it kind of has to because there is no global volume, every sound emitter has its own volume and the game logic has to update them itself.

You might be thinking "well so what, just scale the volume of sounds with the SFX level and the MIDI music with the music level!"  That's easier said than done.  The volume levels are stored in the save files (by the game logic), and syncing them up with ScummVM's settings requires manually hooking into the game logic.  Additionally, the sound volume system used in Obsidian was designed to evaluate volume levels when entering a scene - not while you're already in the scene!  But one quirk you may not be expecting is that MIDI was used for some non-music sounds, like the keycard sound in the maze, and (surprise!) the vidbot screen on/off sound.  (Remember that last one for the next installment!)

Similar problems when trying to load from the in-game menu:

Simply put, the game expects loads to happen at a certain time (i.e. only from the menu), and follow through on them a certain way.  Saving a game in mTropols is not done by the engine tracking everything that needs to be saved, it's done by the game logic, which has its own ideas about what needs to be saved and loaded.

Fortunately, I was able to implement saving from the menus for Obsidian specifically, since whether or not you can press the Esc button to go to the menu is controlled by a single boolean variable, and you can save from anywhere that the menu is available.

Even supporting a menu is more complicated than it may seem.  The game has to be able to support transitioning out of and then back into just about any scene, which is a problem if there is game state from puzzles persisting through scenes.  That's probably why the piazza puzzle has a blocker that prevents you from saving anywhere in the main puzzle area, and there is at least one bug where reloading the game in a particular scene skips the puzzle (which is not a ScummVM bug - it was present in the retail version).

Room to grow

I think this has covered most of the process involved in getting the ScummVM mTropolis engine into a working state.  The next and final installment of this series will cover how it went beyond the original, with better MIDI support, widescreen mode, subtitles, improved color depth, and a tiny little quirk to get the save screenshots right!  It'll also have some final thoughts about wrapping this project up, and the future of the mTropolis engine in ScummVM.

by OneEightHundred (noreply@blogger.com) at 2024-03-06 01:48

2024-02-27

This is part 1 of a 4 part series, mirrored from the Gale Force Games page.

Part 2 is here.

Part 3 is here.

Part 4 is here

I'm putting this together to tell the weird story and technical details of how Obsidian was brought back to life and added to ScummVM.  I didn't originally want to do this because I think dev blogs are more interesting when they show the state of the art and help people understand how to solve similar problems, and if there are two things that stand out about porting Obsidian, it's how unusual of a project it was, and how much luck was involved in it being as easy as it turned out to be, and possible at all.

Maybe that weirdness will be an interesting story on its own?  Who knows.

A little bit of history (and context)

In 1993, Cyan released Myst, a 3D-rendered surreal point-and-click adventure game heavy on puzzles and atmosphere, it was a massive success.  Numerous other games attempted to follow in its footsteps, and most were... not very good.  In 1997, Rocket Science Games released an ambitious, high-budget game with a similar gameplay concept and fantastic visuals, set in a series of surreal dream worlds full of creative and similarly-surreal puzzles.  Unfortunately, for various reasons, the game didn't sell well, and RSG was forced to close its doors.

With Obsidian relying on QuickTime, the seeds of its demise as a playable piece of software were sown.  Apple built an entirely new operating system, leaving the Mac version unplayable when the emulation environment for their old operating system was discontinued.  They also discontinued QuickTime for Windows, already a notoriously finicky piece of software, and future updates to Windows made it quite difficult to run.

Fast-forward 25 years later to August 2022, and support for Obsidian was added to ScummVM.  Barely over a year later, the rights to the game were acquired by Jordan Freeman Group and it was officially put back into print on ZOOM Platform and Steam.

A quick history of the state of the art of 1990's multimedia

In 1984, Apple released the Macintosh computer, which kicked off the foundation of the user interface used by desktop computers to this day, with later models incorporating high-resolution displays and high-fidelity waveform sound output.  It was this design, combined with the release of QuickTime, Apple's video authoring and playback system, that kicked off an era of "multimedia" experiences.

With the release of Windows 3.1 in 1992 and especially Windows 95 in 1995, Windows had largely caught up to Macs in multimedia capability, but much of the multimedia authoring ecosystem was still on Mac, with popular tools such as HyperCard, Authorware, SuperCard, and Director.  HyperCard and SuperCard never got Windows support, and Director didn't get it until its fourth major version.

mTropolis entered the market in 1995 with a Mac-only authoring environment and runtime support on Mac and Windows.  Unlike Director, a project exported from mTropolis couldn't be packaged into a stand-alone application, it had to be bundled with a player program.  This also came with a major caveat (which we will come back to, because it's really important!): Projects exported for Mac and Windows were in different formats.

The life of a program in a multimedia operating system

Before we get into the format differences, I need to explain the nature of the operating systems in this environment.  Another way that Macs were a major departure from DOS, and much closer to how things work today, was how application development for it worked.  Applications were developed against a large suite of functions provided by the operating system, unlike DOS where your application was mostly shoving information directly into the hardware.

One of Apple's big innovations at the time was that a huge amount of multimedia capability was standard and provided by the operating system.  An API called QuickDraw handled drawing, Sound Manager handled sound output, and later QuickTime handled video and audio decompression and playback.  It also had its own image format (PICT) that interfaced with so many of its capabilities that it's difficult to open on Windows to this day.

With Windows, Microsoft sought to have similar facilities.  In particular, an API called GDI existed to provide much of the functionality that QuickDraw provided on Mac.  Naturally, both APIs had different memory structures and formats that they used to provide their data to their drawing systems.

The life of a big number that lives on a circuit

Another concept that will quickly become important here is byte ordering.  All numbers on computers are stored as a sequence of binary data, usually chopped up into 8-bit numbers or bytes.  Most numbers are either 8-bit, 16-bit, 32-bit, or 64-bit in size.  I'll be discussing most numbers in hexadecimal here.

When a number larger than a byte is stored, there is an important question to ask: What order do you store the bytes in?  Say you have decimal number 123456789, that is 75bcd15 in hexadecimal.  There are two ways of storing this: Least-significant byte first ("little endian"), which produces the byte sequence 15 cd 5b 07, or most-significant byte first ("big endian"), which produces the sequence 07 5b cd 15.  Why would you use one versus the other?  That's a question for the electrial engineers, but Macs were originally developed for the Motorola 68000 CPU series, which used MSB-first ordering when loading large values from memory, while DOS and Windows were developed for Intel 8086 derivatives which used LSB-first ordering.

How does one make a cross-platform multimedia framework anyway?

Combining all of these things, we start to see one of the timeless dilemmas of portability: A lot of multimedia stuff is being provided by the operating system, but the operating system is determining the format that the data has to be in to use it.  How do you deal with this?  There are two strategies: Use a common format that you can convert into the format that the operating system expects at runtime, or export to the format that the target operating system expects.

There is a third option, which is to load the OS-native format of one operating system on the other operating system.  This turns out to be a pretty bad option for Mac-based authoring for two reasons: First, many of the memory and data formats that have first-class citizenship on MacOS were thinly-documented, and second, Apple liked to have complicated (but capable) file formats, often integrating with other pieces of their multimedia infrastructure, making supporting them robustly on other platforms difficult.  (Windows, by comparison, tended to use fairly simple formats.)

One thing that helped with this process though (but ultimately consigned Obsidian to bit-rot) was that Apple had ported QuickTime to Windows.  While Microsoft had come up with their own competing technology, Video for Windows (commonly known for playing AVI files), repackaging QuickTime files into AVIs was generally not a simple process due to QuickTime's more advanced features not being supported in VFW.  Ultimately, mTropolis could play back AVI files on the Windows version of a project, but it had to be in AVI format ahead of time.

This became a major problem further down the line though, as Apple discontinued support for QuickTime on Windows, and getting QuickTime to work became more and more difficult with later versions.

Coming soon to... nowhere

mTropolis was positioned as an alternative to Macromedia Director, but ultimately failed in the market.  mFactory was bought out by Quark, who released mTropolis 2.0 only as an update to existing mTropolis licensees and then closed the company, favoring their own product, Quark Immedia (which also failed).  What's not so clear about this is how mTropolis was actually sold to licensees though.

Director was pretty pricey, but could be bought as a through a whole bunch of authorized resellers.  mTropolis was considerably more rare, and VERY expensive.  The first version was $4,500 per copy, equivalent to $9,200 today, and I've yet to find any evidence of it being sold as a mass-market product.  It may have only been sold directly to licensees, and numerous credit lines and mentions suggest that mFactory was more hands-on with their licensees.  Was it being offered under a more "premium" business-to-business model where they provided lots of support?  Was the plan to go boxed-product after they'd worked out the kinks with early licensees?  Who knows.  Either way, used copies of it are extremely scarce.

While I was eventually able to obtain a used copy of the full version, the easiest way around this problem is actually the demo version, which was bundled with some reference material (which is also rare) and on a few issues of the UK magazine MacFormat, specifically the February 1997 (MF47) and May 1998 (MF63) issues.  While the demo versions don't allow you to save your projects, they do allow you to export them, and since we're only interested in the exported format, that's perfectly fine!  Now just need to dust off an old piece of fruit company hardware to run it.

Sorry about those aftermarket prices

Asking a toddler to solve a math problem

After finally obtaining a copy, I wasn't really expecting to contribute more than as much information as I could find and hope that someone else who know what they were doing would pick up development of it.  Most engines in ScummVM are developed by reverse-engineering the game executable with IDA or Ghidra, tools which can help turn compiled code run by the CPU into a more readable format.  This requires a considerable amount of work, since the structure of more complex types and the usage of values have to all be determined by hand, and sometimes two different values wind up being treated as the same value because they were mapped to the same hardware register, and all other kinds of problems.

Doing that is often crucial though, because often you really need to know what the program is doing to handle certain things.   File formats and behaviors can be cryptic, logic can be hard-coded, and being unable to look at the actual code doing things is a big problem.

I had no idea how to do any of that (although I've picked up on it somewhat since!), but I did know how to poke around data formats, and I had just come off of porting Glider PRO, which involved doing a lot of research on Mac-native formats documented in the Inside Macintosh reference book series, and getting a copy of mTropolis meant I had room to play with it.  At this point, actually making it work again was not even remotely the plan.

What $4,500 (or a magazine subscription) buys you

Let's see what this thing actually looks like!  You may have all kinds of ideas in your head of what a good way to write a multimedia authoring suite would look like, and then you have mTropolis, which is a bit... sparse?


mTropolis authoring interface
So, let's talk about the design paradigm of this thing.  The developers of it were really embracing the idea of a "no code" paradigm.  A project is organized into a tree, and the tree consists of several descending layers: The project, sections, subsections, and scenes.  Below scenes, the only supported element types are movies, images, graphic elements (which can be invisible boxes, or used to hold shapes), text, and sounds.

All elements can have modifiers attached, which is where the real fun is.  Variables are modifiers, scripts are modifiers, logical things are modifiers, movement is done via modifiers, basically everything interesting is done by attaching a modifier to an object.

We'll get to how all of this works in a later installment of this series.  First, we need to talk about how to analyze this thing.  If you go to the "Build" menu and select "Build Title..." then you get this nice dialog.

Build dialog


"mFactory Animation?"  That could be a serious problem.  We'll talk about why later.  For now, let's click Build to export our project!

Prompt for path for startup segment export


Looks like we forgot something!  We need to set the directory to export to.  mTropolis was designed from the ground up for multi-CD titles, and the way it does this is by letting you create segment labels, each of which corresponds to a single exported file, and then assign each section in the project to a segment label.  We can set these up further in the project config:

Segment mapping editor

But anyway, let's set our startup segment to the "Move to PC" directory here and build it with a Macintosh target, and then build it again with a Windows target.

Exported projects

Success!  Let's move the files to my machine running the best operating system for open source development and everything else (Windows) and pop it open in XVI32.

Exported project in XVI32
 

There's not a lot that we can tell from the outset, but resizing the window horizontally causes things to line up.  That seems to suggest that there's a table with a bunch of fixed-size elements in it.  Of what purpose?  We'll find out soon, but I'm also going to highlight another pattern that occurs quite often here.

Aligned header table, length-prefixed strings

 We can tell from this now that this size table takes up 34 bytes per row, but it's also not clear where it starts.  But also, look at the boxes highlighted purple and green.  Both of them contain strings with a length prefix.  While C represents the length of strings by terminating them with zero bytes, MacOS was written mainly in assembly and Pascal, and Pascal uses length-prefixed strings.  Even on MacOS, there was an extension to insert length prefixes to strings in C because the MacOS APIs preferred that so much, and that mentality carried over to a lot of Mac software, like this.

However, that also brings us to another problem: How many bytes long is that length prefix?  That's actually a recurring problem throughout this kind of thing: Knowing how big a value is, and where it starts and ends, is really important, but most of the time, values are going to be too small to take up their entire capacity, so it's not clear if the zeroes in the other half belong to that value, or represent something else.

The lucky break that made everything else possible

Let's open the Windows version, "Untitled-11.MPL", and compare it side-by-side to the Mac version.

Position-aligned byte swaps

I've highlighted a few blocks here, but really this type of thing is all over the format.  Remember what I said earlier about Mac and Windows machines using different byte order?  If you compare the files data tends to appear in the same positions, but in reverse byte order.  This is immensely helpful, because it means the overwhelming majority of data start positions and sizes can now be determined by just exporting the data into both formats and looking for the byte order flips.

There did turn out to be a few caveats to this though.  The developers of this decided to try sticking to platform-native formats whenever they could, so a few things have data that is in completely-different format.  One of those things that is the source of much deception are the rectangle and point structures.  On MacOS, a point is typically two 16-bit values, with the first representing the vertical position and the second representing the horizontal position.  On Windows, it is the reverse.  The problem with this is that the member swapping combined with the byte swapping results in a complete 32-bit swap, so it is often difficult to tell if a 4-byte flip is a 32-bit number or a point.

This is still a bunch of salad though, and the earliest stages of doing something like this are the most painful because, while there are numerous concepts used repeatedly throughout this, they haven't been discovered yet.  Even today, I haven't actually figured out what all of the values in the file header mean.

We can start with one basic one though: Most of the time, when a file format has a catalog, it has size and position data.  In this case, some of the values in the catalog are a bit deceptive: The "sceneStr" text is actually just junk that was left in memory when the catalog data was being written, only the part with the stream type up until the first 0-byte terminator is relevant, and the scene type is a fixed-size 24-byte character buffer.

We can look at some of the values and see what is likely to be catalog positioning.  An easy guess is that if two values sum up into a third value, then the value that gets summed into is probably the file position and the other one is the size.

Catalog position and size data, and the first recorded position

Once we've identified this though, we still need to figure out if that data belongs to the scene type string before or after it.  In this case, that can be guessed pretty easily by seeing if the data after the last string makes sense as a catalog entry or not.  In this case, it does, so the data probably belongs to the string preceding it.

This can be poked with a little further by adding more assets and scenes to the project, which causes the value at 0x30 to correspondingly increase.  Great, we know where the substreams are located!

At this point, I started making a program called MTDisasm to do further analysis, in particular dumping the substreams of a project to separate files.  I'll spare most of the details, but the catalog is a series of streams marked by segment number, and the file positions are relative to the start of the segment file.  Segment files have no header!

Marco Polo

Thankfully, this byte-swapping thing is unbelievably, helpful, but our adventure is just beginning.  We still don't know what all of that data is before the "bootStream" or what the contents of any of them mean.  This is where we get into "black box analysis" where we start doing things to our project, exporting it, comparing exports, and using those comparisons to understand how what we do in the editor is reflected in the data.

In theory, if it's possible to completely determine how everything you can do in the editor maps to the file format, then you can basically determine almost all of what the creators had in their project before they exported it.

There are still many issues in front of us though.  Knowing what the data says doesn't always tell us how things are supposed to behave, sometimes the exported data is missing important things, and there's still all the work of making it all actually happen.

About those animations...

I'll cover the details of this in a later installment, but when I said this required a lot of luck, the byte-swapping stuff was a massive windfall, but there is one other type of thing that could have easily derailed this, at least to the point of really needing to start reverse-engineering the actual code: Proprietary compression formats!

For the most part, if data is fixed-size, it is relatively easy to load, but proprietary compression formats can be a nightmare.  If you look at the deflate compression algorithm used in the ZIP format, for instance, it has LZ77 commands and literals compressed by Huffman trees, and the Huffman trees themselves are compressed, and there's very weak correlation between what the bits in the data are and what values they actually represent.

Even simple compression formats like Cinepak have multiple layers of information processing before you can get the data that you want, and being lossy formats, there isn't a reliable way to put data into it where you get a predictable result out.

mTropolis uses a proprietary compression format for its animations called "mFactory Animation" and fortunately, that format turned out to be relatively simple.  This is actually somewhat expected: mTropolis was supposed to run on low-end Macs, at a time when high-end Macs could only decode an MP3 file in real-time if you weren't doing anything else, so compression formats were often relatively simple.

This type of thing was a pretty close call though.  If it weren't for that, this might not have happened.

I'll get to how that was reverse-engineered in a later installment of this series.

Coming up next...

Plot twists, surprises, an anamorphic filter created solely for the intro video of a dead company, and of course, jank!  But maybe less jank than you expect.  In future installments I'll talk about:

  • How mTropolis stores things
  • How mTropolis handles game logic
  • Dealing with platform-specific format differences
  • Decompiling Miniscript
  • Going from decompiling projects to actually running them
  • Figuring out how things are supposed to work
  • Debugging
  • Game systems built out of spaghetti
  • MIDI jank
  • Auto-save, widescreen support, and subtitles
  • Tales from other titles in ScummVM's mTropolis engine


by OneEightHundred (noreply@blogger.com) at 2024-02-27 20:11

2024-02-24

This is part 2 of a 4-part series, mirrored from the Gale Force Games page.

Part 1 is here.

Part 3 is here.

Part 4 is here.

Everyone needs some structure

In the last part, I went over the basics of analyzing values in an exported project, but this doesn't explain how to get actual data out of the scene.  To skip ahead a bit, one problem is that objects in mTropolis aren't the same size.  Without any knowledge of the structure, we aren't really even sure how to determine where objects start and end, and that can be a bit difficult.

A simple way of dealing with variable-sized objects is just make multiple identical ones and see how far apart they wind up in the data.  It also isn't entirely clear yet how the stream catalog even associates with scenes in the project tree.

Let's export it and use MTDisasm's "bin" mode to dump the streams, then open up stream-3-1.scene for the Mac and Win versions in XVI32 again.

A bunch of useful things to note here.  We can see that the element ends with a name and a null byte, and by noting the spacing, we can guess where the first one starts, and determine that they're 61 bytes in size.  61 is 3D in hexadecimal, and we can see the size marked in the data as well.  The scene stream also seems to start with a scene definition of some sort too, maybe?

Also of interest, there is a 32-bit value that is 8 only in the last one - This turns out to be a flags field, and the 8 is a bit flag indicating that the element is the last one in the list of elements for that level of the tree.

(Fun fact: One of the flag bits is for if the object is expanded in the structure window, which is useless outside of the editor.)

The start of the element also provides some useful clues, which led to figuring out the format after investigating different types of objects: Most (but not all!) object types start with a 32-bit object type ID, a 16-bit revision number, a 32-bit field of unknown usage, and a 32-bit size that includes the preceding values.

Unfortunately, while the first 2 values are common across all objects, the next 2 are not always followed.  That turned out to be a problem when a fallback handler for unknown object types was sometimes flying off outside of the data due to the size field not actually being there.  Plug-ins (which we'll get to later) are particularly weird, with the size value being garbled, but still decodable, only on Windows.

That's no scene

There is actually an unusual quirk here: That piece at the start of the file is not the scene definition.  One way that you can tell that it isn't is that if you add modifiers to the scene, they don't show up in the scene stream, they show up in... the boot stream?

Well, now we get to one of the weirder aspects of mTropolis's architecture, a quirk that actually causes an annoying bug in the Windows version of Obsidian.  The object at the start is some kind of stream header, of unknown purpose.  The definition of the actual scene object is not stored in the scene stream, it's stored in the boot stream and it always exists, but the objects inside of them are loaded on-demand.

Not only that, but scenes aren't really even scenes.  Projects have their own type, as do sections and subsections, but scenes are just graphic elements... or at least they are most of the time.  If you link them to an image or QuickTime movie, they change type into that kind of element, so scenes are actually the same type of object as a scene element.

So what is the bug in Obsidian that this causes? It's the bug where if you trigger the rebellion music in the first chapter, and then go the wrong way, the music keeps playing forever, even if you restart the game, because the MIDI modifier that plays the music is directly under the scene in the project hierarchy, preventing it from ever being unloaded.  The Mac version fixes this issue by... deleting that music track.

An informative experience

The strategy for MTDisasm for the most part has been to create structures for all of the data types, byte arrays by default if all of the values seem to be the same, and larger values if they have visible byte swaps.  Fields with unknown use are named "Unknown" with a number after it, and for convenience, if an unknown field turns out to be multiple values, it is not renumbered, but rather split so other values maintain their number.  For example, if a field named unknown3 turns out to be 2 values, it is split into unknown3_1 and unknown3_2.

Here's what our test scene looks like cranked through MTDisasm's text mode today:

An inconsistent experience

Let's take one of the more minor examples of how this can go wrong by creating a floating point variable modifier and setting its initial value to pi.


So far so good, export the files, dump the streams with MTDisasm bin mode and...

Uh oh, something is very wrong here!  The Windows version has a value in it that XVI32 is telling us looks like a double-precision float encoding pi at the end, but the Mac version is 2 bytes longer and looks pretty different.  What is going on?

Let's think like a Mac programmer here: Motorola 68000 CPUs didn't originally have floating point support, and when they did, it was added via the MC68881 FPU add-on, later incorporated into the 68040 CPU.  That FPU preferred float data in an 80-bit format, with a 16-bit exponent and 63-bit mantissa with a required 1 bit for finite numbers above the mantissa.

That format has a few advantages, such as being relatively efficient to work with on CPUs that only support integer arithmetic, but the specifics of floating point behavior are well-documented in other places, and the important thing here is that the Mac and Windows versions are using completely different number formats for floats.

This behavior exists in some other places too: Color values on Mac are 6 bytes per color (due to Macs having 16-bit precision per channel), but 4 byte BGRA format on Windows.  The auxiliary data in QuickTime elements is particularly different, with 14 extra bytes in the header on Mac and 12 extra bytes in the footer on Windows.

Interactivity: A machine built out of mail

Now that we've covered much of how the data is structured, it's time to talk about what is either the best or worst feature of mTropolis: Object behavior and communication.

Almost everything that makes objects interesting in mTropolis is accomplished via modifiers and messages.  Modifiers are a special type of object that can be attached to scene elements (and, in a few specific cases, other modifiers), making them do things.  Importantly, many modifiers do not really modify the object, but provide other functionality, and simply exist as modifiers.  Variables, for instance, are modifiers.  Saving and loading games is a modifier.  Scripts are modifiers.

Also central to this is one of mTropolis's most important features: Behaviors, which are containers for other modifiers.  Behaviors can be switchable, allowing them to be turned on and off, potentially turning off all of the modifiers inside of the behavior, and they can also be aliased, allowing you to save a behavior into the alias palette and reuse it across multiple objects.

Inter-object and inter-modifier communication is done via messages.  Most modifiers trigger when they respond to a message, and the message is either a built-in message type (such as a "Mouse Over" message firing when the mouse moves over an element) or a custom message type created by the author.  Messages can usually be sent with a value as well, potentially affecting how the recipient responds.

Recreating mTropolis's behavior requires extensively testing what messages fire when, and in what order.  Sometimes the order is complicated.  For instance, message senders usually have an "immediate" flag that determines whether the message sends immediately (before the message currently being propagated continues propagating), or is added to a message queue.  When the message queue is processed was behind a bug where the Statue area was playing the wrong music because it turns out that if a non-immediate message is sent, and then something causes a scene transition, the queued-up message has to be sent after the scene transition.

Weird code in the land of no code

It turns out that having a spaghetti of message senders and recipients isn't really enough to express your game logic efficiently all of the time, so mTropolis does supply a scripting system in the form of the Miniscript modifier.

Miniscript is infamous for being hobbled by intentionally not having any loop constructs to encourage developers to use things other than code instead.  If you want a loop, you either have to queue up a message of some kind to re-trigger the modifier, and possibly chain things together.  That's really only the beginning of its problems though, it's full of weird cases where the semantics aren't really clear at all.  For example, as mentioned earlier, variables are modifiers, but if you use a variable from a script, it may treat it as referencing the value of the variable or referencing the variable modifier itself, depending on what you're doing.  What triggers the implicit conversion of a variable to the value it contains?  Who knows, just have to endure the bugs and find out.

One case involves a bunch of things being miscompiled in Obsidian due to a design fault that is invisible, and impossible to fix without trashing the project and starting over.  For example, let's look at the Miniscript modifier with GUID 0034164d, named "Init on PE".  Here's what MTDisasm currently tells us that it says when decompiled:

if element.width <> 640 then
    set element.width to 640
end if
if element.height <> 360 then
    set element.height to 360
end if
if element.visible <> true then
    set element.visible to true
end if
if element.position <> (0, 0) then
    set element.position to (0, 0)
end if
if element.direct <> true then
    set element.direct to true
end if
if element.paused <> true then
    set element.paused to true
end if
if label:(4:3ae) <> true then
    set element.loop to true
end if

... What?

What is going on with this last one?  What is a label?

Well, let's say you write this code in Miniscript:

if loop <> true then
    set element.loop to false
end if


What have you actually written here?  When you save a Miniscript modifier, if there is an identifier like "loop" in this case, then it first tries to resolve it to a global value.  If that fails, then it has to look it up, which internally is done using a subscript (i.e. by looking up a value "contained in" something else).  At first I thought the thing it was subscripting something that meant, basically, "go look for it."  Nope.  The default thing to subscript is the current element, but if you subscript an element by name, it will still look further up in the scene hierarchy for that object.

The exact search order had to be sussed out through via repeated tests, cause you really don't want to be looking up the wrong thing, do you?

MTDisasm will add some additional annotations to note how a value is being resolved.  But what is a label?  Well, in this case, mTropolis supports a feature where you can embed time markers in AIFF sound files, and then allow those time markers to trigger a media cue modifier, among other things, but those time markers are named, and those names go into the global label table.  In this case, someone imported a sound file with a time marker named "loop."

There is no way to view the global label table, there is no way to delete a label after it's been imported, and the labels permanently become names that identifiers in scripts resolve to.

So, some poor sap is now comparing the value of a sound marker to "true" instead of the element's "loop" property like he would have been if that sound file was never imported.

Another thing that must be handled properly when running Miniscript is errors.  Many errors will terminate a script execution, while some things (like sending a message to an object that doesn't exist) will not, and games have code that doesn't work correctly if that behavior isn't mirrored correctly.

Garbage in, un-garbage out

How are we even getting these scripts back though?  Do the Miniscript modifiers just have the program source in them?  Well, in very early versions of mTropolis, yes.  In every shipped title I've seen, no, the export process removes the Miniscript code and only leaves in a compiled version.

Let's take a look at the Miniscript object that we just mentioned in Obsidian.

After isolating the values, we see what we're looking at.  Quick note: The reason for the "program header" is that there are actually two modifiers that use Miniscript programs: The Miniscript modifier, and the "If Messenger Modifier" which uses a Miniscript program to evaluate a condition.

In this case, there are no references.  A reference is just a name and an object GUID.  There is a field indicating the size of the instruction data, which looks like a blob of... stuff.  It's not aligned, so the instructions aren't fixed-size.  Let's look a bit closer.


If we look carefully, there is a bit of a pattern here.  Each instruction has a size 4 bytes in, which makes it possible to isolate individual instructions.  Convenient!

The first guess of how this works, which turned out to be correct, is that this is a stack machine, one of the easiest ways to write a scripting system.  The way a stack machine works, for the unfamiliar, is that instructions do some combination of removing ("popping") values from the end of a list and appending ("pushing") new values to the end of the list ("stack").  For example, if you had to evaluate the expression:

((A*B)+(C*D))

Then the stack machine instructions would look something like:

push A
push B
multiply
push C
push D
multiply
add

After a lot of messing around, this is what MTDisasm outputs for the instructions of that first condition block:

0: push_global  default
1: get_child  00000000
2: push_value  double 640
3: cmp_neq  
4: jump  conditional   unknown=62  skip=5
5: push_global,args=1  default
6: get_child,args=1  00000000
7: push_value  double 640
8: set 

Okay so what are these?  "default" in the disassembly really means "element" but what we are doing here is pushing the value of "element" and then getting the attribute with index 0 from the attribute list (which is "width"), then pushing the value 640.0, doing a comparison (which pops the last 2 values from the stack and pushes true or false on to the stack.  Then it does a conditional jump based on the value, and if it's false, skips to 5 instructions ahead.  If that skip didn't happen, it pushes the element (with a write-intention flag in the arguments!), gets the child (also with a write-intention flag), pushes the value 640.0, and then runs a set operation that applies the value.

One irony here is that it would have been extremely easy to add support for loops into this by just allowing negative offsets in the jump instruction, the developers simply chose not to because... reasons?

This is kind of hard to read though, so what can we do? We decompile it!

Stacking boxes

Decompilation for something like this normally involves 2 steps: Converting stack machine instructions back into expression trees, and converting the control flow graph back into structured control flow.

The first one is straightforward, since Miniscript never has branches in the middle of an expression.  Remember that pseudocode for the stack machine I had earlier?  Well, if you just keep track of each input to something that consumes values from the stack, you can just regurgitate them into expression text.  The only thing complicated there is some precedence analysis, which is done to avoid having to put parentheses around everything... It also doesn't quite work right, if I remember, but it works enough that I can tell what's going on.

Control flow analysis, which involves converting jumps back into if/else/end if blocks, is the harder part.  Let's say we right a script that roughly looks like this:

if A then
    do something A
else
    do something B
end if
do something C

This will compile into stack machine code that looks something like this:

0: push A
1: jump conditional to instruction 4
2: Do something A
3: jump to instruction 5
4: Do something B
5: Do something C

These "Do something ..." blocks can be any number of instructions.  In order to decompile this, we need to convert it into a control flow graph.  Control flow graphs in modern compilers can be quite complicated, especially if they form loops, but since Miniscript doesn't have loops, we're left with what's called a directed acyclic graph, meaning no path through the graph ever flows through the same node more than once.

To do this, we split sets of instructions out into blocks.  A block starts with any instruction that is the target of a jump, and ends with either a jump instruction or where another block starts.

Note that there is an "end of program" block.  This block is always added to the end and contains no instructions, but it's useful for the decompile algorithm.

So how do we work with this?

First, for each one of these nodes, we have to determine several pieces of information about it:

  1. Immediate predecessors (which nodes directly flow into the node)
  2. Immediate successors (which nodes it can directly flow into)
  3. Post-dominators (which nodes must be flowed through after the node)

I'm sure there are better ways to do this, but the way MTDisasm does this is by creating "islands" that each correspond to an emittable block of code.  All islands start with a block of code and end in a "sink" which exits the island.  Initially, there is one island that starts with the first block and sinks into the "end of program" block.  It then repeatedly processes islands until it's done, and each time an island is processed, it can produce additional islands that also must be processed.

Processing an island is done as follows:

First, check if there is more than one post-dominator of the starting block.  If there is, then we need to find one that terminates the island.  Post-dominators are always reached in a fixed sequence, so we can find it by just following the first successor until we hit the first post-dominator.  The island is then split into 2 islands, one that sinks into the post-dominator, and one that starts at the post-dominator (with the original sink).

Then, the successor list of the block is checked.  For each successor, it checks if there is an island starting at that successor already.  Doing that handles "if" statements that have no "else" block and go to a successor of the "else" block when the condition is false, for instance.  If there is no island starting at that successor, a new island is created, starting with that successor's first block and ending in the original island's sink.  Since we know the island is now post-dominated by its sink, we know that is the end of the control flow path of the new island.

This process is repeated until there are no islands left.

After that, we convert the islands into code.  Starting with the first island, the island's instructions are converted back into an expression tree and emitted.  Then, if the block ends in a condition, an "if" statement with the condition expression is emitted, followed by the "true" island, and an "else" followed by the "false" island if present, and then an "end if", followed by the sink island (which can chain into more conditions and sink island).

That's it!  We can now see the code!

The last format standing

Most of the other data can be figured out using this test-case analysis.  Create a bunch of the same thing, change one thing, see what changes.  Some things are formatted interestingly, like a lot of things have a "rate" parameter which is actually exported as a large value divided by the rate, turning it into a delay instead.

Most of the asset formats are fairly straightforward too.  Images are flipped and have different channel order on Windows, but otherwise are uncompressed.  Audio is uncompressed.  Sometimes things are compressed, but are compressed with a QuickTime codec and have the codec ID included, and nearly all QuickTime codecs have been reverse-engineered already.

What hasn't been reverse-engineered is that "mFactory Animation" codec.  mTropolis has an animation format called "mToon" designed for possibly-moving image sequences.  The mTropolis demo doesn't allow saving projects, but it does allow creating and saving mToon assets.  I guess it would have to, otherwise you couldn't try it out, and it's not like you can do much with an mToon file if you can't save your project, right?  (... Right?  ... Yeah.)

In the state of the art of the mid-90's, there were basically three ways of compressing animation in a way that could feasibly be played back in real time: Vector quantization, run-length encoding (RLE), or other simple lossless schemes like LZ77.  Vector quantization involves taking chunks of the image and converting them into a smaller set of similar chunks (the codebook), and replacing parts of the image with a lookup into the codebook.  It's similar to creating a palette for an image (do people even know what that means these days?), except it creates lookups for blocks of pixels instead of single pixels.  Run-length encoding just replaces series of identical pixels with a code indicating how long the series is.

How do we poke at this to start guessing what it does?  Well, we can pop open Color It!, one of the many programs of the era designed to be an affordable Photoshop replacement, and try doing some things.  Let's make an arrow, and give it a weird size like 12x13 so we can find the size easily, then make 2 identical frames so we can determine the frame stride and hopefully match that up with the size somewhere.


Okay, looks like we've got it.  Let's set the compression to "None" and export our project and crack it open!

Important thing to note here: The segment asset data appears after all of the scene objects, so the way mTropolis handles some things here is by storing the metadata with the scene object, but the frame data is in the asset data area.

We have a problem though.  We told it to be uncompressed, which means we should be seeing something that looks like an arrow shape in the hex data, but we don't.  It compressed it anyway.  The good news is, we can see from the codec ID and frame headers that this looks like an RLE codec, which is really good news because RLE codecs are way easier to figure out than VQ codecs.

There are a lot of ways to write an RLE codec.  You can set a minimum run length to increase the numeric range of runs by adding it to the run count, you can expect that runs and non-run byte sequences alternate, you can let runs go past the end of a row, and so on, but it's very easy to correlate because you can just make test images with runs of given lengths and see how they're encoded.

This did turn out to be slightly more complicated: Most animations have "temporal compression" which allows for frames where runs of pixels are skipped and reused from the previous frame, and later it turned out that there was a special code for vertical displacement, which skips many rows of pixels at once.  The 16-bit version is just the 8-bit version, but uses 16-bit values!

Internally, this was handled in a somewhat sneaky way too: ScummVM's mTropolis engine converts the RLE-16 format to RLE-32 so it can decode it on the fly and blast pixels straight into a 32-bit frame buffer... but that's for another time, because...

It looks like we've hit a turning point?

With all of this information, MTDisasm reached a point where it could dump all of Obsidian's data in readable, viewable, and listenable formats.  I finally had something resembling the "source code" to the game.  That's still a long way from actually running it, but it meant that there was no more information about how it's supposed to work that was out-of-reach.

Upon reaching this point, I still wasn't on the ScummVM team or really looking to join it.  I had just come off of porting Glider PRO, which involved re-implementing tons of MacOS stuff by hand, so I really was not shy about spending excessive amounts of my free on this type of thing and doing it myself.

That story is for the next installment though.  Next up, I'll take you down the path of turning a mTropolis player re-implementation into reality.

by OneEightHundred (noreply@blogger.com) at 2024-02-24 08:48

This is part 4 of a 4-part series, mirrored from the Gale Force Games page.

Part 1 is here.

Part 2 is here.

Part 3 is here.

Better than it was before.  Better. Stronger. Faster.

In the last installment, I discussed the process of getting Obsidian back to a point where it was, at least, no buggier than the original game.  That is still not quite the case, there is one bug that doesn't occur in the original version: Inventory items getting swept across the screen during pan transitions.

Ironically, that bug isn't actually visible on modern machines running the original game because the pan transitions were all set to run at maximum rate, which is as fast as possible.  If you don't like the bug, you can always turn off the "Enable short transitions" option in the game options.  Today though, we're not here to talk about bugs, we are here to talk about making Obsidian in ScummVM the best Obsidian experience that over Obsidianed.

The first topic though falls into both categories: MIDI

A cacophony of bugs

Obsidian uses MIDI for music, played back through QuickTime.  This seems like an odd choice, given that Myst and most other games were using digital music, and QuickTime's tinny MIDI instruments were very limiting, but based on interviews, this may have been driven by Thomas Dolby's love of technology and MIDI in particular.  It sounds like a lot of the rationale was driven by a desire to have a dynamic music system, which didn't quite pan out.  Dynamic music is used in exactly two places in the full game: During the church puzzle, and after solving the statue puzzle (which cuts out the piano track).

Before we start, we have to talk about what MIDI is.  MIDI is an acronym for Musical Instrument Digital Interface, and it's designed to interface MIDI-compatible controllers (such as keyboards or programmable sequencers) to interface with devices that consume MIDI commands such as synthesizers, computers, and drum machines.

It was also commonly used in the early DOS days for various reasons related to being able to decode it using low-CPU-usage algorithms, or offloading it to sound hardware where it didn't use CPU at all, combined with its small size on disk, so ScummVM already had ample support for playing back MIDI - Sort of.

When you request a MIDI device from ScummVM, it might be a software backend like FluidSynth, it might be a physical MIDI playback device, it might be anything, but you only get one.  This is a problem because Obsidian often has multiple MIDI files playing at once.  The vidbot on/off sound for instance is a MIDI sound, and that plays simultaneously with the music.  There's also one even more surprising one which I'll get to in a moment.

This is a problem because how MIDI works is that there are several different "channels" each of which responds to a command, and a MIDI controller sends commands assigned to channels.  This allows a MIDI device to play multiple instruments at once by assigning instruments to different channels.

Being a protocol designed for real-time, MIDI commands that play notes also do not have an associated duration, instead there are separate commands for activating and deactivating a playing note.  In turn, the file format used for MIDI in Obsidian, known as SMF (Standard MIDI Format), basically encodes commands to send to the MIDI device and what time to send them.

However, this creates a problem: If we stop sending commands from a MIDI file while a note is playing, then we'll never send the command to stop playing the note.  When using QuickTime's software synthesizer, this isn't a problem, because it just stops running the software synthesizer for that MIDI source.  If we only have one continuously-active MIDI output though, then we can't do that.  Fortunately, most notes have finite duration anyway, but sometimes they don't, and for that, Obsidian has a gnarly hack: A MIDI file that sends an "all notes off" command to every channel in response to various commands, but also periodically on a 30-second timer.

In fact, you may notice a moment in the intro where the music pauses dramatically before displaying the title - that pause is at exactly the 30 second mark, presumably timed exactly to prevent that music note-stopper from ruining the theme.

So, the good news is that despite not having a proper multi-source output, the game is still capable of functioning with a single-source output, as long as you're willing to tolerate abrupt cuts in the music every 30 seconds.

There is another problem: MIDI sources have individually-controlled volume, so how do we handle that?  Well, the simple way is that when a MIDI note is sent, it is sent with a "velocity" which roughly corresponds to the volume of the note, but really means something like "intensity."  It's not exactly a volume scale, because you might get more attack in the sound with higher velocity, for instance, with a higher-quality MIDI renderer.  But, it's close enough for what we need, so it was done by intercepting note on and note off commands, modulating the velocity, stuffing the new velocity back into the command, and sending it out.

Mixing things up a bit

Just because it works doesn't really mean it's good though, and the music periodically getting chopped is really annoying.  But, like I said, I like digging through file formats and standards almost as much as I hate having free time, so I decided to create the secret weapon to solve this problem: The dynamic MIDI mixer.  This fun feature is enabled if you enable the "Improved music mixing" option in the game options (it's on by default).

So, what does this actually do?  Well, basically we are going to implement multiple MIDI drivers that funnel into a single MIDI driver in a more intelligent way.  First, we have a way of tracking the full state of every single MIDI channel:

We track this for every channel of the output device and also every channel of each MIDI source.

Each MIDI source in the game is assigned to MidiCombinerSource, which is a thing that looks like a MIDI driver to it, but in the dynamic music mixer, is a funnel into the combiner.

Aside from starting and stopping notes though, there are several other types of MIDI messages: There are controllers, which can alter the qualities of a channel in some way, there's the program (which affects what instrument to play), and also two features called "sustain" and "sostenuto."  So far, ScummVM doesn't support any games known to use sostenuto, but I wanted to get this right the first time.  Sustain and sostenuto are normally controlled by pedals, and if the pedal is triggered while active, then any note playing continues playing after the "note off" command until the pedal is released, so to track an active note, we need to track whether or not it is sustained as well.

Internally, in order for anything to happen, a MIDI source channel has to be dynamically assigned to an available "physical" channel on the output device.  The way that works is: All channels are unassigned by default.  If a controller change happens, and the channel is unassigned, then it updates the MidiChannelState of the source channel, but otherwise does nothing.

When a note is played on an unassigned channel, the combiner tries to find a channel that is the right channel type (i.e. one channel is typically reserved for percussion), and otherwise stopped playing its last note the longest time ago.  If there is still an active source assigned to that channel, then that source channel is unassigned.  Then, the new source channel is assigned to that output channel, the channel state of the output channel and the source's channel are compared, and anything that is different is adjusted by sending the necessary MIDI commands to do so, and then the note is played.  If a source channel is still actively assigned to an output channel, then control commands are sent to the output immediately.

Because of this, the "all notes off" trigger actually doesn't do anything any more.  It updates the MIDI combiner source's channel state, but since nothing in that MIDI file ever plays a note, none of it ever goes to the output driver.

We can also use the MIDI gain control, which is an actual volume control, instead of having to module velocity, and since we know which output channels are being used by a source, we can completely quiet a source by just sending an "all sound off" command (which bypasses sustain) to silence the channel, and then deallocate it.

La la la, I can't hear you!

Well that's unfortunate.  Individual sounds in Obsidian can have their own volume level, which is modulated by the global volume level, so setting the volume to the maximum doesn't necessarily mean that the MIDI source is set to the maximum possible volume.  As mentioned earlier, the vidbot sounds are MIDI, and by default their volume is set to 50%, but they do seem significantly quieter when using gain control vs. modulating the velocity.  What's going on here?

After a bunch of trawling around, it turns out the answer is in the General MIDI Level 2 specification at page 7: "gain in dB = 40 * log10(cc7/127)"

... What does that mean?  Unfortunately, I'm actually finding out that I did the fix wrong as I'm writing this, and need to fix it again!  "cc7" is the MIDI channel volume value.  The volume scale on MIDI sources is a linear scale, meaning a value half as large causes the amplitude to be halved.  Decibels (dB) are a logarithmic scale, meaning any 2 values with the same distance apart the same proportional magnitude.  One problem with this though is figuring out what we're measuring.  Many measurements in electrical and sound engineering are the square or square root of other measurements due to various physics interactions.  Decibels are a scale designed around factor-of-10 changes, but whether a 10x change is +10 or +20 depends on what quantity is being measured, due to that problem of quantities being squared.

What we want is the measure that the volume is supposed is be scaling, which is amplitude, which is on the +20 = 10x scale.


So converting normalized MIDI volume (a.k.a. the volume rescaled to a 0-1 scale) to modulation involves squaring it, so we need to figure out how to convert scaled modulation back into a new normalized volume.

... okay, great, so basically all we have to do to compute the new MIDI volume is scale the MIDI source volume to a 0-1 scale, and then multiple the original MIDI volume by the square root of that value.  Cool.  This means our vidbots playing at 50% volume are now a 0.7071x multiplier instead of 0.5.  Unfortunately, the original implementation of the decibel scale wrong and was using a 4th-root, which I guess is better than it being too quiet, but still wrong.

Anyway, it's fixed-er now!

I've got 32 problems and my bit depth isn't one

Most monitors today try to run games in 32-bit 8-bits-per-channel (and 8 bits of waste to help with memory alignment) color mode, or sometimes HDR if the game supports it.  Attempting to run Obsidian in 32-bit mode will greet you with this error though:

Obsidian was released at a time when 16-bit color depth (or "thousands of colors" as it was called on Mac) was fairly new, most things ran in 8-bit depth with a color lookup table, so 16-bit was fairly cutting edge.

All of Obsidian's images are 16-bit images though, so we're not really losing anything by running with a 32-bit render target, are we?  Actually, we are.  The images are 16-bit.  The videos are not.  The videos are encoded with Cinepak, which in full-color mode can render out to 8 bits per channel, and even if the input images to Cinepak were in 16-bit color, the averaging out that occurs during the encoding process (and from the YUV-to-RGB perceptual transform) have more accuracy per color channel.  So, we want to render in 32-bit.

That's accomplished by having an override that just lies to the scripts about what color depth the game is being run at.

A more cinematic experience

When Obsidian came out, most desktop monitors were 640x480, which is a 3:4 aspect ratio.  Most displays today are 16:9 widescreen.  However, the images and videos in Obsidian are all 640x360, also 16:9.  Well that's pretty neat.  It would be cool to play the game in widescreen and get rid of the letterboxing on both sides of the monitor, wouldn't it?

Okay, let's just offset the game frame up 60 pixels, cut down the resolution, and lie to the game about the resolution it's running at so it doesn't throw a startup error!  That's a great start, but what about the inventory items that display below the frame?

There's an internal system for overriding object behaviors, which can somewhat handle this.  It's more complicated than it could be, because with how Obsidian is broken up, sometimes you carry the item into different sections and subsections, which means the elements that display them are duplicated.

Fortunately, the inventory items are color-keyed already even though they're on a black background, so I didn't need to do anything to make them not render a black box outline, but they did run into a problem with the security survey in the maze.


Uh oh, the keycards are overlapping the survey, and I really need to let him know that I eat my ice cream straight from a bowl!  Well I guess we could just change the layer order so the form's on top of the cards, right?  Actually, no, because the security form image isn't just the security form, it also includes a bit to the left.

Ultimately, this was resolved by detecting this specific situation.  When the security form is displayed, the cards are moved off-screen, and moved back when the security form is dismissed.  Now I can eat my ice cream in peace.

There was one last problem in widescreen mode:


... the Rocket Science Games logo is too big!  Now you might be thinking, "so what, that company hasn't existed for 25 years, who caaaaaares!?"  Unfortunately, I care, so we need to fix this!  But what can we do?  Shrink it?  But then we get letterboxing on the sides again, and that doesn't look nice!  What we need, which you may have seen if you watch old TV shows re-broadcast on widescreen, is an anamorphic filter.

An anamorphic filter works by stretching out the sides of the image more than the center areas.  This was done by computing an exponential curve that has a derivative of 0 at the point where it's supposed to stop (meaning, basically, the rate of pixel coordinate change becomes normal where the curve tops, preventing a noticeable seam), and applying that to the pixel grid.  Here's what the filter applied to an 8x8 grid pattern looks like:

And here's the filter applied to the logo video:

Much better!

Forgetting to save is half the adventure

While practically a foreign concept these days, checkpoints weren't always a thing.  Like most games of its time, if you wanted to keep your progress, you had to save the game.  The game even reminds you to save when you try to quit!  Having auto-save would be really nice though, wouldn't it?

The auto-save feature works partly on a timer, like most ScummVM games, but there's an option (enabled by default) to also auto-save at progress points.  Most puzzle solutions in Obsidian set some variable, but you aren't always allowed to save, so this creates a bit of a problem: Finding how to detect puzzle solutions, and then finding a safe place to save.

Auto-save detection is done in two ways: One way is by detecting arrival in a specific scene while coming from a specific other scene, which is used to detect things like chapter transitions, but also things like beating the maze without the proper document.  The other way is detecting arrival in a specific scene with a puzzle completion variable set differently than what it was the last time the game was loaded or restarted.  Normally, the latter category is done by triggering in the scene you would wind up in after the puzzle.

There is a minor omission in this scheme: If you complete a puzzle, save the game, then reload it, the autosave won't trigger... but in that case, they've pretty much saved right there anyway, so who cares?

I'm playing a game, not a menu!

ScummVM shows screenshots of the game where you saved it in its save game UI, but there's one problem: If you save the game from the in-game UI, then you're not looking at the game, you're looking at a menu.  So, some hooks had to be added to take a screenshot before transitioning to the menu and using that screenshot for the save instead.

Some light reading

People seem to like subtitles, so why not add them too?  Well, they were added, and you can download them from ScummVM's add-ons page.

Subtitles mostly work by detecting when a specific sound asset is activated and popping up the subtitles.  There is however an option for a very small number of subtitles:

Without spoiling too much, there's one puzzle that involves sound, but popping up subtitles for the sound at all kind of gives away the answer, and it's one of the neatest puzzles in the game.  So, depending on what your rationale is for enabling subtitles, you can keep this option off if you want to have subtitles, and can hear sound, but don't want to spoil the challenge.

Overall, there are a 686 voice lines in the game.  Deciding when to split a subtitle up, and where to split the lines, was an ongoing challenge that I lack the expertise to do, but I did the best I could.

In some cases, this involved getting help for figuring out lines I couldn't make out.  I'd never actually heard the term "lousing up" before in my life.  Also, I added speaker names to the subtitles, but in some cases, we don't really know the names of these characters and just have to guess.

The identity of the characters also clearly has some story implications, but without knowing for sure who they are, that has to be danced around, a problem made more difficult by Obsidian's casting.  While some of the characters are professional actors, many of them (especially the vidbots) are Rocket Science employees.

The bureau chief, for instance, is almost certainly Howard Cushnir, who also appears in a celebration video, and I think is also a character in a chapter intro cinematic, but it's not clear if that's because it's the same character, the same person playing multiple characters, or if I'm mistaken and it's not even the same person.  Adding to this, the brief appearance in a cinematic is as Max's teacher, but he appears to be the project administrator in the journal.  (At least, that seems to be the implication - that his appearance as the chief in the bureau realm is a reflection of his bureaucratic authority status in the real world.)  It would be odd for that to be the same person, then, but maybe he's a graduate professor at a research university and it is the same character?  There's no way to tell.

Another amusing case was the eye test in the bureau maze.  The voiceover there drops to a soft, illegible level, and it's supposed to be a gag that it's unintelligible, so he tells you to go to the hearing test booth (who sends you back for an eye test).  The problem is that it is intelligible if you isolate the sound and turn the volume up, so the intent is that it's illegible, but there are actual coherent words in the line.  So, should the line show the words, or something like "<Unintelligible>" to keep the gag?  I kind of split the difference.

Eventually, it's time to move on

Doing this involved a lot of testing.  In many cases, I found things in the logic that looked like they caused bugs, confirmed that they caused bugs, and verified that the same bugs occurred in the original game.  That type of bug is much harder to fix, and ultimately only one of them was (a progression blocker that occurs if you save before logging into the journal).

The inventory panning bug is the only bug left that hasn't been fixed, and due to how the widescreen mod works, giving it a proper fix is difficult.  In non-widescreen mode, the fix is to only pan the part of the screen with main-scene elements.

In the end though, the two most important lessons I've learned in life are that shipping beats perfection, and motivation is a finite resource.  Working on this was always a race against burnout, and eventually it started reaching the point where it was hard to justify any further work vs. moving on to other things.  Eventually it's time to say "it's good, and that's good enough," stick a fork in it, and move on to other things.

The future of the mTropolis engine in ScummVM

mTropolis was used to ship a few dozen titles, and a few of them have been on the list as possible additions: Muppet Treasure Island, S.P.Q.R. - The Empire's Darkest Hour, Star Trek: The Game Show, and MindGym.  The first one is done and was added in ScummVM in the 2.8.0 release, the next two I have copies of.  S.P.Q.R. is the next one on the to-do.

However, as mentioned above, motivation is not a finite resource, and unlike some members of the team, I don't really approach things with nostalgia, or as a historian that thinks it's not their place to judge the quality of things from the past.  My goal is to save valuable things from oblivion, and the further down the to-do list I go, the more questionable that "value" gets, in my opinion.  So, we'll see what happens, and when it happens, but that's the plan for now.

Jank beyond the dream worlds

It wouldn't be a good rant if I didn't leave you with some tales from other mTropolis games, would it?  Muppet Treasure Island has numerous hacks to deal with duplicated aliased compound variables needing to be linked up in a way that I still haven't made any sense of, and music doesn't work in S.P.Q.R. right now because it depends on sending messages to objects in a scene that is never loaded.

Also, I finally figured out what those extra 8 bytes in the catalog header are: Eventually mFactory realized that it was a problem to have separate formats for Mac and Windows, so they decided to make a cross-platform format.  Does the cross-platform format work by being in a common format that works on both platforms?  Of course not.  It exports the Mac and Windows versions into the same file, and de-duplicates the asset data.

That's all for now.  See you in the future, somewhere.

~~Fin~~

by OneEightHundred (noreply@blogger.com) at 2024-02-24 08:47

2022-11-15

After reviewing the code for the simple YAML parser I wrote, I decided it was getting a little messy, so before continuing, I decided to refactor it a little bit.

The simples thing to do was to separate the serialisation and the deserialisation into separate classes, and simple call those from within the YamlConvert class in the existing methods. This approach tends to be what other JSON and YAML libraries do, with added functionality such as being able to control aspects of the serialisation/deserialisation process for specific types.

I currently don’t need, or want, to do that, as I’m taking a much more brute force approach - however it is something to consider for a future refactor. Maybe.

I ended up with the following for the YamlConvert:

public static class YamlConvert
{
    private static YamlSerialiser Serialiser;
    private static YamlDeserialiser Deserialiser;
    
    static YamlConvert()
    {
        Serialiser = new YamlSerialiser();
        Deserialiser = new YamlDeserialiser();
    }
    
    public static string Serialise(YamlHeader header)
    {
        return Serialiser.Serialise(header);
    }

    public static YamlHeader Deserialise(string filePath)
    {
        if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);

        var content = File.ReadAllLines(filePath);

        return Deserialise(content);
    }

    public static YamlHeader Deserialise(string[] rawHeader)
    {
        return Deserialiser.Deserialise(rawHeader);
    }
}

It works quite well, as it did before, and looks a lot better. There is no dependency configuration to worry about, as I mentioned above I’m not worried about swapping out the serialisation/deserialisation process at any time.

2022-11-15 00:00

2022-07-30

Previously we left off with a method which could parse the YAML header in one of our markdown files, and it was collecting each line between the --- header marker, for further processing.

One of the main requirements for the overall BlogHelper9000 utility is to be able to standardise the YAML headers in each source markdown file for a post. Some of the posts had a mix of different tags, that were essentially doing the same thing, so one of the aims is to be able to collect those, and transform the values into the correct tags.

In order to achieve this, we can specify a collection of the valid header properties up front, and also a collection of the ‘other’ properties that we find, which we can hold for further in the process when we’ve written the code to handle those properties. The YamlHeader class has already been defined, and we can use a little reflection to load that class up and pick the properties out.

private static Dictionary<string, object?> GetYamlHeaderProperties(YamlHeader? header = null)
{
    var yamlHeader = header ?? new YamlHeader();
    return yamlHeader.GetType()
        .GetProperties(BindingFlags.DeclaredOnly | BindingFlags.Public | BindingFlags.Instance)
        .Where(p => p.GetCustomAttribute<YamlIgnoreAttribute>() is null)
        .ToDictionary(p =>
        {
            var attr = p.GetCustomAttribute<YamlNameAttribute>();

            return attr is not null ? attr.Name.ToLower() : p.Name.ToLower();
        }, p => p.GetValue(yamlHeader, null));
}

We need to be careful to ignore collecting properties that are not part of the YAML header in markdown files, but that we use in the YamlHeader that we can use when doing further processing - such as holding the ‘extra’ properties that we’ll need to match up with their valid counterparts in a further step. Thus we have the custom YamlIgnoreAttribute that we can use to ensure we drop properties that we don’t care about. We also need to ensure that we can match up C# property names with the actual YAML header name, so we also have the YamlNameAttribute to handle this.

Then we just need a way of parsing the individual lines and pulling the header name and the value out.

(string property, string value) ParseHeaderTag(string tag)
{
    tag = tag.Trim();
    var index = tag.IndexOf(':');
    var property = tag.Substring(0, index);
    var value = tag.Substring(index+1).Trim();
    return (property, value);
}

Here we just return a simple tuple after doing some simple substring manipulation, which is greatly helped by the header and its value always being seperated by ‘:’.

Then if we put all that together we can start to parse the header properties.

private static YamlHeader ParseYamlHeader(IEnumerable<string> yamlHeader)
{
    var parsedHeaderProperties = new Dictionary<string, object>();
    var extraHeaderProperties = new Dictionary<string, string>();
    var headerProperties = GetYamlHeaderProperties();

    foreach (var line in yamlHeader)
    {
        var propertyValue = ParseHeaderTag(line);

        if (headerProperties.ContainsKey(propertyValue.property))
        {
            parsedHeaderProperties.Add(propertyValue.property, propertyValue.value);
        }
        else
        {
            extraHeaderProperties.Add(propertyValue.property, propertyValue.value);
        }
    }

    return ToYamlHeader(parsedHeaderProperties, extraHeaderProperties);

All we need to do is, to setup up some dictionaries to hold the header properties, get the dictionary of valid header properties, and then loop through each line, parsing the header tag and verifying whether the property is a ‘valid’ one that we definitely know we want to keep, and or one we need to hold for further processing. You’ll noticed in the above code, that it’s missing an end brace: this is deliberate, because the ParseHeaderTag method and ToYamlHeader method are both nested methods.

Reading through the code to write this post has made me realise that we can do some refactoring to make this look a little nicer.

So we’ll look at that next.

2022-07-30 00:00

2022-07-22

The next thing to do to get BlogHelper9000 functional is to write a command which provides some information about the posts in the blog. I want to know:

  • How many published posts there are
  • How many drafts there are
  • A short list of recent posts
  • How long it’s been since a post was published

I also know that I want to introduce a command which will allow me to fix the metadata in the posts, which is a little messy. I’ve been inconsistently blogging since 2007, originally starting off on a self-hosted python blog I’ve forgot the name of before migrating to Wordpress, and then migrating to a short lived .net static site generator before switching over to Jekyll.

Obviously, Markdown powered blogs like Jekyll have to provide non-markdown metadata in each post, and for Jekyll (and most markdown powered blogs) that means: YAML.

Parse that YAML

There are a couple of options when it comes to parsing YAML. One would be to use YamlDotNet which is a stable library which conforms with V1.1 and v1.2 of the YAML specifications.

But where is the fun in that?

I’ve defined a POCO called YamlHeader which I’m going to use to use as the in-memory object to represent the YAML metadata header at the top of a markdown file.

If we take a leaf from different JSON converters, we can define a YamlConvert class like this:

public static class YamlConvert
{
    public static string Serialise(YamlHeader header)
    {
    }

    public static YamlHeader Deserialise(string filePath)
    {
    }
}

With this, we can easily serialise a YamlHeader into a string, and deserialise a file into a YamlHeader.

Deserialise

Deserialising is the slight more complicated of the two, so lets start with that.

Our first unit test looks like this:

    [Fact]
    public void Should_Deserialise_YamlHeader()
    {
        var yaml = @"---
layout: post
title: 'Dynamic port assignment in Octopus Deploy'
tags: ['build tools', 'octopus deploy']
featured_image: /assets/images/posts/2020/artem-sapegin-b18TRXc8UPQ-unsplash.jpg
featured: false
hidden: false
---
post content that's not parsed";
        
        var yamlObject = YamlConvert.Deserialise(yaml.Split(Environment.NewLine));

        yamlObject.Layout.Should().Be("post");
        yamlObject.Tags.Should().NotBeEmpty();
    }

This immediately requires us to add an overload for Deserialise to the YamlConvert class, which takes a string[]. This means our implementation for the first Deserialise method is simply:

public static YamlHeader Deserialise(string filePath)
{
    if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);

    var content = File.ReadAllLines(filePath);

    return Deserialise(content);
}

Now we get into the fun part. And a big caveat: I’m not sure if this is the best way of doing this, but it works for me and that’s all I care about.

Anyway. A YAML header block is identified by a single line of only --- followd by n lines of YAML which is signified to have ended by another single line of only ---. You can see this in the unit test above.

The algorithm I came up with goes like this:

For each line in lines:
  if line is '---' then
    if header start marker not found then
      header start marker found
      continue
     break loop
    store line
  parse each line of found header

So in a nutshell, it loops through each line in the file, look for the first --- to identify the start of the header, and then until it hits another ---, it gathers the lines for further processing.

Translated into C#, the code looks like this:

public static YamlHeader Deserialise(string[] fileContent)
{
    var headerStartMarkerFound = false;
    var yamlBlock = new List<string>();

    foreach (var line in fileContent)
    {
        if (line.Trim() == "---")
        {
            if (!headerStartMarkerFound)
            {
                headerStartMarkerFound = true;
                continue;
            }

            break;
        }

        yamlBlock.Add(line);
    }
        
    return ParseYamlHeader(yamlBlock);
}

This is fairly straightforward, and isn’t where I think some of the problems with the way it works actually are - all that is hidden behind ParseYamlHeader, and is worth a post on its own.

2022-07-22 00:00

2022-07-14

In the introductory post to this series, I ended with issuing a command to initialise a new console project, BlogHelper9000. It doesn’t matter how you create your project, be it from Visual Studio, Rider or the terminal, the end result is the same, as the templates are all the same.

With the new .net 6 templates, the resulting Program.cs is somewhat sparse, if you discount the single comment then all you get in the file is a comment and a Console.WriteLine("Hello, World!");, thanks to all the new wizardry in the latest versions of the language and the framework.

Thanks to this new fangled sorcery, the app still has a static main method, you just don’t need to see it, and as such, the args string array is still there. For very simple applications, this is all you really need to do. However, once you get past a few commands, with a few optional flags, things can get complicated, fast. This can into a maintenance headache.

In the past I’ve written my own command line parsing abstractions, I’ve used Mono.Options and other libraries, and I think I’ve finally settled on Oakton as my go to library for quickly and easily adding command line parsing to a console application. It’s intuitive, easy to use and easy to maintain. This means you can easily introduce it into a team environment and have everyone understand it immediately.

Setup Command loading

After following Oakton’s getting started documentation, you can see how easy it is to get going with a basic implementation. I recommended introducing the ability to have both synchronous and asynchronous commands able to be executed, and you achieve this by a small tweak to the Program.cs and taking into consideration the top-level statements in .net 6, like this:

using System.Reflection;

var executor = CommandExecutor.For(_ =>{
    _.RegisterCommands(typeof(Program).GetTypeInfo().Assembly);
});

var result = await executor.ExecuteAsync(args);
return result;

In .net 5, or if you don’t like top-level statements and have a static int Main you can make it static Task<int> Main instead and return the executor.ExecuteAsync instead of awaiting it.

Base classes

In some console applications, different commands can have the same optional flags, and I like to put mine in a class called BaseInput. Because I know I’m going to have several commands in this application, I’m going to add some base classes so that the different commands can share some of the same functionality. I’ve also used this in the past to, for example, create a database instance in the base class, which is then passed into each inheriting command. It’s also a good place to add some common argument/flag validation.

What I like to do is have an abstract base class, which inherits from the Oakton command, and add an abstract Run method to it, and usually a virtual bool ValidateInput too; these can then be overriden in our actual Command implementations and have a lot of nice functionality automated for us in a way that can be used across all Commands.

Some of the detail of these classes are elided, to stop this from being a super long post, you can see all the details in the Github repo.

public abstract class BaseCommand<TInput> : OaktonCommand<TInput>
    where TInput : BaseInput
{
    public override bool Execute(TInput input)
    {
        return ValidateInput(input) && Run(input);
    }

    protected abstract bool Run(TInput input);

    protected virtual bool ValidateInput(TInput input)
    {
        /* ... */
    }
}

This ensures that all the Commands we implement can optionally decide to validate the inputs that they take in, simply by overriding ValidateInput.

The async version is exactly the same… except async:

public abstract class AsyncBaseCommand<TInput> : OaktonAsyncCommand<TInput>
    where TInput : BaseInput
{
    public override Task<bool> Execute(TInput input)
    {
        return ValidateInput(input) && Run(input);
    }

    protected abstract Task<bool> Run(TInput input);

    protected virtual Task<bool> ValidateInput(TInput input)
    {
        /* ... */
    }
}

There is an additional class I’ve not yet shown, which adds some further reusable functionality between each base class, and that’s the BaseHelper class. I’ve got a pretty good idea that any commands I write for the app are going to operate on posts or post drafts, which in jekyll are stored in _posts and _drafts respectively. Consequently, the commands need an easy way of having these paths to hand, so a little internal helper class is a good place to put this shared logic.

internal class BaseHelper<TInput> where TInput : BaseInput
{
    public string DraftsPath { get; }

    public string PostsPath { get;  }

    private BaseHelper(TInput input)
    {
        DraftsPath = Path.Combine(input.BaseDirectoryFlag, "_drafts");
        PostsPath = Path.Combine(input.BaseDirectoryFlag, "_posts");
    }

    public static BaseHelper<TInput> Initialise(TInput input)
    {
        return new BaseHelper<TInput>(input);
    }

    public bool ValidateInput(TInput input)
    {
        if (!Directory.Exists(DraftsPath))
        {
            ConsoleWriter.Write(ConsoleColor.Red, "Unable to find blog _drafts folder");
            return false;
        }

        if (!Directory.Exists(PostsPath))
        {
            ConsoleWriter.Write(ConsoleColor.Red, "Unable to find blog _posts folder");
            return false;
        }

        return true;
    }
}

This means that our base class implementations can now become:

private BaseHelper<TInput> _baseHelper = null!;
protected string DraftsPath => _baseHelper.DraftsPath;
protected string PostsPath => _baseHelper.PostsPath;

public override bool Execute(TInput input)
{
    _baseHelper = BaseHelper<TInput>.Initialise(input);
    return ValidateInput(input) && Run(input);
}

protected virtual bool ValidateInput(TInput input)
{
    return _baseHelper.ValidateInput(input);
}
Note the null!, where I am telling the compiler to ignore the fact that _baseHelper is being initialised to null, as I know better.

This allows each command implementation to hook into this method and validate itself automatically.

First Command

Now that we have some base classes to work with, we can start to write our first command. If you check the history in the repo, you’ll see this wasn’t the first command I actually wrote… but it probably should have been. In any case, it only serves to illustrate our first real command implementation.

public class InfoCommand : BaseCommand<BaseInput>
{
    public InfoCommand()
    {
        Usage("Info");
    }

    protected override bool Run(BaseInput input)
    {
        var posts = LoadsPosts();
        var blogDetails = new Details();

        DeterminePostCount(posts, blogDetails);
        DetermineDraftsInfo(posts, blogDetails);
        DetermineRecentPosts(posts, blogDetails);
        DetermineDaysSinceLastPost(blogDetails);

        RenderDetails(blogDetails);

        return true;
    }

    /**...*/
}

LoadPosts is a method in the base class which is responsible for loading the posts into memory, so that we can process them and extract meaningful details about the posts. We put store this information in a Details class, which is what we ultimately use to render the details to the console. You can see the details of these methods in the github repository, however they all boil down to simple Linq queries.

Summary

In this post we’ve seen how to setup Oakton and configure a base class to extend the functionality and give us more flexibility, and an initial command. In subsequent posts, we’ll cover more commands and I’ll start to use the utility to tidy up metadata across all the posts in the blog and fix things like images for posts.

2022-07-14 00:00

2022-03-11

I just had to setup my vimrc and vimfiles on a new laptop for work, and had some fun with Vim, mostly as it’s been years since I had to do it. I keep my vimfiles folder in my github, so I can grab it wherever I need it.

To recap, one of the places that Vim will look for things is $HOME/vimfiles/vimrc, where $HOME is actually the same as %USERPROFILE%. In most corporate environments, the %USERPROFILE% is actually stored in a networked folder location, to enable roaming profile support and help when a user gets a new computer.

So you can put your vimfiles there, but, it’s a network folder - it’s slow to start an instance of Vim. Especially if you have a few plugins.

Instead, what you can do is to edit the _vimrc file in the Vim installation folder (usually in C:\Program Files (x86)\vim), delete the entire contents and replace it with:

set rpt+=C:\path\to\your\vimfiles
set viminfo+=nC:\path\to\your\vimfiles\or\whatever
source C:\path\to\your\vimfiles\vimrc

What this does is:

  1. Sets the runtime path to be the path to your vimfiles
  2. Tells vim where to store/update the viminfo file (which stores useful history state amongst other things)
  3. Source your vimrc file and uses that

This post largely serves as a memory aid for myself when I need to do this again in future I won’t spend longer than I probably needed to googling it to find out how to do it, but I hope it helps someone else.

2022-03-11 00:00

2022-03-04

Recently I was inspired by @buhakmeh’s blog post, Supercharge Blogging With .NET and Ruby Frankenblog to write something similar, both as an exercise and excuse to blog about something, and as a way of tidying up the metadata on my existing blog posts and adding header images to old posts.

High level requirements

The initial high level requirements I want to support are:

  1. Cross-platform. This blog is jekyll based, and as such is written in markdown. Any tool I write for automation purposes should be cross-platform.
  2. Easily add posts from the command line, and have some default/initial yaml header metadata automatically added.
  3. See a high level overview of the current status of my blog. This should include things like the most recent post, how many days I’ve been lazy and not published a post, available drafts etc
  4. Publish posts from the command line, which should update the post with published status and add the published date to the yaml header and filename.
  5. Create a customised post header for each post on the blog, containing some kind of blog branding template and the post title, and update or add the appropriate yaml header metadata to each post. This idea also comes from another @buhakmeh’s post.
  6. The blog has many years of blog posts, spread across several different blogging platforms before settling on Jekyll. As such, some of the yaml metadata for each blog post is… not consistent. Some effort should go into correcting this.
  7. Automaticlly notify Twitter of published posts.

Next steps

The next series of posts will cover implementing the above requirements… not necessarily in that order. First I will go over setting up the project and configuring Oakton.

After that I will probably cover implementing fixes to the existing blog metadata, as I think that is going to be something that will be required in order for any sort of Info function to work properly, as all of the yaml metadata will need to be consistent.

Then I think I’ll tackle the image stuff, which should be fairly interesting, and should give a nice look to the existing posts, as having prominent images for posts is part of the theme for the blog, which I’ve not really taken full advantage of.

I’ll try to update this post with links to future posts, or else make it all a big series.

dotnet new console --name BlogHelper9000

2022-03-04 00:00

2022-01-11

At work, we have recently been porting our internal web framework into .net 6. Yes, we are late to the party on this, for reasons. Suffice it to say I currently work in an inherently risk averse industry.

Anyway, one part of the framework is responsible for getting reports from SSRS.

The way it did this is to use a wrapper class around a SOAP client generated from good old ReportService2005.asmx?wsdl, using our faithful friend svcutil.exe. The wrapper class used some TaskCompletionSource magic on the events in the client to make the client.LoadReportAsync and the other *Async methods actually async, as the generated client was not truely async.

Fast forward to the modern times, and we need to upgrade it. How do we do that?

Obviously, Microsoft are a step ahead: svcutil has a dotnet version - dotnet-svcutil. We can install it and get going:

dotnet too install --global dotnet-svcutil

Once installed, we can call it against the endpoint:

Make sure you call this command in the root of the project where the service should go
dotnet-svcutil http://server/ReportServer/ReportService2005.asmx?wsdl

In our wrapper class, the initialisation of the client has to change slightly, because the generated client is different to the original svcutil implementation. Looking at the diff between the two files, it’s because the newer version of the client users more modern .net functionality.

The wrapper class constructor has to be changed slightly:

public Wrapper(string url, NetworkCredential credentials)
{
    var binding = new BasicHttpBinding(BasicHttpSecurityMode.TransportCredentialOnly);
    binding.Security.Transport.ClientCredentialType = HttpClientCredentialType.Ntlm;
    binding.MaxReceivedMessageSize = 10485760; // this is a 10mb limit
    var address = new EndpointAddress(url);

    _client = new ReportExecutionServiceSoapClient(binding, address);
    _client.ClientCredentials.Windows.AllowedInpersonationLevel = TokenImpersonationLevel.Impersonation;
    _client.ClientCredentials.Windows.ClientCredential = credentials;
}

Then, the code which actually generates the report can be updated to remove all of the TaskCompletionSource, which actually simplifies it a great deal:

public async Task<byte[]> RenderReport(string reportPath, string reportFormat, ParameterValue[] parameterValues)
{
    await _client.LoadReportAsync(null, reportPath, null);
    await _client.SetExecutionParametersAsync(null, null, parameterValues, "en-gb");
    var deviceInfo = @"<DeviceInfo><Toolbar>False</ToolBar></DeviceInfo>";
    var request = new RenderRequest(null, null, reportFormat, deviceInfo);
    var response = await _client.RenderAsync(request);
    return response.Result;
}

You can then do whatever you like with the byte[], like return it in an IActionResult or load it into a MemoryStream and write it to disk as the file.

Much of the detail of this post is sourced from various places around the web, but I’ve forgotten all of the places I gleaned the information from.

2022-01-11 00:00

2021-10-26

Recently we realised that we had quite a few applications being deployed through Octopus Deploy, and that we had a number of Environments, and a number of Channels, and that managing the ports being used in Dev/QA/UAT across different servers/channels was becoming… problematic.

When looking at this problem, it’s immediately clear that you need some way of dynamically allocating a port number on each deployment. This blog post from Paul Stovell shows the way, using a custom Powershell build step.

As we’d lost track of what sites were using what ports, and that we also have ad-hoc websites in IIS that aren’t managed by Octopus Deploy, we thought that asking IIS “Hey, what ports are the sites you know about using?” might be a way forward. We also had the additional requirement that on some of our servers, we also might have some arbitary services also using a port and that we might bump into a situation where a port was chosen that was already being used by a non-IIS application/website.

Researching the first situation, it’s quickly apparent that you can do this in Powershell, using the Webadministration module. Based on the answers to this question on Stackoverflow, we came up with this:

Import-Module Webadministration

function Get-IIS-Used-Ports()
{
    $Websites = Get-ChildItem IIS:\Sites

    $ports = foreach($Site in $Websites)
    {
        $Binding = $Site.bindings
        [string]$BindingInfo = $Binding.Collection
        [string]$Port = $BindingInfo.SubString($BindingInfo.IndexOf(":")+1,$BindingInfo.LastIndexOf(":")-$BindingInfo.IndexOf(":")-1)

        $Port -as [int]
    }

    return $ports
}

To get the list of ports on a machine that are not being used is also fairly straightforward in Powershell:

function Get-Free-Ports()
{
    $availablePorts = @(49000-65000)
    $usedPorts = @(Get-NetTCPConnection | Select -ExpandProperty LocalPort | Sort -Descending | Where { $_ -ge 49000})

    $unusedPorts = foreach($possiblePort in $usedPorts)
    {
        $unused = $possiblePort -notin $usedPorts
        if($unused)
        {
            $possiblePort
        }
    }

    return $unusedPorts
}

With those two functions in hand, you can work out what free ports are available to be used as the ‘next port’ on a server. It’s worth pointing out that if a site in IIS is stopped, then IIS won’t allow that port to be used in another website (in IIS), but the port also doesn’t show up as a used port in netstat -a, which is kind of what Get-NetTCPConnection does.

function Get-Next-Port()
{
    $iisUsedPorts = Get-IIS-Used-Ports
    $freePorts = Get-Free-Ports

    $port = $freePorts | Where-Object { $iisUsedPorts -notcontains $_} | Sort-Object | Select-Object First 1

    Set-OctopusVariable -Name "Port" -Value "$port"
}

Then you just have to call it at the end of the script:

Get-Next-Port

You’d also want to have various Write-Host or other logging messages so that you get some useful output in the build step when you’re running it.

2021-10-26 00:00

2021-05-06

If you found this because you have a build server which is ‘offline’, without any external internet access because of reasons, and you can’t get your build to work because dotnet fails to restore the tool you require for your build process because of said lack of external internet access, then this is for you.

In hindsight, this may be obvious for most people, but it wasn’t for me, so here it is.

In this situation, you just need to shy away from local tools completely, because as of yet, I’ve been unable to find anyway of telling dotnet not to try to restore them, and they fail every build.

Instead, I’ve installed the tool(s) as a global tool, in a specific folder, e.g. C:\dotnet-tools, which I’ve then added to the system path on the server. You may need to restart the build server for it to pick up the changes to the environment variable.

One challenge that remains is how to ensure the dotnet tools are consistent on both the developer machine, and the build server. I leave that as an exercise for the reader.

2021-05-06 00:00

2021-04-01

I’m leaving this here so I can find it again easily.

We had a problem updating the Visual Studio 2019 Build Tools on a server, after updating an already existing offline layout.

I won’t go into that here, because it’s covered extensively on Microsoft’s Documentation website.

The installation kept failing, even when using --noweb. It turns out that when your server is completely cut off from the internet, as was the case here, you also need to pass --noUpdateInstaller.

This is because (so it would seem) that even though --noweb correctly tells the installer to use the offline cache, it doesn’t prevent the installer from trying to update itself, which will obviously fail in a totally disconnected environment.

2021-04-01 00:00

2021-01-03

Since a technical breakdown of how Betsy does texture compression was posted, I wanted to lay out how the compressors in Convection Texture Tools (CVTT) work, as well as provide some context of what CVTT's objectives are in the first place to explain some of the technical decisions.

First off, while I am very happy with how CVTT has turned out, and while it's definitely a production-quality texture compressor, providing the best compressor possible for a production environment has not been its primary goal. Its primary goal is to experiment with compression techniques to improve the state of the art, particularly finding inexpensive ways to hit high quality targets.

A common theme that wound up manifesting in most of CVTT's design is that encoding decisions are either guided by informed decisions, i.e. models that relate to the problem being solved, or are exhaustive.  Very little of it is done by random or random-like searching. Much of what CVTT exists to experiment with is figuring out techniques which amount to making those informed decisions.

CVTT's ParallelMath module, and choice of C++

While there's some concidence with CVTT having a similar philosophy to Intel's ISPC compressor, the reason for CVTT's SPMD-style design was actually motivated by it being built a port of the skeleton of DirectXTex's HLSL BC7 compressor.

I chose to use C++ instead of ISPC for three main reasons:
  • It was easier to develop it in Visual Studio.
  • It was easier to do operations that didn't parallelize well.  This turned out to matter with the ETC compressor in particular.
  • I don't trust in ISPC's longevity, in particular I think it will be obsolete as soon as someone makes something that can target both CPU and GPU, like either a new language that can cross-compile, or SPIR-V-on-CPU.

Anyway, CVTT's ParallelMath module is kind of the foundation that everything else is built on.  Much of its design is motivated by SIMD instruction set quirks, and a desire to maintain compatibility with older instruction sets like SSE2 without sacrificing too much.

Part of that compatibility effort is that most of CVTT's ops use a UInt15 type.  The reason for UInt15 is to handle architectures (like SSE2!) that don't support unsigned compares, min, or max, which means performing those operations on a 16-bit number requires flipping the high bit on both operands.  For any number where we know the high bit is zero for both operands, that flip is unnecessary - and a huge number of operations in CVTT fit in 15 bits.

The compare flag types are basically vector booleans, where either all bits are 1 or all bits are 0 for a given lane - There's one type for 16-bit ints, and one for 32-bit floats, and they have to be converted since they're different widths.  Those are combined with several utility functions, some of which, like SelectOrZero and NotConditionalSet, can elide a few operations.

The RoundForScope type is a nifty dual-use piece of code.  SSE rounding modes are determined by the CSR register, not per-op, so RoundForScope when targeting SSE will set the CSR, and then reset it in its destructor.  For other architectures, including the scalar target, the TYPE of the RoundForScope passed in is what determines the operation, so the same code works whether the rounding is per-op or per-scope.

While the ParallelMath architecture has been very resistant to bugs for the most part, where it has run into bugs, they've mostly been due to improper use of AnySet or AllSet - Cases where parallel code can behave improperly because lanes where the condition should exclude it are still executing, and need to be manually filtered out using conditionals.

BC1-7 common themes

All of the desktop formats that CVTT supports are based on interpolation.  S3TC RGB (a.k.a. DXT1) for instance defines two colors (called endpoints), then defines all pixels as being either one of those two colors, or a color that is part-way between those two colors, for each 4x4 block.  Most of the encoding effort is spent on determining what the two colors should be.
 
You can read about a lot of this on Simon Brown's post outlining the compression techniques used by Squish, one of the pioneering S3TC compressors, which in turn is the basis for the algorithm used by CVTT's BC1 compressor.

Principal component analysis

Principal component analysis determines, based on a set of points, what the main axis is that the colors are aligned along.  This gives us a very good guess of what the initial colors should be, simply using the colors that are the furthest along that axis, but it isn't necessarily ideal.

Endpoint refinement

In BC1 for instance, each color is assigned to one of four possible values along the color line.  CVTT solves for that by just finding the color with the shortest distance to each pixel's color.  If the color assignments are known, then it's possible to determine what the color values are that will minimize the sum of the square distance of that mapping.  One round of refinement usually yields slightly better results and is pretty cheap to check.  Two rounds will sometimes yield a slightly better result.

Extrapolation

One problem with using the farthest extents of the principal axis as the color is that the color precision is reduced (quantized) by the format.  In BC1-5, the color is reduced to a 16-bit color with 5 bits of red, 6 bits of green, and 5 bits of alpha.  It's frequently possible to achieve a more accurate match by using colors outside of the range so that the interpolated colors are closer to the actual image colors - This sacrifices some of the color range.

CVTT internally refers to these as "tweak factors" or similar, since what they functionally do is make adjustments to the color mapping to try finding a better result.

The number of extrapolation possibilities increases quadratically with the number of indexes.  CVTT will only ever try four possibilities: No insets, one inset on one end (which is two possibilities, one for each end), and one inset on both ends.

BC1 (DXT1)

CVTT's BC1 encoder uses the cluster fit technique developed by Simon Brown for Squish.  It uses the principal axis to determine an ordering of each of the 16 pixels along the color line, and then rather than computing the endpoints from the start and end points, it computes them by trying each possible count of pixels assigned to each endpoint that maintains the original order and still totals 16.  That's a fairly large set of possibilities with a lot of useless entries, but BC1 is fairly tight on bits, so it does take a lot of searching to maximize quality out of it.

BC2 (DXT3)

BC2 uses BC1 for RGB and 4bpp alpha.  There's not much to say here, since it just involves reducing the alpha precision.

BC3 (DXT5)

This one is actually a bit interesting.  DXT5 uses indexed alpha, where it defines two 8-bit alpha endpoints and a 3-bit interpolator per pixel, but it also has a mode where 2 of the interpolators are reserved 0 and 255 and only 6 are endpoint-to-endpoint values.  Most encoders will just use the min/max alpha.  CVTT will also try extrapolated endpoints, and will try for the second mode by assuming that any pixels within 1/10th of the endpoint range of 0 or 255 would be assigned to the reserved endpoints.  The reason for the 1/10th range is that the rounding range of the 6-value endpoints is 1/10th of the range, and it assumes that for any case where the endpoints would include values in that range, it would just use the 8-index mode and there'd be 6 indexes between them anyway.

BC4 and BC5

These two modes are functionally the same as BC3's alpha encoding, with the exception that the signed modes are offset by 128.  CVTT handles signed modes by pre-offsetting them and undoing the offset.

BC7

BC7 has 8 modes of operation and is the most complicated format to encode, but it's actually not terribly more complicated than BC1.  All of the modes do one of two things: They encode 1 to 3 pairs of endpoints that are assigned to specific groupings of pixels for all color channels, referred to as partitions, or or they encode one set of endpoints for the entire block, except for one endpoint, which is encoded separately.
 
Here are the possible partitions:

Credit: Jon Rocatis from this post.

Another feature of BC7 are parity bits, where the low bit of each endpoint is specified by a single bit.  Parity bits (P-bit) exist as a way of getting a bit more endpoint precision when there aren't as many available bits as there are endpoint channels without causing the channels to have a different number of bits, something that caused problems with gray discoloration in BC1-3.
 
CVTT will by default just try every partition, and every P-bit combination.

Based on some follow-up work that I'm still experimenting with, a good quality trade-off would be to only check certain subsets.  Among the BC7 subsets, the vast majority of selected subsets fall into a only about 16 of the possible ones, and omitting those causes very little quality loss.  I'll publish more about that when my next experiment is further along.

Weight-by-alpha issues

One weakness that CVTT's encoder has vs. Monte Carlo-style encoders is that principal component analysis does not work well for modes in BC7 where the alpha and some of the color channels are interpolated using the same indexes.  This is never a problem with BC2 or BC3, which can avoid that problem by calculating alpha first and then pre-weighting the RGB channels.

I haven't committed a solution to that yet, and while CVTT gets pretty good quality anyway, it's one area where it underperforms other compressors on BC7 by a noticeable amount.

Shape re-use

The groupings of pixels in BC7 are called "shapes."

One optimization that CVTT does is partially reuse calculations for identical shapes.  That is, if you look at the 3 subset grouping above, you can notice that many of the pixel groups are the same as some pixel groups in the 2 subset grouping.

To take advantage of that fact, CVTT performs principal component analysis on all unique shapes before performing further steps.  This is a bit of a tradeoff though: It's only an optimization if those shapes are actually used, so it's not ideal for if CVTT were to reduce the number of subsets that it checks.

Weight reconstruction

One important aspect of BC7 is that, unlike BC1-3, it specifies the precision that interpolation is to be done at, as well as the weight values for each index.  However, doing a table lookup for each value in a parallelized index values is a bit slow.  CVTT avoids this by reconstructing the weights arithmetically:

MUInt15 weight = ParallelMath::LosslessCast<MUInt15>::Cast(ParallelMath::RightShift(ParallelMath::CompactMultiply(g_weightReciprocals[m_range], index) + 256, 9));

Coincidentally, doing this just barely fits into 16 bits of precision accurately.

BC6H

BC6H is very similar to BC7, except it's 16-bit floating point.   The floating point part is achieved by encoding the endpoints as a high-precision base and low-precision difference from the base.  Some of the modes that it supports are partitioned similar to BC7, and it also has an extremely complicated storage format where the endpoint bits are located somewhat arbitrarily.
 
There's a reason that BC6H is the one mode that's flagged as "experimental."  Unlike all other modes, BC6H is floating point, but has a very unique quirk: When BC6H interpolates between endpoints, it's done as if the endpoint values are integers, even though they will be bit-cast into floating point values.

Doing that severely complicates making a BC6H encoder, because part of the floating point values are the exponent, meaning that the values are roughly logarithmic.  Unless they're the same, they don't even correlate proportionally with each other, so color values may shift erratically, and principal component analysis doesn't really work.

CVTT tries to do its usual tricks in spite of this, and it sort of works, but it's an area where CVTT's general approach is ill-suited.

ETC1

ETC1 is based on cluster fit, via what's basically a mathematical reformulation of it.

Basically, ETC1 is based on the idea that the human visual system sees color detail less than intensity detail, so it encodes each 4x4 block as a pair of either 4x2 or 2x4 blocks which each encode a color, an offset table ID, and a per-pixel index into the offset table.  The offsets are added to ALL color channels, making them grayscale offsets, essentially.
 

Unique cumulative offsets

What's distinct about ETC compares to the desktop formats, as far as using cluster fit is concerned, is two things: First, the primary axis is always known.  Second, the offset tables are symmetrical, where 2 of the entries are the negation of the other two.
 
The optimal color for a block, not accounting for clamping, will be the average color of the block, offset by 1/16th of the offset assigned to each pixel.  Since half of the offsets negate each other, every pair of pixels assigned to opposing offsets cancel out, causing no change.  This drastically reduces the search space, since many of the combinations will produce identical colors.  Another thing that reduces the search space is that many of the colors will be duplicates after the precision reduction from quantization.  Yet another thing is that in the first mode, the offsets are +2 and +4, which have a common factor, causing many of the possible offsets to overlap, cancelling out even more combinations.

So, CVTT's ETC1 compressor simply evaluates each possible offset from the average color that results in a unique color post-quantization, and picks the best one.  Differential mode works by selecting the best VALID combination of colors, first by checking if the best pair of colors is valid, and failing that, checking all evaluated color combinations.
 

ETC2

ETC2 has 3 additional selectable modes on top of the ETC1 modes.  One, called T mode, contains 4 colors: Color0, Color1, Color1+offset, and Color2+offset.  Another, called H mode, contains Color0+offset, Color0-offset, Color1+offset, and Color1-offset.  The final mode, called planar mode, contains what is essentially a base color and a per-axis offset gradient.

T and H mode

T and H mode both exist to better handle blocks where, within the 2x4 or 4x2 blocks, the colors do not align well along the grayscale axis.  CVTT's T/H mode encoding basically works with that assumption by trying to find where it thinks the poorly-aligned color axes might be.  First, it generates some chrominance coordinates, which are basically 2D coordinates corresponding to the pixel colors projected on to the grayscale plane.  Then, it performs principal component analysis to find the primary chrominance axis.  Then, it splits the block based on which side of the half-way point each pixel is to form two groupings that are referred to internally as "sectors."

From the sectors, it performs a similar process of inspecting each possible offset count from the average to determine the best fit - But it will also record if any colors NOT assigned to the sector can still use one of the results that it computed, which are used later to determine the actual optimal pairing of the results that it computed.

One case that this may not handle optimally is when the pixels in a block ARE fairly well-aligned along the grayscale axis, but the ability of T/H colors to be relatively arbitrary would be an advantage.
 

ETC2 with punch-through, "virtual T mode"

ETC2 supports punchthrough transparency by mapping one of the T or H indexes to transparent.  Both of these are resolved in the same way as T mode.  When encoding punch-through the color values for T mode are Color0, Color1+offset, transparent, Color1-offset, and in H mode, they are Color0+offset, Color0-offset, transparent, and Color1.

Essentially, both have a single color, and another color +/- an offset, there are only 2 differences: First, the isolated color H mode is still offset, so the offset has to be undone.  If that quantizes to a more accurate value, then H mode is better.  Second, the H mode color may not be valid - H mode encodes the table index low bit based on the order of the colors, but unlike when encoding opaque, reordering the colors will affect which color has the isolated value and which one has the pair of values. 

H mode as T mode encoding

One special case to handle with testing H mode is the possibility that the optimal color is the same.  This should be avoidable by evaluating T mode first, but the code handles it properly just to be safe.  Because H mode encodes the table low bit based on a comparison of the endpoints, it may not be possible to select the correct table if the endpoints are the same.  In that case, CVTT uses a fallback where it encodes the block as T mode instead, mapping everything to the color with the pair of offsets.

Planar mode

Planar mode involves finding an optimal combination of 3 values that determine the color of each channel value as O+(H*x)+(V*Y)

How planar mode actually works is by just finding the least-squares fit for each of those three values at once.
 
Where error=(reconstructedValue-actualValue)², we want to solve for d(error)/dO=0, d(error)/dH=0, and d(error)/dV=0

All three of these cases resolve to quadratic formulas, so the entire thing is just converted to a system of linear equations and solved.  The proof and steps are in the code.

ETC2 alpha and EAC

Both of these "grayscale" modes are both more complicated because they have 3-bit indexes, multiple lookup tables, and an amplitude multiplier.

CVTT tries a limited set of possibilities based on alpha insets.  It tries 10 alpha ranges, which correspond to all ranges where the index inset of each endpoint is +/- 1 the number of the other endpoint.  So, for example, given 8 alpha offsets numbered 0-7, it will try these pairs:
  • 0,7
  • 0,6
  • 1,7
  • 1,6
  • 1,5
  • 2,6
  • 2,5
  • 2,4
  • 3,5
  • 3,4
Once the range is selected, 2 multipliers are checked: The highest value that can be multiplied without exceeding the actual alpha range, and the smallest number that can be multiplied while exceeding it.

The best result of these possibilities is selected.

Possible improvements and areas of interest

BC6H is by far the most improvable aspect.  Traditional PCA doesn't work well because of the logarithmic interpolation.  Sum-of-square-difference in floating point pseudo-logarithmic space performs much worse than in gamma space and is prone to sparkly artifacts.

ETC1 cumulative offset deduplication assumes that each pixel is equally important, which doesn't hold when using weight-by-alpha.

ETC2 T/H mode encoding could try all 15 possible sector assignments (based on the 16-pixel ordering along the chroma axis) instead of one.  I did try finding the grouping that minimized the total square distance to the group averages instead of using the centroid as the split point, but that actually had no effect... they might be mathematically equivalent?  Not sure.

A lot of these concepts don't translate well to ASTC.  CVTT's approaches largely assume that it's practical to traverse the entire search space, but ASTC is highly configurable, so its search space has many axes, essentially.  The fact that partitioning is done AFTER grid interpolation in particular is also a big headache that would require its own novel solutions.

Reduction of the search space is one of CVTT's biggest sore spots.  It performs excellently at high quality targets, but is relatively slow at lower quality targets.  I justified this because typically developers want to maximize quality when import is a one-time operation done offline, and CVTT is fast enough for the most part, but it probably wouldn't be suitable for real-time operation.

by OneEightHundred (noreply@blogger.com) at 2021-01-03 23:21

2020-10-20

 

The plan to post a play-by-play for dev kind of fell apart as I preferred to focus on just doing the work, but the Windows port was a success.

If you want some highlights:

  • I replaced the internal resource format with ZIP archives to make it easier to create custom resource archives.
  • PICT support was dropped in favor of BMP, which is way easier to load.  The gpr2gpa tool handles importing.
  • Ditto with dropping "snd " resource support in favor of WAV.
  • Some resources were refactored to JSON so they could be patched, mostly dialogs.
  • Massive internal API refactoring, especially refactoring the QuickDraw routines to use the new DrawSurface API, which doesn't have an active "port" but instead uses method calls directly to the draw surface.
  • A bunch of work to allow resolution changes while in-game.  The game will load visible dynamic objects from neighboring rooms in a resolution-dependent way, so a lot of work went in to unloading and reloading those objects.

The SDL variant ("AerofoilSDL") is also basically done, with a new OpenGL ES 2 rendering backend and SDL sound backend for improved portability.  The lead version on Windows still uses D3D11 and XAudio2 though.

Unfortunately, I'm still looking for someone to assist with the macOS port, which is made more difficult by the fact that Apple discontinued OpenGL, so I can't really provide a working renderer for it any more.  (Aerofoil's renderer is actually slightly complicated, mostly due to postprocessing.)

Goin' mobile

In the meantime, the Android port is under way!  The game is fully playable so far, most of the work has to do with redoing the UI for touchscreens.  The in-game controls use corner taps for rubber bands and battery/helium, but it's a bit awkward if you're trying to use the battery while moving left due to the taps being on the same side of the screen.

Most of the cases where you NEED to use the battery, you're facing right, so this was kind of a tactical decision, but there are some screens (like "Grease is on TV") where it'd be really nice if it was more usable facing left.

I'm also adding a "source export" feature: The source code package will be bundled with the app, and you can just use the source export feature to save the source code to your documents directory.  That is, once I figure out how to save to the documents directory, which is apparently very complicated...

Anyway, I'm working on getting this into the Google Play Store too.  There might be some APKs posted to GitHub as pre-releases, but there may (if I can figure out how it works) be some Internal Testing releases via GPS.  If you want to opt in to the GPS tests, shoot an e-mail to codedeposit.gps@gmail.com

Will there be an iOS port?

Maybe, but there are two obstacles:

The game is GPL-licensed and there have reportedly been problems with Apple removing GPL-licensed apps from the App Store, and it may not be possible to comply with it.  I've heard there is now a way to push apps to your personal device via Xcode with only an Apple ID, which might make satisfying some of the requirements easier, but I don't know.

Second, as with the macOS version, someone would need to do the port.  I don't have a Mac, so I don't have Xcode, so I can't do it.


by OneEightHundred (noreply@blogger.com) at 2020-10-20 11:09

2019-11-23

Most of the images in Glider PRO's resources are in PICT format.

The PICT format is basically a bunch of serialized QuickDraw opcodes and can contain a combination of both image and vector data.

The first goal is to get all of the known resources to parse.  The good news is that none of the resources in the Glider PRO application resources or any of the houses contain vector data, so it's 100% bitmaps.  The bad news is that the bitmaps have quite a bit of variation in their internal structure, and sometimes they don't match the display format.

Several images contain multiple images spliced together within the image data, and at least one image is 16-bit color even though the rest of the images are indexed color.  One is 4-bit indexed color instead of 8-bit.  Many of them are 1-bit, and the bit scheme for 1-bit images is also inverted compared to the usual expectations (i.e. 1 is black, 0 is white).

Adding to these complications, while it looks like all of the images are using the standard system palette, there's no guarantee that they will - It's actually even possible to make a PICT image that combines multiple images with different color palettes, because the palette is defined per picture op, not per image file.

There's also a fun quirk where the PICT image frame doesn't necessarily have 0,0 as the top-left corner.

I think the best solution to this will simply be to change the display type to 32-bit and unpack PICT images to a single raster bitmap on load.  The game appears to use QuickDraw abstractions for all of its draw operations, so while it presumes that the color depth should be 8-bit, I don't think there's anything that will prevent GlidePort from using 32-bit instead.

In the meantime, I've been able to convert all of the resources in the open source release to PNG format as a test, so it should be possible to now adapt that to a runtime PICT loader.

by OneEightHundred (noreply@blogger.com) at 2019-11-23 20:43

2019-10-10

Recently found out that Classic Mac game Glider PRO's source code was released, so I'm starting a project called GlidePort to bring it to Windows, ideally in as faithful of a reproduction as possible and using the original data files.  Some additions like gamepad support may come at a later time if this stays on track.

While this is a chance to restore of the few iconic Mac-specific games of the era to, it's also a chance to explore a lot of the era technology, so I'll be doing some dev diaries about the process.

Porting Glider has a number of technical challenges: It's very much coded for the Mac platform, which has a lot of peculiarities compared to POSIX and Windows.  The preferred language for Mac OS was originally Pascal, so the C standard library is often mostly or entirely unused, and the Macintosh Toolbox (the operating system API)  has differences like preferring length-prefixed strings instead of C-style null terminated strings.

Data is in big endian format, as it was originally made for Motorola 68k and PowerPC CPUs.  Data files are split into two "forks," one as a flat data stream and the other as a resource database that the toolbox provides parsing facilities for.  In Mac development, parsing individual data elements was generally the preferred style vs. reading in whole structures, which leads to data formats often having variable-length strings and no padding for character buffer space or alignment.

Rendering is done using QuickDraw, the system-provided multimedia infrastructure.  Most images use the system-native PICT format, a vector format that is basically a list of QuickDraw commands.

At minimum, this'll require parsing a lot of Mac native resource formats, some Mac interchange formats (i.e. BinHex 4), reimplementation of a subset of QuickDraw and QuickTime, substitution of copyrighted fonts, and switch-out of numerous Mac-specific compiler extensions like dword literals and Pascal string escapes.

The plan for now is to implement the original UI in Qt, but I might rebuild the UI instead if that turns out to be impractical.

by OneEightHundred (noreply@blogger.com) at 2019-10-10 02:03

2019-09-06

When adding ETC support to Convection Texture Tools, I decided to try adapting the cluster fit algorithm used for desktop formats to ETC.

Cluster fit works by sorting the pixels into an order based on a color axis, and then repeatedly evaluating each possible combination of counts of the number of pixels assigned to each index.  It does so by taking the pixels and applying a least-squares fit to produce the endpoint line.

For ETC, this is is simplified in a few ways: The axis is always 1,1,1, so the step of picking a good axis is unnecessary.  There is only one base color and the offsets are determined by the table index, so the clustering step would only solve the base color.

Assuming that you know what the offsets for each pixel are, the least squares fit amounts to simply subtracting the offset from each of the input pixels and averaging the result.

For a 4x2 block, there are 165 possible cluster configurations, but it turns out that some of those are redundant, given certain assumptions.  The base color is derived from the formula ((color1-offset1)+(color2-offset2)+...)/8, but since the adds are commutative, that's identical to ((color1+color2+...)-(offset1+offset2+...))/8

The first half of that is the total of the colors, which is constant.  The second is the total of the offsets.

Fortunately, not all of the possible combinations produce unique offsets.  Some of them cancel out, since adding 1 to or subtracting 1 from the count of the offsets that are negatives of each other produces no change.  In an example case, the count tuples (5,0,1,2) and (3,2,3,0) are the same, since 5*-L + 0*-S + 1*S + 2*L = 3*-L + 2*-S + 3*S + 0*L.

For most of the tables, this results in only 81 possible offset combinations.  For the first table, the large value is divisible by the small value, causing even more cancellations, and only 57 possible offset combinations.

Finally, most of the base colors produced by the offset combinations are not unique after quantization: Differential mode only has 5-bit color resolution, and differential mode only has 4-bit resolution, so after quantization, many of the results get mapped to the same color.  Deduplicating them is also inexpensive: If the offsets are checked in ascending order, then once the candidate color progresses past the threshold where the result could map to a specific quantized color, it will never cross back below that threshold, so deduplication only needs to inspect the last appended quantized color.

Together, these reduce the candidate set of base colors to a fairly small number, creating a very optimal search space at low cost.

There are a few circumstances where these assumptions don't hold:

One is when the clamping behavior comes into effect, particularly when a pixel channel's value is near 0 or 255.  In that case, this algorithm can't account for the fact that changing the value of the base color would have no effect on some of the offset colors.

One is when the pixels are not of equal importance, such as when using weight-by-alpha, which makes the offset additions non-commutative, but that only invalidates the cancellation part of the algorithm.  The color total can be pre-weighted, and the rest of the algorithm would have to rely on working more like cluster fit: Sort the colors along the 1,1,1 line and determine the weights for the pixels in that order, generate all 165 cluster combinations, and compute the weight totals for each one.  Sort them into ascending order, and then the rest of the algorithm should work.

One is when dealing with differential mode constraints, since not all base color pairs are legal.  There are some cases where a base color pair that is just barely illegal could be made legal by nudging the colors closer together, but in practice, this is rare: Usually, there is already a very similar individual mode color pair, or another differential mode pair that is only slightly worse.

In CVTT, I deal with differential mode by evaluating all of the possibilities and picking the best legal pair.  There's a shortcut case when the best base color for both blocks produces a legal differential mode pair, but this is admittedly a bit less than optimal: It picks the first evaluation in the case of a tie when searching for the best, but since blocks are evaluated starting with the largest combined negative offset, it's a bit more likely to pick colors far away from the base than colors close to the base, even though colors closer to the average tend to produce smaller offsets and are more likely to be legal, so this could be improved by making the tie-breaking function prefer smaller offsets.

In practice though, the differential mode search is not where most of the computation time is spent: Evaluating the actual base colors is.

As with the rest of CVTT's codecs, brute force is still key: The codec is designed to use 8-wide SSE2 16-bit math ops wherever possible to processing 8 blocks at once, but this creates a number of challenges since sorting and list creation are not amenable to vectorization.  I solve this by careful insertion of scalar ops, and the entire differential mode part is scalar as well.  Fortunately, as stated, the parts that have to be scalar are not major contributors to the encoding time.


You can grab the stand-alone CVTT encoding kernels here: https://github.com/elasota/ConvectionKernels

by OneEightHundred (noreply@blogger.com) at 2019-09-06 00:47

2018-03-30

Convection Texture Tools is now roughly equal quality-wise with NVTT at compressing BC7 textures despite being about 140 times faster, making it one of the fastest and highest-quality BC7 compressors.

How this was accomplished turned out to be simpler than expected.  Recall that Squish became the gold standard of S3TC compressors by implementing a "cluster fit" algorithm that ordered all of the input colors on a line and tried every possible grouping of them to least-squares fit them.

Unfortunately, using this technique isn't practical in BC7 because the number of orderings has rather extreme scaling characteristics.  While 2-bit indices have a few hundred possible orderings, 4-bit indices have millions, most BC7 mode indices are 3 bits, and some have 4.

With that option gone, most BC7 compressors until now have tried to solve endpoints using various types of endpoint perturbation, which tends to require a lot of iterations.

Convection just uses 2 rounds of K-means clustering and a much simpler technique based on a guess about why Squish's cluster fit algorithm is actually useful: It can create endpoint mappings that don't use some of the terminal ends of the endpoint line, causing the endpoint to be extrapolated out, possibly to a point that loses less accuracy to quantization.

Convection just tries cutting off 1 index at each end, then 1 index at both ends.  That turned out to be enough to place it near the top of the quality benchmarks.

Now I just need to add color weighting and alpha weighting and it'll be time to move on to other formats.

by OneEightHundred (noreply@blogger.com) at 2018-03-30 05:26

2012-01-08

How do you generate the tangent vectors, which represent which way the texture axes on a textured triangle, are facing?

Hitting up Google tends to produce articles like this one, or maybe even that exact one. I've seen others linked too, the basic formulae tend to be the same. Have you looked at what you're pasting into your code though? Have you noticed that you're using the T coordinates to calculate the S vector, and vice versa? Well, you can look at the underlying math, and you'll find that it's because that's what happens when you assume the normal, S vector, and T vectors form an orthonormal matrix and attempt to invert it, in a sense you're not really using the S and T vectors but rather vectors perpendicular to them.

But that's fine, right? I mean, this is an orthogonal matrix, and they are perpendicular to each other, right? Well, does your texture project on to the triangle with the texture axes at right angles to each other, like a grid?


... Not always? Well, you might have a problem then!

So, what's the real answer?

Well, what do we know? First, translating the vertex positions will not affect the axial directions. Second, scrolling the texture will not affect the axial directions.

So, for triangle (A,B,C), with coordinates (x,y,z,t), we can create a new triangle (LA,LB,LC) and the directions will be the same:

We also know that both axis directions are on the same plane as the points, so to resolve that, we can to convert this into a local coordinate system and force one axis to zero.



Now we need triangle (Origin, PLB, PLC) in this local coordinate space. We know PLB[y] is zero since LB was used as the X axis.


Now we can solve this. Remember that PLB[y] is zero, so...


Do this for both axes and you have your correct texture axis vectors, regardless of the texture projection. You can then multiply the results by your tangent-space normalmap, normalize the result, and have a proper world-space surface normal.

As always, the source code spoilers:

terVec3 lb = ti->points[1] - ti->points[0];
terVec3 lc = ti->points[2] - ti->points[0];
terVec2 lbt = ti->texCoords[1] - ti->texCoords[0];
terVec2 lct = ti->texCoords[2] - ti->texCoords[0];

// Generate local space for the triangle plane
terVec3 localX = lb.Normalize2();
terVec3 localZ = lb.Cross(lc).Normalize2();
terVec3 localY = localX.Cross(localZ).Normalize2();

// Determine X/Y vectors in local space
float plbx = lb.DotProduct(localX);
terVec2 plc = terVec2(lc.DotProduct(localX), lc.DotProduct(localY));

terVec2 tsvS, tsvT;

tsvS[0] = lbt[0] / plbx;
tsvS[1] = (lct[0] - tsvS[0]*plc[0]) / plc[1];
tsvT[0] = lbt[1] / plbx;
tsvT[1] = (lct[1] - tsvT[0]*plc[0]) / plc[1];

ti->svec = (localX*tsvS[0] + localY*tsvS[1]).Normalize2();
ti->tvec = (localX*tsvT[0] + localY*tsvT[1]).Normalize2();


There's an additional special case to be aware of: Mirroring.

Mirroring across an edge can cause wild changes in a vector's direction, possibly even degenerating it. There isn't a clear-cut solution to these, but you can work around the problem by snapping the vector to the normal, effectively cancelling it out on the mirroring edge.

Personally, I check the angle between the two vectors, and if they're more than 90 degrees apart, I cancel them, otherwise I merge them.

by OneEightHundred (noreply@blogger.com) at 2012-01-08 00:23

2011-12-07

Valve's self-shadowing radiosity normal maps concept can be used with spherical harmonics in approximately the same way: Integrate a sphere based on how much light will affect a sample if incoming from numerous sample direction, accounting for collision with other samples due to elevation.

You can store this as three DXT1 textures, though you can improve quality by packing channels with similar spatial coherence. Coefficients 0, 2, and 6 in particular tend to pack well, since they're all dominated primarily by directions aimed perpendicular to the texture.

I use the following packing:
Texture 1: Coefs 0, 2, 6
Texture 2: Coefs 1, 4, 5
Texture 3: Coefs 3, 7, 8

You can reference an early post on this blog for code on how to rotate a SH vector by a matrix, in turn allowing you to get it into texture space. Once you've done that, simply multiply each SH coefficient from the self-shadowing map by the SH coefficients created from your light source (also covered on the previous post) and add together.

by OneEightHundred (noreply@blogger.com) at 2011-12-07 18:39

2011-12-02

Spherical harmonics seems to have some impenetrable level of difficulty, especially among the indie scene which has little to go off of other than a few presentations and whitepapers, some of which even contain incorrect information (i.e. one of the formulas in the Sony paper on the topic is incorrect), and most of which are still using ZYZ rotations because it's so hard to find how to do a matrix rotation.

Hao Chen and Xinguo Liu did a presentation at SIGGRAPH '08 and the slides from it contain a good deal of useful stuff, nevermind one of the ONLY easy-to-find rotate-by-matrix functions. It also treats the Z axis a bit awkwardly, so I patched the rotation code up a bit, and a pre-integrated cosine convolution filter so you can easily get SH coefs for directional light.

There was also gratuitous use of sqrt(3) multipliers, which can be completely eliminated by simply premultiplying or predividing coef #6 by it, which incidentally causes all of the constants and multipliers to resolve to rational numbers.

As always, you can include multiple lights by simply adding the SH coefs for them together. If you want specular, you can approximate a directional light by using the linear component to determine the direction, and constant component to determine the color. You can do this per-channel, or use the average values to determine the direction and do it once.

Here are the spoilers:

#define SH_AMBIENT_FACTOR   (0.25f)
#define SH_LINEAR_FACTOR (0.5f)
#define SH_QUADRATIC_FACTOR (0.3125f)

void LambertDiffuseToSHCoefs(const terVec3 &dir, float out[9])
{
// Constant
out[0] = 1.0f * SH_AMBIENT_FACTOR;

// Linear
out[1] = dir[1] * SH_LINEAR_FACTOR;
out[2] = dir[2] * SH_LINEAR_FACTOR;
out[3] = dir[0] * SH_LINEAR_FACTOR;

// Quadratics
out[4] = ( dir[0]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[5] = ( dir[1]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * SH_QUADRATIC_FACTOR;
out[7] = ( dir[0]*dir[2] ) * 3.0f*SH_QUADRATIC_FACTOR;
out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 3.0f*SH_QUADRATIC_FACTOR;
}


void RotateCoefsByMatrix(float outCoefs[9], const float pIn[9], const terMat3x3 &rMat)
{
// DC
outCoefs[0] = pIn[0];

// Linear
outCoefs[1] = rMat[1][0]*pIn[3] + rMat[1][1]*pIn[1] + rMat[1][2]*pIn[2];
outCoefs[2] = rMat[2][0]*pIn[3] + rMat[2][1]*pIn[1] + rMat[2][2]*pIn[2];
outCoefs[3] = rMat[0][0]*pIn[3] + rMat[0][1]*pIn[1] + rMat[0][2]*pIn[2];

// Quadratics
outCoefs[4] = (
( rMat[0][0]*rMat[1][1] + rMat[0][1]*rMat[1][0] ) * ( pIn[4] )
+ ( rMat[0][1]*rMat[1][2] + rMat[0][2]*rMat[1][1] ) * ( pIn[5] )
+ ( rMat[0][2]*rMat[1][0] + rMat[0][0]*rMat[1][2] ) * ( pIn[7] )
+ ( rMat[0][0]*rMat[1][0] ) * ( pIn[8] )
+ ( rMat[0][1]*rMat[1][1] ) * ( -pIn[8] )
+ ( rMat[0][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )
);

outCoefs[5] = (
( rMat[1][0]*rMat[2][1] + rMat[1][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[1][1]*rMat[2][2] + rMat[1][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[1][2]*rMat[2][0] + rMat[1][0]*rMat[2][2] ) * ( pIn[7] )
+ ( rMat[1][0]*rMat[2][0] ) * ( pIn[8] )
+ ( rMat[1][1]*rMat[2][1] ) * ( -pIn[8] )
+ ( rMat[1][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )
);

outCoefs[6] = (
( rMat[2][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[2][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[2][0]*rMat[2][2] ) * ( pIn[7] )
+ 0.5f*( rMat[2][0]*rMat[2][0] ) * ( pIn[8])
+ 0.5f*( rMat[2][1]*rMat[2][1] ) * ( -pIn[8])
+ 1.5f*( rMat[2][2]*rMat[2][2] ) * ( pIn[6] )
- 0.5f * ( pIn[6] )
);

outCoefs[7] = (
( rMat[0][0]*rMat[2][1] + rMat[0][1]*rMat[2][0] ) * ( pIn[4] )
+ ( rMat[0][1]*rMat[2][2] + rMat[0][2]*rMat[2][1] ) * ( pIn[5] )
+ ( rMat[0][2]*rMat[2][0] + rMat[0][0]*rMat[2][2] ) * ( pIn[7] )
+ ( rMat[0][0]*rMat[2][0] ) * ( pIn[8] )
+ ( rMat[0][1]*rMat[2][1] ) * ( -pIn[8] )
+ ( rMat[0][2]*rMat[2][2] ) * ( 3.0f*pIn[6] )
);

outCoefs[8] = (
( rMat[0][1]*rMat[0][0] - rMat[1][1]*rMat[1][0] ) * ( pIn[4] )
+ ( rMat[0][2]*rMat[0][1] - rMat[1][2]*rMat[1][1] ) * ( pIn[5] )
+ ( rMat[0][0]*rMat[0][2] - rMat[1][0]*rMat[1][2] ) * ( pIn[7] )
+0.5f*( rMat[0][0]*rMat[0][0] - rMat[1][0]*rMat[1][0] ) * ( pIn[8] )
+0.5f*( rMat[0][1]*rMat[0][1] - rMat[1][1]*rMat[1][1] ) * ( -pIn[8] )
+0.5f*( rMat[0][2]*rMat[0][2] - rMat[1][2]*rMat[1][2] ) * ( 3.0f*pIn[6] )
);
}


... and to sample it in the shader ...


float3 SampleSHQuadratic(float3 dir, float3 shVector[9])
{
float3 ds1 = dir.xyz*dir.xyz;
float3 ds2 = dir*dir.yzx; // xy, zy, xz

float3 v = shVector[0];

v += dir.y * shVector[1];
v += dir.z * shVector[2];
v += dir.x * shVector[3];

v += ds2.x * shVector[4];
v += ds2.y * shVector[5];
v += (ds1.z * 1.5 - 0.5) * shVector[6];
v += ds2.z * shVector[7];
v += (ds1.x - ds1.y) * 0.5 * shVector[8];

return v;
}


For Monte Carlo integration, take sampling points, feed direction "dir" to the following function to get multipliers for each coefficient, then multiply by the intensity in that direction. Divide the total by the number of sampling points:


void SHForDirection(const terVec3 &dir, float out[9])
{
// Constant
out[0] = 1.0f;

// Linear
out[1] = dir[1] * 3.0f;
out[2] = dir[2] * 3.0f;
out[3] = dir[0] * 3.0f;

// Quadratics
out[4] = ( dir[0]*dir[1] ) * 15.0f;
out[5] = ( dir[1]*dir[2] ) * 15.0f;
out[6] = ( 1.5f*( dir[2]*dir[2] ) - 0.5f ) * 5.0f;
out[7] = ( dir[0]*dir[2] ) * 15.0f;
out[8] = 0.5f*( dir[0]*dir[0] - dir[1]*dir[1] ) * 15.0f;
}


... and finally, for a uniformly-distributed random point on a sphere ...


terVec3 RandomDirection(int (*randomFunc)(), int randMax)
{
float u = (((float)randomFunc()) / (float)(randMax - 1))*2.0f - 1.0f;
float n = sqrtf(1.0f - u*u);

float theta = 2.0f * M_PI * (((float)randomFunc()) / (float)(randMax));

return terVec3(n * cos(theta), n * sin(theta), u);
}

by OneEightHundred (noreply@blogger.com) at 2011-12-02 12:22

2011-12-01

Fresh install on OS X of ColdFusion Bulder 2 (TWO, the SECOND one). Typing a simple conditional, this is what I was given:



I also had to manually write the closing cfif tag. It's such a joke.

The absolute core purpose of an IDE is to be a text editor. Secondary to that are other features that are supposed to make you work better. ColdFusion Builder 2 (TWO!!!!!) completely fails on all levels as a text editor. It doesn't even function as well as notepad.exe!

Text search is finicky, Find & Replace is completely broken half the time, the UI is often unresponsive (yay Eclipse), the text cursor sometimes disappears, double-clicking folders or files in an FTP view pops up the Rename dialog every time, HTML / CF tag completion usually doesn't happen, indention is broken, function parameter tooltips obscure the place you are typing, # and " completion randomly breaks (often leaving you with a ###)...the list goes on and on.

Adobe has a big feature list on their site. I'm thinking maybe they should go back and use some resources to fix the parts where you type things into the computer, you know, the whole point of the thing.

by Ted (noreply@blogger.com) at 2011-12-01 15:14

2011-10-19

Has it really been a year since the last update?

Well, things have been chugging along with less discovery and more actual work. However, development on TDP is largely on hold due to the likely impending release of the Doom 3 source code, which has numerous architectural improvements like rigid-body physics and much better customization of entity networking.


In the meantime, however, a component of TDP has been spun off into its own project: The RDX extension language. Initially planned as a resource manager, it has evolved into a full-fledged programmability API. The main goal was to have a runtime with very straightforward integration, to the point that you can easily use it for managing your C++ resources, but also to be much higher performance than dynamically-typed interpreted languages, especially when dealing with complex data types such as float vectors.

Features are still being implemented, but the compiler seems to be stable and load-time conversion to native x86 code is functional. Expect a real release in a month or two.

The project now has a home on Google Code.

by OneEightHundred (noreply@blogger.com) at 2011-10-19 01:37