Learning about Swift Performance

Ahead of doing any refactoring of the OysterKit demos to improve interactive performance (I'm doing syntax colouring on the main thread, to the whole file, not the best way to make things feel responsive), I wanted to get a sense of Swift's performance (not to mention my own code). 

The first step was to add some new performance tests using the new features from XCode 6. This is simply done

    func testOKScriptTokenizerPerformance() {
        var tokFileTokDef = TokenizerFile().description
        tokFileTokDef += tokFileTokDef
        tokFileTokDef += tokFileTokDef
        tokFileTokDef += tokFileTokDef
        tokFileTokDef += tokFileTokDef
        self.measureBlock() {
            let parserGeneratedTokens = TokenizerFile().tokenize(tokFileTokDef)
        }
    }

The most complex tokenizer I have is the one used by the OK Script parser itself, so I get this and multiply it out a few times!

The secret sauce is in the self.measureBlock() call. This takes a single closure as a parameter (there is a richer version, but I didn't need it). It then records the time taken to perform the block, running it 10 times. Failures will occur if the standard deviation (how much the time taken varies across the 10 runs) exceeds 3%. Once you've run the test once, you can take a base-line. Perfect. I can now track the impact of the improvements. 

Time to start profiling!

The Ugly

The first thing thrown up was that state transitions in the tokenizer were taking a huge chunk of time. There's nothing Swift here... I was doing a lot of unnecessary reseting of states. I applied a very computer science approach, and made each TokenizationState shed its internal state (with the exception of Repeated states which have to count things). At the end, I had a much leaner strategy, and almost everything now no longer needed reseting. 

Only Repeat has any internal state, meaning that everything else could just do nothing... Much faster than doing something!

Only Repeat has any internal state, meaning that everything else could just do nothing... Much faster than doing something!

Performance improvement... 60% 

Retain/Release

With that performance hog out the way the next biggest slug was retain/release activity. The biggest cost I was paying was on token creation. Many of these were happening in Repeat states where often you just loop through incoming characters until one that isn't suitable for the current token is hit. While it's doing that it's spewing out new tokens. Oh and you remember that repeat states are expensive to enter because of their internal state? 

So I found myself faced with a purity vs. pragmatism scenario. I don't like to have two ways of doing things, but Repeat has a clear use-case, but looping through a stream of characters until an invalid one comes along is very common too. 

So I have introduced a new LoopingChar state. Very very simple, it just doesn't exit until it hits a character that isn't allowed (rather than consuming the character, issuing a token, exiting, only to be re-entered next time). 

override func consume(character: UnicodeScalar, controller: TokenizationController) -> TokenizationStateChange {
    if isAllowed(character){
        return TokenizationStateChange.None
    } else {
        return selfSatisfiedBranchOutOfStateTransition(false, controller: controller, withToken: createToken(controller, useCurrentCharacter: false))
    }
}

See that TokenizationStateChange.None? That's the magic. Not changing state is cheap. That is about the only difference (well one minor change needed to add this construct to OK Script, but that's just serialisation). 

We've just banked a further 10% performance improvement.

Lazy Code means Slow Swift

With 70% of the cruft out the way, I get to see the first real impact of using a Swift feature... String interpolation. I'm using UnicodeScalars in the main in the tokenizer, but there were a couple of places (not many mind) where I needed to convert that into a character for comparison, and I used string interpolation to do that. Now... lazy because typing

Was nice and readable, but just not performant. This was used moderately heavily for delimiters, but the impact far out weighed the number of calls. A quick tweak to get the single character (always the first) out, and we are away

delimiter.unicodeScalars[delimiter.unicodeScalars.startIndex]

From OysterKit's perspective, this is just a better solution. However, there may be some cases where there isn't such a simple fix. Write this down. String interpolation is slow in Swift. I don't think this is a problem, it's just something to be aware of. 

The one-line change above netted me best part of another 10%. 

The Benefits of Libraries

OysterKit provides a range of standard tokenisers to get you started quickly, and I went around and updated them to use LoopingChar instead of repeats where possible (uncovering a bug in repeats at the same time). I used a couple of these in my OK Script tokenizer, and I got another 3% gain, instantly delivered to anyone using the tokenisers!

The Learning

I have a confession. When I first looked at the performance results, I blamed Swift. Normally I would lambast someone who blamed the programming language before themselves. However, I had let the embryonic state of Swift convince me that the protocol thunking calls, the massive number of retains and releases weren't my problems. They were an overly heavy Swift compilation with ARC not doing it's thing properly yet. 

As usual. It wasn't. In a few short hours I had over an 83% performance improvement the traditional optimisation way: Do less. 

String interpolation needs to be avoided in anything like a tight loop, but that doesn't surprise me at all, and I don't see it as Swift's fault. 

So guys, when you see bad performance, the normal rules apply: It's probably your fault.