The Swift AST

Put simply, the Swift Abstract Syntax Tree is a parsed version of a Swift source file.

Some tools I found helpful

While building this site, I was looking into how I could write articles without needing to fool around with much HTML. I came across several libraries from John Sundell, including Ink which parses Markdown into HTML, and Splash which can be used as a plugin to Ink to format code blocks. Ink so far seems to be working well, but Splash is quite limited in its syntax highlighting ability. I wanted something that could produce syntax colors as well as Xcode, so I turned to the swift-syntax library to try writing my own highlighter from scratch.

As it turns out, "from scratch" is surprisingly easy because the swift-syntax library does basically everything for you. Parsing a string into the AST using swift-syntax is just one line and some imports.

import SwiftParser
import SwiftSyntax

let syntaxTree: SourceFileSyntax = Parser.parse(source: code)

Like the name suggests, the AST is, in fact, a tree. The root node represents the entire source file, and each leaf represents an atomic piece of relevant syntax like an identifier name, a comma, or a keyword. I say relevant because each node carries with it the extra decorative bits like whitespace and comments called "trivia". For this reason, the docs point out that it's more accurate to describe it as a concrete syntax tree because you can reproduce the original source from the syntax tree. The leaf, along with its leading and trailing syntax, is represented as a TokenSyntax instance.

Before we can add HTML tags to color the individual pieces, we start by reproducing the original source file.

// This will hold our html
var attributedSource = String()

for token in syntaxTree.tokens(viewMode: .sourceAccurate) {
    attributedSource.append(token.leadingTrivia.description)

    attributedSource.append(token.text)
    
    attributedSource.append(token.trailingTrivia.description)
}

As the syntax tree is parsed, extra tokens are created if the parser encounters something unexpected. These can be accessed via the .fixedUp view. This would be useful for an IDE suggesting fixups, but we are only interested in the highlighting the original source.

SyntaxTreeViewMode.sourceAccurate: Visit the tree in a way that reproduces the original source code. Missing nodes will not be visited, unexpected nodes will be visited. This mode is useful for source code transformations like a formatter.

SyntaxTreeViewMode.fixedUp: Views the syntax tree with fixes applied, that is missing nodes will be visited but unexpected nodes will be skipped. This views the tree in a way that’s closer to being syntactical correct and should be used for structural analysis of the syntax tree.

Inspecting TokenSyntax, we can see it has a tokenKind property which has at first glance exactly what we're looking for. The TokenKind enum has cases for all the different swift keywords, punctuations, literals, and more.

@frozen
public enum TokenKind: Hashable {
  case eof
  case associatedtypeKeyword
  case classKeyword
  case deinitKeyword
  case enumKeyword
  case extensionKeyword
  case funcKeyword
  case importKeyword
  case initKeyword
  // ...
  case period
  case prefixPeriod
  case comma
  // ...
  case poundErrorKeyword
  case poundIfKeyword
  // ...
  case integerLiteral(String)
  case floatingLiteral(String)
  case stringLiteral(String)
  case regexLiteral(String)
  case unknown(String)
  case identifier(String)
  // ...
  case rawStringDelimiter(String)
  case stringSegment(String)
  case stringInterpolationAnchor
  case yield
}

We can create an extension on TokenSyntax that returns a category we want to color.

enum SyntaxTokenCategory: Equatable {
    case plain
    case string
    case number
    case regex
    case keyword
    case preprocessor
    case identifier
    case property
    case type
    case externalType
    case function
}

extension TokenSyntax {
    var category: SyntaxTokenCategory {
        return switch self.tokenKind {
        case .eof:
                .plain
        case .associatedtypeKeyword:
                .keyword
        case .classKeyword:
                .keyword
        case .deinitKeyword:
                .keyword
        case .enumKeyword:
                .keyword
        // ...
        case .poundHasSymbolKeyword:
                .preprocessor
        case .integerLiteral(_):
                .number
        case .floatingLiteral(_):
                .number
        case .stringLiteral(_):
                .string}}}
        // ...

This gives us keywords, strings, regexes, and numbers, but every non-keyword word is lumped under .identifier. To fix these, we just need to inspect the token's parent. This is where the AST Explorer comes in handy. Using it, we can quickly figure out extra rules to differentiate between identifiers.

extension TokenSyntax {
    func identifyToken() -> SyntaxTokenCategory {
        if let parent = parent {
            if parent.is(SwiftSyntax.EnumDeclSyntax.self) {
                return .type
            }
            if parent.is(SwiftSyntax.StructDeclSyntax.self) {
                return .type
            }
            if parent.is(SwiftSyntax.ClassDeclSyntax.self) {
                return .type
            }
            if parent.is(SwiftSyntax.ProtocolDeclSyntax.self) {
                return .type
            }
            if parent.is(SwiftSyntax.MemberAccessExprSyntax.self) {
                return .property
            }
            if parent.is(SwiftSyntax.SimpleTypeIdentifierSyntax.self) {
                if parent.parent?.is(CustomAttributeSyntax.self) == true {
                    return .keyword
                }
                return .externalType
            }
            if parent.is(SwiftSyntax.EnumCaseDeclSyntax.self) {
                return .property
            }
            if let fnParent = parent.as(SwiftSyntax.FunctionParameterSyntax.self) {
                return self == fnParent.firstName ? .function : .plain
            }
            if parent.is(SwiftSyntax.MemberTypeIdentifierSyntax.self) {
                return .externalType
            }  
            if parent.is(SwiftSyntax.GenericParameterSyntax.self) {
                return .externalType
            }
            if parent.is(SwiftSyntax.FunctionDeclSyntax.self) {
                return .function
            }
            if parent.is(SwiftSyntax.IdentifierExprSyntax.self) {
                return .identifier
            }
            if parent.is(SwiftSyntax.IdentifierPatternSyntax.self) {
                return .identifier
            }
        }
        return .plain
    }
}

In some cases, we have to look up another level and inspect the parent's parent type. For parameters in function declarations, we can replicate Xcode's behavior and highlight the firstname one way, and the secondName another.

let someVar = "abc"
let a = someVar.first!

#if DEBUG
import MyLib
#endif

func test(foo bar: String) { }

enum Options: Comparable {
    case good
    case better(kinda: Int)
    case best(String)
}

let regex = /name[a-Z]*^/

struct LinkTag: LeafTag {
    @State var myState: String
    let path: String

    func render(_ ctx: LeafKit.LeafContext) throws -> LeafKit.LeafData {
        LeafData("/\(path)")
    }
    
    func foo<T: View>(@ViewBuilder _ v: () -> T, s: Binding<String>) async throws {
        $myState.foo
    }
}