Common State Functions

Swift

Each state has three key functions you can use to easily build your tokeniser. Each state also implements Printable, constructing the OK Script for itself as its description. This is useful if you would like to confirm the syntax for usage in an OK Script. 

branch(states:TokenizationState...)->TokenizationState

Accepts a series of states which will be added as branches of the target state. It returns the target state to enable chaining

sequence(states:TokenizationState...)->TokenizationState

Will chain the supplied states together. As opposed to branch each state will be attached as a branch of the state before it in the list. As with branch it returns the target state to enable chaining. 

token(tokenName:String)

token(token:Token)

token(with:TokenCreationBlock)

These three methods specify the token that should be created if the state is satisfied. In general these should be terminal states, but the method returns the target state for chaining should that not be the case

OK Script

The . Operator

In OK script a branch is added using the . operator. 

"a"."b"

The example above says a should be followed by b. If you would like a to be followed by b OR c simply enclose them in a branch (see below for more details)

"a".{"b","c"}

Creating tokens

Tokens are created using the -> operator followed by the name of the token to be emitted. For example if we wanted to extend the last example to emit either an "ab" or "ac" token, we simply do the following

"a".{"b"->ab,"c"->ac}

Tokens must start with an english letter, but after this can contain decimal digits an - or _ (dash or underscore)

Branch

Branch is the basic state, it maintains a list of branches in an ordered array. When a branch state is evaluated it looks for one of its child states for one that can accept the available character. If subsequently asked to consume that character it will pass it to the child state. 

Swift

Branch()

Constructs an empty branch state

Branch(states:TokenizationState...) 

Constructs a Branch with the specified branches

OK Script

Branches have very simple syntax, you simply specify an open branch

{State, State, ..., State}

You will also see this Branch syntax used wherever a set of states must be suppled. For example, a Branch with two Char states is simply

{"a","b"}

Char

Swift

The Char state accepts or rejects characters supplied to it's initialiser. These are supplied in a Swift String, and the particular constructor you use will govern if only characters in the String are accepted, or if any character except that in the String is accepted

Char(from:String)

Only characters from the supplied String are accepted. 

Char(except:String)

Any characters except those in the supplied String are accepted. 

OK Script

The Char state is represented by double quotes "" with any character between the quotes used for the accept string. If you wish to invert (make any character except the ones supplied acceptable) simply prefix the first " with a !

"a" //Only a accepted
!"a" //Everything except "a" accepted

The following escape codes may be used \" for double quote, \\ for backslash, \n for newline, \t for tab and \r for carriage return

Repeat

The repeat state counts the number of tokens issued by a child set of states (they become the root of the tokenisation process until the repeat fails or is satisfied). You may specify both a minimum and maximum number of times the state should be entered. 

Swift

Repeat(repeatingState:State,min:Int =1, max:Int?)

The repeating state can be any state (including for example a Branch() or another Repeating state. The Repeated state will be exited when repeatingState can no longer be entered. At this point any token specified on the state will be emitted, and its branches evaluated. A token will not be emitted if the minimum number of tokens have not be issued by the repeatedState. As soon as the maximum number (if specified) is reached, the state will exit through its branches, or directly to the parent state. 

OK Script

The OK Script very closely mirrors the Swift. 

(repeated-state[,min[,max]])

For example, to match exactly two hexadecimal digits. 

("0123456789abcdefABCDEF"->hexDigit,2,2)->byte

 

Delimited

Delimited states allow you to enter a completely different tokenisation strategy when a delimiter is encountered. You may specify a single delimiter (e.g. ' ) or a specific opening a closing delimiter (e.g. [ and ] ). Unlike Repeat states, the tokens emitted by states inside the delimiter will be published in the normal fashion.  The delimiter itself will issue the specified token when it is entered and exited. 

Swift

Delimited(delimiter:String,states:TokenizationState...)

Creates a delimited state using a single string for both the start and end of the delimitation. Any number of states can be supplied and act like the root of a tokenizer until the delimiter character is encountered again.  

Delimited(open:String,close:String,states:TokenizationState...)

As above but a separate opening and closing delimiter can be specified. 

OK Script

Delimited states are specified between < and > characters. They take up to three parameters (just like the Swift constructors. 

<'opening-delimiter'[,'closing-delimiter'], delimited-states>

If a closing delimiter is not specified the opening delimiter will be used. If you wish to use ' as the delimiter it must be escaped (\') and backslash can be used by escaping it also (\\). 

Only one character can be used for a delimiter. 

As an example, here any character is accepted in-between quotation marks

<'"',{!"\"""->char}>->double-quote