HSP2HF

From OHRRPGCE-Wiki
Jump to: navigation, search
Mobile-phone.png
This article is about the OHRRPGCE FMF project, which is an alternate implementation of the OHRRPGCE for Java mobile phones. Technical implementation details discussed here should not be confused with those of the RPG format

Most of you will never have to touch Henceforth if you don't want to. (But who wouldn't want to?) The HSP2HF cross-compiler is used to automatically convert your HamsterSpeak into Henceforth bytecode. It is part of the RPG2XRPG conversion process.


How the Cross-Compiler Works: Motivations[edit]

Some parts of the OHRRPGCE FMF project are slightly incompatible with the standard OHRRPGCE, but for scripting this is simply unacceptable. Unlike, say, a slightly jittery enemy HP value -which is immediately apparent when you first load your game on a phone- a silent error in a script is worth hours of headache-inducing debugging, and probably not worth anything at all in the end.

So, the goal of the cross-compiler is script-level compatibility. Efficiency and conciseness, while important, take a backseat to this driving need.


How the Cross-Compiler Works: Naive Compiler[edit]

The Henceforth cross-compiler benefits from HamsterSpeak's tree-like structure; a naive conversion can simply convert each node to a local Henceforth function. Consider the following script (from Wandering Hamster, loaded in the HSP2HF utility)

FMF cross compiler screenshot.png

Typical to most HSZ scripts, "setnpcspeed" contains a head "do" node. This node happens to contain only one element, which takes three arguments, each of which is a simple number or variable. Clicking "cross-compile" will invoke the naive converter, which produces the following code:

 \[14]{
   [1]@
 }
 \[12]{
   3
 }
 \[10]{
   [0]@
 }
 \[4]{
   [10]()
   [12]()
   [14]()
   [HS:78]()
 }
 
 #Init local variables
 @[1]
 @[0]
 
 #Main script loop
 do_start
 [4]()
 do_end

Let's start from the local variables section:

 @[1]

This is a shorthand syntax; it basically calls "local store", a function deep within the script interpreter which does something like this:

 void localStore(int arg) {
   local_variables[arg] = pop_parent();
 }

Next, we have the main loop. The "do_start" and "do_end" primitives are there to help the "break" and "continue" primitives to function properly. The meat of the main loop is the call to:

 [4]()

...which is simply a call to a script-local subroutine.

The following code defines script-local subroutine 4:

 \[4]{
   [10]()
   [12]()
   [14]()
   [HS:78]()
 }

...as three script-local calls([10], [12], and [14]) and one built-in function call (HamsterSpeak:78, alterNPC). The remaining local functions are equally easy to understand. For example, "[1]@" calls "local load" with "1" as an argument. Local load is defined internally as:

 void localLoad(arg1) {
   push(local_variables[arg1]);
 }

The HSP2HF utility is also an excellent example of a place where recommended syntax is ignored. Although alter_npc is more readable to humans, [HS:78]() is a lot easier to parse for a compiler. Likewise, do_start and do_end is cleaner machine syntax than do{ ... }.


How the Cross-Compiler Works: Reasonable Inlining[edit]

Simple functions like "setnpcspeed" are very easy to inline, just by copying the leaf nodes' source into their parents. The previous script can be re-written as:

 #Init local variables
 @[1]
 @[0]
 
 #Main script loop
 do_start
 [0]@
 3
 [1]@
 [HS:78]()
 do_end

...which is much more concise. Due to the nature of HF bytecode, inlining usually improves both performance and storage efficiency. (When I have time to profile, I hope to collect some facts to back this up.) However, inlining everything is often either impossible or unwise, which is why one needs a policy for inlining. At the moment, the OHRRPGCE FMF's cross-compiler uses the following algorithm to determine what to inline:

 1) Start by doing a naive conversion, and retain the tree structure. 
    Mark all nodes as "not dead" and with a "count" of 1.
 2) Loop until all nodes are inlined or dead. At each iteration, do the following.
 3) Determine which nodes are leaf nodes. (Leaf nodes have no children, or only dead children).
    a) If this node cannot be inlined (e.g., it's self-referential, or checks b/c/d/e below fail), 
       mark it as "dead".
    b) If this node's count is "1", inline it. (Copy its corresponding source into any node that 
       references it, incrementing their "count" value by 1, and then delete it.)
    c) If this node's count is "2", and it is referenced 8 times or less, inline it.
    d) If this node's count is "3", and it is referenced 4 times or less, inline it.
    e) If this node's count is "4" or more, only inline it if it is referenced exactly once.

We are still discussing what makes a node impossible to inline. Technically, the problem is difficult, but Hamsterspeak byte-code is fairly structured in nature, which means we can probably define a few simple criteria for exclusion.


Primitives & HSPEAK->HF Snippits[edit]

The cross-compiler inserts snippets of HF code when it encounters an HSPEAK node of a given type. For example, at node 10, given a "number" node with value 75, it inserts:

 \[10] {
   75
 }

Henceforth, we shall refer to a node of ID n as: \[n] {} --this allows us to generalize HSPEAK nodes into simple templates. Just a reminder: \[n] {} represents a script-local subroutine; it is not valid Format T syntax.

The following templates are loaded when the HVM initializes, so the cross-compiler makes use of them to simplify syntax:


Templates
#Template: -2 4 set_var will set local variable 1 to value 4
\set_var {
  swap
  dup
  0 lt if {
    1 add -1 mult
    @[]
  } else {
    @[.G]
  }
}
 
#Template: 2 get_var will return the contents of global variable 2
\get_var {
  dup
  0 lt if {
    1 add -1 mult
    []@
  } else {
    [.G]@
  }
}


Here are the snippets used by the cross-compiler; we repeat numbers for the sake of completeness:


Numbers
HSpeak Parameters Henceforth Snippet
Kind ID args[]
number value   \[n] {
  value
}


Do Loops
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow do node_x
node_y
...
\[n] {
  do{
    [x]()
    [y]()
    ...
  }
}


If Statements
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow if conditional_x
then_y
else_z

\[n]{
  [x]()
  if {
    [y]()
  } else {
    [z]()
  }
}


Then/Else Loops
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow then/else node_x
node_y
...
\[n] {
  [x]()
  [y]()
  ...
}


Break/Continue
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow command amount
if amount==1 then:
  skip amount
else:
  append "_x" to command
\[n] {
  [amount]
  command
}


Returning
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow return value \[n] {
  value
  @[-1]
}
flow exitscript
At a given depth
  \[n] {
  invalid
  @[-1]
  depth
  break_x
}
flow exitscript
At a given depth
value \[n] {
  value
  @[-1]
  depth
  break_x
}


While/For
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow while conditional_x
do_y
\[n] {
  do {
    [x]()
    not if {
      break
    }
    inline_y{
      y_command_1
      y_command_2
      ...
      y_command_z
    }
  }
}
flow for count_id_x
counter_start_s
count_end_e
counter_step_w
do_y
\[n] {
  [x]()
  [s]()  
  set_var
  do {
    [w]()
    0 gt
    [x]()
    get_var
    [e]()
    gt xor not
    [x]()  
    get_var
    [e]()
    neq and
    if {
      break
    }
    inline_y{
      y_command_1
      y_command_2
      ...
      y_command_z
    }
    [x]()
    get_var
    [w]()
    add
  }
}
Note 1: The block inline_y simply unrolls the do_y block into the body of [n](). This is done so that break and continue will function properly.
Note 2: The upshot of this is that do_y will be instantly culled from the source, unless another node references it (which would be a bit of a hack, in my opinion. Regardless, this is fine, and will not affect program validity in any way.


Switch
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow switch  ??? This is not yet documented in HSZ, so we will deal with it later.


Variable Access
HSpeak Parameters Henceforth Snippet
Kind ID args[]
global variable_x   \[n] {
  [x]()
  [.G]@
}
local variable_x   \[n] {
  [x]()
  []@
}


Math Functions
HSpeak Parameters Henceforth Snippet
Kind ID args[]
math set_variable lhs_l
rhs_r
\[n] {
  [l]()
  [r]()
  set_var
}
math increment_variable lhs_l
rhs_r
\[n] {
  [l]()
  get_var
  [r]()
  add
  set_var
}
math decrement_variable lhs_l
rhs_r
\[n] {
  [l]()
  get_var
  [r]()
  sub
  set_var
}
math not lhs_l \[n] {
  [l]()
  not
}
math and lhs_l
rhs_r
\[n] {
  [l]()
  if {
    [r]()
  } else {
    False
  }
}
math or lhs_l
rhs_r
\[n] {
  [l]()
  not
  if {
    [r]()
  } else {
    True
  }
}
math operand
If the operand is listed
above, use that code
block, not this one.
lhs_l
rhs_r
\[n] {
  [l]()
  [r]()
  operand
}


Built-In and User-Defined Functions
HSpeak Parameters Henceforth Snippet
Kind ID args[]
built-in func_id_x   \[n] {
  hspeak_api_call_x
}
user-script func_id_x   \[n] {
  user_script_call_x
}


Resolution Engine[edit]

After cross-compiling, the HSP2HF utility is basically left with a sequence of bytecodes for each script. The final step is to lump these together into HF lumps. The size of a script, along with its potential to call other scripts, is used to properly group several scripts into one HF lump. To gather this information, a single pass of each script is made. Simultaneously, the resolution engine scans scripts to find simple optimizations which can be performed in place. A list of these optimizations follows.


Found Replaced By Reasoning
number  get_var [number.G]@   #If number is < 0
[-(number+1)]@  #Otherwise
The parameter ID is often known.
number  value  set_var value  @[number.G]   #If number is < 0
value  @[-(number+1)]  #Otherwise
Ditto to the above.