Difference between revisions of "Offline BeepBeep 3 Benchmark1"

From CRV
Jump to: navigation, search
(A FO-LTL Specification)
(A FO-LTL Specification)
Line 167: Line 167:
 
&nbsp;&nbsp;(p<sub>1</sub>/status = WALKER &and; p<sub>2</sub>/status = BLOCKER &and; <br>
 
&nbsp;&nbsp;(p<sub>1</sub>/status = WALKER &and; p<sub>2</sub>/status = BLOCKER &and; <br>
 
&nbsp;&nbsp;p<sub>1</sub>/velocity/x &gt; 0 &and; |p<sub>1</sub>/position/x - p<sub>2</sub>/position/x| &lt; 6  &and; |p<sub>1</sub>/position/y - p<sub>2</sub>/position/y| &lt; 10)<br>
 
&nbsp;&nbsp;p<sub>1</sub>/velocity/x &gt; 0 &and; |p<sub>1</sub>/position/x - p<sub>2</sub>/position/x| &lt; 6  &and; |p<sub>1</sub>/position/y - p<sub>2</sub>/position/y| &lt; 10)<br>
&nbsp;&nbsp;&rarr; (<b>X</b> characters/character[id=i<sub>1</sub>]/velocity &lt; 0))
+
&nbsp;&nbsp;&rarr; (<b>X</b> characters/character[id=i<sub>1</sub>]/velocity/x &lt; 0))
  
 
Intuitively, the first line fetches the XML elements for the two characters and the ID of p<sub>1</sub>, the second line checks that they are respecively a Walker and a Blocker, the third checks that they collide, and the last checks that in the next event, the velocity of the walker has changed its sign.
 
Intuitively, the first line fetches the XML elements for the two characters and the ID of p<sub>1</sub>, the second line checks that they are respecively a Walker and a Blocker, the third checks that they collide, and the last checks that in the next event, the velocity of the walker has changed its sign.

Revision as of 11:28, 3 June 2016

Pingus Synthetic execution traces of the video game Pingus, used for runtime verification. All details regarding these traces can be found in the paper Automated Bug Finding in Video Games: A Case Study for Runtime Monitoring. We highly recommend you scan that paper first.

List of categories

Benchmark Data

Pingus is a clone of Psygnosis’ Lemmings game series made by Ingo Ruhnke. It regularly counts among the highest quality open source games available and was once ranked in the Top 10 Linux games by CNN. The game is divided into more than 70 levels, each of which being populated with various kinds of obstacles, walls, and gaps. Between 10 and 100 autonomous, penguin-like characters (the Pingus) progressively enter the level from a trapdoor and start walking across the area. A Pingu keeps walking in the same direction until it either reaches a wall (in which case it turns around) or falls into a gap (and dies, if it falls from too high).

The goal of the game is to have a minimum percentage of the incoming Pingus safely reach the exit door. To this end, the player can give special abilities to certain Pingus, allowing them to modify the landscape in order to create a walkable path to the goal. For example, some Pingus can become Bashers and dig into the ground; others can become Builders and construct a staircase to reach over a gap. Other abilities modify the behaviour of other Pingus: hence the Blocker stands in the way and makes any Pingu that reaches it turn around as if it encountered a wall.

The Trace Part

The dataset is made of about 20 traces consisting of a simulated execution of the game; each of these traces contains from 10 to 90 Pingus walking in the game field at the same time. You can download the trace files from DataHub (click on the button "Go to resource").

Event structure

All traces are in XML format. Each event of the trace has the following structure:

<message>
 <characters>
  <character>
   <timestamp>123456</timestamp>
   <id>0</id>
   <status>WALKER</status>
   <position>
    <x>62</x>
    <y>0</y>
   </position>
   <velocity>
    <x>1</x>
    <y>0</y>
   </velocity>
  </character>
  ...
 </characters>
</message>

The event is contained in a message element, which contains a single characters element. Within this element, there are multiple character elements, corresponding to the properties of a single Pingu on the game field. Each Pingu has a unique ID, a position (x-y) and a velocity vector (x-y). In addition, each Pingu has a status, which can be "walker" (normal state), "hidden" (not yet injected into the playing field) or any of these special abilities: "blocker", "basher", "blower", "builder", "faller".

IDs are unique within each individual message element, but obviously, they repeat from one event to the next. Some Pingus, however, may disappear at some point in the trace if they die, so the number of character elements is not fixed for the entire trace. Within each message, IDs do not appear in any particular order: this depends on the iterator on game elements provided by the game itself. We cannot assume either that IDs start at 0 or that they form a continuous sequence.

In this simulation, the game loop runs at about 100 Hz; hence message elements correspond to snapshots of the game's state roughly 0.01 seconds apart. The traces in the dataset correspond to approximately two minutes of playing in the same, simple level.

Naming convention

Trace files are labelled using the following convention: description-x.xml where x is the number of Pingus in the level, and description tells which of the properties described below this trace violates. A trace where no property is violated has all-ok for its description.

The Property Part

The paper on Pingus (see link at top of the page) describes many properties one can verify on the execution of the game. For the purpose of this benchmark, we picked three of them.

Property 1: spontaneous Pingu creation

Informal Description

From one event to the next, Pingus can only disappear from the game field; no Pingu can be created mid-game.

Demonstration Traces

Here are two counter-examples of this property (for the sake of clarity, we only list IDs occurring in each event):

  {0,1,2}, {0,1,2,4}
  {0,1,2}, {0,1,3}

Here are two examples of this property (for the sake of clarity, we only list IDs occurring in each event):

  {0,1,2}, {0,1,2}, {0,1,2}
  {0,1,2}, {0,2}, {0,2}, {0}
Formal Specification

This property can be checked in multiple ways. For example, in past-time first-order LTL, one can say that for every ID found in an event, in the previous event, there must be a tag with the same ID:

G (∀ x ∊ /characters/character/id : Y (∃ y ∊ /characters/character/id : x = y))

Another possibility is first to aggregate all IDs from each event into a set, and then check that the sequence of such sets is monotonically decreasing, etc.

A FO-LTL Specification

See above.

Property 2: endless bashing

Informal Description

Every Basher must become a Walker when it stops bashing. Bashers dig horizontal tunnels into walls; all these walls are of finite dimensions and hence a Basher that keeps bashing until the end of a level indicates a problem.

Note that it is not sufficient to check that no Bashers remain in the last event of the trace, since:

  • A Pingu could be a Basher and then disappear (not allowed)
  • A Pingu could change from being a Basher to something other than walker (not allowed either)
Demonstration Traces

This sequence of two events, anywhere in a trace, is illegal:

<message>
  <characters>
    <character>
      <id>0</id>
      <status>BASHER</status>
      ...
    </character>
   ...
   </characters>
</message>

<message>
  <characters>
    <character>
      <id>0</id>
      <status>BLOCKER</status>
      ...
    </character>
   ...
   </characters>
</message>

In the dataset, the traces labelled basher-blocker violate this property.

Formal Specification

This property introduces an additional complexity, as the same constraint applies independently for every Pingu in the trace. A first way of checking it is to "slice" the trace into subtraces, one for each ID. For each such slice, the property then becomes:

G status=BASHER → (status=BASHER U status=WALKER)

A FO-LTL Specification

Without resorting to slices, a specification in LTL-FO+ reads as:

G ∃ x ∊ characters/character/id : /characters/character[id=x]/status = BASHER → (/characters/character[id=x]/status = BASHER U /characters/character[id=x]/status = WALKER)

Note that the notation /character[id=x]/status means: "the value of the status element for the character element whose id has value x".

Property 3: turn around

Informal Description

A Walker encountering a Blocker must turn around and keep on walking.

Demonstration Traces

In the dataset, the traces labelled rebel-walker violate this property.

Formal Specification

The complexity added to this property is that one cannot resort to "slices" of a single Pingu, as we need to correlate in each event the x-y positions of two Pingus.

In this dataset, two Pingus collide if:

|x1-x2| < 6 and |y1-y2| < 10

where (xi,yi) are the x-y coordinates of Pingu i. One must therefore:

  • Find two Pingus with IDs i1 and i2 such that i1 is a Walker and i2 is a Blocker
  • Check that they collide using the formula above
  • If i1's x-velocity in the current event is positive, it should be negative in the next event, and vice versa
A FO-LTL Specification

To simplify, in LTL-FO+, we can break the specification in two cases: one for the Walker going to the right (x-velocity positive), and the other for the Walker going to the left. Here is the first of these cases:

G (∃ p1 ∊ characters/character : ∃ p2 ∊ characters/character : ∃ i1 ∊ p1/id :
  (p1/status = WALKER ∧ p2/status = BLOCKER ∧
  p1/velocity/x > 0 ∧ |p1/position/x - p2/position/x| < 6 ∧ |p1/position/y - p2/position/y| < 10)
  → (X characters/character[id=i1]/velocity/x < 0))

Intuitively, the first line fetches the XML elements for the two characters and the ID of p1, the second line checks that they are respecively a Walker and a Blocker, the third checks that they collide, and the last checks that in the next event, the velocity of the walker has changed its sign.

Clarification Requests

This is a space where other teams can ask for clarifications and the submitting team can answer. When a team requests a clarification, the team should be mentioned in the request and the request timestamped. The team submitting the benchmark should ensure they watch this page so that they get notifications of questions.