Knowledge Base Administration Guide

Duplicate Job detection (Alias)

Simscope automatically tracks and drops duplicate jobs during regression import.

Purpose: this feature prevents accidental duplicate (ie alias) job postings from compute nodes. Otherwise duplicates will incorrectly skew statistics (e.g. pass rates) of regressions.

Double publish

It's possible your codebase is publishing job-finish JSON files more than once for each job (ie duplicate publishes).

  • If this occurs, Simscope attempts to discard these.

If you want to debug this in your codebase, you can use log.sh to get a log of all calls to simscope-tunnel

  • Save this to log.sh
#!/bin/bash

set -e

echo "$(date) | $@" >> ~/log.txt
$@

Then prefix your simscope-tunnel command with log.sh . For example:

/path/to/log.sh /path/to/simscope-tunnel --args...

Then re-run your regression, and you can see all the publish commands inside ~/log.txt


Duplicate detection logic

Duplicates are detected based on the following Job fields:

  1. Regression name
  2. Job Start Timestamp
  3. Job Category
  4. Signature (ie bucketized error message)
  5. Additional fields (only if the job timestamp has a 1-second precision)
    • Test config
    • Seed
    • Compute Time

Duplicate job example

For example, if two passing jobs are imported with:

regr:       smoke/10
timestamp:  2019-05-07T02:00:10-05:00
category:   sim
config:     random+nowait
seed:       93c26740
result:     pass
compute-ms: 12945

Then the second job would be skipped.

If you look on the simscope console, you will see a warning similar to this, indicating a duplicate job was skipped:

[INFO ] Rabbit import skipped job records (possible alias) regr=smoke/10 num-skipped=1

Different config (not duplicate)

If this job is imported:

regr:       smoke/10
timestamp:  2019-05-07T02:00:10-05:00
category:   sim
config:     flush+nowait
seed:       93c26740
result:     pass
compute-ms: 12945

It will be added, since the timestamp is second-precision and the config fields are different:

  • random+nowait
  • flush+nowait

Note: a different fix is to ensure the seed or compute time fields have unique value.

  • If two identical pass jobs are imported from the same regression with a different seed or compute time, they will not alias.

Debugging duplicate jobs

By default Simscope only shows a summary when duplicates are hit.

[INFO ] Rabbit import skipped job records (possible alias) regr=smoke/10 num-skipped=1

If you want to see details of each duplicate job, run the Simscope server with the additional --debug command-line switch.

> $SIMSCOPE_BIN/simscope.sh --debug

Then you will get log lines similar to the following, which shows details of the duplicate job:

[DEBUG ] Discarding duplicate job during import (alias)
regr=import/123 sig=2 id=2/b9pryrvuzcbe config=+foo +bar=12
  • In this case, you can open http://SIMSCOPE_URL/sigs/2/b9pryrvuzcbe in your browser to see the job which Simscope detected as a duplicate of.