The convexity API

Web interface

The convexity containment check inside the reference sets is also available from the queue listing in the web interface by clicking on the convexity button. Running the check from there provides essential the same functionality as this API and supports both submit of positions list and JSON document input.

Queue

Call this API for checking whether the predicted GPS positions are contained in one or more of the reference sets. The positions are either supplied in the payload or read from an already finished job.

The number of dimensions as well as the reference sets to use can be selected. For large dataset its possible to offload running the convexity check (this provides async capabilities) as an queued job using the convexity task.

The input structure

The request payload can contain identity or positions (not both at same time), dimensions, refset and offload. Except for input data (identity or positions), all other is optional with suitable defaults. See respective section below for more information.

Identity Positions Dimensions Algorithm Reference Sets Offload

To get an feel of how an request looks like, heres an example that will supply data and check containment for the first three dimensions:

{
    "offload": false,
    "dimensions": 3,
    "algorithm": "linprog",
    "refset": [
        "set1",
        "set2"
    ],
    "positions": [
        [
            -6.356088,
            -1.367276,
            -0.882047,
            -1.370692,
            0.689635,
            -0.571485,
            -1.360353,
            -0.879498
        ],
        [
            -6.225663,
            -1.302499,
            -0.747851,
            -1.284257,
            0.583682,
            -0.362617,
            -1.209907,
            -0.813425
        ]
    ]
}

Replace positions with the identity to compute convexity for an existing job. All positions will then be automatic fetched directly from the job and no data need to be sent along with the request.

Response message

The response will contain the inside and refset section, were the inside section lists the computed result for each selected reference set. Because the dimension and algorithm is decided dynamic unless explicit requested its also included for information.

{
    "inside": {
        "set1": [
            false,
            true
        ],
        "set2": [
            true,
            true
        ]
    },
    "refset": [],
    "algorithm": "convhull",
    "dimensions": 3
}

If references is set to true in the input document, then the reference set points are echoed back inside the refset section. Having the reference sets data might be useful for plotting, but is excluded unless explicit requested. If points is true in request, then the positions are also included in the response.

Using a table is perhaps a better way to describe the relationship between input and response:

index (positions)set1 (refset)set2 (refset)
0falsetrue
1truetrue
Performance

Some experiment is required to find the optimal combination of settings.

The number of dimensions has some influence on the performance as the number of vertices grows, making computation more complex. Selecting the algorithm has also some influence on computing time. The selected reference sets and number of rows in positions has linear effect.

Requests

Both input and response contains JSON encoded data. GET requests are supported, but due to encoding issues its recommended to use POST. The curl command can be used for exploring the API:

curl -XPOST http://chemgps.bmc.uu.se/batchelor/api/convexity/?pretty=1 -d '{"dimensions":3,"identity":{"jobid":"00a0cacb-fcce-4e96-ae1f-b6e2ceddbb3d","result": "15419047559485"},"refset":["cubes"]}'
{
    "status": "success",
    "result": {
        "inside": {
                "cubes": [ 
                        // array of booleans 
                ]
        }, 
        "refset": {}
    }
}

Passing a JSON document is probably more convenient as the data set might be large:

curl -XPOST "http://chemgps.bmc.uu.se/batchelor/api/convexity/?pretty=1" -i -d @indata.json
{
    "status": "success",
    "result": {
        "inside": {
                "cubes": [
                        // array of booleans
                ]
        }, 
        "refset": {}
    }
}

Testing (the unit cube)

The reference set "cubes" is provided for testing the API and defines an 8 dimensional unit, hypercube spanning (0,0,0) -> (1,1,1) when projected in 3D space.

About convhull algorithm in 8 dimensions

Because computing containment using the convex hull algorithm in all 8 dimensions is slow even for a small set of positions, it has been prohibited. Either use linprog as algorithm for dimensions > 3 or leave algorithms undefined (and let system decide).

Download the JSON document to disk and use it for testing:

curl -XPOST "http://chemgps.bmc.uu.se/batchelor/api/convexity/?pretty=1" -i -d @linprog-8d-cubes.json
HTTP/1.1 200 OK
Date: Wed, 05 Dec 2018 22:49:59 GMT
Server: Apache
X-Powered-By: PHP/7.1.22
Set-Cookie: PHPSESSID=h83nlqf0mtvct9dhu0oo4uddul; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Content-Length: 151
Content-Type: application/json

{
    "status": "success",
    "result": {
        "refset": [],
        "inside": {
            "cubes": [
                true,true,true,true,true,true,true,true,true,false,false,false,false,false,false,false
            ]
        }
    }
}

JSON document

Use data from existing job (identity)

Supply the identity of an existing job to compute the convexity for. The positions is read from job result. Notice that using identity is mutual exclusive with using positions.

The job identity can be obtained from the queue or opendir methods in the queue API. An single identity object is accepted for this key:

{
    "identity": {
        "jobid": "00a0cacb-fcce-4e96-ae1f-b6e2ceddbb3d",
        "result": "15419047559485"
    }
}

Use supplied data (positions)

The data for convexity computation is supplied direct in the payload. Notice that using positions is mutual exclusive with using identity.

The data for positions is an two-dimensional array where each child-array contains 3 - 8 floating point values:

{
    "positions": [
        [
            -6.356088,
            -1.367276,
            -0.882047
        ],
        [
            "..."
        ]
    ]
}

Number of elements in the arrays has to be grater or equal to the dimensions value in the payload. If the dimensions key is missing, then it defaults to three which implies that the number of elements in each child array has to have at least three values.

The complexity (dimensions)

Select number of dimensions to compute the convexity over with three being used by default.

The first three dimensions is the most characteristic for the ChemGPS-NP model, so limiting computation to first three dimensions gives a good approximation.

{
    "dimensions": 3
}

In practice, the only two interesting values for the dimensions is either 3 or 8, but feel free to use anything else between one and eight. The dimensions is intimately connected with the positions data (whether is was supplied direct or read from an existing job) in that is defines the number of columns to use for each "row" (child array in positions) starting from index 0.

Select computing method (algorithm)

The default algorithm should be fine, but its possible to select one explicit.

The available names for algorithm are linprog or convhull. The convhull algorithm should not be used for dimensions > 3 due to bad performance and its usage will in this case be prohibited.

{
    "algorithm": "linprog"
}

Reference sets (refset)

Select the reference sets to compute positions containment within.

The refset is either an array or the string '*'. For each selected reference set, the output result will contain an array of true/false values keyed by the set name indicating whether the position was contained by that reference set. Array indexes in each result array has a one-to-one mapping with indexes in the positions array.

The reference sets are subject to tweaks. Running the same computation in the future might yield slightly different results if the reference sets has been curated.

By default, all reference sets are selected. This is equivalent to using the asterisk:

{
    "refset": "*"
}

Supply an array of reference set names to select these only:

{
    "refset": [
        "set1",
        "set2"
    ]
}

Perform async computation (offload)

If enabled, then the computation is offloaded to be run as an task executing in the background.

The task is scheduled for execution as a convexity task that can be monitored like any other job (polled for completion) using the queue API. Once finished the result can be downloaded.

{
    "offload": true
}

Instead of returning the computed result, the response will contain an queued job object structure:

curl -XPOST http://chemgps.bmc.uu.se/batchelor/api/convexity/?pretty=1 -i -d @linprog-8d-cubes.json
{
    "status": "success",
    "result": {
        "identity": {
            "jobid": "50a7f2bb-d2f1-46e3-b596-9cd5032b7b79",
            "result": "f528764d624db129b32c21fbca0cb8d6"
        },
        "status": {
            "queued": {
                "date": "2018-12-06 04:17:47.287722",
                "timezone_type": 3,
                "timezone": "Europe/Stockholm"
            },
            "started": null,
            "finished": null,
            "state": "pending"
        },
        "submit": {
            "task": "convexity",
            "name": null
        }
    }
}

The identity structure can be passed to the queue API (the stat method) for checking its status:

curl -XPOST http://chemgps.bmc.uu.se/batchelor/api/convexity/?pretty=1 -i -d '{"jobid": "50a7f2bb-d2f1-46e3-b596-9cd5032b7b79","result": "f528764d624db129b32c21fbca0cb8d6"}'
{
    "status": "success",
    "result": {
        "queued": {
            "date": "2018-12-06 04:17:47.287722",
            "timezone_type": 3,
            "timezone": "Europe/Stockholm"
        },
        "started": null,
        "finished": null,
        "state": "pending"
    }
}

When state switches from pending or running to success, it means that job has completed without errors. We can now continue by downloading the result:

curl -XPOST http://chemgps.bmc.uu.se/batchelor/api/convexity/?pretty=1 -i -d '{"job":{"jobid": "50a7f2bb-d2f1-46e3-b596-9cd5032b7b79","result": "f528764d624db129b32c21fbca0cb8d6"},"file":"result/output"}'
HTTP/1.1 200 OK
Date: Thu, 06 Dec 2018 03:38:01 GMT
Server: Apache
X-Powered-By: PHP/7.1.22
Set-Cookie: PHPSESSID=snhsk242hvgbccrhdloo0h81m7; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate
Pragma: no-cache
Content-Disposition: attachment; filename="output"
Content-Length: 181
ETag: 4b7cac3f522fb2a2fa858b0902b56353
Content-Type: text/plain;charset=UTF-8

{"inside": {"cubes": [true, true, true, true, true, true, true, true, true, false, false, false, false, false, false, false]}, "algorithm": "linprog", "dimensions": 3, "refset": {}}

Notice that the identity is data should be placed inside jobid. The response is for file download intended to trigger "save as" dialog to open and has the proper HTTP-headers, including text/plain as MIME-type as JSON-files are just plain text.

See queue API for further information about monitor jobs and reading results.

Monitor Reading