The molecules database

About the database

The molecules database contains already predicted GPS positions for about one million molecules represented in SMILES string format. This data have been collected from jobs submitted to ChemGPS-NP Web for a period of time of about ten years and this database is now made available for public search.

This database can also be queried/browsed using our JSON API.

Show more

This page shows some general help for using the web interface for the molecules database.

Search Browse

The search field

By default, the search string is matched against SMILES strings:

If you want the search string to match a specific column in the database, then you can prefix it with the column name:

Some search strings might contain whitespace characters. In this case, quote the search string:

For matching multiple columns at once, supply a space separated list of search string prefixed by their column name:

Using multiple search criterions will perform a logical AND when matching the results. All searched is case insensitive.

Fuzzy matching

When fuzzy matching is selected, then a full-text index is used for matching search parameters.

This mode has some advantages in that it finds different combinations of the supplied smiles string, but may return false positives especial when searching in multiple columns.

The default search mode is more traditional and supports searching for example for molecule fragments in the complete SMILES-string stored in the database.

Using wildcards

Use wildcard characters for searching for substrings. For example, this search will find all molecules collected september 2017:

For fuzzy search theres no need for using wildcards.

Column names

These column names can be used as prefixes for search strings:

NameTypeDescription
nametextThe molecule name (might be missing or having random names)
smilestextThe SMILES string (always present)
pos1 - pos8floatOne of the eight predicted positions
createddatetimeThe date and time when molecule was added to database
ipaddrtextThe IP address from where molecule was submitted
hostidtextThe host ID from were molecule was added
md5sumchar(32)The computed checksum for this molecule

Advanced options

The restrictions options

Selecting the option "restrict to molecules submitted from this computer" will filter your search result to molecules collected from jobs submitted from your computer. This is equivalent to using "ipaddr:xxx" with xxx replaced by your ip-address as one search criteria.

If "restrict to current used queue name" is selected, then search results are also filtered on your currently used queue name. This is equivalent to using "hostid:xxx" as one search criteria, with xxx taken from the hostid cookie used to track your currently used job queue.