Example programs based on chemfp
added description of how the cli output SDF is generated
decided to use the description as the link text
use the current label for the downloade tar.gz

heads

tip
browse log

clone

read-only
https://hg.sr.ht/~dalke/chemfp_examples
read/write
ssh://hg@hg.sr.ht/~dalke/chemfp_examples

#chemfp examples

The chemfp documentation contains many small examples of how to use the chemfp API for different tasks. On this page you'll find larger examples of how to integrate chemfp with other tools.

The entire repository can be cloned using Mercurial using the following:

hg clone https://hg.sr.ht/~dalke/chemfp_examples

or downloaded as chemfp_examples-tip.tar.gz.

#Datasets

These programs use data from ChEMBL 28. The programs which use fingerprints expect them to be in the file chembl_28.fpb in the current directory. The programs which use the SQLite database expect to find it in chembl_28_sqlite.db in the current directory. You can also configure them to use a different location, typically through the command-line arguments --fingerprints and --sqlite.

#• To get chembl_28.fpb:

To get chembl_28.fpb, download chembl_28.fpb.gz and uncompress it. Here is one way to do it:

wget https://chemfp.com/datasets/chembl_28.fpb.gz
gunzip chembl_28.fpb.gz

The FPB file contains the ChEMBL 28 fingerprints, converted into FPB format, an authorization key that enables chemfp's Tanimoto search for that data set, and the license and other legal details for redistributing ChEMBL data.

While chemfp supports gzip-FPB files, it is typically significantly faster to work with uncompressed files.

#• To get chembl_28.db:

To get chembl_28.db, download the ChEMBL 28 SQLite distribution, un-compress and un-tar it, and move the file chembl_28.db to your working directory. Here is one way to do it:

wget ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_28/chembl_28_sqlite.tar.gz
tar xf chembl_28_sqlite.tar.gz
mv chembl_28/chembl_28_sqlite/chembl_28.db .

#Example programs

chembl_search_cli.py is a command-line utility which finds the nearest neighbors in ChEMBL for a given query and outputs the results in SDF format. The query can be a SMILES string, a ChEMBL id, or a ChEMBL compound name. The SDF output is made by combining the neighbor id and score with the molfile and canonical SMILES from the SQLite database.

It also prints timing information to stderr. The default uses memory-mapping to read the FPB file, which means the search time includes some of the data load time. Use --no-mmap to load the data set entirely into memory before searching.

Examples of use:

python chembl_search_cli.py CHEMBL25 --threshold 0.4
python chembl_search_cli.py caffeine -k 1
python chembl_search_cli.py 'c1ccccc1O' -k 10 --threshold 0.6

More help is available using python chembl_search_cli.py --help.

#★ Neighbor browser using Flask

chembl_neighbor_browser_flask.py is a web app for browsing through the k-nearest neighbors of a given ChEMBL id. The neighbors are sorted from most similar to least. Each neighbor is shown with its similarity score, an SVG image (served by the ChEMBL server), and its id. Colors are used to help distinguish between different levels of similarity.

This uses basic web technologies that would have been understood in the late 1990s. The web app implements a single function. It processes the query, does the chemfp similarity search, and passes the information to a Jinja2 template which generates the output.

The template contains an HTML form that lets you enter a new ChEMBL id or change the value of 'k' for the k-nearest neighbors. There is a small amount of Javascript to auto-submit the form, which makes it feel a bit like a modern AJAX/single-page application.

Example use:

python chembl_neighbor_browser_flask.py

or use the more standard flask interface:

FLASK_APP=chembl_neighbor_browser_flask flask run

then go to the server URL, likely http://127.0.0.1:5000/.

You should see something like:

chembl_search_streamlit.py is a web app based on Streamlit which finds the nearest neighbors in ChEMBL for a given query, displays the results in a table, and offers download links to get the result in SMILES or CSV formats. Streamlit makes it easy to turn a script into an interactive web app.

The query can be a SMILES string, a ChEMBL id, or a ChEMBL compound name. The number of neighbors to find and the minimum similarity threshold can be changed interactively. Or, press the button to go to a random id!

The chemfp search results are put in a Pandas table, along with a compound name and SMILES, found by an identifier lookup using the SQLite database chembl_28.db. The preferred name is used if there is one, otherwise one of the other available compound names is used, likely the IUPAC name. RDKit's PandasTools module is used to add a column containing an image of the structure.

Example of use:

python chembl_search_streamlit.py

By default this will create a local web server and have your browser connect to it.

You should see something like: