The chemfp documentation contains many small examples of how to use the chemfp API for different tasks. On this page you'll find larger examples of how to integrate chemfp with other tools.
The entire repository can be cloned using Mercurial using the following:
hg clone https://hg.sr.ht/~dalke/chemfp_examples
or downloaded as chemfp_examples-tip.tar.gz.
These programs use data from
28. The programs which use fingerprints expect them to be in the file
chembl_28.fpb in the current directory. The programs which use
the SQLite database expect to find it
chembl_28_sqlite.db in the current directory. You can also
configure them to use a different location, typically through the
chembl_28.fpb.gz and uncompress it. Here
is one way to do it:
wget https://chemfp.com/datasets/chembl_28.fpb.gz gunzip chembl_28.fpb.gz
The FPB file contains the ChEMBL 28 fingerprints, converted into FPB format, an authorization key that enables chemfp's Tanimoto search for that data set, and the license and other legal details for redistributing ChEMBL data.
While chemfp supports gzip-FPB files, it is typically significantly faster to work with uncompressed files.
chembl_28.db, download the
ChEMBL 28 SQLite distribution,
un-compress and un-tar it, and move the file
your working directory. Here is one way to do it:
wget ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_28/chembl_28_sqlite.tar.gz tar xf chembl_28_sqlite.tar.gz mv chembl_28/chembl_28_sqlite/chembl_28.db .
chembl_search_cli.py is a command-line utility which finds the nearest neighbors in ChEMBL for a given query and outputs the results in SDF format. The query can be a SMILES string, a ChEMBL id, or a ChEMBL compound name. The SDF output is made by combining the neighbor id and score with the molfile and canonical SMILES from the SQLite database.
It also prints timing information to stderr. The default uses
memory-mapping to read the FPB file, which means the search time
includes some of the data load time. Use
--no-mmap to load the data
set entirely into memory before searching.
Examples of use:
python chembl_search_cli.py CHEMBL25 --threshold 0.4 python chembl_search_cli.py caffeine -k 1 python chembl_search_cli.py 'c1ccccc1O' -k 10 --threshold 0.6
More help is available using
python chembl_search_cli.py --help.
chembl_neighbor_browser_flask.py is a web app for browsing through the k-nearest neighbors of a given ChEMBL id. The neighbors are sorted from most similar to least. Each neighbor is shown with its similarity score, an SVG image (served by the ChEMBL server), and its id. Colors are used to help distinguish between different levels of similarity.
This uses basic web technologies that would have been understood in the late 1990s. The web app implements a single function. It processes the query, does the chemfp similarity search, and passes the information to a Jinja2 template which generates the output.
or use the more standard flask interface:
FLASK_APP=chembl_neighbor_browser_flask flask run
then go to the server URL, likely http://127.0.0.1:5000/.
You should see something like:
chembl_search_streamlit.py is a web app based on Streamlit which finds the nearest neighbors in ChEMBL for a given query, displays the results in a table, and offers download links to get the result in SMILES or CSV formats. Streamlit makes it easy to turn a script into an interactive web app.
The query can be a SMILES string, a ChEMBL id, or a ChEMBL compound name. The number of neighbors to find and the minimum similarity threshold can be changed interactively. Or, press the button to go to a random id!
The chemfp search results are put in a Pandas table, along with a
compound name and SMILES, found by an identifier lookup using the
chembl_28.db. The preferred name is used if
there is one, otherwise one of the other available compound names is
used, likely the IUPAC name. RDKit's
is used to add a column containing an image of the structure.
Example of use:
By default this will create a local web server and have your browser connect to it.
You should see something like: