looks like I figured out the image links
another attempt
what about this way to get the image?
The chemfp documentation contains many small examples of how to use the chemfp API for different tasks. On this page you'll find larger examples of how to integrate chemfp with other tools.
The entire repository can be cloned using Mercurial using the following:
hg clone https://hg.sr.ht/~dalke/chemfp_examples
or downloaded as chemfp_examples-tip.tar.gz.
These programs use data from
ChEMBL
28. The programs which use fingerprints expect them to be in the file
chembl_28.fpb
in the current directory. The programs which use
the SQLite database expect to find it
in chembl_28_sqlite.db
in the current directory. You can also
configure them to use a different location, typically through the
command-line arguments --fingerprints
and --sqlite
.
chembl_28.fpb
:To get chembl_28.fpb
, download
chembl_28.fpb.gz and uncompress it. Here
is one way to do it:
wget https://chemfp.com/datasets/chembl_28.fpb.gz
gunzip chembl_28.fpb.gz
The FPB file contains the ChEMBL 28 fingerprints, converted into FPB format, an authorization key that enables chemfp's Tanimoto search for that data set, and the license and other legal details for redistributing ChEMBL data.
While chemfp supports gzip-FPB files, it is typically significantly faster to work with uncompressed files.
chembl_28.db
:To get chembl_28.db
, download the
ChEMBL 28 SQLite distribution,
un-compress and un-tar it, and move the file chembl_28.db
to
your working directory. Here is one way to do it:
wget https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_28/chembl_28_sqlite.tar.gz
tar xf chembl_28_sqlite.tar.gz
mv chembl_28/chembl_28_sqlite/chembl_28.db .
chembl_search_cli.py is a command-line utility which finds the nearest neighbors in ChEMBL for a given query and outputs the results in SDF format. The query can be a SMILES string, a ChEMBL id, or a ChEMBL compound name. The SDF output is made by combining the neighbor id and score with the molfile and canonical SMILES from the SQLite database.
It also prints timing information to stderr. The default uses
memory-mapping to read the FPB file, which means the search time
includes some of the data load time. Use --no-mmap
to load the data
set entirely into memory before searching.
Examples of use:
python chembl_search_cli.py CHEMBL25 --threshold 0.4
python chembl_search_cli.py caffeine -k 1
python chembl_search_cli.py 'c1ccccc1O' -k 10 --threshold 0.6
More help is available using python chembl_search_cli.py --help
.
chembl_neighbor_browser_flask.py is a web app for browsing through the k-nearest neighbors of a given ChEMBL id. The neighbors are sorted from most similar to least. Each neighbor is shown with its similarity score, an SVG image (served by the ChEMBL server), and its id. Colors are used to help distinguish between different levels of similarity.
This uses basic web technologies that would have been understood in the late 1990s. The web app implements a single function. It processes the query, does the chemfp similarity search, and passes the information to a Jinja2 template which generates the output.
The template contains an HTML form that lets you enter a new ChEMBL id or change the value of 'k' for the k-nearest neighbors. There is a small amount of Javascript to auto-submit the form, which makes it feel a bit like a modern AJAX/single-page application.
Example use:
python chembl_neighbor_browser_flask.py
or use the more standard flask interface:
FLASK_APP=chembl_neighbor_browser_flask flask run
then go to the server URL, likely http://127.0.0.1:5000/.
You should see something like:
chembl_search_streamlit.py is a web app based on Streamlit which finds the nearest neighbors in ChEMBL for a given query, displays the results in a table, and offers download links to get the result in SMILES or CSV formats. Streamlit makes it easy to turn a script into an interactive web app.
The query can be a SMILES string, a ChEMBL id, or a ChEMBL compound name. The number of neighbors to find and the minimum similarity threshold can be changed interactively. Or, press the button to go to a random id!
The chemfp search results are put in a Pandas table, along with a
compound name and SMILES, found by an identifier lookup using the
SQLite database chembl_28.db
. The preferred name is used if
there is one, otherwise one of the other available compound names is
used, likely the IUPAC name. RDKit's
PandasTools module
is used to add a column containing an image of the structure.
Example of use:
python chembl_search_streamlit.py
By default this will create a local web server and have your browser connect to it.
You should see something like: