Software for molecule dataset preparation, semi auto-curation and data visualization

Features:

•Converts units

•Converts SMILES MolFiles

•Handles duplicate compounds

•Adds InChI keys and molecular weights

•Removes special characters from the header

•Removes NA activities

•Neutralizes charges

•Removes salts / other chemical fragments

•Uses decision boundary to binarize values

•Can convert units

•Can use > < = qualifiers to filter and remove ambiguous values

•Removes duplicate values that don’t match agreement ratio (fraction of similar binary values)

•Removes NA values

•Returns rows w/ matching or mismatched values in two datasets

•Can search by InChI key conversion or raw values

•Uses ECFP (adjustable radius and bit) or MACCs fingerprints to generate similarity matrix values and graphic

•Can use same dataset for each axis or upload a different one

•Generates t-SNE for ECFP (adjustable radius and bit), MACCs, other quantifiable descriptors, or ECFP + other descriptors

•Other descriptors are z-normalized

•Generates plot which can be edited and downloaded as SVG or PNG

Access

  • We can use e-Clean in fee for service work for you.

  • We can provide an annual license for you to access this software on your own server.

  • We provide maintenance and customization options.

Please contact us to hear about how you can use this technology.