| Title: | Interface to the 'PubChem' Database for Chemical Data Retrieval |
|---|---|
| Description: | Provides an interface to the 'PubChem' database via the PUG REST <https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest> and PUG View <https://pubchem.ncbi.nlm.nih.gov/docs/pug-view> services. This package allows users to automatically access chemical and biological data from 'PubChem', including compounds, substances, assays, and various other data types. Functions are available to retrieve data in different formats, perform searches, and access detailed annotations. |
| Authors: | Selcuk Korkmaz [aut, cre]
|
| Maintainer: | Selcuk Korkmaz <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 3.0.0 |
| Built: | 2026-05-28 07:01:17 UTC |
| Source: | https://github.com/selcukorkmaz/pubchemr |
These functions are used to retrieve identification information for assays, substances, and compounds from the PubChem database.
AIDs(object, ...) CIDs(object, ...) SIDs(object, ...) ## S3 method for class 'PubChemInstance_AIDs' AIDs(object, .to.data.frame = TRUE, ...) ## S3 method for class 'PubChemInstance_CIDs' CIDs(object, .to.data.frame = TRUE, ...) ## S3 method for class 'PubChemInstance_SIDs' SIDs(object, .to.data.frame = TRUE, ...)AIDs(object, ...) CIDs(object, ...) SIDs(object, ...) ## S3 method for class 'PubChemInstance_AIDs' AIDs(object, .to.data.frame = TRUE, ...) ## S3 method for class 'PubChemInstance_CIDs' CIDs(object, .to.data.frame = TRUE, ...) ## S3 method for class 'PubChemInstance_SIDs' SIDs(object, .to.data.frame = TRUE, ...)
object |
An object returned from a PubChem request, typically generated by functions such as get_cids, get_aids, and get_sids. |
... |
Additional arguments passed to other methods. Currently, these arguments have no effect. |
.to.data.frame |
a logical. If TRUE, returned object will be forced to be converted into a data.frame (or tibble). If failed to convert into a data.frame, a list will be returned with a warning. Be careful for complicated lists (i.e., many elements nested within each other) since it may be time consuming to convert such lists into a data frame. |
The generic dispatches to class-specific methods and returns either a compact
identifier table (.to.data.frame = TRUE) or raw nested payloads.
A tibble (default) or list containing mapped identifier outputs.
# Retrieve Assay IDs aids <- get_aids(identifier = c("aspirin", "caffeine"), namespace = "name") AIDs(aids) # Compound IDs cids <- get_cids(identifier = c("aspirin", "caffein"), namespace = "name") CIDs(cids) # Substance IDs sids <- get_sids(identifier = c("aspirin", "caffein"), namespace = "name") SIDs(sids)# Retrieve Assay IDs aids <- get_aids(identifier = c("aspirin", "caffeine"), namespace = "name") AIDs(aids) # Compound IDs cids <- get_cids(identifier = c("aspirin", "caffein"), namespace = "name") CIDs(cids) # Substance IDs sids <- get_sids(identifier = c("aspirin", "caffein"), namespace = "name") SIDs(sids)
This function sends a request to PubChem to retrieve content in the specified format for a given identifier. It then writes the content to a specified file path.
download( filename = NULL, outformat, path, identifier, namespace = "cid", domain = "compound", operation = NULL, searchtype = NULL, overwrite = FALSE, options = NULL )download( filename = NULL, outformat, path, identifier, namespace = "cid", domain = "compound", operation = NULL, searchtype = NULL, overwrite = FALSE, options = NULL )
filename |
a character string specifying the file name to be saved. If not specified, a default file name "file" is used. |
outformat |
A character string specifying the desired output format (e.g., "sdf", "json"). |
path |
A character string specifying the path where the content should be saved. |
identifier |
A vector of positive integers (e.g. cid, sid, aid) or identifier strings (source, inchikey, formula). In some cases, only a single identifier string (name, smiles, xref; inchi, sdf by POST only). |
namespace |
Specifies the namespace for the query. For the 'compound' domain, possible values include 'cid', 'name', 'smiles', 'inchi', 'sdf', 'inchikey', 'formula', 'substructure', 'superstructure', 'similarity', 'identity', 'xref', 'listkey', 'fastidentity', 'fastsimilarity_2d', 'fastsimilarity_3d', 'fastsubstructure', 'fastsuperstructure', and 'fastformula'. For other domains, the possible namespaces are domain-specific. |
domain |
Specifies the domain of the query. Possible values are 'substance', 'compound', 'assay', 'gene', 'protein', 'pathway', 'taxonomy', 'cell', 'sources', 'sourcetable', 'conformers', 'annotations', 'classification', and 'standardize'. |
operation |
Specifies the operation to be performed on the input records. For the 'compound' domain, possible operations include 'record', 'property', 'synonyms', 'sids', 'cids', 'aids', 'assaysummary', 'classification', 'xrefs', and 'description'. The available operations are domain-specific. |
searchtype |
Specifies the type of search to be performed. For structure searches, possible values are combinations of 'substructure', 'superstructure', 'similarity', 'identity' with 'smiles', 'inchi', 'sdf', 'cid'. For fast searches, possible values are combinations of 'fastidentity', 'fastsimilarity_2d', 'fastsimilarity_3d', 'fastsubstructure', 'fastsuperstructure' with 'smiles', 'smarts', 'inchi', 'sdf', 'cid', or 'fastformula'. |
overwrite |
A logical value indicating whether to overwrite the file if it already exists. Default is FALSE. |
options |
Additional arguments. |
'download()' is a convenience wrapper around get_pubchem() for
file-oriented workflows. The returned payload is written as bytes, so both
text and binary outputs can be handled.
No return value. The function writes the content to the specified file path and prints a message indicating the save location.
# Download JSON file for the compound "aspirin" into "Aspirin.JSON" # A folder named "Compound" will be created under current directory" download( filename = "Aspirin", outformat = "json", path = "./Compound", identifier = "aspirin", namespace = "name", domain = "compound", overwrite = TRUE ) # Remove downloaded files and folders. file.remove("./Compound/Aspirin.json") file.remove("./Compound/")# Download JSON file for the compound "aspirin" into "Aspirin.JSON" # A folder named "Compound" will be created under current directory" download( filename = "Aspirin", outformat = "json", path = "./Compound", identifier = "aspirin", namespace = "name", domain = "compound", overwrite = TRUE ) # Remove downloaded files and folders. file.remove("./Compound/Aspirin.json") file.remove("./Compound/")
This function queries the PubChem database to retrieve Assay IDs (AIDs) based on a given identifier.
get_aids( identifier, namespace = "cid", domain = "compound", searchtype = NULL, options = NULL )get_aids( identifier, namespace = "cid", domain = "compound", searchtype = NULL, options = NULL )
identifier |
A vector of identifiers, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace of the identifier. Possible values depend on the - For - For - For For more details, see the Input section. |
domain |
A character string specifying the domain of the query. Possible values are: - - - - Other domains as specified in the API documentation. |
searchtype |
An optional character string specifying the search type. Possible values depend on the Examples include: - - If |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstance_AIDs', which is a list containing information retrieved from the PubChem database. Assay IDs can be extracted from the returned object using the getter function AIDs.
To extract assay IDs from returned object, one may use AIDs function. See examples.
# Request for multiple assays # If assay identifier is unknown or incorrect, an error returns from PubChem Database aids <- get_aids( identifier = c("aspirin", "ibuprofen", "rstudio"), namespace = "name" ) print(aids) # Return all Assay IDs. AIDs(aids)# Request for multiple assays # If assay identifier is unknown or incorrect, an error returns from PubChem Database aids <- get_aids( identifier = c("aspirin", "ibuprofen", "rstudio"), namespace = "name" ) print(aids) # Return all Assay IDs. AIDs(aids)
This function retrieves a list of all current depositors of substances or assays from PubChem.
get_all_sources(domain = "substance")get_all_sources(domain = "substance")
domain |
A character string specifying the domain for which sources are to be retrieved. Possible values are: - ''substance'' (default) - ''assay'' |
The PubChem PUG REST API provides a way to retrieve all current depositors (sources) for substances or assays. For more detailed information, please refer to the PubChem Data Sources documentation.
A character vector containing the names of all sources for the specified domain.
get_all_sources( domain = 'substance' )get_all_sources( domain = 'substance' )
This function sends a request to PubChem to retrieve assay data based on the specified parameters.
get_assays(identifier, namespace = "aid", operation = NULL, options = NULL)get_assays(identifier, namespace = "aid", operation = NULL, options = NULL)
identifier |
A vector of positive integers (e.g., |
namespace |
A character string specifying the namespace of the identifier. Possible values include: - - - - - - For more details, see the Input section of the PUG REST API. |
operation |
A character string specifying the operation to perform. Possible values include: - - - - - - - - - If For a full list of operations, see the Operations section of the PUG REST API. |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstanceList' containing the information retrieved from the PubChem database.
To extract information about a specific assay from the returned list, use the instance function.
Each assay may include information on several properties. Specific information from the assay can be extracted using the retrieve function. See examples.
# Retrieve a list of assays from the PubChem database assays <- get_assays( identifier = c(1234, 7815), namespace = 'aid' ) # Return assay information for assay ID '1234' assay1234 <- instance(assays, "1234") print(assay1234) # Retrieve specific elements from the assay output retrieve(assay1234, "aid")# Retrieve a list of assays from the PubChem database assays <- get_assays( identifier = c(1234, 7815), namespace = 'aid' ) # Return assay information for assay ID '1234' assay1234 <- instance(assays, "1234") print(assay1234) # Retrieve specific elements from the assay output retrieve(assay1234, "aid")
This helper fetches a PUG View record and extracts section(s) with heading '"Biological Test Results"' (or a custom heading) from the PubChem 'CONTENTS' structure.
get_biological_test_results( identifier, domain = "compound", heading = "Biological Test Results", .match_type = c("match", "contain"), .all = FALSE, .verbose = FALSE, ... )get_biological_test_results( identifier, domain = "compound", heading = "Biological Test Results", .match_type = c("match", "contain"), .all = FALSE, .verbose = FALSE, ... )
identifier |
A single identifier for a PUG View request. |
domain |
A domain value accepted by |
heading |
Section heading to search for. Defaults to
|
.match_type |
Matching strategy for |
.all |
Logical. If |
.verbose |
Logical. If |
... |
Additional arguments passed to |
This is a targeted utility built on top of get_pug_view,
retrieve, and sectionList to simplify extraction
of deeply nested biological testing sections from PUG View records.
A PugViewSection object (default) or PugViewSectionList
when .all = TRUE. If no section is found, a failed
PugViewSection object is returned with error details.
bio <- get_biological_test_results(identifier = "2244", domain = "compound") bio # Return all matching sections (if multiple are present) bio_all <- get_biological_test_results( identifier = "2244", domain = "compound", .all = TRUE ) bio_allbio <- get_biological_test_results(identifier = "2244", domain = "compound") bio # Return all matching sections (if multiple are present) bio_all <- get_biological_test_results( identifier = "2244", domain = "compound", .all = TRUE ) bio_all
This function sends a request to PubChem to retrieve Compound IDs (CIDs) for given identifier(s).
get_cids( identifier, namespace = "name", domain = "compound", searchtype = NULL, options = NULL )get_cids( identifier, namespace = "name", domain = "compound", searchtype = NULL, options = NULL )
identifier |
A vector of identifiers, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace of the identifier. Possible values depend on the - For - For - For For more details, see the Input section of the PUG REST API. |
domain |
A character string specifying the domain of the query. Possible values are: - - - - Other domains as specified in the API documentation. |
searchtype |
An optional character string specifying the search type. Possible values depend on the Examples include: - - If |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstance_CIDs', which is a list containing information retrieved from the PubChem database. Compound IDs can be extracted from the returned object using the CIDs function.
To extract compoud IDs from returned object, one may use CIDs function. See examples.
compound <- get_cids( identifier = "aspirin", namespace = "name" ) compound # Extract compound IDs. CIDs(compound)compound <- get_cids( identifier = "aspirin", namespace = "name" ) compound # Extract compound IDs. CIDs(compound)
This function sends a request to the PubChem database to retrieve compound data based on specified parameters.
get_compounds( identifier, namespace = "cid", operation = NULL, searchtype = NULL, options = NULL )get_compounds( identifier, namespace = "cid", operation = NULL, searchtype = NULL, options = NULL )
identifier |
A vector of positive integers (e.g., cid, sid, aid) or identifier strings (source, inchikey, formula).
In some cases, a single identifier string (e.g., name, smiles, xref; inchi, sdf by POST only) is sufficient.
**Note**: |
namespace |
A character string specifying the namespace of the identifier. Possible values include: - - - - - - For more details, see the Input section of the PUG REST API. |
operation |
A character string specifying the operation to perform. Possible values include: - - - - - - If For a full list of operations, see the Operations section of the PUG REST API. |
searchtype |
An optional character string specifying the search type. Possible values include: - - - - If For more details, see the Input section of the PUG REST API. |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstanceList' and 'PC_Compounds' containing compound information from the PubChem database.
compound <- get_compounds( identifier = c("aspirin", "ibuprofen", "rstudio"), namespace = "name" ) print(compound) # Return results for selected compound. instance(compound, "aspirin") instance(compound, "rstudio") # instance(compound, "unknown"). # returns error. # Extract compound properties for the compound "aspirin". # Use the 'retrieve()' function to extract specific slots from the compound list. retrieve(instance(compound, "aspirin"), "props")compound <- get_compounds( identifier = c("aspirin", "ibuprofen", "rstudio"), namespace = "name" ) print(compound) # Return results for selected compound. instance(compound, "aspirin") instance(compound, "rstudio") # instance(compound, "unknown"). # returns error. # Extract compound properties for the compound "aspirin". # Use the 'retrieve()' function to extract specific slots from the compound list. retrieve(instance(compound, "aspirin"), "props")
This function sends a request to PubChem to retrieve compound properties based on the specified parameters.
get_properties( properties = NULL, identifier, namespace = "cid", searchtype = NULL, options = NULL, propertyMatch = list(.ignore.case = FALSE, type = "contain") ) property_map( x, type = c("match", "contain", "start", "end", "all"), .ignore.case = TRUE, ... )get_properties( properties = NULL, identifier, namespace = "cid", searchtype = NULL, options = NULL, propertyMatch = list(.ignore.case = FALSE, type = "contain") ) property_map( x, type = c("match", "contain", "start", "end", "all"), .ignore.case = TRUE, ... )
properties |
A character vector specifying the properties to retrieve.
If |
identifier |
A vector of compound identifiers, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace of the identifier. Possible values include: - - - - - - - Other namespaces as specified in the API documentation. |
searchtype |
An optional character string specifying the search type. Possible values include: - - - - - Other search types as specified in the API documentation. If For more details, see the API documentation. |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
propertyMatch |
A list of arguments to control how properties are matched. The list can include: - - - Default is |
x |
A character vector of compound properties. The property_map function will search for each property provided here within the available properties. The search can be customized using the |
type |
Defines how to search within the available properties. The default is "match". See Notes for details. |
.ignore.case |
A logical value. If TRUE, the pattern match ignores case letters. This argument is ignored if |
... |
Other arguments. Currently, these have no effect on the function's return. |
For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class "PubChemInstanceList" containing all the properties of the requested compounds.
property_map() is not used to request properties directly from the PubChem database. This function is intended to list the available compound properties that can be requested from PubChem. It has flexible options to search properties from the available property list of the PubChem database. The output of property_map is used as the property input in the get_properties function. This function may be practically used to request specific properties across a range of compounds. See examples for usage.
# Isomeric SMILES of the compounds props <- get_properties( properties = c("MolecularWeight", "MolecularFormula", "InChI"), identifier = c("aspirin", "ibuprofen", "caffeine"), namespace = "name" ) # Properties for a selected compound instance(props, "aspirin") retrieve(props, .which = "aspirin", .slot = NULL) retrieve(instance(props, "aspirin"), .slot = NULL) # Combine properties of all compounds into a single data frame (or list) retrieve(props, .combine.all = TRUE) # Return selected properties retrieve(props, .combine.all = TRUE, .slot = c("MolecularWeight", "MolecularFormula")) # Return properties for the compounds in a range of CIDs props <- get_properties( properties = c("mass", "molecular"), identifier = 2244:2255, namespace = "cid", propertyMatch = list( type = "contain" ) ) retrieve(props, .combine.all = TRUE, .to.data.frame = TRUE) # Return all available properties of the requested compounds props <- get_properties( properties = NULL, identifier = 2244:2245, namespace = "cid", propertyMatch = list( type = "all" ) ) retrieve(props, .combine.all = TRUE) #### EXAMPLES FOR property_map() #### # List all available properties: property_map(type = "all") # Exact match: property_map("InChI", type = "match") property_map("InChi", type = "match", .ignore.case = TRUE) # Returns no match. Ignores '.ignore.case' # Match at the start/end: property_map("molecular", type = "start", .ignore.case = TRUE) property_map("mass", type = "end", .ignore.case = TRUE) # Partial match with multiple search patterns: property_map(c("molecular", "mass", "inchi"), type = "contain", .ignore.case = TRUE)# Isomeric SMILES of the compounds props <- get_properties( properties = c("MolecularWeight", "MolecularFormula", "InChI"), identifier = c("aspirin", "ibuprofen", "caffeine"), namespace = "name" ) # Properties for a selected compound instance(props, "aspirin") retrieve(props, .which = "aspirin", .slot = NULL) retrieve(instance(props, "aspirin"), .slot = NULL) # Combine properties of all compounds into a single data frame (or list) retrieve(props, .combine.all = TRUE) # Return selected properties retrieve(props, .combine.all = TRUE, .slot = c("MolecularWeight", "MolecularFormula")) # Return properties for the compounds in a range of CIDs props <- get_properties( properties = c("mass", "molecular"), identifier = 2244:2255, namespace = "cid", propertyMatch = list( type = "contain" ) ) retrieve(props, .combine.all = TRUE, .to.data.frame = TRUE) # Return all available properties of the requested compounds props <- get_properties( properties = NULL, identifier = 2244:2245, namespace = "cid", propertyMatch = list( type = "all" ) ) retrieve(props, .combine.all = TRUE) #### EXAMPLES FOR property_map() #### # List all available properties: property_map(type = "all") # Exact match: property_map("InChI", type = "match") property_map("InChi", type = "match", .ignore.case = TRUE) # Returns no match. Ignores '.ignore.case' # Match at the start/end: property_map("molecular", type = "start", .ignore.case = TRUE) property_map("mass", type = "end", .ignore.case = TRUE) # Partial match with multiple search patterns: property_map(c("molecular", "mass", "inchi"), type = "contain", .ignore.case = TRUE)
This function sends a request to the PubChem PUG REST API to retrieve various types of data for a given identifier. It supports fetching data in different formats and allows saving the output.
get_pug_rest( identifier = NULL, namespace = "cid", domain = "compound", operation = NULL, output = "JSON", searchtype = NULL, property = NULL, options = NULL, save = FALSE, dpi = 300, path = NULL, file_name = NULL, ... )get_pug_rest( identifier = NULL, namespace = "cid", domain = "compound", operation = NULL, output = "JSON", searchtype = NULL, property = NULL, options = NULL, save = FALSE, dpi = 300, path = NULL, file_name = NULL, ... )
identifier |
A vector of identifiers for the query, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace for the request. Possible values include: - - - - - - - For more details, see the PUG REST API documentation. |
domain |
A character string specifying the domain for the request. Possible values include: - - - For more details, see the PUG REST API documentation. |
operation |
An optional character string specifying the operation for the request. Possible values depend on the Examples include: - - - - - If For a full list of operations, see the PUG REST API documentation. |
output |
A character string specifying the output format. Possible values are: - - - - - - - For more details, see the PUG REST API documentation. |
searchtype |
An optional character string specifying the search type. Possible values include: - - - If For more details, see the PUG REST API documentation. |
property |
An optional character string specifying the property or properties to retrieve. This is typically used when Examples include: - - - - - If For a full list of properties, see the Compound Property Tables. |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
save |
A logical value indicating whether to save the output as a file or image.
Default is |
dpi |
An integer specifying the DPI for image output when |
path |
A character string specifying the directory path where the output file will be saved if |
file_name |
A character string specifying the name of the file (without file extension) to save.
If |
... |
Additional arguments passed to the underlying HTTP request functions. |
For more information on the possible values for parameters such as namespace, domain, operation,
output, searchtype, and property, please refer to the
PUG REST API documentation.
An object of class ''PugRestInstance'' containing:
Logical value indicating if the request was successful.
If 'success' is 'FALSE', a list containing error messages.
The content retrieved from the API; format depends on 'output'.
A list of the arguments used in the request.
If 'save' is 'TRUE', details about the saved file.
result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", output = "JSON" ) pubChemData(result)result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", output = "JSON" ) pubChemData(result)
This function sends a request to the PubChem PUG View API to retrieve various types of data for a given identifier. It supports fetching annotations, QR codes, and more, with options for different output formats including JSON and SVG.
get_pug_view( annotation = "data", identifier = NULL, domain = "compound", output = "JSON", heading = NULL, headingType = NULL, page = NULL, qrSize = "short", save = FALSE )get_pug_view( annotation = "data", identifier = NULL, domain = "compound", output = "JSON", heading = NULL, headingType = NULL, page = NULL, qrSize = "short", save = FALSE )
annotation |
A character string specifying the type of annotation to retrieve. Valid values are:
|
identifier |
A single identifier for the query, either numeric or character.
**Note:** Only one identifier is allowed per request for certain annotations.
For |
domain |
A character string specifying the domain for the request. Possible values include:
- Other domains as specified in the API documentation. |
output |
A character string specifying the output format. Possible values include:
|
heading |
An optional character string specifying a heading to filter the data.
Used with |
headingType |
An optional character string specifying a heading type to filter the data.
Possible values include |
page |
An optional integer specifying a page number for pagination. |
qrSize |
A character string specifying the size of the QR code.
Possible values are |
save |
A logical value indicating whether to save the output to a file. Default is |
The PubChem PUG View API allows users to retrieve detailed information about compounds, substances, and assays. This function constructs the appropriate API call based on the provided parameters. For more detailed information, please refer to the PubChem PUG View API documentation.
Depending on the output format, this function returns different types of content: JSON or JSONP format returns parsed JSON content. SVG format returns an image object. For QR codes, it returns an image object or saves a PNG file.
result <- get_pug_view(identifier = "2244", annotation = "linkout", domain = "compound") retrieve(result, .slot = "ObjUrl", .to.data.frame = FALSE)result <- get_pug_view(identifier = "2244", annotation = "linkout", domain = "compound") retrieve(result, .slot = "ObjUrl", .to.data.frame = FALSE)
This function sends a request to PubChem to retrieve data in SDF format based on the specified parameters. It then saves the retrieved data as an SDF file in the current working directory (or into the system-specific temporary folder).
get_sdf( identifier, namespace = "cid", domain = "compound", operation = NULL, searchtype = NULL, path = NULL, file_name = NULL, options = NULL )get_sdf( identifier, namespace = "cid", domain = "compound", operation = NULL, searchtype = NULL, path = NULL, file_name = NULL, options = NULL )
identifier |
A vector of compound identifiers, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace of the identifier. Possible values include: - - - - - - - Other namespaces as specified in the API documentation. For more details, see the Input section of the PUG REST API. |
domain |
A character string specifying the domain of the query. Possible values include: - - Other domains as specified in the API documentation. |
operation |
A character string specifying the operation to perform.
For SDF retrieval, the operation is typically |
searchtype |
An optional character string specifying the search type. Possible values include: - - - - - Other search types as specified in the API documentation. If For more details, see the Input section of the PUG REST API. |
path |
A character string specifying the directory path where the SDF file will be saved.
If |
file_name |
A character string specifying the name of the SDF file (without file extension).
If |
options |
A list of additional options for the request.
Available options depend on the specific request and the API.
If |
The PubChem PUG REST API allows users to retrieve compound data in various formats, including SDF. This function constructs the appropriate API call and saves the SDF data to a file. For more detailed information, please refer to the PubChem PUG REST API documentation.
The function saves the retrieved data as an SDF file in the current working directory and prints a message indicating the file's location.
get_sdf( identifier = "aspirin", namespace = "name", path = NULL )get_sdf( identifier = "aspirin", namespace = "name", path = NULL )
This function sends a request to PubChem to retrieve Substance IDs (SIDs) for a given identifier.
get_sids( identifier, namespace = "cid", domain = "compound", searchtype = NULL, options = NULL )get_sids( identifier, namespace = "cid", domain = "compound", searchtype = NULL, options = NULL )
identifier |
A vector of identifiers, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace of the identifier. Possible values depend on the - For - For - For For more details, see the Input section of the PUG REST API. |
domain |
A character string specifying the domain of the query. Possible values are: - - - - Other domains as specified in the API documentation. |
searchtype |
An optional character string specifying the search type. Possible values depend on the Examples include: - - If For more details, see the Input section of the PUG REST API. |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
#' For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstance_SIDs', which is a list containing information retrieved from the PubChem database. Substance IDs can be extracted from the returned object using the SIDs function.
result <- get_sids( identifier = c("aspirin", "ibuprofen"), namespace = "name" ) # Extract substance IDs of all compounds SIDs(result)result <- get_sids( identifier = c("aspirin", "ibuprofen"), namespace = "name" ) # Extract substance IDs of all compounds SIDs(result)
This function sends a request to PubChem to retrieve substance data based on the specified parameters.
get_substances(identifier, namespace = "sid", operation = NULL, options = NULL)get_substances(identifier, namespace = "sid", operation = NULL, options = NULL)
identifier |
A vector of substance identifiers, either numeric or character. |
namespace |
A character string specifying the namespace of the identifier. |
operation |
A character string specifying the operation to perform. |
options |
A list of additional options for the request. |
For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstanceList' containing all the substance information of requested compounds.
subs <- get_substances( identifier = c("aspirin", "ibuprofen"), namespace = "name" ) instance(subs, "aspirin") retrieve(instance(subs, "aspirin"), "source")subs <- get_substances( identifier = c("aspirin", "ibuprofen"), namespace = "name" ) instance(subs, "aspirin") retrieve(instance(subs, "aspirin"), "source")
This function sends a request to PubChem to retrieve synonyms for a given identifier. It returns a list of synonyms corresponding to the provided identifier.
get_synonyms( identifier, namespace = "cid", domain = "compound", searchtype = NULL, options = NULL )get_synonyms( identifier, namespace = "cid", domain = "compound", searchtype = NULL, options = NULL )
identifier |
A vector of identifiers, either numeric or character.
The type of identifier depends on the |
namespace |
A character string specifying the namespace of the identifier. Possible values depend on the - For - For - For For more details, see the Input section of the PUG REST API. |
domain |
A character string specifying the domain of the query. Possible values are: - - - - Other domains as specified in the API documentation. |
searchtype |
An optional character string specifying the search type. Possible values depend on the Examples include: - - If For more details, see the Input section of the PUG REST API. |
options |
A list of additional options for the request. Available options depend on the specific request and the API. Examples include: - For similarity searches: - For substructure searches: If For more details, see the Structure Search Operations section of the PUG REST API. |
The PubChem PUG REST API allows for retrieving synonyms related to various domains. The table below summarizes valid combinations for retrieving synonyms: For more detailed information, please refer to the PubChem PUG REST API documentation.
An object of class 'PubChemInstance_Synonyms', which is a list containing information retrieved from the PubChem database. Synonyms data can be extracted from the returned object using the synonyms function.
syns <- get_synonyms( identifier = "aspirin", namespace = "name" ) syns synonyms(syns)syns <- get_synonyms( identifier = "aspirin", namespace = "name" ) syns synonyms(syns)
This function extracts the results of a PubChem instance from an object. It is designed to retrieve
information about a compound from a comprehensive list where multiple elements (such as assay, compound, etc.) are requested.
instance(object, ...) ## S3 method for class 'PubChemInstanceList' instance(object, .which = NULL, ...)instance(object, ...) ## S3 method for class 'PubChemInstanceList' instance(object, .which = NULL, ...)
object |
An object of class |
... |
Additional arguments passed to other methods. Currently, these have no effect. |
.which |
A string specifying which instance's results to return. If NULL, the results of the first instance in
the |
'instance()' is a lightweight accessor that selects one requested identifier result from a multi-request container.
A single instance object extracted from object, typically of
class PubChemInstance.
compounds <- get_compounds( identifier = c("aspirin", "ibuprofen"), namespace = "name" ) instance(compounds) # Returns the results for "aspirin" instance(compounds, "ibuprofen")compounds <- get_compounds( identifier = c("aspirin", "ibuprofen"), namespace = "name" ) instance(compounds) # Returns the results for "aspirin" instance(compounds, "ibuprofen")
Converts long-form assay activity data into dense or sparse matrix-ready representations indexed by compound and assay identifiers.
pc_activity_matrix( x, cid_col = "CID", aid_col = "AID", outcome_col = "ActivityOutcome", value_map = c(Active = 1, Inactive = 0, Inconclusive = NA_real_), strict_outcome = FALSE, unknown_outcome = NA_real_, fill = NA_real_, prefix = "AID_", aggregate = c("max", "mean", "first"), output = c("tibble", "sparse") )pc_activity_matrix( x, cid_col = "CID", aid_col = "AID", outcome_col = "ActivityOutcome", value_map = c(Active = 1, Inactive = 0, Inconclusive = NA_real_), strict_outcome = FALSE, unknown_outcome = NA_real_, fill = NA_real_, prefix = "AID_", aggregate = c("max", "mean", "first"), output = c("tibble", "sparse") )
x |
A long-form table or 'PubChemResult' with at least CID/AID/outcome columns. |
cid_col |
Column name containing compound identifiers. |
aid_col |
Column name containing assay identifiers. |
outcome_col |
Column name containing activity outcome values. |
value_map |
Named numeric mapping for character outcomes. |
strict_outcome |
Logical. If 'TRUE', unknown outcome labels raise an error. |
unknown_outcome |
Numeric fallback for unknown labels when 'strict_outcome = FALSE'. |
fill |
Fill value for missing matrix cells. |
prefix |
Prefix used for assay columns in wide format output. |
aggregate |
Aggregation method for repeated CID/AID pairs. |
output |
Output type: '"tibble"' (default dense table) or '"sparse"' (Matrix backend). |
Character outcomes are normalized through 'pc_activity_outcome_map()'. For repeated CID/AID pairs, values are aggregated using the selected strategy.
A wide tibble ('output = "tibble"') or a sparse matrix wrapper object of class 'PubChemSparseActivityMatrix' ('output = "sparse"').
long_tbl <- tibble::tibble( CID = c("1", "1", "2"), AID = c("10", "11", "10"), ActivityOutcome = c("Active", "Inactive", "Active") ) pc_activity_matrix(long_tbl)long_tbl <- tibble::tibble( CID = c("1", "1", "2"), AID = c("10", "11", "10"), ActivityOutcome = c("Active", "Inactive", "Active") ) pc_activity_matrix(long_tbl)
Converts textual activity outcomes (for example, 'Active'/'Inactive') into numeric values for matrix and modeling workflows.
Maps textual activity outcomes (for example, 'Active'/'Inactive') to numeric values for modeling workflows.
pc_activity_outcome_map(values, map = NULL, strict = FALSE, unknown = NA_real_)pc_activity_outcome_map(values, map = NULL, strict = FALSE, unknown = NA_real_)
values |
Outcome values (character/factor/numeric). |
map |
Optional named numeric map. Names are matched case-insensitively after trimming. |
strict |
Logical. If 'TRUE', unknown non-empty labels raise an error. |
unknown |
Numeric value assigned to unknown labels when 'strict = FALSE'. |
Matching is case-insensitive after trimming whitespace. Custom maps augment the built-in defaults unless names overlap.
Numeric vector aligned with 'values'.
pc_activity_outcome_map(c("Active", "Inactive", "Unknown"))pc_activity_outcome_map(c("Active", "Inactive", "Unknown"))
Convenience wrapper around 'pc_request()' for assay retrieval workflows.
pc_assay( identifier, namespace = "aid", operation = "description", output = "JSON", options = NULL, ... )pc_assay( identifier, namespace = "aid", operation = "description", output = "JSON", options = NULL, ... )
identifier |
Identifier(s). |
namespace |
PubChem namespace. |
operation |
Operation. |
output |
Output format. |
options |
Named list of query options. |
... |
Additional arguments forwarded to 'httr::RETRY'. |
Defaults are tuned for assay description retrieval while preserving all transport controls via '...'.
A typed 'PubChemRecord' object.
assay_rec <- pc_assay(367, offline = TRUE) inherits(assay_rec, "PubChemRecord") ## Not run: pc_assay(367) ## End(Not run)assay_rec <- pc_assay(367, offline = TRUE) inherits(assay_rec, "PubChemRecord") ## Not run: pc_assay(367) ## End(Not run)
Builds a normalized long-form table from the PubChem 'compound/*/assaysummary/JSON' payload, suitable for 'pc_activity_matrix()'.
pc_assay_activity_long( identifier = NULL, namespace = "cid", x = NULL, chunk_size = NULL, unique_rows = TRUE, add_outcome_value = TRUE, outcome_map = NULL, strict_outcome = FALSE, unknown_outcome = NA_real_, ... )pc_assay_activity_long( identifier = NULL, namespace = "cid", x = NULL, chunk_size = NULL, unique_rows = TRUE, add_outcome_value = TRUE, outcome_map = NULL, strict_outcome = FALSE, unknown_outcome = NA_real_, ... )
identifier |
Identifier vector used when 'x' is 'NULL'. |
namespace |
Namespace for 'identifier' when requesting from PubChem. |
x |
Optional source object. One of: - 'PubChemResult' from 'pc_request(...)' - raw parsed payload list containing 'Table/Columns/Row' |
chunk_size |
Optional chunk size for large identifier vectors. |
unique_rows |
Logical. Remove duplicate rows after normalization. |
add_outcome_value |
Logical. If 'TRUE', adds a numeric 'ActivityOutcomeValue' column when 'ActivityOutcome' exists. |
outcome_map |
Optional named mapping passed to 'pc_activity_outcome_map()'. |
strict_outcome |
Logical. If 'TRUE', unknown outcome labels error. |
unknown_outcome |
Numeric fallback for unknown labels when 'strict_outcome = FALSE'. |
... |
Additional arguments forwarded to 'pc_request()' when 'x' is 'NULL'. |
Input can be fetched directly from PubChem or provided as an already parsed payload. Column names are normalized to stable identifiers and optional numeric activity outcomes can be added for modeling workflows.
A tibble with normalized fields including 'CID', 'AID', and 'ActivityOutcome' when available.
payload <- list( Table = list( Columns = list(Column = c("CID", "AID", "Activity Outcome")), Row = list( list(Cell = c("2244", "367", "Active")), list(Cell = c("2244", "368", "Inactive")) ) ) ) pc_assay_activity_long(x = payload)payload <- list( Table = list( Columns = list(Column = c("CID", "AID", "Activity Outcome")), Row = list( list(Cell = c("2244", "367", "Active")), list(Cell = c("2244", "368", "Inactive")) ) ) ) pc_assay_activity_long(x = payload)
Splits identifier vectors into chunks, applies a worker function per chunk, and records per-chunk success and error metadata.
pc_batch( ids, fn, chunk_size = 100, parallel = FALSE, workers = NULL, checkpoint_dir = NULL, checkpoint_id = NULL, resume = FALSE, rerun_failed = TRUE, ... )pc_batch( ids, fn, chunk_size = 100, parallel = FALSE, workers = NULL, checkpoint_dir = NULL, checkpoint_id = NULL, resume = FALSE, rerun_failed = TRUE, ... )
ids |
Identifier vector. |
fn |
Function to run on each chunk of 'ids'. |
chunk_size |
Chunk size. |
parallel |
Logical; use parallel execution. |
workers |
Number of workers. |
checkpoint_dir |
Optional directory to persist per-chunk checkpoint files. |
checkpoint_id |
Optional checkpoint run id. If 'NULL', a deterministic id is generated. |
resume |
Logical; resume from an existing checkpoint manifest. |
rerun_failed |
Logical; when resuming, rerun chunks previously marked as failed. |
... |
Additional arguments passed into 'fn'. |
Checkpoint files can be written to disk and resumed later. Parallel execution is available on supported platforms when checkpointing is disabled.
A typed 'PubChemBatchResult' object.
batch <- pc_batch( ids = 1:6, fn = function(chunk_ids, ...) sum(chunk_ids), chunk_size = 2 ) length(batch$results)batch <- pc_batch( ids = 1:6, fn = function(chunk_ids, ...) sum(chunk_ids), chunk_size = 2 ) length(batch$results)
Evaluates 'pc_batch()' execution under multiple chunk-size and parallel settings and returns runtime and failure metrics.
pc_benchmark( ids, fn, chunk_sizes = c(25, 50, 100), parallel_options = c(FALSE), workers = NULL, ... )pc_benchmark( ids, fn, chunk_sizes = c(25, 50, 100), parallel_options = c(FALSE), workers = NULL, ... )
ids |
Identifier vector. |
fn |
Function applied by 'pc_batch()'. |
chunk_sizes |
Integer vector of chunk sizes. |
parallel_options |
Logical vector controlling parallel toggle. |
workers |
Number of workers used when parallel is enabled. |
... |
Additional arguments passed to 'fn'. |
This benchmark is intended for tuning operational parameters before running large production queries.
A tibble with runtime and success metrics for each benchmark scenario.
bm <- pc_benchmark( ids = 1:20, fn = function(chunk_ids, ...) sum(chunk_ids), chunk_sizes = c(5, 10), parallel_options = FALSE ) nrow(bm)bm <- pc_benchmark( ids = 1:20, fn = function(chunk_ids, ...) sum(chunk_ids), chunk_sizes = c(5, 10), parallel_options = FALSE ) nrow(bm)
Executes benchmark scenarios (defaults: 10, 1000, 100000 identifiers), evaluates threshold gates, and optionally writes a report artifact.
pc_benchmark_harness( fn, ids = NULL, scenario_sizes = c(10L, 1000L, 100000L), id_generator = NULL, chunk_sizes = c(25L, 100L, 1000L), parallel_options = c(FALSE), workers = NULL, thresholds = pc_default_benchmark_thresholds(), report_path = NULL, report_format = c("markdown", "csv", "rds"), ... )pc_benchmark_harness( fn, ids = NULL, scenario_sizes = c(10L, 1000L, 100000L), id_generator = NULL, chunk_sizes = c(25L, 100L, 1000L), parallel_options = c(FALSE), workers = NULL, thresholds = pc_default_benchmark_thresholds(), report_path = NULL, report_format = c("markdown", "csv", "rds"), ... )
fn |
Function applied by 'pc_batch()'/'pc_benchmark()'. |
ids |
Optional base identifier vector. If shorter than a scenario size, identifiers are recycled. |
scenario_sizes |
Integer vector of benchmark scenario sizes. |
id_generator |
Optional function 'function(n)' returning 'n' identifiers. |
chunk_sizes |
Integer vector of chunk sizes evaluated per scenario. |
parallel_options |
Logical vector controlling parallel toggle. |
workers |
Number of workers used when parallel is enabled. |
thresholds |
Named list with optional elements 'elapsed_sec' and 'failed_chunk_ratio'. Each can be scalar or a named numeric vector keyed by scenario size. |
report_path |
Optional path to write report output. |
report_format |
One of '"markdown"', '"csv"', or '"rds"'. |
... |
Additional arguments passed to 'fn'. |
Scenario-level summaries include elapsed-time and failed-chunk-ratio gates, making this helper useful for CI performance regression checks.
An object of class 'PubChemBenchmarkReport' containing 'details' and 'summary' tibbles.
report <- pc_benchmark_harness( fn = function(chunk_ids, ...) sum(chunk_ids), ids = 1:50, scenario_sizes = c(10L, 20L), chunk_sizes = c(5L), parallel_options = FALSE ) class(report)report <- pc_benchmark_harness( fn = function(chunk_ids, ...) sum(chunk_ids), ids = 1:50, scenario_sizes = c(10L, 20L), chunk_sizes = c(5L), parallel_options = FALSE ) class(report)
Clears request cache entries stored in memory and/or on disk.
pc_cache_clear(cache_dir = NULL, memory = TRUE, disk = TRUE)pc_cache_clear(cache_dir = NULL, memory = TRUE, disk = TRUE)
cache_dir |
Cache directory. If 'NULL', uses configured cache directory. |
memory |
Logical; clear in-memory cache. |
disk |
Logical; clear on-disk cache. |
Use this helper before reproducible reruns or after changing transport settings when you want to avoid stale cached responses.
Invisibly returns 'TRUE'.
tmp_dir <- tempdir() pc_cache_info(cache_dir = tmp_dir) pc_cache_clear(cache_dir = tmp_dir, memory = TRUE, disk = FALSE) pc_cache_info(cache_dir = tmp_dir)tmp_dir <- tempdir() pc_cache_info(cache_dir = tmp_dir) pc_cache_clear(cache_dir = tmp_dir, memory = TRUE, disk = FALSE) pc_cache_info(cache_dir = tmp_dir)
Reports the current number and size of cached items in memory and on disk.
pc_cache_info(cache_dir = NULL)pc_cache_info(cache_dir = NULL)
cache_dir |
Cache directory. If 'NULL', uses configured cache directory. |
This is a lightweight diagnostic utility for validating cache behavior in long-running workflows and CI jobs.
A one-row tibble with memory and disk cache diagnostics.
pc_cache_info(cache_dir = tempdir())pc_cache_info(cache_dir = tempdir())
Resolves a 'PubChemAsyncQuery' object to a final 'PubChemResult' by polling when a listkey is present, or returning the initial response otherwise.
pc_collect(x, ...)pc_collect(x, ...)
x |
A 'PubChemAsyncQuery' object. |
... |
Additional arguments passed to 'pc_poll'. |
This helper simplifies asynchronous flow control by encapsulating the listkey/no-listkey branching logic in one call.
A 'PubChemResult' object.
q <- structure( list( initial = pc_request(identifier = 2244, offline = TRUE), listkey = NULL, domain = "compound", operation = NULL, output = "JSON", options = NULL ), class = "PubChemAsyncQuery" ) out <- pc_collect(q) inherits(out, "PubChemResult")q <- structure( list( initial = pc_request(identifier = 2244, offline = TRUE), listkey = NULL, domain = "compound", operation = NULL, output = "JSON", options = NULL ), class = "PubChemAsyncQuery" ) out <- pc_collect(q) inherits(out, "PubChemResult")
Convenience wrapper around 'pc_request()' for 'compound' domain record retrieval.
pc_compound( identifier, namespace = "cid", operation = NULL, searchtype = NULL, output = "JSON", options = NULL, ... )pc_compound( identifier, namespace = "cid", operation = NULL, searchtype = NULL, output = "JSON", options = NULL, ... )
identifier |
Identifier(s). |
namespace |
PubChem namespace. |
operation |
Operation. |
searchtype |
Search type. |
output |
Output format. |
options |
Named list of query options. |
... |
Additional arguments forwarded to 'httr::RETRY'. |
Returns a typed 'PubChemRecord' object and preserves transport metadata such as 'success', 'status', and cache flags.
A typed 'PubChemRecord' object.
cmp <- pc_compound(2244, offline = TRUE) inherits(cmp, "PubChemRecord") ## Not run: pc_compound(2244) ## End(Not run)cmp <- pc_compound(2244, offline = TRUE) inherits(cmp, "PubChemRecord") ## Not run: pc_compound(2244) ## End(Not run)
Reads or updates global defaults used by the next-generation PubChemR transport helpers (timeouts, retries, cache settings, and rate limits).
pc_config(...)pc_config(...)
... |
Named configuration values to update. |
Call with no arguments to inspect the active configuration. Named arguments update only the provided keys and keep all other values unchanged.
A named list of active configuration values.
cfg <- pc_config() names(cfg) pc_config(rate_limit = cfg$rate_limit)$rate_limitcfg <- pc_config() names(cfg) pc_config(rate_limit = cfg$rate_limit)$rate_limit
Joins cross-domain PubChem tables into a single analysis-ready tibble using configurable key mappings and join type.
pc_cross_domain_join( compounds, substances = NULL, assays = NULL, targets = NULL, by = list(compound_substance = "CID", compound_assay = "CID", assay_target = "AID"), join = c("left", "inner", "full") )pc_cross_domain_join( compounds, substances = NULL, assays = NULL, targets = NULL, by = list(compound_substance = "CID", compound_assay = "CID", assay_target = "AID"), join = c("left", "inner", "full") )
compounds |
Base compound table. |
substances |
Optional substance table. |
assays |
Optional assay table. |
targets |
Optional target table. |
by |
Named list of join keys for each join edge. |
join |
Join type ('"left"', '"inner"', '"full"'). |
Join steps are applied in order: compounds-substances, compounds-assays, then assays-targets when corresponding tables are supplied.
A joined tibble suitable for downstream analysis workflows.
compounds <- tibble::tibble(CID = c("1", "2"), MW = c(100, 200)) assays <- tibble::tibble(CID = c("1", "2"), AID = c("10", "11")) pc_cross_domain_join(compounds, assays = assays)compounds <- tibble::tibble(CID = c("1", "2"), MW = c(100, 200)) assays <- tibble::tibble(CID = c("1", "2"), AID = c("10", "11")) pc_cross_domain_join(compounds, assays = assays)
Writes model-ready data to disk from either a 'PubChemModelMatrix' object or a tabular object.
pc_export_model_data( x, path, format = c("csv", "rds"), include_ids = TRUE, include_outcome = TRUE )pc_export_model_data( x, path, format = c("csv", "rds"), include_ids = TRUE, include_outcome = TRUE )
x |
Input object. Supported: - 'PubChemModelMatrix' - 'data.frame'/'tibble' |
path |
Output path. |
format |
Export format, one of '"csv"' or '"rds"'. |
include_ids |
Logical. Include ID columns from 'PubChemModelMatrix'. |
include_outcome |
Logical. Include outcome vector from 'PubChemModelMatrix'. |
Output format is selected by 'format'; 'csv' is written with 'utils::write.csv()' and 'rds' with 'saveRDS()'.
Invisibly returns a list with 'path', 'format', and output dimensions.
out_file <- tempfile(fileext = ".csv") x <- tibble::tibble(CID = c("1", "2"), x1 = c(0.1, 0.2)) meta <- pc_export_model_data(x, path = out_file, format = "csv") file.exists(meta$path)out_file <- tempfile(fileext = ".csv") x <- tibble::tibble(CID = c("1", "2"), x1 = c(0.1, 0.2)) meta <- pc_export_model_data(x, path = out_file, format = "csv") file.exists(meta$path)
Retrieves compound property fields and returns them in a flat table intended for feature engineering and modeling workflows.
pc_feature_table( identifier, properties = c("MolecularWeight", "XLogP", "TPSA", "HBondDonorCount", "HBondAcceptorCount"), namespace = "cid", numeric_only = TRUE, ... )pc_feature_table( identifier, properties = c("MolecularWeight", "XLogP", "TPSA", "HBondDonorCount", "HBondAcceptorCount"), namespace = "cid", numeric_only = TRUE, ... )
identifier |
Identifier vector for compound property retrieval. |
properties |
Compound property names. |
namespace |
Namespace for identifier. |
numeric_only |
If 'TRUE', coerce feature columns to numeric where possible. |
... |
Additional arguments passed to 'pc_property()'. |
Transport metadata columns are removed from the returned table. With 'numeric_only = TRUE', feature columns are opportunistically converted to numeric where conversion yields at least one finite value.
A tibble of compound features suitable for downstream modeling workflows.
names(formals(pc_feature_table)) ## Not run: pc_feature_table(2244, properties = c("MolecularWeight", "XLogP")) ## End(Not run)names(formals(pc_feature_table)) ## Not run: pc_feature_table(2244, properties = c("MolecularWeight", "XLogP")) ## End(Not run)
Converts one identifier type to another ('CID', 'SID', 'AID') using PubChem identifier mapping endpoints.
pc_identifier_map( identifier, namespace = "name", to = c("cids", "sids", "aids"), domain = "compound", searchtype = NULL, options = NULL, ... )pc_identifier_map( identifier, namespace = "name", to = c("cids", "sids", "aids"), domain = "compound", searchtype = NULL, options = NULL, ... )
identifier |
Identifier(s). |
namespace |
PubChem namespace. |
to |
Operation target; usually one of '"cids"', '"sids"', or '"aids"'. |
domain |
PubChem domain. |
searchtype |
Search type. |
options |
Named list of query options. |
... |
Additional arguments forwarded to 'httr::RETRY'. |
This is commonly used to bridge domains, for example mapping names to CIDs before downstream property or assay workflows.
A typed 'PubChemIdMap' object.
id_map <- pc_identifier_map("aspirin", namespace = "name", to = "cids", offline = TRUE) inherits(id_map, "PubChemIdMap") ## Not run: pc_identifier_map("aspirin", namespace = "name", to = "cids") ## End(Not run)id_map <- pc_identifier_map("aspirin", namespace = "name", to = "cids", offline = TRUE) inherits(id_map, "PubChemIdMap") ## Not run: pc_identifier_map("aspirin", namespace = "name", to = "cids") ## End(Not run)
Returns the compatibility policy table used for PubChemR legacy and next-generation APIs.
pc_lifecycle_policy()pc_lifecycle_policy()
The table summarizes expected support windows and deprecation notice guarantees by API stream.
A tibble describing PubChemR compatibility and deprecation guarantees.
pc_lifecycle_policy()pc_lifecycle_policy()
Converts tabular PubChem features into a numeric model-matrix bundle with optional outcome and identifier metadata.
pc_model_matrix( x, outcome = NULL, id_cols = c("CID", "SID", "AID", "Identifier"), na_fill = NULL, scale = FALSE )pc_model_matrix( x, outcome = NULL, id_cols = c("CID", "SID", "AID", "Identifier"), na_fill = NULL, scale = FALSE )
x |
Input table or 'PubChemResult'. |
outcome |
Optional outcome column name. |
id_cols |
Identifier columns excluded from predictors. |
na_fill |
Optional numeric value used to fill missing predictor values. |
scale |
Logical; center and scale predictor matrix. |
Non-numeric predictors are coerced when feasible, then filtered to numeric columns only. The returned object stores predictors, optional response, and identifier columns.
An object of class 'PubChemModelMatrix' with 'x', 'y', and metadata fields.
tbl <- tibble::tibble(CID = c("1", "2"), x1 = c("1.0", "2.0"), y = c(0, 1)) mm <- pc_model_matrix(tbl, outcome = "y") class(mm)tbl <- tibble::tibble(CID = c("1", "2"), x1 = c("1.0", "2.0"), y = c(0, 1)) mm <- pc_model_matrix(tbl, outcome = "y") class(mm)
Polls PubChem listkey endpoints until results are ready or the attempt limit is reached.
pc_poll( x, domain = "compound", operation = NULL, output = "JSON", options = NULL, interval = 1.5, max_attempts = 20, ... )pc_poll( x, domain = "compound", operation = NULL, output = "JSON", options = NULL, interval = 1.5, max_attempts = 20, ... )
x |
A 'PubChemAsyncQuery' object or listkey string. |
domain |
Domain for polling. |
operation |
Operation for polling. |
output |
Output format. |
options |
Polling options. |
interval |
Poll interval in seconds. |
max_attempts |
Maximum polling attempts. |
... |
Additional arguments passed to 'pc_request'. |
Polling stops early once 'pending = FALSE'. On timeout, a structured 'PubChemResult' failure with code 'PollingTimeout' is returned.
A 'PubChemResult' object.
polled <- pc_poll("example-listkey", max_attempts = 1, offline = TRUE) polled$successpolled <- pc_poll("example-listkey", max_attempts = 1, offline = TRUE) polled$success
Applies a named transport profile and optional overrides to 'pc_config()'.
pc_profile(profile = c("default", "cloud", "high_throughput"), ...)pc_profile(profile = c("default", "cloud", "high_throughput"), ...)
profile |
One of '"default"', '"cloud"', or '"high_throughput"'. |
... |
Named overrides applied on top of the selected profile. |
Profiles provide baseline settings for rate limits, retries, and cache TTL. Any named overrides supplied in '...' replace the selected profile values.
A named list with active transport configuration.
cfg <- pc_profile("default") cfg$rate_limitcfg <- pc_profile("default") cfg$rate_limit
Retrieves selected PubChem compound properties for one or more identifiers.
pc_property( identifier, properties, namespace = "cid", searchtype = NULL, options = NULL, ... )pc_property( identifier, properties, namespace = "cid", searchtype = NULL, options = NULL, ... )
identifier |
Identifier(s). |
properties |
Character vector of property names. |
namespace |
PubChem namespace. |
searchtype |
Search type. |
options |
Named list of query options. |
... |
Additional arguments forwarded to 'httr::RETRY'. |
'properties' is encoded into the PUG REST operation path. At least one property name is required.
A typed 'PubChemRecord' object.
prop_rec <- pc_property(2244, properties = c("MolecularWeight"), offline = TRUE) inherits(prop_rec, "PubChemRecord") ## Not run: pc_property(2244, properties = c("MolecularWeight", "XLogP")) ## End(Not run)prop_rec <- pc_property(2244, properties = c("MolecularWeight"), offline = TRUE) inherits(prop_rec, "PubChemRecord") ## Not run: pc_property(2244, properties = c("MolecularWeight", "XLogP")) ## End(Not run)
Executes PubChem API calls with unified handling for URL construction, retries, throttling, and optional cache replay.
pc_request( domain = "compound", namespace = "cid", identifier = NULL, operation = NULL, searchtype = NULL, output = "JSON", options = NULL, method = c("GET", "POST"), body = NULL, rate_limit = TRUE, timeout = NULL, retries = NULL, pause_base = NULL, pause_cap = NULL, user_agent = NULL, cache = FALSE, cache_dir = NULL, cache_ttl = NULL, force_refresh = FALSE, offline = NULL, ... )pc_request( domain = "compound", namespace = "cid", identifier = NULL, operation = NULL, searchtype = NULL, output = "JSON", options = NULL, method = c("GET", "POST"), body = NULL, rate_limit = TRUE, timeout = NULL, retries = NULL, pause_base = NULL, pause_cap = NULL, user_agent = NULL, cache = FALSE, cache_dir = NULL, cache_ttl = NULL, force_refresh = FALSE, offline = NULL, ... )
domain |
PubChem domain. |
namespace |
PubChem namespace. |
identifier |
Identifier(s). |
operation |
Operation. |
searchtype |
Search type. |
output |
Output format. |
options |
Named list of query options. |
method |
HTTP method; '"GET"' or '"POST"'. |
body |
Optional POST body. |
rate_limit |
'TRUE' to use configured default, 'FALSE' to disable, or numeric req/sec. |
timeout |
Timeout in seconds. |
retries |
Retry count. |
pause_base |
Retry base pause. |
pause_cap |
Retry max pause. |
user_agent |
User-agent string. |
cache |
Logical; enable memory+disk cache. |
cache_dir |
Cache directory. |
cache_ttl |
TTL in seconds. |
force_refresh |
Skip cache and refresh. |
offline |
'TRUE' to use cache-only replay mode (no network calls). |
... |
Additional arguments forwarded to 'httr::RETRY'. |
'pc_request()' is the low-level engine behind higher-level 'pc_*' helpers. When 'offline = TRUE', requests are served from cache only and return a structured failure object on cache misses.
An object of class 'PubChemResult'.
# Fast, network-free call: returns cache hit or structured cache-miss result. res <- pc_request(identifier = 2244, offline = TRUE) res$success ## Not run: pc_request(identifier = 2244) ## End(Not run)# Fast, network-free call: returns cache hit or structured cache-miss result. res <- pc_request(identifier = 2244, offline = TRUE) res$success ## Not run: pc_request(identifier = 2244) ## End(Not run)
Parses PubChem response payloads and standardizes them into a typed 'PubChemResult' object with success, error, and metadata fields.
pc_response(response, request = list())pc_response(response, request = list())
response |
A 'httr' response object or raw text. |
request |
Request metadata list. |
JSON and plain-text payloads are supported. Fault payloads and non-2xx HTTP statuses are normalized into structured error entries rather than raising immediate transport exceptions.
An object of class 'PubChemResult'.
ok <- pc_response('{"IdentifierList":{"CID":[2244]}}', request = list(domain = "compound")) ok$success fail <- pc_response('{"Fault":{"Code":"PUGREST.NotFound","Message":"Not found"}}') fail$successok <- pc_response('{"IdentifierList":{"CID":[2244]}}', request = list(domain = "compound")) ok$success fail <- pc_response('{"Fault":{"Code":"PUGREST.NotFound","Message":"Not found"}}') fail$success
Reloads a previously checkpointed 'pc_batch()' run and executes pending or failed chunks.
pc_resume_batch( fn, checkpoint_dir, checkpoint_id, parallel = FALSE, workers = NULL, rerun_failed = TRUE, ... )pc_resume_batch( fn, checkpoint_dir, checkpoint_id, parallel = FALSE, workers = NULL, rerun_failed = TRUE, ... )
fn |
Function to run on each pending chunk. |
checkpoint_dir |
Directory containing checkpoint manifest/files. |
checkpoint_id |
Checkpoint run id. |
parallel |
Logical; use parallel execution where supported. |
workers |
Number of workers. |
rerun_failed |
Logical; rerun chunks previously marked as failed. |
... |
Additional arguments passed into 'fn'. |
This helper reads the batch manifest created by 'pc_batch()' and preserves the original chunking strategy.
A typed 'PubChemBatchResult' object.
cp_dir <- tempdir() cp_id <- "pc-doc-example" pc_batch( ids = 1:4, fn = function(chunk_ids, ...) sum(chunk_ids), chunk_size = 2, checkpoint_dir = cp_dir, checkpoint_id = cp_id ) resumed <- pc_resume_batch( fn = function(chunk_ids, ...) sum(chunk_ids), checkpoint_dir = cp_dir, checkpoint_id = cp_id ) resumed$checkpoint$resumedcp_dir <- tempdir() cp_id <- "pc-doc-example" pc_batch( ids = 1:4, fn = function(chunk_ids, ...) sum(chunk_ids), chunk_size = 2, checkpoint_dir = cp_dir, checkpoint_id = cp_id ) resumed <- pc_resume_batch( fn = function(chunk_ids, ...) sum(chunk_ids), checkpoint_dir = cp_dir, checkpoint_id = cp_id ) resumed$checkpoint$resumed
Queries the PubChem SDQ (Structured Data Query) agent to retrieve the full
biological test results table for a compound. Uses the download query
mode to return all available columns for each record. The number and names
of columns vary by compound depending on available data (e.g. baid, aid,
sid, cid, activityid, aidtypeid, aidname, targetname, cmpdname, acvalue,
geneid, etc.).
pc_sdq_bioactivity( identifier, namespace = "cid", collection = "bioactivity", limit = 10000000L, order = "activity,asc", rate_limit = TRUE, cache = FALSE, cache_dir = NULL, cache_ttl = NULL, force_refresh = FALSE )pc_sdq_bioactivity( identifier, namespace = "cid", collection = "bioactivity", limit = 10000000L, order = "activity,asc", rate_limit = TRUE, cache = FALSE, cache_dir = NULL, cache_ttl = NULL, force_refresh = FALSE )
identifier |
A single compound identifier (CID, name, or InChIKey
depending on |
namespace |
Character. The namespace for |
collection |
Character. SDQ collection to query. Default
|
limit |
Integer. Maximum number of rows to return. Default
|
order |
Character. Column and direction for sorting results. Default
|
rate_limit |
Logical or numeric. If |
cache |
Logical. If |
cache_dir |
Character. Directory for disk cache. Defaults to the value
from |
cache_ttl |
Numeric. Cache time-to-live in seconds. Defaults to the
value from |
force_refresh |
Logical. If |
When namespace != "cid", the identifier is first resolved to CID via
pc_request before querying SDQ. Returned columns depend on
source availability for the requested compound.
A tibble of class PubChemTable containing the full
bioactivity results.
names(formals(pc_sdq_bioactivity)) ## Not run: # Retrieve bioactivity data for aspirin (CID 2244) bio <- pc_sdq_bioactivity(2244) head(bio) ## End(Not run)names(formals(pc_sdq_bioactivity)) ## Not run: # Retrieve bioactivity data for aspirin (CID 2244) bio <- pc_sdq_bioactivity(2244) head(bio) ## End(Not run)
Performs similarity search requests and returns mapped identifiers in a typed result object.
pc_similarity_search( identifier, namespace = c("smiles", "cid", "inchi", "sdf"), domain = "compound", to = c("cids", "sids", "aids"), searchtype = c("similarity", "fastsimilarity_2d", "fastsimilarity_3d"), threshold = 95, max_records = NULL, options = NULL, ... )pc_similarity_search( identifier, namespace = c("smiles", "cid", "inchi", "sdf"), domain = "compound", to = c("cids", "sids", "aids"), searchtype = c("similarity", "fastsimilarity_2d", "fastsimilarity_3d"), threshold = 95, max_records = NULL, options = NULL, ... )
identifier |
Query identifier(s) for similarity search. |
namespace |
Namespace for query identifiers. |
domain |
PubChem domain. |
to |
Output mapping target ('"cids"', '"sids"', or '"aids"'). |
searchtype |
Similarity mode ('"similarity"', '"fastsimilarity_2d"', '"fastsimilarity_3d"'). |
threshold |
Similarity threshold. |
max_records |
Optional maximum number of records returned by PubChem. |
options |
Optional named list of additional query options. |
... |
Additional arguments passed to 'pc_request()'. |
Similarity mode is controlled by 'searchtype' and 'threshold'. Optional 'max_records' is translated to 'list_return' and appended to query options.
A typed object inheriting from 'PubChemIdMap'.
sim <- pc_similarity_search("CC(=O)OC1=CC=CC=C1C(=O)O", namespace = "smiles", offline = TRUE) inherits(sim, "PubChemIdMap") ## Not run: pc_similarity_search("CC(=O)OC1=CC=CC=C1C(=O)O", namespace = "smiles") ## End(Not run)sim <- pc_similarity_search("CC(=O)OC1=CC=CC=C1C(=O)O", namespace = "smiles", offline = TRUE) inherits(sim, "PubChemIdMap") ## Not run: pc_similarity_search("CC(=O)OC1=CC=CC=C1C(=O)O", namespace = "smiles") ## End(Not run)
Submits a potentially asynchronous PubChem request and returns a query object containing the initial response and listkey metadata.
pc_submit( domain = "compound", namespace = "cid", identifier = NULL, operation = NULL, searchtype = NULL, output = "JSON", options = NULL, ... )pc_submit( domain = "compound", namespace = "cid", identifier = NULL, operation = NULL, searchtype = NULL, output = "JSON", options = NULL, ... )
domain |
PubChem domain. |
namespace |
PubChem namespace. |
identifier |
Identifier(s). |
operation |
Operation. |
searchtype |
Search type. |
output |
Output format. |
options |
Named list of query options. |
... |
Additional arguments forwarded to 'httr::RETRY'. |
If PubChem responds with a waiting payload, 'listkey' is captured and can be polled later using 'pc_poll()' or 'pc_collect()'.
An object of class 'PubChemAsyncQuery'.
q <- pc_submit(identifier = 2244, offline = TRUE) inherits(q, "PubChemAsyncQuery") ## Not run: pc_submit(identifier = 2244) ## End(Not run)q <- pc_submit(identifier = 2244, offline = TRUE) inherits(q, "PubChemAsyncQuery") ## Not run: pc_submit(identifier = 2244) ## End(Not run)
Convenience wrapper around 'pc_request()' for 'substance' domain retrieval.
pc_substance( identifier, namespace = "sid", operation = "record", output = "JSON", options = NULL, ... )pc_substance( identifier, namespace = "sid", operation = "record", output = "JSON", options = NULL, ... )
identifier |
Identifier(s). |
namespace |
PubChem namespace. |
operation |
Operation. |
output |
Output format. |
options |
Named list of query options. |
... |
Additional arguments forwarded to 'httr::RETRY'. |
The function returns a typed record object without altering payload shape, making it suitable for downstream custom parsers.
A typed 'PubChemRecord' object.
sub_rec <- pc_substance(5360534, offline = TRUE) inherits(sub_rec, "PubChemRecord") ## Not run: pc_substance(5360534) ## End(Not run)sub_rec <- pc_substance(5360534, offline = TRUE) inherits(sub_rec, "PubChemRecord") ## Not run: pc_substance(5360534) ## End(Not run)
Converts SMILES strings from a PubChem table to a 'ChemmineR' SDF object.
pc_to_chemminer(x, smiles_col = "CanonicalSMILES")pc_to_chemminer(x, smiles_col = "CanonicalSMILES")
x |
Input table or 'PubChemResult'. |
smiles_col |
Column with SMILES strings. |
Requires the optional 'ChemmineR' package with an available 'smiles2sdf()' function.
A 'ChemmineR' SDF object.
names(formals(pc_to_chemminer)) ## Not run: ex_tbl <- tibble::tibble(CanonicalSMILES = "CCO") sdf <- pc_to_chemminer(ex_tbl) class(sdf) ## End(Not run)names(formals(pc_to_chemminer)) ## Not run: ex_tbl <- tibble::tibble(CanonicalSMILES = "CCO") sdf <- pc_to_chemminer(ex_tbl) class(sdf) ## End(Not run)
Converts SMILES strings from a PubChem table into 'rcdk' molecule objects.
pc_to_rcdk(x, smiles_col = "CanonicalSMILES", id_col = "CID")pc_to_rcdk(x, smiles_col = "CanonicalSMILES", id_col = "CID")
x |
Input table or 'PubChemResult'. |
smiles_col |
Column with SMILES strings. |
id_col |
Optional identifier column used to name returned molecules. |
Requires the optional 'rcdk' package. Empty or missing SMILES values are dropped before conversion.
A list of 'rcdk' molecule objects.
names(formals(pc_to_rcdk)) ## Not run: ex_tbl <- tibble::tibble(CID = "1", CanonicalSMILES = "CCO") mols <- pc_to_rcdk(ex_tbl) length(mols) ## End(Not run)names(formals(pc_to_rcdk)) ## Not run: ex_tbl <- tibble::tibble(CID = "1", CanonicalSMILES = "CCO") mols <- pc_to_rcdk(ex_tbl) length(mols) ## End(Not run)
Generic accessor returning raw payload data from a PugRestInstance.
pubChemData(object, ...) ## S3 method for class 'PugRestInstance' pubChemData(object, ...)pubChemData(object, ...) ## S3 method for class 'PugRestInstance' pubChemData(object, ...)
object |
an object of class 'PugRestInstance' returned from get_pug_rest function. |
... |
additional arguments. Currently has no effect on results. |
This helper bypasses higher-level format-specific extractors and returns the underlying parsed result object as stored on the request instance.
a vector, list, or data.frame containing the raw data retrieved from Pub Chem database through PUG REST API.
result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", output = "JSON") pubChemData(result)result <- get_pug_rest(identifier = "2244", namespace = "cid", domain = "compound", output = "JSON") pubChemData(result)
PubChemInstanceList and PubChemInstance ClassesThe PubChemInstanceList object is a superclass returned by a request for compound(s) from
the PubChem Database, such as the output from get_compounds, get_assays, etc.
The PubChemInstance object is another superclass for a PubChem instance, such as an assay, compound, substance, etc.
These instances are nested within the results slot of a PubChemInstanceList object. Similar to PubChemInstanceList,
the PubChemInstance also contains the same slots as described below. For more details, see instance.
results:A list containing elements of each of the requested compounds, assays, substances, etc.
request_args:A list containing the input arguments of a PubChem request.
success:A logical value indicating whether the request was successfully completed (TRUE) or not (FALSE).
error:A list detailing any errors encountered during the request, if applicable.
There is no constructor function for the PubChemInstanceList or PubChemInstance classes. These objects are
constructed within related functions and returned as the output of PubChem requests.
There are several subclasses defined under the PubChemInstanceList and PubChemInstance superclasses. The PubChem API
returns request results in a list; however, each request may have a different list structure and/or items within the returned list.
Therefore, we have defined subclasses to make generic functions compatible with any PubChem request, such as assays, instances,
substances, etc. These subclasses may include PC_Compounds, PC_Substance, PC_Properties, PubChemInstance_AIDs,
PubChemInstance_SIDs, PubChemInstance_CIDs, PubChemInstance_Synonyms, and PubChemInstance_Substances.
Most of the defined subclasses have similar slots as described above. However, some classes may have additional slots not described here. Please refer to the contents of the returned object for more details.
The Pug View API of PubChem database returns more detailed information about a PubChem request, such as assays, compounds,
substances, etc. A super-class PugViewInstance is defined, which is returned from get_pug_view function. This class has
slots detailed below.
results:A list containing elements of each of the requested compounds, assays, substances, etc.
request_args:A list containing the input arguments of a PubChem request.
success:A logical value indicating whether the request was successfully completed (TRUE) or not (FALSE).
error:A list detailing any errors encountered during the request, if applicable.
Pug View API returns many section about the requested instance, which includes detailed information from PubChem database. There may be many nested sections, where each contains details about different features of the instance requested. These sections can be listed via sectionList function.
Other classes, called PugViewSectionList and PugViewSection, are defined to control the outputs of available sections
and sub-sections returned from get_pug_view. See related functions for details.
This function retrieves the input arguments from a specified PubChem database request object.
request_args(object, .which = NULL, ...)request_args(object, .which = NULL, ...)
object |
An object returned from related request functions of the PubChem database. |
.which |
A string specifying which argument's content to retrieve from |
... |
Additional arguments. These have no effect on the returned outputs and are included for compatibility with S3 methods in the PubChemR package. |
This accessor is useful for auditing request provenance when downstream objects are passed through multiple transformation steps.
A list or string vector containing the options used in the function call.
request <- get_cids("aspirin", namespace = "name") request_args(request, "identifier") request_args(request)request <- get_cids("aspirin", namespace = "name") request_args(request, "identifier") request_args(request)
This generic function extracts a specific slot from a PubChem instance.
retrieve(object, ...) ## S3 method for class 'PubChemInstance' retrieve(object, .slot = NULL, .to.data.frame = TRUE, .verbose = FALSE, ...) ## S3 method for class 'PubChemInstanceList' retrieve( object, .which = NULL, .slot = NULL, .to.data.frame = TRUE, .combine.all = FALSE, ... ) ## S3 method for class 'PC_Substance' retrieve( object, .slot = NULL, .idx = 1, .to.data.frame = TRUE, .verbose = FALSE, ... ) ## S3 method for class 'PugViewInstance' retrieve(object, .slot = NULL, .to.data.frame = TRUE, ...) ## S3 method for class 'PugViewSection' retrieve(object, .slot = NULL, .to.data.frame = FALSE, ...)retrieve(object, ...) ## S3 method for class 'PubChemInstance' retrieve(object, .slot = NULL, .to.data.frame = TRUE, .verbose = FALSE, ...) ## S3 method for class 'PubChemInstanceList' retrieve( object, .which = NULL, .slot = NULL, .to.data.frame = TRUE, .combine.all = FALSE, ... ) ## S3 method for class 'PC_Substance' retrieve( object, .slot = NULL, .idx = 1, .to.data.frame = TRUE, .verbose = FALSE, ... ) ## S3 method for class 'PugViewInstance' retrieve(object, .slot = NULL, .to.data.frame = TRUE, ...) ## S3 method for class 'PugViewSection' retrieve(object, .slot = NULL, .to.data.frame = FALSE, ...)
object |
An object returned from a PubChem request. |
... |
Additional arguments passed to other methods. |
.slot |
A string specifying which slot to return. Should not be NULL or length of >1 with some exceptions. See the notes for details. |
.to.data.frame |
A logical value. If TRUE, the returned object will be converted into a data.frame (or tibble).
If conversion to a data.frame fails, a list will be returned with a warning. Be cautious with complex lists
(i.e., many elements nested within each other) as it may be time-consuming to convert such lists into a data frame.
Additionally, |
.verbose |
A logical value. Should the resulting object be printed to the R console? If TRUE, the object is returned invisibly and the output is printed nicely to the R console. This option may not be available for some slots (or classes). See Notes/Details. |
.which |
A character value. This is the identifier of the PubChem request that will be extracted from the complete list. It is ignored if |
.combine.all |
a logical value. If TRUE, the properties of all requested instances are combined into a single data frame (or a list if |
.idx |
An integer indicating which substance result should be returned. A PubChem request may return multiple
substances in the output. |
'retrieve()' is a generic extractor that targets result slots across several PubChemR object classes and can optionally coerce outputs to tabular form.
Retrieved slot contents as a vector, list, or tabular object, depending on method and arguments.
'PugViewInstance' and 'PugViewSection'
The PugView API returns a detailed list related to PubChem requests. The 'Section' slot in this list is structured into
a sub-class called 'PugViewSection'. This object contains information organized through several sections (or sub-sections),
which can be retrieved using section-specific functions such as section and sectionList.
The function argument .to.data.frame is ignored if the "Section" slot is being extracted from the complete list.
For other slots, .to.data.frame is considered as usual. See examples for usage.
If the object is from the 'PC_Properties' class, the .slot can be defined as NULL. If .slot = NULL, retrieve() will return all available properties. If 'object' is of class other than 'PC_Properties', .slot should be length of 1.
In some cases, it may be practical to extract multiple slots from 'object'. For example, one may wish to extract properties from the output of get_properties by running the functions in a loop. See codes below for a practical example:
library(dplyr)
props <- get_properties(
properties = c("MolecularWeight", "MolecularFormula", "HBondDonorCount",
"HBondAcceptorCount", "InChIKey", "InChI"),
identifier = 2244,
namespace = "cid",
propertyMatch = list(
.ignore.case = TRUE,
type = "contain"
)
)
bind_columns <- function(x, ...){
part1 <- x[[1]][ ,"Identifier"]
part2 <- lapply(x, "[", 2)
bind_cols()
bind_cols(part1, part2)
}
propsToExtract <- c("MolecularWeight", "MolecularFormula", "HBondDonorCount")
tmp <- lapply(propsToExtract, retrieve, object = props, .which = "2244")
bind_columns(tmp)
'.verbose' argumentretrieve returns output silently (invisibly) when .verbose = TRUE. However, the function behaves differently
under the following scenarios:
.verbose is ignored if .combine.all = TRUE. The output is returned silently.
.verbose is ignored if the requested slot is not printable to the R console because it is too complicated to print.
compounds <- get_compounds( identifier = c("aspirin", "ibuprofen", "rstudio"), namespace = "name" ) # Extract information for "aspirin" aspirin <- instance(compounds, "aspirin") # print(aspirin) # Extract a specific slot from the "aspirin" compound. retrieve(aspirin, "props", .to.data.frame = TRUE) # Examples (PubChemInstanceList) retrieve(compounds, "aspirin", "props", .to.data.frame = TRUE) # Verbose Assay References to R Console assays <- get_assays(identifier = c(1234, 7815), namespace = "aid") instance(assays, "7815") retrieve(assays, "7815", "xref", .verbose = TRUE) # Print assay protocol to R console (if available) # Note that it may be too long to print for some assays. # retrieve(assays, "1234", "protocol", .verbose = TRUE) # No protocol is available for assay "1234". # retrieve(assays, "7815", "protocol", .verbose = TRUE) # Ignores ".verbose" and ".which" if ".combine.all = TRUE". retrieve(assays, .slot = "xref", .verbose = TRUE, .combine.all = TRUE) ### PUG VIEW EXAMPLES ### pview <- get_pug_view(identifier = "2244", annotation = "data", domain = "compound") # PugViewSectionList object. # This object contains all the section information related to the PubChem request. sect <- retrieve(pview, .slot = "Section") print(sect) retrieve(pview, .slot = "RecordType", .to.data.frame = TRUE)compounds <- get_compounds( identifier = c("aspirin", "ibuprofen", "rstudio"), namespace = "name" ) # Extract information for "aspirin" aspirin <- instance(compounds, "aspirin") # print(aspirin) # Extract a specific slot from the "aspirin" compound. retrieve(aspirin, "props", .to.data.frame = TRUE) # Examples (PubChemInstanceList) retrieve(compounds, "aspirin", "props", .to.data.frame = TRUE) # Verbose Assay References to R Console assays <- get_assays(identifier = c(1234, 7815), namespace = "aid") instance(assays, "7815") retrieve(assays, "7815", "xref", .verbose = TRUE) # Print assay protocol to R console (if available) # Note that it may be too long to print for some assays. # retrieve(assays, "1234", "protocol", .verbose = TRUE) # No protocol is available for assay "1234". # retrieve(assays, "7815", "protocol", .verbose = TRUE) # Ignores ".verbose" and ".which" if ".combine.all = TRUE". retrieve(assays, .slot = "xref", .verbose = TRUE, .combine.all = TRUE) ### PUG VIEW EXAMPLES ### pview <- get_pug_view(identifier = "2244", annotation = "data", domain = "compound") # PugViewSectionList object. # This object contains all the section information related to the PubChem request. sect <- retrieve(pview, .slot = "Section") print(sect) retrieve(pview, .slot = "RecordType", .to.data.frame = TRUE)
section returns section details from a Pug View request.
section(object, ...) ## S3 method for class 'PugViewInstance' section(object, .id = "S1", .verbose = FALSE, ...) ## S3 method for class 'PugViewSectionList' section(object, .id = "S1", .verbose = FALSE, ...) ## S3 method for class 'PugViewSection' section(object, .id = "S1", .verbose = FALSE, ...)section(object, ...) ## S3 method for class 'PugViewInstance' section(object, .id = "S1", .verbose = FALSE, ...) ## S3 method for class 'PugViewSectionList' section(object, .id = "S1", .verbose = FALSE, ...) ## S3 method for class 'PugViewSection' section(object, .id = "S1", .verbose = FALSE, ...)
object |
an object returned from get_pug_view. |
... |
other arguments. Currently has no effect on the outputs. Can be ignored. |
.id |
A character value that corresponds to the ID of a specific section. Detailed information about the section with the given section ID will be returned. If NULL, the first section (i.e., "S1") is returned. If there is no section under |
.verbose |
A logical value. Should the resulting object be printed to the R console? If TRUE, the object is returned invisibly and the output is printed nicely to the R console. This option may not be available for some slots (or classes). See Notes/Details. |
'section()' traverses nested PUG View section structures and supports recursive drill-down by section IDs.
A PugViewSection object for a selected section, or NULL
when section content is unavailable.
A Pug View Request returns a detailed list from the PubChem database. This list may include data under many nested sections, each corresponding to a different property structured within further nested sections. The complicated structure of the returned object makes it impossible to print all information to the R console at once. Therefore, it is recommended to print sections selectively. Furthermore, one may navigate through the nested sections using the section function. See Examples.
Use the sectionList function to list available sections (or subsections of a section) of a Pug View request and related section IDs.
'.verbose' to Print Section DetailsIt is possible to print section details to the R console. If .verbose = TRUE, the resulting object is returned invisibly and a summary of section details is printed to the R console. This might be useful to navigate through nested sections and sequentially print multiple sections to the R console. For example, consider following command:
> section(section(request, "S1", .verbose = TRUE), "S3", .verbose = TRUE)
This command will print section "S1" and the subsection "S3" located under "S1" to the R console. One may navigate through sections under other sections, similar to exploring dreams within dreams as depicted in the exceptional movie Inception. (SPOILER WARNING!!) However, be careful not to get lost or stuck in the dreams!! Also, note that this strategy works only if .verbose = TRUE for all sections and/or subsections.
# Pug View request for the compound "aspirin (CID = 2244)". pview <- get_pug_view(identifier = "2244", annotation = "data", domain = "compound") section(pview, "S1") section(pview, "S1", .verbose = TRUE) # List all available sections sectionList(pview) # Subsections under the section "S1" sectionList(section(pview, "S1")) # Print multiple sections # section(section(pview, "S1", .verbose = TRUE), "S3", .verbose = TRUE)# Pug View request for the compound "aspirin (CID = 2244)". pview <- get_pug_view(identifier = "2244", annotation = "data", domain = "compound") section(pview, "S1") section(pview, "S1", .verbose = TRUE) # List all available sections sectionList(pview) # Subsections under the section "S1" sectionList(section(pview, "S1")) # Print multiple sections # section(section(pview, "S1", .verbose = TRUE), "S3", .verbose = TRUE)
This function may be used to list available sections (or subsections) of a PubChem request returned from get_pug_view. It is useful when one wants to extract a specific section (or subsection) from PubChem request. It supports patteern-specific searches within sections. See Detail/Note below for more information.
sectionList(object, ...) ## S3 method for class 'PugViewInstance' sectionList(object, ...) ## S3 method for class 'PugViewSectionList' sectionList( object, .pattern = NULL, .match_type = c("contain", "match", "start", "end"), ... ) ## S3 method for class 'PugViewSection' sectionList( object, .pattern = NULL, .match_type = c("contain", "match", "start", "end"), ... )sectionList(object, ...) ## S3 method for class 'PugViewInstance' sectionList(object, ...) ## S3 method for class 'PugViewSectionList' sectionList( object, .pattern = NULL, .match_type = c("contain", "match", "start", "end"), ... ) ## S3 method for class 'PugViewSection' sectionList( object, .pattern = NULL, .match_type = c("contain", "match", "start", "end"), ... )
object |
an object of PubChem request, generally returned from get_pug_view. |
... |
other arguments. Currently has no effect on the outputs. Can be ignored. |
.pattern |
a character vector. Each text pattern given here will be searched within Pug View sections by using the pattern matching strategy defined with |
.match_type |
a string. How should search patterns (i.e., |
Pattern matching is used to filter sections that match user-defined patterns. It is useful when there are more sections than allowed to print R console. In such situations, it may be reasonable to print a subset of all section list to R console that meets search criteria. There are several pattern matching methods as described below
Partial Matching ("contain", "start", "end"): Returns the section names that contains or starts/ends by given text patterns.
Exact Matching ("match"): Returns the section names that exactly matches given text patterns.
A tibble with section IDs/headings, or NULL when no section
data are available.
pview <- get_pug_view(identifier = "2244", annotation = "data", domain = "compound") # List all section names sectionList(pview) # Pattern-matched section names sectionList(pview, .pattern = c("safety", "chemical"), .match_type = "contain") sectionList(pview, .pattern = "safety", .match_type = "match") sectionList(pview, .pattern = "properties", .match_type = "end") # Use section IDs to extract section data from Pug View request section(pview, "S12") # Safety and Hazardspview <- get_pug_view(identifier = "2244", annotation = "data", domain = "compound") # List all section names sectionList(pview) # Pattern-matched section names sectionList(pview, .pattern = c("safety", "chemical"), .match_type = "contain") sectionList(pview, .pattern = "safety", .match_type = "match") sectionList(pview, .pattern = "properties", .match_type = "end") # Use section IDs to extract section data from Pug View request section(pview, "S12") # Safety and Hazards
Extracts synonym data from a PubChem request using the function get_synonyms.
synonyms(object, ...) ## S3 method for class 'PubChemInstance_Synonyms' synonyms(object, .to.data.frame = TRUE, ...)synonyms(object, ...) ## S3 method for class 'PubChemInstance_Synonyms' synonyms(object, .to.data.frame = TRUE, ...)
object |
An object of class |
... |
Additional arguments passed to other methods. Currently, these have no effect. |
.to.data.frame |
a logical. If TRUE, returned object will be forced to be converted into a data.frame (or tibble). If failed to convert into a data.frame, a list will be returned with a warning. Be careful for complicated lists (i.e., many elements nested within each other) since it may be time consuming to convert such lists into a data frame. |
The method can return either a combined tibble or raw synonym payloads,
depending on .to.data.frame.
A data.frame (or list) object containing the synonym data.
syns <- get_synonyms(identifier = c("aspirin", "caffeine"), namespace = "name") synonyms(syns)syns <- get_synonyms(identifier = c("aspirin", "caffeine"), namespace = "name") synonyms(syns)