Pubmetric Documentation
This guide provides instructions for setting up and using the pubmetric package locally, including co-citation graph generation and the calculation of metrics.
Local Setup
Ensure you have Python 3.7+ installed.
Download the repository on GitHub.
Install the required Python packages listed in the setup.py file:
pip install setup.py
For development purposes, include the test dependencies by running:
pip install .[test]
Note
The pubmetric co-citation graph can be visualised using Cytoscape. To use Cytoscape, make sure you have it installed and open on your machine.
Importing the Package:
Import the necessary modules in your Python script, pubmetric relies on asyncio and needs to be executed using it:
import pubmetric
import asyncio
Graph Generation
Generate New Co-citation Graph
Test Size: Create a smaller co-citation graph for testing purposes:
testsize_graph = asyncio.run(pubmetric.network.create_network(test_size=20))
Full Network:
Generate a co-citation network for a specific topic:
full_proteomics_graph = asyncio.run(pubmetric.network.create_network(topic_id="topic_0121"))
full_metabolomics_graph = asyncio.run(pubmetric.network.create_network(topic_id="topic_3172"))
Generate a genomics co-citation graph, using not the topic, but all well-annotated tools currently in bio.tools by specifying topic_id=None and tool_selection=’full’:
full_genomics_graph = asyncio.run(pubmetric.network.create_network(topic_id=None, tool_selection='full'))
Generate a co-citation network for a user-specifified list of tools, using a list of bio.tool tool names:
tool_pmid_list = ["tool_name_1", "tool_name_2", ...]
graph_from_list = asyncio.run(pubmetric.network.create_network(topic_id=None, tool_selection=tool_pmid_list))
Pubmetric outputs three files; doi_pmid_library.json, tool_metadata.json and graph.pkl. These can be used to load the graph, or recreate the graph using the already downloaded metadata.
Load Data:
Load an existing co-citation graph from a specified path:
path_to_data = 'path/to/your/data/directory'
loaded_graph = asyncio.run(pubmetric.network.create_network(inpath=path_to_data, load_graph=True))
Note
If load_graph=True is not specified, the graph will be regenerated using the metadata file in the specified directory.
Graph visualisation Install and import Cytoscape for python (py4cytoscape) to interact with Cytoscape directly in your script:
import py4cytoscape as p4c
When running the code, make sure the Cytoscape software is open:
p4c.create_network_from_igraph(graph, f"Cocitation_graph", collection="Bibliographic networks")
Below, an example of a merged co-citation graph containing tools from metabomics, proteomics and genomics is shown.

Figure: Merged co-citation graph of the three domains genomics (blue, bottom right), metabolomics (pink/purple, left) and proteomics (green, top right). Illustrating shared tools and distributions of edge weights, ages, and degrees. The graph only includes nodes with a degree of 20 or higher, for clarity. Edge transparency and width correspond to the edge weight, with higher weights represented by thicker and more opaque lines. Node size corresponds to degree, with larger nodes indicating higher connectivity. Node border colour corresponds to tool age, where brighter green signifies newer tools and grey represents older ones. The inner colour of the nodes indicates the domain to which the tool belongs.
Metric Calculation
Download Workflow Data:
To calculate the metric score the workflow must be loaded using parse_cwl. CWL files can be downloaded from the Workflomics live demo.
The CWL parser function needs the tool_metadata.json metadata file which is generated alongside the co-citation graph.
There are two main metrics, workflow_average
and complete_average
which take into account only the workflow
edges, or all possible edges between tools in a workflow, respectively:
cwl_file_path = "./path/to/candidate_workflow.cwl"
metadata_file_path = 'path/to/tool_metadata.json'
workflow = pubmetric.workflows.parse_cwl(cwl_file_path, metadata_file_path)
metric_score = pubmetric.metrics.workflow_average(loaded_graph, workflow)
File Schemas
This package requires files and data structures to adhere to specific schemas for proper operation. These are presented below.
The Metadata File:
The metadata file holds the metadata also contained within the graph, with some additional global statistics on the data download. It can be used to regenerate a graph containing the tools in the file. By specifying inpath=”./path/to/directory” containg the metadata file.
{
"creationDate": "string",
"topic": "string",
"totalNrTools": "number",
"biotoolsWOpmid": "number",
"pmidFromDoi": "number",
"tools": [
{
"name": "string",
"doi": "string or null",
"topics": ["string"],
"nrPublications": "number",
"allPublications": ["string"],
"pubDate": "number",
"pmid": "string",
"nrCitations": "number"
}
]
}
Field |
Required |
Description |
||
---|---|---|---|---|
|
No |
Date and time of file creation. |
||
|
No |
The topic for which the graph was generated, or None if the full bio.tools or a selection was used. |
||
|
No |
The total nr of tools in bio.tools (or for the topic, if specified). |
||
|
No |
The number of tools that dont have a Pubmed ID in bio.tools . |
||
|
No |
Number of tool Pubmed IDs downloaded from NCBI using their DOI. |
||
|
Yes |
A list of the tools for which Pubmed IDs were found, and their metadata. |
||
|
Yes |
The tool name as registered in bio.tools. |
||
|
No |
The DOI for the primary publication of the tool. |
||
|
No |
The topics linked to the tool in bio.tools. |
||
|
No |
The number of publications linked to (presenting) the tool. |
||
|
No |
The Pubmed IDs of publications linked to (presenting) the tool. |
||
|
Yes |
The year the primary publication was published |
||
|
Yes |
The Pubmed ID for the primary publication of the tool. |
||
|
No |
The number of citations the primary publication of the tool, downloaded from EuropePMC. |
The Graph:
Each vertex in the co-citation graph represents a software tool in bio.tools. Each edge represents the number of co-citations between a pair of tools. The edges and vertices contain additional metadata which is used in the calculation of the metrics.
Field |
|
Description |
||
---|---|---|---|---|
|
Graph |
Date and time of file creation. |
||
|
Graph |
The topic for which the graph was generated, or None if the full bio.tools or a selection was used. |
||
|
Graph |
A boolean for if the graph was created using a list of selected tools rather than a topic. |
||
|
Graph |
The total time taken for the data download and graph creation. |
||
|
Vertex |
The tool name, as registered in bio.tools, of the tool represented by the vertex. |
||
|
Vertex |
The Pubmed ID for the primary publication of the tool represented by the vertex. |
||
|
Vertex |
Time since publication of the primary publication for the tool represented by the vertex. |
||
|
Vertex |
The number of citations the primary publication of the tool, downloaded from EuropePMC. |
||
|
Vertex |
The number of edges connected to the vertex. |
||
|
Edge |
The number of co-citations between a certain pair of tools. |
||
|
Edge |
The closeness of a pair of tools (inverted number of co-citations between a certain pair of tools.) |
The Workflow Dictionary
The workflow dictionary, which is generated using parse_cwl
, follows the following schema:
{
"edges": [
[
"string",
"string"
]
],
"steps": {
"string": "string"
},
"pmid_edges": [
[
"string",
"string"
]
]
}
Field |
Required |
Description |
||
---|---|---|---|---|
|
Yes |
Tuples of step names, as given by the APE generated CWL file. Allows for repetition of tools through step indexing |
||
|
Yes |
A dictionary which links each step name to its Pubmed ID. |
||
|
Yes |
Tuples of Pubmed IDs. Often extracted by metrics which do not require the worfklow order to be perserved. OBS this does not reflect potential repetition of tools, but treats each occurance of a tool as a single node. |