Domain Development Guide

This guide explains how to create and maintain a domain within the Workflomics platform. By following this step-by-step guide, you will learn how to configure a new domain, link it to CWL tool descriptions, and validate it using the APE library. Each step is broken down into detailed instructions for ease of use.

Overview of Steps

Clone the Workflomics Repository: Set up your local environment by cloning the necessary repository.
Create a New Domain Folder: Organize your new domain directory with the required configuration files.
Add Tools to Your Domain: Include tools in your domain by reusing existing annotations or creating new ones.
Configure Domain Files: Update the domain configuration files with metadata, inputs, outputs, and constraints.
Validate the Domain with APE Library: Check your domain configuration for errors using the APE library.
Submit and Integrate the Domain: Submit your domain to the Workflomics repository and update the database.

Now, let’s go through each of these steps in detail.

Step 1: Clone the Workflomics Repository

To start, fork the workflomics/tools-and-domains repository on GitHub (or create a new branch) and clone the forked repository to your local machine:

git clone https://github.com/Workflomics/tools-and-domains.git

Navigate to the cloned directory:

cd tools-and-domains

This repository contains templates and configuration files that will serve as a starting point for creating your new domain. In addition, the latest EDAM version (used to describe domain terminology, data types, formats, operations) and all the tools and domains hosted on the Workflomics platform are stored in this repository.

Step 2: Create a New Domain Folder

Create a new domain folder within the domains directory. This folder will store all configurations and tool annotations for your domain. You can copy the structure of the template-domain folder:

mkdir domains/my-domain
cp -r domains/template-domain/* domains/my-domain/

Your domain folder should have the following structure (the specified file names are recommended but can be customized):

my-domain/
├── tools.json
├── config.json
└── constraints.json

Explanation of Files:

`config.json`: Defines domain-specific metadata and configuration. This includes paths, input and output types, and tool references.
`tools.json`: Contains bio.tools annotations for tools in your domain. You will populate this file with the list of tools used in the domain.
`constraints.json`: Specifies constraints for tools and workflows in your domain. These constraints can be included or linked in config.json.

Step 3: Add Tools to Your Domain

This section explains how to add tools to your domain, i.e., how to update the tools.json file with the bio.tools annotations and references to CWL descriptions for each tool (these CWL descriptions specify how the tool should be executed are are stored in the tools-and-domains/cwl-tools directory). To add tools to your domain, follow the process below to either reuse existing annotations from the cwl-tools repository or create new annotations if the tool is not yet included.

Check for Existing Tools in `cwl-tools`:

Begin by searching for the tool on bio.tools and note down its biotoolsID. For example, the biotoolsID for the tool Comet is comet, while for CoMet it is comet-universe.

Note

How to obtain `biotoolsIDs`

The biotoolsID for each tool can be obtained from bio.tools. It is visible in the URL of the tool page. Alternatively, you can use bio.tools REST API to fetch the biotoolsID for a tool. For example, see the API call for the comet entry.

Use the biotoolsID` to check if the tool is already annotated in the cwl-tools directory, where each tool is stored in a folder named after its biotoolsID. For example, the Comet` tool is annotated in the cwl-tools/comet directory.

If the tool exists, simply copy the content from cwl-tools/biotoolsID/biotoolsID.json and paste it into the your-domain/tools.json file under your domain directory. This way, you can add tools without needing to modify or create any new CWL descriptions.

For example, here is the full annotation for the Comet tool:

{
  "outputs": [
    {
      "format_1915": ["http://edamontology.org/format_3655"],
      "data_0006": ["http://edamontology.org/data_0945"]
    },
    {
      "format_1915": ["http://edamontology.org/format_3247"],
      "data_0006": ["http://edamontology.org/data_0945"]
    },
    {
      "format_1915": ["http://edamontology.org/format_3475"],
      "data_0006": ["http://edamontology.org/data_0945"]
    }
  ],
  "inputs": [
    {
      "format_1915": [
        "http://edamontology.org/format_3244",
        "http://edamontology.org/format_3654",
        "http://edamontology.org/format_3651"
      ],
      "data_0006": ["http://edamontology.org/data_0943"]
    },
    {
      "format_1915": ["http://edamontology.org/format_1929"],
      "data_0006": ["http://edamontology.org/data_2976"]
    }
  ],
  "taxonomyOperations": ["http://edamontology.org/operation_3646"],
  "implementation": {
    "cwl_reference": "https://raw.githubusercontent.com/Workflomics/tools-and-domains/refs/heads/main/cwl-tools/comet/comet.cwl"
  },
  "biotoolsID": "comet",
  "label": "Comet",
  "id": "Comet"
}

Double-check that the cwl_reference field is correct and points to the appropriate CWL file in the repository. The cwl_reference should be accessible and point to the raw file URL of the CWL description for this tool in the cwl-tools directory.

Adding New Tools from `bio.tools` Not Present in `cwl-tools`:

If the tool is not already annotated in the cwl-tools repository, follow the instructions in the Adding a new tool page, which explains how to create new CWL files and annotations for the tool.

Once you have added the new tool to cwl-tools, made a PR and merged the changes into the main branch, update your domain’s tools.json file using the same process as above, linking to the new CWL file using the cwl_reference field.

For additional guidance on how to create new CWL files and annotations, we refer to the TESS CWL user guide.

Step 4: Configure Domain Files

Edit config.json

The config.json file contains most of the bioinformatics domain-specific metadata and configuration (e.g., path to the latest EDAM ontology, EDAM identifiers for root terminology - data format, data type, operation, etc.). You should update the config.json file with your domain’s specific configurations:

Update paths for tool_annotations_path and constraints_path to point to the correct files in your domain folder (paths can be local while you are testing the domain, but when making a PR the paths should point to the expected “raw” path on main, as used in the template).
Define inputs and outputs for the domain to reflect a demo example of the expected inputs and outputs for the tools in your domain. The terminology used adheres to EDAM classes and URIs (always use the latest EDAM version). As an example, the config.json provided in the template folder should contains input fields in the following format:

{
   "inputs": [
   {
      "data_0006": ["data_0943"],
      "format_1915": ["format_3244"]
   },
   {
      "data_0006": ["data_2976"],
      "format_1915": ["format_1929", "format_3654"]
   }],
}

This specifies that the workflow will accept two distinct inputs. The first one must be of data type (data_0006) - Mass spectrum (data_0943) and data format (format_1915) - mzML (format_3244). The second input must have data type (data_0006) - Protein sequence (data_2976), while data format (format_1915) specifies two possible allowed formats FASTA (format_1929) and XML (format_3654). The output fields should be defined in a similar manner following the same semantics, the only difference is that the inputs field should be replaced with outputs.

For a full list of configurable options, see the configuration documentation.

Edit tools.json

The tools.json file holds the bio.tools annotations for all tools in your domain. At this stage, you should have updated this file with the correct tool annotations and CWL references for each tool. If you however want to generate the domain from scratch (and not use the existing CWL files and provided json annotations), you can the APE CLI to generate the tools.json file from a list of bio.tools IDs.:

java -jar APE-2.4.0-executable.jar convert-tools ./toolIDsList.json

Refer to the APE CLI documentation for more details on generating tool annotations.

Edit constraints.json

Modify the constraints.json file to include domain-specific constraints such as tool dependencies, data types, and workflow limitations. This file can be referenced in config.json, as currently done in the template, or included directly in the config.json file under the constraints field.

For more details on constraint formatting, see the constraints documentation.

Step 5: Validate the Domain with APE Library

After configuring the domain, validate the domain files using the APE library to check for errors:

java -jar APE-2.4.0-executable.jar synthesis ./domains/my-domain/config.json

This command will validate your config.json and related files, ensuring that all inputs, outputs, and constraints are correctly defined. In addition, the command will generate workflows that fit the configuration specified (inputs, outputs, constraints) and check for any errors or inconsistencies. Make sure that this configuration produces at least one valid workflow, as it will be used as a demo example for the domain on the Workflomics platform.

Step 6: Submit and Integrate the Domain

If the validation is successful, create a pull request to merge your changes into the Workflomics repository. The pull request should be reviewed and approved by the Workflomics development team.

Once the pull request is merged:

Create an issue in the Workflomics repository to request the addition of your domain to the database.
Include the domain name, a brief description, and the link to your domain’s config.json file.
Update the database using the SQL script that contains the new domain information.

The Workflomics development team will finalize the integration and update the Workflomics platform to include your domain.

Configure CWL Files

CWL files for the tools in your domain should be added to the cwl-tools directory and annotated according to bio.tools standards. Ensure each tool has a separate CWL file named after the tool, such as Comet.cwl, PeptideProphet.cwl, etc.

Once the CWL files are added, update tools.json to include the correct cwl_reference links.

For more information on creating and formatting CWL files, refer to the Elixir Training Platform.

Configure Workflomics

To integrate a new domain into the Workflomics platform, ensure the domain configuration is included in the public.domain table of the Postgres database. This can be done using the SQL script provided in the repository:

INSERT INTO public.domain (name, description, config_path) VALUES ('my-domain', 'A new bioinformatics domain', 'domains/my-domain/config.json');

After updating the database, restart the Workflomics server to reflect the new domain changes.

If you have any questions or need assistance, please contact the Workflomics development team.