Stothard Research Group

Proksee

Proksee provides users with a powerful, easy-to-use, and feature-rich system for assembling, annotating, analysing, and visualizing bacterial genomes. Proksee accepts Illumina sequence reads as compressed FASTQ files or pre-assembled contigs in raw, FASTA, or GenBank format. Alternatively, users can supply a GenBank accession or a previously generated Proksee map in JSON format. Proksee then performs assembly (for raw sequence data), generates a graphical map, and provides an interface for customizing the map and launching further analysis jobs. Notable features of Proksee include unique and informative assembly metrics provided via a custom reference database of assemblies; a deeply integrated high-performance genome browser for viewing and comparing analysis results at individual base resolution (developed specifically for Proksee); an ever-growing list of embedded analysis tools whose results can be seamlessly added to the map or searched and explored in other formats; and the option to export graphical maps, analysis results, and log files for data sharing and research reproducibility. All these features are provided via a carefully designed multi-server cloud-based system that can easily scale to meet user demand and that ensures the web server is robust and responsive.

SVDB-DC

SVDB-DC (Structural Variant Database for Dairy Cattle) is a web server containing the results of our large-scale structural variant analysis conducted using publicly available Holstein cattle genome sequences. SVDB-DC contains extensive information on each structural variant discovered including functional impact predictions, feature overlaps, overlapping SNP chip markers, tagging SNPs from SNP chips and whole-genome sequencing, and population statistics. BAM file content for all study samples is included for all SVs and their surrounding regions, allowing the supporting evidence for each SV to be inspected. In this way it is often possible to refine SV boundaries and genotypes. For each SV, SVDB automatically loads data from representative samples from each observed genotype class. Various buttons can be used to adjust how sequence information is displayed, control the number of samples shown, refresh the sample set with a new random selection, request specific samples, and limit the selected samples to a particular sex. Various previously reported and novel SVs can be viewed in SVDB-DC including those that overlap with POPDC3, ORM1, G2E3, FANCI, TFB1M, FOXC2, N4BP2, GSTA3, and RGS2. A non-genic SV associated with horn absence can also be found in SVDB-DC. Visualization helps to reveal processed pseudogenes that lead to multiple adjacent false deletion calls in genes, like this example involving the COPA gene. SVDB-DC be used to find well-supported genic SVs, determine SV breakpoints, design genotyping approaches, and identify processed pseudogenes masquerading as deletions.

PHASTEST

PHASTEST (PHAge Search Tool with Enhanced Sequence Translation) is a web server designed to support the rapid identification, annotation and visualization of prophage sequences within bacterial genomes and plasmids. PHASTEST also supports extensive annotation and interactive visualization of all other genes (protein coding regions, tRNA sequences and rRNA sequences) in those same genomes. PHASTEST can now process a typical bacterial genome in 3.2 minutes from the raw sequence alone or in 1.3 minutes when given a pre-annotated GenBank file.

PlasMapper

PlasMapper 3.0 is a web server that allows users to generate, edit, annotate and interactively visualize publication quality plasmid maps. It is the successor to PlasMapper 2.0 and offers many features found only in commercial plasmid mapping/editing packages. PlasMapper 3.0 allows users to paste or upload plasmid sequences as input or to upload existing plasmid maps from its large database of > 2000 pre-annotated plasmids (PlasMapDB). This database can be searched by plasmid names, sequence features, restriction sites, preferred host organism, and sequence length. PlasMapper 3.0 also supports the annotation of new or never-before-seen plasmids using its own feature database (FeatureDB) that contains common promoters, terminators, regulatory sequences, replication origins, selectable markers and other features found in most cloning vectors and plasmids.

DrugBank

DrugBank is a comprehensive, free-to-access, online database containing information on drugs and drug targets. It combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. DrugBank is widely used by the drug industry, medicinal chemists, pharmacists, physicians, students and the general public. Because of its broad scope, comprehensive referencing, and detailed data descriptions, DrugBank is enabling major advancements across the data-driven medicine industry.

HMDB

The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body. It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education. The database is designed to contain or link three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data.

The Sequence Manipulation Suite

A collection of simple programs for generating, formatting, and analyzing short DNA and protein sequences. The Sequence Manipulation Suite is commonly used by molecular biologists, for teaching purposes, and for program and algorithm testing. Open the Sequence Manipulation Suite at https://paulstothard.github.io/sequence_manipulation_suite/.

Sequence Manipulation Suite 3 (SMS3)

SMS3 provides tools for DNA/RNA and protein sequence manipulation, analysis, visualization, workflows, and interactive viewers. Open SMS3 at https://paulstothard.github.io/sequence-manipulation-suite/.

CGView

A Java package for generating high quality, navigable maps of circular genomes. Its primary purpose is to serve as a component of sequence annotation pipelines. Feature information and rendering options are supplied to the program using an XML file or a tab-delimited file. CGView converts the input into a graphical map (PNG, JPG, or SVG format), complete with labels, a title, and legends. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any web browser, allowing rapid genome browsing, and facilitating data sharing. The feature labels in maps can be hyperlinked to external resources, allowing CGView maps to be integrated with existing web site content or databases.

CGView.js

CGView.js is a circular and linear map viewer for microbial and organellar genomes. Written in JavaScript and based on the original Java-based CGView, CGView.js creates high-quality, interactive maps that can be easily embedded in web pages. A comprehensive API provides actions to manipulate map components and hooks for integration with third-party tools.

CGView Comparison Tool (CCT)

The CGView Comparison Tool (CCT) is a package for visually comparing bacterial, plasmid, chloroplast, and mitochondrial sequences. The comparisons are conducted using BLAST, and the BLAST results are presented in the form of graphical maps that can also show sequence features, gene and protein names, COG category assignments, and sequence composition characteristics. CCT can generate maps in a variety of sizes, including 400 Megapixel maps suitable for posters. Comparisons can be conducted within a particular species or genus, or all available genomes can be used. The entire map creation process, from downloading sequences to redrawing zoomed maps, can be completed easily using scripts included with CCT. User-defined features or analysis results can be included on maps, and maps can be extensively customized.

fast-pptx

Quickly make a PowerPoint presentation from a directory of code snippets, CSV files, TSV files, Graphviz DOT files, Mermaid mmd files, images, PDFs, and URLs. fast-pptx converts the CSV and TSV files to Markdown tables, renders the DOT and mmd files, creates high-resolution images from the PDFs, captures high-resolution screenshots of the websites, and then builds a Markdown presentation file for input to Pandoc. The Markdown file is then converted to PowerPoint presentations using templates that preserve syntax highlighting and make effective use of slide space. You can edit the Markdown to add content and regenerate the presentations using the included pandoc.sh script that is generated, or you can edit the presentations in PowerPoint.

genotype_conversion_file_builder

The genotype_conversion_file_builder is a pipeline for determining the genomic location and transformation rules for the variants described in Illumina or Affymetrix genotype panel manifest files. Briefly, the pipeline extracts the flanking sequence of each variant from the manifest file, and performs a BLAST search comparing each flanking sequence against a new reference genome of interest. Next, the resulting BLAST alignments are parsed in conjunction with the manifest file, to establish the position of each variant on the reference genome, and to generate simple transformation rules that can be used to convert genotypes between any of the standard formats (AB, TOP, FORWARD, DESIGN) and from any of the standard formats to the forward strand of the reference genome (PLUS). An indication of which allele is observed in the reference genome is also provided. The position information and transformation rules are written to separate files, referred to as "position" and "conversion" files, respectively. An additional "wide" file provides the position and conversion information together in a format that can be easily converted to files used by downstream tools like PLINK. See the output file documentation for detailed descriptions of the output files and sample output. See the conversion example documentation for an example of using a conversion file.

Genome Artistry

The Genome Artistry tool in Proksee generates high-resolution art posters from bacterial genome comparisons. A reference genome is aligned against a collection of comparison genomes; blocks of similarity are coloured by percent identity and arranged into visual compositions such as linear rows, wrapped rows, concentric rings, or an Archimedean spiral.

Genome Loom

The Genome Loom tool in Proksee creates comparative genome ribbon plots that show how a reference genome aligns with multiple comparison genomes across overview, pairwise, and neighbor-chain views. The tool preserves genome order, contig order, and contig orientation while changing which ribbon layer is visible.

helpful_commands

Command-line tools, commands, and code snippets for performing routine data processing and bioinformatics tasks.

helpful_links

Links to resources related to bioinformatics and data analysis.

map-artistry

Artistic topographic and GPX route map generator using OpenStreetMap, satellite imagery, and digital elevation models. Generates high-resolution, stylized region maps and route-overlay maps via a customizable pipeline.

step-by-sample

A Bash-first pattern for per-sample workflow steps that generates one job script per sample, collects them in a run list, and executes that list locally or on Slurm. Each sample writes its own run.log and ends with either .done or .failed in its output folder. The design stays intentionally small with no workflow engine or background controller—just generated scripts plus helper commands for validation, status reporting, and recovery.

News

Our large-scale analysis of structural variation in Holstein cattle and associated database have been published in BMC Genomics.

Congratulations to Natalie Diether on receiving the 2023 ALES Graduate Student PhD Thesis Award.

Our Proksee bacterial genome analysis server has been published in Nucleic Acids Research.

We helped solve a mystery involving cattle DNA testing in Ireland.

Our Bison Integrated Genomics (BIG) Project is now underway.

Feed efficiency genetic evaluations from our Genome Canada-funded research with the University of Guelph have been released by Lactanet Canada.

Our paper on a new SNP array for bison has been published in Frontiers in Genetics, and the array is now available to bison producers through Neogen Canada.

Software