helpful_links
Links to resources related to bioinformatics and data analysis.
Table of Contents
- helpful_links
- ATAC-Seq
- Audiovisual
- BAM and SAM
- ChIP-Seq
- Containers
- Courses and training
- Data sharing and management
- Data visualization
- Diagrams, flowcharts and maps
- Document conversion and manipulation
- Documentation creation and publishing
- EMBL-EBI
- Genome alignment and comparison
- Genome annotation and sequence characterization
- Genome assembly
- Genome visualization
- GWAS
- Images
- Machine learning
- Metagenomics
- Methyl-Seq
- Multiomics
- NCBI
- Phylogenetics and phylogenomics
- Population genetics and conservation
- Programming
- Raw sequence data processing and QC
- RNA-Seq
- Sequence feature utilities
- Sequence read alignment
- Sequence searching and clustering
- Sequence utilities
- Software development tools
- Statistics
- Synthetic data
- Tabular data and exploratory data analysis
- Utilities
- Variant identification and analysis
- VCF files
- Vim
- Workflow development and workflows
ATAC-Seq
- nf-core/atacseq - ATAC-Seq peak-calling, QC and differential analysis pipeline.
Audiovisual
- ffmpeg - A complete, cross-platform solution to record, convert and stream audio and video.
- ffmprovisr - A tool to help you build complex FFmpeg commands without writing a single line of code.
BAM and SAM
- alimanfoo/pysamstats - Reports simple statistics for genome positions based on sequence alignments from a SAM or BAM file.
- genome/bam-readcount - Generates low-level information about sequencing data at specific nucleotide positions in a BAM or CRAM file.
- pysam-developers/pysam - A Python module for reading and manipulating SAM/BAM/VCF/BCF files.
- samtools/samtools - Tools for manipulating next-generation sequencing data.
- shiquan/bamdst - Generate BAM file statistics.
- sstadick/perbase) - Calculate per-base statistics from BAM/CRAM files.
ChIP-Seq
- nf-core/chipseq - ChIP-Seq peak-calling, QC and differential analysis pipeline.
Containers
- BioContainers - A registry of bioinformatics software containers.
- Docker Hub - Find and share Docker container images.
- orbstack/orbstack - Fast, light, simple Docker containers and Linux machines for macOS.
- Quay - Find and share container images.
Courses and training
- Canadian Bioinformatics Workshops - The Canadian Bioinformatics Workshops (CBW) offered through bioinformatics.ca focuses on training in leading technologies and the latest methods being used in computational biology to work with these data.
- matloff/fasteR - A thorough and thoughtful introduction to R for non-programmers.
- raivivek/awesome-biology - Learning resources, research papers, tools, and other resources across different fields of biology.
- sib-swiss/training-collection - Bioinformatics training materials.
Data sharing and management
- datalad/datalad - Keep code, data, containers under control with git and git-annex.
- Zenodo - Store research-related data, software, and reports and make them citable using a DOI.
Data visualization
- arvestad/alv - A console-based alignment viewer.
- ChartsCSS/charts.css - Open source CSS framework for data visualization.
- cxli233/FriendsDontLetFriends - Examples of good and bad practices in data visualization.
- cytoscape/cytoscape.js - Graph theory (network) library for visualisation and analysis.
- davidgohel/ggiraph - Create dynamic ggplot graphs with tooltips, hover effects and JavaScript actions.
- dreamRs/esquisse - RStudio add-in to make plots interactively with ggplot2.
- ewels/MultiQC - Aggregate results from bioinformatics analyses across many samples into a single report.
- GenomeVIS USASK - A variety of browser-based visualization tools to support genomics research.
- hms-dbmi/UpSetR - An R implementation of the UpSet set visualization technique.
- jbkunst/highcharter - AN R wrapper for the Highcharts javascript library and its modules. Highcharts is very flexible and customizable javascript charting library and it has a great and powerful API.
- krassowski/complex-upset - A library for creating complex UpSet plots with ggplot2 geoms.
- mw201608/SuperExactTest - Statistical testing and visualization of intersections among multiple sets.
- plotly/plotly.py - An interactive, open-source, and browser-based graphing library for Python.
- plotly/plotly.R - An interactive, open-source, and browser-based graphing library for R.
- R CHARTS - Code examples of R graphs made with base R graphics, ggplot2 and other packages.
- reneshbedre/bioinfokit - Bioinformatics data analysis and visualization toolkit.
- rich-iannone/DiagrammeR - Graph and network visualization using tabular data in R.
- seaborn - A Python data visualization library that provides a high-level interface for drawing attractive and informative statistical graphics.
- simonw/datasette - A tool for exploring and publishing data. It helps people take data of any shape or size and publish that as an interactive, explorable website and accompanying API.
- slowkow/ggrepel - Provides geoms for ggplot2 to repel overlapping text labels.
- smin95/smplot2 - An R package for statistical data visualization that complements ggplot2.
- taiyun/corrplot - A visual exploratory tool on correlation matrix.
- thackl/gggenomes - A versatile graphics package for comparative genomics.
- thomasp85/patchwork - The goal of patchwork is to make it ridiculously simple to combine separate ggplots into the same graphic.
- vega/altair - A simple and consistent API for data visualizations in Python.
- wilkox/gggenes - Draw gene arrow maps in ggplot2.
Diagrams, flowcharts and maps
- jgraph/drawio - A configurable diagramming application.
- k4m454k/MapPosterCreator - Create beautiful maps using OpenStreetMap data.
- mermaid-js/mermaid - Generation of diagrams and flowcharts from text.
- MyOSMatic - A free software web service that allows you to generate maps of cities using OpenStreetMap data.
Document conversion and manipulation
- jgm/pandoc - A library for converting from one markup format to another, and a command-line tool that uses this library.
- pansapiens/standalone_html.py - Convert HTML to a self contained file with inline Base64 encoded PNG images.
- paulstothard/document-builder - Produces nicely formatted PDF and HTML documents from Markdown source files.
- PDFgear - Read, edit, convert, merge, and sign PDF files.
- py-pdf/pypdf - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files.
Documentation creation and publishing
- Antora - A modular documentation site generator designed for technical documentation sites with customizable AsciiDoc structure for managing large-scale projects.
- Astro - A modern static site generator allowing the creation of websites using a mix of static and dynamic components, supporting various rendering engines and frameworks.
- Docusaurus - An open-source framework for building static websites, particularly for documentation, with simple configuration, Markdown support, versioning, search, and localization.
- GitBook - An online platform for creating and hosting documentation with version control, team collaboration, and integrations for a collaborative writing environment.
- honkit/honkit - A command-line tool and Node.js library based on GitBook for building online documentation or ebooks using Markdown files.
- Jekyll - A static site generator for creating websites or blogs using plain text files and Markdown syntax with customizable templates.
- JupyterBook - An open-source tool for building interactive, publication-quality books and documentation using Jupyter Notebooks and Markdown, incorporating code, visualizations, and interactive elements.
- Kozea/WeasyPrint - Converts simple HTML pages into visually appealing PDF reports, invoices, and posters.
- mcanouil/awesome-quarto - A curated list of Quarto talks, tools, examples and articles.
- MdBook - A command-line utility that generates books or documentation websites from Markdown files, focusing on simplicity with table of contents, navigation, and search functionality.
- MkDocs Material - A theme for MkDocs with a modern and responsive design, customizable navigation, search, and a clean user interface.
- MkDocs - A Python-based static site generator for creating documentation websites using Markdown and YAML configuration, offering themes and plugins.
- Nextra - A minimalist and customizable Next.js theme for building documentation websites with Markdown and code syntax highlighting.
- paulstothard/fast-pptx - Generate Markdown and PowerPoint slides from a folder of images, code snippets, and other files.
- Quarto - A scientific computing environment integrating Markdown, code, and output into a single format, enabling reproducible reports, books, or websites with multiple language support and rich media content.
- secretGeek/clowncar - A lightweight static site generator converting Markdown files into HTML pages without complex configurations or dependencies.
EMBL-EBI
- Tools & Data Resources - The European Bioinformatics Institute (EMBL-EBI) maintains the world’s most comprehensive range of freely available and up-to-date molecular data resources.
Genome alignment and comparison
- bluenote-1577/skani - Fast, robust ANI and aligned fraction for genomes and contigs.
- evotools/nf-LO - A Nextflow workflow to generate liftOver files for any pair of genomes.
- gamcil/clinker - Gene cluster comparison figure generator.
- lastz/lastz - A program for aligning DNA sequences, a pairwise aligner.
- mauve - A system for efficiently constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion.
- metagenlab/mummer2circos - Circular bacterial genome plots based on BLAST or NUCMER/PROMER alignments.
- moshi4/ANIclustermap - A tool for clustering genomes based on ANI values.
- mummer4/mummer - A versatile and fast alignment tool for DNA and protein sequences that can align mammalian genomes in a few hours.
- ParBLiSS/FastANI - Fast alignment-free computation of whole-genome average nucleotide identity (ANI).
- paulstothard/cct - A package for visually comparing bacterial, plasmid, chloroplast, and mitochondrial sequences.
- schneebergerlab/plotsr - Plot synteny and structural rearrangements between genomes.
- schneebergerlab/syri - Predict and visualize genomic differences between related genomes using whole-genome assemblies.
Genome annotation and sequence characterization
- AdmiralenOla/Scoary - Pan-genome wide association studies.
- fmalmeida/bacannot - Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports.
- fwhelan/coinfinder - A tool for the identification of coincident (associating and dissociating) genes in pangenomes.
- Gaius-Augustus/BRAKER - A pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET and AUGUSTUS in novel eukaryotic genomes.
- gbouras13/pharokka - Fast phage annotation.
- gtonkinhill/panaroo - Producing polished prokaryotic pangenomes.
- jime-sg/deleat - Gene essentiality prediction and deletion design for bacterial genome reduction.
- jotech/gapseq - Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks.
- ncbi/amr - Identifies AMR genes, resistance-associated point mutations, and select other classes of genes using protein annotations and/or assembled nucleotide sequence.
- ncbi/pgap - Annotate bacterial and archaeal genomes (chromosomes and plasmids).
- nextgenusfs/funannotate - Eukaryotic genome annotation pipeline.
- oschwengers/bakta - A tool for the rapid and standardized annotation of bacterial genomes and plasmids from isolates and MAGs.
- replikation/What_the_Phage - A scalable and easy-to-use workflow for phage identification and analysis.
- sanger-pathogens/Roary - Rapid large-scale prokaryote pan genome analysis.
- soedinglab/metaeuk - A modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs.
- tseemann/mlst - Scan contig files against traditional PubMLST typing schemes.
- tseemann/prokka - Rapid prokaryotic genome annotation.
- WrightonLabCSU/DRAM - Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes.
Genome assembly
- ablab/quast - Evaluates genome/metagenome assemblies by computing various metrics.
- ablab/spades - SPAdes genome assembler.
- adigenova/wengan - An accurate and ultra-fast hybrid genome assembler.
- alekseyzimin/masurca - The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit.
- BUSCO - Evaluate the completeness of a genome assembly and annotation by assessing the presence of a set of single-copy, conserved orthologous genes.
- Ecogenomics/CheckM - Assess the quality of genomes recovered from isolates, single cells, or metagenomes.
- fenderglass/Flye - Fast and accurate de novo assembler for single molecule sequencing reads.
- gbouras13/hybracter - An automated long-read first bacterial genome assembly tool implemented in Snakemake.
- gbouras13/plassembler - Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates.
- Kinggerm/GetOrganelle - A fast and versatile toolkit for accurate assembly of organelle genomes.
- lbcb-sci/racon - Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.
- lbcb-sci/raven - A de novo genome assembler for long uncorrected reads.
- lh3/miniasm - Ultrafast de novo assembly for long noisy reads.
- malonge/RagTag - Tools for fast and flexible genome assembly scaffolding and improvement.
- marbl/canu - A single molecule sequence assembler for genomes large and small.
- marbl/verkko - A hybrid genome assembly pipeline developed for telomere-to-telomere assembly of PacBio HiFi and Oxford Nanopore reads.
- marcelauliano/MitoHiFi - Assemble mitogenomes from Pacbio HiFi reads.
- nanoporetech/medaka - A tool to create consensus sequences and variant calls from nanopore sequencing data.
- Nextomics/NextDenovo - Fast and accurate de novo assembler for long reads.
- Nextomics/NextPolish - Fast and accurate polishing of genomes generated by long reads.
- rrwick/Bandage - A tool that allows users to interact with the assembly graphs made by de novo assemblers such as Velvet, SPAdes, and MEGAHIT.
- rrwick/Minipolish - Use Racon to polish a miniasm assembly, while keeping the assembly in graph form.
- rrwick/Polypolish - A short-read polishing tool for long-read assemblies.
- rrwick/Trycycler - A tool for generating consensus long-read assemblies for bacterial genomes.
- rrwick/Unicycler - A hybrid assembly pipeline for bacterial genomes.
- tseemann/shovill - Assemble bacterial isolate genomes from Illumina paired-end reads.
- vgl-hub/gfastats - Generate FASTA file summary statistics and manipulate FASTA files.
- whatshap/whatshap - Read-based phasing of genomic variants.
- xiaochuanle/NECAT - A de novo assembly tool for Nanopore long noisy reads.
Genome visualization
- bernatgel/karyoploteR - An R package to plot arbitrary data along the genome.
- cmdcolin/awesome-genome-visualization - Interesting genome browser or genome-browser-like implementations.
- deeptools/pyGenomeTracks - Reproducible plots for multivariate genomic data sets.
- igvteam/igv-reports - Generate self-contained HTML reports that consist of a table of genomic sites or regions and associated IGV views for each site.
- moshi4/pyGenomeViz - A genome visualization Python package for comparative genomics.
- paulstothard/cgview - Generate high-quality, zoomable maps of circular genomes.
- Proksee - In-depth characterization and visualization of bacterial genomes.
GWAS
- brentp/vcfassoc - Perform genotype-phenotype-association tests on a VCF with logistic regression.
- chrchang/plink-ng - A comprehensive update to the PLINK association analysis toolset.
- jianyangqt/gcta - A software package for performing genome-wide association studies and many related analyses.
- MareesAT/GWA_tutorial - A comprehensive tutorial about GWAS and PRS.
- qtltools/qtltools - A tool collection for the discovery of molecular QTLs (e.g. eQTLs) from raw sequence data.
- TASSEL - A software package for assessing diversity, linkage disequilibrium, relatedness, and genotype / phenotype associations.
- xiaolei-lab/rMVP - A Memory-efficient, visualization-enhanced, and parallel-accelerated tool for GWAS.
Images
- BioRender - Create professional science figures in minutes.
- faressoft/terminalizer - Record your terminal and generate animated gif images or share a web player.
- flameshot-org/flameshot - Powerful yet simple to use screenshot software.
- GIMP - A free and open-source raster graphics editor.
- ImageJ - A Java-based image processing program developed at the National Institutes of Health.
- ImageMagick - A free and open-source software suite for displaying, converting, and editing raster image and vector image files.
- ImageOptim/gifski - Produce high-quality GIFs from video frames.
- nbedos/termtosvg - Record terminal sessions as SVG animations.
- Photopea - A free online tool for editing raster and vector graphics with support for PSD, AI, and Sketch files.
- Shottr - A fast and feature-rich screenshot tool for macOS.
- sindresorhus/pageres-cli - Capture website screenshots.
Machine learning
- DeepLabCut/DeepLabCut - Markerless pose estimation of user-defined features with deep learning for all animals.
- josephmisiti/awesome-machine-learning - A curated list of awesome machine learning frameworks, libraries and software.
- karpathy/convnetjs - A JavaScript implementation of neural networks.
- keras-team/keras - A deep learning API written in Python, running on top of the machine learning platform TensorFlow.
- marrlab/InstantDL - An easy and convenient deep learning pipeline for image segmentation and classification.
- neural networks course - A course by Andrej Karpathy on building neural networks, from scratch, in code.
- pytorch/pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
- scikit-learn/scikit-learn - Machine learning in Python.
- tensorflow/tensorflow - An end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
Metagenomics
- biobakery/biobakery_workflows - A collection of workflows and tasks for executing common microbial community analyses using standardized, validated tools and parameters.
- biobakery/humann - A pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data.
- biobakery/Maaslin2 - A comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta-omics features.
- biobakery/MetaPhlAn - A computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data.
- Ecogenomics/GTDBTk - A toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
- fbreitwieser/pavian - Interactive analysis of metagenomics data.
- metagenome-atlas/atlas - Metagenome-Atlas is a easy-to-use metagenomic pipeline based on Snakemake.
- MrOlm/drep - Rapid comparison and dereplication of genomes.
- nf-core/mag - Assembly and binning of metagenomes.
Methyl-Seq
- EpiDiverse - A collection of Nextflow pipelines for epigenome analysis.
- kdkorthauer/dmrseq - R package for inference of differentially methylated regions (DMRs) from bisulfite sequencing.
- nf-core/methylseq - Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel.
Multiomics
- bioFAM/MOFA2 - A factor analysis model that provides a general framework for the integration of multiomic data sets in an unsupervised fashion.
- cantinilab/momix-notebook - Evaluation of multiomics joint dimensionality reduction approaches.
- mixOmics - An R package that offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection.
NCBI
- All Resources - The National Center for Biotechnology Information (NCBI) advances science and health by providing access to biomedical and genomic information.
- NCBI-Hackathons/EDirectCookbook - Examples illustrating the use of NCBI's Entrez Direct (EDirect), which provides access to the NCBI's suite of interconnected databases.
- shenwei356/taxonkit - A practical and efficient NCBI Taxonomy toolkit.
Phylogenetics and phylogenomics
- achtman-lab/GrapeTree - A fully interactive, tree visualization program, which supports facile manipulations of both tree layout and metadata.
- AstrobioMike/GToTree - A user-friendly workflow for phylogenomics.
- biobakery/phylophlan - Precise phylogenetic analysis of microbial isolates and genomes from metagenomes.
- Cibiv/IQ-TREE - Efficient software for phylogenomic inference.
- davidemms/OrthoFinder - Finds orthogroups and orthologs, infers rooted gene trees for all orthogroups and identifies all of the gene duplication events in those gene trees.
- FastTree - Efficient in memory and time inference of phylogenetic trees from up to a million aligned nucleotide or protein sequences.
- IcyTree - A browser-based phylogenetic tree viewer.
- iTOL - iTOL is an online tool for the display, annotation, and management of phylogenetic trees.
- MEGA - Conduct statistical analysis of molecular evolution and for constructing phylogenetic trees.
- stephaneguindon/phyml - Package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.
- YuLab-SMU/ggtree - Visualization and annotation of phylogenetic trees.
Population genetics and conservation
- ADMIXTURE - A software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets.
- ANGSD/angsd - Perform population genetic analyses using raw sequencing data or by using genotype likelihoods.
- Cervus - A tool for parentage analysis that supports microsatellite and SNP data.
- COLONY - Can be used in estimating full- and half-sib relationships, assigning parentage, inferring mating system (polygamous/monogamous) and reproductive skew in both diploid and haplo-diploid species.
- fastStructure - Infer population structure from large SNP genotype data.
- HAPMIX - Infer chromosomal segments of distinct continental ancestry in admixed populations, using dense genetic data.
- lima1/franzpedigree - A fast and flexible parentage inference program for natural populations.
- millanek/Dsuite - Fast calculation of Patterson's D (ABBA-BABA) and the f4-ratio statistics across many populations/species using VCF files and genotype uncertainty.
- NGSadmix - A tool for estimating individual admixture proportions from NGS data that makes use of genotype likelihoods and works well for medium and low coverage NGS data.
- STRUCTURE - Use multi-locus genotype data to investigate population structure.
- thibautjombart/adegenet - An R package package for the exploratory analysis of genetic data.
Programming
- Automate the Boring Stuff with Python - Practical programming for total beginners.
- gto76/python-cheatsheet - A comprehensive Python cheat sheet.
- kvz/bash3boilerplate - A collection of boilerplate templates for writing better Bash scripts.
- R for Data Science - Learn how to get your data into R, get it into the most useful structure, transform it, visualize it and model it.
- ralish/bash-script-template - A best practices template for bash scripts.
- The Modern JavaScript Tutorial - From the basics to advanced topics with simple, but detailed explanations.
- the-art-of-command-line - Master the command line, in one page.
- awesome-vscode - A curated list of delightful VS Code packages and resources.
- ziishaned/learn-regex - Learn regex the easy way.
Raw sequence data processing and QC
- BBMap - Includes BBMap, a short read aligner, as well as various other bioinformatic tools written in Java.
- gear-genomics/tracy - Basecalling, alignment, assembly and deconvolution of Sanger Chromatogram trace files.
- huishenlab/biscuit - Perform alignment, DNA methylation and mutation calling, and allele specific methylation from bisulfite sequencing data.
- kishwarshafin/pepper - A genome inference module based on recurrent neural networks that enables long-read variant calling and nanopore assembly polishing in the PEPPER-Margin-DeepVariant pipeline.
- lh3/seqtk - Toolkit for processing sequences in FASTA/Q formats.
- OpenGene/fastp - An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming filtering, splitting, merging).
- rrwick/Filtlong - Quality filtering tool for long reads.
- s-andrews/FastQC - A quality control application for high throughput sequence data.
- sheinasim/HiFiAdapterFilt - Remove adapter-contaminated PacBio HiFi reads.
- wdecoster/chopper - Filter and trim long reads.
- wdecoster/NanoPlot - Plotting scripts for long read sequencing data.
RNA-Seq
- kevinblighe/EnhancedVolcano - Publication-ready volcano plots with enhanced colouring and labeling.
- nf-core/nanoseq - Nanopore demultiplexing, QC and alignment pipeline.
- nf-core/rnaseq - RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
- pachterlab/kallisto - Near-optimal RNA-Seq quantification.
- STAR-Fusion/STAR-Fusion - Uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads.
- stemangiola/bioc_2020 - A tidy transcriptomics introduction to RNA sequencing analyses.
- suhrig/arriba - Fast and accurate gene fusion detection from RNA-Seq data.
- YosefLab/ImpulseDE2 - A differential expression algorithm for longitudinal count data sets from RNA-Seq, ChIP-Seq, ATAC-Seq and DNase-Seq experiments.
Sequence feature utilities
- agshumate/Liftoff - A tool that accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species.
- arq5x/bedtools2 - Intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
- bedops/bedops - High-performance genomic feature operations.
- EMBOSS - A free open source software analysis package developed for the needs of the molecular biology and bioinformatics user community.
- gpertea/gffread - GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more.
- NBISweden/AGAT - A suite of tools to handle gene annotations in any GTF/GFF format.
- paulstothard/sms2 - A collection of simple JavaScript programs for generating, formatting, and analyzing short DNA and protein sequences.
Sequence read alignment
- lh3/bwa - Burrows-Wheeler aligner for short-read alignment.
- lh3/minimap2 - A versatile pairwise aligner for genomic and spliced nucleotide sequences.
- philres/ngmlr - A long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.
Sequence searching and clustering
- bbuchfink/diamond - Accelerated BLAST compatible local sequence aligner.
- BLAST+ - A suite of command-line tools to run BLAST searches.
- soedinglab/MMseqs2 - Search and cluster huge protein and nucleotide sequence sets.
Sequence utilities
- ialbert/bio - A collection of command-line utilities for working with sequence records.
- kblin/ncbi-acc-download - Download files from NCBI Entrez by accession.
- kingfisher-download - Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP.
- lindenb/jvarkit - Java utilities for bioinformatics.
- nf-core/fetchngs - Pipeline to fetch metadata and raw FASTQ files from public and private databases.
- pachterlab/ffq - A tool to find sequencing data and metadata from public databases.
- shenwei356/seqkit - A cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
- telatin/seqfu2 - A general-purpose program to manipulate and parse information from FASTA/FASTQ files, supporting gzipped input files.
- tseemann/any2fasta - Convert various sequence formats to FASTA.
Software development tools
- cookiecutter/cookiecutter - A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
- heyman/heynote - A dedicated scratchpad for developers.
- JupyterLab - A web-based interactive development environment for notebooks, code, and data.
- nteract/papermill - Parameterize, execute, and analyze Jupyter Notebooks.
- RStudio Desktop - An integrated development environment for R, a programming language for statistical computing and graphics.
- Visual Studio Code - A source-code editor for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git.
Statistics
- Design of Experiments - A course on the design of experiments.
- easystats/easystats - A collection of R packages designed to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.
- easystats/report - Automatically produces reports of R models and data frames according to best practices guidelines (e.g., APA’s style), ensuring standardization and quality in results reporting.
- IndrajeetPatil/ggstatsplot - Enhancing ggplot2 plots with statistical analysis.
- jasp-stats/jasp-desktop - A complete statistical package for both Bayesian and frequentist statistical methods, that is easy to use and familiar to users of SPSS.
- kassambara/factoextra - Extract and visualize the results of multivariate data analyses.
- paul-buerkner/brms/ - An interface to fit Bayesian generalized (non-)linear multivariate multilevel models using Stan.
- paulvanderlaken/ppsr - R implementation of Predictive Power Score.
- scipy/scipy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
- statsmodels/statsmodels - Statistical modeling and econometrics in Python.
- topGO - Semi-automated enrichment analysis for Gene Ontology (GO) terms.
Synthetic data
- DeclareDesign/fabricatr - Quickly create variables that mimic those you plan to collect during the course of observational or experimental work.
- joke2k/faker - Create synthetic data to test software.
- kgoldfeld/simstudy - A collection of functions for generating simulated data sets.
Tabular data and exploratory data analysis
- apache/arrow - A multi-language toolbox for accelerated data interchange and in-memory processing.
- BurntSushi/xsv - A fast CSV command line toolkit written in Rust.
- ClickHouse/ClickHouse - A column-oriented database management system that allows generating analytical data reports in real-time.
- duckdb/duckdb - A high-performance analytical database system that provides a rich SQL dialect.
- harelba/q - Run SQL directly on delimited files and multi-file sqlite databases.
- johnkerl/miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON.
- jqnatividad/qsv - A fork of the popular xsv command-line toolkit for working with CSV files that adds numerous useful features.
- Kanaries/pygwalker - Turn a Pandas DataFrame into a Tableau-style interface for visual analysis.
- lux-org/lux - Recommends a set of visualizations highlighting interesting trends and patterns in Pandas DataFrames.
- man-group/dtale - Visualizer for Pandas DataFrames.
- markfairbanks/tidytable - tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.
- OpenRefine - OpenRefine (previously Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
- pandas-dev/pandas - Flexible and powerful data analysis and manipulation library for Python.
- pola-rs/polars - Fast DataFrame library written in Rust and available for Python, R, and NodeJS.
- pstaender/csv2md - Converts CSV data to Markdown tables.
- ropensci/skimr - A frictionless, pipeable approach to dealing with summary statistics.
- saulpw/visidata - A terminal spreadsheet multitool for discovering and arranging data.
- shenwei356/csvtk - A cross-platform, efficient and practical CSV/TSV toolkit in Golang.
- sqlitebrowser/sqlitebrowser - A high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.
- TomWright/dasel - Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
- wireservice/csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
- ydataai/ydata-profiling - One-line exploratory data analysis for Pandas and Spark DataFrames.
Utilities
- adrianlopezroche/fdupes - A program for identifying or deleting duplicate files residing within specified directories.
- alacritty/alacritty - A cross-platform, GPU-accelerated terminal emulator.
- AppCleaner - A small macOS application that allows you to thoroughly uninstall unwanted apps.
- axel-download-accelerator/axel - Tries to accelerate the download process by using multiple connections per file, and can also balance the load between different servers. Supports HTTP, HTTPS, FTP and FTPS protocols.
- casey/just - A handy way to save and run project-specific commands.
- ClipGrab - A free downloader and converter for YouTube, Vimeo, Facebook and many other online video sites.
- Clipy/Clipy - Clipboard extension app for macOS.
- divriots/jampack - Optimizes static websites to improve performance and user experience.
- dwarvesf/hidden - An ultra-light MacOS utility that helps hide menu bar icons.
- gildas-lormeau/SingleFile - Web extension for saving a faithful copy of a complete web page in a single HTML file.
- go-task/task - A task runner / simpler Make alternative written in Go.
- ibraheemdev/modern-unix - A collection of modern alternatives to common Unix commands.
- joh/when-changed - Execute a command when a file is changed.
- jonschlinkert/markdown-toc - API and CLI for adding a table of contents to a Markdown file.
- newmarcel/KeepingYouAwake - A menu bar utility for macOS that prevents your Mac from going to sleep.
- phiresky/ripgrep-all - Wraps ripgrep and enables it to search more file types.
- rclone/rclone - A command-line program to sync files and directories to and from different cloud storage providers.
- restic/restic - Fast, secure, efficient backup program.
- schollz/croc - Easily and securely send things from one computer to another.
- stevenvachon/broken-link-checker - Find broken links within HTML files.
- storizzi/notes-exporter - Export Apple Notes in Markdown, HTML, or PDF format.
- tcort/markdown-link-check - Check hyperlinks in Markdown text.
- Y2Z/monolith - Command-line tool to save web pages as a single HTML file.
Variant identification and analysis
- ACEnglish/truvari - A toolkit for benchmarking, merging, and annotating structural variants.
- barricklab/breseq - A computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes.
- CRG-CNAG/CalliNGS-NF - GATK RNA-Seq variant calling in Nextflow.
- HKU-BAL/Clair3 - A germline small variant caller for long-reads.
- fritzsedlazeck/Sniffles - A fast structural variant caller for long-read sequence data.
- fritzsedlazeck/SURVIVOR - Toolset for SV simulation, comparison and filtering.
- marbl/gingr - A flexible platform for visualizing and compressing alignments and phylogenetic trees.
- marbl/parsnp - A command-line tool for efficient microbial core genome alignment and SNP detection.
- mkirsche/Jasmine - A pipeline for accurately detecting SVs and comparing variant calls across large numbers of individuals.
- nf-core/sarek - Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing.
- PacificBiosciences/pbsv - PacBio structural variant calling and analysis tools.
- PoisonAlien/maftools - Summarize, analyze and visualize MAF files from TCGA or in-house studies.
- ryanlayer/samplot - Plot structural variant signals from many BAMs and CRAMs.
- tjiangHIT/cuteSV - Long-read structural variation detection.
- tseemann/snippy - Rapid haploid variant calling and core genome alignment.
VCF files
- BGI-shenzhen/VCF2Dis - A simple and efficient tool to calculate a p-distance matrix from VCF files.
- brentp/vcfanno - Annotate a VCF with other VCFs/BEDs/tabixed files.
- dnanexus-rnd/GLnexus - Scalable gVCF merging and joint variant calling for population sequencing projects.
- freeseek/score - Includes BCFtools/liftover, a tool to convert genomic coordinates of variants in VCF format across different genome assemblies.
- Illumina/hap.py - A tool for benchmarking variant calls against a gold standard truth set.
- knausb/vcfR - A package to manipulate and visualize VCF data in R.
- pcingola/SnpEff - Genomic variant annotations and functional effect prediction toolbox.
- samtools/bcftools - A set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
- vcflib/vcflib - C++ library and command-line tools for parsing and manipulating VCF files.
- vcftools/vcftools - A set of tools written in Perl and C++ for working with VCF files.
- vembrane/vembrane - Filter VCF files using Python expressions.
Vim
- iggredible/Learn-Vim - Vim guide for beginner and advanced users.
Workflow development and workflows
- bcbio/bcbio-nextgen - A Python toolkit and pipelines for fully automated high throughput sequence data analysis.
- maxplanck-ie/snakepipes - Customizable workflows based on snakemake and python for the analysis of NGS data.
- nextflow-io/nextflow - A bioinformatics workflow manager that enables the development of portable and reproducible workflows.
- nf-core/tools - Tools for working with nf-core pipelines.
- nfcore - A community effort to collect a curated set of analysis pipelines built using Nextflow.
- Nukesor/pueue - A command-line task queue for sequential and parallel execution of long-running tasks.
- ploomber/ploomber - A framework to build collaborative and modular pipelines.
- ropensci/targets - A Make-like pipeline tool for statistics and data science in R.
- Snakemake workflow catalog - A comprehensive catalog of standards compliant, public, Snakemake workflows.
- snakemake/snakemake-wrappers - A collection of reusable wrappers for adding popular command-line tools to Snakemake workflows.
- snakemake/snakemake - A tool to create reproducible and scalable data analyses.