1  Refresher and other resources

On this page we link to a few resources on what the pre-requisite knowledge for the rest of this course.

1.1 Unix command line and bash scripting

Be sure to check out the various other resources linked on this page.

For a more interactive learning experience, you can try https://sandbox.bio/.

1.2 Whole-genome sequencing pipelines

We have collected a number of related resources below, which have served as an inspiration while preparing our own course. We hope they’ll be useful to you for further learning as well.

The video below by Tobias Rausch @ EMBL-EBI (Rausch 2022) also provides an excellent overview of various topics that come up in genomic variant calling, but it is broader in scope than just AmpliSeq or molecular surveillance of parasites:

1.3 Computational thinking, best practices and reproducibility

  • Track software versions
  • Use “good” file and variable names
  • Keep track of order of analysis and how each file was produced
  • Structure scripts/data in clear directories
  • Avoid manually tweaking things
  • Add readmes and/or comments for complex steps
  • (Track/version your own code on git)
  • (for pipelining / complex scripts: try everything on small test data before starting a long-running analysis)
  • (general bash safety / best practices - no sudo, doublecheck every rm and mv command, etc.)

Further reading: