Using Séance

All Séance commands are run through the ‘seance’ script that you should already have installed (see installation guide).

Contents

Getting help
Downloading example dataset
Data preprocessing
Preprocessing summary
OTU clustering (454 data)
OTU clustering (non 454 data)
Cluster labelling
Phylogenetic placement of cluster centroids
Inspection with Wasabi
Heatmap visualisations

 
Back to Séance home

Getting help

To see details about all seance subcommands and their options type:

 

seance --help

 

This also works for individual commands, for example we can see all the options to the ‘preprocess’ command by typing:

 

seance preprocess --help

 

Downloading example dataset

For this example we are going to analyse four soil samples from a survey of nematode diversity in temperate rainforests using the 18S rRNA marker gene (see Porazinska et al. 2010 and accession SRX160525 from the short read archive). First we download the .sra archive. dump everything into a FASTQ file and demultiplex the samples. Demultiplexing is performed with a custom script written using BioPython.

 

wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR522/SRR522901/SRR522901.sra
fastq-dump SRR522901.sra
wget http://wasabiapp.org/download/seance/tutorial_files/porazinska_demultiplex.py
python porazinska_demultiplex.py

 

FASTQ files are placed in a directory called ‘soil_data’, one sample per file:

 

grep -c "^+$" soil_data/*fastq
soil_data/SRR522901_soil1.fastq:6286
soil_data/SRR522901_soil2.fastq:4697
soil_data/SRR522901_soil3.fastq:16319
soil_data/SRR522901_soil4.fastq:16170

 

(Alternatively, just download the data from here).

Data preprocessing

Séance requires that there be one sample per file. Unfortunately we do not have the original SFF files so we cannot use the flowgrams to perform denoising, but we can be quite stringent with quality control instead:

 

seance preprocess --outdir soil_out --forwardprimer TACAAAGGGCAGGGACGTAAT --clipprimers --primererrors 2 --midlength 15 --miderrors 1 --length 250 --qualmethod window --quality 35 --windowlength 50 --chimeras soil_data/*.fastq

 

This command will preprocess all FASTQ files found in the ‘data’ directory. Séance keeps all of it’s temporary files in a directory specified by the user with the –outdir flag. In this case the directory is called ‘soil_out’. This directory must be stated in all subsequent commands.
 
The rest of the command states what the forward primer is (–forwardprimer), that the primer sequence is to be removed (–clipprimers) and that we will allow two errors in the primer sequence or else discard the read (–primererrors). The length of the MID (–midlength) is set to 15, the MIDs used in this dataset are actually 11 basepairs, but the four base library key sequence (TCAG) has not been clipped. The MID can contain one error (–miderrors), sequences are truncated to 250 basepairs or else filtered out (–length). Quality filtering is performed in windowed mode (–qualmethod), where the window is 50 basepairs (–window) and must have an average quality of 35 (–quality). Chimeras checking is performed with UCHIME (–chimeras).

Preprocessing summary

We can use the summary command to see statistics about our preprocessing:

 

seance summary --outdir soil_out

 

Which in this case outputs:

 

       Filename    Length(250)    MID(1)    Windowed(50)AvgQuality(35)    Primer(2)    Chimera    Homopolymer(8)    Ambiguous(N)    Accepted    Unique

SRR522901_soil1            787       173                          2696            9        261                 0              41        2319       814
SRR522901_soil2              5       103                          2220            5         55                 0              28        2281       700
SRR522901_soil3             38      6915                          4567           25         69                 1              88        4616      1371
SRR522901_soil4              3      5223                          5849           17        155                 0             100        4823      1491

         Totals            833     12414                         15332           56        540                 1             257       14039      4376

 

Most of the sequences that were filtered out were due to not meeting the quality threshold (all 50 basepair windows must have an average quality of 35 or greater) or having two many errors in the barcode (MID).

OTU clustering (454 data)

With our data preprocessed we can move on to clustering the sequences into OTUs. By default Séance uses Pagan’s modelling of homopolymer length uncertainty on sequences from platforms known to produce such errors (e.g. Roche/454). This modelling reduces alignment errors around low-complexity regions separated by short spacers and affects the calculation of sequence similarity.

Séance provides several options to control clustering. In this example we state the similarity threshold that defines whether a sequence should be clustered with a centroid sequence (–threshold) and the minimum copy number for a sequence to be included in clustering (–duplicates), ‘–duplicates 2’ effectively removes all singleton sequences.

 

seance cluster --outdir soil_out --similarity 0.99 --duplicates 2

 

The cluster subcommand generates two files: soil_out/seance.cluster.fasta and soil_out/seance.cluster.biom, containing the cluster centroid sequences and abundance data, respectively. In this example the four soil samples were found to contain 223 OTUs in total.

 

Whilst all the count data is stored in the BIOM file format, the count data can also be accessed using the showcounts commands. By default the output is tab separated, but the user can specify the delimiter character (–delimiter) to generate a comma separated value file, for example.

 

seance showcounts --outdir soil_out > tab_separated.txt
seance showcounts --outdir soil_out --delimiter ',' > comma_separated.txt

 

OTU clustering (non 454 data)

For non 454 data (i.e. for which we do not expect homopolymer errors), use the –nohomopolymer flag:

 

seance cluster --outdir soil_out --similarity 0.99 --duplicates 2 --nohomopolymer

 

Cluster labelling

Séance implements a simple method for getting cluster labels by finding the lowest common ancestor in the NCBI taxonomy of all top scoring BLAST hits. Using this approach we find that 77 out of 223 OTUs BLAST to species in the phylum Nematoda:

 

seance label --outdir soil_out --labels taxonomy --cutoff 0.95

 

Sometimes labeling can fail due to the NCBI servers being overloaded and requests timeout or otherwise fail. Instead of rerunning the entire label command we can just tell it to get the missing labels:

 

seance label --outdir soil_out --labels taxonomy --cutoff 0.95 --missing

 

Or a custom database used (here I have used the SILVA database):

 

seance label --outdir soil_out --labels blastlocal --cutoff 0.95 --dbfile silva_115_db.fasta

 

Labels can be inspected with the showlabels command:

 

seance showlabels --outdir soil_out

 

Phylogenetic placement of cluster centroids

Séance supports reference-based phylogenetic analysis using Pagan’s phylogenetic placement capabilities ensuring an accurate phylogenetic tree is created even with short amplicon sequences. To perform phylogenetic placement you require a premade multiple sequence alignment (–refalignment) and phylogenetic tree (–reftree), both generated using the complete marker gene. In our case we based the reference on the SILVA alignments (release 115) and inferred a tree using RaXML.

 

Download the example reference tree and alignment:

 

curl -O http://wasabiapp.org/download/seance/18S_nematode_reference/SSU_reference_l5_Silva.fasta
curl -O http://wasabiapp.org/download/seance/18S_nematode_reference/SSU_reference_l5_Silva.tree

 

Run the phylogenetic placement:

 

seance phylogeny --outdir soil_out --refalignment SSU_reference_l5_Silva.fasta --reftree SSU_reference_l5_Silva.tree --subset nematoda

 

In this example we are only interested in nematodes, so we use the information from the labels to only place the sequences that BLASTed to nematode species (–subset nematoda).

Inspection with Wasabi

Séance has integrated support for viewing phylogeny and multiple sequence alignments with Wasabi (replace XXX with your username):

 

seance wasabi --outdir soil_out --user XXX

 

You can see the result by looking at this shared link from Wasabi: http://wasabi2.biocenter.helsinki.fi:8000?share=v04M8s

Heatmap visualisations

For publication Séance can create heatmaps integrating the phylogenetic tree:

 

seance heatmap --outdir soil_out

 

and without the integrated tree:

 

seance heatmap --outdir soil_out --notree

 

There are many options for generating heatmaps, in this final example we do not print the part of the label before ‘Nematoda’ and ladderise the tree:

 

seance heatmap --outdir soil_out --labelclip nematoda --ladderise

 

was@bi