Igniting full-length isoform and mutation analysis of single-cell RNA-seq data with FLAMES

Abstract

In this vignette, we will walk through the FLAMES single-cell pipeline, from de-multiplexing raw FASTQ files to identifying and quantifying transcripts. We then perform analysis including single nucleotide polymorphism (SNP), differential transcript expression analysis and differential transcript usage analysis. We will use the cell line mixture data from Tian et al. (Tian et al., 2021-11-11).

Introduction

While the popular single-cell kits allows profiling the transcriptome of up to 10,000 cells (What is the maximum number of cells that can be profiled?) at once, the traditional short-read sequencing (100-200 nts) limits our ability to unambiguously profile isoform expression and mutations. To overcome this issue, we can combine long-read sequencing with existing single-cell kits.

Though long-read sequencing can provide read lengths several orders of magnitude larger than the typical short-read sequencing, it comes at much steeper cost and lower throughput, to accommodate this, we profile a smaller sub-sample of cells in our FLT-seq protocol(Tian et al., 2021-11-11):

There are also many other protocols that combine single-cell with long-read sequencing, including scRaCH-seq which profiles gene panel (Peng et al., 2024-01-30), LR-Split-seq which uses combinatorial barcoding and does not require microfluidic equipments (Rebboah et al., 2021-10-07).

Set-up

if (!file.exists("~/filtered_sorted.bam") | !file.exists("~/filtered_sorted.bam.bai")) {
  options(timeout = max(600, getOption("timeout")))
  download.file("https://zenodo.org/records/12751214/files/filtered_sorted.bam?download=1", "filtered_sorted.bam", mode = 'wb')
  download.file("https://zenodo.org/records/12751214/files/filtered_sorted.bam.bai?download=1", "filtered_sorted.bam.bai", mode = 'wb')
  bam <- "filtered_sorted.bam"
} else {
  bam <- path.expand("~/filtered_sorted.bam")
}

if (!file.exists("~/subset_GRCh38.fa")) {
  download.file("https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_47/GRCh38.primary_assembly.genome.fa.gz", "GRCh38.primary_assembly.genome.fa.gz", mode = 'wb')
  genome <- Biostrings::readDNAStringSet("GRCh38.primary_assembly.genome.fa.gz")
  names(genome) <- sapply(names(genome), function(x) strsplit(x, " ")[[1]][1])
  chr19 <- genome[genome$names == "chr19"]
  Biostrings::writeXStringSet(chr19, "~/subset_GRCh38.fa")
}
fa <- path.expand("~/subset_GRCh38.fa")

if (!file.exists("~/sce_lib10.qs") | !file.exists("~/sce_lib90.qs")) {
  download.file("https://zenodo.org/records/12770737/files/sce_lib10.qs?download=1", "sce_lib10.qs", mode = 'wb')
  download.file("https://zenodo.org/records/12770737/files/sce_lib90.qs?download=1", "sce_lib90.qs", mode = 'wb')
  sce_lib10 <- qs::qread("sce_lib10.qs")
  sce_lib90 <- qs::qread("sce_lib90.qs")
} else {
  sce_lib10 <- qs::qread("~/sce_lib10.qs")
  sce_lib90 <- qs::qread("~/sce_lib90.qs")
}

library(FLAMES)
library(SingleCellExperiment)
library(scater)
library(tidyverse)
library(ggplot2)
library(edgeR)
library(limma)

`FLAMES` pipeline

Read pre-processing

The FLAMES pipeline starts with de-multiplexing the FASTQ reads, i.e. identifiying the cell barcode. For protocols that use the 10x single-cell kits, the raw reads will look like the following, where the 10x adapter sequences and polyT tails are highlighted. Screenshot from SeqSizzle
Chromium Single Cell 3’ Reagent Kits User Guide

FLAMES will locate the cell barcode by locating the adjacent adapter sequences , and match the barcode against an allow-list (if it is provided). We can specify the above 10x 3’ protocol as follows:

protocol_pattern <- c(
  primer = "CTACACGACGCTCTTCCGATCT",
  BC = "NNNNNNNNNNNNNNNN",
  UMI = "NNNNNNNNNNNN", 
  polyT = "TTTTTTTTT"
)

Note that the order has to be the same as they appear in the protocol. The names can be arbitrary for each sequence, but the names UMI and BC are required .

Let’s try de-multiplexing a couple reads with FLAMES::find_barcode:

outdir <- tempfile()
dir.create(outdir)

# prepare a barcode allow-list
bc_allow <- file.path(outdir, "bc_allow.tsv")
R.utils::gunzip(
  filename = system.file("extdata/bc_allow.tsv.gz", package = "FLAMES"), 
  destname = bc_allow, remove = FALSE
)

find_barcode(
  # raw fastq file to be de-multiplexed
  fastq = system.file("extdata/fastq", package = "FLAMES"),
  # outputs
  stats_out = file.path(outdir, "bc_stat"),
  reads_out = file.path(outdir, "demultiplexed.fq"),
  barcodes_file = bc_allow,
  pattern = protocol_pattern, # our protocol pattern
  threads = 1
)

## FLEXIPLEX 0.96.2
## Setting max barcode edit distance to 2
## Setting max flanking sequence edit distance to 8
## Setting read IDs to be  replaced
## Setting number of threads to 1
## Search pattern: 
## primer: CTACACGACGCTCTTCCGATCT
## BC: NNNNNNNNNNNNNNNN
## UMI: NNNNNNNNNNNN
## polyT: TTTTTTTTT
## Setting known barcodes from /tmp/RtmphCOmSV/filebf7586c9898/bc_allow.tsv
## Number of known barcodes: 143
## Processing file: /usr/local/lib/R/site-library/FLAMES/extdata/fastq/demultiplexed.fq.gz
## Searching for barcodes...
## Processing file: /usr/local/lib/R/site-library/FLAMES/extdata/fastq/musc_rps24.fastq.gz
## Searching for barcodes...
## Number of reads processed: 403
## Number of reads where at least one barcode was found: 368
## Number of reads with exactly one barcode match: 364
## Number of chimera reads: 1
## All done!
## Skipping TSO trimming...

## $fastq
## $fastq$reads_tb
## # A tibble: 372 × 8
##    Read            CellBarcode FlankEditDist BarcodeEditDist UMI   TooShort Outfile Sample
##    <chr>           <chr>               <int>           <int> <chr> <lgl>    <chr>   <chr> 
##  1 50409b8e-c5c8-… CGGGTCAGTA…             1               0 CCGG… FALSE    /tmp/R… fastq 
##  2 d54ed65d-dad1-… CGGACTGAGT…             1               0 GGGA… FALSE    /tmp/R… fastq 
##  3 94bfd5ea-64df-… CTGTGCTTCC…             1               2 TTCT… FALSE    /tmp/R… fastq 
##  4 d6443706-7a0b-… GTCGTAATCC…             7               0 GGTA… FALSE    /tmp/R… fastq 
##  5 055dd650-c268-… TTTATGCAGA…             1               0 ATTC… FALSE    /tmp/R… fastq 
##  6 66cbd950-e2a9-… GTAACTGAGA…             2               0 TGAG… FALSE    /tmp/R… fastq 
##  7 e427b9da-019d-… CGAACATGTC…             2               0 ATAC… FALSE    /tmp/R… fastq 
##  8 5f7420ae-9705-… ACCGTAACAA…             3               1 CCTC… FALSE    /tmp/R… fastq 
##  9 f164d67e-d704-… GACTAACTCC…             3               1 GTTC… FALSE    /tmp/R… fastq 
## 10 c7ae369e-ad72-… CATCGAACAG…             2               1 TTAA… FALSE    /tmp/R… fastq 
## # ℹ 362 more rows
## 
## $fastq$read_counts
##                                       total reads 
##                                               403 
##                               demultiplexed reads 
##                                               368 
##                                single match reads 
##                                               364 
## single strand single barcode multi-matching reads 
##                                                 0 
##              single strand multiple barcode reads 
##                                                 3 
##                 both strands single barcode reads 
##                                                 0 
##               both strands multiple barcode reads 
##                                                 1

FLAMES will find and trim the specified pattern, save the barcode and UMI to the identifier field of the read, let check the first raw read:

read_lines(system.file("extdata/fastq/musc_rps24.fastq.gz", package = "FLAMES"),
           n_max = 2)

## [1] "@50409b8e-c5c8-4b86-8d2b-95effbb4434c"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
## [2] "GTTGCCGTTCAGTTGCTAAGCAGTGGTATCAACGCGATCCGCTGGGGCCTTGCGCGTTGATATGATTGGTCGGCAGTGAGCGGTCCTCTTTTCCTCCTCTCCCGGCTCGGCGCCGTAGCCATCATGAATGACAGTAACCATCCGGACCAGGAAGTTCATGACCAACCCAAATCTGCTCAGAGGAAACAGATGGTCATTGATATCCTTCATCCTGGGAAGGCAACAGTACCAAAAGGTGAAATTCAGGAAAAGCTGGCCAAAATGTACAAAACCACACCAGATGTCATCTTTGTATTTGGATTCAGAACCCACTTCAATTGGTAGCAAGACCACTAGCTTTGGCATGATCTATGATTCTTTAGATTATGCAAGAAGATGAGCCTAAACACAGACTGGCAAGACATGGCCTTTATGAGAGAAAGACCTCCGAAAACAGCAGAAAAGAGACACAAGAACAGAATGAAGAAGGTCAGGGGCACGGCGAGAGTAATGTTGGTGCTATGCAAAAAGTGAGTGGAGATTGGATACAGCCAAGGAGTAGATCTGCAGTGACTTGATGTTTGCTGTGATGTGCAGATTTCTGAGAGGACAAATAAACTAAAAGCTCCTACTAAAAAAAAAAAAAAAAAAAAAAGACAGCCCGGTACTACTACTGACCCGAGATCAGGAAGAGCGTCGTGTAGAGCAATACGTAAC"

and the first de-multiplexed read:

read_lines(file.path(outdir, "demultiplexed.fq"), n_max = 2)

## [1] "@CGGGTCAGTAGTAGTA_CCGGGCTGTCTT#50409b8e-c5c8-4b86-8d2b-95effbb4434c_-1of1\tCB:Z:CGGGTCAGTAGTAGTA\tUB:Z:CCGGGCTGTCTT"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
## [2] "AGTAGGAGCTTTTAGTTTATTTGTCCTCTCAGAAATCTGCACATCACAGCAAACATCAAGTCACTGCAGATCTACTCCTTGGCTGTATCCAATCTCCACTCACTTTTTGCATAGCACCAACATTACTCTCGCCGTGCCCCTGACCTTCTTCATTCTGTTCTTGTGTCTCTTTTCTGCTGTTTTCGGAGGTCTTTCTCTCATAAAGGCCATGTCTTGCCAGTCTGTGTTTAGGCTCATCTTCTTGCATAATCTAAAGAATCATAGATCATGCCAAAGCTAGTGGTCTTGCTACCAATTGAAGTGGGTTCTGAATCCAAATACAAAGATGACATCTGGTGTGGTTTTGTACATTTTGGCCAGCTTTTCCTGAATTTCACCTTTTGGTACTGTTGCCTTCCCAGGATGAAGGATATCAATGACCATCTGTTTCCTCTGAGCAGATTTGGGTTGGTCATGAACTTCCTGGTCCGGATGGTTACTGTCATTCATGATGGCTACGGCGCCGAGCCGGGAGAGGAGGAAAAGAGGACCGCTCACTGCCGACCAATCATATCAACGCGCAAGGCCCCAGCGGATCGCGTTGATACCACTGCTTAGCAACTGAACGGCAAC"

This was done with an Rcpp version of flexiplex (Cheng et al., 2024-03-01), and is equivalent to calling using the native flexiplex command line tool with the following arguments:

 ~/flexiplex/flexiplex -x CTACACGACGCTCTTCCGATCT \
  -b '????????????????' -u '????????????' -x TTTTTTTTT \
  -e 2 -f 8 -k bc_allow.tsv musc_rps24.fastq > main_out.fq

If cell barcodes are not known in advance, e.g. when there is no coupled short-reads, the BLAZE tool (You et al., 2023-04-06) can be used to identify the cell barcodes and UMI sequences , and FLAMES also provides a wrapper FLAMES::blaze to call BLAZE.

blaze(
  100,
  fq_in = system.file("extdata/fastq/musc_rps24.fastq.gz", package = "FLAMES")
)

Alignment

Once the reads are processed, they are ready to be aligned to the reference genome. FLAMES calls minimap2 (Li, 2018-09-15) to perform alignment and samtoools (Li et al., 2009-08-15) to sort and index the BAM file, so we need to make sure the binaries are available in $PATH for FLAMES to find it.

minimap2_align(
    config = jsonlite::fromJSON(
      system.file('extdata', 'config_sclr_nanopore_3end.json', package = 'FLAMES')
    ),
    fa_file = fa,
    fq_in = file.path(outdir, "demultiplexed.fq"),
    annot = system.file("extdata", "filtered.gtf", package = "IgniteRNAseq"),
    outdir = outdir
)

FLAMES then performs gene quantification, isoform identification and isoform quantification. For the interest of time we will skip running them individually, as FLAMES also provides the sc_long_pipeline wrapper to do all steps. It will return a SingleCellExperiment object with the isoform counts, and the isoform annotations stored in the rowRanges slot. Let’s run the pipeline with a small number of reads:

sce <- FLAMES::sc_long_pipeline(
  genome_fa = system.file("extdata/rps24.fa.gz", package = "FLAMES"),
  fastq = system.file("extdata/fastq/musc_rps24.fastq.gz", package = "FLAMES"),
  annotation = system.file("extdata/rps24.gtf.gz", package = "FLAMES"),
  outdir = outdir,
  barcodes_file = bc_allow
)

Data pre-processing

After running the pipeline, we get a SingleCellExperiment object and we can finally start some analysis. Let’s load the SingleCellExperiment object from our previous cell line mixture data (Tian et al., 2021-11-11).

sce <- qs::qread(system.file("extdata", "sce.qs", package = "IgniteRNAseq"))
sce

## class: SingleCellExperiment 
## dim: 15076 162 
## metadata(1): OutputFiles
## assays(1): counts
## rownames(15076): ENSG00000000419.14_50934870_50958532_1
##   ENSG00000001036.14_143495487_143511720_2 ... ENST00000713574.1
##   ENST00000713579.1
## rowData names(3): transcript_id gene_id FSM_match
## colnames(162): AAAGAACAGCGATCGA AACAGGGTCCAGTGCG ... TTTCAGTCACCAGTAT
##   TTTGGAGCACAAAGTA
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(1): gene

altExp(sce, 'gene')

## class: SingleCellExperiment 
## dim: 14555 162 
## metadata(0):
## assays(1): counts
## rownames(14555): ENSG00000179820.16 ENSG00000114062.22 ... ENSG00000212907.2
##   ENSG00000228253.1
## rowData names(0):
## colnames(162): AAAGAACAGCGATCGA AACAGGGTCCAGTGCG ... TTTCAGTCACCAGTAT
##   TTTGGAGCACAAAGTA
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

I have also attached the gene count matrix from long-reads to the altExp slot. Let’s do clustering with the gene counts, using Bioconductor packages for normalization, dimensionality reduction and clustering:

altExp(sce, 'gene') <- altExp(sce, 'gene') |>
  scuttle::logNormCounts() |>
  scater::runPCA() |>
  scater::runUMAP()

sce$cluster <- scran::buildSNNGraph(altExp(sce, 'gene')) |>
  igraph::cluster_leiden() |>
  igraph::membership() |>
  factor()

altExp(sce, 'gene')$cluster <- sce$cluster

scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = 'sizeFactor')

scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = 'cluster')

# sce <- qs::qread(system.file("extdata", "processed_sce.qs", package = "IgniteRNAseq"))
download.file("https://raw.githubusercontent.com/ChangqingW/IgniteRNAseq/devel/inst/extdata/processed_sce.qs", "processed_sce.qs", mode = 'wb')
sce <- qs::qread("processed_sce.qs")
scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = 'sizeFactor')

scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = 'cluster')

We can clearly see the five cell lines are identified by the clustering using the gene counts from the long-reads. Now let’s see if we can get mutations / isoforms specific to each cell line.

Mutation analysis

Mutation discovery

Let’s suppose we no nothing about the mutations in our cell lines, we can use the FLAMES::find_variants function get bulk mutation count (similar to the samtools mpileup command, but in a tibble format with extra information):

mutation_discovery_tb <- FLAMES::find_variants(
  bam,
  fa,
  system.file("extdata", "filtered.gtf", package = "IgniteRNAseq"),
  min_nucleotide_depth = 2000,
  annotated_region_only = TRUE,
  names_from = "gene_name",
  threads = 4
)

## 12:07:10 Reading reference ...

## 12:07:10 Reading annotation ...

## Merging overlapping genes ...

## 0 overlapping regions, their gene_name(s) are merged with `, ` as separator

## 12:07:10 Got 1 bam file, parallelizing over each region ...

##   |                                                                                        |                                                                                |   0%  |                                                                                        |================                                                                |  20%  |                                                                                        |================================                                                |  40%  |                                                                                        |================================================                                |  60%  |                                                                                        |================================================================                |  80%  |                                                                                        |================================================================================| 100%

## 12:08:03 Merging results ...

## 12:08:03 Calculating homopolymer percentages ...

##   |                                                                                        |                                                                                |   0%  |                                                                                        |                                                                                |   1%  |                                                                                        |=                                                                               |   1%  |                                                                                        |=                                                                               |   2%  |                                                                                        |==                                                                              |   2%  |                                                                                        |==                                                                              |   3%  |                                                                                        |===                                                                             |   3%  |                                                                                        |===                                                                             |   4%  |                                                                                        |====                                                                            |   4%  |                                                                                        |====                                                                            |   5%  |                                                                                        |====                                                                            |   6%  |                                                                                        |=====                                                                           |   6%  |                                                                                        |=====                                                                           |   7%  |                                                                                        |======                                                                          |   7%  |                                                                                        |======                                                                          |   8%  |                                                                                        |=======                                                                         |   8%  |                                                                                        |=======                                                                         |   9%  |                                                                                        |========                                                                        |   9%  |                                                                                        |========                                                                        |  10%  |                                                                                        |========                                                                        |  11%  |                                                                                        |=========                                                                       |  11%  |                                                                                        |=========                                                                       |  12%  |                                                                                        |==========                                                                      |  12%  |                                                                                        |==========                                                                      |  13%  |                                                                                        |===========                                                                     |  13%  |                                                                                        |===========                                                                     |  14%  |                                                                                        |============                                                                    |  15%  |                                                                                        |============                                                                    |  16%  |                                                                                        |=============                                                                   |  16%  |                                                                                        |=============                                                                   |  17%  |                                                                                        |==============                                                                  |  17%  |                                                                                        |==============                                                                  |  18%  |                                                                                        |===============                                                                 |  18%  |                                                                                        |===============                                                                 |  19%  |                                                                                        |================                                                                |  20%  |                                                                                        |================                                                                |  21%  |                                                                                        |=================                                                               |  21%  |                                                                                        |=================                                                               |  22%  |                                                                                        |==================                                                              |  22%  |                                                                                        |==================                                                              |  23%  |                                                                                        |===================                                                             |  23%  |                                                                                        |===================                                                             |  24%  |                                                                                        |====================                                                            |  25%  |                                                                                        |====================                                                            |  26%  |                                                                                        |=====================                                                           |  26%  |                                                                                        |=====================                                                           |  27%  |                                                                                        |======================                                                          |  27%  |                                                                                        |======================                                                          |  28%  |                                                                                        |=======================                                                         |  28%  |                                                                                        |=======================                                                         |  29%  |                                                                                        |========================                                                        |  30%  |                                                                                        |========================                                                        |  31%  |                                                                                        |=========================                                                       |  31%  |                                                                                        |=========================                                                       |  32%  |                                                                                        |==========================                                                      |  32%  |                                                                                        |==========================                                                      |  33%  |                                                                                        |===========================                                                     |  33%  |                                                                                        |===========================                                                     |  34%  |                                                                                        |============================                                                    |  35%  |                                                                                        |============================                                                    |  36%  |                                                                                        |=============================                                                   |  36%  |                                                                                        |=============================                                                   |  37%  |                                                                                        |==============================                                                  |  37%  |                                                                                        |==============================                                                  |  38%  |                                                                                        |===============================                                                 |  38%  |                                                                                        |===============================                                                 |  39%  |                                                                                        |================================                                                |  40%  |                                                                                        |================================                                                |  41%  |                                                                                        |=================================                                               |  41%  |                                                                                        |=================================                                               |  42%  |                                                                                        |==================================                                              |  42%  |                                                                                        |==================================                                              |  43%  |                                                                                        |===================================                                             |  43%  |                                                                                        |===================================                                             |  44%  |                                                                                        |====================================                                            |  45%  |                                                                                        |====================================                                            |  46%  |                                                                                        |=====================================                                           |  46%  |                                                                                        |=====================================                                           |  47%  |                                                                                        |======================================                                          |  47%  |                                                                                        |======================================                                          |  48%  |                                                                                        |=======================================                                         |  48%  |                                                                                        |=======================================                                         |  49%  |                                                                                        |========================================                                        |  50%  |                                                                                        |=========================================                                       |  51%  |                                                                                        |=========================================                                       |  52%  |                                                                                        |==========================================                                      |  52%  |                                                                                        |==========================================                                      |  53%  |                                                                                        |===========================================                                     |  53%  |                                                                                        |===========================================                                     |  54%  |                                                                                        |============================================                                    |  54%  |                                                                                        |============================================                                    |  55%  |                                                                                        |=============================================                                   |  56%  |                                                                                        |=============================================                                   |  57%  |                                                                                        |==============================================                                  |  57%  |                                                                                        |==============================================                                  |  58%  |                                                                                        |===============================================                                 |  58%  |                                                                                        |===============================================                                 |  59%  |                                                                                        |================================================                                |  59%  |                                                                                        |================================================                                |  60%  |                                                                                        |=================================================                               |  61%  |                                                                                        |=================================================                               |  62%  |                                                                                        |==================================================                              |  62%  |                                                                                        |==================================================                              |  63%  |                                                                                        |===================================================                             |  63%  |                                                                                        |===================================================                             |  64%  |                                                                                        |====================================================                            |  64%  |                                                                                        |====================================================                            |  65%  |                                                                                        |=====================================================                           |  66%  |                                                                                        |=====================================================                           |  67%  |                                                                                        |======================================================                          |  67%  |                                                                                        |======================================================                          |  68%  |                                                                                        |=======================================================                         |  68%  |                                                                                        |=======================================================                         |  69%  |                                                                                        |========================================================                        |  69%  |                                                                                        |========================================================                        |  70%  |                                                                                        |=========================================================                       |  71%  |                                                                                        |=========================================================                       |  72%  |                                                                                        |==========================================================                      |  72%  |                                                                                        |==========================================================                      |  73%  |                                                                                        |===========================================================                     |  73%  |                                                                                        |===========================================================                     |  74%  |                                                                                        |============================================================                    |  74%  |                                                                                        |============================================================                    |  75%  |                                                                                        |=============================================================                   |  76%  |                                                                                        |=============================================================                   |  77%  |                                                                                        |==============================================================                  |  77%  |                                                                                        |==============================================================                  |  78%  |                                                                                        |===============================================================                 |  78%  |                                                                                        |===============================================================                 |  79%  |                                                                                        |================================================================                |  79%  |                                                                                        |================================================================                |  80%  |                                                                                        |=================================================================               |  81%  |                                                                                        |=================================================================               |  82%  |                                                                                        |==================================================================              |  82%  |                                                                                        |==================================================================              |  83%  |                                                                                        |===================================================================             |  83%  |                                                                                        |===================================================================             |  84%  |                                                                                        |====================================================================            |  84%  |                                                                                        |====================================================================            |  85%  |                                                                                        |=====================================================================           |  86%  |                                                                                        |=====================================================================           |  87%  |                                                                                        |======================================================================          |  87%  |                                                                                        |======================================================================          |  88%  |                                                                                        |=======================================================================         |  88%  |                                                                                        |=======================================================================         |  89%  |                                                                                        |========================================================================        |  89%  |                                                                                        |========================================================================        |  90%  |                                                                                        |========================================================================        |  91%  |                                                                                        |=========================================================================       |  91%  |                                                                                        |=========================================================================       |  92%  |                                                                                        |==========================================================================      |  92%  |                                                                                        |==========================================================================      |  93%  |                                                                                        |===========================================================================     |  93%  |                                                                                        |===========================================================================     |  94%  |                                                                                        |============================================================================    |  94%  |                                                                                        |============================================================================    |  95%  |                                                                                        |============================================================================    |  96%  |                                                                                        |=============================================================================   |  96%  |                                                                                        |=============================================================================   |  97%  |                                                                                        |==============================================================================  |  97%  |                                                                                        |==============================================================================  |  98%  |                                                                                        |=============================================================================== |  98%  |                                                                                        |=============================================================================== |  99%  |                                                                                        |================================================================================|  99%  |                                                                                        |================================================================================| 100%

mutation_discovery_tb

## # A tibble: 501 × 10
##    seqnames      pos nucleotide  count    sum   freq ref   bam_path region homopolymer_pct
##    <fct>       <dbl> <fct>       <int>  <dbl>  <dbl> <fct> <chr>    <chr>            <dbl>
##  1 chr19    48965820 -            4102 177214 0.0231 char… filtere… FTL               -Inf
##  2 chr19    48965820 C          173112 177214 0.977  char… filtere… FTL               -Inf
##  3 chr19    48965821 T          176517 176517 1      char… filtere… FTL               -Inf
##  4 chr19    48965822 T          176137 176137 1      char… filtere… FTL               -Inf
##  5 chr19    48965823 -            2967 177758 0.0167 char… filtere… FTL               -Inf
##  6 chr19    48965823 C          174791 177758 0.983  char… filtere… FTL               -Inf
##  7 chr19    48965825 +            3399 176933 0.0192 char… filtere… FTL               -Inf
##  8 chr19    48965824 C          174591 174591 1      char… filtere… FTL               -Inf
##  9 chr19    48965825 -            2411 176933 0.0136 char… filtere… FTL               -Inf
## 10 chr19    48965825 A            4480 176933 0.0253 char… filtere… FTL               -Inf
## # ℹ 491 more rows

Since this is Nanopore sequencing, there ones with very low alternative allele frequency are likely to be sequencing errors; Nanopore is also known to have higher error rate in homopolymer regions, so we can filter out those as well:

mutation_discovery_tb <- mutation_discovery_tb |>
  filter(freq > 0.2, freq < 0.8) |>
  arrange(desc(count))

mutation_discovery_tb

## # A tibble: 12 × 10
##    seqnames      pos nucleotide  count    sum  freq ref    bam_path region homopolymer_pct
##    <fct>       <dbl> <fct>       <int>  <dbl> <dbl> <fct>  <chr>    <chr>            <dbl>
##  1 chrM         8228 C          211448 266121 0.795 chara… filtere… MT-CO2            -Inf
##  2 chrM         8269 G          209767 264311 0.794 chara… filtere… MT-CO2            -Inf
##  3 chrM         8251 G          156306 263505 0.593 chara… filtere… MT-CO2            -Inf
##  4 chrM         9123 G          150286 219009 0.686 chara… filtere… MT-AT…            -Inf
##  5 chrM         7028 T          130634 249199 0.524 chara… filtere… MT-CO1            -Inf
##  6 chrM         7028 C          115940 249199 0.465 chara… filtere… MT-CO1            -Inf
##  7 chrM         8251 A          104893 263505 0.398 chara… filtere… MT-CO2            -Inf
##  8 chrM        11719 G           98511 149795 0.658 chara… filtere… MT-ND4            -Inf
##  9 chr19    48965830 C           85423 178263 0.479 chara… filtere… FTL               -Inf
## 10 chr19    48965830 T           84346 178263 0.473 chara… filtere… FTL               -Inf
## 11 chrM         9123 A           66376 219009 0.303 chara… filtere… MT-AT…            -Inf
## 12 chrM        11719 A           51284 149795 0.342 chara… filtere… MT-ND4            -Inf

As we can see, there is a couple of potential SNP mutations, let’s see if they are correlated with our clustering.

Mutation calling

The previous find_variants function is for bulk mutation calling, but since we have single-cell long-reads, we can call mutations at the single-cell level, and associate them with transcription profiles.

FLAMES offers the sc_mutations function to call mutations at the single-cell level:

snps_tb <- FLAMES::sc_mutations(
  bam,
  seqnames = as.character(mutation_discovery_tb$seqnames),
  positions = mutation_discovery_tb$pos,
  indel = F,
  barcodes = colnames(sce),
  threads = 4
)

## 12:08:27 Got 1 bam file, parallelizing over each position ...

##   |                                                                                        |                                                                                |   0%  |                                                                                        |=======                                                                         |   8%  |                                                                                        |=============                                                                   |  17%  |                                                                                        |====================                                                            |  25%  |                                                                                        |===========================                                                     |  33%  |                                                                                        |=================================                                               |  42%  |                                                                                        |========================================                                        |  50%  |                                                                                        |===============================================                                 |  58%  |                                                                                        |=====================================================                           |  67%  |                                                                                        |============================================================                    |  75%  |                                                                                        |===================================================================             |  83%  |                                                                                        |=========================================================================       |  92%  |                                                                                        |================================================================================| 100%

## 12:10:07 Merging results ...

Lets visualize the SNP status of each cell on the long-read derived UMAP. We will color the cell by the allele that is supported by more than half of the reads from that cell. This can be done easily with some tidyverse functions. The cell lines are homozygous, but with e.g. cancer data this could require more considerations.

head(snps_tb)

## # A tibble: 6 × 7
##   allele barcode          allele_count cell_total_reads     pct   pos seqname
##   <chr>  <chr>                   <dbl>            <dbl>   <dbl> <dbl> <chr>  
## 1 A      AAAGAACAGCGATCGA            1              828 0.00121  8228 chrM   
## 2 A      AACAGGGTCCAGTGCG            0             1109 0        8228 chrM   
## 3 A      AAGTCGTCAGTAGGAC            3             2483 0.00121  8228 chrM   
## 4 A      AATCACGGTCCTCCTA            0             1538 0        8228 chrM   
## 5 A      AATCGACAGGGTGGGA            0             1325 0        8228 chrM   
## 6 A      AATCGACTCGTTTACT            0             1368 0        8228 chrM

chr19 <- snps_tb |>
  filter(allele_count > 0, pct > 0.5) |>
  filter(seqname == 'chr19', pos == 48965830)
chr19 <- chr19[match(colnames(sce), chr19$barcode),"allele"] |>
 as.data.frame()
scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = chr19)

chrM_7028 <- snps_tb |>
  filter(allele_count > 0, pct > 0.5) |>
  filter(seqname == 'chrM', pos == 7028)
chrM_7028 <- chrM_7028[match(colnames(sce), chrM_7028$barcode),"allele"] |>
 as.data.frame()
scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = chrM_7028)

chrM_8251 <- snps_tb |>
  filter(allele_count > 0, pct > 0.5) |>
  filter(seqname == 'chrM', pos == 8251)
chrM_8251 <- chrM_8251[match(colnames(sce), chrM_8251$barcode),"allele"] |>
 as.data.frame()
scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = chrM_8251)

chrM_9123 <- snps_tb |>
  filter(allele_count > 0, pct > 0.5) |>
  filter(seqname == 'chrM', pos == 9123)
chrM_9123 <- chrM_9123[match(colnames(sce), chrM_9123$barcode),"allele"] |>
 as.data.frame()
scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = chrM_9123)

chrM_11719 <- snps_tb |>
  filter(allele_count > 0, pct > 0.5) |>
  filter(seqname == 'chrM', pos == 11719)
chrM_11719 <- chrM_11719[match(colnames(sce), chrM_11719$barcode),"allele"] |>
 as.data.frame()
scater::plotReducedDim(altExp(sce, 'gene'), 'UMAP', colour_by = chrM_11719)

Differential Transcript Experssion analysis

Since we have the transcript count matrix from FLAMES, we can do some DTE analysis. Let’s pretend that instead of cell lines, we have two replicate samples, and we devide the cells in each sample by a SNP mutation status. We can use limma to find differentially transcribed genes in the mutated vs reference cells.

First we need to aggregate the counts to pseudo-bulk samples:

# aggregate to pseudo-bulk counts
pseudo_bulk <- counts(sce) |> 
    t() |>
    as.matrix() |>
    by(sce$cluster, colSums) |>
    sapply(identity) |>
    by(rownames(counts(sce)), colSums) |>
    sapply(identity) |>
    t()

dgelist <- DGEList(
  # throw out one cell line to make this 2 vs 2
  counts = pseudo_bulk[, -2],
  group = factor(c("mutated", "reference", "mutated", "reference"))
)

# remove lowly expressed isoforms
dgelist <- dgelist[
  filterByExpr(dgelist), , 
  keep.lib.sizes = FALSE
]

# visualize the pseudo-bulk samples
plotMDS(dgelist)

Looks like there is some within-group variation as well as between-group variation. Let’s use limma to find significantly differentiated expressed transcripts between the mutated and reference cells.

dgelist <- calcNormFactors(dgelist, method = "TMM")
# means model
design <- model.matrix(~ 0 + dgelist$samples$group, data = dgelist$samples)
colnames(design) <- gsub(".*\\$", "", colnames(design)) |>
  gsub(" .*$", "", x = _)
dge_v <- voom(dgelist, design, save.plot = T, plot = F, span = 0.2)
efit <- lmFit(dge_v, dge_v$design) |>
  contrasts.fit(contrasts = makeContrasts(
    MalevsFemale = "groupmutated - groupreference",
    levels = dge_v$design
  )) |>
  eBayes()
topTable(efit)

##                    logFC AveExpr      t  P.Value adj.P.Val    B
## ENST00000621160.5  -7.20    7.01 -11.71 1.29e-06    0.0103 5.15
## ENST00000284440.9   5.95    6.64  11.00 2.15e-06    0.0103 4.92
## ENST00000618200.4  -7.91    6.81 -11.03 2.10e-06    0.0103 4.46
## ENST00000649529.1  -5.55    7.65  -9.11 9.85e-06    0.0289 3.80
## ENST00000316403.15  5.12    5.99   8.87 1.21e-05    0.0289 3.58
## ENST00000412585.7  -4.57    6.00  -8.60 1.55e-05    0.0306 3.37
## ENST00000361157.11 -4.61    6.96  -8.41 1.85e-05    0.0306 3.27
## ENST00000361566.7  -5.38    8.03  -8.29 2.06e-05    0.0306 3.19
## ENST00000378292.9   6.48    5.22   9.03 1.05e-05    0.0289 3.18
## ENST00000368716.9   4.90    6.44   8.25 2.14e-05    0.0306 3.14

Let’s visualise the top differentially expressed transcript on the UMAP:

scater::plotReducedDim(altExp(sce[, sce$cluster != 2], 'gene'), 'UMAP', 
  colour_by = data.frame(exp = counts(sce)['ENST00000621160.5', sce$cluster != 2]),
  shape_by =  "cluster"
)

Differential Transcript Usage analysis

Previously we have identified the transcripts that are differentially expressed ( i.e. DTE), another question we can ask is whether there are differentiall transcript usage (DTU) between the mutated and reference cells. The difference between DTU and DTE is illustrated by the following figure, adapted from (Ntranos et al., 2019-02).

The sc_DTU_analysis function can test for DTU (chi-squre test against uniform distribution). It will ignore the transcription start site and end site differences and only consider splicing differences.

# The splice sites are saved in isoform_FSM_annotation.csv during
# isoform identification. Let's make sure it is available:
sce@metadata$OutputFiles$outdir <- tempdir()
download.file('https://zenodo.org/records/12751214/files/isoform_FSM_annotation.csv?download=1', 
              file.path(sce@metadata$OutputFiles$outdir, "isoform_FSM_annotation.csv"))

colLabels(sce) <- sce$cluster
dtu_df <- sc_DTU_analysis(sce)

We will get a data frame with the chi-square test results:

head(dtu_df)

##                 gene_id X_value df                                   DTU_tr DTU_group
## 794  ENSG00000130066.17    2311 12   ENSG00000130066.17_23783173_23799848_2         4
## 1634 ENSG00000186468.13    1733 12   ENSG00000186468.13_82276060_82278355_2         1
## 2229 ENSG00000170889.14    1691 12                        ENST00000391751.7         4
## 2477 ENSG00000138326.21    6555 12                        ENST00000360830.9         5
## 2623 ENSG00000243678.12    4943 12                        ENST00000503064.5         2
## 1127 ENSG00000148346.12    1027 12 ENSG00000148346.12_128149453_128153453_7         5
##        p_value     adj_p
## 794   0.00e+00  0.00e+00
## 1634  0.00e+00  0.00e+00
## 2229  0.00e+00  0.00e+00
## 2477  0.00e+00  0.00e+00
## 2623  0.00e+00  0.00e+00
## 1127 2.81e-212 7.95e-209

Visualizations for DTU

Unlike DTE, DTU is a bit harder to visiuliaze. For experiments similar to FLT-seq, where matching short-reads are available for both the smaller sub-sample and the larger sub-sample, we provide some additional plotting functions:

sce_lr <- altExp(sce, 'gene')
altExp(sce) <- NULL
altExp(sce_lr, 'transcript') <- sce
counts(sce_lr) <- as.matrix(counts(sce_lr)) 
rownames(sce_lr) <- str_remove(rownames(sce_lr), "\\..*")
colnames(sce_lib90) <- sce_lib90$Barcode
combined_sce <- combine_sce(sce_lr, sce_lib90) |>
  scuttle::logNormCounts() |>
  scater::runPCA() |>
  scater::runUMAP()

plot_isoform_reduced_dim(combined_sce, "ENSG00000170889.14")

## Constructing graphics...
## Constructing graphics...
## Constructing graphics...
## Constructing graphics...

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] edgeR_4.4.0                 limma_3.62.0                lubridate_1.9.3            
##  [4] forcats_1.0.0               stringr_1.5.1               dplyr_1.1.4                
##  [7] purrr_1.0.2                 readr_2.1.5                 tidyr_1.3.1                
## [10] tibble_3.2.1                tidyverse_2.0.0             scater_1.34.0              
## [13] ggplot2_3.5.1               scuttle_1.16.0              SingleCellExperiment_1.28.0
## [16] SummarizedExperiment_1.36.0 Biobase_2.66.0              GenomicRanges_1.58.0       
## [19] GenomeInfoDb_1.42.0         IRanges_2.40.0              S4Vectors_0.44.0           
## [22] BiocGenerics_0.52.0         MatrixGenerics_1.18.0       matrixStats_1.4.1          
## [25] FLAMES_2.0.1               
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.5                  ProtGenerics_1.38.0       bitops_1.0-9             
##   [4] httr_1.4.7                RColorBrewer_1.1-3        doParallel_1.0.17        
##   [7] tools_4.4.1               backports_1.5.0           utf8_1.2.4               
##  [10] R6_2.5.1                  HDF5Array_1.34.0          uwot_0.2.2               
##  [13] lazyeval_0.2.2            rhdf5filters_1.18.0       GetoptLong_1.0.5         
##  [16] withr_3.0.2               prettyunits_1.2.0         GGally_2.2.1             
##  [19] gridExtra_2.3             cli_3.6.3                 textshaping_0.4.0        
##  [22] scatterpie_0.2.4          labeling_0.4.3            ggbio_1.54.0             
##  [25] sass_0.4.9                pkgdown_2.1.1             yulab.utils_0.1.7        
##  [28] Rsamtools_2.22.0          systemfonts_1.1.0         txdbmaker_1.2.0          
##  [31] foreign_0.8-87            R.utils_2.12.3            dichromat_2.0-0.1        
##  [34] parallelly_1.38.0         BSgenome_1.74.0           rstudioapi_0.17.1        
##  [37] RSQLite_2.3.7             FNN_1.1.4.1               generics_0.1.3           
##  [40] shape_1.4.6.1             RApiSerialize_0.1.4       BiocIO_1.16.0            
##  [43] vroom_1.6.5               Matrix_1.7-1              ggbeeswarm_0.7.2         
##  [46] fansi_1.0.6               abind_1.4-8               R.methodsS3_1.8.2        
##  [49] lifecycle_1.0.4           yaml_2.3.10               rhdf5_2.50.0             
##  [52] SparseArray_1.6.0         BiocFileCache_2.14.0      grid_4.4.1               
##  [55] blob_1.2.4                dqrng_0.4.1               crayon_1.5.3             
##  [58] dir.expiry_1.14.0         lattice_0.22-6            beachmat_2.22.0          
##  [61] cowplot_1.1.3             GenomicFeatures_1.58.0    KEGGREST_1.46.0          
##  [64] magick_2.8.5              metapod_1.14.0            pillar_1.9.0             
##  [67] knitr_1.48                ComplexHeatmap_2.22.0     bambu_3.8.0              
##  [70] rjson_0.2.23              xgboost_1.7.8.1           codetools_0.2-20         
##  [73] glue_1.8.0                ggfun_0.1.7               data.table_1.16.2        
##  [76] vctrs_0.6.5               png_0.1-8                 gtable_0.3.6             
##  [79] cachem_1.1.0              xfun_0.48                 S4Arrays_1.6.0           
##  [82] DropletUtils_1.26.0       iterators_1.0.14          bluster_1.16.0           
##  [85] statmod_1.5.0             bit64_4.5.2               progress_1.2.3           
##  [88] filelock_1.0.3            bslib_0.8.0               irlba_2.3.5.1            
##  [91] vipor_0.4.7               rpart_4.1.23              colorspace_2.1-1         
##  [94] DBI_1.2.3                 Hmisc_5.2-0               nnet_7.3-19              
##  [97] tidyselect_1.2.1          bit_4.5.0                 compiler_4.4.1           
## [100] curl_5.2.3                httr2_1.0.5               graph_1.84.0             
## [103] htmlTable_2.4.3           BiocNeighbors_2.0.0       basilisk.utils_1.18.0    
## [106] xml2_1.3.6                desc_1.4.3                DelayedArray_0.32.0      
## [109] stringfish_0.16.0         rtracklayer_1.66.0        checkmate_2.3.2          
## [112] scales_1.3.0              RBGL_1.82.0               rappdirs_0.3.3           
## [115] SpatialExperiment_1.16.0  digest_0.6.37             rmarkdown_2.28           
## [118] basilisk_1.18.0           XVector_0.46.0            htmltools_0.5.8.1        
## [121] pkgconfig_2.0.3           base64enc_0.1-3           sparseMatrixStats_1.18.0 
## [124] highr_0.11                dbplyr_2.5.0              fastmap_1.2.0            
## [127] ensembldb_2.30.0          rlang_1.1.4               GlobalOptions_0.1.2      
## [130] htmlwidgets_1.6.4         UCSC.utils_1.2.0          DelayedMatrixStats_1.28.0
## [133] farver_2.1.2              jquerylib_0.1.4           jsonlite_1.8.9           
## [136] BiocParallel_1.40.0       R.oo_1.26.0               BiocSingular_1.22.0      
## [139] VariantAnnotation_1.52.0  RCurl_1.98-1.16           magrittr_2.0.3           
## [142] Formula_1.2-5             GenomeInfoDbData_1.2.13   Rhdf5lib_1.28.0          
## [145] munsell_0.5.1             Rcpp_1.0.13               viridis_0.6.5            
## [148] reticulate_1.39.0         stringi_1.8.4             MASS_7.3-61              
## [151] zlibbioc_1.52.0           plyr_1.8.9                ggstats_0.7.0            
## [154] parallel_4.4.1            listenv_0.9.1             ggrepel_0.9.6            
## [157] Biostrings_2.74.0         hms_1.1.3                 circlize_0.4.16          
## [160] locfit_1.5-9.10           igraph_2.1.1              reshape2_1.4.4           
## [163] biomaRt_2.62.0            ScaledMatrix_1.14.0       XML_3.99-0.17            
## [166] evaluate_1.0.1            biovizBase_1.54.0         scran_1.34.0             
## [169] RcppParallel_5.1.9        BiocManager_1.30.25       tweenr_2.0.3             
## [172] tzdb_0.4.0                foreach_1.5.2             polyclip_1.10-7          
## [175] qs_0.27.2                 future_1.34.0             clue_0.3-65              
## [178] ggforce_0.4.2             rsvd_1.0.5                restfulr_0.0.15          
## [181] AnnotationFilter_1.30.0   viridisLite_0.4.2         ragg_1.3.3               
## [184] OrganismDbi_1.48.0        memoise_2.0.1             beeswarm_0.4.0           
## [187] AnnotationDbi_1.68.0      GenomicAlignments_1.42.0  cluster_2.1.6            
## [190] timechange_0.3.0          globals_0.16.3