This document presents the UMAP (Uniform Manifold Approximation and
Projection) analysis of single-cell RNA sequencing (scRNA-seq) data from
the published study E-MTAB-12598. UMAP is a
dimensionality reduction technique that captures high-dimensional
transcriptional variation, allowing visualization of cellular
heterogeneity and lineage transitions.
The dataset consists of human embryonic stem cells (hESCs) subjected
to different durations of Wnt signaling activation via GSK3β inhibition
(CHIR99021).
Cells were collected at 4 hours, 24 hours, and 3 days
after Wnt activation for analysis.
In the next phase, we will conduct a time-resolved study of
cardiomyocyte differentiation from hESCs, analyzing both Wnt inhibition
and differentiation stages, with cells collected at various time points.
The filtered feature barcode matrix (filtered_feature_bc_matrix.h5)
is used for analysis, as it contains high-quality cells.
# Set working directory
setwd("/home/telomere/Documents/degree_project_AG_Cantulab/poster/UMAPs")
# Load necessary libraries
library("Seurat")
library("ggplot2")
library("umap")
library("patchwork")
library("Cairo")
# Define time points
time_points <- c("hESC_WntOFF", "hESC_Chir4h", "hESC_Chir24h", "hESC_Chir3d")
# Create an empty list to store Seurat objects
seurat_list <- list()
# Loop through each time point and process it
for (time in time_points) {
# Define file path
file_path <- paste0(time, "_filtered_feature_bc_matrix.h5")
# Load data
data <- Read10X_h5(file_path)
# Create Seurat object
seurat_obj <- CreateSeuratObject(counts = data, min.cells = 3, min.features = 200)
# Assign time point identity
seurat_obj$orig.ident <- time # <-- This line ensures each time point is labeled
# Normalize and find variable features
seurat_obj <- NormalizeData(seurat_obj)
seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = "vst", nfeatures = 2000)
# Scale data and perform PCA
seurat_obj <- ScaleData(seurat_obj)
seurat_obj <- RunPCA(seurat_obj, npcs = 30)
# Run UMAP
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:20)
seurat_obj <- FindClusters(seurat_obj, resolution = 0.5)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:20)
# Store processed object in list
seurat_list[[time]] <- seurat_obj
}
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2377
## Number of edges: 99249
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8505
## Number of communities: 5
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 4000
## Number of edges: 156048
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8707
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3231
## Number of edges: 122425
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8202
## Number of communities: 6
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3181
## Number of edges: 122107
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8418
## Number of communities: 7
## Elapsed time: 0 seconds
The original dataset labeled hESC_WntOFF to represent cells
without CHIR99021 (CHIR) treatment. However, since Wnt signaling is
already active in hESCs, this label may be misleading. To better reflect
the experimental setup, we rename it hESC_D0 (No CHIR),
indicating cells collected before CHIR exposure.
CHIR was added to enhance Wnt activation, promoting mesoderm
induction alongside Activin A. The time points correspond to different
durations of CHIR exposure, capturing early lineage transitions.
To implement this change, we update the factor levels accordingly.
To analyze all time points together while preserving their
individual identities, we merge the Seurat objects.
# Merge all Seurat objects and keep their identities
merged_seurat <- merge(seurat_list[[1]], y = seurat_list[2:length(seurat_list)], add.cell.ids = names(seurat_list))
# Ensure `orig.ident` is correctly assigned
table(merged_seurat$orig.ident) # should show counts per time point
##
## hESC_Chir24h hESC_Chir3d hESC_Chir4h hESC_WntOFF
## 3231 3181 4000 2377
# Rename 'hESC_WntOFF' to 'hESC_D0 (No CHIR)' in the metadata
merged_seurat$orig.ident <- as.character(merged_seurat$orig.ident) # Convert to character
merged_seurat$orig.ident[merged_seurat$orig.ident == "hESC_WntOFF"] <- "hESC_D0 (No CHIR)"
# Convert back to factor and define levels in ascending order
merged_seurat$orig.ident <- factor(
merged_seurat$orig.ident,
levels = c("hESC_D0 (No CHIR)", "hESC_Chir4h", "hESC_Chir24h", "hESC_Chir3d")
)
# Run PCA and UMAP again after merging
merged_seurat <- NormalizeData(merged_seurat)
merged_seurat <- FindVariableFeatures(merged_seurat)
merged_seurat <- ScaleData(merged_seurat)
merged_seurat <- RunPCA(merged_seurat, npcs = 30)
merged_seurat <- FindNeighbors(merged_seurat, dims = 1:20)
merged_seurat <- FindClusters(merged_seurat, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 12789
## Number of edges: 468178
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9258
## Number of communities: 14
## Elapsed time: 1 seconds
##
## hESC_D0 (No CHIR) hESC_Chir4h hESC_Chir24h hESC_Chir3d
## 2377 4000 3231 3181
# Plot UMAP to visualize batch effects
DimPlot(merged_seurat, group.by = "orig.ident", reduction = "umap", cols = c("firebrick", "gold", "seagreen", "steelblue"))
Batch effect assessment in UMAP
This UMAP
visualizes cells by time points (orig.ident) to check for batch effects
before correction.
Distinct clusters suggest transcriptional differences across time
points, but strong separation may indicate batch effects.
Overlap
between CHIR 4h (yellow) and CHIR 24h (green) suggests a gradual
transition, while D0 (No CHIR) (red) and CHIR 3d (blue) remain distinct.
If differences are biological, batch correction may not be needed.
However, if separation is driven by technical artifacts, Harmony or
another correction method should be applied.
library(harmony)
# Run Harmony to correct batch effects
merged_seurat <- RunHarmony(
object = merged_seurat,
group.by.vars = "orig.ident" # Adjust batch based on time points
)
# Run UMAP on Harmony-corrected embeddings
merged_seurat <- RunUMAP(merged_seurat, reduction = "harmony", dims = 1:20)
# Plot corrected UMAP
DimPlot(merged_seurat, reduction = "umap", group.by = "orig.ident", cols = c("firebrick", "gold", "seagreen", "steelblue"))
Interpretation of Batch-corrected UMAP
This
UMAP plot visualizes all time points after batch correction, ensuring
that clustering reflects biological variation rather than technical
artifacts.
Stronger overlap among D0 (No CHIR) (red), CHIR 4h (yellow), and CHIR
24h (green) suggests better alignment of transcriptionally similar
populations, reducing artificial separation.
CHIR day 3 (blue)
remains distinct, likely representing true biological differences rather
than batch effects, as later differentiation stages should diverge.
Batch correction has minimized artificial clustering, preserving
meaningful transitions in gene expression dynamics.
This confirms
that Harmony has effectively integrated the datasets, allowing a clearer
analysis of differentiation trajectories.
The trajectory from
Day0 (No CHIR) → CHIR 4h → CHIR 24h → CHIR 72h suggests a progressive
shift in gene expression.
Overlapping regions between adjacent time
points indicate transitional states, where cells are gradually shifting
toward a Wnt-activated phenotype.
The distinct spatial separation
of the clusters indicates that Wnt signaling significantly alters gene
expression dynamics over time.
This UMAP suggests that Wnt activation drives gradual and distinct
transcriptional changes in hESCs, likely influencing their pluripotency,
differentiation potential, or lineage commitment. Further gene-specific
UMAPs (e.g., stemness, mesoderm, cardiomyocyte markers) will help
clarify how Wnt influences differentiation pathways.
# Check presence of genes in all conditions
gene_list <- c("AXIN2", "SP5", "NANOG", "SOX2", "POU5F1", "FOXA2", "EOMES", "MESP1", "TBXT", "GATA4", "HAND1", "NKX2-5", "ISL1", "TBX5")
# Apply function to check if each gene is present in all conditions
gene_presence <- sapply(gene_list, function(g) sapply(seurat_list, function(x) g %in% rownames(x)))
# Print results
print(gene_presence)
## AXIN2 SP5 NANOG SOX2 POU5F1 FOXA2 EOMES MESP1 TBXT GATA4 HAND1
## hESC_WntOFF TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE
## hESC_Chir4h TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## hESC_Chir24h TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## hESC_Chir3d TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## NKX2-5 ISL1 TBX5
## hESC_WntOFF FALSE TRUE TRUE
## hESC_Chir4h FALSE TRUE TRUE
## hESC_Chir24h FALSE TRUE TRUE
## hESC_Chir3d TRUE TRUE TRUE
Stemness markers → AXIN2, NANOG,
SOX2, POU5F1
# Stemness markers
FeaturePlot(merged_seurat, features = c("AXIN2"), reduction = "umap", cols = c("lightgray", "blue")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("NANOG"), reduction = "umap", cols = c("lightgray", "blue")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("SOX2"), reduction = "umap", cols = c("lightgray", "blue")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("POU5F1"), reduction = "umap", cols = c("lightgray", "blue")) +
theme(plot.title = element_text(size = 10))
Mesoderm markers → EOMES, MESP1,
TBXT, GATA4
# Mesoderm markers
FeaturePlot(merged_seurat, features = c("EOMES"), reduction = "umap", cols = c("lightgray", "darkgreen")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("MESP1"), reduction = "umap", cols = c("lightgray", "darkgreen")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("TBXT"), reduction = "umap", cols = c("lightgray", "darkgreen")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("GATA4"), reduction = "umap", cols = c("lightgray", "darkgreen")) +
theme(plot.title = element_text(size = 10))
Cardiomyocyte markers → NKX2.5,
ISL1, TBX5, HAND1
# Cardiomyocyte markers
FeaturePlot(merged_seurat, features = c("NKX2-5"), reduction = "umap", cols = c("lightgray", "red")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("ISL1"), reduction = "umap", cols = c("lightgray", "red")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("TBX5"), reduction = "umap", cols = c("lightgray", "red")) +
theme(plot.title = element_text(size = 10))
FeaturePlot(merged_seurat, features = c("HAND1"), reduction = "umap", cols = c("lightgray", "red")) +
theme(plot.title = element_text(size = 10))
Interpreting gene-specific UMAPs in hESC
differentiation
Each UMAP plot visualizes the expression
of a specific gene across all time points. The color gradient indicates
expression levels, with gray representing low or no expression and
blue/red/yellow showing high expression.
Cells are positioned based on transcriptional similarity—closer
clusters suggest similar expression profiles.
Stemness Markers (AXIN2, NANOG,
SOX2, POU5F1)
These UMAP plots visualize
the expression of key stemness markers (POU5F1, NANOG,
SOX2, AXIN2) across all conditions.
POU5F1 (OCT4): A master regulator of pluripotency, essential
for maintaining self-renewal in embryonic stem cells. Its widespread
expression in earlier time points indicates active pluripotency.
SOX2: Functions alongside OCT4 to sustain pluripotency and
prevent differentiation, showing a similar expression pattern.
NANOG: Supports stem cell identity and self-renewal, but its
expression is weaker, suggesting a gradual loss of pluripotency as cells
transition towards differentiation.
AXIN2: A Wnt signaling
target involved in regulating self-renewal and differentiation balance,
showing moderate expression.
The expression of POU5F1
(OCT4), SOX2, and NANOG is prominent in the left
clusters, corresponding to early time points, and diminishes in the
right cluster (CHIR 3d), indicating a transition away from pluripotency.
This suggests that as Wnt signaling persists, cells progressively lose
their pluripotent identity, aligning with early differentiation events.
Mesoderm Markers (EOMES, MESP1,
TBXT, GATA4)
TBXT (Brachyury):
A key mesodermal marker, essential for early mesoderm and primitive
streak formation. Its expression is enriched in distinct clusters,
indicating mesodermal commitment in a subset of cells.
GATA4: A transcription factor involved in mesoderm and cardiac
lineage specification. Its low expression across time points suggests
that cardiogenic commitment is not yet prominent.
MESP1: A
master regulator of cardiovascular progenitor development, showing
minimal expression, consistent with an early differentiation state
before mesoderm-derived cardiac fate specification.
EOMES:
Critical for mesoderm induction and primitive streak formation. Its
expression is observed in specific clusters, aligning with cells
transitioning toward mesodermal fate.
The expression patterns of these markers suggest early mesodermal
specification, with TBXT and EOMES showing the highest
expression, marking early-stage mesoderm induction. The low expression
of GATA4 and MESP1 reinforces that cells are in early
mesodermal stages and have not yet fully committed to mesoderm
differentiation.
Cardiomyocyte Markers (NKX2-5, ISL1,
TBX5, HAND1)
Since these cells have not
yet been subjected to a cardiomyocyte differentiation protocol, it is
expected that cardiac markers like NKX2-5, ISL1,
HAND1, and TBX5 would show little to no expression at
these time points.
NKX2-5: A crucial cardiac transcription factor essential for
heart development, showing sparse expression.
ISL1: A
marker of cardiac progenitors, expected at early stages of heart
specification, but largely absent here.
TBX5: Regulates
cardiac lineage commitment and chamber formation, with low expression,
consistent with the early time points.
HAND1: Plays a role
in ventricular development, but its minimal presence further confirms
that these cells have not initiated cardiac differentiation.
The
lack of strong expression indicates that cardiomyogenic commitment has
not yet begun, aligning with the experimental setup where Wnt has not
been inactivated to promote cardiac differentiation.
Future datasets, including Wnt inhibition phases, will be necessary
to observe definitive cardiomyocyte commitment.