Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
On low-dimensional embeddings of high-dimensional data
0
50
100
150
0 50 100 150
0
20
40
60
value
as.matrix(dist(rw)) ●●
●
●
●●
●●●●●
●
●●●
●●●●
●●●●
●
●●●●
● ● ●●●●●
●
● ●
●
●●●
●●
●●
●●●
●
●●●
●●●●
●●●
●●
●
●●●●
●●●●
●
●
●
●
●●
●
●
●●●
●●
●●●
●●
●●
●●●●
●●
●
●
●
●●
●●●●●
●●●
●
●●●
●●
● ●●
●●●●●
●
●
●●●●
●●
●●●●●●●● ●●
●
●●● ●
●●●
−30
−20
−10
0
10
20
30
−20 0 20P1
P2
cmdscale
●●
●
●
●●
●●●●●
●
●●●
●●●●
●●●●
●
●●●●
● ● ●●●●●
●
● ●
●
●●●
●●
●●
●●●
●
●●●
●●●●
●●●
●●
●
●●●●
●●●●
●
●
●
●
●●
●
●
●●●
●●
●●●
●●
●●
●●●●
●●
●
●
●
●●
●●●●●
●●●
●
●●●
●●
● ●●
●●●●●
●
●
●●●●
●●
●●●●●●●● ●●
●
●●● ●
●●●
−30
−20
−10
0
10
20
30
−20 0 20P1
P2
isoMDS
●●●●
●●●
●●
●●●
●●●●●
●●●●
●●●●
●●●●
●●●●●●
●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●
●●●●●●●●●●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●
●●●●●●●
●●●●●●●
−10
−5
0
5
10
−10 −5 0 5 10P1
P2
t−SNE
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●
●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−7.5
−5.0
−2.5
0.0
2.5
−10 −5 0 5P1
P2
UMAP
Wolfgang Huber and Susan Holmes
include detecting coverage peaks or concentrations in chromatin immunoprecipitation–sequencing, counting the number of cDNA fragments that match each transcript or exon (RNA-seq) and call-ing DNA sequence variants (DNA-seq). Such summaries can be stored in an instance of the class GenomicRanges.
Coordinated analysis of multiple samples. To facilitate the analysis of experiments and studies with multiple samples, Bioconductor defines the SummarizedExperiment class. The computed summa-ries for the ranges are compiled into a rectangular array whose rows correspond to the ranges and whose columns correspond to the dif-ferent samples (Fig. 2). For a typical experiment, there can be tens of thousands to millions of ranges and from a handful to hundreds of samples. The array elements do not need to be single numbers: the summaries can be multivariate.
The SummarizedExperiment class also stores metadata on the rows and columns. Metadata on the samples usually include experi-mental or observational covariates as well as technical information such as processing dates or batches, file paths, etc. Row metadata comprise the start and end coordinates of each feature and the identifier of the containing polymer, for example, the chromo-some name. Further information can be inserted, such as gene or exon identifiers, references to external databases, reagents, func-tional classifications of the region (e.g., from efforts such as the Encyclopedia of DNA Elements (ENCODE)5) or genetic associa-tions (e.g., from genome-wide association studies, the study of rare diseases, or cancer genetics). The row metadata aid integrative analysis, for example, when matching two experiments according to overlap of genomic regions of interest. Tight coupling of meta-data with the data reduces opportunities for clerical errors during reordering or subsetting operations.
Annotation packages and resources. Reference genomes, annota-tions of genomic regions and associated gene products (transcripts or proteins), and mappings between molecule identifiers are essen-tial for placing statistical and bioinformatic results into biological perspective. These needs are partly addressed by the Bioconductor annotation data repository, which provides 894 prebuilt standardized annotation packages for use with common model
organisms as well as other organisms. Each of the packages presents its data through a standard interface using defined Bioconductor classes, including classes for whole-genome sequences (BSgenome), gene model or transcript databases (TxDb) derived from UCSC (University of California, Santa Cruz) tracks or BioMart annota-tions, and identifier cross-references from the US National Center for Biotechnology Information, or NCBI (org). There are also facili-ties for users to create their own annotation packages.
The AnnotationHub resource provides ready access to more than 10,000 genome-scale assay and annotation data sets obtained from Ensembl, ENCODE, dbSNP, UCSC and other sources and delivered in an easy-to-access format (e.g., Ranges-compatible, where appropriate). Bioconductor also supports direct access to underlying file formats such as GTF, 2bit or indexed FASTA.
Bioconductor also offers facilities for directly accessing online resources through their application programming interfaces. This can be valuable when a resource is not represented in an annotation package or when the very latest version of the data is required. The rtracklayer package accesses tables and tracks underlying the UCSC Genome Browser, and the biomaRt package supports fine-grained on-line harvesting of Ensembl, UniProt, COSMIC (Catalogue Of Somatic Mutations In Cancer) and allied resources. Many additional packages access web resources, for example, KEGGREST, PSICQUIC and Uniprot.ws.
Figure 2 | The integrative data container SummarizedExperiment. Its assays component is one or several rectangular arrays of equivalent row and column dimensions. Rows correspond to features, and columns to samples. The component rowData stores metadata about the features, including their genomic ranges. The colData component keeps track of sample-level covariate data. The exptData component carries experiment-level information, including MIAME (minimum information about a microarray experiment)-structured metadata21. The R expressions exemplify how to access components. For instance, provided that these metadata were recorded, rowData(se)$entrezId returns the NCBI Entrez Gene identifiers of the features, and se$tissue returns the tissue descriptions for the samples. Range-based operations, such as %in%, act on the rowData to return a logical vector that selects the features lying within the regions specified by the data object CNVs. Together with the bracket operator, such expressions can be used to subset a SummarizedExperiment to a focused set of genes and tissues for downstream analysis.
BOX 1 GETTING STARTEDInstall R and Bioconductor following the directions at http://www.bioconductor.org/install. Optionally, choose an Integrated Development Environment (IDE), for example, RStudio (http://www.rstudio.com). Learn the basics of the R language, for example, with http://tryr.codeschool.com.
Explore the Bioconductor help, http://www.bioconductor.org/help—which includes material from training courses, sample workflows, vignettes and manual pages—and the online support forum (https://support.bioconductor.org).
Identify and install Bioconductor packages using hierarchically organized “BiocViews” and text search (http://www.bioconductor.org/packages/release/BiocViews.html) and by exploring ‘landing pages’ for package descriptions and links to vignettes, manual pages and usage statistics.
Get to work exploring sample data sets and adapting established workflows for your own analysis!
Feat
ures
(gen
es)
Samples
assays(se)
Feat
ures
(gen
es)
rowData(se)
Sam
ples
exptData(se)exptData(se)$projectId
colData(se)
rowData(se)$entrezId assays(se)$count
colData(se)$tissuese$tissue
se <-SummarizedExperiment( assays, rowData, colData, exptData )
se %in% CNVs
NATURE METHODS | VOL.12 NO.2 | FEBRUARY 2015 | 117
PERSPECTIVE
Dimension reduction / embedding
≈
U Λ Vt
n x mn = 20000, m=1000:
n x m = 2 x 108
n x p p x mp x pp = 3
(n + 1 + m) x p = 63003
Applications: Principal component analysis (PCA), Non-negative matrix factorization (NMF), ...
Multi-dimensional scaling (MDS)
Classical MDS is achieved by singular value decomposition of (double centred) D 2. In R: cmdscale
Non-linear extensions: t-SNE, UMAP, ...
������������ ������� ��� ������������� ���� 249
I Question 9.1 Make a barplot of all the eigenvalues ouput by the cmdscale func-tion: what do you notice? J
I Solution 9.1 If you execute:plotbar(MDSEuro, m = length(MDSEuro$eig))
you will note that unlike in PCA, there are some negative eigenvalues, these aredue to the fact that the data do not come from a Euclidean space. ⇤
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
BarcelonaBelgrade
Berlin Brussels
Bucharest
Budapest
CopenhagenDublin
Hamburg
Istanbul
Kiev
London
MadridMilan
Moscow
Munich
ParisPrague
Rome
Saint_Petersburg
Sofia
Stockholm
Vienna
Warsaw
−1000
−500
0
500
1000
−2000 −1000 0 1000 2000PCo1
PCo2
Figure 9.3: MDS map of European cities based ontheir distances.
To position the points on the map we have projected them on the new coordinatescreated from the distances (we will discuss how the algorithm works in the nextsection). Note that while relative positions in Figure 9.3 are correct, the orientation ofthe map is unconventional: e. g., Istanbul, which is in the South-East of Europe, is atthe top left.MDSeur = tibble(
PCo1 = MDSEuro$points[, 1],
PCo2 = MDSEuro$points[, 2],
labs = rownames(MDSEuro$points))
g = ggplot(MDSeur, aes(x = PCo1, y = PCo2, label = labs)) +
geom_point(color = "red") + xlim(-1950, 2000) + ylim(-1150, 1150) +
coord_fixed() + geom_text(size = 4, hjust = 0.3, vjust = -0.5)
g
We reverse the signs of the principal coordinates and redraw the map. We also readin the cities’ true longitudes and latitudes and plot these alongside for comparison(Figure 9.4).g %+% mutate(MDSeur, PCo1 = -PCo1, PCo2 = -PCo2)
Eurodf = readRDS("../data/Eurodf.rds")
ggplot(Eurodf, aes(x = Long,y = Lat, label = rownames(Eurodf))) +
geom_point(color = "blue") + geom_text(hjust = 0.5, vjust = -0.5)
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Barcelona Belgrade
BerlinBrussels
Bucharest
Budapest
CopenhagenDublin
Hamburg
Istanbul
Kiev
London
MadridMilan
Moscow
Munich
ParisPrague
Rome
Saint_Petersburg
Sofia
Stockholm
Vienna
Warsaw
−1000
−500
0
500
1000
−2000 −1000 0 1000 2000PCo1
PCo2
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
Saint_PetersburgStockholm
Dublin
MoscowCopenhagen
Hamburg
BrusselsBerlin
London
Paris
Rome
Prague
Warsaw
MunichVienna
Kiev
Budapest
Milan
MadridBarcelona
BelgradeBucharest
SofiaIstanbul
40
45
50
55
60
0 10 20 30Long
Lat
Figure 9.4: Left: same as Figure 9.3, but with axesflipped. Right: true latitudes and longitudes.
I Question 9.2 Which cities seem to have the worst representation on the PCoA mapin the left panel of Figure 9.4? J
I Solution 9.2 It seems that the cities at the extreme West: Dublin, Madrid andBarcelona have worse projections than the central cities. This is likely because thedata are more sparse in these areas and it is harder for the method to ‘triangulate’ the
Two-dimensional layout of European cities based on matrix D of their pairwise distances
What does this have to do with RNA-seq?
ARTICLE RESEARCH
(Extended Data Fig. 6h) and performed flow cytometry analysis using the markers CD16/32, and CSF1R. Rare CD16/32+CSF1R+ cells were found in all dissected regions (Extended Data Fig. 6i), indicating that by E8.5 this population has already started to migrate out of the yolk sac.
A platform to dissect genetic mutationsPrevious work has emphasized the critical role of the basic helix–loop–helix (bHLH) transcription factor TAL1 (also known as SCL) in haematopoiesis; in these experiments, Tal1−/− mouse embryos died of severe anaemia at around E9.531. Dissecting the temporal and mechanistic roles of such major regulatory genes in vivo is challenging using knockout mice—breeding mice and genotyping embryos is time- consuming, and furthermore, the direct effects of a mutation are often masked by gross developmental malformations or embryo lethality. To circumvent these difficulties, we generated chimeric mouse embryos in which Tal1−/− tdTomato+ mouse embryonic stem (ES) cells were injected into wild-type blastocysts. In the resulting chimaeras,
wild-type cells still produce blood cells, and this allows the specific effects of TAL1 depletion to be studied in an otherwise healthy embryo32.
To determine whether Tal1 mutant cells were associated with abnor-malities in specific lineages, we sorted tdTomato− (wild type) and tdTomato+ (Tal1−/−) cells from chimeric embryos at E8.5, and then performed scRNA-seq (Fig. 4a; Extended Data Fig. 7a, b). Each cell was annotated by computationally mapping its transcriptome onto our wild-type atlas (Methods; Fig. 4b; Extended Data Fig. 7c–e). Consistent with the pivotal role of Tal1 in haematopoiesis, tdTomato+ cells did not contribute to blood lineages (Fig. 4b; Extended Data Fig. 7e–g). Notably, we confirmed that wild-type control tdTomato+ Tal1+/+ ES cells, when injected into wild-type embryos, make a similar contribu-tion to haematopoiesis as the tdTomato− host cells (Extended Data Fig. 7h, i).
Comparisons between wild-type and Tal1−/− chimeric cells mapped to the landscape defined in Fig. 3a illustrated that TAL1 depletion
a
c
b
e
Mes
ECHaem
Ery
MyMk
BP
BP4 Haem4
EC7
f
g
Ncf2Spi1Alox5apNrrosDok2Hcls1Celf2Fcgr3Lyz2Fcer1gTyrobpCoro1aPtpn7LynMef2cClec1bTimp3LatBin2Rab27bPlekSlaItgb3Slc35d3GnazMplRgs18Gp5Thbs1Gp1bbGp9Pf4Treml1F2rl2Gimap5MfngGimap1Fxyd5Cd34Oit3Igf1Gpr182Cldn5SelenopPlvapIcam2Lyve1Gap43
EC7 Haem4 My BP4 Mk
My
Csf1rPtprcKitFcgr3Tmem119Adgre1Cx3cr1
EMP
Mic
r.
d
4 3 21
2
13
Mk43
My4
76
58
34
21
12
1 2
Mes
Ery
BP
Haem
EC
E6.5
E6.75
E7.0
E7.25
E7.5
E7.75
E8.0
E8.25
E8.5
Spi1
Itga2b
Kdr Low
High
Expr
essi
on le
vels
0 Frac
tion
of c
ells
EC1
EC2
EC3
EC4
EC5
EC6
EC7
EC80
0.5
1.0
Location of endothelium:
Yolk sacEmbryo properAllantois
High
Low
Expr
essi
on le
vels
Fig. 3 | Temporal analysis of blood emergence reveals early myeloid cells. a, Force-directed graph layout of cells associated with the blood lineage, coloured by sub-cluster (15,875 cells). The inset box shows a zoomed-in section that focuses on myeloid, megakaryocytic, and haemogenic endothelial cells. BP, blood progenitor; EC, endothelial cell; Ery, erythrocyte; Haem, haemato-endothelial progenitor; Mes, mesodermal cell; Mk, megakaryocyte; My, myeloid cell. b, Graph abstraction summarizing the relationships between the sub-clusters as in a, coloured by sub-cluster (left) and collection time point (right). Two samples of mixed-time point embryos were excluded. c, Expression levels of Kdr, overlaid on the force-directed layout from a. d, Expression levels of Spi1 and Itga2b, overlaid on the inset of the force-directed layout from a.
e, Fraction of endothelial cells that mapped to yolk sac, allantois, and embryo proper. f, Heat map illustrating row-normalized expression of genes that were significantly upregulated in cells of the EC7 (n = 197), Haem4 (n = 102), My (n = 56), BP4 (n = 54), and Mk (n = 32) sub-clusters when performing pairwise differential expression analyses between a specific sub-cluster and the rest of the cells in a. Significance was considered if log2(mean expression of specific cluster/mean expression of the rest of cells) > 2.5 and Benjamini–Hochberg-adjusted P < 0.05. g, Heat map illustrating the log-count expression (log2(normalized count + 1), ranging from 0 (blue) to 3.5 (red)) of previously described microglial (Micr.) and erythro-myeloid progenitor (EMP) markers.
2 8 F E B R U A R Y 2 0 1 9 | V O L 5 6 6 | N A T U R E | 4 9 3
Fig. 3a from Pijuan-Sala et al. (Nature 2019)
15,875 cells from blood lineage (out of 116,312) from mouse embryos at 9 time points from 6.5-8.5 days post-fertilization
Branch points 👉 Lineages
Trajectories 👉 Differentiation
Clusters
👉 Cell types
But the geometry of high-dimensional spaces is weird
Every pair of random points has nearly the same distance: d(x, y)2 =
n
∑i=1
(xi − yi)2 and central limit theorem
235
10
15
20
0
5
10
15
20
0.00 0.25 0.50 0.75 1.00r
dV/d
r
Almost all volume of any shape (e.g. hypercube, sphere) is close to its surface
V(r) = rd
dV(r)dr
= drd−1
MDS of 100 objects
0
25
50
75
100
0 25 50 75 100
0.0
0.2
0.4
0.6
0.8
value
D●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.2
−0.1
0.0
0.1
0.2
−0.4 −0.2 0.0 0.2 0.4P1
P2
cmdscale
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.2
0.0
0.2
−0.6 −0.3 0.0 0.3 0.6P1
P2
isoMDS
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●●● ●● ●●●●●●●●−5.0
−2.5
0.0
2.5
5.0
−5.0 −2.5 0.0 2.5 5.0P1
P2
t−SNE
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−5
0
5
−10 −5 0 5 10P1
P2
UMAPDij = 1 − e−λ|i−j|idealized model for such data:
MDS of a 100-dimensional dataset
0
25
50
75
100
0 25 50 75 100
0.0
0.2
0.4
0.6
0.8
value
D●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.2
−0.1
0.0
0.1
0.2
−0.4 −0.2 0.0 0.2 0.4P1
P2
cmdscale
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.2
0.0
0.2
−0.6 −0.3 0.0 0.3 0.6P1
P2
isoMDS
●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●●● ●● ●●●●●●●●−5.0
−2.5
0.0
2.5
5.0
−5.0 −2.5 0.0 2.5 5.0P1
P2
t−SNE
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−5
0
5
−10 −5 0 5 10P1
P2
UMAP
Dij = 1 − e−λ|i−j|
λ = 2
MDS of a 100-dimensional dataset
Dij = 1 − e−λ|i−j|
λ = 6
0
25
50
75
100
0 25 50 75 100
0.00
0.25
0.50
0.75
value
D2
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.2
0.0
0.2
0.4
−0.50 −0.25 0.00 0.25 0.50P1
P2
cmdscale
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
−0.6
−0.4
−0.2
0.0
0.2
−0.5 0.0 0.5P1
P2
isoMDS
●●●●●●●
●●●●●
●●●
●●●●●
●●●●
●●●●●
●●●●
●●●●
●●●●
●●●●
●●●●
●●●●
●●●●
●●●●
●●●
●●●●
●●●●
●●●●
●●●●
●●●●●●
●●●●●●●●●●●
●●●
−4
0
4
−4 0 4P1
P2
t−SNE
●●●●●●●●●●●
●●
●
●●
●●●
●
●●●●
●●
●●●●●●●●●●●
●●●●●
●●
●●● ●
●● ●● ●● ●●● ●●● ●● ●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●
●
−7.5
−5.0
−2.5
0.0
2.5
−3 0 3 6P1
P2
UMAP
MDS of a 100-dimensional dataset
Dij = 1 − e−λ|i−j|
λ = 20
0
25
50
75
100
0 25 50 75 100
0.00
0.25
0.50
0.75
value
D3●●●●●●●●●●●●●●●
●●●●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●●●●●●●●●●
●●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●●●●●●●●●●
●●●●●●●●●
−0.2
0.0
0.2
0.4
−0.2 0.0 0.2P1
P2
cmdscale
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●
●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●−0.50
−0.25
0.00
0.25
−0.4 0.0 0.4P1
P2
isoMDS
●●●●●●
● ●●
●●●● ●
● ●
● ●
● ●●● ●
● ●● ●
●●●
●●●
●●
●●●
●●
●●
●●
●●
●●●
●●●
●●
●
●●
●●
●●●●
●
●●●●
●●
●●●●
●●
●●
●●●
●●●●
●●
●●
●●●
●●●
●●● ●
●−5
0
5
10
−10 −5 0 5P1
P2
t−SNE
●●●●●●●●●●
●●●
●●●●●●●●●●●●●●
●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●
●●●●●●●●●
●●●●●●●●●●●
−5
0
5
−5 0 5P1
P2
UMAP
−0.1
0.0
0.1
0 25 50 75 100Var1
valu
e
EigenvectorsEV1
EV2
EV3
EV4
EV5
−2
−1
0
1
2
0 25 50 75 100Var1
valu
e
EigenvectorsEV1
EV2
EV3
EV4
EV5
What is going on?
EV1
−0.1
00.
05
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.1
50.
000.
15
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
−0.10 0.05
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.10 0.05
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
EV2
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●
●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
EV3
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
−0.15 0.00 0.15
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
−0.15 0.00 0.15
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●
●●●●●
●●●●
●●●●●●●●●●●●
●●●●
●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●
●●
●●●
●●●●
●●●●●●●●●
EV4
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●
●●
●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●●●●●●●●
−0.1
00.
05
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−0.1
50.
000.
15
●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●
●●●●●●●●●●●●●●●
●●●●●●
●●●
●●
●●
●●
●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
●●
●●
●●
●●
●●●●●●●●●
−0.15 0.00 0.15
−0.1
50.
000.
15
EV5
eigD ← eigen(double.center(D))pairs(eigD$vectors[, 1:5])
n
eig
D$ve
ctor
For the mathematically inclined:
0
25
50
75
100
0 25 50 75 100
−0.25
0.00
0.25
0.50
value
double centered D
d2
dx2f(x) = lim
h→0
f(x + h) − 2f(x) + f(x − h)h2
d2
dx2f(x) = − k2 f(x) ⇔
f(x) ∝ eikx = cos kx + i sin kx
Multidimensional Scaling and Local Kernel Methods Persi Diaconis, Sharad Goel and Susan Holmes The Annals of Applied Statistics 2008
A straight line in ℝ100
x = at + bfor t ∈ [t1, t2]; a, b ∈ ℝ100
0
10
20
30
40
0 10 20 30 40
0
50
100
150
200
250value
dm
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−20−10
01020
−100 −50 0 50 100P1
P2
cmdscale
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−20−10
01020
−100 −50 0 50 100P1
P2
isoMDS
●●
●● ●
●
● ●
●
●●
●
●
●
●●
●●
●●
●●
● ●
● ●
●●
●
●●
●●
●
●
● ●
●
●
●●
●●
−60
−40
−20
0
20
40
−20 0 20 40P1
P2t−SNE
●●●
●
●●
●
●
●●
●
●●●●
●
●●
●●
●● ●
●●●●
●●
●● ●●●
●●
●●●●
●●
●
−2.5
0.0
2.5
5.0
−1.0−0.50.00.51.0P1
P2
UMAP
Distance matrix D
A straight line in ℝ100 - with saturation of larger distances
x = at + bfor t ∈ [t1, t2]; a, b ∈ ℝ100
Distance matrix
0
10
20
30
40
0 10 20 30 40
0
20
40
60
value
sat(dm)
●●●●
●●
●
●
●
●
●
●
●
●
●●
●●
●● ● ● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
−20
−10
0
10
20
−20 0 20P1
P2
cmdscale
●
●
●
●
●
●
●
●●●●●●
●●
●● ● ● ● ● ● ● ● ● ● ●
●●
●●●●●●●
●
●
●
●
●
●
●
−20
−10
0
10
−40 −20 0 20 40P1
P2
isoMDS
● ●
● ●
● ●
●●
●●
●●
●●
●●
●
●●
●●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
−25
0
25
−20 0 20P1
P2t−SNE
●
●●●
●●●●
●
●●
●●
●●
●
●
●●
●●●●
●
●●●●
●●
●●●
●●
●
●
●●
●●
●●
−5.0
−2.5
0.0
2.5
−2 −1 0 1 2 3P1
P2
UMAP
sat(D)
0
20
40
60
0 50 100 150 200 250d
sat(d
)
2D random field with spatial correlation
0
50
100
150
0 25 50 75 100
−100
−50
0
50
value
x
0
50
100
150
0 50 100 150
−1000
0
1000
2000value
cov(x)
0
25
50
75
100
0 25 50 75 100
0
200
400
600
800
value
as.matrix(dist(x))●●
●●
●
●
●●
●●
● ●
●● ●
●●
●●●●
●
●
●
●●●
●
●
●●
●●
●●
●●
●●
●●●●
●●
●●
●●
●
●●● ●
●●
●
●
●●
●●
●●
●●●
●
●●
●● ●
●●●●●●
●
●
●●●
●
●●
●●
●●●
●
●●●●
●●
●
−200
0
200
−200 0 200 400P1
P2
cmdscale
●●●
●
●
●
●●
●●
● ●
●● ●
●●
●●●●
●
●
●
●●●
●
●
●●
●●
●●
●●
●●
●●●●
●●
●●
●●
●
●●● ●
●●
●
●
●●
●●
●●
●●●
●
●●
●● ●
●●●●●●
●
●
●●●
●
●●
●●
●●●
●
●●●●
●●
●
−200
0
200
−200 0 200 400P1
P2
isoMDS
●●
●●● ●
●●
●●
●
●●●
● ●●
●●
● ●
●●●
●●●●
● ●●
● ●
●●
●●●●●
●●●
●●
●
●●
●
●●
●●●
●●
●●
●
●
●●
●●●
●●●
●●
●
●
●●
●●
●
●
●●●●
●●
●
●
●●●●
●
●●
●●●
●●
●●
−5.0
−2.5
0.0
2.5
5.0
−6 −3 0 3P1
P2
t−SNE
●●●●●●●●
●●●●●
●●
●●●●●●●●●
●●●●
●
●●
●●●●●●●●●
●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●● ●● ●●●●●●
●●
●●●●●
−10
−5
0
5
−2 0 2 4 6P1
P2
UMAP
2D random field with spatial correlation
Matrix filled with random numbers with sequential correlation
0
25
50
75
100
0 50 100 150
−10−5051015
value
rw●●
●
●
●●
●●●
●●
●
●
●●
●●●●
●●●
●
●
●●
●●
● ● ●●
●●
●
●
● ●
●
●●●
●●
●●
●●●
●
●●●
●
●
●●●●●
●●
●
●●●●
●●●
●
●
●
●
●
●●
●
●
●●
●●
●
●●●
●●
●●
●●
●
●
●●
●
●
●
●●
●●●●●
●●
●
●
●
●
●
●●
● ●
●●●●●
●
●
●
●●●●
●●
●●●●
●
●●● ●●
●
●
●
● ●
●●●
−0.1
0.0
0.1
−0.1 0.0 0.1PC1 (17.39%)
PC2
(14.
38%
)
Matrix filled with random numbers with sequential correlation
0
50
100
150
0 50 100 150
0
20
40
60
value
as.matrix(dist(rw)) ●●
●
●
●●
●●●●●
●
●●●
●●●●
●●●●
●
●●●●
● ● ●●●●●
●
● ●
●
●●●
●●
●●
●●●
●
●●●
●●●●
●●●
●●
●
●●●●
●●●●
●
●
●
●
●●
●
●
●●●
●●
●●●
●●
●●
●●●●
●●
●
●
●
●●
●●●●●
●●●
●
●●●
●●
● ●●
●●●●●
●
●
●●●●
●●
●●●●●●●● ●●
●
●●● ●
●●●
−30
−20
−10
0
10
20
30
−20 0 20P1
P2
cmdscale
●●
●
●
●●
●●●●●
●
●●●
●●●●
●●●●
●
●●●●
● ● ●●●●●
●
● ●
●
●●●
●●
●●
●●●
●
●●●
●●●●
●●●
●●
●
●●●●
●●●●
●
●
●
●
●●
●
●
●●●
●●
●●●
●●
●●
●●●●
●●
●
●
●
●●
●●●●●
●●●
●
●●●
●●
● ●●
●●●●●
●
●
●●●●
●●
●●●●●●●● ●●
●
●●● ●
●●●
−30
−20
−10
0
10
20
30
−20 0 20P1
P2
isoMDS
●●●●
●●●
●●
●●●
●●●●●
●●●●
●●●●
●●●●
●●●●●●
●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●
●●●●●●●●●●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●
●●●●●●●
●●●●●●●
−10
−5
0
5
10
−10 −5 0 5 10P1
P2
t−SNE
●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●
●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−7.5
−5.0
−2.5
0.0
2.5
−10 −5 0 5P1
P2
UMAP
2D t-SNE on 'impossible' shapes●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
−20
−10
0
10
20
−20 −10 0 10 20x
y
2D grid
●●
●●
● ● ●●
●●
●●
● ●●●
●●●● ● ● ●●
●●●
● ●●
●●
●●
●●
●●
●●
●●●
●●
●●●
●●●
●●
●●●
●●●
●●
●●●
●●
●●● ● ●●
●
●
●●
●● ● ●
●●
● ● ●● ●●
●●●
● ●●●●
●●
●●
●●●●
●●
●●
●●
●●
● ●●
●●
●● ●
● ●●
●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●
●●
●●●●●●
●●
●●
●●
●●●●
●●●●●●●●
●●
●
●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●●●
●●
●●●●●
●●●
●●●
●●●●●
●●●●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
●
●
●●●
●●
●
●
●
●
●●
●●●●●●
●●
●●●●●
●●●●
●●●●
●●●
●●
●●●
●●●
●●
●●
●●●●
●●
●●
●●●●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●●●●●●●
●●●●●●●● ●
●●●●●●●
●●●
●●●●●
●●●
●●
●●●
●●●
●●
●●●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
● ● ● ●●
●●●
● ●● ● ●●
●●
● ● ● ● ● ● ●●● ● ●
● ● ● ●●● ●●
●●
●
●
●
●●
●●
●●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●● ●
●●●●
●● ● ● ●●●●
●● ● ● ● ●●●
●●●
● ● ●●●●●●
●●●
●●
●●●
●●●
●●
−20
−10
0
10
20
−20 −10 0 10 20x
y
3D grid
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
−10
−5
0
5
10
−10 −5 0 5 10x
y
2D torus
●
●●
●
●●●●
●
●● ●
● ● ●
●
●
●●
●
●● ●
●
●●●
●
●●
●●
●●●
●
●
●●
●
●●●
●
●
●
●●
●●●
●
●●
●●
●●●
●
●●●●
●●● ●
●●●●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●●
●
●●
●
●
●●●
●●
●●
●
●●●
●
●
●●
●
●●●
●
●●
●●
●●●
●
●●
●●
●●● ●
●●●●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●●
●
●●
●
●
●●●
●●
●●
●
●●●
●
●
●●
●
●●●
●
●
●●●
●●●
●
●●
●●
●●●
●
●●●
●
●● ●
●
●●●
●
●●●
●
●●●
●
●●●
●
●●●
●
●●●
●●
●●
●
●●
●
●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●●
●
●●●
●
●●●
●
●● ●
●
●●●
●
●● ●
●
●●●
●
●●●
●
●●●
●
●●●
●●
●●
●
●●
●
●
●
●●
●
●●●
●
●
●●
●
●●●
●
●●●
●
●
●●
●
●●●
●
●
●
● ●
●●●●
●
●
● ●
●●●
●
●●●●
●●
●●
● ● ●
●●●
●●
● ● ●
●●●
●●
●●
●
●● ●
●
●
●●●
●●●●
●
●
●●
●
●●●
●
●
●● ●
●●●●
●
●
●●
●●●
●
●●●
●
●●
●●
● ● ●
●
●●
●●
● ● ●
●
●
●
●●
●●●
●
●
●
●●
●●●
●
●●●
●
●
●●
●
●●●●
●
●● ●
●● ●
●
●
●●
●
●● ●
●
●●● ●
●●
●●
●● ●
●
●
●●
●
● ● ●
●
●
●
●●
●●●
●
●
●
●●
●●●
●
●●●
●
−20
−10
0
10
20
−20 −10 0 10 20x
y
3D torus
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−50
−25
0
25
50
−40 −20 0 20 40x
y
2D sphere surface
2D cmdscale on 'impossible' shapes
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−0.4
0.0
0.4
−0.4 0.0 0.4V1
V2
2D grid
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
−0.4
0.0
0.4
−0.8 −0.4 0.0 0.4 0.8V1
V2
3D grid
●●
● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●●●
●●
● ●●
●●●
−0.2
−0.1
0.0
0.1
0.2
−0.2 0.0 0.2V1
V2
2D torus
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
−0.4
−0.2
0.0
0.2
0.4
−0.4 −0.2 0.0 0.2 0.4V1
V2
3D torus
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
● ● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1.0
−0.5
0.0
0.5
1.0
−1.0 −0.5 0.0 0.5 1.0V1
V2
2D sphere surface
Take home messages
Embeddings of high-dimensional data into lower-dimensional space are useful
But they can create one-dimensional (“time-like”) patterns that have little to do with the data-generating process
Sometimes, a faithful embedding is mathematically impossible
High-dimensional geometry is weird
Be aware