The Chimonanthus salicifolius genome provides insight into ... · Magnoliids represent the third...
Transcript of The Chimonanthus salicifolius genome provides insight into ... · Magnoliids represent the third...
RESOURCE
The Chimonanthus salicifolius genome provides insight intomagnoliid evolution and flavonoid biosynthesis
Qundan Lv1,†, Jie Qiu2,†, Jie Liu2,†, Zheng Li3, Wenting Zhang2, Qin Wang2, Jie Fang1, Junjie Pan1, Zhengdao Chen1,
Wenliang Cheng1, Michael S. Barker3, Xuehui Huang2, Xin Wei2,* and Kejun Cheng1,*1Chemical Biology Center, Lishui Institute of Agriculture and Forestry Sciences, Lishui, China,2Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai,
China, and3Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, USA
Received 4 November 2019; revised 25 May 2020; accepted 2 June 2020.
*For correspondence (e-mail [email protected]; [email protected]).†These authors contributed equally to this work.
SUMMARY
Chimonanthus salicifolius, a member of the Calycanthaceae of magnoliids, is one of the most famous
medicinal plants in Eastern China. Here, we report a chromosome-level genome assembly of C. salicifolius,
comprising 820.1 Mb of genomic sequence with a contig N50 of 2.3 Mb and containing 36 651 annotated
protein-coding genes. Phylogenetic analyses revealed that magnoliids were sister to the eudicots. Two
rounds of ancient whole-genome duplication were inferred in the C. salicifolious genome. One is shared by
Calycanthaceae after its divergence with Lauraceae, and the other is in the ancestry of Magnoliales and Lau-
rales. Notably, long genes with > 20 kb in length were much more prevalent in the magnoliid genomes com-
pared with other angiosperms, which could be caused by the length expansion of introns inserted by
transposon elements. Homologous genes within the flavonoid pathway for C. salicifolius were identified,
and correlation of the gene expression and the contents of flavonoid metabolites revealed potential critical
genes involved in flavonoids biosynthesis. This study not only provides an additional whole-genome
sequence from the magnoliids, but also opens the door to functional genomic research and molecular
breeding of C. salicifolius.
Keywords: Chimonanthus salicifolius, de novo genome assembly, magnoliids, evolution, long genes, gene
expression.
INTRODUCTION
Magnoliids represent the third largest group of angios-
perms, which includes approximately 10 000 species (Pal-
mer et al., 2004; Massoni et al., 2015). Numerous useful
plants are in the magnoliids, such as avocado, nutmeg,
bay laurel, black pepper, star anise, wintersweet and cam-
phor tree. They provide fruit, spices, traditional medicine,
industrial raw materials and ornamental trees for human
use. The availability of genomes for more than 300 mono-
cots and eudicots has greatly accelerated phylogenetic
reconstruction and genetic research in monocots and eudi-
cots. Despite the importance of magnoliids, few magnoliid
genomes have been sequenced, and their phylogenetic
position remains uncertain (Chaw et al., 2019; Chen et al.,
2019; Hu et al., 2019; Rendon-Anaya et al., 2019).
The mysterious phylogenetic position of magnoliids has
been debated for decades. Based on different genomic
components, including plastid genes, mitochondrial genes,
nuclear genes and plastomic inverted repeat regions, three
main phylogenetic topologies have been proposed, that is:
(i) sister to the monocots (Endress and Doyle, 2009); (ii) sis-
ter to the clade containing monocots and eudicots (Moore
et al., 2007; Qiu et al., 2010); and (iii) sister to the eudicots
(Zeng et al., 2014; One Thousand Plant Transcriptomes Ini-
tiative, 2019). In the Angiosperm Phylogeny Group (APG)
system, the phylogenetic position of magnoliids is not con-
sistent among the four versions (The Angiosperm Phy-
logeny Group, 1998, 2003, 2009, 2016). Recently, four
genomes of magnoliids were released, including Lirioden-
dron chinense (Chen et al., 2019), Cinnamomum kanehirae
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA
1
The Plant Journal (2020) doi: 10.1111/tpj.14874
(Chaw et al., 2019), Persea americana (Rend�on-Anaya
et al., 2019) and Piper nigrum (Hu et al., 2019). These gen-
omes greatly facilitated our understanding of the magnoli-
ids evolution (Soltis and Soltis, 2019). However, the
phylogenetic positions of the sequenced species revealed
by these genomes were different, resulting in more confu-
sion regarding the genome evolution of magnoliids. In
addition, the relative timing of the whole-genome duplica-
tion (WGD) events for different species and the divergence
times between different magnoliid plants remain ambigu-
ous (Cui et al., 2006; Chaw et al., 2019; Chen et al., 2019;
Rendon-Anaya et al., 2019).
Chimonanthus salicifolius (Chinese name ‘Liu-Ye-La-
Mei’) is a shrub that belongs to the Calycanthaceae in
the Laurales. The leaves of C. salicifolius have been
used as traditional medicine to relieve diarrhea symp-
toms by people of the She nationality for hundreds of
years in Eastern China, and its definite curative effects
have earned C. salicifolius the title of “the uncrowned
king of traditional She nationality medicine”. A large
number of secondary metabolites, such as flavonoids,
coumarins, alkaloids, and terpenoids, which might be
the active components that play critical roles in the
rehabilitation, have been identified in the leaves of
C. salicifolius (Ma et al., 2015; Li et al., 2016; Wang
et al., 2016, 2019). Ethanolic extracts of C. salicifolius
show significant antimicrobial and antibiotic-mediating
activity (Wang et al., 2018). Moreover, the young leaves
of C. salicifolius are processed into tea, which is com-
monly consumed in Eastern China. Despite the commer-
cial interest and increasing demand for C. salicifolius,
the basic biological research and genetic improvement
of C. salicifolius are quite limited. The lack of genome
information has hindered identification of the flavonoid
biosynthetic genes.
In this study, the genome of C. salicifolius was
sequenced by Illumina and PacBio, and scaffolded using
10 9 Genomics and Hi-C technologies. Approximately 820
megabases (Mb) of genome sequences were assembled
with a contig N50 of 2.2 Mb. This high-quality genome
provides a resource for inferring the phylogeny of mag-
noliids, and identifying the key genes responsible for fla-
vonoid biosynthesis and genes underlying the complex
agronomic traits such as flowering at low temperatures.
Comparative genomic analysis was performed with the
three published magnoliid genomes, L. chinense (Chen
et al., 2019), C. kanehirae (Chaw et al., 2019) and P. amer-
icana (Rendon-Anaya et al., 2019). In addition, the preva-
lence of long genes was discovered in the genomes of
C. salicifolius and other magnoliids. Overall, our results
shed light on the phylogeny of magnoliids, and lay a
foundation for understanding the mechanism of flavo-
noid biosynthesis and molecular breeding of high-flavo-
noid-content varieties.
RESULTS
Genome sequencing, assembly and annotation
The genomic DNA of C. salicifolius was extracted from one
individual plant collected in Eastern China, and sequenced
using both Illumina and PacBio sequencing platforms. For
the initial contig assembly, a total of 82.3 gigabases (Gb)
of PacBio data were generated, representing approximately
101.5-fold coverage of the 810.6-Mb genome, a size pre-
dicted by a 17-mer analysis (Figure S1). Based on the flow
cytometry survey, the genome size of C. salicifolius was
evaluated to be approximately 835.5 Mb (Figure S2), which
is close to the genome size estimated by the k-mer strat-
egy. The contigs were assembled using Falcon (Chin et al.,
2013) with the PacBio data, and were error corrected by
Pilon (Walker et al., 2014) with 98.5 Gb (121.5-fold) of clean
Illumina reads. Afterwards, the contigs were scaffolded by
FragScaff (Adey et al., 2014) using 127.8 Gb (157.7-fold) 10
9 Genomics data, and the final genome size was
851.7 Mb. A total of 1741 contigs and 1531 scaffolds were
assembled.
To assemble the scaffolds into pseudochromosomes, a
high-throughput chromosome conformation capture (Hi-C)
library was constructed and sequenced, resulting in
148.1 Gb (182.7-fold) data. Using LACHESIS (Burton et al.,
2013) to cluster, order and orient them, the assembled
scaffolds were anchored into 11 clusters, with a total size
of 820.1 Mb genome sequences. The assembly genome
size was very close (98.2% coverage) to the estimated gen-
ome size (835.5 Mb) obtained from the flow cytometry
analysis. The number of groups corresponded to the num-
ber of chromosomes of C. salicifolius (2n = 22). The
lengths of the pseudochromosomes ranged from 51.2 to
97.7 Mb (Table 1). The N50 values of pseudochromosome
and contig were 96.3 and 2.3 Mb, respectively. The Hi-C
contact matrix based on the assembled genome is visual-
ized in Figure S3.
The short reads generated from Illumina sequencing
were aligned with our assembled genome. We found that
95.6% of the reads could be mapped back to the genome
and covered 99.9% of the genome. Single nucleotide poly-
morphism (SNP) calling showed that the heterozygosity
rate was 0.46%. The assembled genome was evaluated by
BUSCO (Benchmarking Universal Single Copy Orthologs;
Simao et al., 2015), and we found that the ‘complete’ per-
centage was 95.1% (Table S1). The GC content of the
C. salicifolius genome was 36.9%.
Based on the combined gene prediction strategy, con-
sidering evidence from de novo prediction, protein homol-
ogy and transcriptomic support, we predicted 36 651
protein-coding gene models with an average gene length
of 6593.9 bp and a coding sequence length of 1069 bp. Of
the 36 651 genes, 34 119 (93.1%) were supported by either
the identification of homologs in other species or the
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
2 Qundan Lv et al.
RNA-seq data. The gene density ranged from 0 to 110
genes per megabases across the chromosomes (Figure 1).
Although the density of protein-coding genes seemed
generally complementary to the repetitive elements across
the genome, we found long genes (> 20 kb) tended to be
distributed more in the repetitive regions (Figures 1 and
S4). Meanwhile, heterozygous SNPs were more likely to
appear in regions with low repeat sequence density (Fig-
ures 1 and S5).
The annotation revealed the proportion of repetitive
sequences in the C. salicifolius genome was 57.5%, com-
parable to L. chinense (61.6%) but much higher than
C. kanehirae (48.0%). Interspersed repeat sequences
accounted for 56.6% of the repetitive sequences (Table S2).
Similar to most sequenced plant genomes, long terminal
repeats (LTRs) were the most abundant type of inter-
spersed repeats, occupying the majority (30.0%) of the
repeat sequences, followed by DNA transposons at 9.7%
(Table 1). With regard to non-coding sequences, 174
microRNAs (miRNAs), 695 transfer RNAs (tRNAs), 254
small nuclear RNAs (snRNAs), 1052 small nucleolar RNAs
(snoRNAs) and 283 ribosomal RNAs (rRNAs) were pre-
dicted in the C. salicifolius genome.
Phylogeny and comparative genomic analysis
To infer the phylogenetic position of C. salicifolius, gen-
omes of an outgroup species Selaginella moellendorffii
and 15 other angiosperms, including Amborella tri-
chopoda, three monocots (Musa acuminata, Zea mays and
Oryza sativa), eight eudicots (Daucus carota, Mimulus gut-
tatus, Vitis vinifera, Prunus mume, Arabidopsis thaliana,
Populus trichocarpa, Aquilegia coerulea and Nelumbo
nucifera) and three magnoliids (L. chinense, P. americana
and C. kanehirae), were analyzed. Based on the 103 single-
copy orthologous genes, we constructed a phylogeny of
these 17 plant species (Figure 2a). The phylogenetic tree
shows C. salicifolius clustered with three other magnoliids.
In addition, the magnoliids are sister to the eudicots, rather
than sister to monocots or both monocots and eudicots.
We further used a coalescence-based approach to analyze
1420 gene trees to help reduce the implications of incom-
plete lineage sorting. The coalescence-based phylogenetic
tree shows the same topology for magnoliids as the con-
catenation tree (Figure S6). Therefore, we concluded that
the magnoliids are likely to be sister to eudicots rather
than sister to monocots or sister to the clade of eudicots
and monocots.
We further compared the gene numbers among the four
magnoliid plants. A total of 8896 gene families were
shared by L. chinense, C. salicifolius, P. americana and
C. kanehirae. This suggests that they may be core gene
Table 1 Summary of assembly, annotation and repeat sequences of the Chimonanthus salicifolius genome
Group SubgroupN50 size(Mb) N90 size (Mb) Longest (Mb) Total size (Mb)
Sequencing andassembly
Pseudochromosome 96.3 67.6 97.7 820.1Contig 2.3 0.3 11.9 820.0
Genome annotation Protein-coding gene Genemodels
Gene size(bp)
Supported byexpression
Supported by expression &homolog
36 651 6593.9 90.1% 93.0%Exon No. exons Exons per
gene169 342 4.6
Repetitive elements(%)
LTR LINEs &SINEs
DNA transposons Total
20.8 4.4 5.6 57.7
C. salicifolius genome(a)pseudochromosomes(b)Repeat density(c)Gene density(d)Long gene (>20 kb) density(e)Heterozygous SNP density
Figure 1. Landscape of the Chimonanthus salicifolius genome. Circos plot
of the C. salicifolius genome assembly.
Circles from the outside inwards: (a) pseudochromosomes, (b) repeat den-
sity, (c) gene density, (d) long gene (> 20 kb) density and (e) heterozygous
single nucleotide polymorphism (SNP) density. These density metrics were
calculated with 1-Mb sliding windows. The syntenic genomic blocks
(> 300 kb) are illustrated with gray lines.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,The Plant Journal, (2020), doi: 10.1111/tpj.14874
The genome of Chimonanthus salicifolius 3
families for these four magnoliids. We found 490 gene
families were specific to C. salicifolius (Figure 2d), which is
higher compared with the other three magnoliid genomes.
The numbers of Pfam protein families in the genomes of
the four magnoliids and 11 other plants (three monocots
and eight eudicots) were examined and compared (Fig-
ure 2e; Table S3). For magnoliid plants, the genes with
Pfams like PF14432 (DYW family of nucleic acid deami-
nases), PF01397 (terpene synthase, N-terminal domain) are
commonly expanded compared with monocots and eudi-
cots, while the commonly reduced genes are with the
Pfams terms of PF01565 (FAD binding domain), PF00646
(F-box domain) and PF00931 (NB-ARC domain). For C. sali-
cifolius, the specifically expanded genes are with the
Pfams like PF13041 (PPR repeat family), PF00313 (‘Cold-
shock’ DNA-binding domain) and PF03105 (SPX domain),
while specifically reduced domains include PF13410
(glutathione S-transferase, C-terminal domain) and
PF00891 (O-methyltransferase).
Ancient whole-genome duplications in Chimonanthus
Self-alignment of the C. salicifolius genome showed clear
syntenic evidence for ancient WGD events (Figure 1). To
infer ancient WGDs in C. salicifolius, we used age distribu-
tion of duplicate genes followed by a mixture model imple-
mented in the mixtools R package to identify significant
peaks of gene duplication consistent with WGDs. The mix-
ture model identified two peaks at about Ks 0.53 and 0.86
(Figures 3a and S7). The recent peak of duplication in Chi-
monanthus has a median value of Ks at about 0.53,
younger than the ortholog divergences of Chimonanthus
and Cinnamomum (Ks 0.64), and the divergences of Chi-
monanthus and Liriodendron (Ks 0.76; Figure 3b). The
older peak of duplication in Chimonanthus has a median
Ks of about 0.86. This suggests this duplication most likely
occurred before the divergence of Chimonanthus and other
magnoliids.
To confirm C. salicifolius has undergone two rounds of
ancient WGD, we compared the syntenic depth ratio
between C. salicifolius genome with the genomes of A. tri-
chopoda and V. vinifera. We observed an overall four to
one syntenic depth ratio between C. salicifolius and A. tri-
chopoda (Figure 3c), that is, a single A. trichopoda region
could be aligned to four genomic regions in the C. salici-
folius genome (Figure 3d). Given the A. trichopoda gen-
ome had not experienced any ancient WGD after the
ancestral angiosperm WGD (Amborella Genome Project,
2013), the overall four to one syntenic depth ratio suggests
that C. salicifolius experienced two rounds of ancient
WGD. We also compared C. salicifolius with the V. vinifera
genome, which was most recently duplicated by the eudi-
cot hexaploidy event (Jaillon et al., 2007). Consistent with
our hypothesis, we found a four to three syntenic depth
ratio between Chimonanthus and Vitis (Figure S8). For
comparison to Chimonanthus, we also analyzed the syn-
tenic depth ratios between C. kanehirae and L. chinense to
Amborella and Vitis (Figures S7 and S8). In Liriodendron,
we found a two to one syntenic depth ratio to Amborella,
and two to three ratio to Vitis. In Cinnamomum, we recov-
ered a four to one ratio between Cinnamomum and
Amborella, and a four to three ratio to Vitis. Previous syn-
teny analyses inferred one round of ancient WGD in L. chi-
nense, and two rounds of ancient WGD in C. kanehirae
(Chaw et al., 2019; Chen et al., 2019). Our results are con-
sistent with these previous studies, and indicate two
rounds of ancient WGD occurred in Chimonanthus.
To corroborate the inferences and phylogenetic place-
ments of these ancient WGD, we also used the Multi-tAxon
Paleopolyploidy Search (MAPS) tool (Li et al., 2015, 2018;
Li and Barker, 2020). The MAPS algorithm filters collections
of gene trees for subtrees consistent with given
D. carota
M. guttatus
V. vinifera
P. mume
A. thaliana
P. trichocarpa
A. coerulea
N. nucifera
L. chinense
C. salicifolius
P. americana
C. kanehirae
M. acuminata
Z. mays
O. sativa
A. trichopoda
S. moellendorffii
PF14432(DYW family of
nucleic acid deaminases)
PF01535 (PPR repeat)
PF13041 (PPR repeat family)
PF00931(NB-ARC domain)
PF00646 (F-box domain)
Eudicots
Magnoliidae
Monocots
100
100
0 4000 8000 120000% 50% 100%
(a) (b) (c)
11,156C. kanehirae
11,630C. salicifolius
11,153L. chinense
11,315P. americana
490 384
393190
151 14784
214
279457
413525
8,896
618 279
(d) (e)
<5 kb 5-10 kb10-20kb >20kb
Intron
Length (bp)
CDSGene
.
100
100
86
100
100
67
100
100
100
100
100
Figure 2. Phylogenetics and gene families of Chimonanthus salicifolius.
(a) The phylogenetic tree including 17 species was constructed based on
103 single-copy genes, with Selaginella moellendorffii as an outgroup.
(b) The percentage of genes with different length ranges is illustrated for
each species on the tree. The golden bar indicates the percentage of long
genes (> 20 kb).
(c) The average lengths of the genes, introns and coding regions of each
species are shown. The lengths of the long genes (dark blue) are obviously
mainly contributed by the length of introns (light blue) rather than that of
the CDSs (gray).
(d) Gene families in the genomes of Persea americana, Cinnamomum kane-
hirae, C. salicifolius and Liriodendron chinense.
(e) Scatter plot showing the number of genes with specific Pfam domains in
the C. salicifolius genome compared with those of monocot and eudicot
species.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
4 Qundan Lv et al.
relationships at each node in the species tree. Based on
these filtered subtrees, MAPS reports the number of gene
duplications shared by descendant taxa at each node. It
then compares the observed results with null and positive
simulations of WGDs. For our MAPS analysis, we selected
six species including the four magnoliids, Chimonanthus,
Cinnamomum, Persea, and Liriodendron. Oryza and
Amborella were used as outgroups. We observed a burst
of shared gene duplications at nodes N2, N3 and N4 (Fig-
ure 3e; Table S4). This signal had significantly more
shared gene duplications than expected compared with
the null simulations (P < 0.05). To further characterize
these significant gene bursts, we simulated an additional
set of gene trees with a WGD at the phylogenetic location
of the duplication bursts. At the node representing the
most recent common ancestor of Laurales and Magnoliales
(N3), we found this episodic burst of shared gene duplica-
tion was statistically consistent with our positive simula-
tions of WGD. These MAPS inferences of WGDs
corroborated the results of our Ks plots and ortholog diver-
gence analyses, and provided evidence consistent with an
ancient WGD shared among Laurales and Magnoliales.
Based on the substitution rate of 3.02E-9 for the magnoli-
ids (Cui et al., 2006), we estimated these two ancient poly-
ploidy events in the Chimonanthus genome dating back to
approximately 87 million years ago (Mya) and 142 Mya.
Large number of long genes in magnoliids
An interesting phenomenon was observed by analyzing
the gene length of C. salicifolius: long genes (> 20 kb) in
C. salicifolius (2737) were much greater in number than
those in most monocots and eudicots, such as rice (45),
maize (910), Arabidopsis (4) and poplar (66). We investi-
gated the lengths of all the genes in other plant genomes
and found that long genes were also common in magnoli-
ids and A. trichopoda but not in S. moellendorffii, suggest-
ing that long genes might be a specific genomic
characteristic of magnoliids and some angiosperms (Fig-
ure 2b). Further measuring the lengths of the coding
regions and introns revealed that the average coding
region lengths in all 17 plant genomes are similar (ranging
from 825 to 1396 bp), whereas the lengths of introns vary
greatly (ranging from 556 to 10 202 bp; Figure 2c). The
genes in the four magnoliid genomes have much longer
introns (7144 bp on average) than those in the three mono-
cots (2435 bp on average) and the eight eudicots (2933 bp
on average), suggesting that the long genes are due to the
extension of the intron length rather than coding regions.
According to the gene length, we divided genes in
C. salicifolius into different groups, < 5 kb (short genes), 5–10, 10–20 and > 20 kb (long genes). We further character-
ized LTR content in genes of different length ranges, and
found that a much higher percentage of LTRs, including
Gypsy and Copia, existed in the long genes group (Fig-
ure 4a). When examining paralogous genes in the C. salici-
folius genome, we found that many long genes were
paralogs of short and moderate-length genes. Calculation
of the Ka/Ks values for the paralogous genes with different
lengths revealed a significantly higher (one-sided Wilcoxon
test) Ka/Ks ratio for short�long comparison than that of
short�moderate (P-value: 4.37E-07) and moderate�long
(P-value: 2.98E-10) comparisons. This may indicate that in
addition to dramatic changes in the intron regions, more
non-synonymous variants in the coding regions exist
between long and short genes (Figure 4b).
The expression levels of all the genes were analyzed
using transcriptomic data. The average expression level of
0.640.76
0.05(b)(a)
(e)
(c) (d)C. salicifolius
A. trichopoda
Chr.
Scaf.
C. salicifolius (Csa) vs. A. trichopoda (Atr)4 : 1
106 MYA(0.64)
8 MYA(0.05)
WGD: 142 MYA(0.86)
WGD: 87 MYA (0.53)
WGD: 76 MYA(0.46)
A. trichopoda
L. chinense
C. salicifolius
C. kanehirae
P. americana
125 MYA(0.76)
173 -199MYA#
O. sativa
148 - 166MYA#
Percentage of subtrees from
MAPS analysis
0 10 20 30 (%)
Ks
Den
sity
LchCkaCsa
snoitacilpud en eg fo .oN
Ks
0.46 0.860.53
0%
20%
40%
60%
80%
0 1 2 3 4
No. Csa blocks per Atr geneNo. Atr blocks per Csa gene
**
N1
N2
N3
N4
emoneg fo egatnecreP
Syntenic depth ratio
Figure 3. Ancient whole-genome duplications (WGDs) and ortholog diver-
gences in four magnoliid plants.
(a) The ortholog divergence in four magnoliid plants.
(b) The frequency distributions of synonymous substitutions (Ks) for ortho-
logs between four magnoliid plants [Persea americana (Pam), Cinnamo-
mum kanehirae (Cka), Chimonanthus salicifolius (Csa) and Liriodendron
chinense (Lch)] and Amborella trichopoda (Atr).
(c) The syntenic depth ratio of C. salicifolius compared with A. trichopoda.
(d) Genomic syntenic blocks between C. salicifolius and A. trichopoda are
shown, with blue wedges as a case highlighting a typical ancestral region
of Amborella that can be tracked to four genomic regions of C. salicifolius.
(e) A schematic diagram summarizing WGD events for the four currently
genome-sequenced magnoliid species. The estimated times of polyploidiza-
tion based on Ks are shown with brown ovals. The divergence times
obtained from TimeTree are shown with the symbol ‘#’. The percentage of
subtrees by MAPS analysis, which contained a gene duplication (red line)
shared by descendant species for each node, is shown on the right. The
symbol ‘**’ indicates that the observed value of percentage is significantly
higher than the null simulations.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,The Plant Journal, (2020), doi: 10.1111/tpj.14874
The genome of Chimonanthus salicifolius 5
the long genes was similar to that of the other genes (Fig-
ure 4c). In addition, 20 long genes were randomly selected
and polymerase chain reaction (PCR) amplified from cDNA
of several C. salicifolius tissues. Fourteen genes were suc-
cessfully amplified in leaf, seed, stem and flower (Fig-
ure S9), and their gene sequences were validated by
Sanger sequencing (Table S5). These results strongly indi-
cated that most predicted long genes in the C. salicifolius
genome are functional. GO enrichment analysis of the long
genes revealed that long genes played important roles in
cell components, plant growth and development, and
energy metabolism, as well as in nucleotide binding, cat-
alytic activity and hydrolase activity (Figure 4d; Table S6).
The gene encoding the tryptophan synthase alpha chain
in A. thaliana (AT3G54640) is reported to be involved in
multiple biological roles, including auxin and tryptophan
biosynthetic process, and defense response to bacterium
and gravitropism (Zhang et al., 2008). The intron length of
the orthologous gene in C. salicifolius (Cs05g01228) is sig-
nificantly longer than that in O. sativa, A. thaliana and
even the A. trichopoda. LTRs were detected in the introns
of Cs05g01228 but not in any other orthologous gene in
O. sativa, A. thaliana and A. trichopoda (Figure 4e), indi-
cating that LTR insertion might be involved in the intron
expansion of genes in C. salicifolius.
Transcriptomic profiles of Chimonanthus salicifolius
tissues and flowers during development
To obtain a global transcriptomic map of different tissues
and flower development in C. salicifolius, we selected tis-
sues including root, stem, seed, pericarp, leaf and flower.
Three stages of flowers (bud, blooming and withering) as
well as three stages of leaves (bud, young and senescent)
were collected from the wild C. salicifolius individuals
grown in Kaihua County in Zhejiang Province for RNA-seq
analysis (Figure S10). Each tissue was collected from three
individuals as three biological repetitions. RNA-seq was
performed on these 10 tissues, and comprehensive expres-
sion profiles of C. salicifolius genes were obtained
(Table S7). According to a principal component analysis
(PCA), different tissues could be separated quite well. The
individuals from different stages of the same tissues
grouped together (Figure S11). In young tissues, such as
leaf buds, flower buds and developing roots, higher gene
expression was observed (Figure S12).
In Eastern China, C. salicifolius flowers from the end of
autumn to the beginning of winter; few plants can flower
in these low-temperature conditions. To dissect this phe-
nomenon, RNA of flowers at the bud, blooming and with-
ering stages were collected, and RNA-seq was performed.
A MapMan analysis showed that many types of transcrip-
tion factors were significantly up- or downregulated (Fig-
ure 5). When comparing the transcriptomes of flowers
between the bud and blooming stages, most genes in the
blooming stage were downregulated (Figure 5a). The GO
terms of the downregulated genes were enriched in terms
of catalytic activity, response to stress and transcription
regulator activity (Figure S13a). Notably, we found that the
expression of WRKY genes was significantly upregulated
in the blooming stage, which might result from a response
to the quickly decreased temperature in this stage (Fig-
ure S13b). Numerous studies have revealed that WRKY
genes are involved in cold or chilling tolerance in plants
(Lafuente et al., 2017; Li et al., 2017; Luo et al., 2017). In
addition, two gene families containing domain PF00313
(‘cold-shock’ DNA-binding domain) and PF04180 (low-tem-
perature viability protein) were significantly expanded in
the genome of C. salicifolius (Figure S14a), and might con-
tribute to the cold tolerance of C. salicifolius. One gene
containing the PF0313 domain, Cs11g02011, showed high
expression in low-temperature conditions, and might be
related to the cold tolerance of C. salicifolius (Figure S14b).
The expression of genes related to flavonoid metabolism
and phenylpropanoid synthesis observably increased in
>20 kb
<5 kb 5-20 kb
0.32(185)
0.22(865)
0.19(545)
Ka / Ks
Gene length
egatnecreP
Gene length
Log 2
(FPK
M +
1)
(a) (b)
(c) (d)
No. of genes
0
0.05
0.1
0.15
0.2
<5 kb 5-10 kb 10-20 kb >20 kb
Gypsy Copiaunknown All LTRs
0-20 kb >20 kb0
5
10
(e)
Exon
IntronLTR
Cs05g01228
AT3G54640.1
AT4G02610.1
LOC_Os03g58320.1
LOC_Os03g58300.1
LOC_Os03g58290.1
LOC_Os07g08430.1
LOC_Os03g58260.2
AmTr v1.0 scaffold2.531
AmTr v1.0 scaffold2.532 1kb
Figure 4. Characterization of long genes.
(a) Percentage of long terminal repeats (LTRs) within genes of different
lengths.
(b) Values of Ka/Ks between paralogous genes in the Chimonanthus salici-
folius genome. The paralogous genes are categorized into three groups
based on their lengths, and the median values of Ka/Ks between different
groups and within each group are shown. The values in the bracket are the
number of gene pairs.
(c) The relationship between the length of genes and their expression
levels. The expression level for one gene is the average FPKM values from
the 30 RNA-seq samples generated in this study.
(d) Top 10 enriched GO terms for long genes (> 20 kb).
(e) Phylogenetic tree and gene structure of genes with PF00290 (tryptophan
synthase alpha chain) domain in four species, Amborella trichopoda, Ara-
bidopsis thaliana, Oryza sativa and C. salicifolius. The gene of C. salicifolius
is indicated by a red circle. The bootstrap values above 70 are shown on the
nodes.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
6 Qundan Lv et al.
the withering stage, suggesting that secondary metabolite
accumulation in seeds began at this stage (Figures 5b and
S15).
Investigation of genes encoding enzymes of the flavonoid
biosynthetic pathways
Genes involved in the flavonoid biosynthetic pathways
have been identified and characterized in plants (Saito
et al., 2013), including phenylalanine ammonialyase (PAL),
cinnamate 4-hydroxylase (C4H), 4-coumaroyl:CoA ligase
(4CL), chalcone synthase (CHS), chalcone isomerase (CHI),
flavanone 3-hydroxylase (F3H), flavonoid 30-hydroxylase(F30H), flavonol synthase (FLS), UDP glucose:flavonoid 3-O-
glycosyltranferase (UFGT), flavonol 3‑O‑glucoside rhamno-
syltransferase (GRT), dihydroflavonol 4-reductase (DFR)
and anthocyanidin synthase (ANS; Figure 6a). UFGT and
GRT are the genes that directly affect the content of final
flavonoid products. Homologs of UFGT and GRT, which
were genes with PF00201 (UDP-glucoronosyl and UDP-
glucosyl transferase) domain, were identified from C. sali-
cifolius genome. A phylogenetic tree was constructed by
Arabidopsis UDP-glucosyltransferase multigene family
genes and C. salicifolius genes with PF00201 domain (Fig-
ure S16). The genes clustered with UGT79 and UGT91
were UFGT and GRT genes, respectively. In total, 82 homo-
logs of the genes within the flavonoid biosynthetic path-
way were identified in C. salicifolius genome (Figure S17;
Table S8).
Gene expression analysis showed that the genes in the
flavonoid biosynthetic pathways had the highest expres-
sion level in the leaf buds (Figure 6b), indicating that more
flavonoid might be generated in this stage. It is worth not-
ing that a CHS (Cs04g01186) and two FLS (Cs07g00635 and
Cs07g00727) genes had extremely high expression levels
(FPKM > 3000) in the leaf buds. The products of these
genes might be functional bioactivators related to the ther-
apeutic effects of C. salicifolius.
The leaves of C. salicifolius have been used as tradi-
tional medicine in Eastern China for hundreds of years (Ma
et al., 2017). A previous study showed that six flavonoids,
including kaempferol, kaempferol-3-O-glucoside, kaemp-
ferol-3-O-rutinoside, quercetin, isoquercetin and rutin,
were rich in the leaves of plants in the Calycanthus (Yang
et al., 2018). Flavonoids in different tissues of C. salicifolius
were detected by high-performance liquid chromatography
(HPLC), and contents of these flavonoids were obtained
(Table S9). In general, these flavonoids in leaves were
much more than that in other tissues except flower buds.
Flower buds had the highest content of the upstream flavo-
noids (kaempferol and quercetin), while the leaf buds had
the highest content of downstream flavonoids (kaemp-
ferol-3-O-rutinoside and rutin).
In total, 34 UFGT, GRT and FLS homologs were identi-
fied in the C. salicifolius genome. However, which genes
were involved in the biosynthesis of the six flavonoids in
C. salicifolius was unclear. The correlation of flavonoid
contents and the expression values of homologous genes
were estimated (Table S10). Two FLS homologs
(Cs07g00635 and Cs07g00727), four GRT homologs
(Cs01g03503, Cs01g03506, Cs03g03359 and Cs04g02810)
and two UFGT homologs (Cs03g02624 and Cs03g02680)
showed significantly positive correlation (P < 0.05) of rutin
(Figure 6c), indicating the eight genes might be involved in
the synthesis of rutin in C. salicifolius. In addition, the two
FLS homologs and three GRT homologs showed strong
positive correlation of kaempferol-3-O-rutinoside. This is in
line with the fact that GRT-encoded flavonol 3‑O‑glucoside
rhamnosyltransferase synthetizes both rutin and kaemp-
ferol-3-O-rutinoside.
To investigate the upstream genes that were involved in
biosynthesis of the six flavonoids, correlation of the
expression of the eight rutin-related genes and other
homologs was analyzed (Figure 6d). The CHS gene
(a)
(b)
Figure 5. Transcriptomic profiles of Chimonanthus salicifolius flowers dur-
ing different developmental stages.
An overview of the dynamic expression changes in diverse pathways for
blooming versus bud (a) and withering versus blooming (b). Color intensity
corresponds to the expression fold change at the log2 scale (red: upregu-
lated, blue: downregulated).
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,The Plant Journal, (2020), doi: 10.1111/tpj.14874
The genome of Chimonanthus salicifolius 7
(Cs04g01186) with extremely high expression value
showed significant correlation (r > 0.8, P < 0.05) to the
homologs genes that were involved in the rutin synthesis,
indicating that the CHS genes might be an important
upstream gene in the rutin biosynthetic pathway. There-
fore, we concluded that those nine genes might be
involved in the rutin synthesis. Furthermore, we found that
genes involved in the flavonoid biosynthesis pathways had
good correlation. For example, the genes that produce fla-
vones upstream of chalcone, including PAL, 4CL and C4H,
showed significantly positive coexpression patterns. Simi-
larly, the genes that produce flavonoids upstream of dihy-
droquercetin, including CHS, CHI and F3H, were also
positively coexpressed. In addition, genes expressed after
the divergence of the two flavonoid biosynthetic pathways,
including FLS, DFR and ANS, revealed negatively corre-
lated expression, suggesting that along with the accumula-
tion of flavonoids, which was affected by the expression of
FLS, the synthesis of anthocyanins (affected by DFR and
ANS) was suppressed.
DISCUSSION
Chimonanthus salicifolius is likely to be sister to eudicots
The phylogenetic position of the magnoliids was different
in the four versions of the APG system (The Angiosperm
Phylogeny Group, 1998, 2003, 2009, 2016). Magnoliids are
sister to both monocots and eudicots in APG I, sister to
monocots in APG II, and sister to the clade containing both
monocots and eudicots in APG III and IV. Although several
genomes of magnoliids had been published and phyloge-
nomic analysis of magnoliids had been carried out with
the L. chinense, C. kanehirae, P. americana and P. nigrum
genomes, respectively, the phylogenetic placement of
magnoliids was still inconclusive. Based on the phylogeny
constructed by 211 strictly single-copy genes in 13 seed
plants, Chaw et al. (2019) found that C. kanehirae (repre-
senting magnoliids) was sister to the eudicots with strong
bootstrap support (bootstrap value was 100). Based on the
phylogenetic tree that was constructed by 502 low-copy
orthogroups in 11 plant species, Chen et al. (2019) found
that L. chinense (representing magnoliids) was sister to
monocots and eudicots with weak bootstrap support (boot-
strap value was 50). Rendon-Anaya et al. (2019) suggested
P. americana as sister to the enormous monocot and eudi-
cot lineages according to the phylogenetic tree that was
constructed by 176 single-copy genes in 19 angiosperms.
In this study, phylogenetic analyses based on both con-
catenated alignments and coalescent-based approaches
revealed that magnoliids had a closer relationship to eudi-
cots than monocots, suggesting that C. salicifolius is likely
to be sister to the eudicots. This result disagrees with the
(b)
(a)
(c)
(d)PAL 4CL C4H CHS CHI F3H F3’H FLS DFR ANS UFT91 (GRT) UFT79 (UFGT)
0
10
1004000
FPKM
PAL
4CL
C4H
CHS
CHIF3H
F3’H
FLS
DFR
ANS
UGT91
UGT79
stem
pericarp
seed
root
Leaf(senescent)
Leaf(bud)flower(withering)flower(blooming)flower(bud)
Leaf(young)
ru�n
Cs04g01186 Cs07g00635 Cs07g00727
phenylalanine
cinnamic acid coumaric acid
PAL C4H 4CL
coumaroyl-CoA
CHS
naringenin chalcone
CHI
naringenin
F3H
dihydrokaempferol
F3’H
dihydroqercetin
FLSUFGT
kaempferol
UFGT
quercetin
GRT
GRT
DFR ANS
isoquercetin
kaempferol 3-O-glucoside kaempferol 3-O-rutinoside
rutin
leucocyanidin cyanidin
anthocyanin proanthocyanidin
Flower
Leaf
FLS
UFGTGRT
F3’H
PAL 4CL C4H CHS CHI F3H F3’H DFR ANS FLS UGT91 (GRT) UGT79 (UFGT)
tneiciffeoc noitalerroc
ru�n
Figure 6. Genes involved in the flavonoid biosyn-
thetic pathway.
(a) The flavonoid biosynthesis pathway in Chimo-
nanthus salicifolius. The flavonoids that were
mainly synthetized in leaf and flower were indicated
by green and red boxes, respectively.
(b) The expression levels of different paralogous
genes encoding different enzymes in the flavonoid
biosynthesis pathway.
(c) The correction of flavonoid biosynthetic genes
and rutin content in different tissues. P < 0.05 is
indicated by red circle.
(d) The coexpression coefficient matrix for genes in
the flavonoid biosynthesis pathway. The yellow
arrow indicates the CHS gene (Cs04g01186)
strongly related to the genes that are involved in
rutin synthesis.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
8 Qundan Lv et al.
resolution of APG III and IV, which placed magnoliids as
sister to a clade containing both monocots and eudicots.
However, it is in line with a previous analysis of 59 low-
copy nuclear genes in 26 Mesangiospermae (Zeng et al.,
2014), and a phylogeny constructed by orthologous low-
copy nuclear genes in 115 plant species (Zhang et al.,
2020). In addition, it was also supported by the phyloge-
nomic framework constructed by 410 single-copy nuclear
gene families extracted from genome and transcriptome
data from 1153 species (One Thousand Plant Transcrip-
tomes Initiative, 2019).
Ancient whole-genome duplications in the Chimonanthus
genome
In this study, we inferred and placed two rounds of ancient
WGD in the genome of C. salicifolius by incorporating Ks
plots and ortholog divergences, synteny analyses, and the
MAPS phylogenomic approach. We show evidence for an
ancient polyploidy event only found in the Chimonanthus
genome, and not shared with Cinnamomum and Lirioden-
dron. This WGD was not inferred in the 1KP project (One
Thousand Plant Transcriptomes Initiative, 2019). This is
likely due to difficulties in detecting two highly overlapping
WGD peaks with mixture models from duplicate gene age
distributions. Based on the similarity of Ks distribution of
Idiospermum australiense and Calycanthus floridus from
the 1KP study, this Chimonanthus WGD is likely shared by
the Calycanthaceae. We also show evidence consistent
with an ancient WGD shared among Laurales and Magno-
liales. A previous study has shown the ancient WGD
inferred in the Liriodendron genome likely predated the
divergence of Magnoliaceae and Lauraceae (Chen et al.,
2019). Based on the genome of Cinnamomum and tran-
scriptome of 17 Laurales and Magnoliales from the 1KP
project, previous studies inferred an ancient polyploidy
event shared by Laraceae and another round of ancient
WGD at the ancestry of Laurales and Magnoliales (Chaw
et al., 2019; One Thousand Plant Transcriptomes Initiative,
2019). Consistent with these studies, we found further evi-
dence for the placement of this ancient WGD shared by
Laurales and Magnoliales by our MAPS phylogenomic
approach. Overall, our ancient WGD analyses are largely
consistent with previous findings, and provide clear evi-
dence for two rounds of ancient WGDs in Chimonanthus.
The magnoliid genomes contain a large number of long
genes
In total, 2737 long genes were identified from the C. salici-
folius genome, much more than monocot and eudicot gen-
omes. Long genes with long introns (> 10 kb) have also
been detected in animals, such as humans, Rattus norvegi-
cus, Danio rerio and Drosophila. In the human genome,
the number of introns longer than 24 kb was more than
8000, and the super-long-introns (> 100 kb) numbered
more than 1200 (Shepard et al., 2009). Previous research
on the long introns of Drosophila revealed that some of
the long introns underwent recursive splicing (Hatton
et al., 1998; Conklin et al., 2005; Sibley et al., 2015). Muta-
tions that occurred in the recursive splicing sites resulted
in many human diseases (Chabot and Shkreta, 2016). The
recursive splicing is a splicing phenomenon difficult to
capture, and requires nascent RNA sequencing, which can
profile pre-mRNA transcripts shortly after they are tran-
scribed (Pai et al., 2018). With the data for designed tran-
scriptomic experiments, more characteristics for the long
genes (such as recursive splicing and other mechanisms)
in the genomes of magnoliids could be explored in future.
The Chimonanthus salicifolius genome benefits functional
genomics research and molecular breeding of
Chimonanthus salicifolius
The genus Chimonanthus is widely grown in Asia, America
and Europe. Chimonanthus salicifolius is distributed
mainly in central and eastern China. It is collected and
used as a traditional medicine. The plants of this species
show vigorous growth, tolerance to several abiotic and
biotic stresses, and flowering at low temperatures. Despite
its importance, C. salicifolius is still not deeply utilized,
and its basic research is lacking.
Based on the high quality of the reference genome, gen-
ome-wide association studies (GWAS) and genome-wide
linkage mapping could be performed to quickly and com-
prehensively identify quantitative trait loci (QTLs) that are
related to the yield (yield of leaf buds) and quality (content
of flavones and other bioactive secondary metabolites) of
C. salicifolius. Using gene annotation and gene expression
information, candidate genes in the QTL regions could be
identified. Genome-editing and genetic complementation
experiments, which will also benefit from this genome by
using gene sequences, could be carried out to validate the
candidate genes. These genes can be further utilized in the
molecular breeding for high-yield and superior-quality
C. salicifolius cultivars.
Thus, an accurate reference genome of C. salicifolius
will provide a platform for elucidating the genomic evolu-
tion of the Chimonanthus genus and understanding the
genes responsible for biosynthesis of the various flavo-
noids made in C. salicifolius as well as laying a foundation
for the molecular breeding of C. salicifolius.
EXPERIMENTAL PROCEDURES
Plant materials, genomic DNA extraction and sequencing
The individual C. salicifolius that was used for genome sequenc-ing was originally collected from Liandu District (28°27053″N,119°55031″E), Lishui City, Zhejiang Province in Eastern China, andpreserved in the Lishui Institute of Agriculture and ForestrySciences. The RNA samples were collected from a wild population
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,The Plant Journal, (2020), doi: 10.1111/tpj.14874
The genome of Chimonanthus salicifolius 9
of C. salicifolius in the natural environment at Kaihua County(29°14026″N, 118°27057″E), Zhejiang Province, Eastern China.
Genomic DNA was extracted from young leaves of C. salicifoliusplants using a The DNeasy Plant Mini Kit (Qiagen, Hilden, Ger-many) according to the user manual. The further treatment andpreparation of the genomic DNA of Illumina sequencing followedthe description in Wei et al. (2016). PacBio SMRTbell libraries(20 kb inserts) were prepared with a Template Prep Kit (Pacific Bio-sciences, Menlo Park, CA, USA), and 12 SMRT cells were run on thePacBio Sequel system with P6-C4 chemistry (Chin et al., 2013).
RNA extraction and sequencing
Tissues of roots, stems, leaf buds and seeds, as well as flowersand leaves in three developmental stages were collected fromthree individuals, and total RNA was extracted from each sampleusing RNeasy Plant Mini Kit (Qiagen) according to the user man-ual. The cDNA was synthesized from 20 lg total RNA using ReverTra Ace (TOYOBO, Osaka, Japan) with oligo(dT) primer followingthe manufacturer’s protocol. High-throughput sequencing wasthen performed on the Illumina HiSeq X Ten platform.
Genome size estimation
Flow cytometry was used to determine the nuclear DNA contentof C. salicifolius as described by Dole�zel et al. (2007). Sampleswere prepared by homogenizing young leaves of C. salicifoliusand O. sativa ssp. japonica cv. Nipponbare (as an internal stan-dard, 0.91 pg/2C; Ammiraju et al., 2006) on ice in Galbraith’s buf-fer (5 mM sodium metabi-sulfite and 5 ll b-mercaptoethanolcomplemented) with 50 lg ml�1 propidium iodide, and then ana-lyzed on a MoFlo XDP Cell Sorter (excitation 488 nm, emission620 nm; Beckman Coulter, Hialeah, FL, USA) after filtration. Thedata were further analyzed with FlowJo_V10.4.0 software. Thenuclear DNA content of C. salicifolius was estimated as followswith 1 pg of DNA assumed to be equivalent to 9.78 9 108 bp:Sample 1C value = Reference 1C value 9 sample 2C mean peakposition/reference 2C mean peak position. Genome size estima-tion based on Illumina short reads was conducted via a 17-bpk-mer frequency analysis with ‘kmerfreq’ as implemented inSOAPdenovo2 (Luo et al., 2012).
De novo assembly and genome evaluation
De novo assembly of C. salicifolius was performed using Falconv1.87 (Chin et al., 2016) software. After the process of base error cor-rection, overlap graphs were built, and consensus contigs were con-structed based on raw PacBio long reads. Contig sequences werealigned against each other to remove redundant sequences withmore than 85% similarity and overlap. The Illumina data were alignedwith the assembly contigs by bwa (Li and Durbin, 2009), and SNP andindel errors were corrected using Pilon v1.22 (Walker et al., 2014).
The contigs were scaffolded by FragScaff v140324.1 (Adeyet al., 2014) using 10 9 Genomics data. Based on Hi-C data, scaf-folds were anchored to 11 pseudomolecules using LACHESIS soft-ware (Burton et al., 2013). The completeness of the assembledgenome was evaluated by BUSCO v3 using the ‘embryophyta_od-b9’ database (Simao et al., 2015).
Repeat and gene annotation
We constructed a C. salicifolius genome repeat library usingRepeatModeler v1.0.11 with the default parameters (Chen, 2004).The constructed C. salicifolius repeat library was further used torun RepeatMasker v4.0.7 (Chen, 2004) for whole-genome repeatannotation.
The combination of ab initio gene prediction, protein homologevidence and transcriptomic evidence was used for the predictionof protein-coding genes. AUGUSTUS v3.0.3 (Stanke and Waack,2003), SNAP v5.0 (Leskovec and Sosic, 2016) and GeneMark-ETv4.212 (Lomsadze et al., 2014) were used in ab initio gene predic-tion. The protein sequences of Arabidopsis were aligned to theassembled C. salicifolius genome by Exonerate (Slater and Birney,2005) to achieve evidence for gene structure. The open readingframes (ORFs) in the transcripts from the RNA-seq data were pre-dicted by PASA v2.0.1 (Haas et al., 2003). Finally, all the predic-tions were combined into consensus gene models using EVM(Haas et al., 2008).
The predicted C. salicifolius gene models were aligned againstthe Swiss-Prot and NR protein databases for functional annotation(BLASTP, E-value ≤ 1E-5). InterProScan v5 (Zdobnov and Apwei-ler, 2001) was then applied for the prediction of protein domainsand GO terms for each gene model with the setting ‘-appl PfamA-goterms -pa’. Non-coding RNAs were predicted by the Infernalprogram using the default parameters (Nawrocki and Eddy, 2013).
Phylogenetic analysis and estimation of divergence time
A total of 17 plant species, including four magnoliids (C. salici-folius, P. americana, C. kanehirae, L. chinense), three monocots(O. sativa, Z. mays, M. acuminata), eight eudicots (A. thaliana,P. trichocarpa, P. mume, V. vinifera, D. carota, M. guttatus,A. coerulea, N. nucifera), A. trichopoda and S. moellendorffii wereselected for building the phylogenetic tree. Except for N. nucifera,all the genomes were downloaded from the ftp site of JGI (ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v12.0/). Paralogs andorthologs among the 17 species were identified using theOrthoFinder pipeline with the parameter ‘-M msa -oa’ (Emms andKelly, 2015), and the protein sequences of the identified 103 sin-gle-copy genes were used for phylogenetic tree construction.RAxML v8 (Stamatakis, 2014) was used for the tree construction,with the parameters ‘-m PROTGAMMAAUTO–auto-prot=bic’ toautomatically select the best protein model. A total of 100 boot-strap resampling was performed. The phylogenetic tree was visu-alized using MEGA v5 (Tamura et al., 2011). In addition, ASTRAL-III v5.7.3 (Zhang et al., 2018) was applied to infer the coalescence-based species tree with 1420 gene trees (Figure S6).
Estimation of divergence and ancient whole-genome
duplications
DupPipe analyses of ancient whole-genome duplications. Foreach genome, we used the DupPipe pipeline to construct genefamilies and estimate the age distribution of gene duplications(Barker et al., 2008, 2010). We translated DNA sequences and iden-tified ORFs by comparing the Genewise (Birney et al., 2004) align-ment to the best-hit protein from a collection of proteins from 25plant genomes from Phytozome (Goodstein et al., 2012). For allDupPipe runs, we used protein-guided DNA alignments to alignour nucleic acid sequences while maintaining the ORFs. We esti-mated synonymous divergence (Ks) using PAML with the F3X4model (Yang, 2007) for each node in the gene family phylogenies.We then used mixture modeling to identify significant peaks con-sistent with a potential WGD and to estimate their median paralogKs values. Significant peaks were identified using a likelihood ratiotest in the boot.comp function of the package mixtools in R (Bena-glia et al., 2009).
Estimating orthologous divergence. To place putative WGDs inrelation to lineage divergence, we estimated the synonymous
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
10 Qundan Lv et al.
divergence of orthologs among pairs of species that may share aWGD based on their phylogenetic position and evidence from thewithin-species Ks plots. We used the RBH Orthologue pipeline(Barker et al., 2010) to estimate the mean and median synony-mous divergence of orthologs, and compared those with the syn-onymous divergence of inferred paleopolyploid peaks. Weidentified orthologs as reciprocal best blast hits in pairs of tran-scriptomes. Using protein-guided DNA alignments, we estimatedthe pairwise synonymous divergence for each pair of orthologsusing PAML with the F3X4 model (Yang, 2007).
Synteny analyses and dating of ancient whole-genome
duplications and orthology divergence. The genomiccollinearity blocks for intra- and interspecies comparisons formagnoliids were identified by MCscan program (Tang et al.,2008). We performed all-against-all LAST (Kielbasa et al., 2011)and chained the LAST hits with a distance cutoff of 10 genes,requiring at least 5 gene pairs per synteny block. The syntenic‘depth’ function implemented in MCscan was applied to estimatethe duplication history in respective genomes. The genomic syn-teny was visualized by the python version of MCScan (Tang et al.,2008) and Circos (Krzywinski et al., 2009). The dating of ancientWGDs and orthology divergence were estimated using the for-mula T = Ks/2R, where Ks refers to the synonymous substitutionsper site, and R (3.02 9 10�9) is the synonymous substitution ratefor magnoliids estimated by Cui et al. (2006). Estimation of thedivergence times for A. trichopoda – O. sativa and O. sativa –magnoliids was based on TimeTree (Kumar et al., 2017).
MAPS analyses of whole-genome duplications from gen-
omes of multiple species. To determine the WGD nodeacross the magnoliid phylogeny, the MAPS tool (Li et al., 2015,2018) was applied. Six species, including the four magnoliids(P. americana, C. kanehirae, C. salicifolius, L. chinense), onemonocot species (O. sativa) and A. trichopoda, were selected asoutgroup. Orthologous groups for the six species were obtainedfrom Orthofinder (Emms and Kelly, 2015). We chose gene familieswith a maximum gene family size of 20, and achieved a total num-ber of 8437 gene families. The phylogenetic trees for the 8437gene families constructed by FastTree (Price et al., 2009) were ana-lyzed by the MAPS program. Both null and positive simulations ofthe background gene birth and death rates were performed tocompare with the observed number of duplications at each node.
For null simulations, we estimated the gene birth rate (k) anddeath rate (l) for the selected six species with WGDgc (Rabieret al., 2014). Gene count data of each gene family for the six spe-cies were obtained from Orthofinder (Emms and Kelly, 2015). Theestimated parameters (k = 1.355; l = 0.050) were configured in theMAPS program, and the gene trees were then simulated withinthe species tree using the GuestTreeGen program from GenPhylo-Data (Sjostrand et al., 2013). For each species tree, we simulated3000 gene trees with at least one tip per species: 1000 gene treesat the estimated k and l, 1000 gene trees at half of the estimated kand l, 1000 trees at three times k and l according to the settingsin the 1KP program (One Thousand Plant Transcriptomes Initia-tive, 2019; Li and Barker, 2020). We then randomly resampled1000 trees without replacement from the total pool of gene trees100 times to provide a measure of uncertainty on the percentageof subtrees at each node. A Fisher’s exact test was used to identifylocations with significant increases in gene duplication comparingwith a null simulation.
For positive simulations, we simulated gene trees using thesame methods described above. However, we incorporated WGDs
at the location in the MAPS phylogeny with significantly largernumbers of gene duplications compared with the null simulation.We allowed at least 20% of the genes to be retained following thesimulated WGD to account for biased gene retention and loss.
Identification and validation of long genes
The lengths of all genes were screened, and genes longer than20 kb were selected. Twenty long genes were randomly selected,and their coding sequences were amplified from the cDNA of dif-ferent C. salicifolius tissues using KOD-FX Plus (TOYOBO). Theprimers used for cloning long genes are listed in Table S11. Theamplified fragments were ligated into pMD18-T cloning vector byusing pMDTM 18-T Vector Cloning Kit (TAKARA, Shiga, Japan) afteradding A-tailing through DNA A-Tailing Kit (TAKARA). Positivesingle bacterial colonies were selected for plasmid extraction andfurther sequencing. The sequences were aligned with that of thelong genes.
Gene expression profiling
The raw paired-end RNA-seq reads were filtered into clean databy FASTP (Chen et al., 2018). The clean reads were aligned to ourgenerated C. salicifolius genome reference by Hisat2 (Kim et al.,2015), and StringTie (Pertea et al., 2015) was adopted for quantifi-cation of expression. The differential expression analysis was per-formed with Cuffdiff in the Cufflinks package (Trapnell et al.,2010). The gene coexpression pattern was visualized using the Rpackage ‘corrplot’.
The MapMan software was used to investigate the transcrip-tomic profiles of different developmental stages of flowers andleaves. A functional annotation database was constructed withMercator (Lohse et al., 2014). The list of significantly differentiallyexpressed genes was loaded into MapMan to analyze the signifi-cantly up- and downregulated pathways. GO enrichment analysiswas performed using agriGO (Tian et al., 2017), with the GOterms identified with InterProScan as the species background.The ‘Plant GO slim’ option was selected, and a false discoveryrate (FDR) criterion of 0.05 was used for the considered enrich-ment GO terms.
Identification of genes involved in the flavonoid pathway
To identify the candidate genes involved in the flavonoid pathwayin the C. salicifolius genome, we collected the genes in A. thalianathat were documented in the flavonoid pathway (Saito et al.,2013). The protein sequences of genes for four species (A. tri-chopoda, O. sativa, A. thaliana, C. salicifolius) were combined intoa database. Using each gene of A. thaliana in the flavonoid path-way as a query sequence, BLASTP was applied to scan homolo-gous genes (E-value thresholds: 1E-10). Phylogenetic trees wereconstructed for the homologous genes of the four species byRAxML v8 (Stamatakis, 2014), and further used for identificationof candidate orthologous genes.
Evaluation of flavonoid content in different tissues
The tissues used in the flavonoid evaluation were in accordancewith the samples used in the RNA-seq. These samples were col-lected and dried at 60°C. The dried samples were ground intopowder, and were filtered by passing through 80–100 mesh. HPLCanalysis was carried out on Agilent 1260 instrument following themethod described previously (Yang et al., 2018). Contents of sixflavonoids, including kaempferol, kaempferol-3-O-glucoside,kaempferol-3-O-rutinoside, quercetin, isoquercitrin and rutin, wereanalyzed with commercial reference standards. Pearson
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,The Plant Journal, (2020), doi: 10.1111/tpj.14874
The genome of Chimonanthus salicifolius 11
correlation coefficient was calculated for expression values ofeach gene in the identified flavonoid pathway with the measuredflavonoid contents.
ACKNOWLEDGEMENTS
This work is financially supported by the Zhejiang Major Science& Technology Project of New Agricultural Varieties (2016C02058),the Zhejiang Province Major Science & Technology Project(2012C12014-1), the National Natural Science Foundation of China(31671282), Shanghai Science and Technology Committee Rising-Star Program (19QA1406500), and Shanghai Engineering ResearchCenter of Plant Germplasm Resources (17DZ2252700). The authorsthank Nextomics Biosciences Co., Ltd (Wuhan) for the help in gen-ome assembly, Dr Qiang Zhao from National Center for GeneResearch of Chinese Academy of Sciences for assistance in gen-ome annotation, Dr Yunpeng Zhao (Zhejiang University), and DrJun Yang (Chinese Academy of Sciences) for the discussion andproviding valuable suggestions to the manuscript. The authorsgratefully acknowledge the support of the IBM high-performancecomputing cluster of Analysis Center of Agrobiology and Environ-mental Sciences, Zhejiang University.
AUTHOR CONTRIBUTIONS
KC, XH and XW conceived and coordinated the project. QL,
JL, QW, JF, JP, ZC and WC prepared the materials and
conducted the experiments. JL and JQ performed the
assembly and annotation of the genome. JQ, XW, ZL, WZ
and JL carried out the phylogenetic, comparative genomics
and transcriptome analysis. XW, JQ, ZL, QL MB and KC
wrote the manuscript.
CONFLICT OF INTEREST
The authors declare no competing financial interests.
DATA AVAILABILITY STATEMENT
The assembled C. salicifolius genome and its related data
have been deposited under NCBI BioProject accession
PRJNA602413. The genome assembly has been assigned
with the accession number JAAGOE000000000. The SRA
accession numbers for the raw sequencing data (Pacbio,
Illumina, 10 9 Genomics, and Hi-C) are SRR11127589-
SRR11127597 and SRR11191851-SRR11191853. The tran-
scriptomic data generated in this study are under acces-
sion numbers SRR11109013-SRR11109042. The
C. salicifolius genome assembly and the annotated genes
are accessible at http://xhhuanglab.cn/data/Chimonanthus_
salicifolius.html.
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online ver-sion of this article.
Figure S1. Genome survey of C. salicifolius based on K-mer analy-sis using Illumina sequencing data.
Figure S2. Genome size estimation based on flow cytometry usingO. sativa as an internal reference.
Figure S3. Hi-C contact map of the 11 constructed pseudochromo-somes.
Figure S4. Percentage of long genes and all genes that present ingenomic regions of different repetitive levels.
Figure S5. Heterozygous SNP distribution in the repeat sequenceregions.
Figure S6. Coalescent-based phylogenetic tree constructed by1420 orthologous genes retrieved from 15 plants.
Figure S7. Distribution of Ks among paralogs in four magnoliidplants.
Figure S8. Genomic syntenic depth ratio between magnoliidsagainst A. trichopoda and V. vinifera.
Figure S9. Long genes validated by PCR amplification.
Figure S10. Tissues used for RNA-seq.
Figure S11. PCA based on the expression profile of all genes fordifferent tissues of C. salicifolius.
Figure S12. Transcriptomic profiles for different tissues of C. sali-cifolius.
Figure S13. GO and MapMan terms for the significantly differen-tially expressed genes between bud and blooming stages.
Figure S14. Expansion of two gene families related to cold toler-ance.
Figure S15. Transcriptomic profile for metabolism-related genesvisualized by MapMan.
Figure S16. Classification of UDP-glucosyltransferase multigenefamily in the C. salicifolius genome.
Figure S17. Distribution of flavonoid pathway genes in the C. sali-cifolius genome.
Table S1. Assessment of the completeness of the genome assem-bly by BUSCO analysis
Table S2. Summary statistics of repeat sequences in the C. salici-folius genome.
Table S3. Comparison of number of genes with specific proteindomains in C. salicifolius and magnoliids against 11 monocot andeudicot plants.
Table S4. MAPS result for placements of WGDS for magnoliidsand their simulated distributions.
Table S5. Long genes that were successfully amplified from cDNAof C. salicifolius tissues.
Table S6. GO enrichment terms of long genes.
Table S7. Summary of RNA-seq data generated in this study.
Table S8. Table S8 Homologous genes involved in flavone biosyn-thetic pathways in C. salicifolius.
Table S9. Flavonoid content of different tissues in C. salicifolius.
Table S10. Correlation of flavonoid content and the expression offlavonoid biosynthetic genes.
Table S11. Primers used in PCR amplification for validation of longgenes.
REFERENCES
Adey, A., Kitzman, J.O., Burton, J.N. et al. (2014) In vitro, long-range
sequence information for de novo genome assembly via transposase
contiguity. Genome Res. 24, 2041–2049.Amborella Genome Project (2013) The Amborella genome and the evolution
of flowering plants. Science, 342, 1241089.
Ammiraju, J.S.S., Luo, M.Z., Goicoechea, J.L. et al. (2006) The Oryza bacte-
rial artificial chromosome library resource: construction and analysis of
12 deep-coverage large-insert BAC libraries that represent the 10 gen-
ome types of the genus Oryza. Genome Res. 16, 140–147.Barker, M.S., Kane, N.C., Matvienko, M., Kozik, A., Michelmore, R.W.,
Knapp, S.J. and Rieseberg, L.H. (2008) Multiple paleopolyploidizations
during the evolution of the Compositae reveal parallel patterns of dupli-
cate gene retention after millions of years. Mol. Biol. Evol. 25, 2445–2455.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
12 Qundan Lv et al.
Barker, M.S., Dlugosch, K.M., Dinh, L., Challa, R.S., Kane, N.C., King, M.G.
and Rieseberg, L.H. (2010) EvoPipes.net: bioinformatic tools for ecologi-
cal and evolutionary genomics. Evol. Bioinform. Online 6, 143–149.Benaglia, T., Chauveau, D., Hunter, D.R. and Young, D.S. (2009) mixtools: an
R package for analyzing finite mixture models. J. Stat. Softw. 32, 1–29.Birney, E., Clamp, M. and Durbin, R. (2004) GeneWise and genomewise.
Genome Res. 14, 988–995.Burton, J.N., Adey, A., Patwardhan, R.P., Qiu, R., Kitzman, J.O. and Shen-
dure, J. (2013) Chromosome-scale scaffolding of de novo genome
assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125.
Chabot, B. and Shkreta, L. (2016) Defective control of pre-messenger RNA
splicing in human disease. J. Cell Biol. 212, 13–27.Chaw, S.M., Liu, Y.C., Wu, Y.W. et al. (2019) Stout camphor tree genome
fills gaps in understanding of flowering plant genome evolution. Nat.
Plants, 5, 63–73.Chen, J.H., Hao, Z.D., Guang, X.M. et al. (2019) Liriodendron genome sheds
light on angiosperm phylogeny and species-pair differentiation. Nat.
Plants, 5, 18–25.Chen, N. (2004) Using RepeatMasker to identify repetitive elements in geno-
mic sequences. Curr. Protoc. Bioinformatics, Chapter 4, Unit 4.10.
https://doi.org/10.1002/0471250953.bi0410s25
Chen, S., Zhou, Y., Chen, Y. and Gu, J. (2018) fastp: an ultra-fast all-in-one
FASTQ preprocessor. Bioinformatics, 34, i884–i890.Chin, C.S., Alexander, D.H., Marks, P. et al. (2013) Nonhybrid, finished
microbial genome assemblies from long-read SMRT sequencing data.
Nat. Methods. 10, 563–569.Chin, C.S., Peluso, P., Sedlazeck, F.J. et al. (2016) Phased diploid genome
assembly with single-molecule real-time sequencing. Nat. Methods, 13,
1050–1054.Conklin, J.F., Goldman, A. and Lopez, A.J. (2005) Stabilization and analysis
of intron lariats in vivo. Methods, 37, 368–375.Cui, L.Y., Wall, P.K., Leebens-Mack, J.H. et al. (2006) Widespread genome
duplications throughout the history of flowering plants. Genome Res. 16,
738–749.Dole�zel, J., Greilhuber, J. and Suda, J. (2007) Estimation of nuclear DNA
content in plants using flow cytometry. Nat. Protoc. 2, 2233–2244.Emms, D.M. and Kelly, S. (2015) OrthoFinder: solving fundamental biases in
whole genome comparisons dramatically improves orthogroup inference
accuracy. Genome Biol. 16, 157.
Endress, P.K. and Doyle, J.A. (2009) Reconstructing the ancestral angios-
perm flower and its initial specializations. Am. J. Bot. 96, 22–66.Goodstein, D.M., Shu, S., Howson, R. et al. (2012) Phytozome: a compara-
tive platform for green plant genomics. Nucleic Acids Res. 40, D1178–D1186.
Haas, B.J., Delcher, A.L., Mount, S.M. et al. (2003) Improving the Arabidop-
sis genome annotation using maximal transcript alignment assemblies.
Nucleic Acids Res. 31, 5654–5666.Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White,
O., Buell, C.R. and Wortman, J.R. (2008) Automated eukaryotic gene
structure annotation using EVidenceModeler and the program to assem-
ble spliced alignments. Genome Biol. 9, R7.
Hatton, A.R., Subramaniam, V. and Lopez, A.J. (1998) Generation of alterna-
tive ultrabithorax isoforms and stepwise removal of a large intron by res-
plicing at exon-exon junctions. Mol. Cell, 2, 787–796.Hu, L., Xu, Z., Wang, M. et al. (2019) The chromosome-scale reference gen-
ome of black pepper provides insight into piperine biosynthesis. Nat.
Commun. 10, 4702.
Jaillon, O., Aury, J.M., Noel, B. et al. (2007) The grapevine genome
sequence suggests ancestral hexaploidization in major angiosperm
phyla. Nature, 449, 463–467.Kielbasa, S.M., Wan, R., Sato, K., Horton, P. and Frith, M.C. (2011) Adaptive
seeds tame genomic sequence comparison. Genome Res. 21, 487–493.Kim, D., Langmead, B. and Salzberg, S.L. (2015) HISAT: a fast spliced
aligner with low memory requirements. Nat. Methods, 12, 357–360.Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D.,
Jones, S.J. and Marra, M.A. (2009) Circos: an information aesthetic for
comparative genomics. Genome Res. 19, 1639–1645.Kumar, S., Stecher, G., Suleski, M. and Hedges, S.B. (2017) TimeTree: a
resource for timelines, timetrees, and divergence times. Mol. Biol. Evol.
34, 1812–1819.
Lafuente, M.T., Estables-Ortiz, B. and Gonzalez-Candelas, L. (2017) Insights
into the molecular events that regulate heat-induced chilling tolerance in
Citrus Fruits. Front. Plant Sci. 8, 1113.
Leskovec, J. and Sosic, R. (2016) SNAP: a general purpose network analysis
and graph mining library. ACM Trans. Intell. Syst. Technol. 8, 1–20.Li, D., Jiang, Y.Y., Jin, Z.M., Li, H.Y., Xie, H.J., Wu, B. and Wang, K.W. (2016)
Isolation and absolute configurations of diastereomers of 8alpha-hy-
droxy-T-muurolol and (1alpha,6beta,7beta)-cadinane-4-en-8alpha,10al-
pha-diol from Chimonanthus salicifolius. Phytochemistry, 122, 294–300.Li, D., Liu, P., Yu, J., Wang, L., Dossa, K., Zhang, Y., Zhou, R., Wei, X. and
Zhang, X. (2017) Genome-wide analysis of WRKY gene family in the
sesame genome and identification of the WRKY genes involved in
responses to abiotic stresses. BMC Plant Biol. 17, 152.
Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Bur-
rows-Wheeler transform. Bioinformatics, 25, 1754–1760.Li, Z., Baniaga, A.E., Sessa, E.B., Scascitelli, M., Graham, S.W., Rieseberg,
L.H. and Barker, M.S. (2015) Early genome duplications in conifers and
other seed plants. Sci. Adv. 1, e1501084.
Li, Z., Tiley, G.P., Galuska, S.R., Reardon, C.R., Kidder, T.I., Rundell, R.J. and
Barker, M.S. (2018) Multiple large-scale gene and genome duplications
during the evolution of hexapods. Proc. Natl. Acad. Sci. USA, 115, 4713–4718.
Li, Z. and Barker, M.S. (2020) Inferring putative ancient whole-genome
duplications in the 1000 Plants (1KP) initiative: access to gene family
phylogenies and age distributions. GigaScience 9(2). https://doi.org/10.
1093/gigascience/giaa004.
Lohse, M., Nagel, A., Herter, T., May, P., Schroda, M., Zrenner, R., Tohge,
T., Fernie, A.R., Stitt, M. and Usadel, B. (2014) Mercator: a fast and sim-
ple web server for genome scale functional annotation of plant sequence
data. Plant Cell Environ. 37, 1250–1258.Lomsadze, A., Burns, P.D. and Borodovsky, M. (2014) Integration of mapped
RNA-Seq reads into automatic training of eukaryotic gene finding algo-
rithm. Nucleic Acids Res. 42, e119.
Luo, D.L., Ba, L.J., Shan, W., Kuang, J.F., Lu, W.J. and Chen, J.Y. (2017)
Involvement of WRKY transcription factors in abscisic-acid-induced cold
tolerance of banana fruit. J. Agric. Food Chem. 65, 3627–3635.Luo, R., Liu, B., Xie, Y. et al. (2012) SOAPdenovo2: an empirically improved
memory-efficient short-read de novo assembler. GigaScience, 1, 18.
Ma, G.L., Yang, G.X., Xiong, J., Cheng, W.L., Cheng, K.J. and Hu, J.F. (2015)
Salicifoxazines A and B, new cytotoxic tetrahydro-1,2-oxazine-containing
tryptamine-derived alkaloids from the leaves of Chimonanthus salici-
folius. Tetrahedron Lett. 56, 4071–4075.Ma, S.J., Lv, Q.D., Zhou, H., Fang, J., Cheng, W.L., Jiang, C.X., Cheng, K.J.
and Yao, H. (2017) Identification of traditional She medicine Shi-Liang
tea species and closely related species using the ITS2 barcode. Appl. Sci.
7, 195.
Massoni, J., Couvreur, T.L.P. and Sauquet, H. (2015) Five major shifts of
diversification through the long evolutionary history of Magnoliidae (an-
giosperms). BMC Evol. Biol. 15, 49.
Moore, M.J., Bell, C.D., Soltis, P.S. and Soltis, D.E. (2007) Using plastid gen-
ome-scale data to resolve enigmatic relationships among basal angios-
perms. Proc. Natl. Acad. Sci. USA, 104, 19 363–19 368.
Nawrocki, E.P. and Eddy, S.R. (2013) Infernal 1.1: 100-fold faster RNA
homology searches. Bioinformatics, 29, 2933–2935.One Thousand Plant Transcriptomes Initiative (2019) One thousand plant tran-
scriptomes and the phylogenomics of green plants. Nature, 574, 679–685.Pai, A.A., Paggi, J.M., Yan, P., Adelman, K. and Burge, C.B. (2018) Numer-
ous recursive sites contribute to accuracy of splicing in long introns in
flies. Plos Genet. 14, e1007588.
Palmer, J.D., Soltis, D.E. and Chase, M.W. (2004) The plant tree of life: an
overview and some points of view. Am. J. Bot. 91, 1437–1445.Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T. and
Salzberg, S.L. (2015) StringTie enables improved reconstruction of a
transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295.Price, M.N., Dehal, P.S. and Arkin, A.P. (2009) FastTree: computing large
minimum evolution trees with profiles instead of a distance matrix. Mol.
Biol. Evol. 26, 1641–1650.Qiu, Y.L., Li, L.B., Wang, B., Xue, J.Y., Hendry, T.A., Li, R.Q., Brown, J.W.,
Liu, Y., Hudson, G.T. and Chen, Z.D. (2010) Angiosperm phylogeny
inferred from sequences of four mitochondrial genes. J. Syst. Evol. 48,
391–425.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,The Plant Journal, (2020), doi: 10.1111/tpj.14874
The genome of Chimonanthus salicifolius 13
Rabier, C.E., Ta, T. and Ane, C. (2014) Detecting and locating whole genome
duplications on a phylogeny: a probabilistic approach. Mol. Biol. Evol.
31, 750–762.Rendon-Anaya, M., Ibarra-Laclette, E., Mendez-Bravo, A. et al. (2019) The
avocado genome informs deep angiosperm phylogeny, highlights intro-
gressive hybridization, and reveals pathogen-influenced gene space
adaptation. Proc. Natl. Acad. Sci. USA, 116, 17 081–17 089.
Saito, K., Yonekura-Sakakibara, K., Nakabayashi, R., Higashi, Y., Yamazaki,
M., Tohge, T. and Fernie, A.R. (2013) The flavonoid biosynthetic pathway
in Arabidopsis: structural and genetic diversity. Plant Physiol. Biochem.
72, 21–34.Shepard, S., McCreary, M. and Fedorov, A. (2009) The peculiarities of large
intron splicing in animals. PLoS One, 4, e7853.
Sibley, C.R., Emmett, W., Blazquez, L. et al. (2015) Recursive splicing in long
vertebrate genes. Nature, 521, 371–375.Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. and Zdob-
nov, E.M. (2015) BUSCO: assessing genome assembly and annotation
completeness with single-copy orthologs. Bioinformatics, 31, 3210–3212.
Sjostrand, J., Arvestad, L., Lagergren, J. and Sennblad, B. (2013) GenPhylo-
Data: realistic simulation of gene family evolution. BMC Bioinformatics,
14, 209.
Slater, G.S. and Birney, E. (2005) Automated generation of heuristics for
biological sequence comparison. BMC Bioinformatics, 6, 31.
Soltis, D.E. and Soltis, P.S. (2019) Nuclear genomes of two magnoliids. Nat.
Plants, 5, 6–7.Stamatakis, A. (2014) RAxML version 8: a tool for phylogenetic analysis and
post-analysis of large phylogenies. Bioinformatics, 30, 1312–1313.Stanke, M. and Waack, S. (2003) Gene prediction with a hidden Markov
model and a new intron submodel. Bioinformatics, 19(Suppl 2), ii215–ii225.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. and Kumar, S.
(2011) MEGA5: molecular evolutionary genetics analysis using maximum
likelihood, evolutionary distance, and maximum parsimony methods.
Mol. Biol. Evol. 28, 2731–2739.Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M. and Paterson, A.H.
(2008) Synteny and collinearity in plant genomes. Science, 320, 486–488.
The Angiosperm Phylogeny Group (1998) An ordinal classification for the
families of flowering plants. Ann. Mo. Bot. Gard. 85, 531–553.The Angiosperm Phylogeny Group (2003) An update of the Angiosperm
Phylogeny Group classification for the orders and families of flowering
plants: APG II. Bot. J. Linn. Soc. 141, 399–436.The Angiosperm Phylogeny Group (2009) An update of the Angiosperm
Phylogeny Group classification for the orders and families of flowering
plants: APG III. Bot. J. Linn. Soc. 161, 105–121.
The Angiosperm Phylogeny Group (2016) An update of the Angiosperm
Phylogeny Group classification for the orders and families of flowering
plants: APG IV. Bot. J. Linn. Soc. 181, 1–20.Tian, T., Liu, Y., Yan, H., You, Q., Yi, X., Du, Z., Xu, W. and Su, Z. (2017)
agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017
update. Nucleic Acids Res. 45, W122–W129.
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren,
M.J., Salzberg, S.L., Wold, B.J. and Pachter, L. (2010) Transcript assembly
and quantification by RNA-Seq reveals unannotated transcripts and iso-
form switching during cell differentiation. Nat. Biotechnol. 28, 511–515.Walker, B.J., Abeel, T., Shea, T. et al. (2014) Pilon: an integrated tool for
comprehensive microbial variant detection and genome assembly
improvement. PLoS One, 9, e112963.
Wang, K.W., Li, D., Wu, B. and Cao, X.J. (2016) New cytotoxic dimeric and
trimeric coumarins from Chimonanthus salicifolius. Phytochem. Lett. 16,
115–120.Wang, N., Chen, H., Xiong, L., Liu, X., Li, X., An, Q., Ye, X.M. and Wang,
W.J. (2018) Phytochemical profile of ethanolic extracts of Chimonanthus
salicifolius S. Y. Hu. leaves and its antimicrobial and antibiotic-mediating
activity. Ind. Crop. Prod. 125, 328–334.Wang, X.X., Zhang, H.J., Li, D. and Wang, K.W. (2019) Coumarin and fla-
vone constituents of Chimonanthus salicifolius with antioxidant activi-
ties. Chem. Nat. Compd. 55, 534–537.Wei, X., Zhu, X., Yu, J., Wang, L., Zhang, Y., Li, D., Zhou, R. and Zhang, X.
(2016) Identification of sesame genomic variations from genome com-
parison of landrace and variety. Front. Plant Sci. 7, 1169.
Yang, N., Zhao, K., Li, X., Zhao, R., Aslam, M.Z., Yu, L. and Chen, L. (2018)
Comprehensive analysis of wintersweet flower reveals key structural
genes involved in flavonoid biosynthetic pathway. Gene, 676, 279–289.Yang, Z. (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol.
Biol. Evol. 24, 1586–1591.Zdobnov, E.M. and Apweiler, R. (2001) InterProScan–an integration platform
for the signature-recognition methods in InterPro. Bioinformatics, 17,
847–848.Zeng, L.P., Zhang, Q., Sun, R.R., Kong, H.Z., Zhang, N. and Ma, H. (2014)
Resolution of deep angiosperm phylogeny using conserved nuclear
genes and estimates of early divergence times. Nat. Commun. 5, 4956.
Zhang, C., Rabiee, M., Sayyari, E. and Mirarab, S. (2018) ASTRAL-III: polyno-
mial time species tree reconstruction from partially resolved gene trees.
BMC Bioinformatics, 19, 153.
Zhang, L., Chen, F., Zhang, X. et al. (2020) The water lily genome and the
early evolution of flowering plants. Nature, 577, 79–84.Zhang, R., Wang, B., Jian, O.Y., Li, J.Y. and Wang, Y.H. (2008) Arabidopsis
indole synthase, a homolog of tryptophan synthase alpha, is an enzyme
involved in the Trp-independent indole-containing metabolite biosynthe-
sis. J. Integr. Plant Biol. 50, 1070–1077.
© 2020 Society for Experimental Biology and John Wiley & Sons LtdThis article has been contributed to by US Government employees and their work is in the public domain in the USA,
The Plant Journal, (2020), doi: 10.1111/tpj.14874
14 Qundan Lv et al.