Intrinsic protein disorder is an interesting structural feature where fully functional proteins lack a three-dimensional structure in
solution. In this work, we estimated the relative content of intrinsic protein disorder in 96 plant proteomes including monocots
and eudicots. In this analysis, we found variation in the relative abundance of intrinsic protein disorder among these major clades;
the relative level of disorder is higher in monocots than eudicots. In turn, there is an inverse relationship between the degree of
intrinsic protein disorder and protein length, with smaller proteins being more disordered. The relative abundance of amino acids
depends on intrinsic disorder and also varies among clades. Within the nucleus, intrinsically disordered proteins are more
abundant than ordered proteins. Intrinsically disordered proteins are specialized in regulatory functions, nucleic acid binding,
RNA processing, and in response to environmental stimuli. The implications of this on plants’ responses to their environment are
Introduction complete proteins that are fully functional even though they
do not fold into secondary or tertiary structures in solution
The traditional structure-function paradigm states protein (Uversky 2011; Pancsa and Tompa 2012). These proteins
function depends on a well-defined three-dimensional struc- are known as intrinsically disordered regions/proteins
ture. However, there are regions of proteins and even (IDRs/IDPs) and are present in all domains of life (Xue et al.
2010; Xue et al. 2012; Yruela et al. 2017). In eukaryotes, it is
estimated that between 23 and 28% of proteins are highly
Structural disorder is considered to be significantly higher in
(W, C, F, I, Y, V, L, H, T, and N), are underrepresented, while
they have an abundance of proline and polar and charged res-
similar between IDPs and their ordered counterparts In this study, we predicted intrinsic disorder in 96
(Radivojac 2003). These characteristics in the primary structure proteomes of plants. We found bias in the relative disordome
give IDPs/IDRs a high net charge and a low average hydropho- content among the different clades analyzed, with significant
bicity (Uversky et al. 2000). differences between monocots and eudicots. Unlike other re-
Intrinsic disorder promotes structural flexibility, and this flex- ports, we classified disorder predictions into four categories
ibility allows fast transitions between different structural states, (0–25, 25–50, 50–75, and 75–100% of intrinsic protein disor-
which promotes multispecific functions (Romero et al. 2001; der). Based on this criterion, we observed that protein roles
Radivojac 2003; Uversky 2011; Sun et al. 2013; Covarrubias depend on their disorder level. The disorder level affects the
et al. 2017; Zamora-Briseño et al. 2018). IDPs/IDRs are associ- abundance of aa and influences protein size, its distribution in
ated with the regulation of transcription, signaling, and stress the cell, and protein functions. For these reasons, we consid-
responses (Sun et al. 2013; Pietrosemoli et al. 2013). ered that disordome may have major adaptive implications.
The ubiquitous nature of IDPs in multiple cellular processes
has encouraged the development of programs for intrinsic pro-
tein disorder prediction, which are based on the physicochem- Materials and methods
ical attributes of these proteins. Some of these predictors have
been shown to be highly reliable (Romero et al. 2001; Peng In order to predict intrinsic protein disorder in plant proteomes,
et al. 2006; Mészáros et al. 2009; Xue et al. 2010; Walsh et al. we downloaded proteomes available in the Ensembl Genomes
2012; Dosztányi 2018). It is now possible to estimate the con- (Howe et al. 2020) and Phytozome (Goodstein et al. 2012)
tent of IDPs at the proteomic scale with high confidence genomic browsers and from NCBI. All sequences below 30
(Walsh et al. 2012; Kurotani et al. 2014; Yruela and aa in length were removed, as well as all non-specific posi-
Contreras-Moreira 2012). This has made possible a significant tions. For each proteome, disorder prediction was estimated in
number of studies aimed at answering questions about structur- the Espritz program using “X-ray” and “Best sw” parameters
al disorder at the genomic scale in a large number of models (Walsh et al. 2012). Predictions were grouped into four cate-
(Schad et al. 2011; Xue et al. 2012; Pietrosemoli et al. 2013; gories of intrinsic protein disorder: 0–25, 25–50, 50–75, and
Peng et al. 2014). However, it is often difficult to compare 75–100%. We estimated the relative abundance of each disor-
results obtained from different studies and to produce general- der category for each species. A phylogenetic tree was con-
izations from them, in part because each study uses different structed using PhyloT ( by
predictors (each with a different confidence level) and different using the scientific name of each species and results were
criteria to estimate and classify structural disorder (Pancsa and visualized with iTOL v3.4 (Letunic and Bork 2019).
Tompa 2012). We estimated the abundance of each aa per intrinsic protein
In plants, global-scale analyses of IDPs are limited to disorder category for monocots and eudicots. To find enriched
Arabidopsis thaliana and a few other plant models (Pancsa ontological functions among each category, protein sequences
and Tompa 2012; Yruela and Contreras-Moreira 2012; were annotated with InterproScan5 (Jones et al. 2014). This
Pietrosemoli et al. 2013; Kurotani et al. 2014; Vincent and allowed us to handle annotated proteins with a homogeneous
Schnell 2016; Liu et al. 2017; Alvarez-Ponce et al. 2018). criterion. Then, a random sub-sample of 25,000 proteins was
This limits the identification of biological roles of IDPs with- taken from each category to be analyzed using the WEGO
out homologous functions in other models. For example, online program (Ye et al. 2006) and compare parental onto-
plants have developed systems that allow them to adapt to logical terms that were significantly enriched by category
the environment from which they cannot escape (Moore (p < 0.001). The protein length and intrinsic disorder content
et al. 2008; Schad et al. 2011; Xue et al. 2012; Pietrosemoli of each category were compared between monocots and
et al. 2013; Peng et al. 2014). Since IDPs participate in sig- eudicots, using t test and Kruskal-Wallis, respectively.
naling cascades and stress response processes, IDPs may be Statistical differences among them were determined in R (R
particularly important in plants’ development and adaptation Development Core Team 2016), and data were plotted using
to their environment (Kovacs et al. 2008; Pietrosemoli et al. ggplot2 (Ginestet 2011). In addition, the binned data in the
2013; Liu et al. 2017; Alvarez-Ponce et al. 2018; Zamora- four categories of disorder were analyzed with a principal
Briseño et al. 2018). Furthermore, although conclusions de- component analysis (PCA) biplot, calculated with the
rived from other models may be applicable to plants, this is FactoMineR library (Lê et al. 2008) in the R environment. A
not always the case. For example, in a study evaluating the linear discriminant analysis effect size (LEfSe) (Segata et al.
correlation between the occurrence of post-translational mod- 2011) was performed to detect the discriminant protein cate-
ifications in IDPs/IDRs of plants, it was found that while gories between the eudicots and monocots; the significance
phosphorylations, acetylations, and O-glycosylations show a was stated at a p value < 0.05.
preference for IDPs/IDRs as in animals, methylations occur To examine in detail the association between intrinsic pro-
preferentially in ordered regions (Kurotani et al. 2014). tein disorder in both biological processes and cellular location,
Fig. 2 Comparison between the proteomes of monocots and dicots
considering the disorder abundance. a PCA biplot of the four categories each disorder category. c LEfSe analysis results. The category 0–25 is
of disordered proteins between eudicots and monocots. Vectors are better represented in the eudicots, while the categories 25–50, 50–75, and
plotted towards the direction of its major abundance in the samples. b 75–100 are better represented in the monocots
disorder and their functions. We consider that cataloging pro- are subjected to reduced evolutionary constraints (relaxed
teins into ordered versus disordered is overly simplistic. In evolutionary forces at these sites) and therefore have a higher
other words, it is important to determine not only whether or mutation rate than order-promoting aa (with stronger evolu-
not a protein is intrinsically disordered but also their degree of tionary constraints to keep their function). This explains why
disorder, since this is associated with function. In some ways, the former are clearly separated on the heat map, with a more
it attempts to capture part of the different intrinsic disorder conserved distribution pattern among clades compared with
flavors described for IDPs (Dunker et al. 2008; Walsh et al. disorder-promoting aa.
2012; Forcelloni and Giansanti 2020). The relative abundance of aa apparently differs among the
It is interesting that as intrinsic disorder increases, there is a clades. In general, it is considered that compared with struc-
decrease in protein length. This negative correlation between tured proteins, IDPs show a reduction in their contents of C,
intrinsic disorder and protein length has been previously re- W, Y, F, I, V, and L, at the same time as being significantly
ported and is generally accepted (Howell et al. 2012; Peng enriched in M, K, R, S, Q, P, and E (Dunker et al. 2008). This
et al. 2014; Afanasyeva et al. 2018; Zamora-Briseño et al. general rule does not seem to follow the same pattern in plants
2019). This is expected given the biased aa composition of because M is not enriched in any of the clades. Furthermore, A
IDPs because in some way the occurrence of amino acids is and G are enriched in IDPs of algae and monocot aa, but these
associated with protein length (Carugo 2008). Since longer aa are not usually considered enriched in IDPs (Radivojac
proteins tend to be more conserved than small proteins 2003). Furthermore, the enrichment of disorder-promoting
(Lipman et al. 2002), more disordered proteins must be less aa seems to differ among clades.
conserved. It is known that amino acid changes are faster for Since there is a positive correlation between genetic recom-
proteins with higher proportions of aa exposed to the solvent, bination rate and protein disorder frequency in plants, it has
as occurs with IDPs (Lin et al. 2007). Moreover, IDPs have a been proposed that genetic recombination could be considered
higher mutational rate than globular proteins and have a high an evolutionary force that contributes to structural disorder in
tolerance to mutations (Brown et al. 2002; Forcelloni and proteins (Yruela and Contreras-Moreira 2013). The fact that
Giansanti 2020). This suggests that disorder-promoting aa IDPs have a higher recombination rate, higher mutability is of
GO enrichment analysis of biological components for the A. thaliana proteome divided into disorder categories. Biological processes varied among categories
among categories
particular interest in plant adaptation to challenging environ- to their adaptations to different environmental stressors
mental conditions. According to our GO enrichment analysis, (Mastrangelo et al. 2012; Ling et al. 2019).
highly intrinsically disordered proteins are enriched in the In addition, intrinsic disorder influences the sub-cellular
response to biotic and abiotic stimuli. Moreover, intrinsic dis- localization of proteins; IDPs are enriched in the nucleus, as
order is higher in young genes and in genes created de novo in has been suggested for other non-plant models (Frege and
alternative reading frames, as well as in orphan genes of sev- Uversky 2015; Skupien-Rabian et al. 2015). This is very rea-
eral non-plant species (Rancurel et al. 2009; Mukherjee et al. sonable considering that IDPs have functional specialization
2015; Wilson et al. 2017). So, it is possible that proteins (Vincent and Schnell 2016; Deiana et al. 2019) and such func-
encoded by young and orphan genes in plants possess a higher tional specialization also depends on protein length (Howell
degree of disorder, which must be answered in the future. et al. 2012). Considering that monocots possess a higher pro-
Highly disordered proteins (> 75% intrinsic disorder) are portion of IDPs than eudicots, it is feasible that the proportion
particularly enriched in the regulation and transport of RNA, of nuclear proteins is also higher. In comparative terms, we
as well as RNA splicing. This is consistent with previous data expected that monocots would possess a higher proportion of
indicating that a large number of proteins that bind to RNA nuclear proteins than eudicots.
exhibit broad IDRs. For example, it is estimated that more Finally, given that a large part of the proteome with un-
than 50% of amino acid residues of RNA chaperones occur known functions (dark proteome) is enriched in IDPs
in IDRs (Tompa and Csermely 2004). This has wide-reaching (Bhowmick et al. 2016), it can be inferred that a large part
consequences. For example, alternative splicing is a very im- of the disordome represents a reservoir of potential functions
portant process for stress-induced responses in plants, which involved in the stress response that are waiting to be discov-
can modulate the phenotypic traits of plants and can contribute ered. This may be exploited for biotechnological purposes,
Towards an understanding of the role of intrinsic protein disorder on plant adaptation to environmental...
