String Database

Biological Database BIT2002
Digital Assignment 1
Faculty - RAMANATHAN K
Rohit Chakraborty 18BCB0114

P Anirudh 18BCB0137
Shobhit D 18BCB0162
STRING
Functional Protein Association Networks

1. About the Database
STRING is a database of known and predicted protein-protein interactions.
The interactions include direct (physical) and indirect (functional)
associations; they stem from computational prediction, from knowledge
transfer between organisms, and from interactions aggregated from other
(primary) databases.
Interactions in STRING are derived from five main sources:
1. Genomic Context Predictions

2. High-throughput Lab Experiments
3. (Conserved) Co-Expression
4. Automated Text Mining
5. Previous Knowledge in Databases
The S
TRING database currently covers 24'584'628 proteins from 5'090
organisms.

2. Importance of Database
Protein–protein interaction networks are an important ingredient for the
system-level understanding of cellular processes. Such networks can be
used for filtering and assessing functional genomics data and for providing
an intuitive platform for annotating structural, functional and evolutionary
properties of proteins. Exploring the predicted interaction networks can
suggest new directions for future experimental research and provide
cross-species predictions for efficient interaction mapping.
Protein–protein interaction information can already be retrieved from a

number of online resources. First, primary interaction databases which are
largely collaborating provide curated experimental data originating from a
variety of biochemical, biophysical and genetic techniques. Second, since
protein–protein interactions can also be predicted computationally, a
number of resources have their main focus on interaction prediction, using
a variety of algorithms. Lastly, a group of online resources is providing an
integration of both known and predicted interactions, thus aiming for high
comprehensiveness and coverage. These include STRING, as well as
GeneMANIA, FunCoup, I2D, ConsensusPathDB and others. Within this
landscape of online resources, STRING places its focus on interaction
confidence scoring, comprehensive coverage (in terms of number of
proteins, organisms and prediction methods), intuitive user interfaces and
on a commitment to maintain a long-term, stable resource (since 2000).
Some important use cases of the database are -
1. Researching protein-networks in the context of early immune system

establishment.
2. Highly connected proteins have stable steady-state distribution of
gene expression.
3. Searching for candidate genes involved in the immune response of
gluten.
4. Identifying candidates for unknown enzymes in a pathway.
5. Using STRING to narrow the search space for two-locus epistatis.
6. Using STRING to show network connectivity.
7. STRING as a general purpose database.
8. STRING to guide experiments.
9. Prioritizing functional assignments in RNAi screens using interaction
network data.

3. Updation if Any?
The data in this database can be uploaded by giving-
1. Basic Information
2. Node Data - Proteins
3. Edge Data - Protein Associations
And then we can review our dataset and confirm.

The STRING App allows users to modify an already retrieved network in
three different ways.
First, the confidence cutoff for the imported evidence channels can be
increased or decreased, which in the latter case involves fetching
additional interactions from STRING.
Second, users can expand the network by a user-specified number of

interactors that are most closely associated with all network nodes or a
selected subset of them.
Third, any number of additional nodes can be queried by name and added
to the existing network.

4. How to retrieve the information?
To specify your desired starting point of the analysis you have to use the
input form at the STRING start page.
● Protein by name
● Protein by sequence
● Multiple proteins
● Multiple sequences
● Organisms
● Protein families (COGs)
● Examples
● Random entry
You can search STRING by single protein name, multiple names or by
amino acid sequence (in any format) There are also example inputs and a
random input generator which will randomly select a protein with at least 4
predicted links at medium confidence or better. There is a organism entry
to see if your species of interest is available. There is the possibility to
search by protein family rather than a protein in a single organism, by
searching the COGs (clusters of orthologous groups)
Commonly, you enter your protein of interest by supplying its name or
identifier. The organism can be selected by clicking on the arrow or directly
typing the name inside the relative input field (an autocompletion
mechanism will appear to help you). General names that group more than
one organism (e.g. "Mammals", "Chordata") can also be used.

5. References – At least one Research article uses this
dataset

The paper for the database is -
Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D.,
Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K. P., Kuhn, M.,
Bork, P., Jensen, L. J., & von Mering, C. (2015). STRING v10: protein-protein
interaction networks, integrated over the tree of life. Nucleic acids research,
43(Database issue), D447–D452. https://doi.org/10.1093/nar/gku1003.
There are in total 2

450 articles which use this dataset.
Example -
1. Hong, S. K., & Lee, J. G. (2020). DTranNER: biomedical named entity

recognition with deep learning-based label-label transition model.
BMC bioinformatics, 21(1), 53.
https://doi.org/10.1186/s12859-020-3393-1
2. Zhang, M. J., Ntranos, V., & Tse, D. (2020). Determining sequencing
depth in a single-cell RNA-seq experiment. Nature communications,
11(1), 774. https://doi.org/10.1038/s41467-020-14482-y
3. Reyna, M. A., Haan, D., Paczkowska, M., Verbeke, L., Vazquez, M.,
Kahraman, A., Pulido-Tamayo, S., Barenboim, J., Wadi, L., Dhingra, P.,
Shrestha, R., Getz, G., Lawrence, M. S., Pedersen, J. S., Rubin, M. A.,
Wheeler, D. A., Brunak, S., Izarzugaza, J., Khurana, E., Marchal, K., …
PCAWG Consortium (2020). Pathway and network analysis of more
than 2500 whole cancer genomes. Nature communications, 11(1),
729. https://doi.org/10.1038/s41467-020-14367-0
4. Tian, S., Wang, C., Zhang, J., & Yu, D. (2020). The cox-filter method
identifies respective subtype-specific lncRNA prognostic signatures
for two human cancers. BMC medical genomics, 13(1), 18.
https://doi.org/10.1186/s12920-020-0691-4
5. Cacheiro, P., Muñoz-Fuentes, V., Murray, S. A., Dickinson, M. E., Bucan,
M., Nutter, L., Peterson, K. A., Haselimashhadi, H., Flenniken, A. M.,
Morgan, H., Westerberg, H., Konopka, T., Hsu, C. W., Christiansen, A.,
Lanza, D. G., Beaudet, A. L., Heaney, J. D., Fuchs, H., Gailus-Durner, V.,
Sorg, T., … International Mouse Phenotyping Consortium (2020).
Human and mouse essentiality screens as a resource for disease
gene discovery. Nature communications, 11(1), 655.
https://doi.org/10.1038/s41467-020-14284-2
6. Jiang, C. H., Yuan, X., Li, J. F., Xie, Y. F., Zhang, A. Z., Wang, X. L., Yang,
L., Liu, C. X., Liang, W. H., Pang, L. J., Zou, H., Cui, X. B., Shen, X. H., Qi,
Y., Jiang, J. F., Gu, W. Y., Li, F., & Hu, J. M. (2020).
Bioinformatics-based screening of key genes for transformation of
liver cirrhosis to hepatocellular carcinoma. Journal of translational
medicine, 18(1), 40. https://doi.org/10.1186/s12967-020-02229-8
7. Landa-Galvan, H. V., Rios-Castro, E., Romero-Garcia, T., Rueda, A., &
Olivares-Reyes, J. A. (2020). Metabolic syndrome diminishes
insulin-induced Akt activation and causes a redistribution of
Akt-interacting proteins in cardiomyocytes. PloS one, 15(1),
e0228115. https://doi.org/10.1371/journal.pone.0228115
8. Assefa, T., Zhang, J., Chowda-Reddy, R. V., Moran Lauter, A. N., Singh,
A., O'Rourke, J. A., Graham, M. A., & Singh, A. K. (2020).
Deconstructing the genetic architecture of iron deficiency chlorosis in
soybean using genome-wide approaches. BMC plant biology, 20(1),
42. https://doi.org/10.1186/s12870-020-2237-5
9. Herzberg, D., Strobel, P., Müller, H., Meneses, C., Werner, M., &
Bustamante, H. (2020). Proteomic profiling of proteins in the dorsal
horn of the spinal cord in dairy cows with chronic lameness. PloS
one, 15(1), e0228134. https://doi.org/10.1371/journal.pone.0228134

String Database

Uploaded by

Copyright:

Available Formats

String Database

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

String Database

Uploaded by

Copyright:

Available Formats

Biological Database BIT2002

Rohit Chakraborty 18BCB0114

Interactions in STRING are derived from five main sources:

1. Genomic Context Predictions

Protein–protein interaction information can already be retrieved from a

Some important use cases of the database are -

1. Researching protein-networks in the context of early immune system

And then we can review our dataset and confirm.

Second, users can expand the network by a user-specified number of

There are in total 2

1. Hong, S. K., & Lee, J. G. (2020). DTranNER: biomedical named entity

You might also like