String Database

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Biological Database BIT2002 

Digital Assignment 1 
Faculty - RAMANATHAN K 

Rohit Chakraborty 18BCB0114


P Anirudh 18BCB0137
Shobhit D 18BCB0162

STRING  
Functional Protein Association Networks 

 
 
 
 
 
 
 
1. About the Database 
STRING is a database of known and predicted protein-protein interactions. 
The interactions include direct (physical) and indirect (functional) 
associations; they stem from computational prediction, from knowledge 
transfer between organisms, and from interactions aggregated from other 
(primary) databases.  

Interactions in STRING are derived from five main sources: 

1. Genomic Context Predictions 


2. High-throughput Lab Experiments 
3. (Conserved) Co-Expression 
4. Automated Text Mining 
5. Previous Knowledge in Databases 

The S
​ TRING​ database currently covers 24'584'628 proteins from 5'090 
organisms. 

 
 
 
 
 
 
 
 
 
 
2. Importance of Database 
Protein–protein interaction networks are an important ingredient for the 
system-level understanding of cellular processes. Such networks can be 
used for filtering and assessing functional genomics data and for providing 
an intuitive platform for annotating structural, functional and evolutionary 
properties of proteins. Exploring the predicted interaction networks can 
suggest new directions for future experimental research and provide 
cross-species predictions for efficient interaction mapping. 

Protein–protein interaction information can already be retrieved from a 


number of online resources. First, primary interaction databases which are 
largely collaborating provide curated experimental data originating from a 
variety of biochemical, biophysical and genetic techniques. Second, since 
protein–protein interactions can also be predicted computationally, a 
number of resources have their main focus on interaction prediction, using 
a variety of algorithms. Lastly, a group of online resources is providing an 
integration of both known and predicted interactions, thus aiming for high 
comprehensiveness and coverage. These include STRING, as well as 
GeneMANIA, FunCoup, I2D, ConsensusPathDB and others. Within this 
landscape of online resources, STRING places its focus on interaction 
confidence scoring, comprehensive coverage (in terms of number of 
proteins, organisms and prediction methods), intuitive user interfaces and 
on a commitment to maintain a long-term, stable resource (since 2000). 

Some important use cases of the database are - 

1. Researching protein-networks in the context of early immune system 


establishment. 
2. Highly connected proteins have stable steady-state distribution of 
gene expression. 
3. Searching for candidate genes involved in the immune response of 
gluten. 
4. Identifying candidates for unknown enzymes in a pathway. 
5. Using STRING to narrow the search space for two-locus epistatis. 
6. Using STRING to show network connectivity. 
7. STRING as a general purpose database. 
8. STRING to guide experiments. 
9. Prioritizing functional assignments in RNAi screens using interaction 
network data. 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3. Updation if Any? 
The data in this database can be uploaded by giving- 

1. Basic Information 
2. Node Data - Proteins 
3. Edge Data - Protein Associations 

And then we can review our dataset and confirm.  

 
 
 

The STRING App allows users to modify an already retrieved network in 
three different ways.  

First, the confidence cutoff for the imported evidence channels can be 
increased or decreased, which in the latter case involves fetching 
additional interactions from STRING.  

Second, users can expand the network by a user-specified number of 


interactors that are most closely associated with all network nodes or a 
selected subset of them.  

Third, any number of additional nodes can be queried by name and added 
to the existing network. 

 
 
 
 
 
 
 
4. How to retrieve the information? 

To specify your desired starting point of the analysis you have to use the 
input form at the STRING start page. 

● Protein by name 
● Protein by sequence 
● Multiple proteins 
● Multiple sequences 
● Organisms 
● Protein families (COGs) 
● Examples 
● Random entry 

You can search STRING by single protein name, multiple names or by 
amino acid sequence (in any format) There are also example inputs and a 
random input generator which will randomly select a protein with at least 4 
predicted links at medium confidence or better. There is a organism entry 
to see if your species of interest is available. There is the possibility to 
search by protein family rather than a protein in a single organism, by 
searching the COGs (clusters of orthologous groups) 

Commonly, you enter your protein of interest by supplying its name or 
identifier. The organism can be selected by clicking on the arrow or directly 
typing the name inside the relative input field (an autocompletion 
mechanism will appear to help you). General names that group more than 
one organism (e.g. "Mammals", "Chordata") can also be used. 

 
 
 
 
 
 
 
 
 
 
 
 
5. References – At least one Research article uses this 
dataset 
 
The paper for the database is -  

Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., 
Huerta-Cepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K. P., Kuhn, M., 
Bork, P., Jensen, L. J., & von Mering, C. (2015). STRING v10: protein-protein 
interaction networks, integrated over the tree of life. Nucleic acids research, 
43(Database issue), D447–D452. ​https://doi.org/10.1093/nar/gku1003​. 

There are in total 2


​ 450​ articles which use this dataset.  

Example - 

1. Hong, S. K., & Lee, J. G. (2020). DTranNER: biomedical named entity 


recognition with deep learning-based label-label transition model. 
BMC bioinformatics, 21(1), 53. 
https://doi.org/10.1186/s12859-020-3393-1 
2. Zhang, M. J., Ntranos, V., & Tse, D. (2020). Determining sequencing 
depth in a single-cell RNA-seq experiment. Nature communications, 
11(1), 774. ​https://doi.org/10.1038/s41467-020-14482-y 
3. Reyna, M. A., Haan, D., Paczkowska, M., Verbeke, L., Vazquez, M., 
Kahraman, A., Pulido-Tamayo, S., Barenboim, J., Wadi, L., Dhingra, P., 
Shrestha, R., Getz, G., Lawrence, M. S., Pedersen, J. S., Rubin, M. A., 
Wheeler, D. A., Brunak, S., Izarzugaza, J., Khurana, E., Marchal, K., … 
PCAWG Consortium (2020). Pathway and network analysis of more 
than 2500 whole cancer genomes. Nature communications, 11(1), 
729. ​https://doi.org/10.1038/s41467-020-14367-0 
4. Tian, S., Wang, C., Zhang, J., & Yu, D. (2020). The cox-filter method 
identifies respective subtype-specific lncRNA prognostic signatures 
for two human cancers. BMC medical genomics, 13(1), 18. 
https://doi.org/10.1186/s12920-020-0691-4 
5. Cacheiro, P., Muñoz-Fuentes, V., Murray, S. A., Dickinson, M. E., Bucan, 
M., Nutter, L., Peterson, K. A., Haselimashhadi, H., Flenniken, A. M., 
Morgan, H., Westerberg, H., Konopka, T., Hsu, C. W., Christiansen, A., 
Lanza, D. G., Beaudet, A. L., Heaney, J. D., Fuchs, H., Gailus-Durner, V., 
Sorg, T., … International Mouse Phenotyping Consortium (2020). 
Human and mouse essentiality screens as a resource for disease 
gene discovery. Nature communications, 11(1), 655. 
https://doi.org/10.1038/s41467-020-14284-2 
6. Jiang, C. H., Yuan, X., Li, J. F., Xie, Y. F., Zhang, A. Z., Wang, X. L., Yang, 
L., Liu, C. X., Liang, W. H., Pang, L. J., Zou, H., Cui, X. B., Shen, X. H., Qi, 
Y., Jiang, J. F., Gu, W. Y., Li, F., & Hu, J. M. (2020). 
Bioinformatics-based screening of key genes for transformation of 
liver cirrhosis to hepatocellular carcinoma. Journal of translational 
medicine, 18(1), 40. ​https://doi.org/10.1186/s12967-020-02229-8 
7. Landa-Galvan, H. V., Rios-Castro, E., Romero-Garcia, T., Rueda, A., & 
Olivares-Reyes, J. A. (2020). Metabolic syndrome diminishes 
insulin-induced Akt activation and causes a redistribution of 
Akt-interacting proteins in cardiomyocytes. PloS one, 15(1), 
e0228115. ​https://doi.org/10.1371/journal.pone.0228115 
8. Assefa, T., Zhang, J., Chowda-Reddy, R. V., Moran Lauter, A. N., Singh, 
A., O'Rourke, J. A., Graham, M. A., & Singh, A. K. (2020). 
Deconstructing the genetic architecture of iron deficiency chlorosis in 
soybean using genome-wide approaches. BMC plant biology, 20(1), 
42. ​https://doi.org/10.1186/s12870-020-2237-5 
9. Herzberg, D., Strobel, P., Müller, H., Meneses, C., Werner, M., & 
Bustamante, H. (2020). Proteomic profiling of proteins in the dorsal 
horn of the spinal cord in dairy cows with chronic lameness. PloS 
one, 15(1), e0228134. ​https://doi.org/10.1371/journal.pone.0228134 

You might also like