Principal component analysis for bacterial proteomic analysis II Y-h. Taguchi Chuo University Akira Okamoto, Nagoya University
1. Introduction 2. Incubation condition of Streptococcus pyogenes & retrieval for proteomics data 3. PCA analysis of proteome 4. Biological meanings of obtained proteins 5. Summary & Conclusion
1. Introduction Streptococcus pyogenes (化膿レンサ球菌) is ... normal bacteria flora, but also can cause life-threatening diseases. Thus, it is important to know what the triggers for S. pyogenes to cause such dangerous diseases are. → In this study, we employ proteomic analysis of S. pyogenes during growing phase under two distinct conditions.
2. Incubation condition of Streptococcus pyogenes & retrieval for proteomics data 37℃, until or 4, 6, 14 and 20 hours (OD660 = 0.40, 0.83, 0.92, and 0.90) Under two distinct conditions 1) shaking (sha): more oxidize stress 2) static (sta): ordinary condition Fraction Cell (wc) and Supernatant (snt) [using centrifuge] Three biological replicates each
Retrieval of proteomic data mass spectrometry detection of fragmented proteins [by LTQ-Orbitrap XL + LC] Protein identification by MASCOT Software %emPAI (normalized amount of proteins) are used for further analysis.
3. PCA analysis of proteome Feature (protein) selection methods are similar to yesterday's presentation, “Refined blood-borne miRNome of human diseases via PCA-based feature extraction”, thus will be skipped. The reason why we employ this method differs from the reason why we employed this method yesterday (“selection of miRNA biomarker independent of selection of training/test sets”) To answer the question “What is significantly expressed during these processes?” without any prejudgements.
We would like to list “any” significant features in this experiment, e.g., Temporal significance sta:wc sta:snt sha:wc sha:snt time Incubation condition significance sta:wc sta:snt sha:wc sha:snt time
Fraction significance sta:wc sta:snt sha:wc sha:snt time Or their combinations..... sta:wc sta:snt sha:wc sha:snt time Unsupervised methods like PCA is useful. (Clustering may be OK, too. but it forces hierarchical or prejudged number of clusters.)
Results PCA embeddings of samples sha05_wc ⇒ three clear clusters sha05_snt ⇒ What are representative proteins? sha07_wc sha07_snt sha14_wc sha14_snt sha20_wc sha20_snt sta04_wc sta04_snt sta06_wc sta06_snt sta14_wc sta14_snt sta20_wc sta20_snt
Mclust： Model-Based Clustering The optimal model according to BIC for EM initialized by hierarchical clustering for parameterized Gaussian mixture models. best model: diagonal, equal shape with 9 components 警告メッセージ： 1: In hcEII(data = data) : # of observations <= data dimension 2: In summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) : best model occurs at the min or max # of components considered 3: In Mclust(t(XX)) : optimal number of clusters occurs at max choice Does not work without feature selection..... Useless for feature selection … orz
PCA embeddings of samples with only selected 23 proteins ⇒ Configuration is conserved ⇒ These 23 proteins are critical for this configuration We have repeated same procedure again after removing 23 proteins, and additional 30 proteins
4. Biological meanings of obtained proteins Peroxiredoxin reductase (SPy2079:AhpC), which is estimated to be involved in oxygen metabolism and hydrogen peroxide decomposition, is found in shaking culture condition rather than static condition. It seems reasonable that the increasing amount of AhpC in shaking condition because the shaking condition induces the higher oxygen stress. This is just an example. Almost all selected proteins are biologically reasonable.
5. Summary & Conclusion ・Proteomics analysis is applied to growth phases of S. pyogenes ・ PCA based, unsupervised feature extraction method is applied to proteomics data. ・Feature (protein) extraction based upon PCA extracted biologically important proteins. ・At the moment, we have not yet figure out the trigger of disease but more extensive researches will enable us to understand it.