Identifier of NCBI GEO dataset. It must begin with “GSE” followed by a numeric code (i.e. GSE10325). GSE file is a specific format from NCBI GEO which contains a gene expression matrix (Rows: genes or probe sets. Columns: samples) and all information about the experiment including annotation file and sample information. For more details, visit: http://www.ncbi.nlm.nih.gov/geo/info/datasets.html .
This web tool allows you to integrate your own data in the analysis. To upload the data, you must use the correct format (Figure 1). Your data must be saved in plain text format (.txt, or .tsv) where the first row contains “ID” in column one, followed by the sample counter in the next columns. The first value of the second row must be “NA” followed by a categorical vector (where value '0' is for group 1, '1' for group 2 and “X” if you want to exclude the sample from the analysis). The remaining rows contain the gene identifiers or the probe identifiers and, in the following columns, their expression values.
Figure 1: Data format to upload
Table 1 contains the admitted GEO platforms by ImaGEO. In addition, gene symbols can be used as gene identifiers for datasets submitted by yourself.
Table 1. Platforms admitted.
A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Gene expression meta-analysis combines different expression datasets from different sources. There are two major goals in meta-analysis studies: (1) Combine the same experimental condition across different studies to increase the sample size and the statistical power. (2) Compare different experimental conditions (i.e. different diseases) to discover common and different biomarkers.
Effect size can mean different quantitative measurements to explain the strength of a phenomenon in different groups. In our case, effect size is the difference between two group means (i.e. case and control, case1 and case2, etc) divided by standard deviation, which are considered combinable and comparable across different studies. It is divided into two specific methods:
The selection of one or another method depends on the heterogeneity of the data. In the context of gene expression meta-analysis, a fixed-effects will identify the genes with strongest effect in the studies, while a random-effects model attempts to identify the genes with strongest average effect in a hypothetical population of studies. Although the latter is theoretically preferable, it might not be optimal if, for example, a source of heterogeneity is not expected to occur again in future data. In this context, a heterogeneity test can be useful to decide what is the most suitable method to use.
This method integrates the P-values from individual analysis into a combined P-value. There are different methods:
These methods should give more significant results than effect size methods, but the confidence is smaller (greater number of potential false positives).
Combining p-values has an advantage for standardization of the associations from genomic studies to a common scale allowing to compare very heterogeneous datasets, for example datasets from different tissues.
Sample Selection tab shows four tables. The one at the top (Selected samples) shows the number of cases and controls selected for each dataset. Below this table there are the buttons to change the dataset and access the information of the samples in them. That information is shown in the second table at the left botton (Unassigned samples). In this table you have to select the samples and divide them according to the user criteria into the groups previously created (Cases/Controls in the example). The last two tables are specific to those groups. To select the samples you have to select them and click the right-oriented arrows related to the convenient table (Controls/Cases). Select them again in the group table and click the left-oriented to remove them for the analysis.
Figure 2: Sample selection
If you have any doubt, question or suggestion, you can write us to bioinfo@genyo.es .