Upload and Analyze Your Data Tutorial

Overview:


Use IPA to upload and analyze your 'omics data.

Scenario:
You have RNA-seq, microarray, miRNA, proteomic, genomic, SNP, or metabolic data that you want to analyze with IPA. You need to know the rules, restrictions, and best practices for preparing the data for being uploaded into IPA and how to start an analysis

Tasks:

- Format the data in a way that IPA can upload.
- Set Upload options and assign the ID and Observation columns (measurement values).
- Set analyisis parameters: If the datasets are large, adjust expression cutoff values(s) to restrict their size.
- Run analysis

To understand more about interpreting the results of the analysis,see Analysis Results Tutorial 


1) Prepare your data for uploading (as spreadsheet format).

a) If your data is not in an Excel spreadsheet or tab delimited text file, transfer your data into one. IPA can also upload CuffDiff formatted files directly. See Cuffdiff file import.
b) If using Excel, perform basic calculations in the spreadsheet if necessary. Example: compute average ratio for experiment samples vs. control samples and p-values for replicates. Ideally, upload fold changes, log ratios, or log fold changes (which are the same thing as log ratios).
c) Make sure there is only one header row. (IPA can be set to ignore the first row when doing the import.)
d) Move the molecule IDs to the first column. If ID's are in the first column, IPA will scan down approximately 100 rows to guess the ID type(s) in the column. See Data Upload definitions for the types of IDs that are recognized.
e) IPA uses text in the header row to guess the column types. For example if it detects the text "fold change" it will assign it to Expr Fold Change.


User-added image

f) IPA allows a maximum of 20 observations per imported file. An observation is one "comparison" between an experiment and control. For example if you have a dataset with three time points ratio'ed to a 0 time point, and each with a set of fold changes and a set of p-values for each time point, that would be three observations. See Data Upload definitions for more on this topic. If your data has more than 20 observations, then reduce the data to 20 observations in the spreadsheet or pick and choose up to 20 to import during the upload process.
g) IPA allows up to 8 "measurement values" per observations. For example: fold change, p-values, and the average intensity of expression would equate to three measurements. Any more than 8 will be excluded during the upload process.
 

2) Launch IPA

3) Upload your data into IPA.

a) Select File>Upload Dataset... from the menu.
b) Select the dataset that you modified in Step 1.
c) Click Open.

Your data appears in the "Dataset Upload - …" window.
User-added image
d) Select "Flexible Format" if it's not already selected (The other options are for legacy support).
e) You can try clicking the Infer Observations button to see if IPA can guess the columns in your dataset. If that works well, skip to the Save step below. Otherwise, click the button again and continue with the next step.
f) If you have one header row in your dataset, make sure the Contains Column Header is set to Yes.
g) Select the appropriate Identifier Type(s). Note: IPA will guess the appropriate ID types from the IDs in the first column, but it is always a good idea to double check. Do not select all identifier types at once. This can lead to mis-mapping.
g) If the data came from a commonly known array platform, select it from the pull down menu, otherwise leave the value "Non Specified/applicable". Setting the array platform properly here will automatically select the same platform as “reference” in the create core analysis page and will improve the accuracy of the p-value calculations used in analysis.
h) Select the appropriate ID and Observation labels from the dropdown menus for each of the columns.

i. Set the column with your molecule identities to ID. Check the Dataset Summary tab to view the breakdown of mapped vs. unmapped molecules from your uploaded data.
ii. Set the column with the first observation value to Observation 1


A second pull down menu appears.
User-added image


iii. Set the new dropdown menu to the measurement type (i.e. Expr Fold Change, Expr p-value, etc.).
iv. Repeat the last two steps for each of the observations if there is more than one.


User-added image Note: If there is more than one measurement type per observation, be sure to assign each batch of measurement columns to the same observation. For example, if the data set has log ratio and p-value columns like the example shown above, this is NOT two observations-- do NOT assign one column to Observation 1 and the other to Observation 2. This is one observation (Observation 1) with two measurements as shown above.

If there are multiple observations, there are shortcuts available to make assigning the batches easier (as described below).

Shortcuts:
- Ignore: Right-click a column and select Ignore.
- Repeat Selection: Select a column(s), right-click the selected columns and select Repeat Selection. The assignments made for the selected columns will be repeated for the columns to the right of the selected ones. 
- Header Names -> Observation Name: This selection will take the name found in the column header and use it to label the observation. 
- Group In: Select more than one column, right-click the selected columns and select Group In. The selected columns will be grouped as an observation. 

User-added image


v. Leave any extraneous columns set to Ignore.


i) You can customize the names of the column headers.


i. Click the EDIT OBSERVATION NAMES button. This is especially important to do if uploading a multi-observation dataset like the example below.
ii. Select the name from the drop down menu or type in a new one.
iii. Click OK

User-added image


j) Click Save at the bottom of the window.


A Save Dataset dialog opens.


k) Click New Project to create a new one.
l) Enter a name for your dataset.
m) Click Save.


IPA saves the data in the Project Manager under the project folder that you selected.
 

User-added image
 

4)    Start a Core Analysis on your dataset.

a)    Select File>New>Core Analysis… from the application menu.

 
A Create Core Analysis dialog appears.


b)    Select the dataset that you saved in step 3 and click Next.


5)    Set Filters and General Settings for Analysis parameters. 


a)    Set the Reference Set parameter to the molecule set (reference set) that should be viewed as the complete universe when calculating the statistical significance. You may have already set this when uploading the data.
b)   You may either leave the Network Analysis parameters, Optional Analysis, Data Source, Confidence, Species, and Tissue & Cell Lines optionson their default settings or provide appropriate filters of your choice.

 


6)    Set Cutoffs

It is important to analyze the most significant molecules in your dataset. For example, if you uploaded the data for an entire microarray, you need to set a cutoff so that IPA analyzes only the significantly differentially expressed genes.

a)   Enter values for the Cutoffs.
b)   Click the Recalculate button to recalculate the "analysis ready" molecules (i.e. that passed your cutoffs)


Note: We recommend that the number of "analysis ready" molecules should ideally be less than ~3000 in order to focus on the most relevant molecules in your dataset. This will minimize noise in the results. Adjust the Cutoff values more stringently to restrict how many are analyzed. Also, the maximum number of molecules that can be analyzed in one analysis is 8000. You may use the expression value cutoff and also the analysis filter options to keep your “analysis ready” molecule size within the allowable limits.

User-added image


7)    Click the Run Analysis button at the bottom right of the window.


The Start Analysis dialog appears


a)   Select the same project that contains your dataset.
b)   Set the analysis name. It is often helpful to include the cutoff values used in the analysis in the name.
c)   Click OK.


An Analysis Running dialog appears to tell you that the analysis is in progress. When the analysis is complete an email is sent to you and the name of the analysis appears in the Project Manager window in bold text.


8)    Open the analysis and explore the results.

a)   When the analysis is finished, open it by double clicking on it in the Project Manager. If you analyzed a multi-observation dataset, the analysis will appear in a folder.

Results: See Analysis Results Tutorial for help on understanding analysis results.