Data Upload Definitions

IPA is designed for the import of processed 'omics data. An ideal dataset is one that represents a comparison (or contrast) between two groups of 'omics samples, for example between treated samples and untreated controls, or samples from a disease state vs. healthy controls, or between some time point and an earlier time point. 
This comparison can measure the changes in gene expression, protein phosphorylation, metabolomics, variant loss or gain, etc.  As explained in greater detail below, the dataset should start with a column of molecular identifiers (for genes, proteins, or chemicals) and then ideally have additional columns for "measurements" such as fold changes and p-values.

Single vs Multiple Observation

An Observation is a list of molecule identifiers and their corresponding measurement values (e.g. fold change, p-values, etc) for a given experimental condition relative to the control.

A dataset file may contain a single observation or multiple observations. A Single Observation dataset contains only one experimental comparison (for example mutant vs. wild-type, or treated vs. mock-treated). A Multiple Observation dataset contains more than one experimental condition (i.e. a time course experiment with multiple time points, a dose response experiment with multiple doses, etc) and can be uploaded into IPA in a single file (e.g. text or Excel).

For multi-observation datasets, a maximum of 20 observations in a single file may be uploaded into IPA. Each of the 20 observations may have up to 8 columns of measurement values. For example, if you are uploading expression data where you have also information about variants in the underlying genes, each observation might have columns for expression fold change, expression p-value, expression FDR, and expression intensity in the control and expression intensity in the experimental condition as well as a column for variant loss/gain. This would represent 6 measurement value columns in each observation.

In a multi-observation file, you must have the same number and type of each column for EACH observation. However, if you have a phospho site column and/or note column, you can EITHER have one such column that applies to all the observations, OR one such column per observation. If you do have a single phospho site and/or note column in a multi-observation dataset, just assign it to any one of the observations, and IPA will apply it to all other observations in that file.
 

Species Supported in IPA

The following species are supported with full content in IPA.

  • Human
  • Mouse
  • Rat

For the following species, we support DbSNP, Entrez Gene, GenBank, GenPept, NCBI GI number, UniGene, and SwissProt/UniProt identifiers as well as some array-specific data. The identifiers for these species are mapped according to HomoloGene to the corresponding human, mouse and rat ortholog information in the Ingenuity Knowledge Base. Therefore, the content will be specific to human, mouse, and rat

  • Arabidopsis thaliana
  • Bos taurus (bovine)
  • Caenorhabditis elegans
  • Gallus gallus (chicken)
  • Pan troglodytes (chimpanzee)
  • Danio rerio (zebrafish)
  • Canis lupus familiaris (canine)
  • Drosophila melanogaster
  • Macaca mulatta (Rhesus Monkey)
  • Saccharomyces cerevisiae
  • Schizosaccharomyces pombe
  • Ovine (sheep)

 
Identifier Type

Identifiers are unique public or vendor annotations that represent a gene, protein or chemical. IPA takes items found in the Identifier column and attempts to map them to molecules that exist in our Knowledge Base.

A dataset file may contain more than one of the identifier types. If your dataset file has more than one identifier type, select all appropriate identifier types found in the dataset file from the dropdown menu. But please refrain from selecting all identifier types because this can lead to incorrect mapping when different identifier types happen to share IDs.

The application accepts the following identifiers types:

Identifier TypeExample Format
AffymetrixD26439_at
Affymetrix SNP ID1SNP_A-1507322
Affymetrix Transcript Transcript Cluster ID (Exon Arrays)23758318
AgilentA_68_P33008885
Applied Biosystems (Life Technologies)124112
CAS Registry Number50-28-2
CodeLinkGE102183
dbSNP3rs12345678
Ensembl ENSG00000169710
Entrez Gene (Locus Link)11287
GenBankAK025375
Gene Symbol- Mouse (Entrez Gene)Cyp3a5
Gene Symbol- Rat (Entrez gene)Cyp3a5
Gene Symbol-Human (Hugo/HGNC,Entrez Gene)CYP3A5
GenPeptBAC36826
GI Number6225801
Human Metabolome Database (HMDB)HMDB10075
Illumina4GI_31543715 or
NM_010603
IngenuityInternal to Ingenuity. Not available for use outside of IPA
International Protein IndexIPI00123456
KEGGC11476
Life Technologies (Applied Biosystems)124112
miRBase ID-Sanger 5MIMAT0003214 (recommended)
or
mmu-miR-483
PubChem CID5757
RefSeqNM_010603
UCSC (hg 18)uc001mng
UCSC (hg 19)uc010fth
UnigeneHs.111680
UniProt/Swiss-ProtP08908


1. Affymerix SNP ID mapping to Entrez Gene identifiers (and therefore to Ingenuity genes) was provided by Affymetrix. Affymetrix SNP ID mapping files are also available on their NetAFFX site. Please note that many Affymetrix SNP IDs do not map to a single gene – but rather lie within the region of several genes. In IPA, at this point only the Affymetrix SNP IDs that map unambiguously to a single gene will be considered mappable in IPA. Affymetrix SNP IDs that map to multiple genes will end up in the Unmapped tab of your Dataset Mapping file.


2. Exon Arrays from Affymetrix Human, Mouse, and Rat 1.0 ST are now supported at the level of Transcript Cluster ID. For more information on how Exon Probesets are grouped into Transcript Clusters, please click here. Entering data from an Exon Array into IPA will run an analysis that is similar to a traditional Affymetrix expression array dataset, since IPA does not distinguish molecules at the level of exons. Additionally, we will only support analysis of the core (22K) transcript probe set and not the extended (133K) or the full set (266K). We limit IPA analysis to the core set because this is the set that maps to genes.


3. SNP array data from Illumina or Affymetrix platforms, or directly upload NCBI dBSNP IDs for SNPs of interest can be uploaded into IPA. IPA will automatically map SNPs falling within or near gene-coding regions to the relevant gene orthologs for subsequent pathways analysis and exploration. A SNP is mapped to a gene if the SNP falls within the gene coding region, or within the 2kb upstream/0.5kb downstream range of the gene coding region.


4. Older Illumina probe identifiers (IDs) that have an integer format (e.g. 12345) have been found to be not unique across different species of Illumina expression arrays and therefore are no longer supported in IPA. We recommend using either the GI number or Refseq identifier (usually listed under Accession) for gene mapping. Please note that the arrays affected are older arrays (prior to December 2006) with Illumina probe IDs that have an integer format. Newer arrays have a different format that starts with ILMN_ and are not affected by this issue.


5. For microRNA identifiers, we recommend using the mature forms of the miRNAs (format MIMAT###) because they are stable identifiers (ie. they should not change often). miRNA names (format: mmu-miR-483 if mouse or hsa-miR-483 if human) can also be used for data upload, but they do change over time. We recommend to not use these identifiers if the MIMAT is available, but since some miRNA arrays provide annotations only with the name, we have provided mappings for them.


Measurement Value

A measurement value is a numerical value associated with the identifier in your omics experiment. For example, in a differential expression experiment, you may have data for the fold change for a gene, as well the p-value for the significance off the change in expression. These would be two different "measurement values" for that gene. The types of values and their expected range are listed below:
 

Measurement Value TypesUsed forExpected Values
Exp Ratio1Expression data, metabolomics data(0 to INF)
Exp Fold Change2Expression data, metabolomics data(-INF to -1) and (1 to +INF)
Exp Log RatioExpression data, metabolomics data(-INF to +INF)
Exp p-valueExpression data, metabolomics data(0 to 1)
Exp False Discovery Rate (q-value)Expression data, metabolomics data(0 to 100) though typically 0 to 1
Exp Intensity / RPKM / FPKM / CountsExpression data, metabolomics data(0 to +INF)
Exp Other (normalized around zero)Expression or other(-INF to +INF)
Variant Loss/GainVariant data (gene level) or other categorical gene-level data-2,-1,0,1,2
Variant ACMG ClassificationVariant data (gene level)-2,-1,0,1,2
Phospho Ratio1Phosphoproteomics data(0 to +INF)
Phospho Fold Change2Phosphoproteomics data(-INF to -1) and (1 to +INF)
Phospho Log RatioPhosphoproteomics data(-INF to +INF)
Phospho p-valuePhosphoproteomics data(0 to 1)
Phospho False Discovery Rate (q-value)Phosphoproteomics data(0 to 100), though typically 0 to 1
Phospho Intensity Phosphoproteomics data(0 to +INF)
Phospho SitePhosphoproteomics dataUp to 256 characters per row. Use alphanumeric characters and _ ( ) @  ; - > < : , . ! ? # ^ *
AbsentAllA
OverrideAllX
NoteAll Up to 256 characters per row. Use alphanumeric characters and _ ( ) @  ; - > < : , . ! ? # ^ *

1.    Ratio values are converted to fold changes upon upload into IPA. Ratios between 0 and 1 are converted to negative fold changes (i.e. downregulated) by taking their negative inverse (-1/x). For example, a ratio of 0.5 is converted to a -2 fold change., meaning that the molecule is 2-fold down-regulated. Ratio values greater than 1 are equivalent to positive fold changes and are not affected. Note that fold changes of exactly 1 or exactly -1 equate to “no change”.
2.    Log ratio = log fold change. So if your data is labeled “log fold change”, or “log2 fold change”, etc. then map it to a log ratio measurement in IPA. Do not map log fold change to fold change.


It is important to assign the right type of measurement value to your dataset columns. For example, if you have phosphoproteomics data, do not assign it as expression data. IPA treats the measurement types differently.

The application assumes that molecules that have high expression values (or have low p-values) by the user are more important, and focuses on these for network generation. The expression value can range from 0 to (-/+) INF (Infinity); the application uses the magnitude (absolute value) of the expression value to help prioritize molecules for inclusion in interaction networks.

All molecules whose measurement value(s) are above the cutoff(s) designated by the user when setting up an analysis are eligible for analysis. For p-value, molecules whose p-value is below the cutoff are candidates for analysis. 


If you are using a custom expression value type other than the ones listed in our application, contact Customer Support at support-ingenuity@qiagen.com for assistance.


Using More Than One Measurement Value Type

You may prioritize your list of molecules by assigning each identifier up to eight different measurement value types per observation. This means that the dataset file can contain 1) a single value type (e.g. expression fold change), 2) more than one expression value type (e.g. expression fold change and expression p-value), or 3) no measurement value type for each molecule. A corresponding cutoff for each value type can then be set before running the analysis. Molecules must meet cutoffs for all value types in order to be eligible for analysis.


Absent 
Absent can be used to exclude molecules that you know wish to ignore. For example, you may know that a certain gene is not important in the tissue you are studying, or that a particular expression value corresponding with a gene identifier is an untrustworthy outlier. 

Molecules annotated with an "A" (case-sensitive) in the Absent column will be considered absent and will not be considered for the analysis, irrespective of any measurement value. All other annotations (i.e. any string of characters other than "A") in the Absent column will be considered "present" and therefore can be considered for network generation and functional analysis. 

1) The Ingenuity definition of Absent is not the same as the Affymetrix definition. The Absent column should only be used to designate identifiers you wish to exclude from your analysis. It is not an indication of whether or not that sample had a positive signal in your experiment. 
2) If more than one identifier exists for one molecule (for example a duplicate gene identifier) and one of these is marked as "Absent", IPA will not include any of the other corresponding gene identifiers in the gene list even if some of the other identifiers are not marked as Absent and meet the cutoff. 
3) In a multiple observation experiment, only the IDs marked as absent in the individual observations will be omitted from analysis. If they are not marked as absent in certain observations, they will still be eligible for analysis in that observation. 

Override 
Override can be used to override any of the measurement values to indicate that a molecule be considered for analysis (i.e. “analysis-ready"). This optional column should not be used along with the Absent column. 

Annotate genes with an "X" in the Override column to indicate that they be considered for analysis. If you do not wish to Override a gene, leave the corresponding cell empty because entering any string of characters other than "X" in this column will also result in the gene being overridden.   

Please note that annotating a gene in the Override column only makes it "analysis-ready". It does not guarantee that it will be included in the results, because that depends on information present in the QIAGEN Knowledge Base. 

NOTE: If you mark some genes as Override, you MUST enter a cutoff for ALL expression value types. Not entering a cutoff for all expression value types causes only the Override genes to be eligible for analysis.