The Pancreatic Expression Database (PED) is a repository for pancreatic-derived -omics data. With a generic web-based system, the database provides the research community with an open access tool to mine currently available pancreatic cancer experimental data sets generated by using large-scale transcriptomic, genomic, proteomic, miRNA and methylomic platforms. Interrogation of the database can be achieved using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). The website also provides users with the opportunity to include their own published dataset in the database.
The database contains data of gene expression measurements or differential expression, copy number, methylation alterations extracted and from published transcriptomics, proteomics, methylomics, miRNA, meta-analysis or genomics studies of various pancreatic normal, benign, precancerous and malignant tissues, body fluids, cell lines and xenograft models under different treatment conditions. This describes pancreatic related-regulation events in 7924 genes/proteins, 44358 transcripts and 307 miRNAs as well as methylation events in 2438 genes and 15051 transcripts. The copy number alteration section includes information on 32002 gains, 23957 losses, 1053 deletions, 4717 amplifications and 88 Loss of heterozygosity (LOH) events occurring in distinct genes and genomic areas. We update the database on a regular basis and welcome content suggestions from scientists.
The data is conceptually divided into two separate databases based on the similarities of the experimental data presented in the studies. One contains relevant gene expression data available from the transcriptomic, proteomic, methylomic and miRNA studies. The other contains copy number alteration data available from genomic studies.
|Summary of Gene Expression/Regulation Events|
|Number of articles||41||18||3||12||3|
|Summary of Copy Number Alteration Events|
|Number of articles||5|
|Copy number altered regions/genes||33092|
|Loss of Heterozygosity||88|
A full list of published datasets and summary of corresponding experimental results are available here.
Our database comprises -omics data from a wide range of specimens derived from tissues, fine needle aspirates and body fluids of healthy people and patients with pancreatic malignant or benign diseases. These are stored alongside information on different treatments and profiling data from cell lines and mouse models.
Additional important information on the specimen (source, collection, preparation, gender, cellularity, cohort), metastasis site (for metastatic samples) and patient cohorts have been included as well.
• acinar cells
• stromal cells
• normal duodenum
• Pancreatic intraepithelial neoplasias (PanIN-1a, PanIN-1b, PanIN-2, PanIN-3)
• primary pancreatic ductal adenocarcinoma (PDAC)
• metastatic PDAC
• pancreatic endocrine tumors (PET) (functioning and non-functioning)
• pancreatic acinar cell carcinoma (PACC)
• well-differentiated and poorly-differentiated endocrine carcinoma (WDEC & PDEC)
• metastatic endocrine carcinoma
• intraductal papillary mucinous neoplasms (IPMN)
• mucinous cystic neoplasms (MCN)
• mucinous cystic ovarian type stroma
• ampullary carcinoma
• pancreatic cancer liver metastasis
• chronic pancreatitis (CP)
• pancreatic pseudocyst
• human pancreatic ductal epithelial (HPDE)
• pancreatic juice
• ectopic xenografts from patient tissues and cancer cell lines
• orthotopic xenografts from patient tissues and cancer cell lines
• Hsp 90 Inhibitor (IPI-504)
• epidermal growth factor receptor (EGFR) inhibitors erlotinib and cetuximab
• SMO-acting antagonist of the Hh pathway, Gemcitabine, Cisplatin, Methotrexate, 5-Fluorouracil (5-FU)
• oncolytic adenoviruses
Reported samples have been profiled on a wide range of transcriptomics, proteomics, methylomics, miRNA or genomics platforms. Data from large-scale meta-analysis are presented as well. In order to avoid ambiguity, we have presented experiment platforms separately from validation platforms.
• Affymetrix GeneChip Human Genome U95 arrays (A,B,C,D,E)
• Affymetrix GeneChip Human Genome U95Av2
• Affymetrix GeneChip Human Genome U133A
• Affymetrix GeneChip Human Genome U133B
• Affymetrix GeneChip Human Genome U133 Plus 2.0
• Affymetrix GeneChip HuGeneFL
• Affymetrix GeneChip HuGene 1.0 ST
• Illumina Human-6 Expression Beadchip
• Agilent Whole Human Genome 4x44K Microarray
• Sanger human 10K cDNA arrays
• Sanger custom 5K1 cDNA arrays
• Clontech Atlas Human Cancer cDNA Expression Array
• cDNA Array (Human Genome Centre Tokyo)
• Serial Analysis of Gene Expression (SAGE)
• cDNA Array United Gene technique Ltd
• Human Genome Oligo-Set-Version 2.0 (Operon, Germany)
• one-dimensional gel electrophoresis
• two-dimensional gel electrophoresis
• two-dimensional difference gel electrophoresis (2D-DIGE)
• enzyme-linked immunosorbent assay (ELISA)
• isotope-code affinity tag (ICAT)
• isobaric tags for relative and absolute quantification (iTRAQ)
• Matrix Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) Mass spectrometry
• Liquid Chromatography Mass Spectrometry (LC-MS/MS)
• Invitrogen ProtoArray Human Protein Microarrays
• Western Blot
• Northern Blot
• sirius red staining
• methylation-specific PCR (MSP)
• Illumina Infinium 27k Human Methylation Beadchip
• Agilent Human CpG Island ChIP-on-Chip Microarray 244K
• Agilent Human miRNA array
• Ambion mirVana miRNA Bioarray
• Geniom Biochip miRNA
• Exiquon miRCURY LNA microRNA Array v.11.0 - hsa (human)
• Exiquon miRCURY LNA microRNA Array v.11.0 - mmu (mouse)
• Exiquon miRCURY LNA microRNA Array v.11.0 - rno (rat)
• Affymetrix GeneChip miRNA Array
• Ohio State University Comprehensive Cancer Center miRNA-Array (OSU_CCC)
• TaqMan Low Density Arrays (TLDA) Human MicroRNA (Applied Biosytems)
• Northern blot
• re-analysis of Proteomic data from Liquid Chromatography-Mass Spectrometry experimenets
• re-analysis of Transcriptomic data from Affymetrix GeneChip Human Genome U133 Plus 2.0 experimenets
• re-analysis of Transcriptomic data from Affymetrix GeneChip HuGeneFL Array experiments
• re-analysis of Transcriptomic data from Sanger human 10K and 45K cDNA array
• Molecular Cytogenetics (MCG) Cancer Array-800
• Illumina Infinium II Whole-Genome Genotyping Assay
• Affymetrix GeneChip Human Mapping 100K SNP Array Set
• Affymetrix GeneChip Human Mapping 50K SNP Array
• Affymetrix Genome-Wide Human SNP Array 6.0
All the studies were manually processed, checked for accuracy and consistency and loaded into our relational database alongside annotations from several public resources such as Ensembl, dbSNP, multi-species comparisons, and The Human Protein Atlas. We imported the available Ensembl human genome annotations (Ensembl release 88) for genes and proteins, SNP information, gene structure and multi-species data enabling the integration and annotation of heterogeneous pancreatic data. In order to avoid integration and annotations errors, we used the pre-established Ensembl annotations and microarray probe set mapping. Ensembl links to The Human Protein Atlas, UniProt/Swiss-Prot, RefSeq and UniProt/TrEMBL databases are made on the basis of sequence similarity. All other subsequent links are inferred from these mappings. Ensembl also establishes mappings to microarray probe set identifiers by matching probe set sequences to Ensembl transcripts. We have also added the Ensembl Genes/Variation/Regulation data to expand data mining capabilities.
Data provided in PED can be access via Web interface, Web services, DAS or Linkout.
Access to the data will be provided through a customised version of MartView, a BioMart web-based query interface.
Access is available from the BioMart central server where it is exposed to third party software, such as: the Bioconductor package biomaRt allowing easy interrogation within the open source R statistical environment and integration into expression profiling experiments, Galaxy framework and Cytoscape software. Interoperability with International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) is also possible through this web services layer.
Our database is a DAS server providing DAS annotations for the wider community so it can be used in other resources or browsers such as Ensembl GeneView using GeneDAS protocol.
The database can be interrogated using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (pathways, antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). Thus, our database enables connections between otherwise disparate data sources and allows relatively simple navigation between all data types and annotations. Users can select to display or download the results to a file as 'HTML', 'CSV' for comma-separated values, 'TSV' for tab-separated values, 'XLS' for Excel, 'ADF' for array description format. One can select a compressed file output and the query will run in the background to be downloaded later. One needs to provide an e-mail address to receive a URL in a notification e-mail that allows the query results to be downloaded.
Alternatively, users can quickly extract summarized information about their gene/protein of interest from the home page. Users can provide the HGNC/Ensembl gene id, miRNA accession or SwissProt/Ensembl protein id in a dedicated search box and the results will summarize PED records related to the queried gene. Each record includes important attributes regarding the study and experiment where the gene/ transcript was found. The attributes list includes information on the -omics technology, exact study, experimental platform, target and baseline specimens/samples used, regulation status, corresponding fold-change and p-value as well as the validation platform(s).
We have added new graphical features to the Biomart query interface to allow users to query, overlay and visualise retrieved results. A separate browser window will appear where users can view the differentially expressed genes and/or copy number altered regions in the UCSC Genome Browser under different tracks. Users can choose to change the chromosomal view by selecting a chromosome from the drop-down list provided. A simple color-coding scheme is used where up-regulated genes and copy-number gains/amplifications are presented in green, whereas down-regulated genes and copy-number losses/ deletions are presented in red. Genes, for which regulation information are not available in PED, are presented in black.
Alternatively, users can select a whole genome view of the retreived results using CIRCOS viewer. The colour coding is similar to that used for UCSC browser visualization. To provide additional flexibility, users can click on a particular chromosome band in the circus image to be redirected to the UCSC Genome Browser for a detailed view of the region of interest.
Researchers can now upload their own datasets of interest to be included in the PED. A very basic set of information is required to complete the process. The user will first provide information regarding the published article. One study corresponds to one published article. Next, the user will provide information regarding the individual experiment such as experiment title, platform technology, specimen details and a compiled result data file. The submission will not be accepted until the user uploads the result data file containing the expression/copy number data. Once submitted, the uploaded data will be checked by our team before being included in PED. It is imperative for the users to provide an email address so that we can contact them in case there are any issues/questions regarding the uploaded data.
We believe that interoperability is a key factor in the utility and productive use of any current and future cancer database. This is essential to ensure the sustainability of any cancer database and facilitate its integration with major international efforts in cancer research such as the International Cancer Genome Consortium (ICGC), supported by the Biomart technology platform and The Cancer Genome Atlas (TCGA), supported by the Cancer Biomedical Bioinformatics Grid (caBIGTM) technology platform. This also will allow the design and implementation of more sophisticated analysis portals. The cancer research community needs open source fully interoperable resources allowing information connectivity and data sharing. Only these types of resource can ensure that cancer data generated across different organisations are shared, thereby maximising the impact of cancer research. By using the same BioMart technology for its data management system, our platform is fully interoperable with the ICGC. Through its web service layer, it also is interoperable with TCGA through its data mining platform caBIGTM. Similarly, our bioinformatics platform is integrated with other complementary resources such as Ensembl.