| 23 June 2010: The Pancreatic Expression Database Version 2.0 available
To broaden the range of users and subsequently increase its functionality, we have substantially changed and improved this resource by expanding the -omics, selections, specimens and annotation data types.
The database contains data on 56015 differential expression or expression measurements and 6363 copy number variations extracted from 59 published transcriptomics, proteomics, miRNA or genomics studies of various pancreatic normal, malignant or benign tissues, body fluids, cell lines and mouse models under different treatment conditions. This describes pancreatic related-regulation events in 8938 genes/proteins, 28448 transcripts, and 279 miRNAs as well as 2771 gains, 1073 losses, 347 homozygous deletions, 1297 high-level amplifications and 875 Loss of heterozygosity (LOH) events occurring in distinct genomic areas.
Our database comprises -omics data from a wide range of specimens derived from tissues, fine needle aspirates and body fluids of healthy people and patients with pancreatic malignant or benign diseases.
These are stored alongside information on different treatments and profiling data from cell lines and mouse models.
- Normal tissues: ductal, islet, acinar, stromal and stellate cells, and normal duodenum.
- Disease tissues: pancreatic intraepithelial neoplasias (PanIN-1a, PanIN-1b, PanIN-1b/2, PanIN-2, PanIN-3), pancreatic ductal adenocarcinoma (PDAC), pancreatic endocrine tumors (PET) (functioning and non-functioning), pancreatic acinar cell carcinoma (PACC), well differentiated endocrine tumors carcinoma (WDEC), intraductal papillary mucinous neoplasms (IPMN), mucinous cystic, mucinous cystic ovarian type stroma, ampullary carcinoma, pancreatic cancer liver metastasis, chronic pancreatitis and pancreatic pseudocyst.
- Cell lines: Human pancreatic ductal epithelial (HPDE), A818, AsPC-1, BxPC-3, Capan-1, Capan-2, CFPAC-1, HPAFII, Hs766T, L3.6pl, Mia PaCa-2, MPanc96, PANC-1, Panc3.27, PaTu8902, PaTu8988S, PaTu8988T, PL45, PT45, SU8686, Suit007, Suit0028 and SW1990.
- Body fluids: pancreatic juice, plasma, saliva, urine and serum.
- Mouse models: ectopic and orthotopic xenografts from patient tissues and cancer cell lines.
- Treatments/Drugs: Hsp 90 Inhibitor (IPI-504), epidermal growth factor receptor (EGFR) inhibitors erlotinib and cetuximab, SMO-acting antagonist of the Hh pathway, Gemcitabine, Methotrexate and oncolytic adenoviruses.
These samples have been profiled on a wide range of miRNA, transcriptomics, genomics or proteomics platforms.
- miRNA: Human miRNA array (Agilent), mirVana miRNA Bioarray (Ambion), Ohio State University Comprehensive Cancer Center miRNA-Array (OSU_CCC); TaqMan Low Density Arrays (TLDA) Human MicroRNA (Applied Biosytems), Northern blot, qRT-PCR.
- Transcriptomics: Different Affymetrix GeneChip¨ Human Genome arrays (U95 (A,B,C,D,E), U95Av2, U133A, U133B, U133 Plus 2.0 and HuGeneFL) Sanger human 10K cDNA arrays, Sanger custom 5K1 cDNA arrays, Clontech Atlas Human Cancer cDNA Expression Array, cDNA Array (Human Genome Centre Tokyo), Serial Analysis of Gene Expression (SAGE), qRT-PCR, cDNA Array United Gene technique Ltd, Human Genome Oligo-Set-Version 2.0 (Operon, Germany), Illumina human-6 Expression Beadchip, Clontech Atlas Cancer Array.
- Genomics: Molecular Cytogenetics (MCG) Cancer Array-800, Illumina Human 1M-Duo SNP BeadChip, Affymetrix GeneChip¨ Human Mapping 100K SNP Set.
- Proteomics: one-dimensional and two-dimensional Gel electrophoresis, Two-dimensional difference gel electrophoresis (2D-DIGE), Enzyme-linked immunosorbent assay (ELISA), Isotope-code affinity tag (ICAT), Immunohistochemistry, Isobaric tags for relative and absolute quantification (iTRAQ), Matrix Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) Mass spectrometry, MALDI-TOF/TOF Mass spectrometry, Liquid Chromatography Mass Spectrometry (LC MS/MS), ProtoArray (Invitrogen), Western Blot.
All the studies were manually processed, checked for accuracy and consistency and loaded into our relational database alongside annotations from several public resources such as Reactome, Ensembl, GO ontologies, dbSNP, multi-species comparisons, UniProt and the Human protein atlas. We imported the available Ensembl human genome annotations (Ensembl release 56) for genes and proteins, SNP information, sequences, gene structure and multi-species data enabling the integration and annotation of heterogeneous pancreatic data. In order to avoid integration and annotations errors, we used the pre-established Ensembl annotations and microarray probe set mapping. Ensembl links to Human Protein Atlas, UniProt/Swiss-Prot, RefSeq and UniProt/TrEMBL databases are made on the basis of sequence similarity. All other subsequent links are inferred from these mappings. Ensembl also establishes mappings to microarray probe set identifiers by matching probe set sequences to Ensembl transcripts. We also added the Reactome data to expand data mining to de-regulated pathways.
- Web interface: Access to the data will be provided through a customised version of MartView, a BioMart web-based query interface.
- Web services: Access is available from the BioMart central server where it is exposed to third party software, such as: the Bioconductor package biomaRt allowing easy interrogation within the open source R statistical environment and integration into expression profiling experiments, Galaxy framework and Cytoscape software. Interoperability with ICGC and TCGA is also possible through this web services layer.
- DAS: our database is a DAS server providing DAS annotations for the wider community so it can be used in other resources or browsers such as Ensembl GeneView using GeneDAS protocol.
- Linkout: Our database is referenced as a Linkout resource providing a Linkout annotation available at NCBI EntrezGene.
The database can be interrogated using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (pathways, antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). Thus, our database enables connections between otherwise disparate data sources and allows relatively simple navigation between all data types and annotations. Users can select to display or download the results to a file as 'HTML', 'CSV' for comma-separated values, 'TSV' for tab-separated values, 'XLS' for Excel, 'ADF' for array description format. One can select a compressed file output and the query will run in the background to be downloaded later. One needs to provide an e-mail address to receive a URL in a notification e-mail that allows the query results to be downloaded.
We believe that interoperability is a key factor in the utility and productive use of any current and future cancer database. This is essential to ensure the sustainability of any cancer database and facilitate its integration with major international efforts in cancer research such as the International Cancer Genome Consortium (ICGC), supported by the Biomart technology platform and The Cancer Genome Atlas (TCGA), supported by the Cancer Biomedical Bioinformatics Grid (caBIGTM) technology platform. This also will allow the design and implementation of more sophisticated analysis portals. The cancer research community needs open source fully interoperable resources allowing information connectivity and data sharing. Only these types of resource can ensure that cancer data generated across different organisations are shared, thereby maximising the impact of cancer research. By using the same BioMart technology for its data management system, our platform is fully interoperable with the ICGC. Through its web service layer, it also is interoperable with The Cancer Genome Atlas (TCGA) through its data mining platform caBIGTM. Similarly, our bioinformatics platform is integrated with other complementary resources such as Ensembl and Reactome.