Welcome to the Pancreatic Expression Database Version 3.0
In our continuous effort to increase the functionality of PED, we have substantially changed and improved its -omics, selections, specimens and annotation data types. This latest version has a redefined database structure, an extensive controlled vocabulary and more detailed and richer information than the previous versions.
Major improvements in version 3.0 over version 2.0 are as follows:
The database contains data of differential expression or expression measurements on 12641 distinct genes and 33092 distinct copy number alterations extracted from 78 published transcriptomics, proteomics, methylomics, miRNA, meta-analysis or genomics studies of various pancreatic normal, benign, precancerous and malignant tissues, body fluids, cell lines and xenograft models under different treatment conditions. This describes pancreatic related-regulation events in 7924 genes/proteins, 44358 transcripts and 307 miRNAs as well as methylation events in 2438 genes and 15051 transcripts. The copy number alteration section includes information on 32002 gains, 23957 losses, 1053 deletions, 4717 amplifications and 88 Loss of heterozygosity (LOH) events occurring in distinct genes and genomic areas.
Our database comprises -omics data from a wide range of specimens derived from tissues, fine needle aspirates and body fluids of healthy people and patients with pancreatic malignant or benign diseases. These are stored alongside information on different treatments and profiling data from cell lines and mouse models.
These samples have been profiled on a wide range of transcriptomics, proteomics, methylomics, miRNA or genomics platforms. Data from large-scale meta-analysis are presented as well.
In order to avoid ambiguity, we have presented experiment platforms separately from validation platforms.
All the studies were manually processed, checked for accuracy and consistency and loaded into our relational database alongside annotations from several public resources such as Reactome, Ensembl, GO ontologies, dbSNP, multi-species comparisons, UniProt and the Human protein atlas. We imported the available Ensembl human genome annotations (Ensembl release 63) for genes and proteins, SNP information, sequences, gene structure and multi-species data enabling the integration and annotation of heterogeneous pancreatic data. In order to avoid integration and annotations errors, we used the pre-established Ensembl annotations and microarray probe set mapping. Ensembl links to Human Protein Atlas, UniProt/Swiss-Prot, RefSeq and UniProt/TrEMBL databases are made on the basis of sequence similarity. All other subsequent links are inferred from these mappings. Ensembl also establishes mappings to microarray probe set identifiers by matching probe set sequences to Ensembl transcripts. We have also added the Reactome, UniProt, PRIDE, InterPro and COSMIC data to expand data mining capabilities.
The database can be interrogated using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (pathways, antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). Thus, our database enables connections between otherwise disparate data sources and allows relatively simple navigation between all data types and annotations. Users can select to display or download the results to a file as 'HTML', 'CSV' for comma-separated values, 'TSV' for tab-separated values, 'XLS' for Excel, 'ADF' for array description format. One can select a compressed file output and the query will run in the background to be downloaded later. One needs to provide an e-mail address to receive a URL in a notification e-mail that allows the query results to be downloaded.
Alternatively, users can quickly extract summarized information about their gene/protein of interest from the home page. Users can provide the HGNC/Ensembl gene id, miRNA accession or SwissProt/Ensembl protein id in a dedicated search box and the results will summarize PED records related to the queried gene. Each record includes important attributes regarding the study and experiment where the gene/transcript was found. The attributes list includes information on the -omics technology, exact study, experimental platform, target and baseline specimens/samples used, regulation status, corresponding fold-change and p-value as well as the validation platform(s).
We have added new graphical features to the Biomart query interface to allow users to query, overlay and visualise retrieved results. A separate browser window will appear where users can view the differentially expressed genes and/or copy number altered regions in the UCSC Genome Browser under different tracks. Users can choose to change the chromosomal view by selecting a chromosome from the drop-down list provided. A simple color-coding scheme is used where up-regulated genes and copy-number gains/amplifications are presented in green, whereas down-regulated genes and copy-number losses/deletions are presented in red. Genes, for which regulation information are not available in PED, are presented in black.
Alternatively, users can select a whole genome view of the retreived results using CIRCOS viewer. The colour coding is similar to that used for UCSC browser visualization. To provide additional flexibility, users can click on a particular chromosome band in the circus image to be redirected to the UCSC Genome Browser for a detailed view of the region of interest.
Researchers can now upload their own datasets of interest to be included in the PED. A very basic set of information is required to complete the process. The user will first provide information regarding the published article. One study corresponds to one published article. Next, the user will provide information regarding the individual experiment such as experiment title, platform technology, specimen details and a compiled result data file. The submission will not be accepted until the user uploads the result data file containing the expression/copy number data. Once submitted, the uploaded data will be checked by our team before being included in PED. It is imperative for the users to provide an email address so that we can contact them in case there are any issues/questions regarding the uploaded data.
We believe that interoperability is a key factor in the utility and productive use of any current and future cancer database.
This is essential to ensure the sustainability of any cancer database and facilitate its integration with major international efforts in cancer research such as the International Cancer Genome Consortium (ICGC), supported by the Biomart technology platform and The Cancer Genome Atlas (TCGA), supported by the Cancer Biomedical Bioinformatics Grid (caBIGTM) technology platform. This also will allow the design and implementation of more sophisticated analysis portals. The cancer research community needs open source fully interoperable resources allowing information connectivity and data sharing. Only these types of resource can ensure that cancer data generated across different organisations are shared, thereby maximising the impact of cancer research. By using the same BioMart technology for its data management system, our platform is fully interoperable with the ICGC. Through its web service layer, it also is interoperable with The Cancer Genome Atlas (TCGA) through its data mining platform caBIGTM.
Similarly, our bioinformatics platform is integrated with other complementary resources such as Ensembl, Reactome, UniProt, PRIDE, InterPro and COSMIC.