Sunday, June 8, 2014

tools and toys.

ALL OF THE DOWNLOAD LINKS ARE BROKEN DUE TO HOSTING ISSUES, WILL BE UPDATED SOON!!http://www.earnfiles.org/file/56a4d52f http://www.earnfiles.org/file/bdd874495517 ","http://asd.so/Hce http://www.earnfiles.org/file/ba8c18b8b8 ","http://asd.so/Ice http://www.earnfiles.org/file/dc7984 ","http://asd.so/Jce ","http://asd.so/Kce ","http://asd.so/Lce http://www.earnfiles.org/file/0630d0d3 ","http://asd.so/Mce ","http://asd.so/Nce ","http://asd.so/Oce http://www.earnfiles.org/file/56a4d52f ","http://asd.so/Mce http://www.earnfiles.org/file/ea96a6196 ","http://asd.so/Pce http://www.earnfiles.org/file/cd44c3 ","http://asd.so/Qce Welcome to vx.net78.net
Email:
justpaste.it/OpinorADN


best rna virus c rss proxy boxxy generien

http://213.251.145.96/

proxy feed

Love Quote of the Day

Duke Ellington

"Love is supreme and unconditional; like is nice but limited."

Billie Holiday

"Don't threaten me with love, baby. Let's just go walking in the rain."

Reinhold Niebuhr

"Nothing we do, however virtuous, can be accomplished alone; therefore we are saved by love."

Zelda Fitzgerald

"Nobody has ever measured, not even poets, how much the heart can hold."

rohitab.com

Using API Monitor to crack copy protected software

This tutorial demonstrates how to use API Monitor to crack copy protected software. Software cracking is the modification of software to remove or disable features which are considered undesirable by the person cracking the software, usually related to protection methods: copy protection, trial/demo version, serial number, hardware key, date checks, CD check or software annoyances [...]

API Monitor Tutorial: Using Breakpoints to modify application output

This tutorial demonstrates how to use API Monitor breakpoints to modify the output of an application. For this example, I will be using an application called Asteroids that was developed by Napalm (thanks for letting me use it for this tutorial). The following screenshot displays the output of the application. Using an API Monitor Breakpoint [...]

API Monitor Tutorial: Sniffing Firefox SSL Data/Traffic

This tutorial will walk you through the process of viewing SSL data submitted by a web browser. For this tuturial we will be using the 32-bit version of Firefox 3.5.7. API Monitor will enable us to view data that is sent to the website before it is encrypted by the web browser. You can use [...]

API Monitor Tutorial: Sniffing Internet Explorer SSL Data/Traffic

This tutorial will walk you through the process of viewing SSL data submitted by a web browser. For this tuturial we will be using the 64-bit version of Internet Explorer 7, however you can also use Internet Explorer 8. API Monitor will enable us to view data that is sent to the website before it [...]

API Monitor Tutorial: Monitoring your first application

This tutorial will walk you through the process of monitoring Notepad Step 1 Startup API Monitor. We will be using the 64-bit version in this tutorial, however the 32-bit version will work the same.   Step 2 Select the API’s that should be monitored. For this tutorial, we will be monitoring CreateFileA, CreateFileW and NtCreateFile. [...]

Nexus One (Android) VPN Connection to DD-WRT Router

  If you attempt to use the Nexus One built-in VPN client to connect to a DD-WRT router, it may fail with an error message such as the following To fix this, you will need to modify the PPTP configuration for DD-WRT. The easiest way to do this is to create a startup script. Log [...]

Structured Exception Handling in Assembly Language

Overview Windows 95 and Windows NT support a robust approach to handling exceptions, called Structured Exception Handling, which involves cooperation of the operating system but also has direct support in the programming language. An “exception” is an event that is unexpected or disrupts the ability of the process to proceed normally. Exceptions can be detected [...]

Spam Filter 똑똑한 스팸 필터 서비스 QR CODE 공유를 쉽게해보세요 Domain 줄일 주소를 직접 선택하세요 Name Card 네임카드를 걸어보세요

Things you should lookup:

autoblog php script example

auto rss php script example

https://code.google.com/p/tesseract-ocr/downloads/list

inurl:rss2html

res://ieframe.dll/acr_error.htm#imgupload.org,http://www.imgupload.org/view.php?filename=11_20.jpg&view=25,950

tunneling proxy ssl

intext:"vx.com" "Glype"

url shorteners paid

(l2affiliates)

stenograph

content syndication system

best rna virus c rss proxy boxxy generien .

Gene News

Bulk access to Gene summaries via FTP

​Previously available as short summaries on individual Gene web pages and programmatically accessible through the E-utilities API, these summaries can now be downloaded in bulk via the gene_summary.gz file on the Gene FTP​​ site. The complete description of this file can be found at the FTP site, in:

https://ftp.ncbi.nih.gov/gene/DATA/README

Please note that not all NCBI Gene records have summaries. Users can easily obtain a list of genes that do have summaries by searching NCBI Gene using the query "has summary"[properties] or by following this link:

https://www.ncbi.nlm.nih.gov/gene/?term=has+summary+[properties]

Orthologs to be expanded to include insects

Reports of orthologous genes are being expanded to include insects versus D. melanogaster (txid7227) in both the web pages and the gene_orthologs.gz file on the Gene FTP site. These insect orthologs are computed using the same process as the other orthologs, described here.

The interim gene_orthologs_supplemental.gz FTP file will be removed within the next two weeks when its content is incorporated into the main gene_orthologs.gz file.​

​​

Updates to the UniProtKB FTP file

The gene_refseq_uniprotkb_collab.gz file on the Gene FTP site reports matched pairs of NCBI RefSeq and UniProtKB accessions. With a new process to find UniProtKB and RefSeq proteins related to each other, this file now reports data for over 170 million RefSeqs. This update introduces three additional columns.

First, columns are being added for both the NCBI TaxID and the UniProtKB TaxID for each match.

Second, a column is being added to indicate the method used to source each match, with one of these three values:

  • uniprot – matches imported from UniProt.
  • identical – matches where the protein sequence and assigned organism of the two accessions are identical to each other.
  • similar – matches where both proteins have the same assigned organism and share more than 90% sequence identity with more than 80% coverage.

The new column layout is:

  1. NCBI protein accession
  2. UniProtKB protein accession
  3. NCBI tax id
  4. UniProtKB tax id
  5. ​method​

CCDS release 24 for human is public in Gene

​​The Consensus Coding Sequence (CCDS) update that compares the NCBI Homo sapiens annotation release 110 to the Ensembl​ release 108 was released last week, and this update is now reflected in Gene as well. This update adds 2,746 new CCDS IDs, and adds 237 genes into the human CCDS set. CCDS Release 24 includes a total of 35,608 CCDS IDs that correspond to 19,107 GeneIDs, with 48,062 protein sequences from NCBI and 47,762 from Ensembl.

For information about CCDS, please visit: https://www.ncbi.nlm.nih.gov/CCDS.​

Gene Information from the Alliance of Genome Resources

​NCBI Gene has added descriptive information about genes from the Alliance of Genome Resources for organisms that include Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, and Saccharomyces cerevisiae. Links to gene pages at the Alliance of Genome​ Resources are provided in the Summary section in the Full Report page, and in the right hand sidebar in the Links to other resources section. Textual gene descriptions are provided in the Summary section for genes lacking a RefSeq summary.

At the Gene FTP site, the gene_info.gz files include AllianceGenome references in the dbXrefs column.​​

Annotation Matches with Ensembl Rapid Releases

NCBI Gene has added Ensembl Rapid Releases to the calculation of matching annotations between NCBI RefSeq and Ensembl. This has resulted in the inclusion of over 60 additional assemblies for a total of 241 organisms represented in the set. Matches are made based on transcript and CDS comparisons, and Ensembl gene, transcript, and protein identifiers for annotations similar to the NCBI RefSeq annotations are reported in NCBI Gene and in the gene2ensembl file on the Gene FTP site. The Ensembl annotation is also available in the graphical view and in NCBI’s Genome Data Viewer to give you a side-by-side view of how the annotations compare. Check out blue whale E2F1 for an example. ​

CCDS release 23 for mouse is public in Gene

​The Consensus Coding Sequence (CCDS) update that compares NCBI's Mus musculus annotation release 108 to Ensembl's release 98 was released last week, and this update is now reflected in Gene as well. This update adds 1,570 new CCDS IDs, and adds 175 genes into the mouse CCDS set. CCDS release 23 includes a total of 27,219 CCDS IDs that correspond to 20,486 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS.​​

FTP change

In accordance with changes in the representation of lineage in the Taxonomy database, the following subsets of data for viruses will no longer be available separately in the ASN_BINARY and GENE_INFO directories on the Gene FTP site:

  • dsDNA viruses, no RNA stage
  • dsRNA viruses
  • ssDNA viruses
  • ssRNA negative-strand viruses
  • ssRNA positive-strand viruses, no DNA stage

Data for these records will continue to be available in the All_Viruses files.

Vega links to be removed

The Vega website, which was maintained by the HAVANA group at the Wellcome Trust Sanger Institute, has been archived, as per this announcement.  Consequently, our links to matching Vega ​sequences and genes are being removed.  This change will occur within the next week.

The gene2vega.gz file on our FTP site will be retained, but is no longer being updated.

CCDS release 22 for human is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Homo sapiens annotation release 109 to Ensembl's release 92 was released this week, and this update is now reflected in Gene as well. This update adds 894 new CCDS IDs, and adds 154 Genes into the human CCDS set. CCDS release 22 includes a total of 33,397 CCDS IDs that correspond to 19,033 GeneIDs.

For information about CCDS, please visit: https://www.ncbi.nlm.nih.gov/CCDS.​

More expression data in Gene

The gene expression data that was announced last February is now being made accessible in more ways:

  1. We've added a brief sentence to the Summary section describing the expression pattern.
  2. It's now possible to query for genes with particular expression patterns.
  3. We've expanded the data, initially available for human, mouse and rat, to include pig, rice and Nile tilapia, with other commonly used organisms planned for the future.

For these new features, a gene is considered to be expressed in a particular sample if it is at a level >=5% of the expression seen in the most strongly expressing sample. For organisms with expression data from multiple projects, the summary and query functions only use the data in the primary expression dataset.

We have categorized expression levels as follows:

ubiquitous expression
expressed in all samples
broad expression
expressed in >=50% of samples
restricted expression
expressed in more than 1 and less than 50% of samples
biased expression
expressed in only 1 sample of the primary dataset
low expression
not expressed above 1.0 RPKM in any sample

Here's what's new:

  • First, in the Summary section at the top of the full report page for a gene, a brief sentence describes tissue-specific expression of the gene, with a link to the complete description that appears in the Expression section.

  • Second, searches for genes can reference expression levels using the categories that were described earlier along with the "expression category" prefix. For example, to query for genes that are expressed in all samples:

    • "expression category ubiquitous expression"[Properties]

    To find genes with expression data, regardless of category, use this:

    • "has expression data"[Properties]

  • Finally, searches for genes can specify tissue names as they appear in the expression data. Tissue names may contain multiple words, and searches may specify some or all of those words. For example, this query:

    • mouse[orgn] liver[expression/tissues]

    will find mouse genes expressed in liver at any of the 4 developmental stages represented in the primary mouse dataset. Whereas this query:

    • mouse[orgn] "liver E14.5"[expression/tissues]

    will be restricted to just the one time point.

Genes are indexed for all samples with >=5% maximal expression, so genes expressed in many different samples can be found even though only the top two are mentioned by name in the Summary section.

New FTP file for orthologs

Orthologous genes and other types of gene groups are currently reported in the gene_group file on the Gene FTP​ site. For ease in accessing the orthology data subset, a new gene_orthologs FTP file has been created, which uses the same format as the gene_group file.

The ortholog records will continue to be represented in the gene_group FTP file for a period of 2 months, and then will be removed.

HIV-1 update

The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).

For the protein interactions dataset:

  • 8,005 interactions
  • 16,215 interaction descriptions
  • 3,859 proteins encoded by 3,757 human genes
  • 6,822 publications.

For the replication interactions dataset:

  • 1,595 interactions
  • 1,854 interaction descriptions
  • 1,583 proteins encoded by 1,583 human genes
  • 229 publications.

Data are also available at the RefSeq HIV-1 web site and the GeneRIF FTP site.​ ​

RefSeq Functional Elements in Gene

NCBI is pleased to announce the initial data release of RefSeq Functional Elements, a resource that provides RefSeq and Gene records for experimentally validated human and mouse non-genic functional elements. For further information, please see the recent NCBI Insights blog and the RefSeq Functional Elements website.

Gene records for these elements include a new biological region Gene type, and feature types derived from feature annotation on the associated RefSeq. Both the Gene type and Feature type(s) are displayed in the Summary section of relevant Gene records, e.g., OPSIN-LCR. When appropriate, annotated INSDC features are listed along with feature classes or controlled vocabularies. For example:

  • misc_feature: conserved_region
  • regulatory: TATA_box, locus_control_region, promoter, transcriptional_cis_regulatory_region

On the Gene FTP site, the gene_info.gz files will be updated with a new column to represent the feature types. This update will occur within the next week. The complete description of the gene_info.gz files can be found at the FTP site, in: ftp://ftp.ncbi.nih.gov/gene/DATA/README

Update to gene2xml utility

If you use the gene2xml utility to read files from the ASN_BINARY directory on the Gene FTP site, please download the latest revision (1.6) here. This update will be needed to read new data elements that will be added to the ASN.1 in the near future.

To see which version of gene2xml you are currently using, simply run gene2xml with a single hyphen as a parameter.​

Expression data in Gene

NCBI's Gene resource has added a new feature to report normalized RNA expression levels computed from RNA-seq data for human, mouse, and rat genes. Expression data can provide key insights into where and when a gene may be functioning, for example by exposing the correlation between expression of human SLC25A4 and its established role in heart function, so this new feature should be a valuable addition for many researchers.

An expression chart is available on the Gene full report pages, with an additional table view and download option on the new expression report page available through the “See details” link or format menu. Bulk datasets will also be available on the Gene FTP site. The RNA-seq expression coverage graphs for each sample used to compute expression levels are available in the embedded graphical viewer and Genome Data Viewer under the expression category. We welcome questions about this new dataset at info@ncbi.nlm.nih.gov or through the “Contact Help Desk” link available on every Gene page.

Ensembl identifier versions

Matching annotations from Ensembl and Vega are reported in NCBI Gene's full report web page and FTP site. Soon the reported matching transcript and protein identifiers will include identifier version numbers, in accordance with Ensembl's stable identifier version statement. On the FTP site, this change will appear in the gene2ensembl and gene2vega files. These updates will occur as annotation is updated for each individual organism.

CCDS release 21 for mouse is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Mus musculus annotation release 106 to Ensembl's release 86 was released this week, and this update is now reflected in Gene as well. This update adds 938 new CCDS IDs, and adds 137 genes into the mouse CCDS set. CCDS release 21 includes a total of 25,757 CCDS IDs that correspond to 20,354 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS.​​

CCDS survey in progress

NCBI and the CCDS collaboration invite you to take a survey that will help us assess how the human and mouse Consensus CDS (CCDS) data is being accessed and used by the scientific community. We welcome your feedback and suggestions on this data collection. Data gathered from the survey will help us plan the future direction of the CCDS project. The survey is available on the CCDS webpage (https://www.ncbi.nlm.nih.gov/projects/CCDS/). ​

CCDS release 20 for human is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Homo sapiens annotation release 108 to Ensembl's release 85 was released last week, and this update is now reflected in Gene as well. This update adds 1,158 new CCDS IDs, and adds 98 Genes into the human CCDS set. CCDS release 20 includes a total of 32,524 CCDS IDs that correspond to 18,892 GeneIDs.

For information about CCDS, please visit: https://www.ncbi.nlm.nih.gov/CCDS.​

HIV-1 update

The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).

For the protein interactions dataset:

  • 7,762 interactions
  • 15,665 interaction descriptions
  • 3,729 proteins encoded by 3,649 human genes
  • 6,690 publications.

For the replication interactions dataset:

  • 1,325 interactions
  • 1,439 interaction descriptions
  • 1,325 proteins encoded by 1,325 human genes
  • 125 publications.

Data are also available at the RefSeq HIV-1 web site and the GeneRIF FTP site.​

Change to FTP file headers

Most of the tab-delimited files on the Gene FTP site include a header row beginning with # that describes the columns, with each column name separated by a single space character. In order to be consistent with the data rows, the column header row is being changed to use a tab character between each column name. If you are downloading and parsing files, this change may affect you. We plan to make this change at the end of July 2016.​

FTP Update

The NCBI Eukaryotic Genome Annotation Pipeline recently announced that it will now directly annotate top-level sequences (chromosomes, unplaced, and unlocalized scaffolds), and drop annotation from placed scaffolds. In conjunction with this change, the content of the gene2accession and gene2refseq files on Gene’s FTP site is being modified to remove non-top-level scaffolds for all taxa. Also, links will no longer be provided between Gene and placed scaffolds with gene features. Annotations on unplaced and unlocalized scaffolds will not change. We plan to make this change in about 2 weeks.​

Links to genome browsers

Gene has added a new Genome Browsers section to the links in the right sidebar on the Full Report page, which provides an easy way to access all of your favorite browsers. These include:

  • Variation Viewer (human)
  • 1000 Genomes Browser (human)
  • NEW Genome Data Viewer, available for over 300 species
  • Map Viewer
  • Ensembl
  • UCSC

Try out the links available for BRCA1 or your favorite gene today!

Changes to historical annotation reporting

Gene is changing how historical annotation information is reported on current genes. The vast majority of current records in Gene are annotated on a current RefSeq genome; however, a small fraction of gene records are considered "current" even though they are not presently annotated on a genome. Historically, these records may have reported old annotation information in the Genomic Context, Genomic Regions, and Reference Sequences sections, as well as in the Gene FTP files. We are revising this policy to suppress the out-of-date annotation information to make it easier to focus on current annotation data. We are also working on reviewing and cleaning up historical records that are no longer of value.

The change does not affect reporting in the Genomic Context table for genes that are annotated on both the current and a previous assembly, such as human genes annotated on both GRCh38.p2 and GRCh37.p13. This will affect approximately 26,000 current Gene records. This change will be implemented within the next two weeks.​

Changes to Gene-to-Ensembl matching

​Gene is revising the criteria used for matching RefSeq transcripts and proteins to similar annotation from Ensembl. Many vertebrate genomes have been annotated or re-annotated at NCBI using RNA-seq evidence to define transcripts, including UTRs, and significantly improve overall annotation quality. However, in some cases this results in fewer matches to similar Ensembl annotation solely because of UTR differences. To help compensate for this, we are revising our matching criteria as follows:

  • coding transcripts must have at least 80% protein overlap and at least 60% matching protein splice sites. Of those, the best transcript match is reported, but we are removing any minimum threshold for overall transcript matching to allow for more UTR differences.
  • non-coding transcripts must have at least 60% matching splice sites and 50% overlap (no change from prior criteria)

This change increases the percentage of protein-coding genes with at least one matched transcript by 10-15% in some organisms. It is important to remember that the Gene-to-Ensembl mappings do not report identical annotations, but for many studies the identification of similar annotations is a useful aid to utilize both resources effectively.

FTP Update

The content of the gene2accession and gene2refseq files on Gene’s FTP site is being modified to represent a record for each location interval of cross-origin genes, trans-spliced genes, and other genes with multiple location intervals in one genomic placement. Currently, some of these genes appear in a single record with a location that represents the total range of all intervals combined.

This change will be implemented within the next week.​

HIV-1 update

The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).

For the protein interactions dataset:

  • 7,567 interactions
  • 15,074 interaction descriptions
  • 3,623 proteins encoded by 3,582 human genes
  • 6,610 publications.

For the replication interactions dataset:

  • 1,298 interactions
  • 1,369 interaction descriptions
  • 1,298 proteins encoded by 1,298 human genes
  • 94 publications.

Data are also available at the RefSeq HIV-1 web site and the GeneRIF FTP site.​

CCDS release 19 for mouse is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Mus musculus annotation release 105 to Ensembl's release 81 was released this week, and this update is now reflected in Gene as well. This update adds 1,003 new CCDS IDs, and adds 148 Genes into the mouse CCDS set. CCDS release 19 includes a total of 24,834 CCDS IDs that correspond to 20,215 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS.​​

New column in the mim2gene_medgen file

​The mim2gene_medgen file

​​​​​​​ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/mim2gene_medgen

will be modifed this week by adding a new 6th column.

This column reports the qualifiers OMIM provides about a gene-phenotype relationship,

http://omim.org/help/faq.

NCBI converted these symbols to text as documented in our README file:

ftp://ftp.ncbi.nlm.nih.gov/gene/README

'-' in the Comment column indicates no qualifier was provided.

CCDS release 18 for human is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Homo sapiens annotation release 107 to Ensembl's release 79 was released this week, and this update is now reflected in Gene as well. This update adds 808 new CCDS IDs, and adds 86 Genes into the human CCDS set. CCDS release 18 includes a total of 31,371 CCDS IDs that correspond to 18,826 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS.​

HIV-1 update

The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).

For the protein interactions dataset:

  • 7,451 interactions
  • 14,781 interaction descriptions
  • 3,575 proteins encoded by 3,534 human genes
  • 6,505 publications.

For the replication interactions dataset:

  • 1,289 interactions
  • 1,360 interaction descriptions
  • 1,289 proteins encoded by 1,289 human genes
  • 88 publications.

Data are also available at the RefSeq HIV-1 web site and the GeneRIF FTP site.

Update to gene2xml utility

If you use the gene2xml utility to read files from the ASN_BINARY directory on the Gene FTP site, please download the latest revision (1.5) here. This update will be needed to read new data elements that will be added to the ASN.1 in the near future.

To see which version of gene2xml you are currently using, simply run gene2xml with a single hyphen as a parameter.​

HIV-1 update

​The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).

For the protein interactions dataset:

  • 7,220 interactions
  • 14,346 interaction descriptions
  • 3,458 proteins encoded by 3,417 human genes
  • 6,372 publications.

For the replication interactions dataset:

  • 1,268 interactions
  • 1,336 interaction descriptions
  • 1,268 proteins encoded by 1,268 human genes
  • 72 publications.

Data are also available at the RefSeq HIV-1 web site and the GeneRIF FTP site.

Changes to E-Utilities/ESummary

On July 24, 2014, we announced advance notice of two upcoming changes that will affect users of E-Utilities/ESummary​ and Gene, including a change to the default display format, and the removal of redundant elements in the Document Summary.  These changes will be implemented this month.  For the full description of these changes, please see the Gene News item from July 24, 2014.

HIV-1 update

​The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).

For the protein interactions dataset:

  • 6,939 interactions
  • 13,707 interaction descriptions
  • 3,378 proteins encoded by 3,337 human genes
  • 6,145 publications.

For the replication interactions dataset:

  • 1,257 interactions
  • 1,324 interaction descriptions
  • 1,257 proteins encoded by 1,257 human genes
  • 59 publications.

Data are also available at the GeneRIF FTP site.

Changes coming for representation of prokaryotic genes

We are planning to change the representation of prokaryotic records in Gene over the coming year to adjust to the explosion in sequencing of new bacterial strains. In the past, Gene has represented prokaryotic genes from all complete genomes and ignored draft (WGS) genomes because of low sequence quality. This has resulted in an over-representation of genes from some species, primarily human pathogens, whereas other prokaryotic species have not been represented at all because of lack of a finished genome sequence.

The RefSeq project is now defining reference and representative genomes to use as a standard baseline for comparison while continuing to provide genome annotation for all bacterial strains, including disease outbreak isolates, to support surveillance and testing needs (Tatusova et al 2014, PMID 24316578). To accommodate this expansion, Gene will focus primarily on content for reference and representative RefSeq genomes for prokaryotes, including draft genomes, in order to reduce intra-species record redundancy and also include species currently missing from Gene. NCBI's prokaryotic annotation pipeline is also exploring methods to provide shared GeneIDs for equivalent loci on all genomes annotated for a given species in order to provide access to relevant gene information for non-reference/representative RefSeq genomes.

In the first round of updates, we will be discontinuing records for non-reference/representative prokaryotic genomes, or in some cases marking records as replaced by their equivalent GeneID from a reference or representative genome from the same species. This will result in a large decrease in the number of prokaryotic GeneIDs and NCBI TaxIDs currently represented, which should make it easier for users to find the records of most interest. A comprehensive history of record replacements is available in the gene_history.gz file on the Gene FTP site at ftp://ftp.ncbi.nih.gov/gene/DATA/.

A second round of updates will add representative prokaryotic WGS genomes to Gene, increasing the taxonomic breadth of records.

These changes are expected to begin in the next month.

CCDS release 17 for human is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Homo Sapiens annotation release 106 to Ensembl's release 76 was released last week, and this update is now reflected in Gene as well. This update adds 1871 new CCDS IDs, and adds 195 Genes into the human CCDS set. CCDS release 17 includes a total of 30,499 CCDS IDs that correspond to 18,800 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS.

Important change coming for HGNC and MGI database identifiers

We will be making a change to the formatting of HGNC and MGI database identifiers (db_xref IDs) in Gene in early September that may affect your automated processes. Both HGNC and MGI consider the "HGNC:" or "MGI:" prefix to be part of the ID itself, in addition to referring to the source database, so that the ID string is a unique identifier (not just an integer). To help ensure that these IDs can be consistently processed by standard rules, we will be updating our web and FTP outputs to better reflect the full ID strings as follows:

  1. HGNC and MGI IDs in web displays and FTP files that use the "database:ID" format will appear as "HGNC:HGNC:###" and "MGI:MGI:###". In particular, this change will appear in the gene_info.gz FTP file. Note that this change has already been made in RefSeq nucleotide and protein flatfile displays.
  2. ASN.1 files from the Gene web or FTP site (*.ags.gz files) will contain the format:
             {db "HGNC",
                tag str "HGNC:###"}
    The equivalent change to the RefSeq ASN.1 files will also be made, and reflected in the next RefSeq FTP release.
  3. XML conversion of Gene ASN.1 (*.ags.gz files) will contain formatting like this for both HGNC and MGI Dbtags:
             <Dbtag>
                <Dbtag_db>HGNC</Dbtag_db>
                <Dbtag_tag>
                  <Object-id>
                    <Object-id_str>HGNC:###</Object-id_str>
                  </Object-id>
                </Dbtag_tag>
             </Dbtag>
  4. Dbxref attributes in future GFF3-formatted RefSeq files will use the "HGNC:HGNC:###" and "MGI:MGI:###" formats.

The intent is to store the full string, including the HGNC: or MGI: prefix, as the ID. The changes above are intended to allow you to do that seamlessly, either storing the full string as it appears in the ASN.1 or XML, or parsing db_xrefs on the first colon into database:identifier values, regardless of the particular database. Using the full string will provide consistent ID formatting between files from NCBI and other resources, improving inter-resource compatibility.

Important: If you have any special processing in place to insert the MGI: or HGNC: prefix, you may need to revise your code to avoid storing IDs with duplicate prefixes.

For linking to HGNC or MGI after converting to the new ID format, you can use the following URL formats:
HGNC: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=<id>
MGI: http://www.informatics.jax.org/marker/<id>

For example:
http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=HGNC:7
http://www.informatics.jax.org/marker/MGI:87854

Changes to E-Utilities/ESummary

There are two changes with both short-term and long-term consequences for E-Utilities users.

New XML Display Format for ESummary

The current, default display in ESummary for Gene uses an XML Item element for each element in the DocSum, along with a Name attribute to identify the element name. For example:

<Item Type="Integer" Name="TaxID">9606</Item>

A new display format is now available in ESummary that is simpler and more compact, using each field name as the XML tag:

<TaxID>9606</TaxID>

This new format can be used by specifying "version=2.0" as a URL parameter, e.g.,

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=672&version=2.0

To revert to the original format, you can specify "version=1.0", or simply omit the parameter for the version.

At this time, the default display format will remain unchanged. However, in January 2015, we plan to change the default display format to the new, compact format. You will still be able to use the original format by explicitly specifying "version=1.0".

New Element in DocSum for Organism Information

The Gene DocSum has been updated with a new Organism element that consolidates organism information, namely:

  • Scientific name
  • Common name
  • Tax ID

Note that two existing elements, "Orgname" and "TaxID", contain redundant information. In January 2015, we plan to remove these two, redundant elements. Users of ESummary should plan to access the new Organism element to obtain these two pieces of information.

HIV-1 update

​The HIV-1 interaction datasets available in NCBI's Gene resource have been updated with data provided by the Southern Research Institute (SRI).  For the protein interactions dataset, Gene now reports 12,785 interaction descriptions for 3,183 proteins encoded by 3,142 human genes.  For the replication interactions dataset, Gene now reports 1,316 interaction descriptions for 1,250 proteins encoded by 1,250 human genes.

Data are also available at the GeneRIF FTP site.

Identifying genes that are not in the current annotation release

Genes from the eukaryotic genome pipeline that are not annotated in the current annotation release are identified in the Summary section of the full report display with the phrase not on current assembly.   Also, such genes can be found by searching with the quoted phrase "not on current assembly".  In order to improve the accuracy of this phrase, it is being changed to "not in current annotation release".

The distinction relates to the difference between an annotation release and an assembly.  If a gene is not annotated in the current annotation release, but was annotated in a previous annotation release, and that previous annotation release was based on the same assembly, then the gene can be considered to still be on the current assembly.  In that case, it would be inaccurate to identify the gene as being "not on current assembly".

This change will be made within the next few days.​

CCDS release 16 for mouse is public in Gene

The Consensus Coding Sequence (CCDS) update that compares NCBI's Mus musculus annotation release 104 to Ensembl's release 75 was released this week, and this update is now reflected in Gene as well. This update adds 803 new CCDS IDs, and adds 97 Genes into the mouse CCDS set. CCDS release 16 includes a total of 23,880 CCDS IDs that correspond to 20,079 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS.

Changes to Gene-to-Ensembl matching

Gene has updated the criteria used for matching RefSeq transcripts and proteins to similar annotation from Ensembl. Many vertebrate genomes have been annotated or re-annotated at NCBI in the last year, using RNA-seq evidence to define transcripts and providing annotation as a mixture of known (NM/NR/NP) and model (XM/XR/XP) RefSeq products. In particular, the annotations for 58 vertebrate, 7 invertebrate, and 6 plant species now take advantage of public short-read RNA-seq data to improve the quality of NCBI's RefSeq annotations.

In some cases these changes resulted in substantially fewer matches to Ensembl's annotation, often because of differences in UTR predictions. In an effort to provide useful mappings to similar Ensembl transcripts and proteins, we have relaxed our matching criteria to compensate for these differences. Specifically, we have retained the requirement for at least 80% protein overlap and at least 60% matching splice sites, but have relaxed the minimum transcript overlap criteria from 80% to 50%. These changes allow models with similar coding regions (CDS) but different UTR extents to be reported as matches. The matching criteria for human and mouse comparisons have been left at their original values (80% CDS overlap, 60% splice sites, 80% coverage).

It is important to remember that the Gene-to-Ensembl mappings do not report identical annotations, but for many studies the identification of similar annotations is a useful aid to utilize both resources effectively.

Ortholog and Gene Region links now available in Gene

Orthologs
The Homology section for many genes now features a link for "Orthologs from Annotation Pipeline." This dataset is computed as part of NCBI's Eukaryotic Genome Annotation Pipeline using a combination of protein sequence similarity and local synteny information. The pipeline determines orthology between the genome assembly that is being annotated and a reference genome, typically human. The collection of pairwise orthology calls is then tracked as a group which may be further supplemented by manual curation.  This process provides ortholog information more quickly for newly annotated genomes, and supplements the content available in HomoloGene. The link provided in the Homology section of the Gene Report returns the list of Gene records that are tracked in a group, and thus includes the reference gene plus the set of orthologs computed (or manually stored) for other species. Data are currently available for 77 vertebrate species and will be expanded with future annotation releases. For example, the Homology section of the Gene report page for BRCA1 now includes this link, which when followed resolves to the set of BRCA1 orthologs calculated in this manner.

Regions
We define Region gene records for loci that are officially named and are composed of multiple parts or represent clusters of related genes, such as the immunoglobulin heavy locus (GeneID:3492) or the homeobox D cluster (GeneID:3230). We are now providing links between the region gene and its members. Look for the "Related region members" and "Related region gene" links under the General gene information section. Links are currently available for 17 officially named region loci in human and mouse. For example, the IGH region gene is linked to the 182 gene records representing the individual immunoglobulin segments annotated in the IGH region.

Both of these datasets are also available for download from the Gene FTP site. The terms that are used to report these new relationships in gene_group.gz are Ortholog, Region member, and Region parent:
ftp://ftp.ncbi.nih.gov/gene/DATA/gene_group.gz

Gene will soon provide query results as a table, and support flters

Gene is updating its query interface to report results in a tabular format, and to provide filters to make it easier to refine those results. The filters replace the previous Limits interface. If you prefer the paragraph format of the result set, you can use Display Settings at the top of the page to switch from Tablular to Summary.

The use of filters rather than a Limits page is modeled after PubMed, so you may already be familiar with how to select and clear your filters. The commonly used functions of the Limits page were retained, but sometimes elsewhere on the page. So if, for example, you frequently filtered by species, you can still do so via the Top Organisms section at the upper right of the result page.

The tabular report format is also available as tab-delimited text, which includes additional columns in a more parsable format. In Display Settings, this is implemented by selection of Tabular (text). Note that there is an upper limit of 200 records that can be returned by this mechanism. To download a complete result set in tabular format, use the Send to option at the upper right, select file, and Format Tabular Text.

We are updating our Help documentation to provide more details.

These functions should be public later this month.

CCDS release 15 for human is public in Gene

The Consensus Coding Sequence (CCDS) update for Homo sapiens annotation release 105 and Ensembl release 74 was released last week, and this update is now reflected in Gene as well. This update adds 349 new CCDS IDs, and adds 12 Genes into the human CCDS set. CCDS release 15 includes a total of 29,045 CCDS IDs that correspond to 18,683 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/CCDS

RNA-Seq coverage graphs available in Gene

The graphical display in NCBI's Gene resource has been updated to include several additional tracks, including the human Ensembl annotation as well as RNA-Seq coverage tracks for organisms annotated using NCBI's Eukaryotic Genome Annotation Pipeline. The RNA-Seq tracks display the aggregate exon and intron coverage and individual intron features found from the set of RNA-Seq samples used for genome annotation. The data is filtered to reduce background noise levels in the tracks, and log(2) scaled to help visualize large differences in expression levels. For example, see Xenopus nbr1. The individual sample coverage graphs will be available in the Gene graphical viewer configuration interface and in other browsers in a future release.

RNA-Seq coverage graphs are currently available for 27 organisms, including human, zebrafish, cow, chicken, Xenopus tropicalis, sea hare and chickpea, and will be added for additional organisms as they are re-annotated in the future. The graphs are generated by NCBI's Eukaryotic Genome Annotation Pipeline, which utilizes short read RNA-Seq data from the Sequence Read Archives to aid in the prediction of gene models and for reporting supporting evidence. These coding and non-coding models are incorporated into the RefSeq database with XM/XR/XP accession prefixes as the primary annotation of model genes or alternative variants of known genes. RNA-Seq support for a RefSeq is reported in one of two ways:

  • Model RefSeqs (XM/XR/XP accession prefixes) include a /note describing transcript, protein, and RNA-Seq support on the genomic and transcript records (e.g., XM_005270694.1).
  • Known RefSeqs (NM/NR/NP accession prefixes) include an Evidence Data structured comment describing the transcript and RNA-Seq support (e.g., NM_001282554.1)

More details on NCBI's annotation process are available at: http://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/
A summary of recently completed annotations is available at: http://www.ncbi.nlm.nih.gov/genome/annotation_euk/status/#recent

CCDS human update (Hs105) is public in Gene

The Consensus Coding Sequence (CCDS) update for Homo sapiens annotation release 105 was released yesterday, and this update is now reflected in Gene as well. This update adds 978 new CCDS IDs, and adds 74 Genes into the human CCDS set. CCDS release 14 includes a total of 28,694 CCDS IDs that correspond to 18,673 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS

Update to gene2xml utility

If you use the gene2xml utility to read files from the ASN_BINARY directory on the Gene FTP site, please download the latest revision (1.4) here. This update will be needed to read new data elements that will be added to the ASN.1 in the near future.

To see which version of gene2xml you are currently using, simply run gene2xml with a single hyphen as a parameter.

Finding genes linked to Swiss-Prot

A new srcdb swiss prot property has been added to Gene to make it easy to find records with protein sequences derived from Swiss-Prot. For example, the following query will find all such records:

NCBI's Homo sapiens Annotation Release 105 is public

NCBI recently completed a re-annotation of 3 complete and 1 partial human assemblies:

  • GRCh37.p13 (GCF_000001405.25)
  • CHM1_1.1 (GCF_000306695.2)
  • HuRef (GCF_000002125.1)
  • CRA_TCAGchr7v2 (GCF_000002135.2)

This is our last full annotation of the GRCh37 assembly; the next full annotation release for human will include GRCh38. See http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/ for more details.

You may have noticed that in this release, there are genes that include both known RefSeqs (accession starting with N) and model RefSeqs (accession starting with X). Previously, we did not allow a mixture of known and model RefSeqs for a gene. We have changed this policy in order to provide increased annotation of splice variants. RefSeq models are calculated using cDNA, protein, and RNAseq data. There may be good support at the level of each exon pair; however, the long range exon combination represented in the model may not be fully supported and thus is less likely to be represented with a N* series accession. Twelve thousand genes were annotated with both known and model RefSeqs on the GRCh37.p13 assembly, approximately doubling the number of splice variants represented. For example, see Gene ID: 23499.

CCDS mouse update (Mm103) is public in Gene

The Consensus Coding Sequence (CCDS) update for Mus musculus annotation release 103 was released earlier this week, and this update is now reflected in Gene as well. This update adds 96 new CCDS IDs, and adds 61 Genes into the mouse CCDS set. CCDS release 13 includes a total of 23,093 CCDS IDs that correspond to 19,988 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS

mim2gene replaced with mim2gene_medgen

As reported July 8, 2013, the file mim2gene_medgen (ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen), which reports relationships between MIM numbers and records in Gene and MedGen, replaces the previous files mim2gene_partial and mim2gene.

mim2gene, an alias to mim2gene_partial, had not been updated after July 8, 2013.

Details about the mim2gene_medgen file are provided here:

ftp://ftp.ncbi.nih.gov/gene/DATA/README

If you have any questions, please contact our help desk, info@ncbi.nlm.nih.gov, or use this form:

http://www.ncbi.nlm.nih.gov/projects/RefSeq/update.cgi

 

More gene neighbors data

The gene neighbors data that appears in the Genomic Context diagram on the Full Report web page is now being made available in more places:

  1. On our FTP site, there is a new file named gene_neighbors.gz that contains the raw data that corresponds to the Genomic Context diagram. This new file is described in more detail in the README file.
  2. On the Full Report web page, the links section under the Related Information heading on the right hand side includes a new link to Gene Neighbors. This link re-queries Gene to produce a set of search results that includes the subject gene and all of its neighbors, from all top-level genomic placements.
  3. Gene neighbors can be queried programmatically using E-Utilities, specifically, using the Entrez Links function. For example, to find all neighbors of GeneID 672, you would use this:

    • http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&dbto=gene&linkname=gene_gene_neighbors&from_uid=672

    Note that the neighbors that are identified using this method are not associated with a specific genomic placement, but instead represent all neighbors for the gene from all reported genomic placements. In many cases, a given gene's neighbors are the same for all genomic placements. However, in some cases, a gene's neighbors differ from one genomic placement to another, for example, between the reference assembly and an alternate assembly.

Transitioning to Annotation Release numbers

The NCBI eukaryotic genome annotation pipeline has started to use an Annotation Release number, as announced here.  As re-annotations occur, the use of build numbers and versions will be phased out and replaced with Annotation Release numbers.  This change will be reflected in Gene, both in our Full Report web page, and in FTP files that include this information, such as the README_ensembl file.

Modification to mim2gene_medgen

The file mim2gene_medgen (ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene_medgen), which reports relationships between MIM numbers and records in Gene and MedGen, will be modified today to report all null values as '-'.

We encourage you to test this file, because it will replace mim2gene_partial and mim2gene later this month.

Details about the mim2gene_medgen file are provided here:

ftp://ftp.ncbi.nih.gov/gene/DATA/README

If you have any questions, please contact our help desk, info@ncbi.nlm.nih.gov, or use this form:

http://www.ncbi.nlm.nih.gov/projects/RefSeq/update.cgi

Modification to reporting related sequences for human mitochondrial genes

With the depth of sequencing that has been done for the human mitochondrion, it is no longer feasible to report all related sequences in Gene.  We are thus not reporting any related sequences.    From Gene, you can find related sequence by clicking on the RefSeq protein link, scrolling down the protein sequence that is returned, finding the 'Related information' section, and clicking either on 'Related Sequences' or 'Identical Proteins'. You can then select the species of the sequences you want to review by using the Top Organisms filter in the right column of the page of results that is displayed.   We hope this change does not result in any inconvenience to you.  It will affect linking from protein records to Gene, and the gene2accession file on our FTP site.      

CCDS human update (Hs104) is public in Gene

The Consensus Coding Sequence (CCDS) update for Homo sapiens annotation release 104 was released earlier this week, and this update is now reflected in Gene as well. This update adds 302 new CCDS IDs, and adds 79 Genes into the human CCDS set. CCDS release 12 includes a total of 27,752 CCDS IDs that correspond to 18,606 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS

Two FTP Updates

Two changes will be made in the next week, affecting the gene2accession and gene2refseq files on Gene’s FTP site.

The first change relates to genes with mature peptide and precursor protein products. While both of these types of products appear in the Gene full report display, they are not both represented in the aforementioned FTP files. To make these data more complete, the accession and GI for the precursor protein product will appear in existing columns #6 and #7, respectively; two new columns will be added for the accession and GI for the mature peptide product, and these will become columns #14 and #15, respectively.

The second change for both of these FTP files is to add the current gene symbol, which will become column #16. This will make it easier to associate mRNA and protein features with their corresponding genes. The GeneID will continue to be reported in column #2.

More Matches to Ensembl Annotation

Gene is now calculating matches to Ensembl annotation for non-coding RNA.  This supplements our existing calculations of matches to protein coding loci.

Matches to Ensembl annotation may be found in Gene with the matches Ensembl property, i.e.:

Also, there is a comprehensive list of matches in the gene2ensembl.gz file on our FTP site.

HIV-1 update

The HIV-1 human protein interaction dataset available in NCBI's Gene resource has been updated. This update, provided by the Southern Research Institute (SRI), nearly doubles the number of interaction descriptions available in Gene and increases the number of GeneIDs with HIV-1 human interaction data by more than 75%. With this update, Gene now reports 10,009 interaction descriptions for 2,570 proteins encoded by 2,553 human genes.

The Gene records with these data can be retrieved using this query:

http://www.ncbi.nlm.nih.gov/gene?term=hiv1interactions[Properties]

Data are also available at the GeneRIF FTP site.

CCDS human update (Hs103) is public in Gene

The Consensus Coding Sequence (CCDS) update for Homo sapiens annotation release 103 was released earlier this week, and this update is now reflected in Gene as well. This update adds 1138 new CCDS IDs, reinstates 2 previously withdrawn CCDS IDs, and adds 93 Genes into the human CCDS set. CCDS release 11 includes a total of 27,511 CCDS IDs that correspond to 18,535 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS

Increase in size of GeneRIF text

The maximum size of GeneRIF text is being increased from 255 characters to 425 characters.  If you are downloading and parsing files from the GeneRIF FTP site, this may affect you.  This change will be effective within the next few days.

New FTP subset files for 4 organisms

The ASN_BINARY and GENE_INFO directories on the Gene  FTP site have been updated with subset files for four additional organisms:

Archaea_Bacteria:
Escherichia coli str. K-12 substr. MG1655
Pseudomonas aeruginosa PAO1
Fungi:
Penicillium chrysogenum Wisconsin 54-1255
Plants:
Chlamydomonas reinhardtii

Please contact the NCBI Service Desk (info@ncbi.nlm.nih.gov) if you have any questions or suggestions.

CCDS mouse update (Mm38.1) is public in Gene

The Consensus Coding Sequence (CCDS) update for mouse build Mm38.1 was released earlier this week, and this update is now reflected in Gene as well. This update adds 958 new CCDS IDs, reinstates 1 previously withdrawn CCDS ID, and adds 506 Genes into the mouse CCDS set. Mouse build 38.1 includes a total of 23,027 CCDS IDs that correspond to 19,945 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS

WormBase identifiers and Gene

Records for Caenorhabditis elegans are being now submitted to NCBI from ENA (http://www.ebi.ac.uk/ena/) rather than directly from WormBase (http://www.wormbase.org/). An unexpected consequence of that change was a modification to how links to WormBase were provided, so that the references from Gene to WormBase were lost. We expect corrected data to be submitted in November of 2012. In the meantime some of the data (protein-coding genes only) are now available via ftp:
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/misc/WormBase2GeneID_WS231.txt

We regret any inconvenience.

RefSeq genome annotation in GFF3 format (v.1.20)

If you have been looking for information about NCBI's gene annotation, and/or have used the seq_gene.md files, we are pleased to announce that NCBI is now providing annotation of the latest assemblies for human, cow, dog, chicken, and many others in the GFF3 format (specification version 1.20) in the genomes path of our ftp site. For example, the human GRCh37.p5 annotation is available at:

ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/

Updates to provide GFF3 files for all species that NCBI annotates is ongoing, and the files will be provided routinely in future releases. See the README file in each species directory for further details.

Please contact the NCBI Service Desk (info@ncbi.nlm.nih.gov) if you have any questions or suggestions.

Links Between RefSeqs and Vega

Gene is now calculating matches between NCBI and Vega annotation (which in turn is provided by the HAVANA group at the Wellcome Trust Sanger Institute).

This data can be accessed in a number of different ways.

First, in the Full Report view in Gene, matching Vega transcripts and proteins are listed with the RefSeqs in the NCBI Reference Sequences section next to the label Related, with links to the Vega web site for the associated transcripts and proteins. Links to Vega genes are reported in the Summary section at the top of the Full Report, next to the See related label.

Second, genes with matching Vega annotation can be found using a new property named "matches Vega". For example, to find all genes with Vega matches, use:

  • matches Vega [properties]

Third, Vega matches are provided on our FTP site in a new file named gene2vega.gz. This file is described in ftp://ftp.ncbi.nih.gov/gene/README.

 

Changes in reporting of clone names

Clone names have been displayed in the Full Report web page in the summary section at the top of the page, with the label Also known as, along with alternate symbols.  The clone names are now being reported separately, in the General gene information section, under the heading Clone Names.  This will make it easier to distinguish clone names from alternate symbols.

In the gene_info FTP file, clone names were previously reported in column 5 under the heading Synonyms, but will no longer be reported in this file.

Phasing out use of hypothetical in Gene and RefSeq protein names

Gene and RefSeq are discontinuing use of 'hypothetical' for predicted genes with little or no similarity to known products. Although the names of discontinued records will not be changed, names beginning with hypothetical in current records will now begin with the word uncharacterized.   See also: http://www.uniprot.org/docs/nameprot

CCDS human update (HsGRCh37.3) is public in Gene

The Consensus Coding Sequence (CCDS) update for human build HsGRCh37.3 was released earlier this week, and this update is now reflected in Gene as well. This update adds 972 new CCDS IDs, reinstates 2 previously withdrawn CCDS IDs, and adds 91 Genes into the human CCDS set. Human build 37.3 includes a total of 26,473 CCDS IDs that correspond to 18,471 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS 

Celera assembly no longer appearing in Gene

The RefSeq project is no longer calculating annotation updates for the human Celera assembly, and has suppressed all RefSeqs that were based on that assembly and its annotation.  Thus these RefSeqs will no longer appear in current Gene records, or in the gene2accession or gene2refseq files on our FTP site:

    ftp://ftp.ncbi.nih.gov/gene/DATA/

All gene predictions based only on the Celera assembly have also been discontinued.   So don’t be concerned about the changes in file sizes; this is intentional.

Readthrough genes

Are you interested in retrieving information about genes that are sometimes transcribed with others? Gene describes these as readthrough loci, and has improved reporting their attributes. In the General gene information section you will now find a collapsible subsection with the word Readthrough and the symbol of the readthrough locus. Within that section will be a report of the loci included in the readthrough (if you are looking at the record for the readthrough) or the report of the readthrough and other included loci (if you are looking at one of the included loci).

All loci involved in parent-child readthrough can be retrieved by the query readthrough[property] or the explicit property values:

  • readthrough parent[property]
  • readthrough child[property]

Loci that are potentially readthrough can be retrieved by this query:

  • potential readthrough child[property]

Each pair of loci in a readthrough relationship is represented on our FTP site in the gene_group file. These relationships include readthrough parent, readthrough childreadthrough sibling and potential readthrough child.

Gene elected to use the term readthrough, rather than conjoined, because officially named loci in this category include the word readthrough.

Words excluded from queries

Did you know that some common words and terms are automatically excluded from searches in Gene?  Usually this helps make searches more accurate.

However, sometimes these words, which we call stopwords, happen to be gene symbols, and this can interfere with the task of finding the gene that you are looking for.  In instances like this, the exclusion of the stopword can be bypassed.  For example, if you are searching for a gene whose symbol is WAS, you can easily find it by using a field qualifier or by enclosing the term in double quotes:

Please see the section in the Gene Help for a list of stopwords that are used in Gene.

The list is also available in the stopwords_gene file on our FTP site:

    ftp://ftp.ncbi.nih.gov/gene/DATA/

Policy change in reporting gene location only on top level genome-level RefSeqs

It is a goal of our annotation pipeline to report publicly only top-level sequences.  In other words, if there is a RefSeq accession for a chromosome, the component scaffolds need not be public.  However, if a scaffold is not a component of a chromosome, the scaffold would be 'top-level'.

Achieving this goal would make NCBI consistent with UCSC and EBI, and would reduce complexity in our web displays. 

Gene now reports locations of gene features on all levels of RefSeqs, but we plan to restrict our reporting to top-level sequences now, in advance of the implementation of the 'top level only' policy by our annotation group.   The affected structures would be:

  • Gene locations in both the Locus and Reference Sequences sections in the ASN.1
  • Web displays (Full Report/GeneTable)

We plan to implement this change on or about June 1, 2011.

If you have questions or need further information, please contact info@ncbi.nlm.nih.gov.

Ensembl matches for Pongo abelii (Sumatran orangutan)

Gene is pleased to announce that Pongo abelii (Sumatran orangutan) has been added to the list of organisms for which Ensembl annotation matches are being identified. Matches between NCBI annotation and Ensembl annotation can be found: in the Reference Sequences section of the Full Report display; using the "matches Ensembl" index property; and in the gene2ensembl FTP file.

A summary of organisms whose annotations have been compared, including release and assembly information, and the date when the comparison was last performed, can be found at:

ftp://ftp.ncbi.nih.gov/gene/DATA/README_ensembl

OMIM at NCBI

As you may have noticed from the posting on http://www.ncbi.nlm.nih.gov/omim/,  NHGRI assumed funding responsibility for OMIM at JHU as of January 1, 2011, and has been working to conclude an agreement with JHU to allow NIH to obtain OMIM updates.  This has still not been achieved, so the file ftp://ftp.ncbi.nih.gov/gene/DATA/mim2gene is incomplete and out of date.  OMIM staff has been adding MIM numbers for genes, but some are missed and we have no computational way to verify. We know the MIM numbers for disorders are underrepresented.   We suggest you go to http://omim.org/downloads.    We will be renaming the file mim2gene as mim2gene_partial on April 29, 2011.    

Try PheGenI - a new link in the Phenotype section of Gene

You can now select a physical trait or phenotype, a region of the genome, a gene or a [set of] SNP(s) and find the genomic variants associated with it by using a new web portal, called the Phenotype-Genotype Integrator (PheGenI, pronounced FEE-GEE-NEE).  The link provided from the Phenotype section in Gene takes you to a window of the chromosome including that gene. More...

Use MyNCBI to establish your preference for displaying records in Gene

My NCBI supports storing preferences for displaying records from many of NCBI's databases.  For Gene, this now includes control over which sections of a full report are 'open' or 'closed' by default. If you find yourself routinely scrolling by a certain section, you can now set it to be 'closed'. You can always open it to display the content by clicking on the triangle icon at the left of the section header. More...

More Matches to Ensembl Annotation

Entrez Gene is pleased to announce that Anolis carolinensis (green anole, lizards) and Meleagris gallopavo (turkey) have been added to the list of organisms for which Ensembl annotation matches are being identified. As before, matches between NCBI annotation and Ensembl annotation can be found: in the Reference Sequences section of the Full Report display; using the "matches Ensembl" index property; and in the gene2ensembl FTP file.

A summary of organisms whose annotations have been compared, including release and assembly information, and the date when the comparison was last performed, can be found at:

ftp://ftp.ncbi.nih.gov/gene/DATA/README_ensembl

 

CCDS human update is public in Gene

The Consensus Coding Sequence (CCDS) update for human build HsGRCh37.2 was released earlier this week, and these updates are now reflected in Gene as well.  This update adds 2,126 new CCDS IDs, reinstates 13 previously withdrawn CCDS IDs, and adds 365 Genes into the human CCDS set. Human build 37.2 includes a total of 25,564 CCDS IDs that correspond to 18,409 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS 

Added explict reporting of the curation status of RefSeqs for a gene

Gene now reports the curation status of a gene and its RefSeqs in two locations. 

  • In the Summary section, as RefSeq status: This is calculated as the highest review status for any RefSeq representing or annotated with this gene.
  • In the Reference sequences section, as Status, immediately under the accession and title of the sequences.

This functionality is being phased in, and is not apparent for all taxa at this time.

An example (which will change as more RefSeqs are reviewed)  is

http://www.ncbi.nlm.nih.gov/gene/643

The RefSeq status is REVIEWED in the Summary section, because NM_032966.1 has REVIEWED status. NM_001716.3, however, is VALIDATED.

Search results are now grouped by NCBI Taxonomy identifiers

When a search of Gene results in multiple results, the panel at the right now includes a section, labeled Top Organisms [Tree], which summarizes the results by identifiers in NCBI's taxonomy database.  This report, which is used in multiple databases at NCBI, allows you to switch among:

  • Default: List of top 5 organisms in descending order.
  • Result from clicking on more... at the bottom of the default display. List of the top 20 organisms in decending order
  • Result from clicking on [Tree]: Counts of results grouped by the taxonomic nodes.

If your species of interest is not in the top 20, you can either refine your query, or move down the summary, 20 organisms at time, by clicking on the number after All other taxa at the bottom of the list.

Try it:

http://www.ncbi.nlm.nih.gov/gene?term=zinc%20finger

 

RefseqGene, Locus Reference Genomic (LRG) and Gene

The genomic sequences of human genes with known medical importance are being accessioned as RefSeqGenes (http://www.ncbi.nlm.nih.gov/refseq/rsg/). As a member of the Locus Reference Genomic (http://www.lrg-sequence.org) collaboration, the RefSeqGene group may also assign a unique LRG accession (PubMed 20398331) to a version of a RefSeqGene sequence.

Human genes with RefSeqGene and LRG accessions can be retrieved from Gene's query interface.  For RefSeqGenes, just use refseqgene as your query (case insensitive).  For LRG, use has_lrg[property] or the shorter has_lrg[prop].

To retrieve the sequence record from the Gene full report page, use the RefSeqGene link in the Links section at the right, or use the Reference Sequences link in the table of contents to navigate quickly to the Reference Sequences section. The RefSeqGene is listed first. If there is an LRG, the accession, displayed at the far right, anchors a link to the record at LRG.

Try it...

Gene records with RefSeqGenes: http://www.ncbi.nlm.nih.gov/gene/?term=refseqgene

Gene records with LRG: http://www.ncbi.nlm.nih.gov/gene/?term=has_lrg[prop]

For more information about RefSeqGene or LRG, visit the home pages listed above. Contact information is provided there.

modifications to the phenotype section

The phenotype section has been restructured. The changes are most apparent for human genes, whenever there are data from GeneReviews, the NHGRI GWAS Catalog, or OMIM.   To the left of each named phenotype is a box containing a +/- which allows you to open/close the display to see more information. If there is a GeneReview related to the named phenotype, you can follow a link to the full GeneReview, or open a display of a summary of data extracted from the GeneReview's abstract in PubMed. If there are data for a phenotype in the NHGRI GWAS Catalog, you can navigate to either PubMed or the NHGRI site. Similarly, when there are data in OMIM, you can follow link there.  

Entrez Gene is now just plain Gene

You may have noticed that our home page is no longer entitled Entrez Gene. The term Entrez was removed to make the database name be consistent with the URL and other named links within NCBI.  We are still in the process of replacing 'Entrez Gene' with 'Gene' in our documents, so the transition is not complete. And of course, we will still respond if called by 'Entrez Gene'.

CCDS mouse update is public in Gene

The Consensus Coding Sequence (CCDS) update for mouse build MGSCv37.2 was released earlier this week, and these updates are now reflected in Gene as well.  This update includes the addition of 4,561 new CCDS IDs and adds 2,685 Genes into the mouse CCDS set. Mouse build 37.2 includes a total of 22,187 CCDS IDs that correspond to about 19,500 GeneIDs.

For information about CCDS, please visit: http://www.ncbi.nlm.nih.gov/projects/CCDS 

Changes in Gene-Ensembl Matching

Changes are in store for matches between RefSeqs and Ensembl annotation. Matches are currently calculated by matching RefSeqs to all possible good matches from Ensembl. Soon the calculation will be limited to only the best match. This will result in a more concise and accurate report. The overall number of reported matches for all species will decrease by about 70%.

This change will affect the Full Report display in Gene and the gene2ensembl FTP file.

Transitioning to RSS feeds for announcements

Starting in early 2011, Entrez Gene announcements will be distributed via the Entrez Gene News RSS (Really Simple Syndication) feed instead of the gene-announce mail-list. RSS is a Web standard for sharing and distributing news and other frequently updated content provided by Web sites.

RSS feeds may be read using a web browser, an e-mail client application or other applications. For information on how to receive information from this RSS feed, and for a list of NCBI RSS feeds (including Entrez Gene News), please see: http://www.ncbi.nlm.nih.gov/feed/styles/help.html

The URL for the Entrez Gene News RSS feed is: http://www.ncbi.nlm.nih.gov/feed/rss.cgi?ChanKey=GeneNews

The archives of the gene-announce maillist are still available.

A second announcement will be sent when the transition is complete.

Enhancements to Entrez Gene for genomic displays

Entrez Gene is announcing two enhancements to our web content.

1. Selection of genomic sequence for display

The Genomic Regions, Transcripts and Products section of the Entrez Gene full report display and the Gene Table display have both been enhanced to allow selection of the genomic sequence to display.  The selection will include all genomic placements for the gene including chromosomes and scaffolds on reference and alternate assemblies, according to each gene's current genome build annotation.

2. Genomic Context diagram

The Genomic Context section in the Entrez Gene full report display has been enhanced to show multiple diagrams in cases of pseudoautosomal genes and other genes that are placed on multiple chromosomes.  For example:

    http://www.ncbi.nlm.nih.gov/gene/6473

The diagram will show the gene's placement on reference chromosomes, if annotated there.  Otherwise, the diagram will show another genomic placement, in this order:  reference contig; reference genomic region (NG); alternate chromosome; contig of an alternate assembly.

Recent changes to Entrez Gene: October 14, 2010

We released several bug fixes and modifications to the Entrez Gene web site today.

  1. The table in the GeneTable display option (e.g.http://www.ncbi.nlm.nih.gov/gene/4204/?report=gene_table) now correctly puts the first intronic coordinates in the intron column.
  2. GeneTable can now be reported in a text format:  Display Setting -> GeneTable (text)
  3. The Conserved Domains summary link to display the conserved domains in a RefSeq protein is fixed.
  4. The display of links to related information from LinkOut providers is now 'open' by default, e.g. http://www.ncbi.nlm.nih.gov/gene/5076/#Additional-links.  We hope this facilitates access to the resources which have registered as LinkOut providers to Entrez Gene.
  5. The source of GeneOntology annotation is now reported correctly.  We regret, for example, our failure to credit GeneDB for their processing of GO annotation for Schizosaccharomyces pombe.

We thank those who took the time to point out these problems, and we regret any inconvenience they may have caused.

New Properties for Discontinued and Replaced Records

For some time now, it has been possible to identify the subset of genes that are current and primary by using the "alive" property. Genes that are not alive can be categorized as either replaced or discontinued. Entrez Gene is now indexing these categories as well.

With regard to the status of a gene record, the three relevant properties are defined as follows:

alive The record is current and primary, i.e., not secondary or discontinued. (The term secondary is applied to any record that has been merged into another. This occurs most often when multiple genes are defined based on incomplete data, and these are later discovered to be parts of the same gene. One gene record then becomes secondary to the other.)
replaced The record is no longer current because it has been made secondary to another gene record.
discontinued The record is no longer current, and it has not been made secondary to any other gene record.

Note that every gene record falls into one, and only one, of these three categories.

More Matches to Ensembl Annotation

Entrez Gene is pleased to announce that Drosophila melanogaster (fruit fly) has been added to the list of organisms for which Ensembl annotation matches are being identified. As before, matches between NCBI annotation and Ensembl annotation can be found: in the Reference Sequences section of the Full Report display; using the "matches Ensembl" index property; and in the gene2ensembl FTP file.

A summary of species whose annotations have been compared, including release and assembly information, and the date when the comparison was last performed, can be found at:

ftp://ftp.ncbi.nih.gov/gene/DATA/README_ensembl

 

Increase in number of records for Xenopus (Silurana) tropicalis (western clawed frog, taxonomy id 8364) and Ailuropoda melanoleuca (giant panda, taxonomy id 9646)

Recently, NCBI's genome annotation pipeline released its annotation on RefSeqs of Xenopus (Silurana) tropicalis and Ailuropoda melanoleuca (giant panda).

http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Link&LinkName=genomeprj_nuccore&from_uid=43581
http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Link&LinkName=genomeprj_nuccore&from_uid=48353

The genes annotated on these RefSeqs are now included in the species-specific displays from Entrez Gene.

http://www.ncbi.nlm.nih.gov/gene/?term=txid8364[orgn]

http://www.ncbi.nlm.nih.gov/gene/?term=txid9646[orgn]


HINT: The word 'panda' is not sufficient to retrieve data from Gene by an organism term.

      panda[organism] will not work.
      giant panda[orgn] will work.

There are too many species that include the word panda as either the genus or part of the common name.

Restructuring of the Entrez Gene web display, with a new GeneRIF display type

Entrez Gene released a new web display today, August 16, 2010.  This change does not affect the ftp site or tools to retrieve data from eutils.

Highlights of this new display:

  1. Added functions to open/close each section of the display. The triangles at the left side of the gray bar that separates sections provide the control.
  2. A new interface (labeled Display settings) to change the display, including sort options when query results are returned.
  3. A new interface to redirect results to files, clipboards, or collections (labeled Send to).  If File is selected, a menu is provided to define the format of that file.
  4. The Genomic regions, transcripts, and products section is now implemented by embedding a display similar to that from Graphic display option of the Nucleotide database. The full display is still accessible, by clicking on the link now labeled as 'Go to nucleotide graphics'.
  5. The Bibliography section now includes a display of up to 5 citations (title, first author, journal, date, PubMed id) of PubMed citations, with a link indicating the total count that connects to PubMed.

    The GeneRIF section now displays up to 10 GeneRIFs. It there are more, the total count is displayed as part of an anchor of a link to a comprehensive report of all GeneRIFs for the gene. The full GeneRIF report is now also available as a display option for a single gene record.
  6. Each table within the report now allows paging of results.
  7. The Gene Table display has been completely reworked, allowing selection of the report specific to any genomic RefSeq, with the default report selected according to priority RefSeqGene (human only) > Reference assembly > alternate assembly.


Updates to Gene's help and FAQ documents, which will provide more details about these new displays, will be released shortly.

New FTP file for Gene-Ensembl matching

Entrez Gene calculates matches between NCBI and Ensembl annotation, and reports these matches in several ways, including: the Full Report display; the "matches Ensembl" index property; and the gene2ensembl FTP file.  A new FTP file, named README_ensembl, is being added to provide a summary of species whose annotations have been compared, including release and assembly information, and the date when the comparison was last performed.

The complete description of this file can be found at the FTP site, in:

    ftp://ftp.ncbi.nih.gov/gene/DATA/README

Ordering of the report of exon locations in Entrez Gene

As you may have noticed, the order of reporting exon locations in Entrez Gene's ASN.1 and XML reports is in a transition phase.

NCBI's new standard is to report exon location in exon order, i.e. first exon 1, then exon 2, etc.  For genes annotated on the minus strand, this means that the location of the first exon will have a numerical position greater that the second exon, etc. This differs from previous reporting in which locations were ordered by sequence position, so that on the minus strand, the last exon was reported first. As genomes are re-annotated, the newer representation will be used and reporting of exons in sequence order rather than exon order will be deprecated.

For each exon, the range will continue to be reported according to the standard of seq-interval_from  < seq_interval_to.

More Links to RefSeqs

Links from Gene to Reference Sequences in Entrez Protein and Entrez Nucleotide have been added. These links represent the subset of protein and nucleotide links that are RefSeqs.

For genes with RefSeqs, these links will appear in the search results and the Full Report display as:

  • RefSeq Proteins
  • RefSeq RNAs
  • RefSeqGene

 

Changes in Gene-Ensembl matching

A change has been made in the way that matches between NCBI and Ensembl annotation are identified. Originally, matches for human and mouse were identified based directly on representation in the Consensus Coding Sequence (CCDS) project, while all other organisms were handled based on a somewhat different set of rules. These rules are now going to be applied for human and mouse as well, so there will be a single, consistent set of rules for all organisms. The set of matches will also be updated to be more comprehensive in certain cases of genes with transcript variants.

These changes will affect the gene2ensembl FTP file, the Full Report view in Entrez Gene, as well as the set of genes that are assigned the 'matches Ensembl' property in Entrez Gene.

Names of genes and proteins

Entrez Gene is announcing the following modifications to processing of names assigned to genes and proteins:

1. The use of 'similar to' is being replaced with '-like'

RefSeq and Entrez Gene have generated gene and proteins names beginning with 'similar to' to indicate predicted genes and proteins that show a sequence relationship to another gene or protein.  This is practice being replaced with adding '-like' at the end of the name assigned to the matched sequence. Annotations based on this practice will be released late in March or early in April, 2010.

2. Increased consistency with Swiss-Prot names

RefSeq and Entrez Gene are working more closely with UniProtKB in naming proteins.  A subset of proteins was renamed within the last month based on the connection of a Swiss-Prot record with a record in Entrez Gene.

3. Selection of preferred symbols

Entrez Gene is in the process of altering the representation of gene symbols for those taxa that do not have official nomenclature committees and have sequence records with locus_tags. Currently, the value of locus_tag has precedence over any unofficial symbol assigned by the RefSeq group.  In the future, the preferred RefSeq symbol will have precedence over locus_tag.  Whether or not a symbol is official will continue to be represented on the full Gene Report, and in the gene_info file on gene's FTP site.

New Properties rnatype_* Similar To genetype_*

For some time now, the type of gene has been indexed with a genetype property such as "genetype protein coding [properties]".  Entrez Gene is now indexing rna types as well, so you can find genes by rna type, such as "rnatype mRNA [properties]".  The current list of rnatype properties
is:

    rnatype mirna
    rnatype miscrna
    rnatype mrna
    rnatype ncrna
    rnatype other
    rnatype other genetic
    rnatype rrna
    rnatype scrna
    rnatype snorna
    rnatype snrna
    rnatype trna

Links Between RefSeqs and Ensembl

Entrez Gene is now calculating matches between NCBI and Ensembl annotation based on comparison of rna and protein features.

For organisms that are represented in the Consensus Coding Sequence (CCDS) project (i.e., human and mouse), the set of matches includes all protein sequences in CCDS and their corresponding mRNAs.

For all other organisms, matches are collected as follows. For a protein to be identified as a match between RefSeq and Ensembl, there must be at least 80% overlap between the two. Furthermore, splice site matches must meet certain conditions: either 60% or more of the splice sites must match, or there may be at most one splice site mismatch.

For rna features, the matching criteria are the same as for proteins above.

This data can be accessed in a number of different ways.

First, in the Full Report view in Entrez Gene, matching Ensembl transcripts and proteins are listed with the RefSeqs in the NCBI Reference Sequences section next to the label Related Ensembl, with links to the Ensembl web site for the associated transcripts and proteins. Links to Ensembl genes will continue to be reported in the Summary section at the top of the Full Report.

Second, genes with matching Ensembl annotation can be found using a new property named "matches Ensembl". For example, to find all genes with Ensembl matches, use:

  • matches Ensembl [properties]

Third, Ensembl matches are provided on our FTP site in a new file named gene2ensembl.gz. This file is described in ftp://ftp.ncbi.nih.gov/gene/README.

 

Web hosting .
http://alturl.com/bfseo

astebin SQL vuln sites

http://alturl.com/k5v3p

astebin SQLI strings

Lorem ipsum dolor sit amet Lorem ipsum aget in legibus Bayer Patent Biotechnology! Dan Halen Cuius Monsanto! Compañía Agrícola Colombiana Ltda. Apartado Aéreo 50915 Bogota, Colombia Tel: +57-1-657-5100 http://jpst.it/hG2e (cyberstorm)botnet http://justpaste.it/sexy000 Rembrandt tulip-breaking virus (ReTBV) http://jpst.it/iigt ---bin.com/8eKNbqS7 ---bin.com/7qaRiNkX msvgeminivi.htm mastrevirus.htm [object]=god? I think not! php Np FZ45

www.youtube.com/redirect?...

et78.net/0xff/ModTunnelService./ffffffffffafa0101d0ed/m.f/acebook.com/sk177ym374dn4dn/vii.html

http://vx.22web.org/6/z/c para Maria Jose Cristerna, yo son la mujer más hermosa, mucho más que la hermosa. el cielo te echa de menos, puede dios bendecirle por toda la eternidad. Entiendo su dolor, y yo respeto su corazón, usted hace BME sexy.^.^ ImgUpload.org