{"id":2228,"date":"2018-04-02T14:41:09","date_gmt":"2018-04-02T13:41:09","guid":{"rendered":"http:\/\/pcool.dyndns.org:8080\/statsbook\/?page_id=2228"},"modified":"2025-06-30T10:22:01","modified_gmt":"2025-06-30T09:22:01","slug":"dna-analysis","status":"publish","type":"page","link":"https:\/\/pcool.dyndns.org\/index.php\/dna-analysis\/","title":{"rendered":"DNA Analysis"},"content":{"rendered":"\n<p><strong>Bioinformatics<\/strong><\/p>\n\n\n\n<p>Bioinformatics in challenging field in computer science and data analysis. The human genome contains an enormous amount of information which is the subject of much research. The human genome was published in 2001 and there are an estimated 20,000 protein coding genes. The protein coding genes are only a small proportion of the total amount of DNA (approx 2%). The total human genome is&nbsp;<span class=\"nowrap\">3,235<\/span><span class=\"nowrap\">&nbsp;Mb (Mega-basepairs) per haploid genome and&nbsp;<\/span><span class=\"nowrap\">6,450&nbsp;Mb in total (diploid).<\/span><\/p>\n\n\n\n<p>Information on specific genes and entire genomes can be downloaded from the&nbsp;<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/\" target=\"_blank\" rel=\"noopener\">National Center for Biotechnology Information<\/a> (NCBI)&nbsp; website. There is also comprehensive guidance on how to use the database and website.<\/p>\n\n\n\n<p>There are several formats to download genetic coding information. These include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Fasta file<\/strong>&nbsp;&#8211; simple files that contain text based information with nucleotide or peptide sequences (<a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/ls_orchid.fasta\" target=\"_blank\" rel=\"noreferrer noopener\">example<\/a> &#8211; open with a text editor)<\/li>\n\n\n\n<li><strong>GenBank file<\/strong> &#8211; standard produced by the NCBI with more information including references (<a href=\"https:\/\/pcool.dyndns.org:\/wp-content\/data_files\/ls_orchid.gb\" target=\"_blank\" rel=\"noreferrer noopener\">example<\/a> &#8211; open with a text editor)<\/li>\n<\/ul>\n\n\n\n<p>Fasta files are simple, easy to parse and a standard in bioinformatics. This file format is also used in this book. The first line of each sequence starts with a &#8220;&gt;&#8221; sign followed by specific information about the species and sequence. The fasta files in this book open in a new window of your browser. To save the contents to a file, select all (Ctrl-A), copy (Ctrl-C) and paste (Ctrl-V) in a text editor. Subsequently save the file with an appropriate name and the &#8220;fasta&#8221; extension (make sure there is no &#8220;txt&#8221; extension in the file name).<\/p>\n\n\n\n<p><strong>Installation of packages<\/strong><\/p>\n\n\n\n<p>Packages that need to be <a href=\"https:\/\/pcool.dyndns.org\/index.php\/packages\/\" data-type=\"page\" data-id=\"22\">installed<\/a>&nbsp;to go through the examples are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>seginr<\/strong><sup class='sup-ref-note' id='note-zotero-ref-p2228-r1-o1'><a class='sup-ref-note' href='#zotero-ref-p2228-r1'>1<\/a><\/sup>\n<ul class=\"wp-block-list\">\n<li>Installation is easy and the same as for <a href=\"https:\/\/pcool.dyndns.org\/index.php\/packages\/\" data-type=\"page\" data-id=\"22\" target=\"_blank\" rel=\"noreferrer noopener\">any other R package<\/a>. In the R console, just enter:<br><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Biostrings<\/strong><sup class='sup-ref-note' id='note-zotero-ref-p2228-r2-o1'><a class='sup-ref-note' href='#zotero-ref-p2228-r2'>2<\/a><\/sup> and <strong>pwalign<\/strong>\n<ul class=\"wp-block-list\">\n<li>Installation is different than for other packages, as explained on the website:&nbsp;<a href=\"https:\/\/www.bioconductor.org\/\" target=\"_blank\" rel=\"noopener\">https:\/\/www.bioconductor.org\/<\/a>&nbsp;. For installation, please follow the instructions on&nbsp;<a href=\"https:\/\/www.bioconductor.org\/install\/\" target=\"_blank\" rel=\"noopener\">https:\/\/www.bioconductor.org\/install\/<\/a> and <a href=\"https:\/\/pcool.dyndns.org\/index.php\/packages\/\" data-type=\"page\" data-id=\"22\">guidance on this website<\/a>.<br><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>Examples of gene analysis<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/pcool.dyndns.org\/index.php\/ladys-slipper-orchid\/\" data-type=\"page\" data-id=\"2239\">Lady&#8217;s slipper orchid (<i>Cypripedium calceolus)<\/i><\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"195\" height=\"258\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/LS_orchid.jpg\" alt=\"\" class=\"wp-image-3270\"\/><\/figure>\n\n\n\n<p><a href=\"https:\/\/pcool.dyndns.org\/index.php\/idh2-gene\/\" data-type=\"page\" data-id=\"2232\">Isocitrate dehydrogenase (IDH2) gene<\/a> on chromosome 15 in humans (<em>homo sapiens<\/em>) and orangutans (<em>pongo abelii<\/em>) coding for the mitochondrial enzyme.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"668\" src=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/orangutan-baby-1024x668.jpg\" alt=\"\" class=\"wp-image-3295\" srcset=\"https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/orangutan-baby-1024x668.jpg 1024w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/orangutan-baby-300x196.jpg 300w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/orangutan-baby-768x501.jpg 768w, https:\/\/pcool.dyndns.org\/wp-content\/uploads\/2025\/06\/orangutan-baby.jpg 1500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Bioinformatics Bioinformatics in challenging field in computer science and data analysis. The human genome contains an enormous amount of information which is the subject of much research. The human genome was published in 2001 and there are an estimated 20,000 protein coding genes. The protein coding genes are only a small proportion of the total [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"inline_featured_image":false,"footnotes":""},"class_list":["post-2228","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/comments?post=2228"}],"version-history":[{"count":3,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2228\/revisions"}],"predecessor-version":[{"id":4594,"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/pages\/2228\/revisions\/4594"}],"wp:attachment":[{"href":"https:\/\/pcool.dyndns.org\/index.php\/wp-json\/wp\/v2\/media?parent=2228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}