– Please can some look over part2 and explain to me how it do it.DescriptionSolu
– Please can some look over part2 and explain to me how it do it.DescriptionSolution downloadThe QuestionPlease can some look over part2 and explain to me how it do it.Recall that the Huffman algorithm to construct the tree was to find the two least frequentcharacters(suppose they occurred n1 and n2 times respectively), draw edges from each of them to a newparent symbol, and now treat the parent as a new symbol that occurred n1+n2 times. We repeatedthis process until all the symbols were combined together into a single tree. Suppose there are ksymbols we have to represent, let ni be the number of times that the ith symbol occurs in the text,and let li be the number of bits used to represent the ith symbol. Then, the total number of bitsused to represent all the copies of the ith symbol is ni ? l i, and so the total number of bits used torepresent the entire text will be (n1 ? l1)+(n2 ? l 2)+. . .+(nk ? l k).To find the optimal tree, instead of using the number of times ni that the ith symbol occurs(called thefrequency of the symbol), we could use its relative frequency fi, which is ni/n, where n is thetotal number of symbols in the text. (In our problem, n is the total number of bases in thegenome, and f1 is the fraction of the bases that are of type ?1?.) The two symbols with thesmallest frequency will be the two symbols with the smallest relative frequency (since we justdivide everything by n), and it is easy to see that running the algorithm with the relativefrequencies would give us the same tree as if we were to use the frequencies. The averagenumber of bits used to represent a single text symbol will be (f1? l1)+(f2? l 2)+(f3? l 3)+…+(fk?l k). The total number of bits used will be n times the average number of bits per symbol. (Youshould check that the total number of bits using this formula is the same as the answer from theslightly different formula we used in the previous paragraph.)Suppose the relative frequencies of the bases are 27% A, 26% C, 24% G, and 23% T. Toconstruct theHuffman code, we would note that G and T have the lowest frequencies, so we would connectthem together to form a ?new symbol? with relative frequency 47%. (See the left side of Figure1.)Figure 1. Initial steps of Huffman encodingNext, A and C have the two smallest frequencies (from among those remaining), so we connectthem, and form a new symbol with relative frequency 53%. (See the right side of Figure1). Nowthat we only have two ?symbols? left, we connect them to form a tree, as below. Now that wehave a complete tree, we can assign labels to the nodes, so that A is 00, C is 01, and so on.Figure 2. Complete Huffman treeNow suppose the relative frequencies are 96% A, 2% C, 1% G and 1% T. If we were to constructtheHuffman tree, we would start by combining the symbols G and T, since they have the lowestrelativefrequencies. This would give us the figure on the left below, with the new combined ?symbol?having relative frequency 1+1 = 2. Now, the two lowest remaining frequencies are for the newsymbol and C, so we connect them, giving us the right side of Figure 3.Figure 3. Partial development of second Huffman treeNow, we have only two ?symbols? left; we connect them to form the tree below. We assignlabels to thenodes, with 0 indicating that we move left, and 1 that we move right. So A gets the label 0, Cgets the label 10, and so on.Figure 4. Complete second Huffman tree1. First, notice that there are only two really different Huffman?type trees that have 4?leaves? (that is, that have 4 symbols to represent). The first tree looks like the one we gotwhen all the symbols have nearly the same frequency (in this case, the tree looksperfectly balanced, and all the symbols have two?bit representations). The second treelooks like t
The post – Please can some look over part2 and explain to me how it do it.DescriptionSolu appeared first on PAPER WRITE.