티스토리 툴바


Research/Technical2011/01/12 15:11

배열을 사용하다가 그에 속한 요소들을 지울 일이 생겨서 시행착오를 겪던 중
delete()만 아니라 splice()로 유용하다는 걸 발견했다.


my @temp = qw( 1 2 3 4 5 6 7 8 9 10 );
foreach my $each_idx ( 0..$#temp )      {
        print $each_idx ."\t". join(" # ", @temp) ."\n";
        my $each_val = $temp[$each_idx];
        if ( $each_val % 3 == 0 )       {
                delete $temp[ $each_idx ];
        }
        print $each_idx ."\t". join(" @ ", @temp) ."\n";
}

@temp에 1부터 10까지 요소들을 넣어놓고, 각 요소를 돌면서 3의 배수를 지우는 코드인데,
실행해보면 중간중간 'Use of uninitialized value in join or string at ~~' 라는 에러(경고)가 발생한다.
이유는, delete()는 그 요소의 값을 지우기는 하는데 그 공간을 완전히 없애버리는 게 아니라
undef()를 실행한 것처럼 만들어 버리기 때문인 것 같다. 즉, $temp[2]에 '3'이 저장된 상태에서
undef( $temp[2] )를 실행한 것과 같은 상태로 만들어버려서 print()로 그 값을 출력하려고 하면 $temp[2]는
이미 'uninitialize'된 상태인 것.

반면에, 'delete $temp[ $each_idx ];'를 'splice @temp,$each_idx,1;'로 대신하게 되면 다르다.
splice()는 그 공간 자체를 지워버린 다음, 그 뒷쪽에 있는 요소들을 비어버린 공간만큼 당겨서 채운다.
아래에 퍼온 페이지에서는
 'while the splice() function removes the slot entirely and shifts the remainder of the array down to fill in the gap'
이라고 설명하고 있다. 그렇기 때문에 delete() 결과와 달리 삭제한 부분에 'undef'된 상태인 요소가 없어서 print()를 실행해도 문제가 없다.

한 가지 더, foreach loop에 대한 내용인데, 위의 예에서 '0..$#temp'의 목록을 순환하는데, 이 목록은
매번 갱신되는 것으로 보인다. loop를 돌던 중에 @temp에 변화가 생겨서 배열이 짧아지거나 길어지면
다음번 loop에서 그것이 반영되는 것. 당연한 부분일 수도 있는데 확신이 들지 않아서 시험해 봤다.

///////////
September 12, 2001, 11:00 PM —  ITworld — 
'Deleting' elements from an array might mean two things: deleting the value for a particular index (or indices) in the array (while still leaving the slot in the array open), or, actually removing a slot (and its contents) from the array. The first case can be accomplished with the delete() function and the second with the splice function.

my @array = (0,1,2,3,4,5,6);
delete $array[3];
print join(':', @array),"\n";
splice(@array, 3, 1);
print join(':', @array),"\n";
This snippet produces the following output:

Use of uninitialized value in join or string at - line 3.
0:1:2::4:5:6
0:1:2:4:5:6

You can see that the delete() function only deletes the value at index 3 in the array, while the splice() function removes the slot entirely and shifts the remainder of the array down to fill in the gap.The delete() function can also be used on an array slice as well as a single element. That slice need not be a contiguous range of elements:

# delete a range
delete @array[0..3];
# or a discontiquous slice
delete @array[0,3,5];

(이하 생략)

'Research > Technical' 카테고리의 다른 글

Perl - delete() & splice()  (0) 2011/01/12
Bio::DB::Sam  (0) 2010/12/30
Bioperl Tutorial  (0) 2010/12/30
2010. 12. 30. Bioperl의 SAMTools 관련 모듈 설치  (0) 2010/12/30
[R] Using Data in 'table' type  (0) 2010/09/10
[R] Sorting My Data by a Specific Column  (0) 2010/09/10
Posted by Snowyday
TAG perl
Research/Technical2010/12/30 16:35



The high-level API

The high-level API provides a BioPerl-compatible interface to indexed BAM files. The BAM database is treated as a collection of Bio::SeqFeatureI features, and can be searched for features by name, location, type and combinations of feature tags such as whether the alignment is part of a mate-pair.

When opening a BAM database using the high-level API, you provide the pathnames of two files: the FASTA file that contains the reference genome sequence, and the BAM file that contains the query sequences and their alignments. If either of the two files needs to be indexed, the indexing will happen automatically. You can then query the database for alignment features by combinations of name, position, type, and feature tag.

The high-level API provides access to up to four feature "types":
 * "match": The "raw" unpaired alignment between a read and the reference sequence.
 * "read_pair": Paired alignments
  - a single composite feature that contains two subfeatures for the alignments of each of the mates in a mate pair.
 * "coverage":
  - A feature that spans a region of interest that contains numeric information on the coverage of reads across the region.
 * "region":
  - A way of retrieving information about the reference sequence. Searching for features of type "region" will return a list of chromosomes or contigs in the reference sequence, rather than read alignments.
 * "chromosome": A synonym for "region".

.......

The main object classes that you will be dealing with in the high-level API are as follows:
 * Bio::DB::Sam                      -- A collection of alignments and reference sequences.
 * Bio::DB::Bam::Alignment    -- The alignment between a query and the reference.
 * Bio::DB::Bam::Query          -- An object corresponding to the query sequence in which both (+) and (-)
                                              strand alignments are shown in the reference (+) strand.
 * Bio::DB::Bam::Target         -- An interface to the query sequence in which (-) strand alignments are
                                              shown in reverse complement

You may encounter other classes as well. These include:
 * Bio::DB::Sam::Segment         -- This corresponds to a region on the reference sequence.
 * Bio::DB::Sam::Constants       -- This defines CIGAR symbol constants and flags.
 * Bio::DB::Bam::AlignWrapper  -- An alignment helper object that adds split alignment functionality.
                                                   See Bio::DB::Bam::Alignment for the documentation on using it.
 * Bio::DB::Bam::ReadIterator   -- An iterator that mediates the one-feature-at-a-time 
                                                  retrieval mechanism.
 * Bio::DB::Bam::FetchIterator -- Another iterator for feature-at-a-time retrieval.

The low-level API

The low-level API closely mirrors that of the libbam library. It provides the ability to open TAM and BAM files, read and write to them, build indexes, and perform searches across them. There is less overhead to using the API because there is very little Perl memory management, but the functions are less convenient to use. Some operations, such as writing BAM files, are only available through the low-level API.

The classes you will be interacting with in the low-level API are as follows:
 * Bio::DB::Tam                      -- Methods that read and write TAM (text SAM) files.
 * Bio::DB::Bam                      -- Methods that read and write BAM (binary SAM) files.
 * Bio::DB::Bam::Header        -- Methods for manipulating the BAM file header.
 * Bio::DB::Bam::Index          -- Methods for retrieving data from indexed BAM files.
 * Bio::DB::Bam::Alignment    -- Methods for manipulating alignment data.
 * Bio::DB::Bam::Pileup          -- Methods for manipulating the pileup data structure.
 * Bio::DB::Sam::Fai               -- Methods for creating and reading from indexed Fasta files.


'Research > Technical' 카테고리의 다른 글

Perl - delete() & splice()  (0) 2011/01/12
Bio::DB::Sam  (0) 2010/12/30
Bioperl Tutorial  (0) 2010/12/30
2010. 12. 30. Bioperl의 SAMTools 관련 모듈 설치  (0) 2010/12/30
[R] Using Data in 'table' type  (0) 2010/09/10
[R] Sorting My Data by a Specific Column  (0) 2010/09/10
Posted by Snowyday
Research/Technical2010/12/30 15:59

* Bioperl Tutorial

아.. 이거 다 훑어보고 나면 굉장히 많은 일들이 편리해지겠구나!
그만큼 양은 많지만.


Contents

 [hide]

'Research > Technical' 카테고리의 다른 글

Perl - delete() & splice()  (0) 2011/01/12
Bio::DB::Sam  (0) 2010/12/30
Bioperl Tutorial  (0) 2010/12/30
2010. 12. 30. Bioperl의 SAMTools 관련 모듈 설치  (0) 2010/12/30
[R] Using Data in 'table' type  (0) 2010/09/10
[R] Sorting My Data by a Specific Column  (0) 2010/09/10
Posted by Snowyday
TAG BioPerl