配列Aから配列Bの要素を取り除いた配列Cを作る方法

面白そうなネタを見つけたので、便乗してみた。

1
my @update = grep { my $a = 1; foreach my $b (@key) { $a = 0 if $_ eq $b; } $a; } @items;
1
2
my %key = map { $_ => 1 } @key;
my @update = grep { not exists $key{$_} } @items;

上の二つでは、ハッシュを使うほうが段違いに速い。

その傾向は大きい配列になるほど顕著になる。

でも、配列が大きくなると、mapの部分が結構時間がかかるようなので、ハッシュスライスにしてみた。

1
2
3
my %hash;
undef @hash{@key};
my @update = grep { not exists $hash{$_} } @items;

その結果、さらに速くなった。

しかも、配列が大きくなればなるほど効果も上がっている。

やはり、スライスは使えるテクニックだ。

参考ベンチの結果とソースは以下のとおり。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#!/usr/bin/env perl
use strict;
use warnings;
use Benchmark qw(:all);
use Data::Dumper::Names;
$Data::Dumper::Indent = 1;
sub p (@) { print Dumper(@_) }
use List::Util qw(shuffle);
use Perl6::Say;
our @items = shuffle("a" .. "z");
our @key   = shuffle("f" .. "zz");
say join ' ', '@items = qw(', @items, ');';
say join ' ', '@key   = qw(', @key,   ');';
cmpthese(
    timethese(
        0,
        {
            Linear => sub {
                my @update = grep {
                    my $a = 1;
                    foreach my $b (@key) { $a = 0 if $_ eq $b; }
                    $a;
                } @items;
            },
            Hash => sub {
                my %key = map { $_ => 1 } @key;
                my @update = grep { not exists $key{$_} } @items;
            },
            Hash2a => sub {
                my %hash;
                @hash{@key} = (1) x @key;
                my @update = grep { not exists $hash{$_} } @items;
            },
            Hash2b => sub {
                my %hash;
                @hash{@key} = (1) x @key;
                my @update = grep { !$hash{$_} } @items;
            },
            Hash3 => sub {
                my %hash;
                undef @hash{@key};
                my @update = grep { not exists $hash{$_} } @items;
            },
        }
    )
);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
@items = qw( b t m e d k w r c s j z n v h g l q o y a u f x i p );
@key   = qw( b c d a f e );
Benchmark: running Hash, Hash2a, Hash2b, Hash3, Linear for at least 3 CPU seconds...
Hash:  3 wallclock secs ( 3.22 usr +  0.00 sys =  3.22 CPU) @ 30791.86/s (n=99119)
Hash2a:  3 wallclock secs ( 3.13 usr +  0.00 sys =  3.13 CPU) @ 38859.88/s (n=121476)
Hash2b:  3 wallclock secs ( 3.16 usr +  0.00 sys =  3.16 CPU) @ 38478.30/s (n=121476)
Hash3:  3 wallclock secs ( 3.19 usr +  0.00 sys =  3.19 CPU) @ 40126.10/s (n=127922)
Linear:  3 wallclock secs ( 3.17 usr +  0.00 sys =  3.17 CPU) @ 14515.13/s (n=46042)
Rate Linear   Hash Hash2b Hash2a  Hash3
Linear 14515/s     --   -53%   -62%   -63%   -64%
Hash   30792/s   112%     --   -20%   -21%   -23%
Hash2b 38478/s   165%    25%     --    -1%    -4%
Hash2a 38860/s   168%    26%     1%     --    -3%
Hash3  40126/s   176%    30%     4%     3%     --
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
@items = qw( y f n m b k l x w e p t a z o r q h u g j i s c d v );
@key   = qw( bs uy we bq kg cn vq ap ue zr ot hh u ss cq s mv bf qi at ly yv sq hf jw ph yd rb yw wm he br lz lg kd uf ik lh hj rw gs ox xj qo tf ky kf lt pe zg lw ac er el ao sg jd v gl mz my sj sl xa ws lk iu cr xz gy bm ui k cj h yt ez gz rh oh ep gx de md aw gv tn vm hu mm ti m pk ry vi cb rg mt zv uq ew qw no cm wz nk gd nl ut ng vr vf rd au w rt vo pc fu ix xk xw mq px wu ef bk kb jp z li tg ka bp fw sn nr mj af vx vc ff zk zu o ni vy rv dr kr bj zi cw tb fb su fx fd tm zy ob vn iw uc pi xs bu fe qf wv zc uk hz do qg mb wx kp op dk zm aj kc oe ip cx ey jy pr wo bc sx um fl if fz jq yu in tv lb mw rf mx ha nx zd ea vl xb wt gp kh rx gr te py fj uv zx fg eb yb im ge qr ed bx ad aa hx wd lq ii du sp mi yl ud ru ie nj wf qk jt sa ee gi ij fm yz wb je lv pq wa is nt ej st qs tk pn it xh uu y ro gu gt sc pa sh bh vs et cs dh ne rj ku ze wl mp rn pd ai oq tj xu ga yy us un gc ck hs xr bg r ch vu oy to fa tu va oa td eu hg ki rp tc fy gk nn q qm eq dw yc za nm jr zf dq ct uj nz ft pz ba ym nq ub ib mu cd aq so bw eg i tz fo wk qu rq wh kw bb hr xy qd lu ln vg lf dn iq p dj zn zq pv bl tt or pg sd qp az mo uz kx xt an es fc ug xf jb kz pj nv ca ya mr ml pp fk hp al rl sf xm ae ig wj sr ei hq wg io nd kv ce as qe tl tp vp n sk tq x cc rs ev hb sw nb dm mk ko lj jk xx da ds qa wn vt js zh ar sv ma kl ul pl en re be xi wq xp os wr yf hk jg lc lr cz di ci hv od ux yp em iy cp qt vd bi bv am qq gq ps om vk qz np ov ho oz kj oo tx xl gf gh bt up hc dz po ic oi ou yn ht hd se ir mc ld fh la ec lx ia pt rk ol yr dl rr id cu dd dv go xo rc ay nu wi og il kn ra rz cl iv by yx ek xv qx f nf gw bz fv kk cv yo ql ag ke jo jl me ye jj qh ty fq yk ks sy zs ak sb kq yq ns mn hw vb jx vw rm mh ms ur vh gg on jz hl oc pm co vz fp jf ow tr av hm df cf gn ja ta ax qc fi l ri dy ts gm le of ah t nw sz ua jm zz ls fn xn jc zj zo ju ll qv dg cg lp nh g dc wc eo xg ji fr jh pb ys dp ok xq zw wy yg uo wp pf mg uw xe zl tw vv ww bd th gb iz yi bo pw ab yh bn hy oj zp lo gj zb xc jn ih si qj j lm hn fs zt sm pu db xd jv cy nc na ve mf hi dt ex uh qy kt eh vj qb yj dx km ny qn );
Benchmark: running Hash, Hash2a, Hash2b, Hash3, Linear for at least 3 CPU seconds...
Hash:  3 wallclock secs ( 3.17 usr +  0.00 sys =  3.17 CPU) @ 681.90/s (n=2163)
Hash2a:  3 wallclock secs ( 3.26 usr +  0.00 sys =  3.26 CPU) @ 1994.49/s (n=6512)
Hash2b:  3 wallclock secs ( 3.25 usr +  0.00 sys =  3.25 CPU) @ 1997.23/s (n=6491)
Hash3:  3 wallclock secs ( 3.23 usr +  0.00 sys =  3.23 CPU) @ 2299.63/s (n=7437)
Linear:  3 wallclock secs ( 3.17 usr +  0.00 sys =  3.17 CPU) @ 557.06/s (n=1767)
Rate Linear   Hash Hash2a Hash2b  Hash3
Linear  557/s     --   -18%   -72%   -72%   -76%
Hash    682/s    22%     --   -66%   -66%   -70%
Hash2a 1994/s   258%   192%     --    -0%   -13%
Hash2b 1997/s   259%   193%     0%     --   -13%
Hash3  2300/s   313%   237%    15%    15%     --
comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy