CARVIEW |
SQL???C?N??Hadoop Hive???g???|??!?FHive?\?\RDB?g???̂??߂?Hadoop?K?C?h?i??ҁj?i1/3 ?y?[?W?j
?O??̋L???ł́A?茳?̃}?V?????Hadoop?????\?z????Hive?????A??{?I?ȑ?????m?F???܂????B????͓????????g???āA?????H?I?ȃf?[?^????ɂ??Ă݂Ă????܂??B
?p?[?e?B?V?????𗘗p????
?@????͏????Â????e?[?u?????`?????Ă݂܂??傤?B
?@?X?֔ԍ??f?[?^?͖????X?V?????̂ŁA?e?[?u???w?莞?Ƀo?[?W???????w??ł???悤?ɂ??܂??B???̂悤?ȏꍇ?AHive?ł??p?[?e?B?V???????g???܂??B
?@?ȉ??ɗX?֔ԍ???ۑ?????e?[?u???uzip?v???`???܂????A???t?^DATE?̃p?[?e?B?V????ver??ݒ肷???悤?ɂ??܂??B
hive> CREATE TABLE zip (zip STRING, pref INT, city STRING, town STRING) > PARTITIONED BY (ver DATE) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n'; OK Time taken: 0.128 seconds
?@?????ŁA?e?[?u??zip??2008?N11??28???ł̗X?֔ԍ??f?[?^zip20081128.csv??2008?N12??26???ł?zip20091226.csv??ǂݍ??݂܂??B
hive> LOAD DATA LOCAL INPATH '/home/hiveuser/localfiles/zip20081128.csv' > OVERWRITE INTO TABLE zip PARTITION (ver = '2008-11-28'); Copying data from file:/home/hiveuser/localfiles/zip20081128.csv Loading data to table zip partition {ver=2008-11-28} OK Time taken: 1.582 seconds hive> LOAD DATA LOCAL INPATH '/home/hiveuser/localfiles/zip20081226.csv' > OVERWRITE INTO TABLE zip PARTITION (ver = '2008-12-26'); Copying data from file:/home/hiveuser/localfiles/zip20081226.csv Loading data to table zip partition {ver=2008-12-26} OK Time taken: 1.788 seconds
?@?f?[?^???{???ɓǂݍ??܂ꂽ???ǂ????m?F???܂??傤?B???ɐ???????SELECT???ł??????ł????A?????ł?????Hadoop??HDFS?V?X?e???̂????ŁA?ǂ̂悤?Ƀf?[?^???ۑ?????Ă??邩?m?F???܂??B
?@Hadoop?̃R?}???h??HDFS?V?X?e???̃f?B???N?g???u/user/hive/warehouse?v?ȉ???\???????܂??BHive?̃f?[?^?͂??̃f?B???N?g???ɕۑ?????܂??B
hiveuser> $HADOOP_HOME/bin/hadoop dfs -lsr /user/hive/warehouse drwxr-xr-x - hiveuser supergroup 0 2009-01-01 12:22 /user/hive/warehouse/pref -rw-r--r-- 3 hiveuser supergroup 611 2009-01-01 12:21 /user/hive/warehouse/pref/pref.csv drwxr-xr-x - hiveuser supergroup 0 2009-01-01 12:24 /user/hive/warehouse/zip drwxr-xr-x - hiveuser supergroup 0 2009-01-01 12:24 /user/hive/warehouse/zip/ver=2008-11-28 -rw-r--r-- 3 hiveuser supergroup 4541673 2009-01-01 12:24 /user/hive/warehouse/zip/ver=2008-11-28/zip20081128.csv drwxr-xr-x - hiveuser supergroup 0 2009-01-01 12:24 /user/hive/warehouse/zip/ver=2008-12-26 -rw-r--r-- 3 hiveuser supergroup 4541979 2009-01-01 12:24 /user/hive/warehouse/zip/ver=2008-12-26/zip20081226.csv
?@?e?[?u?????Ɠ??????O?̃f?B???N?g?????쐬????A????ȉ??ɓǂݍ??܂ꂽ?t?@?C???????݂??Ă??邱?Ƃ???????܂??B?p?[?e?B?V???????ݒ肳?ꂽ?e?[?u???ɂ́A?T?u?f?B???N?g?????쐬????Ă??邱?Ƃ???????܂??B
?@?????ł͏ȗ????܂????AWeb?u???E?U??HDFS?̏?Ԃ?\???????????Ƃ??ł??܂??B
?@Web?u???E?U?Łulocalhost:50070?v?ɃA?N?Z?X???A?uLive Datanodes?v?́uNode = localhost?v???N???b?N????ƁAHDFS?̃??[?g?f?B???N?g?????\??????܂??B
?R??????MapReduce????ɂ???
?@Hive??Hadoop?̋@?\?ł???MapReduce??????ł??܂??B????????Hive???̂?MapReduce?Ŏ??s?????킯?ł????A?????ł?Hive????MapReduce?X?N???v?g?????s??????@????????܂??BMapReduce?̍\???́A???LURL?ŏЉ??Ă??܂??B
?@?????ł̓e?[?u??pref?ɂ??āApref?J?????́u?X???v??u??茧?v????u???v?Ƃ??????????폜???āA?V?????e?[?u??pref_new?ɑ}??????Map??????s???܂??B
?@?͂??߂Ƀe?[?u??pref_new???`???܂??B
hive> CREATE TABLE pref_new (id int, pref STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n';
?@????Map??????s???܂??B?܂??͊ȗ??łŎ??s???Ă݂܂??傤?B
hive> FROM ( > FROM pref > SELECT TRANSFORM(pref.id, pref.pref) AS (oid, opref) > USING '/bin/sed s/??//' > CLUSTER BY oid > ) tmap > INSERT OVERWRITE TABLE pref_new SELECT tmap.oid, tmap.opref;
?@Map?X?N???v?g??????????USING??ɂ́AUNIX?̃R?}???h?ł???sed?????????܂?Ă??邾???ł??B???̑?????s???ƁA?e?[?u????1?s????sed?R?}???h?ɓn????A?u???v?Ƃ???????????폜????INSERT???ɓn????܂??B
?@sed?R?}???h??1?s???Ƃɂ??ׂẴJ?????ɑ??ĕϊ????s???̂ŁA?ړI?Ƃ???J?????ȊO?̃f?[?^???ϊ????Ă??܂??A?ėp?I?ł͂???܂???B?????ŁA????????ړI?̃J?????????ɕϊ????{??perl?X?N???v?g???g???ē??l?̍?p???N?????Ă݂܂??傤?B
#!/usr/bin/perl while (<>) { chop; my(@w) = split('\t', $_); $w[1] =~ s/??$//; printf("%d\t%s\n", $w[0], $w[1]); }
hiveuser> chmod +x /home/hiveuser/test.pl
?@??蕶?????u\t?v?ł??邱?Ƃɒ??ӂ??Ă????????BHDFS?̒ʏ?̋?蕶???́u^A?v?ł????AHive???????????e?[?u???Ƀf?[?^?????????ޏꍇ?́A?t?@?C???̋?肪?u\t?v?ƂȂ?悤?ł??B
?@???Ƃ́A???l?Ɏ??s???邾???ł??B
hive> FROM ( > FROM pref > SELECT TRANSFORM(pref.id, pref.pref) AS (oid, opref) > USING '/home/hiveuser/test.pl' > CLUSTER BY oid > ) tmap > INSERT OVERWRITE TABLE pref_new SELECT tmap.oid, tmap.opref;
?{?e?͂????܂ł??O??Ƃ???o?[?W?????ihadoop-0.19.1?j?ł̌??،??ʂł??B?ق??̃o?[?W?????Ō?????ۂ͕ʓr?h?L???????g???m?F???Ă????????B
Copyright © ITmedia, Inc. All Rights Reserved.
?A?C?e?B???f?B?A????̂??m?点
??IT eBook
RSS?ɂ???
?A?C?e?B???f?B?AID?ɂ???
???[???}?K?W???o?^
??IT?̃??[???}?K?W???́A ???????A???ׂĖ????ł??B???Ѓ??[???}?K?W???????w?ǂ????????B
ITmedia?̓A?C?e?B???f?B?A??????Ђ̓o?^???W?ł??B
???f?B?A?ꗗ | ????SNS | ?L???ē? | ???₢???킹 | ?v???C?o?V?[?|???V?[ | RSS | ?^?c??? | ?̗p??? | ??????