课程
                    
                        /云计算&大数据
                        
                            /大数据
                        
                        /走近大数据之Hive入门

关于mepereduce

为什么HIVE 查询某列会执行mepereduce的JOB？

死之徒

2016-01-24

源自：走近大数据之Hive入门 4-1

关注问题我要回答

970

操作

收起

2 回答

vcfvct
2016-03-03

Not exactly。

其实查询某一些列也是可以不执行MR job的，尤其是在Hive 0.14以后，是默认的。

之前的版本比如0.13的话，你需要在hive-site.xml里设置参数‘hive.fetch.task.conversion’为 ‘more’：

<property>  
<name>hive.fetch.task.conversion</name>  
<value>minimal</value>  
<description>    
Some select queries can be converted to single FETCH task     minimizing latency.Currently the query should be single     sourced not having any subquery and should not have    any aggregations or distincts (which incurrs RS),     lateral views and joins.    1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only    2. more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns) 
 </description>
</property>

如果你的表特别大的话，可以还需要设置一下hive.fetch.task.conversion.threshold这个参数，在0.14以后默认是1G（table size）。你可以设置成-1（infinite），当然-1的话对于petabyte级别的表可能有点小危险，慎用。