关于mepereduce
为什么HIVE 查询某列会执行mepereduce的JOB?
为什么HIVE 查询某列会执行mepereduce的JOB?
2016-01-24
Not exactly。
其实查询某一些列也是可以不执行MR job的,尤其是在Hive 0.14以后, 是默认的。
之前的版本比如0.13的话,你需要在hive-site.xml里设置参数‘hive.fetch.task.conversion’为 ‘more’:
<property> <name>hive.fetch.task.conversion</name> <value>minimal</value> <description> Some select queries can be converted to single FETCH task minimizing latency.Currently the query should be single sourced not having any subquery and should not have any aggregations or distincts (which incurrs RS), lateral views and joins. 1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only 2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns) </description> </property>
如果你的表特别大的话, 可以还需要设置一下hive.fetch.task.conversion.threshold这个参数,在0.14以后默认是1G(table size)。你可以设置成-1(infinite),当然-1的话对于petabyte级别的表可能有点小危险,慎用。
举报