Background
Kerberizing a Hadoop cluster enables a properly authorised user to access the cluster without entering of username / password details. For example (after running a kinit command and starting the beeline JDBC client):
beeline> !connect jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL;
Connecting to jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL;
Enter username for jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL;: myusername
Enter password for jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL;: ************
Connected to: Apache Hive (version 1.2.1.2.3.0.1-3)
Driver: Hive JDBC (version 1.2.1.2.3.0.1-3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Despite the successful login above, two errors occurred subsequently when running Hive queries.
First error (permission denied)
1: jdbc:hive2://hdplinux1.company.internal:10000/default> select a,b from c where a=1;
INFO : Tez session hasn’t been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1441612826389_0022 failed 2 times due to AM Container for appattempt_1441612826389_0022_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hdplinux1.company.internal:8088/cluster/app/application_1441612826389_0022Then, click on links to logs of each attempt.
Diagnostics: Application application_1441612826389_0022 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is hive
main : requested yarn user is hive
Can’t create directory /var/log/hadoop/yarn/local/usercache/hive/appcache/application_1441612826389_0022 – Permission denied
Did not create any app directoriesFailing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:678)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:205)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:239)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
Workaround:
The above error was fixed by renaming the local application cache directory on each datanode:
su –
mv /var/log/hadoop/yarn/local/usercache/hive/appcache appcache.bak
A new appcache directory will get created when re-running the hive query. Note – this step was performed in a development cluster with no other users, so may have more harmful effects in a running cluster!
Second error (org.apache.hadoop.util.DiskChecker$DiskErrorException)
After the above workaround was applied a new error appeared when executing the Hive query:
1: jdbc:hive2://hdplinux1.company.internal:10000/default> select a,b from c where a=1;
INFO : Tez session hasn’t been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1441612826389_0036 failed 2 times due to AM Container for appattempt_1441612826389_0036_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hdplinux1.company.internal:8088/cluster/app/application_1441612826389_0036Then, click on links to logs of each attempt.
Diagnostics: Application application_1441612826389_0036 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is hive
main : requested yarn user is hive
org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory: /var/log/hadoop/yarn/local/usercache/hive/filecache/0/11603
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:372)Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:678)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:205)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:239)
at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
Workaround:
This second error was fixed by renaming the local filecache directory on each datanode:
su –
mv /var/log/hadoop/yarn/local/usercache/hive/filecache filecache.bak
A new filecache directory will get created when re-running the hive query. Again note that the impact on a running cluster is uncertain as other jobs may be actively using files in these local cache directories.
After performing the above steps, the original hive query now reruns successfully.
Further info
Vinod Vavilapalli and Omakar Vinit Joshi from Hortonworks describe the role of the appcache and filecache directories in their post on Resource Localization in Yarn. They describe how resources are localised to Yarn application nodes for performance reasons and downloaded files may be found in different local directories depending on categorisation. For example – application specific files are found in <local-dir>/usercache/<userid>/appcache/<app-id>/
and private (user-specific) files are found in
<local-dir>/usercache/<userid>/filecache
.