Disk Space Issues in Hadoop¶
This topic addresses about how to troubleshoot a few common Hadoop disk space issues.
Handling a Disk Space Issue When Creating a Directory¶
While running Hadoop jobs, you can hit this exception: cannot create directory :No space left on device
.
This exception usually appears when the disk space in the HDFS is full. In Qubole, only temporary/intermittent data to the HDFS is deleted as a cron job would be running to delete the temp files regularly. This issue would be seen in cases such as:
- Long running jobs where the jobs may be writing lots of intermediate data and the cron could not delete the data as the jobs are still running.
- Long running clusters where in rare cases, the data written from failed or killed tasks may not get deleted.
Solution: Verify the actual cause by checking the HDFS disk usage from one of these methods:
On the Qubole UI, through the DFS Status from the running cluster’s UI page.
By logging into the cluster node and running this command:
hadoop dfsadmin -report
A sample response is mentioned here.
Configured Capacity: 153668681728 (143.12 GB) Present Capacity: 153668681728 (143.12 GB) DFS Remaining: 153555091456 (143.01 GB) DFS Used: 113590272 (108.33 MB) DFS Used%: 0.07% Under replicated blocks: 33 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Live datanodes (2): Name:x.x.x.x:50010 (ip-x-x-x-x.ec2.internal) Hostname:ip-x-x-x-x.ec2.internal Decommission Status : Normal Configured Capacity: 76834340864 (71.56 GB) DFS Used: 56795136 (54.16 MB) Non DFS Used: 0 (0 B) DFS Remaining: 76777545728 (71.50 GB) DFS Used%: 0.07% DFS Remaining%: 99.93% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: Tue Dec 26 11:21:19 UTC 2017 Name:x.x.x.x:50010((ip-x-x-x-x.ec2.internal) Hostname: ip-x-x-x-x.ec2.internal Decommission Status : Normal Configured Capacity: 76834340864 (71.56 GB) DFS Used: 56795136 (54.16 MB) Non DFS Used: 0 (0 B) DFS Remaining: 76777545728 (71.50 GB) DFS Used%: 0.07% DFS Remaining%: 99.93% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: Tue Dec 26 11:21:21 UTC 2017
Handling a Device Disk Space Error¶
While running jobs, you may hit this exception - java.io.IOException: No space left on device
.
Cause: This exception usually appears when there is no disk space on the worker or coordinator nodes. You can confirm this
by logging into the corresponding node and running a df -h
on the node when the query is still running.
Solution: You can avoid this error by one of these solutions:
- Enable EBS autoscaling. After enabling, you can attach additional EBS volumes based on the query’s requirement.
- You can also try using cluster instance types with larger disk space.