Hadoop is a powerful open-source framework that revolutionized the way we handle large datasets, enabling efficient storage and processing of massive amounts of data. Built to handle the challenges of big data, Hadoop is widely used in industries for tasks like data analysis, machine learning, and more.
In this blog, we’ll explore essential Hadoop commands that every beginner should know. Each command will be explained in detail and accompanied by practical examples to make learning easy and effective.
Basic Hadoop Commands
1. Check Hadoop Version
Use this command to verify the installed version of Hadoop:
hadoop version

2. Get General Help
To see a list of available Hadoop commands:
hadoop help
Output :

Displays all the general and subcommands available in Hadoop.
HDFS (Hadoop Distributed File System) Commands
HDFS is the backbone of Hadoop's storage system. Below are essential HDFS commands with examples.
1. Creating Directories
hdfs dfs -mkdir /user/hadoop
hdfs dfs -mkdir /user/hadoop/input
Output :

Creates a directory named input under /user/hadoop.
2. Listing Files and Directories
hdfs dfs -ls /user/hadoop
Output :

It shows the list of files and directories under the hadoop directories available.
3. Copying Files from Local to HDFS
Before copying a file to HDFS, make sure the file exists in your local directory. You can create a sample file if it doesn't exist:
Step 1: Create a Sample File
mkdir ~/data
sudo echo "Hello, this is a sample file for Hadoop commands." > ~/data/sample.txt
Output :

created a data directory and text file with text.
Step 2: Verify the File Exists
cat ~/data/sample.txt
Output :

Display text from sample.txt file.
Step 3: Copy the File to HDFS
hdfs dfs -put ~/data/sample.txt /user/hadoop/input
This command uploads sample.txt from the local directory to /user/hadoop/input in HDFS.
Output :

Uploads the file sample.txt from the local machine to HDFS.
4. Viewing File Contents
hdfs dfs -cat /user/hadoop/input/sample.txt
Output :

Displays the content of sample.txt.
5. Copying Files from HDFS to Local
mkdir ~/output
hdfs dfs -get /user/hadoop/input/sample.txt ~/output/
Output :

Downloads the file sample.txt to the local output directory.
Viewing File Contents
cat ~/output/sample.txt
Output :

6. Removing Files and Directories
hdfs dfs -rm /user/hadoop/input/sample.txt
Output :

Deletes the specified file.
7. Checking File Replication
In previous command we remove the sample.txt file, so first we again move sample text file from local to hdfs
hdfs dfs -put ~/data/sample.txt /user/hadoop/input
hdfs dfs -stat %r /user/hadoop/input/sample.txt
Output :

Displays the replication factor of the file.
8. Disk Space Usage
hdfs dfs -du -h /user/hadoop
Output :

File Operations Commands
1. Moving Files
Before moving the files you need to make sure the processed directory should be available in hadoop directory otherwise it should be raised the error.
hdfs dfs -mv /user/hadoop/input/sample.txt /user/hadoop/processed/
Output :

Moves the file sample.txt to the processed directory.
2. Renaming Files
hdfs dfs -mv /user/hadoop/processed/sample.txt /user/hadoop/processed/data.txt
Output :

Renames the file to data.txt.
3. Changing File Permissions
hdfs dfs -chmod 644 /user/hadoop/processed/data.txt
Output :

Sets read-write permissions for the owner and read-only for others.
4. Changing Ownership
hdfs dfs -chown user:group /user/hadoop/processed/data.txt
Output :

Hadoop MapReduce Commands
1. Running a MapReduce Job
While running this job, assume that the input text file is available at the location /user/hadoop/input, and the output2 directory does not exist. When the command is executed, the output2 directory will be created automatically, and the output will be stored in that directory.
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /user/hadoop/input /user/hadoop/output2
Processes the files in /input and stores the results in /output2.
Output :

When Submitting the Job
When you run a MapReduce job using the hadoop jar command, the output will display a job submission log. The job ID is typically printed to the console. It looks something like this:
job id is : job_1734608918111_0002
Checking Job Status
hadoop job -status job_1734608918111_0002
Retrieves the status of the specified job.
Listing All Running Jobs
You can list all running jobs using the following command:
hadoop job -list
Output :
This will output a list of currently running jobs with their respective job IDs:

From the ResourceManager Web UI
Open your browser and navigate to the ResourceManager Web UI. The default URL is:
Output :

Administrative Commands
1. Starting Hadoop Services
Output :

2. Checking Service Status
Output :

Common Errors and Troubleshooting
Permission Denied:
Solution: Use the hdfs dfs -chmod command to modify permissions.
Directory Not Found:
Solution: Ensure the path exists before running commands.
Insufficient Replication:
Solution: Increase the replication factor using the hdfs dfs -setrep command.
Mastering these Hadoop commands is the first step to effectively managing big data projects. Hadoop's robust ecosystem empowers you to work with vast datasets seamlessly, and proficiency with these commands will make your journey smoother.
