LAB 1. Word Median
Solution 1: Word Median
# Open the Terminal and perform the following tasks.
# Step 1: Create a file named sample.txt.
lab@user:~$ gedit sample.txt
# Step 2: Write the below lines in the file and save it.
'''
Hadoop MapReduce is a software framework. Hadoop MapReduce easily writing applications which process vast amounts of data in-parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models
'''
# Step 3: Copy the file to HDFS path. /user/labuser
lab@user:~$ hdfs dfs -put sample.txt /user/labuser/
# Check the HDFC directory with new File
lab@user:~$ hdfs dfs -ls /user/labuser/
# Step 4: Find the word count of the above text file.
lab@user:~$ hadoop fs -text /user/labuser/sample.txt | wc -w
# Step 5: Find the number of occurrences of the word 'Hadoop' in the file.
lab@user:~$ hadoop fs -text /user/labuser/sample.txt | grep -o "Hadoop" | wc -l
# Step 6: Find the number of occurrences of the word MapReduce in the file.
lab@user:~$ hadoop fs -text /user/labuser/sample.txt | grep -o "MapReduce" | wc -l