The Hadoop Hornbook Fresco Play hands on Solution

Discover Hadoop's key features: easily copy files to HDFS, create new files, and efficiently manage large datasets with its distributed processing.

LAB 1. Word Median

Solution 1: Word Median

# Open the Terminal and perform the following tasks.
# Step 1: Create a file named sample.txt.
lab@user:~$ gedit sample.txt

# Step 2: Write the below lines in the file and save it.
'''
Hadoop MapReduce is a software framework. Hadoop MapReduce easily writing applications which process vast amounts of data in-parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models
'''

# Step 3: Copy the file to HDFS path. /user/labuser
lab@user:~$ hdfs dfs -put sample.txt /user/labuser/

# Check the HDFC directory with new File
lab@user:~$ hdfs dfs -ls /user/labuser/

# Step 4: Find the word count of the above text file.
lab@user:~$ hadoop fs -text /user/labuser/sample.txt | wc -w


# Step 5: Find the number of occurrences of the word 'Hadoop' in the file.
lab@user:~$ hadoop fs -text /user/labuser/sample.txt | grep -o "Hadoop" | wc -l

# Step 6: Find the number of occurrences of the word MapReduce in the file.
lab@user:~$ hadoop fs -text /user/labuser/sample.txt | grep -o "MapReduce" | wc -l

About the author

D Shwari
I'm a professor at National University's Department of Computer Science. My main streams are data science and data analysis. Project management for many computer science-related sectors. Next working project on Al with deep Learning.....

Post a Comment