Home » Pig Example

Pig Example

Use case: Using Pig find the most occurred start letter.

Solution:

Case 1: Load the data into bag named “lines”. The entire line is stuck to element line of type character array.

Case 2: The text in the bag lines needs to be tokenized this produces one word per row.

Case 3: To retain the first letter of each word type the below command .This commands uses substring method to take the first character.

Case 4: Create a bag for unique character where the grouped bag will contain the same character for each occurrence of that character.

Case 5: The number of occurrence is counted in each group.

Case 6: Arrange the output according to count in descending order using the commands below.

Case 7: Limit to One to give the result.

Case 8: Store the result in HDFS . The result is saved in output directory under tutor folder.

Next TopicPig UDF

You may also like