I want to keep the duplicates in hive when I use collect_set(). Example:
hash_id | num_of_cats
=====================
abcdef            5
abcdef            4
abcdef            3
fndflka            1
fndflka            2
fndflka            3
djsb33            7
djsb33            7
djsb33            7
should return:
hash_agg | cats_aggregate
===========================
abcdef   Array<int>(5,4,3)
fndflka   Array<int>(1,2,3)
djsb33   Array<int>(7,7,7)