Abstract: In the parallel computing framework of Hadoop/Spark, data skew is a common problem resulting in performance degradation, such as prolonging of the entire execution time and idle resources.