admin 管理员组

文章数量: 1087709

【Hadoop】45

1、需求

有如下订单数据

订单id

商品id

成交金额

Order_0000001

Pdt_01

222.8

Order_0000001

Pdt_05

25.8

Order_0000002

Pdt_03

522.8

Order_0000002

Pdt_04

122.4

Order_0000002

Pdt_05

722.4

Order_0000003

Pdt_01

222.8

现在需要求出每一个订单中成交金额最大的一笔交易。

2、分析

1、利用“订单id和成交金额”作为key,可以将map阶段读取到的所有订单数据按照id分区,按照金额排序,发送到reduce。

2、在reduce端利用groupingcomparator将订单id相同的kv聚合成组,然后取第一个即是最大值。

3 、实现

自定义GroupingComparator。

/*** 利用reduce端的GroupingComparator来实现将一组bean看成相同的key*/
public class ItemidGroupingComparator extends WritableComparator {//传入作为key的bean的class类型,以及制定需要让框架做反射获取实例对象protected ItemidGroupingComparator() {super(OrderBean.class, true);}@Overridepublic int compare(WritableComparable a, WritableComparable b) {OrderBean abean = (OrderBean) a;OrderBean bbean = (OrderBean) b;//比较两个bean时,指定只比较bean中的orderidreturn abean.getItemid()pareTo(bbean.getItemid());}
}

定义订单信息bean。

public class OrderBean implements WritableComparable<OrderBean>{private Text itemid;private DoubleWritable amount;public OrderBean() {}public OrderBean(Text itemid, DoubleWritable amount) {set(itemid, amount);}public void set(Text itemid, DoubleWritable amount) {this.itemid = itemid;this.amount = amount;}@Overridepublic int compareTo(OrderBean o) {int cmp = this.itemidpareTo(o.getItemid());if (cmp == 0) {cmp = -this.amountpareTo(o.getAmount());}return cmp;}@Overridepublic void write(DataOutput out) throws IOException {out.writeUTF(itemid.toString());out.writeDouble(amount.get());}@Overridepublic void readFields(DataInput in) throws IOException {String readUTF = in.readUTF();double readDouble = in.readDouble();this.itemid = new Text(readUTF);this.amount= new DoubleWritable(readDouble);}@Overridepublic String toString() {return itemid.toString() + "\t" + amount.get();}
}
Partitioner,相同id的订单bean,会发往相同的partition。
public class ItemIdPartitioner extends Partitioner<OrderBean, NullWritable>{@Overridepublic int getPartition(OrderBean bean, NullWritable value, int numReduceTasks) {//相同id的订单bean,会发往相同的partition//而且,产生的分区数,是会跟用户设置的reduce task数保持一致return (bean.getItemid().hashCode() & Integer.MAX_VALUE) % numReduceTasks;}
}

编写mapreduce处理流程。

public class SecondarySort {static class SecondarySortMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable>{OrderBean bean = new OrderBean();@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString();String[] fields = StringUtils.split(line, ",");bean.set(new Text(fields[0]), new DoubleWritable(Double.parseDouble(fields[2])));context.write(bean, NullWritable.get());}}static class SecondarySortReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable>{//到达reduce时,相同id的所有bean已经被看成一组,且金额最大的那个一排在第一位@Overrideprotected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {context.write(key, NullWritable.get());}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf);job.setJarByClass(SecondarySort.class);job.setMapperClass(SecondarySortMapper.class);job.setReducerClass(SecondarySortReducer.class);job.setOutputKeyClass(OrderBean.class);job.setOutputValueClass(NullWritable.class);FileInputFormat.setInputPaths(job, new Path("c:/wordcount/gpinput"));FileOutputFormat.setOutputPath(job, new Path("c:/wordcount/gpoutput"));//在此设置自定义的Groupingcomparator类 job.setGroupingComparatorClass(ItemidGroupingComparator.class);//在此设置自定义的partitioner类job.setPartitionerClass(ItemIdPartitioner.class);job.setNumReduceTasks(2);job.waitForCompletion(true);}
}

 

本文标签: Hadoop45