Sunday, October 23, 2011

Bay Area Hadoop User Group (HUG) Jan 2011 Meetup



Agenda: 
  • 6:00 - 6:30 - Socialize over food and beer(s) 
  • 6:30 - 7:00 - New features in Pig 0.8 
  • 7:00 - 7:30 - Kafka: LinkedIn's Real-time Data Stream System 
  • 7:30 - 8:00 - Howl: Table Management Service for Hadoop 

New features in Pig 0.8: Pig 0.8 focussed on extending Pig's usability. We added the ability to write UDFs in scripting languages like Python, gave users better access to statistics, and created PigUnit to help users test their Pig Latin, to name only a few. Of course we continued to improve performance too by enabling compression of intermediate results and collecting together small blocks into a single mapper. We'll cover these and more in this overview of Pig 0.8s new features plus talk about what we're working on now for 0.9.
Presenter: Alan Gates, Yahoo!
Kafka: LinkedIn's Real-time Data Stream System: Kafka is a distributed, real-time, persistent messaging system developed at LinkedIn. It supports horizontally distributing message production, brokering and consumption over commodity machines. This system serves as the backbone of LinkedIn's log aggregation and activity processing system, providing data feeds for Hadoop as well as real-time consumers.
Presenter: Jay Kreps, LinkedIn
Howl: Table Management Service for Hadoop: Howl is a project that aims at providing a table management service. Data processors using Hadoop have a common need for table management services. The goal of this service is to track data that exists in a Hadoop grid and present that data to users in a tabular format. The table management service will present data in an uniform format to all tools like Map Reduce, Streaming, Pig, and Hive, by providing interfaces to each of these data processing tools.
Presenter: Devaraj Das, Yahoo!

No comments:

Post a Comment