UW Big Data Event presents Storm

After hearing from a friend in the Madison Big Data Meetup that Twitter would be sending its engineers to the UW Madison CS Department to talk about Apache Storm, a group of Bendyworkers bundled up against the cold and made the short trek to the UW. Here at Bendyworks we’re pretty excited about Storm, and it was great to join UW undergrads and grad students for the event and learn alongside them.

Storm is a distributed, real-time computation system often used to process large amounts of data. Twitter created it to work with the firehose of tweets and user data it processes to perform analytics, suggest tweets you may be interested in, perform machine learning and natural language processing tasks on tweet data, and more.

The typical Storm system is a cluster of servers which run out in the cloud and perform number crunching and other computations against data that would be too big to fit onto one server. Storm runs in real-time, which is an improvement over previous systems that ran in batches. And, it runs against your databases and other data stores rather than requiring data to be turned into files or special formats for processing.

A large chunk of Storm is written in Clojure, which is of special interest to us. Our team has increasingly been using Clojure, so we were excited to learn about this cutting edge use of the language and crunch some numbers in the cloud!

After a brief presentation on how Storm works and how jobs are written, we dove into a hackathon event to create and submit jobs to the cluster that Twitter had provided for the event. After a bit of hacking, our team successfully had a simple word-counting job written in Clojure that uses Bendyworks blog posts to learn which words are most common on our blog.

Companies everywhere increasingly have larger stores of data. This new field frequently termed “Big Data” often requires new tools and machine learning approaches to deliver value from that data. We expect that Storm will become a part of our toolset for responding to the increasing demand.

The Big Data Meetup group here in Madison runs monthly meetings as well as other events. Keep up to date with their announcements on Meetup.


Category: Development
Tags: Databases