Day 2 – ApacheCon Big Data, Seville

November 15, 2016 2 min read

The second day of ApacheCon BigData was also successful and amazing. It was a long day that started with the Keynotes by Mayank Bansal from Uber who explained Uber’s big data stack and how they scaled up.

The next keynote was by Sean Owen from Cloudera who explained how Apache is more than just another Github where people just dump their code. It’s a place for building the community. It was also nice to hear his shoutout to Apache Allura which he talked about to explain the diversity and the reach of the projects. He said how we usually just think of ASF as a place for the HTTPd and Big Data projects but it’s more than that and how there are projects as big as these projects like Apache Allura.

Then, I attended the session on Distributed and Native Machine Learning using Apache Mahout by Suneel Marthi from Redhat. The talk was Math Intensive and demonstrated how easy it is for Data Scientists to forget about the implementation of the stack below and just write the code for their Data projects in their favorite language. He demonstrated how easy it is with Apache Mahout-Samsara to do distributed Linear Algebra with an example of the EigenFaces classification problem.

Another interesting talk was given by Clemens Valiente from the Trivago Development team who explained his company’s big data stack and how they moved from simple Java platform to the Big Data stack that reduced their query time from 5 seconds to less than a second.

Then I spent some time with Melissa and Gaurav at the Apache Software Foundation Booth at the Showcase Foyer.

Julien Nioche gave a talk on Low Latency Web Crawling using Apache Storm.

Julien Herzen presented Meerkat, which is a system built at Swisscom to do real-time anomaly detection on time series. Meerkat uses a combination of machine learning and big data technologies in order to trigger alerts in case of problems in Swisscom network.