OpenZoo

A framework for distributed stream and batch processing

View project on GitHub

Overview

OpenZoo is an open-source, MIT licensed, distributed, stream/batch processing framework. OpenZoo enables the development of processing topologies with minimum configuration on easy-to-use User Interfaces. Multiple languages and cross-platform support enables the creation of complex topologies, deployed either in cloud infrastructures or shared in available PCs in a lab environment. The current distribution consists of a java service template, a management GUI and several test services for demonstrating the use of the framework. Templates for C++ and Python will be soon available.

Features

OpenZoo offers the following core functionalities:

  • Remote deployment of services
  • Basic classes for service registration, intercommunication and monitoring
  • Load balancing through the usage of queues
  • Data storage
  • Data caching
  • Easy service topology creation and management
  • Transparent allocation of available resources
  • Easy exchange of components
  • One touch creation of service wrappers
  • Abstraction layer over communication, persistence, caching, etc.
  • Schema-free JSON as message exchange format

Applications

A wide range of applications can be developed on top of OpenZoo:

  • Real time search and analytics applications
  • Streaming and batch processing frameworks
  • Distributed and scalable architectures

Success cases

OpenZoo has been tested as a real time search and analytics framework, based on images shared through Twitter, during the CUbRIK project. It was running for over a year, distributed on several (8-15) servers, processing millions of tweets per week. It is currently used as a video processing framework, part of the Big Data oriented LASIE project, extracting visual features out of Gigs of video. In the near future, it is going to be used in a mixture of the above mentioned cases, retrieving social media content, including video, during the TRILLION project.

Support

OpenZoo supports both Windows (tested on XP and Win7) and Linux (tested on Ubuntu 14.04 LTS). It uses RabbitMQ as a communication medium and MongoDB for persistence and exchange of parameters. Services are running on Tomcat (tested on Tomcat 7). Currently, only Java is supported for service development. C++ and Python service templates are the next major milestone.