APACHE SOFTWARE FOUNDATION
Updated 787 days ago
This library has been specifically designed for production systems that must process massive data. The library includes adaptors for Apache Hive, Apache Pig, and PostgreSQL (C++). These adaptors also stand as examples for adaptors for other systems. The sketches in this library are designed to have compatible binary representations across languages (Java, C++, Python) and platforms...
For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This technology has helped Yahoo successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms.
Associated domains: datasketches.github.io