Probabilistic Data Structures and Algorithms for Big Data Applications.
A technical book about popular space-efficient data structures that are extremely useful in modern Big Data applications.
Read MoreLet's start GraphQL: structure, behavior, and architecture.
In this talk, I describe the path to start with GraphQL in a company that has experience with Python stack and REST API. We go from the definition of GraphQL, via behavioral aspects and data management, to the most common architectural questions.
Read MoreExceeding Classical: Probabilistic Data Structures in Data-Intensive Applications, EuroSciPy 2019, Bilbao, Spain.
In this talk, I explain the five most important problems in data processing that occurred in different domains but can be efficiently solved with probabilistic data structures and algorithms. We cover the membership querying, counting of unique elements, frequency and rank estimation in data streams, and similarity. [Slides are here.]
Read MoreHow to count Big Data: Probabilistic data structures and algorithms. KDnuggets, Aug 26, 2019.
In this talk, we learn how probabilistic data structures and algorithms can be used for cardinality estimation in Big Data streams.
Read MoreToo Much Data? - Just Sample, Just Hash, ... , Pittsburgh Code & Supply, May 31, 2019.
Probabilistic Data Structure (PDS) concepts have been incorporated into Spark SQL. They are also used by Amazon Redshift and Google BigQuery. Consequently, PDS is not just some interesting academic topic.
Read MoreNew book on Advanced Data Structures and Algorithms for Big Data Applications. Data Science Central, May 13, 2019.
PDSA book presentation for DCS community.
Read MoreBooks for Big Data and AI. How to start and where to go for software engineers.
My recommendations for books in Big Data and AI domains for software developers who want to learn more.
Read MoreAn Introduction to Time Series Forecasting with Python, PyCon UA, April 28-29, 2018.
In this talk, we learn the basic theoretical concepts without going deep into mathematical aspects, study different models, and try them in practice using StatsModels, Prophet, scikit-learn, and keras.
Read MoreLoad distribution with DNS Delegation
The talk is about the problem of balancing the load without a single point of failure with user geographics built-in support.
Read MoreImplementing a Fileserver with Nginx and Lua.
Using the power of Nginx it is easy to implement the quite complex logic of file upload with metadata and authorization support and without the need of any heavy application server. In this article, you can find the basic implementation of such Fileserver using Nginx and Lua only.
Read MoreAn automatic terms extraction for Domain-specific corpora.
Using simple frequency-based methods, such as Domain Specificity method and Domain-Specific TF-IDF, it is possible to automatically extract and score terms for given domain-specific corpus. In this article, we will use Python and its ecosystem to illustrate such methods in action.
Read MoreRecurrent Neural Networks. Part 1: Theory
In presentation I cover basic aspects of the popular RNN architectures: LSTM and GRU.
Read MoreData Mining 2014/2015 (Rus)
The course was offered in Fall 2014 to students of the School of Computer Science at V. Karazin Kharkov National University, Ukraine. It consists of 8 lectures and the final coursework task.
Read MoreProbabilistic data structures. Quotient filter.
In this article, we continue our acquaintance with implementations of probabilistic sets and consider a modern successor of the Bloom filter that is called Quotient filter. Such data structures can effectively work in situations when we need to handle billions of elements and have optimized memory access.
Read MoreA Simple Way to Find Turning points for a Trajectory with Python.
Using Ramer-Douglas-Peucker algorithm I construct an approximated trajectory and find valuable turning points.
Read MoreA Simple Way to Find Outliers in an array with Python.
Using a basic definition of an outlier I show a simple Python function to detect such values and highlight them on a plot.
Read More