Articles

Books for Big Data and AI. How to start and where to go for software engineers.

My recommendations for books in Big Data and AI domains for software developers who want to learn more.

Read More

Implementing a Fileserver with Nginx and Lua.

Using the power of Nginx it is easy to implement the quite complex logic of file upload with metadata and authorization support and without the need of any heavy application server. In this article, you can find the basic implementation of such Fileserver using Nginx and Lua only.

Read More

An automatic terms extraction for Domain-specific corpora.

Using simple frequency-based methods, such as Domain Specificity method and Domain-Specific TF-IDF, it is possible to automatically extract and score terms for given domain-specific corpus. In this article, we will use Python and its ecosystem to illustrate such methods in action.

Read More

Probabilistic data structures. Quotient filter.

In this article, we continue our acquaintance with implementations of probabilistic sets and consider a modern successor of the Bloom filter that is called Quotient filter. Such data structures can effectively work in situations when we need to handle billions of elements and have optimized memory access.

Read More

A Simple Way to Find Turning points for a Trajectory with Python.

Using Ramer-Douglas-Peucker algorithm I construct an approximated trajectory and find valuable turning points.

Read More

Probabilistic data structures. Bloom filter.

In the article we consider such popular implementation of a probabilistic set as Bloom filter, that can efficiently solve the problem of determining membership of some element in a large set of elements without the need to store every element and use many comparisons.

Read More

A Simple Way to Find Outliers in an array with Python.

Using a basic definition of an outlier I show a simple Python function to detect such values and highlight them on a plot.

Read More

Twitter analysis for Strata+Hadoop World (BCN, 2014) with Apache Spark and D3.

Using the official hashtag #StrataHadoop, I've made a basic analysis of Twitter activity during the Strata+Hadoop World conference that was held on 19-21 November 2014 in Barcelona, Spain.

Read More

Realtime Twitter Sentiment Analysis with Storm and Elasticsearch.

In this article I have built an Apache Storm topology to process Twitter stream and provide basic sentiment statistics based on the Stanford CoreNLP.

Read More