Implementing a Fileserver with Nginx and Lua

Using the power of Nginx it is easy to implement the quite complex logic of file upload with metadata and authorization support and without the need of any heavy application server. In this article, you can find the basic implementation of such Fileserver using Nginx and Lua only.

Problem and Solution

We have a Restful API server that communicates with clients using application/json and provides advanced functionality for data management (mostly, textual). Every client is authenticated by a dedicated 3rd party authentication server and authorized by the API server in order to have an access to the requested resources. The end users get benefits from the API via client-side frontend application that provides rich user interface based on the data and functionality delivered by the API server at the backend.

At some point, end users request the ability to manage (store, analyze and retrieve) files using the same client-side application and, consequently, same API.

Therefore, additionally to the usual data management from the API, clients request:

Obviously, such functionality is out of the scope for the API and it's a natural decision to split it across 2 applications: API is used for meta file management and Fileserver takes care about actual files upload/download.

Because we need to support authentication of clients and communicate with API for metadata, one of the immediate solutions is to have a real application server as a Fileserver (e.g. Flask or Django, since we use mainly Python) and direct upload/download requests from client to it. However, to have another application in the project sounds like a bit overkilling and could increase maintenance efforts in the future.

This is why it was reasonable to look for another solution, and it seems we found it! With Nginx webserver (which we use anyway) it's quite easy and straightforward to implement exactly same logic very efficient using Lua scripting.

Nginx and Lua


Nginx is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. Unlike traditional servers, Nginx doesn’t rely on threads to handle requests. Instead, it uses a much more scalable event-driven (asynchronous) architecture. This architecture uses small, but more importantly, predictable amounts of memory under load.

Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description. Lua is dynamically typed, runs by interpreting bytecode with a register-based virtual machine, and has automatic memory management with incremental garbage collection, making it ideal for configuration, scripting, and rapid prototyping.

The Lua module for Nginx ngx_http_lua_module embeds Lua into Nginx and by leveraging Nginx's subrequests, allows the integration of Lua threads into the Nginx event model.

The various *_by_lua_* configuration directives serve as gateways to the Lua API within the nginx.conf file. In this article we will use *_by_lua_block to embed Lua code into Nginx as the most notable way to do so. There are dozens such blocks supported, but the top 3 are:

Nginx object is available in Lua as ngx and server variables are accessible as ngx.var.{name}, requested GET arguments - as ngx.var.arg_{name}, but unfortunately POST variables Nginx doesn't parse into variables automatically.

We use Nginx 1.10.2 on Debian 8 server (from the Dotdeb repository) and some additional mоdules:

A lot of pure Lua libraries are provided by Penlight. They cover such areas as input data handling (such as reading configuration files), functional programming (such as map, reduce, placeholder expressions, etc), and OS path management. We will use these modules to easily manipulate strings and files.

Architecture


First of all, let see how the user flow of our project should look like. Consider 2 main operations: Upload a file and Download a file.

Upload

Download

If we translate that into Nginx configuration, the skeleton of our Fileserver server could look like:

where _api and _auth are just proxies to the API and authentication servers, appropriately.

User authentication


The first action that should be performed by the Fileserver is to ensure that request has been made by an authenticated client (we use access_code query parameter to identify users). It's enough that the user exists at this step, and the authorization will be done at the API level as it was before.

Everything related to the authentication procedure could be kept away from the main logic and specified in the mentioned above access_by_lua_block:

Note, the access_by_lua_block will be executed very early (right after Nginx access module) - before any other our lua block, so further we can assume that we deal with a real user.

Upload


The upload flow contains of the following 5 main steps:

The actual code could be kept in content_by_lua_block as the most appropriate place for the main logic.

First of all, we need to parse the multipart request. It's not so easy as it could be seen from the beginning since by default nginx doesn't extract POST variables and we have decided not to use custom nginx compilations (which could do so, but still, there will a problem with big files handling).

The most common way how people suggest to handle such cases is to parse multipart forms by Lua in memory, but we can't use it since our forms contain files that are quite big to fit in the memory. So, we have to go with a streaming multipart parsing. The good choice is resty.upload module. The idea is to retrieve multipart request data from Nginx line-by-line and store its parts in memory or directly in filesystem based on their type.

In our case, we allow to have only one file object under variable file, but the code will work for many files in the multipart request. All files will be stored in the filesystem as temporary files and all other inputs will be parsed directly to Lua table and, therefore, kept in memory.

Our helper to parse such multipart input we store in utils.lua module:

It's quite easy and straightforward how to call it from the _upload location:

 Important part to make it work is to specify client_body_buffer_size equal to client_max_body_size. Otherwise, if the client body is bigger than client_body_buffer_size, the nginx variable $request_body will be empty.

As a result, we will have a Lua table form with all our variables from the form input, but for file objects instead of value key it will contain a fullpath key with the path of file in the temporary filesystem.

As soon as we have our form parsed, we want to store the file's metadata by calling our existing API which also returns the expected file path on the permanent filesystem (usually, some kind of network file system).

To call API we can issue a subrequest using ngx.location.capture with parameters, based on the form in the user's input:

 Note that subrequests issued by ngx.location.capture inherit all the request headers of the current request by default and that this may have unexpected side effects on the subrequest responses.

If something goes wrong with metadata creation (e.g. invalid data, not authorized access), we just forward such output directly to the client and stop further processing.

As soon as file officially "registered", we move it to the permanent filesystem and tell API that it is ready for download:

Download


The download flow is a bit simpler, but still contains some important steps:

The authentication steps exactly the same as we have for the upload procedure. After authentication, we get extracted from the request $path (in lua accessible as ngx.var.path) and ask API for its metadata. At this step, we also ensure that it exists, ready to download and the user has access to it.

As soon as file has been found, the metadata provides us with all necessary information about its location and we are ready to respond it to the user. Sometimes people recommend to read such file in Lua and return it to the user with ngx.print that is completely unsuitable for us due to potentially huge size of the files that will just crash Lua's virtual machine with error like Lua VM crashed, reason: not enough memory.

This is why it's better to use existing functionality of Nginx as the very performant webserver and let it to respond with our file. To implement such functionality, we created a special internal named location (to prevent unauthorized access) that just serves requested static files from the desired directory.

The @download_file location is quite simple, but additionally we want to play with response headers to provide a real filename for download (on our filesystem all files are stored with unique generated names):

That's it! Now we have a fully functional and very fast, but at the same time quite small Fileserver without any need to install heavy application servers that will require a lot of support.

How to use


Upload a file

Download a file