Implementing a Fileserver with Nginx and Lua

Using the power of Nginx it is easy to implement the quite complex logic of file upload with metadata and authorization support and without the need of any heavy application server. In this article, you can find the basic implementation of such Fileserver using Nginx and Lua only.

Problem and Solution

We have a Restful API server that communicates with clients using application/json and provides advanced functionality for data management (mostly, textual). Every client is authenticated by a dedicated 3rd party authentication server and authorized by the API server in order to have an access to the requested resources. The end users get benefits from the API via client-side frontend application that provides rich user interface based on the data and functionality delivered by the API server at the backend.

At some point, end users request the ability to manage (store, analyze and retrieve) files using the same client-side application and, consequently, same API.

Therefore, additionally to the usual data management from the API, clients request:

Obviously, such functionality is out of the scope for the API and it's a natural decision to split it across 2 applications: API is used for meta file management and Fileserver takes care about actual files upload/download.

Because we need to support authentication of clients and communicate with API for metadata, one of the immediate solutions is to have a real application server as a Fileserver (e.g. Flask or Django, since we use mainly Python) and direct upload/download requests from client to it. However, to have another application in the project sounds like a bit overkilling and could increase maintenance efforts in the future.

This is why it was reasonable to look for another solution, and it seems we found it! With Nginx webserver (which we use anyway) it's quite easy and straightforward to implement exactly same logic very efficient using Lua scripting.

Nginx and Lua


Nginx is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. Unlike traditional servers, Nginx doesn’t rely on threads to handle requests. Instead, it uses a much more scalable event-driven (asynchronous) architecture. This architecture uses small, but more importantly, predictable amounts of memory under load.

Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description. Lua is dynamically typed, runs by interpreting bytecode with a register-based virtual machine, and has automatic memory management with incremental garbage collection, making it ideal for configuration, scripting, and rapid prototyping.

The Lua module for Nginx ngx_http_lua_module embeds Lua into Nginx and by leveraging Nginx's subrequests, allows the integration of Lua threads into the Nginx event model.

The various *_by_lua_* configuration directives serve as gateways to the Lua API within the nginx.conf file. In this article we will use *_by_lua_block to embed Lua code into Nginx as the most notable way to do so. There are dozens such blocks supported, but the top 3 are:

Nginx object is available in Lua as ngx and server variables are accessible as ngx.var.{name}, requested GET arguments - as ngx.var.arg_{name}, but unfortunately POST variables Nginx doesn't parse into variables automatically.

We use Nginx 1.10.2 on Debian 8 server (from the Dotdeb repository) and some additional mоdules:

1
2
3
4
$ apt-get update
$ apt-get install nginx
$ apt-get install libnginx-mod-http-ndk libnginx-mod-http-lua
$ apt-get install lua-cjson lua-penlight lua-filesystem

A lot of pure Lua libraries are provided by Penlight. They cover such areas as input data handling (such as reading configuration files), functional programming (such as map, reduce, placeholder expressions, etc), and OS path management. We will use these modules to easily manipulate strings and files.

Architecture


First of all, let see how the user flow of our project should look like. Consider 2 main operations: Upload a file and Download a file.

Upload

Download

If we translate that into Nginx configuration, the skeleton of our Fileserver server could look like:

lua_package_path '/opt/lua/?.lua;;';
server {
listen 80;
server_name files.example.com;
access_log /var/log/nginx/access.files.log;
error_log /var/log/nginx/error.files.log;
location ^~ /_api {
rewrite ^/_api/(.*) /$1 break;
proxy_pass https://api.example.com;
}
location ^~ /_auth {
rewrite ^/_auth/(.*) /$1 break;
proxy_pass https://auth.example.com;
}
location ~ ^/_download/(.*)$ {
set $path /$1;
default_type 'application/json';
limit_except GET { deny all; }
...
}
location = /_upload {
limit_except POST { deny all; }
if ($http_content_type !~ "multipart/form-data") {
return 406;
}
lua_need_request_body off;
client_body_buffer_size 200M;
client_max_body_size 200M;
default_type 'application/json';
...
}
location / {
access_log off;
log_not_found off;
deny all;
}
}
where _api and _auth are just proxies to the API and authentication servers, appropriately.

User authentication


The first action that should be performed by the Fileserver is to ensure that request has been made by an authenticated client (we use access_code query parameter to identify users). It's enough that the user exists at this step, and the authorization will be done at the API level as it was before.

Everything related to the authentication procedure could be kept away from the main logic and specified in the mentioned above access_by_lua_block:

access_by_lua_block {
local cjson = require("cjson")
-- check if user with access_token can we authenticated
function authenticate_user(access_token)
local params = {
method = ngx.HTTP_GET,
args = {
access_token = access_token
}
}
local res = ngx.location.capture("/_auth/access/check", params)
if not res or res.status ~= 200 then
return nil
end
return cjson.decode(res.body)
end
-- if user has no access_token, return 403 Forbidden
local access_token = ngx.var.arg_access_token
if not access_token then
return_http_forbidden("Forbidden", "Forbidden")
end
-- authenticate user
local credentials = authenticate_user(access_token)
-- if user can't be resolved, return 403 Forbidden
if not credentials or not credentials.data.user.id then
return_http_forbidden("Forbidden", "Forbidden")
end
}

Note, the access_by_lua_block will be executed very early (right after Nginx access module) - before any other our lua block, so further we can assume that we deal with a real user.

Upload


The upload flow contains of the following 5 main steps:

The actual code could be kept in content_by_lua_block as the most appropriate place for the main logic.

First of all, we need to parse the multipart request. It's not so easy as it could be seen from the beginning since by default nginx doesn't extract POST variables and we have decided not to use custom nginx compilations (which could do so, but still, there will a problem with big files handling).

The most common way how people suggest to handle such cases is to parse multipart forms by Lua in memory, but we can't use it since our forms contain files that are quite big to fit in the memory. So, we have to go with a streaming multipart parsing. The good choice is resty.upload module. The idea is to retrieve multipart request data from Nginx line-by-line and store its parts in memory or directly in filesystem based on their type.

In our case, we allow to have only one file object under variable file, but the code will work for many files in the multipart request. All files will be stored in the filesystem as temporary files and all other inputs will be parsed directly to Lua table and, therefore, kept in memory.

Our helper to parse such multipart input we store in utils.lua module:

local pathx = require("pl.path")
local stringx = require("pl.stringx")
local upload = require("resty.upload")
local utils = {}
--
-- Extract variables from the Content-Disposition header
--
local function decode_content_disposition(value)
local result
local disposition_type, params = string.match(value, "([%w%-%._]+);(.+)")
if disposition_type then
result = {}
result.disposition_type = disposition_type
result.params = {}
if params then
for index, param in pairs(stringx.split(params, "; ")) do
key, value = param:match('([%w%.%-_]+)="(.+)"$')
if key then
result.params[key] = value
end
end
end
end
return result
end
--
-- Extract POST variables in a streaming way
-- and store file object (with name "file") in a temporary
-- file (it's filepath stores in the form_data.file.filepath)
-- All.other form variables are fully store in form_data
-- (we expect them to be small)
--
function utils.streaming_multipart_form()
local chunk_size = 4096
local form, err = upload:new(chunk_size)
if form == nil then
return nil, "Can't create upload form"
end
form:set_timeout(1000) -- 1 sec
local file = nil
local part_name = nil
local form_data = {}
while true do
local typ, res, err = form:read()
if not typ then
return nil, "Can't read uploaded form"
end
if typ == "header" then
local header_name = res[1]
if header_name == "Content-Disposition" then
local header_full = res[3]
local parsed_header = decode_content_disposition(header_full)
part_name = parsed_header.params.name
if form_data[part_name] == nil then
form_data[part_name] = {}
end
form_data[part_name] = parsed_header.params
if parsed_header.params.filename then
form_data[part_name]["fullpath"] = pathx.tmpname()
file = io.open(form_data[part_name]["fullpath"], "w+")
end
elseif header_name == "Content-Type" then
local header_value = res[2]
if part_name then
form_data[part_name]["content_type"] = header_value
end
end
elseif typ == "body" then
if file then
file:write(res)
elseif part_name then
form_data[part_name]["value"] = res
end
elseif typ == "part_end" then
if file then
file:close()
end
file = nil
elseif typ == "eof" then
break
end
end
return form_data
end
return utils

It's quite easy and straightforward how to call it from the _upload location:

local helpers = require("utils")
form_data, err = helpers.streaming_multipart_form()
if form_data == nil then
return_http_internal_server_error("Server Error", err)
end

 Important part to make it work is to specify client_body_buffer_size equal to client_max_body_size. Otherwise, if the client body is bigger than client_body_buffer_size, the nginx variable $request_body will be empty.

As a result, we will have a Lua table form with all our variables from the form input, but for file objects instead of value key it will contain a fullpath key with the path of file in the temporary filesystem.

As soon as we have our form parsed, we want to store the file's metadata by calling our existing API which also returns the expected file path on the permanent filesystem (usually, some kind of network file system).

To call API we can issue a subrequest using ngx.location.capture with parameters, based on the form in the user's input:

local cjson = require("cjson")
data = {
status = "not_ready"
}
-- 'title' is the name of the part with text input
if form.title and form.title.value then
data["title"] = form.title.value
end
-- 'file' is the name of the part with binary input
if form.file and form.file.filename then
data["filename"] = form.file.filename
end
local params = {
method = ngx.HTTP_POST,
args = {
access_token = access_token
},
body = cjson.encode(data)
}
local res = ngx.location.capture("/_api/v1/file", params)
if not res then
return_http_bad_gateway("Bad Gateway", "Can't create metadata")
end
local create_metadata_response = res.body
if res and res.status ~= 201 then
ngx.status = res.status
ngx.print(create_metadata_response)
ngx.exit(res.status)
end
local file_metadata = cjson.decode(create_metadata_response)

 Note that subrequests issued by ngx.location.capture inherit all the request headers of the current request by default and that this may have unexpected side effects on the subrequest responses.

If something goes wrong with metadata creation (e.g. invalid data, not authorized access), we just forward such output directly to the client and stop further processing.

As soon as file officially "registered", we move it to the permanent filesystem and tell API that it is ready for download:

local filex = require("pl.file")
local dirx = require("pl.dir")
local cjson = require("cjson")
local file_fullpath = "/storage/files/" .. file_metadata.hash .. file_metadata.path
-- ensure that subdirectories exist
local file_fulldir = pathx.dirname(file_fullpath)
dirx.makepath(file_fulldir)
-- store tmp file from form variable fi to its permanent position
is_moved, err = filex.move(form.file.fullpath, file_fullpath)
-- if file has been successfully moved, we can update its metadata
-- to make it available for download
if is_moved then
local params = {
method = ngx.HTTP_PUT,
args = {
access_token = access_token
},
body = cjson.encode({
status = "ready"
})
}
ngx.location.capture("/_api/v1/file/" .. file_metadata.id, params)
end
-- provide some headers with metadata
ngx.header["X-File-Name"] = file_metadata.filename
ngx.header["X-File-ID"] = file_metadata.id
ngx.status = ngx.HTTP_CREATED
ngx.print(create_metadata_response)
ngx.exit(ngx.HTTP_CREATED)

Download


The download flow is a bit simpler, but still contains some important steps:

The authentication steps exactly the same as we have for the upload procedure. After authentication, we get extracted from the request $path (in lua accessible as ngx.var.path) and ask API for its metadata. At this step, we also ensure that it exists, ready to download and the user has access to it.

local search_by_path = {
filters = {
must = {
{
exact = {
field = "path",
values = {ngx.var.path}
}
},
{
exact = {
field = "status",
values = {"ready"}
}
}
}
},
size = 1
}
local params = {
method = ngx.HTTP_POST,
args = {
access_token = access_token
},
body = cjson.encode(search_by_path)
}
local res = ngx.location.capture("/_api/v1/search/files", params)
if not res then
return_http_bad_gateway("Bad Gateway", "Can't search for the file")
end
local found = cjson.decode(res.body)
if found.total < 1 then
return_http_not_found("File Not Found", "File Not Found")
end

As soon as file has been found, the metadata provides us with all necessary information about its location and we are ready to respond it to the user. Sometimes people recommend to read such file in Lua and return it to the user with ngx.print that is completely unsuitable for us due to potentially huge size of the files that will just crash Lua's virtual machine with error like Lua VM crashed, reason: not enough memory.

This is why it's better to use existing functionality of Nginx as the very performant webserver and let it to respond with our file. To implement such functionality, we created a special internal named location (to prevent unauthorized access) that just serves requested static files from the desired directory.

local file_metadata = found.results[1]
if file_metadata.content_type then
ngx.header["Content-type"] = file_metadata.content_type
end
ngx.header["X-File-Name"] = file_metadata.name
ngx.header["X-File-Path"] = file_metadata.path
ngx.header["X-File-ID"] = file_metadata.id
ngx.req.set_uri("/" .. file_metadata.hash .. file_metadata.path)
ngx.exec("@download_file", download_params)

The @download_file location is quite simple, but additionally we want to play with response headers to provide a real filename for download (on our filesystem all files are stored with unique generated names):

location @download_file {
internal;
root /storage/files/;
try_files $uri =404;
header_filter_by_lua_block {
ngx.header["Cache-Control"] = "no-cache"
ngx.header["Content-Disposition"] = "attachment; filename=\"" .. ngx.header["X-File-Name"] .. "\""
}
}

That's it! Now we have a fully functional and very fast, but at the same time quite small Fileserver without any need to install heavy application servers that will require a lot of support.

How to use


Upload a file

1
2
3
4
curl -XPOST https://files.example.com/_upload?access_token={SOME_TOKEN} \
   --form file=@/tmp/file.pdf\
   --form title="Example title"\
   -H "Content-Type: multipart/form-data"

Download a file

1
curl -XGET https://files.example.com/_download/723533/2338342189083057604.pdf?access_token={SOME_TOKEN}