Implementing a Fileserver with Nginx and Lua
Using the power of Nginx it is easy to implement the quite complex logic of file upload with metadata and authorization support and without the need of any heavy application server. In this article, you can find the basic implementation of such Fileserver using Nginx and Lua only.
Problem and Solution
We have a Restful API server that communicates with clients using application/json
and
provides advanced functionality for data management (mostly, textual). Every client is authenticated
by
a dedicated 3rd party authentication server and authorized by the API server in order to have an
access
to the requested resources. The end users get benefits from the API via client-side frontend
application
that provides rich user interface based on the data and functionality delivered by the API server at
the
backend.
At some point, end users request the ability to manage (store, analyze and retrieve) files using the same client-side application and, consequently, same API.
Therefore, additionally to the usual data management from the API, clients request:
- allow to process
multipart/form-data
requests (that will be proxied from theform
on the client-side application) - extract and handle file metadata
- provide file storage and access
Obviously, such functionality is out of the scope for the API and it's a natural decision to split it across 2 applications: API is used for meta file management and Fileserver takes care about actual files upload/download.
Because we need to support authentication of clients and communicate with API for metadata, one of the immediate solutions is to have a real application server as a Fileserver (e.g. Flask or Django, since we use mainly Python) and direct upload/download requests from client to it. However, to have another application in the project sounds like a bit overkilling and could increase maintenance efforts in the future.
This is why it was reasonable to look for another solution, and it seems we found it! With Nginx webserver (which we use anyway) it's quite easy and straightforward to implement exactly same logic very efficient using Lua scripting.
Nginx and Lua
Nginx is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. Unlike traditional servers, Nginx doesn’t rely on threads to handle requests. Instead, it uses a much more scalable event-driven (asynchronous) architecture. This architecture uses small, but more importantly, predictable amounts of memory under load.
Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description. Lua is dynamically typed, runs by interpreting bytecode with a register-based virtual machine, and has automatic memory management with incremental garbage collection, making it ideal for configuration, scripting, and rapid prototyping.
The Lua
module for Nginx ngx_http_lua_module
embeds Lua into Nginx and by leveraging
Nginx's subrequests, allows the integration of Lua threads into the Nginx event model.
The various *_by_lua_*
configuration directives serve as gateways to the Lua API within
the
nginx.conf
file. In this article we will use *_by_lua_block
to embed Lua
code
into Nginx as the most notable way to do so. There are dozens such blocks supported, but the top 3
are:
header_filter_by_lua_block
- uses Lua code to define an output header filter (e.g. for overriding or adding a response header).access_by_lua_block
- acts as an access phase handler and executes Lua code for every request. as with other access phase handlers, access_by_lua will not run in subrequests.content_by_lua_block
- acts as a "content handler" and executes Lua code for every request. Do not use this directive and other content handler directives (e.g. proxy_pass) in the same location.
Nginx object is available in Lua as ngx
and server variables are accessible as
ngx.var.{name}
, requested GET arguments - as ngx.var.arg_{name}
, but
unfortunately POST variables Nginx doesn't parse into variables automatically.
We use Nginx 1.10.2 on Debian 8 server (from the Dotdeb repository) and some additional mоdules:
1 2 3 4 | $ apt - get update $ apt - get install nginx $ apt - get install libnginx - mod - http - ndk libnginx - mod - http - lua $ apt - get install lua - cjson lua - penlight lua - filesystem |
A lot of pure Lua libraries are provided by Penlight. They cover such areas as input data handling (such as reading configuration files), functional programming (such as map, reduce, placeholder expressions, etc), and OS path management. We will use these modules to easily manipulate strings and files.
Architecture
First of all, let see how the user flow of our project should look like. Consider 2 main operations: Upload a file and Download a file.
Upload
Download
If we translate that into Nginx configuration, the skeleton of our Fileserver server could look like:
lua_package_path '/opt/lua/?.lua;;'; | |
server { | |
listen 80; | |
server_name files.example.com; | |
access_log /var/log/nginx/access.files.log; | |
error_log /var/log/nginx/error.files.log; | |
location ^~ /_api { | |
rewrite ^/_api/(.*) /$1 break; | |
proxy_pass https://api.example.com; | |
} | |
location ^~ /_auth { | |
rewrite ^/_auth/(.*) /$1 break; | |
proxy_pass https://auth.example.com; | |
} | |
location ~ ^/_download/(.*)$ { | |
set $path /$1; | |
default_type 'application/json'; | |
limit_except GET { deny all; } | |
... | |
} | |
location = /_upload { | |
limit_except POST { deny all; } | |
if ($http_content_type !~ "multipart/form-data") { | |
return 406; | |
} | |
lua_need_request_body off; | |
client_body_buffer_size 200M; | |
client_max_body_size 200M; | |
default_type 'application/json'; | |
... | |
} | |
location / { | |
access_log off; | |
log_not_found off; | |
deny all; | |
} | |
} |
_api
and _auth
are just proxies to the API and authentication servers,
appropriately.
User authentication
The first action that should be performed by the Fileserver is to ensure that request has been made
by
an authenticated client (we use access_code
query parameter to identify users). It's
enough
that the user exists at this step, and the authorization will be done at the API level as it was
before.
Everything related to the authentication procedure could be kept away from the main logic and
specified
in the mentioned above access_by_lua_block
:
access_by_lua_block { | |
local cjson = require("cjson") | |
-- check if user with access_token can we authenticated | |
function authenticate_user(access_token) | |
local params = { | |
method = ngx.HTTP_GET, | |
args = { | |
access_token = access_token | |
} | |
} | |
local res = ngx.location.capture("/_auth/access/check", params) | |
if not res or res.status ~= 200 then | |
return nil | |
end | |
return cjson.decode(res.body) | |
end | |
-- if user has no access_token, return 403 Forbidden | |
local access_token = ngx.var.arg_access_token | |
if not access_token then | |
return_http_forbidden("Forbidden", "Forbidden") | |
end | |
-- authenticate user | |
local credentials = authenticate_user(access_token) | |
-- if user can't be resolved, return 403 Forbidden | |
if not credentials or not credentials.data.user.id then | |
return_http_forbidden("Forbidden", "Forbidden") | |
end | |
} |
Note, the access_by_lua_block
will be executed very early (right after Nginx access
module)
- before any other our lua block, so further we can assume that we deal with a real user.
Upload
The upload flow contains of the following 5 main steps:
- receive the upload request
- authenticate the user
- extract metadata and store it in API
- calculate the fullpath of the file in the storage
- store file
The actual code could be kept in content_by_lua_block
as the most appropriate place for
the
main logic.
First of all, we need to parse the multipart request. It's not so easy as it could be seen from the
beginning since by default nginx
doesn't extract POST variables and we have decided not
to
use custom nginx compilations (which could do so, but still, there will a problem with big files
handling).
The most common way how people suggest to handle such cases is to parse multipart forms by Lua in
memory, but we can't use it since our forms contain files that are quite big to fit in the memory.
So,
we have to go with a streaming multipart parsing. The good choice is
resty.upload
module. The idea is to retrieve multipart request data from Nginx
line-by-line
and store its parts in memory or directly in filesystem based on their type.
In our case, we allow to have only one file object under variable file
, but the code
will
work for many files in the multipart request. All files will be stored in the filesystem as
temporary
files and all other inputs will be parsed directly to Lua table and, therefore, kept in memory.
Our helper to parse such multipart input we store in utils.lua
module:
local pathx = require("pl.path") | |
local stringx = require("pl.stringx") | |
local upload = require("resty.upload") | |
local utils = {} | |
-- | |
-- Extract variables from the Content-Disposition header | |
-- | |
local function decode_content_disposition(value) | |
local result | |
local disposition_type, params = string.match(value, "([%w%-%._]+);(.+)") | |
if disposition_type then | |
result = {} | |
result.disposition_type = disposition_type | |
result.params = {} | |
if params then | |
for index, param in pairs(stringx.split(params, "; ")) do | |
key, value = param:match('([%w%.%-_]+)="(.+)"$') | |
if key then | |
result.params[key] = value | |
end | |
end | |
end | |
end | |
return result | |
end | |
-- | |
-- Extract POST variables in a streaming way | |
-- and store file object (with name "file") in a temporary | |
-- file (it's filepath stores in the form_data.file.filepath) | |
-- All.other form variables are fully store in form_data | |
-- (we expect them to be small) | |
-- | |
function utils.streaming_multipart_form() | |
local chunk_size = 4096 | |
local form, err = upload:new(chunk_size) | |
if form == nil then | |
return nil, "Can't create upload form" | |
end | |
form:set_timeout(1000) -- 1 sec | |
local file = nil | |
local part_name = nil | |
local form_data = {} | |
while true do | |
local typ, res, err = form:read() | |
if not typ then | |
return nil, "Can't read uploaded form" | |
end | |
if typ == "header" then | |
local header_name = res[1] | |
if header_name == "Content-Disposition" then | |
local header_full = res[3] | |
local parsed_header = decode_content_disposition(header_full) | |
part_name = parsed_header.params.name | |
if form_data[part_name] == nil then | |
form_data[part_name] = {} | |
end | |
form_data[part_name] = parsed_header.params | |
if parsed_header.params.filename then | |
form_data[part_name]["fullpath"] = pathx.tmpname() | |
file = io.open(form_data[part_name]["fullpath"], "w+") | |
end | |
elseif header_name == "Content-Type" then | |
local header_value = res[2] | |
if part_name then | |
form_data[part_name]["content_type"] = header_value | |
end | |
end | |
elseif typ == "body" then | |
if file then | |
file:write(res) | |
elseif part_name then | |
form_data[part_name]["value"] = res | |
end | |
elseif typ == "part_end" then | |
if file then | |
file:close() | |
end | |
file = nil | |
elseif typ == "eof" then | |
break | |
end | |
end | |
return form_data | |
end | |
return utils |
It's quite easy and straightforward how to call it from the _upload
location:
local helpers = require("utils") | |
form_data, err = helpers.streaming_multipart_form() | |
if form_data == nil then | |
return_http_internal_server_error("Server Error", err) | |
end |
Important part to make it work is to specify
client_body_buffer_size
equal to client_max_body_size
. Otherwise, if the
client body is bigger than client_body_buffer_size
, the nginx variable
$request_body
will be empty.
As a result, we will have a Lua table form
with all our variables from the form input,
but
for file objects instead of value
key it will contain a fullpath
key with
the
path of file in the temporary filesystem.
As soon as we have our form parsed, we want to store the file's metadata by calling our existing API which also returns the expected file path on the permanent filesystem (usually, some kind of network file system).
To call API we can issue a subrequest using ngx.location.capture
with parameters, based
on
the form in the user's input:
local cjson = require("cjson") | |
data = { | |
status = "not_ready" | |
} | |
-- 'title' is the name of the part with text input | |
if form.title and form.title.value then | |
data["title"] = form.title.value | |
end | |
-- 'file' is the name of the part with binary input | |
if form.file and form.file.filename then | |
data["filename"] = form.file.filename | |
end | |
local params = { | |
method = ngx.HTTP_POST, | |
args = { | |
access_token = access_token | |
}, | |
body = cjson.encode(data) | |
} | |
local res = ngx.location.capture("/_api/v1/file", params) | |
if not res then | |
return_http_bad_gateway("Bad Gateway", "Can't create metadata") | |
end | |
local create_metadata_response = res.body | |
if res and res.status ~= 201 then | |
ngx.status = res.status | |
ngx.print(create_metadata_response) | |
ngx.exit(res.status) | |
end | |
local file_metadata = cjson.decode(create_metadata_response) |
Note that subrequests issued by
ngx.location.capture
inherit all the request headers of the current request by default and that this may have unexpected
side
effects on the subrequest responses.
If something goes wrong with metadata creation (e.g. invalid data, not authorized access), we just forward such output directly to the client and stop further processing.
As soon as file officially "registered", we move it to the permanent filesystem and tell API that it is ready for download:
local filex = require("pl.file") | |
local dirx = require("pl.dir") | |
local cjson = require("cjson") | |
local file_fullpath = "/storage/files/" .. file_metadata.hash .. file_metadata.path | |
-- ensure that subdirectories exist | |
local file_fulldir = pathx.dirname(file_fullpath) | |
dirx.makepath(file_fulldir) | |
-- store tmp file from form variable fi≤ to its permanent position | |
is_moved, err = filex.move(form.file.fullpath, file_fullpath) | |
-- if file has been successfully moved, we can update its metadata | |
-- to make it available for download | |
if is_moved then | |
local params = { | |
method = ngx.HTTP_PUT, | |
args = { | |
access_token = access_token | |
}, | |
body = cjson.encode({ | |
status = "ready" | |
}) | |
} | |
ngx.location.capture("/_api/v1/file/" .. file_metadata.id, params) | |
end | |
-- provide some headers with metadata | |
ngx.header["X-File-Name"] = file_metadata.filename | |
ngx.header["X-File-ID"] = file_metadata.id | |
ngx.status = ngx.HTTP_CREATED | |
ngx.print(create_metadata_response) | |
ngx.exit(ngx.HTTP_CREATED) |
Download
The download flow is a bit simpler, but still contains some important steps:
- authenticate the user
- retrieve metadata of the file
- retrieve the file and return it
The authentication steps exactly the same as we have for the upload procedure. After authentication,
we
get extracted from the request $path
(in lua accessible as ngx.var.path
)
and
ask API for its metadata. At this step, we also ensure that it exists, ready to download and the
user
has access to it.
local search_by_path = { | |
filters = { | |
must = { | |
{ | |
exact = { | |
field = "path", | |
values = {ngx.var.path} | |
} | |
}, | |
{ | |
exact = { | |
field = "status", | |
values = {"ready"} | |
} | |
} | |
} | |
}, | |
size = 1 | |
} | |
local params = { | |
method = ngx.HTTP_POST, | |
args = { | |
access_token = access_token | |
}, | |
body = cjson.encode(search_by_path) | |
} | |
local res = ngx.location.capture("/_api/v1/search/files", params) | |
if not res then | |
return_http_bad_gateway("Bad Gateway", "Can't search for the file") | |
end | |
local found = cjson.decode(res.body) | |
if found.total < 1 then | |
return_http_not_found("File Not Found", "File Not Found") | |
end |
As soon as file has been found, the metadata provides us with all necessary information about its
location and we are ready to respond it to the user. Sometimes people recommend to read such file in
Lua
and return it to the user with ngx.print
that is completely unsuitable for us due to
potentially huge size of the files that will just crash Lua's virtual machine with error like
Lua VM crashed, reason: not enough memory
.
This is why it's better to use existing functionality of Nginx as the very performant webserver and
let
it to respond with our file. To implement such functionality, we created a special
internal
named location (to prevent unauthorized access) that just serves requested static files from the
desired
directory.
local file_metadata = found.results[1] | |
if file_metadata.content_type then | |
ngx.header["Content-type"] = file_metadata.content_type | |
end | |
ngx.header["X-File-Name"] = file_metadata.name | |
ngx.header["X-File-Path"] = file_metadata.path | |
ngx.header["X-File-ID"] = file_metadata.id | |
ngx.req.set_uri("/" .. file_metadata.hash .. file_metadata.path) | |
ngx.exec("@download_file", download_params) |
The @download_file
location is quite simple, but additionally we want to play with
response
headers to provide a real filename for download (on our filesystem all files are stored with unique
generated names):
location @download_file { | |
internal; | |
root /storage/files/; | |
try_files $uri =404; | |
header_filter_by_lua_block { | |
ngx.header["Cache-Control"] = "no-cache" | |
ngx.header["Content-Disposition"] = "attachment; filename=\"" .. ngx.header["X-File-Name"] .. "\"" | |
} | |
} |
That's it! Now we have a fully functional and very fast, but at the same time quite small Fileserver without any need to install heavy application servers that will require a lot of support.
How to use
Upload a file
1 2 3 4 | curl - XPOST https: / / files.example.com / _upload?access_token = {SOME_TOKEN} \ - - form file = @ / tmp / file .pdf\ - - form title = "Example title" \ - H "Content-Type: multipart/form-data" |
Download a file
1 | curl - XGET https: / / files.example.com / _download / 723533 / 2338342189083057604.pdf ?access_token = {SOME_TOKEN} |