Files on Heroku

Photo by dlxmedia.hu on Unsplash

Files on Heroku

Options to overcome the Heroku ephemeral filesystem. For free.

The Challenge

Heroku file system is ephemeral: applications can write on the file system but any change is discarded when the Dyno (the application host) restarts, making this option only suitable for temporary data.

It is also important to remember that Heroku implements cycling: every Dyno reboots (at least) every 24 hours and all local filesystem modifications are deleted.

A solution

Applications that must persist data should instead rely on an external database or a remote file server. In this post, we explore several FREE available storage options to read and write files programmatically (with Python, although it is of course possible with other programming languages and libraries):

  • GitHub

  • S3

  • Dropbox

  • DriveHQ (FTP)

  • GitLab

  • GMail

Photo by Brina Blum on Unsplash


Store files on GitHub

A valid option is to use Github: it is free, and reliable and developers are already familiar with it.

Suitable for low traffic, for example saving at regular intervals a single file which contains data updated by the users (like a CSV/JSON file)

The idea is to store files in any of your Github private or public repositories using the PyGithub Python module.

GitHub Access Token

Generate the Github Access Token necessary to authenticate the API calls: go to Account -> Settings -> Developer Settings -> Personal Access Token

Grant the token the necessary privileges (read/write into repositories) and NEVER show or share your tokens.

See GitHub documentation if unsure: Creating a personal access token

Create Personal Access Token screenshot (Image by author)

Create Personal Access Token (Image by author)

Write a file

github = Github('personal_access_token')
repository = github.get_user().get_repo('my_repo')
# path in the repository
filename = 'files/file.json'
content = '{\"name\":\"beppe\",\"city\":\"amsterdam\"}'
# create with commit message
f = repository.create_file(filename, "create_file via PyGithub", content)

Read a file

github = Github('personal_access_token)
repository = github.get_user().get_repo('my_repo')
# path in the repository
filename = 'files/file.json'
file = repository.get_contents(filename)
print(file.decoded_content.decode())

Amazon S3

Amazon S3 (Simple Storage Service) is the remote storage offered by Amazon. It is not entirely free but extremely cheap.
A great advantage of this option is that there are several other S3-compatible storage services, which makes it easier to move to a different provider if needed.

Suitable if you need a production-ready file system (managing folders, support large files, scale up volume and size)

The Python boto3 library is a simple and highly popular module for accessing and managing S3 resources.

AWS Credentials

Obtain the AWS credentials from the IAM Console

Get a file

session = boto3.session.Session()
s3 = session.client(
    service_name='s3',
    aws_access_key_id='xyz',
    aws_secret_access_key='abc'
)
s3.download_file(Bucket='bucket_name', Key='dir/a.txt', Filename=/tmp/a.txt)

Put a file

session = boto3.session.Session()
s3 = session.client(
    service_name='s3',
    aws_access_key_id='xyz',
    aws_secret_access_key='abc'
)
s3.upload_file(Bucket='bucket_name', Key='dir/a.txt', Filename=/tmp/a.txt)

Delete a file

session = boto3.session.Session()
s3 = session.client(
    service_name='s3',
    aws_access_key_id='xyz',
    aws_secret_access_key='abc'
)

s3.delete_object(Bucket='bucket_name', Key='dir/a.txt');

Dropbox

Dropbox is a file hosting service that offers various plans at different prices. The Dropbox Basic account is free and allows storing up to 2GB of files.

Suitable when the files need to be accessed or modified outside the application (i.e. via the Dropbox web application or Google Docs).

The intuitive Web User Interface can be used to create, edit and download files; there is a nice integration with Google Docs and Office Online and a powerful Python Dropbox SDK.

Dropbox Setup

Sign up for a Dropbox account (it is also possible to register via your Google account) then head to the Dropbox App Console. Create a new application, define the permissions (ie files.content.write, files.content.read) and generate the Access Token.

Dropbox App Console (image by author)

Dropbox App Console (image by author)

Create a file

dbx = dropbox.Dropbox('access_token')
filename = '/local_files/file.json'
dbx.files_upload(f.read(), filename, mute=True)

Get a file

dbx = dropbox.Dropbox('access_token')
filename = '/dropbox_root/file.json'
f, r = dbx.files_download(filename)
print(r.content)

DriveHQ

DriveHQ is a file server provider supporting FTP and WebDAV protocols. It offers a “Free 5 GB Basic” Service.

Suitable when developers are familiar with the FTP protocols and tools.

Sign up for a DriveHQ Basic Account. Import the Python ftplib module (for storing and fetching files via FTP) and perform the login with the same credentials using the Python API.

Upload a new file

def upload_file():
  ftp_host = "ftp.drivehq.com"
  ftp_username = "my_username"
  ftp_password = "my_password"
  filename = "/files/file.csv"
  localfile = "tmp/file.csv"
  # open session
  session = ftplib.FTP(ftp_host, ftp_username, ftp_password)
  file = open(localfile, 'rb')  # file to send
  session.storbinary('STOR '+filename, file)  # send the file

  file.close()  # close file and FTP session
  session.quit()

Download an existing file

def download_file():
  ftp_host = "ftp.drivehq.com"
  ftp_username = "my_username"
  ftp_password = "my_password"
  filename = "/files/file.csv"
  localfile = "tmp/file.csv"
  # open session
  session = ftplib.FTP(ftp_host, ftp_username, ftp_password)
  f = open(localfile, 'wb')  # save into local file
  session.retrbinary('RETR ' + filename, f.write, 1024)

  file.close()  # close file and FTP session
  session.quit()
  # Open local file
  f = open(localfile, 'rb')
  content = f.read()

GitLab

GitLab can also be used as a simple file repository.

Suitable for low traffic, for example saving regularly a file which contains data updated by the users (like a CSV/JSON file)

Store files on any of your Gitlab private or public repositories using Python-Gitlab

Gitlab Personal Access Token

Generate a GitHub Personal Access Token: go to Account -> Settings -> Access Tokens -> Add Personal Access Token

Grant the token the necessary privileges (read/write into repositories) and NEVER show or share your tokens.

Read a file

gl = gitlab.Gitlab('https://gitlab.com', private_token='token')
# get repository by ID
project = gl.projects.get(123)
f = project.files.get(file_path='files/config.json', ref='master')

Write a file

gl = gitlab.Gitlab('https://gitlab.com', private_token='token')
# get repository by ID
project = gl.projects.get(123)
# define name and content
filename = 'files/config.json'
content = "{name: \"Beppe\", city: \"Amsterdam\"}"
# payload
data = {
    'branch': 'master',
    'commit_message': 'Push file',
    'actions': [
        {
            'action': 'create',
            'file_path': filename,
            'content': content,
        }
    ]
}

commit = project.commits.create(data)

Gmail

Send a file by email using your Gmail account and the yagmail Python library.

A very basic approach but it might be useful to make an application file available (i.e. send debug data to the administrators)

Security

You will need to use your Gmail credentials as well as enabling Allow less secure apps in the Gmail settings, see here.

yag = yagmail.SMTP('username', 'password')
# define recipient and file to attach
recipient = "beppe@example.com"
attachment= 'tmp/attach.txt'
contents = ['Body of email', attachment]
yag.send(recipient, 'File from Python app', contents)

Conclusions

Undoubtedly storing data in a database allows more flexibility, better maintenance and faster data retrieval, however, the additional complexity is not always necessary. Dealing with simple files is a lot easier and sometimes just enough to meet the developer’s needs.

All code snippets shown in the article can be found in the GitHub repository HerokuFiles. You can catch me on Twitter for questions and suggestions.

Happy coding ✌️