Options to overcome the Heroku ephemeral filesystem. For free.
The Challenge
Heroku file system is ephemeral: applications can write on the file system but any change is discarded when the Dyno (the application host) restarts, making this option only suitable for temporary data.
It is also important to remember that Heroku implements cycling: every Dyno reboots (at least) every 24 hours and all local filesystem modifications are deleted.
A solution
Applications that must persist data should instead rely on an external database or a remote file server. In this post, we explore several FREE available storage options to read and write files programmatically (with Python, although it is of course possible with other programming languages and libraries):
GitHub
S3
Dropbox
DriveHQ (FTP)
GitLab
GMail
Photo by Brina Blum on Unsplash
Store files on GitHub
A valid option is to use Github: it is free, and reliable and developers are already familiar with it.
Suitable for low traffic, for example saving at regular intervals a single file which contains data updated by the users (like a CSV/JSON file)
The idea is to store files in any of your Github private or public repositories using the PyGithub Python module.
GitHub Access Token
Generate the Github Access Token necessary to authenticate the API calls: go to Account -> Settings -> Developer Settings -> Personal Access Token
Grant the token the necessary privileges (read/write into repositories) and NEVER show or share your tokens.
See GitHub documentation if unsure: Creating a personal access token
Create Personal Access Token (Image by author)
Write a file
github = Github('personal_access_token')
repository = github.get_user().get_repo('my_repo')
# path in the repository
filename = 'files/file.json'
content = '{\"name\":\"beppe\",\"city\":\"amsterdam\"}'
# create with commit message
f = repository.create_file(filename, "create_file via PyGithub", content)
Read a file
github = Github('personal_access_token)
repository = github.get_user().get_repo('my_repo')
# path in the repository
filename = 'files/file.json'
file = repository.get_contents(filename)
print(file.decoded_content.decode())
Amazon S3
Amazon S3 (Simple Storage Service) is the remote storage offered by Amazon. It is not entirely free but extremely cheap.
A great advantage of this option is that there are several other S3-compatible storage services, which makes it easier to move to a different provider if needed.
Suitable if you need a production-ready file system (managing folders, support large files, scale up volume and size)
The Python boto3 library is a simple and highly popular module for accessing and managing S3 resources.
AWS Credentials
Obtain the AWS credentials from the IAM Console
Get a file
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id='xyz',
aws_secret_access_key='abc'
)
s3.download_file(Bucket='bucket_name', Key='dir/a.txt', Filename=/tmp/a.txt)
Put a file
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id='xyz',
aws_secret_access_key='abc'
)
s3.upload_file(Bucket='bucket_name', Key='dir/a.txt', Filename=/tmp/a.txt)
Delete a file
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id='xyz',
aws_secret_access_key='abc'
)
s3.delete_object(Bucket='bucket_name', Key='dir/a.txt');
Dropbox
Dropbox is a file hosting service that offers various plans at different prices. The Dropbox Basic account is free and allows storing up to 2GB of files.
Suitable when the files need to be accessed or modified outside the application (i.e. via the Dropbox web application or Google Docs).
The intuitive Web User Interface can be used to create, edit and download files; there is a nice integration with Google Docs and Office Online and a powerful Python Dropbox SDK.
Dropbox Setup
Sign up for a Dropbox account (it is also possible to register via your Google account) then head to the Dropbox App Console. Create a new application, define the permissions (ie files.content.write, files.content.read) and generate the Access Token
.
Dropbox App Console (image by author)
Create a file
dbx = dropbox.Dropbox('access_token')
filename = '/local_files/file.json'
dbx.files_upload(f.read(), filename, mute=True)
Get a file
dbx = dropbox.Dropbox('access_token')
filename = '/dropbox_root/file.json'
f, r = dbx.files_download(filename)
print(r.content)
DriveHQ
DriveHQ is a file server provider supporting FTP and WebDAV protocols. It offers a “Free 5 GB Basic” Service.
Suitable when developers are familiar with the FTP protocols and tools.
Sign up for a DriveHQ Basic Account. Import the Python ftplib
module (for storing and fetching files via FTP) and perform the login with the same credentials using the Python API.
Upload a new file
def upload_file():
ftp_host = "ftp.drivehq.com"
ftp_username = "my_username"
ftp_password = "my_password"
filename = "/files/file.csv"
localfile = "tmp/file.csv"
# open session
session = ftplib.FTP(ftp_host, ftp_username, ftp_password)
file = open(localfile, 'rb') # file to send
session.storbinary('STOR '+filename, file) # send the file
file.close() # close file and FTP session
session.quit()
Download an existing file
def download_file():
ftp_host = "ftp.drivehq.com"
ftp_username = "my_username"
ftp_password = "my_password"
filename = "/files/file.csv"
localfile = "tmp/file.csv"
# open session
session = ftplib.FTP(ftp_host, ftp_username, ftp_password)
f = open(localfile, 'wb') # save into local file
session.retrbinary('RETR ' + filename, f.write, 1024)
file.close() # close file and FTP session
session.quit()
# Open local file
f = open(localfile, 'rb')
content = f.read()
GitLab
GitLab can also be used as a simple file repository.
Suitable for low traffic, for example saving regularly a file which contains data updated by the users (like a CSV/JSON file)
Store files on any of your Gitlab private or public repositories using Python-Gitlab
Gitlab Personal Access Token
Generate a GitHub Personal Access Token: go to Account -> Settings -> Access Tokens -> Add Personal Access Token
Grant the token the necessary privileges (read/write into repositories) and NEVER show or share your tokens.
Read a file
gl = gitlab.Gitlab('https://gitlab.com', private_token='token')
# get repository by ID
project = gl.projects.get(123)
f = project.files.get(file_path='files/config.json', ref='master')
Write a file
gl = gitlab.Gitlab('https://gitlab.com', private_token='token')
# get repository by ID
project = gl.projects.get(123)
# define name and content
filename = 'files/config.json'
content = "{name: \"Beppe\", city: \"Amsterdam\"}"
# payload
data = {
'branch': 'master',
'commit_message': 'Push file',
'actions': [
{
'action': 'create',
'file_path': filename,
'content': content,
}
]
}
commit = project.commits.create(data)
Gmail
Send a file by email using your Gmail account and the yagmail Python library.
A very basic approach but it might be useful to make an application file available (i.e. send debug data to the administrators)
Security
You will need to use your Gmail credentials as well as enabling Allow less secure apps
in the Gmail settings, see here.
yag = yagmail.SMTP('username', 'password')
# define recipient and file to attach
recipient = "beppe@example.com"
attachment= 'tmp/attach.txt'
contents = ['Body of email', attachment]
yag.send(recipient, 'File from Python app', contents)
Conclusions
Undoubtedly storing data in a database allows more flexibility, better maintenance and faster data retrieval, however, the additional complexity is not always necessary. Dealing with simple files is a lot easier and sometimes just enough to meet the developer’s needs.
All code snippets shown in the article can be found in the GitHub repository HerokuFiles. You can catch me on Twitter for questions and suggestions.
Happy coding ✌️