Photo by Hitesh Choudhary on Unsplash
Use Python & Boto3 to Backup files/logs to AWS S3
Python script, using Boto3, to backup files in a folder or server logs to AWS S3 in daily at a fixed time with the backup data in the file/log name.
Introduction
Let’s say we have a folder on our server in which our logs are generated for various services that are running to make our application available to the users. Now, what if we want to back up those logs to the AWS S3 bucket daily at 00:00 hour? Well, this guide is exactly to help us achieve the same! Let’s dive in!
Getting the S3 bucket ready
Create an S3 bucket from your AWS S3 Console.
Create an IAM user and then make sure that the user has write privilege to the S3 bucket.
Keep the bucket name, AWS access key id, and AWS access key handy, as they are required for further steps.
Let’s Write Some Code!
1. Create the project directory & python virtual environment
$ mkdir 'Backup Logs S3'
$ cd 'Backup Logs S3'
$ python3 -m venv env
$ source env/bin/activate
2. Create a requirements.txt file to mention all the packages that are used for this project.
schedule==0.6.0
boto3==1.13.20
3. Install the requirements using pip
(env)$ pip install -r requirements.txt
4. Create a function to upload a file to an S3 bucket
Use your favourite editor to create backup_logs_s3.py as follows:
import boto3
from botocore.exceptions import ClientError
def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
"""
Upload a file to an S3 bucket.
Params:
file_name: File to upload
bucket: Bucket to upload to
object_name: S3 object name. If not specified then file_name is used
folder_name: Folder name in which file is to be uploaded
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name.split('/')[-1]
# If folder_name was specified, upload in the folder
if folder_name is not None:
object_name = f'{folder_name}/{object_name}'
# Upload the file
try:
s3_client = boto3.client(
service_name='s3',
aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID,
aws_secret_access_key=YOUR_AWS_SECRET_ACCESS_KEY
)
response = s3_client.upload_file(file_name, bucket, object_name)
print(response)
except ClientError as e:
print(e)
The function accepts 4 parameters:
file_name: Name of the file with the absolute path
bucket: Name of the bucket to which the file is to be uploaded
object_name: (Optional) To specify the name of the object when the file is uploaded to the bucket
folder_name: (Optional) Name of the folder under which the file will be uploaded, if the folder doesn’t exist already, a new folder will be created.
In this function what we are doing is, first, we assign object_name with the name of the file after splitting the path from the file_name, if the object_name is given as a parameter.
Then, if folder_name is given, we assign the object_name to be ‘folder_name/object_name’.
In the try block, we create a client by calling the client method of the boto3 package. Make sure to replace ‘YOUR_AWS_ACCESS_KEY_ID’ and ‘YOUR_AWS_SECRET_ACCESS_KEY’ with your actual keys which I asked you to keep handy earlier.
This client is then used to call the function upload_file to upload the file to our S3 bucket and the response returned by this function is printed.
5. Create a function to append the date to log files (Optional)
This step is optional if you simply want to upload your files to S3, feel free to skip this step. Suppose, I have a log file named ‘server.log’ which gets appended by the requests that the server receives. So, if my server has been running for a week, then all requests for the whole week have been logged to the same file, this makes checking the logs for a particular day troublesome. To resolve this, each day at 00:00 when we back up the logs to the S3 bucket, first, we will append the date of the previous day to the file name and then upload the file to S3, which will help us to browse through the logs date-wise.
import boto3
from botocore.exceptions import ClientError
import os
def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
"""
Upload a file to an S3 bucket.
Params:
file_name: File to upload
bucket: Bucket to upload to
object_name: S3 object name. If not specified then file_name is used
folder_name: Folder name in which file is to be uploaded
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name.split('/')[-1]
# If folder_name was specified, upload in the folder
if folder_name is not None:
object_name = f'{folder_name}/{object_name}'
# Upload the file
try:
s3_client = boto3.client(
service_name='s3',
aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID,
aws_secret_access_key=YOUR_AWS_SECRET_ACCESS_KEY
)
response = s3_client.upload_file(file_name, bucket, object_name)
print(response)
except ClientError as e:
print(e)
def append_text_to_file_names(files, text):
"""
Appends given text to the name of the files.
Params:
files: List(str): list of file paths
text: str: Text that is to appended
Returns:
files: List(str): list of file paths with text appended
"""
for i in range(len(files)):
file_splitted = files[i].split('/')
file_path = file_splitted[:-1]
file_name = file_splitted[-1]
file_name_splitted = file_name.split('.')
new_file_name = '.'.join([file_name_splitted[0], text, file_name_splitted[1]])
file_path.append(new_file_name)
new_file_name_with_path = '/'.join(file_path)
os.rename(files[i], new_file_name_with_path)
files[i] = new_file_name_with_path
return files
The function append_text_to_file_names() accepts 2 parameters:
files: List of file names with their absolute path
text: String that is to be appended to the file names
In this function, we rename, by appending the given text to the name of the files. After renaming the files we return the list of the files with the new names.
6. Create a function that will use the above functions
The motive of this function is to call the above functions in it, which will be used as a task for scheduling later on.
import boto3
from botocore.exceptions import ClientError
import os
from datetime import datetime, timedelta
def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
"""
Upload a file to an S3 bucket.
Params:
file_name: File to upload
bucket: Bucket to upload to
object_name: S3 object name. If not specified then file_name is used
folder_name: Folder name in which file is to be uploaded
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name.split('/')[-1]
# If folder_name was specified, upload in the folder
if folder_name is not None:
object_name = f'{folder_name}/{object_name}'
# Upload the file
try:
s3_client = boto3.client(
service_name='s3',
aws_access_key_id='YOUR_AWS_ACCESS_KEY_ID',
aws_secret_access_key='YOUR_AWS_SECRET_ACCESS_KEY'
)
response = s3_client.upload_file(file_name, bucket, object_name)
print(response)
except ClientError as e:
print(e)
def append_text_to_file_names(files, text):
"""
Appends given text to the name of the files.
Params:
files: List(str): list of file paths
text: str: Text that is to appended
Returns:
files: List(str): list of file paths with text appended
"""
for i in range(len(files)):
file_splitted = files[i].split('/')
file_path = file_splitted[:-1]
file_name = file_splitted[-1]
file_name_splitted = file_name.split('.')
new_file_name = '.'.join([file_name_splitted[0], text, file_name_splitted[1]])
file_path.append(new_file_name)
new_file_name_with_path = '/'.join(file_path)
os.rename(files[i], new_file_name_with_path)
files[i] = new_file_name_with_path
return files
def rename_and_backup_logs_s3():
"""
Backsup log files to s3 bucket
"""
today = datetime.now()
yesterday = today - timedelta(days=1)
text = yesterday.strftime('%d-%m-%Y')
log_files = [
'/home/pushp/logs/server1.log',
'/home/pushp/logs/server2.log',
'/home/pushp/logs/server3.log',
'/home/pushp/logs/server4.log'
]
print('Appending date to log files...')
log_files = append_text_to_file_names(log_files, text)
print('Appended date to log files...')
print('Uploading logs to S3...')
for log_file in log_files:
upload_file_to_s3(
file_name=log_file,
bucket='YOUR_BUCKET_NAME',
folder_name='server_logs'
)
print('Uploaded logs to S3...')
In the function rename_and_backup_logs_s3(), the previous day’s date is calculated and converted to the ‘DD-MM-YYYY’ string format. log_files list is used to store all the files that we want to back up every day. We call the append_text_to_file_names() passing the list of files and the previous day’s date in ‘DD-MM-YYYY’ format to append it to the name of the files. upload_file_to_s3() is called for each renamed file in the list, to upload it to the S3 bucket. Remember to replace YOUR_BUCKET_NAME with the actual name of the bucket that you assigned while creating the bucket.
7. Final Step, Scheduling the task
We create a schedule to run the task ‘rename_and_backup_logs_s3’ to run daily at ‘00:00’.
import boto3
from botocore.exceptions import ClientError
import os
from datetime import datetime, timedelta
import schedule
import time
def upload_file_to_s3(file_name, bucket, object_name=None, folder_name=None):
"""
Upload a file to an S3 bucket.
Params:
file_name: File to upload
bucket: Bucket to upload to
object_name: S3 object name. If not specified then file_name is used
folder_name: Folder name in which file is to be uploaded
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = file_name.split('/')[-1]
# If folder_name was specified, upload in the folder
if folder_name is not None:
object_name = f'{folder_name}/{object_name}'
# Upload the file
try:
s3_client = boto3.client(
service_name='s3',
aws_access_key_id='YOUR_AWS_ACCESS_KEY_ID',
aws_secret_access_key='YOUR_AWS_SECRET_ACCESS_KEY'
)
response = s3_client.upload_file(file_name, bucket, object_name)
print(response)
except ClientError as e:
print(e)
def append_text_to_file_names(files, text):
"""
Appends given text to the name of the files.
Params:
files: List(str): list of file paths
text: str: Text that is to appended
Returns:
files: List(str): list of file paths with text appended
"""
for i in range(len(files)):
file_splitted = files[i].split('/')
file_path = file_splitted[:-1]
file_name = file_splitted[-1]
file_name_splitted = file_name.split('.')
new_file_name = '.'.join([file_name_splitted[0], text, file_name_splitted[1]])
file_path.append(new_file_name)
new_file_name_with_path = '/'.join(file_path)
os.rename(files[i], new_file_name_with_path)
files[i] = new_file_name_with_path
return files
def rename_and_backup_logs_s3():
"""
Backsup log files to s3 bucket
"""
today = datetime.now()
yesterday = today - timedelta(days=1)
text = yesterday.strftime('%d-%m-%Y')
log_files = [
'/home/pushp/logs/server1.log',
'/home/pushp/logs/server2.log',
'/home/pushp/logs/server3.log',
'/home/pushp/logs/server4.log'
]
print('Appending date to log files...')
log_files = append_text_to_file_names(log_files, text)
print('Appended date to log files...')
print('Uploading logs to S3...')
for log_file in log_files:
upload_file_to_s3(
file_name=log_file,
bucket='YOUR_BUCKET_NAME',
folder_name='server_logs'
)
print('Uploaded logs to S3...')
if __name__ == "__main__":
schedule.every().day.at("00:00").do(rename_and_backup_logs_s3)
while True:
schedule.run_pending()
time.sleep(60) # wait one minute
Run the script:
(env)$ python3 backup_logs_s3.py
In the production environment, use supervisord to start up the script.
Resources:
Create an S3 bucket: https://docs.aws.amazon.com/quickstarts/latest/s3backup/step-1-create-bucket.html
Create an IAM user: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html
User permission best practice: https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege