How to Copy Files from AWS S3 to Your Local Machine and vice versa using aws s3 sync
Prerequisites
Before we begin, ensure that you have the following:
- An AWS account with access to S3.
- AWS Command Line Interface (CLI) installed on your local machine.
- Configured AWS CLI with your credentials.
Step 1: Install AWS CLI
If you haven’t installed AWS CLI on your local machine, you can do so by following the instructions on the official AWS CLI User Guide.
Step 2: Configure AWS CLI
Once you’ve installed AWS CLI, you need to configure it with your AWS credentials. You can do this by running the following command:
aws configure
You’ll be prompted to enter your AWS Access Key ID, Secret Access Key, default region name, and default output format like below:
$ aws configure
AWS Access Key ID [None]: accesskey
AWS Secret Access Key [None]: secretkey
Default region name [None]: us-west-2
Default output format [None]:
- For Access Key and Secret Access Key, you can find them by navigating to the
Security Credentials
in the Account name dropdown menu. - For the Default Region, you can find it by navigating to the Setting icon and choosing
More User Settings
and then you can specify the Default Region - For the Default Output Format, you can choose it whether you want
text
orjson
etc…
If you’re unsure about these, you can find more information in the AWS documentation.
Now we have two methods to copy files, either using the cp
command or the aws s3 sync
method which is more powerful and flexible when working with S3 Buckets.
Method 1: CP
command
- List Your S3 Buckets
Before copying files, you need to know which S3 buckets are available. You can list all your S3 buckets using the following command:
aws s3 ls
- Copy Files from S3 to Your Local Machine
Now that you’ve listed your S3 buckets, you can copy files from any of these buckets to your local machine. The command to do this is:
aws s3 cp s3://your-bucket-name/your-file-name /path/to/local/directory
Replace your-bucket-name
and your-file-name
with the name of your S3 bucket and the file you want to copy, respectively. Replace /path/to/local/directory
with the path to the directory on your local machine where you want to copy the file.
- Verify the Copy
After the copy operation, it’s always a good practice to verify that the file has been copied correctly. You can do this by checking the contents of the local directory where you copied the file.
ls /path/to/local/directory
Method 2: AWS S3 Sync
The AWS CLI provides the “aws s3 sync” command, making it simple to transfer files between your local machine and S3 in both directions, or directly between different buckets. It comes with various flags and options to fulfill all your synchronization needs.
This is the general syntax - without any further flags and options that will we explore later on:
aws s3 sync <source> <destination>
Downloading files from a bucket to Local:
We can download all files from a specific folder to our local machine. If you’re adding the –recursive flag, the sync command will also download nested folders and their files.
aws s3 sync s3://mybucket ~/Downloads --recursive
The S3 sync command will skip empty folders in both upload and download. This means that there won’t be a folder creation at the destination if the source folder does not include any files.
Uploading files to a bucket
This also works in the other direction by switching our parameters.
aws s3 sync ~/Downloads s3://mybucket
By default, AWS S3 sync will upload all files from our directory to the target directory. Already existing files will be overwritten, or if versioning is enabled, will be saved as a new version.
Syncing files between buckets
You can also copy files between two buckets.
aws s3 sync s3://source-bucket s3://target-bucket
This removes the intermediate step of explicitly downloading the files to your local machine from the source bucket and only then uploading them afterward to your target bucket.
Conclusion
Copying files from AWS S3 to your local machine is a straightforward process once you’ve installed and configured AWS CLI. This guide has walked you through the process through two different methods. The simple cp
command and the more sophisticated method of aws s3 sync
.
Remember, working with AWS S3 and other cloud storage services can greatly enhance your data science workflows. However, always ensure that you’re following best practices for data security and management.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.