Introduction to how AWS data transfer pricing works, pitfalls to watch out for, strategies to reduce costs for different types of data transfer, and how Cloudthread can help.
I’ve worked on various cloud migration projects, moving workloads from on-premise data centers to AWS and had to estimate the costs of moving and running workloads on AWS. I’ve observed that people often ignore so called ‘hidden’ aspects of costs like the amount of data transferred in, out and in between our workload components. People tend to focus on AWS costs for resources like EC2 instance running cost, cost of running their databases using a managed service like RDS, AWS Lambda cost based on lambda function execution time and the lambda memory requirement, DynamoDB read capacity unit (RCU) / write capacity unit (WCU) costs, etc.
For simple workloads where we might provision only a few AWS resources, this would be fine, most of the time and our actual cost of running such workloads on AWS should be close to our initial estimates. But when we have to run highly complex workloads on AWS, for example when our EC2 instances are going to be spread across multiple availability zones / regions due to high availability needs or the need to run our workloads in the same region as our target users due to compliance requirements, in such cases we need to pay a lot of attention to the amount of data being moved around the different components of our complex workload. If we ignore the amount of data that is being transferred in, out and between our AWS deployments, then we could very quickly rack up an unexpectedly huge AWS bill due to the amount of data getting transferred and that could mean that our actual AWS bill could end up being way more than our initial estimates.
In the rest of this article we will enumerate all the different AWS Data Transfer charges and what precautions we can take so as to not get bitten with an unexpectedly huge AWS bill.
Tags are metadata that can be attached to AWS resources to identify and organize those resources in a better way. Tags are unique key value pairs that can contain information about the tagged AWS resources, information like the user who created the resource, whether the resource belongs to a Test / Production stack, etc. Tags can either be generated by AWS on resource creation or they can be user-defined.
Cost allocation tags are a type of tags that can be used to segregate and organize AWS resource costs. These tags are used to organize resource costs inside the cost allocation report, to see how much our Test / Production resources are costing us separately and other such grouping of related resource costs.
Data transfer costs could be related to data that was transferred between multiple AWS resources, hence may not be associated with only a single AWS resource and so are hard to organize and separate using tags and cost allocation tags, like AWS resource usage costs.
It is difficult to isolate what exactly the data transfer costs are related to, whether they are related to Inter availability zone traffic or incoming / outgoing traffic over the Internet or any Region to Region traffic.
There is a lot of variability when it comes to costs related to data transfers from AWS. It depends on the destination of the outgoing data. It is hard to keep track of all the possible costs, for all the possible destinations of the outgoing traffic.
All the data going from outside to AWS over the public internet is not charged by AWS, so we are free to move as much data as we want from outside to AWS over the public internet. Data transferred out from AWS over the internet is charged differently based on the service and the region from which the data transfer is happening, though the first 100 GB of outbound data aggregate across all AWS services and all AWS regions, is free of cost, each month. Also, the more data we need we need to transfer out from AWS the cheaper is the per gigabyte transfer price. Below chart shows the EC2 outbound data transfer pricing per GB of data, for the us-east-1 (N. Virginia) region.
General advice here is to minimize the outbound data transfer as much as possible, but for businesses that are built on top of serving data to their customers that might not be an option. In such cases what we can do is to use a content distribution service like Amazon CloudFront for caching our static application content like images and videos, closer to our application users and reduce our outgoing data costs. Data transfer charges from CloudFront edge locations are cheaper when compared to data transfer charges directly from AWS regions, the cheapest for US based regions being $0.020 for data transferred through CloudFront in access of 5 PB / month. CloudFront also has a generous free tier of 1 TB of free data transfer each month, but we obviously have to pay for using the service and that charge depends on the number of world wide AWS edge locations we want to cache our data on.
For private subnets, where EC2 instances are pulling operating system updates and patches from the internet via NAT gateways we should make sure that we have NAT gateways present in each AZ where we have private subnets so that EC2 instances are not pulling updates via a NAT gateway instance present inside another subnet housed inside another availability zone, which would incur a cross availability zone data transfer fee for both the outgoing and also the oncoming availability zone ($ 0.01 / GB + $ 0.01 / GB).
Consider transferring data directly over an internet gateway instead of routing it through a NAT gateway when a lot of data is being transferred from EC2 instances over the public internet. NAT gateways incur a steep data transfer fee when compared to an internet gateway which is free.
If we need to move data between AWS and our on-premise data center or other public cloud providers and our data is highly sensitive and it needs to be transferred over a secure channel, then we should provision an IPSec Site-to-Site VPN tunnel between the on-premises data center / other cloud provider VPC and our AWS VPC and transfer data securely over the VPN connection. In this case, we would be charged for each hour that the VPN connection is alive and active. All the inbound data transferred to AWS over the VPN connection is still free. For outbound data, standard outbound data transfer charges depending on the AWS service are still applicable.
If in case we want to transfer huge volumes of data continually between our on-premise data center and AWS and we don’t want to use the public internet to transfer our data due to bandwidth restrictions then we should establish a Direct Connect connection between AWS and our data center which is like a high-bandwidth dedicated network connection between AWS and our on-premise data center. In this case, we get charged for each hour that the Direct Connect connection is active. Apart from that, the inbound data transfers over Direct Connect are free and the outbound data transfer over Direct Connect costs us $0.0200 / GB for US based regions, which is cheaper when compared to the cheapest data transfer rate from an AWS service over the public internet, for example for EC2 it is $0.0500 / GB for transferring data in access of 150 TB (us-east-1).
For a one time transfer of high volume data to AWS we may choose to use one of the Snow family devices, where AWS will ship a storage device to us, we can load up our data in that storage device and ship it back to AWS and they will move the data from the storage device to S3. It can also be used for outbound data transfer from AWS. Pricing wise, AWS will charge per day for the storage device, based on the memory size of the device. Apart from that, the inbound data transfers are free and the outbound data transfer costs $0.0300 / GB for transferring data to selected US based regions, which is cheaper when compared to the cost of data transfer over the public internet.
Now let’s talk about the cost of the data transferred between the different components of our workload running inside AWS.
AWS does not charge the customers for any data that is transferred between our workload components within the same availability zone, it is all free of charge. This could be the data transferred between two separate microservices running on two different EC2 instances, data that is read by an application deployed on an EC2 instance from an RDS database or an ElastiCache instance all inside the same availability zone. One thing I would like to highlight here is, when transferring data internally, private IP addresses should always be used for the destination rather than the public IP address. For example, if destination EC2 public IP addresses are used when transferring data internally within an availability zone, it is not free and standard EC2 outgoing data costs of $ 0.01 / GB are applicable in this case.
For high availability, we might deploy a multi-AZ RDS database, where we have the primary RDS instance in one AZ and a standby RDS instance in another AZ. The data replication between the primary and the standby across AZs is free. However, if data is being transferred to an RDS database instance from an EC2 instance in another AZ then standard outgoing and incoming data transfer charges across AZs are applicable ($ 0.01 / GB + $ 0.01 / GB). General advice is to minimize the cross-AZ traffic as much as possible.
For any data transfer between two application components that are inside two separate VPCs in two different AWS regions over a peered connection incurs standard outgoing inter-region data transfer charge for the source region ($0.01 - $0.02). Incoming data inside the destination region is free. In general we should try and keep inter region data transfers to a bare minimum.
Data transfer between AWS services like S3, DynamoDB, etc and workload components (EC2 instances) running inside a public subnet within the same region via the service public endpoint is free.
Data transfer between AWS services and workload components (EC2 instances) running inside a public subnet (VPC) in different regions incurs standard inter-region data transfer charge for the source region ($0.01 - $0.02).
Data transfers between AWS services and workload components running inside a private subnet can either happen over a NAT gateway going out over the public internet to the service public endpoints or it can happen using VPC endpoints where the traffic to the AWS service from a VPC does not need to traverse the public internet. For S3 and DynamoDB we have Gateway VPC endpoints and for all other AWS services we have the interface VPC endpoints powered by AWS PrivateLink. For data transfers going to the AWS services through a NAT gateway incurs the data processing charge per GB of data crossing the NAT gateway. For the us-east-1 region this NAT gateway data processing charge is $0.045 / GB in addition to an hourly service charge or $0.045 for the NAT gateway. It is a lot more economical to transfer data to AWS services from a private VPC subnet, over a VPC endpoint. For Interface VPC endpoints, a data processing charge of $0.01 per GB of data processed is applicable in addition to an hourly running charge of $0.01. For Gateway VPC endpoints, the data transfer rate to S3 and DynamoDB services is $0.0035 in addition to $0.01 service charge per hour for using a Gateway VPC endpoint per AZ.
When it comes to data transfer the challenge is often isolating where costs are coming from, what the culprit in driving those costs is, and then tracking data transfer cost efficiency over time. Using Cloudthread’s cost filters we can easily isolate the usage types associated with DataTransfer, identify the culprit service/resource/region/account, save relevant Cost Views, and share overview reports and anomalies to the relevant stakeholders via Slack and email.
As a reminder when you start digging in, the Usage Types most commonly associated to data transfer costs are below:
As we have seen in this article, if we ignore the various data transfer costs and considerations while estimating our AWS costs, we run the risk of getting our estimates completely wrong and severly overshoot our budgets. We have to be considerate about the various data flows and also the amount of data being transferred into, out and across the various components of our AWS deployments. We have to make sure that we have good and in-depth visibility about the various data transfer costs that we are incurring, something the Cloudthread platform can help us with.