As an experienced AWS specialist transitioning to Azure, I encountered a well-known challenge in a new environment. One notable issue was the high transaction costs in Azure Table Storage. Leveraging my AWS expertise, I developed a custom solution that efficiently cleans Azure Tables while minimizing transaction expenses. In this blog post, I’ll be sharing the insights from my journey.
Archive Data Problem
Keeping old data on the cloud can be costly and negatively impact performance. Our long-standing customers are heavily reliant on IoT devices and it deliver data to an Azure Storage Table. With over 10 million entities added daily, this has resulted in massive data collection over time. Significant effects have been observed in terms of costs and performance, especially in terms of transaction costs for data inserts, as a result of the noteworthy fact that this data has been held in the absence of retention restrictions. Optimizing these transaction costs while preserving the system's functioning is now the difficult part. Here are the current costs for the consumer to examine.
Note: The current rate is $0.0045 per GB and $46.0800 per TB
Operation and Data transfer price ($0.0004 on 10K Insert Transaction vary by tier
**Data storage and transaction pricing for account-specific key encrypted Tables that rely on a key that is scoped to the storage account to be able to configure the customer-managed key for encryption at rest.
The end goal is to lower the data transaction cost and implement a retention period of 1 month which means older data more than 1 month should be removed.
Current Total Cost over 1 year
Total Storage Cost: Increasing storage cost by 13.5 USD every month = $1053 USD (13.5 + 27 +...)
Total Transaction Cost: $144 USD
Grand Total: $165+$144=$309 USD
Before diving into the solution, let's briefly review Azure Table Storage.
Understanding Azure Table Storage
- It's a cloud-based NoSQL datastore for structured data
- Offers a key/attribute store with a schemaless design
- Provides fast and cost-effective access for many applications
- Ideal for storing flexible datasets like user data, device information, and metadata
- Can store terabytes of structured data
- Supports authenticated calls from inside and outside the Azure cloud
An entity group transaction must meet the following requirements:
- Every entity involved in the transaction that is subject to operations needs to have the same PartitionKey value.
- In a transaction, an entity can only appear once and can only be the target of one operation.
- The transaction's overall payload size cannot exceed 4 MiB, and it can contain up to 100 entities in total.
- All entities are subject to the limitations described in Understanding the Table Service Data Model.
Crafting a Custom Solution
We had a discussion with the customer and it turns out they are only doing insertion in the table and not using the data after a month. Also, every insertion in the Azure table was of 2KB and doing single insertions which is costing a lot.
So we buffered all rows to be inserted with a batch of 2000 and used Batch write transactions.
To delete older data, I leveraged my AWS experience to create a custom solution for removing outdated records from Azure Tables.
Data Identification: Leveraging timestamps and partition keys to efficiently locate outdated entries.
Here’s a snippet of the data identification logic older than 6 months:
Deletion Strategy: Using the Azure Python SDK to automate the deletion process, inspired by AWS Boto3.
Here's a snippet of the core deletion logic:
This automated approach allows for periodic cleanups without manual intervention, keeping tables lean and efficient.
Now, the current cost looks like this:
Note: The current rate is 0.0045 per GB and 46.0800 per TB
Operation and Data transfer price (0.075 on 10K Batch Insert Transaction vary by tier)
New Total Cost Over 1 Year
- Total Storage Cost: $162 USD
- Total Transaction Cost: $27 USD
Grand Total: $162+$27=$189 USD
Cost Savings
Total cost savings after deleting data every year= $1053 -$162 = $891
Total cost savings from moving to single insertion to batched insertion = $144 - $27 = $117
Total cost savings = ($1053+$144) - ($162 + $27) = $1008 (84.21% cost savings)
You would save about 84 % on your bill if you implement a tailored retention solution to remove data on a monthly basis and use batch transactions. With this method, regular data maintenance is possible and storage and transaction costs are greatly reduced.
Benefits of the Custom Solution
- Cost Reduction: Reduction in storage expenses for out-of-date entries was significant
- Performance Improvement: Significant increases in the speed of data retrieval and queries.
- Automation: The reduction of manual intercept in data management activities.
- Scalability: The ability to modify the solution to accommodate increasing volumes of data.
Conclusion
Transitioning from AWS to Azure presented unique challenges as well as opportunities to leverage cross-platform experience. We have not only resolved a specific problem with Azure Table Storage by developing this innovative solution and optimizing the cost, but we have also demonstrated that cloud expertise is transferable to several platforms.
Regardless of the platform you are using, effective data management is essential to maximizing your cloud's performance and expenses.
1]: Please check the most recent price as it may vary: official documentation
Code for Reference: