AWS S3 SELECT, Aurora Multi-Master & Serverless, DynamoDB Backup & Global Tables, and Neptune Overview
The article introduces AWS S3 SELECT for in‑object SQL filtering, highlights Aurora's new Multi‑Master and Serverless capabilities, describes DynamoDB's on‑demand backup and Global Tables, and provides a brief overview of the managed graph database Neptune, emphasizing performance and cost benefits.
Storage
As more users adopt S3 as a data lake, AWS has released the S3 SELECT feature, which pushes SQL SELECT statements down to S3 objects, reducing data transfer and delivering up to four‑times performance improvement.
# 官方示例代码
import boto3
from s3select import ResponseHandler
class PrintingResponseHandler(ResponseHandler):
def handle_records(self, record_data):
print(record_data.decode('utf-8'))
handler = PrintingResponseHandler()
s3 = boto3.client('s3')
response = s3.select_object_content(
Bucket="super-secret-reinvent-stuff",
Key="stuff.csv",
SelectRequest={
'ExpressionType': 'SQL',
'Expression': 'SELECT s._1 FROM S3Object AS s'',
'InputSerialization': {
'CompressionType': 'NONE',
'CSV': {
'FileHeaderInfo': 'IGNORE',
'RecordDelimiter': '\n',
'FieldDelimiter': ',',
}
},
'OutputSerialization': {
'CSV': {
'RecordDelimiter': '\n',
'FieldDelimiter': ',',
}
}
}
)
handler.handle_response(response['Body'])Preview currently supports CSV and JSON objects; future format support is expected. During preview S3 SELECT is free, and for Athena users it can significantly reduce data scanned and lower costs.
The ecosystem must also adopt the feature, with vendors like Cloudara and DataBricks expected to add support.
Glacier offers a similar capability called Glacier SELECT, which pushes SELECT logic to Glacier for faster data access.
Databases
Database updates include:
Aurora now supports Multi‑Master and Serverless.
DynamoDB adds on‑demand backup/recovery and Global Tables.
Amazon Neptune, a managed graph database, has been launched.
Aurora Multi‑Master
Previously Aurora allowed only a single master; now multiple AZ masters provide automatic failover with zero downtime. While write throughput is expected to improve, the impact on latency due to conflict resolution remains to be seen.
Aurora Serverless
Aurora Serverless offers true on‑demand, pay‑as‑you‑go OLTP capabilities, automatically scaling compute resources based on load without user‑managed nodes.
For more Aurora details, see the AWS Database team session “Deep Dive on the Amazon Aurora MySQL version”.
DynamoDB
DynamoDB now offers a one‑click backup feature that can back up tables of any size in seconds, even up to petabyte scale.
Global Tables provide cross‑region data availability with automatic conflict resolution, akin to an AWS‑managed version of Google Spanner.
Amazon Neptune
Neptune is a fully managed graph database with standard access interfaces, positioned as the “Aurora of the graph database space”.
[1] https://www.youtube.com/watch?v=rPmKo2g9znA
Stay tuned for more engineer insights in the coming days.
Liulishuo Tech Team
Help everyone become a global citizen!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.