As in every year, the AWS re:Invent week came with a host of new announcements, of which you can get the complete list here.
At Altis Consulting we focus on data and what we can do with it, so don’t expect full hardware specs and comparison of the new EC2 instance families that have just been released. If you are still keen to talk gear, drop me an email and we can chat when no one is watching, instead, in this blog we will talk about 2 new services arriving in the Big Data and Analytics arena.
The first one, AWS Athena is already available in a few regions in the US. This service allows direct query of data stored in S3. Queries are written in standard SQL and supported file formats include CSV, Json and Parquet. Athena integrates with AWS QuickSight, or any JDBC compatible BI tool, for visualisation of the output.
In conversations about Data Lake, customers often have the following question: “Once I’ve got all my data stored in my Data Lake on S3, how do I access it?” The answer often is that further processing needs to be done on EMR or that data can be pushed to RedShift and then analysed with a data discovery tool. Well, AWS Athena will certainly change this conversation.
Of course there is still a place for Data Analytics heavy weights like Redshift and EMR. Athena certainly won’t replace any of these two. They rather work together in a bi-modal way with Mode 1 being addressed by EMR and RedShift and Mode 2 supported by Athena for ad-hoc querying of the data. If you are unfamiliar with bi-modal BI, I suggest you check out point 6 of this blog post by my colleague James Black.
The other service that was announced is AWS Glue. The service is not available yet, but it is presented as ETL as a service. There is not a lot of information available at the moment, but it seems that ETL processes can be created without the need of scripting, although generated scripts can be modified if needed. As expected, it will integrate with most AWS Data Stores (S3, RDS, RedShift) and crawl them to generate a catalogue of data available in your account.
These are exciting times as we can now envisage a complete fully-managed bi-modal big data and analytics platform on AWS – all of it without the need to spin up a single EC2 instance.
One last thing: you can now move loads of data to AWS using a truck!