I’m excited about this new productivity solution the nClouds team has created — we’ve made it faster and easier to get started on big data apps with the Quick Start for Amazon EMR by nClouds.
We saw that as our clients were instrumenting their infrastructure and overall business, they were creating enormous volumes of data. To manage this data, and make it simpler to build and deploy analytics apps, there are a number of big data frameworks capable of processing large data sets across many computers.
Apache Hadoop is a popular example of such a framework. It uses algorithms and a component stack to make large-scale batch processing more accessible. However, it can be difficult, time-consuming, and expensive to implement the framework, especially when deploying, configuring, and managing distributed clusters.
We’ve been using Amazon EMR, a managed Hadoop framework that uses the elastic infrastructure of Amazon EC2 and Amazon S3, to make it easy, fast, and cost-effective to distribute data computation across multiple, dynamically-scalable EC2 instances.
Amazon EMR in a nutshell
If you’re new to Amazon EMR, it essentially enables you to run big data frameworks like Apache Hadoop, Apache Spark, HBase, Presto, and Flink on AWS. You can interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB and process it for analytics purposes and business intelligence workloads.
Amazon EMR also securely and reliably handles a broad set of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.
Amazon EMR delivers a powerful set of benefits
- Cost effective. Amazon EMR pricing changes according to the instance type and number of EC2 instances that you deploy and the region in which you launch your cluster. You can also reduce the cost of your on-demand pricing further by purchasing Reserved Instances or Spot Instances. Spot Instances can offer significant savings—as low as a tenth of on-demand pricing in some cases.
- Scalable and flexible. Amazon EMR gives you the flexibility to scale your cluster up or down as your computing needs change. You can resize your cluster to add instances for peak workloads and remove instances to control costs at lower peak workloads. You can also combine different instance types to take advantage of better pricing for one Spot Instance type over another.
- Reliable. Amazon EMR monitors nodes in your cluster and automatically stops and replaces an instance when it fails. Amazon EMR also provides configuration options that allow you to control how your cluster is terminated—automatically or manually.
- Secure. Amazon EMR leverages other AWS services, such as IAM and Amazon VPC, and features such as Amazon EC2 key pairs, to help you secure your clusters and data. You can control inbound and outbound traffic to your EC2 instances and assign security groups to your master and core/task instances for more advanced rules.
Quick Start for Amazon EMR by nClouds
We wanted to make it fast and easy to get started with Amazon EMR so we created a Quick Start for Amazon EMR by nClouds. You can get up and running fast with all your use cases, and we’ve made it really easy to use Spot and Dedicated Instance discounts to help you save money.
Go faster and reduce costs — that’s the name of the game:
- Stand up clusters fast using AWS CloudFormation templates and end-to-end automation.
- Identify Spot instance discounts to reduce costs using the intelligent pricing option added to CloudFormation.
- Automatically shut down the cluster after scheduled use to save money.
The Quick Start includes everything you need to get started, including a demo with CloudFormation Template, sample .csv file, pyspark script, and more.