ageller – BidElastic https://bidelastic.com Optimize your cloud computing costs Wed, 09 Jun 2021 23:19:21 +0000 en-US hourly 1 https://wordpress.org/?v=4.5.32 AWS cloud storage for Big Data https://bidelastic.com/2016/05/16/aws-cloud-storage-for-big-data/ https://bidelastic.com/2016/05/16/aws-cloud-storage-for-big-data/#respond Mon, 16 May 2016 15:57:43 +0000 http://localhost/wordpress/?p=285

A key financial advantage of cloud computing over onsite computing is elasticity: Cloud computing power and storage resources can be provisioned on-demand at virtually any size. This ability of clouds to elastically allocate resources makes performing large-scale data processing and analytic jobs cost-effective.

Cloud clusters

To begin, let’s go over AWS options for large computing clusters and massive data storage.

An AWS cluster can be built either via EC2 instances on IaaS or Elastic Map Reduce (EMR) on Hadoop PaaS. With both options, you need to take several decisions on how to optimize costs. For every cluster node you need to decide its instance type, and then how you want to purchase the instance: On-demand, instance or spot. Remember that optimizing direct costs is only one part of the issue; the other part is optimizing over SLA levels. You need to make these pricing decisions for both IaaS and PaaS (EMR supports both reserved and spot instances). BidElastic can help you to find cost- and SLA-optimal node structures for your Hadoop or Apache Spark installation.

How about data storage?

AWS data storage options for large analytics are even more complicated than those of computing power. To understand the AWS ecosystem of large data storage, let’s begin with data types commonly found in large-scale computations: Transactional data; streamed data, and logs and binary files.

For transactional data AWS offers three choices: Relational (RDS) and NOSQL (DynamoDB) databases, and a data warehouse (RedShift).

  • When storing traditional relational transactional data in RDS you can scale the transactional speed of databases up by increasing database instance size. To improve data reading in RDS you can scale databases out by adding read replicas. Read replica data are copied from the master database asynchronously. Up to five read replicas are available for MySQL and PostgreSQL and up to 15 for Aurora. However, if you’re not concerned with latency, you can overcome this limit by read replicas of read replicas.
  • As a NOSQL PaaS AWS offers DynamoDB supporting scale key-value storage models. The key advantage of DynamoDB is its scaling out model and ability to increase throughput capacity on demand. DynamoDB’s pricing model is based on through-output and stored data volume.
  • Finally, AWS offers Redshift as a data warehousing solution with unlimited capacity columnar storage. Redshift is compatible with PostreSQL database drivers, so using Redshift you can analyze data with SQL queries.

To process streamed data in AWS you can use Kinesis. Kinesis supports simultaneous asynchronous ingestion of several data streams and integrates with AWS technology stack nicely. Within Kinesis there are various services to process data with, for example, you can send each item to an AWS Lambda service or a SQS queue for further processing.

Logs and binary files can be stored in AWS S3. The big advantage of S3 is that a Hadoop cluster can operate directly on S3, so there is no need to build data nodes. So if your problem permits, using S3 greatly increases cost-efficiency of the analytic job. You need to be cautious here however. Data transfer rate of a local volume outperforms that of S3. This problem can be mitigated by data compression and composing clusters out of large numbers of smaller nodes. As data compression methods, we recommend LZO for Hadoop1 and Snappy for Hadoop2. Data transfer parallelism on S3 is efficient, so when a lot of workers on several nodes request S3 data simultaneously, the combined transfer rate is satisfactory for most applications. As another bonus, Amazon Hadoop PaaS (EMR) and standalone Hadoop installations support reading and writing map-reduce operations to S3. It is also possible to query S3 data with Presto or Hive. Integration of S3 data storage with Apache Spark is also straightforward.

]]>
https://bidelastic.com/2016/05/16/aws-cloud-storage-for-big-data/feed/ 0
Optimizing Amazon Web Services costs https://bidelastic.com/2016/04/09/optimizing-amazon-web-services-costs/ https://bidelastic.com/2016/04/09/optimizing-amazon-web-services-costs/#respond Sat, 09 Apr 2016 21:08:34 +0000 http://localhost/wordpress/?p=283 Amazon Web Services (AWS) evolves constantly. In 2015 Amazon announced over 700 AWS updates, typically new and modified features or novel pricing mechanisms. When AWS functionalities evolve, so do cost-effective cloud solutions on AWS. A solution that used to offer low costs a year ago doesn’t necessarily do so today. Let’s go over three examples. 

The first example of AWS updates that make it necessary to refine cloud solutions on AWS is the Infrequent Access (IA) storage class for Simple Storage Service (S3). S3 costs 0.03 USD per GB per month; S3-IA 0.0125, about 60 percent less. S3-IA functionality and redundancy are identical to those of S3, but there is a catch: It costs 0.01 USD to retrieve a GB of data from S3-IA while data retrieval is free in S3. Also S3-IA GET requests cost 150 percent more than those of S3. So S3-IA can lower the cost of rarely used buckets substantially, but can also raise the cost of frequently used buckets. We offer predictive analytics and cost structure optimization on S3 to help companies take full advantage of S3-IA.

Another issue to keep in mind is how AWS credits and cost tracking work together or don’t. Amazon issues credits as service discount; incentive to new customers, and compensation when it breaches the SLA on specific S3 services (S3 SLA). Again there is a catch: Amazon recommends that customers track AWS costs via Billing Alarms. But Billing Alarm is not triggered until a customer account runs out of credit. So, once customers receives credits for any reason, they can’t properly track and monitor AWS costs unless they use one of the external tools for AWS cost tracking.

Finally, consider billing for the Amazon Relational Database Service (RDS). With a regular EC2 instance Amazon doesn’t charge customers for the period of time the instance is stopped. Not so with an RDS instance. Here Amazon does charge customers for instance time regardless of whether the instance is stopped or not. Only deleting the database server along with automatic database backups stops the billing. Deleting automatic backups doesn’t affect persistent manual database snapshots that can be used to restore the database at a later time. So, RDS customers who use a relational database for only a few hours can lower costs by either running an RDBMS on an EC2 instance or a script that snapshots the database; deletes it at the end of the day and restores it the next day. None of these options is particularly attractive.

The above examples show that to run competitive cloud computing solutions on AWS, companies need to continuously track the costs structure of the solution; constantly monitor AWS feature updates, and adapt accordingly. A more cost-effective and less time consuming way to cut costs would be to deploy cost optimization services.

]]>
https://bidelastic.com/2016/04/09/optimizing-amazon-web-services-costs/feed/ 0
Slash HPC computing costs on the cloud https://bidelastic.com/2016/04/09/slash-hpc-computing-costs-on-the-cloud/ https://bidelastic.com/2016/04/09/slash-hpc-computing-costs-on-the-cloud/#respond Sat, 09 Apr 2016 11:07:27 +0000 http://localhost/wordpress/?p=280

StarCluster is developed at the Massachusetts Institute of Technology for scientific HPC computing on Amazon Web Services (AWS). It supports building HPC clusters based on the Open Grid Scheduler, previously Sun Grid Engine, or HTCondor, a high throughput computing environment previously known as Condor. 

StarCluster offers Python-based tools for the authentication, node management, elastic auto scaling and monitoring of HPC clusters. An EBS drive shared with worker nodes through NFS provides data persistence within clusters on Amazon EFS or other Amazon storage services such as S3RedShift or DynamoDB (See the picture below).

Setting up computational jobs to StarCluster is easy. Any pre-configured Ubuntu or Red Hat Amazon Machine Image (AMI) can be used as an HPC cluster node. StarCluster is easily configurable and extendable via plugins, making it possible to custom configure HPC cluster nodes at boot time; install packages remotely, and generate setup and configuration files. One of  our team members has designed and is participating in building such extension for D-MASON, a toolkit for distributed multiagent simulations.

It is also worth noting that StarCluster has a properly configured security: An entire cluster runs within a single Amazon Security Group and every node in the cluster is accessible by SSH private key authorization.

StarCluster supports running large computational jobs on spot instances with much lower computing costs than on-demand instances. For one of our customers we built an extension to the StarCluster auto scaler for auto scaling the HPC with spot instances that integrates with our Bid Server.

]]>
https://bidelastic.com/2016/04/09/slash-hpc-computing-costs-on-the-cloud/feed/ 0
Cloud Monitor https://bidelastic.com/2016/01/26/cloud-monitor/ https://bidelastic.com/2016/01/26/cloud-monitor/#respond Tue, 26 Jan 2016 20:48:28 +0000 http://localhost/wordpress/?p=270

Cloud Monitor is a secure web application that ingests raw EC2 Watch feeds and provides the customer with access to interactive reports on: 

Instance hierarchy based on platform, environment, type, name and instance metadata.

  • Functional group: Testing, staging, pre-production, production and demonstration.
  • Production group of instance functionalities such as ingest and encode.
  • Instance ID.
  • Measure time.
  • Instance type, for example t2.micro, c3.large or m3.medium.

Default measures automatically made available include:

  • Costs.
  • CPU usage.
  • Incoming Network Transfer.
  • Outgoing Network Transfer.
  • Disk read and write operations.
  • CPU credit balance for burstable t2.* instances.

Custom measures requiring running code on the server to generate:

  • Memory usage.
  • CPU usage for every running process.
  • Any other custom metrics specific to customer needs.

Cloud Monitor reports can be downloaded in PDF format with the underlying data available in XLS.

]]>
https://bidelastic.com/2016/01/26/cloud-monitor/feed/ 0
Optimizing of simulation computations on the Amazon EC2 spot market https://bidelastic.com/2016/01/26/optimizing-of-simulation-computations-on-the-amazon-ec2-spot-market/ Tue, 26 Jan 2016 14:07:45 +0000 http://localhost/wordpress/?p=278 The BidElastic team published an article in a special issue of Simulation Modelling Practice and Theory. The article describes some of the technology and principles used by Bid Server to plan and provision EC2 spot instances. 

Using the Amazon spot price market can significantly lower the execution costs of large-scale simulations that require millions of computations. However, Amazon can interrupt computations on the spot price market when user bids are too low. To complete computations without incurring high costs, a bidding algorithm should be developed that balances costs and completion time of computations.

We have developed such an algorithm by identifying drivers of spot prices on Amazon EC2 and using these insights to propose an adaptive bidding strategy to minimize computation costs and delays due to computation termination simultaneously. It turns out that bidding close to spot prices and dynamically switching between instances is an efficient and simple strategy. To develop and test other bidding strategies on the Amazon spot price market, we have also built a simulator of the EC2 spot pricing mechanism.

The article is available as Open Access on Science Direct.

]]>