An oft-held misconception by many individuals and organizations is that AWS is great for Web services, big data processing, DR, and all of the other “Internet facing” applications but not for running your internal business applications. While AWS is absolutely an excellent fit for the aforementioned purposes, it is also an excellent choice for running the vast majority of business applications. Everything from email services, to BI applications, to ERP, and even your own internally built applications can be run in AWS with ease while virtually eliminating future IT capex spending.
Laying the foundation
One of the most foundational pieces of architecture for most businesses is the network that applications and services ride upon. In a traditional model, this will generally look like a varying number of switches in the datacenter that are interconnected with a core switch (e.g. a pair of Cisco Nexus 7000s). Then you have a number of routers and VPN devices (e.g. Cisco ASA 55XX) that interconnect the core datacenter with secondary datacenters and office sites. This is a gross oversimplification of what really happens on the business’s underlying network (and neglects to mention technologies like Fibre Channel and InfiniBand). But that further drives the point that migrating to AWS can greatly reduce the complexity and cost of a business in managing a traditional RYO (run your own) datacenter.
Anyone familiar with IT budgeting is more than aware of the massive capex costs associated with continually purchasing new hardware as well as the operational costs associated with managing it – maintenance agreements, salaries of highly skilled engineers, power, leased datacenter and network space, and so forth. Some of these costs can be mitigated by going to a “hosted” model where you are leasing rack space in someone else’s datacenter, but you are still going to be forking out a wad of cash on a regular basis to support the hosted model.
The AWS VPC (Virtual Private Cloud) is a completely virtual network that allows businesses the ability to create private network spaces within AWS to run all of their applications on, including internal business applications. Through the VGW (Virtual Private Gateway) the VPC inherently provides a pathway for businesses to interconnect their off-cloud networks with AWS. This can be done through traditional VPNs or by using the VPC’s Direct Connect. Direct provides a dedicated private connection from AWS to your off-cloud locations (e.g. on-prem, remote offices, colocation). The VPC is also flexible enough that it will allow you to run your own VPN gateways on EC2 instances if that is a desired approach. In addition, interconnecting with most MPLS providers is supported, as long as the MPLS provider hands off VLAN IDs.
Moving up the stack
The prior section showed how the VPC is a low cost and simplified approach to managing network infrastructure. We can proceed up the stack to the server, storage, and application layers. Another piece of the network layer that is generally heavily intertwined with the application architecture and the server’s hosting is load balancing. At a minimum, load balancing enables the application to run in a highly available and scalable manner while providing a single namespace/endpoint for the application client to connect. Amazon’s ELB (Elastic Load Balancer) is a very cost effective, powerful, and easy to use solution to load balancing in AWS. A lot of businesses have existing load balancing appliances, like F5 BigIP, Citrix Netscaler, or A1, that they use to manage their applications. Many have also written a plethora of custom rules and configs, like F5 iRules, to do some layer 7 processing and logic on the application. All of the previously mentioned load balancing solution providers, and quite a few more, have AWS hosted options available, so there is an easy migration path if they decide the ELB is not a good fit for their needs. However, I have personally written migration tools for our customers to convert well over a thousand F5 Virtual IPs and pools (dumped to a CSV) into ELBs. It allowed for a quick and scripted migration of the entire infrastructure with an enormous cost savings to the customer. In addition to off-the-shelf appliances for load balancing, you can also roll your own with tools like HAProxy and Nginx, but we find that for most people the ELB is an excellent solution for meeting their load balancing needs.
Now we have laid the network foundation to run our servers and applications on. AWS provides several services for this. If you need, or desire, to manage your own servers and underlying operating system, EC2 (Elastic Compute Cloud) provides the foundational building blocks for spinning up virtual servers you can tailor to suit whatever need you have. A multitude of Linux and Windows-based Operating Systems are supported. If your application supports it, there are services like ElasticBeanstalk, OpsWorks, or Lambda, to name a few, that will manage the underlying compute resources for you and simply allow you to “deploy code” on completely managed compute resources in the VPC.
What about my databases?
There are countless examples of people running internal business application databases in AWS. The RDS (Relational Database Service) provides a comprehensive, robust, and HA capable hosted solution for MySQL, PostgreSQL, Microsoft SQL server, and Oracle. If your database platform isn’t supported by RDS, you can always run your own DB servers on EC2 instances.
NAS would be nice
AWS has always recommended a very ephemeral approach to application architectures and not storing data directly on an instance. Sometimes there is no getting away from needing shared storage, though, across multiple instances. Amazon S3 is a potential solution but is not intended to be used as attached storage, so the application must be capable of addressing and utilizing S3’s endpoints if that is to be a solution. There are a great many applications that aren’t compatible with that model.
Until recently your options were pretty limited for providing a NAS type of shared storage to Amazon EC2 instances. You could create a GlusterFS (AKA Redhat Storage Server) or Ceph cluster out of EC2 instances spanned across multiple availability zones, but that is fairly expensive and has several client mounting issues. The Gluster client, for example, is a FUSE (filesystem in user space) client and has sub-optimal performance. Linux Torvalds has a famous and slightly amusing – depending upon the audience – rant about userspace filesystems (see: https://lkml.org/lkml/2011/6/9/462). To get around the FUSE problem you could always enable NFS server mode, but that breaks the ability of the client to dynamically connect to another GlusterFS server node if one fails thus introducing a single point of failure. You could conceivable set up some sort of NFS Server HA cluster using Linux heartbeat, but that is tedious, error prone, and places the burden of the storage ecosystem support on the IT organization, which is not desirable for most IT organizations. Not to mention that Heartbeat requires a shared static IP address, which could be jury rigged in VPC, but you absolutely cannot share the same IP address across multiple Availability Zones, so you would lose multi-AZ protection.
Yes, there were “solutions” but nothing that was easy and slick like most everything else in AWS is nor anything that is ready for primetime. Then on April 9th, 2015 Amazon introduced us to EFS (Elastic File System). The majority of corporate IT AWS users have been clamoring for a shared file system solution in AWS for quite some time, and EFS is set to fill that need. EFS is a low latency, shared storage solution available to multiple EC2 instances simultaneously via NFSv4. It is currently in preview mode but should be released to GA in the near future. See more at https://aws.amazon.com/efs/.
Thinking outside the box
In addition to the AWS tools that are analogs of traditional IT infrastructure (e.g. VPC ≈ Network Layer, EC2 ≈ Physical server or VM) there are a large number of tools and SaaS offerings that add value above and beyond. Tools like SQS, SWF, SES, RDS – for hosted/managed RDMBS platforms – CloudTrail, CloudWatch, DynamoDB, DirectoryServices, WorkDocs, WorkSpace, and many more make transitioning traditional business applications into the cloud easy, all the while eliminating capex costs, reducing operating costs, and increasing stability and reliability.
A word on architectural best practices If it is at all possible, there are some guiding principles and best practices that should be followed when designing and implementing solutions in AWS. First and foremost, design for failure. The new paradigm in virtualized and cloud computing is that no individual system is sacred and nothing is impervious to potential failure. Having worked in a wide variety of high tech and IT organizations over the past 20 years, this should really come as no surprise because even when everything is running on highly redundant hardware and networks, equipment and software failures have ALWAYS been prevalent. IT and software design as a culture would have been much better off adopting this mantra years and years ago. However, overcoming some of the hurdles designing for failure creates wasn’t a full reality until virtualization and the Cloud were available.
AWS is by far the forerunner in providing services and technologies that allow organizations to decouple the application architecture from the underlying infrastructure. Tools like Route53, AutoScaling, CloudWatch, SNS, EC2, and configuration management allow you to design a high level of redundancy and automatic recovery into your infrastructure and application architecture. In addition to designing for failure, decoupling the application state from the architecture as a whole should be strived for. The application state should not be stored on any individual component in the stack, nor should it be passed around between the layers. This way the loss of a single component in the chain will not destroy the state of the application. Having the state of the application store in its own autonomous location, like a distributed NoSQL DB cluster, will allow the application to function without skipping a beat in the event of a component failure.
Finally, a DevOps, Continuous Integration, or Continuous Delivery methodology should be adopted for application development. This allows changes to be ed automatically before being pushed into production and also provides a high level of business agility. The same kind of business agility that running in the Cloud is meant to provide.
The exponential growth of big data is pushing companies to process massive amounts of information as quickly as possible, which is often times not realistic, practical or down right just not achievable on standard CPI’s. In a nutshell, High Performance Computing (HPC) allows you to scale performance to process and report on the data quicker and can be the solution to many of your big data problems.
However, this still relies on your cluster capabilities. By using AWS for your HPC needs, you no longer have to worry about designing and adjusting your job to meet the capabilities of your cluster. Instead, you can quickly design and change your cluster to meet the needs of your jobs. There are several tools and services available to help you do this, like the AWS Marketplace, AWS API’s, or AWS CloudFormation Templates.
Today, I’d like to focus on one aspect of running an HPC cluster in AWS that people tend to forget about – placement groups.
Placement groups are a logical grouping of instances in a single availability zone. This allows you to take full advantage of a low-latency 10 GB network, which in turn will allow you to be able to transfer up to 4TB of data per hour between nodes. However, because of the low-latency 10 GB network, the placement groups cannot span to multiple availability zones. This may scare some people away from using them, but it shouldn’t. You can create multiple placement groups in different availability zones as a work-around, and with enhanced networking you can also still connect between the different HPC’s.
One of the grea benefits of AWS HPC is that you can run your High Performance Computing clusters with no up-front costs and scale out to hundreds of thousands of cores within minutes to meet your computing needs. Learn more about Big Data and HPC solutions on AWS or Contact Us to get started with a workload workshop.
Business intelligence (BI) is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. BI as a discipline is made up of several related activities including data mining, online analytical processing, querying and reporting. Analytics is the discovery and communication of meaningful patterns in data. This blog will look at a few areas of BI that will include data mining and reporting, as well as talk about using analytics to find the answers you need to make better business decisions.
Data Mining is an analytic process designed to explore data. Companies of all sizes continuously collect data, often times in very large amounts, in order to solve complex business problems. Data collection can range in purpose from finding out the types of soda your customers like to drink to tracking genome patterns. To process these large amounts of data quickly takes a lot of processing power, and therefore, a system such as Amazon Elastic MapReduce (EMR) is often needed to accomplish this. AWS EMR can handle most use cases from log analysis to bioinformatics, which are key when collecting data, but AWS EMR can only report on data that is collected, so make sure the collected data is accurate and complete.
Reporting accurate and complete data is essential for good BI. Tools like Splunk’s Hunk and Hive work very well with AWS EMR for modeling, reporting, and analyzing data. Hive is business intelligence software used for reporting meaningful patterns in the data, while Hunk helps interactively review logs with real-time alerts. Using the correct tools is the difference between data no one can use and data that provides meaningful BI.
Why do we collect all this data? To find answers of course! Finding answers in your data, from marketing data to application debugging, is why we collect the data in the first place. AWS EMR is great for processing all that data with the right tools reporting on that data. But more than knowing just what happened, we need to find out how it happened. Interactive queries on the data are required to drill down and find the root causes or customer trends. Tools like Impala and Tableau work great with AWS EMR for these needs.
Business Intelligence and Analytics boils down to collecting accurate and complete data. That includes having a system that can process that data, having the ability to report on that data in a meaningful way, and using that data to find answers. By provisioning the storage, computation and database services you need to collect big data into the cloud, we can help you manage big data, BI and analytics while reducing costs, increasing speed of innovation, and providing high availability and durability so you can focus on making sense of your data and using it to make better business decisions. Learn more about our BI and Analytics Solutions here.
Batch computing isn’t necessarily the most difficult thing to design a solution around, but there are a lot of moving parts to manage, and building in elasticity to handle fluctuations in demand certainly cranks up the complexity. It might not be particularly exciting, but it is one of those things that almost every business has to deal with in some form or another.
The on-demand and ephemeral nature of the Cloud makes batch computing a pretty logical use of the technology, but how do you best architect a solution that will take care of this? Thankfully, AWS has a number of services geared towards just that. Amazon SQS (Simple Queue Services) and SWF (Simple Workflow Service) are both very good tools to assist in managing batch processing jobs in the Cloud. Elastic Transcoder is another tool that is geared specifically around transcoding media files. If your workload is geared more towards analytics and processing petabyte scale big data, then tools like EMR (Elastic Map Reduce) and Kinesis could be right up your alley (we’ll cover that in another blog). In addition to not having to manage any of the infrastructure these services ride on, you also benefit from the streamlined integration with other AWS services like IAM for access control, S3, SNS, DynamoDB, etc.
For this article, we’re going to take a closer look at using SQS and SWF to handle typical batch computing demands.
Simple Queue Services (SQS), as the name suggests, is relatively simple. It provides a queuing system that allows you to reliably populate and consume queues of data. Queued items in SQS are called messages and are either a string, number, or binary value. Messages are variable in size but can be no larger than 256KB (at the time of this writing). If you need to queue data/messages larger than 256KB in size the best practice is to store the data elsewhere (e.g. S3, DynamoDB, Redis, MySQL) and use the message data field as a linker to the actual data. Messages are stored redundantly by the SQS service, providing fault tolerance and guaranteed delivery. SQS doesn’t guarantee delivery order or that a message will be delivered only once, which seems like something that could be problematic except that it provides something called Visibility Timeout that ensures once a message has been retrieved it will not be resent for a given period of time. You (well, your application really) have to tell SQS when you have consumed a message and issue a delete on that message. The important thing is to make sure you are doing this within the Visibility Timeout, otherwise you may end up processing single messages multiple times. The reasoning behind not just deleting a message once it has been read from the queue is that SQS has no visibility into your application and whether the message was actually processed completely, or even just successfully read for that matter.
Where SQS is designed to be data-centric and remove the burden of managing a queuing application and infrastructure, Simple Workflow Service (SWF) takes it a step further and allows you to better manage the entire workflow around the data. While SWF implies simplicity in its name, it is a bit more complex than SQS (though that added complexity buys you a lot). With SQS you are responsible for managing the state of your workflow and processing of the messages in the queue, but with SWF, the workflow state and much of its management is abstracted away from the infrastructure and application you have to manage. The initiators, workers, and deciders have to interface with the SWF API to trigger state changes, but the state and logical flow are all stored and managed on the backend by SWF. SWF is quite flexible too in that you can use it to work with AWS infrastructure, other public and private cloud providers, or even traditional on-premise infrastructure. SWF supports both sequential and parallel processing of workflow tasks.
Note: if you are familiar with or are already using JMS, you may be interested to know SQS provides a JMS interface through its java messaging library.
One major thing SWF buys you over using SQS is that the execution state of the entire workflow is stored by SWF extracted from the initiators, workers, and deciders. So not only do you not have to concern yourself with maintaining the workflow execution state, it is completely abstracted away from your infrastructure. This makes the SWF architecture highly scalable in nature and inherently very fault-tolerant.
There are a number of good SWF examples and use-cases available on the web. The SWF Developer Guide uses a classic e-commerce customer order workflow (i.e. place order, process payment, ship order, record completed order). The SWF console also has a built in demo workflow that processes an image and converts it to either grayscale or sepia (requires AWS account login). Either of these are good examples to walk through to gain a better understanding of how SWF is designed to work.
Contact 2nd Watch today to get started with your batch computing workloads in the cloud.
As a digital business, one of the essential platforms you are leveraging today is your ecommerce platform as a way to interact, engage and sell to your customers. 2nd Watch offers Amazon Web Services hosting for ecommerce platforms for large businesses that want a flexible, secure, highly scalable, global and low-cost solution for online sales and retailing.
The architecture and management of the configuration is vital because every second counts to your customers, especially during peak hours, days and seasonal traffic. In today’s highly-connected world, forecasting demand can be difficult and often reaches new peaks through social awareness of deals or offers. Consumers are impatient, and their expectations for how fast they get information is increasing. Any performance issues can affect your brand, conversions, sales and ultimately your top line performance. In order for ecommerce platforms to be highly responsive and meet your customer demand, you must design-for-change so that you can meet your customers where they want and quickly.
Whether your enterprise is running BlueCherry with MS Dynamic AX or Magento, AWS offers the most powerful infrastructure that can scale globally to meet your customers’ demands. The essential part of running in the cloud is the architecture and engineering that will allow your business to scale efficiently to avoid unnecessary costs. With the proper configuration and management, your business can handle millions of catalog views and hundreds of thousands of orders easily to meet your top line objectives.
Enterprise essentials for running on AWS
Security – At a high level, 2nd Watch has taken the following approach to secure the AWS infrastructure
User access. Management of user access and data management is one of the most important aspects for a digital business. Enterprises need to control secure access for users. AWS Identity and Access Management (IAM) allows enterprises to control access to AWS services and resources. When an account is properly set-up and managed, users and groups have controls and permissions that allow or deny them access to any particular AWS resource. The proper account structure and management are required to ensure security and governance.
Manage IAM usersandtheir access – You can create users in IAM, assign them individual security credentials (in other words, access keys, passwords, and multi-factor authentication devices), or request temporary security credentials to provide users access to AWS services and resources. You can manage permissions in order to control which operations a user can perform.
Manage IAM roles and their permissions – You can create roles in IAM and manage permissions to control which operations can be performed by the entity, or AWS service, that assumes the role. You can also define which entity is allowed to assume the role.
Manage federated users and their permissions – You can enable identity federation to allow existing identities (e.g. users) in your enterprise to access the AWS Management Console, to call AWS APIs, and to access resources, without the need to create an IAM user for each identity.
Data Privacy. Encrypting data in transit and at rest is extremely important in the public cloud. AWS provides the essential platform enhancements to easily implement an end-to-end encryption solution. Many AWS services use SSL connections by default, and AWS enables users to securely and easily manage custom SSL certificates for their applications. Data encryption for personal or business data at rest within AWS can be easily and transparently implemented using AWS- or user-supplied encryption keys. AWS maintains platform certification compliance for many of the most important data protection and privacy certifications your business requires, and publishes backup and redundancy procedures for services so that customers can gain greater understanding of how their data flows throughout AWS. For more information on the data privacy and backup procedures for each service in the AWS cloud, consult the Amazon Web Services: Overview of Security Processes
Reports, Certifications, and Independent Atations. AWS has, in the past, successfully completed multiple SAS70 Type II audits, and now publishes a Service Organization Controls 1 (SOC 1) report, published under both the SSAE 16 and the ISAE 3402 professional standards. In addition, AWS has achieved ISO 27001 certification, and has been successfully validated as a Level 1 service provider under the Payment Card Industry (PCI) Data Security Standard (DSS). In the realm of public sector certifications, AWS has received authorization from the U.S. General Services Administration to operate at the FISMA Moderate level, and is also the platform for applications with Authorities to Operate (ATOs) under the Defense Information Assurance Certification and Accreditation Program (DIACAP). We will continue to obtain the appropriate security certifications and conduct audits to demonstrate the security of our infrastructure and services. For more information on risk and compliance activities in the AWS cloud, consult the Amazon Web Services: Risk and Compliance whitepaper.
Physical Security. Amazon has many years of experience in designing, constructing, and operating large-scale data centers. AWS infrastructure is housed in Amazon-controlled data centers throughout the world. Only those within Amazon who have a legitimate business need to have such information know the actual location of these data centers, and the data centers themselves are secured with a variety of physical controls to prevent unauthorized access.
Secure Services. Each of the services within the AWS cloud is architected to be secure and contains a number of capabilities that restrict unauthorized access or usage without sacrificing the flexibility that customers demand. For more information about the security capabilities of each service in the AWS cloud, consult the Amazon Web Services: Overview of Security Processes whitepaper referenced above.
Amazon Elastic Compute Cloud (EC2)
Elastic Load Balancing
Amazon CloudFront (CDN)
Amazon Relational Database (RDS)
Amazon Route 53
Amazon Simple Storage Service (Amazon S3)
Only proper configuration of enterprise ecommerce platforms and the management of user access, data management and infrastructure (IaaS) management will lead to a successful implementation in the public cloud. With the 2nd Watch solution you get the best practices for architecture, configuration, security, and performance. This allows your platform to accommodate for daily, weekly, monthly or yearly cyclical performance requirements that are easily expanded globally.
We are an AWS Premier Partner with over 400 projects on AWS and highly recommend hosting your ecommerce platform on AWS, regardless of if it is BlueCherry with MS Dynamic AX, Magento or another solution. Learn more about 2nd Watch Digital Marketing Solutions on Amazon Web Service Benefits.
Are you interested in a High Performance Solution for an ecommerce platform?
We’ve all been there: Surfing the internet for, well everything, and then BOOM! The website you land on serves up text, but the static and dynamic images fail to appear, leaving nothing but blank, barren real estate and feelings of frustration. Or perhaps you’ve been trying to download the la episode of Game of Thrones only to be thwarted by delivery speeds that make Tyrion’s journey to Volantis seem like it was taken aboard the Concord.
I never really stopped to consider—or appreciate—the technology that delivers consumer-facing web content like images, media, games and software downloads, until recently. I’ve been taking for granted, like most of consumers, that the content I am searching for just appears (like magic) with a single click of a mouse and rapid load of a browser.
What Are CDNs?
Content Delivery Networks (CDN)s have been around since the birth of the internet. They are the key technology that enables websites to deliver content to consumers and give content owners and publishers the ability to scale to meet increasing global demand from consumers using multiple devices and a variety of platforms.
How Do CDNs Work?
In order to achieve optimal delivery performance and accuracy, CDNs maintain a large network of globally distributed servers that are connected to the internet and store or connect to local copies of the customer’s content. By caching the content to the closest end user, it improves the experience by decreasing the amount of time needed to deliver the content to the user’s device.
Why CDNs are Important
As we discussed in our previous post about websites and web hosting, your website is one of the most visible and valuable ways of communicating with your current and potential customers. While there are several ways your business can benefit from building and hosting its website in the cloud, one key benefit is increased performance. A benefit that is realized when your customers receive the information they want, when they want it: With little to no latency and high data transfer speeds.
Most websites contain a mix if static and dynamic content. Static content includes images or style sheets while dynamic or application-generated content includes elements of your site that are personalized to each view. Previously, developers who wanted to improve the performance and reliability of their dynamic content had limited options, as the solutions offered by traditional CDNs are expensive, hard to configure and difficult to manage.
Public cloud services like Amazon CloudFront are a perfect example of how successful consumer-facing websites like PBS are achieving optimal content delivery speeds that delight visitors and improve the overall customer experience.
In terms of enterprise-related benefits, Amazon CloudFront allows developers to get started in minutes and without long term commitments for use, monthly platform fees or additional costs to deliver dynamic content to your end users.
.It works seamlessly with dynamic web applications running in Amazon EC2 or your origin running outside of AWS (example: on-premises data center) without any custom coding or proprietary configurations. This makes Amazon CloudFront easy to deploy and manage. Plus, you can use one, single Amazon CloudFront distribution to deliver your entire website, allowing you to use a single domain name for your entire website without the need to separate your static and dynamic content or manage multiple domain names on your website.
An AWS-sponsored whitepaper by Frost & Sullivan that compared CDN Performance of four ed CDNs discusses the benefits for enterprises:
“For enterprise companies in particular, Amazon CloudFront allows them to deliver large volumes of content with reliable performance to a global audience at a fraction of the cost of trying to deliver the content themselves using their own in-house infrastructure. Instead of a content owner having to buy their own servers, rent co-location space, buy bandwidth, enter into long-term contracts with a variety of vendors or worry about traffic spikes and delivery performance, the content owner can use Amazon CloudFront. By using Amazon CloudFront, the content owner can focus their time and resources on their core product and services, not infrastructure.”
The whitepaper also presents its findings from multiple comparison s that included top CDNs: Amazon CloudFront, Akamai, Level 3 and Limelight. The results show that Amazon CloudFront is, on average, seven percent faster than the next closest CDN and 51 percent faster than the third CDN ed.
There are many kinds of CDNs that deliver everything from small objects like images on websites, to larger pieces of content like software and media downloads. While the type of content can vary, the main goal (and central benefit) of a CDN remains the same: Improving end-user experience by more rapidly and accurately delivering content.
Migrating to Amazon CloudFront
For enterprises, choosing the best CDN partner for their business can be challenging. At 2nd Watch, our digital marketing capabilities are flexible, highly scalable, elastic and enable you to deliver valuable marketing content to your growing customer base—without the need for upfront investments or long-term contracts. It’s a low cost solution that allows you to manage your digital marketing assets (from static and dynamic content to live streaming video and gaming) with ease and agility. Whether you’re migrating your CDN from Akamai or Limelight to Amazon CloudFront, 2nd Watch’s public cloud environments enable you to focus on delivering relevant content that your current and potential customers want, when they want it. Contact us to get started.