Why Benefits of Data Warehouse Outweigh the Financial Cost and How to Reduce the Cost of Development

Any organization that’s invested in an Analytics tool like Tableau, Power BI, Looker, etc. knows that they’re only as good as the data you feed them. Challenges such as disparate sources, inconsistent data formats, and slow legacy systems are just some of the roadblocks that stand in the way of getting the insights you need from your BI reporting and analytics tool.

A common solution to this challenge is a data warehouse that enables data management, analytics, and advanced data science. The data warehouse helps organizations to facilitate data-driven decision making, find cost savings, improve profitability, and the list goes on. No matter the industry, size of the organization, technology involved, or data savviness, our clients always ask us the same question: how do the benefits of a data warehouse justify the cost?

What are the costs of building a modern data warehouse?

Before diving into the many reasons why the benefits of a data warehouse are worth the costs to build it, let’s spend a bit of time discussing the two main investments.

The first major investment will be either hiring a consulting firm to develop your modern data warehouse or dedicating internal resources to the task. By hiring data consultants, you introduce additional costs in consulting fees, but yield results much quicker and therefore save time. If you choose to create an internal task force for the job, you reduce upfront costs, but altering day-to-day functions and the inevitable learning curve lead to longer development timelines.

data warehouse cost benefit analysis

The second investment is almost always necessary as you will need a tech stack to support your modern data warehouse. This may simply involve expanding or repurposing current tools or it could require selecting new technology. It’s important to note that pricing is different for each technology and varies greatly based on your organization’s needs and goals.

It typically involves paying for storage, computing time, or computing power, in addition to a base fee for using the technology. In total, this can typically incur a yearly starting cost of $25,000 and up. To make it easier, each of the major data warehouse technology stacks (Amazon Redshift, Snowflake, and Microsoft Azure SQL Database) offer cost estimating tools. With a clearer understanding of what your costs will look like, let’s jump into why they are worth it.

Why consultants are worth the additional costs.

While your current IT team likely has an intimate knowledge of your data and the current ecosystem, consultants offer benefits as well. Since consultants are dedicated full time to building out your modern data warehouse, they are able to make progress much quicker than an internal team would. Additionally, since they spend their careers developing a wide variety of analytics solutions, they may be more up to date on relevant technology advancements, have experience with various forms of data modeling to evaluate, and most importantly, they understand how to get business users to actually adopt the new solution.

At Aptitive (a 2nd Watch company), we have seen the most success by bringing in a small team of consultants to work side by side with your IT team, with a shared goal and vision. This ensures that your IT department will be able to support the modern data warehouse when it is completed and that the solution will address all of the details integral to your organization’s data. Considering the wealth of experienced consultants bring to the table, their ability to transfer knowledge to internal employees, and the increased speed of development, the high ROI of hiring consultants is unquestionable.

 

Download Now: A Holistic Approach to Cloud Cost Optimization [eBook]

 

Using a Modern Data Warehouse costs you less than using a traditional data analytics system you may currently have in place.

While this is a considerable amount of money to invest in data analytics, many of your current technology investments will be phased out, or the costs will be reduced using modern technology. These solutions alleviate your IT team from cumbersome maintenance tasks through automatic clustering, self-managed infrastructure, and advanced data security options. This allows your IT team to focus on more important business needs and strategic analytics.

With the volume and variety in data organizations track, it’s easy to find yourself stuck with messy data held in siloed systems. Modern data warehouses automate processes to eliminate duplicate information, reduce unnecessary clutter, and combine various sources of data together which enables you to save money by storing data efficiently. Think of it this way, if your data experts struggle to find key information, so does your technology. The extra compute time and storage costs more than you would expect, implementing a system that stores your data logically and in a streamlined manner greatly reduces these costs.

Advanced analytics unlocks insights, enables you to respond to events quicker, and optimizes key decision-making activities.

While it is more difficult to quantify the ROI here, dashboards and advanced analytics greatly enhance your employee’s ability to perform well in their job and save money. Regardless of your industry, using a modern data warehouse to drive analytics that empowers employees to perform better in several ways:

  • Dashboards dramatically decrease the time employees spend finding and organizing the data. For many of our clients, reports that once took analyst weeks of effort to are now able to automatically aggregate in seconds.
  • Accurate data empowers better decision-making and yields creative problem-solving. You have the right information quicker.
  • Real-time analytics enables you to quickly respond to significant business events. This gives you a competitive edge since you can easily retain customers, spot inefficiencies, and respond to external influences.
  • Predictive analytics save you money by finding opportunities before you would need to act.

Developing a full-scale data warehouse requires time and money that may not be available at the moment. That being said, the benefits of a data warehouse are necessary to remain competitive. To address this discrepancy, Aptitive has found a solution to help you build a modern data warehouse quicker and without the large upfront investment. A modular data warehouse contains key strategic data and ensures that you gain advantages of analytics almost immediately. On top of that, it provides a scalable foundation that you can add data to overtime until you incorporate all the data necessary for your business functions.

contact us

For more details about implementing a modular data warehouse, check out this link or reach out to us directly to get started on your modular data warehouse.

Facebooktwitterlinkedinmailrss

9 Helpful Tools for Building a Data Pipeline

Companies create tons of disparate data throughout their organizations through applications, databases, files and streaming sources. Moving the data from one data source to another is a complex and tedious process. Ingesting different types of data into a common platform requires extensive skill and knowledge of both the inherent data type of use and sources.

Due to these complexities, this process can be faulty, leading to inefficiencies like bottlenecks, or the loss or duplication of data. As a result, data analytics becomes less accurate and less useful and in many instances, provide inconclusive or just plain inaccurate results.

For example, a company might be looking to pull raw data from a database or CRM system and move it to a data lake or data warehouse for predictive analytics. To ensure this process is done efficiently, a comprehensive data strategy needs to be deployed necessitating the creation of a data pipeline.

What is a Data Pipeline?

A data pipeline is a set of actions organized into processing steps that integrates raw data from multiple sources to one destination for storage, business intelligence (BI), data analysis, and visualization.

There are three key elements to a data pipeline: source, processing, and destination. The source is the starting point for a data pipeline. Data sources may include relational databases and data from SaaS applications. There are two different methods for processing or ingesting models: batch processing and stream processing.

  • Batch processing: Occurs when the source data is collected periodically and sent to the destination system. Batch processing enables the complex analysis of large datasets. As patch processing occurs periodically, the insights gained from this type of processing are from information and activities that occurred in the past.
  • Stream processing: Occurs in real-time, sourcing, manipulating, and loading the data as soon as it’s created. Stream processing may be more appropriate when timeliness is important because it takes less time than batch processing. Additionally, stream processing comes with lower cost and lower maintenance.

The destination is where the data is stored, such as an on-premises or cloud-based location like a data warehouse, a data lake, a data mart, or a certain application. The destination may also be referred to as a “sink”.

What is a Data Pipeline and How to Build One | 2ND Watch

Data Pipeline vs. ETL Pipeline

One popular subset of a data pipeline is an ETL pipeline, which stands for extract, transform, and load. While popular, the term is not interchangeable with the umbrella term of “data pipeline”. An ETL pipeline is a series of processes that extract data from a source, transform it, and load it into a destination. The source might be business systems or marketing tools with a data warehouse as a destination.

There are a few key differentiators between an ETL pipeline and a data pipeline. First, ETL pipelines always involve data transformation and are processed in batches, while data pipelines ingest in real-time and do not always involve data transformation. Additionally, an ETL Pipeline ends with loading the data into its destination, while a data pipeline doesn’t always end with the loading. Instead, the loading can instead activate new processes by triggering webhooks in other systems.

Uses for Data Pipelines:

  • To move, process, and store data
  • To perform predictive analytics
  • To enable real-time reporting and metric updates

Uses for ETL Pipelines:

  • To centralize your company’s data
  • To move and transform data internally between different data stores
  • To Enrich your CRM system with additional data

9 Popular Data Pipeline Tools

Although a data pipeline helps organize the flow of your data to a destination, managing the operations of your data pipeline can be overwhelming. For efficient operations, there are a variety of useful tools that serve different pipeline needs. Some of the best and most popular tools include:

  • AWS Data Pipeline: Easily automates the movement and transformation of data. The platform helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available.
  • Azure Data Factory: A data integration service that allows you to visually integrate your data sources with more than 90 built-in, maintenance-free connectors.
  • Etleap: A Redshift data pipeline tool that’s analyst-friendly and maintenance-free. Etleap makes it easy for business to move data from disparate sources to a Redshift data warehouse.
  • Fivetran: A platform that emphasizes the ability to unlock faster time to insight, rather than having to focus on ETL using robust solutions with standardized schemas and automated pipelines.
  • Google Cloud Dataflow: A unified stream and batch data processing platform that simplifies operations and management and reduces the total cost of ownership.
  • Keboola: Keboola is a platform is a SaaS platform that starts for free and covers the entire pipeline operation cycle.
  • Segment: A customer data platform used by businesses to collect, clean, and control customer data to help them understand the customer journey and personalize customer interactions.
  • Stitch: Stitch is a cloud-first platform rapidly moves data to the analysts of your business within minutes so that it can be used according to your requirements. Instead of focusing on your pipeline, Stitch helps reveal valuable insights.
  • Xplenty: A cloud-based platform for ETL that is beginner-friendly, simplifying the ETL process to prepare data for analytics.

 

How We Can Help

Building a data pipeline can be daunting due to the complexities involved in safely and efficiently transferring data. At 2nd Watch, we can build and manage your data for you so you can focus on BI and analytics to focus on your business. Contact us if you would like to learn more.

Facebooktwitterlinkedinmailrss

Why Cloud Services are Here to Stay for Media & Entertainment

During the COVID-19 pandemic, media and entertainment (M&E) organizations accelerated their need to undertake a digital transformation. As we approach a post-pandemic world, M&E companies are realizing that their digital transformation is no longer just a short-term solution, but rather, it is a long-term necessity to survive the increasingly competitive and saturated landscape of content distribution and consumption. Cloud service providers play a crucial role to M&E brands as they continue their digital evolution. Throughout the pandemic, cloud solutions allowed M&E companies to adapt efficiently and effectively. Beyond the landscape of COVID-19, a cloud-based framework will continue to facilitate agility and scalability in the M&E business model.  

Cloud Services for Media and Entertainment (M&E) | 2nd Watch

How COVID-19 Impacted the Media and Entertainment Industry

When COVID-19 created an unprecedented environment and altered our daily operations, people and businesses had to rapidly adjust to the new circumstances. In particular, the M&E industry faced a reckoning that was imminent before the pandemic and became more acute during the pandemic.

For M&E businesses, COVID-19 forced upon them an important pivotal point in their digital strategy. The pandemic didn’t present vastly new challenges for M&E organizations, it simply accelerated and highlighted the problems they had already begun experiencing in the last five or so years. Viewer behavior is one of the biggest shake-ups in the M&E industry. Prior to 2020, audiences were already hunting for new ways to consume content. Traditional linear broadcast was waning and modern digital streaming services were booming. Media content consumption was drastically changing, as audiences streamed content on different devices, such as their smartphones, tablets, connected TVs, PCs, and gaming consoles. Now, legacy M&E brands are no longer competing just against nimble new players in the streaming space, but they are also competing against music, gaming, and esport platforms. All of these trends that were in motion pre-pandemic became more apparent after society began sheltering-in-place.

With most of the United States going remote, industry giants, like Warner Brothers and Disney, pivoted their focus to streaming content to adjust to shelter-in-place orders. In an unprecedented move, Warner Brothers began releasing new movies in theaters and via streaming platforms simultaneously. Disney’s emphasis on its streaming service, Disney Plus, paid off:  it exploded during quarantine and quickly accumulated 100 million subscribers. Additionally, Disney also followed a similar cinema distribution model to Warner Brothers by releasing new hits via streaming rather than just in theaters. 

The need for digital innovation was crucial for the M&E industry to adapt to the new circumstances created by the pandemic, and this need will continue long into the post-COVID world. M&E organizations faced a catalyst in their structural transformation, and the digitization of content workflows and distribution became absolutely imperative as employees went remote and content consumption hit an all-time high. Moreover, certain market trends were felt more acutely during the pandemic and represented a paradigmatic shift for the M&E industry. These trends include the rise of direct-to-consumer, content wars via mergers and acquisitions, and wavering audience loyalty. Change is ever-present, and the consequences of not adapting to the modern world became obvious and unavoidable in the face of the pandemic. Ultimately, M&E incumbents who are slow to modernize their technology, production, and monetization strategies will be left behind by more agile competitors

How M&E Companies Can Use the Cloud to Innovate

As we return “back to normal,” we’ll see how the pandemic affected our societal structures temporarily and permanently. The M&E industry was particularly changed in an irrevocable manner: a new age of media has been fully realized, and M&E businesses will have to rethink their business models as a result. How the pandemic will continue to evolve from here is still unknown, but it is clear that media organizations will have to continue to innovate in order to keep up with the changes in working patterns and audience behavior.

To adapt to the accelerated changes driven by COVID-19, the modern media supply chain will require agility, flexibility, and scalability. Cloud solutions (such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform) are the key enabler for M&E companies as they look to innovate. According to a Gartner report on digital transformation in media and entertainment, 80% of broadcasters and content creators migrated all or part of their operations to public cloud platforms as an urgent response to effects of quarantine in 2020. By switching to cloud-based infrastructures, M&E companies were able to collaborate and create remotely, better understand real-time audience behavior, and maintain a secure environment while supporting media production, storage, processing, and distribution requirements.

There is no one-size-fits-all cloud strategy, as it is dependent on the business. Some companies opt for a single cloud provider, while others choose a multi cloud strategy. A hybrid cloud solution is also an option, which utilizes data centers in conjunction with cloud service providers. Regardless of a company’s cloud strategy, the benefits of migrating to the cloud remain the same. Below we’ll dive into a couple of the pros of utilizing the cloud for morderning workflows, supply chains, and data analyses. 

Unifying Workflows

With a cloud platform, teams can now collaborate remotely and globally, which ultimately leads to greater productivity and efficiency in content creation. When it comes to media production, whether it is live or pre-filmed, massive teams of professionals are needed to make the vision come alive (editors, visual effects artists, production professionals, etc.) COVID-19 demonstrated that teams using cloud service providers could still work collaboratively and effectively in a remote environment. In fact, businesses realized that requiring teams to come on-site for content production can be more time consuming and costly than working remotely. Virtual post-production is a great example of how the cloud is more economical from a financial and time sense. Using a modern cloud infrastructure, M&E brands can create virtual workstations, which replaces physical workstations at the user’s desk. Unlike traditional workstations, virtual workstations do not have a capital expense. Virtual workstations are extremely customizable in terms of size and power to the exact specifications needed for a given task. Furthermore, the billing is flexible and you only pay for what resources you use. Lastly, with physical workstations, there are many “hidden costs.” Think about the electricity and staffing fees that businesses must pay in order to keep a workstation running. When you switch to a virtual workstation for post-production work, all of the aforementioned costs are managed by a cloud service provider.

Streamlining the Media Supply Chain

As media and entertainment shifts to direct-to-consumer, content management has become absolutely crucial in the media supply chain. Content libraries are only growing bigger and there is an influx of newly-produced assets as team workflows work more efficiently. Even so, most media companies store their library assets on-premise and within tape-based LTO cartridges. By doing so, these assets are neither indexable, searchable, or readily accessible. This slows down editing, versioning, compliance checking, and repackaging, all of which hurts an organization’s ability for rapid content monetization. By implementing a cloud-based infrastructure, M&E companies can utilize tools like machine learning capabilities to manage, activate, and monetize their assets throughout the content supply chain.

Capturing Real-time Data

Archaic and lagged metrics, such as overnight ratings and box office returns, will struggle today to produce actionable insights. Digital transformation for M&E organizations will require a technology and cultural transformation towards a data-driven mindset. To make data-driven decisions, you need to have the tools to collect, process, and analyze the data. Cloud platforms can help process big data by employing machine learning capabilities to deeply understand audiences, which can translate into monetization opportunities further down the funnel. By harnessing the cloud to redefine data strategy, businesses can make confident decisions using real-time data and use actionable insights to deliver real transformation. 

Conclusion 

Before the pandemic, 2020 was shaping up to be a pivotal year for the M&E industry as audience behavior was changing and new competitors were cropping up; however, the effects of the COVID-19 expedited these trends and forced organizations to transform immediately. In this new age of media, M&E companies must reckon with these unique and long-lasting challenges and seek to change their business models, cultures and technologies to keep up with the changing landscape. 

-Anthony Torabi, Media & Entertainment Strategic Account Executive

Facebooktwitterlinkedinmailrss

Simple & Secure Data Lakes with AWS Lake Formation

Data is the lifeblood of business. To help companies visualize their data, guide business decisions, and enhance their business operations requires employing machine learning services. But where to begin. Today, tremendous amounts of data are created by companies worldwide, often in disparate systems.

Secure Data Lakes with AWS Lake Formation

These large amounts of data, while helpful, don’t necessarily need to be processed immediately, yet need to be consolidated into a single source of truth to enable business value. Companies are faced with the issue of finding the best way to securely store their raw data for later use. One popular type of data store is referred to as a “data lake, and is very different from the traditional data warehouse.

Use Case: Data Lakes and McDonald’s

McDonald’s brings in about 1.5 million customers each day, creating 20-30 new data points with each of their transactions. The restaurant’s data comes from multiple data sources including a variety of data vendors, mobile apps, loyalty programs, CRM systems, etc. With all this data to use from various sources, the company wanted to build a complete perspective of a CLV and other useful analytics. To meet their needs for data collection and analytics, McDonald’s France partnered with 2nd Watch. The data lake allowed McDonald’s to ingest data into one source, reducing the effort required to manage and analyze their large amounts of data.

Due to their transition from a data warehouse to a data lake, McDonald’s France has greater visibility into the speed of service, customer lifetime value, and conversion rates. With an enhanced view of their data, the company can make better business decisions to improve their customers’ experience. So, what exactly is a data lake, how does it differ from a data warehouse, and how do they store data for companies like McDonald’s France?

What is a Data Lake?

A data lake is a centralized storage repository that holds a vast amount of raw data in its native format until it is needed for use. A data lake can include any combination of:

  • Structured data: highly organized data from relational databases
  • Semi-structured data: data with some organizational properties, such as HTML
  • Unstructured data: data without a predefined data model, such as email

Data Lakes are often mistaken for Data Warehouses, but the two data stores cannot be used interchangeably. Data Warehouses, the more traditional data store, process and store your data for analytical purposes. Filtering data through data warehouses occurs automatically, and the data can arrive from multiple locations. Data lakes, on the other hand, store and centralize data that comes in without processing it. Thus, there is no need to identify a specific purpose for the data as with a data warehouse environment. Your data, whether in its original form or curated form, can be stored in a data lake. Companies often choose a data lake for their flexibility in supporting any type of data, their scalability, analytics, machine learning capabilities, and low costs.

While Data Warehouses are appealing for their element of automatically curated data and fast results, data lakes can lead to several areas of improvement for your data and business including:

  • Improved customer interactions
  • Improved R&D innovation choices
  • Increase operational efficiencies

Essentially, a piece of information stored in a data lake will seem like a small drop in a big lake. Due to the lack of organization and security that tends to occur when storing large quantities of data in data lakes, this storing method has received some criticism. Additionally, setting up a data lake can be time and labor intensive, often taking months to complete. This is because, when built the traditional way, there are a series of steps that need to be completed and then repeated for different data sets.

Even once fully architected, there can be errors in the setup due to your data lakes being manually configured over an extended period. An important piece to your data lake is a data catalog, which uses machine learning capabilities to recognize data and create a universal schema when new datasets come into your data lake. Without defined mechanisms and proper governance, your data lake can quickly become a “data swamp”, where your data becomes hard to manage, analyze, and ultimately becomes unusable. Fortunately, there is a solution to all these problems. You can build a well-architected data lake in a short amount of time with AWS Lake Formation.

AWS Lake Formation & its Benefits

Traditionally, data lakes were set up as on-premises deployments before people realized the value and security provided by the cloud. These on-premises environments required continual adjustments for things like optimization and capacity planning—which is now easier due to cloud services like AWS Lake Formation. Deploying data lakes in the cloud provides scalability, availability, security, and faster time to build and deploy your data lake.

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days, saving your business a lot of time and effort to focus on other aspects of your business. While AWS Lake Formation significantly cuts down the time it takes to setup your data lake, it is built and deployed securely. Additionally, AWS Lake Formation enables you to break down data silos and combine a variety of analytics to gain data insights and ultimately guide better business decisions. The benefits delivered by this AWS service are:

  • Build data lakes quickly: To build a data lake in Lake Formation, you simply need to import data from databases already in AWS, other AWS sources, or from other external sources. Data stored in Amazon S3, for example, can be moved into your data lake, where your crawl, catalog, and prepare your data for analytics. Lake Formation also helps transform data with AWS Glue to prepare for it for quality analytics. Additionally, with AWS’s FindMatches, data can be cleaned and deduplicated to simplify your data.
  • Simplify security management: Security management is simpler with Lake Formation because it provides automatic server-side encryption, providing a secure foundation for your data. Security settings and access controls can also be configured to ensure high-level security. Ones configured with rules, Lake formation enforces your access controls. With Lake Formation, your security and governance standards will be met.
  • Provide self-service access to data: With large amounts of data in your data lake, finding the data you need for a specific purpose can be difficult. Through Lake Formation, your users can search for relevant data using custom fields such as name, contents, and sensitivity to make discovering data easier. Lake Formation can also be paired with AWS analytics services, such as Amazon Athena, Amazon Redshift, and Amazon EMR. For example, queries can be run through Amazon Athena using data that is registered with Lake Formation.

Building a data lake is one hurdle but building a well-architected and secure data lake is another. With Lake Formation, building and managing data lakes is much easier. On a secure cloud environment, your data will be safe and easy to access.

2nd Watch has been recognized as a Premier Consulting Partner by AWS for nearly a decade and our engineers are 100% certified on AWS. Contact us to learn more about AWS Lake Formation or to get assistance building your data lake.

-Tessa Foley, Marketing

Facebooktwitterlinkedinmailrss

BigQuery, Looker, and Cloud Functions – Let’s create a data solution.

As a cloud consulting company, we witness enterprise clients with a lot of data; and typical for most of these clients is that data is siloed with universal access to the information not easily transparent. Client libraries are essentially islands of misfit toys.

During an internal hackathon, Nick Centola and I decided to take up the challenge of creating an enterprise class solution that would extract, transform and load (ETL) data from multiple sources to a data warehouse, with the capability of performing advanced forecasting and in turn be 100% serverless by design that inherently keeps running cost to a minimum.

We decided to keep the scope relatively simple and used the publicly available Citi Bike NYC dataset. The Citi Bike NYC dataset has monthly trip data exported as CSV files and public and a near real-time API, which from our experience is a pattern we often see in enterprises. The diagram below represents what we were trying to achieve.

Extract Transform Load (ETL)

At 2nd Watch, we love Functions-as-a-Service (FaaS) and Cloud Functions as we can create very scalable solutions, have no infrastructure to manage, and in most instances, we will not have to worry about the cost associated with the Cloud Functions.

There were two ETL jobs to write. One was to take the zipped CSV data from the public S3 trip data bucket and land it in our Google Cloud Storage Bucket for an automated daily import into BigQuery. The other function was to grab data from the stations’ near real-time restful API endpoint and insert it into our BigQuery table.

Nick is most efficient with Python; I am most efficient with NodeJS. As both languages are acceptable production code languages for most organizations we work with, we decided to write a function in our respected preferred languages.

The data that we pulled into BigQuery was already clean. We did not need to enrich or transform the data for our purpose – this is not always the case, and cleaning and enriching data are areas where we usually spend most of our time when building similar solutions for our customers.

Machine Learning

We wanted to enable a relatively simple forecast on bike demand on individual stations across New York City. BigQuery ML is incredibly powerful and has more than 30 built-in machine learning models. The model of choice for our use case would be the ARIMA model, which takes time series data as an input. I won’t go into too much in detail on why the ARIMA model is a good model for this as compared to the multitude of google cloud functions; the full form of the acronym describes why; Auto Regressive (AR) Integrated (I) Moving Average (MA).

Visualizations

Bringing it all together, we created our LookML models in Looker and interacted with the data exceptionally easily. We made a couple of heat map-based visualizations of New York City to easily visualize the popular routes and stations and a station dashboard to monitor expected supply and demand over the next hour. With the bike stations API data flowing into BQ every 5 seconds, we get a close-to-real-time dashboard that we can use for the basis of alerting staff of an inadequate number of bikes at any station across NYC.

The station forecast shows the upper and the lower bound forecast for each hour over the next month. We use the upper bound forecast for our predicted “amount of bikes in the next hour” and pull in our available bikes from the real-time API. If you use your imagination, you can think of other use cases where a similar prediction could be relevant; franchise restaurant ingredient forecasting or forecasting at retailers for inventory or staffing needs to service customers – the possibilities are endless.

One of the coolest things we did from Nick and my perspective was to drive model training and forecasting straight from Looker and LookML allowing us to essentially kick off our model training every time we receive new data in BigQuery – all from the convenient interface of Looker.

Enterprise Solution

As this was a quick prototyping effort, we took a few shortcuts compared to our delivery standards at 2nd Watch. We did not use infrastructure as code, a best practice we implement for all production-ready customer engagements. Second, we decided not to worry about data quality, which would be something we would clean, enrich, and transform based on your documented business requirements. Third, we did not set up telemetry that would allow us to respond to things like slow queries and broken ETL jobs or visualizations.

Is this hard?

Yes and no. For us it was not – Nick and my combined experience accumulates to thousands of hours building and documenting data pipelines and distributed systems. If you are new to this and your data footprint includes more than a few data sources, we highly recommend that you ask for enterprise expertise in building out your pipeline. You’ll need a team with in-depth experience to help you set up LookML as this will be the foundation for self-service within your organization. Ultimately though, experiments like this can serve to create both business intelligence and allow your organizations to proactively respond to events to meet your corporate and digital transformation initiatives.

Do you want to see a demo of our solution, check out our webinars below:

Aleksander Hansson, 2nd Watch Google Cloud Specialist

 

Facebooktwitterlinkedinmailrss

3 Reasons Businesses Use Google Cloud Platform (GCP) for AI

Google Cloud Platform (GCP) offers a wide scope of artificial intelligence (AI) and machine learning (ML) services fit for a range of industries and use cases. With more businesses turning to AI for data-based innovation and new solutions, GCP services are proving effective. See why so many organizations are choosing Google Cloud to motivate, manage, and make change easy.

3 Reasons Businesses Use Google Cloud Platform (GCP) for AI

1. Experimentation and Cost Savings

Critical to the success of AI and ML models are data scientists. The more you enable, empower, and support your data scientists through the AI lifecycle, the more accurate and reliable your models will be. Key to any successful new strategy is flexibility and cost management. Oneway GCP reduces costs while offering enterprise flexibility is with Google’s AI Platform Notebooks.

Managed JuptyerLab notebook instances give data scientists functional flexibility – including access to BigQuery, with the ability to add CPUs, RAM, and GPUs to scale – cloud security, and data access with a streamlined experience from data to deployment. Relying on on-prem environments, data scientists are limited by resource availability and a variety of costs related data warehousing infrastructure, hosting, security, storage, and other expenses. JuptyerLab notebooks and Big Query, on the other hand, are pay as you go and always available via the AI Platform Notebooks. With cost-effective experimentation, you avoid over provisioning, only pay for what you use and when you run, and give data scientists powerful tools to get data solutions fast.

2. Access and Applications

AI and ML projects are only possible after unifying data. A common challenge to accomplishing this first step are data silos across the organization. These pockets of disjointed data across departments threaten the reliability and business outcomes of data-based decision making. The GCP platform is built on a foundation of integration and collaboration, giving teams the necessary tools and expansive services to gain new data insights for greater impacts.

For instance, GCP enables more than just data scientists to take advantage of their AI services, databases, and tools. Developers without data science experience can utilize APIs to incorporate ML into the solution without ever needing to build a model. Even others, who don’t have knowledge around data science, can create custom models that integrate into applications and websites using Cloud AutoML.

Additionally, BigQuery Omni, a new service from GCP, enables compatibility across platforms. BigQuery Omni enables you to query data residing in other places using standard SQL with the powerful engine of BigQuery. This innovation furthers your ability to join data quickly and without additional expertise for unobstructed applicability.

3. ML Training and Labs

Google enables users with best practices for cost-efficiency and performance. Through its Quiklabs platform, you get free, temporary access to GCP and AWS, to learn the cloud on the real thing, rather than simulations. Google also offers training courses ranging from 30-minute individual sessions, to multi-day sessions. The courses are built for introductory users, all the way up to expert level, and are instructor-led or self-paced. Thousands of topics are covered, including AI and ML, security, infrastructure, app dev, and many more.

With educational resources at their fingertips, data teams can roll up their sleeves, dive in, and find some sample data sets and labs, and experience the potential of GCP hands-on. Having the ability to experiment with labs without running up a bill – because it is in a sandbox environment – makes the actual implementation, training, and verification process faster, easier, and cost-effective. There is no danger of accidentally leaving a BigQuery system up and running, executing over and over, with a huge cost to the business.

Next Steps

If you’re contemplating AL and ML on Google Cloud Platform, get started with Quiklabs to see what’s possible. Whether you’re the one cheerleading AI and ML in your organization or the one everyone is seeking buy-in from, Quiklabs can help. See what’s possible on the platform before going full force on a strategy. Google is constantly adding new services and tools, so partner with experts you can trust to achieve the business transformation you’re expecting.

Contact 2nd Watch, a Google Cloud Partner with over 10 years of cloud experience, to discuss your use cases, level of complexity, and our advanced suite of capabilities with a cloud advisor.

Learn more

Webinar: 6 Essential Tactics for your Data & Analytics Strategy

Webinar:  Building an ML foundation for Google BigQuery ML & Looker

-Sam Tawfik, Sr Product Marketing Manager

Facebooktwitterlinkedinmailrss