9 Helpful Tools for Building a Data Pipeline

Companies create tons of disparate data throughout their organizations through applications, databases, files and streaming sources. Moving the data from one data source to another is a complex and tedious process. Ingesting different types of data into a common platform requires extensive skill and knowledge of both the inherent data type of use and sources.

Due to these complexities, this process can be faulty, leading to inefficiencies like bottlenecks, or the loss or duplication of data. As a result, data analytics becomes less accurate and less useful and in many instances, provide inconclusive or just plain inaccurate results.

For example, a company might be looking to pull raw data from a database or CRM system and move it to a data lake or data warehouse for predictive analytics. To ensure this process is done efficiently, a comprehensive data strategy needs to be deployed necessitating the creation of a data pipeline.

What is a Data Pipeline?

A data pipeline is a set of actions organized into processing steps that integrates raw data from multiple sources to one destination for storage, business intelligence (BI), data analysis, and visualization.

There are three key elements to a data pipeline: source, processing, and destination. The source is the starting point for a data pipeline. Data sources may include relational databases and data from SaaS applications. There are two different methods for processing or ingesting models: batch processing and stream processing.

  • Batch processing: Occurs when the source data is collected periodically and sent to the destination system. Batch processing enables the complex analysis of large datasets. As patch processing occurs periodically, the insights gained from this type of processing are from information and activities that occurred in the past.
  • Stream processing: Occurs in real-time, sourcing, manipulating, and loading the data as soon as it’s created. Stream processing may be more appropriate when timeliness is important because it takes less time than batch processing. Additionally, stream processing comes with lower cost and lower maintenance.

The destination is where the data is stored, such as an on-premises or cloud-based location like a data warehouse, a data lake, a data mart, or a certain application. The destination may also be referred to as a “sink”.

Data Pipeline vs. ETL Pipeline

One popular subset of a data pipeline is an ETL pipeline, which stands for extract, transform, and load. While popular, the term is not interchangeable with the umbrella term of “data pipeline”. An ETL pipeline is a series of processes that extract data from a source, transform it, and load it into a destination. The source might be business systems or marketing tools with a data warehouse as a destination.

There are a few key differentiators between an ETL pipeline and a data pipeline. First, ETL pipelines always involve data transformation and are processed in batches, while data pipelines ingest in real-time and do not always involve data transformation. Additionally, an ETL Pipeline ends with loading the data into its destination, while a data pipeline doesn’t always end with the loading. Instead, the loading can instead activate new processes by triggering webhooks in other systems.

Uses for Data Pipelines:

  • To move, process, and store data
  • To perform predictive analytics
  • To enable real-time reporting and metric updates

Uses for ETL Pipelines:

  • To centralize your company’s data
  • To move and transform data internally between different data stores
  • To Enrich your CRM system with additional data

9 Popular Data Pipeline Tools

Although a data pipeline helps organize the flow of your data to a destination, managing the operations of your data pipeline can be overwhelming. For efficient operations, there are a variety of useful tools that serve different pipeline needs. Some of the best and most popular tools include:

  • AWS Data Pipeline: Easily automates the movement and transformation of data. The platform helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available.
  • Azure Data Factory: A data integration service that allows you to visually integrate your data sources with more than 90 built-in, maintenance-free connectors.
  • Etleap: A Redshift data pipeline tool that’s analyst-friendly and maintenance-free. Etleap makes it easy for business to move data from disparate sources to a Redshift data warehouse.
  • Fivetran: A platform that emphasizes the ability to unlock faster time to insight, rather than having to focus on ETL using robust solutions with standardized schemas and automated pipelines.
  • Google Cloud Dataflow: A unified stream and batch data processing platform that simplifies operations and management and reduces the total cost of ownership.
  • Keboola: Keboola is a platform is a SaaS platform that starts for free and covers the entire pipeline operation cycle.
  • Segment: A customer data platform used by businesses to collect, clean, and control customer data to help them understand the customer journey and personalize customer interactions.
  • Stitch: Stitch is a cloud-first platform rapidly moves data to the analysts of your business within minutes so that it can be used according to your requirements. Instead of focusing on your pipeline, Stitch helps reveal valuable insights.
  • Xplenty: A cloud-based platform for ETL that is beginner-friendly, simplifying the ETL process to prepare data for analytics.

 

How We Can Help

Building a data pipeline can be daunting due to the complexities involved in safely and efficiently transferring data. At 2nd Watch, we can build and manage your data for you so you can focus on BI and analytics to focus on your business. Contact us if you would like to learn more.

Why Cloud Services are Here to Stay for Media & Entertainment

During the COVID-19 pandemic, media and entertainment (M&E) organizations accelerated their need to undertake a digital transformation. As we approach a post-pandemic world, M&E companies are realizing that their digital transformation is no longer just a short-term solution, but rather, it is a long-term necessity to survive the increasingly competitive and saturated landscape of content distribution and consumption. Cloud service providers play a crucial role to M&E brands as they continue their digital evolution. Throughout the pandemic, cloud solutions allowed M&E companies to adapt efficiently and effectively. Beyond the landscape of COVID-19, a cloud-based framework will continue to facilitate agility and scalability in the M&E business model.  

How COVID-19 Impacted the Media and Entertainment Industry

When COVID-19 created an unprecedented environment and altered our daily operations, people and businesses had to rapidly adjust to the new circumstances. In particular, the M&E industry faced a reckoning that was imminent before the pandemic and became more acute during the pandemic.

For M&E businesses, COVID-19 forced upon them an important pivotal point in their digital strategy. The pandemic didn’t present vastly new challenges for M&E organizations, it simply accelerated and highlighted the problems they had already begun experiencing in the last five or so years. Viewer behavior is one of the biggest shake-ups in the M&E industry. Prior to 2020, audiences were already hunting for new ways to consume content. Traditional linear broadcast was waning and modern digital streaming services were booming. Media content consumption was drastically changing, as audiences streamed content on different devices, such as their smartphones, tablets, connected TVs, PCs, and gaming consoles. Now, legacy M&E brands are no longer competing just against nimble new players in the streaming space, but they are also competing against music, gaming, and esport platforms. All of these trends that were in motion pre-pandemic became more apparent after society began sheltering-in-place.

With most of the United States going remote, industry giants, like Warner Brothers and Disney, pivoted their focus to streaming content to adjust to shelter-in-place orders. In an unprecedented move, Warner Brothers began releasing new movies in theaters and via streaming platforms simultaneously. Disney’s emphasis on its streaming service, Disney Plus, paid off:  it exploded during quarantine and quickly accumulated 100 million subscribers. Additionally, Disney also followed a similar cinema distribution model to Warner Brothers by releasing new hits via streaming rather than just in theaters. 

The need for digital innovation was crucial for the M&E industry to adapt to the new circumstances created by the pandemic, and this need will continue long into the post-COVID world. M&E organizations faced a catalyst in their structural transformation, and the digitization of content workflows and distribution became absolutely imperative as employees went remote and content consumption hit an all-time high. Moreover, certain market trends were felt more acutely during the pandemic and represented a paradigmatic shift for the M&E industry. These trends include the rise of direct-to-consumer, content wars via mergers and acquisitions, and wavering audience loyalty. Change is ever-present, and the consequences of not adapting to the modern world became obvious and unavoidable in the face of the pandemic. Ultimately, M&E incumbents who are slow to modernize their technology, production, and monetization strategies will be left behind by more agile competitors

How M&E Companies Can Use the Cloud to Innovate

As we return “back to normal,” we’ll see how the pandemic affected our societal structures temporarily and permanently. The M&E industry was particularly changed in an irrevocable manner: a new age of media has been fully realized, and M&E businesses will have to rethink their business models as a result. How the pandemic will continue to evolve from here is still unknown, but it is clear that media organizations will have to continue to innovate in order to keep up with the changes in working patterns and audience behavior.

To adapt to the accelerated changes driven by COVID-19, the modern media supply chain will require agility, flexibility, and scalability. Cloud solutions (such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform) are the key enabler for M&E companies as they look to innovate. According to a Gartner report on digital transformation in media and entertainment, 80% of broadcasters and content creators migrated all or part of their operations to public cloud platforms as an urgent response to effects of quarantine in 2020. By switching to cloud-based infrastructures, M&E companies were able to collaborate and create remotely, better understand real-time audience behavior, and maintain a secure environment while supporting media production, storage, processing, and distribution requirements.

There is no one-size-fits-all cloud strategy, as it is dependent on the business. Some companies opt for a single cloud provider, while others choose a multi cloud strategy. A hybrid cloud solution is also an option, which utilizes data centers in conjunction with cloud service providers. Regardless of a company’s cloud strategy, the benefits of migrating to the cloud remain the same. Below we’ll dive into a couple of the pros of utilizing the cloud for morderning workflows, supply chains, and data analyses. 

Unifying Workflows

With a cloud platform, teams can now collaborate remotely and globally, which ultimately leads to greater productivity and efficiency in content creation. When it comes to media production, whether it is live or pre-filmed, massive teams of professionals are needed to make the vision come alive (editors, visual effects artists, production professionals, etc.) COVID-19 demonstrated that teams using cloud service providers could still work collaboratively and effectively in a remote environment. In fact, businesses realized that requiring teams to come on-site for content production can be more time consuming and costly than working remotely. Virtual post-production is a great example of how the cloud is more economical from a financial and time sense. Using a modern cloud infrastructure, M&E brands can create virtual workstations, which replaces physical workstations at the user’s desk. Unlike traditional workstations, virtual workstations do not have a capital expense. Virtual workstations are extremely customizable in terms of size and power to the exact specifications needed for a given task. Furthermore, the billing is flexible and you only pay for what resources you use. Lastly, with physical workstations, there are many “hidden costs.” Think about the electricity and staffing fees that businesses must pay in order to keep a workstation running. When you switch to a virtual workstation for post-production work, all of the aforementioned costs are managed by a cloud service provider.

Streamlining the Media Supply Chain

As media and entertainment shifts to direct-to-consumer, content management has become absolutely crucial in the media supply chain. Content libraries are only growing bigger and there is an influx of newly-produced assets as team workflows work more efficiently. Even so, most media companies store their library assets on-premise and within tape-based LTO cartridges. By doing so, these assets are neither indexable, searchable, or readily accessible. This slows down editing, versioning, compliance checking, and repackaging, all of which hurts an organization’s ability for rapid content monetization. By implementing a cloud-based infrastructure, M&E companies can utilize tools like machine learning capabilities to manage, activate, and monetize their assets throughout the content supply chain.

Capturing Real-time Data

Archaic and lagged metrics, such as overnight ratings and box office returns, will struggle today to produce actionable insights. Digital transformation for M&E organizations will require a technology and cultural transformation towards a data-driven mindset. To make data-driven decisions, you need to have the tools to collect, process, and analyze the data. Cloud platforms can help process big data by employing machine learning capabilities to deeply understand audiences, which can translate into monetization opportunities further down the funnel. By harnessing the cloud to redefine data strategy, businesses can make confident decisions using real-time data and use actionable insights to deliver real transformation. 

Conclusion 

Before the pandemic, 2020 was shaping up to be a pivotal year for the M&E industry as audience behavior was changing and new competitors were cropping up; however, the effects of the COVID-19 expedited these trends and forced organizations to transform immediately. In this new age of media, M&E companies must reckon with these unique and long-lasting challenges and seek to change their business models, cultures and technologies to keep up with the changing landscape. 

-Anthony Torabi, Media & Entertainment Strategic Account Executive

Simple & Secure Data Lakes with AWS Lake Formation

Data is the lifeblood of business. To help companies visualize their data, guide business decisions, and enhance their business operations requires employing machine learning services. But where to begin. Today, tremendous amounts of data are created by companies worldwide, often in disparate systems.

These large amounts of data, while helpful, don’t necessarily need to be processed immediately, yet need to be consolidated into a single source of truth to enable business value. Companies are faced with the issue of finding the best way to securely store their raw data for later use. One popular type of data store is referred to as a “data lake, and is very different from the traditional data warehouse.

Use Case: Data Lakes and McDonald’s

McDonald’s brings in about 1.5 million customers each day, creating 20-30 new data points with each of their transactions. The restaurant’s data comes from multiple data sources including a variety of data vendors, mobile apps, loyalty programs, CRM systems, etc. With all this data to use from various sources, the company wanted to build a complete perspective of a CLV and other useful analytics. To meet their needs for data collection and analytics, McDonald’s France partnered with 2nd Watch. The data lake allowed McDonald’s to ingest data into one source, reducing the effort required to manage and analyze their large amounts of data.

Due to their transition from a data warehouse to a data lake, McDonald’s France has greater visibility into the speed of service, customer lifetime value, and conversion rates. With an enhanced view of their data, the company can make better business decisions to improve their customers’ experience. So, what exactly is a data lake, how does it differ from a data warehouse, and how do they store data for companies like McDonald’s France?

What is a Data Lake?

A data lake is a centralized storage repository that holds a vast amount of raw data in its native format until it is needed for use. A data lake can include any combination of:

  • Structured data: highly organized data from relational databases
  • Semi-structured data: data with some organizational properties, such as HTML
  • Unstructured data: data without a predefined data model, such as email

Data Lakes are often mistaken for Data Warehouses, but the two data stores cannot be used interchangeably. Data Warehouses, the more traditional data store, process and store your data for analytical purposes. Filtering data through data warehouses occurs automatically, and the data can arrive from multiple locations. Data lakes, on the other hand, store and centralize data that comes in without processing it. Thus, there is no need to identify a specific purpose for the data as with a data warehouse environment. Your data, whether in its original form or curated form, can be stored in a data lake. Companies often choose a data lake for their flexibility in supporting any type of data, their scalability, analytics, machine learning capabilities, and low costs.

While Data Warehouses are appealing for their element of automatically curated data and fast results, data lakes can lead to several areas of improvement for your data and business including:

  • Improved customer interactions
  • Improved R&D innovation choices
  • Increase operational efficiencies

Essentially, a piece of information stored in a data lake will seem like a small drop in a big lake. Due to the lack of organization and security that tends to occur when storing large quantities of data in data lakes, this storing method has received some criticism. Additionally, setting up a data lake can be time and labor intensive, often taking months to complete. This is because, when built the traditional way, there are a series of steps that need to be completed and then repeated for different data sets.

Even once fully architected, there can be errors in the setup due to your data lakes being manually configured over an extended period. An important piece to your data lake is a data catalog, which uses machine learning capabilities to recognize data and create a universal schema when new datasets come into your data lake. Without defined mechanisms and proper governance, your data lake can quickly become a “data swamp”, where your data becomes hard to manage, analyze, and ultimately becomes unusable. Fortunately, there is a solution to all these problems. You can build a well-architected data lake in a short amount of time with AWS Lake Formation.

AWS Lake Formation & its Benefits

Traditionally, data lakes were set up as on-premises deployments before people realized the value and security provided by the cloud. These on-premises environments required continual adjustments for things like optimization and capacity planning—which is now easier due to cloud services like AWS Lake Formation. Deploying data lakes in the cloud provides scalability, availability, security, and faster time to build and deploy your data lake.

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days, saving your business a lot of time and effort to focus on other aspects of your business. While AWS Lake Formation significantly cuts down the time it takes to setup your data lake, it is built and deployed securely. Additionally, AWS Lake Formation enables you to break down data silos and combine a variety of analytics to gain data insights and ultimately guide better business decisions. The benefits delivered by this AWS service are:

  • Build data lakes quickly: To build a data lake in Lake Formation, you simply need to import data from databases already in AWS, other AWS sources, or from other external sources. Data stored in Amazon S3, for example, can be moved into your data lake, where your crawl, catalog, and prepare your data for analytics. Lake Formation also helps transform data with AWS Glue to prepare for it for quality analytics. Additionally, with AWS’s FindMatches, data can be cleaned and deduplicated to simplify your data.
  • Simplify security management: Security management is simpler with Lake Formation because it provides automatic server-side encryption, providing a secure foundation for your data. Security settings and access controls can also be configured to ensure high-level security. Ones configured with rules, Lake formation enforces your access controls. With Lake Formation, your security and governance standards will be met.
  • Provide self-service access to data: With large amounts of data in your data lake, finding the data you need for a specific purpose can be difficult. Through Lake Formation, your users can search for relevant data using custom fields such as name, contents, and sensitivity to make discovering data easier. Lake Formation can also be paired with AWS analytics services, such as Amazon Athena, Amazon Redshift, and Amazon EMR. For example, queries can be run through Amazon Athena using data that is registered with Lake Formation.

Building a data lake is one hurdle but building a well-architected and secure data lake is another. With Lake Formation, building and managing data lakes is much easier. On a secure cloud environment, your data will be safe and easy to access.

2nd Watch has been recognized as a Premier Consulting Partner by AWS for nearly a decade and our engineers are 100% certified on AWS. Contact us to learn more about AWS Lake Formation or to get assistance building your data lake.

-Tessa Foley, Marketing

3 Reasons Businesses Use Google Cloud Platform (GCP) for AI

Google Cloud Platform (GCP) offers a wide scope of artificial intelligence (AI) and machine learning (ML) services fit for a range of industries and use cases. With more businesses turning to AI for data-based innovation and new solutions, GCP services are proving effective. See why so many organizations are choosing Google Cloud to motivate, manage, and make change easy.

1. Experimentation and Cost Savings

Critical to the success of AI and ML models are data scientists. The more you enable, empower, and support your data scientists through the AI lifecycle, the more accurate and reliable your models will be. Key to any successful new strategy is flexibility and cost management. Oneway GCP reduces costs while offering enterprise flexibility is with Google’s AI Platform Notebooks.

Managed JuptyerLab notebook instances give data scientists functional flexibility – including access to BigQuery, with the ability to add CPUs, RAM, and GPUs to scale – cloud security, and data access with a streamlined experience from data to deployment. Relying on on-prem environments, data scientists are limited by resource availability and a variety of costs related data warehousing infrastructure, hosting, security, storage, and other expenses. JuptyerLab notebooks and Big Query, on the other hand, are pay as you go and always available via the AI Platform Notebooks. With cost-effective experimentation, you avoid over provisioning, only pay for what you use and when you run, and give data scientists powerful tools to get data solutions fast.

2. Access and Applications

AI and ML projects are only possible after unifying data. A common challenge to accomplishing this first step are data silos across the organization. These pockets of disjointed data across departments threaten the reliability and business outcomes of data-based decision making. The GCP platform is built on a foundation of integration and collaboration, giving teams the necessary tools and expansive services to gain new data insights for greater impacts.

For instance, GCP enables more than just data scientists to take advantage of their AI services, databases, and tools. Developers without data science experience can utilize APIs to incorporate ML into the solution without ever needing to build a model. Even others, who don’t have knowledge around data science, can create custom models that integrate into applications and websites using Cloud AutoML.

Additionally, BigQuery Omni, a new service from GCP, enables compatibility across platforms. BigQuery Omni enables you to query data residing in other places using standard SQL with the powerful engine of BigQuery. This innovation furthers your ability to join data quickly and without additional expertise for unobstructed applicability.

3. ML Training and Labs

Google enables users with best practices for cost-efficiency and performance. Through its Quiklabs platform, you get free, temporary access to GCP and AWS, to learn the cloud on the real thing, rather than simulations. Google also offers training courses ranging from 30-minute individual sessions, to multi-day sessions. The courses are built for introductory users, all the way up to expert level, and are instructor-led or self-paced. Thousands of topics are covered, including AI and ML, security, infrastructure, app dev, and many more.

With educational resources at their fingertips, data teams can roll up their sleeves, dive in, and find some sample data sets and labs, and experience the potential of GCP hands-on. Having the ability to experiment with labs without running up a bill – because it is in a sandbox environment – makes the actual implementation, training, and verification process faster, easier, and cost-effective. There is no danger of accidentally leaving a BigQuery system up and running, executing over and over, with a huge cost to the business.

Next Steps

If you’re contemplating AL and ML on Google Cloud Platform, get started with Quiklabs to see what’s possible. Whether you’re the one cheerleading AI and ML in your organization or the one everyone is seeking buy-in from, Quiklabs can help. See what’s possible on the platform before going full force on a strategy. Google is constantly adding new services and tools, so partner with experts you can trust to achieve the business transformation you’re expecting.

Contact 2nd Watch, a Google Cloud Partner with over 10 years of cloud experience, to discuss your use cases, level of complexity, and our advanced suite of capabilities with a cloud advisor.

Learn more

Webinar: 6 Essential Tactics for your Data & Analytics Strategy

Webinar:  Building an ML foundation for Google BigQuery ML & Looker

-Sam Tawfik, Sr Product Marketing Manager

3 Types of Employees That Can Use AI Offerings on Google Cloud

The Google Cloud Platform (GCP) comes with a number of services, databases, and tools to operationalize company-wide data management and analytics. With the insights and accessibility provided, you can leverage data into artificial intelligence (AI) and machine learning (ML) projects cost-efficiently. GCP empowers employees to apply their ideas and experience into data-based solutions and innovation for business growth. Here’s how.

1. Developers without Data Science Experience

With GCP, developers can connect their software engineering experience with AI capabilities to produce powerful results. Using product APIs, developers can incorporate ML into the product without ever having to build a model.

Let’s take training videos for example – Your company has thousands of training videos varying in length and across subjects. They include everything from full-day trainings on BigQuery, to minutes-long security trainings. How do you operationalize all that information for employees to quickly find exactly what they want?

Using Google’s Cloud Video Intelligence API, the developer can transcribe not only every single video, word-for-word, but also document the start and end time of every word, in every video. The developer builds a search index on top of the API, and just like that, users can search specific content in thousands of videos. Results display both the relevant videos and timestamps within the videos, where the keyword is found. Now employees can immediately find the topic they want to learn more about, without needing to sift through what could be hours of unrelated information.

Additional APIs include, Cloud Natural Language, Speech-to-Text, Text-to-Speech, Cloud Data Loss Prevention, and many others in ML.

2. Everyone without Data Science Experience, who isn’t a Developer

Cloud AutoML enables your less technical employees to harness the power of machine learning. It bridges the gap between the API and building your own ML model. Using AutoML, anyone can create custom models tailored to your business needs, and then integrate those models into applications and websites.

For this example, let’s say you’re a global organization who needs to translate communications across dialects and business domains. The intricacies and complexities of natural language require expensive linguists and specialist translators with domain-specific expertise. How do you communicate in real time effectively, respectfully, and cost-efficiently?

With AutoML Translation, almost anyone can create translation models that return query results specific to your domain, in 50 different language pairs. It graphically ingests your data from any type of Sheet or CSV file. The input data necessary is pairs of sentences that mean the same thing in both the language you want to translate from, and the one you want to translate to. Google goes the extra mile between generic translation and specific, niche vocabularies with an added layer of specificity to help the model get the right translation for domain-specific material. Within an hour, the model translates based on your domain, taxonomy, and the data you provided.

Cloud AutoML is available for platform, sight, structured data, and additional language capabilities.

3. Data Scientists

Data scientists have the experience and data knowledge to take full advantage of GCP AI tools for ML. One of the issues data scientists often confront is notebook functionality and accessibility. Whether its TensorFlow, PyTorch, or JupyterLab, these open source ML platforms require too many resources to run on a local computer, or easily connect to BigQuery.

Google AI Platform Notebooks is a managed service that provides a pre-configured environment to support these popular data science libraries. From a security standpoint, AI Platform Notebooks is attractive to enterprises for the added security of the cloud. Relying on a local device, you run the risk of human error, theft, and fatal accidents. Equipped with a hosted, integrated, secure, and protected JupyterLab environment, data scientists can do the following:

  • Virtualize in the cloud
  • Connect to GCP tools and services, including BigQuery
  • Develop new models
  • Access existing models
  • Customize instances
  • Use Git / GitHub
  • Add CPUs, RAM, and GPUs to scale
  • Deploy models into production
  • Backup machines

With a seamless experience from data to a deployed ML model, data scientists are empowered to work faster, smarter, and safer. Contact Us to further your organization’s ability to maximize data, AI, and ML.

Here are a few resources for those who wish to learn more about this subject:

Sam Tawfik, Sr Product Marketing Manager

Maximizing Cloud Data with Google Cloud Platform Services

If you’re trying to run your business smarter, not harder, utilizing data to gain insights into decision making gives you a competitive advantage. Cloud data offerings empower utilization of data in the cloud, and the Google Cloud Platform (GCP) is full of options. Whether you’re migrating data, upgrading to enterprise-class databases, or transforming customer experience on cloud-native databases – Google Cloud services can fit your needs.

Highlighting some of what Google has to offer

With so many data offerings from GCP, it’s nearly impossible to summarize them all. Some are open source projects being distributed by other vendors, while others were organically created by Google to service their own needs before being externalized to customers. A few of the most popular and widely used include the following.

  • BigQuery: Core to GCP, this serverless, scalable, multi-cloud, data warehouse enables business agility – including data manipulation and data transformation, and it is the engine for AI, machine learning (ML), and forecasting.
  • Cloud SQL: Traditional relational database in the cloud that reduces maintenance costs with fully managed services for MySQL, PostgreSQL, and SQL Server.
  • Spanner: Another fully managed relational database offering unlimited scale, consistency, and almost 100% availability – ideal for supply chain and inventory management across regions and between two databases.
  • Bigtable: Low latency, NoSQL, fully managed database for ML and forecasting, using very large amounts of data in analytical and operational workloads.
  • Data Fusion: Fully managed, cloud-native data integration tool that enables you to move different data sources to different targets – includes over 150 preconfigured connectors and transformers.
  • Firestore: From the Firebase world comes the next generation of Datastore. This cloud-native, NoSQL, document database lets you develop custom apps that directly connect to the database in real-time.
  • Cloud Storage: Object based storage can be considered a database because of all the things you can do with BigQuery – including using standard SQL language to query objects in storage.

Why BigQuery?

After more than 10 years of development, BigQuery has become a foundational data management tool for thousands of businesses. With a large ecosystem of integration partners and a powerful engine that shards queries across petabytes of data and delivers a response in seconds, there are many reasons BigQuery has stood the test of time. It’s more than just super speed, data availability, and insights.

Standard SQL language
If you know SQL, you know BigQuery. As a fully managed platform, it’s easy to learn and use. Simply populate the data and that’s it! You can also bring in large public datasets to experiment and further learn within the platform.

Front-end data
If you don’t have Looker, Tableau, or another type of business intelligence (BI) tool to visualize dashboards off of BigQuery, you can use the software development kit (SDK) for web-based front-end data display. For example, government health agencies can show the public real-time COVID-19 case numbers as they’re being reported. The ecosystem of BigQuery is so broad that it’s a source of truth for your reports, dashboards, and external data representations.

Analogous across offerings

Coming from on-prem, you may be pulling data into multiple platforms – BigQuery being one of them. GCP offerings have a similar interface and easy navigation, so functionality, user experience, and even endpoint verbs are the same. Easily manage different types of data based on the platforms and tools that deliver the most value.

BigQuery Omni

One of the latest GCP services was built with a similar API and platform console to various other platforms. The compatibility enables you to query data living in other places using standard SQL. With BigQuery Omni, you can connect and combine data from outside GCP without having to learn a new language.

Ready for the next step in your cloud journey?

As a Google Cloud Partner, 2nd Watch is here to be your trusted cloud advisor throughout your cloud data journey, empowering you to fuel business growth while reducing cloud complexity. Whether you’re embracing cloud data for the first time or finding new opportunities and solutions with AI, ML, and data science our team of data scientists can help. Contact Us for a targeted consultation and explore our full suite of advanced capabilities.

Learn more

Webinar: 6 Essential Tactics for your Data & Analytics Strategy

Webinar:  Building an ML foundation for Google BigQuery ML & Looker

-Sam Tawfik, Sr Product Marketing Manager

Cloud Crunch Podcast: 5 Strategies to Maximize Your Cloud’s Value – Create Competitive Advantage from your Data

AWS Data Expert, Saunak Chandra, joins today’s episode to break down the first of five strategies used to maximize your cloud’s value – creating competitive advantage from your data. We look at tactics including Amazon Redshift, RA3 node type, best practices for performance, data warehouses, and varying data structures. Listen now on Spotify, iTunes, iHeart Radio, Stitcher, or wherever you get your podcasts.

We’d love to hear from you! Email us at CloudCrunch@2ndwatch.com with comments, questions and ideas.

Google Cloud, Open-Source and Enterprise Solutions

In 2020, a year where enterprises had to rethink their business models to stay alive, Google Cloud was able to grow 47% and capture market share. If you are not already looking at Google Cloud as part of your cloud strategy, you probably should.

Google has made conscious choices about not locking in customers with proprietary technology. Open-source technology has, for many years, been a core focus for Google, and many of Google Cloud’s solutions can integrate easily with other cloud providers.

Kubernetes (GKE), Knative (Cloud Functions), TensorFlow (Machine Learning), and Apache Beam (Data Pipelines) are some examples of cloud-agnostic tools that Google has open-sourced and which can be deployed to other clouds as well as on-premises, if you ever have a reason to do so.

Specifically, some of Google Cloud’s services and its go-to-market strategy set Google Cloud apart. Modern and scalable solutions like BigQuery, Looker, and Anthos fall into this category. They are best of class tools for each of their use cases, and if you are serious about your digital transformation efforts, you should evaluate their capabilities and understand what they can do for your business.

Three critical challenges we see from our enterprise clients here at 2nd Watch repeatedly include:

  1. How to get started with public cloud
  2. How to better leverage their data
  3. How to take advantage of multiple clouds

Let’s dive into each of these.

Foundation

Ask any architect if they would build a house without a foundation, and they would undisputedly tell you “No.” Unfortunately, many companies new to the cloud do precisely that. The most crucial step in preparing an enterprise to adopt a new cloud platform is to set up the foundation.

Future standards are dictated in the foundation, so building it incorrectly will cause unnecessary pain and suffering to your valuable engineering resources. The proper foundation, that includes your project structure aligned with your project lifecycle and environments, and a CI/CD pipeline to push infrastructure changes through code will enable your teams to become more agile while managing infrastructure in a modern way.

A foundation’s essential blocks include project structure, network segmentation, security, IAM, and logging. Google has a multi-cloud tool called Cloud Operations for logs management, reporting, and alerting, or you can ingest logs into existing tools or set up the brand of firewalls you’re most familiar and comfortable with from the Google Cloud Marketplace. Depending on your existing tools and industry regulations, compliance best practices might vary slightly, guiding you in one direction or another.

DataOps

Google has, since its inception, been an analytics powerhouse. The amount of data moving through Google’s global fiber network at any given time is incredible. Why does this matter to you? Google has now made some of its internal tools that manage large amounts of data available to you, enabling you to better leverage your data. BigQuery is one of these tools.

Being serverless, you can get started with BigQuery on a budget, and it can scale to petabytes of data without breaking a sweat. If you have managed data warehouses, you know that scaling them and keeping them performant is a task that is not easy. With BigQuery, it is.

Another valuable tool, Looker, makes visualizing your data easy. It enables departments to share a single source of truth, which breaks down data silos and enables collaboration between departments with dashboards and views for data science and business analysis.

Hybrid Cloud Solutions

Google Cloud offers several services for multi-cloud capabilities, but let’s focus on Anthos here. Anthos provides a way to run Kubernetes clusters on Google Cloud, AWS, Azure, on-premises, or even on the edge while maintaining a single pane of glass for deploying and managing your containerized applications.

With Anthos, you can deploy applications virtually anywhere and serve your users from the cloud datacenter nearest them, across all providers, or run apps at the edge – like at local franchise restaurants or oil drilling rigs – all with the familiar interfaces and APIs your development and operations teams know and love from Kubernetes.

Currently in preview, soon Google Cloud will release BigQuery Omni to the public. BigQuery Omni lets you extend the capabilities of BigQuery to the other major cloud providers. Behind the scenes, BigQuery Omni runs on top of Anthos and Google takes care of scaling and running the clusters, so you only have to worry about writing queries and analyzing data, regardless of where your data lives. For some enterprises that have already adopted BigQuery, this can mean a ton of cost savings in data transfer charges between clouds as your queries run where your data lives.

Google Cloud offers some unmatched open-source technology and solutions for enterprises you can leverage to gain competitive advantages. 2nd Watch has helped organizations overcome business challenges and meet objectives with similar technology, implementations, and strategies on all major cloud providers, and we would be happy to assist you in getting to the next level on Google Cloud.

2nd Watch is here to serve as your trusted cloud data and analytics advisor. When you’re ready to take the next step with your data, contact Us.

Learn more

Webinar: 6 Essential Tactics for your Data & Analytics Strategy

Webinar:  Building an ML foundation for Google BigQuery ML & Looker

-Aleksander Hansson, 2nd Watch Google Cloud Specialist

Ready to Migrate your Data to the Cloud? Answer these 4 Questions to find Out

Many companies are already storing their data in the cloud and even more are considering making the migration to the cloud. The cloud offers unique benefits for data access and consolidation, but some businesses choose to keep their data on-prem for various reasons. Data migration isn’t a one size fits all formula, so when developing your data strategy, think about your long-term needs and goals for optimal results.

We recommend evaluating these 4 questions before making the decision to migrate your data to the cloud:

1. Why do you Want to Migrate your Data to the Cloud?

Typically, there are two reasons businesses find themselves in a position of wanting to change their IT infrastructure. Either your legacy platform is reaching end of life (EOL) and you’re forced to make a change, or it’s time to modernize. If you’re faced with the latter – your business data expanded beyond the EOL platform – it’s a good indication migrating to the cloud is right for you. The benefits of cloud-based storage can drastically improve your business agility.

2. What is Important to You?

You need to know why you’re choosing the platform you are deploying and how it’s going to support your business goals better than other options. Three central arguments for cloud storage – that are industry and business agnostic – include:

  • Agility: If you need to move quickly (and what business doesn’t?), the cloud is for you. It’s easy to start, and you can spin up a cloud environment and have a solution deployed within minutes or hours. There’s no capital expense, no server deployment, and no need for an IT implementation team.
  • Pay as you go: If you like starting small, testing things before you go all in, and only paying for what you use, the cloud is for you. It’s a very attractive feature for businesses hesitant to move all their data at once. You get the freedom and flexibility to try it out, with minimal financial risk. If it’s not a good fit for your business, you’ve learned some things, and can use the experience going forward. But chances are, the benefits you’ll find once utilizing cloud features will more than prove their value.
  • Innovation: If you want to ride the technology wave, the cloud is for you. Companies release new software and features to improve the cloud every day, and there’s no long release cycles. Modernized technologies and applications are available as soon as they’re released to advance your business capabilities based on your data.

3. What is your Baseline?

The more you can plan for potential challenges in advance, the better. As you consider data migration to the cloud, think about what your data looks like today. If you have an on-prem solution, like a data warehouse, lift and shift is an attractive migration plan because it’s fairly easy.

Many businesses have a collection of application databases and haven’t yet consolidated their data. They need to pull the data out, stage it, and store it without interfering with the applications. The main cloud providers offer different, but similar options to get your data into a place where it can be used. AWS offers S3, Google Cloud has Cloud Storage, and Azure provides Blob storage. Later, you can pull the data into a data warehousing solution like AWS Redshift, Google BigQuery, Microsoft Synapse, or Snowflake.

4. How do you Plan to use your Data?

Always start with a business case and think strategically about how you’ll use your data. The technology should fit the business, not the other way around. Once you’ve determined that, garner the support and buy-in of sponsors and stakeholders to champion the proof of concept. Bring IT and business objectives together by defining the requirements and the success criteria. How do you know when the project is successful? How will the data prove its value in the cloud?

As you move forward with implementation, start small, establish a reasonable timeline, and take a conservative approach. Success is crucial for ongoing replication and investment. Once everyone agrees the project has met the success criteria, celebrate loudly! Demonstrate the new capabilities, and highlight overall business benefits and impact, to build and continue momentum.

Be Aware of your Limitations

When entering anything unknown, remember that you don’t know what you don’t know. You may have heard things about the cloud or on-prem environments anecdotally, but making the decision of when and how to migrate data is too important to do without a trusted partner. You risk missing out on big opportunities, or worse, wasting time, money, and resources without gaining any value.

2nd Watch is here to serve as your trusted cloud advisor, so when you’re ready to take the next step with your data, contact Us.

Learn more about 2nd Watch Data and Analytics services

-Sam Tawfik, Sr Product Marketing Manager, Data & Analytics

Migrating Data to Snowflake – An Overview

When considering migrating your data to the cloud, everyone’s familiar with the three major cloud providers – AWS, Google Cloud, and Microsoft Azure. But there are a few other players you should also take note of. Snowflake is a leading cloud data platform that offers exceptional design, scalability, simplicity, and return on investment (ROI).

What is Snowflake?

The Snowflake cloud data platform was born in the cloud for data warehousing. It’s built entirely to maximize cloud usage and designed for almost unlimited scalability. Users like the simplicity, and businesses gain significant ROI from the wide range of use cases Snowflake supports.

Out of the box, Snowflake is easy to interact with through its web interface. Without having to download any applications, users can connect with Snowflake and create additional user accounts for a fast and streamlined process. Additionally, Snowflake performs as a data platform, rather than just a data warehouse. Data ingestion is cloud native and existing tools enable effortless data migration.

Business Drivers

The decision to migrate data to a new cloud environment, or data warehousing solution, needs to be based on clearly defined value. Why are you making the transition? What’s your motivation? Maybe you need to scale up, or there’s some sort of division or business requirement for migration. Often times, companies have a particular implementation that needs to change, or they have specific needs that aren’t being met by their current data environment.

Take one of our clients, for instance. When the client’s company was acquired, they came to utilize a data warehouse shared by all the companies the acquiring company owned. When the client was eventually sold, they needed their own implementation and strategy for migrating data into the cloud. Together, we took the opportunity to evaluate some of the newer data platform tools, like Snowflake, for their specific business case and to migrate quickly to an independent data platform.

With Snowflake, set up was minimal and supported our client’s need for a large number of database users. Migrating from the shared data warehouse to Snowflake was relatively easy, and it gave all users access through a simple web interface. Snowflake also provided more support for unstructured data usage, which simplified querying things like JSON or nested data.

Implementation

Migrating data to Snowflake is generally a smooth transition because Snowflake accepts data from your existing platform. For instance, if data is stored in Amazon S3, Google Cloud, or Azure, you can create Snowflake environments in each then ingest the data using SQL commands and configuration. Not only can you run all the same queries with minor tweaks and get the same output, but Snowflake also fits additional needs and requirements. If you’ve worked in SQL in any manner – on an application database, or in data warehousing – training is minimal.

Another advantage with Snowflake is its ability to scale either horizontally or vertically to pull in any amount of data. And since it is cloud native, Snowflake has embraced the movement toward ‘pay as you go’ – in fact, that’s their entire structure. You only pay for the ingestion time and when the data warehouse is running. After that, it shuts off, and so does your payment. Cost-effective implementation lets you experiment, compare, test, and iterate on the best way to migrate each piece of your data lifecycle.

Long Term Results

Snowflake has yielded successful data migrations with users because of its ease of use and absence of complications. Users also see performance improvements because they’re able to get their data faster than ever and they can grow with Snowflake, bringing in new and additional data sources and tools, taking advantage of artificial intelligence and machine learning, increasing automation, and experimenting and iterating.

From a security and governance perspective, Snowflake is strong. Snowflake enforces a multi-layer security structure, including user management. You can grant access to certain groups, organize them accordingly, integrate with your active directory, and have it run with those permissions. You assign an administrator to regulate specific accessibility for tables in specified areas. Snowflake also lets you choose your desired security level during implementation. You have the option of enterprise level, HIPAA compliance, and a maximum security level with a higher rate per second.

Do you want to explore data migration opportunities? Make the most of your data by partnering with trusted experts. We’re here to help you migrate, store, and utilize data to grow your business and streamline operations. If you’re ready to the next step in your data journey, Contact Us.

Learn more about 2nd Watch Data and Analytics services

-Sam Tawfik, Sr Product Marketing Manager, Data & Analytics