Posted by Brian McCallion ● May 24, 2014 3:11:57 PM

Enterprise Public Cloud Data Warehouse Case Study

Enterprise Public Cloud Data Warehouse Case Study

While the line of business owns significant applications in the data center, accessing this data wasn't easy for the various stakeholders. Further, while the service levels for the Production data sets were understood, the organization didn't invest in the "lower environments" such as Development in the same way as Production environments. And in the case of Production, the kinds of queries and the amount of data that could be stored was restricted by the existing capacity. To request and provision new hardware to meet the demand for Ad Hoc queries, or to test ideas meant months, after which surely the inspiration and opportunity was lost. And this happened continuously. Rather than allow this external force to shape the culture and destiny of the business, the executives decided to build out the infrastructure for a Data Mart and Data Warehouse in the Public Cloud. Such as journey isn't easy to make the first time, yet the two roads were clear.

 

Business Objectives

  1. To keep highly educated and specialized "Quants" engaged the business needed to provide them the resources to run their simulations and tests and to host the data sets needed to support these tests.
  2. Moore's Law meant nothing in the data center if the competing projects of many other business units sharing the data center resulted in long delays and consumed the time and passion of the business with red tape and organizational latency.

 

Benefits

High performance access to data by "Data Scientists" aka "Quants enables building and testing models in development using high performance infrastructure while reducing risk of Production impact

Eliminate the noise and collateral risk of internal data center operations disrupting critical work in the Development environment.

Unexpected Benefits

While the teams expected the solution to provide the necessary resources, rather than just "one-for-one" resource provisioning, the teams also found an entirely different dimension to working with Cloud Infrastructure. In the Cloud the team could run the applications and databases, but also manage the data and infrastructure in ways that made them more self-reliant and production.

One of the often overlooked aspect of Cloud Infrastructure is that as a "Shared Service" where complete strangers build and run applications on a common underlying infrastructure, Cloud Service Providers have solved a problem which remains unsolved by legacy vendors and unsolved by most Corporate IT organizations. Cloud access and resource controls and the isolation of resources from each other empower teams to work more effectively. In order for Cloud IaaS to work as a business, many different types of customers of varying degrees of technical experience and knowledge need to be able to provision and manage their infrastructure.

  1. Cloud Infrastructure is easier to configure, and because it is accessed by many users, the tools and apis for doing so have become reliable.
  2. Cloud infrastructure isolates users from other users.

  3. Cloud infrastructure isolates users from the underlying data center operations (capacity planning, network changes, adding capacity, refreshing capacity.

Over time, the application team to manage data with tools such as "snapshots" and "on demand provisioning"  the application team could work within an isolated network, yet work with powerful tools for managing their data infrastructure.

Design and lead the successful migration of the  Data Mart and Data Warehouse to Amazon Web Services
Work with stakeholders to assess, define, and understand the organization's policies governing the data.
 

Technical Solution Components

  • Microstrategy and Informatica data mart and management platform to AWS Virtual Private Cloud.
  • PII Data Protected via Oracle TDE and SQL Server Enterprise TDE
  • Preferred IOPS and IO Optimized instances
  • 3TB SQL Server based data mart and a 6TB Oracle datawarehouse running on EC2.
  • Instances, managed using AWS DataPipeline

Topics: Amazon Web Services, Amazon RedShift, Big Data, blog, business, Cloud Adoption, Cloud Capability, Cloud Computing, Cloud Strategy, data center, data warehouse, Financial Services, Public Cloud, Solutions, SQL Server, Uncategorized