Financial Data’s Journey to the Cloud: An Introductory Overview

The financial services industry is undergoing a sustained push towards cloud architectures for compute and storage. Firms are at drastically different stages of their migration journey, but almost uniformly the industry remains a long way from being cloud-native.

In cloud compute, basic techniques have given way to repeatable patterns. Early implementations basically resembled remote data centers, where individual compute instances were allocated based on projected need, the same way a new server might be on the rack on-prem. As developers and architects came to better understand the implications and possibilities of the new technology, more sophisticated patterns have emerged to maximize resource usage and developer efficiency. These include container orchestration, serverless architectures, and related techniques.

The movement towards fully realizing the potential of data in the cloud is not nearly as mature. Some firms – particularly those with experience with big data technologies like Hadoop and Cassandra – took early advantage of storage technologies like Amazon’s S3 for simple, scaled use cases, in a manner analogous to simply provisioning a new server in Amazon Elastic Compute Cloud (EC2). But consensus is still lacking on more sophisticated uses of the technology to solve the complete set of storage use cases of the traditional financial services enterprise.

Cloud providers have introduced an unprecedented variety of modern database technologies, information security frameworks, and infrastructure choices. This explosion of implementation options requires careful analysis of data ingestion, production, storage requirements, and consumption patterns. It also demands architectural design that factors in fitness for use and information security considerations.

The multiple competing on-prem models which have matured over the last several decades are now reasonably well understood by large enterprises. Typical legacy technology considerations included questions like:

How do I choose between a few monopolistic relational database providers?
How do I scale and allocate storage infrastructure in the data center?
How do I lessen input/output (I/O) overheads in my systems given hardware limitations?
How do I use NoSQL-style data lakes and unstructured data stores to centralize my data?

The cloud presents software developers with a significantly more complex set of choices and trade-offs, in an operating environment with which they have far less experience. Faced with these unique complexities, it’s unsurprising that financial firms and regulators are struggling with various aspects of data migration to the cloud.

Data governance, security, and compliance

There are therefore far more opportunities in the cloud for developers to unknowingly step into traps they may not even know exist.

Cloud infrastructure and the rise of DevOps have driven a convergence of technical responsibilities and skillsets that used to be separate and specialized. For example, a software developer working on-prem in a large organization could safely make assumptions about their production environments, the path to provisioning those environments and the security of the data center. In DevOps, the boundaries between developer and system administrator are deliberately knocked down. In the cloud, developers routinely provision their own compute and data infrastructure using IaaS and do deployments in real time through automation.

There are therefore far more opportunities in the cloud for developers to unknowingly step into traps they may not even know exist. To have confidence in full-blown adoption, enterprises must redesign their existing compliance, cybersecurity, and infrastructure capabilities, while simultaneously getting up to speed on both the potential and risks of the same technologies.

Financial services firms are heavily regulated and operate in highly sensitive commercial markets; both factors make data confidentiality paramount. Enterprises have to consider the disposition of each piece of data – whether it is sensitive by regulation, commercially confidential, someone else’s IP, etc. – and then decide on the right trade-offs against the technology’s benefits, as well as the lengths to which they go to mitigate risks.

Modern database technologies

Firms need to build the right expertise to provide guidance to their developers to avoid an explosion of complexity and cost. In addition to making sure the right technology is being used to solve the right problem.

At the outset of a new greenfield project in the pre-cloud era, a typical enterprise’s database decision matrix was straightforward but tedious. The enterprise would usually have existing support agreements with one or possibly two large Relational Database Management System (RDBMS) providers, an existing database Center of Excellence with a well-staffed complement of database administrators (DBAs) and other unfriendly database professionals.

A ticket would be filled out with estimated storage requirements. Some organizations would require DBAs to be involved in schema design. Projects with more complex compute and storage needs might take a slightly different route, in idiosyncratic cases requiring a hardware-software utility or advanced hardware like a Storage Area Network (SAN), but this was the minority of applications outside of pricing, risk, and other intensive analytics. Generally the process was well understood and the number of alternatives constrained.

In organizations progressing in their cloud journey, the architecture team may have provided clear guidance on the permitted or recommended technologies. But containerization and Continuous Integration/Continuous Delivery techniques have made it easier to package together amalgams of many different operations support systems, frameworks, and custom code and push them into the cloud. This makes many previously niche alternative database technologies accessible: from graph databases, to document stores, key-value stores, columnar-oriented databases, traditional relational databases (including in memory), and hybrids across the spectrum.

In addition, the major cloud providers each have their own packaging of similar technologies available as managed services. In a multi-cloud environment in particular, the combinatorial space is huge. Firms need to build the right expertise to provide guidance to their developers and avoid an explosion of complexity, or technology being used as an enthusiastic developer’s science experiment.

Cost considerations

Planning for scaled data usage in the cloud is much more complex than in the old model. It is inexorably intertwined for many use cases with planning the correct compute architecture and choosing the right database solution for the actual problem being solved.

Having evolved over decades, data centers in most enterprises have been heavily optimized. Mature organizational processes exist to inform planning and provisioning; in some cases, firms have successfully outsourced the complexity to third parties. But this typically comes with trade-offs of time to market and duplication. Turn-around times for provisioning new infrastructure can be very long, for reasons of both bureaucracy and logistics. Good business continuity planning and disaster recovery can require massive over-provisioning, both in absolute terms and for individual geographies.

The promise of cloud is to fully outsource much of this complexity for massive cost-savings, but realizing this promise is not necessarily straightforward. Cloud adopters can often discover early on how easy it is to rack up a massive bill. This has led to revolutionary new cost optimization approaches in compute, but is still not particularly well addressed in data.

Planning for scaled data usage in the cloud is much more complex than in the old model. It is inexorably intertwined for many use cases with planning the correct compute architecture and choosing the right database solution for the actual problem being solved. Organizations are continuing to learn this the hard way and laboring to find new paths forward.

Conclusion

A huge amount of experimentation and activity is taking place in financial services around cloud adoption, both in compute and data. New revolutionary approaches are being defined by the minute, addressing many of the problems of the past, complexities of the moment, and opportunities of the future. However, much work remains to be done and consensus is still to emerge on many topics. Lab49 will be publishing a series of articles in Q1 2021 going in depth on each of these topics. If you would like to sign up to receive the follow ups, please register for our newsletter here. Alternatively, you can follow us on LinkedIn, and continue to check in with us here on our website.