In the latest episode of our Energy Transition Talks series, CGI Vice-President, Consulting – Data and Analytics Doug Leal discusses with Peter Warren the evolving landscape of data use in the energy and utilities sector, particularly in light of new AI applications. In the first instalment of this two-part conversation, they explore the challenges of scaling AI models, the move away from experimentation toward practical solutions and two key approaches to data management: the Data Lakehouse and the Data Mesh—both of which are shaping the future of data strategies’ success.
Utility organizations are facing increasing pressure to leverage data effectively for decision-making. This involves the integration of various data sources, such as Advanced Metering Infrastructure (AMI) and outage management systems, to enhance operational insights. While some organizations are already progressing in this area, Doug says, many are still in the early stages of their data journey.
Doug and Peter discuss two distinct approaches to AI: one that treats it as a novel tool to explore, and another that focuses on practical problem-solving. The latter, Doug says, is essential for developing a strategic approach to AI implementation, ensuring that solutions are not only effective for immediate challenges but also adaptable for future developments.
“We need to be able to build a model or any type of AI solution in a way that will enable the organization to scale—not only scale that model for production, but also for everything that comes after that model, the innovation that comes after that model.”
The challenge of transitioning from proof of concept (POC) to production
Typically, a business unit recognizes the potential of a technology or model and decides to invest further. However, without a well-defined operational process to transition from proof of concept (POC) or proof of value to full production, this can create significant challenges and bottlenecks.
As Doug shares, only 53% of models successfully progress from POC to production, making it an expensive endeavor when roughly half fail to deliver results.
Shifting focus to Minimal Viable Products (MVPs) and practicality
Peter agrees, citing a current client’s approach that skips the POC phase entirely, jumping ahead to develop minimal viable products (MVPs) right away. He explains their strategy involves creating solutions that are aligned with their organizational goals and can be effectively scaled. This ensures that the IT team can support the growth of these products and that the business can derive tangible value from them.
Doug has also noticed a shift in mindset among clients. As he sees it, there’s a growing emphasis on how to effectively transition ideas into production rather than just experimenting, reflecting an increased understanding of the importance of assessing the real value and return on investment for these initiatives. Given the substantial costs associated with infrastructure, data scientists and machine learning engineers required for model development, organizations are increasingly cautious about treating these efforts as mere experiments.
Understanding the Data Lakehouse as a unified, scalable platform
Looking at new tools that organizations are using to accelerate outcomes, Peter and Doug explore the concept of the Data Lakehouse. The Data Lakehouse is an innovative architectural model that merges the characteristics of both Data Lakes and Data Warehouses, serving as a unified platform that consolidates data from various sources in a utility organization.:
- Data Lake: The Data Lake's cloud-based infrastructure allows for easy scalability, eliminating the need for physical storage upgrades. This component offers the flexibility to store a wide variety of data types, including:
- Structured Data: Traditional tabular formats, such as those found in relational databases.
- Semi-Structured Data: Formats like JSON documents or PDFs.
- Unstructured Data: Media files such as images, audio and video.
- Data Warehouse: In contrast, this element provides reliable performance for business intelligence and reporting tasks. It ensures that organizations can analyze data efficiently.
The Data Lakehouse combines the flexibility of the Data Lake with the predictable performance of a Data Warehouse, supporting diverse business functions and catering to various skill levels within the organization. This centralized approach ensures that all teams within the organization can access and apply data effectively, promoting collaboration and informed decision-making.
Understanding the Data Mesh concept as an enabler of business agility
By contrast, Doug explains, the Data Mesh is a sociotechnical framework designed to enhance the sharing, access, management and analysis of data across organizations. Unlike traditional approaches that rely on centralized data platforms, such as enterprise Data Warehouses or Lakehouses, Data Mesh advocates for a decentralized model. This shift aims to empower business units to take ownership of their data, ensuring they are responsible for data quality and insights.
Key principles of Data Mesh
- Decentralization: Data Mesh encourages the decentralization of data platforms, moving away from a singular, large data repository. Instead of creating isolated data silos, it promotes the establishment of domain-specific data platforms.
- Ownership transfer: For instance, in a utility organization, the generation team would manage its own domain-based Data Lakehouse. This transition of ownership from IT to business units allows those closest to the data to leverage their expertise for better insights.
- Cultural shift: Implementing a Data Mesh requires a significant cultural change, as it involves granting business teams control over data management, which may not be suitable for every organization.
Despite the shift towards decentralization, Doug says, IT still plays a crucial role in the Data Mesh framework. IT is responsible for establishing enterprise data governance and providing infrastructure as a service to support business operations.
One of the primary advantages of adopting a Data Mesh approach is business agility. By enabling business units to access and analyze data independently, organizations can respond more swiftly to insights without relying on IT or other departments.
Stay tuned: Organizational structure and culture shifts ensure data strategy success
For a successful implementation of Data Mesh, Doug emphasizes the need for an established, robust data governance process, ensuring that the decentralized model operates effectively and maintains data integrity.
In part two of the conversation, Peter and Doug explore organizational structure and data governance, which are crucial for successful implementation of these approaches.
Listen to part two of the conversation here.
Listen to other podcasts in this series to learn more about the energy transition
Read the transcript
- 1. Introduction to the importance of data in energy and utilities
-
Peter Warren:
Hello again everyone and welcome to another session on our exploration of all things energy and utilities, as well as the intersections with other industries. Today, we're going to be diving into data. The last few podcasts we've talked about how important data is but we never answered the questions “How do you get to the data?” and “What are some techniques to move forward?” Today, I'm very privileged to have an expert in that area. Doug, do you want to introduce yourself?
Doug Leal
Absolutely. I’m Doug Leal, a technical Vice-President at CGI with more than 23 years of experience in the data and analytics area. I'm one of the leads here at the CGI Data and Analytics practice, serving our clients in the southeastern United States. It's a very exciting time to be working with data.
Peter Warren:
Yeah, I think that's an interesting thing. We talked on previous podcasts, but I'll just summarize it again. All the other industries when we did our Voice of Our Client interviews (for those that are familiar with it, that's global research we conduct with our existing clients and other folks), very much like what an analytics firm would do. This year, in energy and utilities, there was a dip in executives’ satisfaction with data. Doug, my apothecary hypothesis on that was that they're rethinking their data because they've got new uses for AI. Do you concur with that? How do you see data being re-envisioned in the energy and utilities sector?
- 2. Utility organizations’ pressure to become data-driven
-
Doug Leal
Yeah, so utility organizations are facing increasing pressure to become more data-driven. And to make better decisions, they need to be able to collect, store, analyze and drive insights of large amounts of data from different sources. Think about AMI or outage systems and workforce management.
So, it is a journey. Most utility organizations, they are just starting. Some of them, they are well ahead of the curve, right? They are positioned to build a scalable platform to support not only traditional machine learning, but also more advanced AI use cases, such as generative AI, which is very popular nowadays.
- 3. Challenges of moving from Proof of Concept (POC) to production
-
Peter Warren:
We see that happening in Gen AI and things like that; it seems that a lot of people are dabbling in the problem or the possibility. I guess there's two approaches. One is: “I've got a great toy. What can I use it for?” And the other one is more practical: “I've got a problem. How do I solve it?” What's your viewpoint on how to approach these issues?
Doug Leal
That's a great question because as technologists, sometimes we focus only on the problem at hand. And we need to be focusing on the problem at hand, but not only tackle that problem in a way that is not scalable. We need to be able to build a model or any type of AI solution in a way that will enable the corporation or the organization to scale—not only scale that model for production, but also for everything that comes after that model, the innovation that comes after that model. Usually what happens is one business unit sees the value of that technology or the value of that model and they say, “Okay, let's invest more.” But if you do not have an operational process to take that model from proof of concept (POC)—or proof of value, if you will— to production, it becomes a challenge. It becomes a bottleneck. I have an amazing statistic here that only 53 % of the models actually make from POC, or approval value, to production. So, it is a very expensive experiment to have only roughly half of those models make it to production and delivery.
Peter Warren
I think that's a big point. One of the clients I'm working with right here is heading into that approach. You know, their theory is they really don't want to do POC, the proof of value, definitely, but jump right to minimal viable products (MVPs), which is what you were talking about. How can I start with something that I can, I have already got the plans and vision to take to scale? In other words, it's compatible with my organization. It's not built on something I don't support. My IT team can grow it. My business can get value out of it.
And that's what I think people are doing; it’s less of the playing that we saw even six months ago or even three months ago, and much more of “How do I take this idea into production?” Do you see that as a trend?
Doug Leal
Yes, absolutely. It is a trend where anyone that has that proposition, has that use case, is stepping back and trying to find the real value. It should be spending time and money of our resources investing on this use case. What is the value behind it? What is the return on our investment behind it? Because it is not a cheap experiment when you add not only the infrastructure, but also all the data scientists, the machine learning engineers that you have behind that model being produced. It becomes a very expensive effort to just be an experimentation. So yes, that's a trend I have been seeing with several of our clients.
- 4. The Data Lakehouse: Combining flexibility and predictability
-
Peter Warren:
So, let's dive into the “how.” What tools are people looking at? I mean, you've talked in our prep here about Lakehouses and Data Mesh. I think I said that right; I'd like to learn from you as well. How do you see these new tools being applied in reality to accelerate success?
Doug Leal
Absolutely, absolutely. Let's start with the Data Lakehouse. The Data Lakehouse is a design pattern that combines both the Data Lake and the Data Warehouse. Here's what I mean by that: the Data Lake provides the flexibility to ingest various different types of data. That means it could be tabular data, structure data, etc. Think about a relational database. It could be a semi-structured data chase zone document or a PDF document, if you will, or unstructured data, images, audio, video. The data lake enables us to store different types of data sets while leveraging the scalability of cloud storage. I don't need to order new hard drives or storage for my data center. I'm using the cloud, which is easily scalable.
On the other hand, we have the data warehouse, which is a predictable performance for my business intelligence and reporting. So, we combine those two different patterns. We have the Lakehouse, which now feeds or supports different business organizations regardless of that skill set. As we ingest the data to the Lakehouse, this data is curated. We apply data quality. We curate this data at different levels. We have some layers like raw, curated and aggregated. On the aggregated layer of this Lakehouse, we have something that is very close to a golden record, where the data is ready for business units with very minimal technical skill sets. Using a data visualization tool like Power BI or Tableau, they can just go create their own report, answer their own question. On the other hand, in the raw layer of the Lakehouse, we have the entire data set, which is better suited for data scientists. So, it is a scalable platform that allows different teams to consume the data using different tools, but it is a unified platform. All your data from different systems—think about a utility company, where you have outage data, have AMI data—all of this data in one scalable platform feeding the entire organization.
- 5. The Data Mesh: Decentralizing data platforms
-
Peter Warren:
I think that's it. We were talking with another client last week about making the data pertinent to that audience and that's one of things you just hit it on there. So, what about the other strategy, the data mesh?
Doug Leal
Yeah, so the data mesh is a social technical approach to share, access, manage and drive insights out of analytics data. I mention social technical because it is a methodology. There's not a tool out there that will solve your problem and you can implement a data mesh. It is a methodology; therefore, principles in Data Mesh, are a little bit different than what we've been used to because the first principle of Data Mesh is to decentralize your data platform. Usually, we have one big platform, the enterprise data warehouse or the enterprise Lakehouse, if you will, that serves the entire organization. According to the Data Mesh, that's a big problem, or a big challenge, to solve with one data platform.
One key principle is to decentralize, not to create data silos, but decentralize your data platforms. Let's think about a utility company here at a very high level. We have the generation, we have the distribution and we have the customer. Let's just limit it to three key “actors” here, for lack of a better term. So, we build a data Lakehouse, a domain -based data Lakehouse, for generation.
The generation team is the owner of that Lakehouse now. We transfer the ownership from the IT team to the business. The business now is responsible for data quality. The business now is responsible for driving insights out of that data. They are close to the data, right? They know the data better than the IT folks. So, there is a value there. It's not for everyone, of course, because as you imagine, it is a big cultural shift. Now you are taking the data platform out of IT and giving the keys to the kingdom to the business. But it's not for everyone. For some clients, this model aligns better than others. We can talk a little bit about why that is, and what operating model better aligns with data mesh. But that's one aspect.
So, IT still has its role with Data Mesh, which is to build an enterprise data governance, which is very important, and to support the business with infrastructure as service. Just to close it out here on Data Mesh, one of the biggest benefit is business agility. Now that we are transferring this domain-based data platform to the business, the business has quick access to the data. They can move faster without depending on IT or any other teams to deliver their insights. As you can imagine, you need to have a very well-established data governance process in place for a data matching implementation to be successful.
- 6. Stay tuned for part two: The critical role of organizational structure in data management
-
Peter Warren
Well, that's brilliant. Well, I think that's a good place to pause on this episode. We'll pick up the key point you just brought up a minute ago, organizational structure in part two of this. So with that, Doug, thank you very much for your participation and we'll pick you up in the second part. Thanks everyone. See you then.