CGI’s GenAI and predictive analytics models cut records identification from months to minutes and streamline analysis at scale and speed for a major oil and gas company.

Introduction

When our oil and gas client recognized the need to improve their record management strategy for better information quality, faster processes and regulatory compliance, we provided expert consulting through our IP solution CGI GovernWise360. By working closely with their teams and understanding their unique challenges, we combined generative AI (GenAI) with predictive models tailored to specific engineering fields (e.g., civil, electrical, process engineering). This empowered their teams to efficiently identify which files needed to be declared as records and enabled them to process and analyze legacy documents at scale, all while ensuring the solution fit seamlessly within their existing workflows.

"CGI's Intelligent Records Radar pilot has transformed record management by enabling swift, scalable classification across engineering domains with expert-validated models-unlocking critical insights and strengthening asset stewardship across the organization."

Senior IT expert at large oil and gas organization

Consultants reviewing document

Challenges

One of the world’s largest oil and gas companies experienced massive digital content growth, leading to confusion, slower operations and reduced productivity across various teams. The non-record disposal policy they implemented was intended to streamline content retention by keeping only the most relevant information. However, this created a significant risk of important data being lost if improperly classified. The manual analysis process proved overwhelming for users and was error-prone, while their existing software tools did not have the capability to handle large datasets or provide predictions with the nuanced understanding required for critical decisions.

CGI's customer-centric approach was key to overcoming these challenges. We collaborated closely with the client’s engineering teams to ensure that the AI-based solution we developed met their specific needs for accuracy and ease of use. Developing models to handle technical domains was complex. Each model required input from engineering experts, validated keywords and domain-specific documents. Additionally, the sheer volume of documents–over 900 million–presented a significant challenge in terms of bulk classification and retention.

Consultants reviewing data

Solution

The approach involved scanning SharePoint and Microsoft Teams to identify files that should be declared official records. This allowed teams to easily identify records and analyze legacy documents at scale and speed. With CGI's support, the "Intelligent Records Radar" pilot was launched, tailored to the needs of civil, electrical and process engineering teams.

We prioritized user experience by aligning AI-driven models with team workflows, automating tasks like record identification, retention, deletion and document security classification. Tailored to specific engineering domains (e.g., COP, ICE, mechanical, process), the system enabled fast, accurate identification of critical records for informed decisions.

Consultant looking at laptop screen

Key solution features:

  • Data embedding:
    Tasks were established to extract and embed relevant documents from SharePoint, which were then processed using advanced NLP techniques like text summarization, lemmatization and text classification. The cleaned data was stored in a database for easy access and use in machine learning (ML) modules.
  • Processing with various models:
    The solution integrates multiple processing models, including batch processing, real-time streaming and machine learning-based analytics, to enhance flexibility and efficiency. Batch processing optimizes performance by handling large data volumes at set intervals, while real-time streaming enables instantaneous processing for time-sensitive applications. Machine learning models provide actionable insights and predictive analytics, ensuring the system adapts to evolving demands and supports diverse processing needs.
  • Data retrieval:
    Upon receiving a query, the system used models like Logistic Regression, Naive Bayes, and ANN to identify related content based on embedded vectors. It predicted whether a file was a record, providing probability scores. The model also proposed a retention label and duration using document information and similarity scores from gensim word2vec and cosine similarity. Additionally, department-specific keywords were used to predict the associated keywords for each document.
  • Adoption and confidence building:
    GenAI-driven content summarization and word cloud generation significantly optimized the model and boosted team confidence. An integrated web interface was developed to visualize predictions, allowing users to review document summaries and word clouds generated by the GenAI model. This interface enabled users to validate and correct predictions, creating a feedback loop that continuously improved model accuracy and reliability. This process not only fostered trust but also enhanced model performance through iterative refinement.
  • Report generation and visualization:
    Power BI transforms predicted data and probability scores into actionable insights through interactive reports and dashboards. The dashboard includes insights on files categorized as records or non-records, probable retention labels and durations for each file, probability scores per department with the highest scores displayed in charts, and the count of keywords most relevant to each department.
Consultants reviewing document

Outcomes

CGI partnered with the client to leverage its Microsoft 365 investment and CGI GovernWise360 expertise, developing AI and ML models that streamlined record management, reducing content classification and retention time from months to hours. The pilot scanned the client’s repositories, delivering fast, accurate record identification while addressing key needs like automated deletion, long-term retention and duplication detection. This user-focused solution increased compliance posture, enabled tight security measures, minimized errors, boosted efficiency and received high praise from the client's leadership for its speed, accuracy, and productivity gains.

CGI GovernWise360 grows with our clients' business

CGI plans to expand AI-driven information management across more business lines and disciplines, enhancing CGI GovernWise360's scalability and adaptability. By refining AI models and integrating advanced machine learning, CGI aims to deliver more precise, intelligent solutions tailored to industry needs. This broader transformation will push the boundaries of AI-powered records management, focusing on deeper automation, better decision-making and enhanced integration for greater efficiency, accuracy and compliance across sectors.