5 benefits of using synthetic data for artificial intelligence

Name	Provider	Purpose	Expiry	Type
JSESSIONID	NewRelic	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.	Session	First party
__cfruid	HubSpot	This cookie is set by HubSpot’s CDN provider because of their rate limiting policies.	Session	First party
hs-membership-csrf	HubSpot	This cookie is used to ensure that content membership logins cannot be forged. It contains a random string of letters and numbers used to verify that a membership login is authentic.	Session	First party
__cf_bm	CloudFlare	This cookie is used to distinguish between humans and bots. This is beneficial for the website, in order to make valid reports on the use of their website.	30 Minutes	First party
cookie-agreed	www.cgi.com	Stores the user's cookie consent state for the current domain	23 Days	First party
AKA_A2	Akamai	This cookie is generally provided by Akamai and is used for the Advanced Acceleration feature, which enables DNS Prefetch and HTTP2 Push.	1 Hour	First party

Name	Provider	Purpose	Expiry	Type
hs-messages-is-open	HubSpot	This cookie is used to determine and save whether the chat widget is open for future visits. It is set in your visitor's browser when they start a new chat, and resets to re-close the widget after 30 minutes of inactivity. If your visitor manually closes the chat widget, it will prevent the widget from re-opening on subsequent page loads in that browser session for 30 minutes. It contains a boolean value of True if present.	30 Mins	Third party
__hsmem	HubSpot	This cookie is set when visitors log in to a HubSpot-hosted site. It contains encrypted data that identifies the membership user when they are currently logged in.	7 Days	Third party
starlight	www.cgi.com	In specific parts of our website, we have disclaimer flags with buttons to accept or reject the disclaimer. If user accepts the disclaimer, then this cookie stores the information about the fact that a visitor has accepted the disclaimer	365 Days	First party
player	Vimeo	This cookie saves your settings before you play an embedded Vimeo video. This means that the next time you watch a Vimeo video, you will get your preferred settings back.	365 Days	Third party
hs_ab_test	HubSpot	This cookie is used to consistently serve visitors the same version of an A/B test page they’ve seen before. It contains the id of the A/B test page and the id of the variation that was chosen for the visitor.	Session	Third party
_cfuvid	CloudFlare	This cookie is a part of the services provided by Cloudflare - Including load-balancing, deliverance of website content and serving DNS connection for website operators	Session	Third party
lang	LinkedIn	This domain is owned by LinkedIn, the business networking platform. It typically acts as a third party host where website owners have placed one of its content sharing buttons in their pages, although its content and services can be embedded in other ways. Although such buttons add functionality to the website they are on, cookies are set regardless of whether or not the visitor has an active LinkedIn profile, or agreed to their terms and conditions. For this reason it is classified as a primarily tracking/targeting domain.	Session	Third party
WFESessionId	Microsoft Azure	This cookie is necessary to enable a Power BI session, a Microsoft tool that helps visualize data.	Session	Third party
YSC	Youtube	YouTube is a Google owned platform for hosting and sharing videos. YouTube collects user data through videos embedded in websites, which is aggregated with profile data from other Google services in order to display targeted advertising to web visitors across a broad range of their own and other websites.	Session	Third party
<id>_key	HubSpot	When visiting a password-protected page, this cookie is set so future visits to the page from the same browser do not require login again. The cookie name is unique for each password-protected page. It contains an encrypted version of the password so future visits to the page will not require the password again.	14 Days	Third party
s_cc	Adobe Analytics	Used to determine if cookies are enabled for Adobe Analytics	Session	Third party
__hs_opt_out	HubSpot	This cookie is used by the opt-in privacy policy to remember not to ask the visitor to accept cookies again. This cookie is set when you give visitors the choice to opt out of cookies. It contains the string "yes" or "no".	6 Months	Third party
__hs_do_not_track	HubSpot	This cookie can be set to prevent the tracking code from sending any information to HubSpot. It contains the string "yes".	6 Months	Third party
_GRECAPTCHA	Google reCAPTCHA	This cookie is set by Google reCAPTCHA, which protects our site against spam enquiries on contact forms.	6 Months	Third party
hs_langswitcher_choice	HubSpot	This cookie is used to save a visitor’s selected language choice when viewing pages in multiple languages. It is set when a visitor selects a language from the language switcher and is used as a language preference to redirect them to sites in their chosen language in the future if they are available. It contains a colon delimited string with the ISO639 language code choice on the left and the top level private domain it applies to on the right. An example will be "EN-US:hubspot.com".	2 Years	Third party
cookieValue	www.cgi.com	This cookie used for the disclaimer acceptance flag.	1 Day	First party
__hs_cookie_cat_pref	HubSpot	This cookie is used to record the categories a visitor consented to. It contains data on the consented categories.	6 Months	Third party
ARRAffinitySameSite	Microsoft Azure	ARRAffinitySameSite is for Azure Web Sites for load balancing our application. This cookie is used to distribute traffic to the website on several servers in order to optimize response times.	Session	Third party
__hs_initial_opt_in	HubSpot	This cookie is used to prevent the banner from always displaying when visitors are browsing in strict mode. It contains the string "yes" or "no".	7 Days	Third party
language	www.cgi.com	This cookie remembers your preferred language based on your previous selections, allowing the website to present content in your chosen language without you having to manually select it each time you visit.	365 Days	First party
VISITOR_INFO1_LIVE	Youtube	This cookie is used as a unique identifier to track viewing of videos	365 Days	Third party
cookie-agreed-categories	www.cgi.com	Save the user's cookie consent category states for the current domain	23 Days	First party
hs-messages-hide-welcome-message	HubSpot	This cookie is used to prevent the chat widget welcome message from appearing again for one day after it is dismissed. It contains a boolean value of True or False.	1 Day	Third party

Name	Provider	Purpose	Expiry	Type
_tr	Meta	This cookie is used to track your interactions with ads that are powered by Meta. It is stored for 30 days.	30 Days	Third party
tiktok_ads_id	Tiktok	This cookie is used to track your interactions with ads that are powered by TikTok. It is stored for 13 months.	13 Months	Third party
VID	LinkedIn	A visitor-related identifier for a LinkedIn microsite used to determine conversions for lead gen purposes.	1 Year	Third party
li_c_user	LinkedIn	This cookie is used to track your activity on websites that have the LinkedIn Pixel installed. It is stored for 1 year.	1 Year	Third party
_ga	Google Analytics	This cookie enables Google Analytics to distinguish one visitor from another in order to generate statistical website usage data. Each ‘_ga’ cookie is unique to the specific property, so it cannot be used to track a given user or browser across unrelated websites.	365 Days	First and third party
ms_ta*	Bing	These cookies are used to track your interactions with ads that are powered by Bing. They are stored for 1 year.	1 Year	Third party
__utmb	Google Analytics	This is one of the four main cookies set by the Google Analytics service which enables website owners to track visitor behaviour and measure site performance. This cookie determines new sessions and visits and expires after 30 minutes. The cookie is updated every time data is sent to Google Analytics. Any activity by a user within the 30 minute life span will count as a single visit, even if the user leaves and then returns to the site. A return after 30 minutes will count as a new visit, but a returning visitor.	365 Days	First party
s_ppv	Adobe	Used by Adobe Analytics to retain and fetch what percentage of a page was viewed	Session	Third party
tiktok_pixel	Tiktok	This cookie is used to track your activity on websites that have the TikTok Pixel installed. It is stored for 13 months.	13 Months	Third party
__hssc	HubSpot	This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.	30 Minutes	First party
gpy_pn	Adobe	Used to store and retrieve the previous page in Adobe Analytics.	6 Months	Third party
__utmc	Google Analytics	This is one of the four main cookies set by the Google Analytics service which enables website owners to track visitor behaviour and measure site performance. It is not used in most sites but is set to enable interoperability with the older version of Google Analytics code known as Urchin. In this older versions this was used in combination with the __utmb cookie to identify new sessions/visits for returning visitors. When used by Google Analytics this is always a Session cookie which is destroyed when the user closes their browser. Where it is seen as a Persistent cookie it is therefore likely to be a different technology setting the cookie.	365 Days	First party
s_tslv	Adobe	Used to retain and fetch time since the last visit in Adobe Analytics	6 Months	Third party
_ga_LC0YVRL587	Google Analytics	This is a pattern-type cookie set by Google Analytics, where the name element contains the unique identifier of the account or website to which it is associated. Used to store and count pageviews.	1 Year	First party
_gat	Google Analytics	This cookie name is associated with Google Universal Analytics, according to documentation it is used to throttle the request rate - limiting the collection of data on high traffic sites. It expires after 1 minute.	365 Days	First party
fr	Facebook	Contains browser and user unique ID combination, used for targeted advertising.	365 Days	Third party
sc_hit	SnapChat	This cookie is used to track your activity on websites that have the Snapchat Pixel installed. It is stored for 1 year.	13 Months	Third party
mf_[website-id]	Mouseflow	1st party cookie, session lifetime: A cookie for identifying the current session on a website	Session	First party
simpli.fi_visit	Simpli.fi	This cookie is used to track your visits to websites that have the Simpli.fi Pixel installed. It is stored for 30 days.	30 Days	Third party
s_pltp	Adobe	Provides page name value (URL) for use by Adobe Analytics	Session	Third party
s_tp	Adobe	Tracks percent of page viewed	2 Years	Third party
mf_user	Mouseflow	This cookie establishes whether the user is a returning or first-time visitor. This is done simply by a yes/no toggle and no further information about the user is stored. This cookie has a lifetime of 90 days.	3 Months	First party
__utmz	Google Analytics	This is one of the four main cookies set by the Google Analytics service which enables website owners to track visitor behaviour measure of site performance. This cookie identifies the source of traffic to the site - so Google Analytics can tell site owners where visitors came from when arriving on the site. The cookie has a life span of 6 months and is updated every time data is sent to Google Analytics.	365 Days	First party
s_plt	Adobe	Tracks the time that the previous page took to load	Session	Third party
vuid	Vimeo	This domain is owned by Vimeo. The main business activities are: Video Hosting / Sharing	365 Days	Third party
_fbp	Meta	This cookie is used to track your activity on websites that have the Meta Pixel installed. It is stored for 30 days.	30 Days	Third party
_gid	Google Analytics	Registers a unique ID that is used to generate statistical data on how the visitor uses the website. This cookie expires after 1 day.	1 Day	Third party
_hjid	Hotjar	This is a pattern-type cookie set by Google Analytics, where the name element contains the unique identifier of the account or website to which it is associated. It is a variation of the _gat cookie that is used to limit the amount of data that Google stores on high-traffic websites.	365 Days	First party
simpli.fi_id	Simpli.fi	This cookie is used to track your activity on websites that have the Simpli.fi Pixel installed. It is stored for 30 days.	30 Days	Third party
gpv_pn	Adobe	This cookie gathers data for analyzing the visitor's use of the website including activity tracking, page visits and links clicked	2 Hours	Third party
_gat_UA-114077998-1	Google Analytics	This is a pattern-type cookie set by Google Analytics, where the name element contains the unique identifier of the account or website to which it is associated. It is a variation of the _gat cookie that is used to limit the amount of data that Google stores on high-traffic websites.	365 Days	First party
appcast_job_ad	Appcast	This cookie is used to track your interactions with job ads that are powered by Appcast. It is stored for 30 days.	30 Days	Third party
appcast_visitor	Appcast	This cookie is used to track your visits to the Appcast website. It is stored for 30 days.	30 Days	Third party
li_cs	LinkedIn	This cookie is used to track your interactions with ads that are powered by LinkedIn. It is stored for 1 year.	1 Year	Third party
RT	Boomerang	It measures page load time, or other timers associated with the page.	365 Days	First party
_ga	HubSpot	This cookie records a unique identification which is used to generate statistical data about how the visitor uses the Website.	Session	First party
AnalyticsSyncHistory	LinkedIn	Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries	30 Days	Third party
ai_session	Microsoft Azure	This cookie name is associated with the Microsoft Application Insights software, which collects statistical usage and telemetry information for apps built on the Azure cloud platform. This is a unique anonymous session identifier cookie. The main purpose of this cookie is: Performance	Session	First party
__utma	Google Analytics	This is one of the four main cookies set by the Google Analytics service which enables website owners to track visitor behaviour and measure site performance. This cookie lasts for 2 years by default and distinguishes between users and sessions. It it used to calculate new and returning visitor statistics. The cookie is updated every time data is sent to Google Analytics. The lifespan of the cookie can be customised by website owners.	365 Days	First party
__hstc	HubSpot	The main cookie for tracking visitors. It contains the domain, utk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).	6 Months	First party
ai_user	Microsoft Azure	This cookie name is associated with the Microsoft Application Insights software, which collects statistical usage and telemetry information for apps built on the Azure cloud platform. This is a unique user identifier cookie enabling counting of the number of users accessing the application over time. The main purpose of this cookie is: Performance	1 Year	First party
_gclxxxx	Google Analytics	This is the Google conversion tracking cookie. It allows to count visits and traffic sources, so we can measure and improve the performance of our site.	365 Days	First party
aam_uuid	Adobe	Set for ID sync for Adobe Audience Manager	30 Days	Third party
_gid	Google Analytics	This cookie name is associated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited.	365 Days	First and third party
ms_u*	Bing	These cookies are used to track your activity on websites that have the Bing Pixel installed. They are stored for 1 year.	1 Year	Third party
lms_analytics	LinkedIn	Used to identify LinkedIn Members in the Designated Countries for analytics	30 Days	Third party
__utmt	Google Analytics	This cookie is set by Google Analytics. According to their documentation it is used to throttle the request rate for the service - limiting the collection of data on high traffic sites. It expires after 10 minutes	365 Days	First party
appcast_session	Appcast	This cookie is used to track your session on the Appcast website. It is deleted when you close your browser.	30 Days	Third party
__hssrc	HubSpot	Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. It contains the value "1" when present.	Session	First party
hubspotutk	HubSpot	This cookie enables us to deliver the service and or response that individuals needs and expects from us, in a seamless manner	6 Months	First party
mf_user	Mouseflow	1st party cookie, persistent: A cookie for checking if the user is new or returning	90 Days	First party
s_ips	Adobe	Tracks percent of page viewed	Session	Third party
__hjSessionUser_204526	HubSpot	Hotjar cookie that is set when a user first lands on a page with the Hotjar script. It is used to persist the Hotjar User ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.	Session	First party
_gat_UA-nnnnnnn-nn	Google Analytics	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.	365 Days	First party
_gat_UA-399437-1	Google Analytics	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.	365 Days	First party
sc_gpt	SnapChat	This cookie is used to track your interactions with ads that are powered by Snapchat. It is stored for 1 year.	13 Months	Third party

Name	Provider	Purpose	Expiry	Type
sp_t	Spotify	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.	364 Days	Third party
bcookie	LinkedIn	Browser Identifier cookie to uniquely identify devices accessing LinkedIn to detect abuse on the platform.	365 Days	Third party
dpm	Adobe marketing cloud	The cookie is used for targeted advertising and marketing. Domain is owned by Adobe Audience Manager.	365 Days	Third party
NID	Google Ads Optimization	This is a Google cookie that allows a company, such as CGI, to target advertising to users who have signed out of their service. A cookie allows CGI to show you useful content on Google services. By accepting marketing cookies, you authorize Google to process your information. You can also influence your own information or withdraw your consent in the Google services settings or by modifying your cookie settings from this cookie manager (link at the bottom of the page). Learn more: https://policies.google.com/technologies/cookies	365 Days	Third party
lissc	LinkedIn	Used by the social networking service LinkedIn for tracking the use of embedded services.	365 Days	Third party
UserMatchHistory	LinkedIn	This domain is owned by LinkedIn, the business networking platform. It typically acts as a third party host where website owners have placed one of its content sharing buttons in their pages, although its content and services can be embedded in other ways. Although such buttons add functionality to the website they are on, cookies are set regardless of whether or not the visitor has an active LinkedIn profile, or agreed to their terms and conditions. For this reason it is classified as a primarily tracking/targeting domain.	365 Days	Third party
_gcl_au	HubSpot	Google Adsense to store and track conversions.	89 Days	Third party
c	Cision	This domain is owned by IPONWEB and is used to provide a real time bidding platform for online advertising.	184 Days	Third party
_gcl_au	Google Adsense	Used through Google Analytics to understand user interaction with the site and advertising	3 Months	Third party
_fbp	Facebook	Used by Facebook to deliver a series of advertisement products such as real time bidding from third party advertisers	365 Days	Third party
IDE	Google DoubleClick	This cookie, used by Google DoubleClick, helps measure the effectiveness of ads and delivers targeted ads to users. It tracks actions after viewing or clicking an ad to improve user experience, with a focus on ad preferences.	2 Years	Third party
lidc	LinkedIn	This domain is owned by LinkedIn, the business networking platform. This cookies is used to facilitate data center selection.	365 Days	Third party
ln_or	LinkedIn	Used to determine if Oribi analytics can be carried out on a specific domain	1 Day	Third party
uuid	MediaMath	MediaMath uses cookies to help recognize a computer or device so that they can deliver relevant advertising to you, measure the impact of that advertising and better understand and recognize digital media usage patterns.	13 Months	Third party
_cc_aud	Lotame	We use this cookie to target advertising that is appropriate for you through the Adform service. This domain is owned by Lotame. The cookie can be used to collect the following information: Cookie ID, Mobile Advertising ID, Partner ID, browser and device information, IP address and analytics information about the functionality of advertising. By accepting cookies, you allow Adform and Lotame to process your cookie information. You can influence the processing of your information by contacting dpo@adform.com or modifying your cookie settings.	365 Days	Third party
AMCVS_*	Adobe experience cloud	Indicates the start of a session for Adobe Experience Cloud	Session	Third party
c_user	Facebook	The c_user cookie contains the user ID of the currently logged in user. The lifetime of this cookie is dependent on the status of the ‘keep me logged in’ checkbox. If the ‘keep me logged in’ checkbox is set, the cookie expires after 90 days of inactivity. If the ‘keep me logged in’ checkbox is not set, the cookie is a session cookie and will therefore be cleared when the browser exits.	3 Months	Third party
bscookie	LinkedIn	This cookie is used for remembering that a logged in user is verified by two factor authentication and has previously logged in.	365 Days	Third party
_fbp	HubSpot	Used by Facebook to deliver a series of advertisement products such as real time bidding from third party advertisers	3 Months	Third party
lms_ads	LinkedIn	Used to identify LinkedIn Members off LinkedIn in the Designated Countries for advertising	30 Days	Third party
AMCV_*	Adobe experience cloud	Unique Identifier for Adobe Experience Cloud	180 Days	Third party
datr	Facebook	The purpose of the Datr cookie is to identify the web browser used to connect to Facebook, regardless of the logged in user. This cookie plays a key role in Facebook's security and site integrity functions.	2 Years	Third party
sb	Facebook	Facebook – Allows Facebook to recover your account in the event that you forget your password, or to require additional authentication if you tell us that your account has been hacked.”sb” and “dbln” cookies enable Facebook to identify your browser securely.	2 Years	Third party
RUL	Google DoubleClick	Used by Google DoubleClick to determine whether website advertisement has been properly displayed.	1 Year	Third party
personalization_id	X	This cookie is set due to X integration and sharing capabilities for the social media.	2 Years	Third party
GPS	Youtube	YouTube is a Google owned platform for hosting and sharing videos. YouTube collects user data through videos embedded in websites, which is aggregated with profile data from other Google services in order to display targeted advertising to web visitors across a broad range of their own and other websites.	365 Days	Third party
liap	LinkedIn	This cookie locates LinkedIn functionalities in the page and share the Website information on social networks.	1 Year	Third party
demdex	Adobe marketing cloud	This cookie helps Adobe Audience Manger perform basic functions such as visitor identification, ID synchronization, segmentation, modeling and reporting	365 Days	Third party
A3	Yahoo	This domain is owned by Yahoo, whose principal business is Search and Advertising Services.	365 Days	Third party
_gcl_aw	Google Adsense	to provide ad delivery or retargeting.	90 Days	Third party
sp_landing	Spotify	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.	23 Days	Third party
li_sugr	LinkedIn	Used to make a probabilistic match of a user's identity outside the Designated Countries	90 Days	Third party
_kuid_	Salesforce.com	We use this cookie to target advertising that is appropriate for you online. This domain is owned by the Krux Digital.	365 Days	Third party
xs	Facebook	Session cookies are c_user and xs. c_user stores the username and the xs session secret, these two cookies together determine whether the user is logged in or not.	3 Months	Third party
_guid	LinkedIn	Used to identify a LinkedIn Member for advertising through Google Ads	90 Days	Third party

The concept of “garbage in, garbage out” (GIGO) has never been more relevant in our increasingly AI-driven world. GIGO applies when poor quality input results in poor quality output; and some of the early experiences with GenAI are a perfect example of this. Regardless of the type of AI solution used, the solution will always require complete, accurate, and timely data to deliver trusted outcomes.

As organizations across industries increasingly invest in AI, using the right data at the right time and in the right (and responsible) way is becoming more challenging. In this blog, I share a few of these challenges and propose a way forward—the use of synthetic (or artificially produced) data for AI.

Challenges in training AI data

AI and machine learning engines require vast amounts of data to be trained so they can perform their intended tasks. While data volume typically is not an issue, data usage is another story. Three problems associated with using organic data for AI are worth discussing.

Privacy. First, data usage is subject to significant regulation focused on protecting individual privacy. A clear example is the European Union’s General Data Protection Regulation (GDPR). GDPR aims to ensure that personal information is handled in a responsible and secure manner, while also giving individuals more control over their data. It limits how data can be collected and used, as well as how long it can be stored. Because of regulations like GDPR, customer and employee data cannot be freely used to train AI engines. To legally use individual-based data, extensive anonymization is often required, which is both complicated and expensive; and data anonymization does not guarantee security.

Copyright. Second, some data is copyrighted. There is much debate about the use of copyrighted data for AI. Given regulatory discussions happening across various government entities, we anticipate new guidelines to be released soon that will require organizations to clearly indicate which public data has been used to train a specific AI function or module.

Quality. Third, data can include a range of errors and biases, which may or may not be easy to correct, even if the data is organically produced. Further, identifying high-quality data among the volumes of available data can be burdensome and costly.

How synthetic data can help

An alternative to organically produced data is synthetic data. Unlike data collected from real events, synthetic data is artificially generated. However, it offers the same statistical properties as organic data and therefore provides the same statistical conclusions. This makes it very useful for AI solutions.

Synthetic data can be generated programmatically using a variety of techniques. With machine learning, for example, it’s possible to produce synthetic data that mirrors the statistical properties of real-world data. Data also can be collected from real-life people, events, or objects via computer simulations or algorithms and converted to synthetic data. Data scientists take the real-world data, extract desired information, and convert it into synthetic datasets.

What are the benefits of using synthetic data for AI? Here are just a few:

Multiple purposes and types: Synthetic data can be generated for a variety of purposes and in several different formats—from simple table data to more advanced data types like imagery, text, and speech.

Ease of training: Synthetic data enables organizations to avoid many of the above-mentioned challenges associated with training data. It can be generated in the desired volume and completely anonymized to ensure regulatory compliance, while still providing the same statistical conclusions as organic data.

Quality control: With synthetic data, you can control the level of quality. In some cases, such as test data for systems development, the quality can be low. Other uses, however, may require high-quality data to achieve desired outcomes. Keep in mind that assessing the quality of synthetic data is a new area of exploration, with definitions and measurements only beginning to emerge.

Risk reduction. With synthetic data, organizations and researchers can perform extensive analyses and develop AI models without the risks and limitations associated with using real data that is confidential and/or sensitive. Through such risk reduction, synthetic data can advance an organization’s responsible use of AI. (To learn more, check out this blog from my colleague Dr. Diane Gutiw: Embracing responsible AI in the move from automation to creation and Guardrails for data protection in the age of GenAI.)

Cost savings: Synthetic data offers potential cost savings because it can be less expensive to generate than collecting real data.

Synthetic data use cases and challenges

Because there are no limitations on the type or size of synthetic data that can be generated—either from real-world data, including images, or from scratch—potential use cases abound. Synthetic data can be generated, for example, in healthcare to support research and development without compromising real-life patient data. It can be used in industries like retail and transportation to statistically mirror customer behavior and drive product and service innovation.

The use of synthetic data, however, comes with challenges. It may not be as precise as real-world data or perfectly reflect real-world scenarios. For example, outliers and low probability events, common in real-world datasets, are difficult to reproduce in synthetic data.

Synthetic data also can pose a security risk when used to support AI models. Malicious use of synthetic datasets, for example, can lead to AI models that are more vulnerable to security attacks.

Moving forward with synthetic data

Synthetic data is an exciting area of AI that resolves some of the biggest challenges in data management, such as privacy, data availability, and quality. Synthetic data can open new opportunities for exploring AI innovations, while maintaining a high level of data protection and regulatory compliance. use

For organizations evaluating the of synthetic data, we recommend the following:

Clearly define your objectives (Understand what you want to do with synthetic data and the business rationale for using it.)

Determine the level of synthetic data quality you need (Is synthetic data that matches real-world data by 80% sufficient? Is a higher level of mirroring required?)

Assess the cost in producing the synthetic data you need (Will there be a clear ROI?)

Consider security and regulatory issues, such as GDPR requirements

The quality of synthetic data is an area in which I'm particularly interested. CGI recently partnered with Karlstad University* to find better methods for assessing synthetic data quality and to co-publish a research paper. Feel free to contact me to discuss synthetic data, data usage, or AI in general. You also can explore CGI’s AI capabilities and experience.

*Announcement in Swedish

About this author

Jonas Forsman

Director, Consulting Expert

Jonas Forsman has more than 20 years of experience in designing, developing, testing, and implementing advanced technology solutions across industries using artificial intelligence, big data, data analytics, and business intelligence. He also has significant experience in research and innovation project management both within and outside ...

View profile

Alliances

CGI Voice of Our Clients

CGI Voice of Our Clients

Women in tech: Better the balance, better the world

Jonas Forsman