Both Google Analytics 4 (GA4) and BigQuery offer powerful data collection and analysis capabilities, but it is also expected to see discrepancies arise when comparing the data reported by both platforms.
This guide provides practical insights to help you understand and address these differences.
What to expect
Inherently, since GA4 and BigQuery are different platforms which processes data independently, a small amount of discrepancy is expected.
While there isn’t a specific percentage that can be used as a baseline for these discrepancies, minor differences should be anticipated when analyzing and comparing data across different platforms.
Basic checks
Aside from the pointers included in the Google Analytics 4 article: “[GA4] Compare Analytics reports and data exported to BigQuery”, there are still a few considerations we have to look at when comparing BigQuery and Google Analytics 4 data:
Sampling
In Google Analytics 4, sampling is a technique used to process large amounts of data more efficiently. Google Analytics 4 randomly selects a subset of data to represent the entire dataset, which reduces processing time and resource consumption, allowing for faster reporting and analysis. You may see that sampling is indicated in the data quality icon, including the percentage of data used to extrapolate the results in the reports.
BigQuery does not employ the same algorithm and can query large-scale data, so it is recommended to check if the Google Analytics 4 data has sampling applied before comparing it with BigQuery.
Data Integrity
Note that BigQuery exports daily data by batches, which can take for up to 72 hours to complete. This means that the data contained within these tables could still get updated, and hence, potentially showing different results when querying later in the day.
On the other hand, streaming export is a best effort basis, and might not always include data that’s on the daily export table. Therefore, it’s recommended to only query the daily export table for analysis and comparison with Google Analytics 4.
For analysis, be vigilant when querying the daily export tables and ensure that queries do not include dates from within the last 72hours.
To see which export type in BigQuery is being used, you may refer to Admin > Property Settings > Product Links > BigQuery Links.
Once a BQ project is linked to Google Analytics 4, it will show up under existing links. Clicking this link will show the linking details, and data configurations, including the export type.
Cookieless Pings and User Consent
When using the reporting identity of “Device-based” for comparisons with BigQuery, Google Analytics 4 data will only reflect consented users. In contrast, BigQuery reports all cookieless pings, leading to discrepancies.
Cookieless pings, in this case, are anonymized and non-identifiable data points collected by GA4 from users who have opted out of being tracked. Please see the Consent mode reference for more information.
Traffic Source
When doing an analysis of the Google Analytics 4 andBigQuery data, we also need to be aware of the scope being used for each platform.
Previously, BigQuery can only query the first user traffic from Google Analytics, which is under the traffic_* fields in the schema. Currently, we have session_traffic_source_last_click (session-level), and collected_traffic_source(event-level), so it is essential to ensure that the scopes match between Google Analytics 4 and BigQuery to avoid potential discrepancies.
Metric calculations
In Google Analytics 4, certain metrics like user and session count apply HyperLogLog++ (HLL), an algorithm that is designed to provide accurate estimations of the unique users and sessions.
BigQuery, on the other hand, analyzes the raw data without the HLL approximations, which can lead to differences in the reported figures. Feel free to refer to the Google Analytics’ Developer article including the usage of HyperLogLog++ here, for more information.
Query Structure
The complexity of the SQL also plays a role in the discrepancy for some situations. If you continue to experience a discrepancy despite the checks above, one thing I recommend is to try and query a simple metric in BigQuery to compare to Google Analytics 4, and confirm if the discrepancy exists on a smaller scale.
If you experience that a simple query shows the same data between Google Analytics 4 and BQ, and a complex one does not, revisiting how the SQL is structured may be beneficial. A good reference is [GA4] BigQuery Export schema.
Conclusion
Bridging the gap between Google Analytics 4 (GA4) andBigQuery is a must for data-driven insights. Both tools provide unique views of your website or app data, but understanding their differences is essential for accurate insights.
By recognizing the potential discrepancies and their root causes, you can make more informed decisions based on accurate data.