Why you should use BigQuery with GA4
One of the features that is available in Google Analytics 4 is the possibility to link your GA4 property to Google BigQuery free of charge. Well, almost, but we'll get into that later in this article. In Universal Analytics, this was a feature that was only available with the premium license (GA360). In this article we'll take a look at why you really should connect your property to BigQuery and what possibilities you have once the data is stored in a table. But first, let's dive into what BigQuery is, in case you haven't heard of it before.
If you're looking for information on how to connect GA4 to BigQuery and you're not interested on the theoretical parts, Google actually provides a really good video guide themselves.
What is BigQuery?
BigQuery is a service within Google Cloud Platform (GCP), which is Google's equivalent of Amazon's AWS and Microsoft's Azure. BigQuery is used to handle and analyze vast amount of data, so called "big data". It is considered to be a serverless data warehouse, meaning you don't need to maintain any infrastructure. Some of the pros of using BigQuery is that it's fast, scalable and easy to use. You can analyze terabytes of data in seconds.
BigQuery also integrates with other Google Cloud services, including machine learning and data analytics tools, which can further enhance its capabilities. Plus, it provides several ways to load data, including streaming data in real-time or batch loading. It also supports various data formats.
In BigQuery, you can write SQL queries to answer complex questions based on the collected data. You can also create views or shcedule queries to create static tables, which then can be connected to third party tools in order to visualize data.
What is the advantage of connecting GA4 to BigQuery?
There are a lot of reasons for connecting GA4 to BigQuery. We'll have a look at some of the most important ones.
Get access to the raw data
In the GA4 interface, there are a lot of limitation to what you can do. For example, you are not allowed to use certain dimensions or metrics together. You are also not allowed to create advanced filters in the report collections. I'm sure you've felt limited at times.
When looking at the data in BigQuery, you instead have access to all data passed to GA4 for all events, except for data generated by Google Signals. By writing SQL queries, you can then structure the data in whatever way you prefer. There are lots of situations where the data in the GA4 interface might actually not be representing what you think. One example is the landing page report, where you will probably see a lot of missing values. This is due to the fact that sessions can be created in GA4 that don't include a page view event, meaning there is no landing page. In BigQuery, you can either filter out those "sessions" easily, or you could create the landing page dimension yourself, since the event/events that actually triggered during the sessions will contain a parameter for "page_location".
Another thing worth mentioning is that GA4 is not actually presenting data based on the entire raw data, but instead uses an algorithm to estimate the data. You can read more of the details about it in another article I wrote on this topic.
GA4 real time data
In the GA4 interface, you will have to wait a long time before you can view collected data. It can take up to 48 hours before the data is completely processed and available. In BigQuery, you will have access to the data in a matter of seconds, provided that you activate the intraday stream. This makes a huge difference when it comes to reporting and analysis.
Connecting GA4 to Looker Studio
If you've added GA4 to Looker Studio, you might have noticed two issues:
- The reports are showing an error that the charts can't be loaded due to the fact that you've reached your API quota. Google recently increased the quota limits, but it might still happen if you use the report a lot.
- The charts are loading slowly. This usually tends to happen often if you add features such as blended data and filters.
With BigQuery, you will never run into any quota limits and it's the fastest way of loading data to a Looker Studio report. Google has built a feature called "BI Engine", which means that the streaming of data from BigQuery to Looker Studio is optimized to be super fast. However, the table you get when you connect GA4 to BigQuery is quite big in size. It also has kind of a special format, which doesn't make it simple to use when you connect it as is to Looker Studio.
Create simple tables in BigQuery based on GA4 data
To make it as easy as possible both to query GA4 data and to connect the data to Looker Studio (or other third party tools), you can create smaller tables that contains subsets of data from the GA4 table. There are lots of ways to do this, which we will not go into in this article. One thing that might be worth mentioning though is that you should not create views and connect those to directly to third party tools, since that might be expensive. Each time the tool changes what should be visible, it will force the view to run the configuration query again. To be honest, I'm not sure if BI Engine does this as well or not, but I would still recommend using static tables that you update by scheduling queries to append or overwrite them. Personally, I prefer to create views and then schedule queries that are collecting data from the views. In that case you can use the views for multiple purposes within BigQuery, but can still have static tables that you connect to Looker Studio. Some examples of smaller tables that could be created are:
- sessions (containing relevant session information, where each row is a session)
- transactions (containing information about purchases, where each row is a transaction)
- items (containing item information related to a purchase, where each row is a unique item)
- users (containing user related data, where each row is a user)
When you need to analyze something related to sessions, you would simply query the session table and avoid processing unnecessary data. This also helps increase load times when connecting to external platforms.
What's the cost of connecting GA4 to BigQuery?
Google Cloud Platform has a price model which is based on "pay-as-you-go", which means that you only pay for what you actually use. This means that there are no retainers or monthly subscriptions you have to pay for. When it comes to BigQuery, there are mainly two things that affects the cost; the storage of data and how much data you process when you run queries. The more data you store and the more data you process, the higher your cost will be. Makes sense, right?
So to the big question, how much will this cost me? There's no simple answer, since it depends on how you use the service. However, by simply connecting GA4 to BigQuery, you will probably not pay more than at most a few euros per month (and that is when I exaggerate). The main cost will probably be when you run queries, and that is also usually pretty cheap. If you create an infrastructure of smaller static tables, you will find BigQuery to cost almost next to nothing. However, I want to be clear that if you run complex queries, connect views to third party sources and similar stuff, the cost might increase significantly. You should not allow someone without an understanding of SQL work within the service.
If you want to have a look at the actual price of working in BigQuery, you can check out the Google pricing documentation.