Running reports on DB Replicas

In 2019, Chuck Boecking hosted a presentation discussing environment management in iDempiere, where he also mentioned scaling environments in a load balance setting and utilizing DB replicas to run reports without impacting performance.

Environment Management, Chuck Boecking – iDempiere world conference 2019, Lyon

This approach is incredibly powerful for ensuring smooth system operation, especially when running heavy reports or processes. In large-scale implementations like Palich’s, it’s not just an option—it’s a necessity!

There is more to talk about replication:



What is Streaming/Physical DB replication?

“Streaming” or “Physical” (often used interchangeably) database replication involves copying and synchronizing all data from a primary database to one or more secondary databases. This process ensures that the required data is consistent across all replicas and can be read independently for various purposes such as disaster recovery, load balancing, and reporting.

Here’s why database replication is important:

  1. High Availability: Database replication provides redundancy, ensuring that if the primary database fails, one of the secondary databases can take over without disrupting operations.
  2. Disaster Recovery: In the event of a catastrophic failure or data loss in the primary database, replicated data in secondary databases can be used to restore operations quickly and minimize downtime.
  3. Load Balancing: Replicated databases can distribute read operations across multiple servers, reducing the load on the primary database and improving overall system performance.
  4. Geographic Distribution: Database replication enables data to be replicated across different geographic locations, allowing for faster access to data and compliance with data residency regulations.
  5. Reporting and Analytics: Replicated databases can be used for reporting and analytics purposes without impacting the performance of the primary production database, ensuring that business insights can be derived efficiently.

In summary, database replication is crucial for maintaining data integrity, availability, and performance in environment management, especially in scenarios where high availability, disaster recovery, and scalability are essential requirements.

Said another way, if your database is not performing fast enough, or if you cannot restore your database quickly enough in a disaster recovery scenario, replication might be a good option to pursue.

How can I implement DB replicas in iDempiere?

iDempiere (up to iDempiere 11) can use the DB Replication for two purposes: reporting and alert processors. To configure your environment for reading from a replica, follow the instructions here.

Up to now, iDempiere does not support connection pooling for reading the replica. It opens a direct connection to the replica database, executes the report and closes the connection.

Replication Timing: Synchronous vs Asynchronous.

The default option for replication is asynchronous. In scenarios like what is described in this article, most users would choose asynchronous.

  • Asynchronous replication performs better because the DB reads and updates periodically from the master database. The latency is not usually noticeable, but there’s a risk of reading stale data from the DB replica. iDempiere monitors the latency for you. If latency is unacceptable, iDempiere will automatically read from the master database when generating a report.
  • Synchronous replication guarantees that the data is always up-to-date and you’re reporting from the current data at all times. This is accomplished by slowing commits until data parity among replicas is ensured. This approach can be expensive in terms of performance.

Replication Types: Physical/Streaming vs Logical

Postgresql implements two types of replication. In this article, we refer only to streaming/physical replication. We will save logical replication for another day.

  • Physical/streaming is often used for reading replicas and disaster recovery. It is a bit-for-bit copy with many restrictions.
  • Logical replication is where you have more freedom and flexibility. The primary Postgres server is your publisher and there are many options for subscribers including other services that implement the PSQL subscriber protocol (such as Google’s BigQuery).