eXo JCR Backup Service

eXo JCR Backup Service

1 Concept

The main purpose of that feature is to restore data in case of system faults and repository crashes. Also the backup results may be used as a content history.

The eXo JCR backup service was developed from the JCR 1.8 implementation. It's an independent service available as an eXo JCR Extensions project.

The concept is based on the export of a workspace unit in the Full, or Full + Incrementals model. A repository workspace can be backup and restored using a combination of these modes. In all cases, at least one Full (initial) backup must be executed to mark a starting point of the backup history. An Incremental backup is not a complete image of the workspace. It contains only changes for some period. So it is not possible to perform an Incremental backup without an initial Full backup.

The Backup service may operate as a hot-backup process at runtime on an in-use workspace. It's a case when the Full + Incrementals model should be used to have a guaranty of data consistency during restoration. An Incremental will be run starting from the start point of the Full backup and will contain changes that have occured during the Full backup too.

A restore operation is a mirror of a backup one. At least one Full backup should be restored to obtain a workspace corresponding to some point in time. On the other hand, Incrementals may be restored in the order of creation to reach a required state of a content. If the Incremental contains the same data as the Full backup (hot-backup), the changes will be applied again as if they were made in a normal way via API calls.

According to the model there are several modes for backup logic:

  • Full backup only : single operation, runs once
  • Full + Incrementals : Start with an initial Full backup and then keep incrementals changes in one file. Runs until it is stopped.
  • Full + Incrementals(periodic) : Start with an initial Full backup and then keep incrementals with periodic result file rotation. Runs until it is stopped.

2 How it works

2.1 Implementation details

Full backup/restore is implemented using the JCR SysView Export/Import. Workspace data will be exported into Sysview XML data from root node.

Restore is implemented using the special eXo JCR API feature: a dynamic workspace creation. Restoring of the workspace Full backup will create one new workspace in the repository. Then the SysView XML data will be imported as the root node.

Incremental backup is implemented using the eXo JCR ChangesLog API. This API allows to record each JCR API call as atomic entries in a changelog. Hence, the Incremental backup uses a listener that collects these logs and stores them in a file.

Restoring an incremental backup consists in applying the collected set of ChangesLogs to a workspace in the correct order.

2.2 Work basics

The work of Backup is based on the BackupConfig configuration and the BackupChain logical unit.

BackupConfig describes the backup operation chain that will be performed by the service. When you intend to work with it, the configuration should be prepared before the backup is started.

The configuration contains such values as:

  • types of full and incremental backup ? (fullBackupType, incrementalBackupType) Strings with full names of classes which will cover the type functional.
  • incremental period - a period after that a current backup will be stopped and a new one will be started, in seconds (long).
  • target repository and workspace names ? Strings with described names
  • destination directory for result files ? String with a path to a folder where operation result files will be stored.
BackupChain is a unit performing the backup process and it covers the principle of initial Full backup execution and manages Incrementals operations. BackupChain is used as a key object for accessing current backups during runtime via BackupManager. Each BackupJob performs a single atomic operation ? a Full or Incremental process. The result of that operation is data for a Restore. BackupChain can contain one or more BackupJobs. But at least the initial Full job is always there. Each BackupJobs has its own unique number which means its Job order in the chain, the initial Full job always has the number 0.

Backup process, result data and file location

To start the backup process it's necessary to create the BackupConfig and call the BackupManager.startBackup(BackupConfig) method. This method will return BackupChain created according to the configuration. At the same time the chain creates a BackupChainLog which persists BackupConfig content and BackupChain operation states to the file in the service working directory (see Configuration).

When the chain starts the work and the initial BackupJob starts, the job will create a result data file using the destination directory path from BackupConfig. The destination directory will contain a directory with an automatically created name using the pattern repository_workspace-timestamp where timestamp is current time in the format of yyyyMMdd_hhmmss (E.g. db1_ws1-20080306_055404). The directory will contain the results of all Jobs configured for execution. Each Job stores the backup result in its own file with the name repository_workspace-timestamp.jobNumber. BackupChain saves each state (STARTING, WAITING, WORKING, FINISHED) of its Jobs in the BackupChainLog, which has a current result full file path.

BackupChain log file and job result files are a whole and consistent unit, that is a source for a Restore.

BackupChain log contains absolute paths to job result files. Don't move these files to another location.

Restore requirements

As mentioned before a Restore operation is a mirror of a Backup. The process is a Full restore of a root node with restoring an additional Incremental backup to reach a desired workspace state. Restoring of the workspace Full backup will create a new workspace in the repository using given RepositoyEntry of existing repository and given (preconfigured) WorkspaceEntry for a new target workspace. A Restore process will restore a root node there from the SysView XML data.

The target workspace should not be in the repository. Otherwise a BackupConfigurationException exception will be thrown.

For creation and manipulation with Workspaces check the article Repository and Workspace management.

Finally we may say that a Restore is a process of a new Workspace creation and filling it with a Backup content. In case you already have a target Workspace (with the same name) in a Repository, you have to configure a new name for it. If no target workspace exists in the Repository you may use the same name as the Backup one.

3 Configuration

As an optional extension, the Backup service is not enabled by default. You need to enable it via configuration.

Below is an example configuration compatible with JCR 1.9.3 and later :

<component>
    <key>org.exoplatform.services.jcr.ext.backup.BackupManager</key>
    <type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type>
    <init-params>
      <properties-param>
        <name>backup-properties</name>
        <property name="default-incremental-job-period" value="3600" /> <!-- set default incremental period = 60 minutes -->
        <property name="full-backup-type" value="org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob" />
        <property name="incremental-backup-type" value="org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob" />
        <property name="backup-dir" value="target/backup" />
      </properties-param>
    </init-params>
  </component>

Where:

  • incremental-backup-type (since 1.9.3) : t the FQN of incremental job class. Must implement org.exoplatform.services.jcr.ext.backup.BackupJob
  • full-backup-type (since 1.9.3) : the FQN of the full backup job class; Must implement org.exoplatform.services.jcr.ext.backup.BackupJob
  • default-incremental-job-period (since 1.9.3) :the period between incremetal flushes (in seconds)
  • backup-dir : the path to a working directory where the service will store internal files and chain logs.

4 Usage

4.1 Perform a Backup

In following example we create a BackupConfig bean for the Full + Incrementals mode, then we ask the BackupManager to start the backup process.

// Obtaining the backup service from the eXo container.
    BackupManager backup = (BackupManager) container.getComponentInstanceOfType(BackupManager.class);

   // And prepare the BackupConfig instance with custom parameters. 
   // full backup & incremental
    File backDir = new File("/backup/ws1"); // the destination path for result files
    backDir.mkdirs();

    BackupConfig config = new BackupConfig();
    config.setRepository(repository.getName());
    config.setWorkspace("ws1");
    config.setBackupDir(backDir);

   // Before 1.9.3, you also need to indicate the backupjobs class FDNs
   // config.setFullBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob");
   // config.setIncrementalBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob");

   // start backup using the service manager
   BackupChain chain = backup.startBackup(config);

To stop the backup operation you have to use the BackupChain instance.

// stop backup
    backup.stopBackup(chain);

4.2 Perform a Restore

Restoration involves the reloading the backup file into a BackupChainLog and applying appropriate workspace initialization. The following snippet shows the typical sequence for restoring a workspace :

// find ~~BackupChain~~ using the repository and workspace names (return null if not found)
  BackupChain chain = backup.findBackup("db1", "ws1");

  // Get the RepositoryEntry and WorkspaceEntry
  ManageableRepository repo = repositoryService.getRepository(repository);
  RepositoryEntry repoconf = repo.getConfiguration();
  List<WorkspaceEntry> entries = repoconf.getWorkspaceEntries();
  WorkspaceEntry = getNewEntry(entries, workspace); // create a copy entry from an existing one

  // restore backup log using ready RepositoryEntry and WorkspaceEntry
  File backLog = new File(chain.getLogFilePath());
  BackupChainLog bchLog = new BackupChainLog(backLog);

  // initialize the workspace
  repository.configWorkspace(workspaceEntry);

  // run restoration
  backup.restore(bchLog, repositoryEntry, workspaceEntry);

4.2.1 Restoring into an existing workspace

These instructions only applies to regular workspace. Special instructions are provided for System workspace below

To restore a backup over an existing workspace, you are required to clear its data. Your backup process should follow these steps :

  • remove workspace
ManageableRepository repo = repositoryService.getRepository(repository);
repo.removeWorkspace(workspace);

  • clean database, value storage, index;
  • restore (see snippet above)

4.2.2 System workspace

The BackupWorkspaceInitializer is available in JCR 1.9 and later

Restoring the JCR System workspace requires to shutdown the system and use of a special initializer.

Follow these steps (this will also work for normal workspaces) :

  • Stop repository (or portal)
  • clean database, value storage, index;
  • In configuration the workspace set BackupWorkspaceInitializer to reference your backup.
For example :
<workspaces>
  <workspace name="production" ... >
    <container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
      ...
    </container>
    <initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer">
      <properties>
         <property name="restore-path" value="D:\java\exo-working\backup\repository_production-20090527_030434"/>
      </properties>
   </initializer>
    ...
</workspace>

  • Start repository (or portal).

5 Scheduling (experimental)

The Backup service has an additional feature that can be useful for a production level backup implementation. When you need to organize a backup of a repository it's necessary to have a tool which will be able to create and manage a cycle of Full and Incremental backups in periodic manner.

The service has internal BackupScheduler which can run a configurable cycle of BackupChains as if they have been executed by a user during some period of time. I.e. BackupScheduler is a user-like daemon which asks the BackupManager to start or stop backup operations.

For that purpose BackupScheduler has the method

BackupScheduler.schedule(backupConfig, startDate, stopDate, chainPeriod, incrementalPeriod)

where

  • backupConfig ? a ready configuration which will be given to the BackupManager.startBackup() method
  • startDate ? a date and time of the backup start
  • stopDate ? a date and time of the backup stop
  • chainPeriod ? a period after which a current BackupChain will be stopped and a new one will be started, in seconds
  • incrementalPeriod ? if it is greater than 0 it will be used to override the same value in backupConfig.
// geting the scheduler from the BackupManager
   BackupScheduler scheduler = backup.getScheduler();

// schedule backup using a ready configuration (Full + Incrementals) to run from startTime
// to stopTime. Full backuop will be performed every 24 hours (BackupChain lifecycle),
// incremental will rotate result files every 3 hours.
   scheduler.schedule(config, startTime, stopTime, 3600  * 24, 3600 * 3);

// it's possible to run the scheduler for an uncertain period of time (i.e. without stop time).
// schedule backup to run from startTime till it will be stopped manually
// also there, the incremental will rotate result files as it configured in BackupConfig
   scheduler.schedule(config, startTime, null, 3600 * 24, 0);

// to unschedule backup simply call the scheduler with the configuration describing the 
// already planned backup cycle.
// the scheduler will search in internal tasks list for task with repository and
// workspace name from the configuration and will stop that task.
   scheduler.unschedule(config);

When the BackupScheduler starts the scheduling, it uses the internal Timer with startDate for the first (or just once) execution. If chainPeriod is greater than 0 then the task is repeated with this value used as a period starting from startDate. Otherwise the task will be executed once at startDate time. If the scheduler has stopDate it will stop the task ( the chain cycle) after stopDate. And the last parameter incrementalPeriod will be used instead of the same from BackupConfig if its values are greater than 0.

Starting each task (BackupScheduler.schedule(...)), the scheduler creates a task file in the service working directory (see Configuration, backup-dir) which describes the task backup configuration and periodic values. These files will be used at the backup service start (JVM start) to reinitialize BackupScheduler for continuous task scheduling. Only tasks that don't have a stopDate or a stopDate not expired will be reinitialized.

There is one notice about BackupScheduler task reinitialization in the current implementation. It comes from the BackupScheduler nature and its implemented behaviour. As the scheduler is just a virtual user which asks the BackupManager to start or stop backup operations, it isn't able to reinitialize each existing BackupChain before the service (JVM) is stopped. But it's possible to start a new operation with the same configuration via BackupManager (that was configured before and stored in a task file).

This is a main detail of the BackupScheduler which should be taken into suggestion of a backup operation design now. In case of reinitialization the task will have new time values for the backup operation cycle as the chainPeriod and incrementalPeriod will be applied again. That behaviour may be changed in the future.

Tags:
Created by Peter Nedonosko on 03/24/2008
Last modified by Sören Schmidt on 08/06/2009

Products

generated on Thu Sep 02 15:24:59 UTC 2010

eXo Optional Modules

eXo Core Foundations


Copyright (c) 2000-2010. All Rights Reserved - eXo platform SAS
2.4.30451