eXo JCR Backup Service
1 Concept
The main purpose of that feature is to restore data in case of system faults and repository crashes. Also the backup results may be used as a content history.
The eXo JCR backup service was developed from the JCR 1.8 implementation. It's an independent service available as an eXo JCR Extensions project.
The concept is based on the export of a workspace unit in the
Full, or
Full + Incrementals model.
A repository workspace can be backup and restored using a combination of these modes. In all cases, at least one Full (initial) backup must be executed to mark a starting point of the backup history.
An
Incremental backup is not a complete image of the workspace. It contains only changes for some period. So it is not possible to perform an
Incremental backup without an initial
Full backup.
The Backup service may operate as a hot-backup process at runtime on an in-use workspace. It's a case when the Full + Incrementals model should be used to have a guaranty of data consistency during restoration. An Incremental will be run starting from the start point of the Full backup and will contain changes that have occured during the Full backup too.
A
restore operation is a mirror of a backup one. At least one Full backup should be restored to obtain a workspace corresponding to some point in time. On the other hand, Incrementals may be restored in the order of creation to reach a required state of a content. If the Incremental contains the same data as the Full backup (hot-backup), the changes will be applied again as if they were made in a normal way via API calls.
According to the model there are several modes for backup logic:
- Full backup only : single operation, runs once
- Full + Incrementals : Start with an initial Full backup and then keep incrementals changes in one file. Runs until it is stopped.
- Full + Incrementals(periodic) : Start with an initial Full backup and then keep incrementals with periodic result file rotation. Runs until it is stopped.
2 How it works
2.1 Implementation details
Full backup/restore is implemented using the JCR SysView Export/Import.
Workspace data will be exported into Sysview XML data from root node.
Restore is implemented using the special eXo JCR API feature: a dynamic workspace creation. Restoring of the workspace Full backup will create one new workspace in the repository. Then the SysView XML data will be imported as the root node.
Incremental backup is implemented using the eXo JCR ChangesLog API. This API allows to record each JCR API call as atomic entries in a changelog. Hence, the Incremental backup uses a listener that collects these logs and stores them in a file.
Restoring an incremental backup consists in applying the collected set of ChangesLogs to a workspace in the correct order.
2.2 Work basics
The work of Backup is based on the
BackupConfig configuration and the
BackupChain logical unit.
BackupConfig describes the backup operation chain that will be performed by the service. When you intend to work with it, the configuration should be prepared before the backup is started.
The configuration contains such values as:
- types of full and incremental backup ? (fullBackupType, incrementalBackupType) Strings with full names of classes which will cover the type functional.
- incremental period - a period after that a current backup will be stopped and a new one will be started, in seconds (long).
- target repository and workspace names ? Strings with described names
- destination directory for result files ? String with a path to a folder where operation result files will be stored.
BackupChain is a unit performing the backup process and it covers the principle of initial Full backup execution and manages Incrementals operations.
BackupChain is used as a key object for accessing current backups during runtime via
BackupManager. Each
BackupJob performs a single atomic operation ? a Full or Incremental process. The result of that operation is data for a Restore.
BackupChain can contain one or more
BackupJobs. But at least the initial Full job is always there. Each
BackupJobs has its own unique number which means its Job order in the chain, the initial Full job always has the number 0.
Backup process, result data and file location
To start the backup process it's necessary to create the
BackupConfig and call the
BackupManager.startBackup(BackupConfig) method. This method will return
BackupChain created according to the configuration. At the same time the chain creates a
BackupChainLog which persists
BackupConfig content and
BackupChain operation states to the file in the service working directory (see Configuration).
When the chain starts the work and the initial
BackupJob starts, the job will create a result data file using the destination directory path from
BackupConfig. The destination directory will contain a directory with an automatically created name using the pattern repository_workspace-timestamp where timestamp is current time in the format of yyyyMMdd_hhmmss (E.g. db1_ws1-20080306_055404). The directory will contain the results of all Jobs configured for execution. Each Job stores the backup result in its own file with the name repository_workspace-timestamp.jobNumber.
BackupChain saves each state (
STARTING,
WAITING,
WORKING,
FINISHED) of its Jobs in the
BackupChainLog, which has a current result full file path.
BackupChain log file and job result files are a whole and consistent unit, that is a source for a Restore.
BackupChain log contains absolute paths to job result files. Don't move these files to another location.
Restore requirements
As mentioned before a Restore operation is a mirror of a Backup. The process is a Full restore of a root node with restoring an additional Incremental backup to reach a desired workspace state. Restoring of the workspace Full backup will create a new workspace in the repository using given
RepositoyEntry of existing repository and given (preconfigured)
WorkspaceEntry for a new target workspace. A Restore process will restore a root node there from the SysView XML data.
The target workspace should not be in the repository. Otherwise a BackupConfigurationException exception will be thrown.
For creation and manipulation with
Workspaces check the article
Repository and Workspace management.
Finally we may say that a Restore is a process of a new
Workspace creation and filling it with a Backup content. In case you already have a target Workspace (with the same name) in a
Repository, you have to configure a new name for it. If no target workspace exists in the Repository you may use the same name as the Backup one.
3 Configuration
As an optional extension, the Backup service is not enabled by default.
You need to enable it via configuration.
Below is an example configuration compatible with JCR 1.9.3 and later :
<component>
<key>org.exoplatform.services.jcr.ext.backup.BackupManager</key>
<type>org.exoplatform.services.jcr.ext.backup.impl.BackupManagerImpl</type>
<init-params>
<properties-param>
<name>backup-properties</name>
<property name="default-incremental-job-period" value="3600" /> <!-- set default incremental period = 60 minutes -->
<property name="full-backup-type" value="org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob" />
<property name="incremental-backup-type" value="org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob" />
<property name="backup-dir" value="target/backup" />
</properties-param>
</init-params>
</component>
Where:
- incremental-backup-type (since 1.9.3) : t the FQN of incremental job class. Must implement org.exoplatform.services.jcr.ext.backup.BackupJob
- full-backup-type (since 1.9.3) : the FQN of the full backup job class; Must implement org.exoplatform.services.jcr.ext.backup.BackupJob
- default-incremental-job-period (since 1.9.3) :the period between incremetal flushes (in seconds)
- backup-dir : the path to a working directory where the service will store internal files and chain logs.
4 Usage
In following example we create a
BackupConfig bean for the Full + Incrementals mode, then we ask the
BackupManager to start the backup process.
// Obtaining the backup service from the eXo container.
BackupManager backup = (BackupManager) container.getComponentInstanceOfType(BackupManager.class);
// And prepare the BackupConfig instance with custom parameters.
// full backup & incremental
File backDir = new File("/backup/ws1"); // the destination path for result files
backDir.mkdirs();
BackupConfig config = new BackupConfig();
config.setRepository(repository.getName());
config.setWorkspace("ws1");
config.setBackupDir(backDir);
// Before 1.9.3, you also need to indicate the backupjobs class FDNs
// config.setFullBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.FullBackupJob");
// config.setIncrementalBackupType("org.exoplatform.services.jcr.ext.backup.impl.fs.IncrementalBackupJob");
// start backup using the service manager
BackupChain chain = backup.startBackup(config);
To stop the backup operation you have to use the
BackupChain instance.
// stop backup
backup.stopBackup(chain);
Restoration involves the reloading the backup file into a
BackupChainLog and applying appropriate workspace initialization.
The following snippet shows the typical sequence for restoring a workspace :
// find ~~BackupChain~~ using the repository and workspace names (return null if not found)
BackupChain chain = backup.findBackup("db1", "ws1");
// Get the RepositoryEntry and WorkspaceEntry
ManageableRepository repo = repositoryService.getRepository(repository);
RepositoryEntry repoconf = repo.getConfiguration();
List<WorkspaceEntry> entries = repoconf.getWorkspaceEntries();
WorkspaceEntry = getNewEntry(entries, workspace); // create a copy entry from an existing one
// restore backup log using ready RepositoryEntry and WorkspaceEntry
File backLog = new File(chain.getLogFilePath());
BackupChainLog bchLog = new BackupChainLog(backLog);
// initialize the workspace
repository.configWorkspace(workspaceEntry);
// run restoration
backup.restore(bchLog, repositoryEntry, workspaceEntry);
4.2.1 Restoring into an existing workspace
These instructions only applies to regular workspace. Special instructions are provided for System workspace below
To restore a backup over an existing workspace, you are required to clear its data. Your backup process should follow these steps :
ManageableRepository repo = repositoryService.getRepository(repository);
repo.removeWorkspace(workspace);
- clean database, value storage, index;
- restore (see snippet above)
4.2.2 System workspace
The BackupWorkspaceInitializer is available in JCR 1.9 and later
Restoring the JCR System workspace requires to shutdown the system and use of a special initializer.
Follow these steps (this will also work for normal workspaces) :
- Stop repository (or portal)
- clean database, value storage, index;
- In configuration the workspace set BackupWorkspaceInitializer to reference your backup.
For example :
<workspaces>
<workspace name="production" ... >
<container class="org.exoplatform.services.jcr.impl.storage.jdbc.JDBCWorkspaceDataContainer">
...
</container>
<initializer class="org.exoplatform.services.jcr.impl.core.BackupWorkspaceInitializer">
<properties>
<property name="restore-path" value="D:\java\exo-working\backup\repository_production-20090527_030434"/>
</properties>
</initializer>
...
</workspace>
- Start repository (or portal).
5 Scheduling (experimental)
The Backup service has an additional feature that can be useful for a production level backup implementation. When you need to organize a backup of a repository it's necessary to have a tool which will be able to create and manage a cycle of Full and Incremental backups in periodic manner.
The service has internal
BackupScheduler which can run a configurable cycle of
BackupChains as if they have been executed by a user during some period of time. I.e.
BackupScheduler is a user-like daemon which asks the
BackupManager to start or stop backup operations.
For that purpose BackupScheduler has the method
BackupScheduler.schedule(backupConfig, startDate, stopDate, chainPeriod, incrementalPeriod)
where
- backupConfig ? a ready configuration which will be given to the BackupManager.startBackup() method
- startDate ? a date and time of the backup start
- stopDate ? a date and time of the backup stop
- chainPeriod ? a period after which a current BackupChain will be stopped and a new one will be started, in seconds
- incrementalPeriod ? if it is greater than 0 it will be used to override the same value in backupConfig.
// geting the scheduler from the BackupManager
BackupScheduler scheduler = backup.getScheduler();
// schedule backup using a ready configuration (Full + Incrementals) to run from startTime
// to stopTime. Full backuop will be performed every 24 hours (BackupChain lifecycle),
// incremental will rotate result files every 3 hours.
scheduler.schedule(config, startTime, stopTime, 3600 * 24, 3600 * 3);
// it's possible to run the scheduler for an uncertain period of time (i.e. without stop time).
// schedule backup to run from startTime till it will be stopped manually
// also there, the incremental will rotate result files as it configured in BackupConfig
scheduler.schedule(config, startTime, null, 3600 * 24, 0);
// to unschedule backup simply call the scheduler with the configuration describing the
// already planned backup cycle.
// the scheduler will search in internal tasks list for task with repository and
// workspace name from the configuration and will stop that task.
scheduler.unschedule(config);
When the
BackupScheduler starts the scheduling, it uses the internal
Timer with
startDate for the first (or just once) execution. If
chainPeriod is greater than 0 then the task is repeated with this value used as a period starting from
startDate. Otherwise the task will be executed once at
startDate time. If the scheduler has
stopDate it will stop the task ( the chain cycle) after
stopDate. And the last parameter
incrementalPeriod will be used instead of the same from
BackupConfig if its values are greater than 0.
Starting each task (
BackupScheduler.schedule(...)), the scheduler creates a task file in the service working directory (see
Configuration, backup-dir) which describes the task backup configuration and periodic values.
These files will be used at the backup service start (JVM start) to reinitialize
BackupScheduler for continuous task scheduling. Only tasks that don't have a
stopDate or a
stopDate not expired will be reinitialized.
There is one notice about
BackupScheduler task reinitialization in the current implementation. It comes from the
BackupScheduler nature and its implemented behaviour. As the scheduler is just a virtual user which asks the
BackupManager to start or stop backup operations, it isn't able to reinitialize each existing
BackupChain before the service (JVM) is stopped. But it's possible to start a new operation with the same configuration via
BackupManager (that was configured before and stored in a task file).
This is a main detail of the
BackupScheduler which should be taken into suggestion of a backup operation design now. In case of reinitialization the task will have new time values for the backup operation cycle as the
chainPeriod and
incrementalPeriod will be applied again. That behaviour may be changed in the future.