Installation artifacts lifecycle

Delivering installation artifacts

DGCLI downloads the installation artifacts from Public Update Servers.
DGCLI places the fetched datasets into the installation artifacts storage. (The Docker images can be placed into the Docker Registry.)

See more in the DGCLI utility description.

Note:

Installation artifacts storage requires regular maintenance to clear out the outdated installation artifacts. This helps to prevent overflow of the storage space.

DGCLI does not track and does not manage free space in installation artifacts storage or Docker Registry. It is recommended to set up monitoring for these parts of infrastructure and perform regular maintenance.
All artifacts then migrate from public to private network, so that they become available to Helm and On-Premise services.

The migration process can be implemented in different ways depending on the specifics of the project.

Example:

To implement migration of installation artifacts from the public network to the private network, you can install Docker Registry and an S3 compatible storage in the private network. Then configure synchronization between them and the corresponding entities in the public network.

Importing data during installation or updating of a service

Installation of a service includes copying required datasets from the installation artifacts storage (see the previous section) into one or multiple storages that the service will use, e.g., into a PostgreSQL database. Oftentimes, a special Kubernetes Importer job exists for this purpose, providing the following lifecycle for a dataset:

The job reads a manifest file from the installation artifacts storage. This file contains a list of objects stored in the installation artifacts storage and their latest versions.
The job uses the manifest to determine if there is a new piece of data for the service. If there is no new data, the job stops.
The job spawns some workers. Each worker fetches the necessary installation artifacts and imports the new data to the service's data storage as a separate copy.
After the workers complete the data import, the job performs a series of health checks to ensure the integrity of the new data.

If all checks are completed successfully, the job removes the original data, replacing them with the new data.

If one or more checks fails, the job stops the updating process and requires actions from the system administrator. The original data is left intact.

Common scenarios for updating services and datasets

The first step of updating any service is fetching installation artifacts using the DGCLI pull mode. In this mode, a manifest file is created, which contains information about the service including its version. Every time the DGCLI pull mode is run, a new manifest file is created, the older files are not changed.

Next, different scenarios for updating services and datasets are possible:

Updating a service without updating its data

Helm updates the service a similar way to how the Kubernetes job updates the data (see the previous section): new instances of the services will be deployed in addition to the current ones, and if health checks are completed successfully, traffic is redirected to the new instances. Otherwise, the process stops, requiring actions from the system administrator.

To update a service, specify the required version in the --version flag when running the helm upgrade command, for example:

helm upgrade --version=VERSION --atomic --values ./values-search.yaml search-api 2gis-on-premise/search-api

Updating a service and its data with Helm

This scenario is not supported by certain services.

This scenario includes two stages:

Helm launches the service's Kubernetes Importer job to update the data. To make the job recognize that new data is available, specify a new manifest in the dgctlStorage.manifest parameter of the values.yaml service configuration file.
Helm updates the service to the version specified in the --version flag (see the previous scenario).

Updating a service data only

This scenario is not supported by certain services.

The corresponding Kubernetes Importer job is scheduled to run, for example, on an everyday basis.

To avoid updating a service version, specify its current version in the --version flag when running the helm upgrade command. To make the job recognize that new data is available, specify a new manifest in the dgctlStorage.manifest parameter of the values.yaml service configuration file.

Important:

Some services may not support updating datasets, or the updating process may differ from the described one.

For a specific service's updating process description, see its documentation in the Update section.

Cleaning old data

To free up space in the storage, cleaning irrelevant data regularly is recommended. You also need to keep manifests for the current and some previous versions of data so that you can revert to it in case of problems. Depending on your strategy of updating data, two cleaning scenarios are available.

If all data is updated at the same time

If you update data for all services at the same time, execute the following command after each run of dgctl pull:

dgctl manifest cleanup --keep-count N

N is the number of most recent manifests to keep. The command above deletes all manifests and their data, except for N+2 most recent ones. For example, if you update data daily, the manifests for N+2 recent days are kept.

If data is updated separately

If data of different services is updated with different frequency (for example, daily and monthly), the previous cleaning scenario is not recommended, because manifests for monthly updates must be kept longer than the ones for daily updates. You need to prevent erroneously deleting a manifest for the current version of data that is updated less frequently. Otherwise, this data can be lost in the future.

Do the cleanup in the following stages:

For the first cleanup:
1. Get a list of all manifests in the storage:
```
dgctl manifest list
```
2. Remove the manifests, based on which your environment is currently deployed, from the list.
3. Delete all other manifests:
```
dgctl manifest delete --manifest-name {manifest-name}.json
```
Create two files with lists of manifests for daily and monthly data fetch.
For the next cleanups:
1. After each run of the dgctl pull command, write the name of the received manifest to the end of the corresponding file.
2. Decide how many manifests you want to keep to enable revering to the previous version.
3. Delete all other manifests:
```
dgctl manifest delete --manifest-name {manifest-name}.json
```