dp3.database.schema_cleaner ¶
SchemaCleaner ¶
SchemaCleaner(db: Database, schema_col: Collection, model_spec: ModelSpec, config: HierarchicalDict, log: Logger = None)
Maintains a history of model_spec
defined schema, updates the database when needed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db
|
Database
|
Target database mapping. (not just a database connection) |
required |
schema_col
|
Collection
|
Collection where schema history is stored. |
required |
model_spec
|
ModelSpec
|
Model specification. |
required |
log
|
Logger
|
Logger instance to serve as parent. If not provided, a new logger will be created. |
None
|
Source code in dp3/database/schema_cleaner.py
get_current_schema_doc ¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
infer
|
bool
|
Whether to infer the schema if it is not found in the database. |
False
|
Returns: Current schema
Source code in dp3/database/schema_cleaner.py
safe_update_schema ¶
Checks whether schema saved in database is up-to-date and updates it if necessary.
Will NOT perform any changes in master collections on conflicting model changes, but will raise an exception instead. Any such changes must be performed via the CLI.
As this method modifies the schema collection, it should be called only by the main worker.
The schema collection format is as follows:
{
"_id": int,
"schema": { ... }, # (1)!
"storage": {
"snapshot_bucket_size": int
},
"version": int
}
Raises:
Type | Description |
---|---|
ValueError
|
If conflicting changes are detected. |
Source code in dp3/database/schema_cleaner.py
get_schema_status ¶
Gets the current schema status.
database_schema
is the schema document from the database.
configuration_schema
is the schema document constructed from the current configuration.
updates
is a dictionary of required updates to each entity.
Returns:
Type | Description |
---|---|
tuple[dict, dict, dict, list]
|
Tuple of ( |
Source code in dp3/database/schema_cleaner.py
detect_changes ¶
detect_changes(db_schema_doc: dict, current_schema: dict) -> tuple[dict[str, dict[str, dict[str, str]]], list[str]]
Detects changes between configured schema and the one saved in the database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
db_schema_doc
|
dict
|
Schema document from the database. |
required |
current_schema
|
dict
|
Schema from the configuration. |
required |
Returns:
Type | Description |
---|---|
tuple[dict[str, dict[str, dict[str, str]]], list[str]]
|
Tuple of required updates to each entity and list of deleted entites. |
Source code in dp3/database/schema_cleaner.py
infer_current_schema ¶
Infers schema from current database state.
Will detect the attribute type, i.e. whether it is a plain, timeseries or observations attribute.
All other properties will be set based on configuration.
Returns:
Type | Description |
---|---|
dict
|
Dictionary with inferred schema. |
Source code in dp3/database/schema_cleaner.py
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
|
construct_schema_doc ¶
Constructs schema document from current schema configuration.
The schema document format is as follows:
{
"entity_type": {
"attribute": {
"t": 1 | 2 | 4,
"data_type": <data_type.type_info> | None
"timeseries_type": <timeseries_type> | None
"series": dict[str, <data_type.type_info>] | None
}
}
}
where t
is attribute type,
and data_type
is the data type string.
timeseries_type
is one of regular
, irregular
or irregular_intervals
.
series
is a dictionary of series names and their data types.
timeseries_type
and series
are present only for timeseries attributes,
data_type
is present only for plain and observations attributes.
Source code in dp3/database/schema_cleaner.py
await_updated ¶
Checks whether schema saved in database is up-to-date and awaits its update by the main worker on mismatch.
Source code in dp3/database/schema_cleaner.py
get_index_name_by_keys ¶
Gets index name by keys.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection
|
str
|
Collection name. |
required |
keys
|
dict
|
Index keys. |
required |
Returns: Index name.
Source code in dp3/database/schema_cleaner.py
update_storage ¶
Updates storage settings in schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prev_storage
|
dict
|
Previous storage settings. |
required |
curr_storage
|
dict
|
Current storage settings. |
required |
Source code in dp3/database/schema_cleaner.py
migrate ¶
Migrates schema to the latest version.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema_doc
|
dict
|
Schema to migrate. |
required |
Returns: Migrated schema.
Source code in dp3/database/schema_cleaner.py
migrate_schema_2_to_3 ¶
Migrates schema from version 2 to version 3.
This migration adds the storage setting for snapshot bucket size to the schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema
|
Schema to migrate. |
required |
Returns: Migrated schema.