Skip to contents

This is a wrapper function for applying normalise, autoref, and decompose. This takes a data frame and converts it straight into a database, which is the main intended use case for the package.

Usage

autodb(
  df,
  digits = getOption("digits"),
  single_ref = FALSE,
  ensure_lossless = TRUE,
  remove_avoidable = FALSE,
  constants_name = "constants",
  progress = FALSE,
  progress_file = "",
  ...
)

Arguments

df

a data.frame, containing the data to be normalised.

digits

a positive integer, indicating how many significant digits are to be used for numeric and complex variables. This is used for both pre-formatting in discover, and for rounding the data before use in decompose, so that the data satisfies the resulting schema. A value of NA results in no rounding. By default, this uses getOption("digits"), similarly to format. See the "Floating-point variables" section for discover for why this rounding is necessary for consistent results across different machines. See the note in print.default about digits >= 16.

single_ref

a logical, FALSE by default. If TRUE, then only one reference between each relation pair is kept when generating foreign key references. If a pair has multiple references, the kept reference refers to the earliest key for the child relation, as sorted by priority order.

ensure_lossless

a logical, indicating whether to check whether the normalisation is lossless. If it is not, then an additional relation is added to the final "database", containing a key for df. This is enough to make the normalisation lossless.

remove_avoidable

a logical, indicating whether to remove avoidable attributes in relations. If so, then an attribute are removed from relations if the keys can be changed such that it is not needed to preserve the given functional dependencies.

constants_name

a scalar character, giving the name for any relation created to store constant attributes. If this is the same as a generated relation name, it will be changed, with a warning, to ensure that all relations have a unique name.

progress

a logical, for whether to display progress to the user during dependency search in discover.

progress_file

a scalar character or a connection. If progress is non-zero, determines where the progress is written to, in the same way as the file argument for cat.

...

further arguments passed on to discover.

Value

A database, containing the data in df within the inferred database schema.

Details

Since decompose only works with functional dependencies, not approximate dependencies, the accuracy in discover is fixed as 1.

Examples

# simple example
autodb(ChickWeight)
#> database with 2 relations
#> 4 attributes: weight, Time, Chick, Diet
#> relation Chick: Chick, Diet; 50 records
#>   key 1: Chick
#> relation Time_Chick: Time, Chick, weight; 578 records
#>   key 1: Time, Chick
#> references:
#> Time_Chick.{Chick} -> Chick.{Chick}