This is a wrapper function for applying normalise
,
autoref
, and decompose
. This takes a data frame
and converts it straight into a database, which is the main intended use case
for the package.
Usage
autodb(
df,
digits = getOption("digits"),
single_ref = FALSE,
ensure_lossless = TRUE,
remove_avoidable = FALSE,
constants_name = "constants",
progress = FALSE,
progress_file = "",
...
)
Arguments
- df
a data.frame, containing the data to be normalised.
- digits
a positive integer, indicating how many significant digits are to be used for numeric and complex variables. This is used for both pre-formatting in
discover
, and for rounding the data before use indecompose
, so that the data satisfies the resulting schema. A value ofNA
results in no rounding. By default, this usesgetOption("digits")
, similarly toformat
. See the "Floating-point variables" section fordiscover
for why this rounding is necessary for consistent results across different machines. See the note inprint.default
aboutdigits >= 16
.- single_ref
a logical, FALSE by default. If TRUE, then only one reference between each relation pair is kept when generating foreign key references. If a pair has multiple references, the kept reference refers to the earliest key for the child relation, as sorted by priority order.
- ensure_lossless
a logical, indicating whether to check whether the normalisation is lossless. If it is not, then an additional relation is added to the final "database", containing a key for
df
. This is enough to make the normalisation lossless.- remove_avoidable
a logical, indicating whether to remove avoidable attributes in relations. If so, then an attribute are removed from relations if the keys can be changed such that it is not needed to preserve the given functional dependencies.
- constants_name
a scalar character, giving the name for any relation created to store constant attributes. If this is the same as a generated relation name, it will be changed, with a warning, to ensure that all relations have a unique name.
- progress
a logical, for whether to display progress to the user during dependency search in
discover
.- progress_file
a scalar character or a connection. If
progress
is non-zero, determines where the progress is written to, in the same way as thefile
argument forcat
.- ...
further arguments passed on to
discover
.
Value
A database
, containing the data in df
within the
inferred database schema.
Details
Since decompose
only works with functional dependencies, not approximate
dependencies, the accuracy in discover
is fixed as 1.
Examples
# simple example
autodb(ChickWeight)
#> database with 2 relations
#> 4 attributes: weight, Time, Chick, Diet
#> relation Chick: Chick, Diet; 50 records
#> key 1: Chick
#> relation Time_Chick: Time, Chick, weight; 578 records
#> key 1: Time, Chick
#> references:
#> Time_Chick.{Chick} -> Chick.{Chick}