Difference between revisions of "Db-string-type-cleanup"

Latest revision as of 16:52, 24 October 2013

Blueprint Name: db-string-type-cleanup

Author: Rick Harris/s1rp/sirp <rconradharris@gmail.com>

Summary

The proposal here is to change from using SQLAlchemy's String type directly in our models to using logical datatypes that may or may not (depending on the database) map to string columns under the hood.

Rationale

We use string columns to store a variety of logical datatypes: UUIDs, IP addresses, generic strings, among others.

The current practice is for developers to use SQLAlchemy's `String` datatype directly and specify the desired length each time, e.g. String(255), or String(36).

This has a few problems:

Typos - For example, https://review.openstack.org/#/c/39296 accidentally changed '255' to '25'. Since these are runtime and not parse-time errors, they are usually difficult to recover from and require further cleanup migrations. With the proposed change, however, typos become parse-time errors that we can promptly fix.

Unecessary Variance - For example, we have 'String(256)' in places where String(255) was preferable (since that's the de facto standard for generic strings). With the proposed change, developers are encouraged to use the existing logical datatypes where it makes sense, and only introduce new datatypes when necessary.

No Physical Storage/Logical Datatype Abstraction - Some databases support certain logical datatypes natively, like UUID. By declaring the columns as strings, instead of say 'UUID', we're tying that column down to a particular underlying storage type. (In fact for IP Addresses and CIDR addresses, we're already using this abstraction.). With the proposed change, we are free down the road to change the underlying representation of these logical datatypes where it makes sense.

Design

The proposal here is to extend the work we did for `IPAddress` and `CIDR` to other logical datatypes that are stored as underlying Strings.

This would mean that instead of developers using SQLAlchemy Strings directly and specifying a length, they would use (or add) a predefined logical datatype from our existing nova/db/sqlachemy/types.py module.

The following mapping from existing physical datatype to logical datatype is proposed:

Count	Current Physical Datatype	Proposed Logical Datatype
1	String(2047)	types.SMBackendConfigParams
2	String(12)	types.PCIAddress
2	String(5)	types.NetworkProtocol
2	String(8)	types.PCIDeviceType
4	String(4)	types.PCIVendorID, types.PCIProductID
5	String(39)	types.IPAddress (exists)
7	String(256)	types.String
8	String(43)	types.CIDR (exists)
56	String(36)	types.UUID
388	String(255)	types.String

@@ Line 1: / Line 1: @@
-Blueprint Name: db-string-type-cleanup
+Blueprint Name: db-string-type-cleanup<br />
+Author: Rick Harris/s1rp/sirp <rconradharris@gmail.com>
 == Summary ==
+The proposal here is to change from using SQLAlchemy's String type directly in our models to using logical datatypes that may or may not (depending on the database) map to string columns under the hood.
 == Rationale ==
-We use String columns as the physical storage for many kinds of logical datatypes, whether it's a UUID as String(36), IPAddress as String(39), or generic strings which are usually String(255).
+We use string columns to store a variety of logical datatypes: UUIDs, IP addresses, generic strings, among others.
+The current practice is for developers to use SQLAlchemy's `String` datatype directly and specify the desired length each time, e.g. String(255), or String(36).
+This has a few problems:
+* ''Typos'' - For example, https://review.openstack.org/#/c/39296 accidentally changed '255' to '25'. Since these are runtime and not parse-time errors, they are usually difficult to recover from and require further cleanup migrations. With the proposed change, however, typos become parse-time errors that we can promptly fix.
+* ''Unecessary Variance'' - For example, we have 'String(256)' in places where String(255) was preferable (since that's the de facto standard for generic strings). With the proposed change, developers are encouraged to use the existing logical datatypes where it makes sense, and only introduce new datatypes when necessary.
+* ''No Physical Storage/Logical Datatype Abstraction'' - Some databases support certain logical datatypes natively, like UUID.  By declaring the columns as strings, instead of say 'UUID', we're tying that column down to a particular underlying storage type. (In fact for IP Addresses and CIDR addresses, we're already using this abstraction.). With the proposed change, we are free down the road to change the underlying representation of these logical datatypes where it makes sense.
+== Design ==
+The proposal here is to extend the work we did for `IPAddress` and `CIDR` to other logical datatypes that are stored as underlying Strings.
+This would mean that instead of developers using SQLAlchemy Strings directly and specifying a length, they would use (or add) a predefined logical datatype from our existing nova/db/sqlachemy/types.py module.
+The following mapping from existing physical datatype to logical datatype is proposed:
 {| class="wikitable"
 |-
-! Count !! Type !! Notes
+! Count !! Current Physical Datatype !! Proposed Logical Datatype
 |-
-| 2     || String(12)   || PCI Address
+| 1     || String(2047)|| types.SMBackendConfigParams ||
 |-
-| 2     || String(5)     || Protocol
+| 2     || String(12)   || types.PCIAddress
 |-
-| 2     || String(8)     || PCI Device Type
+| 2     || String(5)     || types.NetworkProtocol
 |-
-| 4     || String(4)     || PCI Vendor/Product IDs
+| 2     || String(8)     || types.PCIDeviceType
 |-
-| 5     || String(39)   || IPAddress
+| 4     || String(4)     || types.PCIVendorID, types.PCIProductID
 |-
-| 7     || String(256) || Typos?
+| 5     || String(39)   || types.IPAddress ''(exists)''
 |-
-| 8     || String(43)   || CIDR
+| 7     || String(256) || types.String
 |-
-| 56   || String(36)   || UUID
+| 8     || String(43)   || types.CIDR ''(exists)''
 |-
-| 388 || String(255) || Generic String
+| 56   || String(36)   || types.UUID
+|-
+| 388 || String(255) || types.String
 |}
-Right now, the standard is for developers to use SQLAlchemy's String type directly, where they pass in the desired length.
-There are a few problems with this approach:
-. UNSAFE: Each time a developer types '255', there is an opportunity to introduce a bug. Two examples of this are the 7 256-length columns we have and https://review.openstack.org/#/c/39296/ where '255' was accidentally changed to '25' without reviewers noticing.
-What makes this issue particularly nefarious, is that these kinds of column-length bugs usually do not surface until after a deployment when we start using real data, with all of its natural variance. This makes recovering from these bugs much harder than normal, since additional, carefully-crafted migrations need to be written to fix the existing bad data. As such, it would be nice to turn these runtime bugs into things the interpreter can catch at parse-time.
-== Design ==