RFC-DB.txt
$Id: RFC-DB.txt,v 1.1 2008-01-26 16:18:40 clorenz Exp $ ---------------------------------------------------------------------------- 1. PREFACE ---------- This document should describe the database format of the next generation of mmusic. Please don't confuse this with gmmusic, since gmmusic is only the frontend for gmmusic. We don't know, if this new database scheme will ever be implemented, but we want to keep this as a kind of mind reference. This document will cover the structure of the new database as well as the transitions from the old database to the new one. The new database will be completely different from the existing one; it should be a "clean" structure, without redundancy and without any unconnected data (like today for example the "authors" field, which has got no reference at all to a "person" or "artist" table). This approach hugely blows up the database, but at the end, all data are perfectly organized and are absolutely consistent, which allows you to efficiently work with it. At the end of this document, you can find some mind scratches, notes and everything else, which is just here, since we don't want to lost it, but haven't yet found the time to finalize the thoughts. 2. THE NEW DATA STRUCTURES -------------------------- From various comments, I've learned, that the current data structure, with the artist-song-recording-media tables is insufficient for anything but modern pop music. Classical songs, which require for example a conductor can't be handled properly. Or jazz songs, where one song can be performed by many perfomers, are not really "new" songs, but should be represented as incarnation of a real "song". That leads us now to the question of all questions: What exactly is a song? (1) Something, that only consists of a name (the name of the song) and a few general data, like composer,writer,style,year? If yes, how can he handle re-issues of a song, where the style changes? Just think of "Yesterday" of the Beatles? I am sure, not all of the more than 1000 versions are of that slow style. This would mean, that in the end, the recording should get a "recording style" (mp3 category like) field, too, which usually equals the "song style", but in some cases it could be different. In the search methods, you'd have to ensure, that you all combinations (AND,OR,NOT) of those two styles can be searched for. (2) Something, that consists only of the name, composer, year and writer? If yes, this would be a kind of "original song data", but all non-original incarnations, like remixes, have to get the additional data (new composers/remixers, new year, etc.) somewhere else. But where? Do these fields really belong into the "recording" field? If yes, they would be redundant, if you have got two recordings of the same song. What about a new table "incarnation", which is the "middleware" between a song and a recording: A song can have multiple incarnations (such as remixes, reissues, styles, etc.), and each of those incarnations can have multiple true recordings. In this case, the "recordings" table would carry nothing, but the pure recording data, the quality- and a "comment" field. The "special" field should move into the "incarnation" table. Open: What about the "recording year"? Where does this field belong to? Especially for songs, played on TV? If a song, released in 1998 is recorded from a TV show in 1999, without any differences (so the song *is* the original one), the "incarnation" table would carry the year 1998, while the recording year would carry the year 1999. This means, that we finally would use three (possibly different) year values for one song: One of the (original) song, one of its incarnation and one of its "true" recording. Overkill or brilliant? I'd tend to the latter ;) (3) Something, that only consists of the song name? Certainly not. My personal favourite is (2). 3. THE TABLES ------------- I really have to draw an entity-relationship diagram... 3.1. PERSON_DATA ---------------- Need a better name for this... Maybe "people" ? This table handles all native data of a true, existant person. However, there's one thing, this table does *not* carry, and that is (or are) the name (or names), the person acts in public. Fields: - unique ID - date of birth - location of birth - date of death - location of death - cause of death - comment 3.2. KNOWN_AS ------------- This table carries a 1:n connection between a true, existant person (in table "person_data" and it's names, the person is known for. If a person does not have any pseudonyms, it is a 1:n connection, but many persons have a birth name and pseudonym. If you ask, why I won't carry the birth name in the table "person data", it's just because of redundancy. The table "person_data" will never be directly referenced. Maybe, this table could also be called "public". Fields: - id (primary key) - name (varchar) (Problem: How to deal with two non-itentical true persons with the same pseudonym? In this case, the frontend application has to provide the user all neccessiary data (of person_data), so that the *user* can select, which one he'll take.) - person_data (foreign key -> person_data(id)) 3.3. ARTIST_APPEARANCE ---------------------- This table should track, when and where an artist (e.g. in a band) appears Fields: - id (primary key) - known_as (foreign key to known_as(id)) - artist (foreign key to artist(id)) - start_date - start_location - end_date - end_location - end_cause - remarks Just an example: "Elton John" would have the following fields: id: 123 known_as: 456 (dereferences to "Elton John") artist: 789 (dereferences to "Elton John") ..... "Simon and Garfunkel" could have the following two rows: id: 124 known as: 457 (dereferences to "Simon & Garfunkel) artist: 365 (dereferences to "Paul Simon") ...... id: 125 known_as: 457 (dereferences to "Simon & Garfunkel) artist: 366 (dereferences to "Art Garfunkel") .... 3.4. ARTIST ----------- This tables' name is a bit misleading. This table is used to handle bands (consisting of multiple persons) and individual persons, regardles, whether they're the singers of a band or the composers or whatever else. Fields: - id primary key (really??) - name (cannot be unique, since it is possible, that e.g. two bands with the same name exist. For example, there do exist two bands, called "Atlantis") - printable_name The former "consists of" relation is now handles in the ARTIST_APPEARANCE table. 3.5. SONG --------- Carries all native song data. We assume, that a "song" consists only of a name. The rest is all carried by the incarnation of a song. So, the name "Song" is perhaps misleading... Fields: - id - name 3.6. AVAILABLE_SONG_ROLES ------------------------- This table represents all roles, a "performer" can have in a song Fields: - id (need to be hardcoded!) - role_type (need to be hardcoded!) Possible values: "performer", "composer", "lyrics". Anything else? 3.7. SONG_ROLE -------------- 1:1:n Connection between a Song, a performer and its roles in that song. (just think of "Albert Hammond" wrote "Free Electric Band", and "Albert Hammond" also sang "Free Electric Band). Fields: - id - song (foreign key to incarnation) - artist (foreign key to artist) - role (foreign key to available_song_roles) 3.8. DANCE STYLES ----------------- Fields: - id - name - comment 3.9. GENRE ---------- This table handles all MP3 ID-Tag genre entries Fields: - id - name - comment 3.10. RELEASE ------------- This table handles all possible releases of a song, such as S/A, S/B, LP, ..... Fields: - id - name - comment 3.11. INCARNATION ---------------- We clearly need a better name here :-) Fields: - id - song (foreign key to song) - publishing_date - genre (foreign key to genre) - dance (foreign key to dance styles) - bpm - release (foreign key to release) - lyrics - keywords (like war, summer, love, rain, baseball etc.) - comment In case, you wonder, there artist and authors are, they are represented by the "Song_Role" tables. And if you wonder, where the "Title" is, it is located in the "Song" table. Disadvantage: Information, which was previously held in a single table is now spread over three and more tables Advantage: Full referencial integrity. 3.12. RECORDING -------------- Fields: - id - song_id (foreign key to incarnation) - medium (foreign key to medium) - medium_side - medium_position (handles counter *and* track!) - length - year - longplay (bit) quality bits (I'd prefer true bit fields, but we need a function, that sums them up to a compareable value) special bits (Live, Cover version, ...) - remarks - digital 3.13. AVAILABLE_RECORDING_ROLES ------------------------------ Possible values: "engineer","conductor","technican", .... (see song) Fields: - id - role - remarks 3.14. RECORDING_ROLE ------------------- (see song) Fields: - id - recording (foreign key to recording) - artist (foreign key to artist) - role (foreign key to available_recording_roles) 3.15. MEDIUM_TYPE ----------------- Must be as generic as possible. Possible fields: - id - name - description - number_of_sides ("2" for Singles and tapes, "1" for CDs) - quality value (unique! With that field, a qualitative ordering of recordings of this medium should be made. A 5:1-recording on a DVD should get the "best" value here, while a recording on an audio tape should get the "worst" value. A select * from medium_type order by quality_value will fill the toolbar of gmmusic in the correct order, just as it's currently the case.) I'd suggest not to use sequential numbers (which leave no gaps for further injections), but steps of ten. 3.16. MEDIUM ------------ - id - verbose_id (for me ;-), not unique, since a LP and a CD can have the same verbose_id) - artist - title - medium_type (foreign key for medium_type) - length (for CD, Singles and LPs, this could be the sum length of all recordings; for all writeable media, such as ROM and tapes, this is the brutto recording length) - manufacturer (for tapes and CD-R) - label - year - order no - begin_date_a - begin_date_b - end_date - remarks - buy_date - buy_price - discid - category - track_offsets (CD, ROM only) - genre (ID3 tag) - digital (defaults to AAA) - burning_date - production credits - cover art - purchase info 4. THE STORED PROCEDURES ------------------------ ... will be developed in PL/pgSQL, see http://www.ca.postgresql.org/users-lounge/docs/7.1/programmer/plpgsql.html (CL, 18.10.02) 5. TRANSITION FROM OLD DB TO NEW DB ----------------------------------- 5.1. AT FIRST.... ----------------- - The new DB must have a different name than mmusic2 or mmusic3, since these two names are already in use. Suggestion: mmusic-v2, but can be overridden at install time - Installation can no longer happen, when you install the RPM, since the user is asked for the name of the new DB - If gmmusic (new) finds no database mmusic-v2, it will create the database (with the points given above). - If gmmusic finds an already existing mmusic (old) database, maybe while browsing .gmmusicrc, it will ask the user, whether he will do the transition and fill the new DB with the values of the old DB. (He will also be told, that the old DB will not be deleted) - For every upgrade process, the new mmusic backend will (with interaction of the gmmusic frontend) do the upgrade. The RPM no longer does any upgrading. 5.2. THE OLD MEDIA TABLES ------------------------- Table.Field old | Table.Field new --------------------------+--------------------------- *.id no longer used | medium.id *.medium_id | medium.verbose_id *.interpret | medium.artist *.titel | medium.title *.remarks | medium.remarks *.label | medium.label *.order_no | medium.order_no *.size | medium.length *.firm | medium.manufacturer *.begin_date | medium.begin_date_a *.begin_b | medium.begin_date_b *.end_date | medium.end_date *.year | medium.year *.buy_date | medium.buy_date *.buy_price | medium.buy_price *.cddb_id | medium.discid *.category | medium.category *.track_offsets | medium.track_offsets *.id3_genre | medium.genre *.digital | medium.digital video_tape.system | obsolete rom.audio | obsolete rom.rewitable | obsolete rom.burning_date | medium.burning_date files.magic | medium.discid files.type | medium.medium_type * | medium.medium_type All unused fields must carry the value "NULL" 5.3. THE OLD SONG TABLE ----------------------- 5.4. THE OLD TABLE STATISTICS ----------------------------- I think, we don't need this table any more, the data should be retrived on-the-fly every now and then and the frontend might take care itself, too. 6. UNSORTED THOUGHTS -------------------- - Soundex !! (CL, 17.5.02) - Implementaion in Java! mmusic (the backend) should be a SOAP-Server jmmusic (one of the clients) should be a SOAP client Advantage: Client knows nothing about the database; Client can be "small and stupid"; Server can be a centralized big machine. (CL,18.5.03) ---------------------------------------------------------------------------- vim: tw=78 expandtab