Dillfrog Muse Data Dumps
Dillfrog Muse was created by a hobbyist, using tools and data available freely over the Interwebs. As such, I'd like to give some stuff back to the community. Here's some stuff you can use for your own evil or neutral purposes. If you use it for the purposes of good though, please exercise caution. These things can backfire.
If there's something you'd like to snarf that you don't see here, contact Plat and maybe we can work something out.
Rhyme Data Dump - RhymerPubPronunciations
Download It
Data Sources
Here's the raw data we're slicing and dicing, in case you want to brew up your own custom flavor.
- WordNet SQL (WordNet 3.0) - license
- CMU Pronouncing Dictionary - license
- Moby Pronunciation List (by Grady Ward) - Not copyrighted in the United States. See the eBook download for Gutenberg's terms.
Documentation - General Notes
- Hey! If you end up using this for something awesome, please show me! (You don't have to, though!)
- Disclaimer: This data is being offered on a trial basis. Its information is not comprehensive nor representative of how the muse.dillfrog.com site's data is organized. It's meant to help get you on your way. Because of this, you might find bugs, outdated data, data not used in the muse.dillfrog.com tools, data not published in this dump, or other weird stuff. There's no guarantee the data will be kept up-to-date, either - the first dump may be the last. Future builds may include breaking changes (e.g. renaming columns, removing columns). It's also created as a hobby effort, and might have bad data. If you find a problem, let us know! We probably want to fix it!
- Each row corresponds to a pronunciation for a given word. Some words have multiple pronunciations, so you'll find multiple rows for the same word.
- Be careful: some columns' values are meant to show relationships between rows on a particular build, and could change in future builds. For example, consider the "rhyme_group" column. The rhyme_group for words "BLUE" and "ACCRUE" will match each other on future releases, but the actual [matching] value is likely to change across releases. If you store this value elsewhere, don't expect it to work as a foreign key across releases.
- Optimization: Our snapshot doesn't [currently] include any fancy indexes. Be kind to your server - create some! We might include these by default on future releases.
Documentation - Table Schema
| Column | Description | Data Sources | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| word | Textual representation of the word. Values are upper-cased, and underscores are used instead of spaces (e.g. "BOOK_REVIEW" instead of "BOOK REVIEW"). Yes, "BOOK_REVIEW" is considered a 'word' here; technically this field's value can consist of multiple words. | Pronunciation (merged) | ||||||||||||
| syllable_count | The number of syllables in this pronunciation of the word. | Pronunciation (merged) | ||||||||||||
| rhyme_group | The ID of the rhyme group this pronunciation is tied to. Pronunciations with the same rhyme_group are considered to rhyme with one another.
Warning: the actual value for this column is meant to be internal and can change on future data builds. Only use it to find relationships between the rows.
|
Pronunciation (merged) | ||||||||||||
| offrhyme_group | The ID of the offrhyme (slant rhyme) group this pronunciation is tied to. Pronunciations with the same offrhyme_group are considered to offrhyme with one another.
Warning: the actual value for this column is meant to be internal and can change on future data builds. Only use it to find relationships between the rows.
| Pronunciation (merged) | ||||||||||||
| wn_lemma | The WordNet 3.x lemma that this pronunciation ties to. You can use this to do a quicker lookup or cross-reference with the WordNetSQL data. If we couldn't find a corresponding WordNet entry for this row's word, the field's value will be is NULL. | WordNet | ||||||||||||
| primary_pos |
The word's primary part of speech.
|
WordNet | ||||||||||||
| wn_total_tag_count | The total tag count for all senses of this word. See WordNet's doc on word_cnt for more details. The higher this number is, the more frequently it was found in the corpora WordNet used. You can use this as a very crude way to find common words (high = common). Hopefully we'll have a more comprehensive 'familiarity' score in future releases. | WordNet | ||||||||||||
| is_noun | Determines whether or not this word can be used as a noun.
|
WordNet | ||||||||||||
| is_verb | Acts like is_noun, but for classifying words as verbs. | WordNet | ||||||||||||
| is_adjective | Acts like is_noun, but for classifying words as adjectives. | WordNet | ||||||||||||
| is_adverb | Acts like is_noun, but for classifying words as adverbs. | WordNet |
Example queries
Find words that rhyme with 'BLUE':
SELECT DISTINCT R.word, R.syllable_count
FROM rhymerpubpronunciations T, rhymerpubpronunciations R
WHERE
-- Target the word we want to find similar words to
T.word = 'BLUE'
-- The related word must be of the same group as the target
AND T.rhyme_group = R.rhyme_group
-- Exclude the target word from the results (e.g. so "blue" doesn't show up in the results)
AND T.word <> R.word
ORDER BY R.syllable_count ASC, R.word;
Find words that offrhyme but do not perfectly rhyme with 'BLUE':
SELECT DISTINCT R.word, R.syllable_count
FROM rhymerpubpronunciations T, rhymerpubpronunciations R
WHERE
-- Target the word we want to find similar words to
T.word = 'BLUE'
-- The related word must be of the same OFFRHYME group as the target
AND T.offrhyme_group = R.offrhyme_group
-- Exclude words that perfectly rhyme with the target
AND T.rhyme_group <> R.rhyme_group
-- Exclude the target word from the results (e.g. so "blue" doesn't show up in the results)
AND T.word <> R.word
ORDER BY R.syllable_count ASC, R.word;