Dillfrog Muse Data Dumps

Dillfrog Muse was created by a hobbyist, using tools and data available freely over the Interwebs. As such, I'd like to give some stuff back to the community. Here's some stuff you can use for your own evil or neutral purposes. If you use it for the purposes of good though, please exercise caution. These things can backfire.

If there's something you'd like to snarf that you don't see here, contact Plat and maybe we can work something out.

Rhyme Data Dump - RhymerPubPronunciations

Download It

Data Sources

Here's the raw data we're slicing and dicing, in case you want to brew up your own custom flavor.

Documentation - General Notes

Documentation - Table Schema

Column Description Data Sources
word Textual representation of the word. Values are upper-cased, and underscores are used instead of spaces (e.g. "BOOK_REVIEW" instead of "BOOK REVIEW"). Yes, "BOOK_REVIEW" is considered a 'word' here; technically this field's value can consist of multiple words. Pronunciation (merged)
syllable_count The number of syllables in this pronunciation of the word. Pronunciation (merged)
rhyme_group The ID of the rhyme group this pronunciation is tied to. Pronunciations with the same rhyme_group are considered to rhyme with one another.
Warning: the actual value for this column is meant to be internal and can change on future data builds. Only use it to find relationships between the rows.
Pronunciation (merged)
offrhyme_group The ID of the offrhyme (slant rhyme) group this pronunciation is tied to. Pronunciations with the same offrhyme_group are considered to offrhyme with one another.
Warning: the actual value for this column is meant to be internal and can change on future data builds. Only use it to find relationships between the rows.
Pronunciation (merged)
wn_lemma The WordNet 3.x lemma that this pronunciation ties to. You can use this to do a quicker lookup or cross-reference with the WordNetSQL data. If we couldn't find a corresponding WordNet entry for this row's word, the field's value will be is NULL. WordNet
primary_pos The word's primary part of speech.
value meaning
a adjective
n noun
r adverb
v verb
? Unknown; we couldn't look it up in WordNet.
WordNet also uses a value of 's' to denote satellite adjectives. We mark those as 'a' (adjective), not 's'.
WordNet
wn_total_tag_count The total tag count for all senses of this word. See WordNet's doc on word_cnt for more details. The higher this number is, the more frequently it was found in the corpora WordNet used. You can use this as a very crude way to find common words (high = common). Hopefully we'll have a more comprehensive 'familiarity' score in future releases. WordNet
is_noun Determines whether or not this word can be used as a noun.
value meaning
1 The word is known to be usable as a noun
0 The word is known to be NOT usable as a noun
? We don't know how this word is meant to be used (i.e., we couldn't look the word up in WordNet)
WordNet
is_verb Acts like is_noun, but for classifying words as verbs. WordNet
is_adjective Acts like is_noun, but for classifying words as adjectives. WordNet
is_adverb Acts like is_noun, but for classifying words as adverbs. WordNet

Example queries

Find words that rhyme with 'BLUE':

SELECT DISTINCT R.word, R.syllable_count
FROM rhymerpubpronunciations T, rhymerpubpronunciations R
WHERE
    -- Target the word we want to find similar words to
    T.word = 'BLUE'
    -- The related word must be of the same group as the target
    AND T.rhyme_group = R.rhyme_group
    -- Exclude the target word from the results (e.g. so "blue" doesn't show up in the results)
    AND T.word <> R.word
ORDER BY R.syllable_count ASC, R.word;

Find words that offrhyme but do not perfectly rhyme with 'BLUE':

SELECT DISTINCT R.word, R.syllable_count
FROM rhymerpubpronunciations T, rhymerpubpronunciations R
WHERE
    -- Target the word we want to find similar words to
    T.word = 'BLUE'
    -- The related word must be of the same OFFRHYME group as the target
    AND T.offrhyme_group = R.offrhyme_group
    -- Exclude words that perfectly rhyme with the target
    AND T.rhyme_group <> R.rhyme_group
    -- Exclude the target word from the results (e.g. so "blue" doesn't show up in the results)
    AND T.word <> R.word
ORDER BY R.syllable_count ASC, R.word;