Graph Based Natural Language Generation

In my previous work, I investigated whether adding Rhetorical devices to a computer generated piece of text could make the text appear as if it were written by a human. Today, I took a much more typical approach, namely, a graph based approach to natural language generation.

A graph can be built up from a piece of text if distinct words are considered to be the nodes of the graph, and two nodes n1 and n2 are adjacent in the graph if word n2 appears directly after word n1. Since the same word may appear after another word more than a single time in a piece of text, this becomes a weighted graph, with the weights equaling the number of times n2 appears after n1.

One can generate new text by selecting a node to start with (perhaps randomly) and then traversing the graph. New words can be generated either by selecting the adjacent node with maximum weight or by choosing an adjacent node at random. In experimenting I’ve found that choosing the new word at random 1% of the time produces output that is both interestingly varied yet distinctly human.

To ensure that each node has adjacent nodes, one needs a corpus of text large enough. Making use of the excellent NLTK corpora, I built the word graph from the works of Jane Austen, the authors of the King James Bible, William Blake, Sara Cone Bryant, Thornton W. Burgess, Lewis Carroll, G.K. Chesterton, Maria Edgeworth, Herman Melville, John Milton, William Shakespeare, Walt Whitman. For fun I also included the texts of all previous US president’s inaugural addresses.

I then limited the output text to 140 characters to see what it would be like if all those authors decided to work together to tweet.

Here are some example tweets:

mathematician Gentlemen he was a very much as the same time to be the earth and all that he said unto the world and a good and in a few minutes

cruised on the other side of my dear I am sure I have been the first and said the Lord GOD Behold I was the sea and that the whole of it was not

Unscrew the man of our God and of this day of thy God hath not be in his hand of him and with the day and it is the most of that I shall be so

The next question is from where one could obtain a modern corpus so as to enable replication of the speech of today. Twitter, with its amazingly privacy-free API could offer this. I may never have to speak again. I’ll just generate a new sentence every time it’s needed. 🙂 Since conversations also exist on Twitter, it may be possible to use Twitter to understand and then generate specific text given a context. Who knows?

Anyway, heres some graph building code:

def build_graph(self, text):
    last_word = ""
    for w in self.tokenizer.tokenize(text):
        if w not in self.graph:
            self.graph[w] = Node(w)
        if last_word != "":
            self.graph[last_word].addEdgeToNode(str(self.graph[w].node_id))
        last_word = w

Enjoy!

Advertisements
Posted in Random

PB+J: Origins

After his discovery of Quantum Mechanics, one would expect Werner Heisenberg to have been catapulted onto the world stage of physics research; but physicists were very reluctant to accept it. When Erwin Schrodinger published his approach to Quantum Mechanics, physicists around the world reveled in his glory, pronounced him “King of Physics” and dismissed Heisenberg as a radical. Schrodinger was a rather ostentatious man, so he invited Heisenberg to his coronation ceremony so he could dispute him in front of thousands of physicists. As could be expected, the event did not bode well for Heisenberg, as despite his proof of the equivalence of the matrix and wave mechanics approaches, he was ridiculed by his peers. Heisenberg was enraged; he decided to abandon physics to pursue his hidden passion for cooking.

Two years after being shunned by the scientific community, Heisenberg began an experiment to discern the effect of different materials on sandwich taste. His attempts with Gouda proved futile, as he could never accept the “Copenhagen interpretation” of the sandwich. After days of failings, Heisenberg said to himself, “what if I try to relate two seemingly unrelated quantities: peanut butter and jelly.” He tried over and over, again and again, and discovered something quite remarkable which became his lasting legacy: “The more precisely I apply the peanut butter, the less precisely I apply the jelly, and vice versa.” Reluctant at first, Heisenberg eventually published his results and the cooking community would never be the same. Never again would peanut butter and jelly be considered separate quantities.

Posted in Random

Leave no stone un-philosophized

Icon_1024Hopefully everyone likes the Never Lose icon. Underlying these simple images is great meaning.        I think…

I’d like to preface this post by explaining that my original intent for using glasses, a book, and a car was that these were items which are either often displaced or for which Never Lose could be of great help in tracking. Only later did I realize that inadvertently, I had imbued some interesting relationships into the icon.

Let’s go in order:

(1) Glasses – it is quite obvious that these are green-tinted glasses. Why green-tinted? I thought they looked nice. Readers of Ralph Ellison’s incredible work – Invisible Man – must now think of the green-tinted sunglasses of the Rinehart character, which the nameless narrator wears in an effort to hide his own identity, allowing him to take on the life of another. Once the narrator took off Rinehart’s sunglasses, he was forced to reface the reality of his own existence, and his own lack of identity.

(2) The book – given the green-tinted glasses, this book is clearly Invisible Man.

(3) The car – another character integral to Invisible Man is Dr. Bledsoe – a man whom the narrator initially sought to emulate for his materialistic success – “he was the possessor of not one, but two Cadillacs.” Dr. Bledsoe’s life, identity, and character were founded on his materialistic success – he valued his life by his possession of items. For Bledsoe, losing his valuables would mean losing his identity. Therefore, Dr. Bledsoe may have found solace knowing a technology as Never Lose existed – for it would help him ensure he would never lose track of his existence.

(4) The question mark – the central question of Invisible Man is undoubtedly the central question of identity and existence – “Who am I?” The question mark in this icon is, therefore, a challenge – imbued onto you the user by me the creator. This challenge takes many forms – will you use this app honestly?, will you enjoy it?, and ultimately, will you define your identity by your possessions or can you lead a meaningful life without the pursuit of materialism?

Either way, if you see this post prior to downloading the app, you should definitely download it – it’s really very useful for tracking and finding your valuables, and if you see this post after already downloading the app, then thank you and I hope you enjoy it!

Posted in Random

Hello World

Hello,

I’m Benjamin Englard.

Goodbye

Posted in Random