To get a zip archive of Russian poetry, visit:
https://s3-us-west-2.amazonaws.com/lab-workshops/russian-poetry.zip
Then move this directory to your Desktop to make it easier to find.
-
Install the Anaconda Python 2.7 Distribution - click on "Graphical Installer" under Python 2.7
-
Open a Terminal (Mac and Linux) or PowerShell (Windows).
-
Run
pip install --pre topicexplorer -i https://inpho.cogs.indiana.edu/pypi/
.Note:
--pre
has two-
characters. When the1.0
release happens,--pre
will no longer be necessary. -
Test installation by typing
topicexplorer -h
to print usage instructions.
-
Initialize the Topic Explorer on a file, folder of text files, or folder of folders:
topicexplorer init insertPathToCorpus #for example: topicexplorer init Desktop/wikimedia_russian_texts/txt
When prompted, name your corpus (ex: wikimedia_russian_texts).
The init command will generate a configuration file called CONFIG.
-
Set a min and max frequency for word occurence:
topicexplorer prep insertPathToCorpus #for example: topicexplorer prep Desktop/wikimedia_russian_texts/txt
When prompted, enter max number (ex: 500).
When prompted, enter min number (ex: 1).
-
Train LDA models using the on-screen instructions:
topicexplorer train CONFIG #for example: topicexplorer train Desktop/wikimedia_russian_texts/txt.ini
When prompted, specify the number of topics you would like (defaults: 20, 40, 60, 80).
When prompted, specify the number of training iterations (default: 200).
-
Launch the topic explorer:
topicexplorer launch CONFIG #for example: topicexplorer launch Desktop/wikimedia_russian_texts/txt
-
Press Ctrl+C to quit all servers.