This project aims to implement and improve upon the classical Chinese poetry generation system proposed in "Chinese Poetry Generation with Planning based Neural Network".
Python 2.7
TensorFlow 1.2.1
Jieba 0.38
Gensim 2.0.0
pypinyin 0.23
Network:
- Bidirectional encoder
- Attention decoder
Training and Predicting:
- Alignment boosted word2vec
- Data loading mode: only keywords (no preceding sentences)
- Data loading mode: reversed
- Data loading mode: aligned
- Training mode: ground truth
- Training mode: scheduled sampling
- Predicting mode: greedy
- Predicting mode: sampling
- Predicting mode: beam search
Refinement:
- Output refiner
- Reinforcement learning tuner
- Iterative polishing
Evaluation:
- Evaluation: rhyming
- Evaluation: tonal structure
- Evaluation: alignment score
- Evaluation: BLEU score
Data
data: directory for raw data, processed data, pre-processed starterkit data, and generated poetry samples
model: directory for saved neural network models
log: directory for training logs
notebooks: directory for exploratory/experimental IPython notebooks
training_scripts: directory for sample scripts used for training several basic models
Code
model.py: graph definition
train.py: training logic
predict.py: prediction logic
plan.py: keyword planning logic
main.py: user interaction program
To prepare training data:
python data_utils.pyDetail
This scrip does the following in order:
- Parse corpus
- Build vocab
- Filter quatrains
- Count words
- Rank words
- Generate training data
Note
The TextRank algorithm may take many hours to run.
Instead, you can choose to interrupt the iterations and stop it early,
when the progress shown in the terminal has remained stationary for a long time.
Then, to generate the word embedding:
python word2vec.pyAlternative
As an alternative, we have also provided pre-processed data in thedata/starterkitdirectory
You may simply performcp data/starterkit/* data/processedto skip the data processing step
To train the default model:
python train.pyTo view the full list of configurable training parameters:
python train.py -hNote
Thus you should almost always train a new model after modifying any of the parameters.
Models are by default saved tomodel/. To train a new model, you may either remove the existing model frommodel/
or specify a new model path during training withpython train.py --model_dir :new_model:dir:
To start the user interation program:
python main.pySimilarly, to view the full list of configurable predicting parameters:
python main.py -hNote
The program currently does not check that predication parameters matches corresponding training parameters.
User has to ensure, in particular, the data loading modes correspond with the ones used during traing.
(e.g. If training data isreversedandaligned, then prediction input should also bereversedandaligned.
Otherwise, results may range from subtle differences in output to total crash.
To generate sample poems for evaluation:
python generate_samples.pyDetail
This script by default randomly samples 4000 poems from the training data and saves them ashumanpoems. Then it uses entire poems as inputs to the planner, to create keywords for the predictor. The predicted poems are saved asmachinepoems.
To evaluate the generated poems:
python evaluate.pyAuxiliary
- "Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks"
- "Sequence-to-Sequence Learning as Beam-Search Optimization"
- "Tuning Recurrent Neural Networks with Reinforcement Learning"
- "Deep Reinforcement Learning for Dialogue Generation"
Poetry Generation
- May 10, 2017: "Flexible and Creative Chinese Poetry Generation Using Neural Memory"
- Dec 7, 2016: "Chinese Poetry Generation with Planning based Neural Network"
- June 19, 2016: "Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test"
- The data processing source code is based on DevinZ1993's implementation.
- The neural network implementation is inspired by JayParks's work.
