[HTML payload içeriği buraya]
28.3 C
Jakarta
Friday, May 15, 2026

Unlocking information synthesis with a conditional generator


Experiments

We carried out experiments on 4 datasets, the place three datasets correspond with downstream generative duties and one dataset with a classification process. Generative duties are sometimes tougher than classification duties. It is because the generative duties are evaluated by the next-token prediction accuracy, which requires the artificial information to protect fine-grained textual info from the non-public information. In distinction, the classification duties solely require sustaining the co-occurrence patterns between labels and phrases within the non-public information.

The three generative duties are chosen to cowl a various set of sensible situations: PubMed (medical paper abstracts), Chatbot Area (human-to-machine interactions), and Multi-Session Chat (human-to-human every day dialogues). To guage the standard of the generated artificial information, we adopted the setup of Aug-PE to coach a small downstream language mannequin on the artificial information after which compute the next-token prediction accuracy on the actual check information.

The classification process is carried out on the OpenReview (tutorial paper opinions) dataset. To guage the standard of the generated artificial information, we prepare a downstream classifier on the artificial information, and compute the classification accuracy on the actual check information.

To mitigate considerations relating to information contamination, we rigorously analyzed our chosen datasets. Our evaluation confirmed no overlap between our pre-training information and the downstream datasets.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles