2017 STARDATAAStarCraftAIResearchDat

(Lin, Gehring et al., 2017) ⇒ Zeming Lin, Jonas Gehring, Vasil Khalidov, and Gabriel Synnaeve. (2017). “STARDATA: A StarCraft AI Research Dataset.” In: Proceedings of AIIDE-2017.

Subject Headings: STARDATA Dataset, StarCraft.

Notes

https://aaai.org/ocs/index.php/AIIDE/AIIDE17/paper/view/15837

Cited By

http://scholar.google.com/scholar?q=%222017%22+STARDATA%3A+A+StarCraft+AI+Research+Dataset

Quotes

Abstract

We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imitation learning, forward modeling, partial information extraction, and others. [[2017_STARDATAAStarCraftAIResearchDat}We]] use TorchCraft to extract and store the data, which standardizes the data format for both reading from replays and reading directly from the game. Furthermore, the data can be used on different operating systems and platforms. The dataset contains valid, non-corrupted replays only and its quality and diversity was ensured by a number of heuristics. We illustrate the diversity of the data with various statistics and provide examples of tasks that benefit from the dataset. We make the dataset available at this https://github.com/TorchCraft/StarData . En Taro Adun!

Introduction

Real time strategy games as an AI research problem is attracting substantial attention (Ontanon et al. 2013; Usunier et al. 2016; Peng et al. 2017) due to their complex game dynamics, partial observability, and existing expert games in the form of human replays. These games are a good test bed for various reinforcement learning algorithms on a domain with higher complexity than toy robotics tasks and turn-based board games. Due to recent advances in deep learning, we see a trend of improved model performance with larger datasets. As learning capacity of these models increases, there is a growing need for data, especially in order to apply deep learning methods to control in RTS games.

Although learning in StarCraft can be performed through playing, the dynamics of the game are extremely complex, and it is beneficial to speed up learning by using existing games. The availability of datasets of recorded games between experienced players is therefore desirable.

StarCraft allows one to record replays of games which contain all commands issued by players. A number of online resources contain collections of replays from various tournaments (see Table 1). Some information can be directly inferred from the replay file; however, reconstructing the full game state requires playback in StarCraft.

There are several aspects that make it difficult to use the replays directly for machine learning purposes. Firstly, the reconstruction speed of StarCraft is limited and would impose an upper threshold on training speed. Secondly, incompatibility between replays produced by different StarCraft versions makes it impossible to use the same game engine for all the replays or might result in corrupted data. Finally, the reconstruction process can only be reliably run on Windows, which adds additional unnecessary restrictions. Hence, the utility of a replay dataset can be increased by extracting game states, validating them and storing them as a separate dataset.

For a dataset to serve as a good base for learning models, it should fulfill a number of requirements:

Universality: the data stored in the dataset can be used to learn different aspects of game strategy and at different levels. Thus the dataset should provide data which is not specific to any particular context and should be as close to the full game state as possible.

Diversity: the dataset should cover a variety of game scenarios in terms of match-ups, maps, player strategies, etc.

Validity: the dataset should be representative of the distribution of StarCraft matches where both sides are trying to win.

Interfacing: one should be able to easily substitute game states received from the game engine with game states recorded in the dataset.

Portability: dataset access should be supported on a variety of platforms and operating systems. With these requirements in mind, we constructed a new dataset of StarCraft replays from games among humans that can be used for StarCraft AI research. The following are our major contributions.

We provide a large set of StarCraft human replays, which is about 10x bigger than any of the comparable datasets currently available. The dataset includes a variety of scenarios and thus ensures the diversity requirement. Detailed statistics on matchups, maps etc. can be found in further sections. All replays are checked for playability in StarCraft and BWAPI. We used additional scripted rule-based checks for corruption to fulfill the validity requirement.

The dataset is stored in a format that can be read by TorchCraft (Synnaeve et al. 2016), a library used as an interface between scientific computing frameworks and StarCraft. One can use exactly the same code to read data from the dataset and control StarCraft. This ensures both interfacing and portability requirements, since TorchCraft has a client in C++, Lua, and Python, and be compiled easily on any operating system.

For each replay in the dataset, the complete game state is stored every 3 frames (about 8 frames per second). This means that one can employ the dataset to learn different aspects of the game strategy, from micro level to macro level and the universality requirement is fulfilled. The current paper is structured as follows.

…

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2017 STARDATAAStarCraftAIResearchDat	Jonas Gehring Zeming Lin Vasil Khalidov Gabriel Synnaeve			STARDATA: A StarCraft AI Research Dataset						2017