GM-RKB XML Snapshot File
(Redirected from GM-RKB XML Data Snapshot)
Jump to navigation
Jump to search
A GM-RKB XML Snapshot File is a specific type of MediaWiki XML Data Snapshot File that represents a snapshot of data from the GM-RKB (Gabor Melli's Research Knowledge Base).
- Context:
- It can be used GM-RKB Maintenance, and GM-RKB Analysis.
- It can be processed by a gmrkb_xml_snapshot_processor.py.
- …
- Example(s):
rkb-mediawiki-20230604-1206.xml- …
- Counter-Example(s):
- Wikipedia XML Data Snapshot, such as enwiki-latest-pages-articles.xml.
- Non-XML Export File.
- XML files not adhering to the MediaWiki Wiki Export File Format.
- See: GM-RKB.
References
2023
- chat
import json
from xml.etree import ElementTree
# Introduction: This program extracts the titles and contents of pages from a given XML file.
# It then formats the data into a JSON file that is ready to be uploaded to a specified destination.
def extract_pages(xml_file):
# Parse the XML file
tree = ElementTree.parse(xml_file)
root = tree.getroot()
# Initialize a list to hold the extracted pages
pages = []
# Iterate through each page element in the XML file
for page in root.iter('{http://www.mediawiki.org/xml/export-0.10/}page'):
# Extract the title and content of the page
title = page.find('{http://www.mediawiki.org/xml/export-0.10/}title').text
content = page.find('.//{http://www.mediawiki.org/xml/export-0.10/}text').text
# Append the title and content as a dictionary to the pages list
pages.append({
'title': title,
'content': content
})
return pages
# Specify the XML file to extract from
xml_file = 'rkb-mediawiki-20230604-1206.xml'
# Extract the pages from the XML file
pages = extract_pages(xml_file)
# Create the JSON object to be uploaded
data_to_upload = {"value": pages}
# Write to the JSON file
with open('data_to_upload.json', 'w') as json_file:
json.dump(data_to_upload, json_file, ensure_ascii=False, indent=4)
# Print a message to indicate success
print("File successfully created.")