-
Notifications
You must be signed in to change notification settings - Fork 5.1k
handle yahoo US collecotr api limit issue (Fix #1953) #1970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@microsoft-github-policy-service agree |
| symbols.extend(page_symbols) | ||
| page += 1 | ||
| time.sleep(0.01) | ||
| except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use an explicit type of exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for page in 120 121 122; do
curl -s -L "http://4.push2.eastmoney.com/api/qt/clist/get?pn=$page&pz=100&fs=
m:105,m:106,m:107&fields=f12" | jq '.data.diff | length'
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(symbols) < 8000:
raise ValueError("request error")
try:
resp = requests.get(url, headers=headers, params=params, timeout=10)
if resp.status_code != 200:
break
data = resp.json()
diff = data.get("data", {}).get("diff")
if not diff:
break
page_symbols = [v["f12"].replace("", "-P") for v in diff.values() if "f12" in v]
if not page_symbols:
break
symbols.extend(page_symbols)
page += 1
time.sleep(0.01)
except:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_eastmoney():
url = "http://4.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=10000&fs=m:105,m:106,m:107&fields=f12"
resp = requests.get(url, timeout=None)
if resp.status_code != 200:
raise ValueError("request error")
symbols = []
page = 1
headers = {
"User-Agent": "Mozilla/5.0"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_eastmoney():
url = "http://4.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=10000&fs=m:105,m:106,m:107&fields=f12"
resp = requests.get(url, timeout=None)
if resp.status_code != 200:
raise ValueError("request error")
symbols = []
page = 1
headers = {
"User-Agent": "Mozilla/5.0"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(symbols) < 8000:
raise ValueError("request error")
try:
resp = requests.get(url, headers=headers, params=params, timeout=10)
if resp.status_code != 200:
break
data = resp.json()
diff = data.get("data", {}).get("diff")
if not diff:
break
page_symbols = [v["f12"].replace("", "-P") for v in diff.values() if "f12" in v]
if not page_symbols:
break
symbols.extend(page_symbols)
page += 1
time.sleep(0.01)
except:
Fix eastmoney API pagination for US stock data collection
Description
Fixed pagination issue in
_get_eastmoney()function that was causing "requesterror" when collecting US stock symbols. Changed from requesting 10,000
symbols per page to proper pagination with 100 symbols per page, iterating
through all pages until completion.
Changes made:
pzparameter from10000to100(page size)Motivation and Context
Related Issue: eastmoney API now limits page size to 100 symbols maximum,
but the original code was trying to fetch 10,000 symbols in a single request.
Problem: When running
python collector.py download_data --source_dir ~/.qlib/stock_data/source/us_data --start 2020-01-01 --end 2020-12-31 --delay 1 --interval 1d --region US, it failed with "request error" becauselen(_symbols) < 8000.Root Cause: The eastmoney API
http://4.push2.eastmoney.com/api/qt/clist/getwas returning only 100 symbolsinstead of the requested 10,000, triggering the validation error.
How Has This Been Tested?
API Endpoint Testing:
Test Commands Used: