HCSS Logo RuBase Logo

What is OpenAlex?

πŸ“š
293M+
Scholarly Works
+
πŸ”“
100%
Free & Open
=
πŸš€
OpenAlex
Research Graph

The OurResearch Team

πŸŽ“
OURRESEARCH
πŸ›οΈ 501(c)(3) Non-Profit
πŸ”“ Created Unpaywall
πŸ“– 100M+ Papers Unlocked
🌍 Open Science Mission
Jason Priem
Co-founder & CEO
Heather Piwowar
Co-founder & Data Scientist
2021 Launch
When Microsoft Academic shut down

Why OpenAlex?

🧬
MAG DNA
Built on Microsoft Academic Graph
πŸ”‘
No API Key
Zero barriers to entry
⚑
Daily Updates
Fresh data every 24 hours
♾️
Free Forever
Committed to open access

Funding & Sustainability

πŸ›οΈ
ARCADIA FUND
UK Charity by Lisbet Rausing (Tetra Pak heiress) & Peter Baldwin (UCLA historian)
Supporting Cultural Heritage & Open Knowledge
$7.5M
March 2024 β€’ 5-Year Grant
$4.5M
2021 Initial Grant
$1B+ Total Given Since 2002
🧭
Navigation Fund
$688,000
Nov 2024 β€’ UI Enhancement
πŸ“Š
Monthly Impact
115M API Calls
Serving Global Research
βœ…
POSI Principles
5th Org to Commit
Open Infrastructure
100% Free Forever
Committed to Open Science Infrastructure

Search Operators Available

βœ…
BOOLEAN OPERATORS
AND
OR
NOT
"exact phrase"
( ) grouping
❌
NOT SUPPORTED
* wildcard
? single char
~ fuzzy
NEAR proximity
regex patterns
"russian foreign policy" AND (ukraine OR nato) NOT soviet

Our API Query Strategy

API_ENDPOINT = "https://api.openalex.org/works"
SEARCH_QUERY =
"Russian foreign policy"
OR "Russian defense policy"
OR "Russian security policy"
# Note: Only 3 search terms actually used
NO_YEAR_FILTER = "All years included"
TYPE = "article"

Search Term Coverage

RUSSIAN FOREIGN POLICY
RUSSIAN DEFENSE POLICY
RUSSIAN SECURITY POLICY
OR
Boolean Search

Handling 6,329 Papers

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
129
32 API Calls Γ— 200 Papers/Page = 6,329

API Rate Management

Max Rate: 10 requests/second
Our Rate: 1 request/100ms
Total Time: ~5.3 minutes

Data Cleaning Process

🚫
Remove Paratext
-133
πŸ“‘
Filter Metadata
-47
⚠️
Remove Retracted
-3
βœ…
Final Clean Dataset
6,196
6,329 β†’ 6,196
98% Data Retention

Quality Assurance

βœ“ Publication year range 1963-2026
βœ“ Author deduplication 6,561
βœ“ Citation integrity check 27,257
βœ“ Institution normalization 1,842
βœ“ Language detection 17 langs

Fields Extracted

15
Metadata Fields
8
Citation Metrics
12
Topic Categories
5
Access Types

The Hidden Literature Problem

56% of relevant papers only mention search terms in full-text, not abstracts
Traditional Search
Title + Abstract Only
2,821
44%
Papers Found
OpenAlex Full-Text
Title + Abstract + Full-Text
6,329
100%
Papers Found
Hidden Literature
3,508
56% of Relevant Papers
Only mentioned in methodology, case studies,
comparative analyses, or body text

Why Are They Hidden?

πŸ“Š Comparative Studies
Papers comparing multiple countries mention Russian policy in analysis sections
πŸ”¬ Methodology Papers
Research methods papers using Russian policy as example cases
πŸ“š Historical Analyses
Broader historical works with sections on Russian policy
🌍 Regional Studies
Area studies mentioning Russian policy in regional context
πŸ“– Book Chapters
Edited volumes with Russian policy discussed in specific chapters

The Zotero Challenge

⚠️ Traditional Zotero Import Limitations
πŸ”
Browser Extension
Only sees visible results
πŸ“‘
Metadata Matching
Relies on titles/abstracts
❌
Result
Miss 56% of papers
βœ… OpenAlex β†’ Zotero Solution
🌐
API Export
Get ALL results
πŸ“„
RIS/BibTeX Format
Convert to Zotero format
✨
Result
Capture 100% of papers

Why We Need Everything: Dual Purpose

πŸ“š
Reference Management
βœ“ Citations & footnotes
βœ“ Bibliographies
βœ“ Literature reviews
βœ“ Academic writing
πŸ€–
LLM Corpus Analysis
βœ“ Full-text processing
βœ“ Pattern discovery
βœ“ Taxonomic annotation
βœ“ Knowledge graphs
Precision Γ— Recall = Comprehensive Corpus
Maximum relevant papers Γ— Minimum noise = Optimal LLM training data

Next Workshop Sessions: LLM-Powered Analysis

1
TODAY: OpenAlex & Zotero Integration
β€’ Complete corpus collection (6,196 papers)
β€’ Dual-purpose workflow for reference management
β€’ Export to RIS/BibTeX for LLM analysis
2
Building Rich Taxonomies with LLMs
β€’ Multi-level classification systems
β€’ Multiple perspectives (theoretical, methodological, thematic)
β€’ Emergent categories from 6,196 papers
β€’ LLM-suggested hierarchies
3
Intelligent Corpus Annotation
β€’ Chunk documents into segments
β€’ Apply taxonomic labels via LLMs
β€’ Create knowledge graphs
β€’ Discover hidden connections
From 6,196 papers β†’ Structured knowledge
Including the 3,508 "hidden" papers only findable via full-text search

Workshop Resources & Materials

Everything you need in one place
πŸ“ Dropbox Folder: /2603 - Boston/
πŸ“‚ Main Workshop Materials
β”œβ”€β”€ πŸ“Š 260311-13 Part 1 - Setting the Scene.pptx (117MB)
β”œβ”€β”€ πŸ“Š 260311-13 Part 2 - Building Corpora.pptx (74MB)
β”œβ”€β”€ πŸ“Š 260311-13 Part 3 - Building Taxonomies.pptx (22MB)
└── πŸ“Š 260311-13 Part 4 - Using LLMs.pptx (33MB)
πŸ“‚ /openalex/ - Interactive Dashboard
β”œβ”€β”€ 🌐 index.html - This presentation
└── πŸ“ˆ dashboard.html - Analytics dashboard
πŸ“‚ Documentation & Guides
β”œβ”€β”€ πŸ“„ CLI_LLM_Setup_Guide.md
β”œβ”€β”€ πŸ“„ OPENALEX_API_STUDENT_GUIDE.md
β”œβ”€β”€ πŸ“„ ZOTERO_INTEGRATION.md
└── πŸ“„ GOOGLE_DRIVE_MCP_SETUP.md
πŸ“‚ Scripts & Tools
β”œβ”€β”€ 🐍 Python scripts for data collection
└── πŸ”§ Helper utilities for analysis
πŸ”— How to Access
Day 1 (Today): Public link - anyone can view and download
During Break: Add your Dropbox account for sync access
Day 2-3: Private access - only registered participants

Tech Help & Troubleshooting

Common issues and quick fixes
πŸ–₯️ WSL (Windows Subsystem for Linux)
Problem: "WSL is not recognized"
Fix: Run wsl --install in PowerShell as Admin
Problem: "No Linux distributions found"
Fix: Run wsl --install -d Ubuntu
Problem: WSL won't start
Fix: Restart computer, enable virtualization in BIOS
πŸ”Œ OpenAlex API
Problem: "Connection timeout"
Fix: Check internet, try smaller query (limit=10)
Problem: "429 Too Many Requests"
Fix: Wait 1 minute, add email to requests
Problem: No results returned
Fix: Simplify search terms, check spelling
🐍 Python & CLI Tools
Problem: "Python not found"
Fix: Install via sudo apt install python3
Problem: "Module not found"
Fix: Run pip install requests
Problem: Permission denied
Fix: Use sudo or --user flag
πŸ†˜ Need More Help?
Raise your hand - we'll come to you!
Or use the buddy system - help each other 🀝

OpenAlex β†’ Zotero + LLM Workflow

1
Search OpenAlex
Full-text query
2
Export Results
Python script
3
Convert Format
RIS/BibTeX
4
Import Zotero
File β†’ Import
πŸ“š Zotero Library
6,196 references
Ready for citations
πŸ€– LLM Corpus
Full-text documents
Ready for analysis

Complete Dataset Ready

6,196
Clean Papers Ready
Including 3,508 Hidden Papers