0% found this document useful (0 votes)
33 views52 pages

Exposing Repository Content To Google Scholar

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 52

Exposing Repository Content to Google Scholar

Item Type Presentation

Authors Luyten, Bram

Download date 06/07/2022 05:41:08

Link to Item http://hdl.handle.net/2384/582854


Exposing repository content 

to 

Google Scholar

Bram Luyten - @LuytenBram


Dec 12th 2018
Atmire Webinar
www.atmire.com
Open Repositories 2014
Overview
Basic principles

Check your repository with analyzer.atmire.com

Google Analytics

Most, if not all of the information presented today is present, 



in some shape or form on

https://scholar.google.com/intl/en/scholar/inclusion.html
But first
Does Google Scholar rely on OAI-PMH?
What does the "site:" operator tell you
about coverage of your repository?
Does Google Scholar index all your items?
How long does it take for an item to get indexed?
If you don't see you item in the search results,
does that mean it isn't indexed?
Is it required to register you repository
somewhere in order to get it included?

https://www.google.com/support/scholar/bin/request.py
Site-wide principles
Google Scholar treats DSpace like any website
Robots.txt needs to be in the root of the domain
Robots.txt needs to reference a sitemap
Pages should load "fast enough"

Basic Principles - Site wide


It's alive !
How to test an item

Basic Principles
Item specific principles

Basic Principles
Scholarly articles
"The content hosted on your website must consist primarily of
scholarly articles - journal papers, conference papers,
technical reports, or their drafts, dissertations, pre-prints,
post-prints, or abstracts. Content such as news or magazine
articles, book reviews, and editorials is not appropriate for
Google Scholar. Documents larger than 5MB, such as books
and long dissertations, should be uploaded to Google Book
Search; Google Scholar automatically includes scholarly
works from Google Book Search."

Basic Principles - Item specific


File format

"Your files need to be either in the HTML or in the


PDF format. PDF files must have searchable text, i.e.,
you must be able to search for and find words in the
document using Adobe Acrobat Reader.

Each file must not exceed 5MB in size. To index


larger files, or to index scanned images of pages
that require OCR, please upload them to Google
Book Search."
Basic Principles - Item specific
The next slide is the most
important slide in this webinar
Metadata Matters!
Required fields

citation_title

citation_author preferably one author per tag and order correctly

Make sure you retain the entire author list.

citation_publication_date minimally year - use dc.date.issued


citation_pdf_url 


https://scholar.google.com/intl/en/scholar/inclusion.html#indexing

Basic Principles - Item specific


Author order pre-DSpace 5.4
Author order was systematically wrong in 5.0, 5.1, 5.2 and 5.3.
This was resolved in DSpace 5.4 and as of DSpace 6
https://jira.duraspace.org/browse/DS-2679
If you're pre-DSpace 5.4, either patch or perform the minor
update.

Basic Principles - Item specific


Metadata Example

Basic Principles - Item specific


Metadata mapping configuration
google.citation_title = dc.title
google.citation_publisher = dc.publisher
google.citation_author = dc.author | dc.contributor.author | dc.creator
google.citation_date = dc.date.copyright | dc.date.issued | dc.date.available |
dc.date.accessioned
google.citation_language = dc.language.iso
...
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace/config/crosswalks/google-
metadata.properties
https://wiki.duraspace.org/display/DSDOC6x/Search+Engine+Optimization

Basic Principles - Item specific


Journal and Conference Papers
Optional fields

citation_journal_title, citation_conference_title

citation_issn

citation_isbn

citation_volume

citation_issue

citation_firstpage

citation_lastpage


https://scholar.google.com/intl/en/scholar/inclusion.html#indexing

Basic Principles - Item specific


Theses, Dissertations, Tech Reports
Optional fields

citation_dissertation_institution 

citation_technical_report_institution for the name of the institution
citation_technical_report_number


https://scholar.google.com/intl/en/scholar/inclusion.html#indexing

Basic Principles - Item specific


Item principles RECAP
1. Scholarly content only

2. Google Scholar needs to be able to get to the full text, in


HTML or PDF, max 5MB and clearly linked FROM the item
page if it's not ON the item page itself.
3. The metadata on your item page needs to match with
what's included in the full text file, watch especially out for
correct authors and issue dates.
Basic Principles
Checking your repository and items
https://analyzer.atmire.com
Your Google Scholar traffic in Google Analytics
Identifying traffic from a specific source
Use?
1. Get a feel for what your "normal" volume of
Google Scholar traffic is on a day
2. Setup a recurring report so that you can easily
CHECK if suddenly you get diversions from this
standard amount
IF you care about Google Scholar as a source of
traffic, this is a way to stay on top of the flow of
incoming traffic, and to start looking for sources
of problems when you see alarming drops in
traffic.
Double-Click to insert URL
Questions?

If you want these slides, please take 5 minutes to give us


feedback on this webinar. The link to the slides is shown after
completing the questions.

bit.ly/atmire-scholar-webinar-feedback

Double-Click to insert URL


Image credits
Web Crawler - Scot Swigart https://flic.kr/p/aoYNcG
Mutianyu Great Wall - Arian Zwegers https://flic.kr/p/atS39L

You might also like