Some notes on the difference between the websites
that house the patents for India and the US.
A simple title search (which is possible in the
US Patent Office website generates an error) in India’s online system. I
therefore have to search other aspects of Patents on the Indian trademark
office to achieve results for patents that are related to ‘open source’ and
‘python’. Also the results of the India
Patent office are not directly linkable as they are in the USTrademark Office’s
website. Therefore, searching and scraping of India’s patents system results
cannot be scraped in the same way as the US Patent Office’s site.
Additionally, the results
seem to be not related to software development. Most of the results are about data science, machine learning
and analytics. There are many more records
received from an initial search with the terms “open-source” + “python” in the
Description area/section of the Indian patent office records.
Question: What is a better way to retrieve granted patents in India’s
patent office website?
Answer: Probably developments in new scraping methods in python can retrieve
and scrape India’s patent office
website.
Question: How can we bypass the captcha system that the Indian patent
website has?
Answer: We need to work on different scraping methods in python and
how they can be used for scraping along with their pros and cons is a thing to
look at. This is one of the technical obstacles we face in transferring our
existing php technology that scrapes the US trademark office’s website.
Indian patent site is not
friendly for scraping the way that we have executed scraping for the US
Trademark office’s site. The Indian patent office website uses JavaScript to
produce the results. We need to explore more ways we can rebuild our algorithmn
specific to the Indian Trademark Office’s website. I will be conferring with my
PHP Developer to get his suggestions on how to approach the Indian Trademark
Office’s Site.
Beautifulsoup python is the
function that Joshi has found to filter patents from the USTrademark Office.
Beautifulsoup python is not able to scrape the Indian Trademark Office’s Site
because data is not on an html page but produced through a Javascript. We may
need to research asynchronous web processes to speak with the Javascript that
is used on the Indian Trademark Office’s Website.
Question: Where do we start?
Answer: Joshi has located the officer of the Indian Trademark
Office’s email address. Diane will craft a letter to the Indian Trademark
Patent Officer requesting access to the patents website. How can we access the
patent records published in the database? Joshi will review then we will send
it off and see what results we can receive. In the interim we will manually download the Abstract and the Claims of the Patents granted in India.
Right now, google patents do
not include patents from the Indian patents website. Additionally, captcha is
installed on the Indian patents website to deter scraping.
No comments:
Post a Comment