Some notes on the difference between the websites that house the patents for India and the US.
A simple title search (which is possible in the US Patent Office website generates an error) in India’s online system. I therefore have to search other aspects of Patents on the Indian trademark office to achieve results for patents that are related to ‘open source’ and ‘python’. Also the results of the India Patent office are not directly linkable as they are in the USTrademark Office’s website. Therefore, searching and scraping of India’s patents system results cannot be scraped in the same way as the US Patent Office’s site.
Additionally, the results seem to be not related to software development. Most of the results are about data science, machine learning and analytics. There are many more records received from an initial search with the terms “open-source” + “python” in the Description area/section of the Indian patent office records.
Question: What is a better way to retrieve granted patents in India’s patent office website?
Answer: Probably developments in new scraping methods in python can retrieve and scrape India’s patent office website.
Question: How can we bypass the captcha system that the Indian patent website has?
Answer: We need to work on different scraping methods in python and how they can be used for scraping along with their pros and cons is a thing to look at. This is one of the technical obstacles we face in transferring our existing php technology that scrapes the US trademark office’s website.
Question: Where do we start?
Answer: Joshi has located the officer of the Indian Trademark Office’s email address. Diane will craft a letter to the Indian Trademark Patent Officer requesting access to the patents website. How can we access the patent records published in the database? Joshi will review then we will send it off and see what results we can receive. In the interim we will manually download the Abstract and the Claims of the Patents granted in India.
Right now, google patents do not include patents from the Indian patents website. Additionally, captcha is installed on the Indian patents website to deter scraping.