springer_books_download_script/Springer-Libros.py

#############################################
###
### Download Springer Books
### Corona Virus Time
###
### carlos@cardenas.pe
###
###  GPL 3.0 v
###
### 27/04/2020
###
#############################################
import PyPDF2
import urllib3
import os
import wget


def download_book_from_page(page_url):
    http = urllib3.PoolManager()

    res = http.request('GET', page_url)

    title = ''.join(res.data.decode('utf-8').split('h1')[1].split('>')[1].split('<')[0].split('/')[0])+".pdf"
    
    # skip books already downloaded
    if os.path.isfile(title):
        return

    download_url = "https://link.springer.com/content/"+res.data.decode('utf-8').split('Download book PDF')[0].split('content/')[1].split('title')[0].split('.pdf')[0]+".pdf"

    wget.download(download_url, title)


def process_books_in_pdf(pdf):
    for i in range(0, pdf.numPages):
        lines = pdf.getPage(i).extractText().split('\n')

        i = 0
        no_of_lines = len(lines)
        while i < no_of_lines:
            if lines[i].startswith("http://"):
                # changing protocol from http to https
                url = "https://"+lines[i][7:]
                print(url)
                try:
                    download_book_from_page(url)
                except:
                    print("Error while downloading, trying again.")
                    continue
            i += 1


def main():
    file = open('Spring.pdf', 'rb')
    pdf = PyPDF2.PdfFileReader(file)
    process_books_in_pdf(pdf)


main()
playing with the Free Books 4 years ago			`#############################################`
			`###`
			`### Download Springer Books`
			`### Corona Virus Time`
			`###`
			`### carlos@cardenas.pe`
			`###`
			`### GPL 3.0 v`
			`###`
			`### 27/04/2020`
			`###`
			`#############################################`
			`import PyPDF2`
			`import urllib3`
Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`import os`
playing with the Free Books 4 years ago			`import wget`


Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`def download_book_from_page(page_url):`
			`http = urllib3.PoolManager()`
playing with the Free Books 4 years ago
Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`res = http.request('GET', page_url)`
playing with the Free Books 4 years ago
Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`title = ''.join(res.data.decode('utf-8').split('h1')[1].split('>')[1].split('<')[0].split('/')[0])+".pdf"`

			`# skip books already downloaded`
			`if os.path.isfile(title):`
			`return`
playing with the Free Books 4 years ago
Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`download_url = "https://link.springer.com/content/"+res.data.decode('utf-8').split('Download book PDF')[0].split('content/')[1].split('title')[0].split('.pdf')[0]+".pdf"`
playing with the Free Books 4 years ago
Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`wget.download(download_url, title)`
playing with the Free Books 4 years ago

Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`def process_books_in_pdf(pdf):`
			`for i in range(0, pdf.numPages):`
			`lines = pdf.getPage(i).extractText().split('\n')`
playing with the Free Books 4 years ago
Keep trying if encouter error while downloading 4 years ago			`i = 0`
			`no_of_lines = len(lines)`
			`while i < no_of_lines:`
Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`if lines[i].startswith("http://"):`
			`# changing protocol from http to https`
			`url = "https://"+lines[i][7:]`
			`print(url)`
Keep trying if encouter error while downloading 4 years ago			`try:`
			`download_book_from_page(url)`
			`except:`
			`print("Error while downloading, trying again.")`
			`continue`
			`i += 1`
playing with the Free Books 4 years ago

Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`def main():`
			`file = open('Spring.pdf', 'rb')`
			`pdf = PyPDF2.PdfFileReader(file)`
			`process_books_in_pdf(pdf)`
playing with the Free Books 4 years ago

Simplified logic, changed http to https everhwere, skip files already downloaded 4 years ago			`main()`