Author Topic: loony rants  (Read 11731 times)

0 Members and 1 Guest are viewing this topic.

Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #30 on: July 07, 2017, 01:11:50 am »
Okay, it's working. I think I just need to... save the tuples of hash to URLs (both photobucket and imgur), and add code to automagically upload to imgur using its API. Actually, I need to remove that PA_THREAD and TEST_URL stuff.

Code: [Select]
from bs4 import BeautifulSoup
import requests
import sys, os
import hashlib, base64
from io import BytesIO
from urllib.parse import urlparse
from os.path import splitext, basename


UA = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
TEST_URL = 'http://i984.photobucket.com/albums/ae321/isaacscr/Misc/HPIM5242.jpg'

# haha i'm totally Chrome
HEADERS = {'Upgrade-Insecure-Requests': '1',
           'User-Agent': UA,
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
           'DNT': '1'}

def filename_of_url(u):
    img_url_path = basename(urlparse(u).path)
    img_ext = os.path.splitext(img_url_path)[1]
    url_hash = base64.urlsafe_b64encode(hashlib.sha3_224(u.encode('utf-8')).digest()).decode("ascii")
    return url_hash + "." + img_ext

# Returns: None
# Saves PhotoBucket image at given url
def fetch_pb_image(url):
    try:
        s = requests.Session()
        s.headers.update(HEADERS)
        req1 = s.get(url)
        s.headers.update({'referer': req1.url})
        img_req = s.get(url)#.replace('http', 'https'))

        img_data = BytesIO(img_req.content)

        img_filename = filename_of_url(url)

        with open(img_filename, 'wb') as out:
            out.write(img_data.read())

    except Exception as err:
        # Anything could've gone wrong. Hope that it doesn't for
        # the next image.
        # stderr should be saved to log failed images
        print(err.__class__, file=sys.stderr)
        print(err, file=sys.stderr)
        print("failed to get", url, file=sys.stderr)

pa_session = requests.Session()
PA_THREAD = "http://www.primitivearcher.com/smf/index.php/topic,27206.0.html"

thread_page = BeautifulSoup(requests.get(PA_THREAD).content, "html.parser")
current_pagenum = 420

TEST_URL = 'http://i1278.photobucket.com/albums/y506/psmith311/Mobile%20Uploads/2017-05/3896EDFF-DA57-41F2-9D8B-DDF95DED3F01_zps70czmzvh.jpg'

fetch_pb_image(TEST_URL)

def process_thread(url, session):
    while True:
        thread_page = BeautifulSoup(session.get(url).content, "html.parser")

        thread_pagelinks = thread_page.select(".pagelinks")[0]
        thread_pageno = int(thread_pagelinks.find("strong").text)
        next_page_link = thread_pagelinks.find("a", text=str(thread_pageno + 1))

        pb_imgs = [img.get("src") for img in thread_page.find_all("img") if "photobucket.com" in img.get("src")]

        for pb_img in pb_imgs:
            fetch_pb_image(pb_img)

        if next_page_link:
            url = next_page_link.get("href")
        else:
            break

pa_session = requests.Session()
process_thread("http://www.primitivearcher.com/smf/index.php/topic,60633.0.html", pa_session)

Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #31 on: July 07, 2017, 01:17:20 am »
You know, the next thing I should do is a 'web app' that takes an imgur album link and turns it into BBcode img tags to paste into PA.

but EFFORT

edit: last iteration for today i guess. works for a single thread w/ url hard-coded in.

Code: [Select]
from bs4 import BeautifulSoup
import requests
import sys, os
import hashlib, base64
import pickle
from io import BytesIO
from urllib.parse import urlparse
from os.path import splitext, basename


UA = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

# haha i'm totally Chrome
HEADERS = {'Upgrade-Insecure-Requests': '1',
           'User-Agent': UA,
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
           'DNT': '1'}

hash_to_url = {}

def filename_of_url(u):
    img_url_path = basename(urlparse(u).path)
    img_ext = os.path.splitext(img_url_path)[1]
    url_hash = base64.urlsafe_b64encode(hashlib.sha3_224(u.encode('utf-8')).digest()).decode("ascii")
    hash_to_url[url_hash] = u
    return url_hash + "." + img_ext

# Returns: None
# Saves PhotoBucket image at given url
def fetch_pb_image(url):
    try:
        s = requests.Session()
        s.headers.update(HEADERS)
        req1 = s.get(url)
        s.headers.update({'referer': req1.url})
        img_req = s.get(url)#.replace('http', 'https'))

        img_data = BytesIO(img_req.content)

        img_filename = filename_of_url(url)

        with open("imgs/"+img_filename, 'wb') as out:
            out.write(img_data.read())

    except Exception as err:
        # Anything could've gone wrong. Hope that it doesn't for
        # the next image.
        # stderr should be saved to log failed images
        print(err.__class__, file=sys.stderr)
        print(err, file=sys.stderr)
        print("failed to get", url, file=sys.stderr)

def process_thread(url, session):
    while True:
        thread_page = BeautifulSoup(session.get(url).content, "html.parser")

        thread_pagelinks = thread_page.select(".pagelinks")[0]
        thread_pageno = int(thread_pagelinks.find("strong").text)
        next_page_link = thread_pagelinks.find("a", text=str(thread_pageno + 1))

        pb_imgs = [img.get("src") for img in thread_page.find_all("img") if "photobucket.com" in img.get("src")]

        for pb_img in pb_imgs:
            fetch_pb_image(pb_img)

        if next_page_link:
            url = next_page_link.get("href")
        else:
            break

try:
    with open("rels.pkl", "rb") as f:
        hash_to_url = pickle.load(f)
except FileNotFoundError:
    pass

pa_session = requests.Session()
process_thread("http://www.primitivearcher.com/smf/index.php/topic,60645.0.html", pa_session)

with open("rels.pkl", "wb") as f:
    pickle.dump(hash_to_url, f, pickle.HIGHEST_PROTOCOL)
« Last Edit: July 07, 2017, 02:00:59 am by loon »

Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #32 on: July 07, 2017, 01:13:16 pm »
Turns out I don't need to make anything to do the img code stuff. Here's how to most easily get BBcode links in the imgur desktop website. So the [ img ] stuff you can use in the forums.

1. Be logged into imgur. Upload your images into an album/post in "New post". You can give it a name to make it easier to find in the next steps.

2. Click on your name (with the arrow), then click on images. Not albums.



3. Click on the left where it says all images. Select your album with the images that you want to put on PA.



4. Click where it says "View image info", then click on "generate image links"



5. click/drag on one of the pics to box-select them all



6. click on Done. Now click where it says Link (email & IM) and then on BBcode (message boards & forums). Just copypaste that stuff into PA






yay tepeliks


that hornbow... bleh...

I haven't forgotten UtahChippewa's antler that he very generously sent to me. I have a dremel tool, a vise, and a drill, so I don't see why I couldn't make thumbrings pretty easily.. also thought of making deer antler guitar picks
« Last Edit: July 07, 2017, 07:13:58 pm by loon »

Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #33 on: July 09, 2017, 01:01:16 am »
an electrical high voltage thing blew up and we lost power

still no power hours later


Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #34 on: July 09, 2017, 01:21:51 pm »
power is back

Well, extracting photobucket URLs from the database using MySQL from a Unix-like environment (GNU utils at least) isn't as clean as I'd hoped. it would be something like

Code: [Select]
SELECT body FROM smf_messages WHERE body LIKE "%photobucket.com%" INTO OUTFILE '/tmp/msgswithpb.csv' LINES TERMINATED BY '\n';

then from a shell..

Code: [Select]
sed -r 's/(\[\/img\])/\1\n/g' /tmp/msgswithpb.csv | grep -Po '\[img.*\]?http://.*photobucket.com/.*' | grep -o 'http.*' | sed 's|\[/img\]||'

may be easier with postgres..

if this is too complicated, I guess I'll have to do what I originally planned to (go through all the threads on the forums)


about imgur -

Quote
The Imgur API uses a credit allocation system to ensure fair distribution of capacity. Each application can allow approximately 1,250 uploads per day or approximately 12,500 requests per day

hmm..

Or I could make it into a PHP script that you can run somehow
« Last Edit: July 10, 2017, 10:17:46 pm by loon »

Offline Marc St Louis

  • Administrator
  • Member
  • Posts: 7,877
  • Keep it flexible
    • Marc's Bows and Arrows
Re: loony rants
« Reply #35 on: July 09, 2017, 06:51:40 pm »
power is back

Well, extracting photobucket URLs from the database using MySQL from a Unix-like environment (GNU utils at least) isn't as clean as I'd hoped. it would be something like

Code: [Select]
SELECT body FROM smf_messages WHERE body LIKE "%photobucket.com%" INTO OUTFILE '/tmp/msgswithpb.csv' LINES TERMINATED BY '\n';

then from a shell..

Code: [Select]
sed -r 's/(\[\/img\])/\1\n/g' /tmp/msgswithpb.csv | grep -Po '\[img.*\]?http://.*photobucket.com/.*' | grep -o 'http.*' | sed 's|\[/img\]||'

may be easier with postgres..

if this is too complicated, I guess I'll have to do what I originally planned to (go through all the threads on the forums)


about imgur -

Quote
The Imgur API uses a credit allocation system to ensure fair distribution of capacity. Each application can allow approximately 1,250 uploads per day or approximately 12,500 requests per day

hmm..

Cripes, makes me wish I knew more about computers, well maybe not  :D
Home of heat-treating, Corbeil, On.  Canada

Marc@Ironwoodbowyer.com

Offline Marc St Louis

  • Administrator
  • Member
  • Posts: 7,877
  • Keep it flexible
    • Marc's Bows and Arrows
Re: loony rants
« Reply #36 on: July 11, 2017, 07:59:53 am »
It's my understanding, from what amateurhour said, is that it's too risky and could crash the whole message board, we can't afford that
Home of heat-treating, Corbeil, On.  Canada

Marc@Ironwoodbowyer.com

Offline amateurhour

  • Administrator
  • Member
  • Posts: 239
Re: loony rants
« Reply #37 on: July 11, 2017, 04:16:47 pm »
To clarify that, it's risking taking a sql database as large as PA and trying to run a query this large on it and make changes. The chance of DB corruption would be there, even if low.

That being said, I don't want to have to be on standby to backup the database and do a master restore if it doesn't work.


Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #38 on: July 11, 2017, 06:09:49 pm »
😂

766918 posts... 6 GB max?

I could just make some sort of web app that you'd paste either a link or BBcode into..
« Last Edit: July 12, 2017, 06:38:18 pm by loon »

Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #39 on: July 14, 2017, 08:17:47 pm »
Tapered blunt arrow with ugly composite nock. Sitka spruce
The nock is too tight. It's not that easy to do. Maybe I should've made it 1/8".
The blunt is not even/symmetrical. Sometimes it seems like I have trouble with basic stuff, such as making straight cuts, though it helps to focus.
Sigh.




Offline Hawkdancer

  • Member
  • Posts: 5,040
Re: loony rants
« Reply #40 on: July 15, 2017, 12:02:06 am »
Loon,
Some judicious filing with a flat diamond needle file should loosen the nock(very judicious) (R ;D >:D!  Looks like it would stop a bunny rabbit!
Hawkdancer
Life is far too serious to be taken that way!
Jerry

Offline TimBo

  • Member
  • Posts: 1,047
Re: loony rants
« Reply #41 on: July 15, 2017, 05:10:10 pm »
When my nocks are snug, I double fold some not-too-aggressive sandpaper and work them down that way.  That's an interesting nock design!

Offline loon

  • Member
  • Posts: 1,307
Re: loony rants
« Reply #42 on: July 15, 2017, 06:31:04 pm »
Thanks, I filed it some with needle files, and now it's too loose, but it's fine for me. Should've listened to "judicious" more. Gotta be patient and go slow, same applies to tillering.
It seems like it can fly straight enough.
These sort of nocks were used by the Turks.

I'm hoping that the AWS free tier will be enough for that web app.