Tuesday, December 11, 2012

Postgres temporary table definition and loading in SQLAlchemy, as well as handling large lists of values to be checked against as query parameters

Often, I have to pass a large list of potential value matches to a PostgreSQL query through SQLAlchemy . I believe that SQLAlchemy limits the number of allowed arguments to somewhere around 1000, so to get around that, I'll pass a comma separated list of values and then split the values in my query. For instance, say I am looking for values with ids matching 2, 4, 6, 8, and 10. The query would work as follows:

SELECT * from my_table where id = any(string_to_array( '2,4,6,8,10' )); --Yes, you have to worry about casting here with numeric ids...
And you could automate this behavior with a Python function like so:

def csv_string_selectable(element_list): from sqlalchemy import select, func return select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) )]) def sample_usage(): matching_ids = [ "2", "4", "6", "8", "10" ] query = session.query(MyTable) query = query.filter(MyTable.id.in_(csv_string_selectable(matching_ids)))
Unfortunately, Postgres as of 8.4 doesn't always process these sorts of queries as optimally as it could. That is, if I rewrite the query to use a temporary table, I see a performance improvement. If multiple of these csv lists are used against different tables in the same query, using a temporary table provides a huge performance improvement. The incantation for getting the temporary table up and into the correct session was difficult to derive:

#You also have to have your sqlalchemy # metadata variable defined and available from sqlalchemy.sql.expression import Executable, ClauseElement from sqlalchemy.ext.compiler import compiles class InsertFromSelect(Executable, ClauseElement): def __init__(self, table, select): self.table = table self.select = select @compiles(InsertFromSelect) def visit_insert_from_select(element, compiler, **kw): return "INSERT INTO %s (%s)" % ( compiler.process(element.table, asfrom=True), compiler.process(element.select) ) def csv_string_selectable_tt(session, element_list): #Generate a unique temporary table name import uuid table_name = "temp_" + str(uuid.uuid1()).replace("-", '') #Create the temporary table and commit it to existence temp_table = Table( table_name, metadata, Column('id', VARCHAR(length=255, convert_unicode=False, assert_unicode=None), primary_key=True), prefixes=['TEMPORARY'] ) temp_table.create(bind=session.bind) #This commit is necessary or else the session will be rolled back before the next command session.commit() from sqlalchemy import select, func, insert source_select = select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) ).label('id')]) #This custom generator is necessary to select into a table from a selectable insert_statement = InsertFromSelect(temp_table, source_select) session.execute(insert_statement) return temp_table #Usage: def sample_usage(): allowed_ids = [ "2", "4", "6", "8", "10" ] allowed_ids_table = csv_string_selectable_tt(session, allowed_ids) query = session.query(MyTable) query = query.filter(MyTable.description == My2ndTable.description ) query = query.filter(MyTable.id == allowed_ids_table.alias().c.id ) query = query.filter(My2ndTable.id == allowed_ids_table.alias().c.id )

Thursday, March 8, 2012

Compare the simlarity of two strings in Python

Again, I have adapted this code from some C code I found on the Internet years ago.

Side note: Did you know that toward and towards are completely interchangeable? Toward tends to get used in American English while towards is used more heavily in British English.



def wordcompare (s1, s2, weight_toward_front = False):
"""Compares two strings and returns a value between 0 and 100 for similarness"""

if not s1 or not s2:
return 0

if s1 == s2:
return 100

s1_len = len(s1)
s2_len = len(s2)

n = i = j = k = x = y = 0

while i < s1_len:
x = i
j = 0
while j < s2_len:
y = j
k = 0
while x < s1_len and y < s2_len and s1[x] == s2[y]:
eb = 1
if weight_toward_front:
if (x < 1 and y < 1):
eb += 3
if (x < 2 and y < 2):
eb += 2
if (x < 3 and y < 3):
eb += 1

k += 1
n += (k*k*eb)
x += 1
y += 1

j += 1
i += 1

n = int((n * 20) / (s1_len * s2_len))

if n > 100:
n = 100

return n

Wednesday, March 7, 2012

Compare the simlarity of two strings in Visual Basic

I have adapted this code from some C code I found on the Internet years ago.


Function WordCompare(word1 As String, word2 As String, Optional weight_towards_front As Boolean = False) As Integer
'Compares two strings and returns a value between 0 and 100 for similarness
If word1 = "" And word2 = "" Then
WordCompare = 0
Exit Function
End If

If word1 = word2 Then
WordCompare = 100
Exit Function
End If

Dim word1_len, word2_len As Integer

word1_len = Len(word1)
word2_len = Len(word2)

Dim n, i, j, k, x, y, eb As Integer
n = 0
i = 0
j = 0
k = 0
x = 0
y = 0

For i = 1 To word1_len
x = i

For j = 1 To word2_len
y = j
k = 0

Do While x <= word1_len And y <= word2_len And Mid(word1, x, 1) = Mid(word2, y, 1)
eb = 1
If weight_towards_front Then
If x < 2 And y < 2 Then
eb = eb + 3
End If
If x < 3 And y < 3 Then
eb = eb + 2
End If
If x < 4 And y < 4 Then
eb = eb + 1
End If
End If

k = k + 1
n = n + (k * k * eb)
x = x + 1
y = y + 1
Loop

Next j
Next i

n = (n * 20) / (word1_len * word2_len)

If n > 100 Then
n = 100
End If

WordCompare = n

End Function

Wednesday, November 30, 2011

Thursday, January 6, 2011

A python function to add/update querystring arguments on a given url


def add_get_args_to_url(url, arg_dict):
import urllib, urlparse

try:
from urlparse import parse_qs
except ImportError:
from cgi import parse_qs

url_parts = urlparse.urlparse(url)

qs_args = parse_qs(url_parts[4])
qs_args.update(arg_dict)

new_qs = urllib.urlencode(qs_args, True)

return urlparse.urlunparse(list(url_parts[0:4]) + [new_qs] + list(url_parts[5:]))

Wednesday, November 10, 2010

Acquiring stock information in Excel from Yahoo Finance

Yahoo Finance has a nice feature where you can request stock information in csv format. For instance, want the current earnings per share value for Bank of America? You can get that in a pretty raw form suited for automation.

http://finance.yahoo.com/d/quotes.csv?s=BAC&f=e

There you go. Click that, save it, and open it with Excel, and up will come the information that you asked for.

You can request multiple pieces of information at a time, and a number of different data fields for them. The list is here: http://www.gummy-stuff.org/Yahoo-data.htm

Let's take this a step further. We can make Excel automatically load this information from the Internet. Not only will it eliminate the need to go look up this information and fill it in, but it will reload it whenever the worksheet is refreshed.

To do this, we're going to write a VBA function that takes a ticker symbol and the field we are looking for.


Public Function GetQuote(ByVal ticker As String, ByVal field As String)
'Valid field options can be found at http://www.gummy-stuff.org/Yahoo-data.htm
Dim objHttp As Object
Set objHttp = CreateObject("MSXML2.ServerXMLHTTP") 'Create an XMLHTTP object to access the web
Dim url As String
url = "http://finance.yahoo.com/d/quotes.csv?s=" & ticker & "&f=" & field 'Build our query consisting of our ticker symbol and the field we want
objHttp.Open "GET", url, False 'Open up our connection to Yahoo
objHttp.Send 'And get Yahoo's response

'Before returning, knock off the last two characters from the response--they are junk
GetQuote = Left(objHttp.ResponseText, Len(objHttp.ResponseText) - 2)
End Function


For those of you new to VBA, press Alt-F11 to bring up the VBA window. Right-click in the Project window, and insert a new Module. Then, just copy and paste the above code into the code window.



Now that we have our function, we can make use of it. Returning to our worksheet, we can start entering our formula. If you have Excel 2010 or later, Excel should fill in the name of the function you have added.



Fill in the ticker symbol and field you are looking up. In this example, I'm using the function to create a matrix of values with the ticker symbols along the left and the field symbols along the top.




Excel will go out to Yahoo and get the requested data.




In this example, we can drag the cell border to fill the values down and over.

Friday, April 3, 2009

No module named servicemanager - building a windows service with py2exe

Trying to build a windows service out of my standalone django application, I was using the various examples available, and kept running into the problem "No module named servicemanager" whenever I tried to run the generated executable. There is almost nothing out there on this, but it occurred to me that perhaps I needed to install/reinstall the Python Win32 Extensions. Turns out, that was it.