Thursday, May 5, 2011

SQL TOP 1 analog for lists in Python

Here is an example of my input csv file:

...
0.7,0.5,0.35,14.4,0.521838919218

0.7,0.5,0.35,14.4,0.521893472678

0.7,0.5,0.35,14.4,0.521948026139

0.7,0.5,0.35,14.4,0.522002579599
...

I need to select the top row where the last float > random number. My current implementation is very slow (script has a lot of iterations of this and outer cycles):

for line in foo:
   if float(line[-1]) > random.random():
      res = line
      break
...

How can I make this better and faster?

EDIT:

I was advised to use bisect for this task, but I don't know how to do it.

From stackoverflow
  • The fastest approach is to use bisect (assuming the float list is ordered). You can do it like this:

    import bisect
    
    float_list = [line[-1] for line in foo]
    index = bisect.bisect(float_list, random.random())
    if index < len(float_list)
        result = foo[index]
    else:
        result = None # None exists
    

    The float list has to be ordered for this to work.

  • You might actually be able to use the appropriate SQL command if you import the CSV file into SQLite. Python has a built-in sqlite library you can use to query the database.

0 comments:

Post a Comment